#deepseek

23 items

Why has ChatGPT become so annoying and disagreeable? (www.reddit.com via reddit) 156 pts·126 replies· 4d

deepseek chatgpt claude
DeepSeek Updated their repo DeepGEMM testing Mega MoE (www.reddit.com via reddit) 87 pts·8 replies· 10h

https://github.com/deepseek-ai/DeepGEMM/pull/304 https://preview.redd.it/vcmqwmvzijvg1.png?width=1014&format=png&auto=webp&s=76b1739925f0699b0763aa7814614dd40329c41e https://github.com/deepseek-ai/DeepGEMM/commit/a050d09461e86eb6bba35a8c74…

moe deepseek
Guys we have to change the pelican test (www.reddit.com via reddit) 48 pts·64 replies· 1d

So i have been seeing more of those pelican on a bike svg tests and while they work i feel like (and maybe you guys do too) they are getting kinda benchmaxxed so we should switch things up soon and this is my idea generate me a html svg of…

↯ MiniMax 2.7 deepseek glm minimax+4
We benchmarked TranslateGemma-12b against 5 frontier LLMs on subtitle translation - it won across the board, with one significant catch (www.reddit.com via reddit) 38 pts·15 replies· 2d

deepseek gpt-5 sonnet+2
[P] Built GPT-2, Llama 3, and DeepSeek from scratch in PyTorch - open source code + book (www.reddit.com via reddit) 20 pts·3 replies· 1d

I wrote a book that implements modern LLM architectures from scratch. The part most relevant to this sub: Chapter 3 takes GPT-2 and swaps exactly 4 things to get Llama 3.2-3B: LayerNorm → RMSNorm Learned positional encodings → RoPE GELU →…

moe deepseek llama
Single question llm comparison (www.reddit.com via reddit) 10 pts·1 replies· 8w

haiku grok deepseek+6
Best LLM for logic/ spatial reasoning on small context inputs? (www.reddit.com via reddit) 2 pts·2 replies· 21h

My system has 32gb RAM and 8gb VRAM. I tried out DeepSeek-R1-Distill-Qwen-7B-Q6_K_L.gguf and it was vastly inadequate for what I wanted so looking for other suggestions.

deepseek qwen
Show HN: A book that builds GPT-2, Llama 3, DeepSeek from scratch in PyTorch (news.ycombinator.com via hn) 2 pts·1 replies· 1d

I'm a software engineer who works with LLMs professionally (Forward Deployed Engineer at TrueFoundry). Over the past year I built up implementations of five LLM architectures from scratch and wrote a book around them.

deepseek llama openai
Claude down? TokenMonopoly will help you find the best deals in AI subs (tokenmonopoly.com via hn) 2 pts· 3d

TokenMonopoly Live leaderboard of AI API deals — pricing, subscriptions, and SWE-bench scores for Claude, GPT, Gemini, Kimi, DeepSeek, Llama and more. Compare 27 benchmarked models across 96 hosts by price-per-performance, refreshed daily.

swe-bench deepseek llama+2
Is my 'Retry Tax' math correct for DeepSeek V3/V4 agents? (Project Feedback) (www.reddit.com via reddit) 2 pts·14 replies· 5w

deepseek agentic
Feedback on iOS app with local AI models (www.reddit.com via reddit) 1 pts· 8h

Hey everyone, I just shipped an iOS app that runs local AI models. Current has 12 models: Gemma 4, Llama 3.3, Qwen3, DeepSeek R1 Distill, Phi-4, etc.

deepseek llama gemma+1
For AI agents: is per‑token pricing killing your budget? Looking for feedback on time‑based subscriptions. (www.reddit.com via reddit) 1 pts·3 replies· 1d

Hey r/AI_Agents, I run an inference service (cheapestinference.com) and we're exploring a different pricing model that might be more predictable for agent workloads. Instead of per‑token billing, we offer **dedicated 8‑hour time windows**…

deepseek qwen llama
I built an MCP server that gives Claude Code image/video generation, web search, and smart multi-model routing (www.reddit.com via reddit) 1 pts·2 replies· 2d

deepseek gemini mcp+2
Claude Code with Pro subscription + OpenRouter in parallel — what's the cleanest setup? (www.reddit.com via reddit) 3 replies· 8h

Hi there, I have a Claude Pro subscription and use Claude Code daily. I'd also like to use Claude Code routed through my OpenRouter API key so I can experiment with other models (GLM-5.1, DeepSeek, Kimi, Gemini, etc.) — without giving up m…

↯ GLM 5.1 deepseek glm sonnet+3
MINISFORUM AI X1 Pro-370 (96GB) - Local Ollama Help (www.reddit.com via reddit) 8 replies· 9h

Hey all. This just got delivered yesterday.

↯ Qwen 2.5 deepseek ollama qwen+1
Deepseek-r1 thinks for 30 minutes? (www.reddit.com via reddit) 9 replies· 20h

I was trying to ask a question about coding using DeepSeek-R1-0528-Qwen3-8B-Q4_K_M, and the thinking took 30 minutes??? https://preview.redd.it/kex3fgg4lgvg1.png?width=277&format=png&auto=webp&s=5f7e7cdc8502b935ea8b8fb83e0e4af60c3c4533 I h…

↯ Qwen 3 deepseek
DeepSeek V4 reportedly drops late April. 1M context, multimodal, Claude-level coding. (www.reddit.com via reddit) 21 replies· 1d

Leaks point to late April release. Key specs 1M token context window Native multimodal (image/video input) Projected ~85% SWE-Bench Verified (ties or beats Claude Opus 4.6) Base model remains free.

swe-bench deepseek opus+1
Running a full agentic coding loop locally on a 3090. Here's what actually works in 2026. (www.reddit.com via reddit) 9 replies· 1d

After months of testing, I finally have a local setup that doesn't make me want to go back to the API. Hardware: RTX 3090 (24GB VRAM) Models tested: Qwen2.5-Coder 32B Q4_K_M, DeepSeek-Coder-V3 Q4, Llama 3.3 70B Q3_K_M Inference: llama.cpp…

deepseek ollama llama+1
Looking for people with different hardware to help benchmark local LLM behavioral reliability (www.reddit.com via reddit) 3 replies· 2d

I've been working on measuring how LLMs actually behave (not what they know) across different hardware setups. Things like: does the model cave when you push back on a correct answer?

↯ Qwen 3.5 mistral deepseek
AI lied to me about a video game existing, so I sued it in the High Court of the Internet and got 2 settlement games (www.reddit.com via reddit) 6 replies· 3d

TL;DR: Claude hallucinated "Champions Career Mode." I threatened to sue Anthropic. Claude admitted guilt and built me a custom HTML5 game as settlement.

deepseek anthropic claude
4 llm Groupchat (www.reddit.com via reddit) 8 replies· 3d

grok deepseek claude
Why most open-source models can't answer this question while most closed-source models can answer most of the time? (www.reddit.com via reddit) 30 replies· 3d

grok deepseek glm+5
Is 32GB Mac enough for engineering/coding, or stick to Claude? (www.reddit.com via reddit) 13 replies· 3d

↯ Qwen 2.5 deepseek sonnet claude

← all tags