So apparently the model gets beaten by qwen 3.6 on every benchmark reported by cohere labs. You are getting lower RAM (considering model offload) usage and slightly better performance for imo significantly less output quality.
model
DeepSeek-V4-Pro
huggingface.co/deepseek-ai/DeepSeek-V4-Pro ↗
78864 downloads2553 likestext-generationtransformers
from the model card
DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence Technical Report👁️ Introduction We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens. DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: Hybrid Attention Architecture: We design a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to dramatically improve long-context efficiency. In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. Manifold-Constrained Hyper-Connections (mHC): We incorporate mHC to strengthen conventional residual connections, enhancing stability of signal propagation across layers while preserving model expressivity. Muon Optimizer: We employ the Muon optimizer for faster convergence and greater training stability. We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline. The post-training features a two-stage paradigm: independent cultivation of domain-specific experts (through SFT and RL with GRPO), followe…
discussions
- DeepSeek 4 30 ongoing since 2026-05-29
- DeepSeek 4 181 2026-04-22 – 2026-06-01
recent items
DOA model by Cohere Labs (www.reddit.com via reddit) Running DeepSeek-V4-Flash on a Raspberry Pi (twitter.com via hn) Article Conversation Running DeepSeek-V4-Flash on a Raspberry Pi I ran DeepSeek-V4-Flash on a Raspberry Pi 5 (8GB edition) by streaming model weights from a PCIe attached NVMe SSD. Codex (GPT-5.5 xhigh) and Claude Code (Opus 4.8 max) drove…
FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention (arxiv.org) DeepSeek V4 Pro beats GPT-5.5 Pro on precision (runtimewire.com via hn) DeepSeek V4 Pro takes this matchup 38.0 to 33.0, and the margin feels earned. Across the scored tasks, the pattern is simple: Model A was tighter, more literal, and more reliable under constraints, while Model B was good but a little too w…
DStudio – local DeepSeek V4 with a design studio, reachable from your phone (github.com via hn) DStudio A native, local-first desktop app for DeepSeek V4 — chat, a coding agent and a design studio, all running on your Mac. Nothing leaves the device.
Share your agentic LLMs and average cost ($/MTokens) (www.reddit.com via reddit) Show HN: One API Key for 45 AI Models – Pay per Token, OpenAI Compatible (modelhub-api.com via hn) DeepSeek V4 math score equals GPT-5.5 (91) and trails by just 4-6 points in other categories — at 97% lower cost. Is the AI quality as good as GPT?
Here are some tips on hitting nearly 200 tok/s for DeepSeek v4 Flash on Hopper (dnhkng.github.io via reddit) I needed a smarter model for my local Hermes Agent setup, so I moved to DeepSeek v4 Flash. First things first: Running 4 concurrent threads on vLLM, I can hit ~400 tok/s 400 x 60 x 60 x 24 x 30 is ~1B TOKENS per month!!!
Mimo v2.5 is better deal than DeepSeek v4 flash (news.ycombinator.com) So Hear me out. Not only on almost all benchmarks is mimo v2.5 is better than dsv4f flash, but also the pricing.
I Compared the Top AI Models of 2026 — The Results Were More Nuanced Than Expected (www.reddit.com via reddit) Over the last few weeks I've been comparing the latest frontier AI models, including Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, Grok 4.3, Perplexity AI and DeepSeek V4-Pro. Instead of focusing only on benchmark scores, I looked at: Real-wor…
↯ Opus 4.8↯ GPT 5.5↯ DeepSeek 4↯ Gemini 3.1grokgpt-5deepseek+3
Command Code - confusing messages (www.reddit.com via reddit) Hi, I'm a little confused. I was doing a code review of one of my repositories, mainly just testing out different models to see what came back.
planing with composer 2.5 executing with deepseek v4 flash (www.reddit.com via reddit) I am thinking to buy 20 dollars pro. is this approach make sense?
Alternate to ChatGPT Pro (www.reddit.com via reddit) I had briefly used ChatGPT pro feature - in the chat app. It was quite amazing.
DeepSeek V4 Flash is amazing! (WIP llama.cpp PR #24162) (www.reddit.com via reddit) In case you're not aware already, the DeepSeek V4 series is finally getting supported on llama.cpp with this PR! The PR is at a very early stage right now, so only try it if you're consciously willing to experiment out of curiosity and acc…
DeepSeek V4 managed to reverse engineer Teamspeak's Licensing System with $3.88 (old.reddit.com via hn) could not extract summary
DeepSWE Audit: DeepSeek-v4-pro results are unreliable (github.com via hn) DeepSWE DeepSWE is a benchmark for measuring frontier coding agents on original, long-horizon software engineering tasks drawn from active open-source repositories. The benchmark includes 113 tasks across TypeScript, Go, Python, JavaScript…
Bringing Up DeepSeek-V4-Flash on AMD MI300X (fergusfinn.com via hn) Bringing up DeepSeek-V4-Flash on AMD MI300X At Doubleword we are building an inference cloud designed for volume. To do that we have to reckon with the enveloping compute shortage.
DeepSeek-V4-Flash (official FP8) running across 2x DGX Spark (forums.developer.nvidia.com via hn) I didn’t create this recipe you guys did but I was finally able to find it and get Deepseek v4 Flash working with 200k Context on 2 Nodes. Sharing this since I couldn’t find a confirmed end-to-end recipe for the official DeepSeek-V4-Flash…
How DeepSeek's architecture is shattering Silicon Valley's token moat (venturebeat.com via hn) DeepSeek’s announcement over the weekend that it has made its 75% price cut permanent on its flagship V4 Pro model is a disruptive assault on the capital-heavy business models of Silicon Valley’s frontier labs. The reduction on DeepSeek V4…
↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4deepseek
The first framework that can post train DeepSeek V4-pro on a single-node? (news.ycombinator.com) Hi all, We just opensourced a project called Orbit, which can RL post train trillion scale LLMs like deepseek v4. We found it pretty cool!
SWE-rebench Leaderboard (March, April and May 2026): GPT-5.5, Opus 4.7, Cursor (Composer 2.5), Kimi K2.6 and More (swe-rebench.com via reddit) Hi all, Sorry for going missing — we’ve been collecting a larger, higher-quality set of more complex tasks. We’re excited to share a major leaderboard update covering the past three months.
Show HN: Free open source coding models in Slack (www.runcord.com via hn) Hey HN, We believe we have the easiest onboarding from signup to being able to spin up coding agents in slack like Stripe, Ramp & Coinbase. Demo of the onboarding: https://www.tella.tv/video/connecting-cord-to-slack-1-19ep Every signup get…
Did DeepSeek v4 suddenly become more expensive? (imgur.com via hn) If you're seeing this message, that means JavaScript has been disabled on your browser , please enable JS to make Imgur work.
↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4deepseek
Show HN: Train Claude Code's replacement (ds4 and pi and aoe) (github.com via hn) Remember how Meta monitored employee activity closely for a few months, and then had a bunch of layoffs related to AI efficiency? (oh right that was like 3 days ago).
↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4deepseekclaude-code
Looking for a working Deepseek-v4-Flash quant (www.reddit.com) Best I tried so far is https://huggingface.co/nsparks/DeepSeek-V4-Flash-FP4-FP8-GGUF with the custom llama.cpp fork, but it suffers from low quality and random incoherent output. VLLM wouldn't support anything other than H100s for DS4.
Has anyone gotten their editor to work with Deepseek v4 FIM? (www.reddit.com) I tried to follow the docs here https://api-docs.deepseek.com/guides/fim_completion to get it up and running in VSCode or Zed with my api key but it doesn't work, I think it's got something to do with the request body, has anyone got autoc…
DeepSeek V4 Flash at 8.4 tok/s on 3×3090: patching the GGUFs that won't load on cchuter's llama.cpp fork (www.reddit.com) my apologies if anything does not make sense, I literally dont know what I am doing, im not a programmer, just a simple vibe coder, with an Claude subscription. That said, if you have 200gb of sys ram+vram and want to run deepseek v4 flash…
GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding) (www.reddit.com) Trying to figure out the right box for my team and wanted to see if anyone had any clue which would be a better fit or if it is not worth our time in our budget. Situation: 5 of us doing agentic coding (lots of long context getting re-sent…