model

DeepSeek-V4-Pro

huggingface.co/deepseek-ai/DeepSeek-V4-Pro ↗

78864 downloads·2553 likes·text-generation·transformers

from the model card

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence Technical Report👁️ Introduction We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens. DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: Hybrid Attention Architecture: We design a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to dramatically improve long-context efficiency. In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. Manifold-Constrained Hyper-Connections (mHC): We incorporate mHC to strengthen conventional residual connections, enhancing stability of signal propagation across layers while preserving model expressivity. Muon Optimizer: We employ the Muon optimizer for faster convergence and greater training stability. We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline. The post-training features a two-stage paradigm: independent cultivation of domain-specific experts (through SFT and RL with GRPO), followe…

discussions

DeepSeek 4 9 ongoing since 2026-06-22
DeepSeek 4 20 2026-06-11 – 2026-06-21
DeepSeek 4 37 2026-05-29 – 2026-06-13
DeepSeek 4 181 2026-04-22 – 2026-06-01

recent items

Show HN: DeepSeek Flash inverted the economics of agent products (www.rtrvr.ai via hn) +2 14h

There is an adversarial relationship between developers and big model labs. Model labs charged developers higher API prices to subsidize their own agent harness offerings.

↯ Copilot ↯ DeepSeek 4 deepseek copilot sonnet+2
We got DeepSeek-V4-Pro serving in 20 seconds (inferize.ai via hn) +3 16h

Inferize is building highly optimized, elastic inference for AI workloads. Ridiculously fast, efficient LLM serving that scales with demand.

↯ DeepSeek 4 deepseek
Ask HN: How to avoid LLMs struggling with Lisp parens? (news.ycombinator.com) +21 2d

LLMs seem to love certain languages (Python, Bash, etc.), but they all seem to struggle with Lisp (e.g. Racket or Emacs Lisp).

↯ DeepSeek 4
DeepSeek V4 Flash optimized framework and model variants for DGX Spark (github.com via hn) +21 3d

ds4 - Mixed NVFP4 serving of DeepSeek V4 Flash on the NVIDIA Spark family (GB10) ⚠️ This GitHub repository is for archival / mirror purposes only. Active development happens at git.kokoham.com/sleepy/ds4-nvfp4-spark.

↯ DeepSeek 4 deepseek
Microsoft considers DeepSeek as OpenAI costs mount (www.digitimes.com via hn) +4 3d

Microsoft is reportedly considering introducing a fine-tuned version of the Chinese open-source model DeepSeek V4 into its enterprise artificial intelligence (AI) tool Copilot Cowork, as a lower-cost alternative to models from OpenAI and A…

↯ Copilot ↯ Cowork ↯ DeepSeek 4 cowork deepseek copilot+1
Cheapest way to run Claude Opus 4.8 on a <$30 monthly budget? (www.reddit.com via reddit) 4d

Which option gives the most actual Opus 4.8 usage volume: Kiro Pro, Claude Pro or something else? My monthly budget is $30.

↯ Opus 4.8 ↯ Sonnet 4.6 ↯ DeepSeek 4 deepseek sonnet gemini+1
DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence (arxiv.org) 7d

We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models -- DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) -- both su…

↯ DeepSeek 4 ↯ DeepSeek 4 ↯ DeepSeek 4 ↯ DeepSeek 4 moe deepseek
GLM 5.2 via Claude Code is the first non-Claude model that feels close to Opus (www.reddit.com via reddit) 8d

I’ve been using GLM 5.2 with Claude Code through its Anthropic-compatible API endpoint. I’ve tested it on various projects, including but not limited to database development, backend payment API work, backend and frontend debugging, Larave…

↯ DeepSeek 4 ↯ Glm ↯ DeepSeek 4 glm deepseek opus+2
Using Claude Opus as planner + DeepSeek as worker in Claude Code — anyone solved the single-session routing problem? (www.reddit.com via reddit) 8d

I've been running a hybrid planner/worker setup with Claude Code and hit a tricky constraint I'm hoping the community has thoughts on. The setup Planner — Claude Opus for architecture, planning, and review Worker — DeepSeek V4 Pro / DeepSe…

↯ DeepSeek 4 deepseek opus anthropic+1
Native Coding Agent Optimized for Local LLM and DeepSeek v4 with Vector Memory (code.intellios.ai via hn) +1 9d

cwcode A terminal coding agent built around DeepSeek V4 Pro, Qwen3.6‑27B, Kimi, Azure, and anything else that speaks OpenAI’s chat API. Written in Go.

↯ DeepSeek 4 deepseek openai
DeepSeek V4 Pro at 5% the cost of Claude – what it takes to close the gap (howardchen.substack.com via hn) +8 9d

DeepSeek V4 Pro at 5% the cost of Claude — what it takes to close the gap Hash-anchored edits, a sticky prefix cache, and the autonomous loops we run on production code We’ve been using DeepSeek V4 Pro as our daily-driver coding model for…

↯ DeepSeek 4 deepseek
Kimi 2.7 vs. DeepSeek Coder (simpletechguides.com via hn) +31 10d

Kimi K2.7 Code vs MiMo Code vs DeepSeek V4 Pro: Three Open-Source Coding Tools Compared Three Chinese AI labs shipped major coding tools in the same window this spring: Moonshot AI released Kimi K2.7 Code, Xiaomi shipped MiMo Code, and Dee…

↯ DeepSeek 4 deepseek
DeepSeek-V4 Can't Read Images? I Made It Read (www.dataleadsfuture.com via hn) +2 11d

DeepSeek-V4 Can't Read Images? I Made It Read Don't wait for a multimodal model, you can use it now Introduction Have you ever had that frustrating moment: you are coding with deepseek-v4 in OpenCode, your code throws an error, you want to…

↯ DeepSeek 4 deepseek
International Market Retention Strategy After the Fable 5 Export Ban (www.reddit.com via reddit) 11d

Like many of you, I lost access to Fable 5 on June 12. The next day, I co-authored a strategy paper with Claude addressing the core business problem: how does Anthropic retain its international market now that cloud-only deployment has bee…

↯ DeepSeek 4 deepseek anthropic
Ask HN: Which cheap Chinese LLM are you using? (news.ycombinator.com) +4 12d

In the last one or two months, starting from DeepSeek V4 Pro, there are quite many low-price Chinese models coming out. Their performance looks more or less similar to me: Mimo V2.5 Pro, MiniMax M3, and the just released GLM 5.2, etc.

↯ Glm ↯ Minimax ↯ DeepSeek 4 minimax glm deepseek
Fable 5 Max confidently wrong about PDF encryption status (www.reddit.com via reddit) 2w

I just ran into a bizarre hallucination with Fable 5 Max regarding file analysis. i uploaded several PDF to Fable 5 Max, and out of two of it claude completely refused to process it, claiming the files was password-protected.

↯ Hallucination ↯ DeepSeek 4 ↯ DeepSeek 4 ↯ DeepSeek 4 ↯ DeepSeek 4 ↯ DeepSeek 4 hallucination deepseek
How can Deepseek v4 top the coding leaderboards and still sit 8 months behind the frontier? (www.reddit.comhttps) 2w

Two numbers on this model that don't sit comfortably with each other. The Pro config posts coding scores near the top of every board, 80.6 on SWE-bench Verified and 93.5 on LiveCodeBench.

↯ Swe Bench ↯ DeepSeek 4 ↯ DeepSeek 4 ↯ DeepSeek 4 ↯ DeepSeek 4 ↯ DeepSeek 4 swe-bench gpt-5 deepseek+1
FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention (www.reddit.com via reddit) 2w

Conventional LLMs keep the full KV cache loaded during decoding, causing a severe GPU memory bottleneck for ultra-long context serving. In this report, we propose Lookahead Sparse Attention (LSA), a novel inference paradigm powered by a Ne…

↯ DeepSeek 4 ↯ DeepSeek 4 ↯ DeepSeek 4 ↯ DeepSeek 4 ↯ DeepSeek 4 deepseek
Deepseek v4 pro (www.reddit.com via reddit) 2w

Hello, Ive ran out of Pro+, is it possible to use DS4 in cursor ide? thanks

↯ DeepSeek 4 ↯ DeepSeek 4 ↯ DeepSeek 4 deepseek cursor
Bit of a lull or Winter is Coming? (www.reddit.com via reddit) 2w

It feels as though we’re at an inflection point and I was wondering what others‘ take is on the current situation: On the frontier end we have OpenAI and Anthropic gearing up for their IPO, so it‘s all Mythos and wow and it seems plausible…

↯ Mistral ↯ Anthropic Mythos ↯ DeepSeek 4 ↯ DeepSeek 4 mistral mythos openai+1
Can I finetune Deepseek V4-flash with two rtx pro 6000s (www.reddit.com via reddit) 2w

Well I knew, it may be very tight on 192GB. However, is there any framework to do finetuning of DS4-flash with 4bit QLoRA?

↯ DeepSeek 4 ↯ DeepSeek 4 deepseek
DOA model by Cohere Labs (www.reddit.com via reddit) 2w

So apparently the model gets beaten by qwen 3.6 on every benchmark reported by cohere labs. You are getting lower RAM (considering model offload) usage and slightly better performance for imo significantly less output quality.

↯ DeepSeek 4 ↯ DeepSeek 4 deepseek qwen
Running DeepSeek-V4-Flash on a Raspberry Pi (twitter.com via hn) +1 2w

Article Conversation Running DeepSeek-V4-Flash on a Raspberry Pi I ran DeepSeek-V4-Flash on a Raspberry Pi 5 (8GB edition) by streaming model weights from a PCIe attached NVMe SSD. Codex (GPT-5.5 xhigh) and Claude Code (Opus 4.8 max) drove…

↯ DeepSeek 4 ↯ DeepSeek 4 gpt-5 deepseek codex+2
Here are some tips on hitting nearly 200 tok/s for DeepSeek v4 Flash on Hopper (dnhkng.github.io via reddit) 2w

I needed a smarter model for my local Hermes Agent setup, so I moved to DeepSeek v4 Flash. First things first: Running 4 concurrent threads on vLLM, I can hit ~400 tok/s 400 x 60 x 60 x 24 x 30 is ~1B TOKENS per month!!!

↯ DeepSeek 4 vllm deepseek
Share your agentic LLMs and average cost ($/MTokens) (www.reddit.com via reddit) 2w

↯ DeepSeek 4 deepseek opus agentic
DStudio – local DeepSeek V4 with a design studio, reachable from your phone (github.com via hn) +1 2w

DStudio A native, local-first desktop app for DeepSeek V4 — chat, a coding agent and a design studio, all running on your Mac. Nothing leaves the device.

↯ DeepSeek 4 deepseek
Mimo v2.5 is better deal than DeepSeek v4 flash (news.ycombinator.com) +1 2w

So Hear me out. Not only on almost all benchmarks is mimo v2.5 is better than dsv4f flash, but also the pricing.

↯ DeepSeek 4 deepseek
Show HN: One API Key for 45 AI Models – Pay per Token, OpenAI Compatible (modelhub-api.com via hn) +2 2w

DeepSeek V4 math score equals GPT-5.5 (91) and trails by just 4-6 points in other categories — at 97% lower cost. Is the AI quality as good as GPT?

↯ DeepSeek 4 gpt-5 deepseek openai
DeepSeek V4 Pro beats GPT-5.5 Pro on precision (runtimewire.com via hn) +8412 2w

DeepSeek V4 Pro takes this matchup 38.0 to 33.0, and the margin feels earned. Across the scored tasks, the pattern is simple: Model A was tighter, more literal, and more reliable under constraints, while Model B was good but a little too w…

↯ DeepSeek 4 gpt-5 deepseek

← all models