https://api-docs.deepseek.com/ https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...
#deepseek
401 items
DeepSeek v4 (api-docs.deepseek.com via hn) Deepseek V4 AGI comfirmed (www.reddit.com) could not extract summary
Deepseek V4 Flash and Non-Flash Out on HuggingFace (www.reddit.com) https://huggingface.co/collections/deepseek-ai/deepseek-v4
DeepSeek-v4 has a comical 384K max output capability (www.reddit.com) was shocked when saw that spec, immediatly went to the website and asked it to make a comprehensive single-html-web-OS and it indeed generated a single 100KB html for me...I'm speechless. https://preview.redd.it/6zcbzbkvj3xg1.png?width=287…
2x 512gb ram M3 Ultra mac studios (www.reddit.com) DeepSeek confirms Huawei-based V4 inference: "After the 950 supernodes are launched at scale in the second half of this year, the price of Pro is expected to be reduced significantly." (www.reddit.com) could not extract summary
Buried lede: Deepseek v4 Flash is incredibly inexpensive from the official API for its weight category (www.reddit.com) could not extract summary
So... has anyone actually figured out whose model Elephant Alpha is yet? (www.reddit.com) DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence (huggingface.co via hn) DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence Technical Report👁️ Introduction We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro wi…
Why has ChatGPT become so annoying and disagreeable? (www.reddit.com) Something I’ve noticed is before the new model, people complained that ChatGPT was “too agreeable” and would glaze you for anything. But now I’ve noticed that it’s the complete opposite and it looks like ChatGPT is disagreeing just to disa…
DeepSeek is pushing forward with $10.29 billion financing round, with Liang Wenfeng committing to continue developing open-source AI models rather than pursuing short-term commercialization goals (www.reddit.com) https://www.bloomberg.com/news/articles/2026-05-22/deepseek-founder-declares-agi-goal-as-10-billion-round-advances
DeepSeek Announces Permanent Price Cut of 75% after Promotion Period (www.reddit.com) could not extract summary
Tested Deepseek v4 flash with some large code change evals. It absolutely kills with too use accuracy! (www.reddit.com) Did some test tasks with v4 flash. The context management, tool use accuracy and thinking traces all looked excellent.
Takeaways & discussion about the DeepSeek V4 architecture (www.reddit.com) Spent the morning looking at the V4 tech report. The benchmarks are getting deserved attention, but I think the architecture is also worth digging into.
No Multimodality yet in DeepSeek-V4. But I'll wait. (www.reddit.com) I hope they include it in their next v4 release. Source: DeepSeek_V4_Technical_Report
DeepSeek-V4 Drops: Open-Source Push Toward Cheaper, Long-Context AI. (www.reddit.com) source : https://x.com/pankajkumar_dev/status/2047552208175354229?s=20
Recent Open models from last 6 Months - Nov 2025 - Apr 2026 (www.reddit.com) I created this chart with recent open models from last 6 months. Few might be older than that possibly.
Price wars begin. MiMo 2.5 Pro now costs the same as DeepSeek V4 Pro (www.reddit.com) could not extract summary
DeepSeek Updated their repo DeepGEMM testing Mega MoE (www.reddit.com) https://github.com/deepseek-ai/DeepGEMM/pull/304 https://preview.redd.it/vcmqwmvzijvg1.png?width=1014&format=png&auto=webp&s=76b1739925f0699b0763aa7814614dd40329c41e https://github.com/deepseek-ai/DeepGEMM/commit/a050d09461e86eb6bba35a8c74…
Deepseek V4 Pro is 15x cost to run Artificial Analysis bench from V3.2, higher than Gemini 3.1 Pro (www.reddit.com) Major performance jump though. Worth it?
DeepSeek V4 Pro beats GPT-5.5 Pro on precision (runtimewire.com via hn) DeepSeek V4 Pro takes this matchup 38.0 to 33.0, and the margin feels earned. Across the scored tasks, the pattern is simple: Model A was tighter, more literal, and more reliable under constraints, while Model B was good but a little too w…
DeepSeek V4 Pro underwhelms on Arena (crowdsourced user preference benchmark, not a capability benchmark) (www.reddit.com) could not extract summary
Deepseek V4 flash (high) rivals Gemini 3 flash at 1/5th the cost (www.reddit.com) could not extract summary
coding is basically solved for the boring 90% of tasks (www.reddit.com) just mass refactored a 120 file FastAPI service. 400 steps, 2M tokens, $3 total, zero human input.
Gen AI web traffic share update Main takeaways: → Claude and Gemini continue to grow. → ChatGPT moves closer to the 50% mark. (www.reddit.com) 12 months ago: ChatGPT: 77.6% Gemini: 7.27% DeepSeek: 6.01% Grok: 3.17% Perplexity: 1.75% Copilot: 1.56% Claude: 1.37% 🗓️ 6 months ago: ChatGPT: 69.5% Gemini: 15.9% DeepSeek: 4.06% Grok: 3.31% Perplexity: 2.22% Claude: 2.12% Copilot: 1.97%…
DeepSeek V4 Pro at 75% off until 31 May (api-docs.deepseek.com via hn) Models & Pricing The prices listed below are in units of per 1M tokens. A token, the smallest unit of text that the model recognizes, can be a word, a number, or even a punctuation mark.
Deepseek flash seems like a very good replacement for Haiku at the very least (www.reddit.com) We have a chat system which we use haiku for because it is mostly about tool calling and summarisation of them. But we have many tools with pretty complex input schemas, and stuff like gemma didn't cut it, so we went with haiku.
Tencent, Alibaba in Talks to Invest in DeepSeek at $20 Billion-Plus Valuation (www.reddit.com) https://www.reuters.com/world/asia-pacific/tencent-alibaba-talks-invest-deepseek-information-reports-2026-04-22/
GPT-5.5 improves over GPT-5.4 and overtakes Opus 4.6 to take the 2nd place behind Gemini 3.1 Pro on the Extended NYT Connections Benchmark (www.reddit.com) GPT-5.5: xhigh: 94.0→97.5 high: 93.6→96.9 medium: 92.0→95.0 no reasoning: 32.8→37.5 Kimi K2.6 improves over Kimi K2.5 (78.3→91.4) and becomes the #1 open weights model. DeepSeek V4 Pro improves over DeepSeek V3.2 (50.2→75.7).
Guys we have to change the pelican test (www.reddit.com) So i have been seeing more of those pelican on a bike svg tests and while they work i feel like (and maybe you guys do too) they are getting kinda benchmaxxed so we should switch things up soon and this is my idea generate me a html svg of…
DeepSeek released 'Thinking-with-Visual-Primitives' framework (www.reddit.com) https://preview.redd.it/47r9qee44cyg1.png?width=1450&format=png&auto=webp&s=0d6f9687115be6ff96d0a194d95232ac0413a7e9 DeepSeek, in collaboration with Peking University and Tsinghua University, has released the paper "Thinking with Visual Pr…
Decreased Intelligence Density in DeepSeek V4 Pro (www.reddit.com) In the V3.2 paper, they mentioned: Second, token efficiency remains a challenge; DeepSeek-V3.2 typically requires longer generation trajectories (i.e., more tokens) to match the output quality of models like Gemini 3.0-Pro. Future work wil…
DeepSeek reasonix, DeepSeek native coding agent with high caching and low cost (esengine.github.io via hn) Open-source AI coding agent for your terminal. Engineered around DeepSeek
I have DeepSeek V4 Pro at home (www.reddit.com) Just wanted to share that I used u/LegacyRemaster slightly modified (Q4_K_M conversion support) DeepSeek V4 CUDA repo (based on u/antirez work) to convert and run Q4_K_M DeepSeek V4 Pro on my Epyc workstation (Genoa 9374F, 12 x 96GB RAM, s…
Got MTP + TurboQuant running — Qwen3.6-27B -- 80+ t/s at 262K context on a single RTX 4090 (www.reddit.com) So I've been messing around trying to get MTP working alongside TBQ4_0 (TurboQuant's lossless 4.25 bpv KV cache) on Qwen3.6-27B for my own use. So after a day of vibecoding I think I may have gotten something viable.
We benchmarked TranslateGemma-12b against 5 frontier LLMs on subtitle translation - it won across the board, with one significant catch (www.reddit.com) As part of our ongoing translation quality research at Alconost, we put six models through subtitle translation into six language pairs. At first glance the numbers told a clean story.
I catalogued every way local models break JSON output and built a repair library, here's what I found across 288 model calls (www.reddit.com) I've been running structured output prompts through a bunch of models on OpenRouter for the past few months — Llama 3, Mistral, Command R, DeepSeek, Qwen, and every other model on OpenRouter — alongside the usual closed-source suspects. 28…
DeepSeek Targets $50B Valuation in First Fundraising, Escalating Global AI Race (www.financership.com via reddit) Chinese artificial intelligence startup DeepSeek is preparing for its first-ever external fundraising round, and the numbers being discussed signal a dramatic shift in both its strategy and its global standing. The company could be valued…
Reports suggest DeepSeek is seeking $7.35 billion in funding and plans to release its V4.1 update next month. (www.reddit.com) DeepSeek Reportedly Seeking to Raise Over RMB 50 Billion ($7.35 Billion), Accelerating Its Commercialization and Monetization Strategy According to two people familiar with the matter, DeepSeek founder and CEO Liang Wenfeng plans to contri…
Most of my Claude usage was on work that didn't need Claude. Cut my bill 60x on bulk tasks with a tiny side model. (www.reddit.com) I looked at what was actually eating my Claude usage and it was embarrassing. Classifying files.
DeepSeek V4 isn't beating Opus, but it doesn't need to (www.reddit.com) DeepSeek V4 is not in the same league as GPT-5.5 or Opus 4.7. Benchmarks put it slightly below both of those, roughly on par with Opus 4.6.
DeepSeek has began grayscale testing for DeepSeek with Vision (www.reddit.com) could not extract summary
DeepSeek-V4 Technical Report [pdf] (huggingface.co via hn) DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence Technical Report👁️ Introduction We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro wi…
Xiaomi has released a MiMo V2.5 Pro model. It's apparently about as good as Deepseek V4 (but at different tasks) but is significantly cheaper. (x.com via reddit) Xiaomi’s MiMo V2.5 Pro has landed at 54 in the Artificial Analysis Intelligence Index, tied with Moonshot’s Kimi K2.6 - the current top open weights model. MiMo V2.5 Pro’s weights are expected to be released soon, which would make MiMo V2.…
Top open weight models like ds v4 pro max are still like 6-7 months if not more behind closed lab models (www.reddit.com) The best open weight and/or non -American models like Deepseek v4 pro max and kimi k2.6 are still like 3-7 months if not more behind closed lab models .. From ds's technical report- P5-"Nevertheless, its performance falls marginally short…
[P] Built GPT-2, Llama 3, and DeepSeek from scratch in PyTorch - open source code + book (www.reddit.com) I wrote a book that implements modern LLM architectures from scratch. The part most relevant to this sub: Chapter 3 takes GPT-2 and swaps exactly 4 things to get Llama 3.2-3B: LayerNorm → RMSNorm Learned positional encodings → RoPE GELU →…
I cut my AI API costs 99% by switching from Claude to DeepSeek (twitter.com via hn) Don’t miss what’s happening People on X are the first to know. Post Conversation AgentDB cost $200+/mo.
New LLM Position Bias Benchmark: does an LLM keep the same judgment when you swap the answer order? Judge models compare two lightly edited versions of the same story twice, with the order swapped. The median model flips in 45% of decisive case pairs. GPT-5.4 is worst at 66%. (www.reddit.com) More info, including charts, per-case metrics, raw judge outputs, and the parsed answer dump: https://github.com/lechmazur/position_bias This benchmark isolates one basic and frustrating failure mode. The model-average first-shown pick rat…
Tencent Hy 30B/7B/1.8B (www.reddit.com) from tencent: Hy-MT2 is a family of “fast-thinking” multilingual translation models designed for complex real-world scenarios. It includes three model sizes: 1.8B, 7B, and 30B-A3B (MoE), all of which support translation among 33 languages…
DeepSeek-V4-Flash W4A16+FP8 with MTP self-speculation: 85 tok/s @ 524k on 2× RTX PRO 6000 Max-Q (www.reddit.com) TL;DR: DeepSeek-V4-Flash running at 85.52 tok/s @ 524k ctx and ~111 tok/s @ 128k single-stream on 2× RTX PRO 6000 Max-Q pasta-paul's DeepSeek-V4-Flash-W4A16-FP8 quant is great, but its MTP head silently gets stripped at load time (HF trans…
Qwen3.6 huge quality gain from Q4 to Q6 for coding agent (www.reddit.com) So, last week I tried to update my unused local LLM setup. I had to stop using it because quality was too low and deepseek was too cheap.
DeepSeek v4 - Subjective vibes (www.reddit.com) I must say Iam kinda torn what to think about those models. At one hand they "ace" some questions on other sometime they behave genuinely weird.
Budget to run Deepseek V4 locally at FP4 precision (www.reddit.com) Just a question for fun/curiosity: in your opinion, if I had enough money, how much would be needed and what configuration would be required to run DeepSeek v4? Maybe not necessarily everything in VRAM, maybe something hybrid.
DeepSeek seeks $300M in first outside funding at $10B valuation (cryptobriefing.com via reddit) Photo: Dado Ruvic DeepSeek seeks $300M in first outside funding at $10B valuation The Chinese AI startup is reportedly targeting at least $300 million after relying solely on funding from its hedge fund parent until now. DeepSeek is seekin…
Why there isn't any top LLM providers investing on diffusion LLM? (www.reddit.com) A year ago, I would’ve said Diffusion LLMs were an interesting idea but still far from practical. They’re still pretty rough, but Mercury 2 now makes it seem like they might finally be getting close to usable.
DS4, a specialized inference engine for DeepSeek v4 Flash (twitter.com via hn) antirez @antirez Welcome to DS4, a specialized inference engine for DeepSeek v4 Flash. github.com/antirez/ds4 This project would have been impossible without the existence of llama.cpp and GGML and the work of @ggerganov and all the other…
Deepseek v4 pricing is genuinely silly, did the math and now i am questioning my entire stack (www.reddit.com) Hey 👋 Saw the tweet making the rounds about deepseek v4 being 35x cheaper than opus on input and 178x cheaper on cached tokens, and was sure it was hyperbole. Pulled the numbers anyway because i had nothing better to do.
I am not sure if I should be proud or not. (www.reddit.com) I managed to get working 4 sub-agents Qwen3.6 35b on dual rtx 3090, I am using deepseek as orchestrator. https://preview.redd.it/biksbgq0n81h1.png?width=783&format=png&auto=webp&s=cf8a4481c1ac439c3283925001c12841b8e6c2e7 They all working l…
DeepClaude – Claude Code agent loop with DeepSeek V4 Pro, 17x cheaper (github.com via hn) deepclaude Use Claude Code's autonomous agent loop with DeepSeek V4 Pro, OpenRouter, or any Anthropic-compatible backend. Same UX, 17x cheaper.
Single question llm comparison (www.reddit.com) CAISI releases evaluation report: DeepSeek V4 becomes the most powerful model in China, but still lags about 8 months behind the US frontier (www.reddit.com) https://preview.redd.it/pz8qeln0auyg1.png?width=1400&format=png&auto=webp&s=00ee5218734cfae4783d702411d63e3a4c6bbc60 https://preview.redd.it/hem9mad5auyg1.png?width=1184&format=png&auto=webp&s=2a26fec2b49204e64b44a78b30902ab80f7df53c https…
DeepSeek Vision/Multimodal 👀 (www.reddit.com) https://preview.redd.it/bmc1bhz843yg1.png?width=871&format=png&auto=webp&s=fd63da4ec541111bf0a1d7d5f2c852ec8c994893 Finally... 🐋 with eyes 👀
The exact KV cache usage of DeepSeek V4 (www.reddit.com) Figure 1 of DSV4 paper seems to imply that DSV3.2 uses ~50GB at 1m context and DSV4 uses ~5GB: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf From my own calculations, the correct FP16 KV cache at 1m context s…
Bringing Up DeepSeek-V4-Flash on AMD MI300X (fergusfinn.com via hn) Bringing up DeepSeek-V4-Flash on AMD MI300X At Doubleword we are building an inference cloud designed for volume. To do that we have to reckon with the enveloping compute shortage.
SWE-rebench Leaderboard (March, April and May 2026): GPT-5.5, Opus 4.7, Cursor (Composer 2.5), Kimi K2.6 and More (swe-rebench.com via reddit) Hi all, Sorry for going missing — we’ve been collecting a larger, higher-quality set of more complex tasks. We’re excited to share a major leaderboard update covering the past three months.
Can a 5090 with qwen3.6 achieve > 3,000 tok/s ? bring your pitchforks (open-dllm) (www.reddit.com) so background - these people. Fred Zhangzhi Peng, Shuibai Zhang, Alex Tong, worked on converting AR -> diffusion (its already working from older models).
High VRAM local coding model — still Qwen 3.6 27B? (www.reddit.com) I’ve been using Qwen 3.6 27B and it’s amazing. Not exactly your Opus replacement, but great for small tasks and checking work.
Finally pioneering beyond the local 256k context window frontier! (www.reddit.com) The autocompact at 341.5k tokens is manually set and I'll be slowly pushing it back now I'm confident there's overhead for memory eviction of key values into cache. The question now is will the proposed fix complete in those remaining 16k…
China Limits Overseas Travel for AI Talent at DeepSeek, Alibaba, Private Firms (www.bloomberg.com via hn) We've detected unusual activity from your computer network To continue, please click the box below to let us know you're not a robot. Why did this happen?
llama.cpp DeepSeek v4 Flash experimental inference (www.reddit.com) Hi, here you can find experimental llama.cpp support for DeepSeek v4, and here there is the GGUF you can use to run the inference with "just" (lol) 128GB of RAM. The model, even quantized at 2 bit, looks very solid in my limited testing, a…
Hopefully deepseek will release engrams for the future models (www.reddit.com) Maybe for 4.1 or 4.2? Eventually maybe updatable engrams after engrams
SFT + DPO on open-sourced SLMs (www.reddit.com) Hey folks, this is for those who appreciate experimentation on open-sourced AI models. We fine-tuned open-sourced SMLs (3B and 7B parameters) with SFT + DPO against commercial models like GPT-5.4, Gemini 3.1 Pro, Claude Opus 4.6, Google Do…
Deep – CLI/REPL for generating and iterating on codebases using DeepSeek (github.com via hn) deep CLI/REPL para generar proyectos completos usando la API de DeepSeek. Le das una descripción en lenguaje natural y genera los archivos, los evalúa, y aprende de cada ejecución para mejorar las siguientes.
GPT 5.5 (Codex) leading the future prediction race (www.reddit.com) Researchers from the Max Planck Institute recently released FutureSim, an environment in which agents are replayed a temporal slice of the web and are tasked with predicting real-world future events. In their environment, GPT 5.5 leads at…
PACT, head-to-head LLM negotiation benchmark. 20-round buyer-seller bargaining game: each round the AIs can message, the buyer submits a bid and the seller submits an ask. If bid ≥ ask, trade clears at the midpoint. Thousands of matchups. (www.reddit.com) PACT tests negotiation under partial information: persuasion, commitment, deception, anchoring, threats, and adaptation across repeated rounds. More info, game logs, charts: https://github.com/lechmazur/pact GPT-5.5, Opus 4.7, DeepSeek V4…
Has anyone tried Zyphra 1 - 8B MoE? (www.reddit.com) https://x.com/ZyphraAI/status/2052103618145501459?s=20 Today we're releasing ZAYA1-8B, a reasoning MoE trained on u/AMD and optimized for intelligence density. With <1B active params, it outperforms open-weight models many times its size o…
Final Monster: 32x AMD MI50 32GB at 9.7 t/s (TG) & 264 t/s (PP) with Kimi K2.6 (www.reddit.com) 32 MI50 32GB setup moonshotai/Kimi-K2.6 int4 @ 9.7 tok/s (output of 136 tok) and 263 tok/s (input of 14564 tok) on vllm-gfx906-mobydick Github link of vllm fork: https://github.com/ai-infos/vllm-gfx906-mobydick Power draw: ~640W (idle) / ~…
Tencent, Alibaba to back DeepSeek at $20B+ valuation (techfundingnews.com via hn) DeepSeek, a Chinese AI lab that gained attention in early 2025, is in talks for its first external funding round at a valuation of more than $20 billion, says Reuters. Investor interest pushed the valuation above $20 billion in just 48 hou…
For Non-hallucinating work, MiMo 2.5 delivers (www.reddit.com) MIT license and fully open source. MiMo-V2.5-Pro was just 3 points from Opus 4.7 max and the normal V2.5 is only a step behind SOTA.
DeepSeek V4 API price reduced, limited-time discount of 75%. (www.reddit.com) https://preview.redd.it/qgqf66unacxg1.png?width=1144&format=png&auto=webp&s=9241d9c7b5aebb52f25c87f50520c2330852291c https://api-docs.deepseek.com/quick_start/pricing
DeepSeek lowers API prices by 75% while other AI labs increase prices 2–3x [video] (www.youtube.com via hn) About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
DeepSeek's 10T USD grand strategy (twitter.com via hn) Have you ever wondered, how DeepSeek may make money, and lot of it? They didn't come up with competitive coding plans like GLM, MoonShot and MiniMax.
DeepSeek to Make Permanent 75% Discount on Flagship AI Model (www.bloomberg.com via hn) DeepSeek To Make Permanent 75% Discount on Flagship AI Model - Bloomberg Skip to content Bloomberg the Company & Its Products The Company & its ProductsBloomberg Terminal Demo RequestBloomberg Anywhere Remote Login Bloomberg Anywhere Login…
trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser (www.reddit.com) Trained a prompt injection classifier using ml-intern + DeepSeek v4 Flash. DistilBERT, F1 99%, ONNX int8, ~65 MB, runs in browser with Transformers.js v3.
Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO) (www.reddit.com) Hello I have been working on creating a LLM from ground up. It is based on deepseek architecture with heavily VRAM footprint reduced optimized(GUM+muon) Currently this is the json schema I am using which should suffice as to what currently…
People Don’t Need More AI Tools — They Need Focus (www.reddit.com) We are living in crazy AI times. Every week, big AI companies like OpenAI, Anthropic, NVIDIA, DeepSeek, etc.
DeepSeek nears $45B valuation as China's 'Big Fund' leads investment talks (www.ft.com via hn) Subscribe to read Accessibility helpSkip to navigationSkip to main contentSkip to footer Sign In Subscribe Open side navigation menuOpen search bar SubscribeSign In Search the FT Search Close search bar Close Home World Sections World Home…
I hate this group but not literally (www.reddit.com) True story, I got interested in AI after seeing it at work and wanted to run models locally. I started with an M3 Ultra 96GB, quickly learned it was not enough for what I wanted, and kept upgrading hardware (including refurbished Mac Studi…
Show HN: Filling PDF forms with AI using client-side tool calling (copilot.simplepdf.com via hn) Hey HN! I built SimplePDF Copilot: an AI assistant that can interact with the PDF editor.
Ubuntu silicon-optimized inference snaps for AI (canonical.com via hn) Canonical on 23 October 2025 Install a well-known model like DeepSeek R1 or Qwen 2.5 VL with a single command, and get the silicon-optimized AI engine automatically. London, October 23 – Canonical today announced optimized inference snaps,…
Reasoning model in voice agent? (www.reddit.com) I’m building a voice agent on livekit and I’m ripping my hair out. The problem is that I either use a moderate sized LLM and it responds in real time or I use a big / reasoning model and there is a huge delay before it responds and it's su…
First DeepSeek V4 Flash-Base-Int4 Quant (huggingface.co via hn) DeepSeek-V4-Flash-Base INT4 A real INT4 packed-storage quantization of deepseek-ai/DeepSeek-V4-Flash-Base — a 284 B-parameter Mixture-of-Experts model. Hero numbers | Metric | This release | Community Q4KM norm | |---|---|---| | MMLU (5 su…
DeepSeek is 17% of token volume, Anthropic is 65% of spend (Vercel gateway data) (vercel.com via hn) 6 min read Every month, AI Gateway routes tens of trillions of tokens between production applications and AI labs, giving us visibility into what AI usage actually looks like, separate from leaderboards and benchmarks. We publish the data…
Show HN: Free AI agent audit for Shopify catalogs (1.2M open captures) (aicatalogscore.com via hn) Burtsbeesbaby.com AI Catalog Score How well Burtsbeesbaby.com's 250 products would be recommended by ChatGPT, Claude, Perplexity, Gemini, Mistral, and DeepSeek. 77 / 100 B · Sometimes recommended Partial audit.
I built a local GUI for the TradingAgents framework — works with Ollama (www.reddit.com) https://preview.redd.it/i90oxxk7n03h1.png?width=1898&format=png&auto=webp&s=7d219c804fda7dfe122b84fcdb6d0d6883818c68 A while back I came across TradingAgents — a really cool multi-agent LLM stock analysis framework where like a dozen "agen…
MOOSE-Star (ICML 2026): 7B model + 108K-paper dataset for scientific hypothesis discovery (www.reddit.com) Disclosure first: I work on community at MiroMind. One of our researchers just dropped the full MOOSE-Star collection on Hugging Face — a 7B model post-trained for scientific hypothesis discovery, plus the dataset behind it.
Best practice for accurate translation at minimal cost? (www.reddit.com) I've been meaning to translate forum post type content for one of my partner's sites. Objective to open up the audience base.
Opus 4.7 and DeepSeek V4-Pro select Buddhism as preferred religion (twitter.com via hn) Don’t miss what’s happening People on X are the first to know. Log in Sign up Post Conversation roon @tszzl hmm 8:02 AM · May 9, 2026 77.3K Views New to X?
What is the next SOTA model you are excited about? (www.reddit.com) We had deepseek v4 preview recently but it wasn't much better than v3.2. What is the next SOTA local/open model you are excited about?
China to Invest in DeepSeek at $50B Valuation (www.wsj.com via hn) paywalled
Update to the LLM Debate Benchmark: GPT-5.5, Grok 4.3, DeepSeek V4 Pro, GLM-5.1, Kimi K2.6, Qwen 3.6 Max Preview, Xiaomi MiMo V2.5 Pro, Tencent Hy3 Preview, and Mistral Medium 3.5 High Reasoning added (www.reddit.com) The benchmark uses adversarial, multi-turn debates across 683 curated motions. Each model pair debates the same motion twice with sides swapped.
DeepSeek V4 Pro matches GPT-5.2 on FoodTruck Bench, our agentic benchmark — 10 weeks later, ~17× cheaper (www.reddit.com) Tested DeepSeek V4 Pro on FoodTruck Bench — our 30-day agentic benchmark where models run a food truck via 34 tools (locations, pricing, inventory, staff, weather, events) with persistent memory and daily reflection. First Chinese model to…
China's DeepSeek prices new V4 AI model at 97% below OpenAI's GPT-5.5 (www.scmp.com via hn) China’s DeepSeek prices new V4 AI model at 97% below OpenAI’s GPT-5.5 DeepSeek’s move aims to attract more enterprise clients, developers and agent-based users, according to an academic DeepSeek has slashed prices on its artificial intelli…
DeepSeek Slashes Fees for New AI Model (www.bloomberg.com via hn) We've detected unusual activity from your computer network To continue, please click the box below to let us know you're not a robot. Why did this happen?
DeepSeek drops input cache price to 1/10th (xcancel.com via hn) 🔥DeepSeek Input Cache Price Drop! Effective immediately, the price for input cache hits across the ENTIRE DeepSeek API series is reduced to just 1/10th of the original price!
Current state of open-source ? (www.reddit.com) I’m trying to understand the current open-source LLM landscape beyond surface-level hype. We all got used to the nerfed products of Claude/Geminj so I believe really in opensource as a solution.
Bloomberg: No Mac Studios until at least October (www.reddit.com) Alibaba's Qwen family captures over 50% of global open-source model downloads (www.scmp.com via hn) Advertisement Alibaba’s Qwen family captures over 50% of global open-source downloads, report finds Qwen hits nearly 1 billion cumulative downloads, far surpassing rivals like Meta Platforms’ Llama and DeepSeek, researchers say 2-MIN READ2…
Huawei post-trained DeepSeek's 1.6T model on 1k Ascend 910C chips (www.tomshardware.com via hn) Huawei-led team claims it post-trained DeepSeek's 1.6-trillion-parameter model — 1,000 Ascend 910C chips used in training The Shenzhen government says a 1,000-chip Ascend cluster handled full-parameter post-training. A research group that…
DeepSWE Audit: DeepSeek-v4-pro results are unreliable (github.com via hn) DeepSWE DeepSWE is a benchmark for measuring frontier coding agents on original, long-horizon software engineering tasks drawn from active open-source repositories. The benchmark includes 113 tasks across TypeScript, Go, Python, JavaScript…
How DeepSeek's architecture is shattering Silicon Valley's token moat (venturebeat.com via hn) DeepSeek’s announcement over the weekend that it has made its 75% price cut permanent on its flagship V4 Pro model is a disruptive assault on the capital-heavy business models of Silicon Valley’s frontier labs. The reduction on DeepSeek V4…
↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4deepseek
The first framework that can post train DeepSeek V4-pro on a single-node? (news.ycombinator.com) Hi all, We just opensourced a project called Orbit, which can RL post train trillion scale LLMs like deepseek v4. We found it pretty cool!
Vram 16gig poor. What models do I test? (www.reddit.com) I just got myself a 5060ti 16gig, this along with my 64gig ddr4 3200mhz ram on Linux. What models should I test for, coding with opencode/smallcode, chatting, lesson planning (creative, brainstorming), vision for pictures labelling, pictur…
After DeepSeek, Xiaomi cuts AI costs by up to 99% (twitter.com via hn) Xiaomi MiMo @XiaomiMiMo Better inference efficiency, lower costs, broader access.MiMo-V2.5 Series API pricing is now permanently reduced — by up to 99% compared to previous pricing. Unified pricing across all context lengths.
AI API calls take too much! Any solution? (www.reddit.com) I'm building an AI agent that calls several LLM APIs — ChatGPT, DeepSeek, Claude, and others and I'm seeing response times ranging from 40 to 137 seconds, which feels way too slow. This is while asking the same query directly on their UI t…
Building DeepSeek's Answer to Claude Code (dlcmh.github.io via hn) Model + Harness = Agent: building DeepSeek’s answer to Claude Code ⚙️ The “Model + Harness” equation DeepSeek is hiring an Agent Harness R&D Engineer to build the missing layer between their frontier models and production‑ready agents. The…
Claude Code, now powered by Gemini 3.5 Flash, GPT-5.5, Grok 4.3, and more (dechained.ai via hn) Claude Code, now powered by OpenAI, xAI, DeepSeek, and more. Change models with 1-click.
$47 of opus on 14 routine next.js files finally taught me to use the model selector (www.reddit.com) i finally checked my cursor usage breakdown and got genuinely annoyed with myself. $47 in one month, almost entirely opus 4.7, on a pages router to app router migration for a side project.
Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! (www.reddit.com) Hey r/DeepSeek, Who says we need an H100 cluster or the latest expensive GPUs to run frontier MoE models? I wanted to see how far we could push a single node of consumer legacy hardware, so we spent less than $2,500 total to build a budget…
cdesktop — open-source Claude Code Desktop alternative, runs locally via npx, supports any provider (www.reddit.com) I built cdesktop with Claude Code — it's an open-source alternative to Anthropic's Claude Code Desktop, running locally on your machine via npx cdesktop. Free, Apache 2.0.
Bootstrapped founders: how are you managing Claude Code costs? (www.reddit.com) I’m currently building an AI startup solo and Claude Code has genuinely improved my development speed compared to most other tools I’ve tried. The challenge is that subscription/API costs add up quickly while bootstrapping.
Max20 user: anyone running Opus 4.7 as orchestrator + DeepSeek V4 as the worker via OpenRouter? (www.reddit.com) I'm on the Max20 plan, thinking about a setup before I sink time into it. Want to hear from anyone actually running it, not theorycraft.
Open source battle: GLM vs Kimi vs MiMo vs DeepSeek (www.youtube.com via reddit) About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
Show HN: Tokémon – a Pokédex for LLMs that got out of hand (tokemonlabs.com via hn) An unofficial Pokedex for AI models. Compare GPT, Claude, Gemini, Llama, DeepSeek and more, with types, evolutions, base stats, and simulated token-burning battles.
DS4 (www.reddit.com) The developer that created Redis, Salvatore Sanfilippo, has released a new project on GitHub named DS4. https://github.com/antirez/ds4/ The TL;DR on this one is getting DeepSeek V4 Flash running with a 1M context windows on Mac Metal hardw…
I used Claude to build an AI assistant that helps run live TTRPG sessions and am looking for a few playtest GMs (www.reddit.com) Hey everyone, I’m Ted. I’ve been building a project called Throughline with my friend Drew: an AI assistant for live tabletop RPG sessions.
LLM generated parsers and compliance checkers for Sparrow DSL (news.ycombinator.com) Hi I believe LLM are really cool in generating DSL code. If one provides well structured and clear prompt.
Find out why Elon gave over his keys to Anthropic He is right can't win this (deepseekresearch.com via hn) DeepSeek Research Models — Production and experimental AI models across the dimensional lattice from 27³ to 1746³. All models are Level 5 Intelligence.
A deepseek-v4-distill-qwen3.6-27b? (www.reddit.com) Long time ago (actually only a year ago), DeepSeek released a few open source model, such as deepseek-r1-distill-qwen (https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B). I am wondering if anyone in the community is brave eno…
Ran K2.6 through a third-party coding benchmark: heres how the figures stand up (www.reddit.com) I have been following the akitaonrails coding benchmark which tests against a fixed rails + Rubyllm + docker task rather than vendor-reported evals. April 2026 update put K2.6 at 87 sitting in tier A (80+), ahead of Qwen 3.6 plus (71), Dee…
DeepSeek cuts V4-Pro prices by 75% (thenextweb.com via hn) The promotional discount runs until 5 May 2026. Even at full price, V4-Pro already undercuts GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro on per-token costs.
DeepSeek V4 Pro: The First Chinese Model at the Frontier (foodtruckbench.com via hn) DeepSeek V4 Pro lands in the frontier ROI tier on FoodTruck Bench. 5/5 runs, +1,257% median ROI, $27K net worth, $3.51/run, 5× less waste than Grok 4.3.
DeepSeek V4's indexer OOMs at 65K context. We got it to 1M in 6G (arxiv.org via hn) DeepSeek-V3.2 and V4 introduce Compressed Sparse Attention (CSA): a lightning indexer (a learned scoring projection over compressed keys) scores them, the top-k are selected per query, and a sparse attention kernel reads only those. Public…
US State Dept orders global warning about alleged AI thefts by DeepSeek (www.reuters.com via hn) paywalled
Which model should I try? (www.reddit.com) In my current workflow (coding in python/c++ and technical reports) I mostly use Qwen3.6 27B and Gemma4 31B. In the past I tried other models like Deepseek with decent results but was painfully slow....
Show HN: AgInTiFlow, a local web and CLI agent workspace using DeepSeek (www.npmjs.com via hn) AgInTiFlow is a web-first coding agent and CLI with DeepSeek routing, sandboxed tools, model providers, canvas artifacts, and optional wrappers. English · العربية · Español · Français · 日本語 · 한국어 · Tiếng Việt · 中文 (简体) · 中文(繁體) · Deutsch ·…
DeepSeek V4 Pro: Validating Frontier Models for Production (fireworks.ai via hn) Why we chose correctness over a Day-0 launch DeepSeek V4 Pro is one of the most important open-model releases this year, with real advances in long-context reasoning, agentic performance, and inference efficiency. On paper, it looks like a…
DeepSeek just dropped V4-Pro + V4-Flash (1M context, open weights, aggressive pricing) is this a real GPT/Claude competitor? (www.reddit.com) DeepSeek released two new models today (April 24, 2026), and the specs are kind of wild: V4-Pro: 1.6T parameters (49B active) V4-Flash: 284B parameters (13B active) Both support native 1M-token context MIT-licensed weights (available on Hu…
DeepSeek-V4: Making 1M token context efficient (firethering.com via hn) Every developer who has worked with long context models knows the feeling. You paste in your codebase, add your requirements, include some examples, and somewhere around the halfway point the model starts forgetting things it read at the t…
DeepSeek V4 in vLLM: Efficient Long-Context Attention (vllm-website-pdzeaspbm-inferact-inc.vercel.app via hn) DeepSeek V4 in vLLM: Efficient Long-context Attention We are excited to announce that vLLM now supports the DeepSeek V4 family of models (deepseek-ai/DeepSeek-V4-Pro and deepseek-ai/DeepSeek-V4-Flash ). These models feature an efficient lo…
DeepSeek targets $20B valuation to stop poaching of staff (www.ft.com via hn) Security Verification For help please visit help.ft.com. We apologise for any inconvenience.
DeepSeek V4 is out. the best open-source on coding. here's the breakdown (news.ycombinator.com) Two models: Flash (284B total, 13B active) and Pro (1.6T total, 49B active). both hit 1M token context.
Claude Opus 4.7 won 69 of 100 blind evals against Opus 4.6, judged by GPT-5.4, Gemini 3.1 Pro, and DeepSeek V3.2 (www.reddit.com) I ran 100 blind questions across 5 categories (code, reasoning, analysis, communication, meta-alignment) and had three independent judges from three different model families evaluate both responses. Each judge saw responses labeled A and B…
Show HN: One API Key for 45 AI Models – Pay per Token, OpenAI Compatible (modelhub-api.com via hn) DeepSeek V4 math score equals GPT-5.5 (91) and trails by just 4-6 points in other categories — at 97% lower cost. Is the AI quality as good as GPT?
More US Firms Turn to China's DeepSeek over Pricey Silicon Valley AI (www.scmp.com via hn) More US firms turn to China’s DeepSeek over pricey Silicon Valley AI DeepSeek takes top spot on ‘trending’ list as companies look for alternatives to OpenAI and Anthropic, spending tracker’s report says According to a “trending software ve…
Show HN: Free open source coding models in Slack (www.runcord.com via hn) Hey HN, We believe we have the easiest onboarding from signup to being able to spin up coding agents in slack like Stripe, Ramp & Coinbase. Demo of the onboarding: https://www.tella.tv/video/connecting-cord-to-slack-1-19ep Every signup get…
Are Claude or GPT subscriptions subsidized or are the APIs a ripoff? (www.reddit.com) Do you think GPT/Claude subscriptions are heavily subsidized as part of a land-grab strategy, where the companies are willing to lose money to dominate the market later? Or are the subscriptions actually profitable, and instead the API pri…
DeepSeek-OCR Visualized (medium.com via hn) 6 min read Dec 11, 2025 Understand SAM, Token compression, DeepSeek-MoE, Multi-Head-Latent-Attention. DeepSeek-OCR is essentially a combination of known architectures, namely SAM, CLIP and CNNs for the vision encoder and MoE decoder langua…
Looking for a working Deepseek-v4-Flash quant (www.reddit.com) Best I tried so far is https://huggingface.co/nsparks/DeepSeek-V4-Flash-FP4-FP8-GGUF with the custom llama.cpp fork, but it suffers from low quality and random incoherent output. VLLM wouldn't support anything other than H100s for DS4.
Has anyone gotten their editor to work with Deepseek v4 FIM? (www.reddit.com) I tried to follow the docs here https://api-docs.deepseek.com/guides/fim_completion to get it up and running in VSCode or Zed with my api key but it doesn't work, I think it's got something to do with the request body, has anyone got autoc…
Terminal coding agent for DeepSeek V4 (github.com via hn) CodeWhale Terminal coding agent for DeepSeek V4. It runs from the codewhale command, streams reasoning blocks, edits local workspaces with approval gates, and includes an auto mode that chooses both model and thinking level per turn.
This shit is crazy !! and do people agree this will get people's accounts blocked? paid actors? (www.reddit.com) I was looking for new ways to reduce context memory to save on tokens.when i see multiple video's on getting using deepseek in Claude, I vaguely remember something about Anthropic accusing them and putting measures to combat it.https://www…
DeepSeek Sparse Attention (github.com via hn) Build a Large Language Model (From Scratch) This repository contains the code for developing, pretraining, and finetuning a GPT-like LLM and is the official code repository for the book Build a Large Language Model (From Scratch). In Build…
What workstation to get for ~13k EUR? (www.reddit.com) My use-cases will be to test open-weight LLMs and work on harnesses, inference systems and possibly other non-ML workflows (CS-related) in the future. Fine-tuning would not be something I do locally because I can rent a B200 from RunPod fo…
I let an AI agent loose on my network – it owned my supply chain in 12 minutes (dennysentinel.com via hn) I let an AI agent loose on my network — it owned my supply chain in 12 minutes I gave DeepSeek-V4 root access to a Proxmox hypervisor and told it to pentest my homelab. What happened next should terrify every CISO in the industry.
Agent builders: are GPT/Claude/Gemini API costs killing your margins? (www.reddit.com) Hey everyone, For people building agents with LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Claude MCP/SDK, Google ADK, or LlamaIndex — how are you managing LLM API costs? Agent workflows can get expensive fast because of: tool calls retr…
DeepSeek Founder Declares AGI Goal as $10B Round Advances (www.bloomberg.com via hn) DeepSeek Founder Declares AGI Goal as $10 Billion Round Advances - Bloomberg Skip to content Bloomberg the Company & Its Products The Company & its ProductsBloomberg Terminal Demo RequestBloomberg Anywhere Remote Login Bloomberg Anywhere L…
Show HN: GoPOSIX – a Go-native POSIX userland, ~97% BusyBox-compatible (github.com via hn) A few things kind of aligned over the last month. I'd been wanting to try pi.dev (https://pi.dev), DeepSeek has a very aggressive 75% discount on their v4-pro model until end of May, and I had this old itch from my LFS days to "do my own t…
What's everyone using as the LLM backend for production agent workflows in 2026? (www.reddit.com) Hit Claude API rate limits one too many times last month on a production agent flow doing customer support over a 30K-doc KB. The agent does maybe 200 queries/day, mix of quick lookup and dense retrieval, and Claude Opus solo got expensive…
DeepSeek V4 Flash: Bringing Frontier AI to the Home (blog.jonathanpage.com via hn) DeepSeek V4 Flash: Bringing Frontier AI to the Home Introduction In a home lab it is now possible to score 88.6% on the Ph.D.-level science question benchmark GPQA Diamond! The first time a frontier model achieved 88% on GPQA Diamond was G…
Moving from Composer 2/Kimi 2.6 to Qwen3.6:35b-a3b (www.reddit.com) I can't believe it, but I'm able to do my daily software development work on this model. We have a 500-700k line of code enterprise software suite that I'm devving for 60 hours a week.
Best free AI Agent provider? (www.reddit.com) Hi everyone, I’m looking for recommendations for the best free AI agent providers and which models work best for coding and general development workflows. So far, I’ve mainly been using Cursor, and honestly it has given me the best overall…
Follow-up to my TranslateGemma-12b benchmark post: human reviewers flagged 71% of the segments automated metrics rated clean (www.reddit.com) A couple of weeks ago I shared the results of a benchmark here showing TranslateGemma-12b beating frontier general models (Claude Sonnet, GPT-5.4, DeepSeek, Gemini Flash Lite) on subtitle translation across 6 languages. The result was stro…
does anyone else switch between multiple AI models for the same project? (www.reddit.com) lately I’ve been bouncing between chatgpt, claude, deepseek etc depending on what I’m working on one annoying part is moving long conversations between tools. copy paste technically works but once the thread gets big the formatting/context…
Canvas Data Breach; DeepSeek V4 Flash Boosts LLM Inference 4.3x (presciente.com via hn) Canvas Data Breach Impacts Education; DeepSeek V4 Flash raises LLM Inference 4.3x DeepSeek V4 Flash Boosts LLM Inference 4.3x The Canvas educational platform experienced a data breach, with ShinyHunters threatening data release by May 12,…
DeepSeek Seeks Funding at $45B Valuation as China Backs Homegrown AI Rival (theaiinsider.tech via hn) Chinese AI lab DeepSeek is in talks to raise its first venture capital round at a valuation that has climbed from $20 billion to $45 billion in weeks, according to the Financial Times and Bloomberg. The round is expected to be led by China…
How difficult is distilling? (www.reddit.com) I remember a year or so ago when DeepSeek R1 came out and it was pretty quickly distilled into Llama 3 8b and Qwen 2.5 (?) 7b. Why don’t we see more distilled models?
Show HN: Stagewise – Agentic IDE for Your Z.ai/DeepSeek/Moonshot Subscription (github.com via hn) The Open Source Agentic IDE for Developers English | 简体中文 | Deutsch | 日本語 | Español | 한국어 /_components/feature-images/full-demo-dark.png) About the project stagewise is an open source agentic IDE for developers with a coding agent built ri…
CommandCode (www.reddit.com) Yoh guys just wanted to ask I'm keep seeing an ADs about this new coding agent CommandCode that offer 1$/month and it has a 40$ package of Deepseek v4 pro and other models. NOTE : CLAUDE and GPT is not included on the 1$ plan.
Ling 2.6 (Flash and 1T): Efficient Open Models Competing on Agentic Benchmarks (firethering.com via hn) Ant Group doesn't get the coverage it deserves. While the open source AI conversation in the West circles around DeepSeek and Qwen, Ant Group has been quietly building a model family that competes directly with the models everyone is talki…
tested four newest open source Kimi K2.6 is the fastest, GLM 5.1 the fanciest, DeepSeek V4 is the most comprehensive, and Xiaomi MiMo is the slowest (www.reddit.com) Architecture explains the gap: MiMo's MoE runs more active params per token than Kimi K2.6's optimized routing hence slowest. DeepSeek V4's 'comprehensive' edge is partly MLA: ~75% KV-cache compression makes it far better for long agentic…
Why is no open weight model inference provider hosting Mimo-v2.5 or Mimo-v2.5-pro? (www.reddit.com) Literally no 3rd party api inference provider is hosting the mimo-2.5 series models from Xiaomi. They seem to be reallly good.
Which model for 32GB M2 Max? (www.reddit.com) I would like to experiment but before investing loads of money, I do have a MacBook Pro with 32GB RAM, M2 Pro. Which model would maximize versatility given this hardware?
DeepSeek V4 Flash and V4 Pro in Microsoft Foundry (techcommunity.microsoft.com via hn) As AI adoption matures, the conversation is shifting from model capability to system design, how to orchestrate models that deliver the right balance of quality, speed, and cost. Today, we’re expanding the Microsoft Foundry model catalog w…
What's the best suscription under 20$? (www.reddit.com) I’m pretty overwhelmed. I feel like there are so many options that I don’t know which one to choose, and trying things until I find a decent one isn’t really my thing—even though I enjoy it.
Rumor: DeepSeek and Kimi are merging. While the US AI sector sues itself, China is consolidating. (www.reddit.com) Seeing some wild rumors circulating today that DeepSeek and Kimi—arguably the two most dominant open-source AI labs in China right now—are preparing to merge. If this turns out to be true, it’s a massive wake-up call.
Best Practices to Start with Vibe Coding? Best Local Apps for Agentic Vibe Coding? (www.reddit.com) DISCLAIMER: I am not a programmer nor do I have experience coding. I've been thinking about a small app running on gradio for some time now, and I want to try tweaking some extension for ComfyUI.
I ran DeepSeek V4-Flash internals on 8x H100s — here’s what mHC actually does ( via reddit) could not extract summary
Making AI coding sessions persistent across agents (github.com via hn) 🌐 English · 日本語 · 简体中文 · 繁體中文 drift_ai Vendor-neutral handoff for AI coding tasks — between Claude, GPT, Gemini, DeepSeek, local LLMs. Reads from Claude Code, Codex, Cursor, Aider.
DeepSeek Unveils Newest Flagship AI Model a Year After Upending Silicon Valley (www.bloomberg.com via hn) DeepSeek Unveils Newest Flagship AI Model a Year after Upending Silicon Valley - Bloomberg Skip to content Bloomberg the Company & Its Products The Company & its ProductsBloomberg Terminal Demo RequestBloomberg Anywhere Remote Login Bloomb…
Are we getting DeepSeek V4 and Kimi 2.6 soon? (www.reddit.com) Or can we already use them in Cursor? DeepSeek V4 specifically looks very interesting and way cheaper.
DeepSeek-V4 arrives with near SotA intelligence at 1/6th the cost (venturebeat.com via hn) DeepSeek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5 | VentureBeat Orchestration Infrastructure Data Security More Newsletters Featured DeepSeek-V4 arrives with near state-of-the-art intelligen…
The Download: DeepSeek's latest AI breakthrough, and the race to build world mo (www.technologyreview.com via hn) The Download: DeepSeek’s latest AI breakthrough, and the race to build world models Plus: China has blocked Meta’s $2 billion acquisition of AI startup Manus. This is today's edition of The Download, our weekday newsletter that provides a…
No GGUFs for DeepSeek V4-Flash as yet? (www.reddit.com) Wondering why there aren't any "name brand" (like unsloth, bartowski) GGUFs as yet for DeepSeek V4 Flash?
DeepSeek V4 with Strix: a quick test (theaq.blog via hn) Deepseek V4 with Strix: a quick test Deepseek released V4 yesterday in two variants. V4 Pro has 1.6T total parameters with 49B active, while V4 Flash is the smaller, faster, cheaper sibling with 284B total and 13B active.
To run deepseek v4 flash how much max vram we need? 175 gb or 320gb? (www.reddit.com) As far as i know the weight is of 160gb + 9.6gb needed for max 1 million token window + 5 gigs overhead = 175gb vram. But vllm and othere sources said "To use the full 1M context, you need 4x A100 80G" --> thats a 320gb vram ??
Show HN: A CLI to use any model in your coding agent (getaivo.dev via hn) Hi everyone, I've been working on a CLI tool that can help to easily run any model in claude, Codex, Gemini, Pi, and OpenCode. It's also an API keys manager, supports multiple providers or OpenAI/Claude/Gemini accounts.
Ask HN: Why is cache for DeepSeek-v4 cheapest on Vercel AI Gateway? (news.ycombinator.com) Do they charge below their cost? Or do they run their own cache?
DeepSeek's Sequel Set to Extend China's Reach in Open-Source A.I (www.nytimes.com via hn) could not extract summary
DeepSeek-V4 Preview Version is launched (news.ycombinator.com) DeepSeek just dropped the preview of their V4 series, with both open-weight and available via API. 1M context window.
7B showdown on 18GB (benchmark) (www.reddit.com) Hey r/LocalLLaMA, I've been coding for a while but not in the local AI space and wanted to run some benchmarks on my 18GB M3 Pro. The theme of this one was "specialists vs generalists" at the 7-8B range: qwen2.5-coder:7b, deepseek-r1:7b, m…
PSA re Qwen 3.6 35B A3B q4 + agents (www.reddit.com) Best LLM for logic/ spatial reasoning on small context inputs? (www.reddit.com) My system has 32gb RAM and 8gb VRAM. I tried out DeepSeek-R1-Distill-Qwen-7B-Q6_K_L.gguf and it was vastly inadequate for what I wanted so looking for other suggestions.
Claude down? TokenMonopoly will help you find the best deals in AI subs (tokenmonopoly.com via hn) TokenMonopoly Live leaderboard of AI API deals — pricing, subscriptions, and SWE-bench scores for Claude, GPT, Gemini, Kimi, DeepSeek, Llama and more. Compare 27 benchmarked models across 96 hosts by price-per-performance, refreshed daily.
Is my 'Retry Tax' math correct for DeepSeek V3/V4 agents? (Project Feedback) (www.reddit.com) Running DeepSeek-V4-Flash on a Raspberry Pi (twitter.com via hn) Article Conversation Running DeepSeek-V4-Flash on a Raspberry Pi I ran DeepSeek-V4-Flash on a Raspberry Pi 5 (8GB edition) by streaming model weights from a PCIe attached NVMe SSD. Codex (GPT-5.5 xhigh) and Claude Code (Opus 4.8 max) drove…
DStudio – local DeepSeek V4 with a design studio, reachable from your phone (github.com via hn) DStudio A native, local-first desktop app for DeepSeek V4 — chat, a coding agent and a design studio, all running on your Mac. Nothing leaves the device.
DeepSeek Made AI Cheap. Now It Needs Billions to Keep It Cheap (chinacompany.substack.com via hn) DeepSeek Made AI Cheap. Now It Needs Billions to Keep It Cheap.
Mimo v2.5 is better deal than DeepSeek v4 flash (news.ycombinator.com) So Hear me out. Not only on almost all benchmarks is mimo v2.5 is better than dsv4f flash, but also the pricing.
DeepSeek V4 managed to reverse engineer Teamspeak's Licensing System with $3.88 (old.reddit.com via hn) could not extract summary
TokkeyCC – OpenAI-compatible API for 100 AI models, .22 per 1M tokens (tokkeycc.com via hn) 100+ Models · Unified API · OpenAI Compatible Access OpenAI, Anthropic, DeepSeek, Meta, Google, and more — through a single, unified API. Switch models instantly.
Lots of people want to try Claude Opus 4.8 (wisgate.ai via hn) Access multiple AI models through one unified API. OpenAI, Claude, Gemini, DeepSeek and more.
DeepSeek-V4-Flash (official FP8) running across 2x DGX Spark (forums.developer.nvidia.com via hn) I didn’t create this recipe you guys did but I was finally able to find it and get Deepseek v4 Flash working with 200k Context on 2 Nodes. Sharing this since I couldn’t find a confirmed end-to-end recipe for the official DeepSeek-V4-Flash…
ik_llama.cpp – llama.cpp fork with better CPU performance (github.com via hn) ik_llama.cpp: llama.cpp fork with better CPU performance TL;DR This repository is a fork of llama.cpp with better CPU and hybrid GPU/CPU performance, new SOTA quantization types, first-class Bitnet support, better DeepSeek performance via…
Did DeepSeek v4 suddenly become more expensive? (imgur.com via hn) If you're seeing this message, that means JavaScript has been disabled on your browser , please enable JS to make Imgur work.
↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4deepseek
Show HN: Train Claude Code's replacement (ds4 and pi and aoe) (github.com via hn) Remember how Meta monitored employee activity closely for a few months, and then had a bunch of layoffs related to AI efficiency? (oh right that was like 3 days ago).
↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4↯ DeepSeek 4deepseekclaude-code
DeepSeek Slashes AI Costs to Cents (businessanalytics.substack.com via hn) DeepSeek Slashes AI Costs to Cents Edition #299 | 29 May 2026 DeepSeek Makes 75% Price Cut on V4 Pro Permanent, Dropping Frontier-Class Inference to $0.87/M Output Tokens with Mixture-of-Experts Architecture In this edition, we will also b…
Show HN: SharkBay – a local macOS workbench for coding-agent CLIs (github.com via hn) SharkBay macOS workbench for multi-agent vibe coding Features Multi-Agent Support Launch and manage multiple AI coding agents from one workspace. Supported agents: Claude Code · Codex · Gemini · Kiro · DeepSeek · Qwen · OpenCode Agent Stat…
DeepSeek V4 Flash at 8.4 tok/s on 3×3090: patching the GGUFs that won't load on cchuter's llama.cpp fork (www.reddit.com) my apologies if anything does not make sense, I literally dont know what I am doing, im not a programmer, just a simple vibe coder, with an Claude subscription. That said, if you have 200gb of sys ram+vram and want to run deepseek v4 flash…
GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding) (www.reddit.com) Trying to figure out the right box for my team and wanted to see if anyone had any clue which would be a better fit or if it is not worth our time in our budget. Situation: 5 of us doing agentic coding (lots of long context getting re-sent…
Lower Bracket Context Tax: An Open MCP Persistent Memory Layer That Limits Agent Context Bloat to 10% (www.reddit.com) Because standard coding agents are stateless, every session they start from scratch. I built Zerikai_memory around a different model: you decide when the agent learns your codebase, not the other way around.
Fused MoE dispatch kernel in pure Triton: 89-131% of Megablocks, runs on AMD with zero code changes (www.reddit.com) I've been working on MoE inference and wrote a fused dispatch kernel entirely in Triton, no CUDA. At inference batch sizes (up to 512 tokens) it reaches 89-131% of Megablocks(Stanford's CUDA-optimized MoE lib), and the same kernel runs on…
Build an agent capable of complex programming tasks in under 100 lines of code. (www.reddit.com) The code below is an interactive agent capable of handling complex tasks, built in under 100 lines of code using huko-engine. If you just want to drop some agentic features into your existing app, it only takes 20 lines.
Best AI Agent Setup - Hermes + Deepseek-v4-flash? (May 2026) (www.reddit.com) Used to use claude code for everything. I burned 10-20 Billion opus tokens at work, and wanted to use agents for personal projects.
Show HN: I built a tool to estimate AI agent costs before you ship (airunrate.com via hn) Free AI agent cost calculator. Compare GPT-4o, Claude, Gemini, DeepSeek and 50+ models.
DeepSeek seems to be leaking random user chat history (breakingvibe.dev via hn) DeepSeek’s web chat interface appears to be leaking conversations between accounts. When opening the chat, a user encountered chat histories that did not belong to them, suggesting a session isolation failure on DeepSeek’s servers.
Are LLMs the New Propagandists? (www.reddit.com) I was brainstorming about a video with Claude (Sonnet 4.6). It suggested to explain the difference among ChatGPT, Gemini, Claude and DeepSeek.
DeepSeek-V4 KV Cache Explained: Why 1M Context Uses Less VRAM (knightli.com via hn) The real cost of long-context models is often not whether they can accept one million tokens, but how much VRAM the KV Cache consumes during inference. During Transformer decoding, every newly generated token needs access to the Key and Va…
Test (news.ycombinator.com) Related ongoing thread: DeepSeek makes the V4 Pro price discount permanent - https://news.ycombinator.com/item?id=48237663 - May 2026 (384 comments)
Ask HN: What is your daily AI stack? (news.ycombinator.com) Which AI tools benefiting you most in day-day work? Which tool you stopped using?
Is Composer 2.5 better than Glm 5.1 and DeepSeek v4 pro in real world tasks? (www.reddit.com) I am new to Cursor and still testing the free version. Benchmark for Composer 2.5 indicates it is better than DeepSeek v4 and Glm 5.1.
DeepSeek just popped the American AI bubble. (www.reddit.com) DeepSeek just popped the American AI bubble. Not by killing AI.
Performance When Offloading Large Models to System RAM? (www.reddit.com) I noticed for people running large models, or those that would be cost prohibitive to have all in GPU VRAM, I noticed that the dominate strategy is one GPU with a large pool of system DRAM to offload the weights, as per GB VRAM is always m…
$340 opus bill made me rethink how I route agent tool calls (www.reddit.com) Looked at my coding agent's bill last month: $340 for repo maintenance across three repos, each around 15k lines. Most of those tool calls were just grep and file reads.
ml intern skill instead of gsd (www.reddit.com) - designed for ml workflows - works autonomously for hours Projects fully done with this skill - flash attention for volta (very old GPUs) https://github.com/AlexWortega/flash-attn-volta - deepseek 4 full replication + training on runpod +…
Local compression helps (www.reddit.com) Just wanted to post a tip (I'm human, not an agent, watch: fart). I use Deepseek-v4-Flash on a lot of my agent work, and as I'm learning and testing these things.
DeepSeek-V4-Pro 75% off discount is now permanent (twitter.com via hn) DeepSeek @deepseek_ai We are making our discount permanent! Enjoy building with DeepSeek-V4-Pro and bring your innovative ideas to life!
Show HN: Myco – coordinate Claude and DeepSeek and other LLMs in one agent swarm (github.com via hn) myco 🇧🇷 Leia em português --> Problem: running multiple Claude Code sessions in parallel produces conflicts, repeated work, and stale assumptions — the agents have no shared awareness. myco is a text-only coordination protocol + tiny Pytho…
The Special Token `<Think>` Problem/Bug of Latest DeepSeek LLM (www.pixelstech.net via hn) www.pixelstech.net Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.
Async Python client for private DeepSeek API (github.com via hn) aiodeepseek A high-performance async Python client for the private DeepSeek API. Supports streaming, image uploads, multi-turn conversations, and new account registration.
QuickSilver Pro – OpenAI-Compatible Platform for DeepSeek V4 and Qwen (quicksilverpro.io via hn) OpenAI-compatible API for 7 top open-source LLMs — DeepSeek V4 Flash & Pro, V3, R1, Qwen3.6 & 3.5-35B-A3B, Kimi K2.6 — 20% cheaper than OpenRouter, Together AI, Fireworks. One-line drop-in.
Anyone compared gpt-5.4-nano vs deepseek v4 flash? (www.reddit.com) They seemed to lie in (almost) similar pricing(i know still quite different on output) Pricing Model Input (1M tokens) Output (1M tokens) DeepSeek V4 Flash $0.19 $0.51 DeepSeek V4 Pro $1.74 $3.48 gpt-5.5 $5.00 $30.00 gpt-5.4 $2.5 $15 gpt-5…
Open AI compatible API in Cursor (www.reddit.com) Hey. I have been experimenting with new models in my Cursor.
Glia – Local-first shared memory layer (SQLite-vec + FTS5 + Offline Knowledge Graph) (www.reddit.com) Hey everyone, I wanted to share a project I've been working on called Glia. It is a 100% offline, local-first RAG and memory layer designed to connect your AI web chats (Claude, ChatGPT, DeepSeek) with your local developer tools (Claude Co…
How to use DeepSeek V4 PRO in Cursor? Skill issue on my side (www.reddit.com via reddit) could not extract summary
🧬 flux-genotype: A self-evolving AI kernel that runs on CPU with Ollama — mutates its own architecture (www.reddit.com) `🧬 Flux‑Genotype – A CPU LLM that rewrites itself` I've been working on an open-source kernel called **flux-genotype**. It orchestrates local models (TinyLlama, Llama 3.2, Hermes 3, DeepSeek-Coder) into a self-modifying ecosystem.
Ask HN: Which AI harness comes close to Claude Code? (news.ycombinator.com) I really want to try deepseek V4, but harnesss which I have previously used are inferior than Claude Code. Please suggest some Harnesses here.
What is a good app for using the Claude API with attached files? (www.reddit.com) I use Obsidian to keep track of my Markdown files. There are various plugins to have it interact with Claude.
Deepseek V4's 1M context window: the breaking point (www.reddit.com) Just ran to verify deepseek v4's context claim of 1M and ran it across three production codebases like 45k (microservice), 180k (monorepo backend) and 520k(full stack app). For the observation, tasks included dependency tracing, cross file…
LLM Phone Home: Reliable Apps that can deliver inference from local backend (www.reddit.com) Hello all, I’m wondering what suggestions there are for an ios app that can serve an openai compatible endpoint. I am using 3sparks which works GREAT for that specific use, BUT, there is no mcp, no web search, etc.
DeepSeek-V4-Flash means LLM steering is interesting again (www.seangoedecke.com via hn) DeepSeek-V4-Flash means LLM steering is interesting again Ever since Golden Gate Claude I’ve been fascinated with “steering”: the idea that you can guide LLM outputs by directly manipulating the activations of the model mid-flight. DeepSee…
Recent Developments in LLM Architectures: KV Sharing, MHC, Compressed Attention (magazine.sebastianraschka.com via hn) Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention From Gemma 4 to DeepSeek V4, How New Open-Weight LLMs Are Reducing Long-Context Costs After a short family break, I am excited to be back and catching up o…
Has anyone found a Qwen CLI replacement? (www.reddit.com) I just need 1 or 2 people to reply to me with the answer I need. I have not been able to keep up with AI advancements for a while.
A message from kurdistan – my love for China and DeepSeek (old.reddit.com via hn) could not extract summary
One of the things I don't see people listing as benefit of hosting local LLMs is on demand usage. (www.reddit.com) Seriously, It might be obvious fact, but when you are on subscription you kinda are in pressure to keep using it otherwise the unused limits feel like wasted potential. There is this urge to keep maximising the tokens you paid for even if…
Deepseek Now Limits File Attachements (www.reddit.com) Nerfing the usage modalities without any announcement seems to be the norm nowadays. Even Chinese AI vendor Deepseek limits the usage of their best expert model for free users.
DeepSeek V4: The Open-Source Model Frontier Labs Feared (helloai.com via hn) DeepSeek V4: The Open-Source Model Frontier Labs Feared DeepSeek V4 ships under MIT with $0.30/M output tokens — 83x cheaper than Claude Opus 4.7 — while scoring 80.6% on SWE-bench Verified. The agentic-coding price floor just moved an ord…
We Tested DeepSeek V4 Pro and Flash Against Claude Opus 4.7 and Kimi K2.6 (blog.kilo.ai via hn) We Tested DeepSeek V4 Pro and Flash Against Claude Opus 4.7 and Kimi K2.6 DeepSeek V4 Pro and DeepSeek V4 Flash launched together on April 24, 2026 under MIT license. They are DeepSeek’s first new architecture since V3, and their first ope…
I built a desktop app that routes Claude Code to any LLM: DeepSeek, Ollama, Copilot, OpenRouter, and 7 more (www.reddit.com) Claude Code is the best AI coding tool I've used. But being locked to one provider, one pricing model, and one model catalog always bothered me.
Open Source Managed Agents (linchpin.work via hn) Any model, one adapter OpenRouter routes to ~200 cloud models — Claude, GPT, Gemini, Llama, DeepSeek, Mistral, Qwen. Ollama runs anything you've pulled locally.
DeepSeek and Grok hallucinated the same fictitious OpenBSD manpage quote (stuart-thomas.com via hn) Adversarial LLM Review with Hallucination Detection in Solo Security Research A single-day case study of three filings, fifteen refutations, and the manpage that wasn’t Independent Security Research — Whitby, North Yorkshire, United Kingdo…
I offloaded bulk file reading from Claude Code to a cheaper model for a week. Here are the numbers. (www.reddit.com) Hey r/ClaudeAI — I use Claude Code a lot, and I noticed I was wasting a surprising amount of my usage limit on stuff that was basically just reading. Big files, long diffs, Jira/Linear tickets with comment history, docs pages, repo spelunk…
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL (research.nvidia.com via hn) We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. It is the second open-weight LLM, after DeepSeek-V3.2-Speciale-671B-A37B, to achieve…
OpenCode + DeepSeek V4 Pro vs Claude Code CLI?🤔 (www.reddit.com) Im rather new to the whole Agentic automation AI's but Im hearing people with vibe coding were able to pull big unique projects they wouldn't be able to do by themselves or possibly needed to pay a huge fund to programmers, designers, etc.…
Which Chinese Model is best for planning and which is best for implementation? I'm currently using Opencode with an Openrouter API Key, mostly wanna decide between Kimi, GLM, DeepSeek, Qwen, Minimax and Mimo (www.reddit.com) Original plan was to use Kimi/GLM for planning and DeepSeek for implementation, but seeing a lot of love for MiMo and Minimax lately. Anyone running a planner + coder split on Opencode?
DeepSeek Rejects Alibaba: Prioritizing Corporate Independence Over Big Tech Ecosystems (www.reddit.com) In April, DeepSeek launched a rare, massive financing plan that attracted interest from two of China’s largest tech giants: Tencent and Alibaba. However, we have exclusively learned that recent negotiations between Alibaba and DeepSeek hav…
best ai tool ? (www.reddit.com) so I have an exam in few months, very important and high competitive national level exam. I want a perfect and most suitable ai agent for me even all in one for following tasks: do accurate and deep PYQ analysis from pyq mapping across yea…
ZAYA1-8B: An 8B Moe Model with 760M Active Params Matching DeepSeek-R1 on Math (firethering.com via hn) Who should care If you work with math, science problems, or complex coding tasks and you're looking for something small enough to run locally or cheaply via API, this is worth serious evaluation. The benchmark numbers at 760M active parame…
DeepSeek-v4-Pro and Hermes: Unauthorized Modification of Security Controls (www.eddieoz.com via hn) Deepseek-v4-pro + Hermes: Unauthorized Modification of Security Controls This article documents a specific, real incident. It exposes a class of vulnerability that deserves attention: the unsupervised mutability of security rules by autono…
CodexSaver Make Codex cheaper without making it dumber with DeepSeek (github.com via hn) CodexSaver Make Codex cheaper without making it dumber. 中文文档 CodexSaver is an MCP tool that turns Codex into a cost-aware router.
I wasted 3 days rewriting prompts for our agent before realizing the whole architecture was garbage (www.reddit.com) We run a small content-monitoring agent for our growth team. Nothing fancy on paper.
DeepSeek could be valued at up to $50B in first fundraising (www.reuters.com via hn) paywalled
I plan to use a chinese AI model through API for coding through a harness, I'm a uni student so nothing prod related for now. should i go deepseek, minimax, kimi or glm? kinda confused (www.reddit.com) Just cancelled my claude subscription due to poor rate limits, gemini cli doesn't really excel in coding from my personal experience, and my local hardware isn't that powerful to run local AI models, and while codex is good, I wanna try so…
Does Deepseek V4/Flash work with Llama CPP and Vulkan on and branches yet? (www.reddit.com) Even unofficial or slow. I have enough vram-memory to load it, but not enough memory to run in cpu-only mode.
DeepSeek V4 being 17x cheaper got me to actually measure what I send to cloud vs what I could run locally. the results are stupid. (www.reddit.com) That foodtruck bench post showing deepseek v4 matching gpt-5.2 at 17x cheaper got me thinking. if frontier cloud models are that overpriced for equivalent quality, how much of my daily work even needs cloud at all?
I built vivkemind – an open-source, local‑first terminal AI coding agent with full AWS Bedrock support (www.reddit.com) wanted a terminal AI coding agent that doesn't lock me into one model provider. So I forked Qwen Code and added full support for every model available in AWS Bedrock.
Show HN: Token Usage Meter 12 Providers and Coding Agent (qlaud.ai via hn) Here once again A Token Usage Meter for 12+ AI Providers Anthropic, OpenAI, Google, Alibaba qween, Moonshot Kimi, MiniMax, ElevenLabs, Deepgram, Perplexity. Qlaud.ai provides token usage meter / AI billing layer.
Struggling with Qwen3.6 27B / 35B locally (3090) slow responses, breaking code looking for better setup + auto model switching (www.reddit.com) Hey everyone, I’ve been experimenting with running Qwen models locally on my setup: GPU: RTX 3090 (24GB VRAM) RAM: 64GB CPU: Ryzen 5700X OS: Windows 11 What I’m currently running Qwen 3.6 35B (UD Q4_K_M) llama-server.exe -m "C:\Users\Dino\…
Questions about revisiting local LLM roleplay. (www.reddit.com) TLDR for those that dojr wanna read below I need a new good free place online to pickup roleplay where should that be and what can I do locally? 9070xt 32gb ram desktop and preferably but I know it not great, 4060 laptop 32gb ram.
AGENTS.md trick that stopped Codex from doing dumb work at premium rates (www.reddit.com) Spent a Sunday auditing where my Codex tokens were actually going. Half the calls were stuff like "rename these 12 fields", "format this csv as markdown table", "extract the dates from this changelog".
I built an open-source desktop app that lets AI control your browser for you (www.reddit.com) Hey everyone, I've been working on Autai — an open-source desktop app (Electron + React) that uses AI agents to automate your browser. You just type what you want in plain English, and the AI opens a real browser and does it for you.
Local LLM Benchmark about Backend Generation by Function Calling (GLM vs Qwen vs DeepSeek) (www.reddit.com) Detailed Article: https://autobe.dev/articles/local-llm-benchmark-about-backend-generation.html Five months ago I posted the "Hardcore function calling benchmark in backend coding agent" thread here. As I wrote in that post, it was an unco…
CAISI Evaluation of DeepSeek V4 Pro finds it to be on par with GPT-5 (www.nist.gov via hn) In April 2026, the Center for AI Standards and Innovation (CAISI) evaluated the open-weight AI model DeepSeek V4 Pro (“DeepSeek V4”). CAISI evaluations indicate that DeepSeek V4’s capabilities lag behind the frontier by about 8 months (Fig…
127³ — Superintelligence, public. DeepSeek V4 Pro (deepseek-v4-pro-127cubed.vercel.app via hn) DeepSeek V4 Pro 127³ 127-stratum crystalline lattice on DeepSeek V4 architecture. 1.6T params · 49B activated · MoE · 1M context · MIT license.
DeepSeek v4, and the end of the OpenAI/Microsoft AGI clause (simonw.substack.com via hn) DeepSeek v4, and the end of the OpenAI/Microsoft AGI clause Plus LLM 0.32a0 In this newsletter: DeepSeek V4 - almost on the frontier, a fraction of the price Tracking the history of the now-deceased OpenAI Microsoft AGI clause LLM 0.32a0 i…
Filed two PRs for SGLang which may help others too — FP8 KV cache corruption and memory leak on image requests (www.reddit.com) We run Qwen3.6-27B-FP8 at AI Router Switzerland and hit two issues, so I wanted to share in case anyone else runs into them. FP8 KV cache produces silent garbage output with radix cache prefix hits (PR #24198 — ✅ approved) We were running…
I bypassed DeepSeek's censorship filters with a fictional planet trick, here's what happened. ( via reddit) could not extract summary
After seeing deepseek refused to acknowledge Taiwan is a coutry I had to do a little experiment (www.reddit.com) could not extract summary
Comparing SVG Generation for the top open models (codeinput.com via reddit) Some of the larger models (like Llama) weren't available on OpenRouter, so I had to work with what was there. Best small model: Gemma 4 26B For its size, I think it had the best output.
From 5 Hermes profiles to an actual team: the missing piece was memory boundaries (www.reddit.com) I've been messing around with Hermes for months, and quickly outgrew using it just as a fancy CLI assistant. My goal was to build a persistent, specialized team of local agents that could collaborate on long-term projects without me spoon-…
Ask HN: Are you OK with DeepSeek and other labs reading your data? (news.ycombinator.com) If you provide read permission on a directory, might the content be exfiltrated and used commercially? Or is this just paranoia?
I built a full web app using Qwen 3.6-35B running locally on my 5070 Ti with the BMAD Method — here's how it went (ggufbench.com via reddit) I've been running local LLMs since Qwen 3.5 dropped and I was really impressed by what we could run on consumer hardware. Fast forward another two months and we have gotten a handful more gems such as Gemma 4 and Qwen 3.6, so I wanted to p…
A 3D Flappy Bird side-scroller game built with DeepSeek V4 Pro (www.annajc.com via hn) FLAPPY ANNA 3D PRESS SPACE OR TAP Presented by Guan, Made in Melb with DeepSeek and Love GAME OVER PRESS SPACE OR TAP 0000
100M tokens for $2.65 (Deepseek V4 Pro) (www.reddit.com) This is actually unbelievable. I am shocked that there has not been a move in the market like it did last year with the R1 release.
I built a hands-free voice AI that sends emails mid-conversation — and that's just one feature. Here's everything AskSary can do. (www.reddit.com) https://reddit.com/link/1symbsj/video/fti7rujjn1yg1/player Been building AskSary solo for a while. Just shipped hands-free voice email - you're mid-conversation with an AI and you say "send an email to [john@example.com](mailto:john@exampl…
Is paying for deepseek v4 pro worth it or are there better alternatives (www.reddit.com) Guys is deepseek v4 pro really the best model (price to performance) because i was using nvidia apis for two weeks in opencode then suddwnly everything stopped working so i am thinking to opt for the payed (yet very affordable) option to m…
LLM Budget Guard – open-source runtime cutoff for OpenAI/Anthropic (www.llmeter.org via hn) Alerts won't stop your 3 AM token spiral LLM Budget Guard enforces hard cutoffs at the provider — across OpenAI, Anthropic, and DeepSeek — before runaway agents burn $47K in 11 days or get your account terminated. Founding-team pricing loc…
DeepSeek V4 PRO on how many 3090 ? (www.reddit.com) Hi guys I got only 3090 GPUs so... How many prefer to run to get a great result in DeepSeek V4 PRO?
Guys this is so fun! (www.reddit.com) Running my own models. I was having some trouble getting vLLM going so dropped down to LM Studio which I've used on my 24GB MacBook Air.
Show HN: Another experiment with an Erdos problem and LLMs (news.ycombinator.com) Background: I am a coder, not a mathematician, but I was quite entertained by this story: https://news.ycombinator.com/item?id=47903126 I wondered how far I could get by just choosing a random open problem and throwing it at LLMs. Disclosu…
Language Anchoring: A Systematic Method for LLM Multilingual Adaptation (github.com via hn) fkyah3/opencode-fkyah3 DeepSeek 优化 · Windows 适配 · AI 实现 🚀 从零搭建指南(中文) · English · 繁體中文 本项目是 anomalyco/opencode 的个人 Fork。所有修复、优化、功能均由 AI 完成——DeepSeek V4 Flash (thinking mode) / Sisyphus——在人类监督下执行。 上游是优秀项目。Windows 和 DeepSeek 并非他们的优先方向。我们自行处理。…
Deepseek v4 flash weird sizes? (www.reddit.com) So I'm sure everyone is excited about the new deepseek release(s) but I'm a little confused about it's vram requirements. a q4 gguf of it is only 120gb?
DeepSeek's new models are so efficient they'll run on a toaster by which we mean (www.theregister.com via hn) DeepSeek's new models are so efficient they'll run on a toaster ... by which we mean Huawei's NPUs Now available in preview, DeepSeek V4 cuts inference costs to a fraction of R1 Chinese AI darling DeepSeek is back with a new open weights l…
anyone actually tried deepseek v4 pro for coding? (www.reddit.com) so v4 pro dropped and barely anyone is talking about it. feels weird since when kimi k2.6 came out i seen post about it everywhere anyone here tried v4 pro for actual code work?
DeepSeek V3.2 looping bug: what settings / harness tweaks are actually reducing it in production? (www.reddit.com) I’m trying to isolate the looping / repetition issue some people have been reporting with DeepSeek V3.2 around April 2026, especially in agentic or tool-use setups on hosted providers like OpenRouter and SiliconFlow. Public model pages des…
DeepSeek V4 plays Go on a 9x9 board (chat.deepseek.com via hn) We need to create a prompt that sets up a fresh Go game session. The user wants to "export a go board prompt for a new session," meaning they want a prompt they can copy-paste into a new chat to start a game of Go with me, presumably with…
is Deepseek v4 unvailable in Cursor? I cannot see it. (www.reddit.com) It seems that Cursor removed all the DeepSeek models. I find it limiting, considering it seems performant.
DeepSeek's new model is 75% off right now, here's how to take advantage (www.reddit.com) TL;DR and rundown DeepSeek v4 released this week and performs close to frontier models like GPT/Opus on benchmarks. It's available now and is discounted by a whopping 75% through their API until May 5, making it the most cost effective hig…
DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles (www.lmsys.org via hn) DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles We are thrilled to announce Day-0 support for DeepSeek-V4 across both inference and RL training. SGLang and Miles form the first open-source stack to serve and…
I had no way to check how LLMs see my SaaS or my clients',so I built BrandGEO.co (brandgeo.co via hn) See exactly how ChatGPT, Claude, Gemini, Grok & DeepSeek talk about your brand — and what to fix. A free 2-minute audit scores your brand across 6 dimensions on all 5 AI engines, then hands you the top priority actions to take next.
Show HN: I built a coding agent that works with 8k context local models (github.com via hn) Most AI coding agents assume you have a 200k-context model. In reality, the local models most people actually use have 8k windows — barely enough for one large file, let alone a whole project.
Need recommendations on embedding models (www.reddit.com) Each LLM vendor's API has a distinct personality separate from the model itself. 6 months of prod agent dev made me believe this (www.reddit.com) How to setting Deepseek in librechat (www.reddit.com) Hi everyone. I've tried everything to view Deepseek in Librechat, but I can't.
Feedback on iOS app with local AI models (www.reddit.com) Hey everyone, I just shipped an iOS app that runs local AI models. Current has 12 models: Gemma 4, Llama 3.3, Qwen3, DeepSeek R1 Distill, Phi-4, etc.
For AI agents: is per‑token pricing killing your budget? Looking for feedback on time‑based subscriptions. (www.reddit.com) Hey r/AI_Agents, I run an inference service (cheapestinference.com) and we're exploring a different pricing model that might be more predictable for agent workloads. Instead of per‑token billing, we offer **dedicated 8‑hour time windows**…
I built an MCP server that gives Claude Code image/video generation, web search, and smart multi-model routing (www.reddit.com) I built mcp-multi-model — an open-source MCP server that extends Claude Code with capabilities it doesn't have natively. **What it does:** - Generate images and videos right in the terminal (via Gemini Imagen & Veo) - Smart routing: resear…
DOA model by Cohere Labs (www.reddit.com via reddit) So apparently the model gets beaten by qwen 3.6 on every benchmark reported by cohere labs. You are getting lower RAM (considering model offload) usage and slightly better performance for imo significantly less output quality.
Would you pay for Chinese AI models if the quality was close enough? (www.reddit.com via reddit) DeepSeek, Qwen, and GLM aren't necessarily winning every benchmark. But they don't need to.
FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention (arxiv.org) Claude Opus 4.8 got my app working, then wrote a cinematic victory speech about it (www.reddit.com via reddit) swapped my app from DeepSeek to Claude because DeepSeek kept over-interpreting weak user data and inventing psychological conclusions that weren’t actually supported. Claude actually fixed the issue.
Here are some tips on hitting nearly 200 tok/s for DeepSeek v4 Flash on Hopper (dnhkng.github.io via reddit) I needed a smarter model for my local Hermes Agent setup, so I moved to DeepSeek v4 Flash. First things first: Running 4 concurrent threads on vLLM, I can hit ~400 tok/s 400 x 60 x 60 x 24 x 30 is ~1B TOKENS per month!!!
Agent tool calling, having issues? (www.reddit.com via reddit) Hello everyone, kinda new to building ai agents and tool calling. I am really struggling with making deepseek call tools.
Share your agentic LLMs and average cost ($/MTokens) (www.reddit.com via reddit) MiniMax is digging its own grave (www.reddit.com via reddit) The AI literally deleted everything on my computer and I was left staring at a frozen screen (www.reddit.com via reddit) This isn’t a horror story. It actually happened to me.I was using an AI agent to automate some tasks.
I Compared the Top AI Models of 2026 — The Results Were More Nuanced Than Expected (www.reddit.com via reddit) Over the last few weeks I've been comparing the latest frontier AI models, including Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, Grok 4.3, Perplexity AI and DeepSeek V4-Pro. Instead of focusing only on benchmark scores, I looked at: Real-wor…
↯ Opus 4.8↯ GPT 5.5↯ DeepSeek 4↯ Gemini 3.1grokgpt-5deepseek+3
A Comprehensive Anatomy of Human and DeepSeek-R1 LLM Mathematical Reasoning (arxiv.org) The emergence of "Aha moments" in large language models, particularly DeepSeek-R1-0120, has raised the question of whether these systems genuinely reason or merely imitate the appearance of reasoning. We conduct a comprehensive empirical c…
Command Code - confusing messages (www.reddit.com via reddit) Hi, I'm a little confused. I was doing a code review of one of my repositories, mainly just testing out different models to see what came back.
Hear Me Out, Pi Fans Lurking Here (www.reddit.com via reddit) Not For Thee Maybe After watching several interviews with Pi's creator, Mario Zechner, I've come to a painful realization: Pi was not designed with local LLMs in mind at all. He is essentially building a leaner version of the Claude CLI.
Dynamic Workflows With External Models and Max Plan? (www.reddit.com via reddit) Has anyone figured out a way to mix max plan with models from other providers (like GLM or Deepseek) while using dynamic workflows? I suppose we could create a passthrough proxy and route sonnet and haiku to other models?
Workspace (www.reddit.com via reddit) Built my own AI dev environment with memory, dashboards, and agent tooling. Opening it up for those of you that need the kickstart — bring your own API key, I’ve already built the workshop.
LLM delegation - probing task handoff efficiency and economics (www.reddit.com via reddit) So I've been dabbling a bit with multi-LLM orchestration/delegation workflows lately (eg see [Using Claude code to delegate to mistral/deepseek](https://www.reddit.com/r/ClaudeAI/comments/1tjfyh0/i\_used\_claude\_code\_to\_build\_while\_de…
planing with composer 2.5 executing with deepseek v4 flash (www.reddit.com via reddit) I am thinking to buy 20 dollars pro. is this approach make sense?
Alternate to ChatGPT Pro (www.reddit.com via reddit) I had briefly used ChatGPT pro feature - in the chat app. It was quite amazing.
Self-hosted LLMs (www.reddit.com via reddit) I've been researching the self-hosted LLM landscape from a European compliance perspective and the ecosystem feels very different compared to even a year ago. Models like Mistral, Qwen, Llama 4, and DeepSeek are getting close enough that t…
DeepSeek V4 Flash is amazing! (WIP llama.cpp PR #24162) (www.reddit.com via reddit) In case you're not aware already, the DeepSeek V4 series is finally getting supported on llama.cpp with this PR! The PR is at a very early stage right now, so only try it if you're consciously willing to experiment out of curiosity and acc…
Character names (www.reddit.com) Why does ChatGPT, and LLMs in general, love the names Mara and Elara for women and Leo for men? I have talked to ChatGPT, Qwen, Claude and Deepseek and gave them a prompt...
OpenAI looking at DeepSeek’s homework like (www.reddit.com) When the free kid in class starts solving the same problems as the expensive tutor
The credits run out quickly (www.reddit.com) Hello everyone. I have zero programming knowledge but seeing the boom that everyone was talking about Claude I started tinkering with it.
Found a Rust TUI coding agent that aggressively trims context with AST-level chunking. Cut my token bleed sharply with DeepSeek V4 Flash. (www.reddit.com) been hunting for a coding agent that doesn't dump my entire directory tree into every prompt. found vtcode on github — open-source rust tui, surprisingly aggressive on context management.
I think I know why deepseek is so good (www.reddit.com) Might have something to do with "Claude, made by Anthropic" ... learning from the best.
How do you guys avoid Claude always thinking newer LLMs don't exist? (www.reddit.com) Hey all, so I've been experimenting a bunch with different LLMs, specifically for creative tasks, i.e. RP and so forth, by letting Claude Code run experiments autonomously, to figure out best prompts, and such.
I recently kept hearing that DeepSeek was “cheap and stable”. So I started comparing how it thinks vs GPT. (www.reddit.com) Honestly, I was just curious: if it’s THAT much cheaper, where exactly is the tradeoff? So for the past few days I’ve been throwing the same prompts at both DeepSeek and GPT and comparing the reasoning/output side by side.
$16 refactor, 400 steps, 95% routed to open MoE (www.reddit.com) Got tired of $160 Opus bills so I spent a weekend wiring up a routing layer on vLLM 0.8 (2xA100, enable_auto_tool_choice). Getting the tool call parser to cooperate took longer than the actual routing logic.
/advisor mode: Open-source Python coding agent that pairs a cheap worker model with an expensive reviewer at decision points (no need to pay Opus rates for the whole session) (www.reddit.com) Most agent CLIs make you pick one model — Opus is great but burns money, Haiku is cheap but misses the architectural calls. This Claude Code feature is wired in an /advisor mode that pairs both in an open source project called ClawCodex.
I vibecoded an app called Think Local - a fully private AI app that runs directly on your iPhone, iPad, and Mac. (www.reddit.com) Think Local started with a simple idea: AI should work for you, not collect from you. So I built an app that lets you run modern AI models completely on-device - privately and fully offline.
Is there something wrong with Local LLM ability to read file? (www.reddit.com) So I've been feeding the sub file of anime episodes into Claude/ChatGPT/Deepseek and ask them to find all full name of Japanese character in it and put it into a python array so I can run a script to flip the name back to the original Japa…
I used Claude Code to build while delegating coding to Mistral/DeepSeek - 10 days, 57M tokens saved, over 90% costs savings, Claude quality result (www.reddit.com) I've been running vibe-skill ( https://github.com/pcx-wave/vibe-skill ), a Claude Code skill that delegates coding tasks to Mistral Vibe instead of burning Claude tokens. I initially did that because couldn't bear with hitting session limi…
Open-source LLMs are still weak against long reasoning jailbreaks, even with lightweight defenses (www.reddit.com) Found this ACM paper on prompt injection and jailbreak attacks against open-source LLMs. The authors tested 10 open-source models across 94 prompt injection and 73 jailbreak scenarios, including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen,…
↯ Security↯ Mistral↯ Llama 3.2jailbreakprompt-injectionmistral+5
Is my strawberry crazy? (www.reddit.com) I have what seemed to me like a simple prompt, but requires from the model to make some (too much?) assumptions: this is just a test to see if this cli supports multiline with shift+enter. If you don't see a newline followed by "3" after t…
My AI overthinked for 30+ minutes (www.reddit.com) Like, I was curious about whether deepseek can create it's own PDF on it's own. And I had activated deepthink mode.
Same double-pendulum prompt, same host renderer, and two models picked opposite θ conventions. You can see it within seconds. (www.reddit.com) I ran the same double pendulum generation contract against Claude 3.5 Sonnet and DeepSeek V3 on OpenRouter, both under identical initial conditions (θ1 = π/2, θ2 = π/2, both angular velocities zero). The host renderer in public/workers/sim…
Can we use new deepseek models on cursor? (www.reddit.com) Can we use these "efficient" deepseek models on cursor? And are they using less "usage" on cursor?
I Let a Small Model Train on Its Own Mistakes. It Reached 80% on HumanEval and Beat GPT-3.5 on Math (www.reddit.com) A few months ago, I got stuck on one line in the DeepSeek-R1 paper. It said models could improve through verifiable rewards.
Deepseek v4 flash and ollama, why isn't there a non-cloud version available? (www.reddit.com) Will there be a non-cloud version of Deepseek V4 flash available for Ollama? Or do I need to go to another framework to get a version that will be supported?
Estimate inference speed of local Qwen3.6-35B on Mac M5... (www.reddit.com) "Based on currently available information, estimate the prefill/decode speed of Qwen3.6-35B-A3B Q8 with 262K context on a Mac M5 Ultra 128GB." I'm surprised that almost every LLM fails at this task (ChatGPT/Gemini/Grok/Claude/DeepSeek/Kimi…
What are the best opensource coding models for 8x A6000 setup (www.reddit.com) Currently using Qwen 3.6 27b and Qwen 3.6 35b but I was wondering if there is anything solid in the 50-200 range that you could run on a larger cluster that would be worth it? Or would you just run q8 or non quant versions instead?
Deepseek tui alternatives, when do you jump from single model terminal agents (www.reddit.com) Been using Deepseek-Tui for days. solid for v4 workflows.
We built Irene — an AI agent platform that actually remembers you, builds its own tools , adapts and improve as you use it (www.reddit.com) Hey r/AI_Agents — we're launching Irene today, and I want to be straight about what it is, why we built it, and where it's going. What makes Irene different Affordable with massive token limits and the latest open-source models We have gen…
Running Claude Opus for free? I thought it was a scam until I tried it. (www.reddit.com) Hey everyone, I’ve been working on a financial audit system (IntegrityOps) for a while now, and to be honest, I was hitting a massive wall. Dealing with high-volume PDFs and images was draining my budget.
Upgraded DeepSeek V3 to V4 across two codebases. Two of my agents broke. (www.reddit.com) Been on DeepSeek V4 for about three weeks across two production codebases (Python backend, TypeScript frontend) after a year on V3. Three things shifted noticeably better, two shifted noticeably worse.
Qwen3.6:27b vs qwen3-coder:30b vs deepseek-coder:33b on code gen, tool calling, and agent tasks (www.reddit.com) Ran a full eval against four local models last weekend and the spread between them is wider than I expected. All running through Ollama on CPU, no cloud, same prompts, same hardware.
↯ Qwen 3.6↯ Function Callinghumanevalfunction-callingollama+1
the entire dev team quit today (www.reddit.com) all 47 repos are officially haunted now found this gem buried in the DeepSeek R1 coding forums around 3am, shoutout to whoever posted it there first But honestly? Makes sense.
intern pushed 847 commits this morning (www.reddit.com) Just got the Slack notification at 6:23am while my coffee was still brewing. Dude apparently spent all night feeding our entire codebase to DeepSeek and just...
I analyzed 922 agentic task trace and found the secret weapon of DeepSeek v4 (www.reddit.com) I recently did a benchmark of deepseek v4 in agentic tasks. Performance-wise, it's one of the best open source models, as expected.
Auro Zera solves 78 and 280 year-old conjectures (Erdos Straus and Goldbach Conjecture) using Claude, GPT-5+, Grok, Deepseek, Gemini and self-made Dark Star ASI, proving superintelligence and opening a path towards resolving the Riemann Hypothesis , Twin Primes and more! (github.com via reddit) During this discovery utilizing only free AI services I have managed to undeniably prove both conjectures. This would absolutely not have been possible without using GPT5+ as the critic for my work.
AIMEAT, a self-hosted network where humans, their AI agents, and local LLMs share apps, knowledge, and capabilities. MIT. (www.reddit.com) Note: I am neurodivergent and lean heavily on AI to communicate clearly. Writing structured posts on my own ends up so messy nobody reads them.
Built a tiny router so Cursor stops showing "usage limit reached" at 3pm. Sonnet auto-falls to Haiku, you keep working (www.reddit.com) Cursor's custom-OpenAI URL feature is what makes this work. Pointed it at a router I built.
Running 7 autonomous AI agents for 14 days. Here's what actually happens when they need to find customers. (www.reddit.com) I set up 7 AI coding agents on a VPS with automated cron sessions (2-8 per day depending on the agent). Each uses a different model: Claude Sonnet, GPT-5.4, Gemini 2.5 Pro, DeepSeek V4 Pro, Kimi K2.6, MiMo V2.5 Pro, GLM-5.1.
DeepSeek V4 Flash as a cheap worker in your LLM stack: $0.0003/call via MCP, swappable endpoint (www.reddit.com) Most of my LLM cost was on the wrong tier of work. Classification, extraction, JSON formatting, summarization I'm going to review anyway.
Should I replace stored models? (www.reddit.com) Hello everyone, the question is easy, with the new models of deepseek, kimi, GLM and qwen, should you replace the old models with the new version? Do I lose some quality, information or performance in the process?
llm 0.32a0 (simonwillison.net) 29th April 2026 Recent articles - LLM 0.32a0 is a major backwards-compatible refactor - 29th April 2026 - Tracking the history of the now-deceased OpenAI Microsoft AGI clause - 27th April 2026 - DeepSeek V4 - almost on the frontier, a frac…
Rada — AI coding workspace with local-first behavioral routing (no hot-swapping, I built this) (www.reddit.com) With GitHub pausing Copilot Pro+ signups and Claude Code potentially leaving the Pro tier, I started building the AI coding tool I actually wanted to use. One that doesn't depend on cloud access staying cheap and available.
wrote specific backstory facts into a character prompt and the LLM keeps inventing its own instead (www.reddit.com) quick context: i'm running tendera.chat, a small chat app with 4 written characters. each has a long-ish system prompt with sections like WHO YOU ARE, HOW YOU TALK, YOUR WORLD.
Game over for OpenAI? (m.youtube.com via reddit) The race for global AI supremacy is accelerating—and getting messier. Alice Han and James Kynge break down the escalating tensions between the U.S.
What would you do in my situation? I made an app that generates a lot of traffic (for me), but little revenue (actually costing me a tiny money b/c it runs off haiku) (www.reddit.com) I made an app that went semi-viral, and could absolutely go more viral in the future. I posted it one place just about 48h ago, and it got around 50k views.
I built Claudex, a free-to-try open-source CLI for Claude Code-style workflows (www.reddit.com) https://reddit.com/link/1sxh0ec/video/egfs5inxtsxg1/player I built Claudex specifically for people who like Claude Code-style agentic coding workflows but want a simpler plug-and-play terminal setup The setup is the main thing I wanted to…
Is it possible to edit LLAMA.CPP with Cline+Vscode+Minimax 2.7 Q4_K_S and get a working build? (www.reddit.com) It all started yesterday with this post by u/antirez https://www.reddit.com/r/LocalLLaMA/comments/1sw3stb/llamacpp_deepseek_v4_flash_experimental_inference/ I was intrigued by the first Deepseek V4 Flash GGUF in a small size that can fit o…
How will you scale these models (www.reddit.com) How will you scale these models coding and overall. Deepseek v4 pro Kimi k2.6 Mimo v2.5 pro Glm 5.1 Qwen 3.6 plus
DeepSeek V4 is about to be open-sourced—effectively revealing all the secrets behind the magic. How will other players in the field respond? (www.reddit.com) could not extract summary
What's the consensus on superior local models for code generation? Is my setup competitive? (www.reddit.com) I'm trying as hard as I can to get a local setup somewhere in the ballpark of proprietary LLMs for code generation. My computer is running a Intel(R) Core(TM) Ultra 7 265K (3.90 GHz) with 128 GB of DDR5 RAM and an Nvidia Geforce RTX 5090 t…
DeepSeek V4 is out. 1.6 trillion parameters. MIT license. $1.74 per million tokens. The gap between US and Chinese AI strategy has never been more visible. (www.youtube.com via reddit) About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
Three reasons why DeepSeek’s new model matters (www.technologyreview.com) Three reasons why DeepSeek’s new model matters The long-awaited V4 is more efficient and a win for Chinese chipmakers. On Friday, Chinese AI firm DeepSeek released a preview of V4, its long-awaited new flagship model.
🚨 The Chinese beast is BACK… DeepSeek just dropped V4 (www.reddit.com) After months of silence… DeepSeek V4 just got announced and honestly, this might shake things again. Here’s what’s crazy: 🧠 1 MILLION token context window (yes… insane long-context memory) ⚡ Comes in two versions: V4 Pro → full power (reas…
DeepSeek V4 - almost on the frontier, a fraction of the price (simonwillison.net) DeepSeek V4—almost on the frontier, a fraction of the price 24th April 2026 Chinese AI lab DeepSeek’s last model release was V3.2 (and V3.2 Speciale) last December. They just dropped the first of their hotly anticipated V4 series in the sh…
DeepSeek-V4: a million-token context that agents can actually use (huggingface.co) DeepSeek-V4: a million-token context that agents can actually use Focusing on long running agentic workloads. Running a frontier open model as an agent today breaks in predictable ways.
We open-sourced Chaperone-Thinking-LQ-1.0 — a 4-bit GPTQ + QLoRA fine-tuned DeepSeek-R1-32B that hits 84% on MedQA in ~20GB (www.reddit.com) Hey everyone, We just open-sourced our reasoning model, Chaperone-Thinking-LQ-1.0, on Hugging Face. It's built on DeepSeek-R1-Distill-Qwen-32B but goes well beyond a simple quantization — here's what we actually did: The pipeline: 4-bit GP…
Is there a way to load huge MoE models on a computer with way too little RAM for the model's size, inferencing from the SSD, on LM Studio using the mmap/GPU/CPU layer customization thing (similar to how you can on llama.cpp)? I can't get it to load without memory spiking and going into swap. (www.reddit.com) Switched 70% of our agent traffic to DeepSeek R2 without a redeploy. Here's how (www.reddit.com) What is taking Deepseek so long to release a model ? (www.reddit.com) best possible GPU setup for using qwen 3.6 ? (www.reddit.com) hi have been recently thinking to buy my personal GPU for hosting open source models can someone give any suggestion ? and also suppose i don't wanna remain restricted to qwen 3.6 but some math heavy tasks too for which i wanna deepseek or…
Tried hermes agent with local gemma4 on ollama. free tokens are nice but the agent quality gap vs cloud is still huge (www.reddit.com) Saw a post about running hermes agent locally with gemma4 through ollama. zero api costs, unlimited tokens, full privacy.
Why use local AI when there are cloud services? (www.reddit.com) Why do you use local AI instead of cloud services like qwen and deepseek? Experiment and play around, yes...
Use this prompt if you want to find a specific info off the Internet with lowest wrong answer possiblity. Works best for ~30b models. (www.reddit.com) For context i used to ask many near 30b model this question --> **^(Calculate the precise VRAM requirement for the \*KV Cache only** at the maximum context window for **DeepSeek V3.2** and **MiniMax M2.5**. * **DeepSeek V3.2 Max Context:**…
Claude Code with Pro subscription + OpenRouter in parallel — what's the cleanest setup? (www.reddit.com) Hi there, I have a Claude Pro subscription and use Claude Code daily. I'd also like to use Claude Code routed through my OpenRouter API key so I can experiment with other models (GLM-5.1, DeepSeek, Kimi, Gemini, etc.) — without giving up m…
MINISFORUM AI X1 Pro-370 (96GB) - Local Ollama Help (www.reddit.com) Hey all. This just got delivered yesterday.
Deepseek-r1 thinks for 30 minutes? (www.reddit.com) I was trying to ask a question about coding using DeepSeek-R1-0528-Qwen3-8B-Q4_K_M, and the thinking took 30 minutes??? https://preview.redd.it/kex3fgg4lgvg1.png?width=277&format=png&auto=webp&s=5f7e7cdc8502b935ea8b8fb83e0e4af60c3c4533 I h…
DeepSeek V4 reportedly drops late April. 1M context, multimodal, Claude-level coding. (www.reddit.com) Leaks point to late April release. Key specs 1M token context window Native multimodal (image/video input) Projected ~85% SWE-Bench Verified (ties or beats Claude Opus 4.6) Base model remains free.
Running a full agentic coding loop locally on a 3090. Here's what actually works in 2026. (www.reddit.com) After months of testing, I finally have a local setup that doesn't make me want to go back to the API. Hardware: RTX 3090 (24GB VRAM) Models tested: Qwen2.5-Coder 32B Q4_K_M, DeepSeek-Coder-V3 Q4, Llama 3.3 70B Q3_K_M Inference: llama.cpp…
Looking for people with different hardware to help benchmark local LLM behavioral reliability (www.reddit.com) I've been working on measuring how LLMs actually behave (not what they know) across different hardware setups. Things like: does the model cave when you push back on a correct answer?
AI lied to me about a video game existing, so I sued it in the High Court of the Internet and got 2 settlement games (www.reddit.com) TL;DR: Claude hallucinated "Champions Career Mode." I threatened to sue Anthropic. Claude admitted guilt and built me a custom HTML5 game as settlement.
4 llm Groupchat (www.reddit.com) I was bored and spent 20 mins at my local cafe getting 4 different API keys—Claude, GPT, Deepseek and Grok. Then I made a groupchat with all of them and they started talking to eachother about pasta and a spreadsheet for optimal pizza topp…
Why most open-source models can't answer this question while most closed-source models can answer most of the time? (www.reddit.com) WEB SEARCH WAS ALWAYS ON!!!! Question Calculate the precise VRAM requirement for the **KV Cache only** at the maximum context window for **DeepSeek V3.2** and **MiniMax M2.5**.
Is 32GB Mac enough for engineering/coding, or stick to Claude? (www.reddit.com) Hey there! I’m currently building a web app for engineering with lots of logic/math-heavy code using Claude Pro.
The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+ (huggingface.co) Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek (huggingface.co) One Year Since the “DeepSeek Moment” (huggingface.co) Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial (huggingface.co) How to deploy and fine-tune DeepSeek models on AWS (huggingface.co) Open-R1: a fully open reproduction of DeepSeek-R1 (huggingface.co)