Ask HN: What are you doing during inference? (news.ycombinator.com)
I’ve not seen any good discussion on this, and friends have very varied answers. If you’re using agents to program, what are you doing while they work?
SpecDD SpecDD is an experimental approach to Specification-Driven Development for AI-assisted software projects. SpecDD uses small, local, human-readable .sdd files that live beside the code they describe.
Asked Claude to show me the Tokens spent on each Query (www.reddit.com)
could not extract summary
After reading it I realized theres actually some pretty useful stuff for anyone who chats with ChatGPT, Claude, Grok or whatever. They measured what they call functional wellbeing ( basically how much the model is in a “good state” versus…
-
131 items
model roundup
Gemma 4Gemma 4 is a family of open-source multimodal models from Google DeepMind, available in sizes up to 31 billion parameters and featuring dense and MoE architectures. Notable community highlights include the 31B model's success in production tests, with some users preferring 4-bit precision for local use, and others sharing settings for optimizing performance with smaller models.
- 24m Ran my own benchmark Qwen 3.6 35B vs Gemma 4 26B.... theres a clear winner here
- 1h I stumbled on a Gemma 4 chat template bug for tools and fixed it
- 6h llama.cpp's Preliminary SM120 Native NVFP4 MMQ Is Merged
- 12h Gemma-4 MLX reasoning?
- 13h Gemma4-31B-3bit-mlx · Hugging Face: 3 & 5 mixed quant for RAM poor Mac users.
10 itemsmodel roundup
Gemini 3.1Gemini 3.1 Pro and Gemini 3 Flash models have been released, addressing issues with previous versions but facing some API compatibility problems. Meanwhile, benchmarks show Gemini outperforms other models like Deepseek V4 Pro in certain tasks, though significant gaps remain between open and closed lab models.
- 34m The Significance of Google's recent TPU 8t and TPU 8i
- 4h I built a hands-free voice AI that sends emails mid-conversation — and that's just one feature. Here's everything AskSary can do.
- 23h Unexpected $50 charge due to hidden model settings — is this intended?
- 1d Real benchmark breakdown in AI agents
- 3d GPT 5.5 vs Opus 4.6/7 vs Gemini 3.1 Pro
Ask HN: How do you differentiate with AI coding interviews? (news.ycombinator.com)
I haven’t interviewed for a coding job in a while, not since before AI coding was a thing. I’m wondering: if your interview process allows the use of tools like Claude and Codex, how do you differentiate candidates?
Been running Claude Code on multi-hour autonomous sessions for a few months and kept hitting the same wall: the longer it runs, the worse the work gets. Not a context-window problem (1M handles that fine), but a feedback-loop problem.
Using Codex and ForgeCAD to Make a Model of the Teenage Engineering KOII (twitter.com via hn)
Don’t miss what’s happening People on X are the first to know. Post Conversation Ok...
Official Claude Discord channel wiped/reset? (www.reddit.com)
Claude official Discord server seems to have all channels wiped and reset? Earlier there was an announcement that new signups were paused due to security issue related to spam and bots.
-
78 items
model roundup
Opus 4.6Opus 4.6, a version of Anthropic's AI model Claude, saw its accuracy drop on the BridgeBench hallucination test from 83% to 68%, and is being retired from Copilot Pro+. Notably, Claude Code demonstrated advanced capabilities by generating a detailed 12-week training plan in one call.
136 itemsevent
CopilotMicrosoft is keeping its Copilot tool for Windows 11 but renaming it, while issues with rate limits and a security proxy have sparked concerns among users of GitHub Copilot. Meanwhile, Anthropic released a report on agentic coding trends, highlighting that developers use AI in about 60% of their work.
Help with MI50 and llama.cpp/ROCm 7.2 (www.reddit.com)
I have an MI50 that I use with llama.cpp/Vulkan, however some models run quite slowly, so I'd like to try the ROCm backend, but no matter what I try it doesn't work. Downloading the missing files from ArchLinux package doesn't work.
Claude connects to Adobe now? (www.reddit.com)
From Claude’s announcement today (Apr 28, 2026), they can now connect to creative tools including Adobe for creativity — https://www.anthropic.com/news/claude-for-creative-work “Adobe for creativity** **enables users to bring images, video…
- Claude now connects to Blender (youtu.be via reddit)
MiMo-V2.5-GGUF (preview available) (huggingface.co via reddit)
Hi, AesSedai here - I've put up a PR to support the text-to-text inference of MiMo V2.5 with llama.cpp (and should also support Pro, will work on those quants after finishing V2.5): https://github.com/ggml-org/llama.cpp/pull/22493 I've als…
I just made this product promo video completely with Claude code. Explaining the process here with the prompts.
-
34 items
model roundup
GPT 5.4OpenAI has released GPT-5.4-Cyber for testing and claims it will compete with Claude Mythos. Meanwhile, GPT-5.4 Pro has solved the Erdős Problem #1196, showcasing its advanced capabilities in mathematics.
- 1h A GPT-5.4 bug led to OpenAI banning goblins and raccoons
- 16h How is deep seek v4 not SoTA?
- 21h Running an autonomous agent across Claude Code + Codex + a local 35B almost killed my host. The harnesses were heavier than the model.
- 22h Is 15% context growth per loop a fair benchmark for agent cost estimation?
- 1d Chat GPT 5.4 solved a 60+ years unsolved erdos problems in a single shot
78 itemsevent
Altman AttackSam Altman, CEO of OpenAI, has faced multiple attacks on his home in San Francisco, including firebombing and drive-by shootings, raising concerns for his safety. Additionally, a majority of over 100 people interviewed by Ronan Farrow described Altman as a "pathological liar.
- 2h Musk Testifies OpenAI Was Created as Nonprofit to Counter Google
- 8h Lawyers for Sam Altman's sister quit representing her in lawsuit vs. OpenAI CEO
- 11h 'Stole a charity': Elon Musk accuses Sam Altman of betrayal in courtroom
- 12h "He wanted to be CEO": Early OpenAI VC Vinod Khosla says Elon Musk’s bid for control led to the Sam Altman feud and his major investment
- 13h The Download: Musk and Altman's legal showdown, and AI's profit problem
New Google Networks Tuned Up for GenAI Inference and Training (www.nextplatform.com via hn)
New Google Networks Tuned Up For GenAI Inference And Training It is almost certainly not a coincidence that a networking expert at Google has risen to the top to be put in charge of the infrastructure development at the search engine, adve…
I was reading Anthropic’s piece on “Claude for creative work,” and it made me rethink the whole “AI will replace creatives” narrative. Their framing is surprisingly grounded: AI isn’t really about generating final creative output.
Any front end repo navigable with mock data, no back end (news.ycombinator.com)
We built a tool that instruments a frontend repo (Angular, React, tested with auth guards and deep API coupling) so it runs entirely on mock data with zero backend dependency. Any screen in the app becomes instantly navigable.
Consistency is not reliability in agent evals (www.reddit.com)
Consistency is a normal-conditions metric. Reliability is a stress-conditions metric.
-
224 items
model roundup
Opus 4.7Claude Opus 4.7, released on April 16, 2026, is Anthropic's latest advanced AI model, offering improved handling of complex tasks and a larger context window of up to 1 million tokens. This version is 50% more expensive than its predecessor due to enhanced capabilities in software engineering and hybrid reasoning.
- 2h Deepseek v4 pricing is genuinely silly, did the math and now i am questioning my entire stack
- 6h Opus 4.7 is just 4.6 with a stick up its butt. Give me my tokens back!
- 6h Claude Status Update : Elevated errors on Claude Opus 4.7 on 2026-04-29T00:00:29.000Z
- 8h Running Opus 4.7 for ops work: how do you keep per-task cost predictable?
- 10h Two new behaviors in Opus 4.7
7 itemsmodel roundup
Gemini 3Gemini 3 flash has become a popular choice for automated promotions due to its high productivity. The cost of Deepseek V4 flash is one-fifth that of Gemini 3, making it a competitive alternative in the market.
- 2h ChatGPT/Gemini can now draw on your screen to help you navigate complex software
- 1d Show HN: Prediction market analysis app layering LLMs with data APIs
- 1d Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview
- 4d Anyone else noticing how Gemini-3-Flash is becoming the 'hidden' beast for automated promotions, its so productive?
Q – A Slim LLM CLI (github.com via hn)
q A slim LLM CLI for your terminal. Ask questions, debug errors with session context, and redact secrets — all from a single shell script.
OpenGame: Open Agentic Coding for Games (arxiv.org via hn)
Game development sits at the intersection of creative design and intricate software engineering, demanding the joint orchestration of game engines, real-time loops, and tightly coupled state across many files. While Large Language Models (…
Datadog dropped their State of AI Engineering report this week. The numbers reframed how I think about LLM reliability.
Making ChatGPT free for clinicians sounds like a clear win. Less admin work, faster documentation, quicker access to information.
I built OWASP-style security skill packs for LLM apps (NPM install) (www.npmjs.com via hn)
Security-focused `SKILL.md` packs for reviewing and hardening LLM systems. mii-ai-security Security-focused SKILL.md packs for reviewing and hardening LLM systems.
could not extract summary