Differences Between Opus 4.6 and Opus 4.7 on MineBench (www.reddit.com)
model roundup
Gemini 3.1
-
Some Notes: For what's supposedly the SOTA model and beats all other models in essentially every benchmark, I expected it to be a lot more consistent honestly You'll notice how sometimes it focused too much on the scenery (like the arcade…
-
Qwen3.6 35B: paratroopers puzzle (www.reddit.com)
I keep presenting Local and Huge cloud models with the same challenge: "Two paratroopers land on an infinite 1D numeric axis at distinct, unknown integer coordinates. They both execute the exact same deterministic program.
-
could not extract summary
-
SFT + DPO on open-sourced SLMs (www.reddit.com)
Hey folks, this is for those who appreciate experimentation on open-sourced AI models. We fine-tuned open-sourced SMLs (3B and 7B parameters) with SFT + DPO against commercial models like GPT-5.4, Gemini 3.1 Pro, Claude Opus 4.6, Google Do…
-
Show HN: Claude Opus 4.7: Everything You Need to Know (news.ycombinator.com)
Claude Opus 4.7 is Anthropic's most capable generally available model, released April 16, 2026. It outperforms Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on key benchmarks including agentic coding, multidisciplinary reasoning, scaled tool use,…
-
Gemini 3.1 Pro #1 at METR Timeline 80% Success Rate (1.5H) (www.reddit.com)
#2 at 50% success rate (task length: 6H 24M)
-
Gemini 3.1 Flash TTS – with directed prompts (simonwillison.net via hn)
Google released Gemini 3.1 Flash TTS today, a new text-to-speech model that can be directed using prompts. It's presented via the standard Gemini API using gemini-3.1-flash-tts-preview as the model ID, …
-
Logan Kilpatrick on X: "Introducing Gemini 3.1 Flash TTS 🗣️, our latest text to speech model with scene direction, speaker level specificity, audio tags, more natural + expressive voices, and support for 70 different languages. Available v…
-
Show HN: Hormuz Trail - Oregon Trail parody/black-box AI coding exercise (hormuztrail.com via hn)
I jokingly told a co-worker Iran might make a good Oregon Trail parody. Then I built it.
-
I've been waiting for a capable free local LLM for a while. I think we're close — the quality is getting there fast, and Gemma 4 is the first open-weight model where I genuinely considered using it in production for simple-to-medium tasks.
-
Compare harnesses not models: Blitzy vs. GPT-5.4 on SWE-Bench Pro (quesma.com via hn)
An independent audit of agentic scaffolding and harnesses. We analyze how agent workflows, codebase documentation, and test verification impact performance compared to raw base models like GPT-5.4, Gemini 3.1 Pro, and Claude Code.
-
We are open-sourcing the backbone of our voice AI stack - Dograh, a self-hostable, open-source voice agent platform. Three core things that make it work: Visual Workflow Builder What it is: Drag-and-drop builder for designing voice agent c…