model roundup

Gemini 3.1

13 items · started 2026-04-15 · closed 2026-04-21

Qwen3.6 35B: paratroopers puzzle (www.reddit.com)

+51 9w gemini

I keep presenting Local and Huge cloud models with the same challenge: "Two paratroopers land on an infinite 1D numeric axis at distinct, unknown integer coordinates. They both execute the exact same deterministic program.
FrontierMath: Opus 4.7 improves over Opus 4.6 and Gemini 3.1 but still trails GPT-5.4-xHigh and GPT-5.4-Pro (www.reddit.com)

+406 9w gpt-5 gemini opus

could not extract summary
SFT + DPO on open-sourced SLMs (www.reddit.com)

+75 9w dpo gpt-5 deepseek+2

Hey folks, this is for those who appreciate experimentation on open-sourced AI models. We fine-tuned open-sourced SMLs (3B and 7B parameters) with SFT + DPO against commercial models like GPT-5.4, Gemini 3.1 Pro, Claude Opus 4.6, Google Do…
New “pelican test” but for video (www.reddit.com)

+1 10w gemma qwen gemini

If the LLM supports video—which most VLLMs nowadays do—then try the following prompt with the accompanying video: With the given video, which is about 16 seconds long, your task is to write JavaScript for an animation that faithfully repli…
Show HN: Claude Opus 4.7: Everything You Need to Know (news.ycombinator.com)

+11 10w tool-use gpt-5 mythos+4

Claude Opus 4.7 is Anthropic's most capable generally available model, released April 16, 2026. It outperforms Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on key benchmarks including agentic coding, multidisciplinary reasoning, scaled tool use,…
Gemini 3.1 Pro #1 at METR Timeline 80% Success Rate (1.5H) (www.reddit.com)

+9122 10w gemini

#2 at 50% success rate (task length: 6H 24M)
Gemini 3.1 Flash TTS – with directed prompts (simonwillison.net via hn)

+62 10w gemini

Google released Gemini 3.1 Flash TTS today, a new text-to-speech model that can be directed using prompts. It's presented via the standard Gemini API using gemini-3.1-flash-tts-preview as the model ID, …
Google Launches Gemini 3.1 Flash TTS Text-to-Speech Model (x.com via reddit)

+319 10w gemini

Logan Kilpatrick on X: "Introducing Gemini 3.1 Flash TTS 🗣️, our latest text to speech model with scene direction, speaker level specificity, audio tags, more natural + expressive voices, and support for 70 different languages. Available v…
- Gemini 3.1 Flash TTS: the next generation of expressive AI speech (blog.google)
Show HN: Hormuz Trail - Oregon Trail parody/black-box AI coding exercise (hormuztrail.com via hn)

+3 10w sonnet gemini cursor+2

I jokingly told a co-worker Iran might make a good Oregon Trail parody. Then I built it.
Gemma 4 31B passed 7/8 real-world production tests — including ones I designed to make it fail. Full prompts + outputs. (www.reddit.com)

+1211 10w moe gemma gemini+1

I've been waiting for a capable free local LLM for a while. I think we're close — the quality is getting there fast, and Gemma 4 is the first open-weight model where I genuinely considered using it in production for simple-to-medium tasks.
Compare harnesses not models: Blitzy vs. GPT-5.4 on SWE-Bench Pro (quesma.com via hn)

+1 10w swe-bench gpt-5 gemini+2

An independent audit of agentic scaffolding and harnesses. We analyze how agent workflows, codebase documentation, and test verification impact performance compared to raw base models like GPT-5.4, Gemini 3.1 Pro, and Claude Code.
Open-sourcing Dograh - our voice AI agent platform built as an alternative to Vapi (www.reddit.com)

+1 10w gemini

We are open-sourcing the backbone of our voice AI stack - Dograh, a self-hostable, open-source voice agent platform. Three core things that make it work: Visual Workflow Builder What it is: Drag-and-drop builder for designing voice agent c…

← all threads