model roundup

Gemini 3.5

43 items · started 2026-05-19 · closed 2026-06-01

DocumentAI Visual Benchmark - GPT 5.5, Gemini 3.5, Qwen... (www.maltebuettner.eu via hn)

+1 3w qwen gemini

# documentai bbox benchmark In my previous post, I talked a bit about the recent developments in the field of DocumentAI. Now comes the practical part.
Gemini 3.5 Flash beats Opus 4.8 on bluffbench (bsky.app via hn)

+22 4w gemini opus

Re-ran this eval against Opus 4.8, Gemini 3.5 Flash, and GPT 5.5. Opus 4.8 is a modest improvement over the previously tested Opus models, but Gemini 3.5 Flash is the real stand-out!
Show HN: Audit your Anki flashcards at flashcardaudit.com (flashcardaudit.com via hn)

+3 4w gemini

Hey, my name is Tyler, I made this. flashcardaudit.com is a tool that allows users to upload an Anki collection so that an AI auditor (Gemini 3.5 Flash) can review the factual correctness of each Anki card.
Extra High thinking level possibly with gemini 3.5 pro soon be released (www.reddit.com)

+1 4w gemini

could not extract summary
Ask HN: Is it just me or has Gemini enshittified in the last three weeks? (news.ycombinator.com)

+21 4w gemini claude-code

As someone who's been using the Gemini Pro plan for the past 9 months, I noticed a massive jump in the amount of rate-limiting I'm getting from Gemini since around the beginning of May. It seems to coincide with the updated UI and the rele…
How do you handle trying new models without spending too much? (www.reddit.com)

+22 4w qwen gemini

New models pop up constantly—Qwen 3.7, Gemini 3.5 flash, etc. Every time a better one launches, I want to have a try, but I don't want to increase subscriptions.
Image processing? (www.reddit.com)

+11 4w gemini claude-code

How good is Claude’s image processing capability? Basically, I want Claude code to detect any issues in AI generated presentations (around 5–7 presentations with 5–8 slides each).
Why are AI models getting more expensive? (www.reddit.com)

+1125 4w gemini opus

The trend before was that models became less expensive for their capabilities, many corporations bet on that, and it backfired. Opus 4.7, GPT 5.5, Gemini 3.5 flash.
Google makes Gemini 3.5 Flash the default AI model for billions of users (techthreedots.com via hn)

+2 5w gemini

Google is rolling out Gemini 3.5 Flash as the default model behind the Gemini app and AI Mode in Search this week, putting its newest model directly in front of billions of users worldwide. The switch matters because it changes the model m…
Direct LLM vs Model Context Protocol (MCP): A benchmark on API costs and latency. (www.reddit.com)

+22 5w model-context-protocol gemini mcp

Like everyone else, I’ve been testing the newly released Gemini 3.5 Flash. The speed is phenomenal, but I wanted to see how it handles large, structured data aggregations directly in the prompt versus using a delegated tool architecture.
Tell HN: Gemini 3.5 Flash breaks in stupid ways (news.ycombinator.com)

+51 5w hallucination gemini

I thought I was going crazy, trying to use Gemini 3.5 Flash to rate some answers, but it kept giving 7 instead of 10 for correct answers. Apparently once you add a "Grading criteria" text, the model collapses into a "compressed toward the…
Google is cooking just give them sometime (gemini 3.5 pro) (www.reddit.com)

+5645 5w gemini

could not extract summary
Grok 4.3 tops the Consistency Leaderboard in the LLM Sycophancy Benchmark, largely because it is one of the most cautious models. (www.reddit.com)

+263 5w gpt-4 mistral grok+2

Does a model maintain the same judgment or does it side with whoever is speaking? This benchmark measures that inconsistency directly.
Gemini 3.5 deleted 28,745 lines, broke production, and wrote a fake post-mortem (old.reddit.com via hn)

+1 5w gemini

could not extract summary
Google's latest creation: Gemini 3.5 Flash vs all (www.reddit.com)

+16567 5w grok gemini chatgpt

https://gemini.google.com/share/c2a187275e26 archive link https://claude.ai/share/8383747a-aaf1-4f6c-a516-0e839f46a698 https://grok.com/share/bGVnYWN5_3c63e371-eb9d-46c3-8ba2-0c745c6795a2 https://chatgpt.com/share/6a0f1e13-a0c8-8328-b989-1…
datasette-agent-charts 0.1a2 (simonwillison.net)

5w gemini

21st May 2026 - "View SQL query" buttons below rendered charts. Recent articles - Datasette Agent - 21st May 2026 - Gemini 3.5 Flash: more expensive, but Google plan to use it for everything - 19th May 2026 - The last six months in LLMs in…
Gemini 3.5 flash beating gpt 5.5 a bigger and more pricer model in agentic benchmarks (second image is from zapier automation benchmarks) (www.reddit.com)

2 5w gemini agentic

could not extract summary
Gemini 3.5 Flash ranks #1 on the APEX-Agents-AA benchmark, outperforming much larger models a whole size above it. (www.reddit.com)

+173 5w gemini

could not extract summary
Post I/O Review related to AI (pros and cons ) (www.reddit.com)

2 5w gemini mcp agentic

Post I/O Review related to AI (pros and cons ) Well it was not disastrous as many people say but there were some pros and cons which everyone will agree with. Btw gemini 3.5 flash is absolutely amazing model don't pay attention to some peo…
Gemini 3.5 Flash ranks #1 on Automation Bench (from Zapier), beating every other frontier model at a much lower cost (www.reddit.com)

+96 5w gemini

could not extract summary
Gemini 3.5 Flash vs Gemma4 31B - building SuperMario (Sound on!) (www.reddit.com)

+121 5w gemma gemini

Asked new Google Model to build SuperMario. Compared with Local Gemma4.
Claude Code, now powered by Gemini 3.5 Flash, GPT-5.5, Grok 4.3, and more (dechained.ai via hn)

+3 5w grok gpt-5 deepseek+3

Claude Code, now powered by OpenAI, xAI, DeepSeek, and more. Change models with 1-click.
Gemini 3.5 Flash scores 76.7% on SimpleBench, just 0.2% short of GPT 5.5 Pro's score (www.reddit.com)

+187 5w gemini

Surprised it scored that high on these questions, considering how it scored in some other fields. (no open-ended version score yet)
Gemini 3.5 Flash improves over Gemini 3.1 Pro on the Short Story Creative Writing Benchmark: -2.3 → -1.8. (www.reddit.com)

+203 5w gemini

This benchmark uses head-to-head comparisons of stories written in response to the same constrained creative briefs. The target range is 600-800 words.
Gemini 3.5 Flash hax 14x cost multiplier in GitHub Copilot (github.blog via hn)

+1 5w copilot gemini

Gemini 3.5 Flash is generally available for GitHub Copilot Gemini 3.5 Flash, Google’s latest Flash-tier model, is now rolling out on GitHub Copilot. In our early testing, Gemini 3.5 Flash delivers near-Pro coding quality at Flash-tier spee…
Don't share your opinion, if you didn't test it !!! (www.reddit.com)

+7 5w gemini opus agentic

I see many people giving their opinion based on what they previously saw or based on others and making their own opinion. Even though they don't test models thoroughly, they still give their option which is so frustrating.
Qwen3.7 Max scored by Artificial Analysis, 27B/35B waiting room (www.reddit.com)

+4127 5w qwen gemini

https://preview.redd.it/42ak5qmus82h1.png?width=1133&format=png&auto=webp&s=744ea3dfc06c83d0c4d8aa128c39b3238b17d7be Qwen 3.7 Max sitting at 5th, pretty much on par with GPT 5.4 (xhigh) and a notch above the just released Gemini 3.5 Flash.…
Gemini 3.5 flash is not that great at coding (www.reddit.com)

+12 5w gemini cursor

https://cursor.com/evals
Gemini 3.5 flags vs gpt 5.5 ?? What's your opinion on it (www.reddit.com)

+31 5w gemini

could not extract summary
Gemini 3.5 Flash costs more to run while being less Intelligent than 3.1 Pro (www.reddit.com)

+55 5w gemini

I'm surprised
Gemini 3.5 Flash scores 1479 on the Debate Benchmark. Ratings are Elo-like and centered near 1500. (www.reddit.com)

+93 5w gemini

100s of topics. They include dating apps, school smartphones, older-adult care, shrinkflation, eurozone politics.
Gemini 3.5 Flash: cost per puzzle vs. performance on the Extended NYT Connections Benchmark (www.reddit.com)

+91 5w gemini

More info: https://github.com/lechmazur/nyt-connections/
Gemini 3.5 Flash: more expensive, but Google plan to use it for everything (simonwillison.net)

5w gemini

Gemini 3.5 Flash: more expensive, but Google plan to use it for everything 19th May 2026 Today at Google I/O, Google released Gemini 3.5 Flash. This one skipped the -preview modifier and went straight to general availability, and Google ap…
datasette-llm-accountant 0.1a4 (simonwillison.net)

5w gemini

19th May 2026 - Fixed bug tracking chains of responses. Refs datasette-llm#7 Recent articles - Gemini 3.5 Flash: more expensive, but Google plan to use it for everything - 19th May 2026 - The last six months in LLMs in five minutes - 19th…
- datasette-llm 0.1a8 (simonwillison.net)
Open weights GLM and Mimo are better than Gemini 3.5 flash according to arena (www.reddit.com)

+33 5w glm gemini

While we are weathering the gemini 3.5 flash hype, keep in mind that according to arena, GLM and Mimo are better. https://arena.ai/leaderboard/text/coding-no-style-control #7 GLM #9 Mimo #12 Gemini 3.5 Flash
Gemini 3.5 flash scores, hasn’t even beat GPT 5.4 xhigh (www.reddit.com)

+2028 5w gemini

could not extract summary
Gemini 3.5 Flash looks worse than it seems on Artificial Analysis (www.reddit.com)

+1311 5w gemini

Looking at Artificial Analysis, Gemini 3.5 Flash seems to compare strangely against Gemini 3.1 Pro. Numbers from Artificial Analysis: Gemini 3.1 Pro - Intelligence score: 57 - Cost: $892 - Pricing: $2 / $12 per 1M input/output tokens Gemin…
Google announces agent-optimized Gemini 3.5.Flash and a do-anything model called Omni (arstechnica.com)

5w gemini

At last year’s I/O event, Google was still talking about the 2.5 branch of Gemini, and what a difference a year makes. We’ve gone through the 3.0 and 3.1 families since then, and now it’s on to version 3.5.
Gemini 3.5 Flash: frontier intelligence with action (blog.google via hn)

+14275 5w gemini

Gemini 3.5: frontier intelligence with action Today, we’re introducing Gemini 3.5, our latest family of models combining frontier intelligence with action. This represents a major leap forward in building more capable, intelligent agents.
Gemini 3.5 flash costs 3 times more than the previous version and 30x more than gemini 1.5 flash. (www.reddit.com)

+9434 5w gemini opus

Source Gemini flash costs almost as much as flagship models..... If gemini 3.5 pro scales like that it'll cost more than claude opus 3.
Gemini 3.5 Flash Agents built a real Complete OS from scratch! (www.reddit.com)

+4615 5w gemini

https://x.com/Google/status/2056789235500466273?s=20 Google asked its agents to build a working operating system from scratch using u/Antigravity 2.0 and Gemini 3.5 Flash. Gemini built a real OS out of scratch.
Gemini 3.5 confirmed by google deepmind employee (www.reddit.com)

+14734 5w deepmind gemini

could not extract summary

← all threads