DocumentAI Visual Benchmark - GPT 5.5, Gemini 3.5, Qwen... (www.maltebuettner.eu via hn)
model roundup
Gemini 3.5
-
# documentai bbox benchmark In my previous post, I talked a bit about the recent developments in the field of DocumentAI. Now comes the practical part.
-
Gemini 3.5 Flash beats Opus 4.8 on bluffbench (bsky.app via hn)
Re-ran this eval against Opus 4.8, Gemini 3.5 Flash, and GPT 5.5. Opus 4.8 is a modest improvement over the previously tested Opus models, but Gemini 3.5 Flash is the real stand-out!
-
Show HN: Audit your Anki flashcards at flashcardaudit.com (flashcardaudit.com via hn)
Hey, my name is Tyler, I made this. flashcardaudit.com is a tool that allows users to upload an Anki collection so that an AI auditor (Gemini 3.5 Flash) can review the factual correctness of each Anki card.
-
could not extract summary
-
Ask HN: Is it just me or has Gemini enshittified in the last three weeks? (news.ycombinator.com)
As someone who's been using the Gemini Pro plan for the past 9 months, I noticed a massive jump in the amount of rate-limiting I'm getting from Gemini since around the beginning of May. It seems to coincide with the updated UI and the rele…
-
How do you handle trying new models without spending too much? (www.reddit.com)
New models pop up constantly—Qwen 3.7, Gemini 3.5 flash, etc. Every time a better one launches, I want to have a try, but I don't want to increase subscriptions.
-
Image processing? (www.reddit.com)
How good is Claude’s image processing capability? Basically, I want Claude code to detect any issues in AI generated presentations (around 5–7 presentations with 5–8 slides each).
-
Gemini 3.5 Flash Looks Good for How Fast It Is (thezvi.substack.com via hn)
Gemini 3.5 Flash Looks Good For How Fast It Is Google once again has a model worth at least some consideration. Gemini 3.5 Flash is likely the best model out there at its particular speed point, as long as you don’t mind that it is a Gemin…
-
Why are AI models getting more expensive? (www.reddit.com)
The trend before was that models became less expensive for their capabilities, many corporations bet on that, and it backfired. Opus 4.7, GPT 5.5, Gemini 3.5 flash.
-
Google makes Gemini 3.5 Flash the default AI model for billions of users (techthreedots.com via hn)
Google is rolling out Gemini 3.5 Flash as the default model behind the Gemini app and AI Mode in Search this week, putting its newest model directly in front of billions of users worldwide. The switch matters because it changes the model m…
-
Like everyone else, I’ve been testing the newly released Gemini 3.5 Flash. The speed is phenomenal, but I wanted to see how it handles large, structured data aggregations directly in the prompt versus using a delegated tool architecture.
-
Tell HN: Gemini 3.5 Flash breaks in stupid ways (news.ycombinator.com)
I thought I was going crazy, trying to use Gemini 3.5 Flash to rate some answers, but it kept giving 7 instead of 10 for correct answers. Apparently once you add a "Grading criteria" text, the model collapses into a "compressed toward the…
-
Google is cooking just give them sometime (gemini 3.5 pro) (www.reddit.com)
could not extract summary
-
Does a model maintain the same judgment or does it side with whoever is speaking? This benchmark measures that inconsistency directly.
-
Google's latest creation: Gemini 3.5 Flash vs all (www.reddit.com)
https://gemini.google.com/share/c2a187275e26 archive link https://claude.ai/share/8383747a-aaf1-4f6c-a516-0e839f46a698 https://grok.com/share/bGVnYWN5_3c63e371-eb9d-46c3-8ba2-0c745c6795a2 https://chatgpt.com/share/6a0f1e13-a0c8-8328-b989-1…
-
datasette-agent-charts 0.1a2 (simonwillison.net)
21st May 2026 - "View SQL query" buttons below rendered charts. Recent articles - Datasette Agent - 21st May 2026 - Gemini 3.5 Flash: more expensive, but Google plan to use it for everything - 19th May 2026 - The last six months in LLMs in…
-
could not extract summary
-
could not extract summary
-
Post I/O Review related to AI (pros and cons ) (www.reddit.com)
Post I/O Review related to AI (pros and cons ) Well it was not disastrous as many people say but there were some pros and cons which everyone will agree with. Btw gemini 3.5 flash is absolutely amazing model don't pay attention to some peo…
-
could not extract summary
-
Gemini 3.5 Flash vs Gemma4 31B - building SuperMario (Sound on!) (www.reddit.com)
Asked new Google Model to build SuperMario. Compared with Local Gemma4.
-
Gemini 3.5 deleted 28,745 lines, broke production, and wrote a fake post-mortem (www.reddit.com via hn)
could not extract summary
-
Claude Code, now powered by Gemini 3.5 Flash, GPT-5.5, Grok 4.3, and more (dechained.ai via hn)
Claude Code, now powered by OpenAI, xAI, DeepSeek, and more. Change models with 1-click.
-
Surprised it scored that high on these questions, considering how it scored in some other fields. (no open-ended version score yet)
-
This benchmark uses head-to-head comparisons of stories written in response to the same constrained creative briefs. The target range is 600-800 words.
-
Gemini 3.5 Flash hax 14x cost multiplier in GitHub Copilot (github.blog via hn)
Gemini 3.5 Flash is generally available for GitHub Copilot Gemini 3.5 Flash, Google’s latest Flash-tier model, is now rolling out on GitHub Copilot. In our early testing, Gemini 3.5 Flash delivers near-Pro coding quality at Flash-tier spee…
-
Don't share your opinion, if you didn't test it !!! (www.reddit.com)
I see many people giving their opinion based on what they previously saw or based on others and making their own opinion. Even though they don't test models thoroughly, they still give their option which is so frustrating.
-
Qwen3.7 Max scored by Artificial Analysis, 27B/35B waiting room (www.reddit.com)
https://preview.redd.it/42ak5qmus82h1.png?width=1133&format=png&auto=webp&s=744ea3dfc06c83d0c4d8aa128c39b3238b17d7be Qwen 3.7 Max sitting at 5th, pretty much on par with GPT 5.4 (xhigh) and a notch above the just released Gemini 3.5 Flash.…
-
Gemini 3.5 flags vs gpt 5.5 ?? What's your opinion on it (www.reddit.com)
could not extract summary
-
I'm surprised
-
100s of topics. They include dating apps, school smartphones, older-adult care, shrinkflation, eurozone politics.
-
More info: https://github.com/lechmazur/nyt-connections/
-
Gemini 3.5 Flash: more expensive, but Google plan to use it for everything (simonwillison.net)
Gemini 3.5 Flash: more expensive, but Google plan to use it for everything 19th May 2026 Today at Google I/O, Google released Gemini 3.5 Flash. This one skipped the -preview modifier and went straight to general availability, and Google ap…
-
datasette-llm-accountant 0.1a4 (simonwillison.net)
19th May 2026 - Fixed bug tracking chains of responses. Refs datasette-llm#7 Recent articles - Gemini 3.5 Flash: more expensive, but Google plan to use it for everything - 19th May 2026 - The last six months in LLMs in five minutes - 19th…
- datasette-llm 0.1a8 (simonwillison.net)
-
While we are weathering the gemini 3.5 flash hype, keep in mind that according to arena, GLM and Mimo are better. https://arena.ai/leaderboard/text/coding-no-style-control #7 GLM #9 Mimo #12 Gemini 3.5 Flash
-
Gemini 3.5 flash scores, hasn’t even beat GPT 5.4 xhigh (www.reddit.com)
could not extract summary
-
Gemini 3.5 Flash looks worse than it seems on Artificial Analysis (www.reddit.com)
Looking at Artificial Analysis, Gemini 3.5 Flash seems to compare strangely against Gemini 3.1 Pro. Numbers from Artificial Analysis: Gemini 3.1 Pro - Intelligence score: 57 - Cost: $892 - Pricing: $2 / $12 per 1M input/output tokens Gemin…
-
At last year’s I/O event, Google was still talking about the 2.5 branch of Gemini, and what a difference a year makes. We’ve gone through the 3.0 and 3.1 families since then, and now it’s on to version 3.5.
-
Gemini 3.5 Flash: frontier intelligence with action (blog.google via hn)
Gemini 3.5: frontier intelligence with action Today, we’re introducing Gemini 3.5, our latest family of models combining frontier intelligence with action. This represents a major leap forward in building more capable, intelligent agents.
-
Source Gemini flash costs almost as much as flagship models..... If gemini 3.5 pro scales like that it'll cost more than claude opus 3.
-
Gemini 3.5 Flash Agents built a real Complete OS from scratch! (www.reddit.com)
https://x.com/Google/status/2056789235500466273?s=20 Google asked its agents to build a working operating system from scratch using u/Antigravity 2.0 and Gemini 3.5 Flash. Gemini built a real OS out of scratch.
-
Behold, Gemini 3.5 Flash! (www.reddit.com)
could not extract summary
- Gemini 3.5 flash is not that great at coding (www.reddit.com)
-
Gemini 3.5 confirmed by google deepmind employee (www.reddit.com)
could not extract summary