model roundup

Gemini 3.5

45 items · started 2026-05-19 · closed 2026-06-01

  1. # documentai bbox benchmark In my previous post, I talked a bit about the recent developments in the field of DocumentAI. Now comes the practical part.

  2. Re-ran this eval against Opus 4.8, Gemini 3.5 Flash, and GPT 5.5. Opus 4.8 is a modest improvement over the previously tested Opus models, but Gemini 3.5 Flash is the real stand-out!

  3. Hey, my name is Tyler, I made this. flashcardaudit.com is a tool that allows users to upload an Anki collection so that an AI auditor (Gemini 3.5 Flash) can review the factual correctness of each Anki card.

  4. could not extract summary

  5. As someone who's been using the Gemini Pro plan for the past 9 months, I noticed a massive jump in the amount of rate-limiting I'm getting from Gemini since around the beginning of May. It seems to coincide with the updated UI and the rele…

  6. New models pop up constantly—Qwen 3.7, Gemini 3.5 flash, etc. Every time a better one launches, I want to have a try, but I don't want to increase subscriptions.

  7. How good is Claude’s image processing capability? Basically, I want Claude code to detect any issues in AI generated presentations (around 5–7 presentations with 5–8 slides each).

  8. Gemini 3.5 Flash Looks Good For How Fast It Is Google once again has a model worth at least some consideration. Gemini 3.5 Flash is likely the best model out there at its particular speed point, as long as you don’t mind that it is a Gemin…

  9. The trend before was that models became less expensive for their capabilities, many corporations bet on that, and it backfired. Opus 4.7, GPT 5.5, Gemini 3.5 flash.

  10. Google is rolling out Gemini 3.5 Flash as the default model behind the Gemini app and AI Mode in Search this week, putting its newest model directly in front of billions of users worldwide. The switch matters because it changes the model m…

  11. Like everyone else, I’ve been testing the newly released Gemini 3.5 Flash. The speed is phenomenal, but I wanted to see how it handles large, structured data aggregations directly in the prompt versus using a delegated tool architecture.

  12. I thought I was going crazy, trying to use Gemini 3.5 Flash to rate some answers, but it kept giving 7 instead of 10 for correct answers. Apparently once you add a "Grading criteria" text, the model collapses into a "compressed toward the…

  13. could not extract summary

  14. Does a model maintain the same judgment or does it side with whoever is speaking? This benchmark measures that inconsistency directly.

  15. https://gemini.google.com/share/c2a187275e26 archive link https://claude.ai/share/8383747a-aaf1-4f6c-a516-0e839f46a698 https://grok.com/share/bGVnYWN5_3c63e371-eb9d-46c3-8ba2-0c745c6795a2 https://chatgpt.com/share/6a0f1e13-a0c8-8328-b989-1…

  16. 21st May 2026 - "View SQL query" buttons below rendered charts. Recent articles - Datasette Agent - 21st May 2026 - Gemini 3.5 Flash: more expensive, but Google plan to use it for everything - 19th May 2026 - The last six months in LLMs in…

  17. could not extract summary

  18. could not extract summary

  19. Post I/O Review related to AI (pros and cons ) Well it was not disastrous as many people say but there were some pros and cons which everyone will agree with. Btw gemini 3.5 flash is absolutely amazing model don't pay attention to some peo…

  20. could not extract summary

  21. Asked new Google Model to build SuperMario. Compared with Local Gemma4.

  22. could not extract summary

  23. Claude Code, now powered by OpenAI, xAI, DeepSeek, and more. Change models with 1-click.

  24. Surprised it scored that high on these questions, considering how it scored in some other fields. (no open-ended version score yet)

  25. This benchmark uses head-to-head comparisons of stories written in response to the same constrained creative briefs. The target range is 600-800 words.

  26. Gemini 3.5 Flash is generally available for GitHub Copilot Gemini 3.5 Flash, Google’s latest Flash-tier model, is now rolling out on GitHub Copilot. In our early testing, Gemini 3.5 Flash delivers near-Pro coding quality at Flash-tier spee…

  27. I see many people giving their opinion based on what they previously saw or based on others and making their own opinion. Even though they don't test models thoroughly, they still give their option which is so frustrating.

  28. https://preview.redd.it/42ak5qmus82h1.png?width=1133&format=png&auto=webp&s=744ea3dfc06c83d0c4d8aa128c39b3238b17d7be Qwen 3.7 Max sitting at 5th, pretty much on par with GPT 5.4 (xhigh) and a notch above the just released Gemini 3.5 Flash.…

  29. could not extract summary

  30. I'm surprised

  31. 100s of topics. They include dating apps, school smartphones, older-adult care, shrinkflation, eurozone politics.

  32. More info: https://github.com/lechmazur/nyt-connections/

  33. Gemini 3.5 Flash: more expensive, but Google plan to use it for everything 19th May 2026 Today at Google I/O, Google released Gemini 3.5 Flash. This one skipped the -preview modifier and went straight to general availability, and Google ap…

  34. 19th May 2026 - Fixed bug tracking chains of responses. Refs datasette-llm#7 Recent articles - Gemini 3.5 Flash: more expensive, but Google plan to use it for everything - 19th May 2026 - The last six months in LLMs in five minutes - 19th…

  35. While we are weathering the gemini 3.5 flash hype, keep in mind that according to arena, GLM and Mimo are better. https://arena.ai/leaderboard/text/coding-no-style-control #7 GLM #9 Mimo #12 Gemini 3.5 Flash

  36. could not extract summary

  37. Looking at Artificial Analysis, Gemini 3.5 Flash seems to compare strangely against Gemini 3.1 Pro. Numbers from Artificial Analysis: Gemini 3.1 Pro - Intelligence score: 57 - Cost: $892 - Pricing: $2 / $12 per 1M input/output tokens Gemin…

  38. At last year’s I/O event, Google was still talking about the 2.5 branch of Gemini, and what a difference a year makes. We’ve gone through the 3.0 and 3.1 families since then, and now it’s on to version 3.5.

  39. Gemini 3.5: frontier intelligence with action Today, we’re introducing Gemini 3.5, our latest family of models combining frontier intelligence with action. This represents a major leap forward in building more capable, intelligent agents.

  40. Source Gemini flash costs almost as much as flagship models..... If gemini 3.5 pro scales like that it'll cost more than claude opus 3.

  41. https://x.com/Google/status/2056789235500466273?s=20 Google asked its agents to build a working operating system from scratch using u/Antigravity 2.0 and Gemini 3.5 Flash. Gemini built a real OS out of scratch.

  42. could not extract summary

  43. could not extract summary

← all threads