model roundup
Gemini 3.1
-
Building an image processing pipeline with two Gemini calls per request: Receive an image URL gemini-2.5-flash — multimodal analysis call → generates a scene description prompt gemini-3.1-flash-image-preview — takes that prompt + original…
-
Hey everyone, Currently building an iOS app for generating images from simple prompts, plus a few extra features on top. I'm using the gemini-3.1-flash-image-preview model.
-
Last week, we announced the “Simple Attention Network” and trained Needle, a 26m function call model that beats models 10-25x its size. Some LocalLlama Redditors asked if we could use make a router model.
-
Most model leaderboards are just benchmark scores. I've been building one that ranks by real usage instead - how much each model is actually being run and talked about, plus cost and speed - and the order comes out almost unrecognisable.
-
I built AgentTape to rank models on more than just benchmarks - it blends benchmark performance with who's actually using and talking about a model, plus cost and speed. It scores every public model from public signals (GitHub, Hugging Fac…
-
How to parse tables from pdf's (www.reddit.com)
My advice from testing extensively this month on tables: Convert the pdf's to pngs and then parse with gemini 3.1 pro and low thinking. You will not get better results elsewhere.
-
Ran a head-to-head on two open-weight models for tool-calling on a 4-core CPU, no GPU, no cherry-picking. Wanted to see if the small specialist (Needle, 26M, distilled from Gemini 3.1 for function calls) actually holds up against a small g…
-
I'm planning to build a full application from scratch and want to lean on an AI model to act as my co-developer. My main priorities are top-tier system design capabilities and rock-solid coding skills.
-
A/B tested Gemini 3.1 Pro vs. Claude Opus 4.6 – usage quota and quality (www.reddit.com via hn)
could not extract summary
-
Erdos Unit Distance Problem - Gemini 3.1 Pro's interpretation (www.reddit.com)
could not extract summary
-
Basically what the title says. I have the pro plan and I want to add Gemini 3.1 flash lite model to use it.
-
3.5 flash has been nothing but just a very underwhelming release that scores less than Gemini 3.1 pro and costs more. It's lagging behind 5.5 medium also in both intelligence and Cost.
-
Since the last post I've added: Huxley module (Brave New World style behavioral conditioning) Baudrillard module (synthetic intimacy, trust collapse, simulation) 30 more models including Grok 4.3, GPT-5.5, Gemini 3.1 Pro, GLM-5.1 Multi-jud…
-
I decided a couple of months ago to dabble in AI comic and book generators. Then an idea came to me a few weeks ago to make comics with my friends picture so I could roast him about something XD (Sorry Timo I put you on blast XDD.
-
Grok vs. ChatGPT vs. Gemini Comparison 2026: Complete Guide (Tested) (aithinkerlab.com via hn)
The 30-Second Verdict Best for science & reasoning: Gemini 3.1 Pro — leads GPQA Diamond (94.3%) and ARC-AGI-2 (77.1%). Best for coding: ChatGPT (GPT-5.5) — 88.7% on SWE-Bench Verified.
-
Show HN: Pokémon SVG Generation LLM Benchmark (svg-bench.fenx.work via hn)
Pokémon SVG Bench About Gallery 中文 EN About Gallery 中文 EN Visual Score SVG Structure Rank Model Total S1 S2 S3 Arrow 1.1 Official API 40.93 39.00 52.20 35.20 Gemini 3.1 Pro Official API. reasoning_effort: medium 32.63 55.20 42.20 20.20 Gem…
-
I tested GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro on financial-control (albertquaisie.substack.com via hn)
I Tested GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro Preview on Financial-Control Scenarios. The Hardest Part Was the Evaluation.
-
Needle is a 26M model for single-shot tool calling. The small-model headline is interesting, but I think the more useful claim is about agent architecture: A lot of tool calling is not reasoning.