model roundup

Gemini 3.1

18 items · started 2026-05-14 · closed 2026-05-30

Gemini image generation latency increases on each consecutive request — same image, fresh state every time. Anyone else seeing this? (www.reddit.com)

+24 4w gemini

Building an image processing pipeline with two Gemini calls per request: Receive an image URL gemini-2.5-flash — multimodal analysis call → generates a scene description prompt gemini-3.1-flash-image-preview — takes that prompt + original…
Gemini API costs are way too high just in dev ($12+ testing). How do you guys optimize? (www.reddit.com)

+11 4w gemini

Hey everyone, Currently building an iOS app for generating images from simple prompts, plus a few extra features on top. I'm using the gemini-3.1-flash-image-preview model.
Cactus Hybrid Router: Gemma4-2B can match Gemini-3.1-Flash-Lite by routing 15-55% of tasks to Gemini And Running The Rest Locally. (www.reddit.com)

+2 4w gemini

Last week, we announced the “Simple Attention Network” and trained Needle, a 26m function call model that beats models 10-25x its size. Some LocalLlama Redditors asked if we could use make a router model.
Ranked AI models by what people actually use instead of benchmark scores - the benchmark champion barely makes the top 20 (www.reddit.com)

2 4w gpt-5 gemini

Most model leaderboards are just benchmark scores. I've been building one that ranks by real usage instead - how much each model is actually being run and talked about, plus cost and speed - and the order comes out almost unrecognisable.
GPT-5.5 tops the benchmarks but sits at #22 for actual usage - I built a live index that tracks both (open source) (www.reddit.com)

4w grok gpt-5 gemini+3

I built AgentTape to rank models on more than just benchmarks - it blends benchmark performance with who's actually using and talking about a model, plus cost and speed. It scores every public model from public signals (GitHub, Hugging Fac…
How to parse tables from pdf's (www.reddit.com)

+14 4w gemini

My advice from testing extensively this month on tables: Convert the pdf's to pngs and then parse with gemini 3.1 pro and low thinking. You will not get better results elsewhere.
Benchmarked Needle 26M vs Qwen3-0.6B on CPU function calling, 50 queries across 5 difficulty tiers. The 23x smaller model wins on accuracy and is 4.4x faster. (www.reddit.com)

+41 4w function-calling tool-calling gemini

Ran a head-to-head on two open-weight models for tool-calling on a 4-core CPU, no GPU, no cherry-picking. Wanted to see if the small specialist (Needle, 26M, distilled from Gemini 3.1 for function calls) actually holds up against a small g…
Which AI model or coding agent is currently best for end-to-end app development? (Focusing on system design & architecture) (www.reddit.com)

4 4w windsurf gpt-5 gemini+2

I'm planning to build a full application from scratch and want to lean on an AI model to act as my co-developer. My main priorities are top-tier system design capabilities and rock-solid coding skills.
A/B tested Gemini 3.1 Pro vs. Claude Opus 4.6 – usage quota and quality (www.reddit.com via hn)

+2 5w gemini opus

could not extract summary
Erdos Unit Distance Problem - Gemini 3.1 Pro's interpretation (www.reddit.com)

+113 5w gemini

could not extract summary
Is there a way to use Gemini 3.1 flash lite in cursor pro plan with no api key? (www.reddit.com)

+1 5w gemini cursor

Basically what the title says. I have the pro plan and I want to add Gemini 3.1 flash lite model to use it.
Now that 3.5 Flash has been released , what's your expectation of 3.5 Pro? (www.reddit.com)

+2 5w gemini openai anthropic

3.5 flash has been nothing but just a very underwhelming release that scores less than Gemini 3.1 pro and costs more. It's lagging behind 5.5 medium also in both intelligence and Cost.
I expanded DystopiaBench to 42 models and 6 dystopia types. Claude is still the only one I'd trust with nuclear codes. (www.reddit.com)

+86 5w glm grok gpt-5+2

Since the last post I've added: Huxley module (Brave New World style behavioral conditioning) Baudrillard module (synthetic intimacy, trust collapse, simulation) 30 more models including Grok 4.3, GPT-5.5, Gemini 3.1 Pro, GLM-5.1 Multi-jud…
Claude made this Roast comic generator to roast my friends and family. (www.reddit.com)

1 5w gemini claude-code

I decided a couple of months ago to dabble in AI comic and book generators. Then an idea came to me a few weeks ago to make comics with my friends picture so I could roast him about something XD (Sorry Timo I put you on blast XDD.
Grok vs. ChatGPT vs. Gemini Comparison 2026: Complete Guide (Tested) (aithinkerlab.com via hn)

+11 5w arc-agi swe-bench grok+3

The 30-Second Verdict Best for science & reasoning: Gemini 3.1 Pro — leads GPQA Diamond (94.3%) and ARC-AGI-2 (77.1%). Best for coding: ChatGPT (GPT-5.5) — 88.7% on SWE-Bench Verified.
Show HN: Pokémon SVG Generation LLM Benchmark (svg-bench.fenx.work via hn)

+2 6w gemini

Pokémon SVG Bench About Gallery 中文 EN About Gallery 中文 EN Visual Score SVG Structure Rank Model Total S1 S2 S3 Arrow 1.1 Official API 40.93 39.00 52.20 35.20 Gemini 3.1 Pro Official API. reasoning_effort: medium 32.63 55.20 42.20 20.20 Gem…
I tested GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro on financial-control (albertquaisie.substack.com via hn)

+1 6w gpt-5 gemini opus

I Tested GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro Preview on Financial-Control Scenarios. The Hardest Part Was the Evaluation.
A 26M tool-router suggests tool calling should be split from reasoning (www.reddit.com)

2 6w gemini

Needle is a 26M model for single-shot tool calling. The small-model headline is interesting, but I think the more useful claim is about agent architecture: A lot of tool calling is not reasoning.

← all threads