model roundup

Gemini 3.1

10 items · started 2026-04-23 · closed 2026-05-02

  1. I was trying to find a problem in my math heavy code and asked an agent (Gemini 3.1) to find the issue. Often when I know it’s a hard problem I let it be and go get coffee or lunch.

  2. I've been benching GPT-5.5 for the past couple days and would like to share my findings. This is based on a benchmark I've created that pits models against each other in autonomous games of Blood on the Clocktower - a highly complex social…

  3. Last week it was broken, then they "fixed" smth few days ago. Now again...

  4. Cost & Performance Efficiency Training Cost-Performance (8t): +170% to +180% gain (2.7x–2.8x) Inference Cost-Performance (8i): +80% gain Training Power Efficiency (8t): +124% gain in performance-per-watt Inference Power Efficiency (8i): +1…

  5. I’ve been using Cursor for ~1.5 years, mainly with Gemini 3.1 Pro. Recently I ran into a serious pricing issue.

  6. I dove deep into the most recent benchmark stats from GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro via official reports & third-party evaluations. I found a interesting thing:There’s no such thing as a “one-size-fits-all model.” My finding…

  7. Hi folks, I've been benching Kimi K2.6 for the past few days, and I'd like to share my findings. For context, this is based on a benchmark I've created that pits models against each other in autonomous games of Blood on the Clocktower - a…

  8. Major performance jump though. Worth it?

  9. could not extract summary

  10. I built a plugin that lets Claude Code delegate work to Gemini CLI. I started this after finding myself reaching for Gemini more often on long context repo work.

← all threads