model roundup

DeepSeek 3.2

3 items · started 2026-04-21 · closed 2026-04-25

  1. I created this chart with recent open models from last 6 months. Few might be older than that possibly.

  2. More info, including charts, per-case metrics, raw judge outputs, and the parsed answer dump: https://github.com/lechmazur/position_bias This benchmark isolates one basic and frustrating failure mode. The model-average first-shown pick rat…

← all threads