model roundup

DeepSeek 3.2

3 items · started 2026-04-21 · closed 2026-04-25

Recent Open models from last 6 Months - Nov 2025 - Apr 2026 (www.reddit.com)

+11628 9w mistral glm gemma+1

I created this chart with recent open models from last 6 months. Few might be older than that possibly.
New LLM Position Bias Benchmark: does an LLM keep the same judgment when you swap the answer order? Judge models compare two lightly edited versions of the same story twice, with the order swapped. The median model flips in 45% of decisive case pairs. GPT-5.4 is worst at 66%. (www.reddit.com)

+192 9w mistral gpt-5 deepseek

More info, including charts, per-case metrics, raw judge outputs, and the parsed answer dump: https://github.com/lechmazur/position_bias This benchmark isolates one basic and frustrating failure mode. The model-average first-shown pick rat…
2x 512gb ram M3 Ultra mac studios (www.reddit.com)

+346106 9w glm deepseek