My experience with testing all frontier open-weight models against GPT and Claude

reddit-localllama · www.reddit.com ·13 pts·20 replies ↗ ·3d

I spent about a week testing open-weight models for real work, comparing them against what I already know from ChatGPT, Gemini, and Claude. The gap between what benchmarks suggest and what happens when you give these models something to ve…