Claude Opus 4.6 accuracy on BridgeBench hallucination test drops from 83% to 68%

reddit-ai_agents · www.reddit.com ·28 pts·15 replies ↗ ·1d

Anthropic's flagship model just took a pretty significant accuracy hit on one of the most important AI benchmarks out there. So here's the deal: Claude Opus 4.6 was recently tested on BridgeBench, which specifically measures how often AI m…

hallucinationopusanthropicclaude

open →

← back to top