I built a self-evolving agentic loop that ran 104 iterations autonomously to find questions that break every LLM — here's the architecture

reddit-claudeai · www.reddit.com ·2 pts·4 replies ↗ ·12h

Why I built this: I wanted to find the next "strawberry problem" — simple questions any kid can answer but every LLM gets wrong. Instead of manually testing questions, I built a system that does it autonomously.