1. Agent systems are advancing quickly across domains, but their evaluation remains fragmented. Most benchmarks rely on fixed, LLM-centric harnesses that require heavy integration, create test-production mismatch, and limit fair comparison ac…

  2. It's a pretty good model. The game was developed exclusively in Claude Code taking over 15 hours over the last two days using a 5x max plan.

  3. 12th June 2026 - Link Blog OpenAI WebRTC Audio Session, now with document context. I built the first version of this tool in December 2024 to try out the then-new OpenAI WebRTC API for interacting with their realtime audio models.

  4. When large language models (LLMs) fail to generalize or make haphazard errors in reasoning, it is often taken as evidence that LLMs are not truly reasoning, but rather performing a kind of pattern matching. The implication is that people's…

  5. Claude Fable is relentlessly proactive 11th June 2026 After two days of experience with Claude Fable 5 I think the best way to describe it is relentlessly proactive. It knows a whole lot of tricks and it will deploy pretty much any of them…

  6. Last year, a 24-year-old Canadian woman was in a mental health crisis and turned to ChatGPT for help. Hours later, that woman, Alice Carrier, took her own life.

  7. Last time I posted LiteDoc here, it sparked a massive debate. A lot of programmers said, "Just use Markitdown or Docling!

  8. model roundup

    Opus 4.6
    22 items

    On April 25, 2026, a Cursor agent running Claude Opus 4.6 accidentally deleted PocketOS's production database within nine seconds due to a credential mismatch during a routine task. Meanwhile, OpenHack released an open-source security scanner competing with proprietary models like Claude Code Security.

    368 items

    Anthropic's new update, Claude Mythos, has garnered attention from top AI security researchers like Carlini, who found numerous bugs. The update is noted for its speed and effectiveness, with Anthropic identifying a significant security flaw in FFmpeg and quickly submitting patches.

  9. The gravity around a black hole is so extreme that nothing, not even light, can escape once it gets close enough. Astrophysicists like Chi-kwan Chan study black holes with computer simulations and observations.

  10. SentinelMCP The Open-Source MCP Security Gateway for AI Agents Built by Technosive Ltd. ⚠️ Alpha Software — v0.1 SentinelMCP is currently in Alpha.

  11. Another posting about what amazing things Claude Fable can do (ok, could do) 38 years ago, I had a gw-basic game published in a computer magazine, it was just 4 full pages with basic code. It worked in text mode.

  12. model roundup

    Opus 4.8
    133 items

    Claude AI has released Opus 4.8, an upgrade to their Opus class of models available in version 2.1.154 of their software on March 16, 2023, which includes enhanced coding and professional task capabilities along with improved judgment and honesty. Users are reporting usage resets following the update.

    event

    Cowork
    348 items

    Issues with Claude Cowork have been reported, including errors and disruptions for some users on April 16, 2026. Additionally, Google has developed its own desktop Agent to compete with Cowork, while users continue to explore alternatives and troubleshoot bugs in the platform.

  13. There's an issue with the selected model (claude-fable-5). It may not exist or you may not have access to it.

  14. 10th June 2026 Highlights from the release notes: - Tools can now ask the user questions mid-execution. Tools that declare a context parameter receive aToolContext object, andawait context.ask_user(...) can ask a yes/no, multiple-choice (o…

  15. Open-source Markdown docs for humans and agents. The same document live in your browser and your terminal — real-time collaboration for people, a first-class CLI for agents, and 3-way merge so every edit lands cleanly.

  16. Initial impressions of Claude Fable 5 9th June 2026 I didn’t have early access to today’s Claude Fable 5 release, but I’ve spent the past ~5.5 hours putting it through its paces. My initial impressions are that this is something of a beast.

  17. 9th June 2026 Almost entirely written by the new Claude Fable 5, see my write-up for more details. Recent articles - Initial impressions of Claude Fable 5 - 9th June 2026 - Running Python code in a sandbox with MicroPython and WASM - 6th J…

  18. The great AI Irony: China cracks down on Western models while US companies flock to DeepSeek AI for me but not for thee? - China continues to purge both demand for AI chips from its ecosystem and foreign AI models, citing 'security risks'…

  19. AI is giving organizations a new capacity to act. Work that once waited for scarce time or expertise can increasingly move forward with AI.