OpenAI WebRTC Audio Session, now with document context (simonwillison.net)
12th June 2026 - Link Blog OpenAI WebRTC Audio Session, now with document context. I built the first version of this tool in December 2024 to try out the then-new OpenAI WebRTC API for interacting with their realtime audio models.
Android App - can't download artefacts (www.reddit.com via reddit)
It's been a week or so that I am unable to download any artefact from any of my chats on my Android smartphone Claude app. I just get an error message "Couldn't download the file" no matter what the file is .XLSX, .md, .html, .py I hav…
Reasoning as Pattern Matching: Shared Mechanisms in Human and LLM Everyday Reasoning (arxiv.org) discussed ↗
When large language models (LLMs) fail to generalize or make haphazard errors in reasoning, it is often taken as evidence that LLMs are not truly reasoning, but rather performing a kind of pattern matching. The implication is that people's…
AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility (arxiv.org) discussed ↗
Agent systems are advancing quickly across domains, but their evaluation remains fragmented. Most benchmarks rely on fixed, LLM-centric harnesses that require heavy integration, create test-production mismatch, and limit fair comparison ac…
I built real time chess under 10 hours with Claude fable (chessv2.com via reddit)
I built chessv2, a real-time multiplayer chess game you can play in your browser and play with your friends. It’s free to play at chessv2.com, and the full source is MIT-licensed on GitHub.
Claude Fable is relentlessly proactive (simonwillison.net)
Claude Fable is relentlessly proactive 11th June 2026 After two days of experience with Claude Fable 5 I think the best way to describe it is relentlessly proactive. It knows a whole lot of tricks and it will deploy pretty much any of them…
The gravity around a black hole is so extreme that nothing, not even light, can escape once it gets close enough. Astrophysicists like Chi-kwan Chan study black holes with computer simulations and observations.
Thoughts on fable 5 (I got some use in before the take down) (www.reddit.com via reddit)
Look, obviously it's better, I'm not denying that. But it still regularly makes mistakes and fails to consider edge cases in code I ask it to generate without me explicitly saying "watch out for x potential edge case" or "reason about the…
-
383 items
event
Anthropic MythosAnthropic's new update, Claude Mythos, has garnered attention from top AI security researchers like Carlini, who found numerous bugs. The update is noted for its speed and effectiveness, with Anthropic identifying a significant security flaw in FFmpeg and quickly submitting patches.
- 17m Fable 5 is offline. Switch to Opus, jump to OpenAI, or just wait?
- 19m Anthropic spent a week arguing it should control who uses its most powerful model. Then the government used that exact argument against it. A timeline.
- 1h Are Mythos and Fable pure marketing?
- 1h AI is the new nuclear bomb, USA is going to gatekeep it and control it, they'll make sure no other country creates a mythos.... and that's the best case scenario
- 2h If Fable is "too good" to export does this mean no more better LLMs?
33 itemsmodel roundup
Opus 4.7Anthropic has released Claude Opus 4.8, an upgrade over 4.7 with enhanced judgment and independence. Meanwhile, a new benchmark called The Singularity Gate tests AI models like Opus 4.7 and GPT-5.5 for their ability to predict scientific discoveries beyond their training data.
The Role of Feedback Alignment in Self-Distillation (arxiv.org) discussed ↗
Using Claude as my assistant (www.reddit.com via reddit)
Hey. Since Claude is much better than chat, I changed to it and found out, that people go crazy with projects.
- Claude is using your computer? (www.reddit.com via reddit)
- Am I using Claude incorrectly? (www.reddit.com via reddit)
- Using Claude for writing (www.reddit.com via reddit)
+10 more
- Using Claude everyday (www.reddit.com)
- Using Claude to invest (www.reddit.com)
- How are filmmakers using claude? (www.reddit.com)
- How are you using Claude for marketing? (www.reddit.com)
- Using Claude for Humanities? (www.reddit.com)
- 3D Models using Claude (www.reddit.com)
- Using Claude Daily (www.reddit.com)
- Using Claude for everything (www.reddit.com)
- THE PROBLEM WITH "JUST USING CLAUDE" (www.reddit.com)
- How are you using Claude in your business? (www.reddit.com)
datasette-agent 0.2a0 (simonwillison.net)
10th June 2026 Highlights from the release notes: - Tools can now ask the user questions mid-execution. Tools that declare a context parameter receive aToolContext object, andawait context.ask_user(...) can ask a yes/no, multiple-choice (o…
- datasette-agent 0.1a4 (simonwillison.net)
- Show HN: Datasette Agent (simonwillison.net via hn)
- datasette-agent 0.1a3 (simonwillison.net)
+2 more
- datasette-agent 0.1a2 (simonwillison.net)
- datasette-agent 0.1a1 (simonwillison.net)
Superficial Beliefs in LLM Decision-Making (arxiv.org) discussed ↗
Lawsuit: ChatGPT validated suicidal woman's distrust of crisis lines (arstechnica.com)
Last year, a 24-year-old Canadian woman was in a mental health crisis and turned to ChatGPT for help. Hours later, that woman, Alice Carrier, took her own life.
Investing in multi-agent AI safety research (deepmind.google)
What's to stop Anthropic releasing its models under Anthropic Ireland Limited? (www.reddit.com via reddit)
I mean any of them, not just Fable. Is there something like the CLOUD Act that would prevent it?
ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity (arxiv.org) discussed ↗
New OpenAI Academy courses for the next era of work (openai.com)
AI is giving organizations a new capacity to act. Work that once waited for scarce time or expertise can increasingly move forward with AI.
-
77 items
model roundup
Gemma 4Gemma 4 is a family of open-source multimodal models from Google DeepMind, including sizes up to 31B parameters and featuring Dense and Mixture-of-Experts architectures. Notable community highlights include the release of Gemma 4 12B as an encoder-free unified model for laptops, its availability via llama-server on a RTX 5070 Ti GPU, and detailed visual guides showcasing its capabilities.
Steganography Without Modification: Hidden Communication via LLM Seeds (arxiv.org) discussed ↗
claude having way too much fun down the rabbithole (www.reddit.comhttps)
i finished designing a simple website, and specified an easter egg at the end. Claude went quiet for almost 2 minutes.
AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis (arxiv.org) discussed ↗
Initial impressions of Claude Fable 5 (simonwillison.net)
Initial impressions of Claude Fable 5 9th June 2026 I didn’t have early access to today’s Claude Fable 5 release, but I’ve spent the past ~5.5 hours putting it through its paces. My initial impressions are that this is something of a beast.
There is no going back to pre-AI, but LLM based AI has been a net drag. (www.reddit.com via reddit)
I have been working in software for 12 years. There is no "done right".
TripoSplat Generate 3D models from a single image I asked a coding agent to build a beautiful website showcasing the monuments of Paris as 3D Gaussian splats. I never opened an image generator.
UnpredictaBench: A Benchmark for Evaluating Distributional Randomness in LLMs (arxiv.org) discussed ↗
solo founder. $20.5K MRR.