Show HN: Squishy – Claude Fabel 5 coded a game and it is good (squishy.franzai.com via hn)
01 0 /1 Tap to play You did it! Share Next → Play LEVEL 1 Levels Daily Reset progress Levels Back
Claude Fable is relentlessly proactive (simonwillison.net)
Claude Fable is relentlessly proactive 11th June 2026 After two days of experience with Claude Fable 5 I think the best way to describe it is relentlessly proactive. It knows a whole lot of tricks and it will deploy pretty much any of them…
Lawsuit: ChatGPT validated suicidal woman's distrust of crisis lines (arstechnica.com)
Last year, a 24-year-old Canadian woman was in a mental health crisis and turned to ChatGPT for help. Hours later, that woman, Alice Carrier, took her own life.
- Anthropic Walks Back Policy That Could Have 'Sabotaged' Researchers Using Claude (www.wired.com via hn)
Is Claude capable to be offended? (www.reddit.com via reddit)
I called its email draft bland and impersonal and it became defensive and shut down any tentative to continue improve the draft as if it was offended. Maybe there are some embedded guardrails take over if you challenge the AI too bluntly...
- Claude seems to be offended (www.reddit.com)
- Claude not knowing what it's capable of (www.reddit.com)
The Role of Feedback Alignment in Self-Distillation (arxiv.org) discussed ↗
F-bombs don't make LLMs smarter (tcz.hu via hn)
F-bombs don't make LLMs smarter Imagine someone asks you to solve a math puzzle right as you step on a LEGO brick. Are you going to do better or worse than the baseline?
-
360 items
event
Anthropic MythosAnthropic's new update, Claude Mythos, has garnered attention from top AI security researchers like Carlini, who found numerous bugs. The update is noted for its speed and effectiveness, with Anthropic identifying a significant security flaw in FFmpeg and quickly submitting patches.
- 53m I’ll host your Claude games
- 3h tested Claude Fable 5 and Opus 4.8 across 917 coding-agent scenarios. Fable won by 0.9 points.
- 4h Show HN: We're inviting Anthropic to put the real Mythos 5 on our open benchmark
- 5h Anthropic Mythos: Modelling Bank Strategies
- 9h Canceled my sub over the silent-sabotage guardrail, renewed when they walked it back
76 itemsmodel roundup
Gemma 4Gemma 4 is a family of open-source multimodal models from Google DeepMind, including sizes up to 31B parameters and featuring Dense and Mixture-of-Experts architectures. Notable community highlights include the release of Gemma 4 12B as an encoder-free unified model for laptops, its availability via llama-server on a RTX 5070 Ti GPU, and detailed visual guides showcasing its capabilities.
Superficial Beliefs in LLM Decision-Making (arxiv.org) discussed ↗
Launch HN: BitBoard (YC P25) – Analytics Workspace for Agents (bitboard.work via hn)
We’re Connor and Ambar from BitBoard (https://bitboard.work). BitBoard is an agentic analytics workspace.
The gravity around a black hole is so extreme that nothing, not even light, can escape once it gets close enough. Astrophysicists like Chi-kwan Chan study black holes with computer simulations and observations.
AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis (arxiv.org) discussed ↗
Show HN: Crowfly.golf – Zero-backend GPS round tracking (localStorage) (crowfly.golf via hn)
crowfly is a single-html-file golf round GPS tracker. Hit your shot, tap "log shot", walk to your ball, repeat.
Investing in multi-agent AI safety research (deepmind.google)
Show HN: Tokenbrook Vale, a cozy office village for your AI agents (github.com via hn)
Also running live at https://demo.tokenbrook.com - hook up your own Claude instances per instructions in the repo! This is something I've been wanting to throw together for a few weeks; I thought it'd be a fun visualization of my agents ru…
Steganography Without Modification: Hidden Communication via LLM Seeds (arxiv.org) discussed ↗
datasette-agent 0.2a0 (simonwillison.net)
10th June 2026 Highlights from the release notes: - Tools can now ask the user questions mid-execution. Tools that declare a context parameter receive aToolContext object, andawait context.ask_user(...) can ask a yes/no, multiple-choice (o…
- datasette-agent 0.1a4 (simonwillison.net)
- Show HN: Datasette Agent (simonwillison.net via hn)
- datasette-agent 0.1a3 (simonwillison.net)
+2 more
- datasette-agent 0.1a2 (simonwillison.net)
- datasette-agent 0.1a1 (simonwillison.net)
-
51 items
event
DeepmindGoogle DeepMind has released "Deep Research Max," advancing autonomous research agents, while also facing challenges and competition from other AI companies like Anthropic and Ineffable Intelligence. Meanwhile, DeepMind workers in the UK have voted to unionize, and former DeepMind architect Demis Hassabis is at the center of legal drama involving Elon Musk.
- 1d Google DeepMind is worried about what happens when millions of agents start to interact
- 1d Show HN: Magenta Real-Time Music Generation on iPhone, Without the GPU
- 2d The Great Reframing...
- 3d Show HN: VQAScore – open eval metric/reward model, now for text-to-video
- 7d Inside Google DeepMind: Reasoning, Omni, and Shipping Frontier AI
Canadian mother sues OpenAI, alleging ChatGPT led her daughter to kill herself (www.theguardian.com via hn)
A Canadian mother sued OpenAI and its CEO, Sam Altman, in US court on Thursday, alleging that ChatGPT encouraged her daughter to kill herself. The lawsuit is the latest in a slew accusing the company of failing to address dangerous convers…
- Mother sues OpenAI, alleging ChatGPT encouraged daughter's suicide (www.reuters.com via hn)
ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity (arxiv.org) discussed ↗
llm 0.32a3 (simonwillison.net)
9th June 2026 Almost entirely written by the new Claude Fable 5, see my write-up for more details. Recent articles - Initial impressions of Claude Fable 5 - 9th June 2026 - Running Python code in a sandbox with MicroPython and WASM - 6th J…
Breaking the Ice: Analyzing Cold Start Latency in vLLM (arxiv.org) discussed ↗
Ask HN: I Need Help for a Product (news.ycombinator.com)
I have a very high-quality SaaS product that's ready to launch. Everything is complete: the product, testing, LLC, bank account, Stripe, tech stack, etc.
- Need help (www.reddit.com via reddit)
- Need Help! (www.reddit.com)
- Need help (www.reddit.com)
TripoSplat Generate 3D models from a single image I asked a coding agent to build a beautiful website showcasing the monuments of Paris as 3D Gaussian splats. I never opened an image generator.
Show HN: Apodex-1.0 – a deep-research agent team that verifies its own evidence (dr.miromind.ai via hn)
MiroMind App is now available — access MiroMind wherever you are. Get App MiroThinker Get App Engineered for Deep Understanding, Not Small Talk Don't just chat.
SkyPilot Sandboxes: Run Agent Code on Your Own Kubernetes, at Scale (blog.skypilot.co via hn)
Every agent, coding assistant, and RL pipeline eventually hits the same wall: the model wrote code, and now someone has to run it. Today, most teams hand that code to a hosted sandbox vendor paying a multiple of raw compute to execute untr…
Qwen-Image-Flash: Beyond Objective Design (arxiv.org) discussed ↗