Are coding agents getting expensive, or are we measuring cost the wrong way? (www.reddit.com via reddit)
Seeing the recent token-burn discussion around agentic coding made me think the bigger issue is not just price. A coding agent can be expensive and still be worth it if it removes real engineering effort.
With all the SpaceX IPO noise lately and the launch of Fable got me in the mood of testing out its capabilities with a phycics game! Few hours later I had a 2,500-line, single-file browser game (and mobile(!)) where you boostback a Falcon…
Access OpenAI models and Codex through your Oracle cloud commitment | OpenAI Use your existing Oracle cloud commitment to give teams access to OpenAI’s most advanced models and Codex, without creating a new purchasing path. Listen to artic…
The Role of Feedback Alignment in Self-Distillation (arxiv.org) discussed ↗
datasette-agent 0.2a0 (simonwillison.net)
10th June 2026 Highlights from the release notes: - Tools can now ask the user questions mid-execution. Tools that declare a context parameter receive aToolContext object, andawait context.ask_user(...) can ask a yes/no, multiple-choice (o…
- datasette-agent 0.1a4 (simonwillison.net)
- Show HN: Datasette Agent (simonwillison.net via hn)
- datasette-agent 0.1a3 (simonwillison.net)
+2 more
- datasette-agent 0.1a2 (simonwillison.net)
- datasette-agent 0.1a1 (simonwillison.net)
Eidentic Eidentic is the open-source TypeScript SDK for AI agents with self-improving memory and production fundamentals built in. Durable execution, enforced cost ceilings, multi-tenant isolation, GDPR erasure, and sandboxed tools — not b…
-
118 items
model roundup
Opus 4.8Claude AI has released Opus 4.8, an upgrade to their Opus class of models available in version 2.1.154 of their software on March 16, 2023, which includes enhanced coding and professional task capabilities along with improved judgment and honesty. Users are reporting usage resets following the update.
79 itemsmodel roundup
Gemma 4Gemma 4 is a family of open-source multimodal models from Google DeepMind, including sizes up to 31B parameters and featuring Dense and Mixture-of-Experts architectures. Notable community highlights include the release of Gemma 4 12B as an encoder-free unified model for laptops, its availability via llama-server on a RTX 5070 Ti GPU, and detailed visual guides showcasing its capabilities.
AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis (arxiv.org) discussed ↗
Investing in multi-agent AI safety research (deepmind.google)
Steganography Without Modification: Hidden Communication via LLM Seeds (arxiv.org) discussed ↗
- Anthropic Walks Back Policy That Could Have 'Sabotaged' Researchers Using Claude (www.wired.com via hn)
We're opening up beta access and looking for a handful of B2B teams to test something we've been building. It's an AI agent that runs your product demos on a live video call.
ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity (arxiv.org) discussed ↗
So Google’s new DiffusionGemma-26B-A4B-it is pretty wild. It's a discrete text diffusion model, which means instead of generating one token at a time autoregressively, it predicts and refines a whole block of up to 256 tokens in parallel.
Breaking the Ice: Analyzing Cold Start Latency in vLLM (arxiv.org) discussed ↗
-
352 items
event
Anthropic MythosAnthropic's new update, Claude Mythos, has garnered attention from top AI security researchers like Carlini, who found numerous bugs. The update is noted for its speed and effectiveness, with Anthropic identifying a significant security flaw in FFmpeg and quickly submitting patches.
51 itemsevent
DeepmindGoogle DeepMind has released "Deep Research Max," advancing autonomous research agents, while also facing challenges and competition from other AI companies like Anthropic and Ineffable Intelligence. Meanwhile, DeepMind workers in the UK have voted to unionize, and former DeepMind architect Demis Hassabis is at the center of legal drama involving Elon Musk.
- 8h Google DeepMind is worried about what happens when millions of agents start to interact
- 20h Show HN: Magenta Real-Time Music Generation on iPhone, Without the GPU
- 1d The Great Reframing...
- 2d Show HN: VQAScore – open eval metric/reward model, now for text-to-video
- 6d Inside Google DeepMind: Reasoning, Omni, and Shipping Frontier AI
MCP server for writing fiction (www.reddit.com via reddit)
Title: Anyone using MCP servers with Claude for serious writing workflows? I’ve been experimenting with MCP servers connected to Claude, and I’m starting to think this is one of the more important shifts in practical LLM usage.
Initial impressions of Claude Fable 5 (simonwillison.net)
Initial impressions of Claude Fable 5 9th June 2026 I didn’t have early access to today’s Claude Fable 5 release, but I’ve spent the past ~5.5 hours putting it through its paces. My initial impressions are that this is something of a beast.
Agent Memory SDK – persistent memory for AI agents (github.com via hn)
agent-memory-sdk TypeScript SDK for building AI agents with automatic, scoped, persistent memory. agent-memory-sdk wraps model calls with memory recall and learning so apps can keep useful user, thread, and operation context without manual…
- Persistent Memory for Coding Agents (www.agent-memory.dev via hn)
- Mnemory – Persistent memory for AI agents (github.com via hn)
llm 0.32a3 (simonwillison.net)
9th June 2026 Almost entirely written by the new Claude Fable 5, see my write-up for more details. Recent articles - Initial impressions of Claude Fable 5 - 9th June 2026 - Running Python code in a sandbox with MicroPython and WASM - 6th J…
Coinbase launches AI agent accounts that can trade and spend on your behalf (www.coindesk.com via hn)
Coinbase launches AI agent accounts that can trade and spend on your behalf "Coinbase for Agents" is a new platform that lets AI assistants like ChatGPT and Claude connect to users’ Coinbase accounts to trade crypto, access data and eventu…
TripoSplat Generate 3D models from a single image I asked a coding agent to build a beautiful website showcasing the monuments of Paris as 3D Gaussian splats. I never opened an image generator.
Qwen-Image-Flash: Beyond Objective Design (arxiv.org) discussed ↗
I’ve been thinking about whether paying for a premium model actually improves writing, or whether it mostly produces a more polished version of the model’s default style. This looks at Fable 5 from that angle: where stronger reasoning migh…
Show HN: Bosun – a small model that keeps an agent's memory graph clean (huggingface.co via hn)
Bosun-XS (0.6B) Launch post: Introducing Bosun → The judge that keeps an agent's memory — its knowledge graph — clean. As an agent accumulates memory as a graph of facts linked by relationships, Bosun-XS decides, edge by edge, which connec…