Is MCP a sign of the reopening of the internet? (bakkenbaeck.com via hn)
Back in Web 2.0 times openness was the default for Web-platforms, will AI agents lead us back there? A long time ago, we had Web 2.0!
From the Hugging Face Hub to robot hardware with Strands Agents and LeRobot You have a robot, a folder of demonstration data on the Hugging Face Hub, and a new task you want it to learn. Today that takes five separate tools: one to record…
My Claude refuses to give me a Grand Plan for my business. (www.reddit.comhttps)
This happened to me a week a go when I asked Claude to give me the Grand Plan to execute my future plans we are discussing about. It outright said "No, I won't give you!" and The reason is that I have so many chats going on and haven't don…
datasette-agent 0.3a0 (simonwillison.net)
15th June 2026 - New tool, execute_write_sql , which requests user approval and then writes to a database - taking user permissions into account. #27 I added a mechanism for asking user approval in datasette agent 0.2a0.
- datasette-agent 0.2a0 (simonwillison.net)
- datasette-agent 0.1a4 (simonwillison.net)
- Show HN: Datasette Agent (simonwillison.net via hn)
+3 more
- datasette-agent 0.1a3 (simonwillison.net)
- datasette-agent 0.1a2 (simonwillison.net)
- datasette-agent 0.1a1 (simonwillison.net)
Large Language Models (LLMs) achieve strong performance on reasoning tasks, but whether this reflects faithful logical inference or heuristic approximation remains unclear. We study this question in legal entailment by comparing three para…
On automatic programming (www.reddit.com via reddit)
With the advent of agents, automatic programming has become something really serious. You have now an always-on buddy ready to help, implement, and validate your implementation plans and code changes.
-
371 items
event
SecurityOpenAI has released GPT-5.4-Cyber for testing as part of its Trusted Access for Cyber Defense program, aiming to compete with Anthropic's Claude Mythos in the cybersecurity domain. Meanwhile, concerns are rising over the potential risks associated with advanced AI models like Mythos, prompting calls for improved defenses before wider releases.
- 56m The State of Fable, the Jailbreak Problem, SpaceX Acquires Cursor
- 6h A Red-Team Study of Anthropic Fable 5 and Opus 4.8 Models
- 10h Claude Opus caught malware hidden in my repo, then reverse engineered the whole thing
- 12h 4 in 10 AI agents headed for demotion or the rubbish bin (Gartner)
- 17h Show HN: VulnFeed – 9 security tools your AI agent can call (MCP server)
7 itemsmodel roundup
Opus 4.7Opus 4.7 has gained more frequent use among certain users, such as creative professionals and therapists, who prefer it over newer versions like 4.8 or older ones like 4.6. Some users report that Opus 4.7 handles context documents better than later versions.
SpaceX to acquire AI coding platform Cursor for $60 billion (arstechnica.com)
SpaceX will acquire AI coding tool Cursor for $60 billion in an all-stock transaction, the companies announced today. The deal is expected to close in the third quarter.
- SpaceX to Acquire Cursor (xcancel.com via hn)
Import AI 461: "Alignment is not on track"; FrontierCode; and synthetic research interns (importai.substack.com)
Import AI 461: "Alignment is not on track"; FrontierCode; and synthetic research interns Where are your agents right now? Welcome to Import AI, a newsletter about AI research.
Noema64 Noema64 is an open-source explainable chess engine that uses a language model as a persistent strategic planner while deterministic Go code owns legal move validation, game state, fallback, UCI protocol behavior, traces, and local…
When large language models (LLMs) fail to generalize or make haphazard errors in reasoning, it is often taken as evidence that LLMs are not truly reasoning, but rather performing a kind of pattern matching. The implication is that people's…
OpenAI WebRTC Audio Session, now with document context (simonwillison.net)
12th June 2026 - Link Blog OpenAI WebRTC Audio Session, now with document context. I built the first version of this tool in December 2024 to try out the then-new OpenAI WebRTC API for interacting with their realtime audio models.
Fixing Claude's memory with a Postgres database (www.makeuseof.com via hn)
If you've spent any serious time with Claude Code, you've likely already been frustrated by the AI forgetting everything. You spend the first twenty minutes of the session describing project structure, coding conventions, and why you're us…
Agent systems are advancing quickly across domains, but their evaluation remains fragmented. Most benchmarks rely on fixed, LLM-centric harnesses that require heavy integration, create test-production mismatch, and limit fair comparison ac…
-
132 items
event
GlmRecent developments in the AI space highlight significant advancements from Chinese companies, particularly Zai's upgrade of GLM-5.1, which has shown substantial improvements. Meanwhile, there are concerns about a widespread intelligence drop across various models and discussions around the potential openness of leading AI projects like GLM 5.1.
412 itemsevent
Anthropic MythosAnthropic's new update, Claude Mythos, has garnered attention from top AI security researchers like Carlini, who found numerous bugs. The update is noted for its speed and effectiveness, with Anthropic identifying a significant security flaw in FFmpeg and quickly submitting patches.
Every major LLM provider had at least one significant outage in 2025. Anthropic, OpenAI, Gemini — all of them, at some point, just stopped responding mid-request.
11th June 2026 - Link Blog Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude (via) Big scoop for Maxwell Zeff at Wired: “We’re changing Fable 5’s safeguards for frontier LLM development to make them visibl…
- Anthropic Walks Back Policy That Could Have 'Sabotaged' Researchers Using Claude (www.wired.com via hn)
Show HN: Hyperbox- $40/month Mac mini rentals (hyperbox.sh via hn)
Hey HN, Following the OpenClaw craze, I saw a huge need for hosting personal Claws/Hermes agents on macOS [0]. So I built an agent to scrape eBay for below-market M-series Macs and built a Mac mini datacenter [1].
[AINews] Open Models, Model Labs vs Agent Labs, and What's Untrainable — Sarah Guo a quiet day lets us reflect on a great essay Sarah Guo is a friend of the pod and Queen of AI, and after our Satya crossover pod (great recap here from Goku…
About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
The gravity around a black hole is so extreme that nothing, not even light, can escape once it gets close enough. Astrophysicists like Chi-kwan Chan study black holes with computer simulations and observations.
Claude vs. ChatGPT for Code Review: Which Is Better? (theaileverageweekly.com via hn)
- Claude Code Ultra Review (www.reddit.com)