New OpenAI Academy courses for the next era of work (openai.com)
AI is giving organizations a new capacity to act. Work that once waited for scarce time or expertise can increasingly move forward with AI.
AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility (arxiv.org) discussed ↗
Agent systems are advancing quickly across domains, but their evaluation remains fragmented. Most benchmarks rely on fixed, LLM-centric harnesses that require heavy integration, create test-production mismatch, and limit fair comparison ac…
Is Claude purposefully ignoring its own capabilities to consume tokens?! (www.reddit.com via reddit)
All I want to do is for Claude to send me a message at a scheduled time, for the sole reason of starting a 5h session while I am asleep. I tried multiple things, amongst which the following: - "/loop at 7:00am today just say "HI", nothing…
Claude Fable is relentlessly proactive (simonwillison.net)
Claude Fable is relentlessly proactive 11th June 2026 After two days of experience with Claude Fable 5 I think the best way to describe it is relentlessly proactive. It knows a whole lot of tricks and it will deploy pretty much any of them…
Reasoning as Pattern Matching: Shared Mechanisms in Human and LLM Everyday Reasoning (arxiv.org) discussed ↗
When large language models (LLMs) fail to generalize or make haphazard errors in reasoning, it is often taken as evidence that LLMs are not truly reasoning, but rather performing a kind of pattern matching. The implication is that people's…
Lawsuit: ChatGPT validated suicidal woman's distrust of crisis lines (arstechnica.com)
Last year, a 24-year-old Canadian woman was in a mental health crisis and turned to ChatGPT for help. Hours later, that woman, Alice Carrier, took her own life.
datasette-agent 0.2a0 (simonwillison.net)
10th June 2026 Highlights from the release notes: - Tools can now ask the user questions mid-execution. Tools that declare a context parameter receive aToolContext object, andawait context.ask_user(...) can ask a yes/no, multiple-choice (o…
- datasette-agent 0.1a4 (simonwillison.net)
- Show HN: Datasette Agent (simonwillison.net via hn)
- datasette-agent 0.1a3 (simonwillison.net)
+2 more
- datasette-agent 0.1a2 (simonwillison.net)
- datasette-agent 0.1a1 (simonwillison.net)
Ask HN: What will be the next big memory management system for AI Agents? (news.ycombinator.com)
We have all seen RAG and Graph Knowledge, but in your opinion or if you know of some cool project, what’s the next innovation that could a hierve true perpetual memory and true personalization???
Past 150 seats, you're forced into their Enterprise tier. Seats stop including any usage at all, and every single token bills at standard API rates.
Access OpenAI models and Codex through your Oracle cloud commitment | OpenAI Use your existing Oracle cloud commitment to give teams access to OpenAI’s most advanced models and Codex, without creating a new purchasing path. Listen to artic…
The Role of Feedback Alignment in Self-Distillation (arxiv.org) discussed ↗
Ask HN: How are you designing human review for production AI agents? (news.ycombinator.com)
For people running agents in real workflows: what do you let the agent do automatically, what requires human approval, and how do you keep the review process auditable?
-
363 items
event
Anthropic MythosAnthropic's new update, Claude Mythos, has garnered attention from top AI security researchers like Carlini, who found numerous bugs. The update is noted for its speed and effectiveness, with Anthropic identifying a significant security flaw in FFmpeg and quickly submitting patches.
336 itemsevent
SecurityOpenAI has released GPT-5.4-Cyber for testing as part of its Trusted Access for Cyber Defense program, aiming to compete with Anthropic's Claude Mythos in the cybersecurity domain. Meanwhile, concerns are rising over the potential risks associated with advanced AI models like Mythos, prompting calls for improved defenses before wider releases.
- 2h Ask HN: Phishing from 646-257-4500
- 23h Chaining LLM and web bugs to Admin
- 1d Fable's policy on no zero-day-retention is a serious problem for Enterprise customers
- 1d Visa Vulnerability Agentic Harness for Project Glasswing
- 1d Are we defaulting to VM-level sandboxing before understanding the threat model?
Superficial Beliefs in LLM Decision-Making (arxiv.org) discussed ↗
Show HN: Open-source Git-like Markdown docs for humans and agents (www.datacompany.dev via hn)
Open-source Markdown docs for humans and agents. The same document live in your browser and your terminal — real-time collaboration for people, a first-class CLI for agents, and 3-way merge so every edit lands cleanly.
OpenAI WebRTC Audio Session, now with document context (simonwillison.net)
12th June 2026 - Link Blog OpenAI WebRTC Audio Session, now with document context. I built the first version of this tool in December 2024 to try out the then-new OpenAI WebRTC API for interacting with their realtime audio models.
ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity (arxiv.org) discussed ↗
Curious If Others Use Claude For Mental Health Support and If So How (www.reddit.com via reddit)
So it's essentially as the title says, I'm curious other are using Claude for mental health support, how they are doing so, and opinions on using AI for mental health related issues as a whole. I'm still rather hesitant to use AI for very…
Investing in multi-agent AI safety research (deepmind.google)
Steganography Without Modification: Hidden Communication via LLM Seeds (arxiv.org) discussed ↗
Show HN: Full-self browsing - agents can drive any web task using this CLI Skill (www.webcli.sh via hn)
Connection timed out Error code 522 2026-06-13 00:14:04 UTC You Browser Working www.webcli.sh Host Error What happened? The initial connection between Cloudflare's network and the origin web server timed out.
Initial impressions of Claude Fable 5 (simonwillison.net)
Initial impressions of Claude Fable 5 9th June 2026 I didn’t have early access to today’s Claude Fable 5 release, but I’ve spent the past ~5.5 hours putting it through its paces. My initial impressions are that this is something of a beast.
AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis (arxiv.org) discussed ↗
llm 0.32a3 (simonwillison.net)
9th June 2026 Almost entirely written by the new Claude Fable 5, see my write-up for more details. Recent articles - Initial impressions of Claude Fable 5 - 9th June 2026 - Running Python code in a sandbox with MicroPython and WASM - 6th J…
Claude pro users: How do you actually maximize your subscription (www.reddit.com via reddit)
I recently subscribed to Claude Pro and I feel like I’m probably only using a fraction of what it’s capable of. My current use cases are: Deep research and brainstorming Business ideas and startup planning Long-form strategy discussions Cr…
- About claude pro subscription (www.reddit.com via reddit)
UnpredictaBench: A Benchmark for Evaluating Distributional Randomness in LLMs (arxiv.org) discussed ↗
TripoSplat Generate 3D models from a single image I asked a coding agent to build a beautiful website showcasing the monuments of Paris as 3D Gaussian splats. I never opened an image generator.