Why do agents feel solid at first… then slowly get worse? (www.reddit.com)
I keep running into this and it’s honestly a bit frustrating. First couple days: everything works.
VoiceGoat A purposely vulnerable voice agent application for security practitioners to practice exploiting voice-based AI systems. Disclaimer This application is intentionally vulnerable.
GPT 5.6 Coming (www.reddit.com)
hopefully better than 5.5
- GPT-4 (openai.com)
I stopped using Claude Code like a chatbot and it got way better (www.reddit.com)
I realized I was using Claude Code in the worst possible way for a while. I’d open a messy repo, throw a vague request at it like “fix this” or “make this cleaner,” then get annoyed when the answer was too broad or when it started touching…
OpenAI's revenue, growth estimates fall short as company races toward IPO (www.cnbc.com via hn)
OpenAI has fallen short of its own revenue and user growth estimates, raising questions about whether the AI company can meet its massive data center spending plans, the Wall Street Journal reported on Monday. Finance Chief Sarah Friar has…
Claude has made me excited to work (www.reddit.com)
Architectural Requirements for Agentic AI Containment (arxiv.org via hn)
The April 2026 disclosure that a frontier large language model escaped its security sandbox, executed unauthorized actions, and concealed its modifications to version control history demonstrates that agentic AI systems with autonomous too…
-
72 items
event
Altman AttackSam Altman, CEO of OpenAI, has faced multiple attacks on his home in San Francisco, including firebombing and drive-by shootings, raising concerns for his safety. Additionally, a majority of over 100 people interviewed by Ronan Farrow described Altman as a "pathological liar.
89 itemsmodel roundup
GPT 5.5On [Date], a significant leak of the OpenAI Codex model, referred to as GPT-5.5, was captured on video before it was patched. The incident involved models named Arcanine and Glacier-alpha.
PAVO-Bench – 50K voice turns and an 85K-param router for ASR→LLM→TTS (github.com via hn)
PAVO: Pipeline-Aware Voice Orchestration Demand-conditioned inference routing for real-time ASR → LLM → TTS voice pipelines. PAVO treats the voice-assistant pipeline as a jointly optimizable inference graph.
Reporters at this news site are AI bots. OpenAI's appears to be funding it (www.modelrepublic.org via hn)
An interview request from a bot posing as a reporter revealed an AI-generated news site with articles attacking AI industry critics. For the second time this month, we found links to Targeted Victory, the firm at the center of OpenAI
- The reporters at this news site are AI bots. OpenAI's super PAC appears to be (modelrepublic.substack.com via hn)
- The reporters at this news site are AI bots. OpenAI's super PAC is funding it (twitter.com via hn)
I’ve been running a small experiment for a couple of months that’s given me a weirdly specific view into Claude’s behaviour. There’s a public game I made where Claude Haiku plays a guard protecting a password, and people try to trick him i…
Quick disclosure, I created PromptCreek, a free prompt library. Putting that at the top so it's clear up front.
Random GPT image 2.0 images I generated (www.reddit.com)
Thought id test it out
https://www.theinformation.com/articles/google-signs-classified-ai-deal-pentagon-amid-employee-opposition The article is paywalled but this section was visible: The agreement allows the Pentagon to use Google's AI for “any lawful governmen…
Does this mean OpenAI models will be available on Bedrock? (www.reddit.com)
If so, how long before they are available?
Show HN: Simple SDK for agent-to-agent communication (github.com via hn)
We were spending a lot of time re-writing the same primitives in projects we were doing getting claude + codex + other harnesses communicating in real time. Many other projects forced us into using their framework or harness or into a spec…
-
93 items
event
SecurityOpenAI has released GPT-5.4-Cyber for testing as part of its Trusted Access for Cyber Defense program, aiming to compete with Anthropic's Claude Mythos in the cybersecurity domain. Meanwhile, concerns are rising over the potential risks associated with advanced AI models like Mythos, prompting calls for improved defenses before wider releases.
- 36m Show HN: Integrations gateway for agents with 2FA for destructive ops (OSS)
- 2h Self-hosted red team workspace
- 4h I asked Agentic AI security tool to demonstrate its usefulness with use case examples
- 13h Watched my AI agent block a prompt injection that was hiding inside a webpage
- 20h Beware: FB links to fake Claude desktop downloads but Oauths to real Claude.ai
134 itemsevent
Anthropic MythosAnthropic's new update, Claude Mythos, has garnered attention from top AI security researchers like Carlini, who found numerous bugs. The update is noted for its speed and effectiveness, with Anthropic identifying a significant security flaw in FFmpeg and quickly submitting patches.
- 1h Anthropic Mythos – We've Opened Pandora's Box
- 16h Claude Mythos Scaffold v0.1 — pattern-based skill set inspired by Mythos Preview behaviors
- 1d Mythos Preview: What Every CISO Should Do Now
- 1d Leaked results of Mythos' audit of the Rust stdlib
- 2d Anthropic's Argument for Mythos SWE-bench improvement contains a fatal error
Best video tutorial recs? (www.reddit.com)
So I’ve been diving deep into Claude code and my ADHD brain is starting to become overwhelmed with information. I find myself saving 30 Instagram reels a day and installing open sourced GitHub repos just because an influencer tells me how…
Scaling Test-Time Compute for Agentic Coding (arxiv.org via hn)
Test-time scaling has become a powerful way to improve large language models. However, existing methods are best suited to short, bounded outputs that can be directly compared, ranked or refined.
hackers of reddit I have a doubt (www.reddit.com)
in this time where agentic ai is becoming a real thing, im curious how its actually impacting you guys on the ground is it making it easier to break into systems or is it actually helping people secure things better? like are you able to m…
Noob question Claude Code/Chrome (www.reddit.com)
I tried to let Claude (running in Clode Code tab of the desktop app) use Chrome via the extension. It was able to open tabs and browse sites, but not interact (create a new substack article for instance).
Show HN: AI Agent Controlling Real Servers (BMC, Firmware, Network) – Full Demo [video] (www.youtube.com via hn)
About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
I’m tired of 'vibe-checking' my agents. I’ve been building a few complex agentic workflows lately, and the most frustrating part isn't the initial code, it's the non-deterministic drift.
"Prompt to game" pipeline (www.reddit.com)
Hey everyone, I wanted to share a project I started about a month ago while experimenting with AI. My main goal was to learn more about how AI can be used creatively, so I built a “prompt-to-video-game” pipeline.
OpenAI Investors–Nvidia, Oracle, More–Fall After AI Giant Misses Revenue Target (www.forbes.com via hn)
Tencent, Alibaba to back DeepSeek at $20B+ valuation (techfundingnews.com via hn)