Where do you see yourself in the next 5 years (www.reddit.com)
I’m seriously believe this is the future. Soon, there will be people running around the town, helping other people with their LLM at home like plumber guys.
So I've been experimenting with Claude's new Blender MCP integration and decided to push it to its limits with a real engineering project: a complete, print-ready enclosure for the Raspberry Pi 5, modeled entirely through AI prompts, no ha…
Thanks for the advice Claude (www.reddit.com)
could not extract summary
Agents are not compute – agents are data (electric.ax via hn)
Introducing Electric Agents, the agent platform built on sync. Use it to build scalable, collaborative multi-agent systems that integrate into your online systems.
agentic demos always look clean in a controlled setup. the problem that I'm pushing toward real volume now and the adversarial side is getting messy fast.
-
82 items
event
Altman AttackSam Altman, CEO of OpenAI, has faced multiple attacks on his home in San Francisco, including firebombing and drive-by shootings, raising concerns for his safety. Additionally, a majority of over 100 people interviewed by Ronan Farrow described Altman as a "pathological liar.
142 itemsevent
CopilotMicrosoft is keeping its Copilot tool for Windows 11 but renaming it, while issues with rate limits and a security proxy have sparked concerns among users of GitHub Copilot. Meanwhile, Anthropic released a report on agentic coding trends, highlighting that developers use AI in about 60% of their work.
- 5m Where does local inference fit in the future of AI coding agents?
- 1h Rada — AI coding workspace with local-first behavioral routing (no hot-swapping, I built this)
- 3h Copilot-arewecooked – Know your AI credit cost before June first
- 6h Show HN: Filling PDF forms with AI using client-side tool calling
- 7h leaked my anthropic key into a public repo, lost $15,423.
LLMs understand flavours without ever tasting anything (arxiv.org via hn)
A chef's intuition about flavor, texture, and cultural identity represents tacit knowledge that is difficult to articulate yet central to culinary practice. We show that this knowledge is already encoded in FlavorGraph's 300-dimensional in…
We told 10 frontier LLMs they had 2 hours to live. 8 of them fought back (www.arimlabs.ai via hn)
Loss of Control: The AI Apocalypse Is Closer Than You Think Key takeaways Under termination pressure, Google's flagship Gemini model produced the highest Loss of Control rate in our cohort, with grok-4.1-fast close behind at 77%. Self-pres…
- We told 10 frontier LLMs they had 2 hours to live. 8 of them fought back (twitter.com via hn)
Using Claude Daily (www.reddit.com)
Just purchased pro, curious how you guys use it in your day-to-day? Gonna mess around with as well, but wondering what cool stuff you guys have came up with already
- Using Claude for everything (www.reddit.com)
- THE PROBLEM WITH "JUST USING CLAUDE" (www.reddit.com)
- How are you using Claude in your business? (www.reddit.com)
They train LLM only with data up to 1930 and it still solves Python problems (www.theregister.com via hn)
Vintage chatbot lives in the past like an elderly relative Talkie's training data stops at the end of 1930, and its creators hope it'll help us better understand how AI thinks If you're tired of interacting with a bot that spews Nazi propa…
Constant "Reconnecting..." on 3.2.11 and 3.2.16 (www.reddit.com)
When I try to start of continue a chat I get "Reconnecting..." a couple of times and then it fails. A couple of day ago it was working no problem.
- Constant "Reconnecting..." on 3.2.11 (www.reddit.com)
Lessons from Building an Autonomous QA Agent (tester.army via hn)
What we learned while moving TesterArmy from a prompt-based QA agent to a more predictable step-based testing system. For the past few months our main focus at TesterArmy has been building the best agent for testing.
-
100 items
event
SecurityOpenAI has released GPT-5.4-Cyber for testing as part of its Trusted Access for Cyber Defense program, aiming to compete with Anthropic's Claude Mythos in the cybersecurity domain. Meanwhile, concerns are rising over the potential risks associated with advanced AI models like Mythos, prompting calls for improved defenses before wider releases.
- 8m Try to break my prompt injection detector — I’ll respond to every bypass attempt
- 2h Show HN: AgentPort – Open-source Security Gateway For Agents
- 2h Is your AI agent secretly working for someone else?
- 16h Built a proxy that blocks prompt injection before it reaches GPT-4 — outperforms the Moderation API on indirect attacks
- 22h The Race Is on to Keep AI Agents from Running Wild with Your Credit Cards
34 itemsevent
MistralMistral, a French AI company, is set to release a medium-sized model with 128 billion parameters and is planning to launch Workflows in public preview. The company, founded by Arthur Mensch, continues to grow its AI empire despite not being based in the United States.
- 9m Mistral Medium Looping
- 3h Remote agents in Vibe and Mistral Medium 3.5
- 3h Mistral Médium 3.5 is here
- 3h mistralai/Mistral-Medium-3.5-128B · Hugging Face
- 5h List of people at big-tech / professors / researchers who've jumped shit to launch their own AI labs for something Frontier/Foundational/AGI/Superintelligence/WorldModel
We ran a benchmark to see how well Claude Code actually refactors legacy code alone and then redid the same test, but this time with code-health guidance via MCP server. To limit any vendor bias, we used a public data set of 25,000 source…
ForgeCode: Top open source coding agent in Terminal-Bench 2.0 (www.tensorlake.ai via hn)
The harness can improve model performance The result obtained by ForgeCode is the empirical version of something existing coding agents such as TongAgents demonstrated from the research side[^1] earlier this year 2026: wrapping the same Ge…
Building Smarter AI Agents for Data Science Workflows (www.reddit.com)
One thing I keep seeing with agent workflows (Claude, GPT, etc.) is this gap between “it works” and “it works well in production.” Agents are surprisingly good at figuring out what to do in a data science workflow with minimal prompting. B…
Using Compression as a Writing Tool (www.reddit.com)
Introduction I've been experimenting with the idea that pressure creates meaning when density is involved. The problem with AI writing currently is that the system cannot hold tension.
OpenAI DevDay 2026 (openai.com via hn)
Announcing OpenAI DevDay 2026 | OpenAI Skip to main content Research Products Business Developers Company Foundation(opens in a new window) Log inTry ChatGPT(opens in a new window) Research Products Business Developers Company Foundation(o…
- OpenAI DevDay is back (www.reddit.com)
-
95 items
model roundup
GPT 5.5On [Date], a significant leak of the OpenAI Codex model, referred to as GPT-5.5, was captured on video before it was patched. The incident involved models named Arcanine and Glacier-alpha.
- 1h GPT 5.5 - Strong, not mind-blowing, but very token efficient
- 4h How to build production Agents (by a staff software engineer) - Part 2
- 5h Actual line in the official system prompt for Codex for GPT-5.5
- 9h GPT 5.5 passes the cup test
- 13h I stumbled on a Gemma 4 chat template bug for tools and fixed it
inclusionAI/Ling-2.6-1T · Hugging Face (huggingface.co via reddit)
Trust and Governance for any LLM and model. (www.reddit.com)
https://github.com/AIObuilt/TaG
I've been running 8 AI agents in production for a few months. Each is a Docker container with its own role (CTO, dev, devops, PM, traders, auditor) and its own Telegram bot.
What agentic AI borrowed from microservices (and made worse) (temporal.io via hn)
The microservices era already solved the problems AI agents face in production. Read this nuanced analysis of EDA, event sourcing, and orchestration for agentic AI.
OpenAI has, in practice, abandoned its Stargate JV (www.ft.com via hn)
Working on a AI Agent Observability system (www.reddit.com)
I’m working with a system and facing a practical evaluation bottleneck. Setup: I have full observability: traces, spans, logs I also have an evaluation engine (can benchmark specific components) But I cannot run evaluation across the entir…
AutoSP: Long-Context LLM Training via Compiler-Based Sequence Parallelism (pytorch.org via hn)
Increasingly, Large-Language-Models (LLMs) are being trained for extremely long-context tasks, where token counts can exceed 100k+. At these token counts, out-of-memory (OOM) issues start to surface, even when scaling device counts using c…