GUI is dying. Agents are the new interfaces. (www.reddit.com)
For the last 20 years, software was built for humans clicking buttons. Open dashboard.
Elon Musk Seemingly Admits xAI Has Used OpenAI's Models to Train Its Own (www.wired.com via hn)
While testifying on Thursday in federal court, Elon Musk seemed to indicate that his AI lab may have used OpenAI’s models to train xAI’s own. He touched upon the topic while sitting on the witness stand answering cross-examination question…
- Elon Musk Admits xAi is Distilling OpenAI Models (www.reddit.com)
Terminal AI Coding Agents Comparison Table (terminaltrove.com via hn)
Compare AI Coding Agents Compare 40+ AI coding agents side-by-side. Features, pricing, Terminal Bench benchmarks, MCP support, and more.
The LLM Is Not a Junior Engineer (jacobharr.is via hn)
- April 29, 2026 - A collection of different thoughts about how LLMs might be thoughtfully incorporated into modern software development practices. In the wake of my last essay on why I don’t vibe code, I heard from various people on the I…
The Human Creativity Benchmark – Evaluating Generative AI in Creative Work (contralabs.com via hn)
1.0Introduction When professional creatives evaluate AI-generated work, their judgments produce two distinct signals. The first is convergence: evaluators agree on what works, revealing shared best practices like readable typography, funct…
Why I built it: I was already using Claude for tasks. And I had my own task manager with full control over my data.
The math behind how LLMs are trained and served (www.dwarkesh.com via hn)
Did a very different format with Reiner Pope - a blackboard lecture where he walks through how frontier LLMs are trained and served. It’s shocking how much you can deduce about what the labs are doing from a handful of equations, public AP…
Elon Musk says his xAI startup's models were partially trained on OpenAI's tech (www.sfchronicle.com via hn)
Elon Musk continued his clash with OpenAI’s attorney Thursday in Oakland federal court, with Musk saying he was being repeatedly cut off by the lawyer. Musk’s testimony during the trial’s third day also revealed that his startup xAI, which…
-
24 items
event
WindsurfWindsurf 2.0 has been released with improved local and cloud agent integration and bug fixes. The update follows a series of announcements about AI tools and MCP servers, including gondola.ai's hotel search server and Stork for indexing over 14,000 AI tools.
- 15m I built an open-source bridge so AI agents can read WHOOP health data safely
- 22h Non-technical founder: Is Cursor Pro worth $20/mo for React+Supabase, or am I fighting the wrong battle?`
- 1d Kimi K2.6 helping me uninstall macOS apps
- 2d Best value in the 20$ range coding agents? I want the best quality and high-usage-limit I can get at that price.
- 3d OpenAIs Agentic Shift
240 itemsmodel roundup
Opus 4.7Claude Opus 4.7, released on April 16, 2026, is Anthropic's latest advanced AI model, offering improved handling of complex tasks and a larger context window of up to 1 million tokens. This version is 50% more expensive than its predecessor due to enhanced capabilities in software engineering and hybrid reasoning.
- 20m Opus 4.7 have less parameters than 4.6?
- 2h AI Security Institute: GPT-5.5 "may be the strongest model we have tested" for cyber exploits, including Mythos
- 3h A medicine student with no coding experience tried to create a studying agent: Felicity.
- 4h How to become more efficient with live artifacts?
- 7h We Asked GPT-5.5 and Claude Opus 4.7 to Design 5 UIs
Was constantly missing permission prompts running multiple Claude sessions in parallel. Built Claude Ops to fix it - a local browser dashboard that shows every session's live status, current tool, and spawned subagents.
Anthropic wants to be the AWS of agentic AI (thenewstack.io via hn)
Anthropic's Managed Agents platform bundles sandboxing, checkpointing, and persistent memory into a single API layer — and the company's ambitions look a lot less like a model provider and a lot more like AWS.
Spell-Checking with LLMs (revise.io via hn)
It's 2026 and word processors are still not using LLMs to power their spelling and grammar checkers. I'm really not sure why.
My organization uses Composer 2 (locally and cloud agents) for all of our development tasks, ever since it was released. It's spectacular.
Personal AI Agents (www.reddit.com)
Hey everyone, I’m looking to build a custom AI agent (or multi-agent system) and would appreciate some advice on the best frameworks and tools to execute this. I want an automated daily workflow, rather than just querying a standard LLM in…
- Ask HN: Show Us Your Personal Agents? (news.ycombinator.com)
- AI Agents (www.reddit.com)
- How to set up personal agents? (www.reddit.com)
+1 more
- Personal Knowledge Base for AI Agents (www.reddit.com)
How to prompt for font detection? (www.reddit.com)
I'm trying to use claude to find different fonts from images of pages from a book but it seems to be getting a lot of it wrong. Is there a specific setup or model that is best geared for analyzing text from images?
The Block Model Behind Warp's Agentic Development Environment (www.warp.dev via hn)
Warp has come a long way since it initially set out to modernize the terminal. In the screenshot above, an agent is working through a plan alongside a developer's own shell commands — running its own commands, reasoning, proposing a diff —…
Running Local Agentic PDF Search with Eno (enopdf.com via hn)
eno can drive its full agentic search against a local, open-weight model running on your own hardware. When you do, your PDFs, your queries, and every intermediate step of the agent loop stay on your machine.
-
104 items
model roundup
GPT 5.5On [Date], a significant leak of the OpenAI Codex model, referred to as GPT-5.5, was captured on video before it was patched. The incident involved models named Arcanine and Glacier-alpha.
- 32m Anyone using OpenAi's Privacy Filter?
- 45m GPT-5.5 is the second model to complete AISI multi-step cyber-attack simulation
- 1h GPT5.5 slightly outperformed Mythos on a multi-step cyber-attack simulation. One challenge that took a human expert 12 hrs took GPT-5.5 only 11 min at a $1.73 cost
- 5h GPT-5.5 authorship and order effects
- 11h Which AI agents do you use to automatise your process ?
151 itemsevent
CopilotMicrosoft is keeping its Copilot tool for Windows 11 but renaming it, while issues with rate limits and a security proxy have sparked concerns among users of GitHub Copilot. Meanwhile, Anthropic released a report on agentic coding trends, highlighting that developers use AI in about 60% of their work.
- 1h Claude Code vs Cursor vs Copilot vs Codeium: Which AI coding assistant is actually worth paying for?
- 2h Am I doing it right? Claude Desktop with VSCode Claude Code
- 3h Show HN: Sampletext.store/ We built a dumb web shop and we cannot look away
- 7h LocalPilot with Ollama as a Replacement for CoPilot in VS2026
- 7h TDD and Rules Enforcement using Hooks
I gave my claude code buddy a black hat (www.reddit.com)
Bringing Claude Code into the physical world! I'm working on an open-source project to make AI dev sessions more fun and interactive.
what are the biggest risks of agentic AI in supply chain production? (www.reddit.com)
I think LLMs have trouble with conceptual generalization because every dimension of their representations ALWAYS participates in every operation, so concepts can't be isolated from surface features. So, I was thinking that if we had a good…
A lot of model talk still starts from which one feels smartest. Inside Cursor, I’m not sure that is the first question anymore.
Benchmarking Local LLM/Harness Combinations (neuralnoise.com via hn)
I’ve been running a small benchmark, harness-bench , that pairs local LLMs (served via llama.cpp ’s llama-server ) with agent harnesses (Aider, Claude Code, OpenCode, Pi, Qwen CLI) on 16 software-engineering tasks across Python, PyTorch, J…
- Benchmarking Local LLM/Harness Combinations (neuralnoise.com via reddit)
I run multiple Claude Code sessions at the same time, across multiple projects/modules. Sometimes I need my existing sessions to work together, but CC does not support inter-session messaging so I have to be the message bus.
Coding agents expose this: same VPS, 3 runs, ~65% drift (webbynode.com via hn)
Developer documentation-style problem-resolution hub for VPS, hosting, and deployment. Real benchmarks, real CLI output, real solutions.
Digging into Claude Code and codex source codes to understand how they work (nimasadri11.github.io via hn)
The Annotated Coding Agent Comparing the architectures of Codex CLI and Claude Code. I spent the past few days reading and understanding the Codex CLI and Claude Code codebases.
How to export claude chat with all my pdfs and links and pics? (www.reddit.com)
basically the title, how to do it? when I copy the chat it shows only the text used, not the pics, pdfs, link