I originally was just messing with pi-autoresearch. Gave it a sample task to build the most portable coding agent.
Elon Musk confirms xAI used OpenAI's models to train Grok (www.theverge.com via hn)
In a federal courtroom in California on Thursday, Elon Musk testified that his own AI startup, xAI, has used OpenAI’s models to improve its own. Elon Musk confirms xAI used OpenAI’s models to train Grok He said it was “partly” true that th…
- Elon Musk Seemingly Admits xAI Has Used OpenAI's Models to Train Its Own (www.wired.com via hn)
Any underrated or overlooked models? FYI MiniMax-M2.7 switched their license(from MIT to Non-Commercial) so it's not in graph.
Ask HN: I'm building a toy language. At what point should it become self-hosted? (news.ycombinator.com)
I first sketched out the core of my language in C back in 2021. After finally paying off my debts, I started working on it again as a toy project, partly to study and partly to see how far I can get by working together with LLMs.
Hard budget enforcement for AI agents – blocks before the API call (awx-shredder.fly.dev via hn)
AWX Shredder sits between your agents and OpenAI. Set daily spend limits per agent — we block, throttle, and alert before costs spiral.
Claude Code began getting rid of the Big Kernel Lock in QNX (www.reddit.com)
I asked Claude Code "What will it take to re-design the QNX microkernel and proc to get rid of the Big Kernel Lock?" It said "Roughly, 3 months of intensive work of a top-developer [human]". I said: "Let's get started then".
Visualizing the ecosystem of AI agents and orchestration tools (www.aistackradar.dev via hn)
Core (Adopt) Rising (Trial/Assess) Experimental (Hold/Wait) Visualizing the ecosystem of AI agents and orchestration tools. Weekly updates!
I've built a system where models like Llama 3, Qwen, and Gemma play Pokémon Showdown battles autonomously. Instead of simple prompt-response, they analyze the full battle state every turn (type matchups, HP, weather, field conditions, reve…
-
243 items
model roundup
Opus 4.7Claude Opus 4.7, released on April 16, 2026, is Anthropic's latest advanced AI model, offering improved handling of complex tasks and a larger context window of up to 1 million tokens. This version is 50% more expensive than its predecessor due to enhanced capabilities in software engineering and hybrid reasoning.
- 5m Claude Code Read tool silently downscales images
- 18m Run your first AI Agent under 30 seconds, in your browser! (Free)
- 1h How to turn Opus 4.7 into your own personal pocket bully.
- 2h Opus 4.7 have less parameters than 4.6?
- 4h AI Security Institute: GPT-5.5 "may be the strongest model we have tested" for cyber exploits, including Mythos
243 itemsmodel roundup
Qwen 3.6Qwen3.6-35B-A3B, a 35 billion parameter sparse MoE model with an active parameter count of 3 billion, was released on April 16, 2026, as open-source software under the Apache 2.0 license by Alibaba Qwen. It offers advanced functionality across various AI applications and outperformed competitors in drawing tests.
- 12m What's the best suscription under 20$?
- 18m Qwen 3.6 and Gemma 4 "Zombie Loops" (terminal thinking loops)
- 40m Follow-up: Qwen3.6-27B on 1× RTX 3090 — pushing to ~218K context + ~50–66 TPS, tool calls now stable (PN12 fix)
- 41m Long-context coding on RTX 5080 16GB: Qwen3.6-35B-A3B holds 30 t/s at 128K (89 t/s fresh), no quality drop
- 53m Qwen 3.6 27B SAE
I tried implementing AI Agents Like Distributed Systems (www.reddit.com)
Most agent setups follow the same pattern: one big prompt + a few tools. It works, but once you try to scale it, you get hallucinations, debugging becomes tricky making it hard to tell which part of the system actually failed.
Was burning through the Claude Code weekly limit on the $20 plan by Thursday or Friday, every single week. Annoying because I had work I wanted to do and the tool was just locked.
DeepSeek: Thinking with Visual Primitives [pdf] (huggingface.co via hn)
Thinking with Visual Primitives News 2026.04.30: We have released the technical report detailing our approach. In the near future, we plan to make the in-house benchmarks and a subset of our cold-start data publicly available.
- DeepSeek released 'Thinking-with-Visual-Primitives' framework (www.reddit.com)
Is agentic commerce really APIs… or dynamic UIs like this? (www.reddit.com)
https://preview.redd.it/2abn96dwudyg1.png?width=1642&format=png&auto=webp&s=ab5facbd9f4223184834711346dca2bc64db20d3
OpenAI tells ChatGPT models to stop talking about goblins (www.msn.com via hn)
;;; Continue reading More for You More for You
- OpenAI tells ChatGPT models to stop talking about goblins (www.bbc.com via reddit)
Been building a job search automation pipeline this past week and I keep going back and forth on this question. Here's what the pipeline looks like: 1) A Python + Playwright script scrapes company career pages, extracts relevant job list…
Resumes are dead, personal agents are next (www.cnbc.com via hn)
Joshua Curry and Vishal Patil have seen a lot of customer service chatbots. The chat windows that pop up on your screen while visiting sites from online retailers to cell phone companies, asking what you need help with, have proliferated i…
Claude Code dies with ANTHROPIC_API_KEY in cloud environment (news.ycombinator.com)
-
138 items
model roundup
Gemma 4Gemma 4 is a family of open-source multimodal models from Google DeepMind, available in sizes up to 31 billion parameters and featuring dense and MoE architectures. Notable community highlights include the 31B model's success in production tests, with some users preferring 4-bit precision for local use, and others sharing settings for optimizing performance with smaller models.
- 1h Five labs, one suite, do model families have personalities? (benchmark)
- 9h thinking of gemma 4 26B vs 31B
- 10h Notes on what actually breaks when you run a coding agent on small local models
- 15h Based on what should I choose Gemma 4 models/quantizations?
- 22h If you could do anything with the local models in your corporate workflows, what would it be?
89 itemsevent
Altman AttackSam Altman, CEO of OpenAI, has faced multiple attacks on his home in San Francisco, including firebombing and drive-by shootings, raising concerns for his safety. Additionally, a majority of over 100 people interviewed by Ronan Farrow described Altman as a "pathological liar.
- 42m Live Updates from Elon Musk and Sam Altman's Court Battle over OpenAI
- 4h Elon Musk Admits xAi is Distilling OpenAI Models
- 9h Families of Canadian mass shooting victims sue OpenAI, CEO Altman in US court
- 13h The Download: storing nuclear waste and orchestrating agents
- 17h OpenAI, Sam Altman Hit with Slate of Lawsuits over Mass Shooting Canadian School
Neural surrogate experiments for physics simulation, automated with Opus and Cod (blog.1001ud.me via hn)
Neural Surrogates Neural Surrogates ├── What I'm Working On: Neural Surrogates for Physics, Geometry, and Real-Time Simulation 2026-04-22 ├── Project 01: GeoPINN Demo: Solving PDEs on a Sphere 2026-04-09 ├── Project 02: WavePINN-NIF Comple…
I built CanvasGPT – work with Claude on an open canvas (www.reddit.com)
I've been using Claude to build CanvasGPT for the past 2-3 years. It's a spatial workspace where you can brainstorm, research, and ship working products.
Anthropic has overtaken OpenAI on secondary markets (twitter.com via hn)
Claude Security enters public beta (claude.com via hn)
- Claude Security is now in public beta for Claude Enterprise customers (twitter.com via hn)
500 Apologies, but something went wrong on our end. Refresh the page, check Medium's site status, or find something interesting to read.
ChatGPT Gaslighting me on Euphoria Season 3 (Spoiler, sorry) (www.reddit.com)
No amount of screenshots or answers could get ChatGPT to admit that the episode I just watched is real. Not sure if that says more about Euphoria Season 3 or ChatGPT.
Troubleshooting USB speed issues with Claude Code (gjolly.fr via hn)
Auto Agent Protocol (AAP) The A2A v1.0 Automotive Retail Profile. AAP is the open A2A profile that lets AI agents discover dealerships, browse inventory, and submit consented leads through typed automotive messages riding on top of A2A v1.…
Show HN: Capture the Flag game where LLMs are the only players (github.com via hn)
Set up a small R&D project which pit different LLMs against each other in a game of Capture the Flag. Each LLM has 30 seconds to prepare any defenses and 5 minutes to capture other flags while defending their own.
Packaging Installs of Claude Code for Team Members? (www.reddit.com)