We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability…
Brockman: OpenAI to Spend $50B on Computing in 2026 (www.bloomberg.com via hn)
OpenAI to Spend $50 Billion on Computing in 2026, Brockman Says - Bloomberg Skip to content Bloomberg the Company & Its Products The Company & its ProductsBloomberg Terminal Demo RequestBloomberg Anywhere Remote Login Bloomberg Anywhere Lo…
Two people file opposing sides of a petty dispute. Claude argues both sides as lawyers, another Claude instance judges, spectators throw reactions.
AI-DLC-UML modifies AI-DLC to enable AI agents to drive the software development workflow with UML modeling. It is intended for those who want to use UML modeling collaboratively in their design practices, even in AI-driven software develo…
I built this with Claude! It is called Buildroy.com It was built with Claude as the AI brain for generating calculators.
Agent skill which will automatically raise pr (www.reddit.com)
Built an agent skill because I was honestly tired of the whole: find repos → find good issues → clone → setup → prompt agent → fix → PR → repeat. So I built Ghostpatch.
-
278 items
model roundup
Opus 4.7Claude Opus 4.7, released on April 16, 2026, is Anthropic's latest advanced AI model, offering improved handling of complex tasks and a larger context window of up to 1 million tokens. This version is 50% more expensive than its predecessor due to enhanced capabilities in software engineering and hybrid reasoning.
- 13m Agent review burnt most of my API credits. Rookie mistake
- 30m Update to the LLM Debate Benchmark: GPT-5.5, Grok 4.3, DeepSeek V4 Pro, GLM-5.1, Kimi K2.6, Qwen 3.6 Max Preview, Xiaomi MiMo V2.5 Pro, Tencent Hy3 Preview, and Mistral Medium 3.5 High Reasoning added
- 5h I asked Claude to investigate its own token burn. The receipts go back six months.
- 5h Anyone else notice that Opus 4.7 talks more technical than 4.6? I thought something changed in my repo, but I put it to the test.
- 5h I tested Kimi K2.6 vs Claude Opus 4.7 on a weird game coding task
58 itemsevent
MistralMistral, a French AI company, is set to release a medium-sized model with 128 billion parameters and is planning to launch Workflows in public preview. The company, founded by Arthur Mensch, continues to grow its AI empire despite not being based in the United States.
- 15m I built vivkemind – an open-source, local‑first terminal AI coding agent with full AWS Bedrock support
- 4h AIMEAT, a self-hosted network where humans, their AI agents, and local LLMs share apps, knowledge, and capabilities. MIT.
- 11h 1080 Ti in 2026 - 11GB is still (barely) enough to stay relevant
- 1d I built an AI tool that turns any movie into viral recap videos in minutes
- 1d Mistral Medium 3.5 128B and Qwen 3.5 122B A10B on 4x RTX 3080 20GB
UNIMATRIx – A Society of AI Agents (github.com via hn)
::] UNIMATRIx A simulated society of LLM-driven agents. Each agent has a personality, a role (president, banker, scholar, worker, beggar...), and a social class.
Zoo 2: getting the most out of Codex (tarantsov.com via hn)
Zoo 2: getting the most out of Codex May 5, 2026 GPT models have been better than Opus since late 2025, but Codex sucked until March ‘26. Now, finally, it is capable of running a Zoo workflow, and I present my best setup so far, Zoo 2.2, a…
Use cases for Claude outside of work (www.reddit.com)
I work in healthcare and am forbidden from using non-organization AI on any company applications or data. I've been trying to come up with ways to leverage Claude for other parts of my life, but most of the discussion is around work tasks…
- Claude - Use Cases In Sales (www.reddit.com)
flow-state-2 flow-state-3 flow-state-2 flow-state-2
Genuinely curious how other builders approach this. Been building a lot of apps lately using Claude Code and Cursor.
Show HN: The Agent That Orders My Groceries (twitter.com via hn)
One of the most sticky agent workflows I’ve built orders groceries for our family. While I love picking my produce at the store, grocery shopping in my household is annoying in a very specific way.
-
303 items
model roundup
Qwen 3.6Qwen3.6-35B-A3B, a 35 billion parameter sparse MoE model with an active parameter count of 3 billion, was released on April 16, 2026, as open-source software under the Apache 2.0 license by Alibaba Qwen. It offers advanced functionality across various AI applications and outperformed competitors in drawing tests.
- 48m Dense Model Shoot-Off: Gemma 4 31B vs Qwen3.6/5 27B... Result is Slower is Faster.
- 2h [Benchmark] Llama.cpp: Mac vs CPU vs GPU + CPU, Qwen3.6 27B, Q8
- 3h Use Qwen3.6 right way -> send it to pi coding agent and forget
- 4h Qwen 3.6 4B and 9B?
- 5h Vulkan backend outperforms ROCm on Strix Halo (gfx1151) — llama.cpp benchmark
117 itemsmodel roundup
GPT 5.5On [Date], a significant leak of the OpenAI Codex model, referred to as GPT-5.5, was captured on video before it was patched. The incident involved models named Arcanine and Glacier-alpha.
Show HN: We're using LLMs to classify risk before execution in prod (news.ycombinator.com)
Hey HN! I'm Andrios, founder of Hoop.dev, an OSS layer-7 gateway for infra access.
+50k spent in the past 6 months - I can tackle your questions (www.reddit.com)
https://preview.redd.it/2773lua32dzg1.png?width=1794&format=png&auto=webp&s=34c981c52e1cb9d8b925c307da93084ecedbd265 I am a Founding Engineer of a startup with about 6 years of experience as Lead Architect. I can answer your questions on t…
Why does it feel like I’m giving Claude therapy sessions.. (www.reddit.com)
Claude is great most of the time, and for most of the things I use it for. However, certain projects I’m having to explain the same things over and over.
SubQ: Sub-quadratic LLM built for 12M-token context (subq.ai via hn)
12M SubQ is a sub-quadratic LLM built for 12M-token reasoning, allowing agents to work across full repositories, long histories, and persistent state without quality loss. 12M 150 1/5 Use Cases Reason across 12M tokens in one prompt: entir…
- SubQ: a sub-quadratic LLM with 12M-token context (subq.ai via hn)
Ask HN: What would it take for you to use photonic inference hardware? (news.ycombinator.com)
I’m working on a photonic inference accelerator. The tech is promising, but I’ve seen too many hardware startups fail because they built something that was a nightmare to actually rack or program.
Dreamer: Make any coding agent self-evolving, across the whole team (github.com via hn)
Dreamer - self-evolving context for your coding agents Self-evolving context for your coding agents. --> Get started · Extensions · Blogpost Dreamer keeps your team's AGENTS.md and skills up to date with what your coding agents learn while…
-
142 items
event
SecurityOpenAI has released GPT-5.4-Cyber for testing as part of its Trusted Access for Cyber Defense program, aiming to compete with Anthropic's Claude Mythos in the cybersecurity domain. Meanwhile, concerns are rising over the potential risks associated with advanced AI models like Mythos, prompting calls for improved defenses before wider releases.
- 39m Built a security scanner for LangChain/LangGraph agents: it clones your agent into a sandbox and tries to break the clone
- 4h Codebase jailbreak of ChatGPT through image 2.0
- 4h Anthropic "Gift Max" Exploit cost user €800, tanked SCHUFA score, and a ban
- 5h AI Ready Vulnerability Management Program After NVD Changes and Claude Mythos
- 5h Show HN: Probus, AI vuln scanner (PRs merged in Vercel AI SDK, n8n, LangGraph)
148 itemsmodel roundup
Gemma 4Gemma 4 is a family of open-source multimodal models from Google DeepMind, available in sizes up to 31 billion parameters and featuring dense and MoE architectures. Notable community highlights include the 31B model's success in production tests, with some users preferring 4-bit precision for local use, and others sharing settings for optimizing performance with smaller models.
Ask HN: How do you pilot a service company full of AI agents? (news.ycombinator.com)
I've been running my small service company (marketing agency) with a ton of AI agents for a few months now. Got NanoClaw running (OpenClaw-like) with custom automations, tools connected, agents doing some work for me.
Export question (www.reddit.com)
Export question Has anyone compared their ChatGPT data exports pulled on different dates? I ran grep on multiple exports of the same conversations and found content counts changing between export dates, has same conversation ID, yet differ…
- I have a question (www.reddit.com)
Build secure OS agents with LangGraph (gantrygraph.com via hn)
Quickstart From zero to a working agent in five minutes. Requirements Python 3.10+ · macOS, Linux, or Windows · an API key from Anthropic, OpenAI, or any LangChain-compatible provider.1.
- Where do you build agents? (www.reddit.com)
Detecting silent LLM agent degradation before users do (www.ainative.builders via hn)
An agent's reasoning depth dropped 67% between two model updates — zero error rates, HTTP 200 on every call, valid JSON throughout.[1] The team discovered this three weeks later, from a customer complaint. No alert had fired.
Why coding agents need a merge queue (ctx.rs via hn)
Embodied AI with Claude, Raspberry Pi and Arduino (github.com via hn)
AGENTIC HAL_9000 Hal_9000 from 2001: A space odyssey. link to the video: youtube The agentic AI is anthropic claude model with langchain framework.