When Claude tells you to "stop spiraling and go to bed" (www.reddit.com)
From fabian on 𝕏: https://x.com/fabianstelzer/status/2051260931758272863
No, I won't tell you how. No this is not for anyone who is not already a proven contributor to the fine-tuning space.
Process-Level Reward Modeling for Agentic Data Analysis (arxiv.org via hn)
Process Reward Models (PRMs) have achieved remarkable success in augmenting the reasoning capabilities of Large Language Models (LLMs) within static domains such as mathematics. However, their potential in dynamic data analysis tasks remai…
Why AI Agents Need Proof Chains, Not Just Logs (github.com via hn)
Atlas Trust Infrastructure Atlas Trust Infrastructure is the public-facing trust model and documentation surface for Atlas: a metadata-first trust control plane for authorized security workflows, evidence retention, release trust, and busi…
I have a "second brain" filesystem as markdown files that I have been maintaining for months that started out in Claude Code as the interface + file read/write layer... This system just stores a collection of personal todo items, long/shor…
What's new in CC 2.1.124 (+166 tokens) and CC 2.1.126 (-87 tokens) (www.reddit.com)
NEW: System Reminder: File modification detected (budget exceeded) — Tells the agent when a user or linter changed a file but the diff was omitted because other modified files already exceeded the snippet budget, and directs it to read the…
-
31 items
model roundup
GPT 5.4OpenAI has released GPT-5.4-Cyber for testing and claims it will compete with Claude Mythos. Meanwhile, GPT-5.4 Pro has solved the Erdős Problem #1196, showcasing its advanced capabilities in mathematics.
- 25m Amp's GPT 5.5 Model Analysis
- 12h GPT 5.4 showing as 5.5?
- 19h Running 7 autonomous AI agents for 14 days. Here's what actually happens when they need to find customers.
- 1d Local LLM Benchmark about Backend Generation by Function Calling (GLM vs Qwen vs DeepSeek)
- 1d LLM proxy that lets Claude Code talk to any model
168 itemsevent
CoworkIssues with Claude Cowork have been reported, including errors and disruptions for some users on April 16, 2026. Additionally, Google has developed its own desktop Agent to compete with Cowork, while users continue to explore alternatives and troubleshoot bugs in the platform.
- 40m Using local model for Claude Desktop results in conversation name always be "Untitled"
- 3h Claude Managed Agents - Is it Ready?
- 9h [Bug] Cowork mode stuck in infinite loading when sending a message — anyone else?
- 10h Managing product requirements using a custom Live Artifact
- 10h Claude consistently over-delivers
Show HN: A tiny C program where an LLM rewires its DAG while running (github.com via hn)
liteflow A ~1000-line C program that runs YAML-defined DAGs, where an LLM can edit the graph mid-run. When a task fails, a planner LLM gets the stderr and emits one of four verbs: RETRY, PATCH, INSERT_BEFORE, ABORT.
Get ChatGPT to create a game based on your interests (www.reddit.com)
If you’ve got memory turned on, here’s a fun prompt: “Imagine the ultimate video game. Tailored specially to me and my interests.
Ctx – Persistent Memory for Claude Code, Cursor, and AI Coding Tools (github.com via hn)
ctx (Context) ctx is a system, not a prompt. A lightweight, file-based system that enables AI coding assistants to persist, structure, and rehydrate project context across sessions.
For quite a while, I've enjoyed to have claude panel and codex panel in my cursor application. For me it was practical that I didn't need to use three applications at once, but had everything in one: in cursor.
could not extract summary
paywalled
-
118 items
event
Altman AttackSam Altman, CEO of OpenAI, has faced multiple attacks on his home in San Francisco, including firebombing and drive-by shootings, raising concerns for his safety. Additionally, a majority of over 100 people interviewed by Ronan Farrow described Altman as a "pathological liar.
291 itemsmodel roundup
Qwen 3.6Qwen3.6-35B-A3B, a 35 billion parameter sparse MoE model with an active parameter count of 3 billion, was released on April 16, 2026, as open-source software under the Apache 2.0 license by Alibaba Qwen. It offers advanced functionality across various AI applications and outperformed competitors in drawing tests.
- 1h Mtplx – 2.24x faster TPS – The native MTP inference engine for Apple Silicon
- 2h How do you estimate total memory usage?
- 6h Best Llama Config for Turboquant_Plus? (Stats below)
- 14h Sglang is better for serving a model for a personal agent harness?
- 15h Deep research + report "a la McKinsey" with Hermes Agent and qwen3.6-35b-a3b Q6_K.
Why ChatGPT answers instead of saying "I don't know" (medium.com via hn)
I Forced ChatGPT Into Adversarial Tests—Here’s What It Actually Does Under Uncertainty | by Chris Russell | May, 2026 | Medium Sitemap Open in app Sign up Sign in Get app Write Search Sign up Sign in Chris Russell Follow 2 min read · 1 hou…
Claude throwing shade at JavaScript 🤣 (www.reddit.com)
Claude and I are debating the stack for a new project, when ..... 🤣 I felt like I had to share this exchange after I read #3
A Mental Model for Agentic Work (basti.io via hn)
Blog A Mental Model for Agentic Work May 5, 2026 - AI Agents - Company Operations - Software Engineering Something shifted in the first quarter of 2026. Not a feature launch, not a new product - a structural change in how work happens.
Most complex prompt (www.reddit.com)
It occurred to me that I'm (successfully) micromanaging Claude (code), but that it might be capable of doing complex long horizon tasks. What's the most complex thing you've done in a single (or tiny number of) prompts?
Is your codex also gotten slower in past few days or is it just me? (www.reddit.com)
Been using codex for a few months now. I use it in VScode.
I’m not playing a gotcha game here. AI is undeniably changing software engineering and I can’t think of a better AI use case than coding.
-
131 items
event
SecurityOpenAI has released GPT-5.4-Cyber for testing as part of its Trusted Access for Cyber Defense program, aiming to compete with Anthropic's Claude Mythos in the cybersecurity domain. Meanwhile, concerns are rising over the potential risks associated with advanced AI models like Mythos, prompting calls for improved defenses before wider releases.
- 1h Lasso Security 2024: ~20% of LLM-suggested packages don't exist — and attackers now register the popular hallucinations with malware (slopsquatting)
- 7h LLM anomaly detectors are not a cause for concern despite Mythos
- 7h Using Claude-4.6-Sonnet and Opus 4.6 in a multi-agent "Code Review Swarm" (Visual Sandbox) - try in minutes!
- 10h What Opus 4.7 Tics/Tells have you noticed?
- 10h Five Eyes agencies issue first coordinated agentic AI security guidance
132 itemsmodel roundup
Qwen 3.5Qwen3.5-9B is a post-trained model with 9 billion parameters that integrates multimodal learning and efficient hybrid architecture for enhanced performance. Community highlights include speculative decoding on Apple Silicon boosting Qwen3.5-9B's throughput by 4.1x, and the model outperforming others in coding tasks while addressing overthinking issues through tool usage.
- 2h vLLM Just Merged TurboQuant Fix for Qwen 3.5+
- 9h A simple "hack" to speed up prompt processing for Qwen 3.5/3.6 in LM Studio
- 10h APEX MoE quants update: 25+ new models since the Qwen 3.5 post + new I-Nano tier
- 22h Mistral Medium 3.5 128B and Qwen 3.5 122B A10B on 4x RTX 3080 20GB
- 2d Updated: RTX6k (Server, 450w) Qwen3.5-122B-A10B (MXFP4_MOE) Benchmarks (llama.cpp)
I've been on Max for two months and I finally sat down and tracked where my tokens actually go. breakdown of a typical day: - ~40% file reads, git status, project context scanning: stuff that doesn't need opus at all - ~25% test generation…
Show HN: Kanban-CLI – a web UI for local Markdown todo lists (github.com via hn)
As we all are, I've been experimenting with ways to reduce external saas spend, and continually bring traditionally external pieces of context (prs, docs, trello boards) into the one mono repo. I have toyed with a markdown todo list and se…
AgentShield – spending firewall for AI agents (github.com via hn)
AgentShield A spending firewall for autonomous AI agents. Before an agent executes a payment, it submits a spend intent to AgentShield.
Hey everyone, thinking about upgrading to Claude Max pretty soon and before I pull the trigger I wanted to ask if anyone has good full guides or tutorials on actually getting the most out of it. Not just "here's what the plan includes" typ…
Stripped an AI agent down to a bash loop – No Framework (github.com via hn)
Seed An autonomous AI agent that builds other autonomous AI agents Running 24/7 on a $25 Raspberry Pi Zero 2W. No API keys.
I built a tool that lets you publish your Claude Design artifacts to a real website directly from chat. I built this because chats in claude.ai already have everything they need to make a full stack web app: code execution, file creation,…