1. paywalled

  2. My main coding agents are CodeX-CLI and OpenCode (Harness seems to have some problems). I also use CodeWhale, Antigravity-CLI and OpenClaude as supplements (because of network issues, I don't really dare to use Claude Code).

  3. I’ve been noticing this disturbing trend for quite a while. By high karma I mean well over 1000 karma.

  4. With the end of Moore's Law, optimizing code for performance has become paramount for meeting ever-increasing compute demands, particularly in hyperscale data centers where even small efficiency gains translate to significant resource and…

  5. Most discussions about agent memory focus on what to store and how to represent it. But the problem I keep running into is different: knowing when a past memory is actually relevant to bring up.

  6. The first thing we needed was a way to identify which tests were flaky and how often they failed. Luckily, the team had already built a dashboard on top of Datadog's CI Visibility feature that gives us a clear picture of the flakiest tests…

  7. Building with agents lately, and I've started wondering whether we're missing some of the engineering foundations that made traditional software manageable at scale. In traditional software, state is straightforward.

  8. When resuming a large but old session, you are presented with the choice to "Resume from Summary (Recommended)". But, I couldn't find any info on the cost on session usage.

  9. model roundup

    Sonnet 4.6
    14 items

    Several updates and comparisons revolved around Sonnet 4.6, including its performance in dashboard analytics alongside Opus 4.8, and its role in processing critical requirements for a benchmark test with Gemma 4.31B QAT.

    model roundup

    Qwen 3.6
    49 items

    Qwen/Qwen3.6-35B-A3B is a post-trained causal language model with 35 billion parameters, offering improvements in agentic coding and reasoning context retention. Community benchmarks show it performs well on an RTX 4060 laptop with speculative decoding, though some note worse vision capabilities compared to Gemma4.

  10. Does anyone manage to have Claude browse Reddit in real time? I'm trying to set up the MCP reddit-mcp-buddy but it keeps giving errors.

  11. I've been running Ollama locally for a while and the one thing I kept missing was voice. Every solution I found either sent audio to the cloud, needed a GPU, or was locked to macOS.

  12. AI agent runs amok in Fedora and elsewhere [LWN subscriber-only content] Agentic AI systems can be used to do a variety of things autonomously on behalf of a human user: open or manage bugs, generate code, submit pull-requests, and (appare…

  13. A few months ago, my computer crashed while I had 12 Claude Code sessions running in iTerm. When it didn't restore, I was pissed.

  14. This week's Visa + ChatGPT payments headline got a lot of people focused on the wrong part of the story.The interesting shift is not that an agent can buy something now. It's that we're moving from AI as assistant to AI as operator.Once an…

  15. Anthropic may designate certain models as “Covered Models” when they cross capability thresholds that warrant additional safeguards or other treatment. This page lists the models currently designated as Covered Models and describes the dat…

  16. Help! I'm trying to build a new website for work, and have it 80% there and looking really good.

  17. 🧠 GNOM-HUB The local-first multi-agent forge that compiles AI swarms into immutable products. 8 Agents.

  18. model roundup

    Opus 4.6
    21 items

    On April 25, 2026, a Cursor agent running Claude Opus 4.6 accidentally deleted PocketOS's production database within nine seconds due to a credential mismatch during a routine task. Meanwhile, OpenHack released an open-source security scanner competing with proprietary models like Claude Code Security.

    event

    Security
    354 items

    OpenAI has released GPT-5.4-Cyber for testing as part of its Trusted Access for Cyber Defense program, aiming to compete with Anthropic's Claude Mythos in the cybersecurity domain. Meanwhile, concerns are rising over the potential risks associated with advanced AI models like Mythos, prompting calls for improved defenses before wider releases.

  19. We propose inverse rubric optimization (IRO): tasks where an agent must learn the preferences of a black-box judge under a label budget. IRO tasks induce rich agent behavior and smooth scaling, making them a useful testbed for agent scienc…

  20. I am preparing to release a software project under the AGPLv3. The goal is traditional copyleft reciprocity - if you use it or host it, share your changes.

  21. Ask the SpaceX IPO filing like an analyst. Grounded across 84 indexed sources, including prospectus summaries, risk factors, MD&A, launch vehicle pages, Starlink materials, xAI/X references, charts, and image exhibits.

  22. Hi, I was wondering what the best practice is for designing eval tests for agents. Ideally I'd like to have a comprehensive set of unit tests that run simple prompts and analyse the results.

  23. Rohan Kumar on using social media for fun and profit By Aadil Pickle Jun 2026 PHOTOS BY NICK DYBEL Rohan’s job title is “Vice President of Content Strategy” at Night Media. In the streets, though, they call him "the Rick Rubin of brainrot".

  24. How AI Agents Reshape Knowledge Work Computer raises task autonomy, lowers cost, and widens the scope of work users take on. Frontier AI systems are closing the gap between model intelligence and real-world utility.

  25. In just 4 days, we have built https://appetals.com/ using our own /sutra, the Product Lifecycle Management Agent we have built for Claude Code, and it's not just a static site. It has: Astro - frontend Payload CMS - Headless content manage…

  26. I want to say a final thing about my Fable first reaction: I dedicated my life to programming and I'll use every innovation in the field, also to extract value and bring it to the local inference world, to Redis, and so forth. But:

  27. Wall #003 · walls.sh Know your headroom. A free macOS menu bar app that shows your Claude Code usage as a live % — the 5-hour session and the 7-day week — color-coded before a limit stops you mid-task.

  28. I kept running into the same problem building agent features: you want to give a model real tools — bash, file edits, grep — but you don't want it anywhere near your host, and spinning up a container/VM per session is heavy, slow, and a pa…