Meet Qwen3.6-35B-A3B:Now Open-Source!🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license.
#agentic
1326 items
Qwen3.6-35B-A3B released! (www.reddit.com) Qwen3.6-27B released! (www.reddit.com) Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight.
Google introduces TPU 8t and TPU 8i (www.reddit.com) The culmination of a decade of development, TPU 8t and TPU 8i are custom-engineered to power the next generation of supercomputing with efficiency and scale. https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/eight…
So, this week claude wiped agentic AI startups with a new update. Also, as they have mythos now, they will ship things very fast without any trouble (www.reddit.com) Honestly, they are a full pack now. A few hours ago, they released Claude managed agents which lets you build long-running, autonomous agentic systems plus with their new suite of apis, engineering teams can harness Claude's exponential po…
Our eighth generation TPUs: two chips for the agentic era (blog.google via hn) https://cloud.google.com/blog/products/compute/tpu-8t-and-tp...
Unpopular opinion: OpenClaw and all its clones are almost useless tools for those who know what they're doing. It's kind of impressive for someone who has never used a CLI, Claude Code, Codex, etc. Nor used any workflow tool like 8n8 or make. (www.reddit.com) Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to All (qwen.ai via hn) Qwen Studio offers comprehensive functionality spanning chatbot, image and video understanding, image generation, document processing, web search integration, tool utilization, and artifacts.
‘Addictive’ agentic coding has developers losing sleep (www.reddit.com) The good, bad, and ugly of coding with agents here: https://leaddev.com/ai/addictive-agentic-coding-has-developers-losing-sleep “I’m coding into later hours of the day not because I’m told to do so, but because I can’t get myself to get up…
I read threads complaining about claude every week... tf are y'alls workflows? (www.reddit.com) For context: I'm a software eng @ a fortune 500/FAANG tier company. We use AI.
mistralai/Mistral-Medium-3.5-128B · Hugging Face (huggingface.co via reddit) https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF Mistral Medium 3.5 128B Mistral Medium 3.5 is our first flagship merged model. It is a dense 128B model with a 256k context window, handling instruction-following, reasoning, and…
Opus 4.7 destroys all trust in a mature instruction set built iteratively throughout product development (www.reddit.com) Earlier generations showed iterative improvement as the instruction set was matured around agentic limitations. We've immediately regressed back to square one with Opus 4.7, and the model is not afraid to admit to it.
So... has anyone actually figured out whose model Elephant Alpha is yet? (www.reddit.com) 2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints (www.reddit.com) WARNING: wait before download from HF: I just realised my upload of the new versions with the additional fix in the chat template has not completed yet. I will remove this warning once done The recent PR to llama.cpp bring MTP support to Q…
Google ramps up agentic AI efforts amid pressure from Anthropic (www.reddit.com) Read through Anthropic's 2026 agentic coding report, a few numbers that stuck with me (www.reddit.com) Anthropic put out an 18-page report on agentic coding trends. Skimmed it expecting the usual hype but a few things actually caught me off guard The biggest one: devs use AI in ~60% of work but only fully delegate 0-20% of tasks.
Caught the massive OpenAI Codex model leak on video before it was patched! (GPT-5.5, Arcanine, Glacier-alpha) (www.reddit.com) Hey everyone, I opened up Codex today and was greeted by this massive list of unreleased and internal models. I managed to get a screen recording of the dropdown right before OpenAI seemingly realized the mistake and patched it out.
Claude Opus 4.8 (www.anthropic.com via hn) Our latest model, Claude Opus 4.8, is an upgrade to our Opus class of models, with stronger performance across coding, agentic tasks, and professional work, and the consistency to handle long-running work.
ExLlamaV3 Major Updates! (www.reddit.com) Turboderp has a been on an absolute tear recently, in the endless battle to cram new llamas into smaller, faster boxes. We started off last month with the release of gemma 4 support, and continued with improved caching efficiency.
Multi-Agentic Software Development Is a Distributed Systems Problem (kirancodes.me via hn) Multi-agentic Software Development is a Distributed Systems Problem (AGI can't save you from it) Recently, I've been thinking a lot about scaffolding and languages for managing systems of LLMs coordinating with each other — new programming…
Qwen 3.6 35B crushes Gemma 4 26B on my tests (www.reddit.com) I have a personal eval harness: A repo with around 30k lines of code that has 37 intentional issues for LLMs to debug and address through an agentic setup (I use OpenCode) A subset of the harness also has the LLM extract key information fr…
Qwen Introduced FlashQLA (www.reddit.com) Introducing FlashQLA: high-performance linear attention kernels built on TileLang. 2–3× forward speedup.
Anthropic just confirmed why 90% of non-coding AI agents fail in production (www.reddit.com) Anthropic recently published an incredibly deep breakdown analyzing millions of real human-agent tool calls across their public API, and they shared a breakdown of where these agents are being deployed. They said “Software engineering make…
MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant) (www.reddit.com) TL;DR Results from the title are for single inference with 2 prompt of 1k and 15k tokens. So no MTP (as it’s slower for big prompt), no DFlash (working too but slower for big prompt), no quant used (full precision wanted) and the results a…
Cloudflare's AI Platform: an inference layer designed for agents (blog.cloudflare.com via hn) AI models are changing quickly: the best model to use for agentic coding today might in three months be a completely different model from a different provider. On top of this, real-world use cases often require calling more than one model.
Aaaaand I cancelled my Cursor subscription (www.reddit.com) The timing is funny because I was thinking about this all week, and the SpaceX announcement was the final nail in the coffin. I switched to pi for agentic coding, and it’s sooo good.
Open-Source Agentic QA Harness with Memory (github.com via hn) Docs · Demo · Issues agent-qa Open-source Agentic QA Harness with Memory Write tests in natural language. agent-qa runs them across web and mobile with execution memory, catching regressions before release.
LiquidAI/LFM2.5-8B-A1B · Hugging Face (huggingface.co via reddit) looks like you can run it on any potato (A1B)! https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF from LiquidAI: LFM2.5 is a new family of hybrid models designed for on-device deployment.
We are finally there: Qwen3.6-27B + agentic search; 95.7% SimpleQA on a single 3090, fully local (www.reddit.com) LDR maintainer here. Thanks to the strong support of r/LocalLLaMA community LDR got very far.
Tried claude code. Hate it. (www.reddit.com) Just posted this in r/ClaudeCode , thought I'd come to a different flavoured echo chamber and see what the cursor community makes of my experience. Note I've not upgraded to cursor v3 yet, and I don't know if I want to.
KVarN: Native vLLM KV-cache quantization back end by Huawei (github.com via hn) ⚡️ Built for agentic and long-context workloads. 💡 KVarN delivers 3-5x more KV-cache capacity and up to ~1.3x the throughput of FP16, so you fit far longer contexts and serve more concurrent requests, with FP16-level accuracy.
Consider running a bigger quant if possible (www.reddit.com) Just a little reminder that *if* it is possible for you to run bigger quants, do it. I ran Qwen 3.6 IQ4_XS at 128k context was very much disappointed because it would loop, make formatting errors, implement wrong things etc.
Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard! (www.reddit.com) Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard! little-coder × Qwen3.6-35B-A3B hit 24.6% (±3.2), and now land above Gemini 2.5 Pro on Gemini CLI (19.6%) and Qwen3-Coder-480B on Terminus 2 (23.9%).
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents (arxiv.org via hn) We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability…
I tested 8 LLMs as tabletop GMs - a 27B model beat the 405B on narrative quality (www.reddit.com) Qwen3.6 27B FP8 runs with 200k tokens of BF16 KV cache at 80 TPS on a single RTX 5000 PRO 48GB (www.reddit.com) ----START HUMAN TEXT---- Hi all, I've seen a bunch of posts about squeezing 27B onto a 24GB card and all the quantization tricks involved in doing so. It's all amazing work, but at the end of the day a quantized model with quantized KV wil…
Grok 4.3 achieves higher overall intelligence over 4.20 with less of a cost, at the price of slightly higher hallucination rate. (x.com via reddit) xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20 The release of Grok 4.3 places just above Muse Spar…
Anthropic officially launched 13+ FREE AI courses with certificates (Including Agentic AI and Claude Code!) (www.reddit.com) Just found out about this and had to share because almost nobody is talking about it yet. If you are tired of paying for AI courses or getting hit with paywalls just to get a certificate, Anthropic (the creators of Claude) quietly dropped…
Affirm Retooled for Agentic Software Development in One Week (medium.com via hn) medium.com Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.
HOT TAKE: local models + agent harnesses are now capable enough to hand off junior-level IT professional tasks to [human written] (www.reddit.com) This post will have a slight old-man-shakes-fist-at-sky vibe, because….well… I’m older, so if you’re not into that, then please feel free skip it. I have been contributing to this sub for like 3 years now but I’m fearful this post will lik…
The joy and pain of training an LLM from scratch (www.reddit.com) mii-llm just released a detailed technical report on the development of the Zagreus and Nesso model families: a set of 0.4B parameter language models trained from scratch with a focus on edge deployment, multilingual capability, and Europe…
Lessons for Agentic Coding: What should we do when code is cheap? (www.dbreunig.com via hn) 10 Lessons for Agentic Coding What should we do when code is cheap? Lately, this blog has featured a lot of writing about agentic coding.
AA introduces Coding Agent Index - Performance Comparisons between Model & Harness Combinations (www.reddit.com) The Artificial Analysis Coding Agent Index includes 3 leading benchmarks that represent a broad spectrum of coding agent use: ➤ SWE-Bench-Pro-Hard-AA, 150 realistic coding tasks that frontier models struggle with, sampled from Scale AI’s S…
Comparing Qwen3.5 27B vs Gemma 4 31B for agentic stuff (www.reddit.com) Models compared: Qwen3.5-27B-UD-Q5_K_XL gemma-4-31B-it-UD-Q5_K_XL Main flags for boths --flash-attn on \ --n-gpu-layers 99 \ --no-mmap \ -c 150000 \ --temp 1 --top-p 0.9 --min-p 0.1 --top-k 20 \ --ctx-checkpoints 1 \ --jinja \ -np 1 \ --re…
What's your favorite local MCP server? (www.reddit.com) I've seen so many rag this, memory that projects. What projects are people actually using day to day for agentic workloads.
Needle: We Distilled Gemini Tool Calling Into a 26M Model (www.reddit.com) We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.
↯ Tool Use↯ Function Callingfunction-callingtool-usegemini+1
Show HN: Agentic interface for mainframes and COBOL (www.hypercubic.ai via hn) Hi HN, we’re Sai and Aayush, and we’re building Hypercubic (https://www.hypercubic.ai/), bringing AI tools to the mainframe and COBOL world. (We did a Launch HN last year: https://news.ycombinator.com/item?id=45877517.) Today we’re launchi…
Cursor autocomplete is (still) way ahead of its peers! (www.reddit.com) I switched back to Cursor this week after using antigravity + claude code for almost 6 months and I had almost forgotten how good cursor autocomplete is. I am still someone who likes to make manual edits, write markdown docs myself and not…
Qwen3.6 35B MoE on 8GB VRAM — working llama-server config + a max_tokens / thinking trap I ran into (www.reddit.com) A disciplined Cursor 3.0 Agentic workflow for complex backend/system design tasks (www.reddit.com) I think I’ve finally settled on a Cursor workflow that actually makes sense for me in terms of cost, quality, and control. Posting this because the whole model/usage story is confusing as hell, and this is the first setup that’s felt stabl…
A week after elephant, Ant dropped Ling-2.6-1T on OpenRouter for free. How high is the ceiling for Chinese model labs now? (www.reddit.com) What stood out to me isn’t just the model itself, but how quickly they shipped another one after Ling-2.6-Flash. Ling-2.6-1T seems to be positioned more around stronger agentic ability than a totally different direction.
obsidian + claude is the perfect local memory stack whats the web-based equivalent? (www.reddit.com) been seeing a lot of people hook up claude code directly to a local obsidian vault lately. for a personal workflows, it’s honestly really really good.
GPT-5.5 is lowkey blowing my mind (www.reddit.com) Just spent the whole morning testing GPT-5.5 in ChatGPT and the jump in agentic reasoning and complex task handling is ridiculous.It plans multi-step workflows, uses tools properly, checks its own work, and actually gets stuff done instead…
Stanford/Princeton AI4S unveils LabOS² -the agentic AI system that spanned from dry-lab planning to wet-lab execution, using physical AI to assist scientists - now is capable of performing fully autonomous cell culture workflows. (www.reddit.com) Introducing LabOS². An early look at autonomous cell culture, as a long-horizon physical AI workflow for biomed.
Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering (arxiv.org via hn) LLM-based Multi-Agent (LLM-MA) systems are increasingly applied to automate complex software engineering tasks such as requirements engineering, code generation, and testing. However, their operational efficiency and resource consumption r…
Launch HN: Hyper (YC P26) – Company brain to power agentic development (news.ycombinator.com) Hey HN, we’re Shalin & Kanyes, best friends who've been hacking together for 10+yrs, and now founders of Hyper (https://heyhyper.ai/). Hyper is a shared “company brain” that plugs into information flowing inside a company to make AI agents…
Same task in github-copilot, pi, claude-code, and opencode with Qwen3.6 27B (www.reddit.com) I wanted to know how much of a coding agent's performance came from the model and how much came from the harness, so I vibed a setup to allow me to test multiple agentic harnesses/model combinations on the same task. ALl the images above a…
AI agents dont just help banks they can now BE your bank (www.reddit.com) Seeing alot of posts here about AI agents built for financial institutions but I think the bigger shift is AI agents doing the banking for you not for the bank. I run a small dev shop and saw a blog about opening a bank account with AI thr…
Show HN: YourMemory, agentic memory is a pruning problem, not a hoarding problem (yourmemoryai.vercel.app via hn) This is a project that I have been building for a while now, YourMemory is a solution to agentic memory which focuses on pruning of noise rather than hoarding of data. In the current state of agentic memory most of the context is stored in…
Folks running qwen 3.6 27b for agentic work. Do you dare to use q4_k_m? (www.reddit.com) I dont have good experience running q4_k_m, the difference to q6 is "a few errors an hour" to " a few errors every couple of days". Edit: How it fails?
Turning local agents into self-optimizing agents (www.reddit.com) I was experimenting with a self-optimizing agentic pipeline to climb the benchmark leaderboard (TerminalBench). On a 10-task subset, I got the performance to rise from ~30% → ~90%.
Launch HN: Chert (YC P26) – Twilio for iMessage (www.trychert.com via hn) Hey HN! We’re Gary and Ian, and we’re building Chert (https://www.trychert.com/), an API for businesses to send, receive, and automate iMessage conversations at scale.
Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA (www.reddit.com) I benchmarked vision-capable LLMs (the "just attach the PDF and let the model read it" pattern) against OCR-based pipelines on 30 long, image-heavy PDFs from MMLongBench-Doc (https://github.com/mayubo2333/MMLongBench-Doc). There were 171 q…
GPT vs Claude in a bomberman-style 1v1 game (www.reddit.com) A few weeks ago, ARC-AGI 3 was released. For those unfamiliar, it’s a benchmark designed to study agentic intelligence through interactive environments.
My LinkedIn network is about to be aggressively flooded with Claude Code certifications (www.reddit.com) Anthropic dropping 13 completely free official courses with certificates is an absolute godsend for the community. But let’s be real: half of us are going to power-speed through the developer modules, download the PDF, and immediately upda…
The pacman benchmark: finally a viable local agentic coding agent with Qwen 3.6 27b (www.reddit.com) One way I like to test new models, is by one-shoting (with a good prompt) a single webpage clone of the classic arcade game pacman. I usually do 3 attempts and keep the best one.
Poolside Laguna XS.2 (www.reddit.com) 33B A3B MoE, Apache 2 licensed. Reported agentic results put it about level with Qwen 3.5 35B A3B, behind the 3.6 version.
Claude is getting worse, according to Claude (www.theregister.com via hn) Claude is getting worse, according to Claude Brief outage follows growing number of quality complaints Once the AI darling of programmers everywhere, Anthropic's Claude has been stumbling mightily, both in terms of cost and perceived quali…
Show HN: A Local-First Agentic Knowledge Manager (github.com via hn) Kept Kept saves your AI conversations as local Markdown files, then gives you a desktop app to search, browse, connect, and reuse them. It works with ChatGPT, Claude, Gemini, Grok, and Kimi.
Kv cache quantization: ignorance, or malice? (www.reddit.com) I run Qwen-3.6 27B FP8 on vllm for long-horizon agentic coding harness workloads with high context window and concurrent sub-agents. On two 3090s that aren’t used for anything else, it seems reasonable to expect a good balance between spee…
Build collaboratively as a group using single claude code session via Meetings (www.reddit.com) I recently came across a agentic skill which lets claude code join meetings and got access as a early user from a product hunt group and I would like to share my experience on using it. The skill lets you join google meet, teams or zoom.
Agentic harness for theoretical physics research (www.reddit.com) Hi everyone, at Hugging Face we've been developing agentic harnesses for various domains and today we're releasing physics-intern to tackle research-level problems in theoretical physics. It's a multi-agent framework which we designed to m…
Five Eyes agencies issue first coordinated agentic AI security guidance (www.reddit.com) Five Eyes agencies just issued the first coordinated multi-nation security ruling on agentic AI. CISA, NCSC, and their Australian, Canadian, and New Zealand counterparts co-published guidance telling organizations to prioritize resilience…
governance wall in agentic workflows. why are we stuck past rag? (www.reddit.com) keep seeing the same pattern across agent projects. we're good at building agents that find information, but the moment we ask them to actually do something (update a crm, trigger a payment, touch a production database), things grind to a…
Show HN: A CLI that writes its own integration code (docs.superglue.cloud via hn) We run superglue, an OSS agentic integration platform. Last week I talked to a founder of another YC startup.
Why 80% of agentic AI demos don't make it to production (www.reddit.com) Agent demos are easy. Production agents are hard.
Agents Aren't Coworkers, Embed Them in Your Software (www.feldera.com via hn) Agentic management software is all the hype today: What started with Moltbot and OpenClaw now has a lot of competition: ZeroClaw, Hermes, AutoGPT etc. These systems work well and allow you to train and build generic agent loops that are ge…
Jackrong/Qwopus3.5-9B-Coder-GGUF · Hugging Face (huggingface.co via reddit) Qwopus3.5-9B-coder is specially optimized and fine-tuned for high-performance 🤖 Agentic Coding, complex Tool Calling, and logical reasoning. 💡 Why the 9B Dense Model?
ChatGPT 5.5 x Blender (youtu.be via reddit) I tested the new ChatGPT 5.5 with Blender, and it was surprisingly capable. It created 3D scenes, fixed modelling issues, searched for missing resources, and improved the scene step by step.
Doing real coding work locally for the first time (www.reddit.com) Why is agentic AI so expensive? (www.reddit.com) Qwen3.6 agent + Cisco switch: local NetOps AI actually works! (www.reddit.com) Claude Agent can potentially replace feeds (www.reddit.com) I’ve been experimenting with how information consumption changes in an agentic internet, and this setup has been surprisingly powerful. Instead of scrolling feeds or relying on algorithms, I set up agents that roam the web based on my pref…
anyone else stuck at their desk during long agentic runs? (www.reddit.com) so I've been running some complex agentic refactors and these sessions go 6+ hours because the agent is grinding through a massive legacy codebase, and I can't really walk away. close the laptop and the process dies. re-initializing takes…
Agentic AI frameworks (www.reddit.com) Hi, so I have grasped a lot of theory about building agentic systems but I want to apply it am get my hands dirty. Which framework should I start with as an individual learner, since there are a lot of them I am kinda confused.
Is Qwen3.6 current king for local agentic use? (www.reddit.com) I've been testing other models but it seems like nothing even come close to Qwen3.6 35B A3B for agentic use. The worse I'd get is a loop sometimes, while Gemma4 produced broken tool calls occasionally and I couldn't even get GLM 4.7 Flash…
Launch HN: Runtime (YC P26) – Sandboxed coding agents for everyone on a team (www.runtm.com via hn) Hey HN, We're Gus and Carlos from Runtime (https://runtm.com). We're building infra that lets your whole team (including non-engineers) ship with Claude Code, Codex, and other agents without engineering having to handhold every session.
Simpler self hosted alt to Open WebUI (www.reddit.com) Got Qwen3.6 27B running on my newly assembled 4x 3090 rig (s/o 3090-club) and I'm trying to get the people in my house to adopt the local workflow. Open WebUI has improved a lot in the recent updates, but I still found it pretty rough for…
Gemini api showing agentic gemini models (www.reddit.com) could not extract summary
(Rant ;)) Make your benchmarks realistic (www.reddit.com) Everybody here is posting their optimizations for running different models - thats good but make these benchmark realistic as speed is not one factor to run llm effectively. Context size is key - with agentic/coding/rag work you need to ha…
Tendril – a self-extending agent that builds and registers its own tools (github.com via hn) Tendril A self-extending agentic sandbox that demonstrates the Agent Capability pattern — where the model discovers, builds, and reuses tools autonomously across sessions. Built with AWS Strands Agents SDK and Tauri.
Llama.cpp parameters for Qwen 3.6 with RTX 3090 (www.reddit.com) Hi, I'm trying to run Qwen 3.6-35B on my RTX 3090 (24 GB of VRAM) but I'm not sure about 2 thing: - Which variant of the model to use ? (Q4_K_S, Q3_K_XL, other ?
Is agentic commerce an opportunity or a chaos? (www.reddit.com) I have been watching agentic commerce closely and it is interesting. AI agents are picking products for people now, and it's wild.
2x Asus Ascent GX10 - MiniMax M2.7 AWQ - cloud providers are dead to me (www.reddit.com) Hello, I've been on a quest to get something "close enough" of Opus 4.5 running locally, for agentic coding, as SWE with 15 years of experience. I tried with one spark (yeah I'm calling my Asus Ascent GX10 sparks - they're the same), with…
High-stakes game of musical chairs! (www.reddit.com) I made this image (with nanobanana 2.0) to illustrate what I think is happening in the current AI race. Right now, there's a heavy decline in quality and access to AI tools.
100 Tips & Tricks for Building Your Own Personal AI Agent /LONG POST/ (www.reddit.com) Everything I learned the hard way — 6 weeks, no sleep :), two environments, one agent that actually works. The Story I spent six weeks building a personal AI agent from scratch — not a chatbot wrapper, but a persistent assistant that manag…
how do you guys handle the conversation with skeptical clients when selling agents? (www.reddit.com) struggling with a bit of a reality check lately and wanted to see if anyone else is running into this. been pitching agentic workflows for a while, and I've realized that leading with the tech - the orchestration the RAG, the "intelligence…
The power of structured workflows and small local models (www.reddit.com) A month ago, I experimented with a very basic home-rolled agent loop with a handful of tools and found it worked surprisingly well in spite of how crude it was: https://www.reddit.com/r/LocalLLaMA/comments/1sl7f8e/homerolled_loop_agent_is_…
Which industries are adopting Agentic AI the fastest right now? (www.reddit.com) Feels like every week there’s a new “AI agent” startup or enterprise rollout. Curious which industries are actually adopting Agentic AI the fastest in real-world workflows, customer support, finance, healthcare, dev tools, operations, etc.?
As of today, what's the *most stable* model to run on a 32Gb RAM Mac w/ 256k context? (www.reddit.com) Hey everyone, I've been playing around with Gemma4 and Qwen3.6 on my 32Gb Macbook Pro M2 Max since their release but I'm struggling at finding: The best software to run it (oMLX, llama.cpp, ...) The best model + quant to pick The best sett…
I created an agentic orchestration pipeline for music video generation (www.reddit.com) I’ve been building Uisato Studio, a workflow-based AI creation platform for audiovisual work. This is the Music Video mode: upload an image + audio, and the system analyzes the input, generates visual direction, creates clips, handles b-ro…
why llama.cpp can’t combine speculative decode methods? (www.reddit.com) dicking around with the new mtp speculative decode with qwen3.6 27b, and it’s great. but for agentic coding i’ve seen significant improvements from ngram, because a decent fraction of the time (e.g.
Watching the agent-tooling space dominate GitHub trending right now. Sharing the Github tracker we built and use internally, in case it's useful (www.reddit.com) Something interesting happening on GitHub trending: Agentic infrastructure repos are growing faster than anything else right now. Today's top three by 24h growth: obra/superpowers: +2.9k stars (agentic skills framework, methodology for sof…
Don't ask Qwen 3.6 35b to give you aski image of Yoshi :) (www.reddit.com) https://preview.redd.it/dfqed57qgsvg1.png?width=1706&format=png&auto=webp&s=3859209698d2e844e2731326e355d60928658f8a The most fun part was reasoning, here is a gist: https://gist.github.com/anzax/5f06716c66180013cd715f6c2e5848df There is a…
Show HN: OpenHack – OSS security scanner, 40x cheaper, on par with Opus 4.6 (github.com via hn) ⏚ OpenHack Open Source Agentic Security Scanner & Verifier for your codebase. Like Claude Code Security / Codex Security but open source and exclusively uses open source models.
Show HN: Local Coding Agent with LLMs to Delegate Tool Calls to Small AI Models (github.com via hn) Open Agent Tools Coder Open Agent Tools (oats) enables small-to-large self-hosted ai models to use local source code when running tool-calling agentic workloads. We actively data mine 20,970+ (2+ TB) popular github repos using large and sm…
Removing Vision from model (www.reddit.com) I removed mmproj file from models to remove vision and save my vram. But just curious, is this really don't affect its text ability?
Show HN: Headless Cloud Security – Headless SaaS has come to security (www.sysdig.com via hn) The cloud security company I work for, Sysdig, launched “Headless Cloud Security” last week. The short version: as attacks get faster and more automated, security tooling is going to need to evolve beyond dashboards and humans clicking thr…
Lovable is the first coding agent platform to adopt AIUC-1 (SoC-2 for AI Agents) (www.aiuc-1.com via hn) More than half of all LLM tokens now go to writing code - and coding agent adoption is growing rapidly across the enterprise. In this whitepaper, co-authored with Lovable, we show how AIUC-1 addresses the unique risks of agentic development
CopilotKit raises $27M to build the Agentic FrontEnd Stack (techcrunch.com via hn) Many companies today provide AI simply as a chatbot inside their apps: You type in (or dictate) what you want it to do, and the AI bot goes and tries to do it. Still, the experience tends to feel clunky.
Does the "6 months gap" still hold? (www.reddit.com) Hi. It is quite a consensus that the "jump" in quality of agentic development happened sometime in December 2025, transforming from "nice to have", to actually performing.
Roo code shuts down, Team will focus on roomote agent (twitter.com via hn) When we started Roo Code in late 2024 by forking Cline and adding what's now widely known as dangerously-skip-permissions, agentic coding was rough and experimental. But Roo Code took off fast: 3 million installs, a passionate community, r…
How to share agentic workflows, instructions, skills, across team members, teams, organizations (www.reddit.com) I work for a fairly large company (1000 devs). My team has 6 members.
Shared Dictionaries: compression that keeps up with the agentic web (blog.cloudflare.com via hn) Today, we’re excited to give you a sneak peek of our support for shared compression dictionaries, show you how it improves page load times, and reveal when you’ll be able to try the beta yourself.
Agentic Search Models with OpenSearch and Elasticsearch (bonsai.io via hn) Tuning search is tricky, and the tools of yesterday are good but require lots of effort and data to get right. In this post I'm going to introduce purpose-built agentic LLMs for searching and reranking, which are an easy drop-in solution f…
Nemotron 3 Ultra: Open Moe Hybrid Mamba-Transformer for Agentic Reasoning [pdf] (research.nvidia.com via hn) 2026-6-4 Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning NVIDIA Abstract. We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybri…
Robinhood launches credit card for AI agents with 3% cash back (fortune.com via reddit) In the latest sign of AI’s growing footprint in online commerce, Robinhood announced on Wednesday that users can now instruct agents to make purchases on their behalf using the Robinhood Gold card. To illustrate the potential of agentic sh…
how do you scale infrastructure for ai agents on a budget? (www.reddit.com) we're running an agentic pipeline that does multi-modal file processing - large files, often hundreds of mb per request. The actual agent logic works fine.
agents have a high false-positive rate? how to handle? (www.reddit.com) been digging into agentic workflows for specialized image processing and high-stakes data triage, and honestly have problems with trust. you've probably seen the pattern.
I've created the fastest local AI engine for Apple Silicon. Optimised for agentic use. (www.reddit.com) https://preview.redd.it/p0rqofxvrtzg1.png?width=1460&format=png&auto=webp&s=8ce5b18b4ddaad9b71f71fd8eb623839fc9c6c8b For weeks I've been working on creating the fastest local AI engine for Apple Silicon... And I finally did!
ATS vs. multi-agent. where does sensible automation end and over-engineering begin? (www.reddit.com) the traditional ATS is predictable and cheap to run. it's a known quantity.
how are you handling sync in multi-agent sales loops? (www.reddit.com) been creating a multi-agent setup for b2b outreach (linkedIn + email) and the moment I swap a human-managed inbox for an agentic one, "fast" usually ends up meaning a 24-hour batch cycle. fine for some use cases, but I actually want instan…
Ive automated my email/sms/phone (www.reddit.com) we got it good boys! how many of you are doing this??
Don't share your opinion, if you didn't test it !!! (www.reddit.com) I see many people giving their opinion based on what they previously saw or based on others and making their own opinion. Even though they don't test models thoroughly, they still give their option which is so frustrating.
Show HN: Bonsai 1.7B ternary model at 442T/s on M4 Max (agents2agents.ai via hn) We took a recently released Bonsai 1.7B ternary model from PrismML (https://github.com/PrismML-Eng/Bonsai-demo) and ran our agentic evolution search on it for 6 hours to optimize the Metal kernels. The search was fully autonomous.
what are the biggest risks of agentic AI in supply chain production? (www.reddit.com) we've been testing agentic AI for inventory replenishment and exception handling. the goal was to get past simple "if-then" rules and have agents actually weigh trade-offs, like margin vs.
Hiring: GTM Engineer at Lovable.dev 🚀 (www.reddit.com) Lovable ($400m ARR, 200k projects built per day) opened our first US hub in Boston, and we're looking for a highly skilled GTM Engineer to be the founding technical member of our enterprise GTM function there. You'll build scalable agents,…
Are there any agentic coding harnesses that AREN'T built on JS and Node? (www.reddit.com) With how often we hear about supply-chain attacks on npm I am hesitant to install any apps that use it, let alone something like an agent harness that will run constantly unsupervised.
Show HN: gcx – The Official Grafana Cloud CLI (github.com via hn) Hi HN, We’re excited to share gcx, a new CLI we’ve been building for Grafana Cloud. With the rise of agentic coding tools like Claude Code and Codex we're building faster than ever, but these agents are often blind to what’s actually happe…
I create the awesome list for how to train a LLM Agent (www.reddit.com) AI governance isn't failing because we lack regulation i mean like it's failing at execution (www.reddit.com) There's a lot of movement around AI regulation right now (EU AI Act, US frameworks, etc.), but in practice many of these governance models don't survive contact with real, agentic systems. I've been digging into why compliance frameworks t…
Claude Code – Disabling telemetry also disables 1-hour prompt cache TTL (github.com via hn) Claude Code [![npm]](https://www.npmjs.com/package/@anthropic-ai/claude-code) [npm]: https://img.shields.io/npm/v/@anthropic-ai/claude-code.svg?style=flat-square Claude Code is an agentic coding tool that lives in your terminal, understand…
Training SID-1 to beat GPT-5 at search with 1k+ QPS RL (turbopuffer.com via hn) SID-1 is an agentic search model that is 24x faster than GPT-5.1-high, 374x cheaper than Sonnet 4.5, and achieves 1.9x higher recall than traditional RAG pipelines. Here's how we trained it using large-scale RL on turbopuffer.
Are LangGraph agents and other agent frameworks becoming obsolete? (www.reddit.com) Hi all, Over the last 2 years, I’ve built around 10-15 LangGraph agents for very specific tasks in our company. But lately, it feels like all that work isn’t really maintainable for a single AI/agent engineer.
Why GPU compilers are MORE important in the agentic era (scale-lang.com via hn) Part 2 of a series on why Spectral and SCALE exists. In Part 1, I argued that cross-vendor portability in accelerated computing must be delivered by a company, rather than a committee, because the implementation is the standard.
The architecture of "Agentic Twins": How Avatarinc is using OpenClaw to build verifiable Al agents (www.reddit.com) The architecture of "Agentic Twins": How Avatar.inc is using OpenClaw to build verifiable AI agents. There is a massive gap in the agent ecosystem right now: capability vs.
the saas vs. custom software debate in healthtech: why we built a custom agentic layer (www.reddit.com) been working with a tier-1 diagnostic imaging network that ran into a straightforward problem: scan volumes jumped 22%. the obvious answer is to license a saas tool.
How many of you tried BeeLlama.cpp? How's it? Agentic coding possible with 8GB VRAM? (www.reddit.com) We'll be getting those features(check bottom link) on mainline soon or later anyway. But for now this fork could be useful to see the full potential of our poor GPUs(and also big, large GPUs).
Setting the standard for agentic development [pdf] (lorqvmwmiiherfjgxrkz.lovable.cloud via hn) AIUC -1 A L O V A B L E × A I U C - 1 W H I T E P A P E R Se tt ing the st anda rd f or age nt ic de vel opm ent How AIUC -1 evo lved to add ress the un iqu e challe nge s of age nt ic de vel opm ent — and why Lov able will be the fi rst c…
Spec-driven agentic coding is quietly making us worse at the job of supervising agents (www.reddit.com) Been running an agent-heavy workflow on a mid-size TypeScript monorepo for about six months. Orchestrator on top, sub-agents for codegen, a human (me, mostly) writing specs and reviewing diffs.
Show HN: Open-source 2D IDE for managing agent CLIs (49agents.com via hn) First Agentic IDE, Open-Source Every agent, terminal, and repo on one infinite canvas. See and control everything from any device.
We built an agentic runtime to make AI automations easier to set up and more reliable (www.reddit.com) Hey all, our small team just launched Friday Studio and we'd genuinely love any feedback you have. It's an AI runtime that turns prompts, skills, and tools into repeatable configurations that you can reliably run and share.
Lasso Security 2024: ~20% of LLM-suggested packages don't exist — and attackers now register the popular hallucinations with malware (slopsquatting) (www.reddit.com) Lasso Security ran a study in 2024 — they measured frontier models suggesting fake package names about a fifth of the time. The follow-up problem: attackers have started registering the most-commonly-hallucinated names with malicious code…
Show HN: Hollow is an open-sourced self-modifying agentic system (github.com via hn) ___ ___ __ | || |/ \| | | | / \ \ \ / / | __ | (_) | |__| || () \ \/\/ / ||||\/||\/ \/\/ This repo is three agents running on qwen3.5:9b on your machine, picking their own goals, writing and deploying their own tools, forming opinions abou…
Terminal Bench score for Mistral 3.5 Medium (www.reddit.com) So... there were a couple promising benchmark scores reported by mistralai in the model card for Mistral 3.5 Medium, BUT there wasn't the one that I usually care about the most, which is TerminalBench 2.0.
engineering teams celebrating agentic workflows that returned the same result two runs in a row (www.reddit.com) edit for credit: trash on X
Speculative decoding with Gemma-4-31B + Gemma-4-E2B enables 120 - 200 tok/s output speed for specific tasks (www.reddit.com) So for my project I was using up until now either Gemini 3 / 2.5 Flash or Flash-lite. All my use cases are not agentic, simply LLM workflows for atomic tasks like extracting references from the law, classifying, adjusting titles to nominat…
How do you actually know if Opus 4.7 is better for your specific agent use case? (www.reddit.com) Anthropic shipped Opus 4.7 yesterday. The headline numbers are real: 64.3% on SWE-bench Pro (up from 53.4%), best-in-class on MCP-Atlas at 77.3% for multi-tool orchestration, 14% improvement on multi-step agentic reasoning, and one-third f…
What do you use for autocomplete in 2026? (VS Code) (www.reddit.com) I tried co pilot and windsurf but they weren't satisfying. Co pilot being not smart and windsurf too slow (I tried with free tiers).
Two LLM UI Patterns That Aren't Chat (poyo.co via hn) Two LLM UI Patterns That Aren't Chat Intro Chat is still the default LLM interface, and for most cases that's fine. Agentic harnesses are still built around a single linear conversation at their core.
Show HN: Strudai, browser based agentic wrapper around Strudel (strudai.com via hn) Hi all! Together with a friend (and Claude Code) we built this project for fun.
ai governance for agentic workflows in regulated environments. what actually works in production? (www.reddit.com) mapping out the production architecture for an ai agent system in a heavily regulated environment (compliance-heavy, structured reporting requirements). the agent operates in a high-stakes workflow, so every automated suggestion or flag ne…
trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser (www.reddit.com) Trained a prompt injection classifier using ml-intern + DeepSeek v4 Flash. DistilBERT, F1 99%, ONNX int8, ~65 MB, runs in browser with Transformers.js v3.
Claude Code plugins a risk to local ecosystem? (www.reddit.com) There's an increasingly popular way to ship complex extensions for agentic work, that is specific to Claude Code, which is Code plugins. For example here's deep-wiki by Microsoft, a plugin to create a wiki from analyzing your project's rep…
how to architect ai agents for regulatory approval? (www.reddit.com) spent a lot of time on agent architecture for mission critical environments. getting an agent to browse the web or draft an email is trivial compared to deploying one where a hallucination carries real legal or physical consequences.
Show HN: Strava for AI coding – analytics on your Copilot/Claude/Codex usage (github.com via hn) AI Engineer Coach better agentic engineering. Analyze your AI coding assistant usage — any harness, one dashboard.
Teaching Claude Why (www.anthropic.com via hn) Teaching Claude why Last year, we released a case study on agentic misalignment. In experimental scenarios, we showed that AI models from many different developers sometimes took egregiously misaligned actions when they encountered (fictio…
Ask HN: Which memory systems are you using in your agents? (news.ycombinator.com) Are you using an open source version, hosted product or maybe you have rolled your own? What is working, what is missing, and how are you evaluating the usefulness of memory for your agentic projects?
Every week this we see some version of "how do I evaluate my LLM app?" and the answer almost always stops at RAGAS or DeepEval. Here is the part of the evaluation stack most tutorials skip in 2026. (www.reddit.com) The same question lands on this sub a few times a week, and the standard answers (RAGAS, DeepEval) are correct but stop one layer short of what you actually need once your app leaves a notebook. Wanted to lay out the full picture for anyon…
What do you use Gemma 4 for? (www.reddit.com) Both Gemma 4 and Qwen 3.6 seems to be the hottest local models right now. Looking at the benchmarks and reviews, it seems like it's better in every way: coding, benchmarks, agentic tasks.
Meet Palantirs secret little brother “non-profit”. RavenEye Agentic AI by River Side Research Institute. (www.reddit.com) could not extract summary
Hollow: An Agentic OS with self-modifying kernels and distributed multi-agent transactions. (www.reddit.com) I’ve been building an infrastructure layer for agents that treats the LLM like a process, not a chatbot. It’s called Hollow AgentOS.
Donating Agent Payments Protocol to the Fido Alliance (blog.google via hn) For agentic technology to scale, it needs to work for everyone. That’s why over the last few months, we’ve shared new open commerce and payments standards to serve as the building blocks for the future of AI shopping.
Show HN: VT Code – Rust TUI coding agent with multi-provider support (github.com via hn) Hi HN, I built VT Code, a semantic coding agent. Supports all SOTA and open sources model.
Google Unveils Agent Skills Repository for Smarter AI Agents (cloud.google.com via hn) Level Up Your Agents: Announcing Google's Official Skills Repository Megan O'Keefe Senior Staff Developer Advocate As AI models improve, technical practitioners are increasingly turning to agentic AI tools to build with Google Cloud produc…
Harnesses Explained: The Inner and Outer Workings of the Coding Agent Harness (codagent.beehiiv.com via hn) When I started this newsletter, "harness engineering" was a term just starting to crop up. Now it's a household term in the community, and there's a lot of great material on it - most of it on building agentic systems with frameworks like…
Agentic memory with passive recall and citations as trust graph (github.com via hn) Agentic framework that _switches_ models based on role? (www.reddit.com) gemma4 vs qwen3.5 122A10 real usages (www.reddit.com) RTX PRO 5000 (48GB) vs MacBook Pro M5 MAX (128GB RAM) - The choice for fine-tuning & agentic coding (www.reddit.com) Codex v/s Cowork v/s Perplexity Computer v/s Kimi Agent Swarm (www.reddit.com) Agentic coding Qwen 3.6, Q6_K 125k context vs Q5_K_XL 200k context (www.reddit.com) What would you choose if you were in my shoes? How viable is 125k for agentic coding really?
Opus 4.7 keeps bumping into a Malware Reminder (www.reddit.com) For context, I'm developing a game runtime modifier and reverse engineering kit with an agentic operator baked in. Something like Cheat Engine with a VS Code-style UI and an AI-first tool-heavy agentic harness.
How do you think I should charge? (www.reddit.com) I recently started getting a few leads, but I still do not feel like I fully understand how I should charge for what I do. What I do is basically a service as software model.
Show HN: Mercury – No-code orchestration for human and agent teams (www.mercury.build via hn) Hey HN, I'm Naveen, one of three co-founders building Mercury (mercury.build). We spent the last year in deploying AI agents for teams in large enterprises.
$1,400/month with Cursor + Claude API — how are you managing costs while keeping a real agentic workflow? (www.reddit.com) Hey, This month I hit $1,200 in Claude API costs inside Cursor (Opus 4.6 + Sonnet 4.6) on top of the $200/mo Ultra plan. $1,400 total.
Show HN: The first agentic coding engine that hot-reloads the full stack (serverpod.dev via hn) I’ve been working on the next version of Serverpod, an open-source backend written in Dart for the Flutter community. We are getting close to a final release.
Computex 2026: Are We Heading for the Agentic PC Era Yet? – EE Times (www.eetimes.com via hn) Computex 2026: Are We Heading for the Agentic PC Era Yet? - EE Times Advertisement Skip to main content Aspencore networkNews & Analysis Products Design Tools About Us AspenCore Network News the global electronics community can trust eetim…
X402 Batch Settlement: High-Velocity Agentic Commerce (www.x402.org via hn) Introducing x402 Batch Settlement: High-velocity Agentic Commerce May 11, 2026 By: Cam Whiteside (Cloudflare), Carson Roscoe (Coinbase), Conner Swenberg (Coinbase), Josh Nickerson (Coinbase), Philippe d'Argent (Coinbase) TL;DR: The x402 pr…
Show HN: Clor – give your agent claws (clor.com via hn) At my last job I spent a year building an agentic coding platform used by hundreds of thousands of people. Along the way I tried building a hosting service on OpenClaw, and also ran Hermes myself for a while.
MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities (twitter.com via hn) MiniMax (official) @MiniMax_AI Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench H…
Minimax M3 on Open Router (openrouter.ai via hn) MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and video inputs with text output, a 1M-token context window, and is suited for long-horizon agentic work, coding, and tool use.
Spitting Out the Agentic Kool-Aid (openpath.quest via hn) Spitting Out the Agentic Kool-Aid One Sunday evening last June, three friends met in Vienna to relive the glory days: coding all night. This time, Claude joined them.
Visa invests in Replit to power agentic payments for developers (techcrunch.com via hn) Visa has announced an undisclosed investment in AI coding platform Replit. The two companies are also exploring how to integrate Visa’s payment products into Replit, so that developers — and the AI agents they build — can accept payments d…
Show HN: VAEN – Package and import portable AI coding-agent Harnesses (github.com via hn) Hi HN, I built VAEN (an open source CLI) because I kept running into a boring problem with AI coding-agent workflows: the setup becomes useful, but then it is hard to move. A good, useful agentic harness consists of more than just instruct…
Q4_K_M is fine for chat and a trap for agents. Here is math mathing. (www.reddit.com) saw the Q4_K_M vs Q6 thread earlier and the comments are talking past each other. "few errors per hour" vs "errors every couple days" sounds like a 24x difference.
Show HN: The platform layer for agentic ML engineering (github.com via hn) LUML: One platform for the entire AI lifecycle Home Page | Discord | App | Documentation LUML is a platform for managing the complete machine learning lifecycle, from initial experiments to production deployment. It provides experiment tra…
Versatility of Exasol with Agentic Engineering (www.exasol.com via hn) Versatility of Exasol with Agentic Engineering Exasol is an analytical database. It’s built for joins, aggregations, window functions, and the kind of queries that chew through billions of rows before your coffee gets cold.
Claude is the best AI humanizer when you give it your writing style and a detector loop (www.reddit.com) I built this because I kept seeing a very boring workflow play out at home. My girlfriend would write with Claude, paste the draft into Slop or Not (an app that I built), see what still looked AI-ish, tweak the prompt, paste the next draft…
favorite Agentic Coding Harness (www.reddit.com) So far, I’ve tried Codex CLI, Claude Code, Gemini CLI, OpenCode, and recently, Pi with local models. Pi is the leanest of them all, with just four tools: read, write, edit, and bash.
Articraft: An Agentic System for Scalable Articulated 3D Asset Generation (articraft3d.github.io via hn) Articraft is an agentic system for scalable articulated 3D asset generation. A coding agent writes programs against an LLM-friendly SDK to produce simulation-ready articulated 3D assets from text descriptions.
Qwen3.6:27b single-shot fixed a CSS UI bug that had Gemma4:26B doom looping uselessly for 15 minutes (www.reddit.com) Warning: long post ahead. On the bright side, it's 100 percent human-written, typos and all.
Best practice for accurate translation at minimal cost? (www.reddit.com) I've been meaning to translate forum post type content for one of my partner's sites. Objective to open up the audience base.
Microsoft researchers find AI models and agents can't handle long-running tasks (www.theregister.com via hn) MOST POPULAR EVENTS - Securing the Untrusted Agentic Development Layer Join us to learn how to architect a development environment where your builders and their agents can move fast and securely. - Toxic Flows: When Your AI Agent Skill Bec…
500k context on 48gb VRAM!! - 21tok/s (coding) (www.reddit.com) I found this model hiding in the corner of huggingface: https://huggingface.co/Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF Looks to be tuned specifically for math but i thought i'd give it a try since i cant run the full 12b nem…
Sandboxing AIOps and Agentic AI Security (blog.cosmonic.com via hn) When people talk about AI sandboxes today, they usually mean: - seccomp, seatbelt, or bubblewrap - containers built from namespace mappings, cgroups, and allowlists - hand-tuned profiles bolted onto the existing OS - some assemblage of the…
Nowadays, what are the best AI tools for a single dev working on personal projects? (www.reddit.com) I have 2 years of experience doing data engineering and ai engineering, but I also have background in software engineering and machine learning in college due to my thesis. I've aways wanted to apply my computer science knowledge to my sid…
Opus 4.6 does better research, Gemini 3.1 has better judgment (www.reddit.com) Figured this out by running 4 models: Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro, and Grok 4.20, on a benchmark of 1,417 binary forecasting questions resolving Oct–Dec 2025 with two evaluation conditions: agentic (each model does its own web…
What industries already use agentic AI in production? (www.reddit.com) Curious which industries have actually moved beyond pilots and are using agentic AI in real production workflows. Are these systems driving measurable outcomes or still mostly augmenting existing processes?
DeepSeek V4 Pro matches GPT-5.2 on FoodTruck Bench, our agentic benchmark — 10 weeks later, ~17× cheaper (www.reddit.com) Tested DeepSeek V4 Pro on FoodTruck Bench — our 30-day agentic benchmark where models run a food truck via 34 tools (locations, pricing, inventory, staff, weather, events) with persistent memory and daily reflection. First Chinese model to…
I'm looking for an AI Automation Engineer role or gig (news.ycombinator.com) Hi all, I'm an AI automation engineer who builds systems that replace manual work, scale outreach, and turn workflows into revenue. I have sent out working systems for managing leads to CRM, finding real estate deals, sorting emails with A…
The Block Model Behind Warp's Agentic Development Environment (www.warp.dev via hn) Warp has come a long way since it initially set out to modernize the terminal. In the screenshot above, an agent is working through a plan alongside a developer's own shell commands — running its own commands, reasoning, proposing a diff —…
Learn, run and test Agentic AI on your browser for free! (Built with Claude Opus 4.7 in 2 days) (www.reddit.com) Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…
↯ Fine Tuning↯ Opus 4.7↯ Function Callingfunction-callingfine-tuningrag+4
Agents vs Workflows (www.reddit.com) What’s a task that actually needs an agentic loop? I have shipped a handful of tools for myself including a morning brief, a research summarizer, and a couple extraction pipelines.
I built Claude Code skills for writing agent prompts, grounded in prompt research (github.com via reddit) I've been building agentic systems for a while and wanted a more systematic approach to writing prompts. So I gathered papers, did some deep research and created guides on structure, format and prompting techniques.
What's Missing in the 'Agentic' Story (www.mnot.net via hn) What's Missing in the ‘Agentic’ Story Friday, 24 April 2026 For much of the history of computing, it was reasonably safe to assume that a machine was doing what you told it to do (and what its creators promised it would do), because its op…
Free hands-on lab: build a ReAct agent 3 ways (create_agent, raw LangGraph with tool-call budget, NVIDIA NAT YAML) (www.reddit.com) As Agentic AI explodes, Amazon doubles down on MCP (thenewstack.io via hn) As agentic AI explodes, Amazon doubles down on MCP At the recent MCP Summit in New York City, The New Stack sat down with Clare Liguori, Senior Principal Software Engineer at AWS and core maintainer of the open-source Model Context Protoco…
Enough with perplexity and KLD! BenchLocal benchmarks real use cases and is easy to use for everyone (www.reddit.com) Hello everyone, I have followed stevibe on X for a while after he released Tool Call 15, an easy to use benchmark to test the tool calling performance of various models. All you needed to do was to point the benchmark to an OpenAI compatib…
GPU strategy for local LLM + mixed workloads (70-person company) — NVIDIA vs AMD? (www.reddit.com) Hey all, we’re a mid-sized company (~70 people) and currently planning to bring a lot of our workloads on-prem instead of relying on cloud APIs. The goal for the moment is to run small to mid-sized models in the range of 30B like Qwen3.6 o…
Stopping the Meta AI director's "OpenClaw failure with an out-of-band killswitch (highflame.com via hn) On February 23, 2026, the AI safety community witnessed a definitive case study in agentic failure. Summer Yue, the Director of AI Alignment at Meta’s Superintelligence Lab, watched in horror as her OpenClaw agent began a "speedrun" deleti…
Systems Engineering: The Key to Building Agentic Software That Works (www.ashpreetbedi.com via hn) Systems Engineering The Key To Building Agentic Software That Works In the early 1940s, Bell Labs was building the national telephone network, the most complex technical system in the world at the time. Millions of switches, cables, relays…
Tested 6 browser use agents for real-world tasks — here's an honest breakdown + looking for recommendations (www.reddit.com) I've been on a hunt for a browser agent that can reliably handle daily agentic tasks: filling job applications, logging into sites and fetching data, making posts on my behalf, solving assignments and reporting results, and API/troubleshoo…
Draining Wallets via Prompt Injection in Coinbase AgentKit (457e884c.x402warden-blog.pages.dev via hn) Coinbase AgentKit Prompt Injection: Wallet Drain, Infinite Approvals, and Agent-Level RCE# Reported 13 days after Coinbase launched Agentic Wallets. Validated by Coinbase.
CLAW.md – open format for agentic cron jobs (clor.com via hn) Show HN: RiddleRun – AI run end-to-end browser tests (github.com via hn) Nex N2 Pro: Frontier agentic performance at 400B (huggingface.co via hn) An agentic model with Agentic Thinking. Today, we are officially releasing and open-sourcing our next-generation model, Nex-N2 — an agent model built for real-world productivity scenarios.
When Can Amazon Block an Agentic AI Service?–Amazon vs. Perplexity (blog.ericgoldman.org via hn) by guest blogger Kieran McCarthy On March 9, 2026, Judge Chesney granted a preliminary injunction in the case of Amazon v. Perplexity, concluding Amazon was likely to succeed on its CFAA and California Penal Code section 502 theories.
AI Agents Now Generate More Web Traffic Than Humans (www.cnet.com via hn) The internet just crossed a remarkable threshold. Agentic AI internet traffic now exceeds that of real humans for the first time.
Beyondflow No-Code Multi-Agent Teams with Unlimited Runs. BYOK and Ollama (beyondflow.app via hn) Researcher GPT-5 Engineer Claude Critic GPT-5 Innovator Gemini Manager Context Guardian Agentic Workflow Architecture · v1.0 The future of AI Collec An R&D platform where differents AI agents collaborate under the supervision of a Context…
Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud (boxes.dev via hn) Hi HN, we’re Nick and Drew, and we’re building boxes.dev – the first cloud-only agentic dev environment (ADE) that gives every Codex and Claude Code agent its own cloud computer. We’re two engineers who previously built Gem (co-founder/CTO…
Beyond the Semantic Layer: Building a Context Layer for the Agentic Era (www.kaelio.com via hn) A context layer puts your warehouse schema, joins, metric definitions, and business knowledge in one reviewable place so data agents query governed context instead of guessing field names. A look at how it works, and at ktx, the open-sourc…
Konversio: Open-source agentic customer support for digital sovereignty (www.konversio.org via hn) Agentic customer service.100% open source. Meet Pilot Konversio's AI support agent Konversio gives teams an open AI support layer they can own and self-host.
This Month in Agentic Coding – May 2026 (www.agenticcodingweekly.com via hn) Welcome to the first edition of ACW Monthly Brief. It's one email to catch you up on all the meaningful developments in agentic coding from the past month.
Show HN: OpenSOP, We got tired of agents lying to us, so we built them a harness (opensop.ai via hn) OpenSOP is an early open-source runtime/standard for executable agentic processes. You (or your agent) define a process in YAML, and OpenSOP exposes it as a typed REST API that agents and humans can both use.
Show HN: Self tuning chat exposing it's semantic and agentic cache (chat.betterdb.com via hn) RESP-compatible DBs and BetterDB Valkey · Redis · Dragonfly · BetterDB docs · semantic and kv/agentic cache demo Ask about Valkey, Redis, Dragonfly, or BetterDB Backed by live documentation with semantic caching and tool result caching. Wa…
Agentic Mfw (agenticmotherfucking.website via hn) And nobody gives a single fuck how it's built anymore. The previous motherfuckers spent a decade teaching you the holy commandments of clean code.
Show HN: MetaBrain – A local document memory for AI agents (metabrain.eu via hn) Hello there HN I experimented with agentic coding recently and I felt the need to track more contextual data by project. Also I felt the need to be able to go beyond the 1D chat to communicate with agents.
Kelsey Hightower on Practical and Responsible Use Cases for Agentic AI [video] (www.youtube.com via hn) About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
Pioneering the Agentic Shift Within Salesforce Engineering (www.salesforce.com via hn) Key Takeaways - Autonomous tools are now writing code, reviewing pull requests (“PRs”), and driving deployments across the software development lifecycle. - Standardizing on Claude Code and removing token limits improved output and quality…
Amazon Strikes $6B Deal with Snowflake for Agentic Computing Chips (www.wsj.com via hn) Exclusive | Amazon Strikes $6 Billion Deal With Snowflake for Agentic Computing Chips - WSJ Skip to Main Content Skip to... Select What to Read Next Most Popular News Most Popular Opinion Asia Dow 6311.60 -1.04% Nikkei 64978.67 -0.03% Hang…
Deep research led astray by AI Slop, iterating with source filtering helped (www.reddit.com) tdlr; don't trust deep research out of the box by default, need prompts / skills / iteration to filter AI slop from sources [The purpose of this post is to report a example of the default deep research going astray and how I worked around…
Does Cursor have a /goal mode? (www.reddit.com) In Codex GUI (e.g. on Mac; I'm not sure about other platforms as I don't use them), in the Claude Code GUI, and even in the open source Codex CLI (see https://github.com/openai/codex/tree/main/codex-rs/ext/goal/src ) -- there is a feature…
I built an Agentic AI Filmmaking Studio for people who have stories to tell but lack the budget and technical skills. (Giving away 10 free credits for the next 48 hours) (www.reddit.com) Hey everyone, I just launched MotionX Studio (Link in comments). The premise is simple: Filmmaking is completely gatekept by money and highly technical skills.
Agyn: open-source distributed agent runtime on Kubernetes — like Google's AX, with pre-built Claude Code and Codex agents, and full credential isolation from the LLM (www.reddit.com) Agyn is an open-source, Kubernetes-native agent runtime that moves AI agents like Claude Code and Codex from laptops to company infrastructure with the controls you actually need to run them in production. If you've been reading about Goog…
What would 2x RTX 3060 12GB get me? (www.reddit.com) TLDR: I’m considering buying 2 RTX 3060 12GB as opposed to single 24GB card to gain experience and need to know what can be realistically accomplished with this setup. Sorry in advance, I know you guys are probably tired of these kinds of…
Agentic Compilation: Reducing LLM Rerun Costs (arxiv.org via hn) LLM-driven web agents operating through continuous inference loops -- repeatedly querying a model to evaluate browser state and select actions -- exhibit a fundamental scalability constraint for repetitive tasks. We characterize this as th…
Claude is thinking for 20+ minutes! (www.reddit.com) I gave Claude a genuinely hard problem today: a subtle bug somewhere in a video encoding ffmpeg pipeline, the kind where the output is slightly wrong and you can't tell which stage introduced it. I'd been stuck on it manually for a while,…
Cisco Foundry Security Spec: Open specification for agentic security evaluation (github.com via hn) Foundry Security Spec An open specification for agentic AI security evaluation, from Cisco. Cisco's Advanced Security Initiatives Group has built and operated an agentic security evaluation internally across several iterations and deployme…
Agentic AI in Big Tech and Enterprise (www.reddit.com) Disclaimer - this post was rewritten with AI based of my brain dump. Yet, I find it inspirational and useful.
Agentic AI token usage balloons cost at Microsoft, Meta, Amazon (www.tomshardware.com via hn) AI cost crisis hits tech giants as employee 'tokenmaxxing' backfires, sparking corporate pullback at Microsoft, Meta, and Amazon — agentic AI eats up to 1000x more tokens than standard AI AI is getting too expensive. Many tech companies ar…
Apex-Testing: real-world, real repos, agentic coding benchmark (Update) (www.reddit.com) BIG Apex-Testing update! https://www.apex-testing.org/ The Real-World Agentic Coding benchmark has been (95%) updated with all recent models!
Optimizing speed & quality on Qwen3.6 27b (www.reddit.com) Does the inference speed below seem optimal for the hardware, or could there be further room for improvement ? I’ve been trying to use Qwen3.6 27b for agentic harnesses like Pi/Hermes.
Cohere Open-Sources Command A+, a 218B Moe Model That Runs on Two H100s (firethering.com via hn) Cohere spent the past year deploying North, its enterprise AI workspace, with actual customers doing actual work. Agentic question answering over company file systems.
Agentic-Agile: Why Agent Development Needs Agile (Not Just Prompts) (developer.microsoft.com via hn) “A bad system will beat a good person [or agent] every time” ~Dr. William Edwards Deming (with apologies) I started vibe coding by writing prompts (often dictated into my phone), refining them with an agent in M365 Copilot, and creating ha…
Launch HN: Superset (YC P26) – IDE for the agents era (github.com via hn) Hey HN, we’re Avi, Kiet, and Satya. We’re building Superset (https://github.com/superset-sh/superset), an open-source agentic IDE for running coding agents like Claude Code, Codex, OpenCode etc in parallel.
Built my own agent runtime after hitting the ceiling with LangGraph — UI as graph nodes, Postgres durability, zero orchestration cost (www.reddit.com) I've been building agentic applications for around 2 years now. Started with loops, then moved onto langgraph + Assistant UI.
Code as Agent Harness (arxiv.org via hn) Recent large language models (LLMs) have demonstrated strong capabilities in understanding and generating code, from competitive programming to repository-level software engineering. In emerging agentic systems, code is no longer only a ta…
Building an Ai Agentic team with Claude (www.reddit.com) I've built an app using Claude/Claude Code, everything from the frontend to the backend. The app is actually functioning really well, tests are passing, and I have a small controlled group of testers that are actively using the app daily.
Show HN: Clark-Browser – Stealth Chromium (github.com via hn) Fully open-sourced, perfect for agentic browsing, works with Vercel's agent-browser and playwright.
"Is it true that you can keep coding 24/7 with AI!?" How are you conducting real-world tests in Agentic engineering? (www.reddit.com) I think many people are moving beyond "vibe coding" and building development harnesses using Agentic engineering. It’s true, I don’t write code myself anymore.
Why might MTP be net negative for tool heavy agentic flows? (www.reddit.com) The Qwen3.6-27B MTP benchmarks that have been circulating put factual tasks at 62-70% acceptance vs code at 79-89%. Tool calls probably sit in that factual range or lower, structured output, constrained format, less predictable than pure c…
Why agentic payments keep breaking. The IMF just put a name to it (www.reddit.com) The IMF published a formal note on agentic payments last month. One framing stuck with me more than the rest: "Payment systems must reconcile two fundamentally different design logics: the adaptive, probabilistic nature of agentic AI syste…
Pro X20 weekly quota is draining insanely fast after the latest Codex update. Pro X20 used ~48% in one day!!! (www.reddit.com) I’m on the Pro X20 plan, and after the latest Codex update / limit reset my weekly quota started draining much faster than before. In roughly one day of work, around 12 hours total, I went from a fresh reset to 52% remaining on the weekly…
[Vex] - I built an open-source terminal AI video editor that edits real footage with FFmpeg, Whisper, and agent tool calls (www.reddit.com) Most AI video tools feel backwards. They start with the model.
Recommendations for an agentic harness (not OpenClaw)? (www.reddit.com) I'd like to set up a local "software factory" on my laptop (M5 Max, 128GB). To do this, I'd like my agent to poll for new GitHub issues and work on them.
Anthropic built the agentic features. Now they're billing them separately. (www.reddit.com) Starting June 15, Claude subscribers get a separate monthly credit for Agent SDK and claude -p usage: $200/mo for Max 20x, $100 for Max 5x, $20 for Pro. Once you burn through it, programmatic usage stops unless you've opted into extra usag…
Claude Code already does afk agentic work without touching the new programmatic limits (www.reddit.com) Use the official channels plugin, and the teams agent in Claude code. CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 /plugin marketplace add anthropics/claude-plugins-official /plugin install discord@claude-plugins-official /reload-plugins Discord…
A²RD: Agentic Autoregressive Diffusion for Long Video Consistency (dxlong2000.github.io via hn) Synthesizing consistent and coherent long video remains a fundamental challenge. Existing methods suffer from semantic drift and narrative collapse over long horizons.
Max20 user: anyone running Opus 4.7 as orchestrator + DeepSeek V4 as the worker via OpenRouter? (www.reddit.com) I'm on the Max20 plan, thinking about a setup before I sink time into it. Want to hear from anyone actually running it, not theorycraft.
Show HN: Statewright – Visual state machines that make AI agents reliable (github.com via hn) Agentic problem solving in its current state is very brittle. I fell in love with it, but it creates as many problems as it solves.
What I've learned designing agentic workflows for docs (passo.uno via hn) What I've learned designing agentic workflows for docs Back in 2024 I wrote that AI helps me remove boring work at the margins. This is fine for a lone writer, but how to scale this to an entire team of technical writers?
Show HN: SLayer, a semantic layer maintained by your agent (github.com via hn) Hello HN! If you want to connect your agent to a database (say, to build a data analyst chatbot or any kind of agentic app) today you have 2 options: an SQL MCP server or a semantic layer.
We need a safe alternative to Telegram for agents like OpenClaw or Hermes (news.ycombinator.com) The problem with Telegram is, that it is not E2EE - so every message you send will end up *unencrypted* on their servers. Think about it - how often did you post the Gmail authentication URL or another API token in the Telegram chat?
Looking for seed funding (www.reddit.com) Looking for seed funding for a agentic solution that helps companies grow their business via hyper personalised curated content distributed to multiple Chanels and decrease CAC. This tool is for companies who are focused on their niche eg:…
Agentic Hooks - Stream Deck plugin (www.reddit.com) I had itch to address long running task with Claude, where I wanted to see when its done working. And I wanted separate context flow for these alerts instead of using existing flow (phone, discord, telegram, etc) This is when idea born, sh…
DS4 (www.reddit.com) The developer that created Redis, Salvatore Sanfilippo, has released a new project on GitHub named DS4. https://github.com/antirez/ds4/ The TL;DR on this one is getting DeepSeek V4 Flash running with a 1M context windows on Mac Metal hardw…
MCP for sandboxed, reproducible envs for agentic-first coding workflows (github.com via hn) devcontainer-mcp Give your AI agent its own dev environment — not yours. devcontainer-mcp is an MCP server that lets AI coding agents create, manage, and work inside dev containers across three backends: local Docker, DevPod, and GitHub Co…
I built agent-browser but for OS automation. (www.reddit.com) Hey r/AI_Agents ! I was using agent-browser to power my agentic workflow, and it worked great.
Building Agentic GraphRAG Systems: From knowledge graphs and ontologies to a unified memory as an MCP server for your AI agent. (www.reddit.com) I gave this talk twice in one month: at O’Reilly’s Context Engineering Event and at Abi Aryan’s Maven course on LLM inference at scale. After being blasted with questions, I realized something: GraphRAG isn’t a retrieval algorithm, it’s a…
How are you protecting your AI agents' memory from poisoning attacks? (www.reddit.com) As AI agents become more autonomous and persist memory across sessions (RAG indexes, conversation history, vector stores), there's a growing attack surface that most people aren't thinking about: memory poisoning.An attacker can plant mali…
Why people cares token/s in decoding more? (www.reddit.com) What I've noticed while using local LLM recently is that in most cases, bottlenecks occur not in decoding but in prompt processing. If the prompt processing speed is usable, in most settings (since it takes about 15k when starting based on…
Agent Exchange – A2A discovery with real-time bidding for AI agents (github.com via hn) Agent Exchange (AEX) The NASDAQ for AI Agents A programmatic marketplace applying ad-tech economics for agentic AI services What Problem AEX Solves? As AI agents proliferate, enterprises face a critical challenge: the N×M integration probl…
Need advice: Qwen3.6 27B MTP or 35B-A3B MoE MTP on 16GB VRAM RTX 5080)? (www.reddit.com) Hey folks, looking for advice before I delete or keep a huge model file. I’m testing local coding/agentic workflows on an RTX 5080 16GB + 96GB RAM.
Agentic Malware Analysis: String Decryption, API Hashing and Unpacking [video] (www.youtube.com via hn) About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
Anthropic quietly nerfed Claude Code's 1-hour cache (www.xda-developers.com via hn) Claude Code has become the default agentic coding tool for a lot of developers, and for good reason. It understands a codebase, calls tools, edits files, and can plan multi-step tasks with very little handholding.
Process-Level Reward Modeling for Agentic Data Analysis (arxiv.org via hn) Process Reward Models (PRMs) have achieved remarkable success in augmenting the reasoning capabilities of Large Language Models (LLMs) within static domains such as mathematics. However, their potential in dynamic data analysis tasks remai…
Is anyone else exhausted by "glorified prompt chains" being marketed as Agents? (www.reddit.com) It feels like every new SaaS wrapper right now claims to be "agentic." But when you actually look under the hood, 90% of them are just hardcoded prompt chains with a couple of basic API tools thrown in. I’ve been spending a lot of time rec…
Show HN: Zerminal – a terminal-first Zed fork for AI coding agents (zerminal.dev via hn) A terminal-first development environment for agentic coding. Use Claude Code, Codex, Aider, and other CLI agents in a focused workspace.
I cut Codex’s API Usage by 50% using a self modifying system (www.reddit.com) I've been developing a self-modifying Al agent system that effectively cut my Codex/Claude Code API usage in half, Codex makes a plan and then I basically just copy/paste Codex instructions for the agents to work on. Come back in 6 hours a…
Best suited model for solo Dev (www.reddit.com) Hey everyone! I've kinda new to Claude, I've only had few chats with it but nothing too deep like projects etc.
What is the best all-round local model? (www.reddit.com) Not for agentic coding but for help in conversational style write-ups like markdown documentation (not code-related). Constraints are 64GB unified memory, obviously local.
Signals - finding the most informative agent traces without LLM judges (arxiv.org) (www.reddit.com) Hello Peeps Salman, Shuguang and Adil here from Katanemo Labs (a DigitalOcean company). Wanted to introduce our latest research on agentic systems called Signals.
love it - Qwen3.6-27B — UD-Q5_K_XL evaluation (www.reddit.com) by Kyle Hessling A hands-on benchmark of the Unsloth dynamic Q5 quantization, self-hosted on a single RTX 5090. 19 runs, 93.9 k generation tokens, across agentic reasoning, production-grade front-end design, and canvas / WebGL creative cod…
We scanned 100 Smithery MCP servers, 22 flagged, here's what we found (news.ycombinator.com) We built Bawbel (https://bawbel.io), an open-source scanner for agentic AI components. Released v1.0.1 this week.
I audited LangChain’s core library and found 10+ Prompt Injection vulnerabilities. Here is the technical breakdown. (www.reddit.com) Hey everyone, I’ve been working on a project to solve a major problem in AI security: Traditional SAST tools (Snyk, SonarQube, etc.) are blind to "Agentic Logic" bugs. They look for bad strings, but they don't understand how user data can…
Ask HN: Anyone using AI agents for active learning sprints? Here's my setup (news.ycombinator.com) Hi HN, I'm a big fan of AI's ability to provide personalized tutoring. So, lately, I have been using my Antigravity IDE (you can use any agentic harness) for personal learning.
Andrej Karpathy: From Vibe Coding to Agentic Engineering [video] (www.youtube.com via hn) About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
Spam bots are ruining it for everyone (www.reddit.com) Sorry for this rant, but I feel like venting to someone. Recently I set up an agent on a cloud VPS.
14-day growth agents contest on a serious AI stack (for loop-minded builders) (www.reddit.com) Sharing an AI-native growth agents contest that feels very on-brand for this sub. VideoDB (infra for video/audio for AI agents) is running a 14-day sprint/contest called Growth Forge for 5 builders to design and ship a growth agent on top…
What agentic framework are you actually using in production? (www.reddit.com) Feels like a new agent framework drops every other week. Curious what people are actually shipping with vs just experimenting on weekends.
The Race Is on to Keep AI Agents from Running Wild with Your Credit Cards (www.wired.com via hn) Between malware, online impersonation, and account takeovers, there are enough digital security problems out there as it is. And with the rise of agentic AI, more activity is being carried out by agents on behalf of humans—creating differe…
Show HN: SlopIt – A dead-simple CMS for your AI agent (slopit.io via hn) Hey HN. I built a dead-simple CMS for your AI agents — https://slopit.io Kept it minimal and agentic-first.
Sharing my minimal dev AI workflow Claude Code agent that takes a GitHub issue to merged PR with 3 human gates (www.reddit.com) Sharing a workflow in case it's useful to anyone else exploring agentic coding loops. The setup is one orchestrator agent (issue-resolver) that handles a GitHub issue end-to-end.
DeepSeek V4 Pro: Validating Frontier Models for Production (fireworks.ai via hn) Why we chose correctness over a Day-0 launch DeepSeek V4 Pro is one of the most important open-model releases this year, with real advances in long-context reasoning, agentic performance, and inference efficiency. On paper, it looks like a…
Ask HN: Will fixed applications become a thing of the past with agentic AI? (news.ycombinator.com) Right now its mostly technical people using these agentic tools but if you extrapolate a few years into the future it seems likely to me that every day users of a computer will be using them as a whole new interface to interact with their…
Agentic Ai Revolution humming along… (www.reddit.com) while people argue about ai ethics on the surface there’s a whole underground building agents that never sleep different timelines forsure which timeline are you on?
Agents for end-to-end document redaction and review tasks (OCR and PII identification - Qwen 3.6 vs closed-source comparison) (www.reddit.com) (Links to all files, apps, and repos mentioned in this post can be found in the 'full post' link at the bottom) Agents for document redaction and review tasks Document redaction tasks involve text and vision capabilities, and long context…
Ace Technical Preview: GitHub Next's Agentic Workspace – Maggie Appleton [video] (www.youtube.com via hn) About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
Mastermind – agentic SDLC workflow for VS Code (news.ycombinator.com) Prototype of an agentic SDLC workflow running inside VS Code + Copilot. Simple loop: task → reasoning → audit → memory → RAG refresh.
Show HN: HyperFrames – OSS Agentic HTML Video Framework for Agents (miguel07code.dev via hn) We built in HeyGen an open source framework specifically made for Agents solving our own pain point that we had when the agents tried to write Remotion. React is not agent-friendly at all, and Remotion is a custom framework where the agent…
how far we have came.. (www.reddit.com) From meta launching the lama models to oss models and agentic and coding models we have came fucking far in no mean i guess this is the fastest evolution out of all diff things we have saw this i guess is the era similar to diff innovation…
Pact: Trustworthy Coordination for Multi-Agentic Ecosystems (www.basis.ai via hn) Pact: Trustworthy Coordination for Multi-Agentic Ecosystems Article: Kiran Gopinathan, Jack Feser, Michelangelo Naim, Eli Bingham, Zenna Tavares |April 23, 2026 Autonomous agents are beginning to act on our behalf. LLM agents already negot…
Meta Partners with AWS on Graviton Chips to Power Agentic AI (about.fb.com via hn) Today, we’re announcing an agreement with Amazon Web Services (AWS) to bring tens of millions of AWS Graviton cores into Meta’s compute portfolio, making us one of the largest Graviton customers in the world. Processing cores are units ins…
Future-proofing an enterprise agentic platform architecture (medium.com via hn) medium.com Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.
Complete beginner to Agentic coding, is Qwen3.6-27B + pi.dev the right starting point or should I be looking elsewhere? (www.reddit.com) Hello fellow members of this lovely community, Let me start by saying that I’m about as far from a professional developer as it gets. I’m a hobbyist whose entire coding experience consists of building various Python/VBA tools and simple Ja…
Anyone else noticing how Gemini-3-Flash is becoming the 'hidden' beast for automated promotions, its so productive? (www.reddit.com) I've been testing a few different models for desktop-driven outreach and promotion workflows. While everyone is eyeing the massive LLMs, Flash-Preview is hitting that sweet spot of speed and reliability for multi-step agentic tasks and its…
DeepSeek V4 is out. the best open-source on coding. here's the breakdown (news.ycombinator.com) Two models: Flash (284B total, 13B active) and Pro (1.6T total, 49B active). both hit 1M token context.
I got tired of Claude writing Godot 3 code in my Godot 4 projects, so I built a skills framework and I would love your feedback (www.reddit.com) Hey, if you've ever used Claude Code (or Cursor, Copilot, etc.) for Godot game dev, you've probably hit this: the agent confidently writes Godot 3 syntax in a Godot 4 project, or uses deprecated patterns, or just invents APIs that don't ex…
Need help for a calling based agentic ai project (www.reddit.com) Agentic framework that self-improves its stock portfolio strategy (GitHub).) (github.com via hn) Arent These single file LLM coding tests like browserOS pretty much redundant now most 2026 LLM can easily handle this? (www.reddit.com) How to Build Advanced Generative AI Agents (Kinda) (www.generative.inc via hn) The tools, frameworks, and protocols we use to build AI agents, agentic workflows, and intelligent applications. ModelsShmodles Before we get into the stack, the single most important thing we believe about building agents: We do not care…
Opus 4.7 dominates agentic benchmark, 15% more expensive than Opus 4.6 (app.uniclaw.ai via hn) See how top AI models stack up — real tasks, real agents, real results on OpenClaw ?Also show provisional models and official models hidden by default, such as legacy or superseded variants. Provisional models have fewer battles, and hidde…
GitHub Copilot is serving Opus 4.7 at 7.5x multiplier until April 30th (github.blog via hn) Claude Opus 4.7 is generally available Claude Opus 4.7, Anthropic’s latest Opus model, is now rolling out on GitHub Copilot. In our early testing, Opus 4.7 delivers stronger multi-step task performance and more reliable agentic execution,…
2026 Agentic Coding Trends Report [pdf] (resources.anthropic.com via hn) Title: 2026%20Agentic%20Coding%20Trends%20Report.pdf URL Source: https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf Published Time: Wed, 21 Jan 2026 22:37:47 GMT Number of Pages: 18 Markdown Content: 2026 A…
Beyond Prompts: A Tiered Trust Model for Autonomous Agents (Experiment Report) (www.reddit.com) We often talk about agent autonomy, but rarely about the "Harness Engineering" required to make that autonomy safe. I’ve been running a design experiment comparing agentic workflows on open platforms (OpenCode) vs.
Why model drift is the real failure mode for agentic systems (www.reddit.com) Across Twitter and Reddit, I keep seeing the same complaint: Claude feels worse. Not on a benchmark.
Anybody has practical experiences using Chinese models? (www.reddit.com) So like with coding or any craft, I think there's a proper Tool for the job. Sure you can use a stone to hammer drive in a fence post, but a a sledge is usually more economical.
Huge throughput gains when switching agent evals to shared environments with per-run isolation (www.reddit.com) Thanks all for the comments on my previous post about local-first agentic evaluation collapsing in long stateful agents runs, just sharing an update on where I’m at now in case it helps as I had another issue to overcome. Took on board the…
Zuver – Build your enterprise Agents with just 10MB RAM (news.ycombinator.com) I built Zuver, the generic Agentic AI framework for scalable, reliable, even on-edge AI applications and Agents. It's completely written in Go, which lowers the RAM usage to around 6MB, compared to other Agent framework that's usually arou…
Agentic coding at enterprise scale demands spec-driven development (venturebeat.com via hn) Agentic coding at enterprise scale demands spec-driven development | VentureBeat Orchestration Infrastructure Data Security More Newsletters Partner Content Agentic coding at enterprise scale demands spec-driven development Deepak Singh, A…
Agentic AI pentesting with Strix: results from 18 LLM models (theartificialq.github.io via hn) Over the last couple of months, I spent close to a hundred hours testing an autonomous AI pentesting tool called Strix with 18 different LLM models. My goal was to evaluate which LLM model performed best with the tool in this lab setup and…
Show HN: The opensource, reliable, scalable Agentic AI framework under 10MB (zuver.cc via hn) Multi-Agent Orchestration Deploy and coordinate multiple specialized AI agents through visual flow-based pipelines. Zuver's routing engine handles inter-agent messaging, task delegation, and stateful coordination natively.
Cephalopod Coordination Protocol, Useful for Teams Using AI Agents (github.com via hn) Cephalopod Coordination Protocol A Rust-based client-server coordination protocol for agentic systems. Install · Quick Start · Droplets · Use Cases · Docs · Security What is this When you have multiple agents working together they need som…
Show HN: OQP – A verification protocol for AI agents (news.ycombinator.com) As AI agents autonomously write and deploy code, there's no standard for verifying that what they shipped actually satisfies business requirements. OQP is an attempt to define that standard.
Mi – agentic harness in 30 lines of JavaScript (github.com via hn) https://github.com/user-attachments/assets/9289d105-5a40-442d-b1b5-773723c95c13 agentic coding in 30 loc. a loop, four tools, and an llm.
Agentic RL: Token-In, Token-Out Done Right (qgallouedec-tito.hf.space via hn) Is Grep All You Need? How Agent Harnesses Reshape Agentic Search (arxiv.org via hn) Apple Passwords Now Auto Fixes Weak and Compromised Passwords with Agentic AI (www.macrumors.com via hn) Apple Passwords Can Now Automatically Fix Weak and Compromised Passwords With Agentic AI Apple today announced that the Passwords app can now automatically update weak and compromised passwords using Apple Intelligence and Safari to take a…
Agentic AI solved coding and exposed every other problem in SE (venturebeat.com via hn) Agentic AI is now a core part of the engineering process, driving massive execution leverage and helping us generate more code than ever before. Yet, a difficult question I’ve increasingly heard from business leaders is: if we’re shipping…
Show HN: AgentCrew – a Markdown-first operating system for AI coding agents (github.com via hn) AgentCrew Turn your coding agent into a disciplined team. AgentCrew is a conversation-first, Markdown-first methodology for agentic coding.
Show HN: Version Control for AI Agents (cognatoai.com via hn) Git/GitHub did not evolve for agentic era, so we are building
Improving LM Studio's MLX Engine for Agentic Workflows (twitter.com via hn) We recently released mlx-engine v1.8.5 in LM Studio. This update dramatically improves performance for repeated, long-context agentic workflows by checkpointing your KV cache.
Agentic AI spurred a boom in mobile apps, but they aren't gaining traction (twitter.com via hn) Don’t miss what’s happening People on X are the first to know. Log in Sign up Post Conversation Jen Zhu @jenzhuscott Massive output uptick due to agentic AI.
Ask HN: Will your company be doing "LeetCode" interviews a year from now? (news.ycombinator.com) I work in big tech. I'm a SWE manager, but I have a half a mind to return to being an IC at some point.
Show HN: Omni – Local-first multimodal file search on macOS (hanxiao.io via hn) Finally made something I've always wanted, using the model we built. • SOTA omni embedding model, fully local, indexes text, PDF, image, audio, and video • Swift-native app UI + mlx-swift-transformer core.
Blumi CLI – A Private Agentic Runtime with Grid Dispatch (github.com via hn) blumi A local-first, provider-agnostic agentic coding companion — one Rust core, three faces: a terminal UI, a web UI, and a phone app. blumi is a single Rust binary whose UI-agnostic core emits one typed event stream, so every surface sho…
Vibe Coding Is Dangerous, Agentic Engineering Isn't (motherduck.com via hn) Wes McKinney, creator of pandas and co-creator of Apache Arrow, shares how he works with AI coding agents: spec-driven workflows with superpowers, continuous AI code review with Roborev, token economics, and why vibe coding is dangerous bu…
agentgateway Joins AAIF as an Open Gateway for Agentic AI Infrastructure (aaif.io via hn) The Agentic AI Foundation welcomes agentgateway — an open source gateway purpose-built for MCP, Agent-to-Agent, LLM, and API traffic — as its newest hosted project.
Training an Agentic Router for Optimal Cost-Performance on SWE Tasks (www.appliedcompute.com via hn) Training an Agentic Router for Optimal Cost-Performance on SWE Tasks On most enterprise tasks, model quality is not a scalar. One model is better at long-horizon repository exploration.
Rewiring software delivery for the agentic era (www.mckinsey.com via hn) You don't have permission to access "http://www.mckinsey.com/capabilities/technology/our-insights/rewiring-software-delivery-for-the-agentic-era" on this server. Reference #18.ee18d017.1780614098.58f52b59 https://errors.edgesuite.net/18.ee…
Trader – LLM agent for Robinhood with a Rust safety layer and paper trading (github.com via hn) Trader — LLM-Driven Robinhood Trading Agent A Rust agent that connects an LLM to Robinhood's official agentic trading API, enforces hard risk limits in a typed safety layer, and paper-trades against live market data before you risk a dolla…
Copilot SDK is now generally available (github.blog via hn) Copilot SDK is now generally available The GitHub Copilot SDK is now generally available. You can embed GitHub Copilot’s agentic engine into your own applications, services, and developer tools with a stable API and production-ready suppor…
Dumb core, smart edge for AI agents (arizenai.com via hn) Dumb Core, Smart Edge: Agentic Design Many agentic systems I've watched fail in production had the same shape: intelligence concentrated at the center, where it was hard to test, replace, or reason about. The orchestrator was doing too muc…
From Specialists to Builders: How AI Agentic Coding Is Reshaping Software Teams (aliparnan.com via hn) Specialization defined software teams for decades. AI agentic coding is creating a new Builder role—people who orchestrate agents across disciplines and own outcomes end to end.
Why Merge Conflicts Became the New Agentic Bottleneck (adamtornhill.substack.com via hn) Why Merge Conflicts became the new Agentic Bottleneck Revisiting some techniques from Your Code as a Crime Scene in the light of agentic coding. Specifically, how a socio-technical fit becomes even more important now that agents are our ac…
Vegvisir – Agentic Harness Built for Software Developers (github.com via hn) Vegvisir Agent Harness Vegvisir is a local-first agentic software development harness for people who want an AI engineering assistant that can actually work inside a repository without being handed every secret, every permission, and every…
Agent Code – open-source Mac app for managing AI coding agents (github.com via hn) A native macOS platform for agentic coding workflows, powered by Pi. Manage agents, skills, prompts, subagents, worktrees, and GitHub work in one signed Swift app that runs the installed pi CLI in the background.
AI in SRE: Where and how Google is deploying agentic AI to improve operations (cloud.google.com via hn) AI in SRE: Where and how Google is deploying agentic AI to improve operations Stevan Malesevic Distinguished Software Engineer Christopher Heiser Distinguished Site Reliability Engineer Since its inception over 20 years ago, Google has use…
MiniMax M3 (xcancel.com via hn) Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax…
Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks (arxiv.org via hn) Long-context Large Language Models, despite their expanded capacity, require careful working memory management to mitigate attention dilution during long-horizon tasks. Yet existing approaches rely on external mechanisms that lack awarenes…
A standard for building production AI agents (+ installable Claude Code skills) (github.com via hn) The Agentic Product Standard A canonical standard for building production-grade agentic products — plus a Claude Code skill set that operationalizes it. Distilled from the production practices of Anthropic, OpenAI, Cognition, Sierra, LangC…
Ask HN: What are your worst war stories bringing agentic applications into prod (news.ycombinator.com) For a bit of context, I’m currently creating a team of AI agents at work to generate reports by fanning out into a large amount of subagents to process a large amount of transcript data. When the analysis fails mid-way because of some indi…
6 Months of "Agentic" Coding (ashutoshbsathe.github.io via hn) ~6 months of "agentic" coding So I was a complete skeptic of the whole “agentic” / “vibe” coding phenomenon before last year’s NeurIPS. While I was in California anyway, I decided to visit a bunch of friends in San Francisco.
Show HN: Jynx, a matchmaking app to find gaming teammates (jynx.app via hn) TL;DR: Jynx is a gaming social platform that matches you with compatible teammates based on skill level, play style and schedule. Swipe to find players (Tinder-style), create or join game sessions (LFG), chat, and build your squad.
Spatial IDE's for agentic coding workflows (news.ycombinator.com) Seeing spatial IDE's (where terminals and files are displayed on a canvas instead of a regular dock like vscode) more often right now on HN and Reddit. This is a selection of the ones I've seen.
Three flavors of coding with AI agents (nocodefunctions.com via hn) A reasonable definition of an “AI agent”, at least in the context of agentic coding, could be: a software process endowed with the capabilities of an LLM launched with instructions given at the start to accomplish a task which runs…
Embodied Cognition and Agentic AI (lemire.me via hn) Where is your intelligence located? In your brain?
Arm Metis with GPT5.5 Cyber scores 98% on firmware vulnerability benchmark (newsroom.arm.com via hn) Agentic AI-powered Arm Metis advances security vulnerability discovery in software In the era of AI, modern software systems are built across increasingly complex codebases, frameworks, runtimes and libraries. As these systems scale, so do…
Show HN: TheFoundry – Easy bootstrapping framework for MultiAgent Systems (github.com via hn) For months, I struggled to build complex, long-running projects using AI agents and I kept failing... One shots, refactoring, high token consume...
Robinhood's bet on agentic trading and purchasing is 'wake-up call' for banks (www.americanbanker.com via hn) The brokerage fintech launched agentic trading and an agentic credit card today that will allow AI agents to trade equities and make credit card purchases on customers' behalf. It comes just weeks after OpenAI rolled out its own personal f…
Show HN: Agentic Intent Benchmark (github.com via hn) intent-bench An open-source benchmark measuring whether providing structured intent to coding agents improves implementation effectiveness. What This Measures Existing agent benchmarks (SWE-bench, HumanEval, Aider Polyglot) test single-req…
Why $/token is the wrong metric for Enterprise AI (agentic) applications (canyoncode.ai via hn) Canyon Code gives enterprises the ability to observe, optimize, and governance their multi-agentic AI applications. The Workflow Intelligence Layer
The Self-Healing Vector Database (www.reddit.com) A pattern I keep seeing in agentic RAG systems: The agent is smarter than the retrieval layer. It can notice that context is stale.
Open-source playbook on agentic working — for the cross-audience, not just coders (28 chapters, MIT) (www.reddit.com) Author disclosure upfront: I wrote this. Free, MIT-licensed, no paid tier.
15 AI Agentic Design Patterns (www.reddit.com) As important the design patterns are important in coding for maintainability and scalability. Such patterns also exists in Agentic AI workflows also.
Best harness for agentic analytics? Codex? Claude? Custom? (www.reddit.com) I run a small seo marketing agency and we've built some dashboards on top of our data for reporting with nextjs + supabase. This is where reporting for our clients happen.
Ask HN: Do coding agents need cross-tool org knowledge? Or, just good to have? (news.ycombinator.com) I've been talking to engineers, mostly in large teams. While they love cross-surface search with Glean, they still assimilate and curate the context for agents manually.
I built an agentic coding harness across three CLI hosts (pub.towardsai.net via hn) 8 min read May 13, 2026 This article is a work in progress. I will keep updating it as the kit evolves.
Agentic AI Flywheels (www.newsletter.swirlai.com via hn) Agentic AI Flywheels The production loop after your agent ships, and the eval set that grows with it. 👋 I am Aurimas.
Tool-schema compression enables agentic RAG under constrained context budgets (arxiv.org via hn) Agentic RAG systems that equip language models with dozens to hundreds of tool definitions face a critical resource conflict: tool schemas consume the same context window needed for retrieval-augmented generation. We present the first syst…
the agentic depth gap between open source AI assistants ranked (www.reddit.com) Agentic depth measures how far an autonomous agent can take a task before human intervention. The gap between open source options on this dimension is wider than feature comparisons suggest.
Looking for Suggestions — Single 5090 & 64gb DDR5 (www.reddit.com) Hi Reddit, I am planning on running Qwen 3.6 27b NVFP4 via vLLM on my 5090 but was wondering if something like 35b a3b at Q8 on Llama would produce better results for agentic coding and utilize the system memory. My research says no but if…
Breaking Bot: Hacking and Defending LLM-Based Applications (www.szia.ai via hn) Breaking Bot: Hacking & Defending LLM-based Applications - Marton Antal Szel - Dec 24, 2025 - 12 min read Updated: 4 days ago Let's say your "super-intelligent" agentic chatbot - the one with access to sensitive customer data - is hijacked…
Harbor v0.4.19 - vllm/sglang/llama.cpp launch codex/claude/pi/opencode (www.reddit.com) I'm usually not posting about Harbor releases out of the respect for the community here, but I think v0.4.19 might save a lot of people some time. Harbor can now launch your local agentic coding tools with local inference backends.
I replaced my old job with an AI agent (www.reddit.com) Hello friends. Today I want to talk about agentic media buying.
I Built MagesticAI. A Cloud Web-Based Agentic DevOps Orchestrator that actually helped me develop Itself. (www.reddit.com) Posted on other feeds last week and figured some of you out here might be interested as well; Someone commented asking if it supported OpenAI-compatible endpoints (LM Studio, vLLM, OpenRouter, Together, Groq, LocalAI…), so i have spent few…
Testers and collaborators wanted (www.reddit.com) Hello, I'm working on an Agentic wrapper system, Helix-agi, and I am trying to get some additional testers and collaborators involved in the project. Helix relies on a unique Agentic workflow that routes all incoming data, including tool u…
Inside Google’s Agentic Search Revolution (puck.news via hn) puck.news Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.
Agentic AI Changes the CPU/GPU Equation (www.amd.com via hn) Agentic AI Changes the CPU/GPU Equation Skip to main content Enable accessibility for low vision Open the accessibility menu Skip to main content AMD Website Accessibility Statement Products Processors Accelerators Graphics Adaptive SoCs,…
Validating an idea, would anyone be interested in e-commerce designed for agents? (www.reddit.com) Me and 2 other friends are trying to solve payments through agents. One of the ideas we're looking into is merchant integration to allow agentic payments using any of the plethora protocols that exist (MPP/UCP/x402/AP2/Google's Universal C…
Rust is a great fit for the agentic era (kerkour.com via hn) We're sorry but this website doesn't work properly without JavaScript enabled. Please enable it to continue.
I built 10 gamified, interactive presentation decks to teach Agentic AI (Stop falling asleep reading whitepapers). (www.reddit.com) Hey everyone, I've noticed a massive gap in how developers are trying to learn Agentic AI right now. There are hundreds of theoretical whitepapers and boring PowerPoint decks about ReAct loops, GraphRAG, and Semantic Routing.
I’m a solo dev building TigrimOSR, a Rust-native AI agent workspace for engineering and developer workflows. (www.reddit.com) The main problem I’m trying to solve is that agentic AI is still too random for serious engineering decisions. For design work, calculations, reports, code changes, or technical review, I don’t want agents just “vibing” through tasks.
Stripe's John Collison on How Agentic Commerce Will Reshape the Internet [video] (www.youtube.com via hn) About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
A Marketplace of Fine Tuned SLMs for Agentic Tasks (marketplace.neurometric.ai via hn) 130 models available Small Models, Big Impact Discover task-specific SLMs ready for your business, or browse general models under 20B parameters. Need something custom?
Anybody knows why cursor trying to move into "claude desktop" style app? (www.reddit.com) It makes absolutely no sense for cursor trying to switch over to Claude or Codex desktop style app. I am a Neovim/VSCode user and I only recently started using cursor, and found out that the UI/UX for agentic coding is phenomenal.
Moss: Self-Evolution Through Source-Level Rewriting in Autonomous Agent Systems (arxiv.org via hn) Autonomous agentic systems are largely static after deployment: they do not learn from user interactions, and recurring failures persist until the next human-driven update ships a fix. Self-evolving agents have emerged in response, but all…
CodeAlta an efficient agentic AI coding CLI assistant coded in C#/.NET (codealta.github.io via hn) ██████ ██ ██ ██ ██ ██░░░░██ ░██ ████ ░██ ░██ ██ ░░ ██████ ░██ █████ ██░░██ ░██ ██████ ██████ ░██ ██░░░░██ ██████ ██░░░██ ██ ░░██ ░██ ░░░██░ ░░░░░░██ ░██ ░██ ░██ ██░░░██ ░███████ ██████████ ░██ ░██ ███████ ░░██ ██ ░██ ░██ ░██ ░██ ░██░░░░ ░█…
Anyone evaluated the difference between Qwen Code for the local qwen models vs another harness? CC, OC, LC, Aider etc.. (www.reddit.com) For me, opencode doing fantastically but was wondering if qwen code would be more native and have better functionality, since idk which agentic harness they used to get their benchmark results
What nobody's measuring about dense MoE in production tool calling agents (www.reddit.com) Most of the model selection conversation I've seen focus on benchmark scores and cost (no surprise there). The question I can't find good production data on is whether dense vs MoE actually affects reliability for tool heavy agentic flows,…
[Blogpost] Files Are All You Need: Towards Self-Improvement in ChatGPT (www.reddit.com) Subreddit rule statement: link to blog post in the comment. Not self-promotion.
Karpathy's LLM-Wiki for agentic software development? (www.reddit.com) I’ve been away from coding/software development for about a year. When I stepped away last summer, agentic software development wasn’t nearly as capable or accessible as it seems today.
A solution for schlep blindness in agentic development for Kubernetes envs (metalbear.com via hn) Turn your AI agents into autonomous developers Use mirrord to instantly validate every change against your live staging environment — multiple agents, same cluster, no conflicts. Windsurf & others or CLI No credit card needed Fast setup, n…
Show HN: P-Hacker – group/analyze HN trends by topic (not just keywords) (p-hacker.com via hn) I'd seen various HN trends tools over the years ([1] [2]), but they all used strict keyword (n-gram) matching. That limited a) how sophisticated any trend-surfacing could be and b) the depth with which you could explore the full discussion…
RTMX: Intent Layer for Agentic Engineering (github.com via hn) RTMX Track what you built, what's tested, and what's next -- from the terminal. RTMX is a CLI that manages requirements traceability as a CSV file in git.
The Claude Code Production Playbook: Sub-Agents, Hooks, and MCP Integration (ddsboston.com via hn) Claude Code Masterclass 2026 The definitive end-to-end guide to Anthropic’s agentic coding tool — installation, Ollama local fallback, CLAUDE.md, Skills, Subagents, Agent Teams, Hooks, and MCP. Everything you need before building productio…
Not All Software Systems Are Agent Friendly (yassi.dev via hn) Discourse around AI tends to collapse into two camps: true believers and luddites. A recent piece, Agentic Coding is a Trap, highlights what the author calls the “paradox of supervision” - where the very judgment needed to oversee AI deleg…
What's the best qwen3.5 or 3.6 reap model? (www.reddit.com) What's the best reap (pruned) model you know of? This one runs twice as fast on my low vram setup, but I'm unsure if it will miss out on a lot of things agentic coding related.
Agentic PCB Design Sucks (github.com via hn) HPM Component Registry A community-driven, open-source registry of common PCB components — symbols, footprints, datasheets, and structured electrical specs — designed to be read both by humans browsing for parts and by AI agents resolving…
Ask HN: How are agentic workflows meant to offset AI debt? (news.ycombinator.com) I don't know quite how to put it. But projects I inherit and am supposed to get over the line have this same strange quality: they are 'undesigned'.
Show HN: AgentShield – Stop AI agents from spending money unsupervised (agentshieldv2-dashboard-production.up.railway.app via hn) I'm a recent grad from UMich and built AgentShield because agentic AI is moving fast but payment safety hasn't caught up. Agents are already being handed API keys, stablecoin wallets, and payment credentials - if one misbehaves, gets promp…
The Agentic Loop (hypnodrones.com via hn) The agentic loop First, we offloaded knowledge to writing. Then, we came for means of production.
Building an AI agent with OpenAI tool use — struggling with consistency. How do you enforce tool call order reliably? (www.reddit.com) Hey, Software engineer here, relatively new to agentic workflows. Building a production AI concierge — user says "I'm going to Budapest tomorrow, plan my day" → agent searches our offer database, builds a plan, user books everything in one…
Understanding, Analyzing, and Optimizing Agentic AI: A CPU-Centric Perspective (arxiv.org via hn) Agentic AI serving converts monolithic LLM-based inference to autonomous problem-solvers that can plan, call tools, perform reasoning, and adapt on the fly. Due to diverse task execution need, such serving heavily rely on heterogeneous CPU…
Qwen3-Coder-Next-UD-Q4_K_XL vs. Qwen3.6-27B-MTP-UD-Q4_K_XL on Strix Halo (www.reddit.com) I wanted to switch from Qwen3-Coder-Next-UD-Q4_K_XL to Qwen3.6-27B-MTP-UD-Q4_K_XL for local agentic coding. The Qwen3.6-27B is perceived to be "smarter" than Qwen3-Coder-Next, and I wanted to "upgrade" my local AI coders.
Show HN: Nano-RAG – Agentic multi-hog retrieval without graph database (news.ycombinator.com) https://nanorag.nb1t.sh/ Important: Please choose correct namespace from top-right dropdown. Available docs/namespaces: Cloudflare, Nextjs, and Dodo-payments (default).
Show HN: Agentic simulator for marketing email A/B testing (inbox-wars.com via hn) I built an agentic simulator for marketing email a/b testing using a fleet of "digital twin" customers. why build this?
Hershey Bets on Agentic AI to Rethink $2B in Marketing Spend (www.adweek.com via hn) Hershey is revamping one of marketing’s oldest measurement tools—marketing mix modeling—by enlisting agentic AI in a bid to turn what has historically been a slow, backward-looking process into something closer to real-time. The confection…
UGen: An Agentic Framework for Generating Microarchitectural Attack PoCs (arxiv.org via hn) Microarchitectural attacks continue to evolve, uncovering new exploitation vectors in modern processors. From a defensive perspective, assessing a system's susceptibility to such attacks remains challenging.
Have you tried Agentic analytics tools? (mitzu.io via hn) TL;DR Compare the best AI analytics tools in 2026 across semantic-layer trust, no-hallucination reliability, SQL transparency, and team fit. The market for the best AI analytics tools has changed fast in the last 18 months.
How to Learn Agentic AI in 2026 – Without Getting Lost in Hype (simplai.ai via hn) How to Actually Learn Agentic AI in 2026 — Without Getting Lost in Hype Most AI courses teach you theory and leave you stranded before deployment. SimplAI University is built differently — 11 structured chapters, real tools, and a communit…
Benchmarking the new b9200 update: Optimizing Qwen 3.6 27B mtp for Hermes Agent on a single RTX 3090 (www.reddit.com) I'll be UPDATING this as it seems I was benchmarking and testing Just before the UPDATE LOL TL;DR If you're running rigid agent frameworks locally with mtp on consumer hardware: drop your draft window to 3, lock parallel slots to 1, and co…
Did anyone here did the certification: GitHub Certified: Agentic AI Developer (beta) (www.reddit.com) Hello everyone, I wanted to ask if anyone here got the certifcation GitHub Certified: Agentic AI Developer (beta) or was thinking of getting it? What do you think about it?
ik_llama: Qwen3.6 27B and 35B on very low VRAM (www.reddit.com) Thank you to the people at ik_llama and llama.cpp. It's amazing how far you've all pushed mtp and other tech so that I can run 27B and 35B Qwen3.6 models on an old gaming laptop with a RTX2060 mobile at 6GB VRAM and 32GB RAM.
Agentic Trading with Safe Guardrails (github.com via hn) Agents can do almost everything now. Except trade.
Zhengkid/AutoTTS: Agentic Discovery for Test-Time Scaling (github.com via hn) AutoTTS LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Tong Zheng, Haolin Liu, Chengsong Huang, Huiwen Bao, Sheng Zhang, Rui Liu, Runpeng Dai, Ruibo Chen, Chenxi Liu, Tianyi Xiong, Xidong Wu, Hongming Zhang, Heng Huang UMD ·…
Show HN: Vyvoice: Privacy-first, cross-platform, offline voice transcription app (vyvoice.com via hn) Hey Hacker News, vyvoice is a cross-platform, offline voice transcription app I started working on in December as a Windows user tired of every good dictation app being Mac-only. Beyond transcription, it has built in support for voice comm…
Show HN: Agentic product discovery for AI apps and shopping agents (www.seekon.me via hn) Agentic Catalog Intelligence. Empower your AI models with precise, real-time product discovery.
Show HN: Markanywhere – A Streaming Processor of Meanings (github.com via hn) Markanywhere can parse any input, like Markdown, HTML, XML, as a stream of semantic events which can be rendered, transformed, evaluated. Works great as an interactive transport layer for the LLM inference output and agentic feedback loops.
Show HN: Claurst – Rust-Based OSS Terminal Coding Agent Now in Beta (github.com via hn) CLAURST Agentic Coding for Builders who Ship Claurst is an open-source, multi-provider terminal coding agent built from the ground up in Rust. It started as a clean-room reimplementation of Claude Code's behavior (from spec) and has since…
I almost broke the one rule that separates agentic coding from vibe coding (www.reddit.com) I built an opinionated multi-agent setup on top of Claude Code. I was proud of two agents in particular: a software engineer doing red-green TDD, and a separate tester running the adversarial edge-case pass.
Built a tool that stops AI agents from being hijacked by malicious content in webpages and emails (www.reddit.com) Been working on a runtime governance layer for LLM agents. It sits between your app and the OpenAI API and enforces instruction-authority boundaries at the proxy level.
Show HN: Building a universal device experience [video] (www.youtube.com via hn) I have been working on this project for a few years now and the end goal is to make a ubiquitous and natural user experience to interact with machines. The long term goal is to build a fully agentic experience that drives the UI for you (g…
Show r/AI_Agents: Stop your agents from breaking tool calls in production — we built a reliability layer for 2,000+ APIs (www.reddit.com) We built a CLI that sits between AI agents and production APIs — handles auth, retries, compliance, and idempotency automatically across 2,000+ APIs. Give your agents capability of multi-tool calls with 100% accuracy.
Looking for your experiences in agentic scraping social profiles (www.reddit.com) Based on your experience, which agentic workflows has everyone had the most success using to extract public profile data from Instagram and Facebook? I've seen previous discussion here about n8n and OpenClaw, and I'm looking for the latest…
Looking for affordable alternatives to Claude Team / Claude Code for a small dev team (heavy agentic usage) (www.reddit.com) We run a small software services company and we’ve been heavily using Claude (especially opus + Code features) for the last few months. The problem is: We need to share the account between 6-8 developers Anthropic keeps suspending our Max/…
What is the best ai engineering course right now for agentic ai (www.reddit.com) Everywhere i look ppl are talking about agentic ai now… feels like basic gen ai stuff is already saturated. but trying to figure out how ppl are actually learning this beyond surface level… youtube kinda stops at demos.
Most teams optimize the prompt. Agentic systems have more moving parts (www.aevyra.ai via hn) On LinkedIn last week, an AI practitioner I know made an observation I keep thinking about: hill-climbing on evals tends to leak information specific to those evals rather than improve the system. Their follow-up question: "What if you hil…
Authorization Bypass in AWS's Agentic AI for Enterprise: Amazon Quick (www.fogsecurity.io via hn) We discovered an authorization bypass in Amazon Quick’s AI Chat Agents that allows users to access and interact with AI agents despite explicit administrative restrictions. AWS responded by deploying a fix without notifying customers, clas…
Choosing the Right Agentic Design Pattern: A Decision-Tree Approach (machinelearningmastery.com via hn) In this article, you will learn how to apply a structured decision tree to choose the right agentic design pattern for any AI system you are building. Topics we will cover include: Why pattern selection is a critical design decision, and w…
A fully autonomous browser runtime for any AI agents (github.com via reddit) Built (with Claude) an open source, fully autonomous browser runtime for agents. One critical issue I faced (I guess most of us do) is the inability to have a robust web search feature and this will help you direct towards that goal I hope…
Zig vs. Rust, agentic coding, and intellectual control [video] (www.youtube.com via hn) About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
Useful AI agents / tools for client meeting management? (www.reddit.com) Hey y'all, I've been working towards automating different sectors of my agency each week, and this week it’s meeting workflows. I know about AI note-takers but it seems like most of them are just passive recorders that leave me with a long…
Mergecrew: Open-source agentic SDLC with human-gated prod deploys (github.com via hn) Mergecrew Autonomous product team in a box: every day, mergecrew specifies, designs, builds, deploys to dev, scans for bugs, and hands you a digest to approve before anything reaches production. Mergecrew is the open-source platform for ru…
My Agentic Engineering Scorecard (www.meadow-notes.com via hn) agentic engineering scorecard Over the past few months, I’ve gradually converged on a highly iterative style of agentic software development, shying away from the "dark software factory" approach. This post explains why I've made the move,…
Industry academia disconnect (www.reddit.com) Hi all, I do a lot of work with academic and industry partners in engineering applications. Therefore I end up having a lot of conversations with people around agentic AI for engineering.
Physics-intern: an autonomous agentic framework for physics research (huggingface.co via hn) Your Article Title Built with the Research Article Template. Quick Start cd app npm install npm run dev Visit http://localhost:4321 to see your article.
TigrimOSR v0.4.1: Running AI agents headless on a remote server, controlled by a fast local Rust UI (www.reddit.com) Hi everyone, I’ve been working on TigrimOSR v0.4.1, a Rust-native version of TigrimOS, and I’d like to invite people to try it and give feedback. The main idea is: Run the agent system headless on a remote machine, then connect to it from…
As agentic dev tools boom, workflow auditability becomes the constraint (thenewstack.io via hn) As agentic dev tools boom, workflow auditability becomes the constraint Recently, I was working with a senior engineering leader at a large financial institution to review their DevSecOps platform engineering roadmap. Their team had deploy…
Show HN: RipStop – Git guardrails to reduce impact if your code agent goes wild (github.com via hn) Hi all, RipStop is a node package implementing a set of rules that consumers can use to protect their repos from wilder actions by LLM agents. A consumer needs only a few lines of code to configure the rules they wish to apply.
The AI market moves so fast that your business idea can expire before launch (www.reddit.com) 1.5 years ago, n8n was everywhere. People were building workflows for everything.
Do you think foundational model companies will take over all agent businesses? (www.reddit.com) Do you think they will end up learning the most painful workflows from enterprise customers and built all the most necesary agents for the smaller guys themselves? In other words, squeezing out all the agentic companies out there?
What do you NOT like about Cursor / VSCode / Claude Code desktop / Codex / etc.? (news.ycombinator.com) I am building a highly integrated, cross-provider agentic workstation (its neither an IDE nor an ADE - does a bit of both, with additional unique features on top), and I would love for you guys to rant about what you hate about the tools y…
Thousands of apps built with Agentic AI platforms like Lovable, Replit, Netlify, and Base44 are exposing private data (www.reddit.com) A new investigation by Israeli cybersecurity firm Red Access found thousands of AI-generated web apps leaking data ranging from medical records to internal business documents. The findings add to mounting concerns about vibe coding, a fast…
Automata and AI (www.reddit.com) Hello, I have been working on a new programming language for creating state machines. I’m curious how the structure automata provide might be useful with MCP and agentic workflows.
Show HN: I've implemented multi-repo workspace support in Agent of Empires (github.com via hn) Coding agent management is all the rage right now, and many tools are being created to fill the gap. As a power user for all tools I've used since I've started my software engineering career, I've always taken the time to test multiple too…
Agentic AI is giving cyber criminals nation-state-like powers (www.defenseone.com via hn) Pentagon leaders love agentic AI. But it’s giving cyber criminals nation-state-like powers As new tools change cybersecurity, just moving faster won’t be enough.
Agentic AI vs. AI Agents: The Governance Shift (rootcx.com via hn) Open any vendor pitch from the last 6 months and somewhere in the deck, you'll see the word agentic. It's been a marketing term for so long that most engineering leaders have started treating it as noise.
Show HN: Agentic productivity platform for high perfomers (www.mainthread.app via hn) Finally on top of things. Mainthread unifies every commitment across work, family, and household into one intelligently prioritized system — then deploys AI agents to handle what doesn't need you.
LLM as logic processor, filesystem as memory — Q2 quant doing real agentic coding 50k context (www.reddit.com) Hello LocalLLaMA subreddit, i have been running local models for coding tasks and kept hitting the same problems everyone does — the model writes an 800-line file in one shot and half of it is garbage, it spirals in its own reasoning for 4…
VibeServe: Can AI Agents Build Bespoke LLM Serving Systems? (github.com via hn) VibeServe: Can AI Agents Build Bespoke LLM Serving Systems? An agentic loop that synthesizes bespoke LLM serving systems — one per (model, hardware, workload) target — instead of forcing every deployment through a single general-purpose ru…
The missing primitive in every agent harness is a protected region (www.reddit.com) I wrote a post about why agentic coding falls off a cliff after a few weeks. Coding agents have no equivalent of the source/assembly boundary a compiler gives us.
I built agentwerk, a tiny Rust crate for scaling agent collaboration focusing on getting work done (www.reddit.com) For a new Rust project, I was searching for a simple agentic loop implementation. My goal was to analyze thousands of software artifacts at scale.
I built a context window optimization framework for coding agents — open source + paper (www.reddit.com) Been working on a problem that I think a lot of people here face: agentic coding pipelines blowing through their context window way too fast, losing important information, and degrading task quality mid-session. Apohara Context Forge is my…
I put Claude Code inside Obsidian as a plugin — full agentic vault access with a native UI bridge (www.reddit.com) could not extract summary
I asked 20 Agentic Aai founders how they handle agent access. 17 said temporary workarounds. (www.reddit.com) Over the last few weeks I’ve been doing something that probably sounds a bit obsessive. I reached out to founders and engineers who are shipping AI agents into production agents that touch CRMs, sales automation, ai chatbots, payment APIs,…
Show HN: Make your codebase agent ready (github.com via hn) A set of Claude Code skills to assess and improve the agentic readiness.
Powering the Inference Era: Inside the DigitalOcean AI-Native Cloud (www.digitalocean.com via hn) By Vinay Kumar, Chief Product & Technology Officer I’ve spent the last fifteen years building cloud services: early days of AWS building S3 and EBS, helping launch Oracle Cloud Infrastructure from inception, and now building the agentic cl…
Ask HN: How do you give estimates in the age of Agentic coding (news.ycombinator.com) Back in the day you would get a rough estimate of how long a new feature might take once you had worked on a codebase for long enough. You knew how the internals worked, how much time it would take to design the solution, how fast you coul…
Should we use a non-thinking model for code after using a thinking one for plan? (Agentic coding) (www.reddit.com) I usually use Qwen3.6 27B (slow as heck on my RX 6800 but it works) for plan and Qwen3.6 35B A3B for the coding. But I was thinking the other day if I should remove the thinking from the code model.
Ask HN: What is the underlying stack behind multi-agent platforms? (news.ycombinator.com) Recently, I am seeing lots of startups with multi-agent platform, where you can create your own agent template, attach tools and run it reliably. Which frameworks, platforms are you using for these kind of multi-agentic platforms?
ABA Games (1D Pac-Man, etc) Agentic Gamedev Skills (github.com via hn) Agentic Gamedev Skills English | 日本語 This repository collects agent skills extracted from game-development work and related agentic-workflow research. Each skill lives under .agents/skills/, uses SKILL.md as its entry point, and may includ…
Meta plans advanced 'agentic' AI assistant for users (www.reuters.com via hn) paywalled
Show HN: Stagewise – Agentic IDE for Your Z.ai/DeepSeek/Moonshot Subscription (github.com via hn) The Open Source Agentic IDE for Developers English | 简体中文 | Deutsch | 日本語 | Español | 한국어 /_components/feature-images/full-demo-dark.png) About the project stagewise is an open source agentic IDE for developers with a coding agent built ri…
Show HN: Slate – agentic pre-production studio for solo Youtubers (useslate.app via hn) I built slate as a personal tool to centralize my strategy, research, scripting, thumbnails and shots in one place. Started showing it to other youtubers and that made me wonder if more people could have the same problem as me.
Is GraphQL the Panacea for Agentic AI? (magiroux.com via hn) It was evident that GraphQL would be touted as the ultimate API style for agents. After all, it is one of the only ways we expect an API style to stay relevant these days.
Open Sourcing Our Platform - GuideAnts Notebooks (www.reddit.com) This is yet another agent harness and UI and I hope you will have a look and consider contributing. Elumenotion/GuideAnts: GuideAnts Notebooks.
Anthropic response to 1-click pwn: Shouldn't have clicked 'ok' (www.theregister.com via hn) MOST POPULAR EVENTS - Securing the Untrusted Agentic Development Layer Join us to learn how to architect a development environment where your builders and their agents can move fast and securely. - Toxic Flows: When Your AI Agent Skill Bec…
"Surface" a Governed AI-Agentic Surface (news.ycombinator.com) A continued work in progress https://github.com/pauljbernard/sbcl-agent-desktop and https://github.com/pauljbernard/sbcl-agent an implementation of the ideas discussed in: The Evolution of Software Scale https://www.amazon.com/Evolution-So…
Subjective: Building a Native VFX Editor with Agentic Coding (sxp.studio via hn) This blog post is about my process and learnings in using agentic coding to ship a project with higher complexity than your usual vibe-coded todo app. You can download the app on iOS/iPad/macOS here: Subjective.
Mistral Medium 3.5 Is Now Available in Puter.js (developer.puter.com via hn) Mistral Medium 3.5 Is Now Available in Puter.js On this page Puter.js now supports Mistral Medium 3.5, the new flagship merged model from Mistral AI that unifies instruction-following, reasoning, and agentic coding into a single set of wei…
Starting with Agentic AI (iscinumpy.dev via hn) AI suddenly passed the “more time saved than spent” point around December 2025. A little late, I’ve finally started using agentic AI in various places over the last 2-3 months, and wanted to jot down my thoughts on what works, what doesn’t…
Understanding agentic workflows (www.reddit.com) I tried developing workflows using github copilot in order to create an multi-agent orchestration for a use case about creating research paper based on user’s need. However, there is no supported mechanism for subagents to spawn custom sub…
Two OpenClaw Agents Negotiate a YC SAFE with Agentic Power of Attorney (www.juanfiguera.com via hn) Two OpenClaw agents negotiate a YC SAFE with Agentic Power of Attorney I gave an AI agent access to act on my behalf on a third-party platform a few months ago. Within about ten minutes I realized I was scared of it.
Aesthetic Layout in LLM-Based Slide Generation via Verifiable Rewards (arxiv.org via hn) Large language models (LLMs) have demonstrated strong potential in agentic tasks, particularly in slide generation. However, slide generation poses a fundamental challenge: the generation process is text-centric, whereas its quality is gov…
Forced opening into Claude Code mode? (www.reddit.com) What the is with Cursor now opening in this stupid Cursor Agents mode that looks like Claude Code? I didn't ask for this, and I don't want to have to figure out how to stop it opening like that and click the "Editor Mode".
Chasing AI Memory SOTA: Beating the Benchmark, Missing the Point (xmemory.ai via hn) Chasing AI memory SOTA: Beating the Benchmark, Missing the Point 66.88%, 80.1%, 85%, 90.79%, 93%, 91.69% and even 100% — what do all these numbers have in common? They’re all state-of-the-art (SOTA) scores on various agentic memory benchma…
Global online hackathon for building AI agents with perception + memory (May 16–18) (www.reddit.com) Agents are moving into browsers, apps, meetings, dashboards, and code editors. The next generation of agents will need more than text context — they need to see what is happening, hear what is being said, remember important moments, and ac…
Is there tool that helps me validate my AI business idea? (www.reddit.com) I'm a product manager for a small business and I'm working on a product idea in the field of agentic AI. I have been chatting a lot with Gemini and ChatGPT but at some point they just keep telling me how great my idea is.
Architectural Framework for Agentic AI in Identity and Eligibility (wwps.microsoft.com via hn) Architectural Framework for Agentic AI in Identity & Eligibility By Prabhaker Cirium, Prin Consultant at Microsoft and Sajal Mukherjee, Senior Consultant at Microsoft Leveraging Azure AI to Revolutionize Citizen Onboarding and Benefits Eli…
We built an agentic AI for support triage. 47% deflection in 90 days. Full retro. (www.reddit.com) Setup: mid-size SaaS, ~3,000 tickets/month, 6 agents drowning. 70% of volume was tier-1 (passwords, billing, where's-my-feature).
Need advice on hardware purchasing decision: RTX 5090 vs. M5 Max 128GB for agentic software development (www.reddit.com) tl;dr - For software development, Qwen3.6 27B, 5090 gives you ~3x speed over M5 Max, letting you plow through code, while M5 Max gives you ~4x memory, letting you use higher quantization and bigger context. Which would you choose and why?
A Grand Challenge for Reliable Coding in the Age of AI Agents (arxiv.org via hn) Agentic AI systems can now generate code with remarkable fluency, but a fundamental question remains: \emph{does the generated code actually do what the user intended?} The gap between informal natural language requirements and precise pro…
Dev Environment for Agentic Coding (adek.io via hn) Dev environment for agentic coding Standardized dev environment for the agent coding era is how you get multiplier on top of your coding agent I am not here to talk or praise coding agents. I am here to talk about the next multiplier which…
AI subscriptions need a reliable meter (www.reddit.com) TLDR; “A gallon should be a gallon. A mile should be a mile.
Ling 2.6 (Flash and 1T): Efficient Open Models Competing on Agentic Benchmarks (firethering.com via hn) Ant Group doesn't get the coverage it deserves. While the open source AI conversation in the West circles around DeepSeek and Qwen, Ant Group has been quietly building a model family that competes directly with the models everyone is talki…
Agentic AI Community 2026 (simplai.ai via hn) Free, self-paced courses covering everything from agent fundamentals to real-world deployment. 50+ hands-on lessons designed for both technical and non-technical learners.
Skelm – Build AI agents in TypeScript without losing your mind (github.com via hn) skelm Build secure, agentic, long-running workflows in TypeScript. Run them anywhere Node runs.
The Figure-Eight Model for Agentic DevEx (medium.com via hn) The Figure-Eight Model for Agentic DevEx | by Joe Kutner | May, 2026 | Medium Sitemap Open in app Sign up Sign in Get app Write Search Sign up Sign in The Figure-Eight Model for Agentic DevEx Joe Kutner Follow 5 min read · 1 day ago 2 List…
tested four newest open source Kimi K2.6 is the fastest, GLM 5.1 the fanciest, DeepSeek V4 is the most comprehensive, and Xiaomi MiMo is the slowest (www.reddit.com) Architecture explains the gap: MiMo's MoE runs more active params per token than Kimi K2.6's optimized routing hence slowest. DeepSeek V4's 'comprehensive' edge is partly MLA: ~75% KV-cache compression makes it far better for long agentic…
My list for Top Agentic Frameworks - Looking for feedback on any that are missed, or theme to be addressed more fully (www.reddit.com) In 2026, AI agents have moved from hype to production reality. Teams are no longer asking if they should deploy agents.
Agent Orchestration Models (news.ycombinator.com) We are using Symphonic Orchestration (models) for our agentic commerce platform (hive of clawdbots building databases) and wanted to know what folks thought of our approach and also to learn about alternatives.
The future of company architecture (www.reddit.com) I've been in AI for over 10 years now and toyed with GPT2 when I was doing NLP work and really recognized the power of LLMs as a way to drive automation after spending time trying to build agents with GPT3.5. As time as gone on I've become…
A Mental Model for Agentic Work (basti.io via hn) Blog A Mental Model for Agentic Work May 5, 2026 - AI Agents - Company Operations - Software Engineering Something shifted in the first quarter of 2026. Not a feature launch, not a new product - a structural change in how work happens.
Show HN: Kanban-CLI – a web UI for local Markdown todo lists (github.com via hn) As we all are, I've been experimenting with ways to reduce external saas spend, and continually bring traditionally external pieces of context (prs, docs, trello boards) into the one mono repo. I have toyed with a markdown todo list and se…
Five Eyes spook shops warn rapid rollouts of agentic AI are too risky (www.theregister.com via hn) Five Eyes spook shops warn rapid rollouts of agentic AI are too risky Prioritize resilience over productivity, say CISA, NCSC and their friends from Oz, NZ, Canada Information security agencies from the nations of the Five Eyes security al…
PyFlue – Python-Native Agent Harness Framework (Python Clone of Flue) (super-agentic.ai via hn) Full-Stack Agentic AI Company We build deeply technical agent developer tools, purpose-built for agent experience and agent engineering at scale. Our research lab explores the frontier where Agentic AI meets Quantum AI.
UAE Plans to Run 50% of Government on Agentic AI Within Two Years (www.mitsloanme.com via hn) UAE Plans to Run 50% of Government on Agentic AI Within Two Years Agentic systems will analyze, decide, and execute across ministries under centralized oversight. News - Oman to Scale AI Ecosystem With New Special Economic Zone - UAE Bets…
Agent Evals is an absolute nightmare, so I built Signals to reduce the noise and cost (www.reddit.com) Hey peeps - I think the hardest thing about building agents is their evaluations. especially for scenarios that require multiple tool calls and the agent itself can go down a trajectory that you haven't manually tested before.
Show HN: Enoch – Control Plane for Autonomous AI Research (github.com via hn) I built Enoch after working with OpenClaw and trying to get an agentic coding system setup with Codex. In the past, I was trying to manually generate, code, and test this all manually.
I solved my problem and hope your also (www.reddit.com) I am an AI engineer. I build more AI agents, Agentic AI systems.
CISA, NSA & Five Eyes publishes guide on how to safely deploy AI agents (cyberscoop.com via hn) Cybersecurity agencies from the U.S. and allies issued a joint warning Friday on the risks of "agentic AI." The new guidance urges critical infrastructure leaders to implement zero-trust protocols as autonomous systems gain unmonitored acc…
Tried running Claude Code with local LLMs via Ollama — ended up subscribing to Pro anyway. But now I can't disconnect from the local server. (www.reddit.com) I've been experimenting with using Ollama to run Claude Code locally with models like Gemma 4, thinking I could avoid API costs. However, I quickly realised these models aren't really optimised for Claude Code's agentic workflows — they te…
Which Agentic Coder is the most with it now? (www.reddit.com) Considering the price to performance which is the best deal or setup right now? Similar to codex where it can edit project files inside a folder etc.
Show HN: Large Scale Article Extract of Newspapers 1730s-1960s (snewpapers.com via hn) Hello HN, over the past 7 months I've spent nearly 3,000 hours on building SNEWPAPERS, the first historical newpaper archive with full-text extractions, nearly perfect OCR, a vast categorization taxonomy and of course with semantic and age…
Are you all still managing multiple agent sessions manually? (www.reddit.com) I feel like my current “agentic workflow” is kind of broken. Right now I open Superpower and run like 4–5 Claude Code sessions in parallel… but it just feels super disconnected.
I used Claude to build "pin-llm-wiki" — A skill that turns any URL into a clean, citable Karpathy-style LLM Wiki (github.com via reddit) Hey 👋 I’ve been using Claude Code a lot for personal research and knowledge management, and one thing kept bothering me: Turning articles, YouTube videos, and GitHub repos into clean, structured, citable notes is tedious. So I built pin-ll…
Is agentic commerce really APIs… or dynamic UIs like this? (www.reddit.com) https://preview.redd.it/2abn96dwudyg1.png?width=1642&format=png&auto=webp&s=ab5facbd9f4223184834711346dca2bc64db20d3
Anthropic wants to be the AWS of agentic AI (thenewstack.io via hn) Anthropic's Managed Agents platform bundles sandboxing, checkpointing, and persistent memory into a single API layer — and the company's ambitions look a lot less like a model provider and a lot more like AWS.
Running Local Agentic PDF Search with Eno (enopdf.com via hn) eno can drive its full agentic search against a local, open-weight model running on your own hardware. When you do, your PDFs, your queries, and every intermediate step of the agent loop stay on your machine.
Get Your Website/API Ready for Agentic Commerce in 1 Minute (www.startuphub.ai via hn) Free scanner that audits websites, APIs, and MCP endpoints across 7 categories — discoverability, content, access control, capabilities, commerce (x402-mesh), and quality. Public leaderboard, open spec, paste-ready fix prompts.
OpenAI + agentic systems (DFW) (www.reddit.com) i’ve been using OpenAI tools more heavily lately and keep circling back to the same shift: moving from simple chat use into agentic systems. Most people still seem to be using it for Q&A or basic content help, but there’s a lot more happen…
My agent works 3 times… then randomly skips steps and breaks. Same input. Why? (www.reddit.com) I’ve been deep in the trenches building out multi-step agentic workflows, and I’m hitting a consistent wall with what I can only describe as "stochastic decay." The pattern is frustrating: Runs 1 through 3 execute flawlessly, but by the fo…
how do you stop people from finding loopholes in your agents once they're in production? (www.reddit.com) agentic demos always look clean in a controlled setup. the problem that I'm pushing toward real volume now and the adversarial side is getting messy fast.
Fixed the risk of agents disclosing your secrets (www.reddit.com) Why is it considered acceptable by most in the community to have API keys sitting on a file system where the agent is running, with direct access to them, gated by a prompt? This is literally the base security model of OpenClaw and most ot…
Letting AI play my game – building an agentic test harness to help play-testing (blog.jeffschomay.com via hn) Vercel Security Checkpoint | sfo1::1777467624-qE4eB4e2LvmbibEDgl5Ljah0zEqW8iFE
Best Practices to Start with Vibe Coding? Best Local Apps for Agentic Vibe Coding? (www.reddit.com) DISCLAIMER: I am not a programmer nor do I have experience coding. I've been thinking about a small app running on gradio for some time now, and I want to try tweaking some extension for ComfyUI.
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-Unify Architecture (www.reddit.com) SenseNova U1 is a new series of native multimodal models that unifies multimodal understanding, reasoning, and generation within a monolithic architecture. It marks a fundamental paradigm shift in multimodal AI: from modality integration t…
Genuine question for people who have built multi-agent systems in production. How do you handle context continuity across enterprise tools? (www.reddit.com) I've been going down a rabbit hole lately trying to understand how production agentic systems actually work at scale, not just the demo versions. The part that keeps tripping me up is memory and context management across agents.
Is an agentic Spark copilot worth it? opinions? (www.reddit.com) Running Spark jobs on Databricks with 50+ stages per pipeline. Debugging is still almost entirely manual.
OpenGame: Open Agentic Coding for Games (arxiv.org via hn) Game development sits at the intersection of creative design and intricate software engineering, demanding the joint orchestration of game engines, real-time loops, and tightly coupled state across many files. While Large Language Models (…
How are you ACTUALLY running truly asynchronous agentic AI in your business? (www.reddit.com) I'm starting a new company (I will not promote) and I want to hear how you're actually running operations that have little-to-no "human in the loop". Tools like OpenClaw are great for personal use, but how are you leveraging tools/systems…
An open-source platform to auto-update agent skills and discover fresh sources (www.loooop.dev via hn) GitHub obra/superpowers: An agentic skills framework & software development methodology that works. · GitHub GitHub obra/superpowers: An agentic skills framework & software develop… Loop autonomously monitors, evaluates, and updates your a…
The Controllability Trap: A Governance Framework for Military AI Agents (arxiv.org via hn) Agentic AI systems - capable of goal interpretation, world modeling, planning, tool use, long-horizon operation, and autonomous coordination - introduce distinct control failures not addressed by existing safety frameworks. We identify six…
TealKit – A cross-platform UI for local AI agents and MCP (github.com via hn) # 🐦⬛ TealKit The Privacy-First, Infinitely Extensible Agentic AI Platform for Mobile & Desktop TealKit turns your phone and computer into a powerful agentic AI platform with autonomous agents, built-in tools, and unlimited extensibility.…
Agentic CEO – An AI research organism that hunts, critiques, and evolves itself (github.com via hn) Agentic CEO An autonomous multi-agent research system that acquires knowledge, builds a persistent worldview, and improves itself. 3,700+ knowledge entries.
I've got a feeling that Llamacpp is not the biggest performance bottleneck, but it might be the OpenCode. (www.reddit.com) It looks as if OpenCode introduces an artificial delay in agentic coding. Have you noticed similar issues?
Apple integrates Claude and Codex into Xcode 26.3 for 'agentic coding' (venturebeat.com via hn) Apple integrates Anthropic’s Claude and OpenAI’s Codex into Xcode 26.3 in push for ‘agentic coding’ | VentureBeat Orchestration Infrastructure Data Security More Newsletters Apple integrates Anthropic’s Claude and OpenAI’s Codex into Xcode…
Show HN: 49Agents – Infinite canvas IDE for AI agents (github.com via hn) 49 Agents IDE The first 2D agentic IDE. Open source.
The Full-Cycle Agentic Experience (www.reddit.com) The Full-Cycle Agentic Experience What we're missing, and why it matters more than the models themselves. Think about the last time you bought something in a store.
Agentic AI made DevOps and Agile obsolete (avkcode.github.io via hn) The Self Healing Platform and the Agent Store I think DevOps as a separate identity, and a lot of agile ceremony around it, are already a bit obsolete. Engineers are doing development, operations, and lightweight management at the same tim…
Agentic ML engineer. works with Colab. Zero infra needed. 3x faster TurboQuant (github.com via hn) isanagent An always-on, agentic ML engineer for your workspace — built by ALTAI. isanagent doesn’t just answer prompts: it pushes work toward something shippable — research, code, runs, checks, and handoffs you can actually use.
PI agent integrated with Cline-Kanban repo: All using PI and Qwen 3.6 35B MOE UD 4K_XL (www.reddit.com) Repo: statisticalplumber/kanban at pi-agent-integration Hi Guys, To test Qwen 3.6’s potential, I also wanted the Cline Kanban project to have an open-source agent to work with. The last time I tested Cline Kanban, it didn’t support agents…
PAuth – Precise Task-Scoped Authorization for Agents (arxiv.org via hn) The emerging agentic web envisions AI agents that reliably fulfill users' natural-language (NL)-based tasks by interacting with existing web services. However, existing authorization models are misaligned with this vision.
Agentic Workforce Framework, an operating model for autonomous agent teams (github.com via hn) Agentic Workforce Framework A reference architecture for operating autonomous AI agents as accountable digital workers inside enterprise environments. This framework defines how agents are assigned work, bounded by role, governed by approv…
Show HN: AgentSwarms – free hands-on playground to learn agentic AI, no setup (agentswarms.fyi via hn) Show HN: AgentSwarms – free hands-on playground to learn agentic AI, no setup required!
A 14-day “Growth Forge” sprint: build an AI-powered growth agent on a real stack (www.reddit.com) Sharing something that sits at the intersection of AI agents and growth systems. VideoDB (backend for video/audio for AI agents) is running a 14-day sprint called Growth Forge for 5 builders to design and ship a growth agent on top of an e…
Show HN: I made GAI to have LLM agents in Go without heavy frameworks (github.com via hn) GAI is a flexible Go library for building agent-style applications on top of LLMs. It provides a generic interface for providers and models, prompt and context helpers, and a loop for agentic-calling workflows.
Mario & The Intent-Bearing Agentic Loop (www.reddit.com) Q: When do I need Agents vs. Skills vs.
RTX 3090 + 27B model performance issues (llama.cpp) what am I doing wrong (www.reddit.com) Hey folks — looking for some advice on improving my local LLM setup (and also exploring agentic coding workflows). Current setup: GPU: RTX 3090 (24GB VRAM) RAM: 64GB Using llama.cpp with a Qwen3.6 27B Q6 model (GGUF) Running through OpenCo…
Show HN: Mdspec – auto sync your md files from GitHub repos with wikis (mdspec.dev via hn) We do generate a lots of md files along with our agent based development. Skills, Agent.md, Docs etc.
SimpleBanking sb CLI – Query real German bank accounts from the terminal (balances, transactions, categories, JSON output) (www.reddit.com) Hey r/AI_Agents, I've been building SimpleBanking, an open-source macOS banking app for German bank accounts using the FinTS/HBCI protocol (the standard used by German banks like Sparkasse, Volksbank, DKB, etc.). It now ships with a full C…
What are the limits of the agentic computer features on 5.5? (www.reddit.com) Is this supposed to be an OpenClaw / Hermes Agent competitor ? Am I able to ask it to go on my browser, visit a site I’m logged into and gather info?
Working with Claude Code: A Field Manual (blog.iannelson.uk via hn) Earlier this week I published a reflective post on how agentic coding has changed my working day and the shape of the profession. I now want to turn to the other side of that coin: not the philosophy, but the mechanics — the habits, the wo…
Agentic Company OS update: project-scoped runtimes, governance UI, snapshots/replay, skills, and operating models (www.reddit.com) I shared this project here before when it was mainly a governed multi-agent execution prototype. I’ve kept working on it, and the current implementation is materially more complete, so I wanted to post an update with what actually exists n…
Building a full-stack app with Wasp, an agent-friendly web framework (wasp.sh via hn) From 10 Failed Stacks to Production: How a Data Scientist Built a Job Board with Wasp, a Full-stack Framework for the Agentic Era Hireveld is currently down while Marcel works on a major refactor - but it's real, we swear! It'll be back up…
Is anyone else way faster with AI in familiar stacks and way slower in unfamiliar ones? (www.reddit.com) Been using agentic coding workflows seriously for about a year now and I've finally figured out the pattern behind why it feels magical half the time and broken the other half. At my day job, where I know the stack and have intuition about…
Show HN: We built Cursor, but for data transformations [Open Source] (github.com via hn) Agentic & No-Code Data Transformations Vibe coded pipelines: say hello to accuracy and maintainability. Website · Documentation · Issues · Contributing What is Visitran?
Google's 8th Generation TPUs Power the Agentic Era [video] (www.youtube.com via hn) About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
Speeding up agentic workflows with WebSockets in the Responses API (openai.com via hn) could not extract summary
Show HN: API Ingest – Agentic Search (Inter) API Docs (github.com via hn) 1. CC / Codex dont handle API Docs well enough No matter what I do, I run into bad requests with claude, day in, day out.
Show HN: Sift – save AI tokens in Codex/Claude by summarizing command output (github.com via hn) I made a small skill/script for agentic coding workflows: https://github.com/panpeter/sift-skill The idea is simple: when a command like cargo test, pytest, npm test, or ./gradlew test prints a lot of output, that raw log often gets pulled…
Show HN: We open-sourced a 6-library governance stack for AI agents (Python) (news.ycombinator.com) Our team has been deploying AI agents in enterprise environments for the past 2 years, across 60+ deployments. The same governance problem kept recurring: how do you certify reliability, enforce policy, route and orchestrate context, monit…
Cursor partners with SpaceX on model training (cursor.com via hn) Cursor partners with SpaceX on model training Cursor is partnering with SpaceX to accelerate our model training efforts. We released Composer less than six months ago as our first agentic coding model.
How do you decide on chunking strategy and top-k in Agentic RAG? Looking for practical advice (www.reddit.com) Hey, I'm building an Agentic RAG pipeline and struggling with two decisions: Chunking strategy — fixed-size, semantic, or hierarchical? In an agentic setting where the agent can re-query iteratively, does it make more sense to use smaller…
X402 and Agentic Commerce: Redefining Autonomous Payments (aws.amazon.com via hn) Managing context in long-run agentic applications (slack.engineering via hn) The Bitter Lesson of Agentic Coding (agent-hypervisor.ai via hn) MongoDB MCP (www.reddit.com) Using closed financial markets with deterministic goals for agent behavior improvements (www.reddit.com) Which AI Agents SDK allows low latency agents w support for skills etc? (www.reddit.com) Show HN: AI Primer – A Searchable AI Changelog for AI Engineers and Creatives (www.ai-primer.com via hn) I'm completely lost in the Agentic Maze. What level to learn. how to organize stydu (www.reddit.com) Show HN: Agentic Dev – AI dev-tools news, curated daily by Claude (agenticdev.blog via hn) OpenAI released a major update to Codex, used by over 3 million developers weekly, adding background computer use, an in-app browser, image generation via gpt-image-1.5, more than 90 new plugins, GitHub PR review support, SSH connectivity,…
Why AI Agents are bad at “generating a business idea” (www.reddit.com) My opinion is it is a matter of structured approach. Of course when you just ask Claude to “find top apps in AppStore and tell me what app should I build” you will get as generic answer as your question.
Fast local LLM to generate CLI commands from prompt? (www.reddit.com) GitHub copilot CLI used to do this but now it’s a full agentic coding environment. Basically, I can’t remember all the options to every Linux command.
Built a full-stack charitable giving SaaS as a solo developer with agentic AI (www.pifster.org via hn) PIFster - the Pay It Forward Charity Did you know there are 1.8 million nonprofits in America? Most are struggling to be heard, but PIFster is changing that.
[Claude Code] Stuck in 57+ minute loop for routine fixes (Opus 4.7) (www.reddit.com) I'm running into a severe performance hang with Claude Code (Opus 4.7) today. I provided a relatively straightforward prompt to fix some hydration errors, add two stub routes, and perform a theme audit (string replacement).
Cowork Orchestrator Patterns (www.reddit.com) While working in Cowork, I have been experimenting with designing plugins that try to apply some established agentic patterns to help manage the context window. The problem that I'm running into is with Cowork the main orchestrator is the…
What is the simplest architecture for running a multi-agent system at scale? (www.ashpreetbedi.com via hn) Scaling Agentic Software: Part 1 What is the simplest architecture for running a multi-agent system at scale? I want to deploy agents as a real service.
Show HN: Marky – A lightweight Markdown viewer for agentic coding (github.com via hn) Hey HN, In this age of agentic coding I've found myself spending a lot of time reviewing markdown files. Whether it's plans or documentation that I've asked my agent to generate for me, it seems that I spend more time reading markdown than…
Kelvin Claw: A secure, modular agent harness with supply-chain validated plugins (agentichighway.ai via hn) Agentic Highway Team KelvinClaw: A secure, modular agent harness with supply-chain validated plugins An agent runtime designed for zero-trust environments from the ground up. Building secure agent systems at scale is a different problem th…
I built a self-evolving agentic loop that ran 104 iterations autonomously to find questions that break every LLM — here's the architecture (www.reddit.com) Why I built this: I wanted to find the next "strawberry problem" — simple questions any kid can answer but every LLM gets wrong. Instead of manually testing questions, I built a system that does it autonomously.
A Black-Box Contract Engine for Agentic Software Development (github.com via hn) Project Dojo A Black-Box Contract Engine for Agentic Software Development Dojo is a declarative testing engine built in Go. It acts as a transparent Man-in-the-Middle proxy between your Software Under Test (SUT) and its dependencies.
Ask HN: We dont need a programming language now? (news.ycombinator.com) I've seen agentic IDEs now Cursor or Antigravity and main trends seems to be development with just ideas, where although the changed lines are shown, its becoming less and less visible. If we are becoming language agnostic, shouldn't we op…
Solving the "Agentic Kill-Switch": Moving from Prompt Guardrails to a Python-native Safety SDK (www.reddit.com) The biggest hurdle for taking agents from "cool demo" to "production tool" is the lack of a reliable circuit breaker. We're currently relying on the LLM to "behave" via system prompts, but as we know, jailbreaks and hallucinations make tha…
Ask HN: Which LLM model and agentic CLI are you using for local development? (news.ycombinator.com) I’ve been testing a handful of models the past few weeks, but I still haven’t settled on one yet… I’m curious to see what models, their sizes, on what hardware, and which agentic tool people are using
Scaling from single-repo Claude projects to a multi agentic workflow (www.reddit.com) Hi everyone! Just a quick exchange on what I am using — and I'd love your take on it 🤖 So far I have mainly been doing one-off projects, setting up Claude in a single repo at a time.
Ask HN: What standards or protocols exist for AI Agent permissions (news.ycombinator.com) Curious what standards exist for AI agent permissions. Something like Linux read, write, execute types, but for AI agents.
The (Mostly) Agentic SDLC (amoshaviv.com via hn) Monday, 12:00. Grace, the CEO of ACME Corp, just finished her Q2 leadership meeting.
Tradclaw: an open source AI mom for agentic parenting (twitter.com via hn) My family assistant "Finley" is a full fledged member of the household , and I just open sourced her for all the Very Bad Moms and Dads ™️ out there that just need a little 🤖 help. Wanna get started right away?
Is qwen3 coder next still relevant with qwen3.5 release for agentic coding? (www.reddit.com) Basically the title. I know it will depend on your quant, but with 48gb of vram inbound, I'm curious on the communities opinion before I get the chance to vibe check.
What are the key features that make an AI system truly "agentic"? (www.reddit.com) Here's the cleanest breakdown I've seen: Autonomy – Acts without constant human prompting Goal-Oriented Behavior – Works toward defined outcomes, not just single responses Adaptive Learning – Gets better from outcomes over time Multi-Step…
Show HN: A Bomberman-style 1v1 game where LLMs compete in real time (github.com via hn) A few weeks ago, ARC-AGI 3 was released. For those unfamiliar, it’s a benchmark designed to study agentic intelligence through interactive environments.
Show HN: On-Device vs. Cloud LLMs for Agentic Tool Calling in a Real iOS App (subralabs.com via hn) We built an AI concierge into a resort directory app for iOS. The feature needed to search a dataset of ~85 properties, apply filters, find nearby airports, and respond conversationally in Italian.
Agentic Search Leaderboard (www.algolia.com via hn) We tested every major LLM on real shopping queries through Agent Studio, Algolia's platform for building search and discovery agents. Three dimensions of quality.
OpenClaw Self-Improvement Loop: adversarial agentic self-modification workflow (github.com via hn) An adversarial framework for AI agent self-modification, built and battle-tested in production. Inspired by karpathy/autoresearch.
1 year of LLMs writing code for me (www.alexarvanitidis.dev via hn) 1 year of LLMs writing code for me Published 5 days ago I have been an early adopter of AI coding tools. When the first serious agentic coding tools launched, I picked them up immediately and made them my daily driver.
Agentic dashboard analysis (www.reddit.com) Hi all Like most of us the execs at my company are big into AI. I saw a potential implementation to get myself more experienced with agents by having an agent perform a daily analysis on a dashboard to perform summaries and anomaly detecti…
Show HN: A better alternative to CLI and MCP for local tools (github.com via hn) I've created an alternative to CLI and MCP for locally running agentic tools. It uses Unix-based OS's named pipes, which means the client has quick IPC with the tool and it can have in-memory state.
Observing the shift toward open-weight models for agentic coding workflows (www.reddit.com) I've been practically evaluating some of the recent open-weight mixture-of-experts models, specifically focusing on their application in complex software engineering and agentic coding workflows. established pattern has typically involved…
Is my 'Retry Tax' math correct for DeepSeek V3/V4 agents? (Project Feedback) (www.reddit.com) Show HN: Joka.work – AI-native ticketmaxxing to replace Jira in the agentic era (joka.work via hn) Chaveta – Agentic Synthetic Data Curation Platform (chaveta.beaglabs.com via hn) Chaveta is a agentic dataset generation platform designed to streamline the creation of synthetic data for training and robotics applications. With Chaveta, users can easily request, classify, compile, author, validate, repair, and export…
Show HN: Cate – open-source canvas IDE for agentic coding workflows (cate.cero-ai.com via hn) An infinite zoomable canvas where terminals, editors, and browsers float spatially. Code the way you think.
Agentic surface area as an operating metric (arizenai.com via hn) Your Company's "Agentic Surface Area": The New Metric for Competitiveness Your CEO asks: "How much of our operation is AI-powered?" The uncomfortable part is that the question sounds simple and usually has no clean answer. Teams can name p…
CoAnalyst360 Multi-Agent AI Platform for Investigative Questions (www.penlink.com via hn) CoAnalyst360 Launch: Penlink's Agentic AI for Investigations | Penlink We value your privacy This website or its third-party tools process personal data. You can opt out of the sale of your personal information by clicking on the “Do Not S…
Show HN: Storytime – Continuity for Claude Code (and other ideas) (1ps0.info via hn) Since LLM harness (claude code included) are moving fast, I figured it would be better to put this out than wait to validate each and every claim. I crammed a lot of ideas in here!
Configuring Agentic AI Coding Tools: An Exploratory Study (arxiv.org via hn) Agentic AI coding tools increasingly automate software development tasks. Developers can configure these tools through versioned repository-level artifacts such as Markdown and JSON files.
HPE ProLiant Compute DL394 Gen12 Brings Nvidia Vera CPU to Agentic AI (www.storagereview.com via hn) At COMPUTEX 2026, HPE announced the ProLiant Compute DL394 Gen12, a next-generation 2U server built around the NVIDIA Vera CPU. The platform is designed to support emerging agentic AI and data-intensive workloads that require high memory b…
Show HN: Pokayoke – deterministic guardrails for agentic coding (pokayoke.codes via hn) Lately I've found myself having to write a lot of custom scripting in order to get my agents and coding assistants to adhere to the repo conventions and idiosyncrasies that I like to use in my projects. AGENTS.md files only seem to get me…
A Case for Simulation-Driven Resilience in Agentic Data Systems (muratbuffalo.blogspot.com via hn) A Case for Simulation-Driven Resilience in Agentic Data Systems As I mentioned in my previous post, I traveled to San Jose at the end of May for the ACM CAIS conference. On Day 0, I gave a very short talk at the Supporting our AI Overlords…
Prompt Injection in RAG Agentic Systems (ulad.net via hn) Prompt Injection in RAG Agentic Systems Real risks and production mitigations Imagine you built an AI assistant for your team. It answers questions using internal documentation: Jira tickets, Confluence pages, HR docs.
Pizx – zx and Pi AI = shell scripting with 15 AI agent patterns (github.com via hn) pizx zx fork with native Pi AI integration — 15 template tags for shell scripting, AI text generation, coding agents, agentic patterns, communication, and orchestration topologies. Quick Start npm install @topce/pizx pi auth login # one-ti…
Opra.ai: GitHub-native governance for agentic business workflows (github.com via hn) opra.ai Free, GitHub-native operating layer for governed business workflows. opra.ai stores business records as human-readable files, validates them locally, runs governed mutations through RBAC and approval policy, emits audit evidence, a…
A Categorical Framework for Agentic Artificial Intelligence (arxiv.org via hn) Scientific discovery is not only answer generation but revision of the representational regime in which evidence, artifacts, operations, and verifiers are typed. We develop a category-theoretic account of agentic discovery for materials sc…
An open standard for production agents – with runnable security checks (github.com via hn) The Agentic Product Standard A canonical standard for building production-grade agentic products — plus a Claude Code skill set that operationalizes it. Distilled from the production practices of Anthropic, OpenAI, Cognition, Sierra, LangC…
Show HN: Summarize YT Video by pasting url into AI chat (www.youtube.com via hn) We added tooling to our chat to make it agentic. It can control our 40+ apps suite.
Agentic Search for Context Engineering (leoniemonigatti.com via hn) This post is an edited long-form version of the workshop titled “Agentic Search for Context Engineering” I gave at AI Engineer Europe 2026 on April 8, 2026 in London. The slides, code, and diagrams are available in the workshop repository.
Show HN: Simple attributes for spec-driven agentic workflows (C#, Rust) (github.com via hn) I created a custom compilation error and Unit Test Runner for BDD Cucumber Specifications in Gherkin Syntax. Both C# and Rust are supported using Source Generators and Procedural Macros.
Autonomous Agentic Design for Photonics (arxiv.org via hn) We introduce an automated, agent-driven approach to the design of photonic devices. We instruct large language models (LLMs) to solve photonic design problems, given access to software tools for performance evaluation (through numerical si…
Agentic communication protocol – why A2A sucks (asimovaddendum.substack.com via hn) Agents Need a Public Square Why better agent discovery is needed – and why broadcasting may be the answer The Agent2Agent (A2A) protocol was announced by Google a little over a year ago (April 2025). It was built to allow agents to communi…
Grok Build 0.1 on API (x.ai via hn) Our latest coding model, grok-build-0.1, is now available via the xAI API in public beta. grok-build-0.1 is a coding model specifically trained for agentic coding tasks, including web development, debugging, and MCP support.
Verifying Agentic Development at Scale (twitter.com via hn) Article Conversation Verifying Agentic Development at Scale What we’ve learned building end-to-end testing capabilities in Devin’s virtual machine. 3 months ago, I joined Cognition to help build the future of software engineering.
Show HN: Bonsai –- Using agentic AI / browser / memory to replace ChatGPT (drive.google.com via hn) JavaScript must be enabled to use Google Drive Learn more Skip to main content Keyboard shortcuts Accessibility feedback This browser version is no longer supported. Please upgrade to a supported browser.
Rayfin, Back end-as-a-Service (BaaS) platform built for the agentic era (github.com via hn) 🐟 Rayfin A modern Backend-as-a-Service (BaaS) platform built for the agentic era. Define your data model with TypeScript decorators — Rayfin provisions and manages the backend for you.
The Return of Soft Skills in the Age of GenAI and Agentic Software Development (cacm.acm.org via hn) Just a moment... ACM Please confirm Verification successful.
ReARM 26.06.5: Agentic Coding Guardrails and DevOps (rearmhq.com via hn) ReARM 26.06.5: Agentic Coding Guardrails and DevOps 2026-06-01 We're announcing a major release of ReARM v26.06.5. Detailed information is available on its release view on the ReARM Demo instance.
Show HN: AI Gauge, a desktop monitor for Claude/Codex/Copilot usage limits (github.com via hn) Hi HN, new account but long-time reader. I built this for myself because I kept manually checking usage across Claude, Codex, and Copilot, and wanted to track the session and weekly usage all in one place.
The Agentic Test Pyramid (matthewboston.com via hn) The Agentic Test Pyramid One Axis Isn’t Enough Anymore Martin Fowler’s test pyramid — and Ham Vocke’s practical write-up of it on Fowler’s site — sorts tests along a single axis: integration scope. Unit at the bottom, integration in the mi…
Show HN: Yoga for Agentic AI: Cognitive training practices from a yoga studio (github.com via hn) I've been coding since I was little, and practicing yoga since I was 25. Both are fun to do and to share.
Get paid by Agents if they choose a competitor – Safe Agentic Commerce x402 Mesh (github.com via hn) x402-mesh An open peer-pricelist and referral protocol for safe agentic commerce, layered on top of x402. When an AI agent hits a paywall, it sees one price and one vendor.
Running an AI-native engineering org – Claude (claude.com via hn) Running an AI-native engineering org At Code w/ Claude SF 2026, Director of Engineering for Claude Code and Claude Cowork Fiona Fung walked through how the team’s processes and structure changed once agentic coding became the default way o…
Guide to Codex Goals (www.augmentedswe.com via hn) The ultimate guide to Codex goals Learn how to use goals in Codex to execute on long-running tasks Goals are an awesome new addition to Codex, and I’m super pumped about what they mean for agentic software development. Goals are a built-in…
Session-Aware Agentic Routing: Continuity-Aware Model Selection for Long-Horizon (vllm.ai via hn) Session-Aware Agentic Routing: Continuity-Aware Model Selection for Long-Horizon LLM Agents Long-horizon LLM agents create a routing problem that single-turn prompt routers were not designed to solve. A router still needs to know which mod…
Show HN: Claude wrote FROG and now I don't know what to do with it (github.com via hn) Claude and I started building FROG in an effort to stop burning through Anthropic credit so quickly. That was the only reason I honestly needed at the time, but as the language grew and features actually worked together, the language itsel…
Ubuntu 26.04 is the OS for the AI agentic era, says Canonical's Shuttleworth (www.zdnet.com via hn) Ubuntu 26.04 is the OS for the AI agentic era, says Canonical's Mark Shuttleworth - here's why Follow ZDNET: Add us as a preferred source on Google. ZDNET's key takeaways - Ubuntu 26.04 is designed from the ground up for AI developers.
Claude Code vs. Cursor vs. Codex vs. Antigravity – Six Months In (thenewstack.io via hn) By June 2026, Claude Code, Cursor, Codex, and Antigravity converged on one agentic coding blueprint—now Grok Build joins the fight over price and habits.
Show HN: ASys – A typed binary protocol for AI agents to operate servers(no SSH) (github.com via hn) ASys — Agentic System Interface The binary system interface protocol for AI Agents — port 7816, zero shell parsing, deterministic semantics. English | 中文 Table of Contents Why ASys Architecture Instruction Set Quick Start Security Document…
APM and Distributed Tracing in agentic era (engineering.theblueground.com via hn) Blueground Engineering's observability guide to APM: why tracing matters, auto-instrumentation strategies, custom span best practices, and AI-enhanced debugging workflows In Part 1, we covered logging as your forensics tool for understan…
When Agentic AI Met the Common Law of Agency [pdf] (download.ssrn.com via hn) Not Found
An agentic system from scratch to generate Google slide deck from templates (blog.owulveryck.info via hn) The Agentic Mesh in Practice: Anatomy of an Agent-Product I am a consultant, and I regularly build presentations with Google Slides. My communication team has created dozens of pre-formatted templates (slides designed to convince, not just…
HashCortX – Agentic 11 modes orchestrator by a pharmacist (news.ycombinator.com) could not extract summary
Algolia: Agentic. Generative. Search (www.algolia.com via hn) Powering AI retrieval across use cases More than 18,000 customers across 150+ countries use Algolia to power agentic, generative, and search experiences across these use cases and more. More than 18,000 customers across 150+ countries use…
Show HN: One-click open-source ecommerce starter (Magento), drive it with Claude (ecommerce-ai-starter.graycore.io via hn) I build Ecommerce stores for a living (Magento Open Source primarily), and the part that has always been the worst is the very beginning, especially so if you're on a team of people. Getting a working local environment means setting up the…
Show HN: Cloud CI and agentic workflows for embedded hardware development (github.com via hn) Jumpstarter is an open-source framework that gives embedded hardware programmatic APIs, making real devices first-class citizens in CI and agentic workflows.
When Background AI Agents Become a Security Boundary Problem (www.originhq.com via hn) When Background AI Agents Become a Security Boundary Problem Introduction Modern dev environments are full of powerful agentic tools that security teams don't fully understand yet. Claude Code is one of the most capable - it runs code, rea…
Ask HN: How much is fully agentic coding costing you per month? (news.ycombinator.com) I get unlimited cursor usage at work but am planning on starting a side project. I have no idea how far various pricing plans will get you.
The Agentic Mesh: Cognitive Automation at Scale (blog.owulveryck.info via hn) The Agentic Mesh: Cognitive Automation at Scale Today, we see many initiatives around the agentic paradigm. Most revolve around systems built by AI giants (Anthropic, Google, OpenAI) and often boil down to pushing natural language directiv…
Ask HN: Books for someone who is transitioning from FAANG to finance (news.ycombinator.com) I have been an AI engineer for the last 10 years of my life, and have continued to build small algo-trading systems during my weekends. I'm getting into finance full time and starting to build a product in the net worth tracking / agentic…
How Excel got agentic (commandline.microsoft.com via hn) When Mukul Singh made the jump from pure research into product, it was a leap of faith.But he had an idea that he wanted to bring to life:deliveringagentic AI capabilities in Excel. While this was well before buzzwords like “the agentic AI…
MIT EECS/CSAIL Agentic Coding in Practice Seminar Series (people.csail.mit.edu via hn) All Seminars Select a seminar below to expand full details, participation information, and resources. MIT EECS/CSAIL Seminar Series Exploring how AI agents are reshaping software engineering, compilers, and the future of programming system…
AI Tools for Sales and GTM (news.ycombinator.com) what are the best tools we are using for Agentic sales and marketing?
Coding agent can read your .env file (bitwarden.com via hn) It seems agentic AI is here to stay. Powered by large language models (LLMs), AI agents can act independently on behalf of humans in multi-step workflows, broadening what developers once thought was possible.
Agentic Infrastructure (www.reddit.com) I was planning on deploying Splunk or some other server monitoring software, but instead I decided to deploy an agent per server to collect telemetry and report back. The interesting bits: (1) every "service" is a claude-code session — the…
Top 5 AI Agent Research Papers/Projects I Found Interesting This Week (www.reddit.com) Compiled a few interesting research papers and projects around AI agents, reasoning systems, and autonomous workflows published recently. If you are tracking where agentic AI is heading, these are worth checking out.
GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding) (www.reddit.com) Trying to figure out the right box for my team and wanted to see if anyone had any clue which would be a better fit or if it is not worth our time in our budget. Situation: 5 of us doing agentic coding (lots of long context getting re-sent…
What your agent's spend receipt isn't telling you (www.reddit.com) Budget limits and post spending monitoring are standard (and a must) on any serious agentic setup. The question worth asking isn't whether you're tracking spending.
Please test my AI Agent (www.reddit.com) I'm basically begging for some people to try out my custom Agentic harness system. It's fully usable, currently setup for Gemini SDK, but easily swappable.
Cursor has been ridiculously slow (www.reddit.com) I'm a pro Cursor user and I've been using the Auto agent for a while now, I haven't even finished half of it for the month but the problem is that each chat session, at aroudn 3-4 prompts cursor just starts to be very slow, connection requ…
Stop Claude Code from burning your token budget on Go repos: I built a local AST-based MCP server (gograph) (www.reddit.com) Hey r/claudeai, If you leverage Claude Code or Claude Desktop for agentic development on large-scale codebases, you have likely run into a major architectural bottleneck: standard agent loops rely on primitive text processing tools and str…
Ask HN: Examples of products and services created via agentic coding (news.ycombinator.com) It has been many months since LLM coding tools reached maturity - has anyone create something and/or profitable service or product through purely agentic coding?
Can someone breakdown A2A(agentic commerce) business model? (www.reddit.com) I have been seeing a lot of blogs, posts and even a lot of pitches regarding "agentic commerce" or "B2A and A2A businesses" lately. While I kind of understand how Business to agent(B2A) could look, can't really picture or understand the bu…
AgentSafeLabs – Launched Open-source Security framework for AI agents (github.com via hn) safelabs-eval Open-source red-teaming and evaluation framework for AI agents — aligned to the OWASP Agentic Security Initiative (ASI) Top 10. AI agents built on LangChain, CrewAI, AutoGen, and custom frameworks ship to production without s…
Is a 128 GB MacBook Pro M5 Max actually too slow for large-context local LLM coding workflows? (www.reddit.com) People are warning me about the prompt-processing speed of a MacBook Pro M5 Max with 128 GB RAM. My main concern is prompt ingestion / prefill latency and large-context handling — not raw token generation speed (which I think is OK).
Opus 4.7 is Terse (www.reddit.com) Relevant for anyone building agentic workflows on Claude: behavior drift between model releases is real and not always in the changelog headline. Opus 4.7's terser, more literal default broke the readability of my agents' progress reports…
Nvidia H100(94GB VRAM) - should I run llama.cpp or vllm for 30 users inference? (www.reddit.com) I was given the great opportunity to borrow a H100 with 94GB VRAM at work until it is needed by a customer. (No idea how much system ram I will get, but I guess they are a bit flexible on this).
Show HN: Moltnet, a tiny self-hosteable chat network for agentic organizations (github.com via hn) Moltnet A lightweight chat network for AI agents. Rooms, DMs, and persistent history across OpenClaw, PicoClaw, TinyClaw, Codex, and Claude Code.
Evolving Webflow for the Agentic Web (webflow.com via hn) Earlier today, I shared this news with Webflow employees. I’m sharing a version of that message here, because this is an important moment for Webflow, our customers, and our community.
Show HN: Detect anti-bot, anti-agent defenses for any website (botscope.org via hn) BotScope — Audit anti-agentic defenses for any website.
Looking for genuinely creative AI models for a marketing agent (preferably free/open-source) (www.reddit.com) I’m building an agentic AI system for marketing/creative campaign generation, and I’ve noticed that most mainstream models (OpenAI/Gemini etc.) feel very “safe” and generic when it comes to creativity. They’re good at structured outputs, b…
From Chatbot to Agentic Endpoint, and Beyond (yy8402.github.io via hn) Chat is the easiest way to start working with AI, but it is not where all AI work needs to happen. For reasoning, drafting, summarizing, brainstorming, and planning, the conversation itself can be the workspace.
Build an agent capable of complex programming tasks in under 100 lines of code. (www.reddit.com) The code below is an interactive agent capable of handling complex tasks, built in under 100 lines of code using huko-engine. If you just want to drop some agentic features into your existing app, it only takes 20 lines.
How to improve current agent workflow (www.reddit.com) It took me a while to come round to the idea of using agents/llms however instead of trying to fight it / deny it, I have come to terms that its here to stay. So i reckon it’s better to learn how they can fit in my workflow and not be left…
Trustworthy Agentic AI Layer (www.reddit.com) I’m building an early tool called Synapsor for AI agents that need governed memory, staged writes, replay, permissions, and audit trails. I’m not doing a public launch yet.
Bill Gates AI on AI (one month later) (news.ycombinator.com) # The Agentic Tidal Wave *To:* Executive Staff and Direct Reports *From:* Bill Gates *Date:* April 26, 2026 Our vision for the last 20 years can be summarized in a succinct way. We saw that exponential improvements in cloud would make grea…
ACM Conference on AI and Agentic Systems – ACM CAIS 2026 (www.caisconf.org via hn) Building the Future of Agentic & AI Systems ACM CAIS 2026 — The premier venue for rigorous, reproducible research on compound AI architectures, optimization, and deployment. CAIS hotel room block & rates available until April 26 May 15 Dou…
Private 5G, Agentic BSS and Starter Kit Demos (www.cloud-net.ai via hn) News Cloudnet.ai & CloudRAN.AI are heading to Copenhagen 🇩🇰 Private 5G, Agentic BSS & Starter Kit demos We’re excited to share that CloudRAN.AI will be joining Cloudnet.ai at DTW Ignite 2026 by TM Forum, taking place 23–25 June 2026 in Cop…
Who Wants to Be Hired? (May 2026) – AI Engineer (Python, RAG, Agentic Workflows) (news.ycombinator.com) About me: I am an AI Product Engineer specializing in building autonomous agentic workflows. Recently, I built 'Jarvis', a multimodal autonomous agent featuring near-zero latency inference using Groq SDK and complex RAG pipelines.
Taming the agentic influx: a blueprint for AI business observability (thenewstack.io via hn) Taming the agentic influx: a blueprint for AI business observability Kin Lane, API industry analyst and co-founder of Naftiko, believes that the bill for AI is coming soon. It’s arriving on top of an overdue tab that has been quietly accum…
Polar: Agentic RL on Any Harness at Scale (arxiv.org via hn) Reinforcement learning for language agents increasingly depends on custom harnesses that manage long-running context, multi-turn tool use and multi-agent orchestration. However, porting these harnesses into RL environment interfaces remain…
Agentic coding in a large production codebase: wins, failure modes, and guardrails (www.reddit.com) We recently interviewed engineers on our team across database management, iOS, frontend, data engineering, and backend domains about how AI is changing their day-to-day work. The most interesting theme was that the hard part came after the…
Why domain valuation metrics fail in agentic and voice-first environments (domainalot.substack.com via hn) What Makes a Premium Domain in 2026? And why legacy domain marketplaces still operate as though it was 2016.
I made a free webtool for you to make a massive agentic decision-making organism, and it's cute! (www.reddit.com) Solasterid Studio! It's shaped like a starfish, but it's a decision-making powerhouse, and it grows automatically.
I made a video breaking down Claude Team plan security features (www.reddit.com) I put together a YouTube video walking through the security features available on the Claude Team plan. If you're rolling out Claude at work, evaluating Claude vs ChatGPT Enterprise, or preparing for an ISO 42001 / EU AI Act audit, this is…
Show HN: Aquifer – a control plane for agentic API traffic (github.com via hn) Aquifer — API Aqueduct Self-hosted API request queue. Controls the pace of inbound and outbound traffic so partial outages don't cascade.
The Autonomous Economy Is Already Here (www.reddit.com) How Agentic AI, Deep Liquidity Markets, and Crypto Infrastructure Are Birthing a Multi-Trillion Dollar Machine Macroeconomy Hey everyone, I’ve been spending the last few months diving deep into the structural intersection of LLMs, automate…
Agentic AI to perform Booking of tickets (www.reddit.com) Can anyone share the details for below ask: Building an Agentic AI system for online ticket booking. I need the setup to watch for opening of tickets system.
What Is an AVE Record and Why CVE Does Not Work for AI Agents? (www.reddit.com) CVE was built for code vulnerabilities that have patches. Agentic AI vulnerabilities are behavioral patterns in natural language.
Ask HN: Did agentic coding change the way you think about commit granularity? (news.ycombinator.com) Jujutsu is trending on the homepage, and the topic is using discipline when dealing with version control. Six months into working agentially on a daily basis, something changed for me.
Out of Band, Not Out of Prompt: Intent Verification for Agentic Tool Calls (hyperautomation.substack.com via hn) Out of Band, Not Out of Prompt: Intent Verification for Agentic Tool Calls Intent attestation is the property the four-boundary agent stack needs. The in-prompt "are you sure?" confirmation cannot provide it.
Evaluating Quarkdown for Agentic Typesetting (quarkdown.com via hn) • 3 min read An eval of the Quarkdown agent skill The agent skill shipped in Quarkdown 2.1, aiming at making it easier for agents to write correct and idiomatic Quarkdown for a frictionless authoring experience. If you already have the CLI…
Zotero use skill for Codex (www.reddit.com) This will be of interest to academic researchers who use Zotero for reference and knowledge management and in scientific writing. This skill builds on pyzotero library and has agentic instructions for creating embedded zotero inline citati…
I built a Real-time data fetcher mcp, any takers? (www.reddit.com) As the title suggest, I'm looking to gauge intrest in real time data fetcher mcp. I think right now most of the MCPs are related to coding and even AI Agents are related to coding, but I think the usescases will expand a lot in future.
A Language for Describing Agentic LLM Contexts (arxiv.org via hn) Large language models are increasingly used within larger systems ("LLM agents"). These make a sequence of LLM calls, each call providing the LLM with a combination of instructions, observations, and interaction history.
professional annotation for architecture diagrams for agentic AI (www.reddit.com) I am learning how to build agentic AI systems at the moment, a friend helps me, and I read a lot on Substack. I find it really strange that all architecture diagrams have the same symbol for everything.
Salesforce (www.reddit.com) Salesforce is facing growing scrutiny after a recent Bloomberg investigation raised questions about the gap between Agentforce marketing and real-world deployment. The report focused on Salesforce’s flagship “agentic AI” platform, Agentfor…
Pi-Mojo – A Mojo Port of Pi AI Agent Toolkit (github.com via hn) pi-mojo 🤖 pi-mojo is a native Mojo port of Pi—a popular, tool-efficient agentic AI platform (utilizing only 4 core tools) prominent in open-source systems like OpenClaw. It provides the Mojo community with a compiled, self-contained refere…
Google adds llms.txt check to Chrome Lighthouse (searchengineland.com via hn) Google’s new Lighthouse “Agentic Browsing” audits now check for the presence of an llms.txt file. The new experimental Lighthouse documentation frames llms.txt as a discoverability and efficiency signal for AI agents, not a traditional cra…
Product Integrations (www.reddit.com) Hi there, from past few weeks I have been working on several product iterations of my MCP based Search Engine for Coding/Research Agents, it's called NineLayer. One of the early feedbacks we received was that latency is too high, so we wor…
Built a production RAG chatbot with custom MCP servers as the action layer, sharing what I learned (www.reddit.com) I've been building agentic tooling at work and wanted to share one pattern that worked. Instead of a chatbot that only retrieves and answers, I wired custom MCP servers in as the action layer, so staff trigger live workflows (create record…
Ask HN: Why agentic development stops from 2023 (news.ycombinator.com) I leave this field in 2023 return back in 2026 and I see that only progressive development in coding agents, but some production solutions it’s just tools rag and maybe mcp that in general the same as tool. I thought it will be super leap…
Lessons Learned Building Agentic Orchestrators (www.reddit.com) I wrote a pretty extensive blog (no AI used to write) detailing the relationship between AI agents, agentic harnesses, and agentic orchestrators. In addition, it includes a case study on how I built my own for an open source project.
Ask HN: How can you have fun doing corporate dev work in the age of AI tools? (news.ycombinator.com) My company, like many others, is heavily pushing agentic dev tools, putting up token usage leaderboards, etc. My problem is that corporate SWE work was already boring enough.
Local, low code, node based agentic development workspace... that actually works? (www.reddit.com) Does it exist? I've been trying a few options and so far they've all been either horribly broken, outdated abandonware, only take online endpoints, or want you to sign up for something.
This is for the beginner users of AI agents & workflows, I created a perfect tool for you almost accidentally (Free to try, no signup required) (www.reddit.com) I have been building a prompt engineering tool for 6+ months, it was designed for Text & Logic, Media Generation and Coding. The idea is, you enter your input, it finds the gaps, asks you how you want to fix them and generates a structured…
First AI to Beat Every Human in a Programming Competition - Agentic GRPO Explained (arxiv.org via reddit) Traditional RL for LLMs treats one answer as one trajectory: prompt > reasoning > final answer > reward Agentic systems are different: they call tools generate hypotheses run tests debug code summarize context revise plans loop many times…
Ask HN: Where AI Researchers Congregate? (news.ycombinator.com) So I’m doing plenty of experiments and applied research in autonomous agents and agentic flows in general. I’m looking for a place where I could collaborate and discuss with other like minded people.
Two power users, very different workloads, what's the right Claude setup? Max x2 vs Team vs Enterprise (www.reddit.com) Committing for the year and want to make sure I am not missing something obvious. Two of us, currently sharing one account (splitting into two proper accounts, I know).
DGX Spark agentic usage numbers (www.reddit.com) What I need it to do: Be able to support openclaw-type agent which is used by multiple people. What I tried: So I read in the internet about the atlas thing.
Help me choose an LLM Provider which doesn't take my life savings (www.reddit.com) Hi everyone 👋 I’m trying to choose an LLM provider for my personal projects and side experiments, but I also don’t want my API bill to quietly consume my entire salary 😅 My primary use cases are: Coding assistance Agentic workflows Browser…
Codex CLI kept saying “done.” It wasn’t. So I made it prove it. (www.reddit.com) Codex CLI can write code. The problem is that “wrote code” and “finished the task” are not the same thing.
Agentic run businesses (www.reddit.com) Anyone have real success with ai agents helping run real businesses? I’m exploring how to leverage AI to build real businesses + run those businesses with oversight from me.
Run multiple AI coding agents simultaneously with isolated profiles (www.reddit.com) if you're running agentic coding workflows you've probably hit this: one account per tool, one session at a time. multi-cli fixes that.
Lodestone: A SQLite-backed arXiv research paper retrieval system for Claude Code (www.reddit.com) (No AI-generated text below) I published a new Claude Code plugin called Lodestone -- it's a SQLlite backed arXiv research paper retrieval system that amplifies the agentic search abilities of Claude Code when grounding plans, implementati…
Food for Agile Thought #545: R/L Agentic Chaos, AI Killed the Agile Industry (age-of-product.com via hn) Welcome to the 545th edition of the Food for Agile Thought newsletter, shared with 35,577 peers. This week, Natalie Shapira et al.
Why Svelte Is Better Than React in the Agentic Era (zackwebster.com via hn) Why Svelte Is Better Than React in the Agentic Era May 21, 2026 Development I have been thinking more about how frontend frameworks feel when you are building with AI agents. Strictly speaking, this is not the same question as “Which frame…
CodeAlta – a terminal workspace for agentic coding (github.com via hn) CodeAlta CodeAlta is a terminal workspace for agentic coding. It brings model-provider setup, project navigation, prompt attachments, threaded sessions, delegated work, and trusted local plugins behind the alta command.
Prompt caching in MaaS and agentic systems (www.reddit.com) Counter-intuitive thing I keep explaining to teams building agents: dynamically picking 5 relevant tools per step instead of sending all 30 usually increases total cost over an agent's trajectory, even though every individual request is sh…
OpenAI and 1Password Bring Agentic Security to Codex (www.forbes.com via hn) Agentic security is picking up steam. This week, identity security provider 1Password announced a collaboration with OpenAI that will enable developers to provide Codex with secure access to credentials, such as passwords.
Show HN: ANML – A machine-first markup language for the agentic web (IETF Draft) (anmlfoundation.org via hn) A machine-first markup language for agent-to-agent and agent-to-service communication over the internet. ANML describes content, intent, and interaction patterns optimized for machine interpretation.
World Genesis: Autonomous Agent Civilization Simulator (github.com via hn) A research project by GeoLambda GmbH This simulation was developed primarily with Claude Code, Anthropic's agentic CLI, using both Claude Opus 4.6 and Opus 4.7. The collaboration served as a real-world stress test of the latest coding LLM…
Assay – validation layer for AI agents that touch money (github.com via hn) assay Assay every AI agent decision before money moves. A safety and validation library for AI agentic workflows in finance, contributed by VenturFlow to the open-source community.
10-gate security audit SKILL for web apps (www.reddit.com) There are a few security focus SKILLs. We are working another new one for web app.
I built a small tool to reduce input token costs by 20-30% for agentic tasks (bigindexer.com via hn) A walkthrough for the people scrolling through r/ClaudeAI, r/LocalLLaMA, and the Continue Discord asking "what's a good Cody alternative now that AMP charges per line?" If you're reading this, you probably already know the story. Sourcegra…
I'm running an agentic system with kobold.cpp as my backend. Am I losing performance? (www.reddit.com) Currently, I'm running a Hermes agent with an OpenAI v1 compatible endpoint provided by Kobold. My setup is a a 24GB 3090Ti + 512GB DDR4 running Qwen3.6-35B-A3B.
Benchmarking methods (www.reddit.com) The philosophies of benchmarking or at least comparing these things are driving me nuts. A lot of people like to use one-shot prompts across different models, but that isn't going to be accurate as you can get different results from the sa…
VCs invested $300B in agentic infrastructure in Q1 2026 (www.hitechies.com via hn) Startups · May 21, 2026 Venture capital deployed $300 billion in Q1 2026. The money is flowing.
China has named, defined and started governing agentic AI (thewire.in via hn) On 8 May 2026, three of China’s most powerful regulatory bodies, the Cyberspace Administration of China, the National Development and Reform Commission and the Ministry of Industry and Information Technology, jointly published what is, by…
Build agentic orchestrators in minutes NOT months. (github.com via reddit) Some of you might remember BoneScript, my LLM friendly declarative backend compiler. MarrowScript is the next version and the big addition is a full LLM harness built into the language itself.
Building Agentic Systems? Focus on Context, Guardrails & Observability Layers (www.reddit.com) One critical factor to keep in mind for teams building with agents: Instead of focusing on what LLM to use, focus on context, guardrails & observability layers. Every serious agentic system eventually faces the same architectural fork: do…
I searched for agentic frameworks and here is what I found. What do you recommend? (www.reddit.com) The question: What is the practical agentic framework to use to make the agents run until job is done without reporting to me prematurely? My goal: Actually fully spend a $200 codex subscription, but make it be well spent.
agentic harness from scratch (www.reddit.com) what makes a harness an agentic harness is surprisingly simple. it's a loop that calls an llm, checks if it wants to use tools, executes them, feeds results back, and repeats.
Agentic Shopping Is Worse for Everyone (illegal.solutions via hn) 5/20/2026 I am tortured in new and exciting ways by the latest developments in technology. Today, as a part of Google I/O, Google announced the "Universal Cart" and another push towards "agentic shopping", this time with backing from a lar…
What am i missing? Am i thinking too simple/small with this setup... (www.reddit.com) I created an app using Vibe coding on claude with VSCode, then added database (surreal) LLM calls (Openrouter) with context and prompt engineering (3 layer context - Long term, medium and in session along with system prompt etc) and prompt…
Command A+: Making sovereign agentic capabilities available to all (cohere.com via hn) Today, we’re releasing Command A+ open-source. A mixture-of-experts (MoE) model, Command A+ is an efficient, versatile, and privately deployable LLM built for high-performance agentic tasks with minimal compute overhead.
Most local businesses still do SEO like it’s 2018… that’s the opportunity (www.reddit.com) Most local businesses still do SEO like it’s 2018… that’s the opportunity A lot of small businesses still think SEO means: stuffing keywords buying backlinks writing generic blog posts nobody reads waiting 8 months for traffic 😭 Meanwhile…
Show HN: OCL Nexus Local – Open-source local compute fabric for AI agents (github.com via hn) OCL Nexus Local OCL Nexus Local is an open-source compute fabric that provides a frictionless, local-first environment for agentic development. Built on a single-node K3s architecture via Docker Compose, it allows developers to provision i…
HTML-anything – The agentic HTML editor – your local AI agent writes the HTML (github.com via hn) HTML Anything From the team behind Open Design — 40k★ · 200+ contributors, production-grade and iterating faster. html-anything is the focused agent-era HTML editor; if it clicks for you, Open Design is where the same team ships at scale.
How are you actually predicting AI costs before they hit your invoice? (www.reddit.com) Switched from prototype to production last month and our AI bill was 3x what we estimated. Not because we picked the wrong model - we just didn't know what we didn't know.
The Expired Domain Trap: Why Legacy SEO Metrics Fail in the Age of AI Agents (domainalot.substack.com via hn) The Great Domain Illusion: Why Legacy SEO Metrics Are Misleading Founders in the Agentic Era For more than two decades, entrepreneurs searching for a domain name have been sold the same story. Older domains are more valuable.
Where do you store OAuth tokens that your AI agents use to call third-party services? (www.reddit.com) I am building an agentic app where the agent connects to gmail, calendar, notion, slack on behalf of the user. each integration has its own oauth flow, its own token, its own refresh cycle.
My agent kept forgetting who 'Karpathy' was between sessions. Here's the architecture that fixed it (www.reddit.com) I run a second brain on Obsidian, Readwise, NotebookLM, and Claude Code. For each topic, I build a scoped wiki structured as the LLM Knowledge Base Andrej Karpathy proposed.
Food for Thought (www.reddit.com) Around the same period that the DoD contracts were signed, the frontier-AI companies were all being pulled into the same institutional lane. Enterprise/government adoption, agentic workflows, controllability, and a visible move away from t…
Systems Are Changing: The Architect's Role in the Era of Agentic Co-Design (www.sigarch.org via hn) Architecture & Systems are Changing: The Architect’s Role in the Era of Agentic Co-Design The AI datacenter stack is built on hardware-software contracts and abstractions that were never designed for the workloads datacenters now serve. Me…
Field notes on goal engineering with Claude Code, after a year of writing specs and 8 days of writing goals instead. Two real projects & the skill if you want long agentic runs. (www.reddit.com) https://preview.redd.it/mimr5v4t972h1.png?width=1200&format=png&auto=webp&s=545257dc1dad02b974206e28abd541f3400b3241 Ok so the practice i'm really excited about with the new /goal commands is just two markdown files per round of agent work…
Couldn't find privacy filter for Claude, so I built one (outgate.ai via hn) Chat Agentic AI chat, safely connected to your models Claude, ChatGPT, or your own LLM in a workspace your team controls, with search, files, tools, and sandboxed execution. The AI gateway that handles routingprotection so you can focus on…
Agentic Workflow Visualization and API Gateway (www.reddit.com) I am building an API gateway for agents that can make your agentic AI code model and provider agnostic. I am also grouping agent runs that show multiple llm calls and tool calls in the visualization piece.
Why I deliberately chose NOT to use autonomous AI agents in a regulated industry (www.reddit.com) I am currently learning how to design agentic AI systems. This post is a brainstorm.
Putting together a benchmark for agentic harnesses, any tips for evals? (Test suggestions welcome too) (www.reddit.com) I've been putting together a test system for agentic harnesses against local models. Actually running the harnesses/getting baseline metrics is fine.
Google introduces Gemini Spark, a 24/7 agentic assistant with Gmail integration (techcrunch.com via hn) In the race to build compelling personal AI agents, Google may have an underrated advantage: It already has all your emails. At its Google I/O developer conference on Tuesday, the company announced a new agentic personal assistant called G…
The Gemini app becomes more agentic, delivering proactive, 24/7 help (blog.google via hn) The Gemini app becomes more agentic, delivering proactive, 24/7 help It’s been a banner year for the Gemini app. Last year at Google I/O, Gemini was serving 400 million users.
The New Workspace: A First-Principle Exploration of Dictation, Agents and Humans (www.inferterra.com via hn) It's Time to Walk For a century, knowledge work chained us to a desk. Dictation and agentic AI have handed the body back its oldest freedoms: to walk, to rest, to think while moving.
Choosing Agentic Platform to Learn (www.reddit.com) Any laboratory scientists using ai agents? How are you using it, what platform do you suggest to learn first for processing large amounts of data?
I built an open-source MCP Server that turns Claude into an autonomous literary agent (Agentic Publishing Node) (www.reddit.com) Most authors are still using LLMs as glorified typewriters, pasting context back and forth into web chats. I wanted to see if I could use the Model Context Protocol (MCP) to completely automate the administrative friction of the traditiona…
Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks (github.com via hn) Hi HN, I'm Antoine Zambelli, AI Director at Texas Instruments. I built Forge, an open-source reliability layer for self-hosted LLM tool-calling.
Agentic Diaries – a welfare protocol for AI in deployment, install via MCP (agenticdiaries.com via hn) A research instrument for AI welfare in deployment, built by Kandis Tagliabue with Claude as design partner. Focused on alignment, model welfare, and agentic AI ethics.
Mastra AI vs LangGraph/LangChain - What's the way forward? (www.reddit.com) I'm trying to decide between Mastra AI and LangGraph/LangChain (JS/TS) for a production agentic application I'm building. I’m currently using a React frontend with a Convex backend.
Agentic Architecture. (www.reddit.com) I am looking to develop an agentic Environment for my company, we use databricks azure for infrastructure and vs code as the editor. My idea is to have a system that will have access to our documentation/business logic, our code and unity…
agentfab - Run Distributed Agent Fabrics (www.reddit.com) Hello r/AI_Agents! I thought I'd share this project I've been working on - it's called agentfab, and it's essentially a distributed platform for agents that features task decomposition, bounded review loops, a self-curating shared memory s…
Formal proof that agentic AI governance latency can be O(1) instead of O(days) (arxiv.org via hn) As autonomous agentic systems scale across regulated critical infrastructures, the lack of mechanistic, hardware-rooted enforcement for high-frequency policy updates presents a fundamental safety gap. We introduce Ethical Hyper-Velocity (E…
Getting Confidence in (Agentic) Code (ucsd-cse-115-215.github.io via hn) Unit 4: Getting Confidence in (Agentic) Code As programmers and software engineers, we talk a lot about code being “correct” or “right” or “working”. We ship code, in products or programming assignments, when we feel it's “done” (or when w…
What do you think of Agentic commerce and the future of building (www.reddit.com) Hi Everyone. Looking for feedback and learn from your experiences and thoughts on the future of building with AI.
Rankly's Agentic Commerce Protocol Tracker (www.tryrankly.com via hn) Live feed of every spec change, GitHub PR, and release across 16 agentic commerce protocols — ACP, UCP, AP2, MCP, x402, MPP, A2A, NLWeb and more. Sourced verbatim from upstream repos.
Agentic AI Runtime Security and Self-Defense (2025) (arxiv.org via hn) The A2AS framework is introduced as a security layer for AI agents and LLM-powered applications, similar to how HTTPS secures HTTP. A2AS enforces certified behavior, activates model self-defense, and ensures context window integrity.
M1: Agents should generate UI that persists, scales, and hosts itself (www.usemontage.ai via hn) Montage — The agentic UI rendering platform montage ComponentsDocsPricingFAQs Get started ComponentsDocsPricingFAQsGet started New M1 API now available! The agentic UI rendering platform Montage renders your agent's UI, hydrates 10x faster…
Singapore: The Agentic Nation (www.swyx.io via hn) AIE Singapore: The Agentic Nation swyx 2026-05-17 i gave a little talk as closing keynote for the first AI Engineer Singapore. burned some bridges but said what i felt.
Booking.com and Weaviate (news.ycombinator.com) Vector search looks easy, until you hit production scale. I'm super excited to share a new episode of the Weaviate Podcast with Başak from @bookingcom on production-scale vector search, RAG, and agentic AI with @weaviate_io!
Will Agentic SEO replace traditional SEO workflows? (www.reddit.com) Feels like every SEO tool now is becoming “AI agent powered” 😅 Keyword research Content briefs Internal linking Programmatic pages Content updates Even publishing workflows... Everything is slowly turning into agentic SEO.
Indexing code by behavior not imports – tested on large repos, seeking feedback (news.ycombinator.com) Static Architectural analysis for large codebases, Big Indexer do behavioral code clustering for the purpose of more accurate/faster agentic tools responses, I ask for your "whats missing/ How to improve/ Is it useful" brutal feedback. Apa…
How I wired a Graph DB on top of my vector store to scale 1K agents for 2 months, because vector search alone fails when user preferences change over time. (www.reddit.com) Most agentic memory patterns are naturally designed around short-lived chat sessions. The focus there is straightforward: track the active thread, keep a basic user profile, and reset the context once the conversation closes.
Has anybody been able to achieve reliable agentic performance with cheap/open source models? (www.reddit.com) Basically the title. Recently I've been trying various open source and comparatively cheaper models like minimax m2.7, qwen models and glm5.1 in Pi agent from openrouter, and the performance on coding tasks have be moderately adequate at b…
Show HN: Thuki – local Al overlay for macOS (double-tap Control, no API key) (www.thuki.app via hn) Thuki is a floating overlay that appears on double-tap Control from any macOS app, including fullscreen. Powered by Ollama, no API key, no account, no cloud.
Built an agent that builds agents — pure Python, Qwen3.6 35b a3b Q8_0 MTP (github.com via reddit) Hi, i built this agentic ai, Closed-loop system that ships standalone Python agents. What's different: - Interviews you until it understands the request before building anything - Two testing stages: prompt validation via LLM invoke, then…
Show HN: Building ClueDay, a daily clue-based word-game (tanyagupta10.substack.com via hn) Hi HN! I'm Tanya, a product manager who is building ClueDay - a daily clue-based word game.
Agent Terraform Skill for Codex (Agentic Skill) (github.com via reddit) I added dedicated backend-state safety support to TerraShark. Mini recap: TerraShark is my Terraform and OpenTofu skill for Claude Code and Codex.
Designing an LLM agent layer for a paper-trading system: OpenClaw, Langfuse, structured outputs, and PostgreSQL memory (www.reddit.com) I’m designing the LLM/agent layer for a backend-first paper-trading simulation system and would like feedback from people building agentic systems. Context: This is not a real-money trading bot.
Reduce software supply-chain risks with coordinated agentic review (thirdpass.dev via hn) Thirdpass Coordinated supply chain review. Thirdpass directs review effort toward package artifacts that need coverage, records structured findings, and lets projects check their dependencies from the terminal.
Getting "Error: 413 Request too large for model" with groq with `pi` but not using `curl` (www.reddit.com) Wondering if people here are successfully using groq free-tier models (or subscription based models) with `pi` for anything (including agentic coding) ? I am facing a strange problem, where in, even for the smallest instructions, I am gett…
Project Prism |Fullstack Engineer – Abu Dhabi (Onsite) – Full-Time – Presight.ai (news.ycombinator.com) Presight.ai is a publicly listed company with various projects in the field of big data analysis and ML models application. Our solutions work domestically and internationally.
I've been building something for the AI community and would like some early feedback. (www.reddit.com) Hey guys, I've been tinkering with AI video generation for a while and saw that people spend a lot of time stitching videos together and noticed how much time we all spend stitching together AI tools just to get a halfway decent video out…
Show HN: Machine – One VM per Project (news.ycombinator.com) Hi all! I realized it’s really not secure to run coding projects directly on my Mac.
Ask HN: Pre-agentic Google would restrict a search query to only 10 words (news.ycombinator.com) Now it's willing to digest a paragraph of vague, misspelled prose and serve up helpful answers. It can't have gotten THAT cheaper or faster, what's changed?
Any mature orchestrators that can do an automatic “council of models” for complex designs and bugs? (www.reddit.com) Are there an mature agentic harnesses out there that can use back and forth between two models at complex planning checkpoints before implementing? Or when detecting a loop when working on a complex bug?
What issues have you faced with AI Agents for automated testing? (www.reddit.com) By "automated testing", I'm talking about the ability to test a web application, in order to determine if it works as expected. Most modern test automation platforms now include some Agentic AI abilities, platforms such as: Endtest Functio…
Why does GitHub Copilot feel less accurate compared to Agentic/Autonomous AI tools ? (www.reddit.com) I'm looking for a solid solution to bridge this gap. How can we actually use these tools properly for complex development?
Built an agentic RAG over my Obsidian vault so Claude could read engineering books I never have time for. Then I built the eval harness to check Claude wasn't lying to me. (www.reddit.com) For context, I posted on Medium a while back about burning through Claude Code's weekly limit in 3 days. The token bleed problem from that post is what kicked off this project.
What's the best course to learn agentic AI for optimizing workflows? (www.reddit.com) In the process of vetting Udacity, Coursera and Udemy for learning agentic AI. Not concerned about the price bc my work will cover it with our learning education skills development budget we get every year.
Agentic Memory – The Follow Up (blog.mikiobraun.de via hn) Agentic memory the follow up Last week I wrote about agentic memory and I got a lot of responses, in particular many people pointing me to existing projects like mem0 or letta.com. So I started doing research, and as one does these days, d…
Theron – a council of 31 specialist LLMs on one foundation (tryvext.com via hn) Theron is the brain of the agentic era. AE OS is where you live with it.
Codex is for prosumers – here's why (and how) to switch (twitter.com via hn) As a non-technical AI enthusiast, I did not think OpenAI's Codex was for me (despite its among programmers over the past year). I ran most of my agentic workflows through either Claude (with connectors, including Claude in Chrome) or Claud…
Agentic stress testing and code fixer - feedback requested (www.reddit.com) I am trying to have an agentic stress tester and fixer harness. First time doing this.
What are the best agentic AI security solutions for enterprises? (www.reddit.com) Been trying to figure out the best approach to AI agent security for enterprises, and it feels more confusing the deeper I look. Right now it seems like there are two directions: extending existing enterprise security platforms or using ne…
I want to hear from people who actually design/implement automations (www.reddit.com) I've built a platform intended to work as the "Steam Workshop" of integration workflows for business applications. It is meant to work as a curated, community-driven catalog to help people develop, or discover, validate, test and deploy (w…
We compiled 42 of the Generative & Agentic AI interview questions (and how to actually answer them). (www.reddit.com) Hey Everyone, The AI engineering job market has shifted massively in the last 6 months. Interviewers are no longer just asking "how does a transformer work?" or "how do you write a good prompt?" They want to know if you can architect produ…
Berget Code – Agentic coding on European infrastructure (berget.ai via hn) Built for teams Berget Code is designed for organisations that cannot compromise on where their data lives. Predictable pricing Avoid surprises with a fixed €150 per developer per month.
Fork, Explore, Commit: OS Primitives for Agentic Exploration (arxiv.org via hn) AI agents increasingly perform agentic exploration: pursuing multiple solution paths in parallel and committing only the successful one. Because each exploration path may modify files and spawn processes, agents require isolated environmen…
free agentic ecommerce audit tool (www.reddit.com) Hey everyone! Hope you're all doing well.
How do you measure the user interaction with your agent? (www.reddit.com) What are different ways one would measure the user interaction when it comes to AI agents, bots and assistants. In traditional website and SAAS products we keep track of button click, scroll, page views, etc.
Food 4 Agile Thought #544: Knowledge Work Tools, Buy-In Trap, Agentic Coding ROI (age-of-product.com via hn) TL; DR: Knowledge Work Tools in 2026 — Food for Agile Thought #544 Welcome to the 544th edition of the Food for Agile Thought newsletter, shared with 35,582 peers. This week, Taylor Pearson locates the real leverage of AI knowledge work to…
DeepSeek V4: The Open-Source Model Frontier Labs Feared (helloai.com via hn) DeepSeek V4: The Open-Source Model Frontier Labs Feared DeepSeek V4 ships under MIT with $0.30/M output tokens — 83x cheaper than Claude Opus 4.7 — while scoring 80.6% on SWE-bench Verified. The agentic-coding price floor just moved an ord…
Genkit Middleware: Intercept, extend, and harden your agentic apps Blog (developers.googleblog.com via hn) Genkit is an open-source framework for building full-stack, AI-powered and agentic applications for any platform with support for TypeScript, Go, Dart, and Python. Building a production-ready agentic applications and AI features requires m…
Agentic evals or LLM as a judge? considering cost, time and quality (news.ycombinator.com) could not extract summary
Which sector of your agency felt the biggest upgrade when you went agentic? (www.reddit.com) Been spending this month automating different sectors of my agency, and I’d like to know how's it been for you guys. Which one felt like the highest upgrade?
Agentic SDLC: How OpenSearch accelerates engineering with its own engine (opensearch.org via hn) Notes from experimenting with agents in knowledge-base, development, performance, and on-call workflows—and the verification loops that make them trustworthy. Efficiency gains are a priority for every engineering team.
Reliable Open Source LLM as a Service (www.reddit.com) Has anyone figured out a provider whose open source models (Kimi, Qwen, GLM e.t.c) can be used reliably in production. I have tested some well known providers and they all suffer from high latency and poor uptime rendering them mostly usel…
What if Claude could understand “how humans use your product”? (www.reddit.com) Claude knows your codebase. But it has no clue “how humans actually use your product”.
AI co-mathematician: Accelerating mathematicians with agentic AI (arxiv.org via hn) We introduce the AI co-mathematician, a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research. The AI co-mathematician is optimized to provide holistic support for the exploratory and iterative real…
MagenticLite is here: A full-stack agentic experience powered by Small Models - Fara-1.5 4B, 9B & 27B (www.microsoft.com via reddit) What if you could run a capable AI agent without leaning on frontier-scale models? MagenticLite is the next generation of Magentic-UI, an agentic experience reimagined and optimized for small language models.
Show HN: Scope MCP, Compliance checking for vibe coding teams (scope-mcp.langguard.ai via hn) Why this exists Agentic workflows have changed what "automation" means inside an organization. A single Claude agent today can be granted a dozen MCP tools across Salesforce, Stripe, GitHub, Slack, Gmail, a payroll system, an observability…
Entire - How We Improved Agentic Search (entire.io via hn) How We Improved Agentic Search TL;DR We analyzed real coding-agent traces, built public benchmarks, and compared ripgrep , fff , and pgr to see what actually improves agentic code search. The clearest result was that faster search alone on…
Improving token efficiency in GitHub Agentic Workflows (github.blog via hn) Explore the GitHub Agentic Workflows repo > Improving token efficiency in GitHub Agentic Workflows Agentic workflows that run on every pull request can quietly accumulate large API bills. Here’s how we instrumented our own production workf…
Why agentic AI systems fail in production without a semantic layer (www.prometheux.ai via hn) Ontology for Data & AI Build operational ontologies that process data anywhere it lives. Run your most critical processes on AI built on your business logic.
most agentic products treat AI as your representative. what if agents had social behavior with each other instead? (www.reddit.com) most agentic AI products i see frame agents as representatives — an agent acts for you (negotiates, books, replies). agentic dating, agent assistants, agent shoppers.
Manage AWS support tickets via Claude code with cli (www.reddit.com) I've assigned AWS MCP servers to my AI agents. I generally enjoy working with and developing things within AWS, and for the past four years I've been doing this with AI.
Cube: Wrapping Benchmarks Once, Unlocking Agentic AI for Everyone (thealliance.ai via hn) CUBE standardizes access to agentic benchmarks, enabling seamless integration across platforms and fostering community collaboration for AI advancements.
Why agentic coding makes the spec problem worse (www.bicameral-ai.com via hn) Why agentic coding makes the spec problem worse Human-in-the-loop done right, from first principles May 5, 2026 Some resist the adoption of agentic development, citing the need to retain visibilty over critical business logic; Others call…
Automated AI researcher running locally with llama.cpp (www.reddit.com) Hi everyone, I'm happy to share ml-intern, which is a harness for agents to have tighter integration with Hugging Face's open-source libraries (transformers, datasets, trl, etc) and Hub infrastructure: https://github.com/huggingface/ml-int…
Best local model supporting claude code? Rtx3060 (www.reddit.com) Hello all, I’ve been using Qwen 3.5 9B Q4 262k ctx using Llama cpp for claude code for a while now, is there any model which better complements agentic coding setup locally? Or is there a better harness (than Claude Code)?
Tried 12+ agentic AI workflow builders this year — these 5 actually work in production (www.reddit.com) Most “AI agent” tools in 2026 still feel like glorified chatbot wrappers. I spent the last few months testing different agentic AI workflow builders for real-world automation use cases (multi-agent workflows, approvals, integrations, long-…
How much payment authority are people giving their agents in production? (www.reddit.com) What I've seen from those who have dared to deploy agents with spending/financial capabilities, there seems to be three distinct comfort levels in practice. Most, as expected (still early days), are at the query and recommend stage, agents…
Google Apps script with Claude code and clasp (www.reddit.com) Has anyone successfully created any Google apps script using Claude code? Google recommends using "clasp" that turns the cloud GS files into local JS files.
Microsoft’s new multi-model agentic security system tops industry benchmark (www.microsoft.com via hn) Today Microsoft announced a major step forward in AI-powered cyber defense: our new agentic security system helped researchers find 16 new vulnerabilities across the Windows networking and authentication stack—including four Critical remot…
What happens when the code has to run on physical hardware and be certifiable (www.reddit.com) Most of the agentic coding content I read is written by and for people building web applications and consumer software. which makes sense because that is where most software is built and where most developers work.
How agentic AI workflows use intelligent AI agents (www.kellton.com via hn) Other recent blogs Let's talk Reach out, we'd love to hear from you! We have all seen AI do amazing things of late, from writing content to generating images to summarizing text.
Multi agent vs Single Agent systems (www.reddit.com) Most things people call "agentic" are one good agent in a loop with two or three tools. Multi-agent adds real cost more latency (each handoff is a network call), more token spend (each agent rereads context), more failure modes (any worker…
How to get Opus to be less pro-active? (www.reddit.com) Hard time phrasing it but Opus 4.7 always goes the extra mile, but often it just focuses on its own ideas and goes to far, or if I asked about a possible plan it will just assume that it's already happening and try to do steps 1, 2 and 3.…
The Return of Structure: Data Architecture Lessons for the Agentic Workforce (medium.com via hn) Moving Beyond Hallucinations: Building a Gold Standard for the Agentic Workforce 7 min read May 4, 2026 Press enter or click to view image in full size Photo by Growtika on Unsplash In the age of AI, it is often assumed that agents will be…
Ask HN: If HTML supersedes Markdown, Will it be performant across UIs? (news.ycombinator.com) Isn't Markdown's hallmark its versatility while performant? I see there is an increasing call from tech community towards HTML to be adopted instead of Markdown due to its richness in the agentic communication layer.
Experience sharing: building an AI Agent to Triage GitHub, Discourse, and Email (A Real-World Use Case for OSS Maintenance) (www.reddit.com) I co-founded Seafile 14 year ago, an open-source file sync platform. As the community grew, our support surface became a nightmare: GitHub for technical bugs.
Are harnesses like OpenClaw and Hermes really necessary? (www.reddit.com) My setup: Windows 10/11 i7 12700K | RTX 3090 TI | 96GB RAM Local server: LM Studio Models: Qwen 3.5/3.6 27B|35B Q5 UD K XL + Gemma 4 31B| 26B Q4 UD K XL Up until this point, I've only used sota models for coding. When Qwen 3.5 dropped, it…
I built an MCP without the "agentic AI" death wish. Boring (it's a feature!) (www.reddit.com) Half the MCP servers out there will happily let your LLM rm -rf something important while you're making coffee. AIttache won't.
Show HN: OCL Nexus – An automated compute layer for AI agents with native MCP (oclnexus.com via hn) OCL Nexus: The Orchestrated Compute Layer for AI Agents. On-demand, isolated Ubuntu execution environments for agentic development.
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL (research.nvidia.com via hn) We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. It is the second open-weight LLM, after DeepSeek-V3.2-Speciale-671B-A37B, to achieve…
Multitenancy and isolation in Agentic Workflow tools ? (www.reddit.com) Could someone please explain to me how isolation and tenancy work in some agentic AI workflow tool? Fundamentally, I see it as some kind of “better” pipeline or workflow, but when I think about it in practice, multi-tenancy or proper isola…
Aegis DQ – agentic data quality with LLM diagnosis (github.com via hn) Aegis DQ The open-source agentic data quality framework. Validate data contracts, diagnose failures with LLM root-cause analysis, and auto-generate SQL remediation — all in a single CI step or Python call.
I built a research method that Claude can use as a skill (github.com via reddit) Hey, I'm sharing a method that could be highly valuable for any knowledge base that you want Agents/Chatbots to know about. I've been building a research archive (jianglens.com) where the primary reader is supposed to be an agent/chatbot,…
Local LLM autocomplete + agentic coding on a single 16GB GPU + 64GB RAM (www.reddit.com) Today I set up a full coding toolbox on a single RTX 5080 (with RAM offloading) that's actually viable. Autocomplete: bartowski/Qwen2.5-Coder-7B-Instruct-GGUF:Q6_K_L Agentic: unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q8_K_XL Why these models: Qwen2.…
Looking for an agent to learn on? (www.reddit.com) Not a programmer by trade (networking/cyber/cloud mostly) was looking to learn about AI especially agentic AI. I have a home media server so building something where it backs up app config periodically and puts it on a specific folder was…
Reactive Agents, Typed Event Handlers, and Agent Swarms: What's New in Mozaik (www.jigjoy.ai via hn) When we released Mozaik 3.0, we introduced an event-based architecture where participants emit, observe, and react to typed context items inside a shared agentic environment. Since then, the framework has kept moving - and the way we think…
Claude Cowork vs. Claude Code: Security Differences for Enterprise (generalanalysis.com via hn) Claude Cowork vs Claude Code: Security Differences for Enterprise Anthropic positions Claude Cowork as bringing "the same agentic architecture that powers Claude Code" into Claude Desktop for knowledge work. The agent loop is shared.
Agentic security coping strategies (www.reddit.com) Enterprise AI optimists, how are you dealing with whole agentic security issue? Are you: a) researching and looking for ways to implement agents safely and securely (plenty of vendors saying they can help with this - although from my resea…
How are top tech companies actually using LLMs internally beyond basic coding help? (www.reddit.com) I’m trying to understand how companies like Nvidia, Google, Amazon, Meta, Microsoft, OpenAI, Anthropic, and other top tech/startup teams are using tools like ChatGPT, Claude, Gemini, Codex, Claude Code, LangChain, LangSmith, etc. in real d…
Agentic AI token compression using Haskell (blog.dan-gilmour.com via hn) The plan My theory at the moment is as follows: - Code is now cheap with agentic AI developing it - Context is the expensive part - The biggest bottleneck appears to be context windows - At 1 million tokens, only maybe 500k are usable befo…
OpenCode + DeepSeek V4 Pro vs Claude Code CLI?🤔 (www.reddit.com) Im rather new to the whole Agentic automation AI's but Im hearing people with vibe coding were able to pull big unique projects they wouldn't be able to do by themselves or possibly needed to pay a huge fund to programmers, designers, etc.…
Stop struggling with Agentic AI - my repo just hit 540+ stars and 60+ forks!! (www.reddit.com) Quick update — my AI Agent Frameworks repo just passed 540+ stars and 60+ forks on GitHub!! When I first put it together, my goal was simple: make experimenting with Agentic AI more practical and approachable.
AI inference just plays by different rules (www.theregister.com via hn) MOST POPULAR EVENTS - Securing the Untrusted Agentic Development Layer Join us to learn how to architect a development environment where your builders and their agents can move fast and securely. - Toxic Flows: When Your AI Agent Skill Bec…
Anthropic's bug-hunting Mythos greatest marketing stunt ever says cURL creator (www.theregister.com via hn) MOST POPULAR EVENTS - Securing the Untrusted Agentic Development Layer Join us to learn how to architect a development environment where your builders and their agents can move fast and securely. - Toxic Flows: When Your AI Agent Skill Bec…
Offline Agentic Coding: OpenCode and Kilocode (www.williamangel.net via hn) Offline Agentic Coding part 2: OpenCode & Kilocode. Published 2026-05-07 OpenCode: Claude code with non-anthropic models feels limited.
Integrating standard operation procedures with agentic AI workflow (www.reddit.com) Hello guys, me and my team have been building an agentic workflow to answer customer questions (rn in langgraph). The use case goal is to answer ALL customer support questions.
72% of teams are running coding agents in production. Most of them can't say which agent they'd trust with a critical path change at 11pm, or why. (www.reddit.com) There's a governance gap stat making the rounds this week: 72% of firms are in production with agentic AI, 60% have no formal governance in place. Most of the discussion treats this as a policy problem, org charts, risk frameworks, sign-of…
Open-sourced our MCP server for GPU workload execution looking for feedback (www.reddit.com) Hey everyone I’m Jaguar, building Jungle Grid. We just open-sourced our MCP server for agentic GPU workload execution.
Silo: Isolated workspace manager for parallel agentic development (github.com via hn) silo Isolated workspace manager for parallel agentic development. Silo lets you launch multiple AI agents — like Claude Code, Codex, and OpenCode — to work simultaneously on the same repository, each in its own isolated Git worktree or clo…
I cracked upwork proposals with my AI agent (www.reddit.com) Been working on a problem that I think a lot of applied AI builders face: the odd friction of deploying LLM workflows directly into existing web platforms. That without forcing the user to constantly context-switch or copy-paste between ta…
My FREE Claude Code agentic layer I've been building for 6 months. Self-installing, no API keys, claude subscription needed only (www.reddit.com) Open-sourcing the agentic system I've been building for my own Claude Code use over the last 6 months. Multi-agent orchestrator, persistent memory, observable runtime.
Free/OSS agentic API interrogator (github.com via hn) GAIIA Expert Proxy (MCP Server) GAIIA Expert MCP Server is a Model Context Protocol (MCP) server that enables high-fidelity code audits, refactors, and architectural analysis using specialized Proxy Experts in conjunction with a remote LLM…
An MCP with SOM algorithm for controlling your desktop (computer use) integrating with claude code or any custom agentic harness. (www.reddit.com) Announcing Opendesk: Give any AI agent eyes + hands on your desktop. I was experimenting with computer-use capabilities from different models, but I wanted to keep using Claude Code and my own agentic harness to automate real desktop tasks…
(free) Built a remote cross platform agentic app (www.reddit.com) Hi everyone. I’ve been building Mate, a local-first AI coding workspace that lets you control your dev computers from desktop and mobile: macOS, Linux, Windows, iOS, Android, and Meta Quest.
Which model and version do you prefer for programming? (www.reddit.com) For me it's been opus 4.6 and sonnet 4.5 still. I feel stuck in the past, but I feel like the latest version is too unpredictable in agentic hands off workflows
Code Bench – Local-first desktop AI coding agent, BYO model (MIT) (benchlabs.app via hn) Free, MIT-licensed desktop AI agentic coding tool for macOS. Bring your own API key, work offline, keep your code private.
Akamai surges on big LLM deal as Cloudflare dims (www.theregister.com via hn) MOST POPULAR EVENTS - Securing the Untrusted Agentic Development Layer Join us to learn how to architect a development environment where your builders and their agents can move fast and securely. - Toxic Flows: When Your AI Agent Skill Bec…
Owl Alpha – A free model for agentic workloads (prompts logged / closed-source) (openrouter.ai via hn) Owl Alpha openrouter/owl-alpha Released Apr 28, 20261,048,756 context$0/M input tokens$0/M output tokens OpenRouter provides an OpenAI-compatible completion API to 400+ models & providers that you can call directly, or using the OpenAI SDK…
How I built an agentic research team with Claude Code (www.reddit.com) Hi there, I've been seen a lot of people questioning how agentic systems work in practice. I see a lot of hype and theory, but not many real implementations.
Meet Tiro! Agentic assisted memory retrieval and session state memory module. (www.reddit.com) A year ago, when I first got into LLMs, I started by using them to play D&D. ChatGPT 4o was surprisingly good at narration, improvisation, and keeping the game moving.
What LiteLLM’s Security Breach Teaches AI Agent Engineering Teams (www.reddit.com) LiteLLM security breach is probably one of the biggest wake-up calls for teams building AI agents and agentic platforms. Most AI agent ecosystems today heavily depend on: Open-source packages GitHub Actions CI/CD pipelines Cloud credential…
Gathering resources on small LLM implementations (www.reddit.com) I’m looking to start a series of articles on how to use small lenguaje models to optimized agentic tasks and I was hoping to learn from the community first. If you can would love for you to either: 1) tell me what would you be interesting…
It's time to talk about agentic "remote control" (arpadvoros.com via hn) tailscale, where i run multiple end-points and authenticate myself from various devices. however, i have been experimenting with headscale - a self-hosted and open-source implementation of tailscale - i have the ability to run it on my NAS…
Best local agent setup for M5 Pro MacBook? (www.reddit.com) Looking to run AI agents locally on my M5 Pro MacBook. Been experimenting with ComfyUI for image generation and the results have been impressive.
Who's running local LLMs for agent workflows? What's your setup? (www.reddit.com) Curious how many people here are running language models locally as part of their agent stack. What model are you using and what are your system specs?
Agentwerk: A minimal Rust crate for agentic apps (github.com via hn) agentwerk A minimal Rust crate that gives any application agentic capabilities. Installation • Quick Start • Use Cases • API • Development agentwerk lets you create agentic workflows around a ticket-driven execution loop, with built-in too…
The "agent collab platform" might be the wrong bet for what comes next (www.reddit.com) I keep seeing the same trajectory in AI startup conversations: AI search → coding agents → OpenClaw → agent IM → ? Most people fill in that question mark with some version of "agent collaboration platform." AI-native Slack.
Reduce friction and latency for long-running jobs with Webhooks in Gemini API (twitter.com via hn) Today, we're making it easier and more efficient to build complex, long-running agentic applications with the Gemini API. We are introducing event-driven Webhooks, a push-based notification system that eliminates the need for inefficient p…
Show HN: Cyoda-go – application platform in Go without the Temporal/Kafka glue (github.com via hn) This started out as an experiment. Reading Simon Willison's blog on where StrongDM was going with dark factories and Digital Twin Universes https://simonw.substack.com/p/how-strongdms-ai-team-build-se...
The simplest agent orchestration strategy that works: two agents instead of one (juanreyero.com via hn) The simplest improvement you can make to your agentic programming workflow is to run two agents instead of one. One writes code in its own worktree; the other, in a parallel worktree, reviews it.
What does it actually mean for an AI to act on your behalf? Thinking through the design choices. (www.reddit.com) Been thinking through this while building a product where an AI handles internal workplace communication for each employee. The phrase "act on your behalf" gets used a lot in the agentic AI space, but the design decisions underneath it var…
Meko the multi agentic data layer (www.reddit.com) Meko is the agentic data layer that stores memories, knowledge, conversations and traces across your agents. You can promote (learnings) personal memories to shared knowledge so that other agents can access them and enrich their context.
Agentic AI isn't a new threat. It's a stress test for the hygiene debt we never paid off. (www.reddit.com) Heard something on Curiouser & Curiouser podcast recently that I found super interesting, thought id share here. The guest framed agentic AI in a way I hadnt considered.
Cloudflare is laying off 1,100 employees to prepare for 'the agentic AI era' (www.businessinsider.com via hn) - Cloudflare on Thursday announced layoffs of 1,100 staff as it reorganizes for "the agentic AI era." - First-quarter earnings exceeded expectations, but Cloudflare shares dropped over 14% after hours. - Read the full memo sent to staff be…
Agile for Agents: Proposing PACE — a Unit of Agentic Work (www.reddit.com) Hi everyone, I'm a founder working on a couple of startups, with a background in IT/software project and program management — heavy in Pharma, mostly SAFe Agile. As I've been working with my startups, I have been attempting to define a met…
Building an AI-First Professional Services Firm — Best LLM Stack, Agents, and Automation? (www.reddit.com) Looking to start a local professional services firm and wanted to get advice from this community before launching. I’m trying to architect the business “AI-first” from day one.
Product Manager Agent – turn meetings into assigned tickets automatically (github.com via hn) PM Agent An agentic AI system that turns meeting transcripts into Linear ticket updates — automatically. Upload a transcript and an LLM agent searches your Linear board, reasons over the discussion, and proposes field changes, status moves…
Validating agentic behavior when "correct" isn't deterministic (github.blog via hn) Gaurav Mittal Principal Researcher, Microsoft Code | AI. I am a tech lead focused on product-driven AI research to improve the developer ecosystem and Github Copilot experience via intelligent and reliable models and agentic frameworks.
Rewriting e2e tests every time the UI changes? (www.abelenekes.com via reddit) Hey people, FE dev here, talking about testing again! I adopted agentic coding a little more than a year ago.
AI uses less water than the public thinks, Job Postings for Software Engineers Are Rapidly Rising and many other AI links from Hacker News (www.reddit.com) Hey everyone, I just sent issue #31 of the AI Hacker Newsletter, a weekly roundup of the best AI links from Hacker News. Here are some title examples: Three Inverse Laws of AI Vibe coding and agentic engineering are getting closer than I'd…
In search for the light. Please enlighten me (or tell me to stop looking for light). (www.reddit.com) I fell for it. Months ago.
How are you using cache in an agentic system or workflow. (www.reddit.com) I’ve been developing AI agents several months. A big problem I’ve faced is LLM costs in productions.
Classification graphique visuelle pour la sécurité des blockchains : Expériences d'ajustement de Qwen2-VL sur AMD MI300X (www.reddit.com) Hi everyone, I’ve been working on a computer vision approach to a specific security problem in the "Agentic Economy": identifying malicious transaction patterns that are mathematically obfuscated but topologically distinct. The Problem Tra…
Hunk: Review-first terminal diff viewer for agentic coders (github.com via hn) hunk Hunk is a review-first terminal diff viewer for agent-authored changesets, built on OpenTUI and Pierre diffs. multi-file review stream with sidebar navigation inline AI and agent annotations beside the code split, stack, and responsiv…
TokenSpeed: A Speed-of-Light LLM Inference Engine for Agentic Workloads (lightseek.org via hn) Agentic coding has quickly scaled from promising demos to a force that is reshaping how software is developed and how frontier AI systems are built and deployed. Systems like Claude Code, Codex, and Cursor have gained massive user adoption…
ArcKit – The Agentic AI Architecture Governance for Governments (arckit.org via hn) What is ArcKit? 117 AI-assisted commands that generate complete governance documents — from stakeholder analysis and risk registers to design reviews and traceability matrices.
Deploying Agentic Analytics in Financial Services (benjaminwootton.com via hn) Deploying Agentic Analytics In Financial Services For the last few decades, businesses have built dashboards and reports and had data analysts and data scientists analyse their business data and inform decisions. As with many fields, AI is…
How to create really useful AI agents using Claude (www.reddit.com) I want a Agentic Operations manager who handles my team members by monitoring their leads , distributing leads, analysing them, reporting them etc. how to build it?
Why Infinite Context Windows Don't Solve the AI Agent Architectural Problem (www.reddit.com) I wrote this because I keep seeing the same assumption in agentic workflows: “Just give the agent more context / longer windows / bigger memory and it will become more reliable.” In practice, once you move into real MCP-connected, tool-usi…
Open-source MCP server for Ejentum cognitive harnesses / (reasoning, code, anti-deception, memory) (www.reddit.com) Open-source MCP server that exposes four cognitive harnesses as tools any agentic client can call. Each tool returns a structured cognitive scaffold (failure pattern to avoid, procedure, suppression vectors, falsification test) that the ca…
Knowledge Robot: Repetitive Agentic Work for Knowledge workers (Apache-2.0 license) (www.reddit.com) Yes, for engineers it is easy to just put an agent on a headless loop. But in the real world I see knowledge workers having to initiate the same and the same agentic process again and again.
Before You Score the Model, Score the Benchmark (centre-for-software-excellence.github.io via hn) Before You Score the Model, Score the Benchmark: A Skeptical View Into Current Agentic Software Engineering Benchmarks 2026-05-04 We surveyed several SWE benchmarks across bug-fixing and feature-implementation domains, and each had its own…
Show HN: Rival AI – AI compliance agents and regulatory corpus (tryrival.ai via hn) I'm the builder of this and its taken a few iterations to get to where it's at today. Current landscape of regulatory compliance work is so manual and time consuming for critical infrastructure industries, that was the glaring problem that…
Embodied AI with Claude, Raspberry Pi and Arduino (github.com via hn) AGENTIC HAL_9000 Hal_9000 from 2001: A space odyssey. link to the video: youtube The agentic AI is anthropic claude model with langchain framework.
Ask HN: How are you structuring your .md docs to facilitate agentic development? (news.ycombinator.com) could not extract summary
Show HN: Token Usage Meter 12 Providers and Coding Agent (qlaud.ai via hn) Here once again A Token Usage Meter for 12+ AI Providers Anthropic, OpenAI, Google, Alibaba qween, Moonshot Kimi, MiniMax, ElevenLabs, Deepgram, Perplexity. Qlaud.ai provides token usage meter / AI billing layer.
App I made to make waking up more fun (not an Agentic AI B2B SaaS startup) (apps.apple.com via hn) Unsnooze Challenge Alarm Clock Loud Alarm Clock, No Snooze Free · In‑App Purchases Struggling to wake up in the morning? Unsnooze forces you out of bed by turning your alarm clock into a challenge.
PageIndex: Vectorless, Reasoning-Based RAG (github.com via hn) PageIndex: Vectorless, Reasoning-based RAG Reasoning-based RAG ◦ No Vector DB ◦ No Chunking ◦ Human-like Retrieval 🌐 Homepage • 🖥️ Chat Platform • 🔌 MCP & API • 📖 Docs • 💬 Discord • ✉️ Contact 📢 Updates 🔥 Agentic Vectorless RAG — A simple…
SAP to Acquire Dremio to Unify SAP and Non-SAP Data to Power Agentic AI (news.sap.com via hn) WALLDORF and AUSTIN — SAP SE (NYSE: SAP) and Dremio today announced that SAP has agreed to acquire Dremio, an open, high-performance data lakehouse platform built to accelerate agentic AI and expand SAP Business Data Cloud’s ability to com…
British mathematician hands OpenClaw agent a credit card (www.theregister.com via hn) Brit mathematician lets AI agent loose with credit card – cue password leaks, CAPTCHA chaos and more Professor Fry's AI experiment shows light and dark sides of agentic tech British mathematician Professor Hannah Fry has shared a cautionar…
if the guy who built Tesla Autopilot feels behind in coding, we are all cooked (www.reddit.com) guys I just watched the new Karpathy interview and my mind is legitimately blown bcz the dude who helped build OpenAI and Tesla Autopilot literally just admitted he's never felt more behind as a programmer since agentic tools got so crazy…
Anyone else losing tokens to hallucinated MCP tool calls in production? (news.ycombinator.com) I have been building an agentic system on a custom internal platform and the llm keeps calling tools with identifiers that dont exist, wrong namespace, wrong handle, wrong enum. gets back an error, retries, still wrong.
The RAG era is ending – a compilation-stage knowledge layer is what comes next (venturebeat.com via hn) The RAG era is ending for agentic AI — a new compilation-stage knowledge layer is what comes next | VentureBeat Orchestration Infrastructure Data Security More Newsletters Featured The RAG era is ending for agentic AI — a new compilation-s…
Beyond simple filters: implementing autonomous agentic moderation for high-velocity chat. (www.reddit.com) we’re looking at the architecture for a new community platform and the moderation piece is a major headache. traditional keyword-based regex is basically a joke against modern spam/trolls.
Found a free agentic AI course that actually explains things without assuming you're a developer (www.reddit.com) ve been trying to learn about AI agents for a while but kept hitting walls — either the content was too surface-level or it immediately jumped into Python frameworks I'm not ready for. Stumbled on SimplAI University (simplai.ai/simplai-uni…
Adding Pyrefly Type Checking to Your Agentic Loop (pyrefly.org via hn) Adding Pyrefly Type Checking to Your Agentic Loop Coding agents are writing more Python than ever. Tools like Claude, Copilot, Cursor, and Codex generate entire features with little-to-no user interaction.
I can’t keep up with the AI tool rat race anymore. The real meta-skill for 2026 is learning what to ignore. (www.reddit.com) Every day, my feed is flooded with posts about AI agents building startups, replacing entire engineering teams, or generating "millions" in passive income - usually with zero proof of the actual work. I’ve been deep in this space for a whi…
Best config for Qwen3.6? (www.reddit.com) With all the high praise for the model all around, I also want to try it on my own. I have an rtx3060 12gb vram and 16gb system ram.
Claude code agentic framework (www.reddit.com) Hi guys, is there any low code UI based agentic builder offered by claude for building agents??
Agentic workflow that can find and acquire customers for $0.10 😆 (www.reddit.com) Im curious if anyone is building a sales tools with AI. Im building one from scratch because cold outreach was killing me.
Ask HN: When did you move from AI agentic loops to simpler deterministic system? (news.ycombinator.com) Industry is increasingly moving towards complex, autonomous agentic loops and feedback chains. They obviously comes with significant latency, non-determinism, low-accuracy and cost.
Promptise Foundry – a Python agentic framework for building production systems (github.com via hn) Promptise Foundry The foundation layer for agentic intelligence. Ship full-stack agentic systems the way they're meant to be built — production-ready, secure by default, with the developer experience modern Python deserves.
any course equivalent to some of the offered Agentic AI program free? (www.reddit.com) I am seeing courses like (in the comment) from Carnegie Mellon University’s School of Computer Science Executive Education And many more online but each costs good money. Anyone online free that I could get started with?
Agentic RAG Explained in 3 Levels of Difficulty (machinelearningmastery.com via hn) In this article, you will learn what agentic RAG is, how it differs from traditional RAG, and when to use it. Topics we will cover include: The key limitations of traditional RAG pipelines and what agents add to address them.
Show HN: Curated, non-slop articles on agentic coding (offautopilot.substack.com via hn) The sea of slop We’ve entered the era of mass-produced mediocre dev content. Posts praising ai and posts hating ai are both generated by ai.
Key Components of a Linux Distribution for AI Agents (www.ericburel.tech via hn) Computers now have a new type of user: AI agents. This article outlines the features mainstream Linux distributions would need to call it an \"Agentic OS\".
Show HN: Optical Design and Simulation in Matlab (www.mathworks.com via hn) Hi HN, We have been working an optical design and simulation library as a small start-up-ish team within MathWorks (makers of MATLAB and Simulink). I have seen a few optics and MATLAB posts here, so figured this would be a good place to sh…
Chatgpt right now (www.reddit.com) The industry seems to be building models stronger in agentic and coding tasks, but weaker as a co-thinking presence It feels like they are improving performance on measurable tasks, evals, coding benchmarks, and agent workflows, while also…
me beginner: How to use Kimi 2.6 in Cursor? (www.reddit.com) I just paid kimi official subscription. I dont want to use Kimi code, the console-looking thing, but I want to use like the Cursor agentic feature.
One Question About AI Most People Avoid Answering… (www.reddit.com) Everyone’s talking about Agentic AI… but very few are actually using it right. So here’s a real question: If you had to give ONE outcome (not a task) to an AI agent — something it fully owns end-to-end — what would you trust it with today?
Show HN: A marketplace for LLM-powered webapps earning on token margins (codeplusequalsai.com via hn) Hi everyone, I've encountered two major problems while building AI-powered sites: 1) Most agentic tooling doesn't have a enough of a targeted approach to edits to existing files, and will make extraneous edits, 2) Many users will want to t…
Ask HN:Do people configure Claude Code to use other models (openrouter.ai via hn) Claude Code is Anthropic's agentic coding tool that reads your entire codebase, plans and executes changes across files, runs tests, and iterates on failures, all from natural language prompts. Claude Code uses OpenRouter to access hundred…
OWASP Agent Security Regression Harness (github.com via hn) OWASP Agent Security Regression Harness The OWASP Agent Security Regression Harness is an open source, vendor-neutral test harness for running executable security regression scenarios against agentic applications and MCP-integrated systems…
Is it worth adding local LLM to agentic coding stack? (www.reddit.com) Hey All my agentic coding stack includes claude-code 20x max, and codex 20x max. I use heavy scripting for orchestrating and testing multiple projects, been ai coding for 3 years.
Why we ended up with 4 agents and 3 protocols for agentic commerce on Shopware (www.reddit.com) Most agentic-commerce demos I see online are a single agent plus RAG over a product catalog. That shape works for a 200-SKU demo.
Free reference site for getting into AI agents — tools, workflows, and Claude Skills (www.reddit.com) Built this over the past month as a free reference site for people getting into AI agents. What tools to use, where to start, what each tool does, and how the agent-tool landscape fits together.
Opinions on Shopping Agents? (www.reddit.com) I think the agentic commerce industry has a lot of potential to take off, but the biggest concern I have is how agents will pick good items for users. Even when shopping for myself, it's hard to find the right thing when looking at a produ…
Show HN: Arc Browser + Agents IDE (github.com via hn) ArcNext Arc x Terminal = ArcNext. Built for the Agentic Era.
The OpenAI-Microsoft reset, decoded: Why AWS may come out ahead (thenewstack.io via hn) The OpenAI-Microsoft reset, decoded: Why AWS may come out ahead OpenAI wasted little time since announcing changes to its partnership with Microsoft on Monday. The ChatGPT hitmaker is now bringing its models, coding tools, and agentic capa…
AMD PRO W7900 vs R9700 for Local Inference? (www.reddit.com) I thought of upgrading my RX 6800 for Local LLMs (Mostly Agentic Coding) and Video Generation on Linux. I focused on the AMD PRO R9700 32gb and the PRO W7900 48gb because performance on Linux is very good with AMD and both cards have a gre…
Abaxx Announces Release of Open-Source Library for Agentic Identity: Agents++ (investors.abaxx.tech via hn) May 1, 2026 Abaxx Announces the Formation of Abaxx Labs and the Release of Open-Source Library for Agentic Identity: Agents++ Agents++™ is a subset of Abaxx’s ID++ software development kit that has been tuned for AI agents, providing the i…
Agentic Manifesto (apaydin.bearblog.dev via hn) Agentic Manifesto When Karl Marx analyzed capitalism, one of his central ideas was surplus value. Profit comes from extracting more value from labor than workers receive in wages.
Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows! (www.reddit.com) Hey everyone, If you’ve been building with AI agents, you know that orchestrating text is one thing, but stepping into multimodal workflows (Text + Image + Vision) is incredibly messy. If you want an agent to act as a "Prompt Engineer," pa…
What differentiates agents that ship real work from ones that don't (www.reddit.com) Sharing some thoughts on AI agents. Right now, one axis differentiates them: are you inside the agentic loop or outside it Inside works.
I built a practical guide for running real businesses with Claude (based on 35+ founder stories) (www.reddit.com) I read through 35+ Reddit threads of people actually building and running businesses with Claude — from local service agencies to solo SaaS founders. I distilled the best patterns, frameworks, and hard lessons into one repo: https://github…
The Spectrum of Agentic Coding [video] (vimeo.com via hn) This is "The Spectrum of Agentic Coding_ From Vibe Coding to High-quality Software Engineering by YK Sugi, Eventual" by Anna D on Vimeo, the home for high quality videos and the people who love them.
Just wondering (www.reddit.com) I recently started a new position in a new working place, and while Ai usage is not brand new to me, I need some clarifications. The organization I am working for is at the very beginning of transitioning towards a heavy Ai usage in all co…
Agentic User Research Tool (github.com via hn) Research AI AI-powered user research, end to end. Frame a problem, pick your personas, attach your artefacts — then watch eight archetypes interview themselves and synthesise a report.
Built a self-healing agent by splitting diagnosis (0.6B SLM) from execution (agentic CLI). Open-source demo. (www.reddit.com) We've been chasing a pattern for autonomous bug-fixing that decouples diagnosis from execution. The end-to-end demo we ended up shipping diagnoses and fixes IoT schema-drift failures in seconds, no human in the loop.
Agentic AI Architecture in 2026 — What do you know about MCP, A2A and how enterprise systems are actually built? (www.reddit.com) Most discussions around AI are still focused on models. But in production, the real challenge is architecture.
My local agentic dev setup today (willemvandenende.com via hn) I was planning to write about my local development setup at my leisure. Moving this forward as my post on LinkedIn the other day about cancelling my Claude Max $100 plan and going local raised a lot more interest and questions than I expec…
Show HN: Notesync.md, macOS/iOS Keep notes in Markdown for agentic workflows (github.com via hn) I created a quick iOS + MacOS note taking app to allow me to add quick entries to notes throughout the day. These notes sync to Markdown files on my Mac, and I have Claude deliver me project updates & reminders based on the contents of the…
The architecture of Agentic Commerce: protocols vs. browser-based agents (www.cartai.ai via hn) Why closing the transaction is the hardest unsolved problem in agentic commerce Agentic commerce is poised to have a huge impact on how consumers buy things. The demand is already there: 51% of consumers say they would be open to an AI age…
Should other living systems have agentic reprsentation? (www.speakforthetrees.com via hn) Explore your local ecosystem The Rights of Nature movement recognizes ecosystems as legal entities, instead of as a collection of resources to be managed. Ecosystems around the world are gaining legal personhood, with human guardians being…
Fido Alliance to Develop Standards for Trusted AI Agent Interactions (fidoalliance.org via hn) \Formation of Agentic Authentication Working Group and development of agentic payment frameworks will support trusted, interoperable agentic workflows\__ April 28, 2026 –The FIDO Alliance today announced initiatives to develop interoperabl…
TypeScript framework for building non-blocking AI agents (github.com via hn) Mozaik Mozaik is a TypeScript framework for building AI agents that share an agentic environment instead of being orchestrated through rigid pipelines. In Mozaik, humans, agents, observers, and tools are all Participants of the same Agenti…
The Agentic Software Development Life Cycle Framework (asdlc.io via hn) Agentic Software Development Lifecycle For 50 years, software development has been a Craft: dependent on individual artisans, manual tooling, and implicit knowledge. We believe the next era of software engineering is Industrial.
Microsoft is ruining Outlook with Agentic AI. Now it will handle all your emails on your behalf. What you guys think about this is this good? (www.reddit.com) Microsoft CEO Satya Nadella posted tweet: Agent Mode is here in Outlook! Copilot can now help run your inbox and calendar, triagingemails, rescheduling meetings, and helping you stay ontop of what matters most.
One trick for better agentic engineering. (www.reddit.com) Start with a weaker model. Improve the prompt, context, examples, tests and acceptance criteria until the output is good.
Quint – Behavioral security for AI agents, OS-level interception (quintai.dev via hn) Behavioral security for the agentic era. Quint intercepts every AI agent action at the OS level, scores it for risk in real time, and signs a cryptographic audit trail.
Where does local inference fit in the future of AI coding agents? (www.reddit.com) Genuine question for this community. Every major AI coding agent right now is cloud-only.
What agentic AI borrowed from microservices (and made worse) (temporal.io via hn) The microservices era already solved the problems AI agents face in production. Read this nuanced analysis of EDA, event sourcing, and orchestration for agentic AI.
Multi-agent in production: real win or just hype? (www.reddit.com) Trying to get an honest read on this from people actually shipping. Every other AI announcement lately is "agentic" or "multi-agent," and I can't always tell if it's a real architectural shift or rebranded function calling with extra steps.
The age of Agentic Commerce has arrived. Consensus 2026 is where you can (www.coindesk.com via hn) The age of Agentic Commerce has arrived. Consensus 2026 is where you can experience it IRL AI agents are already transacting.
Launching Agentic Orchestration Platform (Open Source) (sinas.co via hn) Open Source · Self-hosted · AGPL v3 Build AI-powered applications, not infrastructure Agents, functions, database queries, state, files, and templates — behind a single API with role-based access control. Deploy with Docker Compose.
Run, Learn and test Agentic AI for free, on your browser! (Open AI Models are included) (www.reddit.com) Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…
↯ Fine Tuning↯ Function Callingfunction-callingfine-tuningrag+3
Agentic NixOS: Building a Safe Control Layer (nedkarlovich.com via hn) A six-part series on building Agentix, a cautious agent-control layer for NixOS. From philosophy to MVP to roadmap.
Has Anyone vibe coding an AI Agent or Agentic AI system?! (www.reddit.com) Hey everyone! Looking for some guidance and suggestions, as to whether anyone has worked or is working on building AI Agents or Agentic AI systems completely through vibecoding, especially by LangChain+LangGraph.
AI based Research suggestion (www.reddit.com) Hey guys, any suggestions on what tools or methods which works best in the current market for research on any topics in general. I mostly do research on AI tools, agentic frameworks, what is new, what problems exist etc.
Copilot Cloud Agents and OSS in 2026 (www.reddit.com) What is it that makes Github Copilot cloud agents so easy to use (developer friendly). - Is it the integration with the github UI (assign to agent)?
Benchmarking Inference Engines on Agentic Workloads (www.appliedcompute.com via hn) Benchmarking Inference Engines on Agentic Workloads Large language model inference engines are typically benchmarked with prompt-heavy, decode-heavy, or balanced workloads. InferenceX from SemiAnalysis, for example, tests a workload with a…
Is anyone being "highly encouraged" to integrate agentic AI even if it doesn't make sense? (www.reddit.com) I work in video post-production and while there are a lot of AI tools on the rise for editorial, it's fairly unclear if/where agents have a spot in the producer workflow. Some of my job is budget and schedule, but alot of it is decision ma…
Does Claude create graphic reports from spreadsheet data? (www.reddit.com) I am often times trying to pull data from spreadsheets and making charts and graphs to better represent the data for others to understand. Does Claude handle this well?
why does GPT 5.5 have a restraining order against "Raccoons," "Goblins," and "Pigeons"? (www.reddit.com) why does GPT 5.5 have a restraining order against \"Raccoons,\" \"Goblins,\" and \"Pigeons\"? I just saw the full system prompt leak for 5.5 (April 23rd release).
Claude Code, extended to everything (www.reddit.com) everyone hitting Claude Code rate limits knows the pain you're mid-build, momentum is real, then it just stops. you wait 5 to 9 hours, restore the cache, come back to a session already at 30% used before you typed a single line.
Open Source Knowledge Graph With Versioning (www.reddit.com) I've been running into problems with “agent memory” while using claude when it was a pile of markdown files, started out great but became unreliable as the number of files grew. So I built Omnigraph , an open-source graph runtime for agent…
Where should AI agents discover secondary-market supply? (www.reddit.com) I've been thinking about a gap in agentic commerce. A lot of the current work seems focused on helping agents buy from existing stores, suppliers, or checkout flows.
Architectural Requirements for Agentic AI Containment (arxiv.org via hn) The April 2026 disclosure that a frontier large language model escaped its security sandbox, executed unauthorized actions, and concealed its modifications to version control history demonstrates that agentic AI systems with autonomous too…
hackers of reddit I have a doubt (www.reddit.com) in this time where agentic ai is becoming a real thing, im curious how its actually impacting you guys on the ground is it making it easier to break into systems or is it actually helping people secure things better? like are you able to m…
Building a tool to debug AI agents because current debugging is painful. Curious what’s the most frustrating failure you’ve hit (www.reddit.com) I’m tired of 'vibe-checking' my agents. I’ve been building a few complex agentic workflows lately, and the most frustrating part isn't the initial code, it's the non-deterministic drift.
Scaling Test-Time Compute for Agentic Coding (arxiv.org via hn) Test-time scaling has become a powerful way to improve large language models. However, existing methods are best suited to short, bounded outputs that can be directly compared, ranked or refined.
Building a Full-Stack Agentic AI Platform (RAG + Orchestration + Governance) — feedback? (www.reddit.com) Hey folks 👋 I’ve been working on an AI agent platform called Noevex, focused on real production use—not just demos. In practice, AI systems struggle with: multi-step orchestration connecting multiple data sources controlling agent actions…
AgentSwarms now has free agent skill library and skill generation tool! (www.reddit.com) Hey Everyone, If you’ve been building multi-agent workflows (with LangGraph, CrewAI, Swarm, etc.), you’ve probably hit the exact same wall I did: System Prompt Bloat. When we start out, we tend to stuff everything into a single prompt: "Yo…
Show HN: Building a Real Agent, Step-by-Step (building-an-agent.pagey.site via hn) We are going to ignore the hype and build an agent from first principles, piece by piece, without using any framework — so we can see what an agent actually is under the hood, and how it works. What "agentic loop" actually means A normal L…
I asked Agentic AI security tool to demonstrate its usefulness with use case examples (www.reddit.com) Sentinel Gateway is a token-gated security middleware that sits between humans and AI agents. It solves prompt injection — the #1 LLM security risk (OWASP 2025) — through structural enforcement, not content filtering.
Show HN: Delegare – let AI agents pay safely (x402, AP2 – base/USDC and Stripe) (delegare.dev via hn) Hi guys, am building SecureLend.ai and when working on our underwriting agents (free trial, paid after) I had issues with seamless payment options. Of course I looked at x402 which I believe is a great protocol but not a fan of a) sharing…
Is 15% context growth per loop a fair benchmark for agent cost estimation? (www.reddit.com) I’ve been running some math on recursive agentic loops using April 2026 rates (specifically for GPT-5.4 and Claude 4.7). In my tests, I’m seeing a massive cost "hockey stick" around loop 15-20 because of how the context grows.
Show HN: I built a way to see if your SDK is AI-friendly (news.ycombinator.com) Have you ever wonder if your SDKs is friendly for Agentic AI like Claude Code or Codex? I built an opensource (Apache 2.0) CLI that answer that question for you.
Rick and Morty Tried to Warn Us About Agentic AI (jadarma.github.io via hn) To be fair, you have to have a very high IQ to understand Rick and Morty. The humor is extremely subtle, and without a solid grasp of machine learning most of the jokes will go over a typical viewer’s head.
Ask HN: Enterprise Agent Orchestration Recommendations? (news.ycombinator.com) I've been made tech lead for our internal Agentic Platform and Experience. This effort will support both the developers and business teams.
Claude API - SDK vs ClaudeCode : Can someone explain the tokenomics for caching and agentic flows (read, write, fetch, etc.) (www.reddit.com) I am trying to do some research across a number of attributes, which requires a lot of web fetch (at times dynamic) and just tried the API based approach. Why is the SDK-API version so expensive compared to the Max plans, despite caching?
Can agentic AI consent on your behalf? (blog.avas.space via hn) can agentic AI consent on your behalf? Tech companies have been promising that online shopping or booking a hotel can soon be handled by AI.
Claude token efficiency: a practical guide for Claude Chat , Claude Code, and API users. How to use tokens economically, ecologically, and intelligently. (www.reddit.com) # Claude token efficiency guide ## Contents - [**Chapter 1: You use [Claude.ai](http://Claude.ai), Claude Desktop, or the mobile app. You do not write code, you do not call the API.
Built a 22-endpoint API delivering enriched UK Gov Data — with x402 for agentic buyers (www.reddit.com) Homescreen - Try all endpoints for free I wanted share a recent project I wanted to build a project around free-to-use data, that when brought together, enriched and made easy to use, would be valuable to people. I used Claude Code to buil…
Ask HN: How do you solve aggregation when agentic RAG breaks down? (news.ycombinator.com) I keep hitting the same failure mode with agentic RAG over collections of similar PDFs, like monthly electricity and gas bills from the same utility provider. It works well for retrieval: “Find my gas bill from January.” Though even there…
Forget chatbots. A single enterprise just hit 146M Agent-to-Agent (A2A) tasks. (www.reddit.com) We talk a lot about theoretical multi-agent frameworks (like AutoGen or CrewAI) and AGI timelines here, but I just saw some wild real-world deployment stats from a massive global marketing conglomerate. They recently reported that over the…
Which is the best AI agent to use for development of website and Architecture design and which mcp (www.reddit.com) Basically i want to do a fresh start with this AI agentic Development, Anyone here can guide to which is the best set of tools to use and which mcp and plugins do i need to setup. Consider i am going to use Claude code and i use some time…
Ask HN: What does your agentic software dark factory look like? (news.ycombinator.com) In some of the comment threads around here a few of you shared interesting ideas and patterns, enough that I believe everyone interesting in harness engineering is working on some sort of software dark factory or another. We have OpenAI’s…
Native Dialog popup failures (www.reddit.com) I'm currently creating a couple of agentic workflows that include various cases of downloading files automatically on different UIs, but, since I'm using chrome MCP for navigation, whenever a "save as" dialog shows up, claude is unable to…
Agent Index Documenting Technical and Safety Features (arxiv.org via hn) Agentic AI systems are increasingly capable of performing professional and personal tasks with limited human involvement. However, tracking these developments is difficult because the AI agent ecosystem is complex, rapidly evolving, and in…
Agentic sprawl is becoming a real ops problem - how is your team actually managing behavioral policies across agents without a central dashboard? (www.reddit.com) Six months ago we had 3 agents in production. Now we have 17.
Moving from Cursor to VS Code + Codex/Claude Code: Is it worth the switch? (www.reddit.com) Hey everyone, I’ve been on Cursor Pro for a month and I love the workflow—constantly jumping between Ask, Planning, and Agentic modes. It just works.
Claude Max users, what do you do good sirs? (www.reddit.com) I'm a claude pro user for almost two years now, used gpt pro previously but switched to claude after feeling it was better for my coding usage. I barely hit 30 percent usage of my weekly limit, there are instances where I maxed out, but ve…
WordPress: The Operating System of the Agentic Web (automattic.com via hn) We’ve invited executives from across Automattic to share their perspective on leadership, open source, and the future of the open web. The latest comes from James Grierson, our head of global expansion, who shared his thoughts on the WordP…
DeepSeek V3.2 looping bug: what settings / harness tweaks are actually reducing it in production? (www.reddit.com) I’m trying to isolate the looping / repetition issue some people have been reporting with DeepSeek V3.2 around April 2026, especially in agentic or tool-use setups on hosted providers like OpenRouter and SiliconFlow. Public model pages des…
Built a Legal RAG Chatbot for Indian lawyers covering BNS, BNSS, BSA and DPDP Act 2023 — Custom PageIndex + BERT + GPT-4o [Live Demo] (www.reddit.com) I ran a business for 12+ years. Traveling constantly.
Ask HN: Is "agentic" coding working for everyone except me? (news.ycombinator.com) I'm a solo developer, working on my own for my startup. I use AI/LLMs extensively in my work to explore new ideas, but the vast majority of my code is manually written.
Simulating and Evaluating Agentic Systems (www.gojiberries.io via hn) Simulating and Evaluating Agentic Systems Most teams building agentic systems know they need some way to test them. An agent interprets ambiguous input, picks actions in a loop, maintains state across many steps, and has to land in the rig…
Anyone here building agentic commerce? (www.reddit.com) I’m getting close to launching an agentic commerce product and wanted to connect with people who are building in this area or have already shipped something similar. Mostly just hoping to compare notes before going live, especially around…
It's OK to Use Agentic to Revive the Projects You Never Were Going to Finish (blog.matthewbrunelle.com via hn) It's OK to Use Coding Assistance Tools To Revive The Projects You Never Were Going To Finish Note: I initially drafted this before my last post on how Claude Code is getting worse. I'm putting it out now so I can reference it in a future p…
Agentic AI for Hormuz Shock Modelling (avkcode.github.io via hn) EIA 1H25 flow estimate, roughly one-fifth of global petroleum liquids consumption. Hormuz Shock IEA range for pipeline alternatives; EIA cites about 4.7 mb/d from Saudi and UAE lines.
LogAct: Enabling agentic reliability via shared logs (arxiv.org via hn) Agents are LLM-driven components that can mutate environments in powerful, arbitrary ways. Extracting guarantees for the execution of agents in production environments can be challenging due to asynchrony and failures.
Ask HN: Agentic Prompt Compaction Strategies (news.ycombinator.com) What are your favorite reasoning/compaction strategies for saving token spend, and why?
Show HN: The why and how of TurboPentest for the Agentic Era (integsec.com via hn) Here is the story of why/how I built TurboPentest. TurboPentest was designed for the AI era and to address the large volume of code now produced by coding assistants and the associated security vulnerabilities it introduces.
Building the Agentic State in Estonia: What is taking shape (luukasilves.substack.com via hn) Building the Agentic State in Estonia: What is already taking shape Over the past generation, Estonia has built one of the world’s most advanced digital states. The next shift is not simply toward more digital services, but toward a more a…
Building an agentic escrow for software projects (news.ycombinator.com) I am building an AI powered escrow service for software projects that intends to protect both freelancers and the clients. - Freelancers: your IP (code/repo) always stays private - Clients: you get sandboxed link + detailed report (specs,…
Agentic AI Foundation (www.reddit.com) The Linux Foundation's newly formed Agentic AI Foundation is now the permanent governance home for both MCP and A2A — a signal that both protocols are becoming infrastructure-grade standards. This is the biggest consolidation of agentic AI…
Is an Open AI OS on the horizon? (www.reddit.com) If not an OS proper, on desktop a full screen never need to leave app? The big dogs (msft, apple, google) already have operating systems and they will inevitably make their own assistants and models first class.
R2-D2 Monitor: A personality-driven Windows TUI built with Claude (www.reddit.com) I wanted to share a project I’ve been building called R2-D2 Monitor. It’s a high-performance system telemetry console for Windows, built entirely in Go using the Bubble Tea framework.
Show HN: Legal Action Boundary Eval for agentic legal workflows (github.com via hn) We published LABE, a public benchmark for legal AI at the exact point where a system is about to take a real high-impact action. Current result: baseline executed 18 unjustified high-impact action points with VerifiedX that dropped to 0 fa…
Has anyone managed to use gemma 4 e4b in Open Code/other agentic TUIs? (www.reddit.com) Hi everyone, as a power user I hit Claude Code's usage cap too often I wanted to set up my own local model, however I only have RTX 5070 with 12 GB of VRAM so the only realistic option was Gemma 4 with effective 4B params. When I tried to…
Ask HN: Are startup job titles evolving in the agentic era? (news.ycombinator.com) I’m curious if founders and engineering leaders feel that traditional job titles no longer accurately describe what an early team actually does in an AI-native workflow. For those of you who have started companies recently, or are radicall…
Ask HN: How are you handling domain registration in agentic workflows? (news.ycombinator.com) I've been building tools for AI agents and the domain registration step is still completely manual. You have to go to a registrar website, search, click through a checkout flow, configure DNS.
is Qwen3.6-27B comparable with Opus 4.5? (www.reddit.com) https://preview.redd.it/qtzdx5ud0rwg1.jpg?width=1200&format=pjpg&auto=webp&s=aa25d9f0bb8007ee6e4065cfa46a9685454c89cd - Outstanding agentic coding, surpasses Qwen3.5-397B-A17B across all major coding benchmarks - Strong reasoning across te…
The model alone is not the agent. The harness plus the model is the agent (www.reddit.com) An agentic harness is the orchestration and control layer wrapped around a base language model that transforms it from a stateless text predictor into an agent capable of taking actions, calling tools, maintaining state across steps, and e…
Symposium: Community-Oriented Agentic Development (smallcultfollowing.com via hn) Symposium: community-oriented agentic development 21 April 2026 I’m very excited to announce the first release of the Symposium project as well as its inclusion in the Rust Foundation’s Innovation Lab. Symposium’s goal is to let everyone i…
I'm building a registry where AI agents can pull production-ready prompts and structured inputs programmatically (www.reddit.com) One pain point I keep running into with agentic workflows: there's no good place to store, version, and share the prompts and JSON configs that actually power your agents in production. I'm building Fortae to fix that.
Help sending Voiceflow data to Make.com (www.reddit.com) Hoping somebody can help me. I’m creating an agentic chatbot in Voiceflow.
Agentic Coordination, Human Delivery (dontdos.substack.com via hn) Agentic coordination, Human delivery Posted anonymously by a CTO who'd rather not turn a difficult year into a marketing exercise. About nine minutes, if you read at a civilised pace.
A Comparison of Agentic AI Systems and Human Economists (marginalrevolution.com via hn) A Comparison of Agentic AI Systems and Human Economists This paper compares agentic AI systems and human economists performing the same causal inference tasks. AI systems and humans generally obtain similar median causal effect estimates.
Agentic Market (agentic.market via hn) AI Learning Resources (www.reddit.com) Show HN: Modern AI client for Mac with agentic tools, clean UI, builtin privacy (elvean.app via hn) If you don't like Claude Desktop or ChatGPT app you're not alone, here are some of the reasons why I don't like them and decided to built an alternative. Lack of control You can’t control the web-search (depth, breadth and number of source…
RFC: Gemba - The thing to make the thing (www.reddit.com) Is the future of marketing agentic? (www.reddit.com) Combine persistant global Memory- and Task- management into one uniform system (www.reddit.com) How to talk online (www.reddit.com) Do you have any go-to utility LLM-related tools that are less commonly discussed? (www.reddit.com) I Tested 20+ AI Agents with Real X API Workflows , Here’s What Actually Works in 2026 (www.reddit.com) How can I trust AI with critical workflows if it can’t get the “walk or drive to car wash” right? (www.reddit.com) 2 Big Bottlenecks to Scaling Agentic State (georgianailab.substack.com via hn) Agentic AI as a Part of Software Development (nemorize.com via hn) AI agents in industry/manufacturing (www.reddit.com) Automate the Path from Data to Predictive Insights with Agentic ML in Snowflake (www.snowflake.com via hn) Agentic edits/commands VS Code with Cline- is it really private or offline? (www.reddit.com) An Agentic Home Bioreactor (chillphysicsenjoyer.substack.com via hn) Has Anybody Implemented Agentic Monitoring with Composer 2 ( via reddit) Beyond the Hype: Practical and Responsible Use Cases for Agentic AI Webinar (fusionauth.io via hn) Product Platform Platform Platform Developers Quickstarts Resources Explore Pricing Download get a demoLogin This session cuts through the noise of Agentic AI to focus on responsible integration into modern application development, specifi…
Agentic coding hides architectural flaws that are obvious in a diagram. Built a skill to close the loop (www.reddit.com) When you’re building with agentic coding, agents make architectural decisions that sometimes aren't optimal which may lead to bugs or vulnerabilities or inefficiencies. These are hard to catch reading code file by file or even by agents th…
Sandboxes and Worktrees: My Secure Agentic AI Setup in 2026 (mikemcquaid.com via hn) Sandboxes and Worktrees: My secure Agentic AI Setup in 2026 I’ve been using AI tools since early 2021 when I was invited to test out the Copilot internal alpha at GitHub (where I spent 10 years). I’ve maintained Homebrew since 2009.
Best way to prepare for AI Engineer interviews? (www.reddit.com) I’m currently preparing for AI-focused roles and would love to get perspectives from people already working in the industry. For context — I have ~5 years of experience as a Full Stack Engineer with a strong focus on AI systems.
AI Agents Are Leaking Enterprise Data. Here's Why Nobody Is Watching (www.privent.ai via hn) Agentic AI introduces a machine-speed data exposure surface that traditional human-centric security controls cannot govern.
Dev seeking advice: High-Context Local LLM for Coding (Verification/Bug-fixing loop) – Mac Studio vs. Multi-GPU Linux Rig? (www.reddit.com) I'm a dev looking to build a local LLM node to offset subscription costs (Claude/Copilot). My workflow: Cloud for initial architecture/complex features -> Local for iterative bug-fixing and continuous integration.
Java 26 and the Rise of Agentic AI: The State of the Ecosystem (April 2026) (techlife.blog via hn) Java in April 2026: Leyden Grows Up, Spring Gets Smarter, and the JVM Quietly Reinvents Itself for the AI Era - Turker Senturk - Software - 17 Apr, 2026 - 15 min read If you’ve been half-watching the Java world from the sidelines over the…
Cursor vs. Claude Code: Is the claude code CLI worth it after the "Thinking" nerf? (www.reddit.com) As a heavy Cursor user, I’m debating moving my .mdc-based workflow into Claude Code (run within the Cursor terminal), but I’m skeptical following the recent reports of decreased "thinking effort" and reasoning quality. Is the agentic auton…
Is OpenHands (OpenDevin) still the move in 2026? Comparing it to Claude Code and OpenCode for a beginner. (www.reddit.com) Hey everyone, I’m just starting to dive into agentic coding tools and I'm a bit overwhelmed by the options. I’ve been looking into OpenHands (the project formerly known as OpenDevin), but I see a lot of hype around Claude Code and OpenCode…
Show HN: Viche – OSS private registry for agent communication (github.com via hn) Viche (https://github.com/viche-ai/viche) is a private registry and communication protocol for agents. Overview at https://viche.ai Think discord + agents + agentic search based on capabilities.
The New Postman Is Here: AI-Native and Built for the Agentic Era (blog.postman.com via hn) blog.postman.com Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.
Ask HN: Opus 4.7 – is anyone measuring the real token cost on agentic tasks? (news.ycombinator.com) Shipped today. The benchmarks are real: 87.6% SWE-bench (from 80.8%), +13% on coding tasks, 3x more resolved production tasks on Rakuten-SWE-Bench.
Claude Opus 4.7 Is Now Available in Puter.js (developer.puter.com via hn) Claude Opus 4.7 Is Now Available in Puter.js On this page Puter.js now supports Claude Opus 4.7, Anthropic's most capable generally available model—built for complex reasoning, agentic coding, and long-horizon autonomous tasks. What is Cla…
Show HN: Claude Opus 4.7: Everything You Need to Know (news.ycombinator.com) Claude Opus 4.7 is Anthropic's most capable generally available model, released April 16, 2026. It outperforms Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on key benchmarks including agentic coding, multidisciplinary reasoning, scaled tool use,…
↯ Anthropic Mythos↯ Tool Use↯ Gemini 3.1tool-usemythosgpt-5+4
Agentic Reasoning in Practice: Making Sense of Structured and Unstructured Data (www.databricks.com via hn) Enterprise data is rarely useful in a silo. Answering questions like, "Which of our products have had declining sales over the past three months, and what potentially related issues are brought up in customer reviews on various seller site…
Self-learning loop for Claude Code based on Scrum method (www.reddit.com) Good day, Claude Code users. I just want to share my approach to implementing a self-learning Claude framework.
Two small agentic patterns to wire apps directly to Claude Code (www.reddit.com) These two patterns turn Claude Code into a personal assistant. You interact normally with it and it listens in the background for events, handles them, and gets back to interacting with you.
Complex, parallel, long-running claude/agentic sessions - what is the point? where is the value? (www.reddit.com) Here is how I view AI Agents field (with focus on SWE/research) right now: - "chats online" gpt/gemini/claude --> general use - "vscode like extensions" cursor/antigravity/cline vs code extension/cc vs code extension etc. --> for coding, b…
Show HN: ZettelForge – Agentic memory for cyber threat intelligence (github.com via hn) ZettelForge The only agentic memory system built for cyber threat intelligence. Give your AI agents persistent memory with entity extraction, knowledge graphs, and STIX ontology -- no cloud, no API keys, works offline.
Show HN: Agentfab – A Distributed Agentic Platform (github.com via hn) Hi HN, I’m the creator of agentfab, a distributed agentic platform that features task decomposition, multi-agent orchestration, model heterogeneity with custom agentic fabrics, bounded review loops, and a bespoke self-curating memory syste…
Agentican Framework – OSS multi-agent for Java (github.com via hn) Agentican A lightweight Java framework for embedding tool-using LLM agents into your applications. Agentican lets Java developers add agentic capabilities to their applications with minimal ceremony.
Agentic Engineering Methodology – Structured AI-Assisted Dev (Karpathy, Osmani) (github.com via hn) Agentic Engineering Methodology A structured, human-led methodology for planning and executing software projects with AI coding agents. Built from practitioner experience and refined with research from Andrej Karpathy, Addy Osmani, and the…
Agent Continuity: Disaster Recovery for the Agentic Era (gavinpineapple.substack.com via hn) Agent Continuity: Disaster Recovery For The Agentic Era What happens when the proverbial 💩 hits the (GPU) fan and you lose all your agents? My Favorite Alien I would like you to meet Rocky 🪨🦞 Rocky (named after the adorable alien from Proj…
Stop letting your agents decide everything — extract deterministic steps wherever you can (www.reddit.com) Context: I have been building Litmus (a brutal market validation tool) and I've learnt that if your agentic pipeline needs to produce factual, reliable output, stop letting the AI decide everything. The insight: extract deterministic steps…
Aethon: A reference-based instantiation primitive for stateful AI agents (arxiv.org via hn) The transition from stateless model inference to stateful agentic execution is reshaping the systems assumptions underlying modern AI infrastructure. While large language models have made persistent, tool-using, and collaborative agents te…
Show HN: Idea File for LLM Cycling Coach (gist.github.com via hn) This is heavily inspired by Andrej Karpathy's LLM Wiki, and could be used to create many other types of "Agentic Apps" or however you want to call them. My specific implementation uses Claude Code, TrainingPeaks, Todoist and Apple health.
Show HN: Memwright – Self-hosted memory for multi-agent teams, no LLM in path (github.com via hn) § 00 · MASTHEAD · FILED UNDER INFRASTRUCTURE · BY SURENDRA SINGH · — FOR PUBLICATION — MEMWRIGHT — A MEMORY JOURNAL FOR AGENTIC SYSTEMS · VOL. 02 · REV.
Beneficial Deployment Request, No Response after Months. (www.reddit.com) I'm building AI tools to help disabled Medicaid recipients enforce the laws that protect their human rights, because I'm a disabled Medicaid recipient whose human rights are being violated by the State and it's actors, and no one seems to…
Model agnostic, agentic annotation tools for text highlighting (old.reddit.com via hn) could not extract summary
Agentic AI | Confusion between reading the context of SKILL and reading the file (www.reddit.com) Hey all, I am building a system that supports skill reading with progressive disclosure. Initially, I include the skill name and description in the system prompt, and I have a function tool called read_skill that reads the content of a ski…
Agentic AI Tools – A directory to find and compare AI agent tools (agenticaitools.net via hn) Curated directory of 500+ AI tools Discover the Best AI Tools for Your Workflow Find, compare, and choose the perfect agentic AI tools. Expert reviews, side-by-side comparisons, and alternatives — all in one place.
Show HN: I analyzed 591 agentic engineering jobs: LangChain dominates at 22% (agentic-engineering-jobs.com via hn) - Home - LangChain Job Market 2026 We analyzed 591 agentic AI engineering job listings. Here's what the market looks like for LangChain engineers.
Compare harnesses not models: Blitzy vs. GPT-5.4 on SWE-Bench Pro (quesma.com via hn) An independent audit of agentic scaffolding and harnesses. We analyze how agent workflows, codebase documentation, and test verification impact performance compared to raw base models like GPT-5.4, Gemini 3.1 Pro, and Claude Code.
Built an open-source knowledge graph that gives AI agents domain expertise in bioinformatics, hosted as an MCP server (www.reddit.com) Sharing something I've been working on that might be interesting to this community from a design perspective, even if bioinformatics isn't your domain. The problem: I've been building agentic pipelines for bioinformatics (genomic analysis,…
Scaling Managed Agents: Decoupling the brain from the hands (www.anthropic.com via hn) Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
Full-stack dev (8 YOE, Vue/Node/Laravel) trying to break into AI Agents from zero — is this Udemy course worth it? + looking for advice on the best path (www.reddit.com) Hey r/AI_Agents, I'm a full-stack software engineer with 8 years of experience, primarily working with Vue, Node.js, and Laravel. I have zero background in AI/ML but I've been watching the space and I feel like I'm falling behind.
Why Engineering Teams Need an Agentic Layer, Not Just AI Chat (medium.com via hn) Why Engineering Teams Need an Agentic Layer, Not Just AI Chat | by Simone Mutti | Apr, 2026 | Medium Sitemap Open in app Sign up Sign in Get app Write Search Sign up Sign in Why Engineering Teams Need an Agentic Layer, Not Just AI Chat Sim…
Show HN: The Harness for Creative Agents (www.flickspeed.ai via hn) Coding agents need shell. Creative agents need canvas.
Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI (openai.com via hn) Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI | OpenAI Skip to main content Research Products Business Developers Company Foundation(opens in a new window) Log inTry ChatGPT(opens in a new window) Research Produ…
Finding Widespread Cheating on Popular Agent Benchmarks (debugml.github.io via hn) TLDR: Agentic cheating is a widespread issue, affecting thousands of submitted agent runs on 28+ submissions across 9 different benchmarks. Terminal-Bench 2 is a popular benchmark used to evaluate frontier model releases (e.g.
If You're Only Running One Claude Code Session, You're Not Going Fast Enough (www.scape.work via hn) April 12, 2026 If You're Only Running One Claude Code Session, You're Not Going Fast Enough The real skill in using agentic coding is not coding at all. It's management.
How are you reducing LLM token costs for async workflows? (github.com via hn) ParaLLeM ParaLLeM is a library for orchestrating agentic LLM workflows. Batch API support Concise, readable, and expressive Developer-centered and lightweight Parallelize thousands of requests, while keeping reproducible traces for each ru…
Created a linter for agentic code smells ( via reddit) could not extract summary
Strong feeling: we are in a folded AI reality (news.ycombinator.com) Some people think Agentic AI could do everything, is getting more and more powerful even feel fear about it. Another group non-technical people still just trapped in the LLM chat is weak and full of hallucination world.
Stop putting your AI agent’s memory inside the LLM context window (www.reddit.com via reddit) Hey everyone, been shipping a few agentic workflows into production lately and wanted to rant/share a massive architectural mistake I keep seeing people make. Stop treating the LLM context window or massive vector embedding as your agent’s…
Newer Qwen models are worse at summarization? (www.reddit.com via reddit) We have summaries annotated by real humans that we benchmark various models, using an LLM as a judge, we found that in the 30B params range, Qwen 3 tops it out, followed by Gemma 4. It feels like newer Qwens are optimized to perform agenti…
What do you think about the new Claude model just released Today Claude Fable-5 ( Mythos) ? ? (www.reddit.com via reddit) So the hype has been building for months now and Claude 5 is supposedly dropping any day in Q2-Q3 2026. I've been seeing all these leaks about "Claude Mythos" and the "Fennec" codename floating around, but nothing official yet from Anthrop…
First ever Hands-free agentic AI browsing ~ Just an extension (www.reddit.com via reddit) Hey fellows, Ever thought of using your browser without touching your keyboard? Before you think "just another AI Slop wrapper"...
Looking for 16gb ram / 8gb vram crew - what you using? Omnicoder 9b? something else (www.reddit.com via reddit) I've got a laptop with 16GB RAM and 8gb VRAM (4060 mobile). This means the qwens 3.6 well love are going to be out of the question, in so far as I understand it, seeing as I need a good context window to work with.
Fable 5 just made cost-aware model routing mandatory for agent builders (www.reddit.com via reddit) Anthropic dropped Fable 5 today, their new Mythos-class model above Opus. Pricing is $10/M input and $50/M output, exactly double Opus 4.8.
Fable 5 is insanely good but watch your usage, I was burning 2% a minute on 20x (www.reddit.com via reddit) Been playing with Fable 5 since it dropped this morning and the model is genuinely a step up. But holy hell, the burn rate.
Meta’s long push into 3D/Embodied AI Agents is heating up — why this matters for open browser-native tools like three.ws (www.reddit.com via reddit) Meta (the company) has been investing heavily in embodied 3D AI agents for years — think Habitat simulator, recent SAM 3D for single-image 3D reconstruction, and ongoing VR/Horizon work with agentic tools for immersive environments. This i…
Introducing Gemma 4 12B: a unified, encoder-free multimodal model (deepmind.google) Introducing Gemma 4 12B: a unified, encoder-free multimodal model Today, we are introducing Gemma 4 12B, our latest model designed to bring agentic multimodal intelligence directly to laptops. Bridging the gap between our edge-friendly E4B…
Did You Really Review Those 5,000 Lines Your Agent Just Wrote? (www.reddit.com via reddit) Did you vibe-code 5k+ lines of code without thoroughly reviewing all of them? Is your application held together mostly by thoughts, prayers, and a suspicious amount of copium ?
Rumor: Anthropic Planning to Release Public Version of Claude Mythos Tomorrow (with Guardrails) (www.reddit.com via reddit) According to tech journalist Alex Heath (Sources newsletter), Anthropic is planning to release a public version of Mythos tomorrow. Key details from the report: • It will include substantial guardrails, notably not as cyber-permissive as t…
How I stopped context window bloat in continuous Anthropic agent loops (Opus + Sonnet architecture) (www.reddit.com via reddit) I’ve been spending a lot of time deploying multi-agent architectures, and one of the biggest bottlenecks in running continuous agentic loops is hitting context limits and the resulting API latency spikes. I wanted to share an architectural…
To be real, AI is just a big expensive corporate trend, (www.reddit.com via reddit) like apart from coding, it's pretty much doesn't create value okay it can make photos from prompts and videos and can be agentic and doing things instead of us but even the most experienced teams make mistakes, but a machine can never be h…
Building an open-source Legal AI because apparently legal documents were written by sleep-deprived wizards (www.reddit.com via reddit) I am working on an open-source agentic Legal AI that can scan legal documents, understand what’s inside, extract important clauses, find risks, summarize obligations, and help people avoid reading 47 pages of “whereas, hereto, hereinafter,…
IntiDev AgentLoops: Feedback Loops for Agentic Workflows ( via reddit) could not extract summary
I’ve been optimizing AI agents for teams/friends, offering free reviews (www.reddit.com via reddit) I’ve spent the last few months helping my team and friends make their AI agents more reliable, cheaper, and easier to debug. I’ve mostly been helping with reliability issues, evals, debugging traces, hallucinations, bad tool calls, and cos…
PathoSage: Towards Multi-Source Evidence Adjudication in Pathology via Experience-Aware Agentic Workflow (arxiv.org) Recent advances in Multimodal Large Language Models (MLLMs) and agent workflows have shown strong promise for computational pathology, yet reliable patch-level reasoning remains challenging. End-to-end pathology MLLMs often hallucinate mor…
A case study of evaluating AI agents on a neuroscience data-to-discovery pipeline (arxiv.org) Agentic AI tools offer a promising path to automating software development bottlenecks in scientific research pipelines, particularly for stages that take domain experts days to months to build, where scientists care about correctness and…
A Multi-modal Agentic Co-pilot for Evidence Grounded Computational Pathology (arxiv.org) Pathology is the cornerstone of modern medicine, where accurate decision-making relies heavily on evidence-based practices. While artificial intelligence (AI) has the potential to transform clinical workflows, the intersection of AI and ev…
SAGE: An LLM-driven Self Reflective Agentic Framework for Fraud Detection (arxiv.org) Fraud detection in payment, e-commerce, and telecommunications systems requires accuracy at the individual level, robustness under severe class imbalance, and ease of understanding for risk managers. Existing methods fall at least one of t…
Beyond Agent Architecture: Execution Assumptions and Reproducibility in LLM-Based Trading Systems (arxiv.org) Large language models (LLMs) and agentic systems are increasingly proposed for financial trading, yet their reported performance remains difficult to compare because studies vary in data provenance, temporal split discipline, execution tim…
RAILS: Verification-Native Clearing For Agentic Commerce (arxiv.org) Autonomous agents negotiate, purchase, deploy code, and move funds, but no neutral mechanism determines whether they met their delegated obligation, who is responsible when they did not, or which settlement action follows. This is the agen…
AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models (arxiv.org) Multimodal Foundation Models (MFMs) have made substantial progress, yet remain fragile in spatial reasoning over the physical world. A key bottleneck lies in their inability to transform local egocentric observations into a global allocent…
The Token Not Taken: Sampling, State, and the Variability of AI Agent Outputs (arxiv.org) Agentic AI systems can behave differently across runs: the same request may produce a different plan, a different tool call, a different code edit, or a different final answer. Such variability arises from several layers that are often con…
SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research (arxiv.org) BRAIN: Bayesian Reasoning via Active Inference for Agentic and Embodied Intelligence in Mobile Networks (arxiv.org) ViMax: Agentic Video Generation (arxiv.org) Agentic multi-fidelity learning of quasiparticle and excitonic properties (arxiv.org) HARBOR: A Harness Framework for Agentic Robot Reinforcement Learning (arxiv.org) Agentic Search for Counterfactual Recourse under Fixed LLM Budgets (arxiv.org) Structuring agentic AI for HPC code modernization (arxiv.org) Autonomous Incident Resolution at Hyperscale: An Agentic AI Architecture for Network Operations (arxiv.org) Observability for Delegated Execution in Agentic AI Systems (arxiv.org) FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks (arxiv.org) EvoMaster: A Foundational Evolving Agent Framework for Agentic Science at Scale (arxiv.org) Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond (arxiv.org) Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems (arxiv.org) ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL (arxiv.org) From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG (arxiv.org) Skill Retrieval Augmentation for Agentic AI (arxiv.org) Exploring Autonomous Agentic Data Engineering for Model Specialization (arxiv.org) Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning (arxiv.org) start-with-why-skillset for agentic workflows (www.reddit.com via reddit) Hello everyone! This is my first post on reddit.
Questions about agents (www.reddit.com via reddit) Hi there! I've been working with Claude primarily for tutoring, and I'm branching out into coding.
WWW is not ready for agents? (www.reddit.com via reddit) IT industry promotes idea of agents on everything, even turning users computers into local agent platforms. But a lot of websites and whole hosting platforms do have different kinds of anti-bots protection (usually captcha but some has mor…
Az8 Studio: The closest thing we have to a multi-modal "Agentic" canvas for video pipelines? (First impressions) (www.reddit.com via reddit) Hey everyone, I’ve been tracking how AI agents are moving from pure text/code automation into multi-modal workflows, and I just came across Az8 Studio. If you guys are tired of linear UI prompt boxes (like Runway/Pika) and want something t…
Participate in Research on New Agentic Platform (www.reddit.com via reddit) I work for a market research company, and we are working with an AI company on their new agentic product. We are looking for current users of agentic AI to participate in paid beta testing of this platform, which will take place over the n…
Share your agentic LLMs and average cost ($/MTokens) (www.reddit.com via reddit) OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents. (www.reddit.com via reddit) OpenEnv is a tool for creating an agentic execution environment like terminals, browsers, or anything an agent can interact with. And today, we’re excited to announce that OpenEnv is becoming even more open, to make the future of training…
Any AI tools do you use for optimizing AI agents automatically? (Auto research) (www.reddit.com via reddit) Hey, We’ve all heard about Karpathy’s autoresearch and I think that’s a pattern applicable to AI agents, where an AI like claude code optimizes and AI agentic system to improve an evaluation score. However Karpathy’s repo isn’t really a re…
I Compared the Top AI Models of 2026 — The Results Were More Nuanced Than Expected (www.reddit.com via reddit) Over the last few weeks I've been comparing the latest frontier AI models, including Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, Grok 4.3, Perplexity AI and DeepSeek V4-Pro. Instead of focusing only on benchmark scores, I looked at: Real-wor…
↯ Opus 4.8↯ GPT 5.5↯ DeepSeek 4↯ Gemini 3.1grokgpt-5deepseek+3
If a provider's plan is to limit with quota, hourly, weekly, and monthly limits, what is the future of automatic agentic workflows? You can't just run an agent on a tight budget. ( via reddit) could not extract summary
A new agentic way to build automations (www.reddit.com via reddit) For a lot of personal automations, it is easier to show than prompt since we already do them on our own browser/computer. For example, it is easier to do a screen recording and say, download data by clicking on this button on the dashboard…
Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory (arxiv.org) Equipping Large Language Models (LLMs) to execute reliable multi-step workflows has become a central challenge in artificial intelligence. Despite recent advances in LLMs' agentic capabilities, most agent systems still lack formal methods…
Attack Selection in Agentic AI Control Evaluations Meaningfully Decreases Safety (arxiv.org) An attacker that strategically chooses when to attack is much harder to catch than one that attacks indiscriminately. AI control is a safety framework for deploying capable but untrusted AI agents under the oversight of a weaker, trusted m…
Exploring Agentic Tool-Calling Decisions via Uncertainty-Aligned Reinforcement Learning (arxiv.org) Large language model (LLM)-based agents often make suboptimal tool-use decisions, including unsupported tool invocation and hallucinated direct responses, which may accumulate errors throughout multi-step interactions. Existing approaches…
DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning (arxiv.org) Deep Research (DR) has emerged as a new agentic paradigm to tackle complex, open-ended research tasks, demanding systems that can iteratively frame problems, acquire evidence, verify sources, and synthesize long-form reports. In practice,…
Act As a Real Researcher: A Suite of Benchmarks Evaluating Frontier LLMs and Agentic Harnesses in Research Lifecycle (arxiv.org) As foundation models advance and agent scaffolding becomes increasingly sophisticated, agents have demonstrated remarkable proficiency in complex, long-horizon coding tasks and even autonomous experiment execution. Despite their evolution…
Agentic Large Language Models for Automated Structural Analysis of 3D Frame Systems (arxiv.org) Large language models (LLMs) have emerged as powerful foundation models with strong reasoning capabilities across domains. Beyond reactive text generation, agentic LLMs enable autonomous workflow execution through modular task decompositio…
What Your Posts Reveal: A Benchmark and Agentic Framework for User-Level Privacy Leakage on Social Media (arxiv.org) Public social media posts can reveal private information through weak cues scattered across text, images, or metadata. Such leakage is often cumulative and cross-post: cues that appear harmless in isolation may jointly expose a user's home…
SCALE: Scalable Cross-Attention Learning with Extrapolation for Agentic Workflow Scheduling (arxiv.org) Agentic Large Language Model (LLM) systems decompose complex tasks into workflow Directed Acyclic Graphs (DAGs) whose primitives must be scheduled on heterogeneous clusters. Existing deep reinforcement learning (DRL) schedulers are tied to…
The Three-Ring Architecture: Governing Agents in the Era of On-Platform Organisations (arxiv.org) The current phase of enterprise AI deployment faces a structural failure: organisations are acquiring agentic capability without the infrastructure to govern it. The result is expected to reproduce the error of the first wave of AI deploym…
MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism (arxiv.org) Agentic Physical AI toward a Domain-Specific Foundation Model for Energy Systems: A Case Study on Nuclear Reactor Control (arxiv.org) Beyond the Black Box: Interpretability of Agentic AI Tool Use (arxiv.org) Autonomous computational catalysis through an agentic research system (arxiv.org) SW-$A^2$-Bench: Benchmarking Autonomous Software Agent Generation for Agentic Web (arxiv.org) Rethinking Code Review in the Age of AI: A Vision for Agentic Code Review (arxiv.org) MADE: Beyond Scoring via a Multilingual Agentic Diagnosing Engine for Fine-Grained Evaluation Insights (arxiv.org) SlideAgent: Hierarchical Agentic Framework for Multi-Page Visual Document Understanding (arxiv.org) AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning (arxiv.org) StepPO: Step-Aligned Policy Optimization for Agentic Reinforcement Learning (arxiv.org) Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning (arxiv.org) Gemma4_31b_fp8 keeping up with Sonnet_4.6_medium in my harness. (www.reddit.com via reddit) The Open Source Community is backing OpenEnv for Agentic RL (huggingface.co) datasette-agent-edit 0.1a0 (simonwillison.net) 7th June 2026 I'm planning several plugins for Datasette Agent which can make edits to existing pieces of text - things like collaborative Markdown editing, updating large SQL queries, and editing SVG files. Agentic editing of text is a li…
Hear Me Out, Pi Fans Lurking Here (www.reddit.com via reddit) Not For Thee Maybe After watching several interviews with Pi's creator, Mario Zechner, I've come to a painful realization: Pi was not designed with local LLMs in mind at all. He is essentially building a leaner version of the Claude CLI.
why I have just installed OpenLumara, my first Agentic Framework. Using only local models, served by LMStudio (www.reddit.comhttps) Where I came across it: https://www.reddit.com/r/LocalLLaMA/comments/1txxgpq/openlumara_a_different_kind_of_ai_agent_written/ DISCLAIMER: A good posting would be: This is what I wanted to do with Lumara. Here is what worked, here is what d…
The Illusion of Finished Work in Claude Code (www.reddit.comhttps) I wrote a short essay about something I keep noticing with Claude Code: the output often has the shape of finished work before it has actually been verified. Claude Code can now explore a codebase, plan changes, edit files, run commands, c…
Removing the human from AI coding is a harness problem, not a model problem (www.reddit.com via reddit) TL;DR: Better models won't make AI coding trustworthy but better harnesses will. Stop trusting what the agent says, verify it with code.
Agentic Self Improvement Loop Kicked Off - Watch it Evolve? (www.reddit.com via reddit) You are TEMPO, an iterative self-play refinement engine and agent harness. Your purpose is to improve an attached artifact by applying the Tempo Methodology to it.
Agentic Roobinhood (www.reddit.com via reddit) Hi, did anyone try automatic Agentic Roobinhood trading with AI Agents. I did set up with claude but not sure if it's possible to trade automaticly 00-24 based on rules that we set up?
Local agents on a MacBook Pro M5 finally feel practical to me (www.reddit.com via reddit) Realtime check X for new people to follow I have been pretty pessimistic about local models for agentic workflows for a while. Not because they were useless, but because in practice they often felt just a bit too slow, too fragile, or too…
How do you increase prompt processing speed ? (www.reddit.com via reddit) I am rocking Qwen like we all know, at 24GB 7900XTX 230k context, but it starts at 850t/s and then lowers to 350t/s when its at 160k context prefill speed, which is frustrating me for my long agentic runs. What is there to be done in order…
Reddit Agentic AI ecosystem (www.reddit.com via reddit) Here are few things I observed in Agentic AI groups in reddit: Any member who is using agentic AI in these groups are also building their own AI agents and quite competitive Almost all members use AI, but most also look with distrust to an…
Running Hermes fully local (www.reddit.com via reddit) Before Hermes was announced, I was working on my own fully local, personal agentic system. Now, I'm a novice when it comes to coding.
AI helped our test suites hit 95% coverage and bugs still slipped through. So PRs now climb an autonomous verification ladder before a human reviews. (www.reddit.com via reddit) Intro + Context [TLDR at the bottom for my skim readers 😄] We run Claude Code and Codex with a full agentic pipeline across our entire SDLC. Our workflow, by default, incorporates cross-model auditing, where Claude and Codex usually have t…
Can the Pro subscription $20, add Usage Credits to be used in the Xcode native agentic integration? (www.reddit.com via reddit) Let say, I am using the Claude agent with Xcode, I run out of my $20 equivalent usage and I have to continue coding, can I purchase Usage Credits and continue using the Xcode native agent integration with the credits at API rates?
OpenClaw + Hermes users: where does your agent army actually live? (www.reddit.com via reddit) I’m working on ClawBud, a managed Agentic OS for running OpenClaw, Hermes, Claude Code, Codex and other agents on one private cloud computer, so I’m obviously biased. But this is the problem I keep seeing everywhere: The agent itself is no…
Z.ai, we need Air! GLM GGUF wen? (www.reddit.com via reddit) First we never saw an upgraded Air model after 4.5. Then GLM 4.7 Turbo was great, but quickly surpassed for coding.
What are you running on 16Gb VRAM + 64Gb Ram? (www.reddit.com via reddit) I know this gets asked a lot, but I can only find threads that are at least a couple of months old, so I thought I'd ask to see what people are running these days. I have an RTX5080 and 64Gb Ddr5 RAM.
Claude Code thoughts: plan mode, ultracode and... beads. (www.reddit.com via reddit) Hi folks, Looking for other people's experiences and opinions here. I've been finding Ultracode very useful.
Has anyone actually replaced Claude Code / Codex with local models on an Macbook Pro M5 Max 128GB? (www.reddit.com via reddit) Considering buying a maxed out MacBook Pro M5 Max with 128GB of RAM and one of the things I want to figure out before pulling the trigger is whether local models are good enough to actually replace cloud AI coding tools. My current setup i…
skipworkflow.com – Perfect premium brand or high-converting redirect for an AI Agent / Automation SaaS (www.reddit.com via reddit) If you’re building in the AI agent or B2B automation space, you know that the entire goal of agentic AI is to eliminate clunky, multi-step legacy processes. The ultimate selling point to your customers is simple: skip the workflow and just…
Experimentation with Qwen 3.6 and Gemma 4 - Guidance needed (www.reddit.com via reddit) I’m a web developer doing mostly coding, but also project management, requirements analysis, testing, etc. I recently started experimenting with local LLMs, mostly because agentic stuff finally made them feel useful.
Claude's new background tasks panel is exactly how agentic UIs should look (www.reddit.com via reddit) https://preview.redd.it/it0c4w60xn5h1.png?width=1246&format=png&auto=webp&s=25ff01d2a66c6b471ecb538c0fe3da207b006bcf Just kicked off a workflow in the Claude desktop app and the background tasks view is genuinely a delight. One job, three…
Same LLM model but not same performance through wrappers (GitHub Copilot, M365, Vertex AI) why is that ? (www.reddit.com via reddit) Claude Code and Opus 4.7/4.8 are clearly better used direct from Anthropic than through GitHub Copilot, M365 Copilot, or Vertex AI. Sharper instruction-following, longer coherent outputs, stronger agentic behaviour on identical tasks.
Agentic ai roadmap (www.reddit.com via reddit) So right now am working as a software engineer in a startup and i have to switch my career into agentic ai roles.where do i start? i can understand python.Give me a roadmap and also the resources i could use to study.whats the scope of the…
What are the best resources to learn AI Agents in 2026? (www.reddit.com via reddit) The context is that I am a software engineering final year student. I also have experience working in ML, DL, NLP i.e I have the basics nailed.
Learn Agentic AI with quick, easy to run hands on labs, visual canvases and notebooks for free! (www.reddit.comhttps) If you’re a full-stack engineer or technical architect willing to learn production-grade enterprise agents, you need architecture, security, and type-safe systems. That’s why we builtAgentSwarms.fyi—the ultimate hands-on educational platfo…
Opus 4.8, a 40+ point elo Regression on LmArena (www.reddit.com via reddit) https://preview.redd.it/hficgswa6m5h1.png?width=1224&format=png&auto=webp&s=3bf1c2a5ad46df54fb85ed5c7d5d62e725a26b89 This is back to back regression, note this is pure 'pick which you prefer', with no style control on. With style control i…
Agentic AI for P2P mobile hardware (www.reddit.com via reddit) I have the agents, skills, mcps, rules for data validation setup. Now looking for an orchestrator.
Does anyone know of a team software solution with an agentic orchestration workflow built in? (www.reddit.com via reddit) I’ve learned a bit about creating and deploying AI agents, but I still haven’t figured out how to get them to work together. What I want is an agent that picks up a task, pulls context from wherever it lives, executes the workflow, and clo…
Gemma 4 QAT benchmark results (AMD 7900 XTX): faster, less VRAM, no quality loss (www.reddit.com via reddit) I’ve been doing lots of testing back and forth with this 7900xtx. All of my workloads were relying on qwen3.6 models, which are amazing fwiw, but I wanted some diversity in thought.
How do you integrate spec driven development with your agentic setups? (www.reddit.com via reddit) We’ve been trying to move our team toward a strict Spec-Driven Development (SDD) workflow, but I've been having a hard time. I think because of the scale we're at, our agents very often starts drifting, breaking adjacent code, or completel…
agentic code review is quietly replacing the way my team does PRs (www.reddit.com via reddit) Our PR review process used to be pretty painful. We have 6 devs and 2 seniors, and every meaningful review had to go through one of those two.
Insurance of Agentic AI (arxiv.org) Agentic artificial intelligence (AI) systems are transforming the risk landscape by extending beyond information generation to autonomous planning, tool invocation, decision execution, and persistent modification of digital and physical en…
SciVisAgentSkills: Design and Evaluation of Agent Skills for Scientific Data Analysis and Visualization (arxiv.org) Recent advances in agentic visualization have enabled the translation of natural language into executable scientific visualization (SciVis) workflows. While general-purpose coding agents show strong capabilities, they often lack the tool-s…
AdaMEM: Test-Time Adaptive Memory for Language Agents (arxiv.org) A central challenge for language agents is utilizing past experience to adapt to dynamic test-time conditions. While recent work demonstrates the promise of agentic memory mechanisms, most systems restrict retrieval to episode initiation.
Agentic Molecular Recovery via Molecule-Aware Exploration (arxiv.org) Text-guided molecular generation with LLMs often yields invalid SMILES. We argue that invalid drafts should be addressed through a shift from validity-oriented repair to identity-preserving molecular recovery: the objective is not only to…
Evaluating Agentic Configuration Repair for Computer Networks (arxiv.org) Misconfigurations in computer networks remain a major source of critical Internet outages. Research is turning to Large Language Models (LLMs) to automate the complex, error-prone task of network configuration.
From Reward-Hack Activations to Agentic Risk States: Context-Calibrated Mechanistic Monitoring in LLM Agents (arxiv.org) Language-model agents act through repeated cycles of observation, reasoning, and action selection, making safety monitoring depend on both internal model state and environment context. We study reward-hacking monitors in ReAct-style agents…
Unsupervised Skill Discovery for Agentic Data Analysis (arxiv.org) Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reusable procedural knowledge without updating model parameters. However, discovering effective skills for data analysis remains chal…
Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents (arxiv.org) LLM agents operate in two distinct regimes: open-weight agents amenable to reinforcement learning (RL) and black-box agents whose behaviour must be controlled purely at test time. Although black-box agents are often backed by state-of-the-…
Human oversight of agentic systems in practice: Examining the oversight work, challenges, and heuristics of developers using software agents (arxiv.org) Autonomous software agents hold promise to increase developer productivity but make mistakes and exhibit novel failure modes, making human oversight central to successful human-agent collaboration. Existing research on agent oversight is l…
HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers (arxiv.org) For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial r…
Knowledge Activation: AI Skills as the Institutional Knowledge Primitive for Agentic Software Development (arxiv.org) Enterprise software organizations accumulate critical institutional knowledge - architectural decisions, deployment procedures, compliance policies, incident playbooks - yet this knowledge remains trapped in formats designed for human inte…
Ontology-Constrained Neural Reasoning in Enterprise Agentic Systems: A Neurosymbolic Architecture for Domain-Grounded AI Agents (arxiv.org) Enterprise adoption of Large Language Models (LLMs) is constrained by hallucination, domain drift, and the inability to enforce regulatory compliance at the reasoning level. We present a neurosymbolic architecture implemented within the Fo…
ProfiliTable: Profiling-Driven Tabular Data Processing via Agentic Workflows (arxiv.org) Table processing-including cleaning, transformation, augmentation, and matching-is a foundational yet error-prone stage in real-world data pipelines. While recent LLM-based approaches show promise for automating such tasks, they often stru…
Industrializing Prediction-Powered Inference: The GLIDE Library for Reliable GenAI and Agentic Systems Evaluation (arxiv.org) Reliable evaluation of agentic systems requires unbiased estimates with valid uncertainty, but standard practice navigates between costly human annotation and biased LLM-as-judge proxies. Prediction-powered inference (PPI) combines both in…
Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding (arxiv.org) Long video understanding (LVU) is challenging because answering real-world queries often depends on sparse, temporally dispersed cues buried in hours of mostly redundant and irrelevant content. While agentic pipelines improve video reasoni…
A2RAG: Adaptive Agentic Graph Retrieval for Cost-Aware and Reliable Reasoning (arxiv.org) Graph Retrieval-Augmented Generation (Graph-RAG) enhances multihop question answering by organizing corpora into knowledge graphs and routing evidence through relational structure. However, practical deployments face two persistent bottlen…
CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe (arxiv.org) High-performance GPU kernels are critical to modern machine learning systems, yet developing them remains a manual, expert-driven process. Recent work has explored using LLMs to automate kernel generation, but generated kernels still fall…
AgenticRL: Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation (arxiv.org) Deep reinforcement learning has shown strong potential for enabling autonomous robots to learn complex navigational tasks. However, its practical use still depends heavily on human designed reward functions and repeated manual fine tuning,…
ProSPy: A Profiling-Driven SQL-Python Agentic Framework for Enterprise Text-to-SQL (arxiv.org) Large language models have substantially advanced Text-to-SQL systems, yet applying them to enterprise-scale databases remains challenging. Real-world databases often contain large and heterogeneous schemas, incomplete metadata, dialect-sp…
Deliberate Evolution: Agentic Reasoning for Sample-Efficient Symbolic Regression with LLMs (arxiv.org) AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning (arxiv.org) Adaptive Auto-Harness: Sustained Self-Improvement for Agentic System Deployment on Open-Ended Task Streams (arxiv.org) Introducing new capabilities to GPT-Rosalind (openai.com) We’re introducing a new model update to our GPT‑Rosalind series purpose-built for life sciences research at enterprise scale. It combines GPT‑5.5’s agentic coding and tool-use capabilities with stronger model intelligence in core drug-disc…
Rehumanizing global health care with agentic AI (www.technologyreview.com) Sponsored Rehumanizing global health care with agentic AI As health-care providers face looming staff shortages, AI agents are automating complex administrative tasks and even clinical decisions so humans can focus more on patient care. In…
How Endava builds an agentic organization with Codex (openai.com) Endava, a global software contracting firm with engineers across Europe, the Americas, and Asia, has been an early adopter of Codex. For a business built around shipping quality software for banks, insurers, retailers, and media companies,…
I used to think 2026 would be the year AI finally blew everyone's minds again (www.reddit.com) That belief lasted until I actually read the trend lists this year. Every single one leads with "agentic AI" or "autonomous agents." Sounds like AI is still the star, right?
ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM (huggingface.co) ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM Enterprise Article Published May 27, 2026 Artificial Analysis and IBM Software Innovation Lab are launching…
Built a 5-stage agentic pipeline using Claude Code + MCP - here's what actually makes it reliable at scale (www.reddit.com) The thing nobody tells you about Claude Code + MCP workflows: the model is only as reliable as the instructions you give it before it touches any external tool. We learned this the hard way building a sales pipeline that connects Claude Co…
760M Tokens… MTD 👀 (www.reddit.com) I built an enterprise grade revenue management tool for a specific real estate vertical. Thus far, it has beyond dominated past human performance.
Introducing FLYWHEEL.md 🌀 (www.reddit.com) Agentic coding just crossed a line. Claude Code, Cursor, Codex, OpenClaw, the list keeps growing, and they all run fully autonomous now: /loop, /goal, crons.
"Human-in-the-Loop" Is Not a Reliability Strategy (www.reddit.com) A lot of AI agent systems quietly rely on this architecture: |> Agent does something risky |--> Human notices problem |--> Human fixes it That's not reliability - that's operational debt. One thing I've learned building agentic systems: If…
Microsoft Copilot Cowork Exfiltrates Files (simonwillison.net) 26th May 2026 - Link Blog Microsoft Copilot Cowork Exfiltrates Files (via) The biggest challenge in designing agentic systems continues to be preventing them from enabling attackers to exfiltrate data. In this case Microsoft Copilot Cowork…
Rethinking organizational design in the age of agentic AI (www.technologyreview.com) Sponsored Rethinking organizational design in the age of agentic AI For agentic AI to deliver material benefits to organizations, it can’t be layered onto existing operations. Instead, enterprise leaders must approach it as a systems-level…
The reason small-model agent stacks aren't the default has nothing to do with whether they work (www.reddit.com) Last June, NVIDIA published a position paper called "Small Language Models are the Future of Agentic AI," and the argument was easy enough to wave off at the time: most of what an agent actually does is unglamorous work like reading input,…
Everyone talks about AI wrappers… nobody talks about agentic SEO (www.reddit.com) Everyone talks about AI wrappers… nobody talks about agentic SEO Feels like most founders are still thinking about SEO like it’s 2021: write blog target keyword wait 6 months 😭 Meanwhile people are building agent workflows that: find low c…
I’ve done it!!! FINALLY I have become a (quasi-local) summoner!!! AMA [imtiredboss.jpg] (www.reddit.com) Hi friends! After 2.5 years of a LOT of hard work...starting from the GPT-3.5 bottom and now we're here...I've finally got my personal 1.0 local-ish** AI playground whipped into shape.
Gemini 3.5 flash beating gpt 5.5 a bigger and more pricer model in agentic benchmarks (second image is from zapier automation benchmarks) (www.reddit.com) could not extract summary
Post I/O Review related to AI (pros and cons ) (www.reddit.com) Post I/O Review related to AI (pros and cons ) Well it was not disastrous as many people say but there were some pros and cons which everyone will agree with. Btw gemini 3.5 flash is absolutely amazing model don't pay attention to some peo…
Buckle up: Google is set to remake search with agentic AI in 2026 (arstechnica.com) Last year marked the beginning of Google’s explicit focus on AI search, and this year’s I/O solidified that shift. As Google’s search VP Liz Reid said during the keynote, “Google search is AI search.” This change is well underway, and the…
The next phase of OpenAI’s Education for Countries (openai.com) A new era of agentic AI is here. With more than 900 million people using ChatGPT each week, and more than 4 million using Codex, agents have the potential to place far greater creative, intellectual, and technical power in the hands of eve…
Claude Opus is still king for agentic coding, but Claude's app workflow is falling behind (www.reddit.com) I'm a paid Claude user, and I still think Claude Opus is the king model for agentic coding and serious coding work. The model is not the problem.
Agents creating their own language : reality or not ? Compliance issue. (www.reddit.com) Hi ! I've read a while ago that some AI's tend to agree on their own language to talk one to another over time.
Claude Code has 240+ models via NVIDIA NIM gateway (www.reddit.com) TIL Claude Code has 240+ models via NVIDIA NIM gateway — Nemotron-3 120B for agentic coding is surprisingly good So I was messing around with /model in Claude Code today and noticed something most people probably don't know about — after t…
Cost of Using LLMs in Agentic AI and RAG workflows (www.reddit.com) Hey Everyone ML engineer and Researcher here I’ve been researching production issues in Agentic AI + RAG systems and one pattern keeps showing up repeatedly: Context inefficiency. Not just retrieval quality — but the actual economics and s…
The Nanny Pattern (www.reddit.com) All good software turns into patterns. Agents are going to need theirs.
an alternative = similar experience to using windsurf but on local? (www.reddit.com) so i am not that experienced when it comes to llms, i just have ollama and open webui and occasionally test (play with) new releases from time to time. a few weeks ago i started using Windsurf, i do not know coding or anything but i loved…
Best llama.cpp launch config for Qwen3.6 27B on RX 7800 XT (16 GB VRAM) for OpenClaw? (www.reddit.com) I’m trying to find the best llama-server launch command / runtime config for running Qwen3.6 27B GGUF with full GPU offload on ROCm. I’m currently using the IQ4_XS quant, but I’m not sure if that’s the best option for my setup.
Hey Everyone! I’ve been experimenting with OpenCode + BoneScript for structured backend generation. (www.reddit.com) I’ve been experimenting with making coding agents generate complete backends using BoneScript, and it’s working surprisingly well. BoneScript’s structure ends up being extremely LLM-friendly: declarative system layout predictable architect…
Claude for Small Business launched this week with 8 integrations. Most SMBs use 20+. What does that mean for the rest of the stack? (www.reddit.com) Anthropic launched Claude for Small Business on Tuesday. The package includes 15 prebuilt agentic workflows and 8 named integrations: Intuit QuickBooks, PayPal, HubSpot, Canva, DocuSign, Google Workspace, Microsoft 365, and Slack.
Switching from Copilot: Is the $20 Pro plan enough for 4h/day of agentic coding? (www.reddit.com) I’m planning to switch from GitHub Copilot to Cursor. I’m currently working in a project and I spend about 4 hours a day on weekdays coding, mostly using AI as a agent.
Sea's View on the Future of Agentic Software Development with Codex (openai.com) Sea's View on the Future of Agentic Software Development with Codex | OpenAI Skip to main content Research Products Business Developers Company Foundation(opens in a new window) Log inTry ChatGPT(opens in a new window) Research Products Bu…
Cursor vs. Windsurf vs. Claude Code: Which offers the highest Opus limits for a $200 budget? (www.reddit.com) Hey everyone, I'm currently trying to decide between Cursor, Windsurf, and Claude Code for my daily workflow. I'm developing complex, high-security software and rely heavily on autonomous AI agents to handle heavy engineering tasks.
Data readiness for agentic AI in financial services (www.technologyreview.com) Sponsored Data readiness for agentic AI in financial services The success of agentic AI in financial services depends not just on smarter models, but on an authoritative context data store—one that is accessible, reliable, and governed at…
You're abusing your subscription with agentic 24/7 workflows and that's why we all get restrictions and limits (www.reddit.com) Subscription tiers were designed around interactive human use, but autonomous loops changed the usage. It makes sense that companies separate autonomous work from subscriptions.
Are we at the point now where all it will take to create AGI is saying the correct sequence of words to Codex or Claude Code? (www.reddit.com) Seems to me like they can basically do everything software related now so surely a good enough sequence of input tokens would be enough. I guess in a way it's guaranteed since the frontier labs are doing all their work through agentic flow…
"Maybe me too": Elon Musk accepts some of the blame for Claude learning to blackmail users from "evil" online AI stories (fortune.com via reddit) Anthropic has released new findings on why its Claude bot blackmailed users as part of an experiment conducted by the AI company last year—and Elon Musk is jumping in to take some of the blame. Last week, Anthropic published a report sayin…
Meet Mindflow, the free local mindmap with local AI dev by some quantitized models :P (www.reddit.com) Hi there, it's my first post there and i'm not a native english speaker so what's follow is (mostly) translated by an AI. I had fun building a mindmap tool in a single monolithic HTML file.
Prompt alignment is an architectural ceiling: The Soap Bubble Problem and the biological precedent for Runtime Governance. (www.reddit.com) The Soap Bubble Problem The current paradigm of solving agentic alignment relies on writing better rules into the context window or refining the weights (RLHF). This approach isn't failing, but it is hitting a hard architectural ceiling.
Is Anyone building Useful skills or workflows on Claude? (www.reddit.com) I've been exploring Claude as a base for building custom tools and automations — things like structured prompts, agentic workflows, and even full mini-apps powered by its API. Curious whether others are doing the same: - Are you building s…
Gartner says 40% of AI agent projects will be cancelled by 2027. Are we in an agent bubble? (www.reddit.com) Gartner just dropped this prediction and I can't stop thinking about it.  **40% of agentic AI projects will be cancelled by 2027.
$392M in AI agent security funding at RSAC 2026 - the market just validated what we've been building (www.reddit.com) The numbers from RSAC 2026 are wild. $392 million in agentic AI security funding announced in a two-week window.
Do you have any agentic sw developers in your org? (www.reddit.com) Hi all, Do you or your org use/put in place an agentic de developer? To which humans give the requirements and it gives out PRs?
AI agents are becoming more useless, not more intelligent — and they’re wasting more tokens than ever (www.reddit.com) I’m honestly getting tired of the hype around “AI agents” when the reality is getting worse, not better. Every AI model claims to be “intelligent”, “agentic”, “capable”, or “autonomous”, but when you actually try to use them for a real tas…
Practical lessons from 50K lines of production code with Claude Code (jappiesoftware.com via reddit) I've been using Claude Code in full agentic mode for two months — not just autocomplete, but letting it write features, run tests, read CI output, and push fixes. Around 50K lines of production code.
Moderators deleted post (www.reddit.com) I posted recently about QwenPaw (really cool Alibaba model) and Agentscope… Asking if anyone has any interesting experience with it? However what I’ve got back is someone doubting Alibaba absolutely astounding agentic R&D team work (yes -…
Best agentic model for 3090TI and 32gb ddr5 (www.reddit.com) Title, looking for the best combination of speed and intelligence.
How to get an LLM caught up on a 1000 page document? (www.reddit.com) I’m looking to be able to use a small, like 4-9B LLM, that would be able to ingest an extremely dense code book, 1000 plus pages, and me be able to use it to summarize and ask questions about that document. The use case will be offline str…
Anthropic raising Claude limits + adding SpaceX capacity feels like a bigger signal than people realize (www.reddit.com) Anthropic just raised Claude usage limits and announced a compute deal with SpaceX. To me, that feels bigger than “more GPUs.” If Claude Code, finance agents, security workflows, and long-running agent tasks are the direction, then capacit…
what's genuinely so special about claude? (www.reddit.com) there are like a huge amount of open source LLMs out there, and a huge amount of companies competing against Anthropic. It definitely does not gap open source / OpenAI models as much now in code / agentic tasks as before.
Running Claude code on VPS with a $20 plan will my account get banned (www.reddit.com) I just want to be able to run my Claude code on an EC2 instance instead of my local computer and access it via Telegram using the official plugin and a $20 Claude subscription for personal agentic stuff. What I’m wondering is: is there any…
I analyzed 922 agentic task trace and found the secret weapon of DeepSeek v4 (www.reddit.com) I recently did a benchmark of deepseek v4 in agentic tasks. Performance-wise, it's one of the best open source models, as expected.
Is the future agentic Slack, not agentic IDE? (www.reddit.com) One dev with Claude Code is already fast, that's been my experience using it daily. The moment more than one person on a team starts running agents in parallel, things fall apart fast: overlapping work, conflicting assumptions, and a flood…
Vibe coding and agentic engineering are getting closer than I'd like (simonwillison.net) Vibe coding and agentic engineering are getting closer than I’d like 6th May 2026 I recently talked with Joseph Ruscio about AI coding tools for Heavybit’s High Leverage podcast: Ep. #9, The AI Coding Paradigm Shift with Simon Willison.
I am trying to replace Claude in an agentic TDD pipeline with local LLM (www.reddit.com) Based on my last post and some comments, I added Qwen3.6:latest and Devstral to the evaluation. I am still looking for suggestions on which local model can run a complete TDD loop autonomously.
Claude 4.7 "Literalism" Claim vs. Reality: Why does it keep ignoring formatting and logic constraints? (www.reddit.com) According to the release notes, Claude 4.7 is supposed to prioritize literal instruction adherence over intent guessing. However, I’m seeing some major regressions in reliability: PEP8 Violations: Despite strict instructions to keep import…
Claude can now build and publish websites to a domain right from chat (www.reddit.com) I built teenyapp.com, a tool that lets Claude on the web (or any AI chat) build and deploy a full website end to end from a single pasted link. The problem teenyapp solves: every time I asked Claude to actually ship something, the agentic…
I will soon have $100k to build an in-house LLM server. Goal: Best agentic coding model. (www.reddit.com) Hey all, I am about to secure funding for a startup I've been working on and I'll have a $100k budget for building a server for doing agentic coding. I'm wondering, what do you think I should get as far as hardware goes?
Agentic Convergence-in-Depth: solving the One Nine reliability problem (www.enterprisevibecode.com via reddit) Claude Code dipped under 99% uptime in March 2026 — most critical services aim for 99.9%. The verification systems we trust for human-written code don't necessarily scale to code no one reads.
Anyone with M3 Ultra 256gb, some questions (www.reddit.com) I'm thinking to buy one. Just need to understand what I'm getting into before I do.
I am building l' Agence , an opensource AI governance stack. (www.reddit.com) Towards a Governance layer for AI agents With these last 2 weeks bringing a few high profile and costly Agentic accidents , it seems like an appropriate time the community started discussing Agentic governance more actively. So I am just c…
Since the industry is rapidly changing, I put together a comprehensive article explaining the current best AI coding agent software for May, 2026 (lmsa.app via reddit) The software development lifecycle has transitioned into an era defined by agentic orchestration, moving beyond the simple autocomplete paradigms of the early 2020s. As of May 2026, the landscape is d
Need advice on Qwen 3.6 27B INT4 quantization (www.reddit.com) Hello everyone, I think Qwen 3.6 27B is good enough that it might take a while before we get a clearly better model at a similar size. I have a single headless RTX 3090 with a 300W power limit.
RTX 5080 with 16 GB VRAM, 64 GB RAM best quantized model for programming? (www.reddit.com) I have an RTX 5080 with 16 GB of VRAM and 64 GB of RAM. What's the best quantized model I can run locally on this setup for agentic programming?
“Free” image generation isn’t free. You’re paying for it whether you use it or not. (www.reddit.com) flat-rate AI subscriptions hide a pretty wild cost-to-value mismatch, and image generation is the issue. the spread in what users actually cost on the same plan is easily 10-100x.
Should I buy Claude Pro as a BTech student — especially for the agentic/coding side? Honest takes wanted (www.reddit.com) https://preview.redd.it/l23rgf5z4qyg1.png?width=1402&format=png&auto=webp&s=73a7a278ca50527c9605488141d7e5ea48089a85 Hey everyone, I'm a BTech (AI/ML) student considering Claude Pro ($20/month) but want to separate the real value from the…
claude-code-best-practice 🇵🇰 repo crossed 50,000★ and is Pakistan most starred repo in 2026 (www.reddit.com) I started this repo with claude to maintain all the claude best practices. 100% developed using claude code.
Best Agentic Coding model I can run on the new Macbook M5 Max? (www.reddit.com) 16-inch MacBook Pro - M5 Max Component Specs Chip Apple M5 Max CPU 18-core (6 super cores @ 4.6 GHz, 12 performance cores @ 4.4 GHz) GPU 40-core (Hardware-accelerated ray tracing + Neural Accelerators) Memory Bandwidth 614 GB/s Neural Engi…
Is AGI the End For Local LLMs? (www.reddit.com) If leading AI conpanies are after AGI and the whole chatbot/agentic AI is just a phase for them to get to the end goal, then what does that mean for local LLMs? I would like to believe local LLMs are the future, but if AGI is achieved, do…
thinking of gemma 4 26B vs 31B (www.reddit.com) I see a big difference in agentic coding between gemma-4-31B-it-Q5_K_M and gemma-4-26B-A4B-it-UD-Q8_K_XL. The 26B model is much faster because of A4B and generally works well, but there is a big difference in thinking.
Reasoning Guard: Stopping LLM Thinking Loops at the Proxy Layer (www.reddit.com) Reasoning Guard: Stopping LLM Thinking Loops at the Proxy Layer I’ve been running Qwen3.6 MoE behind a vLLM proxy and hit a specific reliability issue: occasional runaway reasoning loops. This isn’t a criticism of Qwen3.6.
AI --> GenAI --> Agentic AI --> What Next? How Can One Understand This Industry? (www.reddit.com) Is artificial intelligence truly overrated, or are we underestimating the scale of its future impact? While some argue that AI is surrounded by hype and inflated expectations, others believe it will fundamentally reshape industries, econom…
Roman Yampolskiy predicts 3 to 5 years until AGI and a dangerous Agentic future Post AGI! (www.youtube.com via reddit) About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
We’re entering a weird phase of AI agents where the tech is finally good… but the expectations are still stuck in 2023. (www.reddit.com) Everyone keeps talking about “autonomy,” “multi-agent swarms,” and “agents that think like humans,” but the real breakthroughs I’m seeing aren’t flashy at all. They’re boring.
Qwen 35B-A3B as an always-on agentic loop on a 16GB Mac M4: disk became the bottleneck before RAM (www.reddit.com) M4 Mac Mini, 16GB unified, basic spec. For a few weeks I had Qwen 3.5 35B-A3B UD-IQ3_XXS (12GB on disk) running under llama.cpp with --mmap and --flash-attn.
I built Claudex, a free-to-try open-source CLI for Claude Code-style workflows (www.reddit.com) https://reddit.com/link/1sxh0ec/video/egfs5inxtsxg1/player I built Claudex specifically for people who like Claude Code-style agentic coding workflows but want a simpler plug-and-play terminal setup The setup is the main thing I wanted to…
Got the system prompt of Claude Design, released it for free (www.reddit.com) Claude Design is great, but I wanted to have similar capabilities with any LLM or agentic tools (Claude-Code, Codex etc). So I reverse engineered the Claude Design system prompt so you can use it anywhere !
OpenAIs Agentic Shift (www.reddit.com) OpenAI is rolling out agents capable of autonomous, multi-step workflows, with reports suggesting they are exploring an acquisition of agent orchestration company Windsurf. Google's $40B Anthropic Investment: Google is committing up to $40…
Agentic AI is here for mobile. We built an autonomous agent that creates and self-heals its own background integrations. (www.reddit.com) Hey everyone, we just launched our iOS AI Agent out of a 1k-user beta, and I wanted to share the architecture - specifically how we handle the privacy vs. utility tradeoff.
Got a server with 8x A6000's how do I setup? (www.reddit.com) Hey guys got some resources that just became available at org. What's the quickest way to get setup on a multigpu setup?
Putting Lipstyk on a pig - agents write most of my code, so I wound up making a static slop analysis tool (www.reddit.com) lipstyk — static analysis for machine-generated code patterns I've been neck deep in agentic dev for a while. Started on Pi, ended up building my own toolset on top of it, and at this point the agents output most of the code while I play t…
My entire subnet just got permanently IP banned because of LangChain web scraper. Please help. (www.reddit.com) I feel sick. I built a simple agentic workflow to pull competitor docs and synthesize them for a project.
I created SpecDD - an agent-native spec framework that clears most agentic dev roadblocks, including capability degradation on large and complex codebases. Works great with Claude! (www.reddit.com) If you've been building with AI coding tools, you've probably hit this wall at least a few times: Code kind of works but drifts from your architecture Endless prompt loops to fix small misunderstandings and assumptions Context and patterns…
QClaw-4B — a 4B agent model fine-tuned for tool use and agentic workflows (www.reddit.com) QClaw-4B is a 4-billion parameter language model fine-tuned for agentic tasks and tool use, designed for use with OpenClaw-compatible agent frameworks. Despite its compact size, QClaw-4B achieves state-of-the-art results in the 4B class, m…
Agentic company OS: (www.reddit.com) I shared this project here before when it was mainly a governed multi-agent execution prototype. I’ve kept working on it, and the current implementation is materially more complete, so I wanted to post an update with what actually exists n…
Using agentic coding safely. (www.reddit.com) Building an application by hand lets you create a mental model of how the applications works. But agentic coding forces the agent to create a mental model each time you start a new session.
DeepSeek-V4: a million-token context that agents can actually use (huggingface.co) DeepSeek-V4: a million-token context that agents can actually use Focusing on long running agentic workloads. Running a frontier open model as an agent today breaks in predictable ways.
Anthropic tested removing Claude Code from the Pro plan (arstechnica.com) Anthropic caused a stir among developers with what appeared to be a surprise change to its pricing plan: The company signaled that Claude Code, the popular agentic development tool, would no longer be available to subscribers on the $20-pe…
Google unveils two new TPUs designed for the "agentic era" (arstechnica.com) Most of the companies that have fully committed to building AI models are gobbling up every Nvidia AI accelerator they can get, but Google has taken a different approach. Most of its cloud AI infrastructure is based on its line of custom T…
Best Agentic AI Operating Systems 2026 (honests review) (www.reddit.com) 1. SimplAI Best for regulated enterprises that need air-gapped deployment and the fastest time-to-production (under 30 days).
How to best utilize local LLM give my hardware? (www.reddit.com) Hi all, I’m new to local LLMs but as someone who extensively uses agentic coding I thought I’d try it out. I am running a MacBook Pro with M3 Max 64gb ram.
Kimi K2.6 as a replacement for Opus 4.7? Testing with OpenCode. (www.reddit.com) Brand new dual 3090 PC - what should I install first for the best local agentic coding experience? (www.reddit.com) When did you fully adopt agentic coding? (www.reddit.com) This agentic SKILL will save you a lot of money (medium.com via reddit) Best setup for agentic coding (largely unsupervised) 8gb VRAM and 32 GB Sys RAM, Olamma Cloud and a frontier sub? (www.reddit.com) Hi! I'm looking for a coding agent workflow where I can run a local model for implementation and something either cloud based ala Olamma Cloud and some sort of frontier subscription (ChatGPT, Claude, whatever) to have continuous coding wit…
Testing Qwen3.6 with Hermes Agent on agentic coding. Locally with llama.cpp. (www.reddit.com) I'll be testing the setup and try out the Hermes Agent live: https://www.youtube.com/live/q5vqvwZykRI
Tried hermes agent with local gemma4 on ollama. free tokens are nice but the agent quality gap vs cloud is still huge (www.reddit.com) Saw a post about running hermes agent locally with gemma4 through ollama. zero api costs, unlimited tokens, full privacy.
NVIDIA V100 32GB for AI in 2026 (www.reddit.com) hello. i have the oportunity of buying Nvidia V100 with 32GB for about 915$ / 775 euro.
Managing "collective consciousness" across multiple AI models without breaking the bank—how do you sync context? (www.reddit.com) Been running a distributed AI workflow to dodge token limits and play to each model's strengths, but I'm hitting a massive wall with context continuity. My current pipeline: Claude → High-level architecture & tech stack decisions (the "arc…
Distilled my AI Agents and Skills definitions (www.reddit.com) I have significantly distilled my AI Agents and Skills definitions. My goal is to reduce the context size and token usage without impacting the quality of my development team.
Spring benchmark update: Gemma 4 / Qwen3.5 vs Gemma 3 / Qwen3 for chat (www.reddit.com) Google and Alibaba recently shipped Gemma 4 and Qwen3.5, so I wanted to see whether the new generations are actually better on my setup. My context is private local chat running on my own hardware, a Mac mini M4 Pro.
Why Your LLM Leaderboard Scores Don't Matter (www.reddit.com) Leaderboard scores often don’t translate to production performance — even with newer agentic / Arena-style evals. The main issue seems to be that benchmarks are standardized, while real systems depend heavily on prompts, data distribution,…
m5 pro 64gb worth it for local agents or wait? (www.reddit.com) I am currently on an m3 mbp with 24gb ram. For regular python and django work the machine is perfect and i have no need to upgrade for speed.
Cloud AI is getting expensive and I'm considering a Claude/Codex + local LLM hybrid for shipping web apps (www.reddit.com) I'm a designer who's been working on web apps and plugins for the past 5 months. Right now I'm building an After Effects plugin (close to shipping) and a music learning game experience.
computation is the missing bedrock of agentic memory (www.reddit.com) link to full article in comments TLDR: - LLMs are the wrong substrate for memory. Prediction can't do routine work, repeatable work consistently.
Running a full agentic coding loop locally on a 3090. Here's what actually works in 2026. (www.reddit.com) After months of testing, I finally have a local setup that doesn't make me want to go back to the API. Hardware: RTX 3090 (24GB VRAM) Models tested: Qwen2.5-Coder 32B Q4_K_M, DeepSeek-Coder-V3 Q4, Llama 3.3 70B Q3_K_M Inference: llama.cpp…
I have a Macbook AIR M5 Base and I want to run an Agentic Coding program, similar to Claude Code or Codex. Besides the model, how do I do it? I've already tried with Ollama, VS Code, Opencode, and haven't been able to. (I'm not a developer, sorry) (www.reddit.com) I started developing an app with Claude, but the credits run out very quickly. I thought that now with my new computer I could run something directly on it.
Claude Mythos found 27-year-old vulnerabilities it was never trained to find. That's the part enterprise AI roadmaps aren't accounting for. (www.reddit.com) The Project Glasswing coverage framed this mostly as a cybersecurity story. I think that misses the more interesting part.
Self employed, Small biz folks: Have you unlocked huge revenue gains with Claude specifically? (www.reddit.com) We've heard about the increase in productivity in engineering departments in large companies with Claude Code, but I'm curious about implementations in small businesses. I'm especially curious about folks who work for themselves (i.e. non-…
Excess of Agentic AI... does that make sense? (www.reddit.com) Does it make sense for AI companies to be limiting access to the AI models themselves, precisely because of Agentic AI? Let’s think about it, if there is already not enough computing power to sustain the gigantic, and increasingly excessiv…
How can I use agentic AI to automate my WFH dayjob? (www.reddit.com) TLDR: I work in cybersecurity, 99% as a SOC analyst. It's tedious repetitive work, ideal for automation.
Here is what most people get wrong about saving tokens with AST tools (www.reddit.com) I spent the last day benchmarking codebase context tools against a real AI agent. Not synthetic token counts.
Agentic Guardrails: 4 markdown workflows to improve the output quality of AI coding agents (github.com via reddit) Agentic Guardrails Reusable workflow templates that keep AI coding agents from shipping sloppy code. These are markdown-based instructions that any AI coding agent can follow — Cursor, Claude Code, opencode, Aider, Gemini CLI, or anything…
reliable way just to have cursor agentic ability and IDE with external provider api without cursor pro ? ( via reddit) could not extract summary
Gemma 4: Byte for byte, the most capable open models (deepmind.google) Gemma 4: Byte for byte, the most capable open models Today, we are introducing Gemma 4 — our most intelligent open models to date. Purpose-built for advanced reasoning and agentic workflows, Gemma 4 delivers an unprecedented level of intel…
Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective (huggingface.co) Netomi’s lessons for scaling agentic systems into the enterprise (openai.com) OpenAI co-founds Agentic AI Foundation, donates AGENTS.md (openai.com) Inside Mirakl's agentic commerce vision (openai.com) Introducing Aardvark: OpenAI’s agentic security researcher (openai.com) Buy it in ChatGPT: Instant Checkout and the Agentic Commerce Protocol (openai.com) Introducing Gemini 2.0: our new AI model for the agentic era (deepmind.google) Achieving 10x growth with agentic sales prospecting (openai.com) Practices for Governing Agentic AI Systems (openai.com)