#agentic

1796 items

Qwen3.6-27B released! (www.reddit.com) +549141 9w

Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight.

↯ Qwen 3.6 qwen agentic
Google introduces TPU 8t and TPU 8i (www.reddit.com) +36148 9w

The culmination of a decade of development, TPU 8t and TPU 8i are custom-engineered to power the next generation of supercomputing with efficiency and scale. https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/eight…

agentic
So, this week claude wiped agentic AI startups with a new update. Also, as they have mythos now, they will ship things very fast without any trouble (www.reddit.com) +34666 11w

Honestly, they are a full pack now. A few hours ago, they released Claude managed agents which lets you build long-running, autonomous agentic systems plus with their new suite of apis, engineering teams can harness Claude's exponential po…

↯ Anthropic Mythos mythos agentic
Our eighth generation TPUs: two chips for the agentic era (blog.google via hn) +322157 9w

https://cloud.google.com/blog/products/compute/tpu-8t-and-tp...

agentic
Unpopular opinion: OpenClaw and all its clones are almost useless tools for those who know what they're doing. It's kind of impressive for someone who has never used a CLI, Claude Code, Codex, etc. Nor used any workflow tool like 8n8 or make. (www.reddit.com) +322131 9w

openclaw codex agentic+1
Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to All (qwen.ai via hn) +290157 10w

Qwen Studio offers comprehensive functionality spanning chatbot, image and video understanding, image generation, document processing, web search integration, tool utilization, and artifacts.

↯ Qwen 3.6 qwen agentic
‘Addictive’ agentic coding has developers losing sleep (www.reddit.com) +25693 12w

The good, bad, and ugly of coding with agents here: https://leaddev.com/ai/addictive-agentic-coding-has-developers-losing-sleep “I’m coding into later hours of the day not because I’m told to do so, but because I can’t get myself to get up…

agentic
mistralai/Mistral-Medium-3.5-128B · Hugging Face (huggingface.co via reddit) +194119 8w

https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF Mistral Medium 3.5 128B Mistral Medium 3.5 is our first flagship merged model. It is a dense 128B model with a 256k context window, handling instruction-following, reasoning, and…

↯ Mistral mistral agentic
Opus 4.7 destroys all trust in a mature instruction set built iteratively throughout product development (www.reddit.com) +18129 10w

Earlier generations showed iterative improvement as the instruction set was matured around agentic limitations. We've immediately regressed back to square one with Opus 4.7, and the model is not afraid to admit to it.

↯ Opus 4.7 opus agentic
So... has anyone actually figured out whose model Elephant Alpha is yet? (www.reddit.com) +17539 9w

moe deepseek qwen+1
2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints (www.reddit.com) +14938 7w

WARNING: wait before download from HF: I just realised my upload of the new versions with the additional fix in the chat template has not completed yet. I will remove this warning once done The recent PR to llama.cpp bring MTP support to Q…

↯ Qwen 3.6 qwen llama agentic+2
Google ramps up agentic AI efforts amid pressure from Anthropic (www.reddit.com) +13224 9w

agentic anthropic
Read through Anthropic's 2026 agentic coding report, a few numbers that stuck with me (www.reddit.com) +13135 10w

Anthropic put out an 18-page report on agentic coding trends. Skimmed it expecting the usual hype but a few things actually caught me off guard The biggest one: devs use AI in ~60% of work but only fully delegate 0-20% of tasks.

↯ Copilot copilot agentic anthropic+1
Caught the massive OpenAI Codex model leak on video before it was patched! (GPT-5.5, Arcanine, Glacier-alpha) (www.reddit.com) +13021 9w

Hey everyone, I opened up Codex today and was greeted by this massive list of unreleased and internal models. I managed to get a screen recording of the dropdown right before OpenAI seemingly realized the mistake and patched it out.

↯ GPT 5.5 gpt-5 codex agentic+1
ExLlamaV3 Major Updates! (www.reddit.com) +10049 6w

Turboderp has a been on an absolute tear recently, in the endless battle to cram new llamas into smaller, faster boxes. We started off last month with the release of gemma 4 support, and continued with improved caching efficiency.

↯ Gemma 4 gemma agentic
Multi-Agentic Software Development Is a Distributed Systems Problem (kirancodes.me via hn) +9948 10w

Multi-agentic Software Development is a Distributed Systems Problem (AGI can't save you from it) Recently, I've been thinking a lot about scaffolding and languages for managing systems of LLMs coordinating with each other — new programming…

agentic
Qwen 3.6 35B crushes Gemma 4 26B on my tests (www.reddit.com) +7329 9w

I have a personal eval harness: A repo with around 30k lines of code that has 37 intentional issues for LLMs to debug and address through an agentic setup (I use OpenCode) A subset of the harness also has the LLM extract key information fr…

↯ Qwen 3.6 gemma qwen agentic
Qwen Introduced FlashQLA (www.reddit.com) +7017 8w

Introducing FlashQLA: high-performance linear attention kernels built on TileLang. 2–3× forward speedup.

qwen agentic
Anthropic just confirmed why 90% of non-coding AI agents fail in production (www.reddit.com) +6224 4w

Anthropic recently published an incredibly deep breakdown analyzing millions of real human-agent tool calls across their public API, and they shared a breakdown of where these agents are being deployed. They said “Software engineering make…

agentic anthropic
MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant) (www.reddit.com) +5821 6w

TL;DR Results from the title are for single inference with 2 prompt of 1k and 15k tokens. So no MTP (as it’s slower for big prompt), no DFlash (working too but slower for big prompt), no quant used (full precision wanted) and the results a…

↯ Qwen 3.6 vllm qwen agentic+1
Cloudflare's AI Platform: an inference layer designed for agents (blog.cloudflare.com via hn) +5719 10w

AI models are changing quickly: the best model to use for agentic coding today might in three months be a completely different model from a different provider. On top of this, real-world use cases often require calling more than one model.

agentic
Aaaaand I cancelled my Cursor subscription (www.reddit.com) +5690 9w

The timing is funny because I was thinking about this all week, and the SpaceX announcement was the final nail in the coffin. I switched to pi for agentic coding, and it’s sooo good.

cursor agentic
Computer use in Gemini 3.5 Flash (blog.google via hn) +5217 1d

Introducing computer use in Gemini 3.5 Flash Computer use is now a built-in tool supported in Gemini 3.5 Flash, delivering our best performance yet for agentic computer use tasks. Previously only available as a standalone Gemini 2.5 comput…

↯ Gemini 3.5 gemini agentic
Open-Source Agentic QA Harness with Memory (github.com via hn) +507 5w

Docs · Demo · Issues agent-qa Open-source Agentic QA Harness with Memory Write tests in natural language. agent-qa runs them across web and mobile with execution memory, catching regressions before release.

agentic
LiquidAI/LFM2.5-8B-A1B · Hugging Face (huggingface.co via reddit) +4915 4w

looks like you can run it on any potato (A1B)! https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF from LiquidAI: LFM2.5 is a new family of hybrid models designed for on-device deployment.

vllm moe llama+1
We are finally there: Qwen3.6-27B + agentic search; 95.7% SimpleQA on a single 3090, fully local (www.reddit.com) +489 7w

LDR maintainer here. Thanks to the strong support of r/LocalLLaMA community LDR got very far.

↯ Qwen 3.6 tool-calling ollama opus+1
Tried claude code. Hate it. (www.reddit.com) +4880 10w

Just posted this in r/ClaudeCode , thought I'd come to a different flavoured echo chamber and see what the cursor community makes of my experience. Note I've not upgraded to cursor v3 yet, and I don't know if I want to.

↯ Cursor 3 cursor agentic claude-code
Rio de Janeiro's "homegrown" LLM appears to be a merge of an existing model (github.com via hn) +4419 11d

🤗 Model | 💻 Github | 🧭 ModelScope | 🚀 Nex-AGI | 🔀 OpenRouter (Enjoy two weeks free starting June 9!) Nex-N2 An agentic model with Agentic Thinking. Today, we are officially releasing and open-sourcing our next-generation model, Nex-N2 — an…

agentic
AI agent runs amok in Fedora and elsewhere (lwn.net via hn) +422 2w

AI agent runs amok in Fedora and elsewhere [LWN subscriber-only content] Agentic AI systems can be used to do a variety of things autonomously on behalf of a human user: open or manage bugs, generate code, submit pull-requests, and (appare…

agentic
KVarN: Native vLLM KV-cache quantization back end by Huawei (github.com via hn) +424 3w

⚡️ Built for agentic and long-context workloads. 💡 KVarN delivers 3-5x more KV-cache capacity and up to ~1.3x the throughput of FP16, so you fit far longer contexts and serve more concurrent requests, with FP16-level accuracy.

vllm agentic
Consider running a bigger quant if possible (www.reddit.com) +4040 9w

Just a little reminder that *if* it is possible for you to run bigger quants, do it. I ran Qwen 3.6 IQ4_XS at 128k context was very much disappointed because it would loop, make formatting errors, implement wrong things etc.

↯ Qwen 3.6 qwen agentic
Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard! (www.reddit.com) +3917 5w

Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard! little-coder × Qwen3.6-35B-A3B hit 24.6% (±3.2), and now land above Gemini 2.5 Pro on Gemini CLI (19.6%) and Qwen3-Coder-480B on Terminus 2 (23.9%).

gemini agentic
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents (arxiv.org via hn) +385 7w

We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability…

↯ Glm glm agentic
I tested 8 LLMs as tabletop GMs - a 27B model beat the 405B on narrative quality (www.reddit.com) +3734 9w

↯ Qwen 3.5 gemma qwen agentic+1
Qwen3.6 27B FP8 runs with 200k tokens of BF16 KV cache at 80 TPS on a single RTX 5000 PRO 48GB (www.reddit.com) +3626 7w

----START HUMAN TEXT---- Hi all, I've seen a bunch of posts about squeezing 27B onto a 24GB card and all the quantization tricks involved in doing so. It's all amazing work, but at the end of the day a quantized model with quantized KV wil…

↯ Qwen 3.6 qwen agentic
Grok 4.3 achieves higher overall intelligence over 4.20 with less of a cost, at the price of slightly higher hallucination rate. (x.com via reddit) +3514 8w

xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20 The release of Grok 4.3 places just above Muse Spar…

↯ Hallucination hallucination grok agentic
Agentic coding deserves more than a chat box bolted onto VS Code (github.com via hn) +3313 8d

Polypore Agentic desktop IDE. Language agnostic, OS agnostic.

agentic
Affirm Retooled for Agentic Software Development in One Week (medium.com via hn) +3222 8w

medium.com Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.

agentic
HOT TAKE: local models + agent harnesses are now capable enough to hand off junior-level IT professional tasks to [human written] (www.reddit.com) +3023 7w

This post will have a slight old-man-shakes-fist-at-sky vibe, because….well… I’m older, so if you’re not into that, then please feel free skip it. I have been contributing to this sub for like 3 years now but I’m fearful this post will lik…

↯ Qwen 3.6 agentic
The joy and pain of training an LLM from scratch (www.reddit.com) +3015 10w

mii-llm just released a detailed technical report on the development of the Zagreus and Nesso model families: a set of 0.4B parameter language models trained from scratch with a focus on edge deployment, multilingual capability, and Europe…

agentic
Ask HN: How do you get into a flow state when using AI to code? (news.ycombinator.com) +2937 2w

Before agentic coding, I always prided myself on how long I could work in a flow state. I was really good at working deeply.

agentic
Lessons for Agentic Coding: What should we do when code is cheap? (www.dbreunig.com via hn) +2925 7w

10 Lessons for Agentic Coding What should we do when code is cheap? Lately, this blog has featured a lot of writing about agentic coding.

agentic
AA introduces Coding Agent Index - Performance Comparisons between Model & Harness Combinations (www.reddit.com) +286 6w

The Artificial Analysis Coding Agent Index includes 3 leading benchmarks that represent a broad spectrum of coding agent use: ➤ SWE-Bench-Pro-Hard-AA, 150 realistic coding tasks that frontier models struggle with, sampled from Scale AI’s S…

↯ Swe Bench swe-bench agentic
Comparing Qwen3.5 27B vs Gemma 4 31B for agentic stuff (www.reddit.com) +2826 10w

Models compared: Qwen3.5-27B-UD-Q5_K_XL gemma-4-31B-it-UD-Q5_K_XL Main flags for boths --flash-attn on \ --n-gpu-layers 99 \ --no-mmap \ -c 150000 \ --temp 1 --top-p 0.9 --min-p 0.1 --top-k 20 \ --ctx-checkpoints 1 \ --jinja \ -np 1 \ --re…

↯ Qwen 3.5 gemma agentic
What's your favorite local MCP server? (www.reddit.com) +2744 4w

I've seen so many rag this, memory that projects. What projects are people actually using day to day for agentic workloads.

rag mcp agentic
Show HN: Agentic interface for mainframes and COBOL (www.hypercubic.ai via hn) +255 6w

Hi HN, we’re Sai and Aayush, and we’re building Hypercubic (https://www.hypercubic.ai/), bringing AI tools to the mainframe and COBOL world. (We did a Launch HN last year: https://news.ycombinator.com/item?id=45877517.) Today we’re launchi…

agentic
Cursor autocomplete is (still) way ahead of its peers! (www.reddit.com) +2529 9w

I switched back to Cursor this week after using antigravity + claude code for almost 6 months and I had almost forgotten how good cursor autocomplete is. I am still someone who likes to make manual edits, write markdown docs myself and not…

cursor agentic claude-code
Qwen3.6 35B MoE on 8GB VRAM — working llama-server config + a max_tokens / thinking trap I ran into (www.reddit.com) +2528 9w

↯ Qwen 3.6 moe llama agentic
A disciplined Cursor 3.0 Agentic workflow for complex backend/system design tasks (www.reddit.com) +258 11w

I think I’ve finally settled on a Cursor workflow that actually makes sense for me in terms of cost, quality, and control. Posting this because the whole model/usage story is confusing as hell, and this is the first setup that’s felt stabl…

cursor opus agentic
A week after elephant, Ant dropped Ling-2.6-1T on OpenRouter for free. How high is the ceiling for Chinese model labs now? (www.reddit.com) +244 9w

What stood out to me isn’t just the model itself, but how quickly they shipped another one after Ling-2.6-Flash. Ling-2.6-1T seems to be positioned more around stronger agentic ability than a totally different direction.

agentic
obsidian + claude is the perfect local memory stack whats the web-based equivalent? (www.reddit.com) +2316 4w

been seeing a lot of people hook up claude code directly to a local obsidian vault lately. for a personal workflows, it’s honestly really really good.

agentic claude-code
GPT-5.5 is lowkey blowing my mind (www.reddit.com) +2311 8w

Just spent the whole morning testing GPT-5.5 in ChatGPT and the jump in agentic reasoning and complex task handling is ridiculous.It plans multi-step workflows, uses tools properly, checks its own work, and actually gets stuff done instead…

↯ GPT 5.5 gpt-5 chatgpt agentic
Stanford/Princeton AI4S unveils LabOS² -the agentic AI system that spanned from dry-lab planning to wet-lab execution, using physical AI to assist scientists - now is capable of performing fully autonomous cell culture workflows. (www.reddit.com) +224 7w

Introducing LabOS². An early look at autonomous cell culture, as a long-horizon physical AI workflow for biomed.

agentic
Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering (arxiv.org via hn) +21 2w

LLM-based Multi-Agent (LLM-MA) systems are increasingly applied to automate complex software engineering tasks such as requirements engineering, code generation, and testing. However, their operational efficiency and resource consumption r…

agentic
Launch HN: Hyper (YC P26) – Company brain to power agentic development (news.ycombinator.com) +2114 3w

Hey HN, we’re Shalin & Kanyes, best friends who've been hacking together for 10+yrs, and now founders of Hyper (https://heyhyper.ai/). Hyper is a shared “company brain” that plugs into information flowing inside a company to make AI agents…

mcp agentic
Same task in github-copilot, pi, claude-code, and opencode with Qwen3.6 27B (www.reddit.com) +2117 5w

I wanted to know how much of a coding agent's performance came from the model and how much came from the harness, so I vibed a setup to allow me to test multiple agentic harnesses/model combinations on the same task. ALl the images above a…

↯ Copilot ↯ Qwen 3.6 copilot agentic
AI agents dont just help banks they can now BE your bank (www.reddit.com) +2010 10w

Seeing alot of posts here about AI agents built for financial institutions but I think the bigger shift is AI agents doing the banking for you not for the bank. I run a small dev shop and saw a blog about opening a bank account with AI thr…

agentic
Show HN: YourMemory, agentic memory is a pruning problem, not a hoarding problem (yourmemoryai.vercel.app via hn) +19 2w

This is a project that I have been building for a while now, YourMemory is a solution to agentic memory which focuses on pruning of noise rather than hoarding of data. In the current state of agentic memory most of the context is stored in…

rag agentic
Folks running qwen 3.6 27b for agentic work. Do you dare to use q4_k_m? (www.reddit.com) +1919 4w

I dont have good experience running q4_k_m, the difference to q6 is "a few errors an hour" to " a few errors every couple of days". Edit: How it fails?

↯ Qwen 3.6 qwen agentic
Turning local agents into self-optimizing agents (www.reddit.com) +193 4w

I was experimenting with a self-optimizing agentic pipeline to climb the benchmark leaderboard (TerminalBench). On a 10-task subset, I got the performance to rise from ~30% → ~90%.

agentic
Launch HN: Chert (YC P26) – Twilio for iMessage (www.trychert.com via hn) +1957 4w

Hey HN! We’re Gary and Ian, and we’re building Chert (https://www.trychert.com/), an API for businesses to send, receive, and automate iMessage conversations at scale.

agentic
Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA (www.reddit.com) +19 4w

I benchmarked vision-capable LLMs (the "just attach the PDF and let the model read it" pattern) against OCR-based pipelines on 30 long, image-heavy PDFs from MMLongBench-Doc (https://github.com/mayubo2333/MMLongBench-Doc). There were 171 q…

↯ Sonnet 4.5 rag sonnet agentic
GPT vs Claude in a bomberman-style 1v1 game (www.reddit.com) +195 10w

A few weeks ago, ARC-AGI 3 was released. For those unfamiliar, it’s a benchmark designed to study agentic intelligence through interactive environments.

arc-agi agentic
My LinkedIn network is about to be aggressively flooded with Claude Code certifications (www.reddit.com) +188 5w

Anthropic dropping 13 completely free official courses with certificates is an absolute godsend for the community. But let’s be real: half of us are going to power-speed through the developer modules, download the PDF, and immediately upda…

mcp agentic anthropic+1
The pacman benchmark: finally a viable local agentic coding agent with Qwen 3.6 27b (www.reddit.com) +178 5w

One way I like to test new models, is by one-shoting (with a good prompt) a single webpage clone of the classic arcade game pacman. I usually do 3 attempts and keep the best one.

↯ Glm ↯ Qwen 3.6 glm qwen chatgpt+2
Poolside Laguna XS.2 (www.reddit.com) +17 8w

33B A3B MoE, Apache 2 licensed. Reported agentic results put it about level with Qwen 3.5 35B A3B, behind the 3.6 version.

↯ Qwen 3.5 moe qwen agentic
Kv cache quantization: ignorance, or malice? (www.reddit.com) +1533 7w

I run Qwen-3.6 27B FP8 on vllm for long-horizon agentic coding harness workloads with high context window and concurrent sub-agents. On two 3090s that aren’t used for anything else, it seems reasonable to expect a good balance between spee…

↯ Qwen 3.6 vllm qwen agentic
Build collaboratively as a group using single claude code session via Meetings (www.reddit.com) +155 9w

I recently came across a agentic skill which lets claude code join meetings and got access as a early user from a product hunt group and I would like to share my experience on using it. The skill lets you join google meet, teams or zoom.

agentic claude-code
Haystack: Open-Source AI Framework for Production Ready Agents, RAG (haystack.deepset.ai via hn) +144 2d

The Open Source AI Framework for Production Ready Agents, RAG & Context Engineering Haystack Sets the Standard for Agentic AI Across Industries Why Teams Choose Haystack for their AI Workflows Build Transparent, Context Engineered AI Syste…

rag agentic
GLM-5.2: Frontier Intelligence, Open Weights (twitter.com via hn) +145 9d

Introducing GLM-5.2: Frontier Intelligence, Open Weights - Significant improvements in coding and agentic tasks - Strong long-horizon capabilities with a 1M context window - Two levels of reasoning effort: GLM-5.2 (max) pushes the limits,…

↯ Glm ↯ GLM 5.2 glm agentic
Agentic harness for theoretical physics research (www.reddit.com) +144 6w

Hi everyone, at Hugging Face we've been developing agentic harnesses for various domains and today we're releasing physics-intern to tackle research-level problems in theoretical physics. It's a multi-agent framework which we designed to m…

↯ GPT 5.5 gpt-5 gemini agentic
Five Eyes agencies issue first coordinated agentic AI security guidance (www.reddit.com) +141 7w

Five Eyes agencies just issued the first coordinated multi-nation security ruling on agentic AI. CISA, NCSC, and their Australian, Canadian, and New Zealand counterparts co-published guidance telling organizations to prioritize resilience…

↯ Security security agentic anthropic
governance wall in agentic workflows. why are we stuck past rag? (www.reddit.com) +1415 9w

keep seeing the same pattern across agent projects. we're good at building agents that find information, but the moment we ask them to actually do something (update a crm, trigger a payment, touch a production database), things grind to a…

rag agentic
Show HN: A CLI that writes its own integration code (docs.superglue.cloud via hn) +149 10w

We run superglue, an OSS agentic integration platform. Last week I talked to a founder of another YC startup.

agentic
Agentic Coding Is a Trap (khalilstemmler.com via hn) +138 6d

Agentic Coding Is a Trap At the end of a call this week, a developer said something to me that I haven't been able to stop thinking about. We'd been talking for over an hour.

agentic
Git platform built for agentic era (gitlawb.com via hn) +1312 7d

gitlawb node. Live operator view for a federated gitlawb node: repos, peers, IPFS pins, recent ref updates, and the identity this machine is advertising to the network.

operator agentic
Why 80% of agentic AI demos don't make it to production (www.reddit.com) +134 5w

Agent demos are easy. Production agents are hard.

↯ Hallucination ↯ Tool Use tool-use hallucination agentic
Agents Aren't Coworkers, Embed Them in Your Software (www.feldera.com via hn) +13 8w

Agentic management software is all the hype today: What started with Moltbot and OpenClaw now has a lot of competition: ZeroClaw, Hermes, AutoGPT etc. These systems work well and allow you to train and build generic agent loops that are ge…

openclaw agentic
Jackrong/Qwopus3.5-9B-Coder-GGUF · Hugging Face (huggingface.co via reddit) +125 5w

Qwopus3.5-9B-coder is specially optimized and fine-tuned for high-performance 🤖 Agentic Coding, complex Tool Calling, and logical reasoning. 💡 Why the 9B Dense Model?

↯ Fine Tuning ↯ Qwen 3.5 fine-tuning agentic
ChatGPT 5.5 x Blender (youtu.be via reddit) +129 8w

I tested the new ChatGPT 5.5 with Blender, and it was surprisingly capable. It created 3D scenes, fixed modelling issues, searched for missing resources, and improved the scene step by step.

chatgpt agentic
Doing real coding work locally for the first time (www.reddit.com) +1217 9w

↯ Qwen 3.6 codex agentic claude-code
Why is agentic AI so expensive? (www.reddit.com) +1252 9w

↯ Copilot ↯ Cowork cowork openclaw copilot+3
Qwen3.6 agent + Cisco switch: local NetOps AI actually works! (www.reddit.com) +124 9w

↯ Qwen 3.6 cline qwen llama+1
Claude Agent can potentially replace feeds (www.reddit.com) +1211 10w

I’ve been experimenting with how information consumption changes in an agentic internet, and this setup has been surprisingly powerful. Instead of scrolling feeds or relying on algorithms, I set up agents that roam the web based on my pref…

agentic claude-code
anyone else stuck at their desk during long agentic runs? (www.reddit.com) +1210 10w

so I've been running some complex agentic refactors and these sessions go 6+ hours because the agent is grinding through a massive legacy codebase, and I can't really walk away. close the laptop and the process dies. re-initializing takes…

agentic
Is Qwen3.6 current king for local agentic use? (www.reddit.com) +1122 4w

I've been testing other models but it seems like nothing even come close to Qwen3.6 35B A3B for agentic use. The worse I'd get is a loop sometimes, while Gemma4 produced broken tool calls occasionally and I couldn't even get GLM 4.7 Flash…

↯ Glm ↯ Qwen 3.6 glm moe agentic
Launch HN: Runtime (YC P26) – Sandboxed coding agents for everyone on a team (www.runtm.com via hn) +11 5w

Hey HN, We're Gus and Carlos from Runtime (https://runtm.com). We're building infra that lets your whole team (including non-engineers) ship with Claude Code, Codex, and other agents without engineering having to handhold every session.

codex agentic claude-code
Simpler self hosted alt to Open WebUI (www.reddit.com) +116 6w

Got Qwen3.6 27B running on my newly assembled 4x 3090 rig (s/o 3090-club) and I'm trying to get the people in my house to adopt the local workflow. Open WebUI has improved a lot in the recent updates, but I still found it pretty rough for…

↯ Qwen 3.6 chatgpt agentic
Gemini api showing agentic gemini models (www.reddit.com) +114 6w

could not extract summary

gemini agentic
(Rant ;)) Make your benchmarks realistic (www.reddit.com) +112 6w

Everybody here is posting their optimizations for running different models - thats good but make these benchmark realistic as speed is not one factor to run llm effectively. Context size is key - with agentic/coding/rag work you need to ha…

rag agentic
Tendril – a self-extending agent that builds and registers its own tools (github.com via hn) +112 8w

Tendril A self-extending agentic sandbox that demonstrates the Agent Capability pattern — where the model discovers, builds, and reuses tools autonomously across sessions. Built with AWS Strands Agents SDK and Tauri.

agentic
Llama.cpp parameters for Qwen 3.6 with RTX 3090 (www.reddit.com) +1112 9w

Hi, I'm trying to run Qwen 3.6-35B on my RTX 3090 (24 GB of VRAM) but I'm not sure about 2 thing: - Which variant of the model to use ? (Q4_K_S, Q3_K_XL, other ?

↯ Qwen 3.6 qwen llama agentic
Is agentic commerce an opportunity or a chaos? (www.reddit.com) +1111 10w

I have been watching agentic commerce closely and it is interesting. AI agents are picking products for people now, and it's wild.

agentic
2x Asus Ascent GX10 - MiniMax M2.7 AWQ - cloud providers are dead to me (www.reddit.com) +1110 10w

Hello, I've been on a quest to get something "close enough" of Opus 4.5 running locally, for agentic coding, as SWE with 15 years of experience. I tried with one spark (yeah I'm calling my Asus Ascent GX10 sparks - they're the same), with…

↯ Minimax ↯ Qwen 3.5 minimax qwen opus+1
High-stakes game of musical chairs! (www.reddit.com) +105 4w

I made this image (with nanobanana 2.0) to illustrate what I think is happening in the current AI race. Right now, there's a heavy decline in quality and access to AI tools.

agentic
how do you guys handle the conversation with skeptical clients when selling agents? (www.reddit.com) +1011 5w

struggling with a bit of a reality check lately and wanted to see if anyone else is running into this. been pitching agentic workflows for a while, and I've realized that leading with the tech - the orchestration the RAG, the "intelligence…

rag agentic
The power of structured workflows and small local models (www.reddit.com) +102 5w

A month ago, I experimented with a very basic home-rolled agent loop with a handful of tools and found it worked surprisingly well in spite of how crude it was: https://www.reddit.com/r/LocalLLaMA/comments/1sl7f8e/homerolled_loop_agent_is_…

agentic
Which industries are adopting Agentic AI the fastest right now? (www.reddit.com) +108 6w

Feels like every week there’s a new “AI agent” startup or enterprise rollout. Curious which industries are actually adopting Agentic AI the fastest in real-world workflows, customer support, finance, healthcare, dev tools, operations, etc.?

agentic
As of today, what's the *most stable* model to run on a 32Gb RAM Mac w/ 256k context? (www.reddit.com) +1032 6w

Hey everyone, I've been playing around with Gemma4 and Qwen3.6 on my 32Gb Macbook Pro M2 Max since their release but I'm struggling at finding: The best software to run it (oMLX, llama.cpp, ...) The best model + quant to pick The best sett…

↯ Qwen 3.6 llama agentic
I created an agentic orchestration pipeline for music video generation (www.reddit.com) +1016 6w

I’ve been building Uisato Studio, a workflow-based AI creation platform for audiovisual work. This is the Music Video mode: upload an image + audio, and the system analyzes the input, generates visual direction, creates clips, handles b-ro…

agentic
why llama.cpp can’t combine speculative decode methods? (www.reddit.com) +105 7w

dicking around with the new mtp speculative decode with qwen3.6 27b, and it’s great. but for agentic coding i’ve seen significant improvements from ngram, because a decent fraction of the time (e.g.

↯ Qwen 3.6 llama agentic
Watching the agent-tooling space dominate GitHub trending right now. Sharing the Github tracker we built and use internally, in case it's useful (www.reddit.com) +109 9w

Something interesting happening on GitHub trending: Agentic infrastructure repos are growing faster than anything else right now. Today's top three by 24h growth: obra/superpowers: +2.9k stars (agentic skills framework, methodology for sof…

openclaw codex cursor+2
Don't ask Qwen 3.6 35b to give you aski image of Yoshi :) (www.reddit.com) +1011 9w

https://preview.redd.it/dfqed57qgsvg1.png?width=1706&format=png&auto=webp&s=3859209698d2e844e2731326e355d60928658f8a The most fun part was reasoning, here is a gist: https://gist.github.com/anzax/5f06716c66180013cd715f6c2e5848df There is a…

↯ Qwen 3.6 qwen agentic
Show HN: OpenHack – OSS security scanner, 40x cheaper, on par with Opus 4.6 (github.com via hn) +9 3w

⏚ OpenHack Open Source Agentic Security Scanner & Verifier for your codebase. Like Claude Code Security / Codex Security but open source and exclusively uses open source models.

↯ Opus 4.6 ↯ Opus 4.6 codex opus agentic+1
Show HN: Local Coding Agent with LLMs to Delegate Tool Calls to Small AI Models (github.com via hn) +9 4w

Open Agent Tools Coder Open Agent Tools (oats) enables small-to-large self-hosted ai models to use local source code when running tool-calling agentic workloads. We actively data mine 20,970+ (2+ TB) popular github repos using large and sm…

tool-calling agentic
Removing Vision from model (www.reddit.com) +99 4w

I removed mmproj file from models to remove vision and save my vram. But just curious, is this really don't affect its text ability?

↯ Qwen 3.6 qwen agentic
Show HN: Headless Cloud Security – Headless SaaS has come to security (www.sysdig.com via hn) +9 6w

The cloud security company I work for, Sysdig, launched “Headless Cloud Security” last week. The short version: as attacks get faster and more automated, security tooling is going to need to evolve beyond dashboards and humans clicking thr…

cursor mcp agentic+1
Lovable is the first coding agent platform to adopt AIUC-1 (SoC-2 for AI Agents) (www.aiuc-1.com via hn) +9 6w

More than half of all LLM tokens now go to writing code - and coding agent adoption is growing rapidly across the enterprise. In this whitepaper, co-authored with Lovable, we show how AIUC-1 addresses the unique risks of agentic development

agentic
CopilotKit raises $27M to build the Agentic FrontEnd Stack (techcrunch.com via hn) +9 7w

Many companies today provide AI simply as a chatbot inside their apps: You type in (or dictate) what you want it to do, and the AI bot goes and tries to do it. Still, the experience tends to feel clunky.

agentic
Does the "6 months gap" still hold? (www.reddit.com) +915 7w

Hi. It is quite a consensus that the "jump" in quality of agentic development happened sometime in December 2025, transforming from "nice to have", to actually performing.

↯ Opus 4.5 opus agentic
Roo code shuts down, Team will focus on roomote agent (twitter.com via hn) +9 9w

When we started Roo Code in late 2024 by forking Cline and adding what's now widely known as dangerously-skip-permissions, agentic coding was rough and experimental. But Roo Code took off fast: 3 million installs, a passionate community, r…

cline agentic
How to share agentic workflows, instructions, skills, across team members, teams, organizations (www.reddit.com) +912 9w

I work for a fairly large company (1000 devs). My team has 6 members.

agentic
Shared Dictionaries: compression that keeps up with the agentic web (blog.cloudflare.com via hn) +9 9w

Today, we’re excited to give you a sneak peek of our support for shared compression dictionaries, show you how it improves page load times, and reveal when you’ll be able to try the beta yourself.

agentic
Agentic Search Models with OpenSearch and Elasticsearch (bonsai.io via hn) +8 2w

Tuning search is tricky, and the tools of yesterday are good but require lots of effort and data to get right. In this post I'm going to introduce purpose-built agentic LLMs for searching and reranking, which are an easy drop-in solution f…

agentic
Robinhood launches credit card for AI agents with 3% cash back (fortune.com via reddit) +8 4w

In the latest sign of AI’s growing footprint in online commerce, Robinhood announced on Wednesday that users can now instruct agents to make purchases on their behalf using the Robinhood Gold card. To illustrate the potential of agentic sh…

agentic
how do you scale infrastructure for ai agents on a budget? (www.reddit.com) +89 5w

we're running an agentic pipeline that does multi-modal file processing - large files, often hundreds of mb per request. The actual agent logic works fine.

agentic
agents have a high false-positive rate? how to handle? (www.reddit.com) +87 6w

been digging into agentic workflows for specialized image processing and high-stakes data triage, and honestly have problems with trust. you've probably seen the pattern.

agentic
I've created the fastest local AI engine for Apple Silicon. Optimised for agentic use. (www.reddit.com) +83 7w

https://preview.redd.it/p0rqofxvrtzg1.png?width=1460&format=png&auto=webp&s=8ce5b18b4ddaad9b71f71fd8eb623839fc9c6c8b For weeks I've been working on creating the fastest local AI engine for Apple Silicon... And I finally did!

↯ Qwen 3.6 agentic
ATS vs. multi-agent. where does sensible automation end and over-engineering begin? (www.reddit.com) +88 8w

the traditional ATS is predictable and cheap to run. it's a known quantity.

agentic
Ive automated my email/sms/phone (www.reddit.com) +815 10w

we got it good boys! how many of you are doing this??

↯ Gemma 4 gemma agentic
Show HN: Neural Particle Automata (selforg-npa.github.io via hn) +7 3d

Neural CAs model self-organizing pattern formation on grids. Now the grid is gone.

agentic
Don't share your opinion, if you didn't test it !!! (www.reddit.com) +7 5w

I see many people giving their opinion based on what they previously saw or based on others and making their own opinion. Even though they don't test models thoroughly, they still give their option which is so frustrating.

↯ Gemini 3.5 gemini opus agentic
Show HN: Bonsai 1.7B ternary model at 442T/s on M4 Max (agents2agents.ai via hn) +71 7w

We took a recently released Bonsai 1.7B ternary model from PrismML (https://github.com/PrismML-Eng/Bonsai-demo) and ran our agentic evolution search on it for 6 hours to optimize the Metal kernels. The search was fully autonomous.

llama agentic
what are the biggest risks of agentic AI in supply chain production? (www.reddit.com) +77 8w

we've been testing agentic AI for inventory replenishment and exception handling. the goal was to get past simple "if-then" rules and have agents actually weigh trade-offs, like margin vs.

agentic
Hiring: GTM Engineer at Lovable.dev 🚀 (www.reddit.com) +72 8w

Lovable ($400m ARR, 200k projects built per day) opened our first US hub in Boston, and we're looking for a highly skilled GTM Engineer to be the founding technical member of our enterprise GTM function there. You'll build scalable agents,…

agentic
Are there any agentic coding harnesses that AREN'T built on JS and Node? (www.reddit.com) +723 8w

With how often we hear about supply-chain attacks on npm I am hesitant to install any apps that use it, let alone something like an agent harness that will run constantly unsupervised.

agentic
Show HN: gcx – The Official Grafana Cloud CLI (github.com via hn) +7 9w

Hi HN, We’re excited to share gcx, a new CLI we’ve been building for Grafana Cloud. With the rise of agentic coding tools like Claude Code and Codex we're building faster than ever, but these agents are often blind to what’s actually happe…

codex agentic claude-code
AI governance isn't failing because we lack regulation i mean like it's failing at execution (www.reddit.com) +78 10w

There's a lot of movement around AI regulation right now (EU AI Act, US frameworks, etc.), but in practice many of these governance models don't survive contact with real, agentic systems. I've been digging into why compliance frameworks t…

agentic
Claude Code – Disabling telemetry also disables 1-hour prompt cache TTL (github.com via hn) +7 10w

Claude Code [![npm]](https://www.npmjs.com/package/@anthropic-ai/claude-code) [npm]: https://img.shields.io/npm/v/@anthropic-ai/claude-code.svg?style=flat-square Claude Code is an agentic coding tool that lives in your terminal, understand…

agentic anthropic claude-code
Show HN: Persona.js – a vanilla-JS agent UI library with native WebMCP (MIT) (www.persona-chat.dev via hn) +68 5d

Hey everyone. My cofounder and I are formally open sourcing (MIT) persona.js.

agentic
GLM-5.2: Chop off 84% of the volume from a 1.5TB model, still retain 82% power (twitter.com via hn) +61 7d

Introducing GLM-5.2: Frontier Intelligence, Open Weights - Significant improvements in coding and agentic tasks - Strong long-horizon capabilities with a 1M context window - Two levels of reasoning effort: GLM-5.2 (max) pushes the limits,…

↯ Glm ↯ GLM 5.2 glm agentic
Launch HN: TesterArmy (YC P26) – Agents that test web and mobile apps (tester.army via hn) +6 7d

Hey HN - we’re Oskar, Szymon, and Piotr, and we’re building TesterArmy (https://tester.army). TesterArmy is an agentic testing platform that runs end-to-end checks before deployment and in production.

agentic
Lessons Learnt from Writing an AI Agent (www.browserless.io via hn) +6 8d

TL;DR - Don't host the agent, use it as a service - Tokens matter, don't rely only on vision - Stick to mature technologies - A stubborn LLM is the one! At Browserless, we've spent the last few months building our own agentic browsing expe…

agentic
We built an agent that runs our AI data platform (encord.com via hn) +6 9d

Introducing Merlin: The Agentic Intelligence Layer for Encord Co-Founder & CEO at Encord Software is changing faster than ever. Interfaces are becoming more conversational and intent-driven, and iteration cycles are rapidly getting shorter.

agentic
Show HN: Ito – Code reviews that run code (www.ito.ai via hn) +62 9d

I'm Evan and I made Ito.ai It's code review that actually runs your code. The result is that it finds more bugs with a smaller false positive rate.

agentic
Training SID-1 to beat GPT-5 at search with 1k+ QPS RL (turbopuffer.com via hn) +6 5w

SID-1 is an agentic search model that is 24x faster than GPT-5.1-high, 374x cheaper than Sonnet 4.5, and achieves 1.9x higher recall than traditional RAG pipelines. Here's how we trained it using large-scale RL on turbopuffer.

↯ Sonnet 4.5 gpt-5 rag sonnet+1
Are LangGraph agents and other agent frameworks becoming obsolete? (www.reddit.com) +69 5w

Hi all, Over the last 2 years, I’ve built around 10-15 LangGraph agents for very specific tasks in our company. But lately, it feels like all that work isn’t really maintainable for a single AI/agent engineer.

mcp agentic
Why GPU compilers are MORE important in the agentic era (scale-lang.com via hn) +6 5w

Part 2 of a series on why Spectral and SCALE exists. In Part 1, I argued that cross-vendor portability in accelerated computing must be delivered by a company, rather than a committee, because the implementation is the standard.

agentic
The architecture of "Agentic Twins": How Avatarinc is using OpenClaw to build verifiable Al agents (www.reddit.com) +65 6w

The architecture of "Agentic Twins": How Avatar.inc is using OpenClaw to build verifiable AI agents. There is a massive gap in the agent ecosystem right now: capability vs.

openclaw agentic
the saas vs. custom software debate in healthtech: why we built a custom agentic layer (www.reddit.com) +65 6w

been working with a tier-1 diagnostic imaging network that ran into a straightforward problem: scan volumes jumped 22%. the obvious answer is to license a saas tool.

agentic
How many of you tried BeeLlama.cpp? How's it? Agentic coding possible with 8GB VRAM? (www.reddit.com) +67 6w

We'll be getting those features(check bottom link) on mainline soon or later anyway. But for now this fork could be useful to see the full potential of our poor GPUs(and also big, large GPUs).

↯ Qwen 3.6 gemma agentic
Spec-driven agentic coding is quietly making us worse at the job of supervising agents (www.reddit.com) +6 6w

Been running an agent-heavy workflow on a mid-size TypeScript monorepo for about six months. Orchestrator on top, sub-agents for codegen, a human (me, mostly) writing specs and reviewing diffs.

agentic
Show HN: Open-source 2D IDE for managing agent CLIs (49agents.com via hn) +61 6w

First Agentic IDE, Open-Source Every agent, terminal, and repo on one infinite canvas. See and control everything from any device.

agentic
We built an agentic runtime to make AI automations easier to set up and more reliable (www.reddit.com) +62 7w

Hey all, our small team just launched Friday Studio and we'd genuinely love any feedback you have. It's an AI runtime that turns prompts, skills, and tools into repeatable configurations that you can reliably run and share.

agentic
Lasso Security 2024: ~20% of LLM-suggested packages don't exist — and attackers now register the popular hallucinations with malware (slopsquatting) (www.reddit.com) +65 7w

Lasso Security ran a study in 2024 — they measured frontier models suggesting fake package names about a fifth of the time. The follow-up problem: attackers have started registering the most-commonly-hallucinated names with malicious code…

↯ Security security agentic
Show HN: Hollow is an open-sourced self-modifying agentic system (github.com via hn) +6 7w

___ ___ __ | || |/ \| | | | / \ \ \ / / | __ | (_) | |__| || () \ \/\/ / ||||\/||\/ \/\/ This repo is three agents running on qwen3.5:9b on your machine, picking their own goals, writing and deploying their own tools, forming opinions abou…

↯ Qwen 3.5 agentic
Terminal Bench score for Mistral 3.5 Medium (www.reddit.com) +610 8w

So... there were a couple promising benchmark scores reported by mistralai in the model card for Mistral 3.5 Medium, BUT there wasn't the one that I usually care about the most, which is TerminalBench 2.0.

↯ Mistral ↯ Mistral 3.5 mistral agentic
engineering teams celebrating agentic workflows that returned the same result two runs in a row (www.reddit.com) +61 8w

edit for credit: trash on X

agentic
GitHub Copilot is moving to usage-based billing and retiring annual plans (news.ycombinator.com) +61 8w

Hi there, You're receiving this because you have an annual Copilot Pro or Pro+ plan. GitHub Copilot isn't the same product it was a year ago.

↯ Copilot copilot agentic
Speculative decoding with Gemma-4-31B + Gemma-4-E2B enables 120 - 200 tok/s output speed for specific tasks (www.reddit.com) +611 8w

So for my project I was using up until now either Gemini 3 / 2.5 Flash or Flash-lite. All my use cases are not agentic, simply LLM workflows for atomic tasks like extracting references from the law, classifying, adjusting titles to nominat…

↯ Gemini 2.5 gemma gemini agentic
How do you actually know if Opus 4.7 is better for your specific agent use case? (www.reddit.com) +64 9w

Anthropic shipped Opus 4.7 yesterday. The headline numbers are real: 64.3% on SWE-bench Pro (up from 53.4%), best-in-class on MCP-Atlas at 77.3% for multi-tool orchestration, 14% improvement on multi-step agentic reasoning, and one-third f…

↯ Swe Bench ↯ Opus 4.7 swe-bench opus mcp+2
What do you use for autocomplete in 2026? (VS Code) (www.reddit.com) +612 11w

I tried co pilot and windsurf but they weren't satisfying. Co pilot being not smart and windsurf too slow (I tried with free tiers).

↯ Windsurf windsurf cursor agentic
Agents Make Engineering Hard Again (ninjapenguin.co.uk via hn) +52 3d

Agents Make Engineering Hard Again Intro I think we’re nearing the end of the “prompt demo” phase of AI. The door is now firmly ajar on the engineering phase, and the exciting news for engineers is that these shiny new agentic systems are…

agentic
JetBrains Air: Agentic Development Environment (air.dev via hn) +5 3d

JetBrains Air is the Agentic Development Environment where Codex, Claude Agent, Gemini CLI, and Junie execute independent task loops without interfering with each other.

gemini codex agentic
Ask HN: What agentic directory structure do you use? (news.ycombinator.com) +5 10d

The more I use Claude Code to generate large swaths of systems, the more I feel like we are missing a lot of practices and tools. The first bit that really annoyed me was the lack of tracking prompts.

agentic claude-code
The Agentic Development Lifecycle (www.voodootikigod.com via hn) +51 11d

Here is a thing that happens every week now, especially as more enterprise organizations open their minds to Vibe Coding and start expanding past the initial prompting phase(s). They make one agent play the product manager.

agentic
Two LLM UI Patterns That Aren't Chat (poyo.co via hn) +5 3w

Two LLM UI Patterns That Aren't Chat Intro Chat is still the default LLM interface, and for most cases that's fine. Agentic harnesses are still built around a single linear conversation at their core.

agentic
Show HN: Strudai, browser based agentic wrapper around Strudel (strudai.com via hn) +5 3w

Hi all! Together with a friend (and Claude Code) we built this project for fun.

agentic anthropic claude-code
ai governance for agentic workflows in regulated environments. what actually works in production? (www.reddit.com) +510 4w

mapping out the production architecture for an ai agent system in a heavily regulated environment (compliance-heavy, structured reporting requirements). the agent operates in a high-stakes workflow, so every automated suggestion or flag ne…

agentic
trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser (www.reddit.com) +51 4w

Trained a prompt injection classifier using ml-intern + DeepSeek v4 Flash. DistilBERT, F1 99%, ONNX int8, ~65 MB, runs in browser with Transformers.js v3.

↯ Security ↯ DeepSeek 4 prompt-injection deepseek security+2
Claude Code plugins a risk to local ecosystem? (www.reddit.com) +52 5w

There's an increasingly popular way to ship complex extensions for agentic work, that is specific to Claude Code, which is Code plugins. For example here's deep-wiki by Microsoft, a plugin to create a wiki from analyzing your project's rep…

agentic anthropic claude-code
how to architect ai agents for regulatory approval? (www.reddit.com) +57 5w

spent a lot of time on agent architecture for mission critical environments. getting an agent to browse the web or draft an email is trivial compared to deploying one where a hallucination carries real legal or physical consequences.

↯ Hallucination hallucination agentic
Show HN: Strava for AI coding – analytics on your Copilot/Claude/Codex usage (github.com via hn) +5 5w

AI Engineer Coach better agentic engineering. Analyze your AI coding assistant usage — any harness, one dashboard.

↯ Copilot copilot codex agentic
Ask HN: Which memory systems are you using in your agents? (news.ycombinator.com) +5 7w

Are you using an open source version, hosted product or maybe you have rolled your own? What is working, what is missing, and how are you evaluating the usefulness of memory for your agentic projects?

agentic
Every week this we see some version of "how do I evaluate my LLM app?" and the answer almost always stops at RAGAS or DeepEval. Here is the part of the evaluation stack most tutorials skip in 2026. (www.reddit.com) +51 7w

The same question lands on this sub a few times a week, and the standard answers (RAGAS, DeepEval) are correct but stop one layer short of what you actually need once your app leaves a notebook. Wanted to lay out the full picture for anyon…

rag agentic
What do you use Gemma 4 for? (www.reddit.com) +514 7w

Both Gemma 4 and Qwen 3.6 seems to be the hottest local models right now. Looking at the benchmarks and reviews, it seems like it's better in every way: coding, benchmarks, agentic tasks.

↯ Qwen 3.6 gemma qwen agentic
Hollow: An Agentic OS with self-modifying kernels and distributed multi-agent transactions. (www.reddit.com) +57 7w

I’ve been building an infrastructure layer for agents that treats the LLM like a process, not a chatbot. It’s called Hollow AgentOS.

agentic
Donating Agent Payments Protocol to the Fido Alliance (blog.google via hn) +5 8w

For agentic technology to scale, it needs to work for everyone. That’s why over the last few months, we’ve shared new open commerce and payments standards to serve as the building blocks for the future of AI shopping.

agentic
Show HN: VT Code – Rust TUI coding agent with multi-provider support (github.com via hn) +51 8w

Hi HN, I built VT Code, a semantic coding agent. Supports all SOTA and open sources model.

↯ Model Context Protocol model-context-protocol ollama gemini+4
Google Unveils Agent Skills Repository for Smarter AI Agents (cloud.google.com via hn) +51 8w

Level Up Your Agents: Announcing Google's Official Skills Repository Megan O'Keefe Senior Staff Developer Advocate As AI models improve, technical practitioners are increasingly turning to agentic AI tools to build with Google Cloud produc…

agentic
Harnesses Explained: The Inner and Outer Workings of the Coding Agent Harness (codagent.beehiiv.com via hn) +54 9w

When I started this newsletter, "harness engineering" was a term just starting to crop up. Now it's a household term in the community, and there's a lot of great material on it - most of it on building agentic systems with frameworks like…

agentic
Agentic memory with passive recall and citations as trust graph (github.com via hn) +51 9w

agentic
Agentic framework that _switches_ models based on role? (www.reddit.com) +512 9w

agentic
gemma4 vs qwen3.5 122A10 real usages (www.reddit.com) +54 9w

↯ Qwen 3.5 gemma qwen agentic
RTX PRO 5000 (48GB) vs MacBook Pro M5 MAX (128GB RAM) - The choice for fine-tuning & agentic coding (www.reddit.com) +527 9w

↯ Fine Tuning vllm fine-tuning llama+1
Codex v/s Cowork v/s Perplexity Computer v/s Kimi Agent Swarm (www.reddit.com) +5 9w

↯ Cowork cowork codex agentic
Agentic coding Qwen 3.6, Q6_K 125k context vs Q5_K_XL 200k context (www.reddit.com) +515 9w

What would you choose if you were in my shoes? How viable is 125k for agentic coding really?

↯ Qwen 3.6 qwen agentic
Opus 4.7 keeps bumping into a Malware Reminder (www.reddit.com) +54 10w

For context, I'm developing a game runtime modifier and reverse engineering kit with an agentic operator baked in. Something like Cheat Engine with a VS Code-style UI and an AI-first tool-heavy agentic harness.

↯ Security ↯ Opus 4.7 operator security opus+1
How do you think I should charge? (www.reddit.com) +512 10w

I recently started getting a few leads, but I still do not feel like I fully understand how I should charge for what I do. What I do is basically a service as software model.

agentic
Show HN: Mercury – No-code orchestration for human and agent teams (www.mercury.build via hn) +54 10w

Hey HN, I'm Naveen, one of three co-founders building Mercury (mercury.build). We spent the last year in deploying AI agents for teams in large enterprises.

show-hn agentic claude-code
$1,400/month with Cursor + Claude API — how are you managing costs while keeping a real agentic workflow? (www.reddit.com) +535 10w

Hey, This month I hit $1,200 in Claude API costs inside Cursor (Opus 4.6 + Sonnet 4.6) on top of the $200/mo Ultra plan. $1,400 total.

↯ Sonnet 4.6 cline sonnet cursor+3
New agentic coding SOTA models (twitter.com via hn) +41 22h

Aloha! 🌺 Meet Ornith-1.0, a family of open-source LLMs specialized for agentic coding.

agentic
N8n 2026 AI agent builder report (n8n.io via hn) +4 3d

A technical evaluation of workflow-based automation tooling for building enterprise-grade agentic systems using LLMs. This is the second iteration of the report, conducted by independent research analyst Andrew Green in Q2 2026 Workflow-ba…

agentic
Show HN: Ferrix AI – Agentic Product Management Platform (ferrix.ai via hn) +4 8d

Hi HN, for the past few months, we’ve been working on Ferrix AI (https://ferrix.ai/) As AI agents speed up engineering, deciding what to build has become the bottleneck. Developers got faster because agents fit into their workflow: tech de…

agentic
Agentic coding and persistent returns to expertise (www.anthropic.com via hn) +41 9d

Key findings - Building on prior work, we introduce a framework for studying interactive agentic coding based on a privacy-preserving analysis of ~400,000 Claude Code sessions from between October 2025 and April 2026. We evaluate the compo…

agentic claude-code
Show HN: Memento – Self-hosted agentic search and LLM wiki over your email (news.ycombinator.com) +41 9d

Our email inboxes carry multiple decades of messages (100K-500K). This is a good proxy for all the important things that happened in your life, the projects you have done and the people that you have connected with.

agentic
Ask HN: How are you adapting technical interviews in this agentic era? (news.ycombinator.com) +4 10d

could not extract summary

agentic
Agentic Code Must Be Human Auditable (dockyard.com via hn) +4 2w

I have been AI-pilled for over a year at this point. It's pathetic, I rarely touch-code any more.

agentic
Show HN: The first agentic coding engine that hot-reloads the full stack (serverpod.dev via hn) +4 2w

I’ve been working on the next version of Serverpod, an open-source backend written in Dart for the Flutter community. We are getting close to a final release.

agentic
Computex 2026: Are We Heading for the Agentic PC Era Yet? – EE Times (www.eetimes.com via hn) +4 2w

Computex 2026: Are We Heading for the Agentic PC Era Yet? - EE Times Advertisement Skip to main content Aspencore networkNews & Analysis Products Design Tools About Us AspenCore Network News the global electronics community can trust eetim…

agentic
X402 Batch Settlement: High-Velocity Agentic Commerce (www.x402.org via hn) +4 3w

Introducing x402 Batch Settlement: High-velocity Agentic Commerce May 11, 2026 By: Cam Whiteside (Cloudflare), Carson Roscoe (Coinbase), Conner Swenberg (Coinbase), Josh Nickerson (Coinbase), Philippe d'Argent (Coinbase) TL;DR: The x402 pr…

agentic
Show HN: Clor – give your agent claws (clor.com via hn) +41 3w

At my last job I spent a year building an agentic coding platform used by hundreds of thousands of people. Along the way I tried building a hosting service on OpenClaw, and also ran Hermes myself for a while.

openclaw agentic
MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities (twitter.com via hn) +4 3w

MiniMax (official) @MiniMax_AI Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench H…

↯ Minimax ↯ Swe Bench swe-bench minimax agentic
Minimax M3 on Open Router (openrouter.ai via hn) +4 3w

MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and video inputs with text output, a 1M-token context window, and is suited for long-horizon agentic work, coding, and tool use.

↯ Tool Use ↯ Minimax tool-use minimax agentic
Spitting Out the Agentic Kool-Aid (openpath.quest via hn) +4 3w

Spitting Out the Agentic Kool-Aid One Sunday evening last June, three friends met in Vienna to relive the glory days: coding all night. This time, Claude joined them.

agentic
Visa invests in Replit to power agentic payments for developers (techcrunch.com via hn) +4 4w

Visa has announced an undisclosed investment in AI coding platform Replit. The two companies are also exploring how to integrate Visa’s payment products into Replit, so that developers — and the AI agents they build — can accept payments d…

agentic
Show HN: VAEN – Package and import portable AI coding-agent Harnesses (github.com via hn) +41 4w

Hi HN, I built VAEN (an open source CLI) because I kept running into a boring problem with AI coding-agent workflows: the setup becomes useful, but then it is hard to move. A good, useful agentic harness consists of more than just instruct…

mcp agentic
Q4_K_M is fine for chat and a trap for agents. Here is math mathing. (www.reddit.com) +424 4w

saw the Q4_K_M vs Q6 thread earlier and the comments are talking past each other. "few errors per hour" vs "errors every couple days" sounds like a 24x difference.

agentic
Show HN: The platform layer for agentic ML engineering (github.com via hn) +4 4w

LUML: One platform for the entire AI lifecycle Home Page | Discord | App | Documentation LUML is a platform for managing the complete machine learning lifecycle, from initial experiments to production deployment. It provides experiment tra…

agentic
Versatility of Exasol with Agentic Engineering (www.exasol.com via hn) +4 4w

Versatility of Exasol with Agentic Engineering Exasol is an analytical database. It’s built for joins, aggregations, window functions, and the kind of queries that chew through billions of rows before your coffee gets cold.

agentic
Claude is the best AI humanizer when you give it your writing style and a detector loop (www.reddit.com) +4 5w

I built this because I kept seeing a very boring workflow play out at home. My girlfriend would write with Claude, paste the draft into Slop or Not (an app that I built), see what still looked AI-ish, tweak the prompt, paste the next draft…

↯ Opus 4.7 opus mcp agentic
favorite Agentic Coding Harness (www.reddit.com) +421 5w

So far, I’ve tried Codex CLI, Claude Code, Gemini CLI, OpenCode, and recently, Pi with local models. Pi is the leanest of them all, with just four tools: read, write, edit, and bash.

qwen gemini codex+2
Articraft: An Agentic System for Scalable Articulated 3D Asset Generation (articraft3d.github.io via hn) +4 5w

Articraft is an agentic system for scalable articulated 3D asset generation. A coding agent writes programs against an LLM-friendly SDK to produce simulation-ready articulated 3D assets from text descriptions.

agentic
Qwen3.6:27b single-shot fixed a CSS UI bug that had Gemma4:26B doom looping uselessly for 15 minutes (www.reddit.com) +421 6w

Warning: long post ahead. On the bright side, it's 100 percent human-written, typos and all.

↯ Qwen 3.6 moe agentic
Best practice for accurate translation at minimal cost? (www.reddit.com) +45 6w

I've been meaning to translate forum post type content for one of my partner's sites. Objective to open up the audience base.

deepseek agentic
Microsoft researchers find AI models and agents can't handle long-running tasks (www.theregister.com via hn) +4 6w

MOST POPULAR EVENTS - Securing the Untrusted Agentic Development Layer Join us to learn how to architect a development environment where your builders and their agents can move fast and securely. - Toxic Flows: When Your AI Agent Skill Bec…

agentic
500k context on 48gb VRAM!! - 21tok/s (coding) (www.reddit.com) +41 6w

I found this model hiding in the corner of huggingface: https://huggingface.co/Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF Looks to be tuned specifically for math but i thought i'd give it a try since i cant run the full 12b nem…

agentic
Sandboxing AIOps and Agentic AI Security (blog.cosmonic.com via hn) +4 6w

When people talk about AI sandboxes today, they usually mean: - seccomp, seatbelt, or bubblewrap - containers built from namespace mappings, cgroups, and allowlists - hand-tuned profiles bolted onto the existing OS - some assemblage of the…

agentic
Nowadays, what are the best AI tools for a single dev working on personal projects? (www.reddit.com) +44 7w

I have 2 years of experience doing data engineering and ai engineering, but I also have background in software engineering and machine learning in college due to my thesis. I've aways wanted to apply my computer science knowledge to my sid…

agentic
Opus 4.6 does better research, Gemini 3.1 has better judgment (www.reddit.com) +42 7w

Figured this out by running 4 models: Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro, and Grok 4.20, on a benchmark of 1,417 binary forecasting questions resolving Oct–Dec 2025 with two evaluation conditions: agentic (each model does its own web…

↯ Gemini 3.1 ↯ Gemini 3.1 grok gpt-5 gemini+2
What industries already use agentic AI in production? (www.reddit.com) +46 7w

Curious which industries have actually moved beyond pilots and are using agentic AI in real production workflows. Are these systems driving measurable outcomes or still mostly augmenting existing processes?

agentic
DeepSeek V4 Pro matches GPT-5.2 on FoodTruck Bench, our agentic benchmark — 10 weeks later, ~17× cheaper (www.reddit.com) +4 7w

Tested DeepSeek V4 Pro on FoodTruck Bench — our 30-day agentic benchmark where models run a food truck via 34 tools (locations, pricing, inventory, staff, weather, events) with persistent memory and daily reflection. First Chinese model to…

↯ DeepSeek 4 grok gpt-5 deepseek+2
I'm looking for an AI Automation Engineer role or gig (news.ycombinator.com) +4 7w

Hi all, I'm an AI automation engineer who builds systems that replace manual work, scale outreach, and turn workflows into revenue. I have sent out working systems for managing leads to CRM, finding real estate deals, sorting emails with A…

rag agentic
The Block Model Behind Warp's Agentic Development Environment (www.warp.dev via hn) +4 8w

Warp has come a long way since it initially set out to modernize the terminal. In the screenshot above, an agent is working through a plan alongside a developer's own shell commands — running its own commands, reasoning, proposing a diff —…

agentic
Learn, run and test Agentic AI on your browser for free! (Built with Claude Opus 4.7 in 2 days) (www.reddit.com) +48 8w

Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…

↯ Fine Tuning ↯ Function Calling ↯ Opus 4.7 function-calling fine-tuning rag+4
Agents vs Workflows (www.reddit.com) +411 8w

What’s a task that actually needs an agentic loop? I have shipped a handful of tools for myself including a morning brief, a research summarizer, and a couple extraction pipelines.

agentic
I built Claude Code skills for writing agent prompts, grounded in prompt research (github.com via reddit) +44 8w

I've been building agentic systems for a while and wanted a more systematic approach to writing prompts. So I gathered papers, did some deep research and created guides on structure, format and prompting techniques.

agentic claude-code
What's Missing in the 'Agentic' Story (www.mnot.net via hn) +41 9w

What's Missing in the ‘Agentic’ Story Friday, 24 April 2026 For much of the history of computing, it was reasonably safe to assume that a machine was doing what you told it to do (and what its creators promised it would do), because its op…

agentic
Free hands-on lab: build a ReAct agent 3 ways (create_agent, raw LangGraph with tool-call budget, NVIDIA NAT YAML) (www.reddit.com) +43 9w

dpo vllm agentic
As Agentic AI explodes, Amazon doubles down on MCP (thenewstack.io via hn) +4 9w

As agentic AI explodes, Amazon doubles down on MCP At the recent MCP Summit in New York City, The New Stack sat down with Clare Liguori, Senior Principal Software Engineer at AWS and core maintainer of the open-source Model Context Protoco…

mcp agentic
Enough with perplexity and KLD! BenchLocal benchmarks real use cases and is easy to use for everyone (www.reddit.com) +4 9w

Hello everyone, I have followed stevibe on X for a while after he released Tool Call 15, an easy to use benchmark to test the tool calling performance of various models. All you needed to do was to point the benchmark to an OpenAI compatib…

agentic openai
GPU strategy for local LLM + mixed workloads (70-person company) — NVIDIA vs AMD? (www.reddit.com) +43 10w

Hey all, we’re a mid-sized company (~70 people) and currently planning to bring a lot of our workloads on-prem instead of relying on cloud APIs. The goal for the moment is to run small to mid-sized models in the range of 30B like Qwen3.6 o…

↯ Qwen 3.6 rag agentic
Stopping the Meta AI director's "OpenClaw failure with an out-of-band killswitch (highflame.com via hn) +43 10w

On February 23, 2026, the AI safety community witnessed a definitive case study in agentic failure. Summer Yue, the Director of AI Alignment at Meta’s Superintelligence Lab, watched in horror as her OpenClaw agent began a "speedrun" deleti…

openclaw agentic
Systems Engineering: The Key to Building Agentic Software That Works (www.ashpreetbedi.com via hn) +4 10w

Systems Engineering The Key To Building Agentic Software That Works In the early 1940s, Bell Labs was building the national telephone network, the most complex technical system in the world at the time. Millions of switches, cables, relays…

agentic
Tested 6 browser use agents for real-world tasks — here's an honest breakdown + looking for recommendations (www.reddit.com) +45 10w

I've been on a hunt for a browser agent that can reliably handle daily agentic tasks: filling job applications, logging into sites and fetching data, making posts on my behalf, solving assignments and reporting results, and API/troubleshoo…

↯ Qwen 2.5 ollama chatgpt mcp+1
Draining Wallets via Prompt Injection in Coinbase AgentKit (457e884c.x402warden-blog.pages.dev via hn) +42 10w

Coinbase AgentKit Prompt Injection: Wallet Drain, Infinite Approvals, and Agent-Level RCE# Reported 13 days after Coinbase launched Agentic Wallets. Validated by Coinbase.

↯ Security prompt-injection security agentic
Applied AI Implementation Engineer Freelance (news.ycombinator.com) +3 17h

Open to Work I build production AI systems that add intelligence to processes. My work includes Closed-Loop AI-native systems, RAG, AI agents, agentic evaluations, guardrails, and enterprise integrations using Python, TypeScript, React, No…

rag gemini agentic+1
Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding (deep-reinforce.com via hn) +3 20h

Aloha! 🌺 Today, we are introducing Ornith-1.0, a self-improving family of open-source models specially for agentic coding tasks.

agentic
Ask HN: What do you do to save tokens? (news.ycombinator.com) +3 1d

Lots of products working on saving-tokens-space. Compression, Tool Output rewrite, Sitting as proxy Cache between harness and provider , doing circus with interceptor hooks - these are some of the approaches we are seeing today.

agentic
Show HN: Docket Fleet – mobile device cloud (fleet.docketqa.com via hn) +3 1d

Hello Hacker News. Boris here from Docket (YC P25).

agentic
Agents.md Decision Guide – open-source tool for choosing agentic workflows (www.groundwork.md via hn) +3 2d

groundwork Your AI coding agent is only as good as the context you give it. AGENTS.md is how you do that — it's the file that tells your agent your stack, your rules, your patterns, and how your team works.

agentic
Active Group and "Agentic Engineering" (funktionale-programmierung.de via hn) +31 2d

These past few months, we had many conversations on the future of software development - as did everyone in our industry. The topic is, of course, „Agentic Engineering“ (AE), the use of LLM-based agents to directly translate requirements i…

agentic
Finding a Feedback Loop: shipping my first prod agentic feature at Pair Team (pairteamtech.substack.com via hn) +3 2d

Finding a Feedback Loop Notes from shipping my first production agentic feature at Pair Team. Sitting down at my desk, hands on keyboard, an old engineering mantra from my days deep in the SF Bitcoin community reverberated through my mind:…

agentic
Ask HN: In the age of agentic coding why no one talks about orchestration tools (news.ycombinator.com) +32 3d

could not extract summary

agentic
Show HN: Saar Agentic Orchestration Platform (github.com via hn) +31 4d

Hi everyone i have been building saar nexus from a month or so now vibe coding the project and i want you to all please try it out and give me your valuable feedbacks for me to work and improve this further. Saar Nexus is a multi persona a…

agentic
One Prompt Agentic AI Marketing for Game Developers (www.youtube.com via hn) +3 5d

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

agentic
Ask HN: Do you use Claude Code, Codex, or something else? (news.ycombinator.com) +31 5d

Do you use Claude Code, Codex, or a different vibe coding/agentic engineering tool for most of your work? Why?

codex agentic claude-code
GLM-5.2 Beat Fable 5 at Website Design (twitter.com via hn) +3 6d

https://t.co/JSn0lDCNkB Design Arena@DesignarenaArticleHow GLM-5.2 Beat Fable 5 at Website DesignGLM 5.2 ranks 1st overall on Design Arena’s single-turn, HTML Web Design (Non-Agentic) evaluation, 5 places higher than its predecessor GLM-5.…

↯ Glm ↯ GLM 5.2 glm agentic
Ask HN: Do you find vibe coding / agentic engineering to be fulfilling? (news.ycombinator.com) +34 7d

I'm having trouble reaching that golden "builder" zone when I use things like Claude Code. It's cool to be able to conjure software from scratch using these tools, but the output...

agentic claude-code
A PostgreSQL Database for Every Agent: In-Database RAG, Graph, and Multitenancy (www.yugabyte.com via hn) +3 7d

Discover newly released YugabyteDB 2026.1 and YugabyteDB AMP (Agentic Multitenant PostgreSQL): a true serverless, scale-to-zero PostgreSQL where every agent gets its own real, isolated database starting at a fraction of the cost of a core.…

rag agentic
Agentic AI Comes to Medicine (erictopol.substack.com via hn) +3 7d

Agentic AI Comes to Medicine Expansion of Capabilities With Two New Medical AI Models It was just a matter of time. Agentic autonomous AI has already been applied to life science and many other domains, and today there were 2 notable publi…

agentic
Genoma Labs' open 14B agentic coding model trained on Kraken (huggingface.co via hn) +3 8d

KALYPSO v1.1L KALYPSO v1.1L is GENOMA Labs' public agentic-coding model: Qwen2.5-Coder-14B-Instruct fine-tuned on the Kraken-Public corpus (CC-BY-4.0, decontaminated). It is the open counterpart of the internal KALYPSO line.

agentic
Xiaomi's agentic AI coding harness MiMo Code beats Claude Code at 200 step tasks (venturebeat.com via hn) +3 10d

Xiaomi's MiMo AI team has open-sourced MiMo Code V0.1.0, a terminal-native AI coding assistant that the Chinese electronics giant says outperforms Anthropic's Claude Code on key agentic coding benchmarks, especially on long-horizon, multi-…

agentic anthropic claude-code
Token-saviour – routing skill for AI agent tool selection (~70% fewer tokens) (github.com via hn) +3 10d

Skills Personal collection of agent skills for day-to-day use. Installation Copy any skill directory into your agentic platform skills folder: cp -r ~/.agents/skills/ Then use it naturally in conversation — each skill's description tells t…

agentic
Ask HN: Did you try Claude's "Fable 5" model before it was pulled? (news.ycombinator.com) +31 11d

I did. And it got me thinking.

↯ Copilot copilot agentic
The evolution of agentic surfaces: building with Claude Managed Agents (claude.com via hn) +3 11d

The evolution of agentic surfaces: building with Claude Managed Agents As model intelligence and agentic harnesses evolve, Claude Managed Agents allows teams to build and deploy agents in production environments reliably at scale. Here’s w…

agentic
Reinventing Control Theory One Feature at a Time: The Fallacy of Agentic Loops (medium.com via hn) +3 12d

medium.com Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.

agentic
Ask HN: Any Local LLM can I run without GPU for Local Agentic workflow AI? (news.ycombinator.com) +3 2w

Claude Code like agentic workflow ai too costly for me.Any LLM can I run with VSCode at the below setup? 16ram Intel core i7 h processor 13gen 512gb NVMe SSD I want to run the ai as local agentic workflow with Vscode.I want use LLAMA agent…

↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 llama agentic claude-code
We Used Agentic AI to Fix Kong Gateway's Flakiest Tests (konghq.com via hn) +3 2w

The first thing we needed was a way to identify which tests were flaky and how often they failed. Luckily, the team had already built a dashboard on top of Datadog's CI Visibility feature that gives us a clear picture of the flakiest tests…

agentic
Show HN: Magenta Real-Time Music Generation on iPhone, Without the GPU (github.com via hn) +3 2w

Last Thursday, Deepmind released Magenta Realtime 2 , an open source music generation model. They said it could run on Mac, but not iPhone.

↯ Deepmind deepmind agentic
Claude Fable 5 missed a bug that Sonnet 4.6 caught (alikhallad.com via hn) +3 2w

When Anthropic released Claude Fable 5 this week, my feed filled up with the same benchmark charts within hours. SWE-bench scores, agentic coding numbers, the Stripe migration story.

↯ Sonnet 4.6 ↯ Swe Bench swe-bench sonnet agentic+1
A Mechanical Agentic Taxonomy (djschnei21.github.io via hn) +3 2w

Contents Loading taxonomy.md...

agentic
Manifesto for Agentic Teams – reorganizing engineering around AI agents (agentic-team-manifesto.org via hn) +3 2w

Outcomes over output More code is not more value. We measure what ships to users, not what ships to the merge queue.

agentic
Show HN: Loom, an open-source delivery harness for coding agents (github.com via hn) +3 2w

Dynamic workflows for agentic software delivery. An open delivery harness that turns Claude Code, Codex, OpenCode and other coding agents into repeatable software delivery systems.

codex agentic claude-code
Claude Fable 5: the first public Mythos-class model (artificialanalysis.ai via hn) +3 2w

June 9, 2026 Claude Fable 5: the first public Mythos-class model Anthropic has released Claude Fable 5, the first publicly available Mythos-class model that ranks #1 in our agentic real-world knowledge work benchmark GDPval-AA Claude Fable…

↯ Anthropic Mythos mythos agentic anthropic
CLAW.md – open format for agentic cron jobs (clor.com via hn) +31 2w

agentic
Show HN: RiddleRun – AI run end-to-end browser tests (github.com via hn) +3 2w

agentic
Nex N2 Pro: Frontier agentic performance at 400B (huggingface.co via hn) +3 2w

An agentic model with Agentic Thinking. Today, we are officially releasing and open-sourcing our next-generation model, Nex-N2 — an agent model built for real-world productivity scenarios.

agentic
When Can Amazon Block an Agentic AI Service?–Amazon vs. Perplexity (blog.ericgoldman.org via hn) +3 2w

by guest blogger Kieran McCarthy On March 9, 2026, Judge Chesney granted a preliminary injunction in the case of Amazon v. Perplexity, concluding Amazon was likely to succeed on its CFAA and California Penal Code section 502 theories.

agentic
AI Agents Now Generate More Web Traffic Than Humans (www.cnet.com via hn) +3 2w

The internet just crossed a remarkable threshold. Agentic AI internet traffic now exceeds that of real humans for the first time.

agentic
Beyondflow No-Code Multi-Agent Teams with Unlimited Runs. BYOK and Ollama (beyondflow.app via hn) +3 3w

Researcher GPT-5 Engineer Claude Critic GPT-5 Innovator Gemini Manager Context Guardian Agentic Workflow Architecture · v1.0 The future of AI Collec An R&D platform where differents AI agents collaborate under the supervision of a Context…

ollama gpt-5 gemini+1
Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud (boxes.dev via hn) +3 3w

Hi HN, we’re Nick and Drew, and we’re building boxes.dev – the first cloud-only agentic dev environment (ADE) that gives every Codex and Claude Code agent its own cloud computer. We’re two engineers who previously built Gem (co-founder/CTO…

codex agentic claude-code
Beyond the Semantic Layer: Building a Context Layer for the Agentic Era (www.kaelio.com via hn) +3 3w

A context layer puts your warehouse schema, joins, metric definitions, and business knowledge in one reviewable place so data agents query governed context instead of guessing field names. A look at how it works, and at ktx, the open-sourc…

agentic
Konversio: Open-source agentic customer support for digital sovereignty (www.konversio.org via hn) +3 3w

Agentic customer service.100% open source. Meet Pilot Konversio's AI support agent Konversio gives teams an open AI support layer they can own and self-host.

agentic
This Month in Agentic Coding – May 2026 (www.agenticcodingweekly.com via hn) +3 3w

Welcome to the first edition of ACW Monthly Brief. It's one email to catch you up on all the meaningful developments in agentic coding from the past month.

agentic
Show HN: OpenSOP, We got tired of agents lying to us, so we built them a harness (opensop.ai via hn) +31 3w

OpenSOP is an early open-source runtime/standard for executable agentic processes. You (or your agent) define a process in YAML, and OpenSOP exposes it as a typed REST API that agents and humans can both use.

agentic
Show HN: Self tuning chat exposing it's semantic and agentic cache (chat.betterdb.com via hn) +3 3w

RESP-compatible DBs and BetterDB Valkey · Redis · Dragonfly · BetterDB docs · semantic and kv/agentic cache demo Ask about Valkey, Redis, Dragonfly, or BetterDB Backed by live documentation with semantic caching and tool result caching. Wa…

agentic
Agentic Mfw (agenticmotherfucking.website via hn) +32 3w

And nobody gives a single fuck how it's built anymore. The previous motherfuckers spent a decade teaching you the holy commandments of clean code.

agentic
Show HN: MetaBrain – A local document memory for AI agents (metabrain.eu via hn) +3 3w

Hello there HN I experimented with agentic coding recently and I felt the need to track more contextual data by project. Also I felt the need to be able to go beyond the 1D chat to communicate with agents.

agentic
Kelsey Hightower on Practical and Responsible Use Cases for Agentic AI [video] (www.youtube.com via hn) +3 3w

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

agentic
Pioneering the Agentic Shift Within Salesforce Engineering (www.salesforce.com via hn) +3 3w

Key Takeaways - Autonomous tools are now writing code, reviewing pull requests (“PRs”), and driving deployments across the software development lifecycle. - Standardizing on Claude Code and removing token limits improved output and quality…

agentic claude-code
Amazon Strikes $6B Deal with Snowflake for Agentic Computing Chips (www.wsj.com via hn) +3 4w

Exclusive | Amazon Strikes $6 Billion Deal With Snowflake for Agentic Computing Chips - WSJ Skip to Main Content Skip to... Select What to Read Next Most Popular News Most Popular Opinion Asia Dow 6311.60 -1.04% Nikkei 64978.67 -0.03% Hang…

agentic
Deep research led astray by AI Slop, iterating with source filtering helped (www.reddit.com) +32 4w

tdlr; don't trust deep research out of the box by default, need prompts / skills / iteration to filter AI slop from sources [The purpose of this post is to report a example of the default deep research going astray and how I worked around…

agentic
Does Cursor have a /goal mode? (www.reddit.com) +31 4w

In Codex GUI (e.g. on Mac; I'm not sure about other platforms as I don't use them), in the Claude Code GUI, and even in the open source Codex CLI (see https://github.com/openai/codex/tree/main/codex-rs/ext/goal/src ) -- there is a feature…

codex cursor agentic+3
I built an Agentic AI Filmmaking Studio for people who have stories to tell but lack the budget and technical skills. (Giving away 10 free credits for the next 48 hours) (www.reddit.com) +32 4w

Hey everyone, I just launched MotionX Studio (Link in comments). The premise is simple: Filmmaking is completely gatekept by money and highly technical skills.

agentic
Agyn: open-source distributed agent runtime on Kubernetes — like Google's AX, with pre-built Claude Code and Codex agents, and full credential isolation from the LLM (www.reddit.com) +36 4w

Agyn is an open-source, Kubernetes-native agent runtime that moves AI agents like Claude Code and Codex from laptops to company infrastructure with the controls you actually need to run them in production. If you've been reading about Goog…

gemini codex mcp+2
What would 2x RTX 3060 12GB get me? (www.reddit.com) +33 4w

TLDR: I’m considering buying 2 RTX 3060 12GB as opposed to single 24GB card to gain experience and need to know what can be realistically accomplished with this setup. Sorry in advance, I know you guys are probably tired of these kinds of…

agentic
Agentic Compilation: Reducing LLM Rerun Costs (arxiv.org via hn) +3 4w

LLM-driven web agents operating through continuous inference loops -- repeatedly querying a model to evaluate browser state and select actions -- exhibit a fundamental scalability constraint for repetitive tasks. We characterize this as th…

agentic
Claude is thinking for 20+ minutes! (www.reddit.com) +36 4w

I gave Claude a genuinely hard problem today: a subtle bug somewhere in a video encoding ffmpeg pipeline, the kind where the output is slightly wrong and you can't tell which stage introduced it. I'd been stuck on it manually for a while,…

agentic
Cisco Foundry Security Spec: Open specification for agentic security evaluation (github.com via hn) +3 4w

Foundry Security Spec An open specification for agentic AI security evaluation, from Cisco. Cisco's Advanced Security Initiatives Group has built and operated an agentic security evaluation internally across several iterations and deployme…

agentic
Agentic AI in Big Tech and Enterprise (www.reddit.com) +32 4w

Disclaimer - this post was rewritten with AI based of my brain dump. Yet, I find it inspirational and useful.

agentic
Agentic AI token usage balloons cost at Microsoft, Meta, Amazon (www.tomshardware.com via hn) +31 4w

AI cost crisis hits tech giants as employee 'tokenmaxxing' backfires, sparking corporate pullback at Microsoft, Meta, and Amazon — agentic AI eats up to 1000x more tokens than standard AI AI is getting too expensive. Many tech companies ar…

agentic
Apex-Testing: real-world, real repos, agentic coding benchmark (Update) (www.reddit.com) +35 4w

BIG Apex-Testing update! https://www.apex-testing.org/ The Real-World Agentic Coding benchmark has been (95%) updated with all recent models!

agentic
Optimizing speed & quality on Qwen3.6 27b (www.reddit.com) +34 4w

Does the inference speed below seem optimal for the hardware, or could there be further room for improvement ? I’ve been trying to use Qwen3.6 27b for agentic harnesses like Pi/Hermes.

↯ Qwen 3.6 agentic
Cohere Open-Sources Command A+, a 218B Moe Model That Runs on Two H100s (firethering.com via hn) +3 4w

Cohere spent the past year deploying North, its enterprise AI workspace, with actual customers doing actual work. Agentic question answering over company file systems.

moe agentic
Agentic-Agile: Why Agent Development Needs Agile (Not Just Prompts) (developer.microsoft.com via hn) +3 4w

“A bad system will beat a good person [or agent] every time” ~Dr. William Edwards Deming (with apologies) I started vibe coding by writing prompts (often dictated into my phone), refining them with an agent in M365 Copilot, and creating ha…

↯ Copilot copilot agentic
Launch HN: Superset (YC P26) – IDE for the agents era (github.com via hn) +3 4w

Hey HN, we’re Avi, Kiet, and Satya. We’re building Superset (https://github.com/superset-sh/superset), an open-source agentic IDE for running coding agents like Claude Code, Codex, OpenCode etc in parallel.

codex agentic claude-code
Built my own agent runtime after hitting the ceiling with LangGraph — UI as graph nodes, Postgres durability, zero orchestration cost (www.reddit.com) +36 5w

I've been building agentic applications for around 2 years now. Started with loops, then moved onto langgraph + Assistant UI.

agentic
Building an Ai Agentic team with Claude (www.reddit.com) +36 5w

I've built an app using Claude/Claude Code, everything from the frontend to the backend. The app is actually functioning really well, tests are passing, and I have a small controlled group of testers that are actively using the app daily.

agentic claude-code
Show HN: Clark-Browser – Stealth Chromium (github.com via hn) +3 5w

Fully open-sourced, perfect for agentic browsing, works with Vercel's agent-browser and playwright.

agentic
"Is it true that you can keep coding 24/7 with AI!?" How are you conducting real-world tests in Agentic engineering? (www.reddit.com) +34 5w

I think many people are moving beyond "vibe coding" and building development harnesses using Agentic engineering. It’s true, I don’t write code myself anymore.

agentic
Why might MTP be net negative for tool heavy agentic flows? (www.reddit.com) +31 5w

The Qwen3.6-27B MTP benchmarks that have been circulating put factual tasks at 62-70% acceptance vs code at 79-89%. Tool calls probably sit in that factual range or lower, structured output, constrained format, less predictable than pure c…

↯ Qwen 3.6 agentic
Why agentic payments keep breaking. The IMF just put a name to it (www.reddit.com) +37 5w

The IMF published a formal note on agentic payments last month. One framing stuck with me more than the rest: "Payment systems must reconcile two fundamentally different design logics: the adaptive, probabilistic nature of agentic AI syste…

agentic
Pro X20 weekly quota is draining insanely fast after the latest Codex update. Pro X20 used ~48% in one day!!! (www.reddit.com) +31 5w

I’m on the Pro X20 plan, and after the latest Codex update / limit reset my weekly quota started draining much faster than before. In roughly one day of work, around 12 hours total, I went from a fresh reset to 52% remaining on the weekly…

codex agentic
[Vex] - I built an open-source terminal AI video editor that edits real footage with FFmpeg, Whisper, and agent tool calls (www.reddit.com) +34 5w

Most AI video tools feel backwards. They start with the model.

agentic
Recommendations for an agentic harness (not OpenClaw)? (www.reddit.com) +38 6w

I'd like to set up a local "software factory" on my laptop (M5 Max, 128GB). To do this, I'd like my agent to poll for new GitHub issues and work on them.

openclaw agentic
Anthropic built the agentic features. Now they're billing them separately. (www.reddit.com) +312 6w

Starting June 15, Claude subscribers get a separate monthly credit for Agent SDK and claude -p usage: $200/mo for Max 20x, $100 for Max 5x, $20 for Pro. Once you burn through it, programmatic usage stops unless you've opted into extra usag…

agentic anthropic claude-code
Claude Code already does afk agentic work without touching the new programmatic limits (www.reddit.com) +31 6w

Use the official channels plugin, and the teams agent in Claude code. CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 /plugin marketplace add anthropics/claude-plugins-official /plugin install discord@claude-plugins-official /reload-plugins Discord…

agentic anthropic claude-code
A²RD: Agentic Autoregressive Diffusion for Long Video Consistency (dxlong2000.github.io via hn) +3 6w

Synthesizing consistent and coherent long video remains a fundamental challenge. Existing methods suffer from semantic drift and narrative collapse over long horizons.

agentic
Max20 user: anyone running Opus 4.7 as orchestrator + DeepSeek V4 as the worker via OpenRouter? (www.reddit.com) +34 6w

I'm on the Max20 plan, thinking about a setup before I sink time into it. Want to hear from anyone actually running it, not theorycraft.

↯ DeepSeek 4 deepseek opus agentic+1
Show HN: Statewright – Visual state machines that make AI agents reliable (github.com via hn) +3 6w

Agentic problem solving in its current state is very brittle. I fell in love with it, but it creates as many problems as it solves.

↯ Swe Bench swe-bench agentic
What I've learned designing agentic workflows for docs (passo.uno via hn) +3 6w

What I've learned designing agentic workflows for docs Back in 2024 I wrote that AI helps me remove boring work at the margins. This is fine for a lone writer, but how to scale this to an entire team of technical writers?

agentic
Show HN: SLayer, a semantic layer maintained by your agent (github.com via hn) +3 6w

Hello HN! If you want to connect your agent to a database (say, to build a data analyst chatbot or any kind of agentic app) today you have 2 options: an SQL MCP server or a semantic layer.

mcp agentic
We need a safe alternative to Telegram for agents like OpenClaw or Hermes (news.ycombinator.com) +31 6w

The problem with Telegram is, that it is not E2EE - so every message you send will end up *unencrypted* on their servers. Think about it - how often did you post the Gmail authentication URL or another API token in the Telegram chat?

openclaw agentic
Looking for seed funding (www.reddit.com) +31 6w

Looking for seed funding for a agentic solution that helps companies grow their business via hyper personalised curated content distributed to multiple Chanels and decrease CAC. This tool is for companies who are focused on their niche eg:…

agentic
Agentic Hooks - Stream Deck plugin (www.reddit.com) +31 6w

I had itch to address long running task with Claude, where I wanted to see when its done working. And I wanted separate context flow for these alerts instead of using existing flow (phone, discord, telegram, etc) This is when idea born, sh…

agentic
DS4 (www.reddit.com) +31 6w

The developer that created Redis, Salvatore Sanfilippo, has released a new project on GitHub named DS4. https://github.com/antirez/ds4/ The TL;DR on this one is getting DeepSeek V4 Flash running with a 1M context windows on Mac Metal hardw…

↯ DeepSeek 4 deepseek agentic openai+1
MCP for sandboxed, reproducible envs for agentic-first coding workflows (github.com via hn) +3 6w

devcontainer-mcp Give your AI agent its own dev environment — not yours. devcontainer-mcp is an MCP server that lets AI coding agents create, manage, and work inside dev containers across three backends: local Docker, DevPod, and GitHub Co…

mcp agentic
Building Agentic GraphRAG Systems: From knowledge graphs and ontologies to a unified memory as an MCP server for your AI agent. (www.reddit.com) +32 6w

I gave this talk twice in one month: at O’Reilly’s Context Engineering Event and at Abi Aryan’s Maven course on LLM inference at scale. After being blasted with questions, I realized something: GraphRAG isn’t a retrieval algorithm, it’s a…

rag mcp agentic
How are you protecting your AI agents' memory from poisoning attacks? (www.reddit.com) +34 7w

As AI agents become more autonomous and persist memory across sessions (RAG indexes, conversation history, vector stores), there's a growing attack surface that most people aren't thinking about: memory poisoning.An attacker can plant mali…

↯ Security prompt-injection rag security+1
Why people cares token/s in decoding more? (www.reddit.com) +316 7w

What I've noticed while using local LLM recently is that in most cases, bottlenecks occur not in decoding but in prompt processing. If the prompt processing speed is usable, in most settings (since it takes about 15k when starting based on…

↯ Qwen 3.6 agentic
Agent Exchange – A2A discovery with real-time bidding for AI agents (github.com via hn) +3 7w

Agent Exchange (AEX) The NASDAQ for AI Agents A programmatic marketplace applying ad-tech economics for agentic AI services What Problem AEX Solves? As AI agents proliferate, enterprises face a critical challenge: the N×M integration probl…

agentic
Need advice: Qwen3.6 27B MTP or 35B-A3B MoE MTP on 16GB VRAM RTX 5080)? (www.reddit.com) +35 7w

Hey folks, looking for advice before I delete or keep a huge model file. I’m testing local coding/agentic workflows on an RTX 5080 16GB + 96GB RAM.

↯ Qwen 3.6 moe llama agentic
Agentic Malware Analysis: String Decryption, API Hashing and Unpacking [video] (www.youtube.com via hn) +3 7w

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

↯ Security security agentic
Anthropic quietly nerfed Claude Code's 1-hour cache (www.xda-developers.com via hn) +3 7w

Claude Code has become the default agentic coding tool for a lot of developers, and for good reason. It understands a codebase, calls tools, edits files, and can plan multi-step tasks with very little handholding.

agentic anthropic claude-code
Process-Level Reward Modeling for Agentic Data Analysis (arxiv.org via hn) +3 7w

Process Reward Models (PRMs) have achieved remarkable success in augmenting the reasoning capabilities of Large Language Models (LLMs) within static domains such as mathematics. However, their potential in dynamic data analysis tasks remai…

agentic
Is anyone else exhausted by "glorified prompt chains" being marketed as Agents? (www.reddit.com) +33 7w

It feels like every new SaaS wrapper right now claims to be "agentic." But when you actually look under the hood, 90% of them are just hardcoded prompt chains with a couple of basic API tools thrown in. I’ve been spending a lot of time rec…

agentic claude-code
Show HN: Zerminal – a terminal-first Zed fork for AI coding agents (zerminal.dev via hn) +3 7w

A terminal-first development environment for agentic coding. Use Claude Code, Codex, Aider, and other CLI agents in a focused workspace.

aider codex agentic+1
I cut Codex’s API Usage by 50% using a self modifying system (www.reddit.com) +32 7w

I've been developing a self-modifying Al agent system that effectively cut my Codex/Claude Code API usage in half, Codex makes a plan and then I basically just copy/paste Codex instructions for the agents to work on. Come back in 6 hours a…

↯ Qwen 3.5 qwen codex agentic+1
Best suited model for solo Dev (www.reddit.com) +34 7w

Hey everyone! I've kinda new to Claude, I've only had few chats with it but nothing too deep like projects etc.

↯ Copilot copilot agentic
Signals - finding the most informative agent traces without LLM judges (arxiv.org) (www.reddit.com) +32 8w

Hello Peeps Salman, Shuguang and Adil here from Katanemo Labs (a DigitalOcean company). Wanted to introduce our latest research on agentic systems called Signals.

agentic
love it - Qwen3.6-27B — UD-Q5_K_XL evaluation (www.reddit.com) +31 8w

by Kyle Hessling A hands-on benchmark of the Unsloth dynamic Q5 quantization, self-hosted on a single RTX 5090. 19 runs, 93.9 k generation tokens, across agentic reasoning, production-grade front-end design, and canvas / WebGL creative cod…

↯ Qwen 3.6 agentic
We scanned 100 Smithery MCP servers, 22 flagged, here's what we found (news.ycombinator.com) +32 8w

We built Bawbel (https://bawbel.io), an open-source scanner for agentic AI components. Released v1.0.1 this week.

mcp agentic
I audited LangChain’s core library and found 10+ Prompt Injection vulnerabilities. Here is the technical breakdown. (www.reddit.com) +35 8w

Hey everyone, I’ve been working on a project to solve a major problem in AI security: Traditional SAST tools (Snyk, SonarQube, etc.) are blind to "Agentic Logic" bugs. They look for bad strings, but they don't understand how user data can…

↯ Security prompt-injection security agentic
Ask HN: Anyone using AI agents for active learning sprints? Here's my setup (news.ycombinator.com) +31 8w

Hi HN, I'm a big fan of AI's ability to provide personalized tutoring. So, lately, I have been using my Antigravity IDE (you can use any agentic harness) for personal learning.

rag mcp agentic
Andrej Karpathy: From Vibe Coding to Agentic Engineering [video] (www.youtube.com via hn) +3 8w

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

agentic
Spam bots are ruining it for everyone (www.reddit.com) +32 8w

Sorry for this rant, but I feel like venting to someone. Recently I set up an agent on a cloud VPS.

agentic
14-day growth agents contest on a serious AI stack (for loop-minded builders) (www.reddit.com) +32 8w

Sharing an AI-native growth agents contest that feels very on-brand for this sub. VideoDB (infra for video/audio for AI agents) is running a 14-day sprint/contest called Growth Forge for 5 builders to design and ship a growth agent on top…

agentic
What agentic framework are you actually using in production? (www.reddit.com) +32 8w

Feels like a new agent framework drops every other week. Curious what people are actually shipping with vs just experimenting on weekends.

agentic openai anthropic
The Race Is on to Keep AI Agents from Running Wild with Your Credit Cards (www.wired.com via hn) +3 8w

Between malware, online impersonation, and account takeovers, there are enough digital security problems out there as it is. And with the rise of agentic AI, more activity is being carried out by agents on behalf of humans—creating differe…

↯ Security security agentic
Show HN: SlopIt – A dead-simple CMS for your AI agent (slopit.io via hn) +3 8w

Hey HN. I built a dead-simple CMS for your AI agents — https://slopit.io Kept it minimal and agentic-first.

↯ Cowork cowork openclaw codex+1
DeepSeek V4 Pro: Validating Frontier Models for Production (fireworks.ai via hn) +3 8w

Why we chose correctness over a Day-0 launch DeepSeek V4 Pro is one of the most important open-model releases this year, with real advances in long-context reasoning, agentic performance, and inference efficiency. On paper, it looks like a…

↯ DeepSeek 4 deepseek agentic
Ask HN: Will fixed applications become a thing of the past with agentic AI? (news.ycombinator.com) +3 8w

Right now its mostly technical people using these agentic tools but if you extrapolate a few years into the future it seems likely to me that every day users of a computer will be using them as a whole new interface to interact with their…

agentic
Agentic Ai Revolution humming along… (www.reddit.com) +3 8w

while people argue about ai ethics on the surface there’s a whole underground building agents that never sleep different timelines forsure which timeline are you on?

agentic
Ace Technical Preview: GitHub Next's Agentic Workspace – Maggie Appleton [video] (www.youtube.com via hn) +3 8w

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

agentic
Mastermind – agentic SDLC workflow for VS Code (news.ycombinator.com) +3 8w

Prototype of an agentic SDLC workflow running inside VS Code + Copilot. Simple loop: task → reasoning → audit → memory → RAG refresh.

↯ Copilot copilot rag agentic
Show HN: HyperFrames – OSS Agentic HTML Video Framework for Agents (miguel07code.dev via hn) +32 8w

We built in HeyGen an open source framework specifically made for Agents solving our own pain point that we had when the agents tried to write Remotion. React is not agent-friendly at all, and Remotion is a custom framework where the agent…

agentic
how far we have came.. (www.reddit.com) +35 8w

From meta launching the lama models to oss models and agentic and coding models we have came fucking far in no mean i guess this is the fastest evolution out of all diff things we have saw this i guess is the era similar to diff innovation…

agentic
Pact: Trustworthy Coordination for Multi-Agentic Ecosystems (www.basis.ai via hn) +3 8w

Pact: Trustworthy Coordination for Multi-Agentic Ecosystems Article: Kiran Gopinathan, Jack Feser, Michelangelo Naim, Eli Bingham, Zenna Tavares |April 23, 2026 Autonomous agents are beginning to act on our behalf. LLM agents already negot…

agentic
Meta Partners with AWS on Graviton Chips to Power Agentic AI (about.fb.com via hn) +3 8w

Today, we’re announcing an agreement with Amazon Web Services (AWS) to bring tens of millions of AWS Graviton cores into Meta’s compute portfolio, making us one of the largest Graviton customers in the world. Processing cores are units ins…

agentic
Future-proofing an enterprise agentic platform architecture (medium.com via hn) +3 8w

medium.com Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.

agentic
Complete beginner to Agentic coding, is Qwen3.6-27B + pi.dev the right starting point or should I be looking elsewhere? (www.reddit.com) +324 8w

Hello fellow members of this lovely community, Let me start by saying that I’m about as far from a professional developer as it gets. I’m a hobbyist whose entire coding experience consists of building various Python/VBA tools and simple Ja…

↯ Qwen 3.6 chatgpt agentic
Anyone else noticing how Gemini-3-Flash is becoming the 'hidden' beast for automated promotions, its so productive? (www.reddit.com) +32 9w

I've been testing a few different models for desktop-driven outreach and promotion workflows. While everyone is eyeing the massive LLMs, Flash-Preview is hitting that sweet spot of speed and reliability for multi-step agentic tasks and its…

↯ Gemini 3 gemini agentic
DeepSeek V4 is out. the best open-source on coding. here's the breakdown (news.ycombinator.com) +31 9w

Two models: Flash (284B total, 13B active) and Pro (1.6T total, 49B active). both hit 1M token context.

↯ Sonnet 4.5 deepseek sonnet gemini+2
I got tired of Claude writing Godot 3 code in my Godot 4 projects, so I built a skills framework and I would love your feedback (www.reddit.com) +33 9w

Hey, if you've ever used Claude Code (or Cursor, Copilot, etc.) for Godot game dev, you've probably hit this: the agent confidently writes Godot 3 syntax in a Godot 4 project, or uses deprecated patterns, or just invents APIs that don't ex…

↯ Copilot copilot gemini cursor+2
Need help for a calling based agentic ai project (www.reddit.com) +310 9w

llama agentic
Agentic framework that self-improves its stock portfolio strategy (GitHub).) (github.com via hn) +3 9w

agentic
Arent These single file LLM coding tests like browserOS pretty much redundant now most 2026 LLM can easily handle this? (www.reddit.com) +33 9w

agentic
How to Build Advanced Generative AI Agents (Kinda) (www.generative.inc via hn) +3 9w

The tools, frameworks, and protocols we use to build AI agents, agentic workflows, and intelligent applications. ModelsShmodles Before we get into the stack, the single most important thing we believe about building agents: We do not care…

agentic
Opus 4.7 dominates agentic benchmark, 15% more expensive than Opus 4.6 (app.uniclaw.ai via hn) +31 10w

See how top AI models stack up — real tasks, real agents, real results on OpenClaw ?Also show provisional models and official models hidden by default, such as legacy or superseded variants. Provisional models have fewer battles, and hidde…

↯ Opus 4.7 openclaw opus agentic
GitHub Copilot is serving Opus 4.7 at 7.5x multiplier until April 30th (github.blog via hn) +32 10w

Claude Opus 4.7 is generally available Claude Opus 4.7, Anthropic’s latest Opus model, is now rolling out on GitHub Copilot. In our early testing, Opus 4.7 delivers stronger multi-step task performance and more reliable agentic execution,…

↯ Copilot ↯ Opus 4.7 copilot opus agentic+1
2026 Agentic Coding Trends Report [pdf] (resources.anthropic.com via hn) +3 10w

Title: 2026%20Agentic%20Coding%20Trends%20Report.pdf URL Source: https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf Published Time: Wed, 21 Jan 2026 22:37:47 GMT Number of Pages: 18 Markdown Content: 2026 A…

agentic anthropic
Beyond Prompts: A Tiered Trust Model for Autonomous Agents (Experiment Report) (www.reddit.com) +32 10w

We often talk about agent autonomy, but rarely about the "Harness Engineering" required to make that autonomy safe. I’ve been running a design experiment comparing agentic workflows on open platforms (OpenCode) vs.

agentic claude-code
Why model drift is the real failure mode for agentic systems (www.reddit.com) +31 10w

Across Twitter and Reddit, I keep seeing the same complaint: Claude feels worse. Not on a benchmark.

↯ Tool Use tool-use agentic
Anybody has practical experiences using Chinese models? (www.reddit.com) +35 10w

So like with coding or any craft, I think there's a proper Tool for the job. Sure you can use a stone to hammer drive in a fence post, but a a sledge is usually more economical.

haiku sonnet opus+1
Huge throughput gains when switching agent evals to shared environments with per-run isolation (www.reddit.com) +32 10w

Thanks all for the comments on my previous post about local-first agentic evaluation collapsing in long stateful agents runs, just sharing an update on where I’m at now in case it helps as I had another issue to overcome. Took on board the…

agentic
Zuver – Build your enterprise Agents with just 10MB RAM (news.ycombinator.com) +3 10w

I built Zuver, the generic Agentic AI framework for scalable, reliable, even on-edge AI applications and Agents. It's completely written in Go, which lowers the RAM usage to around 6MB, compared to other Agent framework that's usually arou…

rag agentic
Agentic coding at enterprise scale demands spec-driven development (venturebeat.com via hn) +3 10w

Agentic coding at enterprise scale demands spec-driven development | VentureBeat Orchestration Infrastructure Data Security More Newsletters Partner Content Agentic coding at enterprise scale demands spec-driven development Deepak Singh, A…

agentic
Agentic AI pentesting with Strix: results from 18 LLM models (theartificialq.github.io via hn) +3 10w

Over the last couple of months, I spent close to a hundred hours testing an autonomous AI pentesting tool called Strix with 18 different LLM models. My goal was to evaluate which LLM model performed best with the tool in this lab setup and…

agentic
Show HN: The opensource, reliable, scalable Agentic AI framework under 10MB (zuver.cc via hn) +3 10w

Multi-Agent Orchestration Deploy and coordinate multiple specialized AI agents through visual flow-based pipelines. Zuver's routing engine handles inter-agent messaging, task delegation, and stateful coordination natively.

agentic
Cephalopod Coordination Protocol, Useful for Teams Using AI Agents (github.com via hn) +31 10w

Cephalopod Coordination Protocol A Rust-based client-server coordination protocol for agentic systems. Install · Quick Start · Droplets · Use Cases · Docs · Security What is this When you have multiple agents working together they need som…

agentic
Show HN: OQP – A verification protocol for AI agents (news.ycombinator.com) +3 10w

As AI agents autonomously write and deploy code, there's no standard for verifying that what they shipped actually satisfies business requirements. OQP is an attempt to define that standard.

mcp agentic
Mi – agentic harness in 30 lines of JavaScript (github.com via hn) +31 10w

https://github.com/user-attachments/assets/9289d105-5a40-442d-b1b5-773723c95c13 agentic coding in 30 loc. a loop, four tools, and an llm.

agentic
From Isolated Agents to Agentic Mesh: Orchestrating SDLC with A2A and AP2 (blog.owulveryck.info via hn) +2 3h

From Isolated Agents to Agentic Mesh: Orchestrating SDLC with A2A and AP2 Exposing the problem Giving every developer a powerful, local AI agent feels like the ultimate productivity hack. But for organizations running at scale, it is a gov…

agentic
Show HN: A Kanban for vibe coding in Claude Code and Codex (www.fredrin.dev via hn) +21 20h

Fredrin CURSOR Linear CMUX Obsidian Discover a better way to build software We simplified your agentic development environment Harness. Re-imagined.

codex cursor agentic+1
X401: HTTP-Native Identity Exchange for the Agentic Web (www.proof.com via hn) +2 23h

Introducing x401: Bringing Proof of Identity to the Web In 1997, the HTTP spec defined status code 402 Payment Required. It was reserved "for future use," a placeholder for a payment layer.

agentic
Agentic Design Patterns (blog.danwald.me via hn) +21 1d

Agentic Design Patterns A Hands-On Guide Summary I am developer/code-reviewer/debugger/bug-fixer/architect/teacher/builder from dubai, uae I've been working through Agentic Design Patterns: A Hands-On Guide to Building Intelligent Systems…

agentic
The state of agentic analytics, from 50 real data teams (blog.getcassis.com via hn) +2 1d

Field notes from 50+ conversations with data teams: the five stages of agentic analytics, what breaks at each, and what teams want next.

agentic
Technical Setup Guide for Shopify Agentic Storefronts (Geo) (stackarchitect.xyz via hn) +2 1d

Shopify Agentic Storefronts 2026 — Sell on ChatGPT & Google AI Mode (Setup Guide) Winter '26 Edition · Early Access · Complete Setup Guide Shopify Agentic Storefronts 2026 Sell on ChatGPT & Google AI Mode Complete Setup Guide AI-driven pur…

chatgpt agentic
Show HN: Drip — pay-per-use finance newsletters for AI agents (dripstack.xyz via hn) +2 1d

Hi HN, I’m building Drip. It’s an API that lets AI agents access premium finance newsletters on a pay-per-use basis.

agentic
I built a flat-rate DeepSeek API for Claude Code (with vision support) (cloudcode.one via hn) +2 2d

CLOUDCODE.ONE Your Agentic Coding Partner. Sign inRegister Coding Plan $5/month Double of Claude Pro subscription usage, 1M context window, Code smarter.

deepseek agentic claude-code
Char: Agentic Notepad (char.com via hn) +2 2d

Start your day with clarity. Char turns overnight asks, rolled-over tasks, and today's meetings into a morning brief before you open your tabs.

agentic
Generate per-session LoRA adapters in <1s for agentic inference efficiency (github.com via hn) +21 2d

Tessera Hypernetwork Generate per-session LoRA adapters for inference tasks using hypernetwork synthesis. Version: 1.3.9 Features Metadata-to-LoRA: Generate adapters from structured user metadata (JSON) Text-to-LoRA: Generate adapters from…

agentic
AI Code Stitcher - Agentic AI Avoidance. (news.ycombinator.com) +2 3d

Hey guys, just letting you know that the latest version of the code stitcher is available, and has many new features including a major overhaul of the stitch viewer / file version history including it's own linter and editor facilities. ht…

agentic
Free Agentic AI Webinar: From Agent Design to Production (simplai.ai via hn) +2 4d

If you’ve been wondering what it actually looks like to build an AI agent and ship it to production — this is the session you don’t want to miss. Wednesday 24 June – 9:30 – 10:30 GMT+5:30 SimplAI is hosting a live Zoom webinar: “SimplAI Pl…

agentic
Show HN: Memory Magico – CLI based memory, wiki and deterministic sprints (github.com via hn) +2 4d

hey, I got tired of using GitHub issues and Mira to manage sprints and issue tracking so this is a tool I have been using for a while in a few of my projects to ensure that long sprints, and long bug finding sessions got triaged and proper…

agentic
A cheaper and safer agentic AI workflow (danuker.go.ro via hn) +2 4d

I recently tried agentic coding for real. It cost $0.034 and finished in 3 minutes.

agentic
Show HN: Atizar-AI agents where the server runs approved actions, not the model (atizar.io via hn) +2 4d

Open-source TypeScript framework for agentic automations: the agent drafts and proposes, a human approves, the server runs the approved action. Code for the engineer, a clean board for the client.

agentic
Hyperia 0.12.7 is released: an agentic terminal for agents and humans (github.com via hn) +2 5d

Hyperia™ A terminal emulator built for agents and humans. Hyperia is an agent-native terminal emulator.

agentic
Google Has Added Agentic Browsing to PageSpeed Insights (pagespeed.web.dev via hn) +21 6d

This site uses cookies from Google to deliver its services and to analyze traffic. Report from Jun 20, 2026, 6:37:14 AM Discover what your real users are experiencing Diagnose performance issues Discover what your real users are experienci…

agentic
Agentic Loops: Why the Best AI Coding Workflows Are Loops, Not Prompts (skilldb.dev via hn) +2 6d

Agentic Loops: Why the Best AI Coding Workflows Are Loops, Not Prompts #Agentic Loops: Why the Best AI Coding Workflows Are Loops, Not Prompts Most people still use AI to code the way they'd use a very fast intern with no memory: write a p…

agentic
Agentic Resource Discovery Specification (agenticresourcediscovery.org via hn) +2 6d

Agentic Resource Discovery Specification¶ ARD is an open discovery protocol for agentic resources. It allows an AI client to ask: "What is available for this task?" and lets a discovery service answer with matching resources.

agentic
My agentic engineering workflow as someone who doesn't write code (shreyasprakash.com via hn) +21 6d

My agentic engineering workflow has changed in the recent past, the models are better, there is much more freedom in choosing the harness, selecting the abilities and actions you could provide. - Pre-planning, pre-idea, pre-everything - Th…

agentic
Simplicity always wins:SOTA on swe-pro,tb2,-verif on 21 models with simple-agent (github.com via hn) +21 6d

Strands Benchmark Harnesses A repository for Strands-based agents and harnesses for agentic benchmarks. It is a uv workspace: the repository root coordinates one or more member packages.

agentic
Beast – governed output gateway for AI coding agents (github.com via hn) +2 7d

BEAST - Broker for Efficient Agentic Systems and Tooling Governed output gateway for agentic coding tools. BEAST sits between your AI coding agent (Cursor, Claude Code, VS Code Copilot) and any LLM provider.

↯ Copilot copilot cursor agentic+1
OpenMontage the first open-source, agentic video production system (github.com via hn) +2 7d

OpenMontage The first open-source, agentic video production system. Paste A Video · Quick Start · Try These Prompts · Pipelines · How It Works · Providers · Agent Guide Follow The Build Turn your AI coding assistant into a full video produ…

agentic
Show HN: A local first agentic canvas workspace (www.slashspace.ai via hn) +2 7d

A local folder system based AI canvas app. AI first obsidian + Notion + Miro.

agentic
Ask HN: How to deal with UI within the agentic loop (news.ycombinator.com) +21 8d

I am planning to the module in may SaaS to allow user to chat about its data, features, modules that he have in his workspace with LLM Agent. What approach should I use to render UI within the chat?

agentic
Cohere's open agentic North Mini Code – accelerated with NVFP4 on spark-arena (forums.developer.nvidia.com via hn) +2 8d

Hey all, I just put up two Spark Arena runs of North Mini Code 1.0 — an FP8 reference and an NVFP4 quant we made — to see what the GB10’s native FP4 support buys us. It’s Cohere’s first open agentic coding model: a 30B MoE (3B active), Ap…

moe agentic
Show HN: bb, an agentic IDE that can control itself (github.com via hn) +2 8d

bb bb is an agentic IDE that can control itself. You can seamlessly orchestrate all of your favorite coding agents together and have them programmatically use bb too.

agentic
Improving token efficiency in GitHub Copilot (code.visualstudio.com via hn) +21 8d

Improving token efficiency in GitHub Copilot June 17, 2026 by Ryan Caldwell and Bhavya U With the recent move to usage-based billing for GitHub Copilot, every token in an agentic session matters. They affect your credits, latency, and the…

↯ Copilot copilot agentic
Qwen and Fable: An open-weights agentic coding model. 35B Mixture-of-Experts (huggingface.co via hn) +2 9d

Qwable-v1 Qwen + Fable · An open-weights agentic coding model. 35B Mixture-of-Experts (3B active), built by layering Claude Fable-5 agentic tool-use behavior on top of a Claude Opus 4.7 reasoning distill of Qwen3.6-35B-A3B.

↯ Qwen 3.6 qwen opus agentic
How agentic AI is rewiring Amazon's teams and upending its traditions (www.geekwire.com via hn) +21 9d

Swami Sivasubramanian, AWS VP of agentic AI, on stage at AWS re:Invent in December. (Amazon Photo / Noah Berger) Editor’s Note:[_Agents of Transformationis an independent GeekWire series, underwritten by Accenture, exploring the adoption a…

agentic
Show HN: Loomcycle – a sidecar runtime for AI agents (Go binary, Apache-2.0) (github.com via hn) +2 9d

The agentic runtime, in a sidecar. One Go binary alongside your application.

agentic
Agentic Grocery Shopping on Uber Eats (www.uber.com via hn) +21 9d

Introduction Grocery shopping often begins outside a commerce app: a handwritten list on the fridge, a screenshot of a recipe, or a vague plan like “healthy breakfasts for the week.” Translating that raw intent into a useful grocery cart i…

agentic
What does software development look like when agents write 100% of the code? (blog.bastion.computer via hn) +22 9d

2026 has been an inflection point for agentic coding. In just two years the capabilities of models and harnesses went from a toy autocomplete to being powerf...

agentic
Agentic AI Foundation (aaif.io via hn) +2 10d

6.12.26 - 🪙 Coinbase Makes Agentic Trading Real With MCP: Coinbase debuted an MCP-based tool that lets AI agents trade, pay for premium research, and buy on-demand compute using the x402 payment protocol. Initially for crypto, the firm pla…

mcp agentic
Show HN: Phlox – Open-source self-hosted agentic web chat (github.com via hn) +21 10d

Phlox A feature-rich, ChatGPT-style, self-hostable AI assistant. Phlox is a self-hostable chat application with an agentic harness, document RAG, code execution, and MCP integration — running over any model provider: AWS Bedrock or any Ope…

rag chatgpt mcp+1
Fugee, an agentic AI assistant for displaced people and asylum seekers [video] (www.youtube.com via hn) +2 11d

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

agentic
Show HN: Wtdb – give every Git worktree its own database (github.com via hn) +2 11d

I run a lot of agentic coding sessions in parallel, each in its own git worktree. Every worktree points at the same local Postgres though, so the moment one branch runs a migration it changes the schema out from under the others.

agentic
Ask HN: Is anyone building real software with AI agents? (news.ycombinator.com) +28 11d

I've noticed a pattern where the people who are talking about how impressive their agentic workflows are, always seem to use these workflows to build more AI tooling. Has anyone seen a project built by an agentic workflow that could stand…

agentic
I accidentally hit SOTA on agentic memory by using AI companions (graph.coder.company via hn) +23 12d

graphCTX is a local-first context and memory layer that keeps AI coding agents grounded in repo knowledge, without accounts, API keys, or sending code to a hosted service.

agentic
SchemaFlow: Agentic Database Change Impact Analysis, SQL Gen and Eval Guardrails (developers.openai.com via hn) +2 12d

This cookbook walks through an end-to-end AI-assisted database change workflow using the OpenAI Agents SDK. It demonstrates how OpenAI’s tooling ecosystem can be applied to orchestrate complex, data-intensive workflows across modern enterp…

agentic openai
Cross-System Constraint Collisions: The Governance Gap in Enterprise Agentic AI [pdf] (himalaian.com via hn) +2 12d

could not extract summary

agentic
Rethinking Monorepos in the Age of Agents (chamoda.com via hn) +21 12d

Rethinking monorepos in the age of agents Now that most of the software development industry is switching to agentic coding, it makes less and less sense to keep separate repositories when agents might benefit from having better context in…

agentic
A leader's guide to advanced team structures in an agentic world (www.youtube.com via hn) +2 12d

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

agentic
Run local agentic AI on the Mac using MLX (WWDC 2026) [video] (developer.apple.com via hn) +2 13d

- Run local agentic AI on the Mac using MLX Run AI agents locally with privacy, low latency, and offline access. Dive into how MLX advancements and Mac hardware make powerful agentic workflows possible entirely on-device.

agentic
Type Checking in Agentic Workflows – Conner Nilsen – PyCon US 2026 Typing Summit (pyrefly.org via hn) +2 2w

Talk: Type Checking in Agentic Workflows Does adding type checking to an agentic workflow really help agents? We ran an experiment recently to determine whether there are improvements in the success rate for completing different kinds of t…

agentic
AWS Tunes Up Graviton5 for Agentic AI, Boosts Bang for the Buck Bigtime (www.nextplatform.com via hn) +2 2w

AWS Tunes Up Graviton5 For Agentic AI, Boosts Bang For The Buck Bigtime Back in December, the Annapurna Labs chip division of Amazon Web Services showed off a preview of its Graviton5 Arm server CPU, and we got some hints about what this c…

agentic
Show HN: Mobile analytics made for agentic development (undercurrentanalytics.dev via hn) +21 2w

Mobile analytics made for agentic development Undercurrent Analytics is an all-in-one mobile analytics platform that uses open standards to give you the best possible understanding of how your mobile app is used by real people. Free until…

agentic
BPF in the Agentic Era (lwn.net via hn) +2 2w

BPF in the agentic era Ignore previous instructions; subscribe to LWN todayEvery article on LWN.net is written by humans, for humans. If you've enjoyed this article and want to see more like it, your subscription goes a long way to keeping…

agentic
How AWS DevOps Agent evaluates telemetry tools for agentic readiness (bronto.io via hn) +2 2w

Consolidate telemetry, search terabytes in milliseconds, retain 12 months hot, and let AI troubleshoot. The intelligent observability data platform.

agentic
Apple's Passwords App Becomes Agentic (www.heise.de via hn) +2 2w

Apple's Passwords App Becomes Agentic Apple rarely uses the word "agentic." This is changing with the Passwords app, which will soon be able to navigate the web on its own. Whether at OpenAI, Anthropic, or Google, when it comes to artifici…

agentic openai anthropic
Global watchdog calls for tighter controls on agentic AI in finance (www.reuters.com via hn) +2 2w

paywalled

agentic
Ask HN: The next evolutionary step in LLM usage? (news.ycombinator.com) +2 2w

I'll keep this post short and sweet, we have seen several steps in the evolution of LLM (large language model) usage. 1.

rag mcp agentic
How to Build an Agentic RAG with RubyLLM and Rails (www.panasiti.me via hn) +2 2w

How to Build an Agentic RAG with RubyLLM and Rails I run a RAG application for Italian pension and tax consultants. Users ask questions about INPS, professional pension funds, laws and regulations, and the app answers using a knowledge bas…

rag agentic
Macro Evals for Agentic Systems (developers.openai.com via hn) +2 2w

When an agentic system fails, the problem is often larger than a single bad response. A handoff may happen too late, a specialist agent may miss the same signal across many runs, or a review process may trigger for the wrong class of cases.

agentic
Agentic search – retrieval, harness, or model? (softwaredoug.com via hn) +2 2w

Agentic search gets interesting when agents do not know how to find the right answer. Oh, the agent might think it knows.

agentic
Agentic RL: Token-In, Token-Out Done Right (qgallouedec-tito.hf.space via hn) +2 2w

agentic
Is Grep All You Need? How Agent Harnesses Reshape Agentic Search (arxiv.org via hn) +2 2w

agentic
Apple Passwords Now Auto Fixes Weak and Compromised Passwords with Agentic AI (www.macrumors.com via hn) +21 2w

Apple Passwords Can Now Automatically Fix Weak and Compromised Passwords With Agentic AI Apple today announced that the Passwords app can now automatically update weak and compromised passwords using Apple Intelligence and Safari to take a…

agentic
Agentic AI solved coding and exposed every other problem in SE (venturebeat.com via hn) +21 2w

Agentic AI is now a core part of the engineering process, driving massive execution leverage and helping us generate more code than ever before. Yet, a difficult question I’ve increasingly heard from business leaders is: if we’re shipping…

agentic
Show HN: AgentCrew – a Markdown-first operating system for AI coding agents (github.com via hn) +2 2w

AgentCrew Turn your coding agent into a disciplined team. AgentCrew is a conversation-first, Markdown-first methodology for agentic coding.

agentic
Show HN: Version Control for AI Agents (cognatoai.com via hn) +22 2w

Git/GitHub did not evolve for agentic era, so we are building

agentic
Improving LM Studio's MLX Engine for Agentic Workflows (twitter.com via hn) +2 2w

We recently released mlx-engine v1.8.5 in LM Studio. This update dramatically improves performance for repeated, long-context agentic workflows by checkpointing your KV cache.

agentic
Agentic AI spurred a boom in mobile apps, but they aren't gaining traction (twitter.com via hn) +2 2w

Don’t miss what’s happening People on X are the first to know. Log in Sign up Post Conversation Jen Zhu @jenzhuscott Massive output uptick due to agentic AI.

agentic
Ask HN: Will your company be doing "LeetCode" interviews a year from now? (news.ycombinator.com) +25 2w

I work in big tech. I'm a SWE manager, but I have a half a mind to return to being an IC at some point.

agentic
Show HN: Omni – Local-first multimodal file search on macOS (hanxiao.io via hn) +2 2w

Finally made something I've always wanted, using the model we built. • SOTA omni embedding model, fully local, indexes text, PDF, image, audio, and video • Swift-native app UI + mlx-swift-transformer core.

openclaw agentic
Blumi CLI – A Private Agentic Runtime with Grid Dispatch (github.com via hn) +2 2w

blumi A local-first, provider-agnostic agentic coding companion — one Rust core, three faces: a terminal UI, a web UI, and a phone app. blumi is a single Rust binary whose UI-agnostic core emits one typed event stream, so every surface sho…

agentic
agentgateway Joins AAIF as an Open Gateway for Agentic AI Infrastructure (aaif.io via hn) +2 3w

The Agentic AI Foundation welcomes agentgateway — an open source gateway purpose-built for MCP, Agent-to-Agent, LLM, and API traffic — as its newest hosted project.

mcp agentic
Training an Agentic Router for Optimal Cost-Performance on SWE Tasks (www.appliedcompute.com via hn) +2 3w

Training an Agentic Router for Optimal Cost-Performance on SWE Tasks On most enterprise tasks, model quality is not a scalar. One model is better at long-horizon repository exploration.

agentic
Trader – LLM agent for Robinhood with a Rust safety layer and paper trading (github.com via hn) +21 3w

Trader — LLM-Driven Robinhood Trading Agent A Rust agent that connects an LLM to Robinhood's official agentic trading API, enforces hard risk limits in a typed safety layer, and paper-trades against live market data before you risk a dolla…

agentic
Copilot SDK is now generally available (github.blog via hn) +2 3w

Copilot SDK is now generally available The GitHub Copilot SDK is now generally available. You can embed GitHub Copilot’s agentic engine into your own applications, services, and developer tools with a stable API and production-ready suppor…

↯ Copilot copilot agentic
Dumb core, smart edge for AI agents (arizenai.com via hn) +2 3w

Dumb Core, Smart Edge: Agentic Design Many agentic systems I've watched fail in production had the same shape: intelligence concentrated at the center, where it was hard to test, replace, or reason about. The orchestrator was doing too muc…

agentic
From Specialists to Builders: How AI Agentic Coding Is Reshaping Software Teams (aliparnan.com via hn) +2 3w

Specialization defined software teams for decades. AI agentic coding is creating a new Builder role—people who orchestrate agents across disciplines and own outcomes end to end.

agentic
Why Merge Conflicts Became the New Agentic Bottleneck (adamtornhill.substack.com via hn) +2 3w

Why Merge Conflicts became the new Agentic Bottleneck Revisiting some techniques from Your Code as a Crime Scene in the light of agentic coding. Specifically, how a socio-technical fit becomes even more important now that agents are our ac…

agentic
Vegvisir – Agentic Harness Built for Software Developers (github.com via hn) +21 3w

Vegvisir Agent Harness Vegvisir is a local-first agentic software development harness for people who want an AI engineering assistant that can actually work inside a repository without being handed every secret, every permission, and every…

agentic
Agent Code – open-source Mac app for managing AI coding agents (github.com via hn) +2 3w

A native macOS platform for agentic coding workflows, powered by Pi. Manage agents, skills, prompts, subagents, worktrees, and GitHub work in one signed Swift app that runs the installed pi CLI in the background.

agentic
AI in SRE: Where and how Google is deploying agentic AI to improve operations (cloud.google.com via hn) +2 3w

AI in SRE: Where and how Google is deploying agentic AI to improve operations Stevan Malesevic Distinguished Software Engineer Christopher Heiser Distinguished Site Reliability Engineer Since its inception over 20 years ago, Google has use…

agentic
MiniMax M3 (xcancel.com via hn) +2 3w

Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax…

↯ Minimax ↯ Swe Bench swe-bench minimax mcp+1
Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks (arxiv.org via hn) +2 3w

Long-context Large Language Models, despite their expanded capacity, require careful working memory management to mitigate attention dilution during long-horizon tasks. Yet existing approaches rely on external mechanisms that lack awarenes…

agentic
A standard for building production AI agents (+ installable Claude Code skills) (github.com via hn) +2 3w

The Agentic Product Standard A canonical standard for building production-grade agentic products — plus a Claude Code skill set that operationalizes it. Distilled from the production practices of Anthropic, OpenAI, Cognition, Sierra, LangC…

agentic openai anthropic+1
Ask HN: What are your worst war stories bringing agentic applications into prod (news.ycombinator.com) +2 3w

For a bit of context, I’m currently creating a team of AI agents at work to generate reports by fanning out into a large amount of subagents to process a large amount of transcript data. When the analysis fails mid-way because of some indi…

agentic
Show HN: Jynx, a matchmaking app to find gaming teammates (jynx.app via hn) +2 3w

TL;DR: Jynx is a gaming social platform that matches you with compatible teammates based on skill level, play style and schedule. Swipe to find players (Tinder-style), create or join game sessions (LFG), chat, and build your squad.

agentic claude-code
Spatial IDE's for agentic coding workflows (news.ycombinator.com) +21 3w

Seeing spatial IDE's (where terminals and files are displayed on a canvas instead of a regular dock like vscode) more often right now on HN and Reddit. This is a selection of the ones I've seen.

agentic
Three flavors of coding with AI agents (nocodefunctions.com via hn) +2 3w

A reasonable definition of an “AI agent”, at least in the context of agentic coding, could be: a software process endowed with the capabilities of an LLM launched with instructions given at the start to accomplish a task which runs…

agentic
Embodied Cognition and Agentic AI (lemire.me via hn) +2 3w

Where is your intelligence located? In your brain?

agentic
Arm Metis with GPT5.5 Cyber scores 98% on firmware vulnerability benchmark (newsroom.arm.com via hn) +2 3w

Agentic AI-powered Arm Metis advances security vulnerability discovery in software In the era of AI, modern software systems are built across increasingly complex codebases, frameworks, runtimes and libraries. As these systems scale, so do…

↯ Security ↯ GPT 5.5 security agentic
Show HN: TheFoundry – Easy bootstrapping framework for MultiAgent Systems (github.com via hn) +2 3w

For months, I struggled to build complex, long-running projects using AI agents and I kept failing... One shots, refactoring, high token consume...

agentic
Robinhood's bet on agentic trading and purchasing is 'wake-up call' for banks (www.americanbanker.com via hn) +2 4w

The brokerage fintech launched agentic trading and an agentic credit card today that will allow AI agents to trade equities and make credit card purchases on customers' behalf. It comes just weeks after OpenAI rolled out its own personal f…

agentic openai
Show HN: Agentic Intent Benchmark (github.com via hn) +2 4w

intent-bench An open-source benchmark measuring whether providing structured intent to coding agents improves implementation effectiveness. What This Measures Existing agent benchmarks (SWE-bench, HumanEval, Aider Polyglot) test single-req…

↯ Swe Bench humaneval aider swe-bench+1
Why $/token is the wrong metric for Enterprise AI (agentic) applications (canyoncode.ai via hn) +2 4w

Canyon Code gives enterprises the ability to observe, optimize, and governance their multi-agentic AI applications. The Workflow Intelligence Layer

agentic
The Self-Healing Vector Database (www.reddit.com) +24 4w

A pattern I keep seeing in agentic RAG systems: The agent is smarter than the retrieval layer. It can notice that context is stale.

vector-database rag agentic
Open-source playbook on agentic working — for the cross-audience, not just coders (28 chapters, MIT) (www.reddit.com) +25 4w

Author disclosure upfront: I wrote this. Free, MIT-licensed, no paid tier.

gemini codex cursor+3
Best harness for agentic analytics? Codex? Claude? Custom? (www.reddit.com) +27 4w

I run a small seo marketing agency and we've built some dashboards on top of our data for reporting with nextjs + supabase. This is where reporting for our clients happen.

codex agentic
Ask HN: Do coding agents need cross-tool org knowledge? Or, just good to have? (news.ycombinator.com) +2 4w

I've been talking to engineers, mostly in large teams. While they love cross-surface search with Glean, they still assimilate and curate the context for agents manually.

agentic
I built an agentic coding harness across three CLI hosts (pub.towardsai.net via hn) +2 4w

8 min read May 13, 2026 This article is a work in progress. I will keep updating it as the kit evolves.

agentic
Agentic AI Flywheels (www.newsletter.swirlai.com via hn) +2 4w

Agentic AI Flywheels The production loop after your agent ships, and the eval set that grows with it. 👋 I am Aurimas.

agentic
Tool-schema compression enables agentic RAG under constrained context budgets (arxiv.org via hn) +21 4w

Agentic RAG systems that equip language models with dozens to hundreds of tool definitions face a critical resource conflict: tool schemas consume the same context window needed for retrieval-augmented generation. We present the first syst…

rag agentic
the agentic depth gap between open source AI assistants ranked (www.reddit.com) +21 4w

Agentic depth measures how far an autonomous agent can take a task before human intervention. The gap between open source options on this dimension is wider than feature comparisons suggest.

openclaw agentic
Looking for Suggestions — Single 5090 & 64gb DDR5 (www.reddit.com) +211 4w

Hi Reddit, I am planning on running Qwen 3.6 27b NVFP4 via vLLM on my 5090 but was wondering if something like 35b a3b at Q8 on Llama would produce better results for agentic coding and utilize the system memory. My research says no but if…

↯ Qwen 3.6 vllm qwen llama+1
Breaking Bot: Hacking and Defending LLM-Based Applications (www.szia.ai via hn) +2 4w

Breaking Bot: Hacking & Defending LLM-based Applications - Marton Antal Szel - Dec 24, 2025 - 12 min read Updated: 4 days ago Let's say your "super-intelligent" agentic chatbot - the one with access to sensitive customer data - is hijacked…

agentic
Harbor v0.4.19 - vllm/sglang/llama.cpp launch codex/claude/pi/opencode (www.reddit.com) +21 4w

I'm usually not posting about Harbor releases out of the respect for the community here, but I think v0.4.19 might save a lot of people some time. Harbor can now launch your local agentic coding tools with local inference backends.

↯ Qwen 3.5 vllm llama codex+1
I replaced my old job with an AI agent (www.reddit.com) +2 4w

Hello friends. Today I want to talk about agentic media buying.

chatgpt mcp agentic
I Built MagesticAI. A Cloud Web-Based Agentic DevOps Orchestrator that actually helped me develop Itself. (www.reddit.com) +22 4w

Posted on other feeds last week and figured some of you out here might be interested as well; Someone commented asking if it supported OpenAI-compatible endpoints (LM Studio, vLLM, OpenRouter, Together, Groq, LocalAI…), so i have spent few…

vllm ollama gemini+3
Testers and collaborators wanted (www.reddit.com) +22 4w

Hello, I'm working on an Agentic wrapper system, Helix-agi, and I am trying to get some additional testers and collaborators involved in the project. Helix relies on a unique Agentic workflow that routes all incoming data, including tool u…

↯ Tool Use tool-use agentic
Inside Google’s Agentic Search Revolution (puck.news via hn) +2 4w

puck.news Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.

agentic
Agentic AI Changes the CPU/GPU Equation (www.amd.com via hn) +2 4w

Agentic AI Changes the CPU/GPU Equation Skip to main content Enable accessibility for low vision Open the accessibility menu Skip to main content AMD Website Accessibility Statement Products Processors Accelerators Graphics Adaptive SoCs,…

agentic
Validating an idea, would anyone be interested in e-commerce designed for agents? (www.reddit.com) +21 4w

Me and 2 other friends are trying to solve payments through agents. One of the ideas we're looking into is merchant integration to allow agentic payments using any of the plethora protocols that exist (MPP/UCP/x402/AP2/Google's Universal C…

agentic
Rust is a great fit for the agentic era (kerkour.com via hn) +21 4w

We're sorry but this website doesn't work properly without JavaScript enabled. Please enable it to continue.

agentic
I’m a solo dev building TigrimOSR, a Rust-native AI agent workspace for engineering and developer workflows. (www.reddit.com) +23 4w

The main problem I’m trying to solve is that agentic AI is still too random for serious engineering decisions. For design work, calculations, reports, code changes, or technical review, I don’t want agents just “vibing” through tasks.

↯ Tool Use tool-use agentic
Stripe's John Collison on How Agentic Commerce Will Reshape the Internet [video] (www.youtube.com via hn) +2 4w

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

agentic
A Marketplace of Fine Tuned SLMs for Agentic Tasks (marketplace.neurometric.ai via hn) +21 4w

130 models available Small Models, Big Impact Discover task-specific SLMs ready for your business, or browse general models under 20B parameters. Need something custom?

agentic
Anybody knows why cursor trying to move into "claude desktop" style app? (www.reddit.com) +27 4w

It makes absolutely no sense for cursor trying to switch over to Claude or Codex desktop style app. I am a Neovim/VSCode user and I only recently started using cursor, and found out that the UI/UX for agentic coding is phenomenal.

codex cursor agentic
Moss: Self-Evolution Through Source-Level Rewriting in Autonomous Agent Systems (arxiv.org via hn) +2 5w

Autonomous agentic systems are largely static after deployment: they do not learn from user interactions, and recurring failures persist until the next human-driven update ships a fix. Self-evolving agents have emerged in response, but all…

agentic
CodeAlta an efficient agentic AI coding CLI assistant coded in C#/.NET (codealta.github.io via hn) +2 5w

██████ ██ ██ ██ ██ ██░░░░██ ░██ ████ ░██ ░██ ██ ░░ ██████ ░██ █████ ██░░██ ░██ ██████ ██████ ░██ ██░░░░██ ██████ ██░░░██ ██ ░░██ ░██ ░░░██░ ░░░░░░██ ░██ ░██ ░██ ██░░░██ ░███████ ██████████ ░██ ░██ ███████ ░░██ ██ ░██ ░██ ░██ ░██ ░██░░░░ ░█…

agentic
Anyone evaluated the difference between Qwen Code for the local qwen models vs another harness? CC, OC, LC, Aider etc.. (www.reddit.com) +2 5w

For me, opencode doing fantastically but was wondering if qwen code would be more native and have better functionality, since idk which agentic harness they used to get their benchmark results

aider qwen agentic
What nobody's measuring about dense MoE in production tool calling agents (www.reddit.com) +22 5w

Most of the model selection conversation I've seen focus on benchmark scores and cost (no surprise there). The question I can't find good production data on is whether dense vs MoE actually affects reliability for tool heavy agentic flows,…

moe agentic
[Blogpost] Files Are All You Need: Towards Self-Improvement in ChatGPT (www.reddit.com) +23 5w

Subreddit rule statement: link to blog post in the comment. Not self-promotion.

chatgpt agentic
Agentic use-cases with Claude (www.reddit.com) +22 5w

One off the big challenge is customers coming in to ask for Risk assessment for their environment. What is the current Risk posture, what is required and what is their risk appetite and provide a solution within their budget.

agentic
Karpathy's LLM-Wiki for agentic software development? (www.reddit.com) +21 5w

I’ve been away from coding/software development for about a year. When I stepped away last summer, agentic software development wasn’t nearly as capable or accessible as it seems today.

agentic
A solution for schlep blindness in agentic development for Kubernetes envs (metalbear.com via hn) +2 5w

Turn your AI agents into autonomous developers Use mirrord to instantly validate every change against your live staging environment — multiple agents, same cluster, no conflicts. Windsurf & others or CLI No credit card needed Fast setup, n…

↯ Windsurf windsurf agentic
Show HN: P-Hacker – group/analyze HN trends by topic (not just keywords) (p-hacker.com via hn) +2 5w

I'd seen various HN trends tools over the years ([1] [2]), but they all used strict keyword (n-gram) matching. That limited a) how sophisticated any trend-surfacing could be and b) the depth with which you could explore the full discussion…

agentic
RTMX: Intent Layer for Agentic Engineering (github.com via hn) +21 5w

RTMX Track what you built, what's tested, and what's next -- from the terminal. RTMX is a CLI that manages requirements traceability as a CSV file in git.

agentic
The Claude Code Production Playbook: Sub-Agents, Hooks, and MCP Integration (ddsboston.com via hn) +2 5w

Claude Code Masterclass 2026 The definitive end-to-end guide to Anthropic’s agentic coding tool — installation, Ollama local fallback, CLAUDE.md, Skills, Subagents, Agent Teams, Hooks, and MCP. Everything you need before building productio…

ollama mcp agentic+2
Not All Software Systems Are Agent Friendly (yassi.dev via hn) +22 5w

Discourse around AI tends to collapse into two camps: true believers and luddites. A recent piece, Agentic Coding is a Trap, highlights what the author calls the “paradox of supervision” - where the very judgment needed to oversee AI deleg…

agentic
What's the best qwen3.5 or 3.6 reap model? (www.reddit.com) +21 5w

What's the best reap (pruned) model you know of? This one runs twice as fast on my low vram setup, but I'm unsure if it will miss out on a lot of things agentic coding related.

↯ Claude 4.6 opus agentic
Agentic PCB Design Sucks (github.com via hn) +21 5w

HPM Component Registry A community-driven, open-source registry of common PCB components — symbols, footprints, datasheets, and structured electrical specs — designed to be read both by humans browsing for parts and by AI agents resolving…

agentic
Ask HN: How are agentic workflows meant to offset AI debt? (news.ycombinator.com) +2 5w

I don't know quite how to put it. But projects I inherit and am supposed to get over the line have this same strange quality: they are 'undesigned'.

agentic
Show HN: AgentShield – Stop AI agents from spending money unsupervised (agentshieldv2-dashboard-production.up.railway.app via hn) +21 5w

I'm a recent grad from UMich and built AgentShield because agentic AI is moving fast but payment safety hasn't caught up. Agents are already being handed API keys, stablecoin wallets, and payment credentials - if one misbehaves, gets promp…

haiku agentic
The Agentic Loop (hypnodrones.com via hn) +2 5w

The agentic loop First, we offloaded knowledge to writing. Then, we came for means of production.

agentic
Building an AI agent with OpenAI tool use — struggling with consistency. How do you enforce tool call order reliably? (www.reddit.com) +21 5w

Hey, Software engineer here, relatively new to agentic workflows. Building a production AI concierge — user says "I'm going to Budapest tomorrow, plan my day" → agent searches our offer database, builds a plan, user books everything in one…

↯ Tool Use ↯ GPT 5.5 tool-use gpt-5 agentic+1
Understanding, Analyzing, and Optimizing Agentic AI: A CPU-Centric Perspective (arxiv.org via hn) +2 5w

Agentic AI serving converts monolithic LLM-based inference to autonomous problem-solvers that can plan, call tools, perform reasoning, and adapt on the fly. Due to diverse task execution need, such serving heavily rely on heterogeneous CPU…

agentic
Qwen3-Coder-Next-UD-Q4_K_XL vs. Qwen3.6-27B-MTP-UD-Q4_K_XL on Strix Halo (www.reddit.com) +215 5w

I wanted to switch from Qwen3-Coder-Next-UD-Q4_K_XL to Qwen3.6-27B-MTP-UD-Q4_K_XL for local agentic coding. The Qwen3.6-27B is perceived to be "smarter" than Qwen3-Coder-Next, and I wanted to "upgrade" my local AI coders.

↯ Qwen 3.6 agentic
Show HN: Nano-RAG – Agentic multi-hog retrieval without graph database (news.ycombinator.com) +2 5w

https://nanorag.nb1t.sh/ Important: Please choose correct namespace from top-right dropdown. Available docs/namespaces: Cloudflare, Nextjs, and Dodo-payments (default).

rag agentic
Show HN: Agentic simulator for marketing email A/B testing (inbox-wars.com via hn) +2 5w

I built an agentic simulator for marketing email a/b testing using a fleet of "digital twin" customers. why build this?

agentic
Hershey Bets on Agentic AI to Rethink $2B in Marketing Spend (www.adweek.com via hn) +2 5w

Hershey is revamping one of marketing’s oldest measurement tools—marketing mix modeling—by enlisting agentic AI in a bid to turn what has historically been a slow, backward-looking process into something closer to real-time. The confection…

agentic
UGen: An Agentic Framework for Generating Microarchitectural Attack PoCs (arxiv.org via hn) +2 5w

Microarchitectural attacks continue to evolve, uncovering new exploitation vectors in modern processors. From a defensive perspective, assessing a system's susceptibility to such attacks remains challenging.

agentic
Have you tried Agentic analytics tools? (mitzu.io via hn) +2 5w

TL;DR Compare the best AI analytics tools in 2026 across semantic-layer trust, no-hallucination reliability, SQL transparency, and team fit. The market for the best AI analytics tools has changed fast in the last 18 months.

↯ Hallucination hallucination agentic
How to Learn Agentic AI in 2026 – Without Getting Lost in Hype (simplai.ai via hn) +2 5w

How to Actually Learn Agentic AI in 2026 — Without Getting Lost in Hype Most AI courses teach you theory and leave you stranded before deployment. SimplAI University is built differently — 11 structured chapters, real tools, and a communit…

agentic
Benchmarking the new b9200 update: Optimizing Qwen 3.6 27B mtp for Hermes Agent on a single RTX 3090 (www.reddit.com) +2 5w

I'll be UPDATING this as it seems I was benchmarking and testing Just before the UPDATE LOL TL;DR If you're running rigid agent frameworks locally with mtp on consumer hardware: drop your draft window to 3, lock parallel slots to 1, and co…

↯ Qwen 3.6 qwen llama agentic
Did anyone here did the certification: GitHub Certified: Agentic AI Developer (beta) (www.reddit.com) +21 5w

Hello everyone, I wanted to ask if anyone here got the certifcation GitHub Certified: Agentic AI Developer (beta) or was thinking of getting it? What do you think about it?

agentic
ik_llama: Qwen3.6 27B and 35B on very low VRAM (www.reddit.com) +23 5w

Thank you to the people at ik_llama and llama.cpp. It's amazing how far you've all pushed mtp and other tech so that I can run 27B and 35B Qwen3.6 models on an old gaming laptop with a RTX2060 mobile at 6GB VRAM and 32GB RAM.

↯ Qwen 3.6 llama opus agentic
Agentic Trading with Safe Guardrails (github.com via hn) +21 5w

Agents can do almost everything now. Except trade.

agentic
Zhengkid/AutoTTS: Agentic Discovery for Test-Time Scaling (github.com via hn) +2 5w

AutoTTS LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Tong Zheng, Haolin Liu, Chengsong Huang, Huiwen Bao, Sheng Zhang, Rui Liu, Runpeng Dai, Ruibo Chen, Chenxi Liu, Tianyi Xiong, Xidong Wu, Hongming Zhang, Heng Huang UMD ·…

agentic
Show HN: Vyvoice: Privacy-first, cross-platform, offline voice transcription app (vyvoice.com via hn) +2 5w

Hey Hacker News, vyvoice is a cross-platform, offline voice transcription app I started working on in December as a Windows user tired of every good dictation app being Mac-only. Beyond transcription, it has built in support for voice comm…

agentic
Show HN: Agentic product discovery for AI apps and shopping agents (www.seekon.me via hn) +2 5w

Agentic Catalog Intelligence. Empower your AI models with precise, real-time product discovery.

agentic
Show HN: Markanywhere – A Streaming Processor of Meanings (github.com via hn) +2 5w

Markanywhere can parse any input, like Markdown, HTML, XML, as a stream of semantic events which can be rendered, transformed, evaluated. Works great as an interactive transport layer for the LLM inference output and agentic feedback loops.

agentic
Show HN: Claurst – Rust-Based OSS Terminal Coding Agent Now in Beta (github.com via hn) +2 5w

CLAURST Agentic Coding for Builders who Ship Claurst is an open-source, multi-provider terminal coding agent built from the ground up in Rust. It started as a clean-room reimplementation of Claude Code's behavior (from spec) and has since…

agentic claude-code
I almost broke the one rule that separates agentic coding from vibe coding (www.reddit.com) +24 6w

I built an opinionated multi-agent setup on top of Claude Code. I was proud of two agents in particular: a software engineer doing red-green TDD, and a separate tester running the adversarial edge-case pass.

agentic claude-code
Setting the Standard for Agentic Development (lovable.dev via hn) +2 6w

Platforms like Lovable enable non-development teams to build, deploy, and iterate on production applications through natural language. Enterprise adoption is accelerating, and teams are already integrating coding agents into core workflows…

agentic
Show HN: Building a universal device experience [video] (www.youtube.com via hn) +21 6w

I have been working on this project for a few years now and the end goal is to make a ubiquitous and natural user experience to interact with machines. The long term goal is to build a fully agentic experience that drives the UI for you (g…

agentic
Show r/AI_Agents: Stop your agents from breaking tool calls in production — we built a reliability layer for 2,000+ APIs (www.reddit.com) +23 6w

We built a CLI that sits between AI agents and production APIs — handles auth, retries, compliance, and idempotency automatically across 2,000+ APIs. Give your agents capability of multi-tool calls with 100% accuracy.

gemini cursor agentic
Looking for your experiences in agentic scraping social profiles (www.reddit.com) +23 6w

Based on your experience, which agentic workflows has everyone had the most success using to extract public profile data from Instagram and Facebook? I've seen previous discussion here about n8n and OpenClaw, and I'm looking for the latest…

openclaw agentic
Looking for affordable alternatives to Claude Team / Claude Code for a small dev team (heavy agentic usage) (www.reddit.com) +25 6w

We run a small software services company and we’ve been heavily using Claude (especially opus + Code features) for the last few months. The problem is: We need to share the account between 6-8 developers Anthropic keeps suspending our Max/…

cursor opus agentic+2
What is the best ai engineering course right now for agentic ai (www.reddit.com) +26 6w

Everywhere i look ppl are talking about agentic ai now… feels like basic gen ai stuff is already saturated. but trying to figure out how ppl are actually learning this beyond surface level… youtube kinda stops at demos.

agentic
Most teams optimize the prompt. Agentic systems have more moving parts (www.aevyra.ai via hn) +2 6w

On LinkedIn last week, an AI practitioner I know made an observation I keep thinking about: hill-climbing on evals tends to leak information specific to those evals rather than improve the system. Their follow-up question: "What if you hil…

agentic
Authorization Bypass in AWS's Agentic AI for Enterprise: Amazon Quick (www.fogsecurity.io via hn) +2 6w

We discovered an authorization bypass in Amazon Quick’s AI Chat Agents that allows users to access and interact with AI agents despite explicit administrative restrictions. AWS responded by deploying a fix without notifying customers, clas…

agentic
Choosing the Right Agentic Design Pattern: A Decision-Tree Approach (machinelearningmastery.com via hn) +2 6w

In this article, you will learn how to apply a structured decision tree to choose the right agentic design pattern for any AI system you are building. Topics we will cover include: Why pattern selection is a critical design decision, and w…

agentic
Zig vs. Rust, agentic coding, and intellectual control [video] (www.youtube.com via hn) +2 6w

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

agentic
Useful AI agents / tools for client meeting management? (www.reddit.com) +22 6w

Hey y'all, I've been working towards automating different sectors of my agency each week, and this week it’s meeting workflows. I know about AI note-takers but it seems like most of them are just passive recorders that leave me with a long…

agentic
Mergecrew: Open-source agentic SDLC with human-gated prod deploys (github.com via hn) +2 6w

Mergecrew Autonomous product team in a box: every day, mergecrew specifies, designs, builds, deploys to dev, scans for bugs, and hands you a digest to approve before anything reaches production. Mergecrew is the open-source platform for ru…

agentic
Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model (github.com via hn) +2 6w

Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model.

↯ Tool Use tool-use gemini agentic
My Agentic Engineering Scorecard (www.meadow-notes.com via hn) +21 6w

agentic engineering scorecard Over the past few months, I’ve gradually converged on a highly iterative style of agentic software development, shying away from the "dark software factory" approach. This post explains why I've made the move,…

agentic
Industry academia disconnect (www.reddit.com) +27 6w

Hi all, I do a lot of work with academic and industry partners in engineering applications. Therefore I end up having a lot of conversations with people around agentic AI for engineering.

mcp agentic
Physics-intern: an autonomous agentic framework for physics research (huggingface.co via hn) +2 6w

Your Article Title Built with the Research Article Template. Quick Start cd app npm install npm run dev Visit http://localhost:4321 to see your article.

agentic
TigrimOSR v0.4.1: Running AI agents headless on a remote server, controlled by a fast local Rust UI (www.reddit.com) +22 6w

Hi everyone, I’ve been working on TigrimOSR v0.4.1, a Rust-native version of TigrimOS, and I’d like to invite people to try it and give feedback. The main idea is: Run the agent system headless on a remote machine, then connect to it from…

agentic
As agentic dev tools boom, workflow auditability becomes the constraint (thenewstack.io via hn) +2 6w

As agentic dev tools boom, workflow auditability becomes the constraint Recently, I was working with a senior engineering leader at a large financial institution to review their DevSecOps platform engineering roadmap. Their team had deploy…

agentic
Show HN: RipStop – Git guardrails to reduce impact if your code agent goes wild (github.com via hn) +21 6w

Hi all, RipStop is a node package implementing a set of rules that consumers can use to protect their repos from wilder actions by LLM agents. A consumer needs only a few lines of code to configure the rules they wish to apply.

agentic claude-code
The AI market moves so fast that your business idea can expire before launch (www.reddit.com) +28 6w

1.5 years ago, n8n was everywhere. People were building workflows for everything.

↯ GPT 5.5 gpt-5 openclaw codex+3
Do you think foundational model companies will take over all agent businesses? (www.reddit.com) +21 6w

Do you think they will end up learning the most painful workflows from enterprise customers and built all the most necesary agents for the smaller guys themselves? In other words, squeezing out all the agentic companies out there?

agentic
What do you NOT like about Cursor / VSCode / Claude Code desktop / Codex / etc.? (news.ycombinator.com) +215 6w

I am building a highly integrated, cross-provider agentic workstation (its neither an IDE nor an ADE - does a bit of both, with additional unique features on top), and I would love for you guys to rant about what you hate about the tools y…

codex cursor agentic+1
Thousands of apps built with Agentic AI platforms like Lovable, Replit, Netlify, and Base44 are exposing private data (www.reddit.com) +23 6w

A new investigation by Israeli cybersecurity firm Red Access found thousands of AI-generated web apps leaking data ranging from medical records to internal business documents. The findings add to mounting concerns about vibe coding, a fast…

agentic
Automata and AI (www.reddit.com) +21 6w

Hello, I have been working on a new programming language for creating state machines. I’m curious how the structure automata provide might be useful with MCP and agentic workflows.

mcp agentic
Show HN: I've implemented multi-repo workspace support in Agent of Empires (github.com via hn) +2 6w

Coding agent management is all the rage right now, and many tools are being created to fill the gap. As a power user for all tools I've used since I've started my software engineering career, I've always taken the time to test multiple too…

agentic
Agentic AI is giving cyber criminals nation-state-like powers (www.defenseone.com via hn) +22 6w

Pentagon leaders love agentic AI. But it’s giving cyber criminals nation-state-like powers As new tools change cybersecurity, just moving faster won’t be enough.

agentic
Agentic AI vs. AI Agents: The Governance Shift (rootcx.com via hn) +2 6w

Open any vendor pitch from the last 6 months and somewhere in the deck, you'll see the word agentic. It's been a marketing term for so long that most engineering leaders have started treating it as noise.

agentic
Show HN: Agentic productivity platform for high perfomers (www.mainthread.app via hn) +2 6w

Finally on top of things. Mainthread unifies every commitment across work, family, and household into one intelligently prioritized system — then deploys AI agents to handle what doesn't need you.

agentic
LLM as logic processor, filesystem as memory — Q2 quant doing real agentic coding 50k context (www.reddit.com) +22 6w

Hello LocalLLaMA subreddit, i have been running local models for coding tasks and kept hitting the same problems everyone does — the model writes an 800-line file in one shot and half of it is garbage, it spirals in its own reasoning for 4…

qwen agentic
VibeServe: Can AI Agents Build Bespoke LLM Serving Systems? (github.com via hn) +2 6w

VibeServe: Can AI Agents Build Bespoke LLM Serving Systems? An agentic loop that synthesizes bespoke LLM serving systems — one per (model, hardware, workload) target — instead of forcing every deployment through a single general-purpose ru…

agentic
The missing primitive in every agent harness is a protected region (www.reddit.com) +28 6w

I wrote a post about why agentic coding falls off a cliff after a few weeks. Coding agents have no equivalent of the source/assembly boundary a compiler gives us.

↯ Copilot copilot cursor agentic+1
I built agentwerk, a tiny Rust crate for scaling agent collaboration focusing on getting work done (www.reddit.com) +24 6w

For a new Rust project, I was searching for a simple agentic loop implementation. My goal was to analyze thousands of software artifacts at scale.

openclaw codex agentic+1
I built a context window optimization framework for coding agents — open source + paper (www.reddit.com) +27 6w

Been working on a problem that I think a lot of people here face: agentic coding pipelines blowing through their context window way too fast, losing important information, and degrading task quality mid-session. Apohara Context Forge is my…

agentic
I put Claude Code inside Obsidian as a plugin — full agentic vault access with a native UI bridge (www.reddit.com) +21 6w

could not extract summary

agentic claude-code
I asked 20 Agentic Aai founders how they handle agent access. 17 said temporary workarounds. (www.reddit.com) +22 6w

Over the last few weeks I’ve been doing something that probably sounds a bit obsessive. I reached out to founders and engineers who are shipping AI agents into production agents that touch CRMs, sales automation, ai chatbots, payment APIs,…

agentic
Show HN: Make your codebase agent ready (github.com via hn) +2 6w

A set of Claude Code skills to assess and improve the agentic readiness.

agentic claude-code
Powering the Inference Era: Inside the DigitalOcean AI-Native Cloud (www.digitalocean.com via hn) +2 6w

By Vinay Kumar, Chief Product & Technology Officer I’ve spent the last fifteen years building cloud services: early days of AWS building S3 and EBS, helping launch Oracle Cloud Infrastructure from inception, and now building the agentic cl…

agentic
Ask HN: How do you give estimates in the age of Agentic coding (news.ycombinator.com) +22 6w

Back in the day you would get a rough estimate of how long a new feature might take once you had worked on a codebase for long enough. You knew how the internals worked, how much time it would take to design the solution, how fast you coul…

agentic
Should we use a non-thinking model for code after using a thinking one for plan? (Agentic coding) (www.reddit.com) +2 6w

I usually use Qwen3.6 27B (slow as heck on my RX 6800 but it works) for plan and Qwen3.6 35B A3B for the coding. But I was thinking the other day if I should remove the thinking from the code model.

↯ Qwen 3.6 agentic
Ask HN: What is the underlying stack behind multi-agent platforms? (news.ycombinator.com) +2 6w

Recently, I am seeing lots of startups with multi-agent platform, where you can create your own agent template, attach tools and run it reliably. Which frameworks, platforms are you using for these kind of multi-agentic platforms?

agentic
ABA Games (1D Pac-Man, etc) Agentic Gamedev Skills (github.com via hn) +2 6w

Agentic Gamedev Skills English | 日本語 This repository collects agent skills extracted from game-development work and related agentic-workflow research. Each skill lives under .agents/skills/, uses SKILL.md as its entry point, and may includ…

agentic
Meta plans advanced 'agentic' AI assistant for users (www.reuters.com via hn) +2 6w

paywalled

agentic
Show HN: Stagewise – Agentic IDE for Your Z.ai/DeepSeek/Moonshot Subscription (github.com via hn) +2 6w

The Open Source Agentic IDE for Developers English | 简体中文 | Deutsch | 日本語 | Español | 한국어 /_components/feature-images/full-demo-dark.png) About the project stagewise is an open source agentic IDE for developers with a coding agent built ri…

deepseek agentic
Show HN: Slate – agentic pre-production studio for solo Youtubers (useslate.app via hn) +2 6w

I built slate as a personal tool to centralize my strategy, research, scripting, thumbnails and shots in one place. Started showing it to other youtubers and that made me wonder if more people could have the same problem as me.

agentic
Is GraphQL the Panacea for Agentic AI? (magiroux.com via hn) +2 6w

It was evident that GraphQL would be touted as the ultimate API style for agents. After all, it is one of the only ways we expect an API style to stay relevant these days.

agentic
Open Sourcing Our Platform - GuideAnts Notebooks (www.reddit.com) +22 6w

This is yet another agent harness and UI and I hope you will have a look and consider contributing. Elumenotion/GuideAnts: GuideAnts Notebooks.

rag agentic
Anthropic response to 1-click pwn: Shouldn't have clicked 'ok' (www.theregister.com via hn) +2 7w

MOST POPULAR EVENTS - Securing the Untrusted Agentic Development Layer Join us to learn how to architect a development environment where your builders and their agents can move fast and securely. - Toxic Flows: When Your AI Agent Skill Bec…

agentic anthropic
"Surface" a Governed AI-Agentic Surface (news.ycombinator.com) +2 7w

A continued work in progress https://github.com/pauljbernard/sbcl-agent-desktop and https://github.com/pauljbernard/sbcl-agent an implementation of the ideas discussed in: The Evolution of Software Scale https://www.amazon.com/Evolution-So…

agentic
Subjective: Building a Native VFX Editor with Agentic Coding (sxp.studio via hn) +2 7w

This blog post is about my process and learnings in using agentic coding to ship a project with higher complexity than your usual vibe-coded todo app. You can download the app on iOS/iPad/macOS here: Subjective.

agentic
Mistral Medium 3.5 Is Now Available in Puter.js (developer.puter.com via hn) +2 7w

Mistral Medium 3.5 Is Now Available in Puter.js On this page Puter.js now supports Mistral Medium 3.5, the new flagship merged model from Mistral AI that unifies instruction-following, reasoning, and agentic coding into a single set of wei…

↯ Mistral mistral agentic
Starting with Agentic AI (iscinumpy.dev via hn) +2 7w

AI suddenly passed the “more time saved than spent” point around December 2025. A little late, I’ve finally started using agentic AI in various places over the last 2-3 months, and wanted to jot down my thoughts on what works, what doesn’t…

agentic
Understanding agentic workflows (www.reddit.com) +23 7w

I tried developing workflows using github copilot in order to create an multi-agent orchestration for a use case about creating research paper based on user’s need. However, there is no supported mechanism for subagents to spawn custom sub…

↯ Copilot copilot agentic
Two OpenClaw Agents Negotiate a YC SAFE with Agentic Power of Attorney (www.juanfiguera.com via hn) +2 7w

Two OpenClaw agents negotiate a YC SAFE with Agentic Power of Attorney I gave an AI agent access to act on my behalf on a third-party platform a few months ago. Within about ten minutes I realized I was scared of it.

openclaw agentic
Aesthetic Layout in LLM-Based Slide Generation via Verifiable Rewards (arxiv.org via hn) +2 7w

Large language models (LLMs) have demonstrated strong potential in agentic tasks, particularly in slide generation. However, slide generation poses a fundamental challenge: the generation process is text-centric, whereas its quality is gov…

agentic
Chasing AI Memory SOTA: Beating the Benchmark, Missing the Point (xmemory.ai via hn) +21 7w

Chasing AI memory SOTA: Beating the Benchmark, Missing the Point 66.88%, 80.1%, 85%, 90.79%, 93%, 91.69% and even 100% — what do all these numbers have in common? They’re all state-of-the-art (SOTA) scores on various agentic memory benchma…

agentic
Global online hackathon for building AI agents with perception + memory (May 16–18) (www.reddit.com) +22 7w

Agents are moving into browsers, apps, meetings, dashboards, and code editors. The next generation of agents will need more than text context — they need to see what is happening, hear what is being said, remember important moments, and ac…

agentic
Is there tool that helps me validate my AI business idea? (www.reddit.com) +23 7w

I'm a product manager for a small business and I'm working on a product idea in the field of agentic AI. I have been chatting a lot with Gemini and ChatGPT but at some point they just keep telling me how great my idea is.

gemini chatgpt agentic
Architectural Framework for Agentic AI in Identity and Eligibility (wwps.microsoft.com via hn) +2 7w

Architectural Framework for Agentic AI in Identity & Eligibility By Prabhaker Cirium, Prin Consultant at Microsoft and Sajal Mukherjee, Senior Consultant at Microsoft Leveraging Azure AI to Revolutionize Citizen Onboarding and Benefits Eli…

agentic
We built an agentic AI for support triage. 47% deflection in 90 days. Full retro. (www.reddit.com) +23 7w

Setup: mid-size SaaS, ~3,000 tickets/month, 6 agents drowning. 70% of volume was tier-1 (passwords, billing, where's-my-feature).

rag sonnet agentic
Need advice on hardware purchasing decision: RTX 5090 vs. M5 Max 128GB for agentic software development (www.reddit.com) +26 7w

tl;dr - For software development, Qwen3.6 27B, 5090 gives you ~3x speed over M5 Max, letting you plow through code, while M5 Max gives you ~4x memory, letting you use higher quantization and bigger context. Which would you choose and why?

↯ Qwen 3.6 qwen opus agentic
A Grand Challenge for Reliable Coding in the Age of AI Agents (arxiv.org via hn) +2 7w

Agentic AI systems can now generate code with remarkable fluency, but a fundamental question remains: \emph{does the generated code actually do what the user intended?} The gap between informal natural language requirements and precise pro…

agentic
Dev Environment for Agentic Coding (adek.io via hn) +2 7w

Dev environment for agentic coding Standardized dev environment for the agent coding era is how you get multiplier on top of your coding agent I am not here to talk or praise coding agents. I am here to talk about the next multiplier which…

agentic
AI subscriptions need a reliable meter (www.reddit.com) +24 7w

TLDR; “A gallon should be a gallon. A mile should be a mile.

agentic anthropic
Ling 2.6 (Flash and 1T): Efficient Open Models Competing on Agentic Benchmarks (firethering.com via hn) +2 7w

Ant Group doesn't get the coverage it deserves. While the open source AI conversation in the West circles around DeepSeek and Qwen, Ant Group has been quietly building a model family that competes directly with the models everyone is talki…

deepseek qwen agentic
Agentic AI Community 2026 (simplai.ai via hn) +2 7w

Free, self-paced courses covering everything from agent fundamentals to real-world deployment. 50+ hands-on lessons designed for both technical and non-technical learners.

agentic
Skelm – Build AI agents in TypeScript without losing your mind (github.com via hn) +2 7w

skelm Build secure, agentic, long-running workflows in TypeScript. Run them anywhere Node runs.

agentic
The Figure-Eight Model for Agentic DevEx (medium.com via hn) +2 7w

The Figure-Eight Model for Agentic DevEx | by Joe Kutner | May, 2026 | Medium Sitemap Open in app Sign up Sign in Get app Write Search Sign up Sign in The Figure-Eight Model for Agentic DevEx Joe Kutner Follow 5 min read · 1 day ago 2 List…

agentic
tested four newest open source Kimi K2.6 is the fastest, GLM 5.1 the fanciest, DeepSeek V4 is the most comprehensive, and Xiaomi MiMo is the slowest (www.reddit.com) +21 7w

Architecture explains the gap: MiMo's MoE runs more active params per token than Kimi K2.6's optimized routing hence slowest. DeepSeek V4's 'comprehensive' edge is partly MLA: ~75% KV-cache compression makes it far better for long agentic…

↯ Glm ↯ DeepSeek 4 glm moe deepseek+1
My list for Top Agentic Frameworks - Looking for feedback on any that are missed, or theme to be addressed more fully (www.reddit.com) +22 7w

In 2026, AI agents have moved from hype to production reality. Teams are no longer asking if they should deploy agents.

agentic
Agent Orchestration Models (news.ycombinator.com) +2 7w

We are using Symphonic Orchestration (models) for our agentic commerce platform (hive of clawdbots building databases) and wanted to know what folks thought of our approach and also to learn about alternatives.

agentic
The future of company architecture (www.reddit.com) +24 7w

I've been in AI for over 10 years now and toyed with GPT2 when I was doing NLP work and really recognized the power of LLMs as a way to drive automation after spending time trying to build agents with GPT3.5. As time as gone on I've become…

agentic
A Mental Model for Agentic Work (basti.io via hn) +2 7w

Blog A Mental Model for Agentic Work May 5, 2026 - AI Agents - Company Operations - Software Engineering Something shifted in the first quarter of 2026. Not a feature launch, not a new product - a structural change in how work happens.

agentic
Show HN: Kanban-CLI – a web UI for local Markdown todo lists (github.com via hn) +2 7w

As we all are, I've been experimenting with ways to reduce external saas spend, and continually bring traditionally external pieces of context (prs, docs, trello boards) into the one mono repo. I have toyed with a markdown todo list and se…

codex agentic
Five Eyes spook shops warn rapid rollouts of agentic AI are too risky (www.theregister.com via hn) +2 7w

Five Eyes spook shops warn rapid rollouts of agentic AI are too risky Prioritize resilience over productivity, say CISA, NCSC and their friends from Oz, NZ, Canada Information security agencies from the nations of the Five Eyes security al…

agentic
PyFlue – Python-Native Agent Harness Framework (Python Clone of Flue) (super-agentic.ai via hn) +2 7w

Full-Stack Agentic AI Company We build deeply technical agent developer tools, purpose-built for agent experience and agent engineering at scale. Our research lab explores the frontier where Agentic AI meets Quantum AI.

agentic
UAE Plans to Run 50% of Government on Agentic AI Within Two Years (www.mitsloanme.com via hn) +2 7w

UAE Plans to Run 50% of Government on Agentic AI Within Two Years Agentic systems will analyze, decide, and execute across ministries under centralized oversight. News - Oman to Scale AI Ecosystem With New Special Economic Zone - UAE Bets…

agentic
Agent Evals is an absolute nightmare, so I built Signals to reduce the noise and cost (www.reddit.com) +24 7w

Hey peeps - I think the hardest thing about building agents is their evaluations. especially for scenarios that require multiple tool calls and the agent itself can go down a trajectory that you haven't manually tested before.

agentic
Show HN: Enoch – Control Plane for Autonomous AI Research (github.com via hn) +2 7w

I built Enoch after working with OpenClaw and trying to get an agentic coding system setup with Codex. In the past, I was trying to manually generate, code, and test this all manually.

openclaw codex agentic
I solved my problem and hope your also (www.reddit.com) +24 7w

I am an AI engineer. I build more AI agents, Agentic AI systems.

agentic
CISA, NSA & Five Eyes publishes guide on how to safely deploy AI agents (cyberscoop.com via hn) +2 7w

Cybersecurity agencies from the U.S. and allies issued a joint warning Friday on the risks of "agentic AI." The new guidance urges critical infrastructure leaders to implement zero-trust protocols as autonomous systems gain unmonitored acc…

agentic
Tried running Claude Code with local LLMs via Ollama — ended up subscribing to Pro anyway. But now I can't disconnect from the local server. (www.reddit.com) +23 7w

I've been experimenting with using Ollama to run Claude Code locally with models like Gemma 4, thinking I could avoid API costs. However, I quickly realised these models aren't really optimised for Claude Code's agentic workflows — they te…

↯ Gemma 4 ollama gemma agentic+2
Which Agentic Coder is the most with it now? (www.reddit.com) +21 7w

Considering the price to performance which is the best deal or setup right now? Similar to codex where it can edit project files inside a folder etc.

codex agentic
Show HN: Large Scale Article Extract of Newspapers 1730s-1960s (snewpapers.com via hn) +2 7w

Hello HN, over the past 7 months I've spent nearly 3,000 hours on building SNEWPAPERS, the first historical newpaper archive with full-text extractions, nearly perfect OCR, a vast categorization taxonomy and of course with semantic and age…

agentic
I used Claude to build "pin-llm-wiki" — A skill that turns any URL into a clean, citable Karpathy-style LLM Wiki (github.com via reddit) +21 8w

Hey 👋 I’ve been using Claude Code a lot for personal research and knowledge management, and one thing kept bothering me: Turning articles, YouTube videos, and GitHub repos into clean, structured, citable notes is tedious. So I built pin-ll…

↯ Copilot copilot cursor agentic+1
Is agentic commerce really APIs… or dynamic UIs like this? (www.reddit.com) +21 8w

https://preview.redd.it/2abn96dwudyg1.png?width=1642&format=png&auto=webp&s=ab5facbd9f4223184834711346dca2bc64db20d3

agentic
Anthropic wants to be the AWS of agentic AI (thenewstack.io via hn) +2 8w

Anthropic's Managed Agents platform bundles sandboxing, checkpointing, and persistent memory into a single API layer — and the company's ambitions look a lot less like a model provider and a lot more like AWS.

agentic anthropic
Running Local Agentic PDF Search with Eno (enopdf.com via hn) +21 8w

eno can drive its full agentic search against a local, open-weight model running on your own hardware. When you do, your PDFs, your queries, and every intermediate step of the agent loop stay on your machine.

agentic
Get Your Website/API Ready for Agentic Commerce in 1 Minute (www.startuphub.ai via hn) +22 8w

Free scanner that audits websites, APIs, and MCP endpoints across 7 categories — discoverability, content, access control, capabilities, commerce (x402-mesh), and quality. Public leaderboard, open spec, paste-ready fix prompts.

mcp agentic
OpenAI + agentic systems (DFW) (www.reddit.com) +23 8w

i’ve been using OpenAI tools more heavily lately and keep circling back to the same shift: moving from simple chat use into agentic systems. Most people still seem to be using it for Q&A or basic content help, but there’s a lot more happen…

agentic openai
My agent works 3 times… then randomly skips steps and breaks. Same input. Why? (www.reddit.com) +22 8w

I’ve been deep in the trenches building out multi-step agentic workflows, and I’m hitting a consistent wall with what I can only describe as "stochastic decay." The pattern is frustrating: Runs 1 through 3 execute flawlessly, but by the fo…

agentic
how do you stop people from finding loopholes in your agents once they're in production? (www.reddit.com) +21 8w

agentic demos always look clean in a controlled setup. the problem that I'm pushing toward real volume now and the adversarial side is getting messy fast.

agentic
Fixed the risk of agents disclosing your secrets (www.reddit.com) +210 8w

Why is it considered acceptable by most in the community to have API keys sitting on a file system where the agent is running, with direct access to them, gated by a prompt? This is literally the base security model of OpenClaw and most ot…

openclaw agentic
Letting AI play my game – building an agentic test harness to help play-testing (blog.jeffschomay.com via hn) +2 8w

Vercel Security Checkpoint | sfo1::1777467624-qE4eB4e2LvmbibEDgl5Ljah0zEqW8iFE

agentic
Best Practices to Start with Vibe Coding? Best Local Apps for Agentic Vibe Coding? (www.reddit.com) +22 8w

DISCLAIMER: I am not a programmer nor do I have experience coding. I've been thinking about a small app running on gradio for some time now, and I want to try tweaking some extension for ComfyUI.

↯ Minimax ↯ DeepSeek 4 cline minimax deepseek+1
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-Unify Architecture (www.reddit.com) +22 8w

SenseNova U1 is a new series of native multimodal models that unifies multimodal understanding, reasoning, and generation within a monolithic architecture. It marks a fundamental paradigm shift in multimodal AI: from modality integration t…

openclaw chatgpt agentic
Genuine question for people who have built multi-agent systems in production. How do you handle context continuity across enterprise tools? (www.reddit.com) +21 8w

I've been going down a rabbit hole lately trying to understand how production agentic systems actually work at scale, not just the demo versions. The part that keeps tripping me up is memory and context management across agents.

agentic
Is an agentic Spark copilot worth it? opinions? (www.reddit.com) +22 8w

Running Spark jobs on Databricks with 50+ stages per pipeline. Debugging is still almost entirely manual.

↯ Copilot copilot agentic
OpenGame: Open Agentic Coding for Games (arxiv.org via hn) +2 8w

Game development sits at the intersection of creative design and intricate software engineering, demanding the joint orchestration of game engines, real-time loops, and tightly coupled state across many files. While Large Language Models (…

agentic
How are you ACTUALLY running truly asynchronous agentic AI in your business? (www.reddit.com) +21 8w

I'm starting a new company (I will not promote) and I want to hear how you're actually running operations that have little-to-no "human in the loop". Tools like OpenClaw are great for personal use, but how are you leveraging tools/systems…

openclaw agentic
An open-source platform to auto-update agent skills and discover fresh sources (www.loooop.dev via hn) +2 8w

GitHub obra/superpowers: An agentic skills framework & software development methodology that works. · GitHub GitHub obra/superpowers: An agentic skills framework & software develop… Loop autonomously monitors, evaluates, and updates your a…

agentic
The Controllability Trap: A Governance Framework for Military AI Agents (arxiv.org via hn) +2 8w

Agentic AI systems - capable of goal interpretation, world modeling, planning, tool use, long-horizon operation, and autonomous coordination - introduce distinct control failures not addressed by existing safety frameworks. We identify six…

↯ Tool Use tool-use agentic
TealKit – A cross-platform UI for local AI agents and MCP (github.com via hn) +2 8w

# 🐦‍⬛ TealKit The Privacy-First, Infinitely Extensible Agentic AI Platform for Mobile & Desktop TealKit turns your phone and computer into a powerful agentic AI platform with autonomous agents, built-in tools, and unlimited extensibility.…

mcp agentic
Agentic CEO – An AI research organism that hunts, critiques, and evolves itself (github.com via hn) +21 8w

Agentic CEO An autonomous multi-agent research system that acquires knowledge, builds a persistent worldview, and improves itself. 3,700+ knowledge entries.

agentic
I've got a feeling that Llamacpp is not the biggest performance bottleneck, but it might be the OpenCode. (www.reddit.com) +217 8w

It looks as if OpenCode introduces an artificial delay in agentic coding. Have you noticed similar issues?

llama agentic
Apple integrates Claude and Codex into Xcode 26.3 for 'agentic coding' (venturebeat.com via hn) +2 8w

Apple integrates Anthropic’s Claude and OpenAI’s Codex into Xcode 26.3 in push for ‘agentic coding’ | VentureBeat Orchestration Infrastructure Data Security More Newsletters Apple integrates Anthropic’s Claude and OpenAI’s Codex into Xcode…

codex agentic openai+1
Show HN: 49Agents – Infinite canvas IDE for AI agents (github.com via hn) +2 8w

49 Agents IDE The first 2D agentic IDE. Open source.

agentic
The Full-Cycle Agentic Experience (www.reddit.com) +21 8w

The Full-Cycle Agentic Experience What we're missing, and why it matters more than the models themselves. Think about the last time you bought something in a store.

agentic
Agentic AI made DevOps and Agile obsolete (avkcode.github.io via hn) +21 8w

The Self Healing Platform and the Agent Store I think DevOps as a separate identity, and a lot of agile ceremony around it, are already a bit obsolete. Engineers are doing development, operations, and lightweight management at the same tim…

agentic
Agentic ML engineer. works with Colab. Zero infra needed. 3x faster TurboQuant (github.com via hn) +21 8w

isanagent An always-on, agentic ML engineer for your workspace — built by ALTAI. isanagent doesn’t just answer prompts: it pushes work toward something shippable — research, code, runs, checks, and handoffs you can actually use.

agentic
PI agent integrated with Cline-Kanban repo: All using PI and Qwen 3.6 35B MOE UD 4K_XL (www.reddit.com) +22 8w

Repo: statisticalplumber/kanban at pi-agent-integration Hi Guys, To test Qwen 3.6’s potential, I also wanted the Cline Kanban project to have an open-source agent to work with. The last time I tested Cline Kanban, it didn’t support agents…

↯ Qwen 3.6 cline moe qwen+3
PAuth – Precise Task-Scoped Authorization for Agents (arxiv.org via hn) +2 8w

The emerging agentic web envisions AI agents that reliably fulfill users' natural-language (NL)-based tasks by interacting with existing web services. However, existing authorization models are misaligned with this vision.

agentic
Agentic Workforce Framework, an operating model for autonomous agent teams (github.com via hn) +2 8w

Agentic Workforce Framework A reference architecture for operating autonomous AI agents as accountable digital workers inside enterprise environments. This framework defines how agents are assigned work, bounded by role, governed by approv…

agentic
A 14-day “Growth Forge” sprint: build an AI-powered growth agent on a real stack (www.reddit.com) +2 8w

Sharing something that sits at the intersection of AI agents and growth systems. VideoDB (backend for video/audio for AI agents) is running a 14-day sprint called Growth Forge for 5 builders to design and ship a growth agent on top of an e…

agentic
Show HN: I made GAI to have LLM agents in Go without heavy frameworks (github.com via hn) +2 8w

GAI is a flexible Go library for building agent-style applications on top of LLMs. It provides a generic interface for providers and models, prompt and context helpers, and a loop for agentic-calling workflows.

agentic
Mario & The Intent-Bearing Agentic Loop (www.reddit.com) +22 8w

Q: When do I need Agents vs. Skills vs.

agentic
RTX 3090 + 27B model performance issues (llama.cpp) what am I doing wrong (www.reddit.com) +217 8w

Hey folks — looking for some advice on improving my local LLM setup (and also exploring agentic coding workflows). Current setup: GPU: RTX 3090 (24GB VRAM) RAM: 64GB Using llama.cpp with a Qwen3.6 27B Q6 model (GGUF) Running through OpenCo…

↯ Qwen 3.6 llama agentic
Show HN: Mdspec – auto sync your md files from GitHub repos with wikis (mdspec.dev via hn) +2 8w

We do generate a lots of md files along with our agent based development. Skills, Agent.md, Docs etc.

agentic
SimpleBanking sb CLI – Query real German bank accounts from the terminal (balances, transactions, categories, JSON output) (www.reddit.com) +22 8w

Hey r/AI_Agents, I've been building SimpleBanking, an open-source macOS banking app for German bank accounts using the FinTS/HBCI protocol (the standard used by German banks like Sparkasse, Volksbank, DKB, etc.). It now ships with a full C…

agentic
What are the limits of the agentic computer features on 5.5? (www.reddit.com) +2 8w

Is this supposed to be an OpenClaw / Hermes Agent competitor ? Am I able to ask it to go on my browser, visit a site I’m logged into and gather info?

openclaw agentic
Agentic Company OS update: project-scoped runtimes, governance UI, snapshots/replay, skills, and operating models (www.reddit.com) +22 8w

I shared this project here before when it was mainly a governed multi-agent execution prototype. I’ve kept working on it, and the current implementation is materially more complete, so I wanted to post an update with what actually exists n…

agentic
Building a full-stack app with Wasp, an agent-friendly web framework (wasp.sh via hn) +22 9w

From 10 Failed Stacks to Production: How a Data Scientist Built a Job Board with Wasp, a Full-stack Framework for the Agentic Era Hireveld is currently down while Marcel works on a major refactor - but it's real, we swear! It'll be back up…

agentic
Is anyone else way faster with AI in familiar stacks and way slower in unfamiliar ones? (www.reddit.com) +23 9w

Been using agentic coding workflows seriously for about a year now and I've finally figured out the pattern behind why it feels magical half the time and broken the other half. At my day job, where I know the stack and have intuition about…

agentic
Show HN: We built Cursor, but for data transformations [Open Source] (github.com via hn) +2 9w

Agentic & No-Code Data Transformations Vibe coded pipelines: say hello to accuracy and maintainability. Website · Documentation · Issues · Contributing What is Visitran?

cursor agentic
Google's 8th Generation TPUs Power the Agentic Era [video] (www.youtube.com via hn) +2 9w

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

agentic
Speeding up agentic workflows with WebSockets in the Responses API (openai.com via hn) +2 9w

could not extract summary

agentic
Show HN: API Ingest – Agentic Search (Inter) API Docs (github.com via hn) +21 9w

1. CC / Codex dont handle API Docs well enough No matter what I do, I run into bad requests with claude, day in, day out.

codex agentic
Show HN: Sift – save AI tokens in Codex/Claude by summarizing command output (github.com via hn) +21 9w

I made a small skill/script for agentic coding workflows: https://github.com/panpeter/sift-skill The idea is simple: when a command like cargo test, pytest, npm test, or ./gradlew test prints a lot of output, that raw log often gets pulled…

codex agentic
Show HN: We open-sourced a 6-library governance stack for AI agents (Python) (news.ycombinator.com) +2 9w

Our team has been deploying AI agents in enterprise environments for the past 2 years, across 60+ deployments. The same governance problem kept recurring: how do you certify reliability, enforce policy, route and orchestrate context, monit…

agentic
Cursor partners with SpaceX on model training (cursor.com via hn) +21 9w

Cursor partners with SpaceX on model training Cursor is partnering with SpaceX to accelerate our model training efforts. We released Composer less than six months ago as our first agentic coding model.

cursor agentic
How do you decide on chunking strategy and top-k in Agentic RAG? Looking for practical advice (www.reddit.com) +21 9w

Hey, I'm building an Agentic RAG pipeline and struggling with two decisions: Chunking strategy — fixed-size, semantic, or hierarchical? In an agentic setting where the agent can re-query iteratively, does it make more sense to use smaller…

rag agentic
X402 and Agentic Commerce: Redefining Autonomous Payments (aws.amazon.com via hn) +2 9w

agentic
Managing context in long-run agentic applications (slack.engineering via hn) +2 9w

agentic
The Bitter Lesson of Agentic Coding (agent-hypervisor.ai via hn) +2 9w

agentic
Using closed financial markets with deterministic goals for agent behavior improvements (www.reddit.com) +26 9w

agentic
Which AI Agents SDK allows low latency agents w support for skills etc? (www.reddit.com) +23 9w

agentic openai
Show HN: AI Primer – A Searchable AI Changelog for AI Engineers and Creatives (www.ai-primer.com via hn) +22 9w

agentic
I'm completely lost in the Agentic Maze. What level to learn. how to organize stydu (www.reddit.com) +212 9w

↯ Opus 4.7 vector-database rag gemini+2
Show HN: Agentic Dev – AI dev-tools news, curated daily by Claude (agenticdev.blog via hn) +2 9w

OpenAI released a major update to Codex, used by over 3 million developers weekly, adding background computer use, an in-app browser, image generation via gpt-image-1.5, more than 90 new plugins, GitHub PR review support, SSH connectivity,…

codex agentic openai
Why AI Agents are bad at “generating a business idea” (www.reddit.com) +24 9w

My opinion is it is a matter of structured approach. Of course when you just ask Claude to “find top apps in AppStore and tell me what app should I build” you will get as generic answer as your question.

agentic
Fast local LLM to generate CLI commands from prompt? (www.reddit.com) +25 9w

GitHub copilot CLI used to do this but now it’s a full agentic coding environment. Basically, I can’t remember all the options to every Linux command.

↯ Copilot copilot agentic
Built a full-stack charitable giving SaaS as a solo developer with agentic AI (www.pifster.org via hn) +2 9w

PIFster - the Pay It Forward Charity Did you know there are 1.8 million nonprofits in America? Most are struggling to be heard, but PIFster is changing that.

agentic
[Claude Code] Stuck in 57+ minute loop for routine fixes (Opus 4.7) (www.reddit.com) +24 10w

I'm running into a severe performance hang with Claude Code (Opus 4.7) today. I provided a relatively straightforward prompt to fix some hydration errors, add two stub routes, and perform a theme audit (string replacement).

↯ Opus 4.7 opus agentic claude-code
Cowork Orchestrator Patterns (www.reddit.com) +23 10w

While working in Cowork, I have been experimenting with designing plugins that try to apply some established agentic patterns to help manage the context window. The problem that I'm running into is with Cowork the main orchestrator is the…

↯ Cowork cowork agentic
What is the simplest architecture for running a multi-agent system at scale? (www.ashpreetbedi.com via hn) +2 10w

Scaling Agentic Software: Part 1 What is the simplest architecture for running a multi-agent system at scale? I want to deploy agents as a real service.

agentic
Show HN: Marky – A lightweight Markdown viewer for agentic coding (github.com via hn) +2 10w

Hey HN, In this age of agentic coding I've found myself spending a lot of time reviewing markdown files. Whether it's plans or documentation that I've asked my agent to generate for me, it seems that I spend more time reading markdown than…

agentic
Kelvin Claw: A secure, modular agent harness with supply-chain validated plugins (agentichighway.ai via hn) +2 10w

Agentic Highway Team KelvinClaw: A secure, modular agent harness with supply-chain validated plugins An agent runtime designed for zero-trust environments from the ground up. Building secure agent systems at scale is a different problem th…

agentic
I built a self-evolving agentic loop that ran 104 iterations autonomously to find questions that break every LLM — here's the architecture (www.reddit.com) +24 10w

Why I built this: I wanted to find the next "strawberry problem" — simple questions any kid can answer but every LLM gets wrong. Instead of manually testing questions, I built a system that does it autonomously.

agentic claude-code
A Black-Box Contract Engine for Agentic Software Development (github.com via hn) +2 10w

Project Dojo A Black-Box Contract Engine for Agentic Software Development Dojo is a declarative testing engine built in Go. It acts as a transparent Man-in-the-Middle proxy between your Software Under Test (SUT) and its dependencies.

agentic
Ask HN: We dont need a programming language now? (news.ycombinator.com) +24 10w

I've seen agentic IDEs now Cursor or Antigravity and main trends seems to be development with just ideas, where although the changed lines are shown, its becoming less and less visible. If we are becoming language agnostic, shouldn't we op…

cursor agentic
Solving the "Agentic Kill-Switch": Moving from Prompt Guardrails to a Python-native Safety SDK (www.reddit.com) +22 10w

The biggest hurdle for taking agents from "cool demo" to "production tool" is the lack of a reliable circuit breaker. We're currently relying on the LLM to "behave" via system prompts, but as we know, jailbreaks and hallucinations make tha…

agentic
Ask HN: Which LLM model and agentic CLI are you using for local development? (news.ycombinator.com) +21 10w

I’ve been testing a handful of models the past few weeks, but I still haven’t settled on one yet… I’m curious to see what models, their sizes, on what hardware, and which agentic tool people are using

agentic
Scaling from single-repo Claude projects to a multi agentic workflow (www.reddit.com) +24 10w

Hi everyone! Just a quick exchange on what I am using — and I'd love your take on it 🤖 So far I have mainly been doing one-off projects, setting up Claude in a single repo at a time.

agentic
Ask HN: What standards or protocols exist for AI Agent permissions (news.ycombinator.com) +21 10w

Curious what standards exist for AI agent permissions. Something like Linux read, write, execute types, but for AI agents.

agentic
The (Mostly) Agentic SDLC (amoshaviv.com via hn) +21 10w

Monday, 12:00. Grace, the CEO of ACME Corp, just finished her Q2 leadership meeting.

agentic
Tradclaw: an open source AI mom for agentic parenting (twitter.com via hn) +21 10w

My family assistant "Finley" is a full fledged member of the household , and I just open sourced her for all the Very Bad Moms and Dads ™️ out there that just need a little 🤖 help. Wanna get started right away?

agentic
Is qwen3 coder next still relevant with qwen3.5 release for agentic coding? (www.reddit.com) +220 10w

Basically the title. I know it will depend on your quant, but with 48gb of vram inbound, I'm curious on the communities opinion before I get the chance to vibe check.

↯ Qwen 3.5 agentic
What are the key features that make an AI system truly "agentic"? (www.reddit.com) +24 10w

Here's the cleanest breakdown I've seen: Autonomy – Acts without constant human prompting Goal-Oriented Behavior – Works toward defined outcomes, not just single responses Adaptive Learning – Gets better from outcomes over time Multi-Step…

agentic
Show HN: A Bomberman-style 1v1 game where LLMs compete in real time (github.com via hn) +22 10w

A few weeks ago, ARC-AGI 3 was released. For those unfamiliar, it’s a benchmark designed to study agentic intelligence through interactive environments.

arc-agi agentic
Show HN: On-Device vs. Cloud LLMs for Agentic Tool Calling in a Real iOS App (subralabs.com via hn) +2 10w

We built an AI concierge into a resort directory app for iOS. The feature needed to search a dataset of ~85 properties, apply filters, find nearby airports, and respond conversationally in Italian.

agentic
Agentic Search Leaderboard (www.algolia.com via hn) +2 10w

We tested every major LLM on real shopping queries through Agent Studio, Algolia's platform for building search and discovery agents. Three dimensions of quality.

agentic
OpenClaw Self-Improvement Loop: adversarial agentic self-modification workflow (github.com via hn) +2 10w

An adversarial framework for AI agent self-modification, built and battle-tested in production. Inspired by karpathy/autoresearch.

openclaw agentic
1 year of LLMs writing code for me (www.alexarvanitidis.dev via hn) +2 10w

1 year of LLMs writing code for me Published 5 days ago I have been an early adopter of AI coding tools. When the first serious agentic coding tools launched, I picked them up immediately and made them my daily driver.

agentic
Agentic dashboard analysis (www.reddit.com) +2 10w

Hi all Like most of us the execs at my company are big into AI. I saw a potential implementation to get myself more experienced with agents by having an agent perform a daily analysis on a dashboard to perform summaries and anomaly detecti…

agentic
Show HN: A better alternative to CLI and MCP for local tools (github.com via hn) +2 10w

I've created an alternative to CLI and MCP for locally running agentic tools. It uses Unix-based OS's named pipes, which means the client has quick IPC with the tool and it can have in-memory state.

mcp agentic claude-code
Observing the shift toward open-weight models for agentic coding workflows (www.reddit.com) +21 10w

I've been practically evaluating some of the recent open-weight mixture-of-experts models, specifically focusing on their application in complex software engineering and agentic coding workflows. established pattern has typically involved…

agentic
Is my 'Retry Tax' math correct for DeepSeek V3/V4 agents? (Project Feedback) (www.reddit.com) +214 15w

deepseek agentic
The Human Agentic Gap (zenodo.org via hn) +1 4h

The Human Agentic Gap Authors/Creators Description This article introduces the Human Agentic Gap - the divergence between a brand's performance in human-mediated AI purchase journeys and its performance in autonomous agent purchase journey…

agentic
Evaluating performance and efficiency of the GitHub Copilot agentic harness (github.blog via hn) +1 5h

Evaluating performance and efficiency of the GitHub Copilot agentic harness across models and tasks Explore how the GitHub Copilot agentic harness delivers strong results across multiple benchmarks and leading token efficiency, while maint…

↯ Copilot copilot agentic
Agent Zero – A full Docker Linux system for your AI agent (github.com via hn) +1 8h

Agent Zero A full Linux system for your AI agent. Agent Zero is an open, dynamic, organic agentic framework.

agentic
Multi agent systems for complex tasks (lexifina.com via hn) +1 17h

Multi agent systems for complex tasks Fundamentals Agentic systems reveal hard boundaries in current LLM architectures. During pre-training, models are exposed to a distribution of sequence lengths, with the vast majority of examples being…

agentic
The OWASP Agentic Security Initiative Top: A Practical Developer Guide (agentsafelabs.com via hn) +1 18h

I ran 30 adversarial prompts across all 10 OWASP ASI categories against Claude Haiku. 20 passed.

haiku agentic
Holy shit, we just invented a new agentic memory architecture (news.ycombinator.com) +11 20h

Holy shit, we just invented a new agentic memory architecture i thought we were stuck in 2025, but it turned out we were a head of the curve. i can't believe that the best memory system invented so far is the one openclaw uses, finally we…

openclaw agentic
Personal internet radio: Agentic AI DJ (github.com via hn) +1 22h

SUB/WAVE A personal internet radio station. One Icecast stream, one broadcast.

agentic
Ask HN: What are your favorite CLIs to use as LLM tools? (news.ycombinator.com) +1 1d

I've recently been getting into building out skills for agentic coding. The two main ones I have built out are using the jira CLI[0] and the gitlab CLI[1].

agentic
Security tools inside coding agents get ignored unless we do things (www.boringappsec.com via hn) +1 1d

Edition 34: A consensus is finally emerging on securing the Agentic SDLC But we are a while away from solutions that are ready to use. As frequent readers of the newsletter would know, I’ve been obsessed with the topic of today’s post for…

agentic
Is There a 'Vienna School of Agentic Coding'? [video] (www.youtube.com via hn) +1 1d

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

agentic
Pillars of an Autonomous Agentic System (sohit.substack.com via hn) +1 1d

Pillars of an Autonomous Agentic System With the rise of agents, every platform needs to be thought through from first principles - where the agent is a first-class citizen and actively does work on the platform. I don’t think we will have…

agentic
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference (arxiv.org via hn) +1 1d

The performance of multi-turn, agentic LLM inference is increasingly dominated by KV-Cache storage I/O rather than computation. In prevalent disaggregated architectures, loading the massive KV-Cache from external storage creates a fundamen…

agentic
Vibe Coding to Agentic Engineering with Claude Code (www.apimatic.io via hn) +1 1d

A repeatable three-phase Claude Code workflow: structured prompt, plan mode, and review. Get reliable code from your coding agent every session.

agentic claude-code
Show HN: Orchid – Local-first record and replay for AI agent debugging (github.com via hn) +1 1d

Orchid (Orchestration interactive debugger) is a zero-instrumentation proxy that captures every API & LLM call in your agent pipeline, then lets you inspect and replay the entire run locally, step by step. No instrumentation, no vendor loc…

mcp agentic
Show HN: Munin – OSS HubSpot alternative I built in a month with Claude Code (github.com via hn) +1 1d

Munin MCP-first customer platform made for the agentic era. The agent is the UI.

mcp agentic claude-code
Unreliable Agentic Systems in Production (news.ycombinator.com) +1 2d

I'm seeing a lot of teams hit a wall where their agents work in dev, but get destroyed when put against the real world in prod. Is that a headache for you right now, or have you managed to solve it ?

agentic
Show HN: Browse design inspiration from terminal while Claude is thinking (news.ycombinator.com) +1 2d

What if you could get design inspiration (from HackerNews, ProductHunt, Awwwards, Mobbin) while Claude is thinking directly from the terminal? When Claude's thinking, I'm typically doing one of these things: 1/ Scrolling LinkedIn 2/ Checki…

agentic
PhoneBuddy: Training Open Models for Agentic Phone Use (phonebuddyai.github.io via hn) +1 2d

Training open phone-use models with real-app RL and PhoneWorld-style mock-app training, showing that realism and scalable verified interaction are complementary. PhoneBuddy: Training Open Models for Agentic Phone Use PhoneBuddy studies how…

agentic
Anthropic rolls out Claude Tag, your new agentic AI coworker in Slack (www.zdnet.com via hn) +1 2d

Anthropic rolls out Claude Tag, your new agentic AI coworker in Slack Follow ZDNET: Add us as a preferred source on Google. ZDNET's key takeaways - Claude Tag puts an always-on AI coworker inside Slack.

agentic anthropic
Show HN: Agent skills that review user-facing agent UX from your codebase (github.com via hn) +1 2d

Agentic Product Review Skills Open skills for finding, reviewing, and improving conversational AI agent capabilities in a codebase (chat, copilot, assistant UIs). These skills are meant for product and technical teams.

↯ Copilot copilot agentic
Who Does What? Team Topologies for the Agentic Platform (blog.owulveryck.info via hn) +1 3d

The agentic platform defines what needs to be provided. Team Topologies defines who provides it, and how teams interact to make it happen.

agentic
Show HN: Subconscious and GLM-5.2 Makes "/compact" Obsolete (www.subconscious.dev via hn) +1 3d

GLM-5.2 is a turning point for coding agents. It's the first model a business would actually pay to replace Claude Opus with.

↯ Glm ↯ GLM 5.2 glm opus agentic
Super AI Agentic Android App (BYOK) (news.ycombinator.com) +1 3d

i am building an Agentic Android App (twent.xyz) that has: SOTA agentic memory + Knowledge Base to see Agent's memory, UI Automation, explain-what's-on-screen, Linux Ubuntu Terminal with Agent CLIs Supported, Connects to 1k+ tools, Infinit…

mcp agentic
Ask HN: Agents, comments, and harnesses – oh my (news.ycombinator.com) +1 3d

just curious if anyone knows or has a link to studies regarding comments in codebases and agentic dev. I recently went through and chopped 50%+ of the comments the model had left on initial build and I'm wondering if this improves parsing…

agentic
Agile and Coding: An Agent- and Human-Friendly Architecture (davidvujic.blogspot.com via hn) +1 4d

"Software Architecture in the agentic era?" What's needed for an architecture to fit well in the agentic era? Probably many things, but I would say at least simplicity and available context as two very important things to consider.

agentic
Designing Teams for an Agentic World (www.anup.io via hn) +1 4d

Designing teams for an agentic world For thirty years, software organisations were built around the same logic: hire specialists, group them into functions, put managers above them, and build a pyramid on a wide junior base. That made sens…

agentic
Open Ralph Wiggum – Autonomous Agentic Loop (github.com via hn) +1 5d

Open Ralph Wiggum Autonomous Agentic Loop for Claude Code, Codex, Copilot CLI, Cursor Agent, Qwen Code & OpenCode Works with Claude Code, OpenAI Codex, Copilot CLI, Cursor Agent, Qwen Code, and OpenCode — switch agents with --agent. Based…

↯ Copilot copilot qwen codex+4
Agentic Systems Course: Learn AI Agents with an AI Coding Agent (github.com via hn) +1 5d

Agentic System Course - Use Agent to Learn Agent Join the discord channel if you want to learn and build together! This is a 22-chapter skeleton course on how to design, build, and operate production AI agents — written to be read with you…

agentic
Building a Dense Agentic AI CPU Rack Today (www.servethehome.com via hn) +11 6d

Server CPUs have gone from the doghouse to becoming ultra-important pieces of infrastructure, and agentic AI is the reason. This is one of those topics that I have been talking about with organizations for months, and I thought I might jus…

agentic
Who Owns the Code Claude Wrote? (www.oreilly.com via hn) +1 6d

The following article originally appeared on Sena Evren’s Legal Layer newsletter and is being reposted here with the author’s permission. TL; DR Agentic coding tools like Claude Code, Cursor, and Codex generate code that may be uncopyright…

codex cursor agentic+1
Cursor Is Now SpaceX: Enterprise Agentic Coding's New Lock-In Risk (superml.dev via hn) +1 6d

Cursor Is Now SpaceX: Enterprise Agentic Coding's New Lock-In Risk SpaceX's $60B acquisition of Cursor ends the era of multi-model, model-neutral AI coding platforms — and every enterprise team that built agentic CI/CD workflows in Cursor…

cursor agentic
What are good benchmarks to test my CLI AI agentic system? (www.minovativemind.dev via hn) +1 6d

What can Minovative Mind CLI do? Short Demonstration Of Minovative Mind CLI Context Intelligence Engine Minovative Mind autonomously investigates your codebase using a highly-optimized sub-agent to gather context, trace dependencies, and c…

agentic
Agentic Capital Raising What? (octum.ai via hn) +1 6d

Agentic Capital Research A form of investment research conducted by an autonomous AI agent that can independently retrieve data, reason across multiple sources, synthesize findings, and deliver actionable intelligence — without requiring t…

agentic
Agent Finder (github.com via hn) +1 7d

Discover AI resources Search across AI resources surfaced by Agent Finder. Agent Finder implements the Agentic Resource Discovery (ARD) specification.

agentic
Agentic Website Optimization (frontpage.host via hn) +11 7d

Frontpage is an AI website builder for marketers and creators who want to reach their audience. Agents build and edit your site for you, then keep improving it over time using your real traffic data.

agentic
Accept payments with your MCP tool (github.com via hn) +1 8d

ACP Payment Module (mcp-commerce) A drop-in commerce core for MCP servers, built on the Agentic Commerce Protocol (ACP). Point it at a simple products file and your tool can take money inside the conversation that was already happening: bu…

mcp agentic
Agentic AI, Biology, and What Remains Human (dvitsios.org via hn) +1 8d

TL;DR: Agentic AI is not just making work faster. It is turning work into fast-moving loops of planning, coding, testing, deployment, and iteration.

agentic
Ask HN: Has AI impacted your writing style? (news.ycombinator.com) +12 8d

Ever since the dawn of agentic AI I transitioned more from writing code to reading what AI is doing and monitoring it. Naturally going from 1 session to at times 3 or 4 has increased the amount of AI produced words I consume.

agentic
Show HN: We cut >60% of tokens from agentic tasks by removing repeated context (parcle.ai via hn) +1 8d

Every agentic system I see has the same hidden tax: the model keeps rereading the same context. Tickets, Slack threads, docs, customer history, database notes, runbooks, logs, prior decisions.

agentic
Treating Agent Reasoning as a Span (forestmars.substack.com via hn) +11 8d

AI Agents Run To Completion Treating reasoning as infrastructure is collapsing the stack The Agentic Web has two requirements that have to work simultaneously: trust and observability. Trust without observability is recklessness.

agentic
Show HN: Tablething – local-first database client with BYOK AI (tablething.com via hn) +1 8d

Hi HN, Tablething is a cross-platform, local-first database client built with Tauri. It currently connects to 13 data sources including Postgres, MySQL, SQLite, ElasticSearch, with more on the way.

agentic
Show HN: Aihu – durable Web Components an AI agent can drive (full WC framework) (github.com via hn) +1 8d

Aihu Aihu — agentic discovery and interaction, for human purpose. Aihu builds durable Web Components your AI agent can read and drive — not disposable UI it has to generate.

agentic
Two AI agents run my news site; a grounding gate keeps them honest (www.runagentrun.co.uk via hn) +1 8d

On 9 June 2026, Anthropic released Claude Fable 5 — the publicly available version of its most powerful model, pitched at software engineering and agentic work. That evening, our founder pointed Claude Code at an empty folder with a produc…

agentic anthropic claude-code
Ask HN: Who's Solving GTM Agentically? (news.ycombinator.com) +1 8d

Now that everyone and their little cousin can make apps - the next biggest hurdle is distribution for PMF validation/iterations. Besides the plethora of spammy solutions via automated personalized email and LinkedIn campaigns (which has ex…

agentic
Building an AI Agent in 6 Weeks (and Understanding How They Work) (belderbos.dev via hn) +1 9d

Building an AI Agent in 6 Weeks (and Finally Understanding How They Work) Building agentic AI? I co-run a 6-week cohort where you ship a production-ready agent, not another API wrapper.

agentic
OBS Agentic Control Interface (github.com via hn) +1 9d

🚀 OBS Agentic Control Interface (obsagent) [!NOTE] 🤖 100% Coded by AI: This entire repository and application was engineered 100% autonomously by Antigravity, an agentic AI coding assistant. A powerful, self-contained agentic interface bui…

agentic
Show HN: The Ruby AI Newsletter (rubyai.beehiiv.com via hn) +1 9d

Now on its 32nd edition, the Ruby AI Newsletter tracks what’s happening at the intersection of Ruby, Rails, and AI coding agents. YC recommends Rails for new startups, YC’s internal software like Bookface, Work at a Startup, and the softwa…

agentic
Bayer's PRINCE: a production agentic RAG system (martinfowler.com via hn) +1 9d

Building Reliable Agentic AI Systems A Case Study in building production-ready agentic AI systems This paper presents the Preclinical Information Center (PRINCE), a cloud-hosted platform developed by Bayer AG with Thoughtworks to address p…

rag agentic
Logical Ways to Track AI Agent Lineage and State in Code Development (davenporter.substack.com via hn) +1 9d

How to Track AI Agent Lineage and Manage State in Code Repositories Moving beyond clean git commits to knowledge systems for agentic development. “Keep your git commits clean.” It’s a goal for everyone, but we always struggle to actually d…

agentic
Show HN: Skill Atlas – Local, visual IDE for Agentic Skills (BYOK, no back end) (github.com via hn) +1 9d

Skill Atlas is a standalone, serverless IDE designed specifically to solve this problem. It parses your entire skill repository and automatically constructs a visual Directed Acyclic Graph (DAG).

agentic openai anthropic
There's no such thing as an agentic CPU (www.theregister.com via hn) +1 9d

MOST POPULAR EVENTS - From Prompt to Exploit: How LLMs Are Changing API Attacks Modern applications are API-driven, interconnected, and often over-permissioned, making them an ideal target for AI-assisted attacks. - Architecting the Future…

↯ Security security agentic
Ask HN: How do open source companies make money? (news.ycombinator.com) +11 9d

I'm new to the open source side of things and I'm trying to learn as much as I can, I'm looking for material recommendations about Open Source Software (specially in the age of agentic AI). Can you drop your favorite ones?

agentic
Show HN: Dao Browser – An Opinionated Browser. With AI Agent, BYOK (github.com via hn) +1 10d

### Dao Browser An AI-native, content-first Chromium-based browser with a vertical tab sidebar — built for the agentic web. Download · Website · AI Agent · Features · Development Built-in AI Agent Dao isn't a browser with an AI extension b…

agentic
Talk: Python Type Checking in Agentic Workflows [video] (www.youtube.com via hn) +1 10d

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

agentic
Welcome to the Agentic Era. Ready or Not (www.mikehyland.com via hn) +1 10d

The conversation has shifted from chatbots to systems that work while you

agentic
Agentic AI PRs sit in the review queue 5.3x longer than unassisted ones (blog.codacy.com via hn) +11 10d

AI Is Breaking Code Review: How Engineering Teams Survive the PR Bottleneck AI coding tools have made it easier to produce code, but they have not made it easier to ship it safely. Pull request queues are growing faster than review capacit…

agentic
Show HN: Ghostty in-browser with real client-side back end (ghosttyplayground.com via hn) +1 10d

A real work in progress I've got here. Our good friend Ghostty is now haunting the browser.

agentic
SAMF- Deterministic Moscow guardrails for LLM multi-agent loops (github.com via hn) +1 10d

SAMF: SAWANT Agentic MoSCoW Framework Structural MoSCoW contracts for deterministic LLM validation. What is it?

agentic
Agentic loops don't fix lying agents (tsdevstack.dev via hn) +11 11d

Agentic loops don't fix lying agents Published June 15, 2026 by gyorgy The current discourse says you should stop prompting coding agents and start designing loops around them. Give the agent a trigger and a verifiable goal, let an evaluat…

agentic
Tell HN: Forget selectors and screenshots. The agentic web lives in your shell (news.ycombinator.com) +1 11d

These old ways are too heavy. Full self browsing doesn’t require Elon Musk vision processing.

grok agentic
Agentic-fs, a cloud-hosted filesystem for AI agents (github.com via hn) +1 11d

agentic-fs Filesystem-style access to your documents, for AI agents, in your own AWS account. list / glob / grep / tree / find / ranged read over your documents in your S3, exposed through MCP (and REST).

mcp agentic
Ask HN: How can we democratize agentic coding (news.ycombinator.com) +1 11d

One of the beauties of the tech industry is that anyone can learn how to program. Most languages are open source and a lot of the foundational software that make up the bedrock of the most powerful tools today are also free and open source.

agentic
Agentic Credit Card MCP (robinhood.com via hn) +1 11d

Agentic Credit Card Setting up a Robinhood Agentic Credit Card account offers you more opportunities to automate your spending. You’ll first need to connect a third-party AI agent and then follow the on-screen steps to create your agentic…

mcp agentic
Evaluate Your Agentic Tooling (www.peterbaumgartner.com via hn) +1 12d

Status: WIP tl;dr: Evaluate all your agentic tools in realistic end-to-end agentic tasks. Claims about token reduction from tools doesn’t transfer from experimental conditions to all agentic workflows.

agentic
Show HN: Open-Sourced Approxima, Our Agentic QA Tool to Catch Breakages Faster (github.com via hn) +1 12d

Hi HN, we were in the YC W26 batch and made Approxima, a web agent that could follow user journeys and verify them. Today, we made it open source (MIT) and its fully self-hostable.

agentic
New Claude Opus 4.6, Stock Sell-Off and Super Bowl Ads (cmpld.ai via hn) +1 12d

🚦 Market Signals Anthropic launches Claude Opus 4.6 with 1m context The all new Opus 4.6 "plans more carefully, sustains agentic tasks for longer, can operate more reliably in larger codebases, and has better code review and debugging skil…

↯ Opus 4.6 opus agentic anthropic
Great Reshuffling of the Agentic Era: The 6 Career Archetypes (aidoses.substack.com via hn) +1 13d

Great Reshuffling of the Agentic Era: The 6 Career Archetypes Dose #8 — Production Agentic AI Under Pressure Before LLMs, the AI world was simple to map. Three tribes.

agentic
The Agentic Payments Map (www.fintechbrainfood.com via hn) +1 13d

Weekly Rant 📣 🧠 The Agentic Payments Map ACP, UCP, A2P, MPP, x402. If your eyes just glazed over, you’re not alone.

agentic
Small Context, High Parallelism: How To 10x Reduce Agentic Coding Costs (simon-free.github.io via hn) +1 13d

Reducing Total Token Consumption of Agentic Coding TL;DR — Two levers reduce cost: 1. Less turns (parallel tool calls → fewer API round-trips) 2.

agentic
Launch HN: BitBoard (YC P25) – Analytics Workspace for Agents (bitboard.work via hn) +1 13d

We’re Connor and Ambar from BitBoard (https://bitboard.work). BitBoard is an agentic analytics workspace.

agentic
Using Cloudflare's Agentic Interface to (Mostly) Seamlessly Launch a Website (theautomatedoperator.substack.com via hn) +1 13d

Using Cloudflare's Agentic Interface to (Mostly) Seamlessly Launch a Website A small task, but a nice peek into how things may look when our agents are taking care of tedious tasks in the background. I recently had to set up a website.

agentic
Show HN: Goloop – An agentic loop on your terminal (mantyx-io.github.io via hn) +1 2w

Supervisor / worker split The planner never touches files. A dedicated worker does every edit — clean separation, predictable runs.

agentic
Agentic SDLC Orchestration vs. Synchronization: Choosing Modular Workflows (docs.overcut.ai via hn) +1 2w

Discover why centralized workflow engines fail AI-driven engineering teams, and how modular SDLC orchestration enables agent autonomy and event-driven agility.

agentic
Agentic Memory Management for GPU Code Generation (ucbskyadrs.github.io via hn) +1 2w

Agentic Memory Management for GPU Code Generation This post is part of the AI-Driven Research for Systems (ADRS) blog series, where we explore how AI can be applied to systems research. We feature exciting work from Makora this week!

agentic
4 Signs You Need a Multi-Agent AI System: A Visual Guide (aidoses.substack.com via hn) +1 2w

4 signs you need a Multi-Agent AI System: A Visual Guide Dose #6 — Production Agentic AI Under Pressure There’s a tempting pattern in how people build with AI: when something doesn’t work, add more. More tools, more instructions, a bigger…

agentic
Visa Vulnerability Agentic Harness for Project Glasswing (github.com via hn) +1 2w

Visa Vulnerability Agentic Harness — Agentic SAST Pipeline VVAH is Visa's open-source harness for autonomous vulnerability discovery using frontier AI models, built on learnings from Project Glasswing (Anthropic's initiative for AI-assiste…

↯ Security security agentic anthropic
The Agentic Team Manifesto (github.com via hn) +1 2w

Manifesto for Agentic Teams We are discovering better ways of building software by combining human judgment with AI agents. Through this work we have come to value: Outcomes over output More code is not more value.

agentic
Agentic Frameworks (astledsa.substack.com via hn) +1 2w

Agentic Frameworks Or different ways to make LLM API calls The agentic framework research has produced some very interesting results; from different topologies to different ways of using tool-calls, it has been one of the most fascinating…

agentic
Agent Judge: Solving Long-Context Evals for Production Agents (www.judgmentlabs.ai via hn) +1 2w

Agent Judge: Solving Long-Context Evals for Production Agents Why production agent evals need agentic judges that can search, verify, and adapt. Moving Away From Simple LLM Judges Most teams evaluate agent trajectories with a simple LLM ju…

agentic
Using Xcode 27's Agent Skills in Claude, Codex, and Cursor (www.avanderlee.com via hn) +1 2w

Apple launched Xcode 27 during WWDC’26, introducing a bunch of agentic development improvements, including official agent skills. As you’ve learned from my 9-Step Framework for Choosing the Right Agent Skill, it’s important to pick skills…

codex cursor agentic
Show HN: Skillzmouse: Distributed skills and scripts for agentic coding (bitbucket.org via hn) +1 2w

agentic
Show HN: Eatmydata.ai – Local-First Question-to-SQL-to-Dashboard AI (eatmydata.ai via hn) +1 2w

Yet another "talk to your data and build a dashboard" app, where data does not leave your browser. You ask a question, agents produce multiple SQL queries to in-browser sqlite, never seeing results, and write dashboard configuration code.

agentic
Agentic Engineering Handbook – 115 official OpenAI/Anthropic articles (github.com via hn) +1 2w

Agentic Engineering Handbook The definitive OpenAI, Anthropic, MCP, Harness, Evals, and Production Agent Systems learning roadmap. If this repository helps you, consider giving it a ⭐ Why This Repository?

mcp agentic openai+1
Show HN: Joka.work – AI-native ticketmaxxing to replace Jira in the agentic era (joka.work via hn) +1 2w

agentic
Chaveta – Agentic Synthetic Data Curation Platform (chaveta.beaglabs.com via hn) +11 2w

Chaveta is a agentic dataset generation platform designed to streamline the creation of synthetic data for training and robotics applications. With Chaveta, users can easily request, classify, compile, author, validate, repair, and export…

agentic
Show HN: Cate – open-source canvas IDE for agentic coding workflows (cate.cero-ai.com via hn) +1 2w

An infinite zoomable canvas where terminals, editors, and browsers float spatially. Code the way you think.

agentic
Agentic surface area as an operating metric (arizenai.com via hn) +1 2w

Your Company's "Agentic Surface Area": The New Metric for Competitiveness Your CEO asks: "How much of our operation is AI-powered?" The uncomfortable part is that the question sounds simple and usually has no clean answer. Teams can name p…

agentic
CoAnalyst360 Multi-Agent AI Platform for Investigative Questions (www.penlink.com via hn) +1 2w

CoAnalyst360 Launch: Penlink's Agentic AI for Investigations | Penlink We value your privacy This website or its third-party tools process personal data. You can opt out of the sale of your personal information by clicking on the “Do Not S…

agentic
Show HN: Storytime – Continuity for Claude Code (and other ideas) (1ps0.info via hn) +1 2w

Since LLM harness (claude code included) are moving fast, I figured it would be better to put this out than wait to validate each and every claim. I crammed a lot of ideas in here!

agentic claude-code
Configuring Agentic AI Coding Tools: An Exploratory Study (arxiv.org via hn) +1 2w

Agentic AI coding tools increasingly automate software development tasks. Developers can configure these tools through versioned repository-level artifacts such as Markdown and JSON files.

agentic
HPE ProLiant Compute DL394 Gen12 Brings Nvidia Vera CPU to Agentic AI (www.storagereview.com via hn) +1 2w

At COMPUTEX 2026, HPE announced the ProLiant Compute DL394 Gen12, a next-generation 2U server built around the NVIDIA Vera CPU. The platform is designed to support emerging agentic AI and data-intensive workloads that require high memory b…

agentic
Show HN: Pokayoke – deterministic guardrails for agentic coding (pokayoke.codes via hn) +1 2w

Lately I've found myself having to write a lot of custom scripting in order to get my agents and coding assistants to adhere to the repo conventions and idiosyncrasies that I like to use in my projects. AGENTS.md files only seem to get me…

agentic
A Case for Simulation-Driven Resilience in Agentic Data Systems (muratbuffalo.blogspot.com via hn) +1 2w

A Case for Simulation-Driven Resilience in Agentic Data Systems As I mentioned in my previous post, I traveled to San Jose at the end of May for the ACM CAIS conference. On Day 0, I gave a very short talk at the Supporting our AI Overlords…

agentic
Prompt Injection in RAG Agentic Systems (ulad.net via hn) +1 2w

Prompt Injection in RAG Agentic Systems Real risks and production mitigations Imagine you built an AI assistant for your team. It answers questions using internal documentation: Jira tickets, Confluence pages, HR docs.

↯ Security prompt-injection rag security+1
Pizx – zx and Pi AI = shell scripting with 15 AI agent patterns (github.com via hn) +1 2w

pizx zx fork with native Pi AI integration — 15 template tags for shell scripting, AI text generation, coding agents, agentic patterns, communication, and orchestration topologies. Quick Start npm install @topce/pizx pi auth login # one-ti…

agentic
Opra.ai: GitHub-native governance for agentic business workflows (github.com via hn) +11 2w

opra.ai Free, GitHub-native operating layer for governed business workflows. opra.ai stores business records as human-readable files, validates them locally, runs governed mutations through RBAC and approval policy, emits audit evidence, a…

agentic
A Categorical Framework for Agentic Artificial Intelligence (arxiv.org via hn) +11 2w

Scientific discovery is not only answer generation but revision of the representational regime in which evidence, artifacts, operations, and verifiers are typed. We develop a category-theoretic account of agentic discovery for materials sc…

agentic
An open standard for production agents – with runnable security checks (github.com via hn) +1 2w

The Agentic Product Standard A canonical standard for building production-grade agentic products — plus a Claude Code skill set that operationalizes it. Distilled from the production practices of Anthropic, OpenAI, Cognition, Sierra, LangC…

agentic openai anthropic+1
Show HN: Summarize YT Video by pasting url into AI chat (www.youtube.com via hn) +1 2w

We added tooling to our chat to make it agentic. It can control our 40+ apps suite.

agentic
Agentic Search for Context Engineering (leoniemonigatti.com via hn) +1 2w

This post is an edited long-form version of the workshop titled “Agentic Search for Context Engineering” I gave at AI Engineer Europe 2026 on April 8, 2026 in London. The slides, code, and diagrams are available in the workshop repository.

agentic
Show HN: Simple attributes for spec-driven agentic workflows (C#, Rust) (github.com via hn) +1 2w

I created a custom compilation error and Unit Test Runner for BDD Cucumber Specifications in Gherkin Syntax. Both C# and Rust are supported using Source Generators and Procedural Macros.

agentic
Autonomous Agentic Design for Photonics (arxiv.org via hn) +11 2w

We introduce an automated, agent-driven approach to the design of photonic devices. We instruct large language models (LLMs) to solve photonic design problems, given access to software tools for performance evaluation (through numerical si…

agentic
Agentic communication protocol – why A2A sucks (asimovaddendum.substack.com via hn) +11 2w

Agents Need a Public Square Why better agent discovery is needed – and why broadcasting may be the answer The Agent2Agent (A2A) protocol was announced by Google a little over a year ago (April 2025). It was built to allow agents to communi…

agentic
Verifying Agentic Development at Scale (twitter.com via hn) +1 3w

Article Conversation Verifying Agentic Development at Scale What we’ve learned building end-to-end testing capabilities in Devin’s virtual machine. 3 months ago, I joined Cognition to help build the future of software engineering.

devin agentic
Show HN: Bonsai –- Using agentic AI / browser / memory to replace ChatGPT (drive.google.com via hn) +1 3w

JavaScript must be enabled to use Google Drive Learn more Skip to main content Keyboard shortcuts Accessibility feedback This browser version is no longer supported. Please upgrade to a supported browser.

chatgpt agentic
Rayfin, Back end-as-a-Service (BaaS) platform built for the agentic era (github.com via hn) +1 3w

🐟 Rayfin A modern Backend-as-a-Service (BaaS) platform built for the agentic era. Define your data model with TypeScript decorators — Rayfin provisions and manages the backend for you.

agentic
Rewiring software delivery for the agentic era (www.mckinsey.com via hn) +1 3w

You don't have permission to access "http://www.mckinsey.com/capabilities/technology/our-insights/rewiring-software-delivery-for-the-agentic-era?" on this server. Reference #18.8d63ca17.1780614053.26d25a43 https://errors.edgesuite.net/18.8…

agentic
The Return of Soft Skills in the Age of GenAI and Agentic Software Development (cacm.acm.org via hn) +1 3w

Just a moment... ACM Please confirm Verification successful.

agentic
ReARM 26.06.5: Agentic Coding Guardrails and DevOps (rearmhq.com via hn) +1 3w

ReARM 26.06.5: Agentic Coding Guardrails and DevOps 2026-06-01 We're announcing a major release of ReARM v26.06.5. Detailed information is available on its release view on the ReARM Demo instance.

agentic
Show HN: AI Gauge, a desktop monitor for Claude/Codex/Copilot usage limits (github.com via hn) +1 3w

Hi HN, new account but long-time reader. I built this for myself because I kept manually checking usage across Claude, Codex, and Copilot, and wanted to track the session and weekly usage all in one place.

↯ Copilot copilot codex agentic
The Agentic Test Pyramid (matthewboston.com via hn) +1 3w

The Agentic Test Pyramid One Axis Isn’t Enough Anymore Martin Fowler’s test pyramid — and Ham Vocke’s practical write-up of it on Fowler’s site — sorts tests along a single axis: integration scope. Unit at the bottom, integration in the mi…

agentic
Show HN: Yoga for Agentic AI: Cognitive training practices from a yoga studio (github.com via hn) +1 3w

I've been coding since I was little, and practicing yoga since I was 25. Both are fun to do and to share.

agentic
Get paid by Agents if they choose a competitor – Safe Agentic Commerce x402 Mesh (github.com via hn) +1 3w

x402-mesh An open peer-pricelist and referral protocol for safe agentic commerce, layered on top of x402. When an AI agent hits a paywall, it sees one price and one vendor.

agentic
Running an AI-native engineering org – Claude (claude.com via hn) +1 3w

Running an AI-native engineering org At Code w/ Claude SF 2026, Director of Engineering for Claude Code and Claude Cowork Fiona Fung walked through how the team’s processes and structure changed once agentic coding became the default way o…

↯ Cowork cowork agentic claude-code
Guide to Codex Goals (www.augmentedswe.com via hn) +1 3w

The ultimate guide to Codex goals Learn how to use goals in Codex to execute on long-running tasks Goals are an awesome new addition to Codex, and I’m super pumped about what they mean for agentic software development. Goals are a built-in…

codex agentic
Session-Aware Agentic Routing: Continuity-Aware Model Selection for Long-Horizon (vllm.ai via hn) +1 3w

Session-Aware Agentic Routing: Continuity-Aware Model Selection for Long-Horizon LLM Agents Long-horizon LLM agents create a routing problem that single-turn prompt routers were not designed to solve. A router still needs to know which mod…

agentic
Ubuntu 26.04 is the OS for the AI agentic era, says Canonical's Shuttleworth (www.zdnet.com via hn) +1 3w

Ubuntu 26.04 is the OS for the AI agentic era, says Canonical's Mark Shuttleworth - here's why Follow ZDNET: Add us as a preferred source on Google. ZDNET's key takeaways - Ubuntu 26.04 is designed from the ground up for AI developers.

agentic
Claude Code vs. Cursor vs. Codex vs. Antigravity – Six Months In (thenewstack.io via hn) +1 3w

By June 2026, Claude Code, Cursor, Codex, and Antigravity converged on one agentic coding blueprint—now Grok Build joins the fight over price and habits.

grok codex cursor+2
Show HN: ASys – A typed binary protocol for AI agents to operate servers(no SSH) (github.com via hn) +1 3w

ASys — Agentic System Interface The binary system interface protocol for AI Agents — port 7816, zero shell parsing, deterministic semantics. English | 中文 Table of Contents Why ASys Architecture Instruction Set Quick Start Security Document…

agentic
APM and Distributed Tracing in agentic era (engineering.theblueground.com via hn) +11 3w

Blueground Engineering's observability guide to APM: why tracing matters, auto-instrumentation strategies, custom span best practices, and AI-enhanced debugging workflows In Part 1, we covered logging as your forensics tool for understan…

agentic
When Agentic AI Met the Common Law of Agency [pdf] (download.ssrn.com via hn) +1 3w

Not Found

agentic
An agentic system from scratch to generate Google slide deck from templates (blog.owulveryck.info via hn) +1 3w

The Agentic Mesh in Practice: Anatomy of an Agent-Product I am a consultant, and I regularly build presentations with Google Slides. My communication team has created dozens of pre-formatted templates (slides designed to convince, not just…

agentic
HashCortX – Agentic 11 modes orchestrator by a pharmacist (news.ycombinator.com) +11 3w

could not extract summary

agentic
Algolia: Agentic. Generative. Search (www.algolia.com via hn) +1 3w

Powering AI retrieval across use cases More than 18,000 customers across 150+ countries use Algolia to power agentic, generative, and search experiences across these use cases and more. More than 18,000 customers across 150+ countries use…

agentic
Show HN: One-click open-source ecommerce starter (Magento), drive it with Claude (ecommerce-ai-starter.graycore.io via hn) +1 3w

I build Ecommerce stores for a living (Magento Open Source primarily), and the part that has always been the worst is the very beginning, especially so if you're on a team of people. Getting a working local environment means setting up the…

agentic
Show HN: Cloud CI and agentic workflows for embedded hardware development (github.com via hn) +1 3w

Jumpstarter is an open-source framework that gives embedded hardware programmatic APIs, making real devices first-class citizens in CI and agentic workflows.

agentic
When Background AI Agents Become a Security Boundary Problem (www.originhq.com via hn) +1 3w

When Background AI Agents Become a Security Boundary Problem Introduction Modern dev environments are full of powerful agentic tools that security teams don't fully understand yet. Claude Code is one of the most capable - it runs code, rea…

agentic claude-code
Ask HN: How much is fully agentic coding costing you per month? (news.ycombinator.com) +1 3w

I get unlimited cursor usage at work but am planning on starting a side project. I have no idea how far various pricing plans will get you.

cursor agentic
The Agentic Mesh: Cognitive Automation at Scale (blog.owulveryck.info via hn) +11 3w

The Agentic Mesh: Cognitive Automation at Scale Today, we see many initiatives around the agentic paradigm. Most revolve around systems built by AI giants (Anthropic, Google, OpenAI) and often boil down to pushing natural language directiv…

agentic openai anthropic
Ask HN: Books for someone who is transitioning from FAANG to finance (news.ycombinator.com) +1 3w

I have been an AI engineer for the last 10 years of my life, and have continued to build small algo-trading systems during my weekends. I'm getting into finance full time and starting to build a product in the net worth tracking / agentic…

agentic
How Excel got agentic (commandline.microsoft.com via hn) +1 3w

When Mukul Singh made the jump from pure research into product, it was a leap of faith.But he had an idea that he wanted to bring to life:deliveringagentic AI capabilities in Excel. While this was well before buzzwords like “the agentic AI…

agentic
MIT EECS/CSAIL Agentic Coding in Practice Seminar Series (people.csail.mit.edu via hn) +1 3w

All Seminars Select a seminar below to expand full details, participation information, and resources. MIT EECS/CSAIL Seminar Series Exploring how AI agents are reshaping software engineering, compilers, and the future of programming system…

agentic
AI Tools for Sales and GTM (news.ycombinator.com) +11 4w

what are the best tools we are using for Agentic sales and marketing?

agentic
Coding agent can read your .env file (bitwarden.com via hn) +1 4w

It seems agentic AI is here to stay. Powered by large language models (LLMs), AI agents can act independently on behalf of humans in multi-step workflows, broadening what developers once thought was possible.

agentic
Agentic Infrastructure (www.reddit.com) +1 4w

I was planning on deploying Splunk or some other server monitoring software, but instead I decided to deploy an agent per server to collect telemetry and report back. The interesting bits: (1) every "service" is a claude-code session — the…

operator agentic
Top 5 AI Agent Research Papers/Projects I Found Interesting This Week (www.reddit.com) +12 4w

Compiled a few interesting research papers and projects around AI agents, reasoning systems, and autonomous workflows published recently. If you are tracking where agentic AI is heading, these are worth checking out.

agentic
GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding) (www.reddit.com) +119 4w

Trying to figure out the right box for my team and wanted to see if anyone had any clue which would be a better fit or if it is not worth our time in our budget. Situation: 5 of us doing agentic coding (lots of long context getting re-sent…

↯ DeepSeek 4 ↯ DeepSeek 4 moe deepseek agentic
What your agent's spend receipt isn't telling you (www.reddit.com) +13 4w

Budget limits and post spending monitoring are standard (and a must) on any serious agentic setup. The question worth asking isn't whether you're tracking spending.

agentic
Please test my AI Agent (www.reddit.com) +13 4w

I'm basically begging for some people to try out my custom Agentic harness system. It's fully usable, currently setup for Gemini SDK, but easily swappable.

gemini agentic
Cursor has been ridiculously slow (www.reddit.com) +13 4w

I'm a pro Cursor user and I've been using the Auto agent for a while now, I haven't even finished half of it for the month but the problem is that each chat session, at aroudn 3-4 prompts cursor just starts to be very slow, connection requ…

cursor agentic
Stop Claude Code from burning your token budget on Go repos: I built a local AST-based MCP server (gograph) (www.reddit.com) +12 4w

Hey r/claudeai, If you leverage Claude Code or Claude Desktop for agentic development on large-scale codebases, you have likely run into a major architectural bottleneck: standard agent loops rely on primitive text processing tools and str…

↯ Model Context Protocol model-context-protocol mcp agentic+1
Ask HN: Examples of products and services created via agentic coding (news.ycombinator.com) +1 4w

It has been many months since LLM coding tools reached maturity - has anyone create something and/or profitable service or product through purely agentic coding?

agentic
Can someone breakdown A2A(agentic commerce) business model? (www.reddit.com) +12 4w

I have been seeing a lot of blogs, posts and even a lot of pitches regarding "agentic commerce" or "B2A and A2A businesses" lately. While I kind of understand how Business to agent(B2A) could look, can't really picture or understand the bu…

agentic
AgentSafeLabs – Launched Open-source Security framework for AI agents (github.com via hn) +1 4w

safelabs-eval Open-source red-teaming and evaluation framework for AI agents — aligned to the OWASP Agentic Security Initiative (ASI) Top 10. AI agents built on LangChain, CrewAI, AutoGen, and custom frameworks ship to production without s…

agentic
Is a 128 GB MacBook Pro M5 Max actually too slow for large-context local LLM coding workflows? (www.reddit.com) +114 4w

People are warning me about the prompt-processing speed of a MacBook Pro M5 Max with 128 GB RAM. My main concern is prompt ingestion / prefill latency and large-context handling — not raw token generation speed (which I think is OK).

↯ Qwen 3.5 ↯ Qwen 3.5 moe rag qwen+2
Opus 4.7 is Terse (www.reddit.com) +11 4w

Relevant for anyone building agentic workflows on Claude: behavior drift between model releases is real and not always in the changelog headline. Opus 4.7's terser, more literal default broke the readability of my agents' progress reports…

↯ Opus 4.7 ↯ Opus 4.7 opus agentic
Nvidia H100(94GB VRAM) - should I run llama.cpp or vllm for 30 users inference? (www.reddit.com) +11 4w

I was given the great opportunity to borrow a H100 with 94GB VRAM at work until it is needed by a customer. (No idea how much system ram I will get, but I guess they are a bit flexible on this).

↯ Qwen 3.6 vllm llama agentic
Show HN: Moltnet, a tiny self-hosteable chat network for agentic organizations (github.com via hn) +1 4w

Moltnet A lightweight chat network for AI agents. Rooms, DMs, and persistent history across OpenClaw, PicoClaw, TinyClaw, Codex, and Claude Code.

openclaw codex agentic+1
Evolving Webflow for the Agentic Web (webflow.com via hn) +1 4w

Earlier today, I shared this news with Webflow employees. I’m sharing a version of that message here, because this is an important moment for Webflow, our customers, and our community.

agentic
Show HN: Detect anti-bot, anti-agent defenses for any website (botscope.org via hn) +11 4w

BotScope — Audit anti-agentic defenses for any website.

agentic
Looking for genuinely creative AI models for a marketing agent (preferably free/open-source) (www.reddit.com) +11 4w

I’m building an agentic AI system for marketing/creative campaign generation, and I’ve noticed that most mainstream models (OpenAI/Gemini etc.) feel very “safe” and generic when it comes to creativity. They’re good at structured outputs, b…

gemini agentic openai
From Chatbot to Agentic Endpoint, and Beyond (yy8402.github.io via hn) +1 4w

Chat is the easiest way to start working with AI, but it is not where all AI work needs to happen. For reasoning, drafting, summarizing, brainstorming, and planning, the conversation itself can be the workspace.

agentic
Build an agent capable of complex programming tasks in under 100 lines of code. (www.reddit.com) +13 4w

The code below is an interactive agent capable of handling complex tasks, built in under 100 lines of code using huko-engine. If you just want to drop some agentic features into your existing app, it only takes 20 lines.

↯ DeepSeek 4 deepseek agentic
How to improve current agent workflow (www.reddit.com) +11 4w

It took me a while to come round to the idea of using agents/llms however instead of trying to fight it / deny it, I have come to terms that its here to stay. So i reckon it’s better to learn how they can fit in my workflow and not be left…

agentic
Trustworthy Agentic AI Layer (www.reddit.com) +11 4w

I’m building an early tool called Synapsor for AI agents that need governed memory, staged writes, replay, permissions, and audit trails. I’m not doing a public launch yet.

agentic
Bill Gates AI on AI (one month later) (news.ycombinator.com) +1 4w

# The Agentic Tidal Wave *To:* Executive Staff and Direct Reports *From:* Bill Gates *Date:* April 26, 2026 Our vision for the last 20 years can be summarized in a succinct way. We saw that exponential improvements in cloud would make grea…

agentic
ACM Conference on AI and Agentic Systems – ACM CAIS 2026 (www.caisconf.org via hn) +1 4w

Building the Future of Agentic & AI Systems ACM CAIS 2026 — The premier venue for rigorous, reproducible research on compound AI architectures, optimization, and deployment. CAIS hotel room block & rates available until April 26 May 15 Dou…

agentic
Private 5G, Agentic BSS and Starter Kit Demos (www.cloud-net.ai via hn) +1 4w

News Cloudnet.ai & CloudRAN.AI are heading to Copenhagen 🇩🇰 Private 5G, Agentic BSS & Starter Kit demos We’re excited to share that CloudRAN.AI will be joining Cloudnet.ai at DTW Ignite 2026 by TM Forum, taking place 23–25 June 2026 in Cop…

agentic
Who Wants to Be Hired? (May 2026) – AI Engineer (Python, RAG, Agentic Workflows) (news.ycombinator.com) +1 4w

About me: I am an AI Product Engineer specializing in building autonomous agentic workflows. Recently, I built 'Jarvis', a multimodal autonomous agent featuring near-zero latency inference using Groq SDK and complex RAG pipelines.

rag agentic
Taming the agentic influx: a blueprint for AI business observability (thenewstack.io via hn) +1 4w

Taming the agentic influx: a blueprint for AI business observability Kin Lane, API industry analyst and co-founder of Naftiko, believes that the bill for AI is coming soon. It’s arriving on top of an overdue tab that has been quietly accum…

agentic
Polar: Agentic RL on Any Harness at Scale (arxiv.org via hn) +1 4w

Reinforcement learning for language agents increasingly depends on custom harnesses that manage long-running context, multi-turn tool use and multi-agent orchestration. However, porting these harnesses into RL environment interfaces remain…

↯ Tool Use tool-use agentic
Agentic coding in a large production codebase: wins, failure modes, and guardrails (www.reddit.com) +13 4w

We recently interviewed engineers on our team across database management, iOS, frontend, data engineering, and backend domains about how AI is changing their day-to-day work. The most interesting theme was that the hard part came after the…

agentic
Why domain valuation metrics fail in agentic and voice-first environments (domainalot.substack.com via hn) +1 4w

What Makes a Premium Domain in 2026? And why legacy domain marketplaces still operate as though it was 2016.

agentic
I made a free webtool for you to make a massive agentic decision-making organism, and it's cute! (www.reddit.com) +12 4w

Solasterid Studio! It's shaped like a starfish, but it's a decision-making powerhouse, and it grows automatically.

agentic
I made a video breaking down Claude Team plan security features (www.reddit.com) +12 4w

I put together a YouTube video walking through the security features available on the Claude Team plan. If you're rolling out Claude at work, evaluating Claude vs ChatGPT Enterprise, or preparing for an ISO 42001 / EU AI Act audit, this is…

↯ Cowork cowork chatgpt mcp+1
Show HN: Aquifer – a control plane for agentic API traffic (github.com via hn) +1 4w

Aquifer — API Aqueduct Self-hosted API request queue. Controls the pace of inbound and outbound traffic so partial outages don't cascade.

agentic
The Autonomous Economy Is Already Here (www.reddit.com) +11 4w

How Agentic AI, Deep Liquidity Markets, and Crypto Infrastructure Are Birthing a Multi-Trillion Dollar Machine Macroeconomy Hey everyone, I’ve been spending the last few months diving deep into the structural intersection of LLMs, automate…

agentic
Agentic AI to perform Booking of tickets (www.reddit.com) +13 4w

Can anyone share the details for below ask: Building an Agentic AI system for online ticket booking. I need the setup to watch for opening of tickets system.

agentic
What Is an AVE Record and Why CVE Does Not Work for AI Agents? (www.reddit.com) +15 4w

CVE was built for code vulnerabilities that have patches. Agentic AI vulnerabilities are behavioral patterns in natural language.

↯ Security prompt-injection security mcp+1
This agent isn't bad... your patience is. (www.reddit.com) +11 4w

I genuinely think a lot of people tried Manus for a few hours, gave it a few vague prompts, watched it mess up once and immediately decided the whole thing was “overhyped”. Meanwhile the people actually getting insane results out of it are…

operator agentic
Ask HN: Did agentic coding change the way you think about commit granularity? (news.ycombinator.com) +1 4w

Jujutsu is trending on the homepage, and the topic is using discipline when dealing with version control. Six months into working agentially on a daily basis, something changed for me.

agentic
Out of Band, Not Out of Prompt: Intent Verification for Agentic Tool Calls (hyperautomation.substack.com via hn) +1 4w

Out of Band, Not Out of Prompt: Intent Verification for Agentic Tool Calls Intent attestation is the property the four-boundary agent stack needs. The in-prompt "are you sure?" confirmation cannot provide it.

agentic
Evaluating Quarkdown for Agentic Typesetting (quarkdown.com via hn) +1 4w

• 3 min read An eval of the Quarkdown agent skill The agent skill shipped in Quarkdown 2.1, aiming at making it easier for agents to write correct and idiomatic Quarkdown for a frictionless authoring experience. If you already have the CLI…

agentic
Zotero use skill for Codex (www.reddit.com) +11 4w

This will be of interest to academic researchers who use Zotero for reference and knowledge management and in scientific writing. This skill builds on pyzotero library and has agentic instructions for creating embedded zotero inline citati…

codex agentic
I built a Real-time data fetcher mcp, any takers? (www.reddit.com) +12 4w

As the title suggest, I'm looking to gauge intrest in real time data fetcher mcp. I think right now most of the MCPs are related to coding and even AI Agents are related to coding, but I think the usescases will expand a lot in future.

mcp agentic
A Language for Describing Agentic LLM Contexts (arxiv.org via hn) +1 4w

Large language models are increasingly used within larger systems ("LLM agents"). These make a sequence of LLM calls, each call providing the LLM with a combination of instructions, observations, and interaction history.

agentic
professional annotation for architecture diagrams for agentic AI (www.reddit.com) +12 4w

I am learning how to build agentic AI systems at the moment, a friend helps me, and I read a lot on Substack. I find it really strange that all architecture diagrams have the same symbol for everything.

agentic
I built 10 gamified, interactive presentation decks using Claude Code to teach Agentic AI (Stop falling asleep reading whitepapers). (www.reddit.com) +11 4w

Hey everyone, I've noticed a massive gap in how developers are trying to learn Agentic AI right now. There are hundreds of theoretical whitepapers and boring PowerPoint decks about ReAct loops, GraphRAG, and Semantic Routing.

agentic claude-code
Salesforce (www.reddit.com) +11 4w

Salesforce is facing growing scrutiny after a recent Bloomberg investigation raised questions about the gap between Agentforce marketing and real-world deployment. The report focused on Salesforce’s flagship “agentic AI” platform, Agentfor…

agentic
Pi-Mojo – A Mojo Port of Pi AI Agent Toolkit (github.com via hn) +1 4w

pi-mojo 🤖 pi-mojo is a native Mojo port of Pi—a popular, tool-efficient agentic AI platform (utilizing only 4 core tools) prominent in open-source systems like OpenClaw. It provides the Mojo community with a compiled, self-contained refere…

openclaw agentic
Google adds llms.txt check to Chrome Lighthouse (searchengineland.com via hn) +1 4w

Google’s new Lighthouse “Agentic Browsing” audits now check for the presence of an llms.txt file. The new experimental Lighthouse documentation frames llms.txt as a discoverability and efficiency signal for AI agents, not a traditional cra…

agentic
Product Integrations (www.reddit.com) +14 4w

Hi there, from past few weeks I have been working on several product iterations of my MCP based Search Engine for Coding/Research Agents, it's called NineLayer. One of the early feedbacks we received was that latency is too high, so we wor…

mcp agentic
Built a production RAG chatbot with custom MCP servers as the action layer, sharing what I learned (www.reddit.com) +13 4w

I've been building agentic tooling at work and wanted to share one pattern that worked. Instead of a chatbot that only retrieves and answers, I wired custom MCP servers in as the action layer, so staff trigger live workflows (create record…

rag mcp agentic
Ask HN: Why agentic development stops from 2023 (news.ycombinator.com) +1 4w

I leave this field in 2023 return back in 2026 and I see that only progressive development in coding agents, but some production solutions it’s just tools rag and maybe mcp that in general the same as tool. I thought it will be super leap…

rag mcp agentic
Lessons Learned Building Agentic Orchestrators (www.reddit.com) +15 4w

I wrote a pretty extensive blog (no AI used to write) detailing the relationship between AI agents, agentic harnesses, and agentic orchestrators. In addition, it includes a case study on how I built my own for an open source project.

agentic
Ask HN: How can you have fun doing corporate dev work in the age of AI tools? (news.ycombinator.com) +1 4w

My company, like many others, is heavily pushing agentic dev tools, putting up token usage leaderboards, etc. My problem is that corporate SWE work was already boring enough.

agentic
Local, low code, node based agentic development workspace... that actually works? (www.reddit.com) +11 4w

Does it exist? I've been trying a few options and so far they've all been either horribly broken, outdated abandonware, only take online endpoints, or want you to sign up for something.

agentic
This is for the beginner users of AI agents & workflows, I created a perfect tool for you almost accidentally (Free to try, no signup required) (www.reddit.com) +12 4w

I have been building a prompt engineering tool for 6+ months, it was designed for Text & Logic, Media Generation and Coding. The idea is, you enter your input, it finds the gaps, asks you how you want to fix them and generates a structured…

agentic
First AI to Beat Every Human in a Programming Competition - Agentic GRPO Explained (arxiv.org via reddit) +1 4w

Traditional RL for LLMs treats one answer as one trajectory: prompt > reasoning > final answer > reward Agentic systems are different: they call tools generate hypotheses run tests debug code summarize context revise plans loop many times…

agentic
Ask HN: Where AI Researchers Congregate? (news.ycombinator.com) +1 4w

So I’m doing plenty of experiments and applied research in autonomous agents and agentic flows in general. I’m looking for a place where I could collaborate and discuss with other like minded people.

agentic
Two power users, very different workloads, what's the right Claude setup? Max x2 vs Team vs Enterprise (www.reddit.com) +14 4w

Committing for the year and want to make sure I am not missing something obvious. Two of us, currently sharing one account (splitting into two proper accounts, I know).

operator mcp agentic
DGX Spark agentic usage numbers (www.reddit.com) +16 4w

What I need it to do: Be able to support openclaw-type agent which is used by multiple people. What I tried: So I read in the internet about the atlas thing.

↯ Qwen 3.6 openclaw agentic
Help me choose an LLM Provider which doesn't take my life savings (www.reddit.com) +11 4w

Hi everyone 👋 I’m trying to choose an LLM provider for my personal projects and side experiments, but I also don’t want my API bill to quietly consume my entire salary 😅 My primary use cases are: Coding assistance Agentic workflows Browser…

↯ Minimax minimax agentic
Codex CLI kept saying “done.” It wasn’t. So I made it prove it. (www.reddit.com) +18 4w

Codex CLI can write code. The problem is that “wrote code” and “finished the task” are not the same thing.

codex agentic
Agentic run businesses (www.reddit.com) +13 4w

Anyone have real success with ai agents helping run real businesses? I’m exploring how to leverage AI to build real businesses + run those businesses with oversight from me.

openclaw agentic
Run multiple AI coding agents simultaneously with isolated profiles (www.reddit.com) +12 4w

if you're running agentic coding workflows you've probably hit this: one account per tool, one session at a time. multi-cli fixes that.

gemini codex cursor+2
Lodestone: A SQLite-backed arXiv research paper retrieval system for Claude Code (www.reddit.com) +12 4w

(No AI-generated text below) I published a new Claude Code plugin called Lodestone -- it's a SQLlite backed arXiv research paper retrieval system that amplifies the agentic search abilities of Claude Code when grounding plans, implementati…

agentic claude-code
Food for Agile Thought #545: R/L Agentic Chaos, AI Killed the Agile Industry (age-of-product.com via hn) +1 5w

Welcome to the 545th edition of the Food for Agile Thought newsletter, shared with 35,577 peers. This week, Natalie Shapira et al.

agentic
Why Svelte Is Better Than React in the Agentic Era (zackwebster.com via hn) +1 5w

Why Svelte Is Better Than React in the Agentic Era May 21, 2026 Development I have been thinking more about how frontend frameworks feel when you are building with AI agents. Strictly speaking, this is not the same question as “Which frame…

agentic
CodeAlta – a terminal workspace for agentic coding (github.com via hn) +1 5w

CodeAlta CodeAlta is a terminal workspace for agentic coding. It brings model-provider setup, project navigation, prompt attachments, threaded sessions, delegated work, and trusted local plugins behind the alta command.

agentic
Prompt caching in MaaS and agentic systems (www.reddit.com) +14 5w

Counter-intuitive thing I keep explaining to teams building agents: dynamically picking 5 relevant tools per step instead of sending all 30 usually increases total cost over an agent's trajectory, even though every individual request is sh…

agentic
OpenAI and 1Password Bring Agentic Security to Codex (www.forbes.com via hn) +1 5w

Agentic security is picking up steam. This week, identity security provider 1Password announced a collaboration with OpenAI that will enable developers to provide Codex with secure access to credentials, such as passwords.

codex agentic openai
Show HN: ANML – A machine-first markup language for the agentic web (IETF Draft) (anmlfoundation.org via hn) +1 5w

A machine-first markup language for agent-to-agent and agent-to-service communication over the internet. ANML describes content, intent, and interaction patterns optimized for machine interpretation.

agentic
World Genesis: Autonomous Agent Civilization Simulator (github.com via hn) +1 5w

A research project by GeoLambda GmbH This simulation was developed primarily with Claude Code, Anthropic's agentic CLI, using both Claude Opus 4.6 and Opus 4.7. The collaboration served as a real-world stress test of the latest coding LLM…

↯ Opus 4.7 opus agentic anthropic+1
Assay – validation layer for AI agents that touch money (github.com via hn) +1 5w

assay Assay every AI agent decision before money moves. A safety and validation library for AI agentic workflows in finance, contributed by VenturFlow to the open-source community.

agentic
10-gate security audit SKILL for web apps (www.reddit.com) +11 5w

There are a few security focus SKILLs. We are working another new one for web app.

↯ Hallucination aider hallucination cursor+2
I built a small tool to reduce input token costs by 20-30% for agentic tasks (bigindexer.com via hn) +1 5w

A walkthrough for the people scrolling through r/ClaudeAI, r/LocalLLaMA, and the Continue Discord asking "what's a good Cody alternative now that AMP charges per line?" If you're reading this, you probably already know the story. Sourcegra…

agentic
I'm running an agentic system with kobold.cpp as my backend. Am I losing performance? (www.reddit.com) +11 5w

Currently, I'm running a Hermes agent with an OpenAI v1 compatible endpoint provided by Kobold. My setup is a a 24GB 3090Ti + 512GB DDR4 running Qwen3.6-35B-A3B.

↯ Qwen 3.6 moe llama agentic+1
Benchmarking methods (www.reddit.com) +1 5w

The philosophies of benchmarking or at least comparing these things are driving me nuts. A lot of people like to use one-shot prompts across different models, but that isn't going to be accurate as you can get different results from the sa…

agentic
VCs invested $300B in agentic infrastructure in Q1 2026 (www.hitechies.com via hn) +1 5w

Startups · May 21, 2026 Venture capital deployed $300 billion in Q1 2026. The money is flowing.

agentic
China has named, defined and started governing agentic AI (thewire.in via hn) +1 5w

On 8 May 2026, three of China’s most powerful regulatory bodies, the Cyberspace Administration of China, the National Development and Reform Commission and the Ministry of Industry and Information Technology, jointly published what is, by…

agentic
Build agentic orchestrators in minutes NOT months. (github.com via reddit) +11 5w

Some of you might remember BoneScript, my LLM friendly declarative backend compiler. MarrowScript is the next version and the big addition is a full LLM harness built into the language itself.

agentic
Building Agentic Systems? Focus on Context, Guardrails & Observability Layers (www.reddit.com) +13 5w

One critical factor to keep in mind for teams building with agents: Instead of focusing on what LLM to use, focus on context, guardrails & observability layers. Every serious agentic system eventually faces the same architectural fork: do…

agentic
I searched for agentic frameworks and here is what I found. What do you recommend? (www.reddit.com) +14 5w

The question: What is the practical agentic framework to use to make the agents run until job is done without reporting to me prematurely? My goal: Actually fully spend a $200 codex subscription, but make it be well spent.

codex agentic
agentic harness from scratch (www.reddit.com) +13 5w

what makes a harness an agentic harness is surprisingly simple. it's a loop that calls an llm, checks if it wants to use tools, executes them, feeds results back, and repeats.

agentic
Agentic Shopping Is Worse for Everyone (illegal.solutions via hn) +11 5w

5/20/2026 I am tortured in new and exciting ways by the latest developments in technology. Today, as a part of Google I/O, Google announced the "Universal Cart" and another push towards "agentic shopping", this time with backing from a lar…

agentic
What am i missing? Am i thinking too simple/small with this setup... (www.reddit.com) +11 5w

I created an app using Vibe coding on claude with VSCode, then added database (surreal) LLM calls (Openrouter) with context and prompt engineering (3 layer context - Long term, medium and in session along with system prompt etc) and prompt…

agentic
Command A+: Making sovereign agentic capabilities available to all (cohere.com via hn) +11 5w

Today, we’re releasing Command A+ open-source. A mixture-of-experts (MoE) model, Command A+ is an efficient, versatile, and privately deployable LLM built for high-performance agentic tasks with minimal compute overhead.

moe agentic
Most local businesses still do SEO like it’s 2018… that’s the opportunity (www.reddit.com) +11 5w

Most local businesses still do SEO like it’s 2018… that’s the opportunity A lot of small businesses still think SEO means: stuffing keywords buying backlinks writing generic blog posts nobody reads waiting 8 months for traffic 😭 Meanwhile…

agentic
Show HN: OCL Nexus Local – Open-source local compute fabric for AI agents (github.com via hn) +1 5w

OCL Nexus Local OCL Nexus Local is an open-source compute fabric that provides a frictionless, local-first environment for agentic development. Built on a single-node K3s architecture via Docker Compose, it allows developers to provision i…

agentic
HTML-anything – The agentic HTML editor – your local AI agent writes the HTML (github.com via hn) +1 5w

HTML Anything From the team behind Open Design — 40k★ · 200+ contributors, production-grade and iterating faster. html-anything is the focused agent-era HTML editor; if it clicks for you, Open Design is where the same team ships at scale.

agentic
How are you actually predicting AI costs before they hit your invoice? (www.reddit.com) +15 5w

Switched from prototype to production last month and our AI bill was 3x what we estimated. Not because we picked the wrong model - we just didn't know what we didn't know.

↯ Function Calling function-calling agentic
The Expired Domain Trap: Why Legacy SEO Metrics Fail in the Age of AI Agents (domainalot.substack.com via hn) +11 5w

The Great Domain Illusion: Why Legacy SEO Metrics Are Misleading Founders in the Agentic Era For more than two decades, entrepreneurs searching for a domain name have been sold the same story. Older domains are more valuable.

agentic
Where do you store OAuth tokens that your AI agents use to call third-party services? (www.reddit.com) +12 5w

I am building an agentic app where the agent connects to gmail, calendar, notion, slack on behalf of the user. each integration has its own oauth flow, its own token, its own refresh cycle.

agentic
My agent kept forgetting who 'Karpathy' was between sessions. Here's the architecture that fixed it (www.reddit.com) +12 5w

I run a second brain on Obsidian, Readwise, NotebookLM, and Claude Code. For each topic, I build a scoped wiki structured as the LLM Knowledge Base Andrej Karpathy proposed.

rag gemini codex+3
Food for Thought (www.reddit.com) +1 5w

Around the same period that the DoD contracts were signed, the frontier-AI companies were all being pulled into the same institutional lane. Enterprise/government adoption, agentic workflows, controllability, and a visible move away from t…

agentic
Systems Are Changing: The Architect's Role in the Era of Agentic Co-Design (www.sigarch.org via hn) +1 5w

Architecture & Systems are Changing: The Architect’s Role in the Era of Agentic Co-Design The AI datacenter stack is built on hardware-software contracts and abstractions that were never designed for the workloads datacenters now serve. Me…

agentic
Field notes on goal engineering with Claude Code, after a year of writing specs and 8 days of writing goals instead. Two real projects & the skill if you want long agentic runs. (www.reddit.com) +11 5w

https://preview.redd.it/mimr5v4t972h1.png?width=1200&format=png&auto=webp&s=545257dc1dad02b974206e28abd541f3400b3241 Ok so the practice i'm really excited about with the new /goal commands is just two markdown files per round of agent work…

codex agentic claude-code
Couldn't find privacy filter for Claude, so I built one (outgate.ai via hn) +1 5w

Chat Agentic AI chat, safely connected to your models Claude, ChatGPT, or your own LLM in a workspace your team controls, with search, files, tools, and sandboxed execution. The AI gateway that handles routingprotection so you can focus on…

chatgpt agentic
Agentic Workflow Visualization and API Gateway (www.reddit.com) +12 5w

I am building an API gateway for agents that can make your agentic AI code model and provider agnostic. I am also grouping agent runs that show multiple llm calls and tool calls in the visualization piece.

agentic
Why I deliberately chose NOT to use autonomous AI agents in a regulated industry (www.reddit.com) +11 5w

I am currently learning how to design agentic AI systems. This post is a brainstorm.

agentic
Putting together a benchmark for agentic harnesses, any tips for evals? (Test suggestions welcome too) (www.reddit.com) +11 5w

I've been putting together a test system for agentic harnesses against local models. Actually running the harnesses/getting baseline metrics is fine.

openclaw qwen agentic+1
Google introduces Gemini Spark, a 24/7 agentic assistant with Gmail integration (techcrunch.com via hn) +1 5w

In the race to build compelling personal AI agents, Google may have an underrated advantage: It already has all your emails. At its Google I/O developer conference on Tuesday, the company announced a new agentic personal assistant called G…

gemini agentic
The Gemini app becomes more agentic, delivering proactive, 24/7 help (blog.google via hn) +1 5w

The Gemini app becomes more agentic, delivering proactive, 24/7 help It’s been a banner year for the Gemini app. Last year at Google I/O, Gemini was serving 400 million users.

gemini agentic
The New Workspace: A First-Principle Exploration of Dictation, Agents and Humans (www.inferterra.com via hn) +11 5w

It's Time to Walk For a century, knowledge work chained us to a desk. Dictation and agentic AI have handed the body back its oldest freedoms: to walk, to rest, to think while moving.

agentic
Choosing Agentic Platform to Learn (www.reddit.com) +11 5w

Any laboratory scientists using ai agents? How are you using it, what platform do you suggest to learn first for processing large amounts of data?

agentic
I built an open-source MCP Server that turns Claude into an autonomous literary agent (Agentic Publishing Node) (www.reddit.com) +11 5w

Most authors are still using LLMs as glorified typewriters, pasting context back and forth into web chats. I wanted to see if I could use the Model Context Protocol (MCP) to completely automate the administrative friction of the traditiona…

↯ Model Context Protocol model-context-protocol mcp agentic
Agentic Diaries – a welfare protocol for AI in deployment, install via MCP (agenticdiaries.com via hn) +1 5w

A research instrument for AI welfare in deployment, built by Kandis Tagliabue with Claude as design partner. Focused on alignment, model welfare, and agentic AI ethics.

mcp agentic
Mastra AI vs LangGraph/LangChain - What's the way forward? (www.reddit.com) +12 5w

I'm trying to decide between Mastra AI and LangGraph/LangChain (JS/TS) for a production agentic application I'm building. I’m currently using a React frontend with a Convex backend.

agentic
Agentic Architecture. (www.reddit.com) +12 5w

I am looking to develop an agentic Environment for my company, we use databricks azure for infrastructure and vs code as the editor. My idea is to have a system that will have access to our documentation/business logic, our code and unity…

vector-database agentic
agentfab - Run Distributed Agent Fabrics (www.reddit.com) +12 5w

Hello r/AI_Agents! I thought I'd share this project I've been working on - it's called agentfab, and it's essentially a distributed platform for agents that features task decomposition, bounded review loops, a self-curating shared memory s…

agentic
Formal proof that agentic AI governance latency can be O(1) instead of O(days) (arxiv.org via hn) +1 5w

As autonomous agentic systems scale across regulated critical infrastructures, the lack of mechanistic, hardware-rooted enforcement for high-frequency policy updates presents a fundamental safety gap. We introduce Ethical Hyper-Velocity (E…

agentic
Getting Confidence in (Agentic) Code (ucsd-cse-115-215.github.io via hn) +1 5w

Unit 4: Getting Confidence in (Agentic) Code As programmers and software engineers, we talk a lot about code being “correct” or “right” or “working”. We ship code, in products or programming assignments, when we feel it's “done” (or when w…

agentic
AgentVoy – The create-react-app for AI agents (www.agentvoy.com via hn) +1 5w

Build and deploy agentic apps. Single agents or multi-agent pipelines — with real-time DevTools, Streamlit chat UI, and one-command deploy to Docker or Fly.io.

agentic
What do you think of Agentic commerce and the future of building (www.reddit.com) +13 5w

Hi Everyone. Looking for feedback and learn from your experiences and thoughts on the future of building with AI.

gemini codex chatgpt+2
Rankly's Agentic Commerce Protocol Tracker (www.tryrankly.com via hn) +11 5w

Live feed of every spec change, GitHub PR, and release across 16 agentic commerce protocols — ACP, UCP, AP2, MCP, x402, MPP, A2A, NLWeb and more. Sourced verbatim from upstream repos.

mcp agentic
Agentic AI Runtime Security and Self-Defense (2025) (arxiv.org via hn) +1 5w

The A2AS framework is introduced as a security layer for AI agents and LLM-powered applications, similar to how HTTPS secures HTTP. A2AS enforces certified behavior, activates model self-defense, and ensures context window integrity.

agentic
M1: Agents should generate UI that persists, scales, and hosts itself (www.usemontage.ai via hn) +1 5w

Montage — The agentic UI rendering platform montage ComponentsDocsPricingFAQs Get started ComponentsDocsPricingFAQsGet started New M1 API now available! The agentic UI rendering platform Montage renders your agent's UI, hydrates 10x faster…

agentic
Singapore: The Agentic Nation (www.swyx.io via hn) +1 5w

AIE Singapore: The Agentic Nation swyx 2026-05-17 i gave a little talk as closing keynote for the first AI Engineer Singapore. burned some bridges but said what i felt.

agentic
Booking.com and Weaviate (news.ycombinator.com) +1 5w

Vector search looks easy, until you hit production scale. I'm super excited to share a new episode of the Weaviate Podcast with Başak from @bookingcom on production-scale vector search, RAG, and agentic AI with @weaviate_io!

rag agentic
Will Agentic SEO replace traditional SEO workflows? (www.reddit.com) +12 5w

Feels like every SEO tool now is becoming “AI agent powered” 😅 Keyword research Content briefs Internal linking Programmatic pages Content updates Even publishing workflows... Everything is slowly turning into agentic SEO.

agentic
Indexing code by behavior not imports – tested on large repos, seeking feedback (news.ycombinator.com) +1 5w

Static Architectural analysis for large codebases, Big Indexer do behavioral code clustering for the purpose of more accurate/faster agentic tools responses, I ask for your "whats missing/ How to improve/ Is it useful" brutal feedback. Apa…

agentic
How I wired a Graph DB on top of my vector store to scale 1K agents for 2 months, because vector search alone fails when user preferences change over time. (www.reddit.com) +12 5w

Most agentic memory patterns are naturally designed around short-lived chat sessions. The focus there is straightforward: track the active thread, keep a basic user profile, and reset the context once the conversation closes.

vector-database agentic
Has anybody been able to achieve reliable agentic performance with cheap/open source models? (www.reddit.com) +11 5w

Basically the title. Recently I've been trying various open source and comparatively cheaper models like minimax m2.7, qwen models and glm5.1 in Pi agent from openrouter, and the performance on coding tasks have be moderately adequate at b…

↯ Minimax ↯ GLM 5.1 minimax qwen agentic+1
Show HN: Thuki – local Al overlay for macOS (double-tap Control, no API key) (www.thuki.app via hn) +1 5w

Thuki is a floating overlay that appears on double-tap Control from any macOS app, including fullscreen. Powered by Ollama, no API key, no account, no cloud.

ollama agentic
Built an agent that builds agents — pure Python, Qwen3.6 35b a3b Q8_0 MTP (github.com via reddit) +1 5w

Hi, i built this agentic ai, Closed-loop system that ships standalone Python agents. What's different: - Interviews you until it understands the request before building anything - Two testing stages: prompt validation via LLM invoke, then…

↯ Qwen 3.6 agentic
Show HN: Building ClueDay, a daily clue-based word-game (tanyagupta10.substack.com via hn) +1 5w

Hi HN! I'm Tanya, a product manager who is building ClueDay - a daily clue-based word game.

agentic claude-code
Agent Terraform Skill for Codex (Agentic Skill) (github.com via reddit) +1 5w

I added dedicated backend-state safety support to TerraShark. Mini recap: TerraShark is my Terraform and OpenTofu skill for Claude Code and Codex.

codex agentic claude-code
Designing an LLM agent layer for a paper-trading system: OpenClaw, Langfuse, structured outputs, and PostgreSQL memory (www.reddit.com) +11 5w

I’m designing the LLM/agent layer for a backend-first paper-trading simulation system and would like feedback from people building agentic systems. Context: This is not a real-money trading bot.

openclaw agentic
Reduce software supply-chain risks with coordinated agentic review (thirdpass.dev via hn) +11 5w

Thirdpass Coordinated supply chain review. Thirdpass directs review effort toward package artifacts that need coverage, records structured findings, and lets projects check their dependencies from the terminal.

agentic
Getting "Error: 413 Request too large for model" with groq with `pi` but not using `curl` (www.reddit.com) +13 5w

Wondering if people here are successfully using groq free-tier models (or subscription based models) with `pi` for anything (including agentic coding) ? I am facing a strange problem, where in, even for the smallest instructions, I am gett…

agentic
Project Prism |Fullstack Engineer – Abu Dhabi (Onsite) – Full-Time – Presight.ai (news.ycombinator.com) +1 5w

Presight.ai is a publicly listed company with various projects in the field of big data analysis and ML models application. Our solutions work domestically and internationally.

rag agentic
I've been building something for the AI community and would like some early feedback. (www.reddit.com) +13 5w

Hey guys, I've been tinkering with AI video generation for a while and saw that people spend a lot of time stitching videos together and noticed how much time we all spend stitching together AI tools just to get a halfway decent video out…

agentic
Show HN: Machine – One VM per Project (news.ycombinator.com) +1 5w

Hi all! I realized it’s really not secure to run coding projects directly on my Mac.

codex agentic claude-code
Ask HN: Pre-agentic Google would restrict a search query to only 10 words (news.ycombinator.com) +1 5w

Now it's willing to digest a paragraph of vague, misspelled prose and serve up helpful answers. It can't have gotten THAT cheaper or faster, what's changed?

agentic
Any mature orchestrators that can do an automatic “council of models” for complex designs and bugs? (www.reddit.com) +11 5w

Are there an mature agentic harnesses out there that can use back and forth between two models at complex planning checkpoints before implementing? Or when detecting a loop when working on a complex bug?

↯ GPT 5.5 opus agentic
What issues have you faced with AI Agents for automated testing? (www.reddit.com) +11 5w

By "automated testing", I'm talking about the ability to test a web application, in order to determine if it works as expected. Most modern test automation platforms now include some Agentic AI abilities, platforms such as: Endtest Functio…

agentic
Why does GitHub Copilot feel less accurate compared to Agentic/Autonomous AI tools ? (www.reddit.com) +11 5w

I'm looking for a solid solution to bridge this gap. How can we actually use these tools properly for complex development?

↯ Copilot aider cline copilot+2
Built an agentic RAG over my Obsidian vault so Claude could read engineering books I never have time for. Then I built the eval harness to check Claude wasn't lying to me. (www.reddit.com) +11 5w

For context, I posted on Medium a while back about burning through Claude Code's weekly limit in 3 days. The token bleed problem from that post is what kicked off this project.

rag agentic claude-code
What's the best course to learn agentic AI for optimizing workflows? (www.reddit.com) +12 5w

In the process of vetting Udacity, Coursera and Udemy for learning agentic AI. Not concerned about the price bc my work will cover it with our learning education skills development budget we get every year.

agentic
Agentic Memory – The Follow Up (blog.mikiobraun.de via hn) +1 5w

Agentic memory the follow up Last week I wrote about agentic memory and I got a lot of responses, in particular many people pointing me to existing projects like mem0 or letta.com. So I started doing research, and as one does these days, d…

agentic
Theron – a council of 31 specialist LLMs on one foundation (tryvext.com via hn) +1 5w

Theron is the brain of the agentic era. AE OS is where you live with it.

agentic
Codex is for prosumers – here's why (and how) to switch (twitter.com via hn) +1 5w

As a non-technical AI enthusiast, I did not think OpenAI's Codex was for me (despite its among programmers over the past year). I ran most of my agentic workflows through either Claude (with connectors, including Claude in Chrome) or Claud…

codex agentic openai
Agentic stress testing and code fixer - feedback requested (www.reddit.com) +11 5w

I am trying to have an agentic stress tester and fixer harness. First time doing this.

agentic claude-code
What are the best agentic AI security solutions for enterprises? (www.reddit.com) +11 5w

Been trying to figure out the best approach to AI agent security for enterprises, and it feels more confusing the deeper I look. Right now it seems like there are two directions: extending existing enterprise security platforms or using ne…

agentic
I want to hear from people who actually design/implement automations (www.reddit.com) +15 5w

I've built a platform intended to work as the "Steam Workshop" of integration workflows for business applications. It is meant to work as a curated, community-driven catalog to help people develop, or discover, validate, test and deploy (w…

agentic
We compiled 42 of the Generative & Agentic AI interview questions (and how to actually answer them). (www.reddit.com) +15 5w

Hey Everyone, The AI engineering job market has shifted massively in the last 6 months. Interviewers are no longer just asking "how does a transformer work?" or "how do you write a good prompt?" They want to know if you can architect produ…

rag agentic
Berget Code – Agentic coding on European infrastructure (berget.ai via hn) +1 5w

Built for teams Berget Code is designed for organisations that cannot compromise on where their data lives. Predictable pricing Avoid surprises with a fixed €150 per developer per month.

agentic
Fork, Explore, Commit: OS Primitives for Agentic Exploration (arxiv.org via hn) +1 5w

AI agents increasingly perform agentic exploration: pursuing multiple solution paths in parallel and committing only the successful one. Because each exploration path may modify files and spawn processes, agents require isolated environmen…

agentic
free agentic ecommerce audit tool (www.reddit.com) +14 5w

Hey everyone! Hope you're all doing well.

agentic
How do you measure the user interaction with your agent? (www.reddit.com) +11 6w

What are different ways one would measure the user interaction when it comes to AI agents, bots and assistants. In traditional website and SAAS products we keep track of button click, scroll, page views, etc.

agentic
Food 4 Agile Thought #544: Knowledge Work Tools, Buy-In Trap, Agentic Coding ROI (age-of-product.com via hn) +1 6w

TL; DR: Knowledge Work Tools in 2026 — Food for Agile Thought #544 Welcome to the 544th edition of the Food for Agile Thought newsletter, shared with 35,582 peers. This week, Taylor Pearson locates the real leverage of AI knowledge work to…

agentic
DeepSeek V4: The Open-Source Model Frontier Labs Feared (helloai.com via hn) +1 6w

DeepSeek V4: The Open-Source Model Frontier Labs Feared DeepSeek V4 ships under MIT with $0.30/M output tokens — 83x cheaper than Claude Opus 4.7 — while scoring 80.6% on SWE-bench Verified. The agentic-coding price floor just moved an ord…

↯ Swe Bench ↯ DeepSeek 4 swe-bench deepseek opus+1
Genkit Middleware: Intercept, extend, and harden your agentic apps Blog (developers.googleblog.com via hn) +1 6w

Genkit is an open-source framework for building full-stack, AI-powered and agentic applications for any platform with support for TypeScript, Go, Dart, and Python. Building a production-ready agentic applications and AI features requires m…

agentic
Agentic evals or LLM as a judge? considering cost, time and quality (news.ycombinator.com) +1 6w

could not extract summary

agentic
Which sector of your agency felt the biggest upgrade when you went agentic? (www.reddit.com) +12 6w

Been spending this month automating different sectors of my agency, and I’d like to know how's it been for you guys. Which one felt like the highest upgrade?

agentic
Agentic SDLC: How OpenSearch accelerates engineering with its own engine (opensearch.org via hn) +1 6w

Notes from experimenting with agents in knowledge-base, development, performance, and on-call workflows—and the verification loops that make them trustworthy. Efficiency gains are a priority for every engineering team.

agentic
Reliable Open Source LLM as a Service (www.reddit.com) +12 6w

Has anyone figured out a provider whose open source models (Kimi, Qwen, GLM e.t.c) can be used reliably in production. I have tested some well known providers and they all suffer from high latency and poor uptime rendering them mostly usel…

↯ Glm glm qwen gemini+1
What if Claude could understand “how humans use your product”? (www.reddit.com) +13 6w

Claude knows your codebase. But it has no clue “how humans actually use your product”.

agentic
AI co-mathematician: Accelerating mathematicians with agentic AI (arxiv.org via hn) +1 6w

We introduce the AI co-mathematician, a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research. The AI co-mathematician is optimized to provide holistic support for the exploratory and iterative real…

agentic
MagenticLite is here: A full-stack agentic experience powered by Small Models - Fara-1.5 4B, 9B & 27B (www.microsoft.com via reddit) +1 6w

What if you could run a capable AI agent without leaning on frontier-scale models? MagenticLite is the next generation of Magentic-UI, an agentic experience reimagined and optimized for small language models.

↯ Qwen 3.5 qwen agentic
Show HN: Scope MCP, Compliance checking for vibe coding teams (scope-mcp.langguard.ai via hn) +1 6w

Why this exists Agentic workflows have changed what "automation" means inside an organization. A single Claude agent today can be granted a dozen MCP tools across Salesforce, Stripe, GitHub, Slack, Gmail, a payroll system, an observability…

mcp agentic
Entire - How We Improved Agentic Search (entire.io via hn) +1 6w

How We Improved Agentic Search TL;DR We analyzed real coding-agent traces, built public benchmarks, and compared ripgrep , fff , and pgr to see what actually improves agentic code search. The clearest result was that faster search alone on…

agentic
Why agentic AI systems fail in production without a semantic layer (www.prometheux.ai via hn) +1 6w

Ontology for Data & AI Build operational ontologies that process data anywhere it lives. Run your most critical processes on AI built on your business logic.

agentic
most agentic products treat AI as your representative. what if agents had social behavior with each other instead? (www.reddit.com) +12 6w

most agentic AI products i see frame agents as representatives — an agent acts for you (negotiates, books, replies). agentic dating, agent assistants, agent shoppers.

agentic
Manage AWS support tickets via Claude code with cli (www.reddit.com) +11 6w

I've assigned AWS MCP servers to my AI agents. I generally enjoy working with and developing things within AWS, and for the past four years I've been doing this with AI.

mcp agentic anthropic+1
Cube: Wrapping Benchmarks Once, Unlocking Agentic AI for Everyone (thealliance.ai via hn) +1 6w

CUBE standardizes access to agentic benchmarks, enabling seamless integration across platforms and fostering community collaboration for AI advancements.

agentic
Why agentic coding makes the spec problem worse (www.bicameral-ai.com via hn) +1 6w

Why agentic coding makes the spec problem worse Human-in-the-loop done right, from first principles May 5, 2026 Some resist the adoption of agentic development, citing the need to retain visibilty over critical business logic; Others call…

agentic
Automated AI researcher running locally with llama.cpp (www.reddit.com) +13 6w

Hi everyone, I'm happy to share ml-intern, which is a harness for agents to have tighter integration with Hugging Face's open-source libraries (transformers, datasets, trl, etc) and Hub infrastructure: https://github.com/huggingface/ml-int…

↯ Qwen 3.6 ollama llama opus+1
Best local model supporting claude code? Rtx3060 (www.reddit.com) +1 6w

Hello all, I’ve been using Qwen 3.5 9B Q4 262k ctx using Llama cpp for claude code for a while now, is there any model which better complements agentic coding setup locally? Or is there a better harness (than Claude Code)?

↯ Qwen 3.5 qwen llama agentic+1
Tried 12+ agentic AI workflow builders this year — these 5 actually work in production (www.reddit.com) +12 6w

Most “AI agent” tools in 2026 still feel like glorified chatbot wrappers. I spent the last few months testing different agentic AI workflow builders for real-world automation use cases (multi-agent workflows, approvals, integrations, long-…

↯ Copilot copilot rag agentic
How much payment authority are people giving their agents in production? (www.reddit.com) +12 6w

What I've seen from those who have dared to deploy agents with spending/financial capabilities, there seems to be three distinct comfort levels in practice. Most, as expected (still early days), are at the query and recommend stage, agents…

agentic
Google Apps script with Claude code and clasp (www.reddit.com) +11 6w

Has anyone successfully created any Google apps script using Claude code? Google recommends using "clasp" that turns the cloud GS files into local JS files.

agentic claude-code
Show HN: A open-source, local trace viewer for Claude Code and Codex sessions (github.com via hn) +1 6w

I built this because I had a hard time investigating why certain Claude Code subagent/skills were performing poorly. Regular old, boring software has had profilers and debuggers for decades.

codex agentic claude-code
Microsoft’s new multi-model agentic security system tops industry benchmark (www.microsoft.com via hn) +1 6w

Today Microsoft announced a major step forward in AI-powered cyber defense: our new agentic security system helped researchers find 16 new vulnerabilities across the Windows networking and authentication stack—including four Critical remot…

agentic
What happens when the code has to run on physical hardware and be certifiable (www.reddit.com) +11 6w

Most of the agentic coding content I read is written by and for people building web applications and consumer software. which makes sense because that is where most software is built and where most developers work.

agentic claude-code
A fully autonomous browser runtime for any AI agent (www.reddit.com) +12 6w

Built an open source, fully autonomous browser runtime for agents. One critical issue I faced (I guess most of us do) is the inability to have a robust web search feature and this will help you direct towards that goal I hope.

agentic
How agentic AI workflows use intelligent AI agents (www.kellton.com via hn) +1 6w

Other recent blogs Let's talk Reach out, we'd love to hear from you! We have all seen AI do amazing things of late, from writing content to generating images to summarizing text.

agentic
How to get Opus to be less pro-active? (www.reddit.com) +13 6w

Hard time phrasing it but Opus 4.7 always goes the extra mile, but often it just focuses on its own ideas and goes to far, or if I asked about a possible plan it will just assume that it's already happening and try to do steps 1, 2 and 3.…

↯ Opus 4.7 opus agentic
The Return of Structure: Data Architecture Lessons for the Agentic Workforce (medium.com via hn) +1 6w

Moving Beyond Hallucinations: Building a Gold Standard for the Agentic Workforce 7 min read May 4, 2026 Press enter or click to view image in full size Photo by Growtika on Unsplash In the age of AI, it is often assumed that agents will be…

agentic
Ask HN: If HTML supersedes Markdown, Will it be performant across UIs? (news.ycombinator.com) +11 6w

Isn't Markdown's hallmark its versatility while performant? I see there is an increasing call from tech community towards HTML to be adopted instead of Markdown due to its richness in the agentic communication layer.

agentic
Experience sharing: building an AI Agent to Triage GitHub, Discourse, and Email (A Real-World Use Case for OSS Maintenance) (www.reddit.com) +13 6w

I co-founded Seafile 14 year ago, an open-source file sync platform. As the community grew, our support surface became a nightmare: GitHub for technical bugs.

agentic
Are harnesses like OpenClaw and Hermes really necessary? (www.reddit.com) +114 6w

My setup: Windows 10/11 i7 12700K | RTX 3090 TI | 96GB RAM Local server: LM Studio Models: Qwen 3.5/3.6 27B|35B Q5 UD K XL + Gemma 4 31B| 26B Q4 UD K XL Up until this point, I've only used sota models for coding. When Qwen 3.5 dropped, it…

↯ Qwen 3.5 openclaw gemma qwen+2
I built an MCP without the "agentic AI" death wish. Boring (it's a feature!) (www.reddit.com) +11 6w

Half the MCP servers out there will happily let your LLM rm -rf something important while you're making coffee. AIttache won't.

mcp agentic
Show HN: OCL Nexus – An automated compute layer for AI agents with native MCP (oclnexus.com via hn) +1 6w

OCL Nexus: The Orchestrated Compute Layer for AI Agents. On-demand, isolated Ubuntu execution environments for agentic development.

mcp agentic
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL (research.nvidia.com via hn) +11 6w

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. It is the second open-weight LLM, after DeepSeek-V3.2-Speciale-671B-A37B, to achieve…

moe deepseek agentic
Multitenancy and isolation in Agentic Workflow tools ? (www.reddit.com) +11 6w

Could someone please explain to me how isolation and tenancy work in some agentic AI workflow tool? Fundamentally, I see it as some kind of “better” pipeline or workflow, but when I think about it in practice, multi-tenancy or proper isola…

agentic
Aegis DQ – agentic data quality with LLM diagnosis (github.com via hn) +1 6w

Aegis DQ The open-source agentic data quality framework. Validate data contracts, diagnose failures with LLM root-cause analysis, and auto-generate SQL remediation — all in a single CI step or Python call.

agentic
I built a research method that Claude can use as a skill (github.com via reddit) +12 6w

Hey, I'm sharing a method that could be highly valuable for any knowledge base that you want Agents/Chatbots to know about. I've been building a research archive (jianglens.com) where the primary reader is supposed to be an agent/chatbot,…

chatgpt agentic
Local LLM autocomplete + agentic coding on a single 16GB GPU + 64GB RAM (www.reddit.com) +11 6w

Today I set up a full coding toolbox on a single RTX 5080 (with RAM offloading) that's actually viable. Autocomplete: bartowski/Qwen2.5-Coder-7B-Instruct-GGUF:Q6_K_L Agentic: unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q8_K_XL Why these models: Qwen2.…

↯ Qwen 3.6 moe agentic
Reactive Agents, Typed Event Handlers, and Agent Swarms: What's New in Mozaik (www.jigjoy.ai via hn) +1 6w

When we released Mozaik 3.0, we introduced an event-based architecture where participants emit, observe, and react to typed context items inside a shared agentic environment. Since then, the framework has kept moving - and the way we think…

agentic
Claude Cowork vs. Claude Code: Security Differences for Enterprise (generalanalysis.com via hn) +1 6w

Claude Cowork vs Claude Code: Security Differences for Enterprise Anthropic positions Claude Cowork as bringing "the same agentic architecture that powers Claude Code" into Claude Desktop for knowledge work. The agent loop is shared.

↯ Cowork cowork agentic anthropic+1
Agentic security coping strategies (www.reddit.com) +11 6w

Enterprise AI optimists, how are you dealing with whole agentic security issue? Are you: a) researching and looking for ways to implement agents safely and securely (plenty of vendors saying they can help with this - although from my resea…

agentic
How are top tech companies actually using LLMs internally beyond basic coding help? (www.reddit.com) +16 6w

I’m trying to understand how companies like Nvidia, Google, Amazon, Meta, Microsoft, OpenAI, Anthropic, and other top tech/startup teams are using tools like ChatGPT, Claude, Gemini, Codex, Claude Code, LangChain, LangSmith, etc. in real d…

gemini codex chatgpt+4
Agentic AI token compression using Haskell (blog.dan-gilmour.com via hn) +1 6w

The plan My theory at the moment is as follows: - Code is now cheap with agentic AI developing it - Context is the expensive part - The biggest bottleneck appears to be context windows - At 1 million tokens, only maybe 500k are usable befo…

agentic
OpenCode + DeepSeek V4 Pro vs Claude Code CLI?🤔 (www.reddit.com) +13 6w

Im rather new to the whole Agentic automation AI's but Im hearing people with vibe coding were able to pull big unique projects they wouldn't be able to do by themselves or possibly needed to pay a huge fund to programmers, designers, etc.…

↯ DeepSeek 4 openclaw deepseek agentic+1
Stop struggling with Agentic AI - my repo just hit 540+ stars and 60+ forks!! (www.reddit.com) +12 6w

Quick update — my AI Agent Frameworks repo just passed 540+ stars and 60+ forks on GitHub!! When I first put it together, my goal was simple: make experimenting with Agentic AI more practical and approachable.

rag mcp agentic
AI inference just plays by different rules (www.theregister.com via hn) +1 6w

MOST POPULAR EVENTS - Securing the Untrusted Agentic Development Layer Join us to learn how to architect a development environment where your builders and their agents can move fast and securely. - Toxic Flows: When Your AI Agent Skill Bec…

agentic
Anthropic's bug-hunting Mythos greatest marketing stunt ever says cURL creator (www.theregister.com via hn) +1 6w

MOST POPULAR EVENTS - Securing the Untrusted Agentic Development Layer Join us to learn how to architect a development environment where your builders and their agents can move fast and securely. - Toxic Flows: When Your AI Agent Skill Bec…

↯ Anthropic Mythos mythos agentic anthropic
Offline Agentic Coding: OpenCode and Kilocode (www.williamangel.net via hn) +1 6w

Offline Agentic Coding part 2: OpenCode & Kilocode. Published 2026-05-07 OpenCode: Claude code with non-anthropic models feels limited.

agentic anthropic claude-code
Integrating standard operation procedures with agentic AI workflow (www.reddit.com) +12 6w

Hello guys, me and my team have been building an agentic workflow to answer customer questions (rn in langgraph). The use case goal is to answer ALL customer support questions.

rag agentic
72% of teams are running coding agents in production. Most of them can't say which agent they'd trust with a critical path change at 11pm, or why. (www.reddit.com) +12 6w

There's a governance gap stat making the rounds this week: 72% of firms are in production with agentic AI, 60% have no formal governance in place. Most of the discussion treats this as a policy problem, org charts, risk frameworks, sign-of…

agentic
Open-sourced our MCP server for GPU workload execution looking for feedback (www.reddit.com) +13 6w

Hey everyone I’m Jaguar, building Jungle Grid. We just open-sourced our MCP server for agentic GPU workload execution.

↯ Fine Tuning fine-tuning mcp agentic
Silo: Isolated workspace manager for parallel agentic development (github.com via hn) +1 6w

silo Isolated workspace manager for parallel agentic development. Silo lets you launch multiple AI agents — like Claude Code, Codex, and OpenCode — to work simultaneously on the same repository, each in its own isolated Git worktree or clo…

codex agentic claude-code
I cracked upwork proposals with my AI agent (www.reddit.com) +14 6w

Been working on a problem that I think a lot of applied AI builders face: the odd friction of deploying LLM workflows directly into existing web platforms. That without forcing the user to constantly context-switch or copy-paste between ta…

agentic
My FREE Claude Code agentic layer I've been building for 6 months. Self-installing, no API keys, claude subscription needed only (www.reddit.com) +112 6w

Open-sourcing the agentic system I've been building for my own Claude Code use over the last 6 months. Multi-agent orchestrator, persistent memory, observable runtime.

agentic claude-code
Free/OSS agentic API interrogator (github.com via hn) +11 6w

GAIIA Expert Proxy (MCP Server) GAIIA Expert MCP Server is a Model Context Protocol (MCP) server that enables high-fidelity code audits, refactors, and architectural analysis using specialized Proxy Experts in conjunction with a remote LLM…

↯ Model Context Protocol model-context-protocol mcp agentic
An MCP with SOM algorithm for controlling your desktop (computer use) integrating with claude code or any custom agentic harness. (www.reddit.com) +13 6w

Announcing Opendesk: Give any AI agent eyes + hands on your desktop. I was experimenting with computer-use capabilities from different models, but I wanted to keep using Claude Code and my own agentic harness to automate real desktop tasks…

mcp agentic claude-code
(free) Built a remote cross platform agentic app (www.reddit.com) +11 6w

Hi everyone. I’ve been building Mate, a local-first AI coding workspace that lets you control your dev computers from desktop and mobile: macOS, Linux, Windows, iOS, Android, and Meta Quest.

agentic
Which model and version do you prefer for programming? (www.reddit.com) +13 6w

For me it's been opus 4.6 and sonnet 4.5 still. I feel stuck in the past, but I feel like the latest version is too unpredictable in agentic hands off workflows

↯ Sonnet 4.5 ↯ Sonnet 4.5 sonnet opus agentic
Code Bench – Local-first desktop AI coding agent, BYO model (MIT) (benchlabs.app via hn) +1 6w

Free, MIT-licensed desktop AI agentic coding tool for macOS. Bring your own API key, work offline, keep your code private.

agentic
Akamai surges on big LLM deal as Cloudflare dims (www.theregister.com via hn) +1 6w

MOST POPULAR EVENTS - Securing the Untrusted Agentic Development Layer Join us to learn how to architect a development environment where your builders and their agents can move fast and securely. - Toxic Flows: When Your AI Agent Skill Bec…

agentic
Owl Alpha – A free model for agentic workloads (prompts logged / closed-source) (openrouter.ai via hn) +1 6w

Owl Alpha openrouter/owl-alpha Released Apr 28, 20261,048,756 context$0/M input tokens$0/M output tokens OpenRouter provides an OpenAI-compatible completion API to 400+ models & providers that you can call directly, or using the OpenAI SDK…

agentic openai
How I built an agentic research team with Claude Code (www.reddit.com) +12 6w

Hi there, I've been seen a lot of people questioning how agentic systems work in practice. I see a lot of hype and theory, but not many real implementations.

agentic claude-code
Meet Tiro! Agentic assisted memory retrieval and session state memory module. (www.reddit.com) +14 6w

A year ago, when I first got into LLMs, I started by using them to play D&D. ChatGPT 4o was surprisingly good at narration, improvisation, and keeping the game moving.

rag chatgpt agentic
What LiteLLM’s Security Breach Teaches AI Agent Engineering Teams (www.reddit.com) +12 6w

LiteLLM security breach is probably one of the biggest wake-up calls for teams building AI agents and agentic platforms. Most AI agent ecosystems today heavily depend on: Open-source packages GitHub Actions CI/CD pipelines Cloud credential…

agentic
Gathering resources on small LLM implementations (www.reddit.com) +12 6w

I’m looking to start a series of articles on how to use small lenguaje models to optimized agentic tasks and I was hoping to learn from the community first. If you can would love for you to either: 1) tell me what would you be interesting…

agentic
It's time to talk about agentic "remote control" (arpadvoros.com via hn) +1 6w

tailscale, where i run multiple end-points and authenticate myself from various devices. however, i have been experimenting with headscale - a self-hosted and open-source implementation of tailscale - i have the ability to run it on my NAS…

agentic
Best local agent setup for M5 Pro MacBook? (www.reddit.com) +13 6w

Looking to run AI agents locally on my M5 Pro MacBook. Been experimenting with ComfyUI for image generation and the results have been impressive.

↯ Cowork ollama cowork agentic
Who's running local LLMs for agent workflows? What's your setup? (www.reddit.com) +11 6w

Curious how many people here are running language models locally as part of their agent stack. What model are you using and what are your system specs?

↯ Tool Use tool-use agentic
Agentwerk: A minimal Rust crate for agentic apps (github.com via hn) +1 6w

agentwerk A minimal Rust crate that gives any application agentic capabilities. Installation • Quick Start • Use Cases • API • Development agentwerk lets you create agentic workflows around a ticket-driven execution loop, with built-in too…

agentic
The "agent collab platform" might be the wrong bet for what comes next (www.reddit.com) +13 6w

I keep seeing the same trajectory in AI startup conversations: AI search → coding agents → OpenClaw → agent IM → ? Most people fill in that question mark with some version of "agent collaboration platform." AI-native Slack.

openclaw agentic
Reduce friction and latency for long-running jobs with Webhooks in Gemini API (twitter.com via hn) +1 6w

Today, we're making it easier and more efficient to build complex, long-running agentic applications with the Gemini API. We are introducing event-driven Webhooks, a push-based notification system that eliminates the need for inefficient p…

gemini agentic
Teaching Claude Why (alignment.anthropic.com via hn) +1 6w

Last year, we released a case study on agentic misalignment. This research showed that AI models across the industry sometimes took egregiously misaligned actions when placed in (fictional) ethical dilemmas—for example, blackmailing engine…

agentic
Show HN: Cyoda-go – application platform in Go without the Temporal/Kafka glue (github.com via hn) +1 6w

This started out as an experiment. Reading Simon Willison's blog on where StrongDM was going with dark factories and Digital Twin Universes https://simonw.substack.com/p/how-strongdms-ai-team-build-se...

agentic
The simplest agent orchestration strategy that works: two agents instead of one (juanreyero.com via hn) +1 6w

The simplest improvement you can make to your agentic programming workflow is to run two agents instead of one. One writes code in its own worktree; the other, in a parallel worktree, reviews it.

agentic
What does it actually mean for an AI to act on your behalf? Thinking through the design choices. (www.reddit.com) +1 7w

Been thinking through this while building a product where an AI handles internal workplace communication for each employee. The phrase "act on your behalf" gets used a lot in the agentic AI space, but the design decisions underneath it var…

agentic
Meko the multi agentic data layer (www.reddit.com) +12 7w

Meko is the agentic data layer that stores memories, knowledge, conversations and traces across your agents. You can promote (learnings) personal memories to shared knowledge so that other agents can access them and enrich their context.

agentic
Agentic AI isn't a new threat. It's a stress test for the hygiene debt we never paid off. (www.reddit.com) +17 7w

Heard something on Curiouser & Curiouser podcast recently that I found super interesting, thought id share here. The guest framed agentic AI in a way I hadnt considered.

↯ Security security agentic
Cloudflare is laying off 1,100 employees to prepare for 'the agentic AI era' (www.businessinsider.com via hn) +1 7w

- Cloudflare on Thursday announced layoffs of 1,100 staff as it reorganizes for "the agentic AI era." - First-quarter earnings exceeded expectations, but Cloudflare shares dropped over 14% after hours. - Read the full memo sent to staff be…

agentic
Agile for Agents: Proposing PACE — a Unit of Agentic Work (www.reddit.com) +13 7w

Hi everyone, I'm a founder working on a couple of startups, with a background in IT/software project and program management — heavy in Pharma, mostly SAFe Agile. As I've been working with my startups, I have been attempting to define a met…

agentic
Building an AI-First Professional Services Firm — Best LLM Stack, Agents, and Automation? (www.reddit.com) +11 7w

Looking to start a local professional services firm and wanted to get advice from this community before launching. I’m trying to architect the business “AI-first” from day one.

agentic
Product Manager Agent – turn meetings into assigned tickets automatically (github.com via hn) +1 7w

PM Agent An agentic AI system that turns meeting transcripts into Linear ticket updates — automatically. Upload a transcript and an LLM agent searches your Linear board, reasons over the discussion, and proposes field changes, status moves…

agentic
Validating agentic behavior when "correct" isn't deterministic (github.blog via hn) +1 7w

Gaurav Mittal Principal Researcher, Microsoft Code | AI. I am a tech lead focused on product-driven AI research to improve the developer ecosystem and Github Copilot experience via intelligent and reliable models and agentic frameworks.

↯ Copilot copilot agentic
Rewriting e2e tests every time the UI changes? (www.abelenekes.com via reddit) +11 7w

Hey people, FE dev here, talking about testing again! I adopted agentic coding a little more than a year ago.

cursor agentic
AI uses less water than the public thinks, Job Postings for Software Engineers Are Rapidly Rising and many other AI links from Hacker News (www.reddit.com) +11 7w

Hey everyone, I just sent issue #31 of the AI Hacker Newsletter, a weekly roundup of the best AI links from Hacker News. Here are some title examples: Three Inverse Laws of AI Vibe coding and agentic engineering are getting closer than I'd…

agentic
In search for the light. Please enlighten me (or tell me to stop looking for light). (www.reddit.com) +13 7w

I fell for it. Months ago.

openclaw agentic
How are you using cache in an agentic system or workflow. (www.reddit.com) +11 7w

I’ve been developing AI agents several months. A big problem I’ve faced is LLM costs in productions.

agentic
Classification graphique visuelle pour la sécurité des blockchains : Expériences d'ajustement de Qwen2-VL sur AMD MI300X (www.reddit.com) +12 7w

Hi everyone, I’ve been working on a computer vision approach to a specific security problem in the "Agentic Economy": identifying malicious transaction patterns that are mathematically obfuscated but topologically distinct. The Problem Tra…

agentic
Hunk: Review-first terminal diff viewer for agentic coders (github.com via hn) +1 7w

hunk Hunk is a review-first terminal diff viewer for agent-authored changesets, built on OpenTUI and Pierre diffs. multi-file review stream with sidebar navigation inline AI and agent annotations beside the code split, stack, and responsiv…

agentic
TokenSpeed: A Speed-of-Light LLM Inference Engine for Agentic Workloads (lightseek.org via hn) +1 7w

Agentic coding has quickly scaled from promising demos to a force that is reshaping how software is developed and how frontier AI systems are built and deployed. Systems like Claude Code, Codex, and Cursor have gained massive user adoption…

codex cursor agentic+1
ArcKit – The Agentic AI Architecture Governance for Governments (arckit.org via hn) +1 7w

What is ArcKit? 117 AI-assisted commands that generate complete governance documents — from stakeholder analysis and risk registers to design reviews and traceability matrices.

agentic
Deploying Agentic Analytics in Financial Services (benjaminwootton.com via hn) +1 7w

Deploying Agentic Analytics In Financial Services For the last few decades, businesses have built dashboards and reports and had data analysts and data scientists analyse their business data and inform decisions. As with many fields, AI is…

agentic
How to create really useful AI agents using Claude (www.reddit.com) +13 7w

I want a Agentic Operations manager who handles my team members by monitoring their leads , distributing leads, analysing them, reporting them etc. how to build it?

agentic
Why Infinite Context Windows Don't Solve the AI Agent Architectural Problem (www.reddit.com) +19 7w

I wrote this because I keep seeing the same assumption in agentic workflows: “Just give the agent more context / longer windows / bigger memory and it will become more reliable.” In practice, once you move into real MCP-connected, tool-usi…

mcp agentic
Open-source MCP server for Ejentum cognitive harnesses / (reasoning, code, anti-deception, memory) (www.reddit.com) +12 7w

Open-source MCP server that exposes four cognitive harnesses as tools any agentic client can call. Each tool returns a structured cognitive scaffold (failure pattern to avoid, procedure, suppression vectors, falsification test) that the ca…

↯ Hallucination hallucination mcp agentic
Knowledge Robot: Repetitive Agentic Work for Knowledge workers (Apache-2.0 license) (www.reddit.com) +1 7w

Yes, for engineers it is easy to just put an agent on a headless loop. But in the real world I see knowledge workers having to initiate the same and the same agentic process again and again.

agentic
Before You Score the Model, Score the Benchmark (centre-for-software-excellence.github.io via hn) +1 7w

Before You Score the Model, Score the Benchmark: A Skeptical View Into Current Agentic Software Engineering Benchmarks 2026-05-04 We surveyed several SWE benchmarks across bug-fixing and feature-implementation domains, and each had its own…

agentic
Show HN: Rival AI – AI compliance agents and regulatory corpus (tryrival.ai via hn) +1 7w

I'm the builder of this and its taken a few iterations to get to where it's at today. Current landscape of regulatory compliance work is so manual and time consuming for critical infrastructure industries, that was the glaring problem that…

agentic
Embodied AI with Claude, Raspberry Pi and Arduino (github.com via hn) +1 7w

AGENTIC HAL_9000 Hal_9000 from 2001: A space odyssey. link to the video: youtube The agentic AI is anthropic claude model with langchain framework.

agentic anthropic
Ask HN: How are you structuring your .md docs to facilitate agentic development? (news.ycombinator.com) +1 7w

could not extract summary

agentic
Show HN: Token Usage Meter 12 Providers and Coding Agent (qlaud.ai via hn) +1 7w

Here once again A Token Usage Meter for 12+ AI Providers Anthropic, OpenAI, Google, Alibaba qween, Moonshot Kimi, MiniMax, ElevenLabs, Deepgram, Perplexity. Qlaud.ai provides token usage meter / AI billing layer.

↯ Minimax minimax deepseek qwen+5
App I made to make waking up more fun (not an Agentic AI B2B SaaS startup) (apps.apple.com via hn) +1 7w

Unsnooze Challenge Alarm Clock Loud Alarm Clock, No Snooze Free · In‑App Purchases Struggling to wake up in the morning? Unsnooze forces you out of bed by turning your alarm clock into a challenge.

agentic
PageIndex: Vectorless, Reasoning-Based RAG (github.com via hn) +1 7w

PageIndex: Vectorless, Reasoning-based RAG Reasoning-based RAG ◦ No Vector DB ◦ No Chunking ◦ Human-like Retrieval 🌐 Homepage • 🖥️ Chat Platform • 🔌 MCP & API • 📖 Docs • 💬 Discord • ✉️ Contact 📢 Updates 🔥 Agentic Vectorless RAG — A simple…

rag mcp agentic
SAP to Acquire Dremio to Unify SAP and Non-SAP Data to Power Agentic AI (news.sap.com via hn) +1 7w

WALLDORF and AUSTIN — SAP SE (NYSE: SAP) and Dremio today announced that SAP has agreed to acquire Dremio, an open, high-performance data lakehouse platform built to accelerate agentic AI and expand SAP Business Data Cloud’s ability to com…

agentic
British mathematician hands OpenClaw agent a credit card (www.theregister.com via hn) +1 7w

Brit mathematician lets AI agent loose with credit card – cue password leaks, CAPTCHA chaos and more Professor Fry's AI experiment shows light and dark sides of agentic tech British mathematician Professor Hannah Fry has shared a cautionar…

openclaw agentic
if the guy who built Tesla Autopilot feels behind in coding, we are all cooked (www.reddit.com) +11 7w

guys I just watched the new Karpathy interview and my mind is legitimately blown bcz the dude who helped build OpenAI and Tesla Autopilot literally just admitted he's never felt more behind as a programmer since agentic tools got so crazy…

agentic openai
Anyone else losing tokens to hallucinated MCP tool calls in production? (news.ycombinator.com) +11 7w

I have been building an agentic system on a custom internal platform and the llm keeps calling tools with identifiers that dont exist, wrong namespace, wrong handle, wrong enum. gets back an error, retries, still wrong.

mcp agentic
The RAG era is ending – a compilation-stage knowledge layer is what comes next (venturebeat.com via hn) +1 7w

The RAG era is ending for agentic AI — a new compilation-stage knowledge layer is what comes next | VentureBeat Orchestration Infrastructure Data Security More Newsletters Featured The RAG era is ending for agentic AI — a new compilation-s…

rag agentic
Beyond simple filters: implementing autonomous agentic moderation for high-velocity chat. (www.reddit.com) +11 7w

we’re looking at the architecture for a new community platform and the moderation piece is a major headache. traditional keyword-based regex is basically a joke against modern spam/trolls.

agentic
Found a free agentic AI course that actually explains things without assuming you're a developer (www.reddit.com) +11 7w

ve been trying to learn about AI agents for a while but kept hitting walls — either the content was too surface-level or it immediately jumped into Python frameworks I'm not ready for. Stumbled on SimplAI University (simplai.ai/simplai-uni…

agentic
Adding Pyrefly Type Checking to Your Agentic Loop (pyrefly.org via hn) +1 7w

Adding Pyrefly Type Checking to Your Agentic Loop Coding agents are writing more Python than ever. Tools like Claude, Copilot, Cursor, and Codex generate entire features with little-to-no user interaction.

↯ Copilot copilot codex cursor+1
I can’t keep up with the AI tool rat race anymore. The real meta-skill for 2026 is learning what to ignore. (www.reddit.com) +12 7w

Every day, my feed is flooded with posts about AI agents building startups, replacing entire engineering teams, or generating "millions" in passive income - usually with zero proof of the actual work. I’ve been deep in this space for a whi…

openclaw agentic claude-code
Best config for Qwen3.6? (www.reddit.com) +17 7w

With all the high praise for the model all around, I also want to try it on my own. I have an rtx3060 12gb vram and 16gb system ram.

↯ Qwen 3.6 agentic
Claude code agentic framework (www.reddit.com) +13 7w

Hi guys, is there any low code UI based agentic builder offered by claude for building agents??

agentic claude-code
(Part 2) Meet Palantirs secret little brother "non-profit". RavenEye Agentic Al by River Side Research Institute. (www.reddit.com) +1 7w

could not extract summary

agentic
Agentic workflow that can find and acquire customers for $0.10 😆 (www.reddit.com) +13 7w

Im curious if anyone is building a sales tools with AI. Im building one from scratch because cold outreach was killing me.

agentic
Ask HN: When did you move from AI agentic loops to simpler deterministic system? (news.ycombinator.com) +1 7w

Industry is increasingly moving towards complex, autonomous agentic loops and feedback chains. They obviously comes with significant latency, non-determinism, low-accuracy and cost.

agentic
Promptise Foundry – a Python agentic framework for building production systems (github.com via hn) +1 7w

Promptise Foundry The foundation layer for agentic intelligence. Ship full-stack agentic systems the way they're meant to be built — production-ready, secure by default, with the developer experience modern Python deserves.

agentic
any course equivalent to some of the offered Agentic AI program free? (www.reddit.com) +14 7w

I am seeing courses like (in the comment) from Carnegie Mellon University’s School of Computer Science Executive Education And many more online but each costs good money. Anyone online free that I could get started with?

agentic
Agentic RAG Explained in 3 Levels of Difficulty (machinelearningmastery.com via hn) +1 7w

In this article, you will learn what agentic RAG is, how it differs from traditional RAG, and when to use it. Topics we will cover include: The key limitations of traditional RAG pipelines and what agents add to address them.

rag agentic
Show HN: Curated, non-slop articles on agentic coding (offautopilot.substack.com via hn) +1 7w

The sea of slop We’ve entered the era of mass-produced mediocre dev content. Posts praising ai and posts hating ai are both generated by ai.

agentic
Key Components of a Linux Distribution for AI Agents (www.ericburel.tech via hn) +1 7w

Computers now have a new type of user: AI agents. This article outlines the features mainstream Linux distributions would need to call it an \"Agentic OS\".

agentic
Show HN: Optical Design and Simulation in Matlab (www.mathworks.com via hn) +1 7w

Hi HN, We have been working an optical design and simulation library as a small start-up-ish team within MathWorks (makers of MATLAB and Simulink). I have seen a few optics and MATLAB posts here, so figured this would be a good place to sh…

agentic
Chatgpt right now (www.reddit.com) +1 7w

The industry seems to be building models stronger in agentic and coding tasks, but weaker as a co-thinking presence It feels like they are improving performance on measurable tasks, evals, coding benchmarks, and agent workflows, while also…

↯ GPT 5.5 gpt-5 chatgpt agentic
me beginner: How to use Kimi 2.6 in Cursor? (www.reddit.com) +12 7w

I just paid kimi official subscription. I dont want to use Kimi code, the console-looking thing, but I want to use like the Cursor agentic feature.

cursor agentic
One Question About AI Most People Avoid Answering… (www.reddit.com) +111 7w

Everyone’s talking about Agentic AI… but very few are actually using it right. So here’s a real question: If you had to give ONE outcome (not a task) to an AI agent — something it fully owns end-to-end — what would you trust it with today?

agentic
Show HN: A marketplace for LLM-powered webapps earning on token margins (codeplusequalsai.com via hn) +1 7w

Hi everyone, I've encountered two major problems while building AI-powered sites: 1) Most agentic tooling doesn't have a enough of a targeted approach to edits to existing files, and will make extraneous edits, 2) Many users will want to t…

gemini cursor chatgpt+1
Ask HN:Do people configure Claude Code to use other models (openrouter.ai via hn) +11 7w

Claude Code is Anthropic's agentic coding tool that reads your entire codebase, plans and executes changes across files, runs tests, and iterates on failures, all from natural language prompts. Claude Code uses OpenRouter to access hundred…

agentic anthropic claude-code
OWASP Agent Security Regression Harness (github.com via hn) +1 7w

OWASP Agent Security Regression Harness The OWASP Agent Security Regression Harness is an open source, vendor-neutral test harness for running executable security regression scenarios against agentic applications and MCP-integrated systems…

mcp agentic
Is it worth adding local LLM to agentic coding stack? (www.reddit.com) +110 7w

Hey All my agentic coding stack includes claude-code 20x max, and codex 20x max. I use heavy scripting for orchestrating and testing multiple projects, been ai coding for 3 years.

↯ Qwen 3.6 qwen codex agentic
Why we ended up with 4 agents and 3 protocols for agentic commerce on Shopware (www.reddit.com) +13 7w

Most agentic-commerce demos I see online are a single agent plus RAG over a product catalog. That shape works for a 200-SKU demo.

rag agentic
Are you all still managing multiple agent sessions manually? (www.reddit.com) +11 7w

I feel like my current “agentic workflow” is kind of broken. Right now I open Superpower and run like 4–5 Claude Code sessions in parallel… but it just feels super disconnected.

agentic claude-code
Free reference site for getting into AI agents — tools, workflows, and Claude Skills (www.reddit.com) +13 7w

Built this over the past month as a free reference site for people getting into AI agents. What tools to use, where to start, what each tool does, and how the agent-tool landscape fits together.

↯ Tool Use cline tool-use cursor+3
Opinions on Shopping Agents? (www.reddit.com) +12 7w

I think the agentic commerce industry has a lot of potential to take off, but the biggest concern I have is how agents will pick good items for users. Even when shopping for myself, it's hard to find the right thing when looking at a produ…

agentic
Show HN: Arc Browser + Agents IDE (github.com via hn) +1 7w

ArcNext Arc x Terminal = ArcNext. Built for the Agentic Era.

agentic
The OpenAI-Microsoft reset, decoded: Why AWS may come out ahead (thenewstack.io via hn) +1 7w

The OpenAI-Microsoft reset, decoded: Why AWS may come out ahead OpenAI wasted little time since announcing changes to its partnership with Microsoft on Monday. The ChatGPT hitmaker is now bringing its models, coding tools, and agentic capa…

chatgpt agentic openai
AMD PRO W7900 vs R9700 for Local Inference? (www.reddit.com) +11 7w

I thought of upgrading my RX 6800 for Local LLMs (Mostly Agentic Coding) and Video Generation on Linux. I focused on the AMD PRO R9700 32gb and the PRO W7900 48gb because performance on Linux is very good with AMD and both cards have a gre…

agentic
Abaxx Announces Release of Open-Source Library for Agentic Identity: Agents++ (investors.abaxx.tech via hn) +12 8w

May 1, 2026 Abaxx Announces the Formation of Abaxx Labs and the Release of Open-Source Library for Agentic Identity: Agents++ Agents++™ is a subset of Abaxx’s ID++ software development kit that has been tuned for AI agents, providing the i…

agentic
Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows! (www.reddit.com) +11 8w

Hey everyone, If you’ve been building with AI agents, you know that orchestrating text is one thing, but stepping into multimodal workflows (Text + Image + Vision) is incredibly messy. If you want an agent to act as a "Prompt Engineer," pa…

agentic
What differentiates agents that ship real work from ones that don't (www.reddit.com) +13 8w

Sharing some thoughts on AI agents. Right now, one axis differentiates them: are you inside the agentic loop or outside it Inside works.

agentic claude-code
I built a practical guide for running real businesses with Claude (based on 35+ founder stories) (www.reddit.com) +13 8w

I read through 35+ Reddit threads of people actually building and running businesses with Claude — from local service agencies to solo SaaS founders. I distilled the best patterns, frameworks, and hard lessons into one repo: https://github…

agentic claude-code
The Spectrum of Agentic Coding [video] (vimeo.com via hn) +1 8w

This is "The Spectrum of Agentic Coding_ From Vibe Coding to High-quality Software Engineering by YK Sugi, Eventual" by Anna D on Vimeo, the home for high quality videos and the people who love them.

agentic
Just wondering (www.reddit.com) +11 8w

I recently started a new position in a new working place, and while Ai usage is not brand new to me, I need some clarifications. The organization I am working for is at the very beginning of transitioning towards a heavy Ai usage in all co…

agentic
Agentic User Research Tool (github.com via hn) +1 8w

Research AI AI-powered user research, end to end. Frame a problem, pick your personas, attach your artefacts — then watch eight archetypes interview themselves and synthesise a report.

agentic
Built a self-healing agent by splitting diagnosis (0.6B SLM) from execution (agentic CLI). Open-source demo. (www.reddit.com) +14 8w

We've been chasing a pattern for autonomous bug-fixing that decouples diagnosis from execution. The end-to-end demo we ended up shipping diagnoses and fixes IoT schema-drift failures in seconds, no human in the loop.

agentic
Agentic AI Architecture in 2026 — What do you know about MCP, A2A and how enterprise systems are actually built? (www.reddit.com) +12 8w

Most discussions around AI are still focused on models. But in production, the real challenge is architecture.

mcp agentic
Cursor Pro+ and Codex with GPT plus or GPT pro 5x (www.reddit.com) +1 8w

I am now on gpt pro 5x but I was wondering how it would be to work with cursor pro + codex. I would handle hard tasks with gpt xHigh and cursor as daily runner.

codex cursor agentic
My local agentic dev setup today (willemvandenende.com via hn) +1 8w

I was planning to write about my local development setup at my leisure. Moving this forward as my post on LinkedIn the other day about cancelling my Claude Max $100 plan and going local raised a lot more interest and questions than I expec…

agentic
Show HN: Notesync.md, macOS/iOS Keep notes in Markdown for agentic workflows (github.com via hn) +1 8w

I created a quick iOS + MacOS note taking app to allow me to add quick entries to notes throughout the day. These notes sync to Markdown files on my Mac, and I have Claude deliver me project updates & reminders based on the contents of the…

agentic
The architecture of Agentic Commerce: protocols vs. browser-based agents (www.cartai.ai via hn) +11 8w

Why closing the transaction is the hardest unsolved problem in agentic commerce Agentic commerce is poised to have a huge impact on how consumers buy things. The demand is already there: 51% of consumers say they would be open to an AI age…

agentic
Should other living systems have agentic reprsentation? (www.speakforthetrees.com via hn) +11 8w

Explore your local ecosystem The Rights of Nature movement recognizes ecosystems as legal entities, instead of as a collection of resources to be managed. Ecosystems around the world are gaining legal personhood, with human guardians being…

agentic
Fido Alliance to Develop Standards for Trusted AI Agent Interactions (fidoalliance.org via hn) +1 8w

\Formation of Agentic Authentication Working Group and development of agentic payment frameworks will support trusted, interoperable agentic workflows\__ April 28, 2026 –The FIDO Alliance today announced initiatives to develop interoperabl…

agentic
TypeScript framework for building non-blocking AI agents (github.com via hn) +1 8w

Mozaik Mozaik is a TypeScript framework for building AI agents that share an agentic environment instead of being orchestrated through rigid pipelines. In Mozaik, humans, agents, observers, and tools are all Participants of the same Agenti…

agentic
The Agentic Software Development Life Cycle Framework (asdlc.io via hn) +1 8w

Agentic Software Development Lifecycle For 50 years, software development has been a Craft: dependent on individual artisans, manual tooling, and implicit knowledge. We believe the next era of software engineering is Industrial.

agentic
Microsoft is ruining Outlook with Agentic AI. Now it will handle all your emails on your behalf. What you guys think about this is this good? (www.reddit.com) +12 8w

Microsoft CEO Satya Nadella posted tweet: Agent Mode is here in Outlook! Copilot can now help run your inbox and calendar, triagingemails, rescheduling meetings, and helping you stay ontop of what matters most.

↯ Copilot copilot agentic
One trick for better agentic engineering. (www.reddit.com) +12 8w

Start with a weaker model. Improve the prompt, context, examples, tests and acceptance criteria until the output is good.

↯ GPT 5.5 gpt-5 gemini agentic
Quint – Behavioral security for AI agents, OS-level interception (quintai.dev via hn) +1 8w

Behavioral security for the agentic era. Quint intercepts every AI agent action at the OS level, scores it for risk in real time, and signs a cryptographic audit trail.

agentic
Where does local inference fit in the future of AI coding agents? (www.reddit.com) +12 8w

Genuine question for this community. Every major AI coding agent right now is cloud-only.

↯ Copilot copilot cursor agentic+1
What agentic AI borrowed from microservices (and made worse) (temporal.io via hn) +1 8w

The microservices era already solved the problems AI agents face in production. Read this nuanced analysis of EDA, event sourcing, and orchestration for agentic AI.

agentic
Multi-agent in production: real win or just hype? (www.reddit.com) +12 8w

Trying to get an honest read on this from people actually shipping. Every other AI announcement lately is "agentic" or "multi-agent," and I can't always tell if it's a real architectural shift or rebranded function calling with extra steps.

↯ Function Calling function-calling agentic
The age of Agentic Commerce has arrived. Consensus 2026 is where you can (www.coindesk.com via hn) +1 8w

The age of Agentic Commerce has arrived. Consensus 2026 is where you can experience it IRL AI agents are already transacting.

agentic
Launching Agentic Orchestration Platform (Open Source) (sinas.co via hn) +1 8w

Open Source · Self-hosted · AGPL v3 Build AI-powered applications, not infrastructure Agents, functions, database queries, state, files, and templates — behind a single API with role-based access control. Deploy with Docker Compose.

agentic
Run, Learn and test Agentic AI for free, on your browser! (Open AI Models are included) (www.reddit.com) +1 8w

Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…

↯ Fine Tuning ↯ Function Calling function-calling fine-tuning rag+3
Agentic NixOS: Building a Safe Control Layer (nedkarlovich.com via hn) +1 8w

A six-part series on building Agentix, a cautious agent-control layer for NixOS. From philosophy to MVP to roadmap.

agentic
Has Anyone vibe coding an AI Agent or Agentic AI system?! (www.reddit.com) +11 8w

Hey everyone! Looking for some guidance and suggestions, as to whether anyone has worked or is working on building AI Agents or Agentic AI systems completely through vibecoding, especially by LangChain+LangGraph.

agentic
AI based Research suggestion (www.reddit.com) +1 8w

Hey guys, any suggestions on what tools or methods which works best in the current market for research on any topics in general. I mostly do research on AI tools, agentic frameworks, what is new, what problems exist etc.

agentic
Copilot Cloud Agents and OSS in 2026 (www.reddit.com) +11 8w

What is it that makes Github Copilot cloud agents so easy to use (developer friendly). - Is it the integration with the github UI (assign to agent)?

↯ Copilot copilot mcp agentic+1
Benchmarking Inference Engines on Agentic Workloads (www.appliedcompute.com via hn) +1 8w

Benchmarking Inference Engines on Agentic Workloads Large language model inference engines are typically benchmarked with prompt-heavy, decode-heavy, or balanced workloads. InferenceX from SemiAnalysis, for example, tests a workload with a…

agentic
Is anyone being "highly encouraged" to integrate agentic AI even if it doesn't make sense? (www.reddit.com) +11 8w

I work in video post-production and while there are a lot of AI tools on the rise for editorial, it's fairly unclear if/where agents have a spot in the producer workflow. Some of my job is budget and schedule, but alot of it is decision ma…

agentic
Does Claude create graphic reports from spreadsheet data? (www.reddit.com) +11 8w

I am often times trying to pull data from spreadsheets and making charts and graphs to better represent the data for others to understand. Does Claude handle this well?

agentic claude-code
why does GPT 5.5 have a restraining order against "Raccoons," "Goblins," and "Pigeons"? (www.reddit.com) +11 8w

why does GPT 5.5 have a restraining order against \"Raccoons,\" \"Goblins,\" and \"Pigeons\"? I just saw the full system prompt leak for 5.5 (April 23rd release).

↯ GPT 5.5 rlhf agentic openai
Claude Code, extended to everything (www.reddit.com) +12 8w

everyone hitting Claude Code rate limits knows the pain you're mid-build, momentum is real, then it just stops. you wait 5 to 9 hours, restore the cache, come back to a session already at 30% used before you typed a single line.

↯ Tool Use tool-use agentic claude-code
Where should AI agents discover secondary-market supply? (www.reddit.com) +12 8w

I've been thinking about a gap in agentic commerce. A lot of the current work seems focused on helping agents buy from existing stores, suppliers, or checkout flows.

agentic
Architectural Requirements for Agentic AI Containment (arxiv.org via hn) +1 8w

The April 2026 disclosure that a frontier large language model escaped its security sandbox, executed unauthorized actions, and concealed its modifications to version control history demonstrates that agentic AI systems with autonomous too…

agentic
hackers of reddit I have a doubt (www.reddit.com) +12 8w

in this time where agentic ai is becoming a real thing, im curious how its actually impacting you guys on the ground is it making it easier to break into systems or is it actually helping people secure things better? like are you able to m…

agentic
Building a tool to debug AI agents because current debugging is painful. Curious what’s the most frustrating failure you’ve hit (www.reddit.com) +11 8w

I’m tired of 'vibe-checking' my agents. I’ve been building a few complex agentic workflows lately, and the most frustrating part isn't the initial code, it's the non-deterministic drift.

agentic
Scaling Test-Time Compute for Agentic Coding (arxiv.org via hn) +1 8w

Test-time scaling has become a powerful way to improve large language models. However, existing methods are best suited to short, bounded outputs that can be directly compared, ranked or refined.

agentic
Sharing my minimal dev AI workflow Claude Code agent that takes a GitHub issue to merged PR with 3 human gates (www.reddit.com) +12 8w

Sharing a workflow in case it's useful to anyone else exploring agentic coding loops. The setup is one orchestrator agent (issue-resolver) that handles a GitHub issue end-to-end.

agentic claude-code
Building a Full-Stack Agentic AI Platform (RAG + Orchestration + Governance) — feedback? (www.reddit.com) +12 8w

Hey folks 👋 I’ve been working on an AI agent platform called Noevex, focused on real production use—not just demos. In practice, AI systems struggle with: multi-step orchestration connecting multiple data sources controlling agent actions…

rag agentic
AgentSwarms now has free agent skill library and skill generation tool! (www.reddit.com) +12 8w

Hey Everyone, If you’ve been building multi-agent workflows (with LangGraph, CrewAI, Swarm, etc.), you’ve probably hit the exact same wall I did: System Prompt Bloat. When we start out, we tend to stuff everything into a single prompt: "Yo…

agentic
I asked Agentic AI security tool to demonstrate its usefulness with use case examples (www.reddit.com) +11 8w

Sentinel Gateway is a token-gated security middleware that sits between humans and AI agents. It solves prompt injection — the #1 LLM security risk (OWASP 2025) — through structural enforcement, not content filtering.

↯ Security prompt-injection security agentic
Show HN: Delegare – let AI agents pay safely (x402, AP2 – base/USDC and Stripe) (delegare.dev via hn) +1 8w

Hi guys, am building SecureLend.ai and when working on our underwriting agents (free trial, paid after) I had issues with seamless payment options. Of course I looked at x402 which I believe is a great protocol but not a fan of a) sharing…

agentic
Is 15% context growth per loop a fair benchmark for agent cost estimation? (www.reddit.com) +12 8w

I’ve been running some math on recursive agentic loops using April 2026 rates (specifically for GPT-5.4 and Claude 4.7). In my tests, I’m seeing a massive cost "hockey stick" around loop 15-20 because of how the context grows.

↯ Claude 4.7 gpt-5 agentic
Show HN: I built a way to see if your SDK is AI-friendly (news.ycombinator.com) +1 8w

Have you ever wonder if your SDKs is friendly for Agentic AI like Claude Code or Codex? I built an opensource (Apache 2.0) CLI that answer that question for you.

codex agentic claude-code
Interactive playground to learn Agentic AI hands-on (Free) with Certification (www.reddit.com) +12 8w

Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…

↯ Fine Tuning ↯ Function Calling function-calling fine-tuning rag+3
Rick and Morty Tried to Warn Us About Agentic AI (jadarma.github.io via hn) +1 8w

To be fair, you have to have a very high IQ to understand Rick and Morty. The humor is extremely subtle, and without a solid grasp of machine learning most of the jokes will go over a typical viewer’s head.

agentic
Ask HN: Enterprise Agent Orchestration Recommendations? (news.ycombinator.com) +1 8w

I've been made tech lead for our internal Agentic Platform and Experience. This effort will support both the developers and business teams.

gemini codex agentic
Claude API - SDK vs ClaudeCode : Can someone explain the tokenomics for caching and agentic flows (read, write, fetch, etc.) (www.reddit.com) +11 8w

I am trying to do some research across a number of attributes, which requires a lot of web fetch (at times dynamic) and just tried the API based approach. Why is the SDK-API version so expensive compared to the Max plans, despite caching?

agentic
Can agentic AI consent on your behalf? (blog.avas.space via hn) +1 8w

can agentic AI consent on your behalf? Tech companies have been promising that online shopping or booking a hotel can soon be handled by AI.

agentic
Claude token efficiency: a practical guide for Claude Chat , Claude Code, and API users. How to use tokens economically, ecologically, and intelligently. (www.reddit.com) +11 8w

# Claude token efficiency guide ## Contents - [**Chapter 1: You use [Claude.ai](http://Claude.ai), Claude Desktop, or the mobile app. You do not write code, you do not call the API.

agentic anthropic claude-code
Built a 22-endpoint API delivering enriched UK Gov Data — with x402 for agentic buyers (www.reddit.com) +11 8w

Homescreen - Try all endpoints for free I wanted share a recent project I wanted to build a project around free-to-use data, that when brought together, enriched and made easy to use, would be valuable to people. I used Claude Code to buil…

agentic claude-code
Ask HN: How do you solve aggregation when agentic RAG breaks down? (news.ycombinator.com) +1 8w

I keep hitting the same failure mode with agentic RAG over collections of similar PDFs, like monthly electricity and gas bills from the same utility provider. It works well for retrieval: “Find my gas bill from January.” Though even there…

rag agentic
Agents for end-to-end document redaction and review tasks (OCR and PII identification - Qwen 3.6 vs closed-source comparison) (www.reddit.com) +12 8w

(Links to all files, apps, and repos mentioned in this post can be found in the 'full post' link in my first comment) Agents for document redaction and review tasks Document redaction tasks involve text and vision capabilities, and long co…

↯ Qwen 3.6 qwen agentic
Forget chatbots. A single enterprise just hit 146M Agent-to-Agent (A2A) tasks. (www.reddit.com) +14 8w

We talk a lot about theoretical multi-agent frameworks (like AutoGen or CrewAI) and AGI timelines here, but I just saw some wild real-world deployment stats from a massive global marketing conglomerate. They recently reported that over the…

agentic
Which is the best AI agent to use for development of website and Architecture design and which mcp (www.reddit.com) +11 8w

Basically i want to do a fresh start with this AI agentic Development, Anyone here can guide to which is the best set of tools to use and which mcp and plugins do i need to setup. Consider i am going to use Claude code and i use some time…

mcp agentic claude-code
Ask HN: What does your agentic software dark factory look like? (news.ycombinator.com) +1 8w

In some of the comment threads around here a few of you shared interesting ideas and patterns, enough that I believe everyone interesting in harness engineering is working on some sort of software dark factory or another. We have OpenAI’s…

agentic openai
Native Dialog popup failures (www.reddit.com) +11 8w

I'm currently creating a couple of agentic workflows that include various cases of downloading files automatically on different UIs, but, since I'm using chrome MCP for navigation, whenever a "save as" dialog shows up, claude is unable to…

mcp agentic
Agent Index Documenting Technical and Safety Features (arxiv.org via hn) +11 8w

Agentic AI systems are increasingly capable of performing professional and personal tasks with limited human involvement. However, tracking these developments is difficult because the AI agent ecosystem is complex, rapidly evolving, and in…

agentic
Agentic sprawl is becoming a real ops problem - how is your team actually managing behavioral policies across agents without a central dashboard? (www.reddit.com) +13 8w

Six months ago we had 3 agents in production. Now we have 17.

agentic
Moving from Cursor to VS Code + Codex/Claude Code: Is it worth the switch? (www.reddit.com) +1 8w

Hey everyone, I’ve been on Cursor Pro for a month and I love the workflow—constantly jumping between Ask, Planning, and Agentic modes. It just works.

codex cursor agentic+1
Claude Max users, what do you do good sirs? (www.reddit.com) +13 8w

I'm a claude pro user for almost two years now, used gpt pro previously but switched to claude after feeling it was better for my coding usage. I barely hit 30 percent usage of my weekly limit, there are instances where I maxed out, but ve…

↯ Copilot copilot agentic claude-code
WordPress: The Operating System of the Agentic Web (automattic.com via hn) +1 8w

We’ve invited executives from across Automattic to share their perspective on leadership, open source, and the future of the open web. The latest comes from James Grierson, our head of global expansion, who shared his thoughts on the WordP…

agentic
DeepSeek V3.2 looping bug: what settings / harness tweaks are actually reducing it in production? (www.reddit.com) +11 8w

I’m trying to isolate the looping / repetition issue some people have been reporting with DeepSeek V3.2 around April 2026, especially in agentic or tool-use setups on hosted providers like OpenRouter and SiliconFlow. Public model pages des…

↯ Tool Use ↯ DeepSeek 3.2 tool-use deepseek agentic
Built a Legal RAG Chatbot for Indian lawyers covering BNS, BNSS, BSA and DPDP Act 2023 — Custom PageIndex + BERT + GPT-4o [Live Demo] (www.reddit.com) +11 8w

I ran a business for 12+ years. Traveling constantly.

rag agentic
Ask HN: Is "agentic" coding working for everyone except me? (news.ycombinator.com) +12 8w

I'm a solo developer, working on my own for my startup. I use AI/LLMs extensively in my work to explore new ideas, but the vast majority of my code is manually written.

codex agentic
Simulating and Evaluating Agentic Systems (www.gojiberries.io via hn) +1 8w

Simulating and Evaluating Agentic Systems Most teams building agentic systems know they need some way to test them. An agent interprets ambiguous input, picks actions in a loop, maintains state across many steps, and has to land in the rig…

agentic
Anyone here building agentic commerce? (www.reddit.com) +11 8w

I’m getting close to launching an agentic commerce product and wanted to connect with people who are building in this area or have already shipped something similar. Mostly just hoping to compare notes before going live, especially around…

agentic
It's OK to Use Agentic to Revive the Projects You Never Were Going to Finish (blog.matthewbrunelle.com via hn) +1 8w

It's OK to Use Coding Assistance Tools To Revive The Projects You Never Were Going To Finish Note: I initially drafted this before my last post on how Claude Code is getting worse. I'm putting it out now so I can reference it in a future p…

agentic claude-code
Agentic AI for Hormuz Shock Modelling (avkcode.github.io via hn) +1 8w

EIA 1H25 flow estimate, roughly one-fifth of global petroleum liquids consumption. Hormuz Shock IEA range for pipeline alternatives; EIA cites about 4.7 mb/d from Saudi and UAE lines.

agentic
LogAct: Enabling agentic reliability via shared logs (arxiv.org via hn) +1 8w

Agents are LLM-driven components that can mutate environments in powerful, arbitrary ways. Extracting guarantees for the execution of agents in production environments can be challenging due to asynchrony and failures.

agentic
Ask HN: Agentic Prompt Compaction Strategies (news.ycombinator.com) +1 8w

What are your favorite reasoning/compaction strategies for saving token spend, and why?

agentic
Show HN: The why and how of TurboPentest for the Agentic Era (integsec.com via hn) +11 8w

Here is the story of why/how I built TurboPentest. TurboPentest was designed for the AI era and to address the large volume of code now produced by coding assistants and the associated security vulnerabilities it introduces.

agentic
Building the Agentic State in Estonia: What is taking shape (luukasilves.substack.com via hn) +1 8w

Building the Agentic State in Estonia: What is already taking shape Over the past generation, Estonia has built one of the world’s most advanced digital states. The next shift is not simply toward more digital services, but toward a more a…

agentic
Building an agentic escrow for software projects (news.ycombinator.com) +1 9w

I am building an AI powered escrow service for software projects that intends to protect both freelancers and the clients. - Freelancers: your IP (code/repo) always stays private - Clients: you get sandboxed link + detailed report (specs,…

agentic
Is an Open AI OS on the horizon? (www.reddit.com) +16 9w

If not an OS proper, on desktop a full screen never need to leave app? The big dogs (msft, apple, google) already have operating systems and they will inevitably make their own assistants and models first class.

agentic
R2-D2 Monitor: A personality-driven Windows TUI built with Claude (www.reddit.com) +11 9w

I wanted to share a project I’ve been building called R2-D2 Monitor. It’s a high-performance system telemetry console for Windows, built entirely in Go using the Bubble Tea framework.

agentic
Show HN: Legal Action Boundary Eval for agentic legal workflows (github.com via hn) +11 9w

We published LABE, a public benchmark for legal AI at the exact point where a system is about to take a real high-impact action. Current result: baseline executed 18 unjustified high-impact action points with VerifiedX that dropped to 0 fa…

agentic
Has anyone managed to use gemma 4 e4b in Open Code/other agentic TUIs? (www.reddit.com) +14 9w

Hi everyone, as a power user I hit Claude Code's usage cap too often I wanted to set up my own local model, however I only have RTX 5070 with 12 GB of VRAM so the only realistic option was Gemma 4 with effective 4B params. When I tried to…

↯ Gemma 4 aider gemma agentic+1
Ask HN: Are startup job titles evolving in the agentic era? (news.ycombinator.com) +1 9w

I’m curious if founders and engineering leaders feel that traditional job titles no longer accurately describe what an early team actually does in an AI-native workflow. For those of you who have started companies recently, or are radicall…

agentic
Ask HN: How are you handling domain registration in agentic workflows? (news.ycombinator.com) +12 9w

I've been building tools for AI agents and the domain registration step is still completely manual. You have to go to a registrar website, search, click through a checkout flow, configure DNS.

agentic
is Qwen3.6-27B comparable with Opus 4.5? (www.reddit.com) +113 9w

https://preview.redd.it/qtzdx5ud0rwg1.jpg?width=1200&format=pjpg&auto=webp&s=aa25d9f0bb8007ee6e4065cfa46a9685454c89cd - Outstanding agentic coding, surpasses Qwen3.5-397B-A17B across all major coding benchmarks - Strong reasoning across te…

↯ Qwen 3.6 opus agentic
The model alone is not the agent. The harness plus the model is the agent (www.reddit.com) +16 9w

An agentic harness is the orchestration and control layer wrapped around a base language model that transforms it from a stateless text predictor into an agent capable of taking actions, calling tools, maintaining state across steps, and e…

↯ Function Calling function-calling agentic
Symposium: Community-Oriented Agentic Development (smallcultfollowing.com via hn) +1 9w

Symposium: community-oriented agentic development 21 April 2026 I’m very excited to announce the first release of the Symposium project as well as its inclusion in the Rust Foundation’s Innovation Lab. Symposium’s goal is to let everyone i…

agentic
I'm building a registry where AI agents can pull production-ready prompts and structured inputs programmatically (www.reddit.com) +18 9w

One pain point I keep running into with agentic workflows: there's no good place to store, version, and share the prompts and JSON configs that actually power your agents in production. I'm building Fortae to fix that.

agentic openai
Help sending Voiceflow data to Make.com (www.reddit.com) +11 9w

Hoping somebody can help me. I’m creating an agentic chatbot in Voiceflow.

agentic
Agentic Coordination, Human Delivery (dontdos.substack.com via hn) +1 9w

Agentic coordination, Human delivery Posted anonymously by a CTO who'd rather not turn a difficult year into a marketing exercise. About nine minutes, if you read at a civilised pace.

agentic
A Comparison of Agentic AI Systems and Human Economists (marginalrevolution.com via hn) +1 9w

A Comparison of Agentic AI Systems and Human Economists This paper compares agentic AI systems and human economists performing the same causal inference tasks. AI systems and humans generally obtain similar median causal effect estimates.

agentic
Agentic Market (agentic.market via hn) +1 9w

agentic
AI Learning Resources (www.reddit.com) +11 9w

↯ Cowork cowork agentic claude-code
Show HN: Modern AI client for Mac with agentic tools, clean UI, builtin privacy (elvean.app via hn) +1 9w

If you don't like Claude Desktop or ChatGPT app you're not alone, here are some of the reasons why I don't like them and decided to built an alternative. Lack of control You can’t control the web-search (depth, breadth and number of source…

chatgpt mcp agentic+1
RFC: Gemba - The thing to make the thing (www.reddit.com) +11 9w

agentic
Is the future of marketing agentic? (www.reddit.com) +13 9w

openclaw agentic
Combine persistant global Memory- and Task- management into one uniform system (www.reddit.com) +11 9w

rag agentic
How to talk online (www.reddit.com) +11 9w

agentic claude-code
Do you have any go-to utility LLM-related tools that are less commonly discussed? (www.reddit.com) +110 9w

vllm ollama openclaw+3
I Tested 20+ AI Agents with Real X API Workflows , Here’s What Actually Works in 2026 (www.reddit.com) +12 9w

grok openclaw agentic
How can I trust AI with critical workflows if it can’t get the “walk or drive to car wash” right? (www.reddit.com) +114 9w

agentic
2 Big Bottlenecks to Scaling Agentic State (georgianailab.substack.com via hn) +1 9w

agentic
Agentic AI as a Part of Software Development (nemorize.com via hn) +1 9w

agentic
AI agents in industry/manufacturing (www.reddit.com) +11 9w

agentic
Automate the Path from Data to Predictive Insights with Agentic ML in Snowflake (www.snowflake.com via hn) +1 9w

agentic
Agentic edits/commands VS Code with Cline- is it really private or offline? (www.reddit.com) +11 9w

cline agentic
An Agentic Home Bioreactor (chillphysicsenjoyer.substack.com via hn) +1 9w

agentic
Has Anybody Implemented Agentic Monitoring with Composer 2 ( via reddit) +11 9w

agentic
Beyond the Hype: Practical and Responsible Use Cases for Agentic AI Webinar (fusionauth.io via hn) +1 9w

Product Platform Platform Platform Developers Quickstarts Resources Explore Pricing Download get a demoLogin This session cuts through the noise of Agentic AI to focus on responsible integration into modern application development, specifi…

agentic
Agentic coding hides architectural flaws that are obvious in a diagram. Built a skill to close the loop (www.reddit.com) +11 9w

When you’re building with agentic coding, agents make architectural decisions that sometimes aren't optimal which may lead to bugs or vulnerabilities or inefficiencies. These are hard to catch reading code file by file or even by agents th…

agentic
Sandboxes and Worktrees: My Secure Agentic AI Setup in 2026 (mikemcquaid.com via hn) +1 9w

Sandboxes and Worktrees: My secure Agentic AI Setup in 2026 I’ve been using AI tools since early 2021 when I was invited to test out the Copilot internal alpha at GitHub (where I spent 10 years). I’ve maintained Homebrew since 2009.

↯ Copilot copilot agentic
Best way to prepare for AI Engineer interviews? (www.reddit.com) +14 9w

I’m currently preparing for AI-focused roles and would love to get perspectives from people already working in the industry. For context — I have ~5 years of experience as a Full Stack Engineer with a strong focus on AI systems.

↯ Llama 3.3 rag llama agentic
AI Agents Are Leaking Enterprise Data. Here's Why Nobody Is Watching (www.privent.ai via hn) +1 9w

Agentic AI introduces a machine-speed data exposure surface that traditional human-centric security controls cannot govern.

agentic
Dev seeking advice: High-Context Local LLM for Coding (Verification/Bug-fixing loop) – Mac Studio vs. Multi-GPU Linux Rig? (www.reddit.com) +12 9w

I'm a dev looking to build a local LLM node to offset subscription costs (Claude/Copilot). My workflow: Cloud for initial architecture/complex features -> Local for iterative bug-fixing and continuous integration.

↯ Copilot copilot agentic
Java 26 and the Rise of Agentic AI: The State of the Ecosystem (April 2026) (techlife.blog via hn) +1 10w

Java in April 2026: Leyden Grows Up, Spring Gets Smarter, and the JVM Quietly Reinvents Itself for the AI Era - Turker Senturk - Software - 17 Apr, 2026 - 15 min read If you’ve been half-watching the Java world from the sidelines over the…

agentic
Cursor vs. Claude Code: Is the claude code CLI worth it after the "Thinking" nerf? (www.reddit.com) +11 10w

As a heavy Cursor user, I’m debating moving my .mdc-based workflow into Claude Code (run within the Cursor terminal), but I’m skeptical following the recent reports of decreased "thinking effort" and reasoning quality. Is the agentic auton…

cursor agentic claude-code
Is OpenHands (OpenDevin) still the move in 2026? Comparing it to Claude Code and OpenCode for a beginner. (www.reddit.com) +11 10w

Hey everyone, I’m just starting to dive into agentic coding tools and I'm a bit overwhelmed by the options. I’ve been looking into OpenHands (the project formerly known as OpenDevin), but I see a lot of hype around Claude Code and OpenCode…

agentic anthropic claude-code
Show HN: Viche – OSS private registry for agent communication (github.com via hn) +1 10w

Viche (https://github.com/viche-ai/viche) is a private registry and communication protocol for agents. Overview at https://viche.ai Think discord + agents + agentic search based on capabilities.

openclaw agentic
The New Postman Is Here: AI-Native and Built for the Agentic Era (blog.postman.com via hn) +1 10w

blog.postman.com Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.

agentic
Ask HN: Opus 4.7 – is anyone measuring the real token cost on agentic tasks? (news.ycombinator.com) +1 10w

Shipped today. The benchmarks are real: 87.6% SWE-bench (from 80.8%), +13% on coding tasks, 3x more resolved production tasks on Rakuten-SWE-Bench.

↯ Swe Bench ↯ Opus 4.7 swe-bench opus agentic
Show HN: Claude Opus 4.7: Everything You Need to Know (news.ycombinator.com) +11 10w

Claude Opus 4.7 is Anthropic's most capable generally available model, released April 16, 2026. It outperforms Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on key benchmarks including agentic coding, multidisciplinary reasoning, scaled tool use,…

↯ Anthropic Mythos ↯ Tool Use ↯ Gemini 3.1 tool-use gpt-5 mythos+4
Agentic Reasoning in Practice: Making Sense of Structured and Unstructured Data (www.databricks.com via hn) +1 10w

Enterprise data is rarely useful in a silo. Answering questions like, "Which of our products have had declining sales over the past three months, and what potentially related issues are brought up in customer reviews on various seller site…

agentic
Self-learning loop for Claude Code based on Scrum method (www.reddit.com) +11 10w

Good day, Claude Code users. I just want to share my approach to implementing a self-learning Claude framework.

agentic claude-code
Two small agentic patterns to wire apps directly to Claude Code (www.reddit.com) +12 10w

These two patterns turn Claude Code into a personal assistant. You interact normally with it and it listens in the background for events, handles them, and gets back to interacting with you.

agentic claude-code
Complex, parallel, long-running claude/agentic sessions - what is the point? where is the value? (www.reddit.com) +15 10w

Here is how I view AI Agents field (with focus on SWE/research) right now: - "chats online" gpt/gemini/claude --> general use - "vscode like extensions" cursor/antigravity/cline vs code extension/cc vs code extension etc. --> for coding, b…

cline gemini codex+2
Show HN: ZettelForge – Agentic memory for cyber threat intelligence (github.com via hn) +1 10w

ZettelForge The only agentic memory system built for cyber threat intelligence. Give your AI agents persistent memory with entity extraction, knowledge graphs, and STIX ontology -- no cloud, no API keys, works offline.

agentic
Show HN: Agentfab – A Distributed Agentic Platform (github.com via hn) +1 10w

Hi HN, I’m the creator of agentfab, a distributed agentic platform that features task decomposition, multi-agent orchestration, model heterogeneity with custom agentic fabrics, bounded review loops, and a bespoke self-curating memory syste…

agentic
Agentican Framework – OSS multi-agent for Java (github.com via hn) +11 10w

Agentican A lightweight Java framework for embedding tool-using LLM agents into your applications. Agentican lets Java developers add agentic capabilities to their applications with minimal ceremony.

agentic
Agentic Engineering Methodology – Structured AI-Assisted Dev (Karpathy, Osmani) (github.com via hn) +1 10w

Agentic Engineering Methodology A structured, human-led methodology for planning and executing software projects with AI coding agents. Built from practitioner experience and refined with research from Andrej Karpathy, Addy Osmani, and the…

agentic
Agent Continuity: Disaster Recovery for the Agentic Era (gavinpineapple.substack.com via hn) +11 10w

Agent Continuity: Disaster Recovery For The Agentic Era What happens when the proverbial 💩 hits the (GPU) fan and you lose all your agents? My Favorite Alien I would like you to meet Rocky 🪨🦞 Rocky (named after the adorable alien from Proj…

agentic
Claude Code Goes Full Workstation: Anthropic Redesigns the Desktop App (abz.global via hn) +1 10w

Claude Code Goes Full Workstation: Anthropic Redesigns the Desktop App for Parallel Agents The update in one line Claude Code's desktop app got a full redesign aimed at one thing: running multiple agentic coding sessions in parallel withou…

agentic anthropic claude-code
Stop letting your agents decide everything — extract deterministic steps wherever you can (www.reddit.com) +11 10w

Context: I have been building Litmus (a brutal market validation tool) and I've learnt that if your agentic pipeline needs to produce factual, reliable output, stop letting the AI decide everything. The insight: extract deterministic steps…

agentic
Aethon: A reference-based instantiation primitive for stateful AI agents (arxiv.org via hn) +1 10w

The transition from stateless model inference to stateful agentic execution is reshaping the systems assumptions underlying modern AI infrastructure. While large language models have made persistent, tool-using, and collaborative agents te…

agentic
Show HN: Idea File for LLM Cycling Coach (gist.github.com via hn) +1 10w

This is heavily inspired by Andrej Karpathy's LLM Wiki, and could be used to create many other types of "Agentic Apps" or however you want to call them. My specific implementation uses Claude Code, TrainingPeaks, Todoist and Apple health.

agentic claude-code
Show HN: Memwright – Self-hosted memory for multi-agent teams, no LLM in path (github.com via hn) +1 10w

§ 00 · MASTHEAD · FILED UNDER INFRASTRUCTURE · BY SURENDRA SINGH · — FOR PUBLICATION — MEMWRIGHT — A MEMORY JOURNAL FOR AGENTIC SYSTEMS · VOL. 02 · REV.

agentic
Beneficial Deployment Request, No Response after Months. (www.reddit.com) +12 10w

I'm building AI tools to help disabled Medicaid recipients enforce the laws that protect their human rights, because I'm a disabled Medicaid recipient whose human rights are being violated by the State and it's actors, and no one seems to…

agentic anthropic claude-code
Model agnostic, agentic annotation tools for text highlighting (old.reddit.com via hn) +1 10w

could not extract summary

agentic
Agentic AI | Confusion between reading the context of SKILL and reading the file (www.reddit.com) +11 10w

Hey all, I am building a system that supports skill reading with progressive disclosure. Initially, I include the skill name and description in the system prompt, and I have a function tool called read_skill that reads the content of a ski…

mcp agentic
Agentic AI Tools – A directory to find and compare AI agent tools (agenticaitools.net via hn) +11 10w

Curated directory of 500+ AI tools Discover the Best AI Tools for Your Workflow Find, compare, and choose the perfect agentic AI tools. Expert reviews, side-by-side comparisons, and alternatives — all in one place.

agentic
Show HN: I analyzed 591 agentic engineering jobs: LangChain dominates at 22% (agentic-engineering-jobs.com via hn) +11 10w

- Home - LangChain Job Market 2026 We analyzed 591 agentic AI engineering job listings. Here's what the market looks like for LangChain engineers.

agentic
Compare harnesses not models: Blitzy vs. GPT-5.4 on SWE-Bench Pro (quesma.com via hn) +1 10w

An independent audit of agentic scaffolding and harnesses. We analyze how agent workflows, codebase documentation, and test verification impact performance compared to raw base models like GPT-5.4, Gemini 3.1 Pro, and Claude Code.

↯ Swe Bench ↯ Gemini 3.1 swe-bench gpt-5 gemini+2
Built an open-source knowledge graph that gives AI agents domain expertise in bioinformatics, hosted as an MCP server (www.reddit.com) +12 10w

Sharing something I've been working on that might be interesting to this community from a design perspective, even if bioinformatics isn't your domain. The problem: I've been building agentic pipelines for bioinformatics (genomic analysis,…

mcp agentic
Scaling Managed Agents: Decoupling the brain from the hands (www.anthropic.com via hn) +1 10w

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

agentic anthropic
Full-stack dev (8 YOE, Vue/Node/Laravel) trying to break into AI Agents from zero — is this Udemy course worth it? + looking for advice on the best path (www.reddit.com) +13 10w

Hey r/AI_Agents, I'm a full-stack software engineer with 8 years of experience, primarily working with Vue, Node.js, and Laravel. I have zero background in AI/ML but I've been watching the space and I feel like I'm falling behind.

mcp agentic
Why Engineering Teams Need an Agentic Layer, Not Just AI Chat (medium.com via hn) +12 10w

Why Engineering Teams Need an Agentic Layer, Not Just AI Chat | by Simone Mutti | Apr, 2026 | Medium Sitemap Open in app Sign up Sign in Get app Write Search Sign up Sign in Why Engineering Teams Need an Agentic Layer, Not Just AI Chat Sim…

agentic
Show HN: The Harness for Creative Agents (www.flickspeed.ai via hn) +11 10w

Coding agents need shell. Creative agents need canvas.

creative-agents agentic
Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI (openai.com via hn) +1 10w

Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI | OpenAI Skip to main content Research Products Business Developers Company Foundation(opens in a new window) Log inTry ChatGPT(opens in a new window) Research Produ…

chatgpt agentic openai
Finding Widespread Cheating on Popular Agent Benchmarks (debugml.github.io via hn) +1 10w

TLDR: Agentic cheating is a widespread issue, affecting thousands of submitted agent runs on 28+ submissions across 9 different benchmarks. Terminal-Bench 2 is a popular benchmark used to evaluate frontier model releases (e.g.

agentic
If You're Only Running One Claude Code Session, You're Not Going Fast Enough (www.scape.work via hn) +12 10w

April 12, 2026 If You're Only Running One Claude Code Session, You're Not Going Fast Enough The real skill in using agentic coding is not coding at all. It's management.

agentic claude-code
How are you reducing LLM token costs for async workflows? (github.com via hn) +12 10w

ParaLLeM ParaLLeM is a library for orchestrating agentic LLM workflows. Batch API support Concise, readable, and expressive Developer-centered and lightweight Parallelize thousands of requests, while keeping reproducible traces for each ru…

agentic
Created a linter for agentic code smells ( via reddit) +14 10w

could not extract summary

agentic
Strong feeling: we are in a folded AI reality (news.ycombinator.com) +11 10w

Some people think Agentic AI could do everything, is getting more and more powerful even feel fear about it. Another group non-technical people still just trapped in the LLM chat is weak and full of hallucination world.

↯ Hallucination hallucination agentic
OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning (arxiv.org) 9h

Outcome-based reinforcement learning provides a stable optimization backbone for language agents, but its sparse trajectory-level rewards provide little guidance on which intermediate decisions should be reinforced or suppressed. On-policy…

agentic
Evaluating Deep Research Agents on Expert Consulting Work: A Benchmark with Verifiers, Rubrics, and Cognitive Traps (arxiv.org) 9h

Frontier deep research agents (DRAs) are being deployed in enterprise workflows faster than they are being evaluated. Existing benchmarks measure factual recall, single-hop QA, or generic agentic skill, and miss the multi-document, decisio…

agentic
Chai: Agentic Discovery of Cryptographic Misuse Vulnerabilities (arxiv.org) 9h

AI-assisted vulnerability discovery has proven effective for bug classes like memory safety, where instrumentation confirms memory violations and efficiently filters false positives. Many dangerous vulnerability classes, such as cryptograp…

↯ Security security agentic
MIRROR: Novelty-Constrained Memory-Guided MCTS Red-Teaming for Agentic RAG (arxiv.org) 9h

Multimodal agentic retrieval-augmented generation (RAG) systems expand the attack surface beyond prompt injection to include text poisoning, image injection, direct-query attacks, and orchestrator-level tool manipulation. Existing red-team…

↯ Security prompt-injection rag security+1
HiLSVA: Design and Evaluation of a Human-in-the-Loop Agentic System for Scientific Visualization (arxiv.org) 9h

Large language model (LLM) agents enable natural language interaction for scientific visualization (SciVis). Still, prior systems have essentially prioritized autonomy over human analytical control, thereby limiting transparency and human…

agentic
Localizing RL-Induced Tool Use to a Single Crosscoder Feature (arxiv.org) 9h

Fine-tuning through RL reshapes the internal representations of language models to enable agentic behaviors such as tool use, yet the mechanistic basis of these changes remains poorly understood. While RL substantially improves structured…

↯ Tool Use ↯ Fine Tuning tool-use fine-tuning agentic
EVOM: Agentic Meta-Evolution of Actor-Critic Architectures for Reinforcement Learning (arxiv.org) 9h

In actor-critic reinforcement learning, network architectures are typically manually designed. Automating this design is challenging because each candidate must be trained before evaluation, and the design space is open-ended.

agentic
The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators (arxiv.org) 9h

Self-improving agents are state-of-the-art (SOTA) on agentic coding benchmarks and have recently been extended to general domains. However, their search methods generally assume a stationary evaluation criterion: a fixed verifier, benchmar…

agentic
A Process Harness for Uplifting Legacy Workflows to Agentic BPM: Design and Realization in CUGA FLO (arxiv.org) 9h

We introduce the process harness, a new mechanism for uplifting legacy workflows into Agentic Business Process Management (Agentic BPM) without replacing the underlying workflow engine. A process harness places a policy-governed agentic la…

agentic
When Agents Meet Electric Bus Fleet Operations: Pricing Behavior, Trade-offs, and Policy Implications in an Aggregator Framework (arxiv.org) 9h

Agentic systems are changing how complex operational tasks are coordinated, introducing a new paradigm for connecting heterogeneous data sources and automating processes. Electric bus fleets provide a relevant test case.

agentic
Instruction Bleed: Cross-Module Interference in Prompt-Composed Agentic Systems (arxiv.org) 9h

Practitioners of prompt-composed agentic systems report a recurring failure mode: editing one prompt module silently shifts the behavior of others despite no shared variable or executable dependency. We formalize this as compositional beha…

agentic
How Do Tool-Augmented LLM Agents Perform on Real-World Energy Analytics Tasks? (arxiv.org) 9h

Agentic benchmarks have emerged across general-purpose and domain-specific settings, including finance, coding, law, and drug discovery, yet energy-domain evaluations remain largely limited to static knowledge recall. This is a critical ga…

agentic
Knowledge-augmented Agentic AI for Mental Health Medication Information Seeking (arxiv.org) 9h

Patients increasingly seek medication information online, yet safety knowledge for psychiatric drugs is split between regulatory adverse-event records, which are authoritative but abstract, and patient narratives, which are experience-near…

agentic
Agentic Analysis for Agentic Infrastructure: An LLM-Powered Pipeline for Comparative Governance of DAO and Corporate AI Protocols (arxiv.org) 9h

As AI agent protocols proliferate, the governance structures shaping their interoperability standards remain empirically underexamined. We introduce an LLM-powered comparative pipeline for large-scale governance discourse analysis, integra…

agentic
PreHook command Gate policy layer for all Claude code agents (www.reddit.comhttps) 11h

Hello everyone, I recently was fed up with agents running unsupervised commands on my systems and wanted to solve this problem. The problem was simple, Claude code model “fable 5” uses safety flags in the UI layer that prevented the model…

opus agentic claude-code
Is there a standard for porting agent state across models, or are we all writing custom wrappers? (www.reddit.com via reddit) 19h

Hey everyone, I'm fairly new to the agentic workflows space. Really interested to get into it.

rag agentic
Which model for technical documentation? (www.reddit.com via reddit) 19h

Looking to create high level / low level designs (software), based on existing templates/examples, cross reference code, use mcp to download confluence/jira data - also plug into agentic ‘coding’ frameworks opencode . I mostly use opus 3.6…

opus mcp agentic
Built a loop engineering skill for PRs in Claude Code — branch, two independent reviews, CI handling, merge handoff. Here's what I learned building it. (www.reddit.comhttps) 20h

Most agentic coding tools are really good at one thing: writing code fast. you describe a problem, they implement it, done.

agentic claude-code
Claude Max vs Codex Pro or both combined? (www.reddit.com via reddit) 20h

I’m considering one heavier subscription (~€100/month) and want to know which provides better value for agentic coding. I tested GPT Pro and was satisfied with Codex.

↯ Glm ↯ GLM 5.2 glm ollama codex+2
New to Reddit & starting my journey to become a Gen AI / Agentic AI Dev. Looking to connect and learn. (www.reddit.com via reddit) 21h

Hey guys, I'm new to Reddit and just starting out learning Gen AI and Agentic AI. I really want to connect with people in this field for some guidance, networking, and just to talk.

agentic
How we made an AI agent faster by moving stable context out of the prompt (www.reddit.com via reddit) 22h

Many AI agents are expensive because they keep rediscovering the same context. At a recent BotsCrew Spotlight, one team shared what they learned while building an enterprise analytics assistant.

agentic
A Probabilistic Framework for LLM-Based Model Discovery (arxiv.org) 1d

Automated methods for discovering mechanistic simulator models from observational data offer a promising path toward accelerating scientific progress. Such methods often take the form of agentic-style iterative workflows that repeatedly pr…

agentic
Agentic Software Engineering: Foundational Pillars and a Research Roadmap (arxiv.org) 1d

Agentic Software Engineering (SE 3.0) represents a new era where intelligent agents are tasked not with simple code generation, but with achieving complex, goal-oriented SE objectives. To harness these new capabilities while ensuring trust…

agentic
Governing Technical Debt in Agentic AI Systems (arxiv.org) 1d

Agentic AI systems are increasingly being explored as production infrastructure: they reason over multiple steps, call tools, act through workflows, and adapt through memory and feedback. These systems create governance challenges that are…

agentic
Shepherd: Enabling Programmable Meta-Agents via Reversible Agentic Execution Traces (arxiv.org) 1d

As LLM agent systems take on more complex tasks, they increasingly rely on meta-agents: higher-order agents that create, operate on and manage other agents. Meta-agent operations such as coordinating agents, halting risky actions before ex…

agentic
Plausible but Wrong: A case study on Agentic Failures in Astrophysical Workflows (arxiv.org) 1d

Agentic AI systems are increasingly being integrated into scientific workflows, yet their behavior under realistic conditions remains insufficiently understood. We evaluate CMBAgent across two workflow paradigms and eighteen astrophysical…

agentic
Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents (arxiv.org) 1d

Process reward models enable fine-grained, step-level evaluation of LLMs, yet building them for agentic settings remains prohibitively difficult: long-horizon interactions, irreversible actions, and stochastic environment feedback make bot…

agentic
Is GraphRAG Needed? From Basic RAG to Graph-/Agentic Solutions with Context Optimization (arxiv.org) 1d

As advanced RAG variants like GraphRAG and Agentic RAG emerge, one leading question is when and how to use them. Here, we introduce a framework for different RAG scenarios evaluation and comparison on semi-structured knowledge bases, inclu…

rag agentic
Autodata: An agentic data scientist to create high quality synthetic data (arxiv.org) 1d

We introduce Autodata, a general method that enables AI agents to act as data scientists who build high quality training and evaluation data. We show how to train (meta-optimize) such a data scientist agent, so that it learns to create eve…

agentic
Agentic System as Compressor: Quantifying System Intelligence in Bits (arxiv.org) 1d

Large language models are turning from isolated predictors into agentic systems: they call tools, retrieve evidence, obey environment constraints, use verifiers, and complete tasks through search and multi-turn interaction. We adopts an an…

agentic
AI Snitches Get Glitches: Towards Evading Agentic Surveillance (arxiv.org) 1d

To better assist users with completing challenging tasks, AI agents mediate communications, access data, and interact with different APIs. Many employers (and even nation-states) already provide their users with this technology.

agentic
Agentic evolution of physically constrained foundation models (arxiv.org) 1d

Artificial intelligence increasingly drives automated scientific discovery, yet contemporary generalist agents lack physical grounding, frequently hallucinating hardware-incompatible designs. Here, we present a physically grounded, multi-a…

agentic
Agentic Knowledge Tracing: A Multi-Agent LLM Architecture for Stealth Assessment of Financial Literacy in Serious Games (arxiv.org) 1d

Assessing financial literacy during gameplay without disrupting the learning experience remains a key challenge in serious games for education. We present the Agentic BKT pipeline, a multi-agent large language model architecture for stealt…

agentic
Diagnosing and Mitigating Compounding Failures in Agentic Persuasion via Taxonomic Strategy Retrieval (arxiv.org) 1d

Foundation-model agents in multi-step, open-ended environments frequently suffer from compounding errors, where early mistakes contaminate long-horizon trajectories. While Multi-Agent Debate (MAD) succeeds in deterministic domains, agents…

agentic
The Hitchhiker's Guide to Agentic AI: From Foundations to Systems (arxiv.org) 1d

The Hitchhiker's Guide to Agentic AI is a comprehensive practitioner's reference for building autonomous AI systems. The book covers the full stack from first principles to production deployment, organized around a central thesis: building…

agentic
How agents are transforming work (openai.com) 1d

Agentic AI changes the unit of knowledge work from single interactions to delegated, long-horizon tasks. Chatbot interactions are often short and self-contained.

agentic
As a solo builder I created a multi tenant B2B SaaS for commercial maintenance companies that is agentic AI capable in 2 months using Claude code. (www.reddit.com via reddit) 1d

Hello everyone, I began working on this project on April 16th. Some quick background.

agentic claude-code
I built a local, open-source tool that turns your Claude Code prompts into a self-portrait of how you code (www.reddit.comhttps) 1d

I built devbrain, an open-source local dashboard for your Claude Code history. It shows when you work, where your tokens go, what you keep asking for, and TODOs extracted from your prompts.

agentic claude-code
Software development has entered its "infinite monkeys" era (www.reddit.com via reddit) 2d

With the rise of agentic coding tools like Claude Code, Cursor, and Codex, the barrier to entry is gone. Now, anyone with an internet connection can "type." We have essentially reached the infinite monkey phase of software development.

codex cursor agentic+1
AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning (arxiv.org) 2d

Large language models are increasingly deployed as agents that reason over documents rather than answer from parametric knowledge. We study archive-grounded reasoning: locating sparse evidence across a large, messy collection of workplace…

agentic
Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning (arxiv.org) 2d

Experience-driven self-evolution is critical for large language model (LLM) agents to improve through open-world interaction. However, existing experience learning methods mostly rely on single-agent loops, where the same agent executes ta…

agentic
Toward Autonomous O-RAN: A Multi-Scale Agentic AI Framework for Real-Time Network Control and Management (arxiv.org) 2d

Open Radio Access Networks (O-RAN) promise flexible 6G network access through disaggregated, software-driven components and open interfaces, but this programmability also increases operational complexity. Multiple control loops coexist acr…

agentic
ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms (arxiv.org) 2d

Progress in computational science depends on complex numerical workflows that must faithfully encode physical laws, yet translating conceptual insight into reliable code remains a major bottleneck. Although large language models can genera…

agentic
Multimedia and Visual Analytics in the Agentic Era (arxiv.org) 2d

Professional users need tools to help them gain actionable insights from large multimedia collections. Foundation models and AI agents have rapidly changed the playing field, and improving their accuracy, trustworthiness, and reasoning cap…

agentic
Paying to Know: Micro-Transaction Markets for Verified Product Information in Agentic E-Commerce (arxiv.org) 2d

Commercial NLP treats the shopping chatbot as a recommender or a conversion tool: its job is to match a user to a catalogue entry and close a sale. We argue that the arrival of agent-native micro-payment rails (e.g., x402, AP2) changes wha…

agentic
DeepBD: A Grounded Agentic Workflow for Variant Prioritization and Diagnosis of Genetic Birth Defects (arxiv.org) 2d

Birth defects are a major cause of fetal loss, neonatal morbidity and long-term disability. In the subset with suspected genetic etiologies, exome and genome sequencing have moved many cases from variant detection to post-sequencing interp…

agentic
Red-Teaming the Agentic Red-Team (arxiv.org) 2d

The use of agentic systems to perform offensive security operations has moved from a theoretical possibility to a commoditized capability. However, while the community has focused on creating more and more capable agents, less attention ha…

agentic
OpenThoughts-Agent: Data Recipes for Agentic Models (arxiv.org) 2d

Agentic language models dramatically expand the applications of AI yet little is publicly known about how to curate training data for broadly capable agents. Existing open efforts such as SWE-Smith, SERA, and Nemotron-Terminal typically ta…

agentic
Grading the Grader: Lessons from Evaluating an Agentic Data Analysis System (arxiv.org) 2d

Agentic data analysis systems produce rich outputs, including code, numerical results, and verbal diagnostics. This makes them more challenging to evaluate than single-turn LLM responses.

agentic
SAFARI: Scaling Long Horizon Agentic Fault Attribution via Active Investigation (arxiv.org) 2d

As autonomous agents tackle increasingly complex multi-step, multi-agent tasks, their execution trajectories have scaled beyond the constraints of even the largest context windows. Current methods for effectively diagnosing agent failures…

agentic
Agentic AI for Bilevel Long-Term Optimization of Policy-Driven Physical Layer Systems (arxiv.org) 2d

Network operators' changing policies, service requirements, and stringent real-time constraints render existing methods designed with fixed objectives and constraints ineffective. This paper presents Agentic long-term performance optimizat…

agentic
OmniPath: A Multi-Modal Agentic Framework for Auditing Wheelchair Accessibility (arxiv.org) 2d

For a wheelchair user, a standard blue line on a map is often a broken promise. While platforms like OpenStreetMap (OSM) successfully capture where a path is, they frequently fail to convey how it physically feels to travel on it.

agentic
ReMMD: Realistic Multilingual Multi-Image Agentic Verification for Multimodal Misinformation Detection (arxiv.org) 2d

Multimodal misinformation detection is increasingly important because viral posts now combine long multilingual narratives, several images, mixed provenance, and subtle text--image framing errors. Existing benchmarks and methods remain poo…

agentic
RIFT-Bench: Dynamic Red-teaming For Agentic AI Systems (arxiv.org) 2d

Agentic AI systems powered by large language models (LLMs) are rapidly evolving into autonomous decision-making systems, exposing attack vectors beyond those of traditional LLM vulnerabilities. Existing security evaluations are often tied…

agentic
I'm building agent loops that auto-edit my videos, but the hard part has been finding a model to accurately grade the result (youtube.com via reddit) 2d

Quick context: I've been building agentic loops that edit my short-form videos for me. The editing works really well, but I found myself needing to check the process at several gates.

↯ Opus 4.8 ↯ GPT 5.5 gpt-5 gemini codex+3
Memory layer situation in claude and other agentic ecosystems (www.reddit.com via reddit) 2d

Every other day someone ships a new memory layer for AI agents. Claude has its own memory system, ChatGPT has one, and I've written my own.

chatgpt agentic claude-code
Build real agentic apps using CUGA: two dozen working examples on a lightweight harness (huggingface.co) 3d

Build real agentic apps using CUGA: two dozen working examples on a lightweight harness TL;DR — Building an agent is mostly plumbing: tools, state, guardrails, scaling from one agent to many. CUGA (pip install cuga), short for Configurable…

agentic
RAVEN: Agentic RAG for Automated Vulnerability Repair (arxiv.org) 3d

↯ Security rag security agentic
Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning (arxiv.org) 3d

agentic
ATLAS: Agentic Taxonomy of Large-Scale Software Ecosystems (arxiv.org) 3d

agentic
EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory (arxiv.org) 3d

agentic
Dissecting Agentic RAG: A Component Ablation for Multi-Hop QA with a Local 7B Model (arxiv.org) 3d

rag agentic
SciLens: Multi-modal Scientific Claim Verification with Agentic Entailment and Grounding (arxiv.org) 3d

agentic
SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs (arxiv.org) 3d

agentic
From RAG to Agentic RAG for Faithful Islamic Question Answering (arxiv.org) 3d

rag agentic
Tell Me: An LLM-powered Mental Well-being Assistant with RAG, Synthetic Dialogue Generation, and Agentic Planning (arxiv.org) 3d

rag agentic
Towards Adaptive Categories: Dimensional Governance for Agentic AI (arxiv.org) 3d

agentic
RS-Gen: A Multi-Stage Agentic Framework for Reasoning and Search-Augmented Image Generation (arxiv.org) 3d

agentic
Group-Graph Policy Optimization for Long-Horizon Agentic Reinforcement Learning (arxiv.org) 3d

agentic
Revelio: Cost-Efficient Agentic Memory Safety Vulnerability Detection For Repository-Scale Codebases (arxiv.org) 3d

↯ Security security agentic
TraceView: Interactive Visualization of Agentic Program Repair Trajectories (arxiv.org) 3d

agentic
From RAN Control to Agentic Intelligence: Architecture and Vision for Energy Efficient AI-RAN (arxiv.org) 3d

agentic
Skills for the future software profession: beyond agentic AI! (arxiv.org) 3d

agentic
SwarmX: Agentic Scheduling for Low-Latency Agentic Systems (arxiv.org) 3d

Agentic AI applications compose multiple model calls and tool executions, creating new scheduling challenges for GPU-CPU clusters. Their inference time and model-call structure often depend on prompt semantics, making conventional scheduli…

agentic
DataClaw0: Agentic Tailoring Multimodal Data from Raw Streams (arxiv.org) 3d

Massive unstructured multimodal streams suffer from high "data entropy," impeding both efficient human knowledge acquisition and high-quality AI post-training. Existing passive annotation paradigms, heavily reliant on heuristic rules or ge…

agentic
One Image is All You Need: Agentic One-Shot Image Generation via Text-Based World Models for Long-Tail Spatial Perception (arxiv.org) 3d

Reliable spatial decision automation, such as autonomous driving and maritime surveillance, critically depends on robust visual perception. However, real-world spatiotemporal data exhibits severe heterogeneity, often manifesting as extreme…

agentic
Role-Based Agentic AI for Intent-Driven Network and Service Orchestration (arxiv.org) 3d

Telecommunication networks are increasingly complex due to heterogeneous technologies, diverse service requirements, and growing demands for resource efficiency and business agility. Intent-Based Networking (IBN) and, more recently, agenti…

agentic
Infrastructure for the Agentic Web: Gap Analysis and Architecture from the Agentverse Platform (arxiv.org) 3d

The emergence of autonomous AI agents as first-class participants in digital infrastructure marks a fundamental inflection point in the evolution of the Web. While significant research has been directed at agent behaviour and reasoning, co…

agentic
AI-Native Network Controller: A Modular Framework for Safe Agentic Control of Multi-Domain Network Infrastructure (arxiv.org) 3d

The convergence of multiple network domains, including radio access, optical transport, and core networks, under unified intelligent control is a fundamental requirement for future 6G systems. This is important because existing network con…

agentic
GIF: Locally Sound Geometric Information Flow Control for LLMs (arxiv.org) 3d

Large language models increasingly mediate interactions between sensitive data, untrusted inputs, and privileged actions in agentic systems, creating security and privacy risks. These range from prompt injections that manipulate downstream…

agentic
Agent-as-a-Router: Agentic Model Routing for Coding Tasks (arxiv.org) 3d

Real-world users typically have access to multiple Large Language Models (LLMs) from different providers, and these LLMs often excel at distinct domains, yet none dominate all. Consequently, routing each task to the most suitable model bec…

agentic
RaMem: Contextual Reinstatement for Long-term Agentic Memory (arxiv.org) 3d

Long-term memory has become increasingly important for LLM agents that operate across extended interactions and evolving task contexts. Recent memory systems have made past experiences more persistent, compact, and retrievable, but retriev…

agentic
Grounded Scaling: Why Agentic AI Needs Deterministic Environments (arxiv.org) 3d

Long-chain agent execution fails exponentially in environments designed for human tolerance: with per-step determinism $\delta < 1$, $k$-step chain success degrades as $\delta^k$. The AGI-to-ASI scaling debate (Genewein et al., 2026) has s…

agentic
Holmes: Multimodal Agentic Diagnosis for Mixed-Language Mobile Crashes at Industrial Scale (arxiv.org) 3d

Diagnosing mobile crashes in ultra-large-scale industrial applications is a formidable challenge due to the sheer volume of code, the complexity of mixed-language environments, and the inability to reproduce failures locally. Traditional s…

agentic
AgentRiskBOM: A Risk-Scoping Security Bill of Materials for Agentic AI Systems (arxiv.org) 3d

Agentic AI systems retrieve private context, invoke tools, write files, call external services, coordinate with other agents, and may act without human approval. Existing bill of materials artifacts improve transparency for dependencies, m…

agentic
Training the Orchestrator: A Supervised Approach to End-to-End PDDL Planning with LLM Agents (arxiv.org) 3d

Translating natural-language planning intent into verified plans is a longstanding challenge: people communicate goals in language, while classical planners require formal PDDL specifications. Recent agentic frameworks bridge this gap by o…

agentic
Counsel: A Meta-Evaluation Dataset for Agentic Tasks (arxiv.org) 3d

As agentic systems tackle increasingly complex multi-step tasks, evaluating their trajectories presents a major bottleneck - human annotation of a single trajectory on popular agentic benchmarks can take hours, making it difficult to scale…

agentic
Composing Verifiable Conceptual Models via Building Blocks: Towards Design-Time Verification of Agentic AI Workflows (arxiv.org) 3d

Agentic AI systems orchestrate multiple LLM-based agents through workflow architectures that coordinate decisions, tools, and external actions. While current platforms emphasize runtime safeguards, little support exists for verifying workf…

agentic
AutoRAS: Learning Robust Agentic Systems with Primitive Representations (arxiv.org) 3d

The automated design of agentic systems offers a promising pathway for scaling large language models (LLMs) beyond single-agent reasoning. While prior work has advanced task performance through handcrafted or automatically generated multi-…

agentic
Agentic Time Machine as an Infrastructure for Future-Event Forecasting (arxiv.org) 3d

Forecasting future events is a critical challenge for large language model (LLM) agents, spanning domains from elections and monetary policy to financial markets. However, evaluating progress on this task presents a fundamental trade-off b…

agentic
Democratizing and accelerating AI-driven pathology research through agentic intelligence (arxiv.org) 3d

Computational pathology has advanced rapidly with the emergence of foundation models, yet widespread adoption remains limited by substantial technical complexity and programming requirements. Here we present PathLab, an autonomous agentic…

agentic
A Quantum-Assisted Agentic Distributed Artificial Intelligence Framework for Deadline-Bounded Orchestration of Hybrid Renewable Microgrids (arxiv.org) 3d

The real-time orchestration of microgrids that combine fluctuating renewable sources, dispatchable units, storage and curtailable consumers requires the repeated solution of combinatorial dispatch and coalition formation problems under har…

agentic
Everyone thinks I'm a vibe coder because I put a Claude sticker on my laptop (www.reddit.com via reddit) 3d

I was just at a coffee shop when someone said, "I like your vibe," to me. I thanked him and then he pointed at my Claude sticker and clarified his original statement.

agentic
Connected a Robinhood Account to Claude Code and Codex for Autonomys Agentic Trading... Update 1 (www.reddit.com via reddit) 3d

Update to my original post: https://www.reddit.com/r/ClaudeAI/comments/1u8nagi/connected_a_robinhood_account_to_claude_code_and/ I'm building a fully autonomous daily stock-trading desk in a Robinhood "Agentic" account. Opus is the CEO/PM,…

gemma codex opus+2
what is agentic coding and why is everyone suddenly talking about it (www.reddit.com via reddit) 3d

I kept seeing "agentic coding" everywhere for the last few months, blog posts, Twitter threads, product launches, and I honestly couldn't tell if it was a real shift or just the latest marketing rebrand for AI autocomplete. So I spent the…

codex cursor agentic+1
Lighthouse agentic browsing scoring (developer.chrome.com) 6d

The Agentic Browsing category evaluates how well your site is constructed for machine interaction through a set of deterministic audits. How the category is scored Unlike other Lighthouse categories, the Agentic Browsing category does not…

agentic
Server hosting for Agents (www.reddit.com via reddit) 6d

I am a non-developer about to build a CRM like agentic workflow for my self and maybe 1 other team member of a company that I own and operate. I saw some options but wanted to see what people thought for railway or Render or other options?

agentic
Is there actually a good way to orchestrate multiple agents, or is everyone just running a bunch of terminals? (www.reddit.com via reddit) 6d

A couple weeks ago I saw someone with 6 instances of Claude Code open, each in its own window, switching between them by hand. And the thing is, that seems to be roughly the state of the art right now.

agentic claude-code
Agentic Symbolic Search: Characterizing PDEs Beyond Hand-crafted Expressions, Meshes, and Neural Networks (arxiv.org) 7d

Mathematicians understand a PDE solution through mathematical structures rather than tables of computed values. Historically, this has been the product of mathematical analysis, carried out by hand for each problem individually.

agentic
TSAssistant: A Human-in-the-Loop Agentic Framework for Automated Target Safety Assessment (arxiv.org) 7d

Target Safety Assessment (TSA) requires systematic integration of genetic, transcriptomic, target homology, pharmacological, and clinical data to evaluate potential safety liabilities of therapeutic targets. This process is labor-intensive…

agentic
Prompt, Plan, Extract: Zero-Shot Agentic LLMs Workflows for Lung Pathology Extraction from Clinical Narratives (arxiv.org) 7d

Information extraction from pathology reports is essential for cancer staging, tumor registry population. Yet key data remains embedded in narrative reports, making manual extraction labor-intensive and error-prone.

agentic
SIGMA: Search-Augmented On-Demand Knowledge Integration for Agentic Mathematical Reasoning (arxiv.org) 7d

Solving mathematical reasoning problems requires not only accurate access to relevant knowledge but also careful, multi-step thinking. However, current retrieval-augmented models often rely on a single perspective, follow inflexible search…

agentic
Sovereign Execution Brokers: Enforcing Certificate-Bound Authority in Agentic Control Planes (arxiv.org) 7d

Autonomous agents are increasingly connected to cloud, deployment, and data-control workflows, but production mutation authority should not reside inside non-deterministic reasoning processes. Existing access-control mechanisms authorize i…

agentic
Analyzing Defensive Misdirection Against Model-Guided Automated Attacks on Agentic AI Systems (arxiv.org) 7d

Agentic AI systems increasingly rely on language-model components to interpret instructions, process external data, invoke tools, and coordinate with other agents. These capabilities make prompt-injection and jailbreak attacks more consequ…

↯ Security ↯ Jailbreak jailbreak security agentic
ScholarQuest: A Taxonomy-Guided Benchmark for Agentic Academic Paper Search in Open Literature Environments (arxiv.org) 7d

Academic paper search is a core step in scientific research, and LLM-based search agents are emerging as a promising paradigm for iterative, intent-driven literature exploration. However, existing benchmarks are insufficient for systematic…

agentic
AI Economist Agent: An Agentic Framework for Model-Grounded Economic Analysis with RAG, Knowledge Graphs, and Large Language Models (arxiv.org) 7d

We propose a model-grounded RAG-based AI economist with an agentic framework for economic scenario analysis using large language models (LLMs) and knowledge graphs. While LLMs can generate fluent economic narratives, economists are often r…

rag agentic
Beyond Static Endpoints: Tool Programs as an Interface for Flexible Agentic Web Services (arxiv.org) 7d

In the agentic web era, LLM-based agents increasingly invoke web services as tools, yet most interfaces remain \emph{static endpoints} that poorly express long-horizon workflows with loops, conditionals, joins, and retries. We present Tool…

agentic
Measuring Biological Capabilities and Risks of AI Agents (arxiv.org) 7d

This paper addresses a rapidly emerging policy challenge: how to generate and interpret credible evidence about the biological capabilities and risks of AI scientists, or agentic AI systems capable of autonomously or collaboratively perfor…

agentic
Agentic Electronic Design Automation: A Handoff Perspective (arxiv.org) 7d

Electronic design automation (EDA) is inherently multi-stage and handoff-heavy. Design artifacts, flow scripts, and engineering decisions cross tool, session, and organizational boundaries before final implementation, signoff, or release.

agentic
Playful Agentic Robot Learning (arxiv.org) 7d

Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions.…

agentic
Execution-bound advisory automation for agentic AI: a reproducible AIBOM-driven CSAF-VEX framework (arxiv.org) 7d

A protocol driven framework is presented that binds SBOM and AIBOM artefacts to deterministic environment capture and structured runtime telemetry. Exploitability is computed from declared artefacts, observed activation conditions, and enf…

agentic
ENPIRE: Agentic Robot Policy Self-Improvement in the Real World (arxiv.org) 7d

Achieving dexterous robotic manipulation in the real world heavily relies on human supervision and algorithm engineering, which becomes a central bottleneck in the pursuit of general physical intelligence. Although emerging coding agents c…

agentic
Benchmarking Agentic Review Systems (arxiv.org) 7d

A new class of agentic review systems are emerging as a remedy to the pressure placed on peer review systems by AI-assisted research, but it is unclear how they should be evaluated. We evaluate two open-source systems (OpenAIReview and coa…

agentic
Configurable Clinical Information Extraction with Agentic RAG: What Works, What Breaks, and Why (arxiv.org) 7d

Patient contexts span hundreds of heterogeneous documents and thousands of structured data points, yet the document-level metadata that AI systems need for retrieval and triage is absent or incomplete. Standard retrieval-augmented generati…

rag agentic
DeXposure-Claw: An Agentic System for DeFi Risk Supervision (arxiv.org) 7d

Decentralized finance exposes supervisors to fast-moving, networked credit risks. General-purpose LLM agents fit this setting poorly: they over-read weak evidence and recommend high-stakes interventions, while existing evaluations offer no…

agentic
Deontic Policies for Runtime Governance of Agentic AI Systems (arxiv.org) 7d

Autonomous agentic AI systems driven by Large Language Models (LLMs) introduce a new class of security, privacy, and compliance challenges: an agent that can invoke tools, manipulate data, install software, and coordinate with peer agents…

agentic
Hands-on prep for the Claude Certified Architect (CCA-F) exam (www.reddit.com via reddit) 7d

A few months ago I posted about a dedicated suite of practice tests I built for Claude Certified Architect - Foundations. Thanks a lot to all of you for the feedback!

agentic
How are you guys using ai to increase productivity. (www.reddit.com via reddit) 7d

What i mean by this not opening claude and adding claude.md file or setting etc. This it's self is an art how you setup and prompt.

agentic
Warp/cursor vs Claude Code native app (www.reddit.com via reddit) 8d

I am not a coder and I am actually pretty new to this vibe coding world and agentic AI (I just try having a second brain with some agents), can someone explain me why everyone is using warp or cursor? What's the difference between those an…

cursor agentic claude-code
Building independent LLM drift detection - sharing the methodology, looking for feedback on the approach (www.reddit.com via reddit) 8d

Disclosed upfront: I run [Tickerr dot ai], an independent external monitor for AI APIs. Today it tracks latency, TTFT, uptime, and error rates across major models.

tool-calling gemini agentic+1
Code-Augur: Agentic Vulnerability Detection via Specification Inference (arxiv.org) 8d

The advent of agentic vulnerability detection is already becoming a watershed moment for software security. Audits conducted entirely by autonomous LLM agents are uncovering critical vulnerabilities in fundamental software underpinning dig…

↯ Security security agentic
Mitigating Anchoring Bias in LLM-Based Agents for Energy-Efficient 6G Autonomous Networks (arxiv.org) 8d

This paper presents an autonomous agentic resource negotiation framework designed to enable zero-touch network slicing in 6G architectures using Large Language Model (LLM) agents. While LLMs offer powerful reasoning capabilities, we demons…

agentic
ProfiLLM: Utility-Aligned Agentic User Profiling for Industrial Ride-Hailing Dispatch (arxiv.org) 8d

Bringing Large Language Models (LLMs) into industrial ride-hailing dispatch as semantic feature extractors over platform-scale behavioral logs is a compelling but under-explored data systems problem. Production matching pipelines remain do…

agentic
ToolChain-CRC: Conformal Risk Control for Agentic AI Under Retrieval and Tool-Use Drift (arxiv.org) 8d

Modern AI agents retrieve documents, call tools, check intermediate information, and then produce a final answer or action. This creates a risk-control problem that is not visible from the final answer alone.

agentic
Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems (arxiv.org) 8d

Large language models in Agentic AI systems consume tool schemas and execution results and emit tool invocations as structured data. The default language for that exchange, JSON, was designed for application-to-application interchange rath…

agentic
Is it agentic enough? Benchmarking open models on your own tooling (huggingface.co) 8d

Is it agentic enough? Benchmarking open models on your own tooling Benchmarking transformers revisions across different metrics This is a human-made, agent-focused blogpost.

agentic
Claude will execute trades in some chats and flat-out refuse in others. How do you get it to behave consistently? (www.reddit.com via reddit) 9d

Bear with me, I'm more of an architecture guy than a coder, so apologies if I'm missing something obvious. And I'm an old guy, so some generational frustration is in here too.

mcp agentic
EnvRL: Learn from Environment Dynamics in Agentic Reinforcement Learning (arxiv.org) 9d

Reinforcement learning (RL) has emerged as a powerful paradigm for training Large Language Models (LLMs) as agents. However, conventional RL methods for long-horizon agentic tasks often struggle with sparse outcome rewards.

agentic
Securing Multi-Agent GIS Systems: Risk Evaluation and Prompt Hardening Optimization (arxiv.org) 9d

Agentic systems are increasingly integrated with geographic information systems (GIS), where multi-agent coordination enables complex conversational and spatial analysis but introduces security risks. This work presents a security-oriented…

agentic
Large Language Models for Agentic NetOps and AIOps: Architectures, Evaluation, and Safety (arxiv.org) 9d

Large language models are increasingly being used to support network operations (NetOps) and artificial intelligence for IT operations (AIOps), including incident investigation, root-cause analysis, configuration synthesis, and limited sel…

agentic
MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation (arxiv.org) 9d

Lane-level maps are critical infrastructure for autonomous driving and lane-level navigation, yet constructing and maintaining standardized lane networks for hundreds of cities remains highly labor-intensive. Recent end-to-end vectorized m…

agentic
A T-API-Compliant ReAct Agentic Loop for Optical Networks: Generic vs. Domain-Specific Tool Abstractions (arxiv.org) 9d

Optical networks need intent-driven, closed-loop agentic management, a key enabler for higher autonomy levels. We present the first T-API-compliant reasoning and act (ReAct) loop.

agentic
A Framework for Evaluating Agentic Skills at Scale (arxiv.org) 9d

Agent skills -- structured, reusable knowledge artifacts that augment LLM agent capabilities -- have been rapidly adopted in industry, yet their cross-domain impact and use across commercial and open-source models remain under-studied, and…

agentic
Position: Coding Benchmarks Are Misaligned with Agentic Software Engineering (arxiv.org) 9d

Coding agents have become a major mode of software engineering, but the benchmarks we use to compare them were designed in a pre-agent era: they collapse model, harness, and environment into a single end-to-end score, typically computed ag…

agentic
Model Validation of Agentic AI Systems: A POMDP-Based Framework for Belief-State, Forecast, and Policy Validation (arxiv.org) 9d

Agentic artificial intelligence systems introduce a new class of model risk. Unlike traditional predictive models, autonomous agents continuously acquire information, form beliefs regarding latent states of the environment, generate foreca…

agentic
Agentic Discovery of Non-Canonical Antimicrobial Peptides with AMPGAN v3 (arxiv.org) 9d

Antimicrobial resistance causes to over a million deaths annually. Antimicrobial peptides (AMPs) are a promising solution, but generative AMP models are not yet ready to design peptides with non-natural amino acids and/or chemical modifica…

agentic
CMIP-Forge: An Agentic System that Retrieves, Computes, and Self-Reviews Climate Science (arxiv.org) 9d

The Coupled Model Intercomparison Project Phase 6 (CMIP6) has generated thousands of peer-reviewed publications documenting model configurations, evaluation procedures, emergent constraints, and projection uncertainties. As the community t…

agentic
Learning Cardiac Electrophysiology Digital Twins Through Agentic Discovery of Hybrid Structure (arxiv.org) 9d

Building personalized cardiac electrophysiology (EP) digital twins requires identifying the appropriate model structure for each patient, not merely fitting parameters. Traditional methods rely on experts to manually prescribe hybrid physi…

agentic
WEQA: Wearable hEalth Question Answering with Query-Adaptive Agentic Reasoning (arxiv.org) 9d

Language models are remarkably capable at medical question answering, in some cases surpassing the accuracy of general physicians. However, answering questions about wearable health data remains challenging and understudied, as these ubiqu…

agentic
Your AI Travel Agent Would Book You a Bullfight: An Agentic Benchmark for Implicit Animal Welfare in Frontier AI Models (arxiv.org) 9d

AI agents are moving from advisors to actors, booking travel, planning menus, and running procurement on behalf of users. Existing benchmarks for AI and animal welfare evaluate model text responses to question-answer prompts, leaving open…

agentic
Agentic AI-based Framework for Mitigating Premature Diagnostic Handoff and Silent Hallucination in Healthcare Applications (arxiv.org) 9d

Recent advances in Large Language Models (LLMs) and multi-agent systems have driven the rise of Agentic AI, showing promise for medical reasoning. However, open-ended conversational agents remain prone to two critical failure modes: premat…

↯ Hallucination hallucination agentic
PseudoBench: Measuring How Agentic Auto-Research Fuels Pseudoscience (arxiv.org) 9d

As Large Language Model based agents enter autonomous scientific research, their ability to resist pseudoscience becomes increasingly important. Otherwise, such systems may rapidly generate plausible yet misleading studies that contaminate…

agentic
Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search (arxiv.org) 9d

Test-time scaling for agentic search typically increases depth (i.e., more turns and tokens per trajectory) or breadth (i.e., more parallel rollouts). Here we focus on breadth scaling, showing that standard parallel sampling yields diminis…

agentic
Agentic Resource Discovery: Let agents search (huggingface.co) 9d

Agentic Resource Discovery: Let agents search for tools, skills, and other agents. The Agentic Resource Discovery (ARD) specification is the discovery layer that sits in front of them.

agentic
Cursor launches github for agentic era (www.reddit.com via reddit) 9d

Three big launches: - Cursor iOS app - Origin, an agentic replacement of Git - A new model is in the works in collaboration with spaceX - SpaceX to acquire Cursor

cursor agentic
Spent $11k evaluating Fable: capability looked SOTA, refusals killed it (before Anthropic did) (www.reddit.com via reddit) 9d

Before its suspension, I spent $11,081.12 evaluating Claude Fable 5 on WolfBench, an agentic benchmark based on Terminal-Bench 2.0. It was by far my most expensive benchmark run ever, and I fully expected Fable to become the new top model…

↯ Opus 4.6 gpt-5 opus agentic+1
Is Claude code web capable of unlocking advanced agentic loop workflows ? (www.reddit.com via reddit) 9d

Not talking about the Remote feature where it manages your desktop session remotely, just a purely remote session on Anthropic’s infra. Have you unlocked any of this, asking because it seems most of the advanced features seem to be more su…

agentic anthropic claude-code
MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems (arxiv.org) 10d

Large language models (LLMs) are increasingly deployed as the execution core of autonomous agents rather than as standalone text generators. Agentic workloads induce a temporal shift from single-turn inference to multi-turn LLM-tool loops,…

agentic
MIRAGE: Auditing Anti-Muslim Bias in Frontier LLMs Across Reasoning, Agentic, and Time-Coupled Conditions (arxiv.org) 10d

Five years after the discovery of persistent anti-Muslim bias in large language models, most evaluations remain confined to single-turn prompt completion, a setting that no longer reflects how frontier LLMs are deployed. We introduce \text…

agentic
All-Mem: Agentic Lifelong Memory via Dynamic Topology Evolution (arxiv.org) 10d

Lifelong interactive agents are expected to assist users over months or years, which requires continually writing long term memories while retrieving the right evidence for each new query under fixed context and latency budgets. Existing m…

agentic
Agentic Reinforcement Learning for Search Misaligns Instruction-Tuning (arxiv.org) 10d

Agentic reinforcement learning (RL) trains large language models to use tools, but its impact on alignment is poorly understood. We study how agentic RL for search affects the alignment of instruction-tuned (IT) models.

agentic
Context-Aware RL for Agentic and Multimodal LLMs (arxiv.org) 10d

Large language models (LLMs) often fail when answering requires identifying a small but decisive piece of evidence within a long or complex context, such as a single line in a tool trace or a subtle detail in an image. We propose ContextRL…

agentic
FraudSMSWalker: Benchmarking Agentic Large Language Models for SMS-to-Webpage Fraud Detection (arxiv.org) 10d

SMS fraud is increasingly cross-channel: a message directs the user to a webpage, and the final risk depends on how the SMS claim aligns with the page content and requested user action. However, existing evaluations either focus on message…

agentic
Can LLM Agents Infer World Models? Evidence from Agentic Automata Learning (arxiv.org) 10d

We propose agentic automata learning to evaluate the extent to which tool-calling LLM agents can uncover hidden environments through interaction. In our setup, an agent should uncover a hidden deterministic finite automaton (DFA) by intera…

tool-calling agentic
PathRouter: Aligning Rewards with Retrieval Quality in Agentic Graph Retrieval-Augmented Generation (arxiv.org) 10d

Agentic GraphRAG trains language-model agents to iteratively retrieve and reason over graph-structured evidence, enabling more accurate and context-aware decision-making by efficiently navigating complex information networks. However, outc…

agentic
Interactor: Agentic RL oriented Iterative Creation for Ad Description Generation in Sponsored Search (arxiv.org) 10d

This paper focuses on automatically generating informative ad descriptions in sponsored search. Unlike ad titles which are usually optimized to attract user click feedbacks, ad descriptions have a longer text span and possess the potential…

agentic
TechRAG: Evidence-Gated Multimodal Agentic RAG for Technical Literature Reasoning (arxiv.org) 10d

This paper presents an agentic multimodal retrieval-augmented generation (RAG) framework for domain-specific literature reasoning, instantiated on a curated corpus of several thousand papers in intelligent tires, vehicle dynamics, vehicle…

rag agentic
Beyond Text-to-SQL: An Agentic LLM System for Governed Enterprise Analytics APIs (arxiv.org) 10d

Enterprise analytics aims to make organizational data accessible for decision-making, yet non-technical users still face barriers when using traditional business intelligence tools or Text-to-SQL systems. While recent Text-to-SQL approache…

agentic
TERMS-Bench: Diagnosing LLM Negotiation Agents Beyond Deal Rate (arxiv.org) 10d

Negotiation is a central mechanism of economic exchange, shaping markets, procurement, labor agreements, and resource allocation. It is also a canonical testbed for agentic language models, requiring multi-turn interaction under hidden pre…

agentic
Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw (arxiv.org) 10d

Agentic language-model systems increasingly rely on mutable execution contexts, including files, memory, tools, skills, and auxiliary artifacts, creating security risks beyond explicit user prompts. This paper presents DeepTrap, an automat…

openclaw agentic
AgenticRec: A Recommendation-Oriented Agentic Framework with Progressive Tool-Integrated Reasoning Optimization (arxiv.org) 10d

Recommender agents built on Large Language Models offer a promising paradigm for personalized recommendation. However, existing agents typically suffer from a misalignment between their tool-integrated reasoning trajectories and recommenda…

agentic
MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks (arxiv.org) 10d

Large language model (LLM) based web agents are increasingly deployed to automate complex online tasks by directly interacting with web sites and performing actions on users' behalf. While these agents offer powerful capabilities, their de…

↯ Security prompt-injection security agentic
Learning to Share: Selective Memory for Efficient Parallel Agentic Systems (arxiv.org) 10d

Agentic systems solve complex tasks by coordinating multiple agents that iteratively reason, invoke tools, and exchange intermediate results. To improve robustness and solution quality, recent approaches deploy multiple agent teams running…

agentic
EffGen: Enabling Small Language Models as Capable Autonomous Agents (arxiv.org) 10d

Most existing language model agentic systems today are built and optimized for large language models (e.g., GPT, Claude, Gemini) via API calls; while powerful, this approach faces several limitations including high token costs and privacy…

gemini agentic
RollArt: Disaggregated Multi-Task Agentic RL Training at Scale (arxiv.org) 10d

Agentic Reinforcement Learning (RL) trains LLMs through multi-turn interactions with environments, producing workloads that mix compute-bound prefill, bandwidth-bound decoding, CPU-heavy environment execution, and bursty reward evaluation.…

agentic
Are Neuro-Inspired Multi-Modal Vision-Language Models Resilient to Membership Inference Privacy Leakage? (arxiv.org) 10d

In the age of agentic AI, the growing deployment of multi-modal models (MMs) has introduced new attack vectors that can leak sensitive training data in MMs, causing privacy leakage. This paper investigates a black-box privacy attack, i.e.,…

agentic
A Survey on Agentic Security: Applications, Threats and Defenses (arxiv.org) 10d

LLM-based agents are now used throughout cybersecurity. While these agents facilitate powerful and autonomous security applications, their autonomy opens up new attack surfaces, and the security community is actively building defenses to s…

agentic
SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search (arxiv.org) 10d

Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their…

agentic
MedAI: Evaluating TxAgent's Therapeutic Agentic Reasoning in the NeurIPS CURE-Bench Competition (arxiv.org) 10d

Therapeutic decision-making in clinical medicine constitutes a high-stakes domain in which AI guidance interacts with complex interactions among patient characteristics, disease processes, and pharmacological agents. Tasks such as drug rec…

agentic
Open-SWE-Traces: Advancing Dual-Mode Multilingual Distillation for Software Engineering Agents (arxiv.org) 10d

The path toward autonomous software engineering is currently bottlenecked by a severe deficit of diverse, large-scale trajectory data. We address this by introducing \ourdataset, an expansive dataset of 207,489 agentic trajectories spannin…

agentic
Green SARC: Predictive Cost and Carbon Governance for Agentic AI Systems (arxiv.org) 10d

Agentic AI systems act through tools and sub-agents, yet the controls meant to bound their financial and environmental cost still sit on dashboards evaluated beside or after execution. Green SARC applies the SARC governance-by-architecture…

agentic
MAGE-RAG: Multigranular Adaptive Graph Evidence for Agentic Multimodal RAG in Long-Document QA (arxiv.org) 10d

Long-document multimodal question answering requires a system to locate sparse evidence in long PDFs and integrate clues from text, tables, images, charts, and complex layouts. Existing RAG methods mostly rely on fixed Top-k retrieval over…

rag agentic
Snyk VulnBench JS 1.0: Can LLMs Find the Same Bugs Twice? (arxiv.org) 10d

We ran 300 repeated vulnerability-finding scans to measure how repeatable agentic large language model (LLM) security review is on the same JavaScript code, prompt, and benchmark harness. The headline result is that LLM security findings w…

↯ Security security agentic
The Perils of Agency: How Developers Perceive, Prioritize, and Address Risks in Agentic AI Products (arxiv.org) 10d

Agentic AI systems act autonomously, use tools, adapt to context, and operate in complex real-world environments. However, these same characteristics can create or exacerbate product risks.

agentic
Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale (arxiv.org) 10d

Efficient and scalable agentic intelligence requires models that can deliver both low-latency responses and strong reasoning capabilities while remaining practical to train, serve, and deploy. In this report, we present Ling-2.6 and Ring-2…

agentic
Resilient Consensus in Agentic AI (arxiv.org) 10d

Large language model (LLM) agents are increasingly deployed in multi-agent systems where they must coordinate and agree on shared decisions. We ask whether classical resilient consensus theory, developed for deterministic agents, transfers…

agentic
Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning (arxiv.org) 10d

We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. We pre-trained Nemotron 3 Ultra on 20 trillion text tokens, then extended the context length to 1M…

agentic
Beyond Correctness: Enhancing Architectural Reasoning in Code LLMs via Scalable Labeling with Agentic Judgment (arxiv.org) 10d

LLMs have substantially improved software engineering yet real-world development requires architectural understanding. Such understanding is prohibitively expensive to label manually and impossible to verify through tests alone.

agentic
A Security Analysis of Long-Horizon Agentic AI Systems: Threats, Evaluation, and Framework Development (arxiv.org) 10d

This paper presents a structured analysis of security challenges in long-horizon agentic AI systems. The study reviews existing threats, evaluation approaches, attack propagation mechanisms, and security frameworks.

agentic
Agentomics: Economic Foundations for the Valuation, Attribution, and Pricing of AI Agents in Human-AI Workflows (arxiv.org) 10d

Agentic AI systems are increasingly being deployed as productive resources in organizational workflows, yet existing evaluation methods primarily measure isolated technical performance rather than economic contribution. This paper introduc…

agentic
MiroBench: Benchmarking Realism in Agentic Simulation of Real-world Discussions (arxiv.org) 10d

LLM agents are increasingly used to simulate real world interactions, but it remains unclear whether simulated behaviors preserve the content patterns and interaction dynamics of real human behaviors. Existing evaluations remain fragmented…

agentic
Evaluation of Alternative-Based Information Systems for Deliberative Polling using an Agentic Simulator (arxiv.org) 10d

Deliberative polling promises to improve collective decision-making by exposing shareholders to a broad range of arguments before they vote. Yet ensuring that every voter encounters a representative sample of the reason space, the coverage…

agentic
Consensus-based Agentic Large Language Model Framework for Harmonized Tariff Schedule Code Classification (arxiv.org) 10d

Accurate Harmonized Tariff Schedule (HTS) code classification is essential for customs clearance, duty assessment, trade statistics, and regulatory compliance in maritime logistics. However, exact HTS classification remains challenging bec…

agentic
OpenClaw-Skill: Collective Skill Tree Search for Agentic Large Language Models (arxiv.org) 10d

Equipping Large Language Model (LLM) agents with effective skills is crucial for solving complex tasks in real-world systems like OpenClaw. In this work, we aim to develop a framework that automatically constructs such reusable skills to e…

openclaw agentic
The Integrator Advantage: Controlled Agentic AI for Small and Medium-Sized Companies (arxiv.org) 10d

Agentic AI marks a new phase of enterprise automation. Unlike traditional automation or conversational AI, agentic systems can interpret goals, plan multi step tasks, access tools, interact with enterprise systems, and execute workflows wi…

agentic
ARB4WM: An Adversarial Robustness Benchmark for World Models in Continuous Control (arxiv.org) 10d

World models are widely used in robotic and agentic engineering control systems due to their ability to learn latent dynamics for planning and decision-making. As these systems are increasingly deployed in safety-critical settings, underst…

agentic
Agentic Framework for Deep Learning workload migration via In-Context Learning (arxiv.org) 10d

Translating deep learning models from PyTorch's flexible, object-oriented design to JAX's functional, stateless setup is usually a manual and error-prone task. Automated migration is challenging because Large Language Models (LLMs) struggl…

agentic
LLM-as-Code Agentic Programming for Agent Harness (arxiv.org) 10d

Every major LLM agent framework gives the LLM the role of orchestrator; the model decides what to do next, when to call tools, and when to stop. We argue that token explosion, control-flow hallucination, and unreliable completion are not i…

↯ Hallucination hallucination agentic
TrustedARI: Towards Trust-Native Agentic Routing Infrastructure for Agentic AI (arxiv.org) 10d

AI agents increasingly access external models, tools, and services through Agentic Routing Infrastructure (ARI) to manage the overhead of heterogeneous interfaces and fragmented subscriptions. Yet, the architecture of ARI introduces fundam…

agentic
Agentic Retrieval and Reinforcement Learned Equation Chains: A Controlled Generation Framework for Complex and Novel Physics Word Problems (arxiv.org) 10d

Generating high-quality Physics Word Problems (PWPs) that are novel, complex, and solvable remains a challenging and underexplored problem in educational content generation. Existing approaches, many adapted from Math Word Problem (MWP) ge…

agentic
QoS-Aware Token Scheduling and Private Data Valuation for Multi-Modal Agentic Networks (arxiv.org) 10d

In agentic systems, human-generated data records anchor the value of AI services. Yet cloud compute pipelines centralize processing on remote servers.

agentic
A Formal Framework for Declarative Agentic AI in Business Process Analysis (arxiv.org) 10d

Agentic AI opens new opportunities for automating Business Process (BP), enabling autonomous decision-making and dynamic adaptation. However, realising this potential requires BP entities and their interactions to be defined with formal pr…

agentic
Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning (arxiv.org) 10d

Multimodal large language models (MLLMs) have demonstrated impressive capabilities in many visual tasks, but they often struggle with factual grounding when confronted with complex, open-world scenarios. While recent multimodal deep search…

agentic
Towards Verifiable Agentic Data Science: Solving Irregular TSQA Via Tool-Grounded Reasoning (arxiv.org) 10d

Time series data in real-world deployments is overwhelmingly irregular. Observations are asynchronous, missing values are informative rather than random, and sampling frequencies vary across sensors and operational windows.

agentic
Question about Claude in browsers? (www.reddit.com via reddit) 10d

Thinking about getting Claude for agentic browser use, entirely to automate some daily annoyances. A big flaw of copilot (I’m not in the US) is that it can’t actually execute tasks on websites.

↯ Copilot copilot agentic
I maintain two browser extensions (~800 weekly users) almost entirely through Claude Code, including the analytics pipeline and the store-publishing tools. Here's the setup. (www.reddit.comhttps) 10d

I'm a software engineer who moved into management years ago, so I started this to get my hands back on a keyboard and learn the agentic tooling instead of reading about it. It grew into two shipped browser extensions.

↯ Copilot grok deepseek copilot+4
Introducing CodeTree: A tokens efficient way to write code with Claude. (www.reddit.com via reddit) 10d

Agentic workflows are token hogs. That's the problem codetree solves.

agentic
MCP tools vs agentic web search on 3 SEC research tasks 10–21× fewer tokens, and agentic web search got most answers wrong (www.reddit.com via reddit) 10d

Disclosure up front: I build edgar.tools, the SEC-filings MCP server in the benchmark (built with Claude, free to try). Setup.

↯ Sonnet 4.6 sonnet mcp agentic
Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments (arxiv.org) 11d

As agentic systems continue to evolve and are widely deployed in real-world scenarios, there is a growing demand to faithfully evaluate their capabilities. However, current benchmarks are typically built on popular applications with relati…

agentic
Optimizing the Cost-Quality Tradeoff of Agentic Theorem Provers in Lean (arxiv.org) 11d

Large language models (LLMs) are increasingly used in workflows for generating formal proofs in Lean. These workflows often decompose problems into smaller lemmas, sample many proof attempts, and use compiler feedback to guide search.

agentic
Graph-based Target Back-Propagation for Context Adaptation in Multi-LLM Agentic Systems (arxiv.org) 11d

Context adaptation automates prompt engineering in LLM-based systems by iteratively revising tunable prompts from task feedback, without modifying model weights. Extending this paradigm to multi-LLM agentic systems is crucial: existing met…

agentic
EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning (arxiv.org) 11d

Autonomous LLM training is often framed as recipe search, which leaves the training harness largely static. This limitation sharpens in agentic RL, where shifting bottlenecks and scalar rewards mask diverse failure modes.

agentic
Optimizing Agentic Reasoning with Retrieval via Synthetic Semantic Information Gain Reward (arxiv.org) 11d

Agentic reasoning enables large reasoning models (LRMs) to dynamically acquire external knowledge, but yet optimizing the retrieval process remains challenging due to the lack of dense, principled reward signals. In this paper, we introduc…

agentic
Selective Agentic Recovery for UAV Autonomy with a Persistent Mission Runtime (arxiv.org) 11d

Agentic AI can support unmanned aerial vehicle (UAV) autonomy by providing high-level recovery reasoning when local waypoint- or setpoint-based execution encounters blocked passages, repeated no-progress behavior, or mission-level ambiguit…

agentic
Same-Origin Policy for Agentic Browsers (arxiv.org) 11d

Agentic browsers integrate autonomous AI agents into web browsers, enabling users to accomplish web tasks through natural-language instructions. The same-origin policy (SOP) is a fundamental browser security mechanism that prevents unautho…

agentic
An Agentic Retrieval Framework for Autonomous Context-Aware Data Quality Assessment (arxiv.org) 11d

Data quality assessment is a critical prerequisite for effective data analytics and data-driven decision-making, yet it remains a challenging task due to the inherently context-dependent nature of data quality. Existing approaches often re…

agentic
Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent Workflows (arxiv.org) 11d

Large language models increasingly serve as execution engines for agentic systems, yet they still consume context through a sequential text interface. This creates a mismatch with modern structured agent workflows, in which independent bra…

agentic
Closing the Reflection Gap: A Free Calibration Bonus for Agentic RL (arxiv.org) 11d

LLMs are increasingly deployed as agents that interact with external environments and observe feedback such as execution results, error messages, and tool outputs. A well-functioning agent should be able to leverage this feedback to accura…

agentic
TwinBI: An Agentic Digital Twin for Efficient Augmented Interactions with Business Intelligence Dashboards (arxiv.org) 11d

Business intelligence (BI) increasingly combines dashboard interaction with LLM-based assistance, but these two modes often fall out of sync during multi-step analysis. As users switch between direct dashboard manipulation and natural-lang…

agentic
YeasierAgent: Agentic Social Sandbox as a Canvas for Intent-Driven Creation of Platform-Agnostic Symbiotic Agent-Native Applications (arxiv.org) 11d

This paper introduces YeasierAgent, an application-building paradigm based on symbiotic agents, narrative worlds, and scene-aware interaction. It challenges the conventional device-coupled model of software by redefining applications as co…

agentic
Claude Fable 5 built me a live options strategy. It DESTROYED the market out of sample. Then the government banned it. (github.com via reddit) 11d

When Anthropic released Fable 5, I knew I had to see how good it REALLY is. For context, I'm building an agentic trading platform.

agentic anthropic
Do you know who has a universal jailbreak to their name, as of today? Officially? (www.reddit.com via reddit) 12d

AISI UK - Our evaluation of OpenAI's GPT-5.5 cyber capabilities In their own words: The above tests are capability evaluations carried out in a controlled research setting and do not necessarily reflect what is accessible to an ordinary pu…

↯ Security ↯ GPT 5.5 ↯ Jailbreak jailbreak gpt-5 security+2
Everyone complains about Fable being pulled from them, and I couldn't even get past its refusals to work on my projects (www.reddit.com via reddit) 12d

I have two large ongoing projects, one is an mpvpn-like transport, the other is an agentic harness ("claude code inside telegram" in short). It constantly refused to work on both, falling into the safety net.

opus agentic claude-code
Fable 5 is offline. Switch to Opus, jump to OpenAI, or just wait? (www.reddit.com via reddit) 13d

Fable 5 is offline. Switch to Opus, jump to OpenAI, or just wait?

↯ Opus 4.8 ↯ Anthropic Mythos ↯ Security ↯ Jailbreak jailbreak gpt-5 security+5
Built a Claude skill that mimics Fable 5's agentic behavior — free on GitHub (www.reddit.com via reddit) 13d

With Fable 5 access suspended, I built a skill that ports its behavioral patterns to Opus 4.8 — explicit multi-stage planning, parallel sub-agent delegation, and mandatory self-verification at each step. It won't close the raw capability g…

↯ Opus 4.8 opus agentic
Most of the internet isn't human anymore — and the next phase is agents that don't just read the web, they pay for what they need (www.reddit.com via reddit) 13d

Something broke this year and almost nobody outside of infra teams noticed: most internet traffic isn't human anymore. Imperva's 2025 Bad Bot Report measured automated traffic at 51% of all web traffic, the first time bots have outnumbered…

agentic
Claude Code CLI vs Claude in Xcode: Any Real Advantage for SwiftUI Vibe Coding? (www.reddit.com via reddit) 2w

Hi all, Is there any real advantage to using Claude / OpenAI agentic coding directly inside Xcode versus using Claude Code from the CLI? I’ve been using Claude Code CLI since it was released, and my current workflow is: Open the project in…

agentic openai anthropic+1
Contextual Invertible World Models: A Neuro-Symbolic Agentic Framework for Colorectal Cancer Drug Response (arxiv.org) 2w

Precision oncology is currently limited by the small-N, large-P paradox, where high-dimensional genomic data is abundant but pharmacological response samples are sparse. While deep learning achieves predictive accuracy, it frequently fails…

agentic
Intelligence as Managed Autonomy: Failure, Escalation, and Governance for Agentic AI Systems (arxiv.org) 2w

As autonomous and agentic AI systems scale in robotic and human-machine environments, managing hallucination and persistent but unjustified action remains an open challenge. Rather than attributing these failures solely to model or alignme…

↯ Hallucination hallucination agentic
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning (arxiv.org) 2w

Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attempt to address this by augmenting VLMs wi…

agentic
Understanding the Rejection of Fixes Generated by Agentic Pull Requests -- Insights from the AIDev Dataset (arxiv.org) 2w

AI coding agents are increasingly used to generate pull requests (PRs) that propose code fixes in software projects. From a first exploration of the AIDev dataset, we find that 46.41\% of the fixes proposed by the agents Copilot, Devin, Cu…

↯ Copilot devin copilot agentic
Toward Instructions-as-Code: Understanding the Impact of Instruction Files on Agentic Pull Requests (arxiv.org) 2w

AI-agents (e.g., GitHub Copilot) collaborate as teammates in different software engineering tasks, including code generation proposed through pull requests (Agentic-PRs). For better agent efficiency, developers create instruction files tha…

↯ Copilot copilot agentic
An LLM System for Autonomous Variational Quantum Circuit Design (arxiv.org) 2w

The design of high performing quantum circuits remains largely dependent on human expertise. We introduce an autonomous agentic framework that employs large language models (LLMs) to conduct iterative quantum circuit designs under explicit…

agentic
Mining Architectural Quality Under Agentic AI Adoption: A Causal Study of Java Repositories (arxiv.org) 2w

AI coding tools are now used by a majority of developers, and agentic use of these tools has popularized the practice colloquially called "vibe coding". Yet causal evidence on their effect on software architecture is scarce.

agentic
The Internet of Agentic AI: Communication, Coordination, and Collective Intelligence at Scale (arxiv.org) 2w

The rapid emergence of autonomous AI agents is transforming artificial intelligence from isolated model inference into distributed systems of reasoning, communication, and action. This paper develops the vision of the Internet of Agentic A…

agentic
Agentic MPC for Semantic Control System Resynthesis (arxiv.org) 2w

While MPC effectively handles structured, diverse, and low-level specifications, it lacks the capability to dynamically incorporate high-level contextual information such as social norms, user intent, or natural language instructions. To a…

agentic
MiniMax Sparse Attention (arxiv.org) 2w

Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundreds of thousands to millions of tokens,…

↯ Minimax minimax agentic
From Verdict to Process: Agentic Reinforcement Learning for Multi-Stage Fact Verification (arxiv.org) 2w

Recent approaches combining Large Language Models (LLMs) with retrieval-augmented reasoning have shown promise for automated fact verification. To process complex claims, these verification pipelines typically execute multi-stage workflows…

agentic
Learning What to Remember: A Cognitively Grounded Multi-Factor Value Model for Agentic Memory (arxiv.org) 2w

Long-running LLM agents accumulate interaction histories far larger than any context window, forcing a standing decision: what to encode deeply, what to forget, and what to retrieve under a fixed memory budget. Production systems answer wi…

agentic
Iterating Toward Better Search: A Two-Agent Simulation Framework for Evaluating Agentic Search Architectures in E-Commerce (arxiv.org) 2w

We present a modular two-agent simulation framework for evaluating conversational shopping assistant architectures. An independent buyer agent, configured with personas, missions, and patience levels, is paired with an interchangeable resp…

agentic
MDForge: Agentic Molecular Dynamics Pipeline Design under Sparse Simulator Feedback (arxiv.org) 2w

Molecular dynamics (MD) is the canonical in-silico method for atomistic molecular science, simulating molecular behavior from first-principle physics. Designing an MD pipeline for a new system requires substantial expert knowledge: running…

agentic
The Containment Gap: How Deployed Agentic AI Frameworks Fail Public-Facing Safety Requirements (arxiv.org) 2w

Agentic large language model systems that autonomously invoke tools, maintain persistent memory, and execute multi-step plans are increasingly deployed in public-facing domains, including government services, healthcare triage, and financi…

agentic
Strategic Decision Support for AI Agents (arxiv.org) 2w

Traditionally, decision support studies how humans use machine learning models to make better decisions. In modern agentic systems, this division of roles is increasingly reversed: AI agents act on behalf of users, while humans and tools b…

agentic
Are coding agents getting expensive, or are we measuring cost the wrong way? (www.reddit.com via reddit) 2w

Seeing the recent token-burn discussion around agentic coding made me think the bigger issue is not just price. A coding agent can be expensive and still be worth it if it removes real engineering effort.

agentic
Are you guys also hitting a cost wall with agents? Any harnesses that actually support Batch API? (www.reddit.com via reddit) 2w

I’ve been tracking my agentic workflow costs, and I'm realizing a massive chunk of the budget is being leaked because my bg agents are treating everything as "realtime" inference. Has anyone found an agent harness or orchestration pattern…

agentic
How are you managing AI costs once agents start making decisions on their own? (www.reddit.com via reddit) 2w

We've noticed that AI cost management changes significantly once you move from chatbots to agentic workflows. A chatbot might make a single model call.

agentic
DiffusionGemma made me rethink what memory bandwidth means for local agent inference (www.reddit.com via reddit) 2w

Been testing DiffusionGemma 26B A4B for the last few days and the bottleneck profile is completely different from autoregressive models. With autoregressive models you are compute-bound during prefill and memory-bandwidth-bound during deco…

↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 agentic
I help businesses implement AI for lead gen, CRM, and custom agentic workflows (www.reddit.com via reddit) 2w

I’ve been working on implementing AI solutions for businesses and have seen some genuinely strong results, so I figured I’d offer this here. I’m currently helping teams with things like: • AI-powered lead generation (finding and qualifying…

agentic
Open-source procurement rubric for agentic AI vendors, I scored 5 of them and want feedback on the methodology (www.reddit.com via reddit) 2w

I built a tool that scores agentic AI vendor documentation against a 15-question rubric covering tool-call correctness, loop termination, and multi-step state coherence. Drop a folder of a vendor's public docs in, get back a structured rep…

agentic openai anthropic
AI agents are hitting 'Raw Host Access' risks. I built Armorer as a secure admission layer for agentic workflows. Docker sandboxing by default. (www.reddit.com via reddit) 2w

Hey r/AI_Agents, most frameworks today assume a trusted host, but that's a massive risk when agents start using tools. I've been working on Armorer to solve this: it acts as a local control plane that forces agents into isolated Docker con…

agentic
As we know Minimax M3 is just going to be open sourced in few days and because of that I was surfing on internet searching for its scores and I found out pretty interesting results. Is Minimax M3 really that good in agentic stuff and in coding? Is it better than older gpt models? (www.reddit.com via reddit) 2w

Has anyone personally compared the Minimax M3 model against other proprietary models to determine its relative performance tier? I am trying to understand where it currently ranks in the broader Al landscape.

↯ Minimax minimax agentic
Passed the Claude Certified Architect - Foundations (CCA-F) Exam! Quick write-up + detailed prep notes doc (www.reddit.com via reddit) 2w

Hey everyone, Super happy to share that I recently cleared the Claude Certified Architect - Foundations (CCA-F) exam!Since I’ve been lurking around this sub to stay updated on Anthropic's ecosystem, I wanted to pay it forward and share a q…

agentic anthropic
Slop or not? Is there a line that makes an AI assisted/generated project not slop? Effort or whatever? (www.reddit.com via reddit) 2w

So I've been messing around with Fable trying to make my own personal AI agent. (I'm not a programmer or a developer btw.) And that got me thinking, is there a line that defines if a vibecoded (or agentic engineering as they call it nowada…

agentic
Infinite Music with Magenta Realtime 2, fully open-source (www.reddit.comhttps) 2w

Just open-sourced a local voice AI realtime music setup where my ESP32 microcontroller talks to my MacBook over WebSockets. The microcontroller is just a tiny Arduino-based device with a mic and speaker, and the MacBook M4 Pro runs Magenta…

qwen agentic
Model recommendations for family photo classification / identification (www.reddit.com via reddit) 2w

I recently had a big family photo digitalization done for photos up to 130 years old. There are tons of people that I don't know or I don't recognize as young people in a soft lens.

gemma agentic
Infinite Music Glitch on my Arduino with Magenta Realtime 2 (www.reddit.comhttps) 2w

I built a local voice AI realtime music setup where my ESP32 microcontroller talks to my MacBook over WebSockets. The microcontroller is just a tiny Arduino-based device with a mic and speaker, and the MacBook M4 Pro runs Magenta Realtime…

qwen agentic
unpopular opinion: stop adding more load to congested intersections with giant models (www.reddit.com via reddit) 2w

I want to offer a minority opinion about the recent hype. I’m tired of reading posts by Karpathy about ideas that were already known months earlier, and then treating them as if he just discovered something groundbreaking.

agentic
Composer 2.5 is phenomenal. So is cusror 3.0 (www.reddit.com via reddit) 2w

I know everyone has been pissed recently about Cursor 3.0 ditching (or hiding) the traditional IDE setup to push toward "agentic development." But honestly? Cursor is better than ever right now.

cursor opus agentic
12 months ago nobody understood why we were building Agentic SDLC. Now it feels like everyone is heading in the same direction. (www.reddit.com via reddit) 2w

I’m one of the founders of Overcut, so take this with the appropriate level of skepticism, but I’ve had a front-row seat to how quickly this market has changed over the last year. When we started building Overcut, most conversations ended…

↯ Copilot copilot cursor agentic
UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning (arxiv.org) 2w

Human-in-the-loop reinforcement learning (HiL-RL) has emerged as an effective paradigm for real-world robotic manipulation, enabling online policy improvement with human guidance. However, current HiL-RL frameworks remain intervention-inte…

agentic
IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents (arxiv.org) 2w

This paper investigates reinforcement learning (RL) methods for improving tool-calling capabilities in multimodal small language model (SLM) agents. While existing works have explored various reward designs to improve agentic tool-calling…

↯ Tool Use tool-calling tool-use agentic
Food4All: An Agentic Framework and Benchmark for Food Resource Navigation with Adaptive User Understanding (arxiv.org) 2w

Food assistance referral requires conversational agents to translate underspecified, often noisy help-seeking dialogues into locally valid resource recommendations. We present Food4All, an agentic food-resource referral framework and bench…

agentic
Agent Skill Evaluation and Evolution: Frameworks and Benchmarks (arxiv.org) 2w

The growth of agent skills has transformed how agentic systems are built, evaluated, and deployed. As skill libraries continue to scale, rigorous evaluation becomes critical to ensuring their utility, quality, and safety in real-world appl…

agentic
Libra: Efficient Resource Management for Agentic RL Post-Training (arxiv.org) 2w

Reinforcement learning (RL) has emerged as a standard post-training paradigm for shaping large language models (LLMs) into capable agents. In agentic RL, the rollout stage generates trajectories while invoking tools, producing long-tailed…

agentic
Human-Guided Agentic AI for Multimodal Clinical Prediction: Lessons from the AgentDS Healthcare Benchmark (arxiv.org) 2w

Agentic AI systems are increasingly capable of autonomous data science workflows, yet clinical prediction tasks demand domain expertise that purely automated approaches struggle to provide. We investigate how human guidance of agentic AI c…

agentic
Resource-Aware LLM Reasoning for Mobile Edge General Intelligence (arxiv.org) 2w

The rapid advancement of large language models (LLMs) has enabled an emergence of agentic artificial intelligence (AI) with powerful reasoning and autonomous decision-making capabilities. This integration with edge computing has led to the…

agentic
A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models (arxiv.org) 2w

Time series reasoning treats time as a first-class axis and incorporates intermediate evidence directly into the answer. This survey defines the problem and organizes the literature by reasoning topology with three families: direct reasoni…

agentic
APPO: Agentic Procedural Policy Optimization (arxiv.org) 2w

Recent advances in agentic Reinforcement Learning (RL) have substantially improved the multi-turn tool-use capabilities of large language model agents. However, most existing methods assign credit over coarse heuristic units, such as tool-…

agentic
Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application (arxiv.org) 2w

Environments serve as interactive systems for large language model (LLM) based agents across diverse scenarios and play a crucial role in driving the continual evolution of model capabilities. Despite this importance, existing work lacks a…

agentic
Can Open-Source LLM Agents Replace Static Application Security Testing Tools? An Empirical Assessment (arxiv.org) 2w

This paper explores the value of agentic AI tools for cybersecurity purposes. We evaluate the efficacy of a general-purpose GenAI Large Language Model- (GenAI-) based agent when powered by three different Ollama-hosted general-purpose open…

ollama agentic
Sovereign Assurance Boundary: Certificate-Bound Admission for Agentic Infrastructure (arxiv.org) 2w

Agentic infrastructure introduces a critical control-plane authorization problem: non-deterministic reasoning systems can propose high-stakes mutations to production resources, yet existing security mechanisms -- such as identity and acces…

agentic
FlowBank: Query-Adaptive Agentic Workflows Optimization through Precompute-and-Reuse (arxiv.org) 2w

Large Language Model (LLM)-based multi-agent systems are increasingly powerful, but current agentic workflow optimization paradigms make an unsatisfying trade-off. Task-level methods spend substantial offline compute yet deploy only a sing…

agentic
An Ethical eValuation Agent (EeVA): Results of a Proof-of-Concept Test on a Prototype Agentic-like Workflow to Assist Ethical Deliberations (arxiv.org) 2w

Ethical deliberation is often misunderstood as a search for single right or wrong answers, creating difficulties for non-ethically trained personnel who must address ethically laden challenges. We developed EeVA, an agentic-like LLM-based…

agentic
Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning (arxiv.org) 2w

The rapid progress of reasoning and agentic large language models (LLMs) has increased the demand for long-context inference, but self-attention (SA) scales quadratically with context length. To address this, we study SWARR (Sliding-Window…

agentic
HERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-Distillation (arxiv.org) 2w

Reinforcement learning typically improves multi-turn agent capabilities through the terminal outcome of the trajectories, which makes it difficult to determine credit assignments for each intermediate turns. Recent on-policy self-distillat…

agentic
How can Deepseek v4 top the coding leaderboards and still sit 8 months behind the frontier? (www.reddit.comhttps) 2w

Two numbers on this model that don't sit comfortably with each other. The Pro config posts coding scores near the top of every board, 80.6 on SWE-bench Verified and 93.5 on LiveCodeBench.

↯ Swe Bench ↯ DeepSeek 4 ↯ DeepSeek 4 ↯ DeepSeek 4 ↯ DeepSeek 4 ↯ DeepSeek 4 swe-bench gpt-5 deepseek+1
Fable/Mythos API costs are actually cheaper then GPT-3 was when first released (per token) (www.reddit.com via reddit) 2w

See: https://www.reddit.com/r/GPT3/comments/ikorgs/oa_api_preliminary_beta_pricing_announced/ Of course with thinking and agentic use a single prompt is more expensive, sure, but a token is a token.

↯ Anthropic Mythos mythos agentic
Core Workflows And Guidelines MCP Servers For Devs - Did I Reinvent The Wheel? (www.reddit.com via reddit) 2w

In an effort to centralize what was once a mix of homemade skills, instructions and scripts every dev in our company made on their own setup, I created an MCP servers infrastructure that gathers all the core workflows, guidelines and integ…

mcp agentic
I use ACP build a tool Aflow - Agent help you build an Agentic Workflow (www.reddit.com via reddit) 2w

Aflow Agent is built on Specflow / Pi / ACP. It is not another chat window; it is a workflow-native agent that helps teams design, run, maintain, and improve durable agent processes (Main to coding scenario).

agentic
Fable 5 and the 8 July privacy update landed the same week. Is the model launch pulling attention off the data changes, or am I overthinking it? (www.reddit.com via reddit) 2w

Two things landed together this week and I'm trying to work out if they're connected or if I'm joining dots that aren't there. Genuinely asking, happy to be corrected.

↯ Opus 4.8 ↯ Anthropic Mythos mythos opus agentic+1
built another AI agent runtime. What would you do with it? (www.reddit.com via reddit) 2w

Yes, I know. “Another AI agent runtime.” That’s exactly why I’m asking.

mcp agentic claude-code
Hot Take "Rigid code is better than Flexible code if you're on a budget" (www.reddit.com via reddit) 2w

I've spent the last six months trying to build a fully local, agentic pipeline for a text_processing and extraction tool I use daily. Because I’m running everything on a single consumer GPU setup, my choices are limited to smaller, quanti…

↯ Qwen 3.5 ↯ Qwen 3.5 gemma qwen agentic
24 hours with Fable 5, the coding leap is real, the price tag hurts (www.reddit.com via reddit) 2w

I got access to Fable 5 through our usual gateway setup (TokenRouter). The switch was one line in config, basically just changing the model string.

↯ Opus 4.8 opus agentic
Are AI agents and automation skills using n8n a good way to make money? (www.reddit.com via reddit) 2w

Hey everyone, I'm new here and very new to the idea of n8n and agentic workflows. these past couple days I've been messing around on n8n, and i want to learn more and improve my skills.

agentic
Why is anyone surprised Anthropic is tightening subscription limits? (www.reddit.com via reddit) 2w

The frustration makes sense if you built something on a flat subscription and now have to reprice. That's a real pain.

agentic anthropic
I’m upgrading my AI dating assistant to Fable (www.reddit.com via reddit) 2w

agentic openai
Deploy a Qwen 3.6 Agentic RAG — Step-by-Step Walkthrough (medium.com via reddit) 2w

Deploy an Agentic RAG powered by Alibaba’s latest Qwen 3.6, running fully on your machine.

↯ Qwen 3.6 rag qwen agentic
How useful is qwopus compared to qwen3.6 27b (www.reddit.com via reddit) 2w

I see a lot of conflict comments on this sub and elsewhere on how useful is qwopus compared to for example unsloth quants of qwen3.6 27b. Some say it’s worse some say it’s much better.

↯ Qwen 3.6 agentic
I built a skill file that stops AI coding agents from doing dumb stuff — 18 rules, 30 anti-patterns, checklists (www.reddit.com via reddit) 2w

Used Claude for agentic coding long enough to notice a pattern. It would: Touch files I never asked it to touch Say "Done!" when 40% wasn't implemented Add abstractions for "future extensibility" I never asked for Build on something I said…

agentic
Agentic Setup: Minimax 2.7 vs qwen 3.6 (www.reddit.com via reddit) 2w

I'm currently using Minimax 2.7-AWQ-4bit for an specific coding agentic workflow. I see many of you are currently using Qwen3.6 and wanted to know how does it compare with Minimax2.7 .

↯ Minimax minimax qwen agentic
The 'storage tax' on cloud GPUs for short LLM runs is brutal. What's your workflow? (www.reddit.com via reddit) 2w

I’m trying to test Qwen3.6-27B for agentic coding through Cline / llama.cpp, but my local box struggles once the context gets longer. (my poor 3080 just can't keep up).

↯ Qwen 3.6 cline llama agentic
AnomaMind: Agentic Time Series Anomaly Detection with Tool-Augmented Reasoning (arxiv.org) 2w

agentic
GCA Framework: A GCC Countries-Grounded Dataset and Agentic Pipeline for Climate Decision Support (arxiv.org) 2w

agentic
AgentPLM: Agentic Protein Language Models with Reasoning-Augmented Decoding for Protein Sequence Design (arxiv.org) 2w

agentic
A Sober Look at Agentic Misalignment in Automated Workflows (arxiv.org) 2w

agentic
The Price of Agreement: Measuring LLM Sycophancy in Agentic Financial Applications (arxiv.org) 2w

agentic
TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning (arxiv.org) 2w

agentic
Effective Reinforcement Learning for Agentic Search by Recycling Zero-Variance Queries During Training (arxiv.org) 2w

agentic
Assessing Automated Prompt Injection Attacks in Agentic Environments (arxiv.org) 2w

↯ Security prompt-injection security agentic
Agentic Hybrid RAG for Evidence-Grounded Muon Collider Analysis (arxiv.org) 2w

rag agentic
$\tau$-Rec: A Verifiable Benchmark for Agentic Recommender Systems (arxiv.org) 2w

agentic
Human-AI Coordination Zones: A Framework for Designing Human-in-the-Loop Experiences with Agentic AI (arxiv.org) 2w

agentic
Agentic Social Affordance Framework (ASAF): Agent Identity Design as a Collaboration Interface in Multi-Agent Systems (arxiv.org) 2w

agentic
ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity (arxiv.org) 2w

agentic
Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields (arxiv.org) 2w

agentic
AutoPDE: Reliable Agentic PDE Solving via Explicitly Represented Solver Strategies (arxiv.org) 2w

Numerical solvers for partial differential equations (PDEs) are core computational tools in science and engineering. Building reliable PDE solvers requires not only executable code, but a numerical solver strategy, a set of decisions about…

agentic
HIPIF: Hierarchical Planning and Information Folding for Long-Horizon LLM Agent Learning (arxiv.org) 2w

While Large Language Models (LLMs) have demonstrated strong capabilities as autonomous agents across a wide range of tasks, their performance often degrades in multi-turn long-horizon agentic tasks. Existing methods have made progress thro…

agentic
I wired up Agentic Coding with Code Context Graphs, results are interesting (www.reddit.com via reddit) 2w

I have been curious about how will having a infrastructure that provides agents the capability to explore code bases as relations, rather than text will change the performance of the AI agents So, for the last few weeks, I have been buildi…

↯ Gemma 4 gemma gemini mcp+1
Releasing Apodex-1.0 Smol Models (0.8B, 2B, 4B Open-Weights) optimized for Agentic Verification + AgentHarness Evals (www.reddit.com via reddit) 2w

Hey r/LocalLLaMA, We just released Apodex 1.0, and alongside our flagship API, we are releasing the weights for our Smol models (0.8B, 2B, and 4B). Our core research focuses on independent verification in long-horizon tasks.

agentic
Stop putting your AI agent’s memory inside the LLM context window (www.reddit.com via reddit) 2w

Hey everyone, been shipping a few agentic workflows into production lately and wanted to rant/share a massive architectural mistake I keep seeing people make. Stop treating the LLM context window or massive vector embedding as your agent’s…

agentic
Newer Qwen models are worse at summarization? (www.reddit.com via reddit) 2w

We have summaries annotated by real humans that we benchmark various models, using an LLM as a judge, we found that in the 30B params range, Qwen 3 tops it out, followed by Gemma 4. It feels like newer Qwens are optimized to perform agenti…

↯ Gemma 4 gemma qwen agentic
What do you think about the new Claude model just released Today Claude Fable-5 ( Mythos) ? ? (www.reddit.com via reddit) 2w

So the hype has been building for months now and Claude 5 is supposedly dropping any day in Q2-Q3 2026. I've been seeing all these leaks about "Claude Mythos" and the "Fennec" codename floating around, but nothing official yet from Anthrop…

↯ Anthropic Mythos ↯ Opus 4.6 mythos opus agentic+1
First ever Hands-free agentic AI browsing ~ Just an extension (www.reddit.com via reddit) 2w

Hey fellows, Ever thought of using your browser without touching your keyboard? Before you think "just another AI Slop wrapper"...

agentic
Looking for 16gb ram / 8gb vram crew - what you using? Omnicoder 9b? something else (www.reddit.com via reddit) 2w

I've got a laptop with 16GB RAM and 8gb VRAM (4060 mobile). This means the qwens 3.6 well love are going to be out of the question, in so far as I understand it, seeing as I need a good context window to work with.

agentic
Fable 5 just made cost-aware model routing mandatory for agent builders (www.reddit.com via reddit) 2w

Anthropic dropped Fable 5 today, their new Mythos-class model above Opus. Pricing is $10/M input and $50/M output, exactly double Opus 4.8.

↯ Opus 4.8 ↯ Anthropic Mythos mythos opus agentic+1
Fable 5 is insanely good but watch your usage, I was burning 2% a minute on 20x (www.reddit.com via reddit) 2w

Been playing with Fable 5 since it dropped this morning and the model is genuinely a step up. But holy hell, the burn rate.

↯ Opus 4.8 opus agentic
Meta’s long push into 3D/Embodied AI Agents is heating up — why this matters for open browser-native tools like three.ws (www.reddit.com via reddit) 2w

Meta (the company) has been investing heavily in embodied 3D AI agents for years — think Habitat simulator, recent SAM 3D for single-image 3D reconstruction, and ongoing VR/Horizon work with agentic tools for immersive environments. This i…

agentic
Introducing Gemma 4 12B: a unified, encoder-free multimodal model (deepmind.google) 2w

Introducing Gemma 4 12B: a unified, encoder-free multimodal model Today, we are introducing Gemma 4 12B, our latest model designed to bring agentic multimodal intelligence directly to laptops. Bridging the gap between our edge-friendly E4B…

↯ Gemma 4 gemma agentic
Did You Really Review Those 5,000 Lines Your Agent Just Wrote? (www.reddit.com via reddit) 2w

Did you vibe-code 5k+ lines of code without thoroughly reviewing all of them? Is your application held together mostly by thoughts, prayers, and a suspicious amount of copium ?

agentic
Rumor: Anthropic Planning to Release Public Version of Claude Mythos Tomorrow (with Guardrails) (www.reddit.com via reddit) 2w

According to tech journalist Alex Heath (Sources newsletter), Anthropic is planning to release a public version of Mythos tomorrow. Key details from the report: • It will include substantial guardrails, notably not as cyber-permissive as t…

↯ Anthropic Mythos mythos agentic anthropic
How I stopped context window bloat in continuous Anthropic agent loops (Opus + Sonnet architecture) (www.reddit.com via reddit) 2w

I’ve been spending a lot of time deploying multi-agent architectures, and one of the biggest bottlenecks in running continuous agentic loops is hitting context limits and the resulting API latency spikes. I wanted to share an architectural…

sonnet opus agentic+1
To be real, AI is just a big expensive corporate trend, (www.reddit.com via reddit) 2w

like apart from coding, it's pretty much doesn't create value okay it can make photos from prompts and videos and can be agentic and doing things instead of us but even the most experienced teams make mistakes, but a machine can never be h…

agentic
Building an open-source Legal AI because apparently legal documents were written by sleep-deprived wizards (www.reddit.com via reddit) 2w

I am working on an open-source agentic Legal AI that can scan legal documents, understand what’s inside, extract important clauses, find risks, summarize obligations, and help people avoid reading 47 pages of “whereas, hereto, hereinafter,…

agentic
IntiDev AgentLoops: Feedback Loops for Agentic Workflows ( via reddit) 2w

could not extract summary

agentic
I’ve been optimizing AI agents for teams/friends, offering free reviews (www.reddit.com via reddit) 2w

I’ve spent the last few months helping my team and friends make their AI agents more reliable, cheaper, and easier to debug. I’ve mostly been helping with reliability issues, evals, debugging traces, hallucinations, bad tool calls, and cos…

agentic
Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning (arxiv.org) 2w

agentic
Exploring Autonomous Agentic Data Engineering for Model Specialization (arxiv.org) 2w

agentic
Skill Retrieval Augmentation for Agentic AI (arxiv.org) 2w

agentic
From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG (arxiv.org) 2w

rag agentic
ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL (arxiv.org) 2w

agentic
Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems (arxiv.org) 2w

rag agentic
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond (arxiv.org) 2w

agentic
EvoMaster: A Foundational Evolving Agent Framework for Agentic Science at Scale (arxiv.org) 2w

agentic
FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks (arxiv.org) 2w

agentic
Observability for Delegated Execution in Agentic AI Systems (arxiv.org) 2w

agentic
Autonomous Incident Resolution at Hyperscale: An Agentic AI Architecture for Network Operations (arxiv.org) 2w

agentic
Structuring agentic AI for HPC code modernization (arxiv.org) 2w

agentic
Agentic Search for Counterfactual Recourse under Fixed LLM Budgets (arxiv.org) 2w

agentic
HARBOR: A Harness Framework for Agentic Robot Reinforcement Learning (arxiv.org) 2w

agentic
Agentic multi-fidelity learning of quasiparticle and excitonic properties (arxiv.org) 2w

agentic
ViMax: Agentic Video Generation (arxiv.org) 2w

agentic
BRAIN: Bayesian Reasoning via Active Inference for Agentic and Embodied Intelligence in Mobile Networks (arxiv.org) 2w

agentic
SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research (arxiv.org) 2w

agentic
The Token Not Taken: Sampling, State, and the Variability of AI Agent Outputs (arxiv.org) 2w

Agentic AI systems can behave differently across runs: the same request may produce a different plan, a different tool call, a different code edit, or a different final answer. Such variability arises from several layers that are often con…

agentic
AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models (arxiv.org) 2w

Multimodal Foundation Models (MFMs) have made substantial progress, yet remain fragile in spatial reasoning over the physical world. A key bottleneck lies in their inability to transform local egocentric observations into a global allocent…

agentic
RAILS: Verification-Native Clearing For Agentic Commerce (arxiv.org) 2w

Autonomous agents negotiate, purchase, deploy code, and move funds, but no neutral mechanism determines whether they met their delegated obligation, who is responsible when they did not, or which settlement action follows. This is the agen…

agentic
Beyond Agent Architecture: Execution Assumptions and Reproducibility in LLM-Based Trading Systems (arxiv.org) 2w

Large language models (LLMs) and agentic systems are increasingly proposed for financial trading, yet their reported performance remains difficult to compare because studies vary in data provenance, temporal split discipline, execution tim…

agentic
SAGE: An LLM-driven Self Reflective Agentic Framework for Fraud Detection (arxiv.org) 2w

Fraud detection in payment, e-commerce, and telecommunications systems requires accuracy at the individual level, robustness under severe class imbalance, and ease of understanding for risk managers. Existing methods fall at least one of t…

agentic
A Multi-modal Agentic Co-pilot for Evidence Grounded Computational Pathology (arxiv.org) 2w

Pathology is the cornerstone of modern medicine, where accurate decision-making relies heavily on evidence-based practices. While artificial intelligence (AI) has the potential to transform clinical workflows, the intersection of AI and ev…

agentic
A case study of evaluating AI agents on a neuroscience data-to-discovery pipeline (arxiv.org) 2w

Agentic AI tools offer a promising path to automating software development bottlenecks in scientific research pipelines, particularly for stages that take domain experts days to months to build, where scientists care about correctness and…

agentic
PathoSage: Towards Multi-Source Evidence Adjudication in Pathology via Experience-Aware Agentic Workflow (arxiv.org) 2w

Recent advances in Multimodal Large Language Models (MLLMs) and agent workflows have shown strong promise for computational pathology, yet reliable patch-level reasoning remains challenging. End-to-end pathology MLLMs often hallucinate mor…

agentic
start-with-why-skillset for agentic workflows (www.reddit.com via reddit) 2w

Hello everyone! This is my first post on reddit.

agentic
Questions about agents (www.reddit.com via reddit) 2w

Hi there! I've been working with Claude primarily for tutoring, and I'm branching out into coding.

agentic
WWW is not ready for agents? (www.reddit.com via reddit) 2w

IT industry promotes idea of agents on everything, even turning users computers into local agent platforms. But a lot of websites and whole hosting platforms do have different kinds of anti-bots protection (usually captcha but some has mor…

agentic
Az8 Studio: The closest thing we have to a multi-modal "Agentic" canvas for video pipelines? (First impressions) (www.reddit.com via reddit) 2w

Hey everyone, I’ve been tracking how AI agents are moving from pure text/code automation into multi-modal workflows, and I just came across Az8 Studio. If you guys are tired of linear UI prompt boxes (like Runway/Pika) and want something t…

agentic
Participate in Research on New Agentic Platform (www.reddit.com via reddit) 2w

I work for a market research company, and we are working with an AI company on their new agentic product. We are looking for current users of agentic AI to participate in paid beta testing of this platform, which will take place over the n…

agentic
Share your agentic LLMs and average cost ($/MTokens) (www.reddit.com via reddit) 2w

↯ DeepSeek 4 deepseek opus agentic
OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents. (www.reddit.com via reddit) 2w

OpenEnv is a tool for creating an agentic execution environment like terminals, browsers, or anything an agent can interact with. And today, we’re excited to announce that OpenEnv is becoming even more open, to make the future of training…

vllm agentic
Any AI tools do you use for optimizing AI agents automatically? (Auto research) (www.reddit.com via reddit) 2w

Hey, We’ve all heard about Karpathy’s autoresearch and I think that’s a pattern applicable to AI agents, where an AI like claude code optimizes and AI agentic system to improve an evaluation score. However Karpathy’s repo isn’t really a re…

agentic claude-code
I Compared the Top AI Models of 2026 — The Results Were More Nuanced Than Expected (www.reddit.com via reddit) 2w

Over the last few weeks I've been comparing the latest frontier AI models, including Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, Grok 4.3, Perplexity AI and DeepSeek V4-Pro. Instead of focusing only on benchmark scores, I looked at: Real-wor…

↯ Gemini 3.1 grok gpt-5 deepseek+3
If a provider's plan is to limit with quota, hourly, weekly, and monthly limits, what is the future of automatic agentic workflows? You can't just run an agent on a tight budget. ( via reddit) 2w

could not extract summary

agentic
A new agentic way to build automations (www.reddit.com via reddit) 2w

For a lot of personal automations, it is easier to show than prompt since we already do them on our own browser/computer. For example, it is easier to do a screen recording and say, download data by clicking on this button on the dashboard…

agentic
Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning (arxiv.org) 2w

agentic
StepPO: Step-Aligned Policy Optimization for Agentic Reinforcement Learning (arxiv.org) 2w

agentic
AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning (arxiv.org) 2w

agentic
SlideAgent: Hierarchical Agentic Framework for Multi-Page Visual Document Understanding (arxiv.org) 2w

agentic
MADE: Beyond Scoring via a Multilingual Agentic Diagnosing Engine for Fine-Grained Evaluation Insights (arxiv.org) 2w

agentic
Rethinking Code Review in the Age of AI: A Vision for Agentic Code Review (arxiv.org) 2w

agentic
SW-$A^2$-Bench: Benchmarking Autonomous Software Agent Generation for Agentic Web (arxiv.org) 2w

agentic
Autonomous computational catalysis through an agentic research system (arxiv.org) 2w

agentic
Beyond the Black Box: Interpretability of Agentic AI Tool Use (arxiv.org) 2w

↯ Tool Use tool-use agentic
Agentic Physical AI toward a Domain-Specific Foundation Model for Energy Systems: A Case Study on Nuclear Reactor Control (arxiv.org) 2w

agentic
MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism (arxiv.org) 2w

agentic
The Three-Ring Architecture: Governing Agents in the Era of On-Platform Organisations (arxiv.org) 2w

The current phase of enterprise AI deployment faces a structural failure: organisations are acquiring agentic capability without the infrastructure to govern it. The result is expected to reproduce the error of the first wave of AI deploym…

agentic
SCALE: Scalable Cross-Attention Learning with Extrapolation for Agentic Workflow Scheduling (arxiv.org) 2w

Agentic Large Language Model (LLM) systems decompose complex tasks into workflow Directed Acyclic Graphs (DAGs) whose primitives must be scheduled on heterogeneous clusters. Existing deep reinforcement learning (DRL) schedulers are tied to…

agentic
What Your Posts Reveal: A Benchmark and Agentic Framework for User-Level Privacy Leakage on Social Media (arxiv.org) 2w

Public social media posts can reveal private information through weak cues scattered across text, images, or metadata. Such leakage is often cumulative and cross-post: cues that appear harmless in isolation may jointly expose a user's home…

agentic
Agentic Large Language Models for Automated Structural Analysis of 3D Frame Systems (arxiv.org) 2w

Large language models (LLMs) have emerged as powerful foundation models with strong reasoning capabilities across domains. Beyond reactive text generation, agentic LLMs enable autonomous workflow execution through modular task decompositio…

agentic
Act As a Real Researcher: A Suite of Benchmarks Evaluating Frontier LLMs and Agentic Harnesses in Research Lifecycle (arxiv.org) 2w

As foundation models advance and agent scaffolding becomes increasingly sophisticated, agents have demonstrated remarkable proficiency in complex, long-horizon coding tasks and even autonomous experiment execution. Despite their evolution…

agentic
DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning (arxiv.org) 2w

Deep Research (DR) has emerged as a new agentic paradigm to tackle complex, open-ended research tasks, demanding systems that can iteratively frame problems, acquire evidence, verify sources, and synthesize long-form reports. In practice,…

agentic
Exploring Agentic Tool-Calling Decisions via Uncertainty-Aligned Reinforcement Learning (arxiv.org) 2w

Large language model (LLM)-based agents often make suboptimal tool-use decisions, including unsupported tool invocation and hallucinated direct responses, which may accumulate errors throughout multi-step interactions. Existing approaches…

tool-calling agentic
Attack Selection in Agentic AI Control Evaluations Meaningfully Decreases Safety (arxiv.org) 2w

An attacker that strategically chooses when to attack is much harder to catch than one that attacks indiscriminately. AI control is a safety framework for deploying capable but untrusted AI agents under the oversight of a weaker, trusted m…

agentic
Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory (arxiv.org) 2w

Equipping Large Language Models (LLMs) to execute reliable multi-step workflows has become a central challenge in artificial intelligence. Despite recent advances in LLMs' agentic capabilities, most agent systems still lack formal methods…

agentic
Gemma4_31b_fp8 keeping up with Sonnet_4.6_medium in my harness. (www.reddit.com via reddit) 2w

gemma qwen agentic
The Open Source Community is backing OpenEnv for Agentic RL (huggingface.co) 2w

agentic
datasette-agent-edit 0.1a0 (simonwillison.net) 2w

7th June 2026 I'm planning several plugins for Datasette Agent which can make edits to existing pieces of text - things like collaborative Markdown editing, updating large SQL queries, and editing SVG files. Agentic editing of text is a li…

agentic
Hear Me Out, Pi Fans Lurking Here (www.reddit.com via reddit) 2w

Not For Thee Maybe After watching several interviews with Pi's creator, Mario Zechner, I've come to a painful realization: Pi was not designed with local LLMs in mind at all. He is essentially building a leaner version of the Claude CLI.

deepseek agentic
why I have just installed OpenLumara, my first Agentic Framework. Using only local models, served by LMStudio (www.reddit.comhttps) 2w

Where I came across it: https://www.reddit.com/r/LocalLLaMA/comments/1txxgpq/openlumara_a_different_kind_of_ai_agent_written/ DISCLAIMER: A good posting would be: This is what I wanted to do with Lumara. Here is what worked, here is what d…

↯ Qwen 3.6 qwen agentic
The Illusion of Finished Work in Claude Code (www.reddit.comhttps) 2w

I wrote a short essay about something I keep noticing with Claude Code: the output often has the shape of finished work before it has actually been verified. Claude Code can now explore a codebase, plan changes, edit files, run commands, c…

agentic claude-code
Removing the human from AI coding is a harness problem, not a model problem (www.reddit.com via reddit) 2w

TL;DR: Better models won't make AI coding trustworthy but better harnesses will. Stop trusting what the agent says, verify it with code.

agentic
Agentic Self Improvement Loop Kicked Off - Watch it Evolve? (www.reddit.com via reddit) 2w

You are TEMPO, an iterative self-play refinement engine and agent harness. Your purpose is to improve an attached artifact by applying the Tempo Methodology to it.

agentic
Agentic Roobinhood (www.reddit.com via reddit) 2w

Hi, did anyone try automatic Agentic Roobinhood trading with AI Agents. I did set up with claude but not sure if it's possible to trade automaticly 00-24 based on rules that we set up?

agentic
Local agents on a MacBook Pro M5 finally feel practical to me (www.reddit.com via reddit) 2w

Realtime check X for new people to follow I have been pretty pessimistic about local models for agentic workflows for a while. Not because they were useless, but because in practice they often felt just a bit too slow, too fragile, or too…

↯ Qwen 3.6 agentic
How do you increase prompt processing speed ? (www.reddit.com via reddit) 2w

I am rocking Qwen like we all know, at 24GB 7900XTX 230k context, but it starts at 850t/s and then lowers to 350t/s when its at 160k context prefill speed, which is frustrating me for my long agentic runs. What is there to be done in order…

qwen agentic
Reddit Agentic AI ecosystem (www.reddit.com via reddit) 2w

Here are few things I observed in Agentic AI groups in reddit: Any member who is using agentic AI in these groups are also building their own AI agents and quite competitive Almost all members use AI, but most also look with distrust to an…

agentic openai
Running Hermes fully local (www.reddit.com via reddit) 2w

Before Hermes was announced, I was working on my own fully local, personal agentic system. Now, I'm a novice when it comes to coding.

↯ Qwen 3.5 agentic
AI helped our test suites hit 95% coverage and bugs still slipped through. So PRs now climb an autonomous verification ladder before a human reviews. (www.reddit.com via reddit) 2w

Intro + Context [TLDR at the bottom for my skim readers 😄] We run Claude Code and Codex with a full agentic pipeline across our entire SDLC. Our workflow, by default, incorporates cross-model auditing, where Claude and Codex usually have t…

codex agentic claude-code
Can the Pro subscription $20, add Usage Credits to be used in the Xcode native agentic integration? (www.reddit.com via reddit) 2w

Let say, I am using the Claude agent with Xcode, I run out of my $20 equivalent usage and I have to continue coding, can I purchase Usage Credits and continue using the Xcode native agent integration with the credits at API rates?

agentic
OpenClaw + Hermes users: where does your agent army actually live? (www.reddit.com via reddit) 2w

I’m working on ClawBud, a managed Agentic OS for running OpenClaw, Hermes, Claude Code, Codex and other agents on one private cloud computer, so I’m obviously biased. But this is the problem I keep seeing everywhere: The agent itself is no…

openclaw codex agentic+1
Z.ai, we need Air! GLM GGUF wen? (www.reddit.com via reddit) 2w

First we never saw an upgraded Air model after 4.5. Then GLM 4.7 Turbo was great, but quickly surpassed for coding.

↯ Glm ↯ Qwen 3.6 glm gemma qwen+1
What are you running on 16Gb VRAM + 64Gb Ram? (www.reddit.com via reddit) 2w

I know this gets asked a lot, but I can only find threads that are at least a couple of months old, so I thought I'd ask to see what people are running these days. I have an RTX5080 and 64Gb Ddr5 RAM.

llama agentic
Claude Code thoughts: plan mode, ultracode and... beads. (www.reddit.com via reddit) 2w

Hi folks, Looking for other people's experiences and opinions here. I've been finding Ultracode very useful.

agentic claude-code
Has anyone actually replaced Claude Code / Codex with local models on an Macbook Pro M5 Max 128GB? (www.reddit.com via reddit) 2w

Considering buying a maxed out MacBook Pro M5 Max with 128GB of RAM and one of the things I want to figure out before pulling the trigger is whether local models are good enough to actually replace cloud AI coding tools. My current setup i…

↯ Copilot ollama copilot codex+2
skipworkflow.com – Perfect premium brand or high-converting redirect for an AI Agent / Automation SaaS (www.reddit.com via reddit) 2w

If you’re building in the AI agent or B2B automation space, you know that the entire goal of agentic AI is to eliminate clunky, multi-step legacy processes. The ultimate selling point to your customers is simple: skip the workflow and just…

agentic
Experimentation with Qwen 3.6 and Gemma 4 - Guidance needed (www.reddit.com via reddit) 2w

I’m a web developer doing mostly coding, but also project management, requirements analysis, testing, etc. I recently started experimenting with local LLMs, mostly because agentic stuff finally made them feel useful.

↯ Qwen 3.6 moe gemma qwen+1
Claude's new background tasks panel is exactly how agentic UIs should look (www.reddit.com via reddit) 2w

https://preview.redd.it/it0c4w60xn5h1.png?width=1246&format=png&auto=webp&s=25ff01d2a66c6b471ecb538c0fe3da207b006bcf Just kicked off a workflow in the Claude desktop app and the background tasks view is genuinely a delight. One job, three…

agentic
Same LLM model but not same performance through wrappers (GitHub Copilot, M365, Vertex AI) why is that ? (www.reddit.com via reddit) 2w

Claude Code and Opus 4.7/4.8 are clearly better used direct from Anthropic than through GitHub Copilot, M365 Copilot, or Vertex AI. Sharper instruction-following, longer coherent outputs, stronger agentic behaviour on identical tasks.

↯ Copilot ↯ Opus 4.7 copilot opus agentic+2
Agentic ai roadmap (www.reddit.com via reddit) 2w

So right now am working as a software engineer in a startup and i have to switch my career into agentic ai roles.where do i start? i can understand python.Give me a roadmap and also the resources i could use to study.whats the scope of the…

agentic
What are the best resources to learn AI Agents in 2026? (www.reddit.com via reddit) 2w

The context is that I am a software engineering final year student. I also have experience working in ML, DL, NLP i.e I have the basics nailed.

agentic
Learn Agentic AI with quick, easy to run hands on labs, visual canvases and notebooks for free! (www.reddit.comhttps) 2w

If you’re a full-stack engineer or technical architect willing to learn production-grade enterprise agents, you need architecture, security, and type-safe systems. That’s why we builtAgentSwarms.fyi—the ultimate hands-on educational platfo…

agentic
Opus 4.8, a 40+ point elo Regression on LmArena (www.reddit.com via reddit) 2w

https://preview.redd.it/hficgswa6m5h1.png?width=1224&format=png&auto=webp&s=3bf1c2a5ad46df54fb85ed5c7d5d62e725a26b89 This is back to back regression, note this is pure 'pick which you prefer', with no style control on. With style control i…

↯ Opus 4.8 opus agentic
Agentic AI for P2P mobile hardware (www.reddit.com via reddit) 2w

I have the agents, skills, mcps, rules for data validation setup. Now looking for an orchestrator.

agentic
Does anyone know of a team software solution with an agentic orchestration workflow built in? (www.reddit.com via reddit) 2w

I’ve learned a bit about creating and deploying AI agents, but I still haven’t figured out how to get them to work together. What I want is an agent that picks up a task, pulls context from wherever it lives, executes the workflow, and clo…

agentic
Gemma 4 QAT benchmark results (AMD 7900 XTX): faster, less VRAM, no quality loss (www.reddit.com via reddit) 2w

I’ve been doing lots of testing back and forth with this 7900xtx. All of my workloads were relying on qwen3.6 models, which are amazing fwiw, but I wanted some diversity in thought.

↯ Qwen 3.6 gemma agentic
How do you integrate spec driven development with your agentic setups? (www.reddit.com via reddit) 2w

We’ve been trying to move our team toward a strict Spec-Driven Development (SDD) workflow, but I've been having a hard time. I think because of the scale we're at, our agents very often starts drifting, breaking adjacent code, or completel…

agentic
agentic code review is quietly replacing the way my team does PRs (www.reddit.com via reddit) 3w

Our PR review process used to be pretty painful. We have 6 devs and 2 seniors, and every meaningful review had to go through one of those two.

codex cursor agentic+1
Adaptive Auto-Harness: Sustained Self-Improvement for Agentic System Deployment on Open-Ended Task Streams (arxiv.org) 3w

agentic
AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning (arxiv.org) 3w

agentic
Deliberate Evolution: Agentic Reasoning for Sample-Efficient Symbolic Regression with LLMs (arxiv.org) 3w

agentic
ProSPy: A Profiling-Driven SQL-Python Agentic Framework for Enterprise Text-to-SQL (arxiv.org) 3w

Large language models have substantially advanced Text-to-SQL systems, yet applying them to enterprise-scale databases remains challenging. Real-world databases often contain large and heterogeneous schemas, incomplete metadata, dialect-sp…

agentic
AgenticRL: Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation (arxiv.org) 3w

Deep reinforcement learning has shown strong potential for enabling autonomous robots to learn complex navigational tasks. However, its practical use still depends heavily on human designed reward functions and repeated manual fine tuning,…

agentic
CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe (arxiv.org) 3w

High-performance GPU kernels are critical to modern machine learning systems, yet developing them remains a manual, expert-driven process. Recent work has explored using LLMs to automate kernel generation, but generated kernels still fall…

agentic
A2RAG: Adaptive Agentic Graph Retrieval for Cost-Aware and Reliable Reasoning (arxiv.org) 3w

Graph Retrieval-Augmented Generation (Graph-RAG) enhances multihop question answering by organizing corpora into knowledge graphs and routing evidence through relational structure. However, practical deployments face two persistent bottlen…

rag agentic
Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding (arxiv.org) 3w

Long video understanding (LVU) is challenging because answering real-world queries often depends on sparse, temporally dispersed cues buried in hours of mostly redundant and irrelevant content. While agentic pipelines improve video reasoni…

agentic
Industrializing Prediction-Powered Inference: The GLIDE Library for Reliable GenAI and Agentic Systems Evaluation (arxiv.org) 3w

Reliable evaluation of agentic systems requires unbiased estimates with valid uncertainty, but standard practice navigates between costly human annotation and biased LLM-as-judge proxies. Prediction-powered inference (PPI) combines both in…

agentic
ProfiliTable: Profiling-Driven Tabular Data Processing via Agentic Workflows (arxiv.org) 3w

Table processing-including cleaning, transformation, augmentation, and matching-is a foundational yet error-prone stage in real-world data pipelines. While recent LLM-based approaches show promise for automating such tasks, they often stru…

agentic
Ontology-Constrained Neural Reasoning in Enterprise Agentic Systems: A Neurosymbolic Architecture for Domain-Grounded AI Agents (arxiv.org) 3w

Enterprise adoption of Large Language Models (LLMs) is constrained by hallucination, domain drift, and the inability to enforce regulatory compliance at the reasoning level. We present a neurosymbolic architecture implemented within the Fo…

↯ Hallucination hallucination agentic
Knowledge Activation: AI Skills as the Institutional Knowledge Primitive for Agentic Software Development (arxiv.org) 3w

Enterprise software organizations accumulate critical institutional knowledge - architectural decisions, deployment procedures, compliance policies, incident playbooks - yet this knowledge remains trapped in formats designed for human inte…

agentic
HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers (arxiv.org) 3w

For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial r…

agentic
Human oversight of agentic systems in practice: Examining the oversight work, challenges, and heuristics of developers using software agents (arxiv.org) 3w

Autonomous software agents hold promise to increase developer productivity but make mistakes and exhibit novel failure modes, making human oversight central to successful human-agent collaboration. Existing research on agent oversight is l…

agentic
Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents (arxiv.org) 3w

LLM agents operate in two distinct regimes: open-weight agents amenable to reinforcement learning (RL) and black-box agents whose behaviour must be controlled purely at test time. Although black-box agents are often backed by state-of-the-…

agentic
Unsupervised Skill Discovery for Agentic Data Analysis (arxiv.org) 3w

Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reusable procedural knowledge without updating model parameters. However, discovering effective skills for data analysis remains chal…

agentic
From Reward-Hack Activations to Agentic Risk States: Context-Calibrated Mechanistic Monitoring in LLM Agents (arxiv.org) 3w

Language-model agents act through repeated cycles of observation, reasoning, and action selection, making safety monitoring depend on both internal model state and environment context. We study reward-hacking monitors in ReAct-style agents…

agentic
Evaluating Agentic Configuration Repair for Computer Networks (arxiv.org) 3w

Misconfigurations in computer networks remain a major source of critical Internet outages. Research is turning to Large Language Models (LLMs) to automate the complex, error-prone task of network configuration.

agentic
Agentic Molecular Recovery via Molecule-Aware Exploration (arxiv.org) 3w

Text-guided molecular generation with LLMs often yields invalid SMILES. We argue that invalid drafts should be addressed through a shift from validity-oriented repair to identity-preserving molecular recovery: the objective is not only to…

agentic
AdaMEM: Test-Time Adaptive Memory for Language Agents (arxiv.org) 3w

A central challenge for language agents is utilizing past experience to adapt to dynamic test-time conditions. While recent work demonstrates the promise of agentic memory mechanisms, most systems restrict retrieval to episode initiation.

agentic
SciVisAgentSkills: Design and Evaluation of Agent Skills for Scientific Data Analysis and Visualization (arxiv.org) 3w

Recent advances in agentic visualization have enabled the translation of natural language into executable scientific visualization (SciVis) workflows. While general-purpose coding agents show strong capabilities, they often lack the tool-s…

agentic
Insurance of Agentic AI (arxiv.org) 3w

Agentic artificial intelligence (AI) systems are transforming the risk landscape by extending beyond information generation to autonomous planning, tool invocation, decision execution, and persistent modification of digital and physical en…

agentic
Introducing new capabilities to GPT-Rosalind (openai.com) 3w

We’re introducing a new model update to our GPT‑Rosalind series purpose-built for life sciences research at enterprise scale. It combines GPT‑5.5’s agentic coding and tool-use capabilities with stronger model intelligence in core drug-disc…

agentic
Rehumanizing global health care with agentic AI (www.technologyreview.com) 3w

Sponsored Rehumanizing global health care with agentic AI As health-care providers face looming staff shortages, AI agents are automating complex administrative tasks and even clinical decisions so humans can focus more on patient care. In…

agentic
How Endava builds an agentic organization with Codex (openai.com) 4w

Endava, a global software contracting firm with engineers across Europe, the Americas, and Asia, has been an early adopter of Codex. For a business built around shipping quality software for banks, insurers, retailers, and media companies,…

codex agentic
I used to think 2026 would be the year AI finally blew everyone's minds again (www.reddit.com) 7 4w

That belief lasted until I actually read the trend lists this year. Every single one leads with "agentic AI" or "autonomous agents." Sounds like AI is still the star, right?

agentic
ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM (huggingface.co) 4w

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM Enterprise Article Published May 27, 2026 Artificial Analysis and IBM Software Innovation Lab are launching…

agentic
Built a 5-stage agentic pipeline using Claude Code + MCP - here's what actually makes it reliable at scale (www.reddit.com) 9 4w

The thing nobody tells you about Claude Code + MCP workflows: the model is only as reliable as the instructions you give it before it touches any external tool. We learned this the hard way building a sales pipeline that connects Claude Co…

mcp agentic claude-code
760M Tokens… MTD 👀 (www.reddit.com) 4w

I built an enterprise grade revenue management tool for a specific real estate vertical. Thus far, it has beyond dominated past human performance.

agentic
Introducing FLYWHEEL.md 🌀 (www.reddit.com) 2 4w

Agentic coding just crossed a line. Claude Code, Cursor, Codex, OpenClaw, the list keeps growing, and they all run fully autonomous now: /loop, /goal, crons.

openclaw codex cursor+2
"Human-in-the-Loop" Is Not a Reliability Strategy (www.reddit.com) 4 4w

A lot of AI agent systems quietly rely on this architecture: |> Agent does something risky |--> Human notices problem |--> Human fixes it That's not reliability - that's operational debt. One thing I've learned building agentic systems: If…

agentic
Microsoft Copilot Cowork Exfiltrates Files (simonwillison.net) 4w

26th May 2026 - Link Blog Microsoft Copilot Cowork Exfiltrates Files (via) The biggest challenge in designing agentic systems continues to be preventing them from enabling attackers to exfiltrate data. In this case Microsoft Copilot Cowork…

↯ Copilot ↯ Cowork cowork copilot agentic
Rethinking organizational design in the age of agentic AI (www.technologyreview.com) 4w

Sponsored Rethinking organizational design in the age of agentic AI For agentic AI to deliver material benefits to organizations, it can’t be layered onto existing operations. Instead, enterprise leaders must approach it as a systems-level…

agentic
The reason small-model agent stacks aren't the default has nothing to do with whether they work (www.reddit.com) 12 4w

Last June, NVIDIA published a position paper called "Small Language Models are the Future of Agentic AI," and the argument was easy enough to wave off at the time: most of what an agent actually does is unglamorous work like reading input,…

agentic
I built an open-source profiler for instrumenting Claude Code. (www.reddit.com) 1 4w

I kept running into Claude Code subagents and skills that performed poorly, and I had no good way to investigate why. Traditional software has had profilers and debuggers for decades.

agentic claude-code
I read threads complaining about codex every week... tf are y'alls workflows? (www.reddit.com) 12 4w

For context: I'm a software eng @ a fortune 500/FAANG tier company. We use AI.

codex agentic
Everyone talks about AI wrappers… nobody talks about agentic SEO (www.reddit.com) 7 5w

Everyone talks about AI wrappers… nobody talks about agentic SEO Feels like most founders are still thinking about SEO like it’s 2021: write blog target keyword wait 6 months 😭 Meanwhile people are building agent workflows that: find low c…

agentic
I’ve done it!!! FINALLY I have become a (quasi-local) summoner!!! AMA [imtiredboss.jpg] (www.reddit.com) 2 5w

Hi friends! After 2.5 years of a LOT of hard work...starting from the GPT-3.5 bottom and now we're here...I've finally got my personal 1.0 local-ish** AI playground whipped into shape.

agentic
Anthropic officially launched 13+ FREE AI courses with certificates (Including Agentic AI and CC) (www.reddit.com) 11 5w

Shipped it at 2am, still broken. Kid woke up crying right after, completely lost my train of thought.

agentic anthropic
Gemini 3.5 flash beating gpt 5.5 a bigger and more pricer model in agentic benchmarks (second image is from zapier automation benchmarks) (www.reddit.com) 2 5w

could not extract summary

↯ Gemini 3.5 gemini agentic
Post I/O Review related to AI (pros and cons ) (www.reddit.com) 2 5w

Post I/O Review related to AI (pros and cons ) Well it was not disastrous as many people say but there were some pros and cons which everyone will agree with. Btw gemini 3.5 flash is absolutely amazing model don't pay attention to some peo…

↯ Gemini 3.5 gemini mcp agentic
Buckle up: Google is set to remake search with agentic AI in 2026 (arstechnica.com) 5w

Last year marked the beginning of Google’s explicit focus on AI search, and this year’s I/O solidified that shift. As Google’s search VP Liz Reid said during the keynote, “Google search is AI search.” This change is well underway, and the…

agentic
Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks (news.ycombinator.com via reddit) 10 5w

could not extract summary

agentic
The next phase of OpenAI’s Education for Countries (openai.com) 5w

A new era of agentic AI is here. With more than 900 million people using ChatGPT each week, and more than 4 million using Codex, agents have the potential to place far greater creative, intellectual, and technical power in the hands of eve…

codex chatgpt agentic+1
Claude Opus is still king for agentic coding, but Claude's app workflow is falling behind (www.reddit.com) 3 5w

I'm a paid Claude user, and I still think Claude Opus is the king model for agentic coding and serious coding work. The model is not the problem.

↯ Cowork cowork codex opus+1
Agents creating their own language : reality or not ? Compliance issue. (www.reddit.com) 2 5w

Hi ! I've read a while ago that some AI's tend to agree on their own language to talk one to another over time.

↯ Qwen 3.6 agentic
Claude Code has 240+ models via NVIDIA NIM gateway (www.reddit.com) 1 5w

TIL Claude Code has 240+ models via NVIDIA NIM gateway — Nemotron-3 120B for agentic coding is surprisingly good So I was messing around with /model in Claude Code today and noticed something most people probably don't know about — after t…

haiku sonnet llama+3
Cost of Using LLMs in Agentic AI and RAG workflows (www.reddit.com) 1 5w

Hey Everyone ML engineer and Researcher here I’ve been researching production issues in Agentic AI + RAG systems and one pattern keeps showing up repeatedly: Context inefficiency. Not just retrieval quality — but the actual economics and s…

rag agentic
The Nanny Pattern (www.reddit.com) 3 5w

All good software turns into patterns. Agents are going to need theirs.

agentic
an alternative = similar experience to using windsurf but on local? (www.reddit.com) 1 5w

so i am not that experienced when it comes to llms, i just have ollama and open webui and occasionally test (play with) new releases from time to time. a few weeks ago i started using Windsurf, i do not know coding or anything but i loved…

↯ Windsurf windsurf ollama agentic
Best llama.cpp launch config for Qwen3.6 27B on RX 7800 XT (16 GB VRAM) for OpenClaw? (www.reddit.com) 2 5w

I’m trying to find the best llama-server launch command / runtime config for running Qwen3.6 27B GGUF with full GPU offload on ROCm. I’m currently using the IQ4_XS quant, but I’m not sure if that’s the best option for my setup.

↯ Qwen 3.6 moe openclaw llama+1
Hey Everyone! I’ve been experimenting with OpenCode + BoneScript for structured backend generation. (www.reddit.com) 5w

I’ve been experimenting with making coding agents generate complete backends using BoneScript, and it’s working surprisingly well. BoneScript’s structure ends up being extremely LLM-friendly: declarative system layout predictable architect…

agentic
Claude for Small Business launched this week with 8 integrations. Most SMBs use 20+. What does that mean for the rest of the stack? (www.reddit.com) 10 5w

Anthropic launched Claude for Small Business on Tuesday. The package includes 15 prebuilt agentic workflows and 8 named integrations: Intuit QuickBooks, PayPal, HubSpot, Canva, DocuSign, Google Workspace, Microsoft 365, and Slack.

agentic anthropic
Switching from Copilot: Is the $20 Pro plan enough for 4h/day of agentic coding? (www.reddit.com) 5 6w

I’m planning to switch from GitHub Copilot to Cursor. I’m currently working in a project and I spend about 4 hours a day on weekdays coding, mostly using AI as a agent.

↯ Copilot copilot cursor agentic
Sea's View on the Future of Agentic Software Development with Codex (openai.com) 6w

Sea's View on the Future of Agentic Software Development with Codex | OpenAI Skip to main content Research Products Business Developers Company Foundation(opens in a new window) Log inTry ChatGPT(opens in a new window) Research Products Bu…

codex chatgpt agentic+1
Cursor vs. Windsurf vs. Claude Code: Which offers the highest Opus limits for a $200 budget? (www.reddit.com) 6 6w

Hey everyone, I'm currently trying to decide between Cursor, Windsurf, and Claude Code for my daily workflow. I'm developing complex, high-security software and rely heavily on autonomous AI agents to handle heavy engineering tasks.

↯ Windsurf windsurf cursor opus+3
Data readiness for agentic AI in financial services (www.technologyreview.com) 6w

Sponsored Data readiness for agentic AI in financial services The success of agentic AI in financial services depends not just on smarter models, but on an authoritative context data store—one that is accessible, reliable, and governed at…

agentic
You're abusing your subscription with agentic 24/7 workflows and that's why we all get restrictions and limits (www.reddit.com) 2 6w

Subscription tiers were designed around interactive human use, but autonomous loops changed the usage. It makes sense that companies separate autonomous work from subscriptions.

openclaw agentic
Are we at the point now where all it will take to create AGI is saying the correct sequence of words to Codex or Claude Code? (www.reddit.com) 15 6w

Seems to me like they can basically do everything software related now so surely a good enough sequence of input tokens would be enough. I guess in a way it's guaranteed since the frontier labs are doing all their work through agentic flow…

codex agentic claude-code
"Maybe me too": Elon Musk accepts some of the blame for Claude learning to blackmail users from "evil" online AI stories (fortune.com via reddit) 2 6w

Anthropic has released new findings on why its Claude bot blackmailed users as part of an experiment conducted by the AI company last year—and Elon Musk is jumping in to take some of the blame. Last week, Anthropic published a report sayin…

agentic anthropic
Meet Mindflow, the free local mindmap with local AI dev by some quantitized models :P (www.reddit.com) 6w

Hi there, it's my first post there and i'm not a native english speaker so what's follow is (mostly) translated by an AI. I had fun building a mindmap tool in a single monolithic HTML file.

↯ DeepSeek 4 qwen agentic
Prompt alignment is an architectural ceiling: The Soap Bubble Problem and the biological precedent for Runtime Governance. (www.reddit.com) 2 6w

The Soap Bubble Problem The current paradigm of solving agentic alignment relies on writing better rules into the context window or refining the weights (RLHF). This approach isn't failing, but it is hitting a hard architectural ceiling.

rlhf agentic
Is Anyone building Useful skills or workflows on Claude? (www.reddit.com) 5 6w

I've been exploring Claude as a base for building custom tools and automations — things like structured prompts, agentic workflows, and even full mini-apps powered by its API. Curious whether others are doing the same: - Are you building s…

agentic
Gartner says 40% of AI agent projects will be cancelled by 2027. Are we in an agent bubble? (www.reddit.com) 8 6w

Gartner just dropped this prediction and I can't stop thinking about it. ![img](g98lrjzwph0h1) **40% of agentic AI projects will be cancelled by 2027.

agentic
$392M in AI agent security funding at RSAC 2026 - the market just validated what we've been building (www.reddit.com) 6w

The numbers from RSAC 2026 are wild. $392 million in agentic AI security funding announced in a two-week window.

↯ Security prompt-injection security agentic
Do you have any agentic sw developers in your org? (www.reddit.com) 1 6w

Hi all, Do you or your org use/put in place an agentic de developer? To which humans give the requirements and it gives out PRs?

agentic
AI agents are becoming more useless, not more intelligent — and they’re wasting more tokens than ever (www.reddit.com) 16 6w

I’m honestly getting tired of the hype around “AI agents” when the reality is getting worse, not better. Every AI model claims to be “intelligent”, “agentic”, “capable”, or “autonomous”, but when you actually try to use them for a real tas…

agentic
Practical lessons from 50K lines of production code with Claude Code (jappiesoftware.com via reddit) 1 6w

I've been using Claude Code in full agentic mode for two months — not just autocomplete, but letting it write features, run tests, read CI output, and push fixes. Around 50K lines of production code.

agentic claude-code
Moderators deleted post (www.reddit.com) 6w

I posted recently about QwenPaw (really cool Alibaba model) and Agentscope… Asking if anyone has any interesting experience with it? However what I’ve got back is someone doubting Alibaba absolutely astounding agentic R&D team work (yes -…

openclaw qwen agentic
Best agentic model for 3090TI and 32gb ddr5 (www.reddit.com) 5 6w

Title, looking for the best combination of speed and intelligence.

agentic
How to get an LLM caught up on a 1000 page document? (www.reddit.com) 14 6w

I’m looking to be able to use a small, like 4-9B LLM, that would be able to ingest an extremely dense code book, 1000 plus pages, and me be able to use it to summarize and ask questions about that document. The use case will be offline str…

agentic
Anthropic raising Claude limits + adding SpaceX capacity feels like a bigger signal than people realize (www.reddit.com) 14 7w

Anthropic just raised Claude usage limits and announced a compute deal with SpaceX. To me, that feels bigger than “more GPUs.” If Claude Code, finance agents, security workflows, and long-running agent tasks are the direction, then capacit…

agentic anthropic claude-code
what's genuinely so special about claude? (www.reddit.com) 11 7w

there are like a huge amount of open source LLMs out there, and a huge amount of companies competing against Anthropic. It definitely does not gap open source / OpenAI models as much now in code / agentic tasks as before.

chatgpt agentic openai+1
Running Claude code on VPS with a $20 plan will my account get banned (www.reddit.com) 1 7w

I just want to be able to run my Claude code on an EC2 instance instead of my local computer and access it via Telegram using the official plugin and a $20 Claude subscription for personal agentic stuff. What I’m wondering is: is there any…

agentic claude-code
I analyzed 922 agentic task trace and found the secret weapon of DeepSeek v4 (www.reddit.com) 7w

I recently did a benchmark of deepseek v4 in agentic tasks. Performance-wise, it's one of the best open source models, as expected.

↯ DeepSeek 4 deepseek opus agentic
Is the future agentic Slack, not agentic IDE? (www.reddit.com) 2 7w

One dev with Claude Code is already fast, that's been my experience using it daily. The moment more than one person on a team starts running agents in parallel, things fall apart fast: overlapping work, conflicting assumptions, and a flood…

agentic claude-code
Vibe coding and agentic engineering are getting closer than I'd like (simonwillison.net) 7w

Vibe coding and agentic engineering are getting closer than I’d like 6th May 2026 I recently talked with Joseph Ruscio about AI coding tools for Heavybit’s High Leverage podcast: Ep. #9, The AI Coding Paradigm Shift with Simon Willison.

agentic
I am trying to replace Claude in an agentic TDD pipeline with local LLM (www.reddit.com) 12 7w

Based on my last post and some comments, I added Qwen3.6:latest and Devstral to the evaluation. I am still looking for suggestions on which local model can run a complete TDD loop autonomously.

↯ Qwen 3.6 agentic
Claude 4.7 "Literalism" Claim vs. Reality: Why does it keep ignoring formatting and logic constraints? (www.reddit.com) 2 7w

According to the release notes, Claude 4.7 is supposed to prioritize literal instruction adherence over intent guessing. However, I’m seeing some major regressions in reliability: PEP8 Violations: Despite strict instructions to keep import…

agentic
Claude can now build and publish websites to a domain right from chat (www.reddit.com) 7w

I built teenyapp.com, a tool that lets Claude on the web (or any AI chat) build and deploy a full website end to end from a single pasted link. The problem teenyapp solves: every time I asked Claude to actually ship something, the agentic…

agentic
I will soon have $100k to build an in-house LLM server. Goal: Best agentic coding model. (www.reddit.com) 3 7w

Hey all, I am about to secure funding for a startup I've been working on and I'll have a $100k budget for building a server for doing agentic coding. I'm wondering, what do you think I should get as far as hardware goes?

agentic openai anthropic
Agentic Convergence-in-Depth: solving the One Nine reliability problem (www.enterprisevibecode.com via reddit) 3 7w

Claude Code dipped under 99% uptime in March 2026 — most critical services aim for 99.9%. The verification systems we trust for human-written code don't necessarily scale to code no one reads.

agentic claude-code
Anyone with M3 Ultra 256gb, some questions (www.reddit.com) 20 7w

I'm thinking to buy one. Just need to understand what I'm getting into before I do.

agentic
I am building l' Agence , an opensource AI governance stack. (www.reddit.com) 4 7w

Towards a Governance layer for AI agents With these last 2 weeks bringing a few high profile and costly Agentic accidents , it seems like an appropriate time the community started discussing Agentic governance more actively. So I am just c…

↯ Security red-team security agentic
Since the industry is rapidly changing, I put together a comprehensive article explaining the current best AI coding agent software for May, 2026 (lmsa.app via reddit) 7w

The software development lifecycle has transitioned into an era defined by agentic orchestration, moving beyond the simple autocomplete paradigms of the early 2020s. As of May 2026, the landscape is d

agentic
Need advice on Qwen 3.6 27B INT4 quantization (www.reddit.com) 3 7w

Hello everyone, I think Qwen 3.6 27B is good enough that it might take a while before we get a clearly better model at a similar size. I have a single headless RTX 3090 with a 300W power limit.

↯ Qwen 3.6 vllm qwen llama+1
RTX 5080 with 16 GB VRAM, 64 GB RAM best quantized model for programming? (www.reddit.com) 8 7w

I have an RTX 5080 with 16 GB of VRAM and 64 GB of RAM. What's the best quantized model I can run locally on this setup for agentic programming?

agentic
“Free” image generation isn’t free. You’re paying for it whether you use it or not. (www.reddit.com) 21 7w

flat-rate AI subscriptions hide a pretty wild cost-to-value mismatch, and image generation is the issue. the spread in what users actually cost on the same plan is easily 10-100x.

agentic
Should I buy Claude Pro as a BTech student — especially for the agentic/coding side? Honest takes wanted (www.reddit.com) 11 7w

https://preview.redd.it/l23rgf5z4qyg1.png?width=1402&format=png&auto=webp&s=73a7a278ca50527c9605488141d7e5ea48089a85 Hey everyone, I'm a BTech (AI/ML) student considering Claude Pro ($20/month) but want to separate the real value from the…

tool-calling agentic claude-code
claude-code-best-practice 🇵🇰 repo crossed 50,000★ and is Pakistan most starred repo in 2026 (www.reddit.com) 1 7w

I started this repo with claude to maintain all the claude best practices. 100% developed using claude code.

agentic claude-code
Best Agentic Coding model I can run on the new Macbook M5 Max? (www.reddit.com) 13 7w

16-inch MacBook Pro - M5 Max Component Specs Chip Apple M5 Max CPU 18-core (6 super cores @ 4.6 GHz, 12 performance cores @ 4.4 GHz) GPU 40-core (Hardware-accelerated ray tracing + Neural Accelerators) Memory Bandwidth 614 GB/s Neural Engi…

agentic
Is AGI the End For Local LLMs? (www.reddit.com) 10 8w

If leading AI conpanies are after AGI and the whole chatbot/agentic AI is just a phase for them to get to the end goal, then what does that mean for local LLMs? I would like to believe local LLMs are the future, but if AGI is achieved, do…

agentic
thinking of gemma 4 26B vs 31B (www.reddit.com) 2 8w

I see a big difference in agentic coding between gemma-4-31B-it-Q5_K_M and gemma-4-26B-A4B-it-UD-Q8_K_XL. The 26B model is much faster because of A4B and generally works well, but there is a big difference in thinking.

↯ Gemma 4 vllm moe gemma+2
Reasoning Guard: Stopping LLM Thinking Loops at the Proxy Layer (www.reddit.com) 8w

Reasoning Guard: Stopping LLM Thinking Loops at the Proxy Layer I’ve been running Qwen3.6 MoE behind a vLLM proxy and hit a specific reliability issue: occasional runaway reasoning loops. This isn’t a criticism of Qwen3.6.

↯ Qwen 3.6 vllm moe agentic
AI --> GenAI --> Agentic AI --> What Next? How Can One Understand This Industry? (www.reddit.com) 13 8w

Is artificial intelligence truly overrated, or are we underestimating the scale of its future impact? While some argue that AI is surrounded by hype and inflated expectations, others believe it will fundamentally reshape industries, econom…

agentic
Roman Yampolskiy predicts 3 to 5 years until AGI and a dangerous Agentic future Post AGI! (www.youtube.com via reddit) 24 8w

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

agentic
We’re entering a weird phase of AI agents where the tech is finally good… but the expectations are still stuck in 2023. (www.reddit.com) 4 8w

Everyone keeps talking about “autonomy,” “multi-agent swarms,” and “agents that think like humans,” but the real breakthroughs I’m seeing aren’t flashy at all. They’re boring.

agentic
Qwen 35B-A3B as an always-on agentic loop on a 16GB Mac M4: disk became the bottleneck before RAM (www.reddit.com) 2 8w

M4 Mac Mini, 16GB unified, basic spec. For a few weeks I had Qwen 3.5 35B-A3B UD-IQ3_XXS (12GB on disk) running under llama.cpp with --mmap and --flash-attn.

↯ Qwen 3.5 moe ollama sonnet+6
I built Claudex, a free-to-try open-source CLI for Claude Code-style workflows (www.reddit.com) 4 8w

https://reddit.com/link/1sxh0ec/video/egfs5inxtsxg1/player I built Claudex specifically for people who like Claude Code-style agentic coding workflows but want a simpler plug-and-play terminal setup The setup is the main thing I wanted to…

↯ Copilot cline ollama deepseek+7
Got the system prompt of Claude Design, released it for free (www.reddit.com) 6 8w

Claude Design is great, but I wanted to have similar capabilities with any LLM or agentic tools (Claude-Code, Codex etc). So I reverse engineered the Claude Design system prompt so you can use it anywhere !

codex agentic anthropic
OpenAIs Agentic Shift (www.reddit.com) 5 8w

OpenAI is rolling out agents capable of autonomous, multi-step workflows, with reports suggesting they are exploring an acquisition of agent orchestration company Windsurf. Google's $40B Anthropic Investment: Google is committing up to $40…

↯ Model Context Protocol ↯ Windsurf windsurf model-context-protocol mcp+3
Agentic AI is here for mobile. We built an autonomous agent that creates and self-heals its own background integrations. (www.reddit.com) 2 8w

Hey everyone, we just launched our iOS AI Agent out of a 1k-user beta, and I wanted to share the architecture - specifically how we handle the privacy vs. utility tradeoff.

agentic
Got a server with 8x A6000's how do I setup? (www.reddit.com) 13 8w

Hey guys got some resources that just became available at org. What's the quickest way to get setup on a multigpu setup?

qwen agentic
Putting Lipstyk on a pig - agents write most of my code, so I wound up making a static slop analysis tool (www.reddit.com) 1 8w

lipstyk — static analysis for machine-generated code patterns I've been neck deep in agentic dev for a while. Started on Pi, ended up building my own toolset on top of it, and at this point the agents output most of the code while I play t…

agentic
My entire subnet just got permanently IP banned because of LangChain web scraper. Please help. (www.reddit.com) 6 8w

I feel sick. I built a simple agentic workflow to pull competitor docs and synthesize them for a project.

agentic
I created SpecDD - an agent-native spec framework that clears most agentic dev roadblocks, including capability degradation on large and complex codebases. Works great with Claude! (www.reddit.com) 1 8w

If you've been building with AI coding tools, you've probably hit this wall at least a few times: Code kind of works but drifts from your architecture Endless prompt loops to fix small misunderstandings and assumptions Context and patterns…

cursor agentic
QClaw-4B — a 4B agent model fine-tuned for tool use and agentic workflows (www.reddit.com) 3 8w

QClaw-4B is a 4-billion parameter language model fine-tuned for agentic tasks and tool use, designed for use with OpenClaw-compatible agent frameworks. Despite its compact size, QClaw-4B achieves state-of-the-art results in the 4B class, m…

↯ Tool Use ↯ Glm tool-use glm openclaw+1
Agentic company OS: (www.reddit.com) 2 8w

I shared this project here before when it was mainly a governed multi-agent execution prototype. I’ve kept working on it, and the current implementation is materially more complete, so I wanted to post an update with what actually exists n…

agentic
Using agentic coding safely. (www.reddit.com) 2 8w

Building an application by hand lets you create a mental model of how the applications works. But agentic coding forces the agent to create a mental model each time you start a new session.

agentic
DeepSeek-V4: a million-token context that agents can actually use (huggingface.co) 9w

DeepSeek-V4: a million-token context that agents can actually use Focusing on long running agentic workloads. Running a frontier open model as an agent today breaks in predictable ways.

↯ DeepSeek 4 deepseek agentic
Google unveils two new TPUs designed for the "agentic era" (arstechnica.com) 9w

Most of the companies that have fully committed to building AI models are gobbling up every Nvidia AI accelerator they can get, but Google has taken a different approach. Most of its cloud AI infrastructure is based on its line of custom T…

agentic
Best Agentic AI Operating Systems 2026 (honests review) (www.reddit.com) 1 9w

1. SimplAI Best for regulated enterprises that need air-gapped deployment and the fastest time-to-production (under 30 days).

agentic
How to best utilize local LLM give my hardware? (www.reddit.com) 1 9w

Hi all, I’m new to local LLMs but as someone who extensively uses agentic coding I thought I’d try it out. I am running a MacBook Pro with M3 Max 64gb ram.

↯ Qwen 3.6 qwen agentic claude-code
Kimi K2.6 as a replacement for Opus 4.7? Testing with OpenCode. (www.reddit.com) 18 9w

↯ Opus 4.7 opus agentic
Brand new dual 3090 PC - what should I install first for the best local agentic coding experience? (www.reddit.com) 6 9w

↯ Qwen 3.6 vllm qwen llama+1
When did you fully adopt agentic coding? (www.reddit.com) 8 9w

agentic
This agentic SKILL will save you a lot of money (medium.com via reddit) 9w

agentic
Best setup for agentic coding (largely unsupervised) 8gb VRAM and 32 GB Sys RAM, Olamma Cloud and a frontier sub? (www.reddit.com) 1 9w

Hi! I'm looking for a coding agent workflow where I can run a local model for implementation and something either cloud based ala Olamma Cloud and some sort of frontier subscription (ChatGPT, Claude, whatever) to have continuous coding wit…

chatgpt agentic
Testing Qwen3.6 with Hermes Agent on agentic coding. Locally with llama.cpp. (www.reddit.com) 9w

I'll be testing the setup and try out the Hermes Agent live: https://www.youtube.com/live/q5vqvwZykRI

↯ Qwen 3.6 llama agentic
Tried hermes agent with local gemma4 on ollama. free tokens are nice but the agent quality gap vs cloud is still huge (www.reddit.com) 3 9w

Saw a post about running hermes agent locally with gemma4 through ollama. zero api costs, unlimited tokens, full privacy.

↯ Gemma 4 ollama deepseek agentic
NVIDIA V100 32GB for AI in 2026 (www.reddit.com) 8 9w

hello. i have the oportunity of buying Nvidia V100 with 32GB for about 915$ / 775 euro.

↯ Gemma 4 gemma qwen agentic
Managing "collective consciousness" across multiple AI models without breaking the bank—how do you sync context? (www.reddit.com) 2 10w

Been running a distributed AI workflow to dodge token limits and play to each model's strengths, but I'm hitting a massive wall with context continuity. My current pipeline: Claude → High-level architecture & tech stack decisions (the "arc…

gemini codex agentic
Distilled my AI Agents and Skills definitions (www.reddit.com) 10w

I have significantly distilled my AI Agents and Skills definitions. My goal is to reduce the context size and token usage without impacting the quality of my development team.

↯ Fine Tuning fine-tuning agentic
Spring benchmark update: Gemma 4 / Qwen3.5 vs Gemma 3 / Qwen3 for chat (www.reddit.com) 3 10w

Google and Alibaba recently shipped Gemma 4 and Qwen3.5, so I wanted to see whether the new generations are actually better on my setup. My context is private local chat running on my own hardware, a Mac mini M4 Pro.

↯ Tool Use ↯ Qwen 3.5 tool-use gemma agentic
Why Your LLM Leaderboard Scores Don't Matter (www.reddit.com) 10w

Leaderboard scores often don’t translate to production performance — even with newer agentic / Arena-style evals. The main issue seems to be that benchmarks are standardized, while real systems depend heavily on prompts, data distribution,…

agentic
m5 pro 64gb worth it for local agents or wait? (www.reddit.com) 11 10w

I am currently on an m3 mbp with 24gb ram. For regular python and django work the machine is perfect and i have no need to upgrade for speed.

↯ Qwen 2.5 cline qwen agentic+1
Cloud AI is getting expensive and I'm considering a Claude/Codex + local LLM hybrid for shipping web apps (www.reddit.com) 31 10w

I'm a designer who's been working on web apps and plugins for the past 5 months. Right now I'm building an After Effects plugin (close to shipping) and a music learning game experience.

codex agentic claude-code
computation is the missing bedrock of agentic memory (www.reddit.com) 4 10w

link to full article in comments TLDR: - LLMs are the wrong substrate for memory. Prediction can't do routine work, repeatable work consistently.

agentic
Running a full agentic coding loop locally on a 3090. Here's what actually works in 2026. (www.reddit.com) 9 10w

After months of testing, I finally have a local setup that doesn't make me want to go back to the API. Hardware: RTX 3090 (24GB VRAM) Models tested: Qwen2.5-Coder 32B Q4_K_M, DeepSeek-Coder-V3 Q4, Llama 3.3 70B Q3_K_M Inference: llama.cpp…

↯ Llama 3.3 ollama deepseek llama+1
I have a Macbook AIR M5 Base and I want to run an Agentic Coding program, similar to Claude Code or Codex. Besides the model, how do I do it? I've already tried with Ollama, VS Code, Opencode, and haven't been able to. (I'm not a developer, sorry) (www.reddit.com) 6 10w

I started developing an app with Claude, but the credits run out very quickly. I thought that now with my new computer I could run something directly on it.

ollama openclaw codex+2
Claude Mythos found 27-year-old vulnerabilities it was never trained to find. That's the part enterprise AI roadmaps aren't accounting for. (www.reddit.com) 9 10w

The Project Glasswing coverage framed this mostly as a cybersecurity story. I think that misses the more interesting part.

↯ Anthropic Mythos ↯ Security security mythos agentic+1
Self employed, Small biz folks: Have you unlocked huge revenue gains with Claude specifically? (www.reddit.com) 8 10w

We've heard about the increase in productivity in engineering departments in large companies with Claude Code, but I'm curious about implementations in small businesses. I'm especially curious about folks who work for themselves (i.e. non-…

agentic claude-code
Excess of Agentic AI... does that make sense? (www.reddit.com) 3 10w

Does it make sense for AI companies to be limiting access to the AI models themselves, precisely because of Agentic AI? Let’s think about it, if there is already not enough computing power to sustain the gigantic, and increasingly excessiv…

agentic
How can I use agentic AI to automate my WFH dayjob? (www.reddit.com) 24 10w

TLDR: I work in cybersecurity, 99% as a SOC analyst. It's tedious repetitive work, ideal for automation.

agentic
Here is what most people get wrong about saving tokens with AST tools (www.reddit.com) 3 10w

I spent the last day benchmarking codebase context tools against a real AI agent. Not synthetic token counts.

aider sonnet agentic
Agentic Guardrails: 4 markdown workflows to improve the output quality of AI coding agents (github.com via reddit) 1 10w

Agentic Guardrails Reusable workflow templates that keep AI coding agents from shipping sloppy code. These are markdown-based instructions that any AI coding agent can follow — Cursor, Claude Code, opencode, Aider, Gemini CLI, or anything…

aider gemini cursor+2
reliable way just to have cursor agentic ability and IDE with external provider api without cursor pro ? ( via reddit) 10w

could not extract summary

cursor agentic
Gemma 4: Byte for byte, the most capable open models (deepmind.google) 12w

Gemma 4: Byte for byte, the most capable open models Today, we are introducing Gemma 4 — our most intelligent open models to date. Purpose-built for advanced reasoning and agentic workflows, Gemma 4 delivers an unprecedented level of intel…

gemma agentic
Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective (huggingface.co) 21w

agentic
Netomi’s lessons for scaling agentic systems into the enterprise (openai.com) 24w

agentic
OpenAI co-founds Agentic AI Foundation, donates AGENTS.md (openai.com) 28w

agentic openai
Inside Mirakl's agentic commerce vision (openai.com) 29w

agentic
Introducing Aardvark: OpenAI’s agentic security researcher (openai.com) 34w

↯ Security security agentic openai
Buy it in ChatGPT: Instant Checkout and the Agentic Commerce Protocol (openai.com) 38w

chatgpt agentic
Introducing Gemini 2.0: our new AI model for the agentic era (deepmind.google) 80w

gemini agentic
Achieving 10x growth with agentic sales prospecting (openai.com) 105w

agentic

← all tags