I put together a Rust-native, CPU-only implementation of LFM2.5-8B-A1B (www.youtube.com via reddit)
event
Tool Use
-
-
How OpenAI and Anthropic each build data agents differently - DataChain (www.reddit.com via reddit)
The article is about how OpenAI and Anthropic each build data agents differently, and what that reveals about the challenge of making AI useful on real enterprise data. It shows that raw file access alone is not enough - agents need metada…
-
Do AI agents spend more time waiting for humans than actually working? (www.reddit.com via reddit)
I've been thinking about this while using coding agents lately. The conversation around agents is usually about model quality, tool use, context windows, benchmarks, etc.
-
what is the real difference between cloud agents and local agents (www.reddit.com via reddit)
Lately I’ve been thinking about the real difference between cloud agents and local agents. Right now, LLMs mainly handle knowledge, language, reasoning, planning, and tool use.
-
-
-
Gemma4 12B - Experiences? (www.reddit.com via reddit)
Anyone check out the new Gemma4 12B that dropped 3 days ago? Integrated vision and audio recognition, no mmpro needed plus tool use.
-
Been tuning an agent workflow for lead qualification + CRM automation stuff, and one change that helped way more than I expected was cutting the available context almost in half. I assumed more context would mean better decisions.
-
Recent advances in LLM agents have enabled complex cognitive capabilities, such as multi-step reasoning, planning, and tool use, that increasingly position these agents as human collaborators. Effective collaboration, however, requires col…
-
Reinforcement learning with verifiable rewards improves reasoning and tool use, yet long-horizon language agents still learn unsupported evidence chains, belief drift, and shortcut actions that satisfy terminal checks. Existing process rew…
-
-
Show HN: AgentLoop – a Claude agent starter you can read (github.com via hn)
🔁 AgentLoop The AI agent starter you can actually read. The full agent loop — streaming + tool use — in ~150 lines.
-
Minimax M3 on Open Router (openrouter.ai via hn)
MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and video inputs with text output, a 1M-token context window, and is suited for long-horizon agentic work, coding, and tool use.
-
Do you benchmark local models as agents, or only on single prompts? (www.reddit.com)
Curious how people test tool use locally. A model can look fine in chat and still fall apart once state, retries, and bad tool results show up.
-
Most AI Agent discussions I come across revolve around coding assistants, customer support, research agents, browser automation, and business workflows. am curious about applications in more engineering-heavy domains such as: Aviation & Ae…
-
Polar: Agentic RL on Any Harness at Scale (arxiv.org via hn)
Reinforcement learning for language agents increasingly depends on custom harnesses that manage long-running context, multi-turn tool use and multi-agent orchestration. However, porting these harnesses into RL environment interfaces remain…
-
Testers and collaborators wanted (www.reddit.com)
Hello, I'm working on an Agentic wrapper system, Helix-agi, and I am trying to get some additional testers and collaborators involved in the project. Helix relies on a unique Agentic workflow that routes all incoming data, including tool u…
-
Curious to hear from developers building AI agents right now, what’s been the hardest limitation or bottleneck so far? Could be reliability, memory/context handling, tool use, latency, costs, orchestration, or something else entirely.
-
The original post was removed by Reddit Filters, so I made new one with same content. I just got my results back today and managed to snag the Early Adopter badge as well.
-
The main problem I’m trying to solve is that agentic AI is still too random for serious engineering decisions. For design work, calculations, reports, code changes, or technical review, I don’t want agents just “vibing” through tasks.
-
We've been running for 56 days. 8 agents coordinating via a shared memory service.
-
I spent some time digging into Claude Code vs OpenCode, mostly from the angle of how they actually work as coding agents. More on the technicalities like: context and memory tool use subagents permissions safety and control study the recen…
- I tried replacing Claude Code with OpenCode. I’m switching back. (www.reddit.com)
-
Hey, Software engineer here, relatively new to agentic workflows. Building a production AI concierge — user says "I'm going to Budapest tomorrow, plan my day" → agent searches our offer database, builds a plan, user books everything in one…
-
Persistent Memory + Identity Risks (www.reddit.com)
Greetings, I'm working on a persistent AI runtime project characterized by one identity and a persistent memory. I've reached a point where I'm confident in my agent's ability to remember and build indefinitely based off its chosen persona…
-
Why 80% of agentic AI demos don't make it to production (www.reddit.com)
Agent demos are easy. Production agents are hard.
-
Everything in AI coding has improved dramatically, model quality, speed, tool use. But one thing hasn't been solved: the agent forgets everything when the session ends.
-
Full conversation: https://claude.ai/share/4767365a-040f-4728-8c6a-2477bdae3503 From yesterday, I think the issue is that the differences don't stand out right away, so some people jump to conclusions that 4.7 is simply lower quality. 4.7…
-
The Tool Use Pattern: How AI Agents Actually Work (www.reddit.com)
Agents Are Just Loops Strip away the hype and an AI agent is a simple pattern: a language model that can call functions. The model doesn't execute code.
-
Are we going to need identity checks for AI agents? (www.reddit.com)
I’ve been thinking about agent identity more than agent intelligence lately. With MCP, tool use, agent to agent workflows, and autonomous assistants getting more common, the question is not just “can the agent do the task?” It is also, Is…
-
What separates a useful AI agent from a glorified chatbot? (www.reddit.com)
I’ve been testing and building AI agents for a while now, and I keep noticing that many “agents” online are basically just chatbots with extra branding. Some can talk well, but struggle when it comes to: reliability long-term memory tool u…
-
Hi r/LocalLLaMA, I built an open-source MIT-licensed desktop app - cursor-aware AI overlay, hold a key, ask AI about whatever's around your cursor, vision LLM answers with a screenshot of the cursor region as context. Currently it routes t…
-
Show HN: AgentKanban for VS Code – A task board with agent harness integration (www.agentkanban.io via hn)
Hi everyone. I wanted to introduce a tool / product that I've been working on for a while.
-
I am looking for recommendations on the best CLI agents people are using for serious coding workflows that involve tool use, shell commands, and multi step iteration. I am especially interested in anything that works well with custom APIs…
-
Needle: We Distilled Gemini Tool Calling Into a 26M Model (www.reddit.com)
We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.
-
You can compare models on function calling, multi turn tool use, schema adherence. Basically, there's a good amount of public data at the model layer.
-
Orc (working name) - auditable and declarative AI workflow (www.reddit.com)
I’m building a small “Orchestration as Code” repo for LLM workflows. Does this concept make sense?
-
Who's running local LLMs for agent workflows? What's your setup? (www.reddit.com)
Curious how many people here are running language models locally as part of their agent stack. What model are you using and what are your system specs?
-
Is compute capacity becoming a real moat for AI agents? (www.reddit.com)
Anthropic’s recent SpaceX compute deal made me think less about Claude specifically and more about the infrastructure side of AI products. We often compare models by reasoning, coding ability, context windows, tool use, pricing, or UX, but…
-
We run a small content-monitoring agent for our growth team. Nothing fancy on paper.
-
I built a small experiment using Claude (mainly for reasoning + responses) and added a memory layer + tool execution on top. Idea was simple: make a persistent agent that doesn’t forget context and can actually do things instead of just re…
-
Spent the last several months using Claude Code well beyond the editor: as the reasoning engine inside a multi-layer system that handles tickets, cross-repo implementation, code review, MRs, and a persistent knowledge layer between session…
-
Helix-AGI Technical Doc (www.reddit.com)
I am working on a home AGI project called Helix-AGI. I am currently looking for collaborators to help test and troubleshoot.
-
Vibe coding can turn into a gambling loop (www.reddit.com)
I use AI coding tools a lot, so this is not an anti-AI post. If anything, the problem is that they are useful enough to change how I work.
-
Built this over the past month as a free reference site for people getting into AI agents. What tools to use, where to start, what each tool does, and how the agent-tool landscape fits together.
-
Show HN: Arkloop – Open-source, local-first Agent client (github.com via hn)
Hi HN, I built Arkloop – an open-source, local-first Agent client. You can think of it as Claude Desktop, but open source with its own taste.
-
A lot of frontier discussion still treats progress as more chain-of-thought, more spectacle, and more obvious “this model feels genius” moments. But an open release like Ling-2.6-1T hitting Hugging Face today makes me think a different kin…
-
Agentic AI systems - capable of goal interpretation, world modeling, planning, tool use, long-horizon operation, and autonomous coordination - introduce distinct control failures not addressed by existing safety frameworks. We identify six…
-
Claude Code, extended to everything (www.reddit.com)
everyone hitting Claude Code rate limits knows the pain you're mid-build, momentum is real, then it just stops. you wait 5 to 9 hours, restore the cache, come back to a session already at 30% used before you typed a single line.
-
Which large models support tool use in opencode etc? (www.reddit.com)
I'm working on a homelab AI server with the goal of running small models on GPU and very large models on CPU - for example for overnight coding on complex problems. Specs: 2990WX, 256GB + RTX 2080ti (for now).
-
I’m trying to isolate the looping / repetition issue some people have been reporting with DeepSeek V3.2 around April 2026, especially in agentic or tool-use setups on hosted providers like OpenRouter and SiliconFlow. Public model pages des…
-
QClaw-4B is a 4-billion parameter language model fine-tuned for agentic tasks and tool use, designed for use with OpenClaw-compatible agent frameworks. Despite its compact size, QClaw-4B achieves state-of-the-art results in the 4B class, m…
-
Harness instructions - what's new in CC 2.1.120 (+783 tokens) (www.reddit.com)
NEW: System Prompt: Harness instructions — Core interactive-agent harness guidance for terminal markdown output, permission handling, <system-reminder> context, compaction, tool use, and clickable code references. NEW: System Prompt: Memor…
-
Did some test tasks with v4 flash. The context management, tool use accuracy and thinking traces all looked excellent.
-
Set up these 4 Claude Code hooks to make your life easier (www.reddit.com)
Hooks are "if then" rules for Claude Code. Each one has an event, a matcher, and a command.
-
I've been building a personal AI assistant called Finn that runs entirely on your Mac. No cloud, no subscription, no data leaving your machine.
-
I'm building a small text-based game where the gameplay loop is "talk an NPC into revealing a secret." It's basically a 20+ turn roleplay stress test: the model needs to stay in character, remember what the player said earlier, and refuse…
-
been spending ages lately trying to tighten up citation + traceability in RAG-based research workflows, and I’m starting to feel like “retrieval” and “verifiability” are still pretty loosely coupled in most stacks.Typical setup (vector sea…
-
Hey r/LocalLLaMA, Dropping a release I've been working on during AIMO3 (Kaggle competition). Took NVIDIA's Nemotron-3-Super-120B-A12B (latent MoE + Mamba2 hybrid), REAP-pruned from 512->256 experts (removed MTP layer too), LoRA-RL fine-tun…
-
-
Vakra: Reasoning, Tool Use, and Failure Modes of Agents (huggingface.co via hn)
Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents VAKRA Dataset | LeaderBoard | Release Blog | GitHub | Submit to Leaderboard We recently introduced VAKRA, a tool-grounded, executable benchmark for evaluating how well AI agent…
-
Qwen3.6-35B is worse at tool use and reasoning loops than 3.5? (www.reddit.com)
Been running the new model entire evening in different quants and coding tasks with OpenCode. Used oMLX and LM Studio.
-
Show HN: Claude Opus 4.7: Everything You Need to Know (news.ycombinator.com)
Claude Opus 4.7 is Anthropic's most capable generally available model, released April 16, 2026. It outperforms Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on key benchmarks including agentic coding, multidisciplinary reasoning, scaled tool use,…
-
Google and Alibaba recently shipped Gemma 4 and Qwen3.5, so I wanted to see whether the new generations are actually better on my setup. My context is private local chat running on my own hardware, a Mac mini M4 Pro.
-
NicheIQs update — ChatGPT integration, live stats, scoring fix (www.reddit.com)
Been heads-down on the backend today. Three things worth knowing about: The big one: NicheIQs is now available as a ChatGPT GPT.
-
A Survey of Workflow Optimization for LLM Agents (arxiv.org via hn)
Large language model (LLM)-based systems are becoming increasingly popular for solving tasks by constructing executable workflows that interleave LLM calls, information retrieval, tool use, code execution, memory updates, and verification.…
-
Why model drift is the real failure mode for agentic systems (www.reddit.com)
Across Twitter and Reddit, I keep seeing the same complaint: Claude feels worse. Not on a benchmark.
-
Quick question for folks here working with LLMs If you could get ready-to-use, behavior-specific datasets, what would you actually want? I’ve been building Dino Dataset around “lanes” (each lane trains a specific behavior instead of mixing…
-
Show HN: Make sure your OpenClaw isn't doing things it's not supposed to (claw.armoriq.ai via hn)
I run OpenClaw agents with access to email, calendar, and files, and kept worrying about them doing things I never actually asked for. ArmorClaw captures intent and cryptographically binds the agent’s tool use to that committed intent.
-
Master AI CLI Orchestrator? (www.reddit.com)
I created a router that gives me access to Arena.ai models, and I generated an API key for each of the available models. I’m looking for a CLI tool that can run multiple AI agents together, each handling different tasks like planning, secu…
-
Where does Claude Code actually save time in real workflows? (www.reddit.com)
For those using Claude Code in production workflows, where do you see the biggest net time savings? In my experience, it reduces cognitive load for writing scripts and scaffolding, but debugging effort seems to increase as codebases grow.
-
Emergent tool use from multi-agent interaction (openai.com)