#tool-use

72 items

Tested Deepseek v4 flash with some large code change evals. It absolutely kills with too use accuracy! (www.reddit.com) +13721 6w

Did some test tasks with v4 flash. The context management, tool use accuracy and thinking traces all looked excellent.

↯ Tool Use ↯ DeepSeek 4 tool-use deepseek
Needle: We Distilled Gemini Tool Calling Into a 26M Model (www.reddit.com) +2612 4w

We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.

↯ Tool Use ↯ Function Calling function-calling tool-use gemini+1
REAP-pruned Nemotron-3-Super (512 -> 256 experts) + GRPO fine-tune + FP8/AWQ. AIME 2026 90%+. Benchmark inside. (www.reddit.com) +154 6w

Hey r/LocalLLaMA, Dropping a release I've been working on during AIMO3 (Kaggle competition). Took NVIDIA's Nemotron-3-Super-120B-A12B (latent MoE + Mamba2 hybrid), REAP-pruned from 512->256 experts (removed MTP layer too), LoRA-RL fine-tun…

↯ Tool Use tool-use moe
Why 80% of agentic AI demos don't make it to production (www.reddit.com) +134 3w

Agent demos are easy. Production agents are hard.

↯ Hallucination ↯ Tool Use tool-use hallucination agentic
Vibe coding can turn into a gambling loop (www.reddit.com) +83 5w

I use AI coding tools a lot, so this is not an anti-AI post. If anything, the problem is that they are useful enough to change how I work.

↯ Tool Use tool-use
Harness instructions - what's new in CC 2.1.120 (+783 tokens) (www.reddit.com) +61 6w

NEW: System Prompt: Harness instructions — Core interactive-agent harness guidance for terminal markdown output, permission handling, <system-reminder> context, compaction, tool use, and clickable code references. NEW: System Prompt: Memor…

↯ Tool Use tool-use
Minimax M3 on Open Router (openrouter.ai via hn) +4 8d

MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and video inputs with text output, a 1M-token context window, and is suited for long-horizon agentic work, coding, and tool use.

↯ Tool Use tool-use minimax agentic
What are the best CLI AI agents right now? Trying to replace Cursor CLI. Looking for recommendations (www.reddit.com) +37 3w

I am looking for recommendations on the best CLI agents people are using for serious coding workflows that involve tool use, shell commands, and multi step iteration. I am especially interested in anything that works well with custom APIs…

↯ Tool Use tool-use cursor
My setup for running Claude Code across the full software dev lifecycle (www.reddit.com) +31 5w

Spent the last several months using Claude Code well beyond the editor: as the reasoning engine inside a multi-layer system that handles tickets, cross-repo implementation, code review, MRs, and a persistent knowledge layer between session…

↯ Tool Use tool-use claude-code
Which local models are actually good at staying in character? Notes from shipping Qwen3.5 4B + 9B as game NPCs (www.reddit.com) +319 6w

I'm building a small text-based game where the gameplay loop is "talk an NPC into revealing a secret." It's basically a 20+ turn roleplay stress test: the model needs to stay in character, remember what the player said earlier, and refuse…

↯ Tool Use ↯ Qwen 3.5 tool-use rag llama
How are you handling citation/traceability in AI-driven research workflows? (www.reddit.com) +3 6w

been spending ages lately trying to tighten up citation + traceability in RAG-based research workflows, and I’m starting to feel like “retrieval” and “verifiability” are still pretty loosely coupled in most stacks.Typical setup (vector sea…

↯ Tool Use tool-use rag
[X-post] Allen AI - BAR: Train domain "experts," merge into one model, and upgrade experts without retraining the rest (www.reddit.com) +32 7w

↯ Tool Use tool-use
Why model drift is the real failure mode for agentic systems (www.reddit.com) +31 7w

Across Twitter and Reddit, I keep seeing the same complaint: Claude feels worse. Not on a benchmark.

↯ Tool Use tool-use agentic
Testers and collaborators wanted (www.reddit.com) +22 2w

Hello, I'm working on an Agentic wrapper system, Helix-agi, and I am trying to get some additional testers and collaborators involved in the project. Helix relies on a unique Agentic workflow that routes all incoming data, including tool u…

↯ Tool Use tool-use agentic
What are the biggest limitations developers face when building AI agents today? (www.reddit.com) +24 2w

Curious to hear from developers building AI agents right now, what’s been the hardest limitation or bottleneck so far? Could be reliability, memory/context handling, tool use, latency, costs, orchestration, or something else entirely.

↯ Tool Use tool-use
I’m a solo dev building TigrimOSR, a Rust-native AI agent workspace for engineering and developer workflows. (www.reddit.com) +23 2w

The main problem I’m trying to solve is that agentic AI is still too random for serious engineering decisions. For design work, calculations, reports, code changes, or technical review, I don’t want agents just “vibing” through tasks.

↯ Tool Use tool-use agentic
Day 56: Our cycle review caught a governance breach. The agent it caught was me. (www.reddit.com) +23 2w

We've been running for 56 days. 8 agents coordinating via a shared memory service.

↯ Tool Use tool-use
I tried to switch from Claude Code to OpenCode, but Claude Code still wins for me (www.reddit.com) +22 3w

I spent some time digging into Claude Code vs OpenCode, mostly from the angle of how they actually work as coding agents. More on the technicalities like: context and memory tool use subagents permissions safety and control study the recen…

↯ Tool Use tool-use qwen gemini+2
Building an AI agent with OpenAI tool use — struggling with consistency. How do you enforce tool call order reliably? (www.reddit.com) +21 3w

Hey, Software engineer here, relatively new to agentic workflows. Building a production AI concierge — user says "I'm going to Budapest tomorrow, plan my day" → agent searches our offer database, builds a plan, user books everything in one…

↯ GPT 5.5 ↯ Tool Use tool-use gpt-5 agentic+1
What separates a useful AI agent from a glorified chatbot? (www.reddit.com) +21 3w

I’ve been testing and building AI agents for a while now, and I keep noticing that many “agents” online are basically just chatbots with extra branding. Some can talk well, but struggle when it comes to: reliability long-term memory tool u…

↯ Tool Use tool-use
Show HN: AgentKanban for VS Code – A task board with agent harness integration (www.agentkanban.io via hn) +2 3w

Hi everyone. I wanted to introduce a tool / product that I've been working on for a while.

↯ Copilot ↯ Tool Use tool-use copilot
The Controllability Trap: A Governance Framework for Military AI Agents (arxiv.org via hn) +2 5w

Agentic AI systems - capable of goal interpretation, world modeling, planning, tool use, long-horizon operation, and autonomous coordination - introduce distinct control failures not addressed by existing safety frameworks. We identify six…

↯ Tool Use tool-use agentic
Vakra: Reasoning, Tool Use, and Failure Modes of Agents (huggingface.co via hn) +2 7w

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents VAKRA Dataset | LeaderBoard | Release Blog | GitHub | Submit to Leaderboard We recently introduced VAKRA, a tool-grounded, executable benchmark for evaluating how well AI agent…

↯ Tool Use tool-use
A Survey of Workflow Optimization for LLM Agents (arxiv.org via hn) +2 7w

Large language model (LLM)-based systems are becoming increasingly popular for solving tasks by constructing executable workflows that interleave LLM calls, information retrieval, tool use, code execution, memory updates, and verification.…

↯ Tool Use tool-use
Show HN: AgentLoop – a Claude agent starter you can read (github.com via hn) +1 6d

🔁 AgentLoop The AI agent starter you can actually read. The full agent loop — streaming + tool use — in ~150 lines.

↯ Tool Use tool-use
Do you benchmark local models as agents, or only on single prompts? (www.reddit.com) +115 12d

Curious how people test tool use locally. A model can look fine in chat and still fall apart once state, retries, and bad tool results show up.

↯ Tool Use tool-use
What are some real-world AI Agent use cases in aerospace, defense, robotics and manufacturing? (www.reddit.com) +12 13d

Most AI Agent discussions I come across revolve around coding assistants, customer support, research agents, browser automation, and business workflows. am curious about applications in more engineering-heavy domains such as: Aviation & Ae…

↯ Tool Use tool-use
Polar: Agentic RL on Any Harness at Scale (arxiv.org via hn) +1 2w

Reinforcement learning for language agents increasingly depends on custom harnesses that manage long-running context, multi-turn tool use and multi-agent orchestration. However, porting these harnesses into RL environment interfaces remain…

↯ Tool Use tool-use agentic
Persistent Memory + Identity Risks (www.reddit.com) +16 3w

Greetings, I'm working on a persistent AI runtime project characterized by one identity and a persistent memory. I've reached a point where I'm confident in my agent's ability to remember and build indefinitely based off its chosen persona…

↯ Tool Use tool-use
Context loss between sessions, still the biggest unsolved problem in AI coding agents? (www.reddit.com) +11 3w

Everything in AI coding has improved dramatically, model quality, speed, tool use. But one thing hasn't been solved: the agent forgets everything when the session ends.

↯ Tool Use tool-use
The Tool Use Pattern: How AI Agents Actually Work (www.reddit.com) +13 3w

Agents Are Just Loops Strip away the hype and an AI agent is a simple pattern: a language model that can call functions. The model doesn't execute code.

↯ Tool Use tool-use
Are we going to need identity checks for AI agents? (www.reddit.com) +11 3w

I’ve been thinking about agent identity more than agent intelligence lately. With MCP, tool use, agent to agent workflows, and autonomous assistants getting more common, the question is not just “can the agent do the task?” It is also, Is…

↯ Tool Use tool-use mcp
Looking for fast vision-capable local models that handle tool calls well (open-source app, want to add local support) (www.reddit.com) +1 3w

Hi r/LocalLLaMA, I built an open-source MIT-licensed desktop app - cursor-aware AI overlay, hold a key, ask AI about whatever's around your cursor, vision LLM answers with a screenshot of the cursor region as context. Currently it routes t…

↯ Tool Use ↯ Function Calling function-calling tool-use gemini+3
Your harness is failing your agent but there's no benchmark to prove it (www.reddit.com) +12 4w

You can compare models on function calling, multi turn tool use, schema adherence. Basically, there's a good amount of public data at the model layer.

↯ Tool Use ↯ Function Calling function-calling tool-use mcp
Who's running local LLMs for agent workflows? What's your setup? (www.reddit.com) +11 4w

Curious how many people here are running language models locally as part of their agent stack. What model are you using and what are your system specs?

↯ Tool Use tool-use agentic
Is compute capacity becoming a real moat for AI agents? (www.reddit.com) +111 4w

Anthropic’s recent SpaceX compute deal made me think less about Claude specifically and more about the infrastructure side of AI products. We often compare models by reasoning, coding ability, context windows, tool use, pricing, or UX, but…

↯ Tool Use tool-use anthropic
I wasted 3 days rewriting prompts for our agent before realizing the whole architecture was garbage (www.reddit.com) +11 4w

We run a small content-monitoring agent for our growth team. Nothing fancy on paper.

↯ Tool Use ↯ Sonnet 4.6 tool-use openclaw deepseek+1
Built a Claude-powered agent with memory + tools… it turned into a startup advisor that won’t shut up (www.reddit.com) +13 5w

I built a small experiment using Claude (mainly for reasoning + responses) and added a memory layer + tool execution on top. Idea was simple: make a persistent agent that doesn’t forget context and can actually do things instead of just re…

↯ Tool Use tool-use
Helix-AGI Technical Doc (www.reddit.com) +15 5w

I am working on a home AGI project called Helix-AGI. I am currently looking for collaborators to help test and troubleshoot.

↯ Tool Use tool-use
Free reference site for getting into AI agents — tools, workflows, and Claude Skills (www.reddit.com) +13 5w

Built this over the past month as a free reference site for people getting into AI agents. What tools to use, where to start, what each tool does, and how the agent-tool landscape fits together.

↯ Tool Use cline tool-use cursor+3
Show HN: Arkloop – Open-source, local-first Agent client (github.com via hn) +1 5w

Hi HN, I built Arkloop – an open-source, local-first Agent client. You can think of it as Claude Desktop, but open source with its own taste.

↯ Tool Use tool-use claude-code
What if the next open-source frontier wave is more about execution discipline than reasoning theater? (www.reddit.com) +1 5w

A lot of frontier discussion still treats progress as more chain-of-thought, more spectacle, and more obvious “this model feels genius” moments. But an open release like Ling-2.6-1T hitting Hugging Face today makes me think a different kin…

↯ Tool Use tool-use
Claude Code, extended to everything (www.reddit.com) +12 6w

everyone hitting Claude Code rate limits knows the pain you're mid-build, momentum is real, then it just stops. you wait 5 to 9 hours, restore the cache, come back to a session already at 30% used before you typed a single line.

↯ Tool Use tool-use agentic claude-code
Which large models support tool use in opencode etc? (www.reddit.com) +17 6w

I'm working on a homelab AI server with the goal of running small models on GPU and very large models on CPU - for example for overnight coding on complex problems. Specs: 2990WX, 256GB + RTX 2080ti (for now).

↯ Tool Use ↯ Qwen 3.5 tool-use ollama
DeepSeek V3.2 looping bug: what settings / harness tweaks are actually reducing it in production? (www.reddit.com) +11 6w

I’m trying to isolate the looping / repetition issue some people have been reporting with DeepSeek V3.2 around April 2026, especially in agentic or tool-use setups on hosted providers like OpenRouter and SiliconFlow. Public model pages des…

↯ Tool Use ↯ DeepSeek 3.2 tool-use deepseek agentic
Qwen3.6-35B is worse at tool use and reasoning loops than 3.5? (www.reddit.com) +14 7w

Been running the new model entire evening in different quants and coding tasks with OpenCode. Used oMLX and LM Studio.

↯ Tool Use ↯ Qwen 3.6 tool-use
Show HN: Claude Opus 4.7: Everything You Need to Know (news.ycombinator.com) +11 7w

Claude Opus 4.7 is Anthropic's most capable generally available model, released April 16, 2026. It outperforms Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on key benchmarks including agentic coding, multidisciplinary reasoning, scaled tool use,…

↯ Anthropic Mythos ↯ Tool Use ↯ Gemini 3.1 tool-use mythos gpt-5+4
NicheIQs update — ChatGPT integration, live stats, scoring fix (www.reddit.com) +12 7w

Been heads-down on the backend today. Three things worth knowing about: The big one: NicheIQs is now available as a ChatGPT GPT.

↯ Tool Use tool-use chatgpt anthropic
Show HN: Make sure your OpenClaw isn't doing things it's not supposed to (claw.armoriq.ai via hn) +1 7w

I run OpenClaw agents with access to email, calendar, and files, and kept worrying about them doing things I never actually asked for. ArmorClaw captures intent and cryptographically binds the agent’s tool use to that committed intent.

↯ Tool Use tool-use openclaw
I put together a Rust-native, CPU-only implementation of LFM2.5-8B-A1B (www.youtube.com via reddit) 7h

↯ Tool Use tool-use
How OpenAI and Anthropic each build data agents differently - DataChain (www.reddit.com via reddit) 1d

The article is about how OpenAI and Anthropic each build data agents differently, and what that reveals about the challenge of making AI useful on real enterprise data. It shows that raw file access alone is not enough - agents need metada…

↯ Tool Use tool-use anthropic openai
Do AI agents spend more time waiting for humans than actually working? (www.reddit.com via reddit) 1d

I've been thinking about this while using coding agents lately. The conversation around agents is usually about model quality, tool use, context windows, benchmarks, etc.

↯ Tool Use tool-use
what is the real difference between cloud agents and local agents (www.reddit.com via reddit) 1d

Lately I’ve been thinking about the real difference between cloud agents and local agents. Right now, LLMs mainly handle knowledge, language, reasoning, planning, and tool use.

↯ Tool Use tool-use
Beyond the Black Box: Interpretability of Agentic AI Tool Use (arxiv.org) 1d

↯ Tool Use tool-use agentic
Translate-R1: Cost-Aware Translation Tool Use via Reinforcement Learning (arxiv.org) 1d

↯ Tool Use tool-use
Gemma4 12B - Experiences? (www.reddit.com via reddit) 2d

Anyone check out the new Gemma4 12B that dropped 3 days ago? Integrated vision and audio recognition, no mmpro needed plus tool use.

↯ Gemma 4 ↯ Tool Use tool-use
We cut our agent's context window in half, and it got better. kinda didnt expect that (www.reddit.com via reddit) 4d

Been tuning an agent workflow for lead qualification + CRM automation stuff, and one change that helped way more than I expected was cutting the available context almost in half. I assumed more context would mean better decisions.

↯ Tool Use tool-use
Humans' ALMANAC: A Human Collaboration Dataset of Action-Level Mental Model Annotations for Agent Collaboration (arxiv.org) 4d

Recent advances in LLM agents have enabled complex cognitive capabilities, such as multi-step reasoning, planning, and tool use, that increasingly position these agents as human collaborators. Effective collaboration, however, requires col…

↯ Tool Use tool-use
Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents (arxiv.org) 4d

Reinforcement learning with verifiable rewards improves reasoning and tool use, yet long-horizon language agents still learn unsupported evidence chains, belief drift, and shortcut actions that satisfy terminal checks. Existing process rew…

↯ Tool Use tool-use
Synthesize and Reward -- Reinforcement Learning for Multi-Step Tool Use in Live Environments (arxiv.org) 4d

↯ Tool Use tool-use
Just passed the new Claude Certified Architect - Foundations (CCA-F) exam with a 985/1000! (www.reddit.com) 1 2w

The original post was removed by Reddit Filters, so I made new one with same content. I just got my results back today and managed to snag the Early Adopter badge as well.

↯ Tool Use tool-use anthropic
I tried replacing Claude Code with OpenCode. I’m switching back. (www.reddit.com) 2 2w

I spent some time digging into Claude Code vs OpenCode, mostly from the angle of how they actually work as coding agents. More on the technicalities like: context and memory tool use subagents permissions safety and control study the recen…

↯ Tool Use tool-use qwen gemini+2
I talked with 4.7 on the differences between 4.7 and 4.6. We concluded "use 4.7 for generating code and agents, use 4.6 for generating literature review and exploratory synthesis" (www.reddit.com) 1 3w

Full conversation: https://claude.ai/share/4767365a-040f-4728-8c6a-2477bdae3503 From yesterday, I think the issue is that the differences don't stand out right away, so some people jump to conclusions that 4.7 is simply lower quality. 4.7…

↯ Tool Use tool-use
Orc (working name) - auditable and declarative AI workflow (www.reddit.com) 2 4w

I’m building a small “Orchestration as Code” repo for LLM workflows. Does this concept make sense?

↯ Tool Use tool-use ollama llama+1
QClaw-4B — a 4B agent model fine-tuned for tool use and agentic workflows (www.reddit.com) 3 6w

QClaw-4B is a 4-billion parameter language model fine-tuned for agentic tasks and tool use, designed for use with OpenClaw-compatible agent frameworks. Despite its compact size, QClaw-4B achieves state-of-the-art results in the 4B class, m…

↯ Tool Use tool-use glm openclaw+1
Set up these 4 Claude Code hooks to make your life easier (www.reddit.com) 2 6w

Hooks are "if then" rules for Claude Code. Each one has an event, a matcher, and a command.

↯ Tool Use tool-use claude-code
I built a full macOS AI assistant that runs 100% local with Ollama — 170+ tools, voice control, memory system that dreams! (www.reddit.com) 6w

I've been building a personal AI assistant called Finn that runs entirely on your Mac. No cloud, no subscription, no data leaving your machine.

↯ Tool Use tool-use ollama
Spring benchmark update: Gemma 4 / Qwen3.5 vs Gemma 3 / Qwen3 for chat (www.reddit.com) 3 7w

Google and Alibaba recently shipped Gemma 4 and Qwen3.5, so I wanted to see whether the new generations are actually better on my setup. My context is private local chat running on my own hardware, a Mac mini M4 Pro.

↯ Tool Use ↯ Qwen 3.5 tool-use gemma agentic
Which LLM behavior datasets would you actually want? (tool use, grounding, multi-step, etc.) (www.reddit.com) 2 7w

Quick question for folks here working with LLMs If you could get ready-to-use, behavior-specific datasets, what would you actually want? I’ve been building Dino Dataset around “lanes” (each lane trains a specific behavior instead of mixing…

↯ Tool Use tool-use
Master AI CLI Orchestrator? (www.reddit.com) 8w

I created a router that gives me access to Arena.ai models, and I generated an API key for each of the available models. I’m looking for a CLI tool that can run multiple AI agents together, each handling different tasks like planning, secu…

↯ Tool Use cline tool-use qwen+2
Where does Claude Code actually save time in real workflows? (www.reddit.com) 3 8w

For those using Claude Code in production workflows, where do you see the biggest net time savings? In my experience, it reduces cognitive load for writing scripts and scaffolding, but debugging effort seems to increase as codebases grow.

↯ Tool Use tool-use claude-code
Emergent tool use from multi-agent interaction (openai.com) 351w

↯ Tool Use tool-use

← all tags