#tool-calling

45 items

Hot take: the biggest bottleneck in AI agents right now isn't models, frameworks, or even cost. It's that nobody knows how to properly evaluate if their agent is actually working (www.reddit.com) +8429 9w

tool-calling rag
We are finally there: Qwen3.6-27B + agentic search; 95.7% SimpleQA on a single 3090, fully local (www.reddit.com) +489 7w

LDR maintainer here. Thanks to the strong support of r/LocalLLaMA community LDR got very far.

↯ Qwen 3.6 tool-calling ollama opus+1
AI agent roadmap for developers who can code but have never built an agent (www.reddit.com) +1013 9w

tool-calling mcp openai
Show HN: Local Coding Agent with LLMs to Delegate Tool Calls to Small AI Models (github.com via hn) +9 4w

Open Agent Tools Coder Open Agent Tools (oats) enables small-to-large self-hosted ai models to use local source code when running tool-calling agentic workloads. We actively data mine 20,970+ (2+ TB) popular github repos using large and sm…

tool-calling agentic
Benchmarked Needle 26M vs Qwen3-0.6B on CPU function calling, 50 queries across 5 difficulty tiers. The 23x smaller model wins on accuracy and is 4.4x faster. (www.reddit.com) +41 4w

Ran a head-to-head on two open-weight models for tool-calling on a 4-core CPU, no GPU, no cherry-picking. Wanted to see if the small specialist (Needle, 26M, distilled from Gemini 3.1 for function calls) actually holds up against a small g…

↯ Function Calling ↯ Gemini 3.1 function-calling tool-calling gemini
Computer use is 45x more expensive than a structured API call (www.reddit.com) +45 5w

Hi r/AI_Agents, I recently did a benchmark on computer use agents vs api calls as part of a feature launch for my company. I wanted to share the benchmark here since it seems relevant to this sub: See, most teams default to computer use ag…

tool-calling
how do you design an ai agent to handle heavy data processing and large files? (www.reddit.com) +34 5w

looking for architectural patterns on handling data gravity in production agent pipelines. every tutorial I've found assumes light text payloads or short tool-calling loops, but once your agents have to actually interact with massive sourc…

vector-database tool-calling
AI Support Agents & Workflows Worth Exploring in 2026 (www.reddit.com) +31 6w

Been exploring how AI agents are slowly changing customer support workflows, especially for smaller teams trying to scale without adding headcount. Some interesting tools/workflows worth checking out: • SparrowDesk’s Zoona: AI support agen…

tool-calling openai
What we learned trying to fine-tune a small tool-calling model from production traces (and what not to do) (www.reddit.com) +3 10w

TL;DR: We wanted a small, fast model for multi-turn tool-calling. Training on clean, curated data worked brilliantly (1.7B student beating a 744B teacher).

tool-calling
Show HN: Kitchen Rush, Overcooked inspired LLM tool calling benchmark (github.com via hn) +2 10d

An agent tool-calling benchmark where latency matters as much as intelligence. Why this exists Most tool-calling benchmarks (BFCL, τ-bench, ToolSandbox, AppWorld) check whether a model makes the right calls — and the world politely waits w…

tool-calling
Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM (deemwar-products.github.io via hn) +2 3w

mochallamaA local, tool-calling LLM inside your JVM The only in-process, tool-calling local LLM for the JVM — Spring-first, OpenAI-compatible, llama.cpp-backed via Project Panama FFM. No JNI, no daemon, no native-install dance.

tool-calling llama openai
Show HN: Prism Coder – Qwen3.5-14B fine-tuned for MCP tool-routing decisions (github.com via hn) +2 5w

🧠 Prism Coder 🌐 Read in your language: 🇬🇧 English · 🇪🇸 Español · 🇫🇷 Français · 🇵🇹 Português · 🇷🇴 Română · 🇺🇦 Українська · 🇷🇺 Русский · 🇩🇪 Deutsch · 🇯🇵 日本語 · 🇰🇷 한국어 · 🇨🇳 中文 · 🇸🇦 العربية Persistent memory + tool-calling intelligence for AI a…

↯ Qwen 3.5 tool-calling mcp
The Oats Protocol – Open Agent Tools for Local Coding Agents (news.ycombinator.com) +2 5w

Recently I was using functiongemma and watched it load and run local source code as a tool call without any training/tuning. A couple days later I got Qwen35 in Open-WebUI to use the "native" tool-calling.

tool-calling
Is Haiku good for building a chatbot with MCP tools ? (www.reddit.com) +22 7w

Hi, We’re experimenting with building a chatbot that handles consumer interactions. The agent currently has access to about 5–8 tools, and we’re exploring different models to find the right balance of speed, cost, and tool-calling reliabil…

tool-calling haiku mcp
I tried to get my AI agent to schedule a meeting over email. The failure mode revealed a problem almost nobody in the agent space is talking about. (www.reddit.com) +22 9w

I've been building an AI agent that operates across SMS, email, WhatsApp, and Slack — and the hardest problem I've run into isn't tool-calling or reasoning. It's what happens when the agent interacts with multiple people who have different…

tool-calling
What we learned building a data agent that talks to 4 database types simultaneously (DAB benchmark) (www.reddit.com) +21 10w

UC Berkeley published DataAgentBench (DAB) in March — 54 queries across PostgreSQL, MongoDB, SQLite, and DuckDB. Best score so far is 54.3% (PromptQL + Gemini).

tool-calling gemini mcp+1
How does Google Antigravity IDE actually work internally? (www.reddit.com) +12 4w

Hey everyone, I’ve been exploring Google Antigravity recently, and I’m really curious about its internal architecture and engineering design. From the demos, it seems much more advanced than a normal AI coding assistant — almost like an au…

devin tool-calling cursor
ReAct tool-calling issue: Orchestration model computes internally instead of using tools (www.reddit.com) +13 4w

Built a local ReAct-style calculator agent with 6 tools: add subtract multiply divide modulo etc. The setup is: orchestrator agent dynamic tool selection ReAct loop tools exposed as functions Problem: Even when the user asks multi-step ari…

↯ Qwen 3.5 tool-calling gemma
Training a 22MB prompt injection classifier (www.stackone.com via hn) +1 5w

Training a 22MB Prompt Injection Classifier Table of Contents When we started building Defender (our prompt injection guard for MCP tool-calling agents), the constraint was simple and unforgiving: ship inline inside a TypeScript Lambda, st…

↯ Security tool-calling prompt-injection security+1
Needle-rs – AI Function calling in the browser, 258 KB WASM (needle-rs.pages.dev via hn) +1 5w

AI TOOL CALLING · WASM · NO_STD Below is a 26M-parameter tool-calling transformer running entirely in this tab — no server, no API key, no data leaving your device. The model is Needle by Cactus Compute; needle-rs is the pure-Rust runtime…

↯ Function Calling function-calling tool-calling
sAI2.m6s (www.reddit.com) +12 5w

Hey everyone, I'm designing a powerful, autonomous AI chatbot(agent) , fully private, using a Python backend (for the core intelligence and tool-calling loops) and a Flutter frontend for a cross-platform UI. Since this moves past a basic…

↯ Security tool-calling prompt-injection security
I built an OSS CLI to catch regressions when migrating between LLMs (www.reddit.com) +12 5w

I’ve been working on EvalShift, an open-source Python CLI for testing whether moving from one LLM/model version to another introduces regressions. The use case is simple: You have prompts, agents, or tool-calling workflows that work well o…

tool-calling gpt-5 gemini
Built a practical voice-first AI tool for ADHD/executive dysfunction — one-tap brain dump → structured reminders & tasks (not a full autonomous agent) (www.reddit.com) +12 6w

Not a full autonomous agent in the Auto-GPT / LangChain sense, but I built something that uses AI in a very practical, daily way for executive dysfunction / ADHD brains. SAVI is a one-tap voice capture tool.

tool-calling
Show HN: Mlx-code – I built a "backyard shed" AI coding agent for Mac (github.com via hn) +1 6w

mlx-code A lightweight coding agent for Mac, built on Apple's MLX framework. Fast local inference, built-in prompt caching, robust tool-calling.

tool-calling
Help setting up Chrome MCP for Hermes Agent (www.reddit.com) +12 7w

Hi everyone, I'm trying to set up Chrome MCP (Model Context Protocol) for Hermes Agent and need some guidance. **Background:** - Hermes Agent (by NousResearch) has self-learning features - I want to integrate Chrome browser automation via…

↯ Model Context Protocol tool-calling model-context-protocol mcp
Title: Is it just me, or is the "Multi-Agent Swarm" the new "Over-Engineered Spreadsheet"? (www.reddit.com) +11 8w

We’re four months into 2026 and every demo I see features "15 agents working together to write a blog post." In my experience, the more agents you add, the higher the "Cognitive Tax." You get more hallucinations, more token cost, and more…

tool-calling claude-code
Defender – Local prompt injection detection for AI agents (no API calls) (www.npmjs.com via hn) +1 10w

Prompt injection defense framework for AI tool-calling Indirect prompt injection defense and protection for AI agents using tool calls (via MCP, CLI or direct function calling). Detects and neutralizes prompt injection attacks hidden in t…

↯ Security ↯ Function Calling function-calling tool-calling prompt-injection+2
Self-Evolution for Multi-Turn Tool-Calling Agents via Divergence-Point Preference Learning (arxiv.org) 3d

tool-calling
LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents (arxiv.org) 7d

Policy-adherent tool-calling agents in customer-service domains must maintain task states across turns while calling tools and obeying domain policies. Task states consist of relevant facts, identifiers, constraints, and conditions observe…

tool-calling
Building independent LLM drift detection - sharing the methodology, looking for feedback on the approach (www.reddit.com via reddit) 8d

Disclosed upfront: I run [Tickerr dot ai], an independent external monitor for AI APIs. Today it tracks latency, TTFT, uptime, and error rates across major models.

tool-calling gemini agentic+1
Kimi K2.7 Code: 1T MoE, $0.95/M tokens, MIT license, beats Opus 4.8 on MCP tool-calling (www.reddit.com via reddit) 9d

Moonshot AI released Kimi K2.7 Code on June 12 — a coding-focused open-weight model. Key specs: - 1 trillion params (MoE, 32B active, 384 experts) - 256K context window - Modified MIT license — weights on Hugging Face - $0.95/M input, $4.0…

↯ Swe Bench tool-calling swe-bench moe+5
Can LLM Agents Infer World Models? Evidence from Agentic Automata Learning (arxiv.org) 10d

We propose agentic automata learning to evaluate the extent to which tool-calling LLM agents can uncover hidden environments through interaction. In our setup, an agent should uncover a hidden deterministic finite automaton (DFA) by intera…

tool-calling agentic
CacheRL:Multi-Turn Tool-Calling Agents via Cached Rollouts and Hybrid Reward (arxiv.org) 11d

We present CacheRL, a system for training small agent foundation models that achieves 92 percent process accuracy on multi-step tool-calling tasks, approaching GPT-5's 94 percent while requiring 100 times less compute. Our approach address…

tool-calling gpt-5
JudgeOS V5.7 / EBH — The Governance Firewall Above AI, Robots, Agents, and Autonomous Workflows (www.reddit.com via reddit) 2w

Below is the whole-system tree map showing how JudgeOS V5.7 / EBH connects the locked core, Universal Adapter, domain adapters, capability registry, evidence trust, exact-action ALLOW binding, receipt/replay layer, SDK, dashboard, and exec…

tool-calling
Has anyone tried turning Gemini Web + Playwright into an OpenAI-compatible API to skip paid API purchases? (www.reddit.com via reddit) 2w

I've always wondered if there was a way to avoid expensive API costs while still building AI-powered applications. That got me thinking: what if, instead of using the official API, I simply automated a web browser to interact with Gemini d…

tool-calling gemini openai
IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents (arxiv.org) 2w

This paper investigates reinforcement learning (RL) methods for improving tool-calling capabilities in multimodal small language model (SLM) agents. While existing works have explored various reward designs to improve agentic tool-calling…

↯ Tool Use tool-calling tool-use agentic
I built a visual, local-first AI agent platform - no Docker, no terminal, double-click installer (v0.3.5, open source) (www.reddit.comhttps) 2w

tool-calling ollama openclaw+2
ASA: Backbone-Training-Free Representation Engineering for Tool-Calling Agents (arxiv.org) 2w

tool-calling
Exploring Agentic Tool-Calling Decisions via Uncertainty-Aligned Reinforcement Learning (arxiv.org) 2w

Large language model (LLM)-based agents often make suboptimal tool-use decisions, including unsupported tool invocation and hallucinated direct responses, which may accumulate errors throughout multi-step interactions. Existing approaches…

tool-calling agentic
we really all are going to make it, aren't we? 2x3090 setup. (www.reddit.com) 3 6w

i'm blown away. i saw someone made a post the other day about "club-3090" and after having sonnet patch some fixes into it, specifically a sse-session drop bug and a bug with tool-calling, it's fair to say that even "budget" setups like my…

tool-calling sonnet
Should I buy Claude Pro as a BTech student — especially for the agentic/coding side? Honest takes wanted (www.reddit.com) 11 7w

https://preview.redd.it/l23rgf5z4qyg1.png?width=1402&format=png&auto=webp&s=73a7a278ca50527c9605488141d7e5ea48089a85 Hey everyone, I'm a BTech (AI/ML) student considering Claude Pro ($20/month) but want to separate the real value from the…

tool-calling agentic claude-code
I built AI agents that play Pokemon Showdown autonomously using free LLM APIs via tool-calling (www.reddit.com) 8w

I've built a system where models like Llama 3, Qwen, and Gemma play Pokémon Showdown battles autonomously. Instead of simple prompt-response, they analyze the full battle state every turn (type matchups, HP, weather, field conditions, reve…

↯ Llama 3 tool-calling gemma qwen+1
llm 0.32a1 (simonwillison.net) 8w

29th April 2026 - Fixed a bug in 0.32a0 where tool-calling conversations were not correctly reinflated from SQLite. #1426 Recent articles - LLM 0.32a0 is a major backwards-compatible refactor - 29th April 2026 - Tracking the history of the…

tool-calling
TPS wasn't enough, tool-calling pass rate decided the winner in my Qwen 7B runs (www.reddit.com) 3 8w

I kept running into the same problem: TPS and TTFT tell you which config is fast, and perplexity is helpful only as a rough quality signal. None of them reliably tell you how the model will behave after changing quant, ctx size, kv_cache,…

tool-calling qwen
Small models fail at tool selection - but it's not what I expected (www.reddit.com) 8 10w

Been running small models (1.5B-4B) with tool-calling agents. They consistently failed at selecting the right tool from 80+ options.

tool-calling

← all tags