#function-calling

31 items

Qwen 3.6 27B BF16 vs Q4_K_M vs Q8_0 GGUF evaluation (www.reddit.com) +9035 8w

Evaluated Qwen 3.6 27B across BF16, Q4_K_M, and Q8_0 GGUF quant variants with llama-cpp-python using Neo AI Engineer. Benchmarks used: HumanEval: code generation HellaSwag: commonsense reasoning BFCL: function calling Total samples: HumanE…

↯ Function Calling ↯ Qwen 3.6 humaneval function-calling qwen+1
Benchmarked Needle 26M vs Qwen3-0.6B on CPU function calling, 50 queries across 5 difficulty tiers. The 23x smaller model wins on accuracy and is 4.4x faster. (www.reddit.com) +41 4w

Ran a head-to-head on two open-weight models for tool-calling on a 4-core CPU, no GPU, no cherry-picking. Wanted to see if the small specialist (Needle, 26M, distilled from Gemini 3.1 for function calls) actually holds up against a small g…

↯ Function Calling ↯ Gemini 3.1 function-calling tool-calling gemini
After 3 months of switching between Claude Sonnet 4.6, GPT-5.5, and Gemini 3.1 daily — here's my actual routing (www.reddit.com) +45 4w

Not benchmarks — actual tasks, actual results. Claude Sonnet 4.6 for: - Long documents that need nuanced analysis - Writing where voice and precision matter - Reasoning through edge cases in code - Anything where "think carefully" is the r…

↯ Function Calling ↯ Sonnet 4.6 function-calling gpt-5 sonnet+1
Learn, run and test Agentic AI on your browser for free! (Built with Claude Opus 4.7 in 2 days) (www.reddit.com) +48 8w

Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…

↯ Fine Tuning ↯ Function Calling ↯ Opus 4.7 function-calling fine-tuning rag+4
Context checkpoint erasure in llama.cpp ? (www.reddit.com) +37 10w

Has anyone been able to solve or mitigate context checkpoints being erased during single user inference, specifically when function calling is part of the chat history? I've been using Qwen 3.5 35B A3B for some time (now using 3.6), tested…

↯ Function Calling ↯ Qwen 3.6 function-calling qwen llama
What’s the cheapest way to give a local Llama 3 internet access? (SearXNG isn’t cutting it) (www.reddit.com) +219 5w

Finally got Llama 3 70B running locally and wired up function calling so it can search the web. First tried self-hosting SearXNG, but the results are pretty messy.

↯ Function Calling function-calling llama
Show HN: I built a search engine for llms.txt sites (statespace.com via hn) +2 8w

More and more developer tools are adopting the llms.txt standard to build AI-friendly versions of their docs. The problem is that it's very hard to search across them.

↯ Mistral ↯ Function Calling function-calling vector-database mistral+1
I built an open source protocol that gives every AI tool a signed contract — so your agent verifies before executing, saves tokens by choosing card depth, and leaves an auditable receipt on every call. No blind function calling. (www.reddit.com) +12 4w

▎ What problem does this solve? Right now most agents call tools based on a name and a JSON Schema.

↯ Function Calling function-calling
Tool calling vs prompt routing for search decisions? (www.reddit.com) +13 4w

Hi, would appreciate your help. I have a summary of a given topic plus past conversation history.

↯ Function Calling function-calling
opensource router slm with 50-100ms latency and 99% accuracy that runs locally (www.reddit.com) +11 5w

i am working on a router slm that helps in multiple agent orchestration , excels in tool calling but every option comes with a tradeoff of its own , you are invited to give your approaches to refine the architecture 1 - if we use multiple…

↯ Function Calling function-calling
How are you actually predicting AI costs before they hit your invoice? (www.reddit.com) +15 5w

Switched from prototype to production last month and our AI bill was 3x what we estimated. Not because we picked the wrong model - we just didn't know what we didn't know.

↯ Function Calling function-calling agentic
Needle-rs – AI Function calling in the browser, 258 KB WASM (needle-rs.pages.dev via hn) +1 5w

AI TOOL CALLING · WASM · NO_STD Below is a 26M-parameter tool-calling transformer running entirely in this tab — no server, no API key, no data leaving your device. The model is Needle by Cactus Compute; needle-rs is the pure-Rust runtime…

↯ Function Calling function-calling tool-calling
Looking for fast vision-capable local models that handle tool calls well (open-source app, want to add local support) (www.reddit.com) +1 6w

Hi r/LocalLLaMA, I built an open-source MIT-licensed desktop app - cursor-aware AI overlay, hold a key, ask AI about whatever's around your cursor, vision LLM answers with a screenshot of the cursor region as context. Currently it routes t…

↯ Tool Use ↯ Function Calling function-calling tool-use gemini+3
Your harness is failing your agent but there's no benchmark to prove it (www.reddit.com) +12 6w

You can compare models on function calling, multi turn tool use, schema adherence. Basically, there's a good amount of public data at the model layer.

↯ Tool Use ↯ Function Calling function-calling tool-use mcp
Function calling works great in demos. In production, it’s a different story. (www.reddit.com) +15 7w

I’ve been working on adding function calling to an LLM-based support system over the past few weeks. Thought I’d share a few things that didn’t behave the way the demos suggest.

↯ Function Calling function-calling
Qwem Meetup Presentation: Function Calling Harness, from 6.75% to 100% (typia.io via hn) +1 7w

TL;DR - AutoBe — AI backend auto-generation agent - Production-grade backend from natural language conversation - 4 AST types + 4-tier compiler validation + self-healing loops - Schema specs are the new prompts - Typia — The infrastructure…

↯ Function Calling function-calling
Local LLM Benchmark about Backend Generation by Function Calling (GLM vs Qwen vs DeepSeek) (www.reddit.com) +1 7w

Detailed Article: https://autobe.dev/articles/local-llm-benchmark-about-backend-generation.html Five months ago I posted the "Hardcore function calling benchmark in backend coding agent" thread here. As I wrote in that post, it was an unco…

↯ Glm ↯ Function Calling ↯ Sonnet 4.6 function-calling glm gpt-5+3
Qwen Meetup Draft Review Required (Function Calling Harness 2 - CoT Compliance from 9.91% to 100%) (autobe.dev via reddit) +1 7w

Talk at Qwen Meetup Korea end of May. Looking for review on this draft before I build PPT slides off it.

↯ Function Calling ↯ Qwen 3 function-calling qwen
Multi-agent in production: real win or just hype? (www.reddit.com) +12 8w

Trying to get an honest read on this from people actually shipping. Every other AI announcement lately is "agentic" or "multi-agent," and I can't always tell if it's a real architectural shift or rebranded function calling with extra steps.

↯ Function Calling function-calling agentic
Run, Learn and test Agentic AI for free, on your browser! (Open AI Models are included) (www.reddit.com) +1 8w

Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…

↯ Fine Tuning ↯ Function Calling function-calling fine-tuning rag+3
Interactive playground to learn Agentic AI hands-on (Free) with Certification (www.reddit.com) +12 8w

Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…

↯ Fine Tuning ↯ Function Calling function-calling fine-tuning rag+3
The model alone is not the agent. The harness plus the model is the agent (www.reddit.com) +16 9w

An agentic harness is the orchestration and control layer wrapped around a base language model that transforms it from a stateless text predictor into an agent capable of taking actions, calling tools, maintaining state across steps, and e…

↯ Function Calling function-calling agentic
Defender – Local prompt injection detection for AI agents (no API calls) (www.npmjs.com via hn) +1 10w

Prompt injection defense framework for AI tool-calling Indirect prompt injection defense and protection for AI agents using tool calls (via MCP, CLI or direct function calling). Detects and neutralizes prompt injection attacks hidden in t…

↯ Security ↯ Function Calling function-calling tool-calling prompt-injection+2
Beyond Function Calling: Benchmarking Tool-Using Agents under Tool-Environment Unreliability (arxiv.org) 1d

Large language models are increasingly deployed as agents that solve tasks by interacting with external tool environments. Although recent tool-use benchmarks increasingly cover complex task settings, they still largely assume clean, stabl…

↯ Function Calling function-calling
Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents (arxiv.org) 2w

Compact language models (LMs) reduce cost, latency, and deployment risk for tool agents. Yet MCP-style tool use requires more than isolated function calling: an agent must discover tools from live catalogs, satisfy schemas, preserve depend…

↯ Tool Use ↯ Function Calling function-calling tool-use mcp
Built a broadcast dashboard monitoring AI agent developments across 21 primary sources - here's what I'm tracking and what's missing (www.reddit.com via reddit) 2w

Agent-related developments are some of the hardest signals to track right now - they're spread across arXiv papers, GitHub repos, model release notes, incident reports, and policy documents simultaneously. I've been running a pipeline that…

↯ Tool Use ↯ Function Calling function-calling tool-use
nvidia/diffusiongemma-26B-A4B-it-NVFP4 · Hugging Face (huggingface.co via reddit) 2w

Model Overview Description: DiffusionGemma 26B A4B IT is an open-weights multimodal generative model developed by Google DeepMind that processes text, image, and video inputs to produce text output via discrete diffusion. Built on the Gemm…

↯ Deepmind ↯ Function Calling ↯ Gemma 4 function-calling deepmind moe+1
Building Expertise in Claude - Seeking Quality Learning Resources (www.reddit.com) 13 5w

Hi everyone, I'm on a mission to become a serious expert in Claude and AI, and I'm building a structured learning path. I want to create content that's actually valuable - with real practical applications, not surface-level tutorials.

↯ Function Calling function-calling rag anthropic
ReAct or CodeAct, that is the question (www.reddit.com) 5 6w

Hi guys, Idk what you think, but for me, one of the biggest discussions in the AI engineering field is this issue: ReAct vs. CodeAct.

↯ Function Calling function-calling mcp
Qwen3.6:27b vs qwen3-coder:30b vs deepseek-coder:33b on code gen, tool calling, and agent tasks (www.reddit.com) 6 7w

Ran a full eval against four local models last weekend and the spread between them is wider than I expected. All running through Ollama on CPU, no cloud, same prompts, same hardware.

↯ Function Calling ↯ Qwen 3.6 humaneval function-calling ollama+1
Function calling and other API updates (openai.com) 158w

↯ Function Calling function-calling

← all tags