event

Tool Use

72 items · started 2019-09-17 · ongoing (last activity 2026-06-09)

  1. The article is about how OpenAI and Anthropic each build data agents differently, and what that reveals about the challenge of making AI useful on real enterprise data. It shows that raw file access alone is not enough - agents need metada…

  2. I've been thinking about this while using coding agents lately. The conversation around agents is usually about model quality, tool use, context windows, benchmarks, etc.

  3. Lately I’ve been thinking about the real difference between cloud agents and local agents. Right now, LLMs mainly handle knowledge, language, reasoning, planning, and tool use.

  4. Anyone check out the new Gemma4 12B that dropped 3 days ago? Integrated vision and audio recognition, no mmpro needed plus tool use.

  5. Been tuning an agent workflow for lead qualification + CRM automation stuff, and one change that helped way more than I expected was cutting the available context almost in half. I assumed more context would mean better decisions.

  6. Recent advances in LLM agents have enabled complex cognitive capabilities, such as multi-step reasoning, planning, and tool use, that increasingly position these agents as human collaborators. Effective collaboration, however, requires col…

  7. Reinforcement learning with verifiable rewards improves reasoning and tool use, yet long-horizon language agents still learn unsupported evidence chains, belief drift, and shortcut actions that satisfy terminal checks. Existing process rew…

  8. 🔁 AgentLoop The AI agent starter you can actually read. The full agent loop — streaming + tool use — in ~150 lines.

  9. MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and video inputs with text output, a 1M-token context window, and is suited for long-horizon agentic work, coding, and tool use.

  10. Curious how people test tool use locally. A model can look fine in chat and still fall apart once state, retries, and bad tool results show up.

  11. Most AI Agent discussions I come across revolve around coding assistants, customer support, research agents, browser automation, and business workflows. am curious about applications in more engineering-heavy domains such as: Aviation & Ae…

  12. Reinforcement learning for language agents increasingly depends on custom harnesses that manage long-running context, multi-turn tool use and multi-agent orchestration. However, porting these harnesses into RL environment interfaces remain…

  13. Hello, I'm working on an Agentic wrapper system, Helix-agi, and I am trying to get some additional testers and collaborators involved in the project. Helix relies on a unique Agentic workflow that routes all incoming data, including tool u…

  14. Curious to hear from developers building AI agents right now, what’s been the hardest limitation or bottleneck so far? Could be reliability, memory/context handling, tool use, latency, costs, orchestration, or something else entirely.

  15. The original post was removed by Reddit Filters, so I made new one with same content. I just got my results back today and managed to snag the Early Adopter badge as well.

  16. The main problem I’m trying to solve is that agentic AI is still too random for serious engineering decisions. For design work, calculations, reports, code changes, or technical review, I don’t want agents just “vibing” through tasks.

  17. We've been running for 56 days. 8 agents coordinating via a shared memory service.

  18. I spent some time digging into Claude Code vs OpenCode, mostly from the angle of how they actually work as coding agents. More on the technicalities like: context and memory tool use subagents permissions safety and control study the recen…

  19. Hey, Software engineer here, relatively new to agentic workflows. Building a production AI concierge — user says "I'm going to Budapest tomorrow, plan my day" → agent searches our offer database, builds a plan, user books everything in one…

  20. Greetings, I'm working on a persistent AI runtime project characterized by one identity and a persistent memory. I've reached a point where I'm confident in my agent's ability to remember and build indefinitely based off its chosen persona…

  21. Agent demos are easy. Production agents are hard.

  22. Everything in AI coding has improved dramatically, model quality, speed, tool use. But one thing hasn't been solved: the agent forgets everything when the session ends.

  23. Full conversation: https://claude.ai/share/4767365a-040f-4728-8c6a-2477bdae3503 From yesterday, I think the issue is that the differences don't stand out right away, so some people jump to conclusions that 4.7 is simply lower quality. 4.7…

  24. Agents Are Just Loops Strip away the hype and an AI agent is a simple pattern: a language model that can call functions. The model doesn't execute code.

  25. I’ve been thinking about agent identity more than agent intelligence lately. With MCP, tool use, agent to agent workflows, and autonomous assistants getting more common, the question is not just “can the agent do the task?” It is also, Is…

  26. I’ve been testing and building AI agents for a while now, and I keep noticing that many “agents” online are basically just chatbots with extra branding. Some can talk well, but struggle when it comes to: reliability long-term memory tool u…

  27. Hi r/LocalLLaMA, I built an open-source MIT-licensed desktop app - cursor-aware AI overlay, hold a key, ask AI about whatever's around your cursor, vision LLM answers with a screenshot of the cursor region as context. Currently it routes t…

  28. Hi everyone. I wanted to introduce a tool / product that I've been working on for a while.

  29. I am looking for recommendations on the best CLI agents people are using for serious coding workflows that involve tool use, shell commands, and multi step iteration. I am especially interested in anything that works well with custom APIs…

  30. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.

  31. You can compare models on function calling, multi turn tool use, schema adherence. Basically, there's a good amount of public data at the model layer.

  32. I’m building a small “Orchestration as Code” repo for LLM workflows. Does this concept make sense?

  33. Curious how many people here are running language models locally as part of their agent stack. What model are you using and what are your system specs?

  34. Anthropic’s recent SpaceX compute deal made me think less about Claude specifically and more about the infrastructure side of AI products. We often compare models by reasoning, coding ability, context windows, tool use, pricing, or UX, but…

  35. We run a small content-monitoring agent for our growth team. Nothing fancy on paper.

  36. I built a small experiment using Claude (mainly for reasoning + responses) and added a memory layer + tool execution on top. Idea was simple: make a persistent agent that doesn’t forget context and can actually do things instead of just re…

  37. Spent the last several months using Claude Code well beyond the editor: as the reasoning engine inside a multi-layer system that handles tickets, cross-repo implementation, code review, MRs, and a persistent knowledge layer between session…

  38. I am working on a home AGI project called Helix-AGI. I am currently looking for collaborators to help test and troubleshoot.

  39. I use AI coding tools a lot, so this is not an anti-AI post. If anything, the problem is that they are useful enough to change how I work.

  40. Built this over the past month as a free reference site for people getting into AI agents. What tools to use, where to start, what each tool does, and how the agent-tool landscape fits together.

  41. Hi HN, I built Arkloop – an open-source, local-first Agent client. You can think of it as Claude Desktop, but open source with its own taste.

  42. A lot of frontier discussion still treats progress as more chain-of-thought, more spectacle, and more obvious “this model feels genius” moments. But an open release like Ling-2.6-1T hitting Hugging Face today makes me think a different kin…

  43. Agentic AI systems - capable of goal interpretation, world modeling, planning, tool use, long-horizon operation, and autonomous coordination - introduce distinct control failures not addressed by existing safety frameworks. We identify six…

  44. everyone hitting Claude Code rate limits knows the pain you're mid-build, momentum is real, then it just stops. you wait 5 to 9 hours, restore the cache, come back to a session already at 30% used before you typed a single line.

  45. I'm working on a homelab AI server with the goal of running small models on GPU and very large models on CPU - for example for overnight coding on complex problems. Specs: 2990WX, 256GB + RTX 2080ti (for now).

  46. I’m trying to isolate the looping / repetition issue some people have been reporting with DeepSeek V3.2 around April 2026, especially in agentic or tool-use setups on hosted providers like OpenRouter and SiliconFlow. Public model pages des…

  47. QClaw-4B is a 4-billion parameter language model fine-tuned for agentic tasks and tool use, designed for use with OpenClaw-compatible agent frameworks. Despite its compact size, QClaw-4B achieves state-of-the-art results in the 4B class, m…

  48. NEW: System Prompt: Harness instructions — Core interactive-agent harness guidance for terminal markdown output, permission handling, <system-reminder> context, compaction, tool use, and clickable code references. NEW: System Prompt: Memor…

  49. Did some test tasks with v4 flash. The context management, tool use accuracy and thinking traces all looked excellent.

  50. Hooks are "if then" rules for Claude Code. Each one has an event, a matcher, and a command.

  51. I've been building a personal AI assistant called Finn that runs entirely on your Mac. No cloud, no subscription, no data leaving your machine.

  52. I'm building a small text-based game where the gameplay loop is "talk an NPC into revealing a secret." It's basically a 20+ turn roleplay stress test: the model needs to stay in character, remember what the player said earlier, and refuse…

  53. been spending ages lately trying to tighten up citation + traceability in RAG-based research workflows, and I’m starting to feel like “retrieval” and “verifiability” are still pretty loosely coupled in most stacks.Typical setup (vector sea…

  54. Hey r/LocalLLaMA, Dropping a release I've been working on during AIMO3 (Kaggle competition). Took NVIDIA's Nemotron-3-Super-120B-A12B (latent MoE + Mamba2 hybrid), REAP-pruned from 512->256 experts (removed MTP layer too), LoRA-RL fine-tun…

  55. Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents VAKRA Dataset | LeaderBoard | Release Blog | GitHub | Submit to Leaderboard We recently introduced VAKRA, a tool-grounded, executable benchmark for evaluating how well AI agent…

  56. Been running the new model entire evening in different quants and coding tasks with OpenCode. Used oMLX and LM Studio.

  57. Claude Opus 4.7 is Anthropic's most capable generally available model, released April 16, 2026. It outperforms Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on key benchmarks including agentic coding, multidisciplinary reasoning, scaled tool use,…

  58. Google and Alibaba recently shipped Gemma 4 and Qwen3.5, so I wanted to see whether the new generations are actually better on my setup. My context is private local chat running on my own hardware, a Mac mini M4 Pro.

  59. Been heads-down on the backend today. Three things worth knowing about: The big one: NicheIQs is now available as a ChatGPT GPT.

  60. Large language model (LLM)-based systems are becoming increasingly popular for solving tasks by constructing executable workflows that interleave LLM calls, information retrieval, tool use, code execution, memory updates, and verification.…

  61. Across Twitter and Reddit, I keep seeing the same complaint: Claude feels worse. Not on a benchmark.

  62. Quick question for folks here working with LLMs If you could get ready-to-use, behavior-specific datasets, what would you actually want? I’ve been building Dino Dataset around “lanes” (each lane trains a specific behavior instead of mixing…

  63. I run OpenClaw agents with access to email, calendar, and files, and kept worrying about them doing things I never actually asked for. ArmorClaw captures intent and cryptographically binds the agent’s tool use to that committed intent.

  64. I created a router that gives me access to Arena.ai models, and I generated an API key for each of the available models. I’m looking for a CLI tool that can run multiple AI agents together, each handling different tasks like planning, secu…

  65. For those using Claude Code in production workflows, where do you see the biggest net time savings? In my experience, it reduces cognitive load for writing scripts and scaffolding, but debugging effort seems to increase as codebases grow.

← all threads