#tool-calling
3 items
What we learned trying to fine-tune a small tool-calling model from production traces (and what not to do) (www.reddit.com via reddit) What we learned building a data agent that talks to 4 database types simultaneously (DAB benchmark) (www.reddit.com via reddit) UC Berkeley published DataAgentBench (DAB) in March — 54 queries across PostgreSQL, MongoDB, SQLite, and DuckDB. Best score so far is 54.3% (PromptQL + Gemini).
Small models fail at tool selection - but it's not what I expected (www.reddit.com via reddit)