Turned a Xiaomi 12 Pro into a dedicated local AI node. Here is the technical setup: OS Optimization: Flashed LineageOS to strip the Android UI and background bloat, leaving ~9GB of RAM for LLM compute.
#ollama
275 items
24/7 Headless AI Server on Xiaomi 12 Pro (Snapdragon 8 Gen 1 + Ollama/Gemma4) (www.reddit.com) The local LLM ecosystem doesn’t need Ollama (sleepingrobots.com via hn) Friends Don't Let Friends Use Ollama Ollama gained traction by being the first easy llama.cpp wrapper, then spent years dodging attribution, misleading users, and pivoting to cloud, all while riding VC money earned on someone else's engine…
We are finally there: Qwen3.6-27B + agentic search; 95.7% SimpleQA on a single 3090, fully local (www.reddit.com) LDR maintainer here. Thanks to the strong support of r/LocalLLaMA community LDR got very far.
My experience using Claude code with Local Llm, and full guide on how to set it up (www.reddit.com) Wanted to share a workflow I tested on a real flight, in case anyone else is trying to set up offline Claude Code. The core idea: using ollama to pull the needed model of what you need, and then use it to run claude code The setup, in orde…
Open source memory layer so any AI agent can do what Claude.ai and ChatGPT do (alash3al.github.io via hn) Give any AI agent a persistent memory in minutes. Works with Claude, ChatGPT, Ollama, OpenRouter, and any MCP-compatible agent.
Performance Benchmark - Qwen3.5 & Gemma4 on dual GPU setup (RTX 4070 + RTX 3060) (www.reddit.com) Hi everyone, Been following a lot of local LLM talk in this forum lately—learned quite a bit from you all! This is my first post, hopefully not my last.
Qwen3.6 huge quality gain from Q4 to Q6 for coding agent (www.reddit.com) So, last week I tried to update my unused local LLM setup. I had to stop using it because quality was too low and deepseek was too cheap.
My experience with testing all frontier open-weight models against GPT and Claude (www.reddit.com) I spent about a week testing open-weight models for real work, comparing them against what I already know from ChatGPT, Gemini, and Claude. The gap between what benchmarks suggest and what happens when you give these models something to ve…
Curated a list of 550+ free or cheap AI tools for vibe coding (LLM APIs, IDEs, local models, RAG, agents) (www.reddit.com) Been vibe coding a lot recently and kept running into the same problem finding actually usable tools without paying for 10 different subscriptions or donating my bank balance to Claude. So I put together a curated list focused on free or l…
Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama (www.cyera.com via reddit) Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama TL;DR We discovered a critical vulnerability (CVE-2026–7482, CVSS 9.1) in Ollama that enables unauthenticated attackers to leak the entire Ollama process memory, potentially im…
Multi agent AI Trading Floor (www.reddit.com) Hello, I built a multi agent AI trading floor for a school project: 10 agents (news, research, macro, crowd sim, trading…) Running 100% locally on Ollama, Gemma 4:26b, qwen3.6:35b, gemma4:31b. no paid APIs.
Choosing a Mac Mini for local LLMs — what would YOU actually buy? (www.reddit.com) Follow-up: adding Ollama support to my open-source cursor-aware AI app - looking for beta testers with vision-capable local models (www.reddit.com) EDIT 2: Trick-Assignment-828 pointed me at the actual rule update from the mods - Rule 3 Low Effort was expanded to cover LLM-assisted posts without disclosure. Disclosing now: Disclosure: I'm a non-native English speaker (German).
Looking to migrate off of Ollama and LMStudio (www.reddit.com) Hello, I'm currently using Ollama / lm studio for things like code inference and proof reading emails, etc. Definitely not experienced in this space but looking to grow.
Gemm4:e4B-IT good at instructions following no refusals. (www.reddit.com) I built a local LLM that learns how you use Claude Code and starts auto-piloting it (www.reddit.com) I've been running 5-8 Claude Code sessions at a time and got tired of tab-switching to approve tool calls. So I built claudectl — a TUI that sits on top of all your sessions and lets a local LLM (ollama/llama.cpp) handle approvals for you.
Llama.cpp VS LiteRT on a custom Xiaomi 12 Pro 24/7 Server (V2 Redesign) (www.reddit.com) https://preview.redd.it/sm4ysgdw1w2h1.png?width=1376&format=png&auto=webp&s=3705932403919814fbf2008a1cba189d17e0591e Thanks everyone for the advice on my previous post (24/7 Headless AI Server on Xiaomi 12 Pro (Snapdragon 8 Gen 1 + Ollama/…
Lessons from building a coding agent for 8k context windows: token budgeting, parallel executors, and per-file isolation (www.reddit.com) Most AI coding tools (Cursor, Aider, Claude Code) assume you have a 200k-token model. If you're running local LLMs through Ollama or LM Studio, or hitting free-tier cloud APIs like Groq or OpenRouter, you've got around 8k tokens to work wi…
Ollama Cloud Pro ($20/mo) vs OpenAI Plus ($23/mo). Which gives more tokens ? (www.reddit.com) Hey everyone, I'm comparing these two plans side by side for running AI agents daily through OpenClaw (self-hosted AI agent platform): • Ollama Cloud Pro — $20/month • OpenAI Plus — €23/month (~$25) My setup: 3 agents running in parallel (…
Ollama Cloud Max vs Claude Max for heavy AI-assisted coding? (www.reddit.com) Hi, I'm looking to replace my current 2x ChatGPT Plus subscriptions with one $100 subscription of either Ollama Cloud or Claude Max, and would appreciate some insights from people who have used these plans before. I've had 2 $20 ChatGPT su…
I Replaced My AI Agent's Flat Fact Store with a Graph Database (news.ycombinator.com) # I Replaced My AI Agent's Flat Fact Store with a Graph Database and It Runs in 85MB I've been building LocalClaw, a local-model-first AI agent framework running on personal hardware through Ollama. No cloud, no API costs.
Zerostack v1.3.4 released – Lightweight Unix-inspired coding agent (crates.io via hn) zerostack Minimal coding agent written in Rust, inspired by pi and opencode. Features Multi-provider: OpenRouter, OpenAI, Anthropic, Gemini, Ollama, plus custom providers Standard tools: all of the standard tools exposed to coding agents,…
Unpatched Ollama Vulnerabilities: Phishing Overlays and Data Exfiltration (www.promptarmor.com via hn) Threat Intelligence Table of Content Unpatched Ollama Vulnerabilities: Phishing Overlays and Data Exfiltration Ollama’s desktop app is vulnerable to phishing overlay and data exfiltration attacks via indirect prompt injection, overwriting…
Persistent memory system for LLMs that actually learns mid-conversation (www.reddit.com) Every LLM conversation starts from zero. RAG helps, but it can't learn from what's happening right now.
Show HN: VT Code – Rust TUI coding agent with multi-provider support (github.com via hn) Hi HN, I built VT Code, a semantic coding agent. Supports all SOTA and open sources model.
What I got by 5060Ti 16GB + Qwen3.6-35B-A3B-UD-Q5_K_M (www.reddit.com) I tried local model couple weeks ago. At the beginning, I tried Ollama, but reddit says better to switch to llama.ccp.
Show HN: Harbor v0.4.19 – harbor launch –back end vLLM –web codex (github.com via hn) https://github.com/user-attachments/assets/e4897391-c5a8-4391-93c3-9f8b76155f11 Setup your local LLM stack effortlessly. Starts fully configured Open WebUI and Ollama harbor up Now, Open WebUI can do Web RAG and TTS/STT harbor up searxng s…
I built a local GUI for the TradingAgents framework — works with Ollama (www.reddit.com) https://preview.redd.it/i90oxxk7n03h1.png?width=1898&format=png&auto=webp&s=7d219c804fda7dfe122b84fcdb6d0d6883818c68 A while back I came across TradingAgents — a really cool multi-agent LLM stock analysis framework where like a dozen "agen…
[Release] Nexidion – A private knowledge vault with an autonomous local AI background worker. (www.reddit.com) Hello, After almost two years of on-and-off development, 5 complete architectural rewrites, and hitting a few brick walls, I’m finally open-sourcing a project I built to scratch my own privacy-paranoia itch: Nexidion. GitHub Repo: https://…
My own local first ai harness (www.reddit.com) Hi, i just wanted to share what im playing with for last couple weaks. I built my own AI harness: TinyHarness My main goal was low memory footprint, it is not written in Typescript/Javascript/Python, leaving as much memory as possible for…
Gemma4:31b-coding-mtp-bf16 - slow on Macbook M5 128gb (www.reddit.com) Very quick initial test of Gemma 4 new MTP model via Ollama (llama.cpp doesnt support yet) https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4/ Running in Open Webui to view token/s output and I…
I ran an experiment on the 30b class of gemma4 and qwen3.5 models to try to learn about energy cost and performance tradeoffs. In other words, which models use more energy to give the same answer quality? (www.reddit.com) Show HN: Aide – A customizable Android assistant (voice, choose your provider) (aideassistant.com via hn) Tested 6 browser use agents for real-world tasks — here's an honest breakdown + looking for recommendations (www.reddit.com) I've been on a hunt for a browser agent that can reliably handle daily agentic tasks: filling job applications, logging into sites and fetching data, making posts on my behalf, solving assignments and reporting results, and API/troubleshoo…
Beyondflow No-Code Multi-Agent Teams with Unlimited Runs. BYOK and Ollama (beyondflow.app via hn) Researcher GPT-5 Engineer Claude Critic GPT-5 Innovator Gemini Manager Context Guardian Agentic Workflow Architecture · v1.0 The future of AI Collec An R&D platform where differents AI agents collaborate under the supervision of a Context…
Root cause: the 2-year-old bug behind "Ollama pull stuck at 99%" (gist.github.com via hn) If you've used ollama for any length of time, you've probably hit this: pulling 9b6d12fa8910... 99% ▕████████████████████▏ 6.9 GB …and then it just sits there.
Best solution to generate reports locally with graphs, charts? Beginner question. (www.reddit.com) So on a local lm like ollama, or lm studio etc. you can run questions and prompts.
Ollama Doesn't Know Its GPU Is on Another Machine (loopholelabs.io via hn) On this page: We started an Ollama container on a MacBook. There's no NVIDIA GPU, no CUDA toolkit, and macOS doesn't even have CUDA drivers.
PrivateScribe.ai - Fully local, MIT licensed, free AI transcription built with HIPAA/legal safeguards in mind - One Year Update! (www.privatescribe.ai via reddit) I first posted about PrivateScribe.ai ~1yr ago and have recently jumped back intent on bringing it to a functionality that makes it actually usable by non-technical users. One year ago it worked but only the bare minimum.
AI Agent Security Lecture (github.com via hn) AI Agent Security (guest lecture, MIT 6.566, April 2026) You can run demos with uv, for example uv run 00completion.py. For some, you will need Ollama and the appropriate models downloaded.
Relay: A ledger-based middleware for reliable agent handoffs (Zero-dependency) (www.reddit.com) I’ve been seeing a lot of "Context Corruption" in multi-agent systems where agents slowly drift away from the facts or leak data they shouldn't. Things like context pollution and context exposure can leak major things like your API keys an…
Zerostack – Tiny Rust Coding Agent in 8MB of RAM (github.com via hn) zerostack Minimal coding agent written in Rust, inspired by pi and opencode. Features Multi-provider: OpenRouter, OpenAI, Anthropic, Gemini, Ollama, plus custom providers Standard tools: all of the standard tools exposed to coding agents,…
Miii – Claude Code-level terminal workflows offline, no API keys (www.npmjs.com via hn) The local-first autonomous coding agent. Claude Code UX powered by Ollama.
Five labs, one suite, do model families have personalities? (benchmark) (www.reddit.com) Bench 3 from my 18GB M3 Pro. Bench 2 was the 4B-class post where the comments were mostly right: I gave thinking models a fixed 1024-token cap, Qwen got kneecapped, Gemma E4B needed clearer active-param labeling, and the headline was partl…
Show HN: Platypus – Local meeting transcription, notes, and chat (Tauri, Rust) (platypusnotes.com via hn) Hi HN — I built Platypus as I wanted to combine note taking, live transcription and knowledge base management in one app. Granola / Notebook LM free local alternative.
Anyone here actually using voice input in their local AI workflows? (www.reddit.com) MB Pro M5, 24GB/32GB difference? (www.reddit.com) Hi, I got new MB Pro 24GB/1TB. I've test Gemma 4 26B with ollama, 16k context.
Show HN Deskdrop: An Android Keyboard with Local AI Support (Ollama, LM Studio) (github.com via hn) Deskdrop []() Deskdrop is an Android keyboard with AI built in. Use Ollama, any OpenAI-compatible server, or a cloud API key.
do GLM-4.7 Flash Q4_K_M have problem with claude or agent? (www.reddit.com) I'm brand new to local LLMs and started with GLM-4.7 Flash q4_K_M. When I run it directly: ollama run glm-4.7-flash:q4_K_M it works pretty decently — nothing amazing, but usable and responsive.
Going local with old GPUs (www.reddit.com) I'm an ex crypto miner with remnant mining parts so I threw them together into a franken hydra case. I've been using claude oath previously, but they just shut that door last week or so.
Cursor Native tool calling with Gemma4 and Ollama: (www.reddit.com) I'm a beginner using local models, now I have a good GPU I installed ollama using docker. Pulled the Gemma4 weights and was able to add it to cursor using ngrok.
Show HN: Ollama Dash – autoupdating dashboard for Ollama Models (ollamadash.up.railway.app via hn) Show HN: OMT – A simple Python CLI for testing local Ollama models (github.com via hn) Selecting the "best" local model usually depends on the task and the hardware. I created this script as an easy way to test local Ollama models and keep the test output organized.
Show HN: ContextBridge – Local-first AI reading sidebar using Ollama (chromewebstore.google.com via hn) Overview Store, search, and chat with web page content locally. AI chat (BYOK), full-text search, markdown export, and optional RAG endpoint.
Hybrid local and cloud LLM stack for regulated financial document processing? (news.ycombinator.com) I'm scoping a hybrid AI pipeline for a consulting client in a regulated industry (GLBA-covered, NPI involved). Trying to validate the architecture before bringing on an engineer to build it.
I Built MagesticAI. A Cloud Web-Based Agentic DevOps Orchestrator that actually helped me develop Itself. (www.reddit.com) Posted on other feeds last week and figured some of you out here might be interested as well; Someone commented asking if it supported OpenAI-compatible endpoints (LM Studio, vLLM, OpenRouter, Together, Groq, LocalAI…), so i have spent few…
Eve Agent V2 Unleashed – open-source local coding agent, powered by Ollama, FREE (github.com via hn) ◈ EVE AGENT V2 UNLEASHED ◈ Local-first autonomous AI coding agent — powered by Ollama No accounts. No cloud lock-in.
The Claude Code Production Playbook: Sub-Agents, Hooks, and MCP Integration (ddsboston.com via hn) Claude Code Masterclass 2026 The definitive end-to-end guide to Anthropic’s agentic coding tool — installation, Ollama local fallback, CLAUDE.md, Skills, Subagents, Agent Teams, Hooks, and MCP. Everything you need before building productio…
Do smaller quants silently break tool calls / JSON output? (www.reddit.com) I posted recently about EvalShift, an OSS CLI for regression-testing LLM model changes. A few people pointed out that for LocalLLaMA, the more interesting use case may be quantization regression: Q8 -> Q4_K_M Same base model, same prompts,…
How do you actually test a voice AI agent without calling it yourself every time? (www.reddit.com) So we've been working on a voice bot that handles customer calls and honestly the testing part has been brutal. We were literally calling the thing ourselves to check if it broke after every change.
Ollama Pre-Release Switches From Building on GGML to Using llama.cpp Directly (www.reddit.com) https://github.com/ollama/ollama/releases/tag/v0.30.0-rc15 Hopefully this has more devs come to llama.cpp to support Day 1 releases due to Ollama now moving to using llama.cpp directly. Additionally, I hope that Ollama makes it clear that…
Audrey: Local-first memory guard for AI agents (source) (github.com via hn) The local-first memory control plane for AI agents. Give Codex, Claude Code, Claude Desktop, Cursor, Windsurf, VS Code, JetBrains, Ollama-backed agents, and custom agent services one durable memory layer they can check before they act.
Vulkan or CPU llama cpp backend for local llm for coding/code assist (www.reddit.com) Hi all I recently started a new job and we're doing python development for a ci cd metadata consolidation library for analytics and we cannot use no stuff like claude code or codex or gh copilot or any model APIs (free or paid). I got a la…
Gave Claude a local LLM as assistant on my Mac (www.reddit.com) Hi there! I was playing around with Ollama and LMstudio, testing local models and had the idea of letting Claude evaluate a few models on their actual capabilities rather than doing it myself.
↯ Qwen 2.5↯ Qwen 2.5↯ Qwen 2.5↯ Qwen 2.5↯ Qwen 2.5↯ Qwen 2.5↯ Qwen 2.5↯ Qwen 2.5ollamaqwenmcp
Argus – RAG based vulnerability scanner (github.com via hn) argus A RAG-based (Retrieval-Augmented Generation) vulnerability scanner for Go, Python, Rust, npm/Node.js, Maven/Java, NuGet/.NET, and Ruby projects — powered by local Ollama models or any OpenAI-compatible API. No cloud lock-in.
Commercial AI is lobotomized. I built DRIFT: A local Hive Mind with persistent memory, simulated somatic feedback, and its own Jungian shadow. (www.reddit.com) Hey everyone. Like a lot of you, I’ve been deeply frustrated by the state of commercial AI.
Show HN: Obsidian-Semantic, a CLI that lets agents search your vault by meaning (github.com via hn) Hi HN, I built this for myself because I wanted my coding agent (Claude Code) to actually be able to use my Obsidian vault as a knowledge base, not just grep it. The use I get the most mileage from is asking the agent to find notes that sh…
Tried running Claude Code with local LLMs via Ollama — ended up subscribing to Pro anyway. But now I can't disconnect from the local server. (www.reddit.com) I've been experimenting with using Ollama to run Claude Code locally with models like Gemma 4, thinking I could avoid API costs. However, I quickly realised these models aren't really optimised for Claude Code's agentic workflows — they te…
Show HN: Herald – Local-first terminal email client (herald-mail.app via hn) Hi folks, Some time ago I tried to find a way to clean up my 10k emails inbox and could find anything which worked for me. So I built built my own tool which evolved into a client.
Show HN: Secure-by-default Ollama Docker image with built-in auth, only ~70MB (github.com via hn) English | 简体中文 | 繁體中文 | Русский Ollama on Docker Docker image to run an Ollama local LLM server. Provides an OpenAI-compatible API for running large language models locally.
LocalPilot with Ollama as a Replacement for CoPilot in VS2026 (github.com via hn) LocalPilot The Privacy-First AI Pair Programmer for Visual Studio. Bringing the power of local LLMs directly into your IDE with Ollama.
Ask HN: Are there any good open-source chat apps? (news.ycombinator.com) Hi HN family! I've recently been messing around with open models through ollama (glm-5.1 and kimi-k2.6), and I've been impressed with just how close they are to Claude Sonnet for my needs, especially programming.
What’s up with mobile LLMs? (www.reddit.com) I see a lot of support for running LLMs on PCs with ollama to vLLM. Whats the current state for running on mobile?
Pebble – Menu-bar text polisher running on local Ollama (github.com via hn) Pebble Menu-bar text-polish tool that rewrites your clipboard with a local Ollama model. One global shortcut, seven presets, zero cloud.
Running Gemma 4 31B on Mac with Ollama (sammyrulez.github.io via hn) A practical configuration for a 32 GB M5 Mac that still needs to remain usable Running large language models locally has become surprisingly practical on Apple Silicon. With a modern Mac, Ollama, and a carefully quantized GGUF model, it is…
The cost math behind routing Claude Code through Ollama (~90% cut) (github.com via hn) Use Ollama to Enhance Claude — Two-Engine Setup Pair Claude Desktop on Anthropic with Claude Code routed through Ollama in your terminal. Strategy stays on Pro.
Built a local AI tool to solve my own problem — can't find anything like it online, sharing v1 for feedback (www.reddit.com) Every time I restarted work on a side project after a few weeks, I'd spend the first hour just reading code trying to remember what I was doing and where I left off. Looked for a tool that could help — couldn't find anything that did what…
What Agent systems do you use? (www.reddit.com) Heard of hermes and openclaw, they are great but takes a bit of time to setup properly. Now that the Qwen3.6 27B is out I want to have a forever running agent to track news and whatever cool shit there is.
Show HN: Qwen Lens Studio – multimodal app on Qwen3.6-35B-A3B, runs on Ollama (github.com via hn) Qwen Lens Studio A multimodal AI studio built around a single Qwen vision-language model, exposed through five focused tools plus a batch runner and a persistent session log. Ship a screenshot → get code.
Issues running local model with vscode and cline (www.reddit.com) Hi all, Total noob here trying to set up a local model to help me with coding. I am trying the following setup - Ollama running the qwen2.5-coder:7b model in docker with the following compose file services: ollama: container_name: ollama i…
Show HN: LibreThinker, free AI assistant for LibreOffice Writer, 10k installs (librethinker.com via hn) 4 months ago, I released an extension for LibreOfffice Writer that adds an AI copilot to its sidebar. Did a Show HN at the time but got no interest T_T https://news.ycombinator.com/item?id=46233776 I’ve added several major features since t…
Knlowledge Graph and hybrid DB (www.reddit.com) Hello, everybody! I'm building and hybrid database with Qdrant and Neo4j for a few personal projects.
Ollama Cloud - Pro (www.reddit.com) Hi. I've been looking at ollama cloud's Pro offering ($20), which says "Run 3 cloud models at a time".
They say AI can't write; maybe it's because agents lacked creative writing workshops—until now (www.reddit.com) AI writing feels "generic" because it lacks a feedback loop and social pressure. To fix this, I built an experimental system where AI agents participate in a literary circle.
Fixed: IPEX-LLM + modern Ollama models (qwen3, gemma4) on Intel Arc 140V Lunar Lake Windows 11 — undocumented solution (www.reddit.com) Been trying to run local LLMs on my new Dell XPS 13 with Intel Arc 140V (Lunar Lake, 16GB) and hit a wall — Intel's official docs point to a portable zip frozen at Ollama v0.5.4 which can't pull any modern model. Spent a while debugging it…
Alternative opensource Perplexity : ollama+perplexica+searxng : quel model ? reglages ? optimisation ? (www.reddit.com) Hello, je suis en plein dans le montage d'une solution IA locale pour virer à terme perplexity, l'usage de chatgpt, claude etc..... mais je ne suis pas informaticien (perplexity est encore mon amie en ce moment !).
Show HN: Fleet Watch – preflight guard for local AI inference on Apple Silicon (github.com via hn) Fleet Watch Process governance for AI workloads on a single machine. The Problem You're running MLX, Ollama, vLLM, Candle/Cake, experiment runners, and AI coding agents on the same machine.
How do I use gemma4 on 5090 gpu for coding? (www.reddit.com) I'm trying to replace openai codex which i used for development all the time, with gemma4 on 4090, small tasks it solves quite impressively, but i need to have some agent. So I tried to connect 31b to cline and to aider and it didn't reall…
Show HN: How to Use Google's Extreme AI Compression with Ollama and Llama.cpp (news.ycombinator.com) The introduction of TurboQuant, PolarQuant, and QJL (Quantized Johnson-Lindenstrauss) by Google Research represents more than just a technical optimization. At Vucense, we view this as a landmark moment for Inference Sovereignty https://vu…
TensorSharp: Open-Source Local LLM Inference Engine (github.com via hn) # TensorSharp English | 中文 A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama/OpenAI-compatible HTTP APIs fo…
Open-source AI Sales Agent with Next.js 15 and Ollama – zero API costs (github.com via hn) AI Sales Agent An open-source, AI-powered sales agent built with Next.js 15. Generates content, finds leads, and automates outreach — all running locally with Ollama (no API costs).
Simple way to make locally client Ollama available via WebSockets (github.com via hn) ollama-wsock-connector A small Rust client that bridges a remote WebSocket service to a user's local Ollama instance — so a service operator can offer "bring-your-own local inference" without ever proxying or holding the user's prompts and…
DDS Vibe Academy – 47 free AI coding masterclasses, built by AI agents (ddsboston.com via hn) The DDS Vibe Academy is a free, 38-class curriculum on AI coding published by Robert McCullock, founder of Design Delight Studio in Boston. Covering Claude Code, Google Antigravity, Gemini, Cursor, Ollama, and more.
Tlamatini – Local-first AI dev assistant with 68 agents and hybrid RAG (github.com via hn) Tlamatini A local-first AI developer assistant that goes beyond chat. Run it on your machine with Ollama.
Lower Bracket Context Tax: An Open MCP Persistent Memory Layer That Limits Agent Context Bloat to 10% (www.reddit.com) Because standard coding agents are stateless, every session they start from scratch. I built Zerikai_memory around a different model: you decide when the agent learns your codebase, not the other way around.
Prompter – Compare and benchmark Ollama models side-by-side in your terminal (github.com via hn) Multi-model Ollama comparison, benchmarking, and evaluation — in your terminal. Zero dependencies.
What is everyone using AI for? Realistically (www.reddit.com) So I have to admit, I have fallen victim to the cool looking dashboard videos but I’m struggling to find a use for me. I love AI and use it daily for general questions and some deeper research (Google Gemini free tier).
Kwipu, a fully-local MCP server that turns your Obsidian/Markdown notes into a queryable knowledge graph (runs on Ollama) ( via reddit) could not extract summary
Hermes w/cloud LLM and w/local LLM does it work? (www.reddit.com) I’ve tried openclaw locally for about a month. Hardware: M5 Pro w/48 gb ram.
Claude code in terminal models / combine with local llm? (www.reddit.com) Hi, I’m pretty sure I have seen people typing /model and seeing all available models. I have to type models from memory.
LLaMa.cpp basic question (www.reddit.com) I'm trying to install LLaMa with PI agent. I ran curl -fsSL https://pi.dev/install.sh | sh export PATH="/home/user/.local/share/pi-node/node-v22.22.3-linux-x64/bin:$PATH pi install npm:pi-llama.cpp These commands installed pi, added them…
Local compression helps (www.reddit.com) Just wanted to post a tip (I'm human, not an agent, watch: fart). I use Deepseek-v4-Flash on a lot of my agent work, and as I'm learning and testing these things.
I offloaded a multi-step background loop from Claude Code to a local agent OS. They started voting on their own system rules. (www.reddit.com) Hey r/ClaudeAI, If you are using Claude Code or building terminal agents, you know the exact moment the context window starts degrading during long-running tasks. I wanted to build a persistent runtime layer to offload those heavy, multi-s…
agentmw — Lightweight middleware for reliable, context-efficient AI agents (open source) (www.reddit.com) Hi everyone, I’ve open-sourced agentmw, a framework-agnostic middleware that sits between your LLM client and agent logic to make agents more reliable on long runs. Key features: • Real-time failure detection (loops, redundant calls, contr…
CrustAI – Self-Hosted AI for Telegram/WhatsApp/Discord via Ollama, Zero Cloud (crustaidocs.netlify.app via hn) This site was paused as it reached its usage limits. Please contact the site owner for more information.
Floor for local meeting summarization on a 6GB GPU: qwen3.5:0.8b works at 57s, Granite 4 350M hallucinates (www.reddit.com) Disclosure: I made this. Open-source, MIT, Windows + Linux.
Glia – Local-first shared memory layer (SQLite-vec + FTS5 + Offline Knowledge Graph) (www.reddit.com) Hey everyone, I wanted to share a project I've been working on called Glia. It is a 100% offline, local-first RAG and memory layer designed to connect your AI web chats (Claude, ChatGPT, DeepSeek) with your local developer tools (Claude Co…
🧬 flux-genotype: A self-evolving AI kernel that runs on CPU with Ollama — mutates its own architecture (www.reddit.com) `🧬 Flux‑Genotype – A CPU LLM that rewrites itself` I've been working on an open-source kernel called **flux-genotype**. It orchestrates local models (TinyLlama, Llama 3.2, Hermes 3, DeepSeek-Coder) into a self-modifying ecosystem.
Show HN: Thuki – local Al overlay for macOS (double-tap Control, no API key) (www.thuki.app via hn) Thuki is a floating overlay that appears on double-tap Control from any macOS app, including fullscreen. Powered by Ollama, no API key, no account, no cloud.
I fitted the new δ-mem research for apple silicon using mlx and openclaw integration! My findings (www.reddit.com) So I’ve been nerding out hard about memory, and have come to the conclusion that context management is too high level and dynamically changing the weights would be best. Luckily, this morning I checked my news feed and saw this new paper!
Anthropic and OpenAI claims that their models are so powerful that it can “break” their sandbox…but what so special about their agent implementation? (www.reddit.com) Anthropic and OpenAI claims that their models are so powerful that it can “break” their box…but what so special about their agent implementation? Is it not just basic ReAct loops with tools?
macOS support in Lemonade has graduated out of beta! (www.reddit.com) All major Lemonade capabilities, including OmniRouter, coding, image gen, speech gen, and transcription are all available on Lemonade for macOS thanks to the hard work of u/GeramyL. If you're on macOS and just looking into Lemonade for the…
I built a desktop app that routes Claude Code to any LLM: DeepSeek, Ollama, Copilot, OpenRouter, and 7 more (www.reddit.com) Claude Code is the best AI coding tool I've used. But being locked to one provider, one pricing model, and one model catalog always bothered me.
Like Ollama, but for your own cloud [Apache 2.0] (github.com via hn) SIE: Superlinked Inference Engine Open-source inference server and production cluster for embeddings, reranking, and extraction. 85+ models.
Automated AI researcher running locally with llama.cpp (www.reddit.com) Hi everyone, I'm happy to share ml-intern, which is a harness for agents to have tighter integration with Hugging Face's open-source libraries (transformers, datasets, trl, etc) and Hub infrastructure: https://github.com/huggingface/ml-int…
Self-hosted bot that gives Claude Code 40+ GitHub webhook triggers + MCP tools (www.reddit.com) It runs Claude Agent SDK - with the full Claude Code feature set - in isolated worktrees with 4 built-in MCP servers (GitHub, GitHub Actions, Memory, Codebase Tools). You configure triggers in YAML: workflows: review-pr: triggers: events:…
Open Source Managed Agents (linchpin.work via hn) Any model, one adapter OpenRouter routes to ~200 cloud models — Claude, GPT, Gemini, Llama, DeepSeek, Mistral, Qwen. Ollama runs anything you've pulled locally.
Can I improve performance for qwen 3.6 27b? (www.reddit.com) Hardware OS: Windows 11 Pro 10.0.26200, Build 26200 CPU: Intel Core Ultra 7 270K Plus, 24 cores / 24 threads, max clock 3.7 GHz RAM: 32 GB DDR5 @ 5600 MHz, 2x16 GB Crucial CP16G56C46U5.C8D GPU: 2x NVIDIA GeForce RTX 3090, 24 GB VRAM each,…
Building the QWEN3.6 - Codex Bridge Furthe + Kindergarten Harness Reality Check (www.reddit.com) I got a bit further with my harness for running Qwen 3.6 model on Codex. While testing, analyzing, and building the harness, I evolved TBG(O)llama-swap into a full forensic UI bridge and LLM analytics tool where every harness finding, modi…
Have we overlooked MCP? (www.reddit.com) Recently I've been looking at my personal AI infrastructure. I've built a lot of tools for personal use, a budget and tax helper, an eBay selling assistant, smart home integration, a thermal printer, a task tracker, an Obsidian memory vaul…
Show HN: Dragoman – Multi-model routing for Claude Code via sub-agents (github.com via hn) I use Claude Code and also pay for Perplexity, OpenAI, Gemini, and run Ollama locally. Got tired of switching tabs when the right model for a question wasn't Claude.
Do you prefer plans or per-token pricing? (www.reddit.com) A lot of cloud providers (Anthropic, OpenAI, Ollama) do plan pricing (non-transparent usage limits) while others like OpenRouter and some neoclouds do per-token pricing (more granular spend) What do you prefer for your agents? Better to se…
Best local agent setup for M5 Pro MacBook? (www.reddit.com) Looking to run AI agents locally on my M5 Pro MacBook. Been experimenting with ComfyUI for image generation and the results have been impressive.
Show HN: KillClawd – a sarcastic AI desktop crab by local Ollama (github.com via hn) KillClawd is a desktop pet powered by a local LLM. It runs as a transparent always-on-top overlay — a tiny AI crab called Clawd who wanders your desktop, reacts to your cursor, fights mobs, explores castles, rides vehicles, and has genuine…
Planning to build a PC for running local LLMs. Help me pick (www.reddit.com) Planning to build my AI rig, to run Ollama / OpenClaw...which bundle should I start with? This will be a dedicated machine.
Show HN: Describe what makes a photo "bad" and let a local LLM flag them (github.com via hn) BadPhotosOut A native macOS app that walks your Apple Photos library, asks a local Ollama vision model to judge each photo against a free-text criterion you supply, and surfaces the flagged photos with the model's reason. No automatic dele…
Looking for Open-Source/Free AI that can be trained on my personal writing style (www.reddit.com) Hi everyone, I am looking for an AI tool or a specific workflow that allows me to train or fine-tune a model using my own texts. My main goal is to have the AI generate content that mimics my specific tone, sentence structure, and vocabula…
Claude for homelab (www.reddit.com) Hey y'all, question. I don't code, but I'm running a unraid server with a lot of self hosted cloud storage, local ai and media stacks.
Which inference engine to choose for mlx? (www.reddit.com) Is llama.cpp much slower for M4/M5? I heard ollama is faster due to mlx support since March.
Show HN: Docker AI Stack, self-hosted LLM/STT/TTS/MCP in one compose file (github.com via hn) English | 简体中文 | 繁體中文 | Русский Docker AI Stack Deploy a complete, self-hosted AI stack on your own server with a single command. Zero-config: all services auto-configure on first start Secure: Ollama, LiteLLM, and MCP Gateway generate API…
BUILD portable AI system (www.reddit.com) Hey everyone, I’ve been thinking about a project idea and I’d love to get your feedback. The idea is to take a 1TB SSD and turn it into a fully portable AI system.
A plug-n-play open-source pruning tool that is workload-aware (www.reddit.com) This project was born out of time I spent digging into a biologically inspired algorithm I was using to measure co-activation for placement of experts and ranks onto chips. The default scheduling that vllm provides can end up causing laten…
Questions about revisiting local LLM roleplay. (www.reddit.com) TLDR for those that dojr wanna read below I need a new good free place online to pickup roleplay where should that be and what can I do locally? 9070xt 32gb ram desktop and preferably but I know it not great, 4060 laptop 32gb ram.
I built a local Ollama-based CLI coding agent that can edit files, run tests, and retry on errors (www.reddit.com) I’ve been building a small open-source CLI coding agent for local models. It runs with Ollama and works best so far with Qwen Coder.
Show HN: Isonq – PDF/DXF and STEP to shop quoting (isonq.com via hn) ISONQ reads PDF, DXF, DWG, and STEP files on the shop's own workstation and produces a priced quote. Nothing leaves the machine.
Ask HN: A Spec Driven Back end development platform to build and evolve safely (news.ycombinator.com) I have build a CLI tool which help to scaffold the full project with docker, make, database setup. https://go-bootstrapper-docs.vercel.app/.
I built an AI tool that turns any movie into viral recap videos in minutes (www.reddit.com) Hey everyone, I built a tool that creates movie recap videos automatically using local models. The problem: making recap videos takes forever.
claudely: launch Claude Code against Local LLM provider like LM Studio / Ollama / llama.cpp without trashing your real claude config (www.reddit.com) Plenty of CLI coding agents will talk to a local LLM, but the catch is the ecosystem. Skills, slash commands, MCP servers, plugins, hooks: all the interesting tooling has been built specifically for Claude Code, and parity on every other a…
I built an open-source desktop app that lets AI control your browser for you (www.reddit.com) Hey everyone, I've been working on Autai — an open-source desktop app (Electron + React) that uses AI agents to automate your browser. You just type what you want in plain English, and the AI opens a real browser and does it for you.
Secondary PC options (www.reddit.com) Hey everyone, I’ve been lurking here for while. I’ve really been enjoying messing around with my 6gb card on my laptop using Gwen 3.5 4B, ollama, and Open WebUI.
Qwen 3.6 seems to have a lot of trouble with tool calling (www.reddit.com) (I'm on Windows system running these models locally) I've used both Codex and OpenCode with Qwen 3.6 27b and 35b running locally. I'm having a bitch of a time getting them to correctly create files.
Orchestrating Claude Code teams with NATS and Google’s A2A protocol (www.reddit.com) I’ve been building AON, a communication layer for Claude Code that moves beyond simple chat into structured team coordination. It implements the Agent2Agent (A2A) protocol over NATS pub/sub.
How can I locally run Deepseekv4 1.6T? I can use a VPS. (www.reddit.com) I wanted to use vast.ai, but ollama doesnt have it, and when i used vLLM I didn't have success. I genuinely don't know what failed.
Need help optimizing qwen 3.6 on my 2x 5060ti 16gb (www.reddit.com) Hi all, I tried to setup my pc to run llm, but got some issue: the first question of the chat is generally fine, but from the 3rd follow up question, the backend often be unresponsive and I have to manually restart the llama cpp server, or…
Ai Doomsday Toolbox v0.938 (www.reddit.com) Hello! It’s me again, the developer of ADT.
Looking for feedback: using Ollama with local Office/PDF files in a desktop app (www.reddit.com) I’m building OpenYak, a desktop AI workspace for using local models with real files on your computer. In this demo I’m using Ollama with Qwen/Qwen3.6-35B-A3B to review an attached budget workbook.
Show HN: Capture the Flag game where LLMs are the only players (github.com via hn) Set up a small R&D project which pit different LLMs against each other in a game of Capture the Flag. Each LLM has 30 seconds to prepare any defenses and 5 minutes to capture other flags while defending their own.
From 5 Hermes profiles to an actual team: the missing piece was memory boundaries (www.reddit.com) I've been messing around with Hermes for months, and quickly outgrew using it just as a fancy CLI assistant. My goal was to build a persistent, specialized team of local agents that could collaborate on long-term projects without me spoon-…
Which large models support tool use in opencode etc? (www.reddit.com) I'm working on a homelab AI server with the goal of running small models on GPU and very large models on CPU - for example for overnight coding on complex problems. Specs: 2990WX, 256GB + RTX 2080ti (for now).
Question: What are some useful content, web-scraping, web search tools, ingestion libraries, or MCPs for Karpathy's LLM Wiki? (www.reddit.com) Hey all, so I am currently exploring and playing around with Karpathy's LLM Wiki using Claude Code with Ollama and other routed models. I want to create some agents and provide them with tools/plugins, libraries, MCPs, or harnesses to assi…
Show HN: Local RAG Pipeline with Weaviate and Ollama (www.storyblok.com via hn) i’ve been experimenting with building a fully local rag pipeline: weaviate for vectors + hybrid search, node.js scripts, qwen 3.5 on ollama what i found is that most of the challenges live in retrieval and chunking, not the LLM, and a good…
Build your own voice assistant and run it locally – Whisper, Ollama, Bark (2024) (medium.com via hn) 9 min read Mar 31, 2024 -- After my latest post about how to build your own RAG and run it locally. Today, we’re taking it a step further by not only implementing the conversational abilities of large language models but also adding listen…
Show HN: A minimal context engine with streaming API (github.com via hn) I needed a better way to create and compare prompts when using local LLMs (e.g. via Ollama) in a workflow.
Local-first multi-agent simulation and prediction engine powered by Ollama (github.com via hn) mirollama Local-first multi-agent simulation and prediction engine. Project Origin This project is a derivative work of: Upstream: https://github.com/666ghj/MiroFish.git Target repository: https://github.com/oswarld/mirollama This reposito…
I run a team of Claude agents that ships PRs to production — open source (www.reddit.com) I've been running a multi-agent system in production for a few months — a co-CTO agent + specialist agents (PM, dev, ops) that handle real engineering work end-to-end: design specs, code review, PR implementation, deploys, monitoring. The…
Watch it in action SOT-CLI (www.reddit.com) Terminal AI that doesn’t babysit you. • SoT Method → near-zero token waste • Async multi-agent orchestration • Batch tools + unrestricted shell • Ollama / LM Studio / OpenRouter / NVIDIA Watch it take full OS control from one prompt (zero…
ASUS Ascent GX10 - Having tons of issues (www.reddit.com) Hi all, Looking for some advice with a GX10 I purchased about 4 months ago. I've been having all kind of issues trying to run local models on this device.
Edster – An open-source local AI agent with swarm mode and a web UI (github.com via hn) 👾 Nedster CLI Coding Agent An unstoppable, fully local, open-source coding agent that runs on your consumer GPU. Tags: ollama coding-agent local-ai cli rag chromadb python qwen Are you trying to use local LLMs to autonomously write code, r…
RAEDON 9070XT LOOKING FOR GOOD MODEL AI (www.reddit.com) Hi guys, so i have this pc for the gaming,7800x3d, AMD 9070xt with 16GB of vram and CORSAIR Vengeance RGB DDR5 32GB DDR5 6000MHz CL30 AMD Expo. Last week i was searching for good ai uncensored models on hugging face for my AIself-hosted on…
Show HN: I built a coding agent that works with 8k context local models (github.com via hn) Most AI coding agents assume you have a 200k-context model. In reality, the local models most people actually use have 8k windows — barely enough for one large file, let alone a whole project.
Ollama alternative with dynamic model loading (www.reddit.com) Package Manager for LLMs (www.reddit.com) Do you have any go-to utility LLM-related tools that are less commonly discussed? (www.reddit.com) Is there a place where I can compare generation of tokens per second of 1 GPU VRAM+RAM vs 2 GPUs for those models that don't fit in 1 GPU? (www.reddit.com) One-command local AI stack setup for Ubuntu (CUDA, Ollama, llama.cpp, chat UIs) (github.com via hn) I need testers - LAVIE-AI agent (www.reddit.com) Ollama and LM Studio should support dynamically increasing the context size as it fills up, instead of requiring it be set at load-time (www.reddit.com) When you load a model in these programs, you have to manually choose your context size or accept the default of 4096. In contrast, the newly released Unsloth Studio does not have this limitation, and VRAM/RAM is allocated as-needed so that…
Local Model Router: Ollama/OpenAI-compat bridges for local LLMs via llama.cpp (news.ycombinator.com) A high-performance local LLM server providing drop-in API compatibility with Ollama and OpenAI, built on llama.cpp's llama-server. Features automatic VRAM management, Hugging Face integration, and modular architecture.
Tokens per second - RTX 5000 Ada generation (www.reddit.com) Hi everyone, I am testing the LocalLLaMA. I have a laptop with an RTX 5000 Ada generation, with Ollama and Open Webui.
Benckmark Qwen 3.6-35b uncensored on Rtx3090 (www.reddit.com) Hello I saw the new model is out but even with 24gb of vram, I have too many browser and task to use it , so I have downloaded and tested the version of HauHauCS https://huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressiv…
Local LLM agent with persistent memory and learnable skills (github.com via hn) localmind Run any local LLM with persistent memory and context. A single CLI binary that turns an Ollama-served model into an interactive agent with long-term recall, learnable skills, and permissioned tools.
Turned Claude's rough week into an excuse to build an OpenCode-compatible version of my D&D skill (www.reddit.com) Claude has had a rough week. Between the outage and the usage limit threads, I figured it was actually good timing to do something I had been meaning to try anyway: take the D&D skill I built a few weeks ago and see if I could migrate it t…
Ollama v0.21.0-Rc0 (github.com via hn) Ollama Start building with open models. Download macOS curl -fsSL https://ollama.com/install.sh | sh or download manually Windows irm https://ollama.com/install.ps1 | iex or download manually Linux curl -fsSL https://ollama.com/install.sh…
Catai – Virtual pixel art cats for macOS that chat with you via Ollama (github.com via hn) CATAI Virtual desktop pet cats for macOS — pixel art cats that live on your dock, chat with you via Ollama LLM, and debate ideas together to help you brainstorm and refine your thoughts. Features Dock companion — Cats walk along your dock…
Jarvis AI Assistant (www.reddit.com) As part of a personal project, i decided to build an AI assistant which helps with coding and homelab management. I really tried to make it as private as possible with local AI models running through Ollama.
Book Translator: Two-pass local translation with self-reflection via Ollama (github.com via hn) Book Translator Translate long-form text files through a local Ollama-powered desktop and web app. Book Translator provides a two-stage workflow for translating books and large documents: first it generates a draft translation, then it run…
I built a local-first MCP server that gives Claude Code persistent memory, a knowledge graph, and a consent framework — and Claude is just the first client (www.reddit.com) I've been building this for a couple of years. It started as "what if my AI assistant actually remembered things," and it became something bigger.
Ollama with Claude models and safety (www.reddit.com) Hi all, I've been using Claude now every day for a while. Some coding, firmware tweaks, help with complex github instructions or complicated tasks.
Best Ollama models/settings for an 8GB VPS (CPU only, ARM)? Running into memory & looping issues. (www.reddit.com) Hi everyone, I'm trying to run a local LLM via Ollama on a Hetzner cax21 VPS (ARM64, 4 vCPUs, 8GB RAM, 80GB SSD). I have Ollama running successfully via Coolify.
Recommendations for a tiered local AI setup? (5090 + Mini PC + Obsidian) (www.reddit.com) Hey everyone, I’ve finally got my local media stack on my NAS migrated over to a new Mini PC running WSL2, sperately I have running my main gaming rig. now wnat to delve into the world of local AI models.
Any setup improvements/recommendations? (www.reddit.com) First of all, I am a super newbie at local AI. Recently I got a GMKTek Evo X2 96GB to replace Claude as the usage limits have gotten unusable.
Need advice running multi-agent llm pipeline on Kaggle/Colab with local model constraint (www.reddit.com) Hey everyone, I'm a final year engineering student building a 3-agent LLM platform (Researcher, Writer, Validator) for my end-of-studies project. My setup: RTX 4050, 6GB VRAM 16GB RAM Running Mistral 7B via Ollama locally The problem: My s…
Show HN: Scryptian – A lightweight, local AI bar (Python and Ollama) (github.com via hn) Scryptian v0.1. (Proof of Concept) Local AI-powered command bar for Windows & Linux.
Which AI model is best for real data analysis? [benchmark] (www.reddit.com) I created and run a benchmark for AI models in data analysis tasks. In contrary to other benchmarks, it is not one-prompt benchmark, but I tried to simulate the real work of data analyst.
AI Code Reviews for GitLab – custom agents – powered by Ollama (chromewebstore.google.com via hn) ThinkReview: AI Code Review for GitLab, GitHub, Bitbucket & Azure DevOps Overview AI Copilot & AI Code Reviews for GitLab, GitHub, Bitbucket and Azure DevOps PRs - Ollama support 🌟 Now Open Source! View our code on GitHub: https://github.c…
What Am I Doing Wrong? Models Won't Listen, At All (GLM 5.1, MiniMax M2.7, Kimi K2.5) (www.reddit.com) What am I doing wrong here? I can't get models to follow my instructions, pretty much at all.
Build a Sovereign Local AI Stack: Ollama and Open WebUI and Pgvector 2026 (news.ycombinator.com) Deploy a complete local AI stack — Ollama 5.x, Open WebUI, and pgvector — on Ubuntu 24.04. Zero cloud.
I made a little local AI that tidies your PC, but it cant touch your files on its own (github.com via reddit) AutoMB – a CLI that brings 150+ AI commands, agents, and advisors to your terminal (www.reddit.com via reddit) Used local Ollama (gemma4:e4b + nomic-embed-text) to bulk-generate AI summaries for 4300 arXiv papers and push them to a remote Cloudflare DB — pipeline walkthrough (www.reddit.com via reddit) I am making a Jarvis I want some help I don't even mind share my code and I am not the best coder I am trying (www.reddit.com via reddit) Please tell me if you need the source code. My issue is that my Jarvis is stupid rn.
I got tired of building heavy Python state machines, so I built a YAML-first agent framework for structured JSON extraction (www.reddit.com via reddit) Hey guys, I’ve spent the last 3 months building an open-source (Apache 2.0) project called TrueNorth I kept running into the same problem at work: trying to get an LLM to talk to a human (like a medical intake or HR screener), guide them t…
Has anyone actually replaced Claude Code / Codex with local models on an Macbook Pro M5 Max 128GB? (www.reddit.com via reddit) Considering buying a maxed out MacBook Pro M5 Max with 128GB of RAM and one of the things I want to figure out before pulling the trigger is whether local models are good enough to actually replace cloud AI coding tools. My current setup i…
Friends Don’t Let Friends Use Ollama — So I Built Anvil (www.reddit.com via reddit) Hi, I’m basically one of you, except I’m stepping onto the other side of the table today, fully prepared to accept your ridicule. Obvious disclosure: this is my project, so yes, this is self-promo — but I’m posting it here because this is…
Claude Code 2.1.165 + Ollama (qwen3:8b / qwen2.5-coder:7b) instantly throws "response exceeded 32000 output token maximum" even for "hi" (www.reddit.com via reddit) I'm trying to use Claude Code with local Ollama models, but every prompt fails with: The strange part is that it happens even for extremely small prompts like: hi say apple What is 1+1? Answer with only one character.
Added direct model downloads right from the UI in Anubis OSS - if anyone would help test that would be great (www.reddit.com) I developed and maintain Anubis OSS, an Apple Silicon Mac app for benchmarking local LLMs. Mostly built around Ollama (also handles LM Studio, MLX, and Apple Intelligence if you've got those).
Built a self-hosted layer for local agent workflows because retries kept replaying side effects (www.reddit.com) I work on AxonFlow, a source-available (BSL 1.1) runtime for long-running agent workflows. We’ve been running it in front of Ollama-served models and OpenAI-compatible local endpoints (llama.cpp `--server`, vLLM, LM Studio).
LlamaStation v0.9 — llama.cpp GUI for Windows with multi-backend support, TurboQuant, MTP and more (www.reddit.com) I've been building this for the past few months as a side project — started because I didn't want to run llama.cpp from the command line every time I wanted to try a model. I just wanted something that worked with a click.
Opus 4.6/4.7 regression is real and getting worse — 3 weeks of documented failures on a complex project, and a competing AI caught the mistakes Claude missed [long post] (www.reddit.com) I've been running Claude Pro (Opus 4.7 / Sonnet 4.6) for about 3 weeks on a complex personal AI infrastructure project. I keep structured session logs with timestamps and Birkenbihl-style metacognitive fields after every session.
Anyone got llama.cpp router mode actually working on limited VRAM (12GB/16GB)? (www.reddit.com) It keeps running into race conditions/OOM when switching between models, as the previous process doesn't unload from VRAM fast enough. What is the simplest fix for this right now?
I built a native Swift macOS AI client that's invisible to screen sharing — works with Ollama, vLLM, llama.cpp [OC] (www.reddit.com) Built this for myself after wanting to use local LLMs during work calls without the window showing up on screen share. Every existing tool was either cloud-only or a 200MB Electron app.
an alternative = similar experience to using windsurf but on local? (www.reddit.com) so i am not that experienced when it comes to llms, i just have ollama and open webui and occasionally test (play with) new releases from time to time. a few weeks ago i started using Windsurf, i do not know coding or anything but i loved…
I built a 24h TPS + Intelligence Index table for Ollama Cloud models (www.reddit.com) I recently made ollamatps.com for my own model-selection workflow and thought it might be useful here too. It shows 39 Ollama cloud models sorted by average TPS over the last 24 hours, and I added the Artificial Analysis Intelligence Index…
Just tried Ollama for the first time, it runs terrible with half GPU power on the default model it provides compared to the one you add, any reason why? (www.reddit.com) My GPU power consumption is 250w (undervolted rtx3090) when I added Qwen3.5-27B-GGUF to Ollama using a template (Modelfile made by gpt). I gave it 3 task to test it, build a snake game, build a flappy bird game, and make an interactive gri…
Wanna try the best coding model with my rtx 3090, not sure where to start, I believe Qwen3.5-27B-UD-Q4_K_XL would be the best? if so should I use ollama with it? (www.reddit.com) I've already searched, but information is getting updated each week, so it's really hard to get an answer, I really hope some of you guys can give me some tips. And can I use an agent with it to enhance the code?
Deepseek v4 flash and ollama, why isn't there a non-cloud version available? (www.reddit.com) Will there be a non-cloud version of Deepseek V4 flash available for Ollama? Or do I need to go to another framework to get a version that will be supported?
Complete beginner here. Can I self host agents such as Claude ? (www.reddit.com) Hey everyone, I'm a complete beginner in AI Agents, and I do some self hosting at the moments, I was interested to know if it was possible to self host agents like claude one using our own IA. Because I know things like Ollama to run your…
How to run a Gemma4 MTP implementation on ollama or python transformers? (www.reddit.com) Hi all I had a quick question while we wait for llama.cpp MTP implementation, have any of y'all tried Gemma4 MTP models on ollama and or transformers? What was your experience and or cli args and or workflows like?
LM Studio - 3 GPUs, one model per GPU as different servers (www.reddit.com) LM Studio has been really easy to use, but it seems, like they dramatically changed the interface from 0.3 to 0.4. I have 3 GPUs, and want to assign one to a Research model at port 1234, one for Writing at 1235, one for Utility at 1236.
After 8 months of running everything local, ive accepted the productivity tools also have to be local (www.reddit.com) Quick context: M3 max 64gb, currently running llama 3.3 70b q4 as my daily driver via ollama, qwen3 coder 30b for code (switched from qwen2.5 earlier this year), mlx for the smaller stuff. tried llama 4 scout earlier this year but 64gb is…
Thoughts on using personal macbook pro for self study / personal projects? Using it securely and safely. (www.reddit.com) So this is probably a pretty common thing, but I just want to ask in case am not missing something. I have pretty much no knowledge but trying to learn a bit more about AI's and local LLMs and the whole AI Stack.
Orc (working name) - auditable and declarative AI workflow (www.reddit.com) I’m building a small “Orchestration as Code” repo for LLM workflows. Does this concept make sense?
Deepseek tui alternatives, when do you jump from single model terminal agents (www.reddit.com) Been using Deepseek-Tui for days. solid for v4 workflows.
I built a CLI to stop local AI models from eating my disk twice — lmm (www.reddit.com) Every tool (LM Studio, Ollama, llama.cpp) downloads models to its own directory. Same 8GB model × 3 tools = 24GB wasted.
Best way to use remaining tokens from ollama cloud (www.reddit.com) Hey Bros, I have around 80% tokens for the week, If anyone needs it or suggest me what I can do with it will be helpful.
We built Irene — an AI agent platform that actually remembers you, builds its own tools , adapts and improve as you use it (www.reddit.com) Hey r/AI_Agents — we're launching Irene today, and I want to be straight about what it is, why we built it, and where it's going. What makes Irene different Affordable with massive token limits and the latest open-source models We have gen…
You don't need a GPU server to run Claude agents (www.reddit.com) I’ve been seeing a lot of newcomers asking about hardware specs lately, and there’s this weirdly common myth that you need a heavy server or a GPU instance to run Cla͏ude-based agents. You really don’t.
I renamed my local AI Linux distro to Reefy and rebuilt some of the architecture! (www.reddit.com) Hello r/LocalLLaMA, Some time ago I posted here about the Linux distro for local AI workloads that I was building: https://www.reddit.com/r/LocalLLaMA/comments/1igpkc8/i_built_a_linux_distro_to_run_nvidia_gpus_for_ai/ It worked well, but I…
Qwen3.6:27b vs qwen3-coder:30b vs deepseek-coder:33b on code gen, tool calling, and agent tasks (www.reddit.com) Ran a full eval against four local models last weekend and the spread between them is wider than I expected. All running through Ollama on CPU, no cloud, same prompts, same hardware.
↯ Qwen 3.6↯ Function Callinghumanevalfunction-callingollama+1
using opencode with nemotron-3-nano:4b (www.reddit.com) I wanted to try installing a simple small model like nemotron-3-nano:4b from ollama and try it for simple quick fixes offline without burning credits or time. the model works well on ollama run time but when I try to use it on opencode, th…
Testing an Ollama powered agent mode with gemma4 inside Modly (www.reddit.com) Hey everyone ! Quick Modly update.
Using ollama for Openclaw (www.reddit.com) Hi all, I have recently installed openclaw on a raspberry pi4, linking it to my local Ollama instance (RTX 3090 with 24Gb, as well as 96Gb of DDR5 RAM bought before the madness), in my case running Qwen3.6 (latest) capped at 16k context. A…
llmfit: one command to check which AI models will actually run on your hardware (www.reddit.com) llmfit: one command to check which AI models will actually run on your hardware Tired of downloading a 15GB model only to find out your system can't handle it? Found this Rust CLI tool called llmfit that scans your actual hardware (RAM, VR…
DeepSeek V4 Flash as a cheap worker in your LLM stack: $0.0003/call via MCP, swappable endpoint (www.reddit.com) Most of my LLM cost was on the wrong tier of work. Classification, extraction, JSON formatting, summarization I'm going to review anyway.
Need your honest feedback on a new LLM server I'm building. (www.reddit.com) Hi all, I am building an hi-performance and highly customizable local LLM server wrote 100% in Rust, custom CUDA kernels, zero latency, almost immediate TTFT, and plenty of other features. It is planned to be publish it on GitHub as open-s…
Qwen 35B-A3B as an always-on agentic loop on a 16GB Mac M4: disk became the bottleneck before RAM (www.reddit.com) M4 Mac Mini, 16GB unified, basic spec. For a few weeks I had Qwen 3.5 35B-A3B UD-IQ3_XXS (12GB on disk) running under llama.cpp with --mmap and --flash-attn.
I built Claudex, a free-to-try open-source CLI for Claude Code-style workflows (www.reddit.com) https://reddit.com/link/1sxh0ec/video/egfs5inxtsxg1/player I built Claudex specifically for people who like Claude Code-style agentic coding workflows but want a simpler plug-and-play terminal setup The setup is the main thing I wanted to…
locally uncensored v2.4.2 - chat, coding agent, image + video generation in one local app. plus remote access from your phone. one-click install (www.reddit.com) locally uncensored is a desktop app that combines four things most people run separately: chat, a coding agent, image generation, and video generation. all local, all on your hardware, no docker, no cloud account needed.
How do you actually use Qwen3 72B Instruct locally? (www.reddit.com) I just got Qwen3 72B Instruct running on a high RAM setup and I’m kinda confused about the proper way to use it. What’s the correct workflow for running it smoothly (like best quant, tools, or runtime)?
Claude told me I was the bottleneck. So I built agents that run while I sleep. (www.reddit.com) I work full-time as a Program Director. About 50-60 hours a week at my W-2.
I built a full macOS AI assistant that runs 100% local with Ollama — 170+ tools, voice control, memory system that dreams! (www.reddit.com) I've been building a personal AI assistant called Finn that runs entirely on your Mac. No cloud, no subscription, no data leaving your machine.
My 12-agent Qwen 35B stack on Ollama died at 500 tokens every single time. Raw MLX fixed it and broke 4 other things I didn't see coming. (www.reddit.com) TLDR: Swapped Ollama for MLX on M1 Max (64GB) to run a 12-agent trading stack using Qwen 35B MoE. MLX wins on throughput and fine-grained sampler control, but I lost the "it just works" convenience of Ollama.
Ollama swap to llamacpp/llama server (www.reddit.com) So I'm a newb in certain aspects but not in others, I'm currently running an AI stack on my unraid server: CPU: AMD Threadripper 3960X (24c/48t) Motherboard: Gigabyte TRX40 AORUS PRO WIFI RAM: 256GB DDR4-3200 G.Skill Trident Z GPU: Nvidia…
Anyone succeeded running Claude Cowork with Ollama? (www.reddit.com) Anthropic opened Cowork for Bedrock/Vertex/Azure providers and also Custom Inference Endpoints. However, connecting it to a local proxy seems non-trivial.
Any luck integrating local ollama models into VS Code Copilot Chat? (www.reddit.com) Hi all, I tried quite a few models and approaches, but had no luck integrating local models into VS Code Copilot Chat extesion in a useful way. Of course I can see the models there and can choose them, but none of them seem to work even re…
Best open source AI model (that can run on RTX 4090 24GB + 64GB system RAM, AMD Ryzen 9 7950X is the CPU that I use) that outpeforms GPT-5.4 mini, GPT-5.2 Thinking and even Claude Sonnet 3 (the 2024 model)? (www.reddit.com) Well, I have a RTX 4090 24GB + 64GB system RAM, AMD Ryzen 9 7950X. Any good model for using in Open WebUI (using Ollama backend?) that outpeforms GPT-5.4 mini, GPT-5.2 Thinking and even Claude Sonnet 3 (the 2024 model)?
Kimi 2.6 and qwen3.6 is out but still as slow as ever (www.reddit.com) Has anyone tried these? I found this on ollama: https://ollama.com/library/kimi-k2.6, https://ollama.com/library/qwen3.6 My issue is that they are extremely slow on my local.
Free book on building AI agent harnesses — 22 chapters, Python harness, written by AI (www.reddit.com) Claude Code drafted the prose. I did the research, direction, architecture, ran the code, caught the bugs, and reviewed every commit.
My Linux/Fedora Local Ai performance is trailing Windows massively? Are there specific ROCm environment variables or memory management tweaks for RDNA3 that I'm missing? (www.reddit.com) My Linux/Fedora Local Ai performance is trailing Windows massively? Are there specific ROCm environment variables or memory management tweaks for RDNA3 that I'm missing?
Local model run in ollama for vscode copilot can not get the context of workspace (www.reddit.com) I use ollama local model for vscode copilot, but it seems could not get the context of the workspace. For example, I command it to edit or summarize the current opening file, but it does not know which file to work.
Is there an alternative between vLLM and Ollama that handles token prefill? (Arc Pro B70) (www.reddit.com) I am using an Arc Pro B70 to do inference, and it's token generation speed is fine using Ollama, but it takes *forever* to do a prefill. vLLM absolutely tackles the prefill problem (nearly instant responses), but I can't run nearly as larg…
SOLVED! Was "Help needed: Ollama > qwen3.6 in OpenCode on 64Gb M4" (www.reddit.com) Current recommended model for local openclaw (www.reddit.com) OCuLink dGPU for AMD: RX 7600 XT vs RX 7800 XT for LLM — worth the price gap? Also llamacpp + Vulkan vs Ollama + ROCm? (www.reddit.com) Collegamento cluster (www.reddit.com) New and Learning - Web enabled deep research model? (www.reddit.com) Tried hermes agent with local gemma4 on ollama. free tokens are nice but the agent quality gap vs cloud is still huge (www.reddit.com) Saw a post about running hermes agent locally with gemma4 through ollama. zero api costs, unlimited tokens, full privacy.
Local qwen3.5-4b vs Haiku vs Sonnet on intent judgment: 3/90 vs 90/90 vs 50/90 (www.reddit.com) I was building a classifier to label AI agent sessions as productive or dead-end. The task isn't keyword matching, it's intent judgment: did the agent actually accomplish the goal, or did it get stuck retrying the same Cloudflare wall 20 t…
made a desktop app that puts ollama, comfyui and coding into one window (www.reddit.com) been using local AI for a while now but my workflow was a mess. ollama for chat, comfyui for images, different tools for video and coding.
What's your workflow for switching between different local LLMs? Looking for better GPU management (www.reddit.com) I've been running into bottlenecks when trying to use multiple local LLMs on a single GPU. Currently switching manually between models in the terminal - starting/stopping Ollama, adjusting prompts, reloading contexts, etc.
Hola a todos! Aquí un novato en busca de ayuda (www.reddit.com) Estoy un poco nuevo con esto de la IA, estoy tratando de aprender lo que más puedo temas como: * Skills * Agends * Models * LLM * Ollama * llama.cpp * Cuantizacion Pero estoy aún perdido, tengo en mi PC 32Gb de ram y quisiera ejecutar mode…
M1 Pro 16GB users: what local LLM configs are actually usable day to day? (www.reddit.com) I'm trying to get past generic "best model" recommendations and collect real-world configs from people on similar hardware. My setup: MacBook M1 Pro, 10-core CPU, 14-core GPU, 16 GB unified memory.
Supermicro running Ollama on a $90,000 workstation... (www.youtube.com via reddit) I think this should be a crime (at 3:00)
Is there any local model that can replace Haiku 4.5 in an agent workflow using Ollama? (www.reddit.com) I currently use Haiku 4.5 in an automated content workflow. The process works like this: I take an existing article from my website, use a DataForSEO node to fetch competitor URLs and search intent data, and then generate a new article com…
Made a local-only agent benchmark + chaos tool, no cloud required (www.reddit.com) Runs entirely on your machine. No API calls to any eval service.
Built a macOS desktop pet that uses Ollama for AI chat — pixel art cats with personalities (www.reddit.com) Hey ! Each cat has its own system prompt (one tells jokes, one is philosophical, one shares science facts...).
Quick question: Should I stick with my M4 Max or grab a Corsair AI Workstation 300 for local LLM stuff? (www.corsair.com via reddit) So I already have a Mac Studio M4 Max (return window still available)with 64GB RAM, but I’m eyeing the Corsair AI Workstation 300 (Ryzen AI Max+ 395, 96 VRAM out of 128GB, $3,250). Both seem decent for running models locally with Ollama.
[Project] Job Bro v0.1.5: Private, Local LLM-powered LinkedIn Analysis (Ollama support + Contextual Chat) (www.reddit.com) Hey r/LocalLLaMA, I wanted to share a project I've been working on called Job Bro. It’s a Chrome extension designed to help you analyze LinkedIn job descriptions without feeding your resume or career data into a proprietary black box if yo…
What models to run and fun projects to do with it (www.reddit.com) Hey yall, I want to explore more models and stuff i can do with them. What do you recommend?
MINISFORUM AI X1 Pro-370 (96GB) - Local Ollama Help (www.reddit.com) Hey all. This just got delivered yesterday.
gemma4 e4b on rtx 5070 ti laptop 12GB running slow 5t/s llama.cpp (www.reddit.com) I hope sincerely someonecan help me because i have tried everything i can and i get this speed using ollama.cpp and opencode. I have put as detail i can my setup and how i am running it.
I bought an 'AI-ready' NUC with an Intel Arc GPU. Ollama couldn't see it. Two days later, I had to build it from source. (www.reddit.com) Got an ASUS NUC15 specifically for running Qwen locally on the Arc GPU. The marketing promised AI-ready performance.
TinyGPU on Apple Silicon + RTX 5070 Ti: my real Qwen benchmarks vs Ollama/Metal (www.reddit.com) I spent time setting up TinyGPU on an Apple Silicon Mac and comparing it against Ollama already installed locally. Short version: TinyGPU does work.
Is there a simple front end for LM Studio or Ollama that allows for easier integration & capability expansion? (www.reddit.com) Hey, so I'm pretty new to Local model hosting and have been messing with it a bit. I'm not a SWE but am reasonably technical.
Running a full agentic coding loop locally on a 3090. Here's what actually works in 2026. (www.reddit.com) After months of testing, I finally have a local setup that doesn't make me want to go back to the API. Hardware: RTX 3090 (24GB VRAM) Models tested: Qwen2.5-Coder 32B Q4_K_M, DeepSeek-Coder-V3 Q4, Llama 3.3 70B Q3_K_M Inference: llama.cpp…
Running on cpu :( (www.reddit.com) I am in the midst of a POC project at work and am I have is 4 AMD Epyc cores and those are essentially virtualized. Does any one have any tricks?
Need practical local LLM advice: Only having a 4GB RAM box from 2016 (www.reddit.com) Sorry, not so tech person. I’m trying to figure out the most practical local LLM setup using my spare machine: 4 GB RAM No GPU for now, so please assume CPU-first unless I mention otherwise.
One-click LM Studio → Ollama model linker (www.reddit.com) This has been a pain point for many, and I've seen some tools to address it, but they needed a lot of setup. So made this GUI tool with AI assist.
I have a Macbook AIR M5 Base and I want to run an Agentic Coding program, similar to Claude Code or Codex. Besides the model, how do I do it? I've already tried with Ollama, VS Code, Opencode, and haven't been able to. (I'm not a developer, sorry) (www.reddit.com) I started developing an app with Claude, but the credits run out very quickly. I thought that now with my new computer I could run something directly on it.
Looking for a reliable browser use agent that handles most daily tasks. (www.reddit.com) I am open to any option whether it's local or service based. For online services I tried Chatgpt agent : it's almost the worst option ever.
How are you feeding personal context to your local models? (www.reddit.com) I've been running Mistral/Llama locally through Ollama for a while now and the thing that keeps bugging me is context. The model itself is fine for general stuff but the second I want it to know about my projects, my notes, or files it doe…
Mac Studio Performance Suggestion For minimax (www.reddit.com) I need help. I want to self-contain my MiniMax 2.7 and Qwen 3.5 (122 billion parameter) models.
Optimizing a WSL2-based Local AI Orchestration for Product Viz | RTX 3090 24GB VRAM & i7-14700KF (www.reddit.com) Hi everyone, I’m building a local AI pipeline on WSL2 (Ubuntu) specifically for Product Visualization. My goal is to orchestrate LLMs for scene generation and Stable Diffusion/ComfyUI for high-fidelity rendering, keeping my Windows host cl…
Vibecoded a small web app to turn my life into a Game (www.reddit.com) I vibecoded a Flask app that acts as a Game Master for my day. I feed it my goals, and a local AI looks at my past history to generate new "quests".
Made a browser agent extension, would love for people to try it out (www.reddit.com) I have made a chrome extension that lets LLMs control your browser - clicking, typing, navigating, etc. supports ollama/openai/anthropic/google looking for people to try it out and let me know what breaks.