model roundup

Qwen 2.5

7 items · started 2026-05-17 · closed 2026-05-30

Built a config sweep CLI for llama.cpp and vLLM and found out Q4_K_M beat Q8_0 by 230ms TTFT on Qwen2.5-7B (www.reddit.com)

19 4w vllm llama

I have been coming to this subreddit to understand what the optimal config is to run a model on a given hardware setup. I referred to specific benchmarks, but they are too generic and do not consider the underlying hardware.
Hermes w/cloud LLM and w/local LLM does it work? (www.reddit.com)

+13 4w ollama gpt-5 openclaw+1

I’ve tried openclaw locally for about a month. Hardware: M5 Pro w/48 gb ram.
My experience using Claude code with Local Llm, and full guide on how to set it up (www.reddit.com)

+2511 4w ollama claude-code

Wanted to share a workflow I tested on a real flight, in case anyone else is trying to set up offline Claude Code. The core idea: using ollama to pull the needed model of what you need, and then use it to run claude code The setup, in orde…
Show HN: Charm – on-device spelling, grammar, and prediction for macOS (www.theodorehq.com via hn)

+2 4w gemma qwen

I've spent the last year building Charm, a native macOS menu bar app that corrects spelling, fixes grammar, and predicts your next word. Three features: - Spells: NSSpellChecker plus a local LLM for context-aware corrections (catches "defi…
I offloaded a multi-step background loop from Claude Code to a local agent OS. They started voting on their own system rules. (www.reddit.com)

+1 5w ollama qwen claude-code

Hey r/ClaudeAI, If you are using Claude Code or building terminal agents, you know the exact moment the context window starts degrading during long-running tasks. I wanted to build a persistent runtime layer to offload those heavy, multi-s…
An overview of modern LLM compiler stack: writing an interactive and hackable compiler (www.reddit.com)

+13 5w

Hey r/LocalLLaMA, Production ML compiler stack is brutal: TVM is 500K+ lines of C++. PyTorch piles Dynamo, Inductor, and Triton on top of each other.
"Qwen 3 72B" doesn't exist — and it's in a surprising number of places that act like it does (www.reddit.com)

9 5w hallucination moe qwen

spent today auditing my own model catalog and noticed 39 of my own pages confidently reference "qwen 3 72b" with apache 2.0 licensing, a 2025-09-15 release date, and a 131k context window. seemed normal — qwen 2.5 had a 72b, why wouldn't q…

← all threads