model roundup

Qwen 3

7 items · started 2026-04-19 · ongoing (last activity 2026-04-22)

Qwen3 TTS is seriously underrated - I got it running locally in real-time and it's one of the most expressive open TTS models I've tried (www.reddit.com)

+5813 1h qwen
LLM speed t/s (www.reddit.com)

+550 13h llama
7B showdown on 18GB (benchmark) (www.reddit.com)

+21 23h deepseek

Hey r/LocalLLaMA, I've been coding for a while but not in the local AI space and wanted to run some benchmarks on my 18GB M3 Pro. The theme of this one was "specialists vs generalists" at the 7-8B range: qwen2.5-coder:7b, deepseek-r1:7b, m…
Which kind of base/fine-tunes have you done? And which data did you use? (www.reddit.com)

+1 3d fine-tuning qwen
Acceptable prompt processing speed for you? (www.reddit.com)

+338 3d agentic
TPU v7x Ironwood vs Nvidia B200 (www.reddit.com)

+2 4d moe

Google published Ironwood inference benchmarks in their AI-Hypercomputer/tpu-recipes repo. Nvidia has InferenceMAX numbers for B200.
Tokens per second - RTX 5000 Ada generation (www.reddit.com)

+14 5d ollama

Hi everyone, I am testing the LocalLLaMA. I have a laptop with an RTX 5000 Ada generation, with Ollama and Open Webui.