model roundup
Qwen 3
-
-
LLM speed t/s (www.reddit.com)
-
7B showdown on 18GB (benchmark) (www.reddit.com)
Hey r/LocalLLaMA, I've been coding for a while but not in the local AI space and wanted to run some benchmarks on my 18GB M3 Pro. The theme of this one was "specialists vs generalists" at the 7-8B range: qwen2.5-coder:7b, deepseek-r1:7b, m…
-
-
Acceptable prompt processing speed for you? (www.reddit.com)
-
TPU v7x Ironwood vs Nvidia B200 (www.reddit.com)
Google published Ironwood inference benchmarks in their AI-Hypercomputer/tpu-recipes repo. Nvidia has InferenceMAX numbers for B200.
-
Tokens per second - RTX 5000 Ada generation (www.reddit.com)
Hi everyone, I am testing the LocalLLaMA. I have a laptop with an RTX 5000 Ada generation, with Ollama and Open Webui.