[Help] Running big dense models faster (www.reddit.com)
model roundup
Mistral 3.5
-
I have been trying Mistral 3.5 on my 4x RTX 3090 rig with llama.cpp. Inference is slow (about 11 t/s) even without anything being offloaded to the CPU.
-
Unsloth solved bug in Mistral Medium 3.5 implementation (www.reddit.com)
https://unsloth.ai/docs/models/mistral-3.5 "May 1, 2026 Update: We worked with Mistral to fix Mistral Medium 3.5 inference affecting some implementations, and released updated GGUFs with the fix (NOT related to Unsloth or our quants). Theā¦
-
Terminal Bench score for Mistral 3.5 Medium (www.reddit.com)
So... there were a couple promising benchmark scores reported by mistralai in the model card for Mistral 3.5 Medium, BUT there wasn't the one that I usually care about the most, which is TerminalBench 2.0.
-
Is Mistral-3.5-Medium-128B broken in Llama CPP? (www.reddit.com)
Trying some if Bartowski's Q4 quants. Using Vulkan with the latest main branch as of a few hours ago.