Tested DFlash speculative decoding on oMLX — Results are mixed.

reddit-localllama · www.reddit.com ·4 pts·2 replies ↗ ·1d

I spent time this evening benchmarking the new DFlash block-diffusion speculative decoding in oMLX v0.3.5-rc1 on my Mac Studio M2 Max, 96GB. Couldn't find much real-world data out there, so here's what I got.

claude

open →

← back to top