DFlash is real: x2 tg on small context with oMLX

reddit-localllama · www.reddit.com ·3 pts·6 replies ↗ ·1d

Right from the oven with the latest commit: DFLASH_MAX_CTX=8192 uv run python -m omlx.cli serve oMLX - LLM inference, optimized for your Mac https://github.com/jundot/omlx Benchmark Model: Qwen3.5-35B-A3B-MLX-MXFP4-FP16 ===================…

open →