model roundup

Qwen 3.6

4 items · started 2026-05-28 · closed 2026-05-31

Qwen3.6 35B - TXT vs Markdown vs HTML vs HTML+CSS (www.reddit.com)

+154 4w claude-code

Theres been talk of late about using HTML rather than markdown in Claude Code. I was curious how this worked with a local model so loaded up Qwen3.6 35B A3B at Q8 and F16 KV cache.
VLLM gives 5x speed of llama but quants not available (unsloth/gguf). What to do? (www.reddit.com)

+939 4w vllm llama

EDIT - IGNORE. I MADE A MISTAKE.
I'm seeing low draft acceptance when using Qwen3.x MTP, what am I doing wrong? (www.reddit.com)

+314 4w llama

I'm using llama.cpp, and I've tried Bartowski's and my own quants. When using Qwen3.5-122B or Qwen3.6-27B, I'm seeing really low draft acceptance in chats with interleaved code snippets (chatting with the LLM about programming / a code pro…
Qwen3.6-35B-A3B-APEX / 128K ctx on RTX 3060 12GB — 37 t/s gen with 72k ctx filled, PPL 3.25, offloading 17GB model (www.reddit.com)

+1221 4w llama

I'm posting this because it may be helpful to squeeze the 12GB VRAM in the 3060. All credit goes to spiritbuun's fork (github.com/spiritbuun/buun-llama-cpp) and mudler's APEX quantizations (huggingface.co/mudler).