model roundup

Qwen 3.6

5 items · started 2026-05-28 · closed 2026-05-31

Qwen3.6 35B - TXT vs Markdown vs HTML vs HTML+CSS (www.reddit.com)

+154 4w claude-code

Theres been talk of late about using HTML rather than markdown in Claude Code. I was curious how this worked with a local model so loaded up Qwen3.6 35B A3B at Q8 and F16 KV cache.
VLLM gives 5x speed of llama but quants not available (unsloth/gguf). What to do? (www.reddit.com)

+939 4w vllm llama

EDIT - IGNORE. I MADE A MISTAKE.
I'm seeing low draft acceptance when using Qwen3.x MTP, what am I doing wrong? (www.reddit.com)

+314 4w llama

I'm using llama.cpp, and I've tried Bartowski's and my own quants. When using Qwen3.5-122B or Qwen3.6-27B, I'm seeing really low draft acceptance in chats with interleaved code snippets (chatting with the LLM about programming / a code pro…
Qwen3.6-35B-A3B-APEX / 128K ctx on RTX 3060 12GB — 37 t/s gen with 72k ctx filled, PPL 3.25, offloading 17GB model (www.reddit.com)

+1221 4w llama

I'm posting this because it may be helpful to squeeze the 12GB VRAM in the 3060. All credit goes to spiritbuun's fork (github.com/spiritbuun/buun-llama-cpp) and mudler's APEX quantizations (huggingface.co/mudler).
Krasis update: Qwen3.6-35B-A3B (Q4) at reading speed, 1x 8GB 3070 Mobile laptop (32GB RAM) (www.reddit.com)

+1813 4w

Context Krasis is an LLM runtime for running models that don't fit into VRAM. Krasis streams the model through VRAM from system RAM efficiently and handles prefill and decode as separate architectures and optimised usecases.