model roundup

Qwen 3.6

10 items · started 2026-05-27 · closed 2026-05-30

Qwen3.6 35B - TXT vs Markdown vs HTML vs HTML+CSS (www.reddit.com)

+154 4w claude-code

Theres been talk of late about using HTML rather than markdown in Claude Code. I was curious how this worked with a local model so loaded up Qwen3.6 35B A3B at Q8 and F16 KV cache.
VLLM gives 5x speed of llama but quants not available (unsloth/gguf). What to do? (www.reddit.com)

+939 4w vllm llama

EDIT - IGNORE. I MADE A MISTAKE.
I'm seeing low draft acceptance when using Qwen3.x MTP, what am I doing wrong? (www.reddit.com)

+314 4w llama

I'm using llama.cpp, and I've tried Bartowski's and my own quants. When using Qwen3.5-122B or Qwen3.6-27B, I'm seeing really low draft acceptance in chats with interleaved code snippets (chatting with the LLM about programming / a code pro…
Qwen3.6-35B-A3B-APEX / 128K ctx on RTX 3060 12GB — 37 t/s gen with 72k ctx filled, PPL 3.25, offloading 17GB model (www.reddit.com)

+1221 4w llama

I'm posting this because it may be helpful to squeeze the 12GB VRAM in the 3060. All credit goes to spiritbuun's fork (github.com/spiritbuun/buun-llama-cpp) and mudler's APEX quantizations (huggingface.co/mudler).
Krasis update: Qwen3.6-35B-A3B (Q4) at reading speed, 1x 8GB 3070 Mobile laptop (32GB RAM) (www.reddit.com)

+1813 4w

Context Krasis is an LLM runtime for running models that don't fit into VRAM. Krasis streams the model through VRAM from system RAM efficiently and handles prefill and decode as separate architectures and optimised usecases.
Qwen/Qwen-Image-Bench · Hugging Face (huggingface.co via reddit)

+6013 4w qwen

Model Description Q-Judger is a vision-language model fine-tuned specifically for automated evaluation of text-to-image generated images. Given a text prompt and a generated image, the model evaluates the image on fine-grained quality crit…
Question: Llama cpp, whats good right now for: MTP, KV cache quant, Long context. (www.reddit.com)

+819 4w vllm qwen llama

Used the vllm version of https://github.com/noonghunna/club-3090 It worked fine for myabe 20 40k context, havent tried the new one. Anyone used the new llama.cpp patched one for single 3090?
Local LLMs on Refurb M4 Max vs new M5 Max (www.reddit.com)

+216 4w gemma

Hoping the community can guide me on this one. I'm on the fence about the following purchase: Refurbished 16-inch MacBook Pro Apple M4 Max Chip with 16‑Core CPU and 40‑Core GPU, 64gb ram for $3,479.00 vs The new 16-inch MacBook Pro Apple M…
Anyone tried a setup like this? Is it a bad idea? 😅 (www.reddit.com)

+12 4w qwen

I’m considering building a local machine for AI inference using a Dell Precision T5820 and 2 Intel Arc A770’s. From this I could get 32GB DDR4 RAM, 1TB SSD and 32GB VRAM, all for like $1000.
Need some advice on AI workflow (www.reddit.com)

+210 4w llama chatgpt mcp

Hi all, I'm somewhat new to the scene (been lurking for maybe 4-5 months now), but i think I have all the basics figured out. My setup: 9800x3d with 64GB of RAM, 6900xt with 16GB VRAM.

← all threads