thread

Qwen 3.5

54 items · started 2026-04-13 · ongoing (last activity 2026-04-16)

Qwen3.5 50% expert reduction success news.ycombinator.com

hn ·4 pts·1 replies ↗ ·6h ago ·summary · moe
Spring benchmark update: Gemma 4 / Qwen3.5 vs Gemma 3 / Qwen3 for chat www.reddit.com

reddit-localllama ·3 replies ↗ ·5h ago ·summary · gemma
Local Coding Stacks www.reddit.com

reddit-localllama ·1 pts·2 replies ↗ ·5h ago ·summary · sonnet qwen gemma+1
I got it guys, I think I finally understand why you hate censored models www.reddit.com

reddit-localllama ·120 pts·54 replies ↗ ·21h ago ·summary · qwen
Gemma4 26b & E4B are crazy good, and replaced Qwen for me! www.reddit.com

reddit-localllama ·392 pts·100 replies ↗ ·1d ago ·summary · qwen llama gemma+2
Qwen3.5-35B running well on RTX4060 Ti 16GB at 60 tok/s www.reddit.com

reddit-localllama ·86 pts·35 replies ↗ ·21h ago ·summary · moe llama
GPU advice for Qwen 3.5 27B / Gemma 4 31B (dense) — aiming for 64K ctx, 30+ t/s www.reddit.com

reddit-localllama ·9 pts·78 replies ↗ ·17h ago ·summary · moe qwen gemma
What's your favorite small-medium local model? www.reddit.com

reddit-localllama ·5 pts·10 replies ↗ ·17h ago ·summary · gemma
How faster is Gemma 4 26B-A4B during inference vs 31B? www.reddit.com

reddit-localllama ·16 replies ↗ ·15h ago ·summary · moe qwen llama+1
Been trying to get Qwen 3.5 to stop reasoning using old methods like /no_think, it didn't work, but it said something like "too late" in its reasoning www.reddit.com

reddit-localllama ·2 pts·4 replies ↗ ·1d ago ·summary · qwen
Hey, has anyone here used Qwen3.5-27B-NVFP4-GGUF with llama.cpp yet? www.reddit.com

reddit-localllama ·3 pts·15 replies ↗ ·1d ago ·summary · llama
Please help me pick the right Qwen3.5-27B format/quant for RTX5090 www.reddit.com

reddit-localllama ·2 pts·1 replies ↗ ·1d ago ·summary · openclaw
DFlash is real: x2 tg on small context with oMLX www.reddit.com

reddit-localllama ·3 pts·6 replies ↗ ·1d ago ·summary
Qwen3.5 35b is sure still one the best local model (pulling above its weight) - More Details www.reddit.com

reddit-localllama ·1 pts·1 replies ↗ ·1d ago ·summary · moe qwen
Thinking issue [Qwen3.5] www.reddit.com

reddit-localllama ·1 pts ·1d ago ·summary · gemma
Summarizing text locally, medical literature www.reddit.com

reddit-localllama ·3 pts·7 replies ↗ ·1d ago ·summary · qwen
Hot Experts in your VRAM! Dynamic expert cache in llama.cpp for 27% faster CPU +GPU token generation with Qwen3.5-122B-A10B compared to layer-based single-GPU partial offload www.reddit.com

reddit-localllama ·9 pts·2 replies ↗ ·1d ago ·summary · llama claude
The LLM tunes its own llama.cpp flags (+54% tok/s on Qwen3.5-27B) www.reddit.com

reddit-localllama ·100 pts·51 replies ↗ ·2d ago ·summary · llama gemma
My first impressions of Minimax M2.7 (Q5_K_M) vs Qwen 3.5 27b (Q8_0) www.reddit.com

reddit-localllama ·19 pts·34 replies ↗ ·1d ago ·summary · minimax qwen
Loading "stacks" of models on-demand? Does a tool like this exist? www.reddit.com

reddit-localllama ·3 pts·3 replies ↗ ·1d ago ·summary · gemma
I want to run qwen3.5 27B q4_k_m on CPU, and I need help. www.reddit.com

reddit-localllama ·2 pts·17 replies ↗ ·1d ago ·summary · llama
2x Asus Ascent GX10 - MiniMax M2.7 AWQ - cloud providers are dead to me www.reddit.com

reddit-localllama ·11 pts·10 replies ↗ ·2d ago ·summary · minimax qwen agentic+1
running models bigger than physical memory capacity www.reddit.com

reddit-localllama ·14 replies ↗ ·1d ago ·summary · qwen gemma
GRaPE 2 Model Family www.reddit.com

reddit-localllama ·3 pts ·1d ago ·summary
Any magic prompt that Local LLM never turning back until everything completed? (building frontend application with qwen3.5-35b-a3b) www.reddit.com

reddit-localllama ·1 pts·7 replies ↗ ·1d ago ·summary · gpt-5 codex claude-code+1
Llama.cpp llama-server command recommendations? www.reddit.com

reddit-localllama ·5 pts·3 replies ↗ ·2d ago ·summary · llama
PSA: Having issues with Qwen3.5 overthinking? Give it a tool, and it can help dramatically. www.reddit.com

reddit-localllama ·34 pts·11 replies ↗ ·2d ago ·summary · qwen
Qwen 122B is AMAZING but is my config right? (128GB M4 Max) www.reddit.com

reddit-localllama ·2 pts·9 replies ↗ ·1d ago ·summary · qwen llama
DFlash speculative decoding on Apple Silicon: 4.1x on Qwen3.5-9B, now open source (MLX, M5 Max) www.reddit.com

reddit-localllama ·83 pts·30 replies ↗ ·3d ago ·summary
What's the deal with Qwen3.5's and Gemma 4's reasoning traces? www.reddit.com

reddit-localllama ·3 pts·4 replies ↗ ·2d ago ·summary · gemma
Anybody got Qwen3.5-27B working with Intel Arc B70 (or similar) and proper optimization? www.reddit.com

reddit-localllama ·2 pts·15 replies ↗ ·1d ago ·summary · llama
Qwen 3.5 Small – on-device multimodal models – Alibaba / Qwen ai-tldr.dev

hn ·3 pts ·2d ago ·summary · qwen
Is there anything better than Qwen3.5-27B-UD-Q5_K_XL for coding? www.reddit.com

reddit-localllama ·43 pts·56 replies ↗ ·3d ago ·summary · qwen codex claude
Performance Benchmark - Qwen3.5 & Gemma4 on dual GPU setup (RTX 4070 + RTX 3060) www.reddit.com

reddit-localllama ·16 pts·5 replies ↗ ·2d ago ·summary
Can I combine a RTX5060ti 16gb with 7900XTX 24gb for llama.cpp? www.reddit.com

reddit-localllama ·3 pts·9 replies ↗ ·2d ago ·summary · qwen llama
Can LLM make small change to the software program? www.reddit.com

reddit-localllama ·1 pts·4 replies ↗ ·1d ago ·summary · qwen llama gemma
Alternative opensource Perplexity : ollama+perplexica+searxng : quel model ? reglages ? optimisation ? www.reddit.com

reddit-localllama ·2 pts ·2d ago ·summary · ollama qwen chatgpt+1
Comparing Qwen3.5 27B vs Gemma 4 31B for agentic stuff www.reddit.com

reddit-localllama ·28 pts·26 replies ↗ ·3d ago ·summary · gemma agentic
Is qwen3 coder next still relevant with qwen3.5 release for agentic coding? www.reddit.com

reddit-localllama ·2 pts·20 replies ↗ ·2d ago ·summary · agentic
Been out of the loop - Will this work for EXO/MLX? www.reddit.com

reddit-localllama ·1 pts·1 replies ↗ ·2d ago ·summary · qwen
Why don't Groq (with a q) and Cerebras add new models www.reddit.com

reddit-localllama ·12 replies ↗ ·2d ago ·summary · gemma
3x3090 is faster in Ubuntu than win11, GPT-OSS 120B 120tg/s vs 6tg/s why? www.reddit.com

reddit-localllama ·3 pts·24 replies ↗ ·2d ago ·summary · llama
[cupel] M5 Max 128GB: Qwen3.5-397B IQ2 @ 29 tokens per second www.reddit.com

reddit-localllama ·12 pts·8 replies ↗ ·3d ago ·summary · qwen gemma
Qwen 3.5 122B A10B running 50tok/s on DGX SPARK / Asus Ascent www.reddit.com

reddit-localllama ·5 pts·10 replies ↗ ·2d ago ·summary · qwen
Show HN: Hitoku Draft – context aware local macOS assistant github.com

hn ·3 pts ·3d ago ·summary · qwen gemma
current: 1x 16GB 5060Ti. worth a 2nd for OpenCode? www.reddit.com

reddit-localllama ·4 pts·9 replies ↗ ·2d ago ·summary · vllm llama
Gemma4 vs Qwen3.5! MoE vs Dense! Sota vs Obsolete! Porque no los dos? www.reddit.com

reddit-localllama ·4 replies ↗ ·2d ago ·summary · moe
Which AI model is best for real data analysis? [benchmark] www.reddit.com

reddit-localllama ·1 pts ·2d ago ·summary · glm gpt-5 ollama
How to run Qwen3.5-27B with speculative decoding with llama.cpp llama-server? www.reddit.com

reddit-localllama ·5 pts·14 replies ↗ ·3d ago ·summary · llama
DGX spark www.reddit.com

reddit-localllama ·1 pts·5 replies ↗ ·2d ago ·summary · vllm qwen llama+1
Lora training www.reddit.com

reddit-localllama ·3 pts·7 replies ↗ ·3d ago ·summary · qwen
What is the best way to deploy LLM on 3x3090? www.reddit.com

reddit-localllama ·13 replies ↗ ·2d ago ·summary · vllm gemma
Opinion on best suit for my hardware www.reddit.com

reddit-localllama ·2 replies ↗ ·2d ago ·summary · openclaw gemma
Speed on m5 pro 48Gb www.reddit.com

reddit-localllama ·3d ago ·summary · glm qwen gemma

← all threads