#moe

17 items

Qwen3.6-35B-A3B released! www.reddit.com

Meet Qwen3.6-35B-A3B：Now Open-Source！🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license.

↯ Qwen 3.6

moe qwen agentic

reddit-localllama ·678 pts·234 replies ↗ ·6h ·summary
DeepSeek Updated their repo DeepGEMM testing Mega MoE www.reddit.com

https://github.com/deepseek-ai/DeepGEMM/pull/304 https://preview.redd.it/vcmqwmvzijvg1.png?width=1014&format=png&auto=webp&s=76b1739925f0699b0763aa7814614dd40329c41e https://github.com/deepseek-ai/DeepGEMM/commit/a050d09461e86eb6bba35a8c74…

moe deepseek

reddit-localllama ·87 pts·8 replies ↗ ·8h ·summary
Qwen3.5-35B running well on RTX4060 Ti 16GB at 60 tok/s www.reddit.com

Spent a bunch of time tuning llama.cpp on a Windows 11 box (i7-13700F 64GB) with an RTX 4060 Ti 16GB, trying to get unsloth Qwen3.5-35B-A3B-UD-Q4_K_L running well at 64k context. I finally got it into a pretty solid place, so I wanted to s…

↯ Qwen 3.5

moe llama

reddit-localllama ·86 pts·35 replies ↗ ·21h ·summary
[P] Built GPT-2, Llama 3, and DeepSeek from scratch in PyTorch - open source code + book www.reddit.com

I wrote a book that implements modern LLM architectures from scratch. The part most relevant to this sub: Chapter 3 takes GPT-2 and swaps exactly 4 things to get Llama 3.2-3B: LayerNorm → RMSNorm Learned positional encodings → RoPE GELU →…

moe deepseek llama

reddit-localllama ·20 pts·3 replies ↗ ·1d ·summary
Gemma 4 31B passed 7/8 real-world production tests — including ones I designed to make it fail. Full prompts + outputs. www.reddit.com

I've been waiting for a capable free local LLM for a while. I think we're close — the quality is getting there fast, and Gemma 4 is the first open-weight model where I genuinely considered using it in production for simple-to-medium tasks.

↯ Gemma 4

moe gemma

reddit-localllama ·12 pts·11 replies ↗ ·1d ·summary
GPU advice for Qwen 3.5 27B / Gemma 4 31B (dense) — aiming for 64K ctx, 30+ t/s www.reddit.com

Hey all, Looking for some real-world advice on GPU choices for running the new dense models — mainly Qwen 3.5 27B and Gemma 4 31B. What I’m targeting Context: 64K+ (ideally higher later) Speed: 30+ tok/s @ tg128 minimum Power: not critical…

↯ Qwen 3.5

moe qwen gemma

reddit-localllama ·9 pts·78 replies ↗ ·17h ·summary
Alibaba open-sources Qwen3.6-35B-A3B, a 35B MoE model with 3B active parameters huggingface.co

Qwen3.6-35B-A3B [!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransf…

↯ Qwen 3.6

moe

hn ·8 pts ·6h ·summary
A note of warning about DFlash. www.reddit.com

It started saying 4/5x speed advantage against usual bf16 models (test are less optimistic but let think this is true). Then MoE gain is not that good, value was for dense models.

moe

reddit-localllama ·5 pts·11 replies ↗ ·7h ·summary
Qwen3.5 50% expert reduction success news.ycombinator.com

We surgically removed half the experts from Qwen3.5-35B-A3B to create 8 memory efficient domain specialists (coding, web, math, physics, biology, engineering, vocational, humanities). A cross-domain test shows a 96-point pass@5 gap between…

↯ Qwen 3.5

moe

hn ·4 pts·1 replies ↗ ·6h ·summary
How does MOE training ensure different experts are chosen? www.reddit.com

I’m training a coding model that is basically a large model and a mini model built into one. Think of it like a person with two heads.

moe

reddit-localllama ·4 pts·6 replies ↗ ·1d ·summary
Better MoE model inference with warp decode cursor.com

moe

hn ·3 pts ·4d ·summary
Qwen3.5 35b is sure still one the best local model (pulling above its weight) - More Details www.reddit.com

Last time I posted on how this model has performed in creating the webapp based on provided research paper. I got so much love to see people has appreciated the post and of-course the potential of this MOE model.

↯ Qwen 3.5

moe qwen

reddit-localllama ·1 pts·1 replies ↗ ·1d ·summary
Ask HN: How do you prepare for a mid career Research Engineer role at neo Labs news.ycombinator.com

Hey, I’m sure this question has been asked in various forms on HN. While I feel the answer might mostly stay the same, changes with various developments in AI - relevance of concepts like MoE, RL etc change - and the tools like custom Open…

moe openclaw

hn ·1 pts ·1d ·summary
How faster is Gemma 4 26B-A4B during inference vs 31B? www.reddit.com

I want to download one and usually do inference on CPU having old GPU so I'm concerned with speed. One link on the web (I have posted with it and post been removed): Multiple users are reporting that Gemma 4's MoE model (26B-A4B) runs sign…

↯ Qwen 3.5

moe qwen llama+1

reddit-localllama ·16 replies ↗ ·15h ·summary
Is Gemma 4 26B MoE or 31B good as an MCP agent for coding with Xcode? www.reddit.com

Thanks

↯ Gemma 4

moe gemma mcp

reddit-localllama ·1 replies ↗ ·1d ·summary
Hardware needed for Gemma 26B MoE vs Qwen 14B for ~100–300 users (vLLM, single node?) www.reddit.com

↯ Qwen 2.5

vllm moe qwen+1

reddit-localllama ·16 replies ↗ ·2d ·summary
Gemma4 vs Qwen3.5! MoE vs Dense! Sota vs Obsolete! Porque no los dos? www.reddit.com

↯ Qwen 3.5

moe

reddit-localllama ·4 replies ↗ ·2d ·summary

← all tags