Meet Qwen3.6-35B-A3B:Now Open-Source!🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license.
#moe
17 items
Qwen3.6-35B-A3B released! www.reddit.com DeepSeek Updated their repo DeepGEMM testing Mega MoE www.reddit.com https://github.com/deepseek-ai/DeepGEMM/pull/304 https://preview.redd.it/vcmqwmvzijvg1.png?width=1014&format=png&auto=webp&s=76b1739925f0699b0763aa7814614dd40329c41e https://github.com/deepseek-ai/DeepGEMM/commit/a050d09461e86eb6bba35a8c74…
Qwen3.5-35B running well on RTX4060 Ti 16GB at 60 tok/s www.reddit.com Spent a bunch of time tuning llama.cpp on a Windows 11 box (i7-13700F 64GB) with an RTX 4060 Ti 16GB, trying to get unsloth Qwen3.5-35B-A3B-UD-Q4_K_L running well at 64k context. I finally got it into a pretty solid place, so I wanted to s…
[P] Built GPT-2, Llama 3, and DeepSeek from scratch in PyTorch - open source code + book www.reddit.com I wrote a book that implements modern LLM architectures from scratch. The part most relevant to this sub: Chapter 3 takes GPT-2 and swaps exactly 4 things to get Llama 3.2-3B: LayerNorm → RMSNorm Learned positional encodings → RoPE GELU →…
Gemma 4 31B passed 7/8 real-world production tests — including ones I designed to make it fail. Full prompts + outputs. www.reddit.com I've been waiting for a capable free local LLM for a while. I think we're close — the quality is getting there fast, and Gemma 4 is the first open-weight model where I genuinely considered using it in production for simple-to-medium tasks.
GPU advice for Qwen 3.5 27B / Gemma 4 31B (dense) — aiming for 64K ctx, 30+ t/s www.reddit.com Hey all, Looking for some real-world advice on GPU choices for running the new dense models — mainly Qwen 3.5 27B and Gemma 4 31B. What I’m targeting Context: 64K+ (ideally higher later) Speed: 30+ tok/s @ tg128 minimum Power: not critical…
Alibaba open-sources Qwen3.6-35B-A3B, a 35B MoE model with 3B active parameters huggingface.co Qwen3.6-35B-A3B [!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransf…
A note of warning about DFlash. www.reddit.com It started saying 4/5x speed advantage against usual bf16 models (test are less optimistic but let think this is true). Then MoE gain is not that good, value was for dense models.
Qwen3.5 50% expert reduction success news.ycombinator.com We surgically removed half the experts from Qwen3.5-35B-A3B to create 8 memory efficient domain specialists (coding, web, math, physics, biology, engineering, vocational, humanities). A cross-domain test shows a 96-point pass@5 gap between…
How does MOE training ensure different experts are chosen? www.reddit.com I’m training a coding model that is basically a large model and a mini model built into one. Think of it like a person with two heads.
Better MoE model inference with warp decode cursor.com Qwen3.5 35b is sure still one the best local model (pulling above its weight) - More Details www.reddit.com Last time I posted on how this model has performed in creating the webapp based on provided research paper. I got so much love to see people has appreciated the post and of-course the potential of this MOE model.
Ask HN: How do you prepare for a mid career Research Engineer role at neo Labs news.ycombinator.com Hey, I’m sure this question has been asked in various forms on HN. While I feel the answer might mostly stay the same, changes with various developments in AI - relevance of concepts like MoE, RL etc change - and the tools like custom Open…
How faster is Gemma 4 26B-A4B during inference vs 31B? www.reddit.com I want to download one and usually do inference on CPU having old GPU so I'm concerned with speed. One link on the web (I have posted with it and post been removed): Multiple users are reporting that Gemma 4's MoE model (26B-A4B) runs sign…
Is Gemma 4 26B MoE or 31B good as an MCP agent for coding with Xcode? www.reddit.com Thanks
Hardware needed for Gemma 26B MoE vs Qwen 14B for ~100–300 users (vLLM, single node?) www.reddit.com Gemma4 vs Qwen3.5! MoE vs Dense! Sota vs Obsolete! Porque no los dos? www.reddit.com