model

Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled

huggingface.co/lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled ↗

182091 downloads·146 likes·text-generation·transformers

from the model card

Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled A reasoning-distilled variant of Qwen3.6-35B-A3B taught to imitate the chain-of-thought style of Claude Opus 4.7, the frontier reasoning model from Anthropic. The goal: port Claude-grade reasoning behavior into a permissively-licensed Mixture-of-Experts model that an individual can actually run. Why this model Claude-style reasoning, open weights. Claude Opus 4.7 is one of the strongest reasoning models available, but only via a proprietary API. This model has been fine-tuned on ~8k high-quality reasoning traces produced by Opus 4.7, teaching the base to think before answering — with explicit … blocks — in Claude's structure and cadence. Sparse activation, dense knowledge. The base is a 35B-parameter MoE with 256 experts, 8 routed + 1 shared, of which only about 3B parameters are active per token. You get the capacity of a 35B model at the inference cost of a small dense model. Full-quality bf16 inference runs on a single 80GB A100 or H100. Long thinking supported. 64k token context. The model routinely emits 5–30k tokens of reasoning on hard problems before giving the final answer — which is the whole point of reasoning models, and why this one was specifically trained end-to-end with an upstream teacher that also reasons explicitly. Clean base to build on. LoRA adapter is also published separately (…-adapter), so you can app…

discussions

recent items

← all models