model

Qwen3-0.6B

15189206 downloads·1189 likes·text-generation·transformers

from the model card

Qwen3-0.6B Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Model Overview Qwen3-0.6B…

discussions

Qwen 3 9 ongoing since 2026-04-13

recent items

Show HN: Flint – A 30B model fine-tuned for less repetition (springboards.ai via hn) 5 pts· 21h

As frontier LLMs have very little output diversity even for open ended queries. We built Flint to see if we could reverse this.

↯ Qwen 3 mmlu
Compile English function descriptions into 22MB neural programs that run locally via llama.cpp (www.reddit.com via reddit) 15 pts·7 replies· 1d

We built a system where a neural compiler takes a plain-English function description and produces a "neural program" (a combination of a continuous LoRA adapter and a discrete pseudo-program). At inference time, these adapt a fixed interpr…

↯ Qwen 3 llama
Deepseek-r1 thinks for 30 minutes? (www.reddit.com via reddit) 9 replies· 20h

I was trying to ask a question about coding using DeepSeek-R1-0528-Qwen3-8B-Q4_K_M, and the thinking took 30 minutes??? https://preview.redd.it/kex3fgg4lgvg1.png?width=277&format=png&auto=webp&s=5f7e7cdc8502b935ea8b8fb83e0e4af60c3c4533 I h…

↯ Qwen 3 deepseek
Potential Local LLM Setup Question (www.reddit.com via reddit) 2 replies· 23h

I want to set up a local coding llm, maybe with Qwen3:30BA3B (i have heard it's good). I want to use what I have as much as possible, I have an old desktop with a Ryzen 5600G and 16GB DDR4 RAM.

↯ Qwen 3
Qwen 3 Coder Next has a bug! Help Test? (www.reddit.com via reddit) 31 replies· 1d

Hey y'all. So I've stumbled upon a really specific and esoteric "bug" where an llm can't comprehend a URL in like, 90% of scenarios.

↯ Qwen 3 qwen
inference on the Qwen3 -Coder-480B-A35B-Instruct with 4xH200 (www.reddit.com via reddit) 7 replies· 1d

Hello guys, I want to do the inference on the Qwen3 -Coder-480B-A35B-Instruct. I have a 4xH200 machine.

↯ Qwen 3
Built a Japanese ASR benchmark because existing ones can't measure quality differences properly (www.reddit.com via reddit) 9 pts· 2d

↯ Qwen 3 fine-tuning
ClaudeCodeCLI vs OpenCode vs Cline vs QwenCode (www.reddit.com via reddit) 2 pts·4 replies· 2d

↯ Qwen 3 cline mcp
openrouter/elephant-alpha is 99% Chinese, likely Qwen 3 Nex (www.reddit.com via reddit) 4 replies· 2d

↯ Qwen 3 qwen

← all models