We have summaries annotated by real humans that we benchmark various models, using an LLM as a judge, we found that in the 30B params range, Qwen 3 tops it out, followed by Gemma 4. It feels like newer Qwens are optimized to perform agentiβ¦
model
Qwen3-0.6B
huggingface.co/Qwen/Qwen3-0.6B ↗
15189206 downloads1189 likestext-generationtransformers
from the model card
Qwen3-0.6B Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Model Overview Qwen3-0.6Bβ¦
discussions
recent items
Newer Qwen models are worse at summarization? (www.reddit.com via reddit) OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization (www.reddit.comhttps) Levi: Run AlphaEvolve on your local QWEN 30B (www.reddit.com via reddit) Hi r/LocalLLaMA, Wanted to share something I'm excited about. I've been fascinated by AlphaEvolve and its results for more than a year now, but running the open source frameworks gets expensive fast.
Looking for a local "NotebookLM for lawyers" setup β what am I doing wrong? (www.reddit.com via reddit) Hello everyone I am totally new to LocalLLMs and only used chatGPT/Claude/NotebookLM before. So bear with me π I'm an attorney and would like to analyze and summarize case files locally for privacy/confidentiality reasons.
Gemma 4 E4B as a primary local LLM (replaced Qwen) (digg.com via hn) Gemma 4 E4B 6bit is now the local model of my choice and loaded 24/7 on my Mac (using @lmstudio), replacing Qwen3, 3.5 4B after ~9 months of usage What an insane model, congrats @GoogleDeepMind π€ The new setup replaces his nine-month dailyβ¦
Claude Code 2.1.165 + Ollama (qwen3:8b / qwen2.5-coder:7b) instantly throws "response exceeded 32000 output token maximum" even for "hi" (www.reddit.com via reddit) I'm trying to use Claude Code with local Ollama models, but every prompt fails with: The strange part is that it happens even for extremely small prompts like: hi say apple What is 1+1? Answer with only one character.
Initial testing with llama-bench and 3 different Qwen3 models for my R9700 32GB (www.reddit.com via reddit) In a recent build I did I used dual R9700 32GB cards but I wanted to see how a single R9700 stacked up against other hardware I had access to. I created a simple benchmark with llama-bench and ran it on a few different setups.
PSA: Gemma 4 12B is NOT completely broken for coding and tool calling, you need a special chat template (www.reddit.com via reddit) This is a PSA for people like me who tried it and hit the wall with tool calls failing left and right, so much so that harnesses like OpenCode just didn't work: There is a fix for that. You need to pass a better chat template file, which iβ¦
Tuning CPU-only Qwen3-30B inference with an IBM Quantum sampling loop (github.com via hn) Qwen Air QPU/MCP Lab Quantum-enhanced autoresearch for high-performance, CPU-only Mixture-of-Experts LLM inference on legacy hardware. This repository contains the benchmark harness, MCP-style tool boundary, experiment logs, paper draft, aβ¦
Turning every "no thats not what i meant" in chat into actual LoRA training data (www.reddit.com) i kept running local models on my own hardware, they'd say something dumb, id sit there going "no thats not what i meant", id close the chat and the model never learned. so i built the correction loop into a desktop app.
I ran a quantization shootout on Qwen3-Coder and the results are... interesting (www.reddit.com) Out of random curiousity I ran a shootout on Qwen3-Coder-Next. I've been using the MXFP4_MOE from unsloth for awhile as it's just really fast on my system.
↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3llama
New Release of ROCm based MLX LLM Engine - lemon-mlx-engine (www.reddit.com) Hey everyone lemon-mlx-engine just got done integrating TheRock / ROCm 7.13 into the lemon-mlx-engine which means you get to try the latest ROCm on your local hardware with the MLX engine! This also includes various bug fixes and kernel fiβ¦
↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3moe
My pipeline for the best speech to transcript results (www.reddit.com) I wished the new ASR (automatic speech recognition) models to give me the accurate output but I was disappointed, specially when the input was multilingual and noisy (all my use cases). I had to put in significant efforts in audio pre/postβ¦
Came home to find Pi with Qwen3.627B had run rm -rf ..... (www.reddit.com) on the build cache because it had run my computer out of disk space. So I assign my coding agent (pi) a task, and then leave the house.
MiroThinker-1.7, an open-weight deep research agent (Qwen3 MoE base) β mini is 30B/3B active, curious what tok/s people get on consumer hardware (www.reddit.com) As usual, disclosure first: I'm on the team that built this. Our MiroThinker-1.7-deepresearch and 1.7-mini-deepresearch API went live, mini is a deep research agent built on Qwen3 MoE (30B total, 3B active for mini).
I trained TIME: short context-triggered thinking on Qwen model instead of overthinking (www.reddit.com) Started this as a personal project for my Open-WebUI setup to use. Somehow it ended up as an ACL 2026 paper.
Seeking resources to read about llama.cpp server and how offloading works (www.reddit.com) SETUP INFO: Amd R9700 AI PRO. Using llama-cpp server, ROCM docker version.
↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3llama
Built a personal Jarvis-style AI using MCP and open models (www.reddit.com) Still heavily work in progress, but I finally built a personal Jarvis-style AI using MCP and open models. It currently supports memory, autonomous file editing, visible tool-call tracing, confirmation before dangerous actions, persistent cβ¦
Predicting Rare LLM Failures with 30Γ Fewer Rollouts (www.lesswrong.com via hn) TL;DR: We estimate how often Qwen 3 4B exhibits rare harmful behaviors with 30Γ fewer rollouts than naive sampling, using a new method that interpolates between the model and a less-safe variant in logit space. Authors: Francisco Pernice (β¦
A Qwen finetune, that feels VERY human (www.reddit.com) Hello guys, So TL;DR, I was asked by multiple people to make an Assistant_Pepe_32B version, but the best base model contender was Qwen3-32B, a model that is very hard to tune on anything other than STEM. The concept of Assistant_Pepe is anβ¦
Two flat-fee agent endpoints, no token meter: OpenClaw chat ($7/mo, 128K ctx) + All You Can Code ($19/mo, 256K ctx). OpenAI v1. (www.reddit.com) For anyone running agents (coding or otherwise) who'd rather pay a flat fee than meter tokens. Two tiers, both flat fee, both unlimited: OpenClaw ($7/mo) - Nemotron-3-Nano-Omni-30B-A3B - 128K context - For general-purpose agents: research,β¦
As of May 2026 LongCat Dit 3.5B and Moss TTS 8B are the best SOTA tts models and Qwen tts is not even close. (www.reddit.com) [Disclaimer: i am totally avoiding fish audio s2 pro because its not a real open-sourced model(non commercial license)] So the context is i asked many ai to give me best tts model as of now but most of it said qwen 3 tts, and voxtral etc.β¦
↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3↯ Qwen 3qwen
How are you all handling state for long-running agents? Stateless sandboxes are eating my evenings (www.reddit.com) ok I want to know if I am the only one. been running a local coding agent against qwen3 coder on a 4090 box, with a remote sandbox for the actual code execution.
Case Study: Dogfooding a Facebook Agent Before Deploying It to a Realtor (www.reddit.com) A real estate firm came to us wanting an AI agent that could run their Facebook page. Not a scheduler.
Qwen/WebWorld 32B/14B/8B (Qwen3 finetune) (www.reddit.com) WebWorld is a large-scale open-web world model series for training and evaluating web agents. It is trained on 1M+ real-world web interaction trajectories via a scalable hierarchical data pipeline, supporting: Long-horizon simulation (30+β¦
Show HN: Per-request emotion steering for vLLM, with batching preserved (github.com via hn) emotion-steering Extract and serve CAA-style emotion steering vectors for any HuggingFace causal LM, with a fast vLLM path for Qwen3. ββββββββββββββ ββββββββββββββββββββββ labeled β extract β vectors + AUC report β serve β contrasts βββββββ¦
Code's open. Tried building a fully real time on-device voice assistant + live translator on a phone (multilingual, STTβLLMβTTS, all local) on the Tether QVAC SDK. (www.reddit.com) Wanted to see if a real voice loop β speak, model thinks, speaks back β could run entirely on a single device today, no cloud. Same codebase doubles as a live translator (speak in language A, hear it back in language B).
[Paper on Hummingbird+: low-cost FPGAs for LLM inference] Qwen3-30B-A3B Q4 at 18 t/s token-gen, 24GB, expected $150 mass production cost (dl.acm.org via reddit) dl.acm.org Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.
Looking for Small VLM/MLLMs Alternatives to Qwen Series Models (www.reddit.com) I have tried Qwen 3 VL family of models on my rtx3060, max I can load is Q8 8b. The task is visual reasoning/ instruction following.
Qwen Meetup Draft Review Required (Function Calling Harness 2 - CoT Compliance from 9.91% to 100%) (autobe.dev via reddit) Talk at Qwen Meetup Korea end of May. Looking for review on this draft before I build PPT slides off it.