#qwen

53 items

Qwen3.6-35B-A3B released! www.reddit.com

Meet Qwen3.6-35B-A3B：Now Open-Source！🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license.

↯ Qwen 3.6

moe qwen agentic

reddit-localllama ·678 pts·234 replies ↗ ·6h ·summary
Gemma4 26b & E4B are crazy good, and replaced Qwen for me! www.reddit.com

My pre-gemma 4 setup was as follows: Llama-swap, open-webui, and Claude code router on 2 RTX 3090s + 1 P40 (My third 3090 died, RIP) and 128gb of system memory Qwen 3.5 4B for semantic routing to the following models, with n_cpu_moe where…

↯ Qwen 3.5

qwen llama gemma+2

reddit-localllama ·392 pts·100 replies ↗ ·1d ·summary
Released Qwen3.6-35B-A3B www.reddit.com

https://x.com/Alibaba_Qwen/status/2044768734234243427 https://huggingface.co/Qwen/Qwen3.6-35B-A3B

↯ Qwen 3.6

qwen

reddit-localllama ·207 pts·38 replies ↗ ·6h ·summary
I got it guys, I think I finally understand why you hate censored models www.reddit.com

I was trying to do an easy task automatically with qwen-code using qwen3.5-122b I can totally do it myself, but I wanted to try, so maybe it could just do it entirely for me? But no, because it refused.

↯ Qwen 3.5

qwen

reddit-localllama ·120 pts·54 replies ↗ ·21h ·summary
These "Claude-4.6-Opus" Fine Tunes of Local Models Are Usually A Downgrade www.reddit.com

Time and time again I find posts about these fine tunes that promise increased intelligence and reasoning with base models, and I continuously try them, realize they're botched, and delete them shortly after. I sometimes do resort to a low…

↯ Claude 4.6

qwen opus claude

reddit-localllama ·111 pts·68 replies ↗ ·1d ·summary
Do you guys think there’s a high chance of Singularity being open source? www.reddit.com

glm qwen gemma+1

reddit-singularity ·74 pts·67 replies ↗ ·4d ·summary
Is there anything better than Qwen3.5-27B-UD-Q5_K_XL for coding? www.reddit.com

I have a 5090, so my VRAM is limited to 32GB, but i find that Qwen3.5-27B-UD-Q5_K_XL with opencode (and mmproj) does a pretty good job for my use case (mainly web development). i use claude and codex here and there, recently a lot less, be…

↯ Qwen 3.5

qwen codex claude

reddit-localllama ·43 pts·56 replies ↗ ·3d ·summary
PSA: Having issues with Qwen3.5 overthinking? Give it a tool, and it can help dramatically. www.reddit.com

I'm sure everyone has seen the posts from people talking about Qwen 3.5 over-thinking, or maybe you've experienced it yourself. Considering we're like 2 months out from the release and I still see people talk about this issue, I decided it…

↯ Qwen 3.5

qwen

reddit-localllama ·34 pts·11 replies ↗ ·2d ·summary
My first impressions of Minimax M2.7 (Q5_K_M) vs Qwen 3.5 27b (Q8_0) www.reddit.com

I'm not sure if the AesSedai's Q5_K_M version of Minimax M2.7 is too much lobotomized or if the model itself is kind of weak. I did a simple experiment with both models running with the recommended parameters.

↯ Qwen 3.5

minimax qwen

reddit-localllama ·19 pts·34 replies ↗ ·1d ·summary
[cupel] M5 Max 128GB: Qwen3.5-397B IQ2 @ 29 tokens per second www.reddit.com

↯ Qwen 3.5

qwen gemma

reddit-localllama ·12 pts·8 replies ↗ ·3d ·summary
2x Asus Ascent GX10 - MiniMax M2.7 AWQ - cloud providers are dead to me www.reddit.com

Hello, I've been on a quest to get something "close enough" of Opus 4.5 running locally, for agentic coding, as SWE with 15 years of experience. I tried with one spark (yeah I'm calling my Asus Ascent GX10 sparks - they're the same), with…

↯ Qwen 3.5

minimax qwen agentic+1

reddit-localllama ·11 pts·10 replies ↗ ·2d ·summary
Single question llm comparison www.reddit.com

haiku grok glm+6

reddit-chatgptcoding ·10 pts·1 replies ↗ ·8w ·summary
GPU advice for Qwen 3.5 27B / Gemma 4 31B (dense) — aiming for 64K ctx, 30+ t/s www.reddit.com

Hey all, Looking for some real-world advice on GPU choices for running the new dense models — mainly Qwen 3.5 27B and Gemma 4 31B. What I’m targeting Context: 64K+ (ideally higher later) Speed: 30+ tok/s @ tg128 minimum Power: not critical…

↯ Qwen 3.5

moe qwen gemma

reddit-localllama ·9 pts·78 replies ↗ ·17h ·summary
Tell HN: Qwen Free Tier Is Discontinued news.ycombinator.com

I kept getting 401 'token expired' errors on my existing Qwen session. Attempting to resume it after quitting, I got: qwen resume [API Error: 401 invalid access token or token expired] [API Error: 401 invalid access token or token expired]…

qwen

hn ·6 pts·4 replies ↗ ·14h ·summary
Llama.cpp vs LM Studio on gaming PC www.reddit.com

Here is my experience, I've been using LM Studio with RTX 5080 and 64GB RAM using Windows 11. I'm very happy with LM Studio except the speed.

↯ Gemma 4

qwen llama gemma

reddit-localllama ·6 pts·6 replies ↗ ·18h ·summary
Qwen 3.5 122B A10B running 50tok/s on DGX SPARK / Asus Ascent www.reddit.com

Hello guys, wanted to share this: https://github.com/albond/DGX_Spark_Qwen3.5-122B-A10B-AR-INT4 I am running it on my DGX Spark Int4 V2 with Max context window - and getting 50tok/sec with Multi Token Prediction: Its working great for tool…

↯ Qwen 3.5

qwen

reddit-localllama ·5 pts·10 replies ↗ ·2d ·summary
Alibaba's Qwen family captures over 50% of global open-source model downloads www.scmp.com

qwen

hn ·4 pts·4 replies ↗ ·2d ·summary
A proxy routing all webtraffic through Qwen, removing all enshittified crap geohot.github.io

zappa: an AI powered mitmproxy Soon, AI will be good enough to interact with the Internet in an indistinguishable way from a human. This can be an amazing opportunity for liberation from all the people who are targeting your attention.

qwen

hn ·3 pts ·7h ·summary
Summarizing text locally, medical literature www.reddit.com

Colleagues, I have a question: does anyone have a locally developed solution for summarizing text? Which qwant qwen 3.5 27b would be able to summarize an entire chapter of medical literature, about 25-30 A4 pages, without hallucinations?

↯ Qwen 3.5

qwen

reddit-localllama ·3 pts·7 replies ↗ ·1d ·summary
Qwen OAuth Free tier will be discontinued on 2026-04-15 github.com

An open-source AI agent that lives in your terminal. 中文 | Deutsch | français | 日本語 | Русский | Português (Brasil) 🎉 News 2026-04-15: Qwen OAuth free tier has been discontinued.

qwen

hn ·3 pts·1 replies ↗ ·1d ·summary
RTX 3090 llamacpp flags help www.reddit.com

↯ Gemma 4

qwen llama gemma

reddit-localllama ·3 pts·3 replies ↗ ·2d ·summary
Can I combine a RTX5060ti 16gb with 7900XTX 24gb for llama.cpp? www.reddit.com

↯ Qwen 3.5

qwen llama

reddit-localllama ·3 pts·9 replies ↗ ·2d ·summary
NVIDIA + UMD released AF-Next: open audio-language model that outperforms Gemini-2.5-Pro on MMAU-Pro (75.01% vs 57.4%). Temporal Audio Chain-of-Thought anchors reasoning to timestamps. www.aiuniverse.news

qwen gemini

reddit-localllama ·3 pts ·2d ·summary
Qwen 3.5 Small – on-device multimodal models – Alibaba / Qwen ai-tldr.dev

↯ Qwen 3.5

qwen

hn ·3 pts ·2d ·summary
Show HN: Hitoku Draft – context aware local macOS assistant github.com

↯ Qwen 3.5

qwen gemma

hn ·3 pts ·3d ·summary
Lora training www.reddit.com

↯ Qwen 3.5

qwen

reddit-localllama ·3 pts·7 replies ↗ ·3d ·summary
Best LLM for logic/ spatial reasoning on small context inputs? www.reddit.com

My system has 32gb RAM and 8gb VRAM. I tried out DeepSeek-R1-Distill-Qwen-7B-Q6_K_L.gguf and it was vastly inadequate for what I wanted so looking for other suggestions.

deepseek qwen

reddit-localllama ·2 pts·2 replies ↗ ·19h ·summary
Been trying to get Qwen 3.5 to stop reasoning using old methods like /no_think, it didn't work, but it said something like "too late" in its reasoning www.reddit.com

Wait, I need to be careful about the "no_think" tag in the system prompt. The system prompt says /no_think.

↯ Qwen 3.5

qwen

reddit-localllama ·2 pts·4 replies ↗ ·1d ·summary
Qwen 122B is AMAZING but is my config right? (128GB M4 Max) www.reddit.com

Hi! I hope its okay for me to ask this here.

↯ Qwen 3.5

qwen llama

reddit-localllama ·2 pts·9 replies ↗ ·1d ·summary
Alternative opensource Perplexity : ollama+perplexica+searxng : quel model ? reglages ? optimisation ? www.reddit.com

↯ Qwen 3.5

ollama qwen chatgpt+1

reddit-localllama ·2 pts ·2d ·summary
Dynamic tool lists vs KV cache: how do you handle this trade-off in LLM agents? www.reddit.com

qwen mcp

reddit-localllama ·2 pts·8 replies ↗ ·2d ·summary
Local Coding Stacks www.reddit.com

I’m trying to reduce my reliance on Claude. I have a 5090/128GB RAM.

↯ Qwen 3.5

sonnet qwen gemma+1

reddit-localllama ·1 pts·2 replies ↗ ·5h ·summary
Qwen3.5 35b is sure still one the best local model (pulling above its weight) - More Details www.reddit.com

Last time I posted on how this model has performed in creating the webapp based on provided research paper. I got so much love to see people has appreciated the post and of-course the potential of this MOE model.

↯ Qwen 3.5

moe qwen

reddit-localllama ·1 pts·1 replies ↗ ·1d ·summary
Best Ollama models/settings for an 8GB VPS (CPU only, ARM)? Running into memory & looping issues. www.reddit.com

Hi everyone, I'm trying to run a local LLM via Ollama on a Hetzner cax21 VPS (ARM64, 4 vCPUs, 8GB RAM, 80GB SSD). I have Ollama running successfully via Coolify.

ollama qwen gemma

reddit-localllama ·1 pts·2 replies ↗ ·1d ·summary
Can LLM make small change to the software program? www.reddit.com

I'm currently vibe-coding (I'm new to vibe-coding) with Gemma 4 4EB Q4 and Qwen 3.5 9B Q5 (KV is quantized to 4 bits with new Google TurboQuant implemented in llama.cpp - I use koboldcpp and release said it's automatically activated): the…

↯ Qwen 3.5

qwen llama gemma

reddit-localllama ·1 pts·4 replies ↗ ·1d ·summary
For AI agents: is per‑token pricing killing your budget? Looking for feedback on time‑based subscriptions. www.reddit.com

Hey r/AI_Agents, I run an inference service (cheapestinference.com) and we're exploring a different pricing model that might be more predictable for agent workloads. Instead of per‑token billing, we offer **dedicated 8‑hour time windows**…

deepseek qwen llama

reddit-ai_agents ·1 pts·3 replies ↗ ·1d ·summary
Been out of the loop - Will this work for EXO/MLX? www.reddit.com

Had to sell my AI server and am down to an M4 Macbook Air 16GB. If I were to buy a used M1 Air with 16GB (run it headless) and connect the two via EXO + Thunderbolt...would it be possible to be able to run a (19.6GB) Qwen 3.5-27B-Q5_K_M.gg…

↯ Qwen 3.5

qwen

reddit-localllama ·1 pts·1 replies ↗ ·2d ·summary
DGX spark www.reddit.com

↯ Qwen 3.5

vllm qwen llama+1

reddit-localllama ·1 pts·5 replies ↗ ·2d ·summary
Is Local LLM (MCP) + Claude Code a Game Changer or Hype? Upgrading from 16GB M1 www.reddit.com

Is Local LLM (MCP) + Claude Code a Game Changer or Hype? Upgrading from 16GB M1. Hi everyone, I’m at a crossroads with my next Mac upgrade.

qwen mcp claude-code+1

reddit-localllama ·1 pts·4 replies ↗ ·2d ·summary
MINISFORUM AI X1 Pro-370 (96GB) - Local Ollama Help www.reddit.com

Hey all. This just got delivered yesterday.

↯ Qwen 2.5

deepseek ollama qwen+1

reddit-localllama ·8 replies ↗ ·7h ·summary
Need suggestions for local AI Machine www.reddit.com

I’ve been running various AI harnesses like OpenClaw, ForgeCode, ClaudeCode, etc. Most of these are running via OpenRouter or Minimax (credits/subscription model).

↯ Gemma 4

minimax qwen openclaw+1

reddit-localllama ·11 replies ↗ ·10h ·summary
Lower inference speed of Gemma4 26BA4B on vllm. www.reddit.com

For my earlier use case I used to host qwen 2.5 vl 7b gptq int4. Now I was looking to switch to Gemma4 26B A4B, as it would improve performance as well as improve latency considering only 4B parameters are active..

↯ Qwen 2.5

vllm qwen

reddit-localllama ·8 replies ↗ ·11h ·summary
How faster is Gemma 4 26B-A4B during inference vs 31B? www.reddit.com

I want to download one and usually do inference on CPU having old GPU so I'm concerned with speed. One link on the web (I have posted with it and post been removed): Multiple users are reporting that Gemma 4's MoE model (26B-A4B) runs sign…

↯ Qwen 3.5

moe qwen llama+1

reddit-localllama ·16 replies ↗ ·15h ·summary
I bought an 'AI-ready' NUC with an Intel Arc GPU. Ollama couldn't see it. Two days later, I had to build it from source. www.reddit.com

Got an ASUS NUC15 specifically for running Qwen locally on the Arc GPU. The marketing promised AI-ready performance.

ollama qwen

reddit-localllama ·10 replies ↗ ·17h ·summary
TinyGPU on Apple Silicon + RTX 5070 Ti: my real Qwen benchmarks vs Ollama/Metal www.reddit.com

I spent time setting up TinyGPU on an Apple Silicon Mac and comparing it against Ollama already installed locally. Short version: TinyGPU does work.

ollama qwen

reddit-localllama ·3 replies ↗ ·22h ·summary
Qwen 3 Coder Next has a bug! Help Test? www.reddit.com

Hey y'all. So I've stumbled upon a really specific and esoteric "bug" where an llm can't comprehend a URL in like, 90% of scenarios.

↯ Qwen 3

qwen

reddit-localllama ·31 replies ↗ ·1d ·summary
running models bigger than physical memory capacity www.reddit.com

has anyone really tried running models bigger than physical memory capacity? I'd guess most users stick with running models that fit in DRAM + VRAM https://unsloth.ai/docs/models/qwen3.5 even google gemma 4 are released with about 30+ bill…

↯ Qwen 3.5

qwen gemma

reddit-localllama ·14 replies ↗ ·1d ·summary
Laptop has AMD Radeon + RTX 3050 — Which GPU should I use and how do I force apps to use the RTX? www.reddit.com

↯ Qwen 2.5

qwen

reddit-localllama ·1 replies ↗ ·2d ·summary
Hardware needed for Gemma 26B MoE vs Qwen 14B for ~100–300 users (vLLM, single node?) www.reddit.com

↯ Qwen 2.5

vllm moe qwen+1

reddit-localllama ·16 replies ↗ ·2d ·summary
The Mac Studio M5 Ultra Dilemma: Why does Apple make the memory tiers so awkward for LLM www.reddit.com

qwen

reddit-localllama ·21 replies ↗ ·2d ·summary
openrouter/elephant-alpha is 99% Chinese, likely Qwen 3 Nex www.reddit.com

↯ Qwen 3

qwen

reddit-localllama ·4 replies ↗ ·2d ·summary
Speed on m5 pro 48Gb www.reddit.com

Hey guys! How would you reckon a 30-50b model would run on a 48 GBs m5 pro?

↯ Qwen 3.5

glm qwen gemma

reddit-localllama ·3d ·summary
Mac Studio Performance Suggestion For minimax www.reddit.com

↯ MiniMax 2.7

minimax ollama qwen

reddit-localllama ·12 replies ↗ ·3d ·summary

← all tags