Meet Qwen3.6-35B-A3B:Now Open-Source!🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license.
#qwen
53 items
Qwen3.6-35B-A3B released! www.reddit.com Gemma4 26b & E4B are crazy good, and replaced Qwen for me! www.reddit.com My pre-gemma 4 setup was as follows: Llama-swap, open-webui, and Claude code router on 2 RTX 3090s + 1 P40 (My third 3090 died, RIP) and 128gb of system memory Qwen 3.5 4B for semantic routing to the following models, with n_cpu_moe where…
Released Qwen3.6-35B-A3B www.reddit.com https://x.com/Alibaba_Qwen/status/2044768734234243427 https://huggingface.co/Qwen/Qwen3.6-35B-A3B
I got it guys, I think I finally understand why you hate censored models www.reddit.com I was trying to do an easy task automatically with qwen-code using qwen3.5-122b I can totally do it myself, but I wanted to try, so maybe it could just do it entirely for me? But no, because it refused.
These "Claude-4.6-Opus" Fine Tunes of Local Models Are Usually A Downgrade www.reddit.com Time and time again I find posts about these fine tunes that promise increased intelligence and reasoning with base models, and I continuously try them, realize they're botched, and delete them shortly after. I sometimes do resort to a low…
Do you guys think there’s a high chance of Singularity being open source? www.reddit.com Is there anything better than Qwen3.5-27B-UD-Q5_K_XL for coding? www.reddit.com I have a 5090, so my VRAM is limited to 32GB, but i find that Qwen3.5-27B-UD-Q5_K_XL with opencode (and mmproj) does a pretty good job for my use case (mainly web development). i use claude and codex here and there, recently a lot less, be…
PSA: Having issues with Qwen3.5 overthinking? Give it a tool, and it can help dramatically. www.reddit.com I'm sure everyone has seen the posts from people talking about Qwen 3.5 over-thinking, or maybe you've experienced it yourself. Considering we're like 2 months out from the release and I still see people talk about this issue, I decided it…
My first impressions of Minimax M2.7 (Q5_K_M) vs Qwen 3.5 27b (Q8_0) www.reddit.com I'm not sure if the AesSedai's Q5_K_M version of Minimax M2.7 is too much lobotomized or if the model itself is kind of weak. I did a simple experiment with both models running with the recommended parameters.
[cupel] M5 Max 128GB: Qwen3.5-397B IQ2 @ 29 tokens per second www.reddit.com 2x Asus Ascent GX10 - MiniMax M2.7 AWQ - cloud providers are dead to me www.reddit.com Hello, I've been on a quest to get something "close enough" of Opus 4.5 running locally, for agentic coding, as SWE with 15 years of experience. I tried with one spark (yeah I'm calling my Asus Ascent GX10 sparks - they're the same), with…
Single question llm comparison www.reddit.com GPU advice for Qwen 3.5 27B / Gemma 4 31B (dense) — aiming for 64K ctx, 30+ t/s www.reddit.com Hey all, Looking for some real-world advice on GPU choices for running the new dense models — mainly Qwen 3.5 27B and Gemma 4 31B. What I’m targeting Context: 64K+ (ideally higher later) Speed: 30+ tok/s @ tg128 minimum Power: not critical…
Tell HN: Qwen Free Tier Is Discontinued news.ycombinator.com I kept getting 401 'token expired' errors on my existing Qwen session. Attempting to resume it after quitting, I got: qwen resume [API Error: 401 invalid access token or token expired] [API Error: 401 invalid access token or token expired]…
Llama.cpp vs LM Studio on gaming PC www.reddit.com Here is my experience, I've been using LM Studio with RTX 5080 and 64GB RAM using Windows 11. I'm very happy with LM Studio except the speed.
Qwen 3.5 122B A10B running 50tok/s on DGX SPARK / Asus Ascent www.reddit.com Hello guys, wanted to share this: https://github.com/albond/DGX_Spark_Qwen3.5-122B-A10B-AR-INT4 I am running it on my DGX Spark Int4 V2 with Max context window - and getting 50tok/sec with Multi Token Prediction: Its working great for tool…
Alibaba's Qwen family captures over 50% of global open-source model downloads www.scmp.com A proxy routing all webtraffic through Qwen, removing all enshittified crap geohot.github.io zappa: an AI powered mitmproxy Soon, AI will be good enough to interact with the Internet in an indistinguishable way from a human. This can be an amazing opportunity for liberation from all the people who are targeting your attention.
Summarizing text locally, medical literature www.reddit.com Colleagues, I have a question: does anyone have a locally developed solution for summarizing text? Which qwant qwen 3.5 27b would be able to summarize an entire chapter of medical literature, about 25-30 A4 pages, without hallucinations?
Qwen OAuth Free tier will be discontinued on 2026-04-15 github.com An open-source AI agent that lives in your terminal. 中文 | Deutsch | français | 日本語 | Русский | Português (Brasil) 🎉 News 2026-04-15: Qwen OAuth free tier has been discontinued.
RTX 3090 llamacpp flags help www.reddit.com Can I combine a RTX5060ti 16gb with 7900XTX 24gb for llama.cpp? www.reddit.com NVIDIA + UMD released AF-Next: open audio-language model that outperforms Gemini-2.5-Pro on MMAU-Pro (75.01% vs 57.4%). Temporal Audio Chain-of-Thought anchors reasoning to timestamps. www.aiuniverse.news Qwen 3.5 Small – on-device multimodal models – Alibaba / Qwen ai-tldr.dev Show HN: Hitoku Draft – context aware local macOS assistant github.com Lora training www.reddit.com Best LLM for logic/ spatial reasoning on small context inputs? www.reddit.com My system has 32gb RAM and 8gb VRAM. I tried out DeepSeek-R1-Distill-Qwen-7B-Q6_K_L.gguf and it was vastly inadequate for what I wanted so looking for other suggestions.
Been trying to get Qwen 3.5 to stop reasoning using old methods like /no_think, it didn't work, but it said something like "too late" in its reasoning www.reddit.com Wait, I need to be careful about the "no_think" tag in the system prompt. The system prompt says /no_think.
Qwen 122B is AMAZING but is my config right? (128GB M4 Max) www.reddit.com Hi! I hope its okay for me to ask this here.
Alternative opensource Perplexity : ollama+perplexica+searxng : quel model ? reglages ? optimisation ? www.reddit.com Dynamic tool lists vs KV cache: how do you handle this trade-off in LLM agents? www.reddit.com Local Coding Stacks www.reddit.com I’m trying to reduce my reliance on Claude. I have a 5090/128GB RAM.
Qwen3.5 35b is sure still one the best local model (pulling above its weight) - More Details www.reddit.com Last time I posted on how this model has performed in creating the webapp based on provided research paper. I got so much love to see people has appreciated the post and of-course the potential of this MOE model.
Best Ollama models/settings for an 8GB VPS (CPU only, ARM)? Running into memory & looping issues. www.reddit.com Hi everyone, I'm trying to run a local LLM via Ollama on a Hetzner cax21 VPS (ARM64, 4 vCPUs, 8GB RAM, 80GB SSD). I have Ollama running successfully via Coolify.
Can LLM make small change to the software program? www.reddit.com I'm currently vibe-coding (I'm new to vibe-coding) with Gemma 4 4EB Q4 and Qwen 3.5 9B Q5 (KV is quantized to 4 bits with new Google TurboQuant implemented in llama.cpp - I use koboldcpp and release said it's automatically activated): the…
For AI agents: is per‑token pricing killing your budget? Looking for feedback on time‑based subscriptions. www.reddit.com Hey r/AI_Agents, I run an inference service (cheapestinference.com) and we're exploring a different pricing model that might be more predictable for agent workloads. Instead of per‑token billing, we offer **dedicated 8‑hour time windows**…
Been out of the loop - Will this work for EXO/MLX? www.reddit.com Had to sell my AI server and am down to an M4 Macbook Air 16GB. If I were to buy a used M1 Air with 16GB (run it headless) and connect the two via EXO + Thunderbolt...would it be possible to be able to run a (19.6GB) Qwen 3.5-27B-Q5_K_M.gg…
DGX spark www.reddit.com Is Local LLM (MCP) + Claude Code a Game Changer or Hype? Upgrading from 16GB M1 www.reddit.com Is Local LLM (MCP) + Claude Code a Game Changer or Hype? Upgrading from 16GB M1. Hi everyone, I’m at a crossroads with my next Mac upgrade.
MINISFORUM AI X1 Pro-370 (96GB) - Local Ollama Help www.reddit.com Hey all. This just got delivered yesterday.
Need suggestions for local AI Machine www.reddit.com I’ve been running various AI harnesses like OpenClaw, ForgeCode, ClaudeCode, etc. Most of these are running via OpenRouter or Minimax (credits/subscription model).
Lower inference speed of Gemma4 26BA4B on vllm. www.reddit.com For my earlier use case I used to host qwen 2.5 vl 7b gptq int4. Now I was looking to switch to Gemma4 26B A4B, as it would improve performance as well as improve latency considering only 4B parameters are active..
How faster is Gemma 4 26B-A4B during inference vs 31B? www.reddit.com I want to download one and usually do inference on CPU having old GPU so I'm concerned with speed. One link on the web (I have posted with it and post been removed): Multiple users are reporting that Gemma 4's MoE model (26B-A4B) runs sign…
I bought an 'AI-ready' NUC with an Intel Arc GPU. Ollama couldn't see it. Two days later, I had to build it from source. www.reddit.com Got an ASUS NUC15 specifically for running Qwen locally on the Arc GPU. The marketing promised AI-ready performance.
TinyGPU on Apple Silicon + RTX 5070 Ti: my real Qwen benchmarks vs Ollama/Metal www.reddit.com I spent time setting up TinyGPU on an Apple Silicon Mac and comparing it against Ollama already installed locally. Short version: TinyGPU does work.
Qwen 3 Coder Next has a bug! Help Test? www.reddit.com Hey y'all. So I've stumbled upon a really specific and esoteric "bug" where an llm can't comprehend a URL in like, 90% of scenarios.
running models bigger than physical memory capacity www.reddit.com has anyone really tried running models bigger than physical memory capacity? I'd guess most users stick with running models that fit in DRAM + VRAM https://unsloth.ai/docs/models/qwen3.5 even google gemma 4 are released with about 30+ bill…
Laptop has AMD Radeon + RTX 3050 — Which GPU should I use and how do I force apps to use the RTX? www.reddit.com Hardware needed for Gemma 26B MoE vs Qwen 14B for ~100–300 users (vLLM, single node?) www.reddit.com The Mac Studio M5 Ultra Dilemma: Why does Apple make the memory tiers so awkward for LLM www.reddit.com openrouter/elephant-alpha is 99% Chinese, likely Qwen 3 Nex www.reddit.com Speed on m5 pro 48Gb www.reddit.com Hey guys! How would you reckon a 30-50b model would run on a 48 GBs m5 pro?
Mac Studio Performance Suggestion For minimax www.reddit.com