model

Qwen3.5-4B

10156913 downloads·608 likes·image-text-to-text·transformers

from the model card

Qwen3.5-4B [!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. Over recent months, we have intensified our focus on developing foundation models that deliver exceptional utility and performance. Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility to empower developers and enterprises with unprecedented capability and efficiency. Qwen3.5 Highlights Qwen3.5 features the following enhancement: Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks. Efficient Hybrid Architecture: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead. Scalable RL Generalization: Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability. Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuance…

discussions

Qwen 3.5 15 2026-06-04 – 2026-06-13

recent items

DiffusionGemma made me rethink what memory bandwidth means for local agent inference (www.reddit.com via reddit) 2w

Been testing DiffusionGemma 26B A4B for the last few days and the bottleneck profile is completely different from autoregressive models. With autoregressive models you are compute-bound during prefill and memory-bandwidth-bound during deco…

↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 agentic
How do i prevent llama.cpp from offloading on Swap? (www.reddit.com via reddit) 2w

I have tried preventing this issue by using llama.cpp flags. However, I still have the issue: whenever I'm close to my 96GB of RAM, llama-server / llama.cpp decides to offload the KV cache onto my swap.

↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 qwen llama
NVFP4 with llama.cpp - FAQs? (www.reddit.com via reddit) 2w

Lets clarify all things related to NVFP4 in this thread. Sharing few questions & links here.

↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 llama
Ask HN: Any Local LLM can I run without GPU for Local Agentic workflow AI? (news.ycombinator.com) +3 2w

Claude Code like agentic workflow ai too costly for me.Any LLM can I run with VSCode at the below setup? 16ram Intel core i7 h processor 13gen 512gb NVMe SSD I want to run the ai as local agentic workflow with Vscode.I want use LLAMA agent…

↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 ↯ Qwen 3.5 llama agentic claude-code
Hot Take "Rigid code is better than Flexible code if you're on a budget" (www.reddit.com via reddit) 2w

I've spent the last six months trying to build a fully local, agentic pipeline for a text_processing and extraction tool I use daily. Because I’m running everything on a single consumer GPU setup, my choices are limited to smaller, quanti…

↯ Qwen 3.5 ↯ Qwen 3.5 gemma qwen agentic
I have 4x 128 GB VRAM now , what should i do. (www.reddit.com via reddit) 2w

↯ Qwen 3.5 vllm qwen
nice_meme (www.reddit.com via reddit) 2w

https://preview.redd.it/z66h627yi96h1.png?width=1080&format=png&auto=webp&s=94040bb76c0f8099b58927771c2193dd6a5019da qwen3.5 9b at 0 bit quant>>>>>>>copilot

↯ Copilot ↯ Qwen 3.5 copilot
[Opinion/Benchmark] Gemma4-12B's architecture change is too big of a tradeoff; A quick reasoning comparison between Gemma4-12B and Qwen 3.5-9B (www.reddit.com via reddit) 2w

I took the liberty to test both models today on my favorite benchmark question, head to head. Device: Apple Mac M3 Max 64GB Environment: llama.cpp, all defaults Gemma4-12B's token generation speed: 47 tps with MTP and 2 predicted tokens 29…

↯ Qwen 3.5 qwen llama
Nex N2 has a funny "few words do trick" reasoning (www.reddit.com via reddit) 2w

↯ Qwen 3.5 qwen
Preferred two LLM combo (www.reddit.com via reddit) 2w

I’m using my MacBook Pro M1 Pro with 32GB to run Qwen3.5-35B in Q4 as my coding agent. I have a gaming PC with a 5070 Ti that I’m currently not using but would like to.

↯ Qwen 3.5
Dense vs MoE quantization resiliance (www.reddit.com via reddit) 2w

Which one is more resiliant to quantization? Especially at 4-bit?

↯ Qwen 3.5 moe qwen
Running Hermes fully local (www.reddit.com via reddit) 2w

Before Hermes was announced, I was working on my own fully local, personal agentic system. Now, I'm a novice when it comes to coding.

↯ Qwen 3.5 agentic
It felt good to return my Asus Spark (www.reddit.com via reddit) 2w

It's an incredible little package but too expensive of a price to pay for the performance and I simply didn't want to be part of the great "Superchip lie" - it could be super, but its super ruined by its limited memory bandwidth even thoug…

↯ Qwen 3.5 moe qwen
Launch HN: General Instinct (YC P26) – Frontier models on edge devices (news.ycombinator.com) +42 2w

Hey HN, Guanming and Bill here from General Instinct (https://general-instinct.com/). After years of working in robotics, we kept running into the same problem: the best models never fit the hardware we actually had available.

↯ Qwen 3.5 moe
Show HN: Hitoku Draft – Context aware local assistant (hitoku.me via hn) +5 3w

Hi guys. I have been working on Hitoku Draft, an open-source, voice-first AI assistant that runs entirely locally.

↯ Qwen 3.5 gemma qwen

← all models