model

Qwen2.5-1.5B-Instruct

huggingface.co/Qwen/Qwen2.5-1.5B-Instruct ↗

13065099 downloads·699 likes·text-generation·transformers

from the model card

Qwen2.5-1.5B-Instruct Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. Long-context Support up to 128K tokens and can generate up to 8K tokens. Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. This repo contains the instruction-tuned 1.5B Qwen2.5 model, which has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings Number of Parameters: 1.54B Number of Paramaters (Non-Embedding): 1.31B Number of Layers: 28 Number of Attention Heads (GQA): 12 fo…

discussions

Qwen 2.5 7 2026-05-17 – 2026-05-30

recent items

Built a config sweep CLI for llama.cpp and vLLM and found out Q4_K_M beat Q8_0 by 230ms TTFT on Qwen2.5-7B (www.reddit.com) 19 4w

I have been coming to this subreddit to understand what the optimal config is to run a model on a given hardware setup. I referred to specific benchmarks, but they are too generic and do not consider the underlying hardware.

↯ Qwen 2.5 vllm llama
Hermes w/cloud LLM and w/local LLM does it work? (www.reddit.com) +13 4w

I’ve tried openclaw locally for about a month. Hardware: M5 Pro w/48 gb ram.

↯ Qwen 2.5 ollama gpt-5 openclaw+1
My experience using Claude code with Local Llm, and full guide on how to set it up (www.reddit.com) +2511 4w

Wanted to share a workflow I tested on a real flight, in case anyone else is trying to set up offline Claude Code. The core idea: using ollama to pull the needed model of what you need, and then use it to run claude code The setup, in orde…

↯ Qwen 2.5 ollama claude-code
Show HN: Charm – on-device spelling, grammar, and prediction for macOS (www.theodorehq.com via hn) +2 4w

I've spent the last year building Charm, a native macOS menu bar app that corrects spelling, fixes grammar, and predicts your next word. Three features: - Spells: NSSpellChecker plus a local LLM for context-aware corrections (catches "defi…

↯ Qwen 2.5 gemma qwen
I offloaded a multi-step background loop from Claude Code to a local agent OS. They started voting on their own system rules. (www.reddit.com) +1 5w

Hey r/ClaudeAI, If you are using Claude Code or building terminal agents, you know the exact moment the context window starts degrading during long-running tasks. I wanted to build a persistent runtime layer to offload those heavy, multi-s…

↯ Qwen 2.5 ollama qwen claude-code
An overview of modern LLM compiler stack: writing an interactive and hackable compiler (www.reddit.com) +13 5w

Hey r/LocalLLaMA, Production ML compiler stack is brutal: TVM is 500K+ lines of C++. PyTorch piles Dynamo, Inductor, and Triton on top of each other.

↯ Qwen 2.5
"Qwen 3 72B" doesn't exist — and it's in a surprising number of places that act like it does (www.reddit.com) 9 5w

spent today auditing my own model catalog and noticed 39 of my own pages confidently reference "qwen 3 72b" with apache 2.0 licensing, a 2025-09-15 release date, and a 131k context window. seemed normal — qwen 2.5 had a 72b, why wouldn't q…

↯ Hallucination ↯ Qwen 2.5 hallucination moe qwen

← all models