Wanted to share a workflow I tested on a real flight, in case anyone else is trying to set up offline Claude Code. The core idea: using ollama to pull the needed model of what you need, and then use it to run claude code The setup, in orde…
model
Qwen2.5-1.5B-Instruct
huggingface.co/Qwen/Qwen2.5-1.5B-Instruct ↗
13065099 downloads699 likestext-generationtransformers
from the model card
Qwen2.5-1.5B-Instruct Introduction Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. Long-context Support up to 128K tokens and can generate up to 8K tokens. Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. This repo contains the instruction-tuned 1.5B Qwen2.5 model, which has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings Number of Parameters: 1.54B Number of Paramaters (Non-Embedding): 1.31B Number of Layers: 28 Number of Attention Heads (GQA): 12 fo…
discussions
- Qwen 2.5 7 2026-05-17 – 2026-05-30
recent items
My experience using Claude code with Local Llm, and full guide on how to set it up (www.reddit.com) Built a config sweep CLI for llama.cpp and vLLM and found out Q4_K_M beat Q8_0 by 230ms TTFT on Qwen2.5-7B (www.reddit.com) I have been coming to this subreddit to understand what the optimal config is to run a model on a given hardware setup. I referred to specific benchmarks, but they are too generic and do not consider the underlying hardware.
Show HN: Charm – on-device spelling, grammar, and prediction for macOS (www.theodorehq.com via hn) I've spent the last year building Charm, a native macOS menu bar app that corrects spelling, fixes grammar, and predicts your next word. Three features: - Spells: NSSpellChecker plus a local LLM for context-aware corrections (catches "defi…
Hermes w/cloud LLM and w/local LLM does it work? (www.reddit.com) I’ve tried openclaw locally for about a month. Hardware: M5 Pro w/48 gb ram.
I offloaded a multi-step background loop from Claude Code to a local agent OS. They started voting on their own system rules. (www.reddit.com) Hey r/ClaudeAI, If you are using Claude Code or building terminal agents, you know the exact moment the context window starts degrading during long-running tasks. I wanted to build a persistent runtime layer to offload those heavy, multi-s…
An overview of modern LLM compiler stack: writing an interactive and hackable compiler (www.reddit.com) Hey r/LocalLLaMA, Production ML compiler stack is brutal: TVM is 500K+ lines of C++. PyTorch piles Dynamo, Inductor, and Triton on top of each other.
"Qwen 3 72B" doesn't exist — and it's in a surprising number of places that act like it does (www.reddit.com) spent today auditing my own model catalog and noticed 39 of my own pages confidently reference "qwen 3 72b" with apache 2.0 licensing, a 2025-09-15 release date, and a 131k context window. seemed normal — qwen 2.5 had a 72b, why wouldn't q…