model roundup

Qwen 2.5

24 items · started 2026-04-13 · closed 2026-04-28

Speculative Decoding Implementations: EAGLE-3, Medusa-1, PARD, Draft Models, N-gram and Suffix Decoding from scratch (www.reddit.com)

+41 8w qwen

I’ve been working on an educational implementation repo for speculative decoding: https://github.com/shreyansh26/Speculative-Decoding The goal is not to wrap existing libraries, but to implement several speculative decoding methods from sc…
Using logit steering / KV Cache Dynamic Assembly to guide outputs from Small Language Models using ONNX Runtime (www.reddit.com)

+11 8w qwen

I've been using ONNX browser based runtime to do experiments with logit steering ad I've been seeing shocking improvements over baseline generation. This is a Qwen 2.5 0.5B....
Show HN: Doxa – Open-source emergent simulator for geopolitical scenarios (github.com via hn)

+2 8w llama gemini

Hi! We, Vincenzo and Riccardo, built Doxa as an agnostic engine for emergent simulations with agents for constrainted scenarios (like geopolitical, economics, ...) and work well with LLMs like Qwen2.5:7B, Llama but also cloud models such a…
Best model that can run on raspberry pi 5 with 8GB of RAM (www.reddit.com)

+1 9w qwen

I wanted to start a robotic project to try and build a robot that has an embedded AI. I tried with a qwen 2.5-VL-3B and it was too big for the raspberry pi.
I ran a Hormuz Crisis emergent SIM: AIs started lying to hide a stalemate (news.ycombinator.com)

+42 9w

Over the past few days, to test the Doxa geopolitical-economic simulation engine, we recreated the Strait of Hormuz scenario with 5 actors to analyze the agents' emergent outcomes. We gave the US agent a "populist" persona and the Iran age…
Issues running local model with vscode and cline (www.reddit.com)

+21 9w cline ollama

Hi all, Total noob here trying to set up a local model to help me with coding. I am trying the following setup - Ollama running the qwen2.5-coder:7b model in docker with the following compose file services: ollama: container_name: ollama i…
7B showdown on 18GB (benchmark) (www.reddit.com)

+21 9w deepseek

Hey r/LocalLLaMA, I've been coding for a while but not in the local AI space and wanted to run some benchmarks on my 18GB M3 Pro. The theme of this one was "specialists vs generalists" at the 7-8B range: qwen2.5-coder:7b, deepseek-r1:7b, m…
I Built a desktop app for generating LLM fine-tuning datasets — started it a week ago while learning FT (www.reddit.com)

+31 9w humaneval fine-tuning claude-code

Hey, I've been building side projects with Claude Code for a few months, but I'm completely new to fine-tuning — started experimenting maybe a week ago. From day one I wanted a GUI for the dataset side of the workflow, so this desktop app…
16GB VRAM x coding model (www.reddit.com)

+41 9w codex claude-code
Good Summarization SLMs for < 2000 tokens (www.reddit.com)

+23 9w qwen
Need help for running local llm on a server (www.reddit.com)

9 9w
OCuLink dGPU for AMD: RX 7600 XT vs RX 7800 XT for LLM — worth the price gap? Also llamacpp + Vulkan vs Ollama + ROCm? (www.reddit.com)

8 9w ollama qwen llama
Nyquest – Open-source LLM token compression proxy in Rust (15–75% savings) (github.com via hn)

+2 10w qwen

nyquest.ai Semantic Compression Proxy for LLMs Reduce LLM token usage by 15–75% without losing meaning. Drop-in proxy with 350+ compiled rules + local LLM semantic condensation (Qwen 2.5 1.5B).
MINISFORUM AI X1 Pro-370 (96GB) - Local Ollama Help (www.reddit.com)

8 10w ollama deepseek qwen+1

Hey all. This just got delivered yesterday.
m5 pro 64gb worth it for local agents or wait? (www.reddit.com)

11 10w cline qwen agentic+1

I am currently on an m3 mbp with 24gb ram. For regular python and django work the machine is perfect and i have no need to upgrade for speed.
duda sobre descargarse IA de forma local (www.reddit.com)

5 10w claude-code

Hola, tengo actualmente un dispositivo con Truenas Scale, mi disposotivo tiene in i5 4570, 32gb ddr3, varios ssd para NAS y le instale hace poco una rtx 3060 de 12gb con el proposito de correr una ia local, para llamar a claude code o tene…
Lower inference speed of Gemma4 26BA4B on vllm. (www.reddit.com)

8 10w vllm qwen

For my earlier use case I used to host qwen 2.5 vl 7b gptq int4. Now I was looking to switch to Gemma4 26B A4B, as it would improve performance as well as improve latency considering only 4B parameters are active..
24/7 Headless AI Server on Xiaomi 12 Pro (Guide & Benchmarks) Gemma4 VS Qwen2.5 (www.reddit.com)

+11 10w

https://preview.redd.it/2olx2ckl9evg1.jpg?width=4088&format=pjpg&auto=webp&s=b8ee69bff72a4ca21888dccf6f825da11b2b89a2 Here is the build guide for my setup. While it isn't a massive textbook, it provides enough detail to replicate the steps.
Laptop has AMD Radeon + RTX 3050 — Which GPU should I use and how do I force apps to use the RTX? (www.reddit.com)

1 10w qwen

I have a laptop with: • AMD Radeon GPU • NVIDIA RTX 3050 GPU • 16GB RAM I’m running Qwen 2.5 3B locally, but it’s using the CPU instead of my RTX 3050. Performance is much slower than expected.
Tested 6 browser use agents for real-world tasks — here's an honest breakdown + looking for recommendations (www.reddit.com)

+45 10w ollama chatgpt mcp+1

I've been on a hunt for a browser agent that can reliably handle daily agentic tasks: filling job applications, logging into sites and fetching data, making posts on my behalf, solving assignments and reporting results, and API/troubleshoo…
Hardware needed for Gemma 26B MoE vs Qwen 14B for ~100–300 users (vLLM, single node?) (www.reddit.com)

16 10w vllm moe gemma+1

I'm trying to figure out what sort of hardware setup i will need to accomodate a userbase of 100 users (not necessarily concurrent). Does anyone have any idea what sort of setup i'd be looking at?
Looking for a reliable browser use agent that handles most daily tasks. (www.reddit.com)

12 10w ollama chatgpt mcp

I am open to any option whether it's local or service based. For online services I tried Chatgpt agent : it's almost the worst option ever.
Is 32GB Mac enough for engineering/coding, or stick to Claude? (www.reddit.com)

13 10w deepseek sonnet

Hey there! I’m currently building a web app for engineering with lots of logic/math-heavy code using Claude Pro.
What's the current best code autocomplete LLM for local deployment (as of April 2026)? (www.reddit.com)

+34 10w glm

I know this question has already been asked a thousand times, probably, but... what's the best or close-to-best model I can use with Continue for local IDE-like code autocomplete?

← all threads