model roundup

Qwen 2.5

24 items · started 2026-04-13 · closed 2026-04-28

  1. I’ve been working on an educational implementation repo for speculative decoding: https://github.com/shreyansh26/Speculative-Decoding The goal is not to wrap existing libraries, but to implement several speculative decoding methods from sc…

  2. I've been using ONNX browser based runtime to do experiments with logit steering ad I've been seeing shocking improvements over baseline generation. This is a Qwen 2.5 0.5B....

  3. Hi! We, Vincenzo and Riccardo, built Doxa as an agnostic engine for emergent simulations with agents for constrainted scenarios (like geopolitical, economics, ...) and work well with LLMs like Qwen2.5:7B, Llama but also cloud models such a…

  4. I wanted to start a robotic project to try and build a robot that has an embedded AI. I tried with a qwen 2.5-VL-3B and it was too big for the raspberry pi.

  5. Over the past few days, to test the Doxa geopolitical-economic simulation engine, we recreated the Strait of Hormuz scenario with 5 actors to analyze the agents' emergent outcomes. We gave the US agent a "populist" persona and the Iran age…

  6. Hi all, Total noob here trying to set up a local model to help me with coding. I am trying the following setup - Ollama running the qwen2.5-coder:7b model in docker with the following compose file services: ollama: container_name: ollama i…

  7. Hey r/LocalLLaMA, I've been coding for a while but not in the local AI space and wanted to run some benchmarks on my 18GB M3 Pro. The theme of this one was "specialists vs generalists" at the 7-8B range: qwen2.5-coder:7b, deepseek-r1:7b, m…

  8. Hey, I've been building side projects with Claude Code for a few months, but I'm completely new to fine-tuning — started experimenting maybe a week ago. From day one I wanted a GUI for the dataset side of the workflow, so this desktop app…

  9. nyquest.ai Semantic Compression Proxy for LLMs Reduce LLM token usage by 15–75% without losing meaning. Drop-in proxy with 350+ compiled rules + local LLM semantic condensation (Qwen 2.5 1.5B).

  10. Hey all. This just got delivered yesterday.

  11. I am currently on an m3 mbp with 24gb ram. For regular python and django work the machine is perfect and i have no need to upgrade for speed.

  12. Hola, tengo actualmente un dispositivo con Truenas Scale, mi disposotivo tiene in i5 4570, 32gb ddr3, varios ssd para NAS y le instale hace poco una rtx 3060 de 12gb con el proposito de correr una ia local, para llamar a claude code o tene…

  13. For my earlier use case I used to host qwen 2.5 vl 7b gptq int4. Now I was looking to switch to Gemma4 26B A4B, as it would improve performance as well as improve latency considering only 4B parameters are active..

  14. https://preview.redd.it/2olx2ckl9evg1.jpg?width=4088&format=pjpg&auto=webp&s=b8ee69bff72a4ca21888dccf6f825da11b2b89a2 Here is the build guide for my setup. While it isn't a massive textbook, it provides enough detail to replicate the steps.

  15. I have a laptop with: • AMD Radeon GPU • NVIDIA RTX 3050 GPU • 16GB RAM I’m running Qwen 2.5 3B locally, but it’s using the CPU instead of my RTX 3050. Performance is much slower than expected.

  16. I've been on a hunt for a browser agent that can reliably handle daily agentic tasks: filling job applications, logging into sites and fetching data, making posts on my behalf, solving assignments and reporting results, and API/troubleshoo…

  17. I'm trying to figure out what sort of hardware setup i will need to accomodate a userbase of 100 users (not necessarily concurrent). Does anyone have any idea what sort of setup i'd be looking at?

  18. I am open to any option whether it's local or service based. For online services I tried Chatgpt agent : it's almost the worst option ever.

  19. Hey there! I’m currently building a web app for engineering with lots of logic/math-heavy code using Claude Pro.

  20. I know this question has already been asked a thousand times, probably, but... what's the best or close-to-best model I can use with Continue for local IDE-like code autocomplete?

← all threads