model roundup
Qwen 2.5
-
I’ve been working on an educational implementation repo for speculative decoding: https://github.com/shreyansh26/Speculative-Decoding The goal is not to wrap existing libraries, but to implement several speculative decoding methods from sc…
-
I've been using ONNX browser based runtime to do experiments with logit steering ad I've been seeing shocking improvements over baseline generation. This is a Qwen 2.5 0.5B....
-
Hi! We, Vincenzo and Riccardo, built Doxa as an agnostic engine for emergent simulations with agents for constrainted scenarios (like geopolitical, economics, ...) and work well with LLMs like Qwen2.5:7B, Llama but also cloud models such a…
-
Best model that can run on raspberry pi 5 with 8GB of RAM (www.reddit.com)
I wanted to start a robotic project to try and build a robot that has an embedded AI. I tried with a qwen 2.5-VL-3B and it was too big for the raspberry pi.
-
I ran a Hormuz Crisis emergent SIM: AIs started lying to hide a stalemate (news.ycombinator.com)
Over the past few days, to test the Doxa geopolitical-economic simulation engine, we recreated the Strait of Hormuz scenario with 5 actors to analyze the agents' emergent outcomes. We gave the US agent a "populist" persona and the Iran age…
-
Issues running local model with vscode and cline (www.reddit.com)
Hi all, Total noob here trying to set up a local model to help me with coding. I am trying the following setup - Ollama running the qwen2.5-coder:7b model in docker with the following compose file services: ollama: container_name: ollama i…
-
7B showdown on 18GB (benchmark) (www.reddit.com)
Hey r/LocalLLaMA, I've been coding for a while but not in the local AI space and wanted to run some benchmarks on my 18GB M3 Pro. The theme of this one was "specialists vs generalists" at the 7-8B range: qwen2.5-coder:7b, deepseek-r1:7b, m…
-
Hey, I've been building side projects with Claude Code for a few months, but I'm completely new to fine-tuning — started experimenting maybe a week ago. From day one I wanted a GUI for the dataset side of the workflow, so this desktop app…
-
16GB VRAM x coding model (www.reddit.com)
-
Good Summarization SLMs for < 2000 tokens (www.reddit.com)
-
Need help for running local llm on a server (www.reddit.com)
-
-
nyquest.ai Semantic Compression Proxy for LLMs Reduce LLM token usage by 15–75% without losing meaning. Drop-in proxy with 350+ compiled rules + local LLM semantic condensation (Qwen 2.5 1.5B).
-
MINISFORUM AI X1 Pro-370 (96GB) - Local Ollama Help (www.reddit.com)
Hey all. This just got delivered yesterday.
-
m5 pro 64gb worth it for local agents or wait? (www.reddit.com)
I am currently on an m3 mbp with 24gb ram. For regular python and django work the machine is perfect and i have no need to upgrade for speed.
-
duda sobre descargarse IA de forma local (www.reddit.com)
Hola, tengo actualmente un dispositivo con Truenas Scale, mi disposotivo tiene in i5 4570, 32gb ddr3, varios ssd para NAS y le instale hace poco una rtx 3060 de 12gb con el proposito de correr una ia local, para llamar a claude code o tene…
-
Lower inference speed of Gemma4 26BA4B on vllm. (www.reddit.com)
For my earlier use case I used to host qwen 2.5 vl 7b gptq int4. Now I was looking to switch to Gemma4 26B A4B, as it would improve performance as well as improve latency considering only 4B parameters are active..
-
https://preview.redd.it/2olx2ckl9evg1.jpg?width=4088&format=pjpg&auto=webp&s=b8ee69bff72a4ca21888dccf6f825da11b2b89a2 Here is the build guide for my setup. While it isn't a massive textbook, it provides enough detail to replicate the steps.
-
I have a laptop with: • AMD Radeon GPU • NVIDIA RTX 3050 GPU • 16GB RAM I’m running Qwen 2.5 3B locally, but it’s using the CPU instead of my RTX 3050. Performance is much slower than expected.
-
I've been on a hunt for a browser agent that can reliably handle daily agentic tasks: filling job applications, logging into sites and fetching data, making posts on my behalf, solving assignments and reporting results, and API/troubleshoo…
-
I'm trying to figure out what sort of hardware setup i will need to accomodate a userbase of 100 users (not necessarily concurrent). Does anyone have any idea what sort of setup i'd be looking at?
-
I am open to any option whether it's local or service based. For online services I tried Chatgpt agent : it's almost the worst option ever.
-
Is 32GB Mac enough for engineering/coding, or stick to Claude? (www.reddit.com)
Hey there! I’m currently building a web app for engineering with lots of logic/math-heavy code using Claude Pro.
-
I know this question has already been asked a thousand times, probably, but... what's the best or close-to-best model I can use with Continue for local IDE-like code autocomplete?