Running a full agentic coding loop locally on a 3090. Here's what actually works in 2026.
After months of testing, I finally have a local setup that doesn't make me want to go back to the API. Hardware: RTX 3090 (24GB VRAM) Models tested: Qwen2.5-Coder 32B Q4_K_M, DeepSeek-Coder-V3 Q4, Llama 3.3 70B Q3_K_M Inference: llama.cpp…