model roundup

Qwen 2.5

4 items · started 2026-06-07 · ongoing (last activity 2026-06-10)

Fine-tuned Qwen2.5-7B to 96% of Claude Haiku on a domain-specific task using ~$3 of API calls and zero human labelers (www.reddit.comhttps)

2h dpo haiku

Built a decision-reasoning engine (Orlog) and wanted to fine-tune a local model for it instead of paying per-call forever. The method (DV-DPO): Run a 3-voice council on each question, produce a synthesis Cross-examine: losing voices challe…
Has anyone tried running retrieval inside the model, not before it? (www.reddit.com via reddit)

17h llama

Been messing with a bolt-on refiner block for small models. Insert a small trainable transformer layer at the midpoint of a frozen base model, loop it 2-4 times over the hidden states.
why I have just installed OpenLumara, my first Agentic Framework. Using only local models, served by LMStudio (www.reddit.comhttps)

2d qwen agentic

Where I came across it: https://www.reddit.com/r/LocalLLaMA/comments/1txxgpq/openlumara_a_different_kind_of_ai_agent_written/ DISCLAIMER: A good posting would be: This is what I wanted to do with Lumara. Here is what worked, here is what d…
Claude Code 2.1.165 + Ollama (qwen3:8b / qwen2.5-coder:7b) instantly throws "response exceeded 32000 output token maximum" even for "hi" (www.reddit.com via reddit)

4d ollama claude-code

I'm trying to use Claude Code with local Ollama models, but every prompt fails with: The strange part is that it happens even for extremely small prompts like: hi say apple What is 1+1? Answer with only one character.