#dpo

5 items

SFT + DPO on open-sourced SLMs (www.reddit.com) +75 7d

Hey folks, this is for those who appreciate experimentation on open-sourced AI models. We fine-tuned open-sourced SMLs (3B and 7B parameters) with SFT + DPO against commercial models like GPT-5.4, Gemini 3.1 Pro, Claude Opus 4.6, Google Do…

↯ Gemini 3.1 dpo deepseek gpt-5+2
Free hands-on lab: build a ReAct agent 3 ways (create_agent, raw LangGraph with tool-call budget, NVIDIA NAT YAML) (www.reddit.com) +43 4d

dpo vllm agentic
I tried a selective training method for hallucination — beats DPO and SFT with ~10% data (www.reddit.com) +26 2d

dpo hallucination
Ultra-ml-intern: huggingface/ml-intern's workflow as a Claude Code plugin (www.reddit.com) 1 2h

huggingface/ml-intern is HF's autonomous ML engineer — reads papers, audits datasets, ships SFT/DPO/LoRA/GRPO runs to HF Jobs. it's a standalone python harness with its own agent loop calling the Claude API.

dpo claude-code
Fine-tune Llama 2 with DPO (huggingface.co) 141w

dpo llama