Hey folks, this is for those who appreciate experimentation on open-sourced AI models. We fine-tuned open-sourced SMLs (3B and 7B parameters) with SFT + DPO against commercial models like GPT-5.4, Gemini 3.1 Pro, Claude Opus 4.6, Google Do…
#dpo
5 items
SFT + DPO on open-sourced SLMs (www.reddit.com) Free hands-on lab: build a ReAct agent 3 ways (create_agent, raw LangGraph with tool-call budget, NVIDIA NAT YAML) (www.reddit.com) I tried a selective training method for hallucination — beats DPO and SFT with ~10% data (www.reddit.com) Ultra-ml-intern: huggingface/ml-intern's workflow as a Claude Code plugin (www.reddit.com) huggingface/ml-intern is HF's autonomous ML engineer — reads papers, audits datasets, ships SFT/DPO/LoRA/GRPO runs to HF Jobs. it's a standalone python harness with its own agent loop calling the Claude API.
Fine-tune Llama 2 with DPO (huggingface.co)