model roundup

Qwen 2.5.0

3 items · started 2026-04-15 · closed 2026-04-19

Training Qwen2.5-0.5B-Instruct on Reddit post summarization with GRPO on my 3x Mac Minis — add METEOR as quality reward! (www.reddit.com)

+11 7w vllm

Setup: 3x Mac Minis in a cluster running MLX. One node drives training, two push rollouts via vLLM.
- Training Qwen2.5-0.5B-Instruct on Reddit posts summarization tasks with length constraint on my 3xMac Minis with GRPO - evals update (www.reddit.com)
Trained a Qwen2.5-0.5B-Instruct bf16 model on Reddit post summarization task with GRPO written from scratch in PyTorch - updates! (www.reddit.com)

+32 7w vllm

So, yesterday run was a success and I did get an avg rollout length of about 64 tokens as attached in the image! This was with quality_reward + length_penalty (more info below!) Next, I'll be going with length penalty as the reward and wit…