model roundup

Qwen 2.5.0

3 items · started 2026-04-15 · closed 2026-04-19

  1. Setup: 3x Mac Minis in a cluster running MLX. One node drives training, two push rollouts via vLLM.

  2. So, yesterday run was a success and I did get an avg rollout length of about 64 tokens as attached in the image! This was with quality_reward + length_penalty (more info below!) Next, I'll be going with length penalty as the reward and wit…

← all threads