model roundup
Qwen 2.5.0
-
Setup: 3x Mac Minis in a cluster running MLX. One node drives training, two push rollouts via vLLM.
-
So, yesterday run was a success and I did get an avg rollout length of about 64 tokens as attached in the image! This was with quality_reward + length_penalty (more info below!) Next, I'll be going with length penalty as the reward and wit…