Training Qwen2.5-0.5B-Instruct on Reddit post summarization with GRPO on my 3x Mac Minis — add METEOR as quality reward!
Setup: 3x Mac Minis in a cluster running MLX. One node drives training, two push rollouts via vLLM.
Setup: 3x Mac Minis in a cluster running MLX. One node drives training, two push rollouts via vLLM.