How does MOE training ensure different experts are chosen?

reddit-localllama · www.reddit.com ·4 pts·6 replies ↗ ·1d

I’m training a coding model that is basically a large model and a mini model built into one. Think of it like a person with two heads.

moe

open →

← back to top