A note of warning about DFlash.
It started saying 4/5x speed advantage against usual bf16 models (test are less optimistic but let think this is true). Then MoE gain is not that good, value was for dense models.
It started saying 4/5x speed advantage against usual bf16 models (test are less optimistic but let think this is true). Then MoE gain is not that good, value was for dense models.