We introduce DiceHuBERT, a data distillation framework for compressing HuBERT, a extensively used self-supervised studying (SSL)-based speech basis mannequin. Not like present distillation strategies that depend on layer-wise and feature-wise mapping between trainer and scholar fashions, DiceHuBERT leverages HuBERT’s iterative self-distillation mechanism by straight changing the unique mannequin with a scholar mannequin. This alternative permits the coed to be skilled utilizing the identical SSL goal used when pre-training HuBERT, eliminating the necessity for added modules or architectural constraints. Experimental outcomes on SUPERB present that DiceHuBERT persistently outperforms present distillation strategies, enhancing phoneme recognition efficiency by over 21% and ASR efficiency by greater than 14%. Moreover, DiceHuBERT demonstrates aggressive efficiency throughout a number of duties, highlighting its clear benefit.
- † Carnegie Mellon College
- ‡ Meta
- ** Work finished whereas at Apple