Whereas federated studying (FL) and differential privateness (DP) have been extensively studied, their software to automated speech recognition (ASR) stays largely unexplored because of the challenges in coaching giant transformer fashions. Particularly, giant fashions additional exacerbate points in FL as they’re notably vulnerable to gradient heterogeneity throughout layers, in contrast to the comparatively uniform gradient habits noticed in shallow fashions. Consequently, prior works wrestle to converge with normal optimization methods, even within the absence of DP mechanisms. To the perfect of our information, no present work establishes a aggressive, sensible recipe for FL with DP within the context of ASR. To deal with this hole, we set up the primary benchmark for FL with DP in end-to-end ASR. Our method facilities on per-layer clipping and layer-wise gradient normalization: theoretical evaluation reveals that these methods collectively mitigate clipping bias and gradient heterogeneity throughout layers in deeper fashions. According to these theoretical insights, our empirical outcomes present that FL with DP is viable below robust privateness ensures, supplied a inhabitants of no less than a number of million customers. Particularly, we obtain user-level (7.2, )-DP (resp. (4.5, )-DP) with a 1.3% (resp. 4.6%) absolute drop in phrase error charge when extrapolating to excessive (resp. low) inhabitants scales for FL with DP in ASR. Though our experiments concentrate on ASR, the underlying ideas we uncover — notably these regarding gradient heterogeneity and layer-wise gradient normalization — provide broader steerage for designing scalable, privacy-preserving FL algorithms for big fashions throughout domains.
- * Equal Contributors
- † Purdue College
Determine 1: (ε, δ)-DP ensures: central seed educated on Librispeech (100h) and fine-tuned with federated studying and differential privateness on Widespread Voice (1,500h) reveals sensible high quality whereas preserving (ε, δ)-DP for extrapolation to bigger inhabitants and cohort dimension.