A harmful assumption that may be comprised of prior work on the bias switch speculation (BTH) is that biases don’t switch from pre-trained massive language fashions (LLMs) to tailored fashions. We invalidate this assumption by finding out the BTH in causal fashions underneath immediate diversifications, as prompting is an especially well-liked and accessible adaptation technique utilized in real-world purposes. In distinction to prior work, we discover that biases can switch by means of prompting and that well-liked prompt-based mitigation strategies don’t constantly stop biases from transferring. Particularly, the correlation between intrinsic biases and people after immediate adaptation stay reasonable to robust throughout demographics and duties — for instance, gender (rho >= 0.94) in co-reference decision, and age (rho >= 0.98) and faith (rho >= 0.69) in query answering. Additional, we discover that biases stay strongly correlated when various few-shot composition parameters, similar to pattern measurement, stereotypical content material, occupational distribution and representational stability (rho >= 0.90). We consider a number of prompt-based debiasing methods and discover that completely different approaches have distinct strengths, however none constantly cut back bias switch throughout fashions, duties or demographics. These outcomes reveal that correcting bias, and probably bettering reasoning capacity, in intrinsic fashions might stop propagation of biases to downstream duties.
- * Equal contribution
- † Work finished whereas at Apple

