Imaginative and prescient basis fashions pre-trained on huge information encode wealthy representations of real-world ideas, which may be tailored to downstream duties by fine-tuning. Nevertheless, fine-tuning basis fashions on one process usually results in the difficulty of idea forgetting on different duties. Current strategies of sturdy fine-tuning goal to mitigate forgetting of prior information with out affecting the fine-tuning efficiency. Information is commonly preserved by matching the unique and fine-tuned mannequin weights or characteristic pairs. Nevertheless, such point-wise matching may be too sturdy, with out specific consciousness of the characteristic neighborhood buildings that encode wealthy information as effectively. We suggest a novel regularization methodology Proxy-FDA that explicitly preserves the structural information in characteristic house. Proxy-FDA performs Characteristic Distribution Alignment (utilizing nearest neighbor graphs) between the pre-trained and fine-tuned characteristic areas, and the alignment is additional improved by informative proxies which can be generated dynamically to extend information range. Experiments present that Proxy-FDA considerably reduces idea forgetting throughout fine-tuning, and we discover a sturdy correlation between forgetting and a distributional distance metric (compared to L2 distance). We additional show Proxy-FDA’s advantages in varied fine-tuning settings (end-to-end, few-shot and continuous tuning) and throughout completely different duties like picture classification, captioning and VQA.