Self-Distillation

SDFT: Self-Distillation Enables Continual Learning
SDPO: Reinforcement Learning via Self-Distillation
Aligning Language Models from User Interactions