Self-DistillationSDFT: Self-Distillation Enables Continual LearningSDPO: Reinforcement Learning via Self-DistillationAligning Language Models from User Interactions