[edit]
Optimizing the Learning Rate for the Online Training of Neural Networks
Proceedings of The 3rd Conference on Lifelong Learning Agents, PMLR 274:798-814, 2025.
Abstract
Efficient training via gradient-based optimization techniques is an essential building block to the success of artificial neural networks. Extensive research on the impact and the effective estimation of an appropriate learning rate has partly enabled these techniques. Despite the proliferation of data streams generated by IoT devices, digital platforms, and many more, previous research has been primarily focused on batch learning, which assumes that all training data is available a priori. However, characteristics such as the gradual emergence of data and distributional shifts, also known as \textit{concept drift}, pose additional challenges. Therefore, the findings on batch learning may not apply to streaming environments, where the underlying model needs to adapt on the fly each time a new data instance appears. In this work, we seek to address this knowledge gap by evaluating typical learning rate schedules and techniques for adapting these schedules to concept drift (i), and by investigating the effectiveness of optimization techniques with adaptive step sizes in the context of stream-based neural network training (ii). We also introduce a novel \textit{pre-tuning} approach, which we find improves the effectiveness of learning rate tuning performed prior to the stream learning process compared to conventional tuning (iii).