From Generalization Analysis to Optimization Designs for State Space Models

Fusheng Liu, Qianxiao Li
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:31383-31405, 2024.

Abstract

A State Space Model (SSM) is a foundation model in time series analysis, which has recently been shown as an alternative to transformers in sequence modeling. In this paper, we theoretically study the generalization of SSMs and propose improvements to training algorithms based on the generalization results. Specifically, we give a data-dependent generalization bound for SSMs, showing an interplay between the SSM parameters and the temporal dependencies of the training sequences. Leveraging the generalization bound, we (1) set up a scaling rule for model initialization based on the proposed generalization measure, which significantly improves the robustness of the output value scales on SSMs to different temporal patterns in the sequence data; (2) introduce a new regularization method for training SSMs to enhance the generalization performance. Numerical results are conducted to validate our results.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-liu24ah, title = {From Generalization Analysis to Optimization Designs for State Space Models}, author = {Liu, Fusheng and Li, Qianxiao}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {31383--31405}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/liu24ah/liu24ah.pdf}, url = {https://proceedings.mlr.press/v235/liu24ah.html}, abstract = {A State Space Model (SSM) is a foundation model in time series analysis, which has recently been shown as an alternative to transformers in sequence modeling. In this paper, we theoretically study the generalization of SSMs and propose improvements to training algorithms based on the generalization results. Specifically, we give a data-dependent generalization bound for SSMs, showing an interplay between the SSM parameters and the temporal dependencies of the training sequences. Leveraging the generalization bound, we (1) set up a scaling rule for model initialization based on the proposed generalization measure, which significantly improves the robustness of the output value scales on SSMs to different temporal patterns in the sequence data; (2) introduce a new regularization method for training SSMs to enhance the generalization performance. Numerical results are conducted to validate our results.} }
Endnote
%0 Conference Paper %T From Generalization Analysis to Optimization Designs for State Space Models %A Fusheng Liu %A Qianxiao Li %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-liu24ah %I PMLR %P 31383--31405 %U https://proceedings.mlr.press/v235/liu24ah.html %V 235 %X A State Space Model (SSM) is a foundation model in time series analysis, which has recently been shown as an alternative to transformers in sequence modeling. In this paper, we theoretically study the generalization of SSMs and propose improvements to training algorithms based on the generalization results. Specifically, we give a data-dependent generalization bound for SSMs, showing an interplay between the SSM parameters and the temporal dependencies of the training sequences. Leveraging the generalization bound, we (1) set up a scaling rule for model initialization based on the proposed generalization measure, which significantly improves the robustness of the output value scales on SSMs to different temporal patterns in the sequence data; (2) introduce a new regularization method for training SSMs to enhance the generalization performance. Numerical results are conducted to validate our results.
APA
Liu, F. & Li, Q.. (2024). From Generalization Analysis to Optimization Designs for State Space Models. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:31383-31405 Available from https://proceedings.mlr.press/v235/liu24ah.html.

Related Material