Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts

Xu Liu, Juncheng Liu, Gerald Woo, Taha Aksu, Yuxuan Liang, Roger Zimmermann, Chenghao Liu, Junnan Li, Silvio Savarese, Caiming Xiong, Doyen Sahoo
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:38940-38962, 2025.

Abstract

Achieving effective unified pretraining on large time series corpora remains an open challenge in developing time series foundation models. Existing methods, such as Moirai, introduce multiple projection layers for time series of different frequencies to account for high data heterogeneity. We identify major drawbacks to this human-imposed frequency-level model specialization. First, frequency is not a reliable indicator for grouping pretraining data. Second, time series can display varied distributions even within a short window. Frequency-level specialization overlooks the diversity at this granularity. To address these issues, this paper introduces Moirai-MoE, excluding human-defined data groupings while delegating the modeling of diverse time series patterns to the sparse mixture of experts (MoE) within Transformers. With this design, Moirai-MoE eliminates reliance on heuristics and enables automatic token-level specialization. Extensive evaluations on 39 datasets demonstrate the superiority of Moirai-MoE over state-of-the-art foundation models. This study also conducts comprehensive model analyses to explore the inner workings of time series MoE foundation models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-liu25an, title = {Moirai-{M}o{E}: Empowering Time Series Foundation Models with Sparse Mixture of Experts}, author = {Liu, Xu and Liu, Juncheng and Woo, Gerald and Aksu, Taha and Liang, Yuxuan and Zimmermann, Roger and Liu, Chenghao and Li, Junnan and Savarese, Silvio and Xiong, Caiming and Sahoo, Doyen}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {38940--38962}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/liu25an/liu25an.pdf}, url = {https://proceedings.mlr.press/v267/liu25an.html}, abstract = {Achieving effective unified pretraining on large time series corpora remains an open challenge in developing time series foundation models. Existing methods, such as Moirai, introduce multiple projection layers for time series of different frequencies to account for high data heterogeneity. We identify major drawbacks to this human-imposed frequency-level model specialization. First, frequency is not a reliable indicator for grouping pretraining data. Second, time series can display varied distributions even within a short window. Frequency-level specialization overlooks the diversity at this granularity. To address these issues, this paper introduces Moirai-MoE, excluding human-defined data groupings while delegating the modeling of diverse time series patterns to the sparse mixture of experts (MoE) within Transformers. With this design, Moirai-MoE eliminates reliance on heuristics and enables automatic token-level specialization. Extensive evaluations on 39 datasets demonstrate the superiority of Moirai-MoE over state-of-the-art foundation models. This study also conducts comprehensive model analyses to explore the inner workings of time series MoE foundation models.} }
Endnote
%0 Conference Paper %T Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts %A Xu Liu %A Juncheng Liu %A Gerald Woo %A Taha Aksu %A Yuxuan Liang %A Roger Zimmermann %A Chenghao Liu %A Junnan Li %A Silvio Savarese %A Caiming Xiong %A Doyen Sahoo %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-liu25an %I PMLR %P 38940--38962 %U https://proceedings.mlr.press/v267/liu25an.html %V 267 %X Achieving effective unified pretraining on large time series corpora remains an open challenge in developing time series foundation models. Existing methods, such as Moirai, introduce multiple projection layers for time series of different frequencies to account for high data heterogeneity. We identify major drawbacks to this human-imposed frequency-level model specialization. First, frequency is not a reliable indicator for grouping pretraining data. Second, time series can display varied distributions even within a short window. Frequency-level specialization overlooks the diversity at this granularity. To address these issues, this paper introduces Moirai-MoE, excluding human-defined data groupings while delegating the modeling of diverse time series patterns to the sparse mixture of experts (MoE) within Transformers. With this design, Moirai-MoE eliminates reliance on heuristics and enables automatic token-level specialization. Extensive evaluations on 39 datasets demonstrate the superiority of Moirai-MoE over state-of-the-art foundation models. This study also conducts comprehensive model analyses to explore the inner workings of time series MoE foundation models.
APA
Liu, X., Liu, J., Woo, G., Aksu, T., Liang, Y., Zimmermann, R., Liu, C., Li, J., Savarese, S., Xiong, C. & Sahoo, D.. (2025). Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:38940-38962 Available from https://proceedings.mlr.press/v267/liu25an.html.

Related Material