tspDB: Time Series Predict DB

Anish Agarwal; Abdullah Alomar; Devavrat Shah

tspDB: Time Series Predict DB

Anish Agarwal, Abdullah Alomar, Devavrat Shah

Proceedings of the NeurIPS 2020 Competition and Demonstration Track, PMLR 133:27-56, 2021.

Abstract

A major bottleneck of the current Machine Learning (ML) workflow is the time consuming, error prone engineering required to get data from a datastore or a database (DB) to the point an ML algorithm can be applied to it. This is further exacerbated since ML algorithms are now trained on large volumes of data, yet we need predictions in real-time, especially in a variety of time-series applications such as finance and real-time control systems. Hence, we explore the feasibility of directly integrating prediction functionality on top of a data store or DB. Such a system ideally: (i) provides an intuitive prediction query interface which alleviates the unwieldy data engineering; (ii) provides state-of-the-art statistical accuracy while ensuring incremental model update, low model training time and low latency for making predictions. As the main contribution we explicitly instantiate a proof-of-concept, tspDB which directly integrates with PostgreSQL. We rigorously test tspDB’s statistical and computational performance against the state-of-the-art time series algorithms, including a Long-Short-Term-Memory (LSTM) neural network and DeepAR (industry standard deep learning library by Amazon). Statistically, on standard time series benchmarks, tspDB outperforms LSTM and DeepAR with 1.1-1.3x higher relative accuracy. Computationally, tspDB is 59-62x and 94-95x faster compared to LSTM and DeepAR in terms of median ML model training time and prediction query latency, respectively. Further, compared to PostgreSQL’s bulk insert time and its SELECT query latency, tspDB is slower only by 1.3x and 2.6x respectively. That is, tspDB is a real-time prediction system in that its model training / prediction query time is similar to just inserting, reading data from a DB. As an algorithmic contribution, we introduce an incremental multivariate matrix factorization based time series method, which tspDB is built off. We show this method also allows one to produce reliable prediction intervals by accurately estimating the time-varying variance of a time series, thereby addressing an important problem in time series analysis.

Cite this Paper

BibTeX


@InProceedings{pmlr-v133-agarwal21a,
  title = 	 {tspDB: Time Series Predict DB},
  author =       {Agarwal, Anish and Alomar, Abdullah and Shah, Devavrat},
  booktitle = 	 {Proceedings of the NeurIPS 2020 Competition and Demonstration Track},
  pages = 	 {27--56},
  year = 	 {2021},
  editor = 	 {Escalante, Hugo Jair and Hofmann, Katja},
  volume = 	 {133},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--12 Dec},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v133/agarwal21a/agarwal21a.pdf},
  url = 	 {https://proceedings.mlr.press/v133/agarwal21a.html},
  abstract = 	 {A major bottleneck of the current Machine Learning (ML) workflow is the time consuming, error prone engineering required to get data from a datastore or a database (DB) to the point an ML algorithm can be applied to it.  This is further exacerbated since ML algorithms are now trained on large volumes of data, yet we need predictions in real-time, especially in a variety of time-series applications such as finance and real-time control systems.  Hence, we explore the feasibility of directly integrating prediction functionality on top of a data store or DB.  Such a system ideally:  (i) provides an intuitive prediction query interface which alleviates the unwieldy data engineering;  (ii) provides state-of-the-art statistical accuracy while ensuring incremental model update, low model training time  and low latency for making predictions.  As the main contribution we explicitly instantiate a proof-of-concept, tspDB which directly integrates with PostgreSQL.  We rigorously test tspDB’s statistical and computational performance against the state-of-the-art time series algorithms, including a Long-Short-Term-Memory (LSTM) neural network and DeepAR (industry standard deep learning library by Amazon).  Statistically, on standard time series benchmarks, tspDB outperforms LSTM and DeepAR with 1.1-1.3x higher relative accuracy.  Computationally, tspDB is 59-62x and 94-95x faster compared to LSTM and DeepAR in terms of median ML model training time and prediction query latency, respectively.  Further, compared to PostgreSQL’s bulk insert time and its SELECT query latency, tspDB is slower only by 1.3x and 2.6x respectively.  That is, tspDB is a real-time prediction system in that its model training / prediction query time is similar to just inserting, reading data from a DB. As an algorithmic contribution, we introduce an incremental multivariate matrix factorization based time series method, which tspDB is built off. We show this method also allows one to produce reliable prediction intervals by accurately estimating the time-varying variance of a time series, thereby addressing an important problem in time series analysis.}
}

Endnote

%0 Conference Paper
%T tspDB: Time Series Predict DB
%A Anish Agarwal
%A Abdullah Alomar
%A Devavrat Shah
%B Proceedings of the NeurIPS 2020 Competition and Demonstration Track
%C Proceedings of Machine Learning Research
%D 2021
%E Hugo Jair Escalante
%E Katja Hofmann	
%F pmlr-v133-agarwal21a
%I PMLR
%P 27--56
%U https://proceedings.mlr.press/v133/agarwal21a.html
%V 133
%X A major bottleneck of the current Machine Learning (ML) workflow is the time consuming, error prone engineering required to get data from a datastore or a database (DB) to the point an ML algorithm can be applied to it.  This is further exacerbated since ML algorithms are now trained on large volumes of data, yet we need predictions in real-time, especially in a variety of time-series applications such as finance and real-time control systems.  Hence, we explore the feasibility of directly integrating prediction functionality on top of a data store or DB.  Such a system ideally:  (i) provides an intuitive prediction query interface which alleviates the unwieldy data engineering;  (ii) provides state-of-the-art statistical accuracy while ensuring incremental model update, low model training time  and low latency for making predictions.  As the main contribution we explicitly instantiate a proof-of-concept, tspDB which directly integrates with PostgreSQL.  We rigorously test tspDB’s statistical and computational performance against the state-of-the-art time series algorithms, including a Long-Short-Term-Memory (LSTM) neural network and DeepAR (industry standard deep learning library by Amazon).  Statistically, on standard time series benchmarks, tspDB outperforms LSTM and DeepAR with 1.1-1.3x higher relative accuracy.  Computationally, tspDB is 59-62x and 94-95x faster compared to LSTM and DeepAR in terms of median ML model training time and prediction query latency, respectively.  Further, compared to PostgreSQL’s bulk insert time and its SELECT query latency, tspDB is slower only by 1.3x and 2.6x respectively.  That is, tspDB is a real-time prediction system in that its model training / prediction query time is similar to just inserting, reading data from a DB. As an algorithmic contribution, we introduce an incremental multivariate matrix factorization based time series method, which tspDB is built off. We show this method also allows one to produce reliable prediction intervals by accurately estimating the time-varying variance of a time series, thereby addressing an important problem in time series analysis.

APA


Agarwal, A., Alomar, A. & Shah, D.. (2021). tspDB: Time Series Predict DB. Proceedings of the NeurIPS 2020 Competition and Demonstration Track, in Proceedings of Machine Learning Research 133:27-56 Available from https://proceedings.mlr.press/v133/agarwal21a.html.

Related Material

Download PDF