Scalable SDE Filtering and Inference with Apache Spark

Harish S. Bhat; R. W. M. A. Madushani; Shagun Rawat

Scalable SDE Filtering and Inference with Apache Spark

Harish S. Bhat, R. W. M. A. Madushani, Shagun Rawat

Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016, PMLR 53:18-34, 2016.

Abstract

In this paper, we consider the problem of Bayesian filtering and inference for time series data modeled as noisy, discrete-time observations of a stochastic differential equation (SDE) with undetermined parameters. We develop a Metropolis algorithm to sample from the high-dimensional joint posterior density of all SDE parameters and state time series. Our approach relies on an innovative density tracking by quadrature (DTQ) method to compute the likelihood of the SDE, the part of the posterior that requires the most computational effort to evaluate. As we show, the DTQ method lends itself to a natural implementation using Scala and Apache Spark, an open source framework for scalable data mining. We study the performance and scalability of our algorithm on filtering and inference problems for both regularly and irregularly spaced time series.

Cite this Paper

BibTeX


@InProceedings{pmlr-v53-bhat16,
  title = 	 {Scalable SDE Filtering and Inference with Apache Spark},
  author = 	 {Bhat, Harish S. and Madushani, R. W. M. A. and Rawat, Shagun},
  booktitle = 	 {Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016},
  pages = 	 {18--34},
  year = 	 {2016},
  editor = 	 {Fan, Wei and Bifet, Albert and Read, Jesse and Yang, Qiang and Yu, Philip S.},
  volume = 	 {53},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Francisco, California, USA},
  month = 	 {14 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v53/bhat16.pdf},
  url = 	 {https://proceedings.mlr.press/v53/bhat16.html},
  abstract = 	 {In this paper, we consider the problem of Bayesian filtering and inference for time series data modeled as noisy, discrete-time observations of a stochastic differential equation (SDE) with undetermined parameters. We develop a Metropolis algorithm to sample from the high-dimensional joint posterior density of all SDE parameters and state time series. Our approach relies on an innovative density tracking by quadrature (DTQ) method to compute the likelihood of the SDE, the part of the posterior that requires the most computational effort to evaluate. As we show, the DTQ method lends itself to a natural implementation using Scala and Apache Spark, an open source framework for scalable data mining. We study the performance and scalability of our algorithm on filtering and inference problems for both regularly and irregularly spaced time series.}
}

Endnote

%0 Conference Paper
%T Scalable SDE Filtering and Inference with Apache Spark
%A Harish S. Bhat
%A R. W. M. A. Madushani
%A Shagun Rawat
%B Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016
%C Proceedings of Machine Learning Research
%D 2016
%E Wei Fan
%E Albert Bifet
%E Jesse Read
%E Qiang Yang
%E Philip S. Yu	
%F pmlr-v53-bhat16
%I PMLR
%P 18--34
%U https://proceedings.mlr.press/v53/bhat16.html
%V 53
%X In this paper, we consider the problem of Bayesian filtering and inference for time series data modeled as noisy, discrete-time observations of a stochastic differential equation (SDE) with undetermined parameters. We develop a Metropolis algorithm to sample from the high-dimensional joint posterior density of all SDE parameters and state time series. Our approach relies on an innovative density tracking by quadrature (DTQ) method to compute the likelihood of the SDE, the part of the posterior that requires the most computational effort to evaluate. As we show, the DTQ method lends itself to a natural implementation using Scala and Apache Spark, an open source framework for scalable data mining. We study the performance and scalability of our algorithm on filtering and inference problems for both regularly and irregularly spaced time series.

RIS


TY  - CPAPER
TI  - Scalable SDE Filtering and Inference with Apache Spark
AU  - Harish S. Bhat
AU  - R. W. M. A. Madushani
AU  - Shagun Rawat
BT  - Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016
DA  - 2016/12/06
ED  - Wei Fan
ED  - Albert Bifet
ED  - Jesse Read
ED  - Qiang Yang
ED  - Philip S. Yu	
ID  - pmlr-v53-bhat16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 53
SP  - 18
EP  - 34
L1  - http://proceedings.mlr.press/v53/bhat16.pdf
UR  - https://proceedings.mlr.press/v53/bhat16.html
AB  - In this paper, we consider the problem of Bayesian filtering and inference for time series data modeled as noisy, discrete-time observations of a stochastic differential equation (SDE) with undetermined parameters. We develop a Metropolis algorithm to sample from the high-dimensional joint posterior density of all SDE parameters and state time series. Our approach relies on an innovative density tracking by quadrature (DTQ) method to compute the likelihood of the SDE, the part of the posterior that requires the most computational effort to evaluate. As we show, the DTQ method lends itself to a natural implementation using Scala and Apache Spark, an open source framework for scalable data mining. We study the performance and scalability of our algorithm on filtering and inference problems for both regularly and irregularly spaced time series.
ER  -

APA


Bhat, H.S., Madushani, R.W.M.A. & Rawat, S.. (2016). Scalable SDE Filtering and Inference with Apache Spark. Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016, in Proceedings of Machine Learning Research 53:18-34 Available from https://proceedings.mlr.press/v53/bhat16.html.

Related Material

Download PDF