Learning from Censored and Dependent Data: The case of Linear Dynamics
Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:3771-3787, 2021.
Observations from dynamical systems often exhibit irregularities, such as censoring, where values are recorded only if they fall within a certain range. Censoring is ubiquitous in practice, due to saturating sensors, limit-of-detection effects, image frame effects, and combined with the temporal dependencies within the data, makes the task of system identification particularly challenging. In light of recent developments on learning linear dynamical systems (LDSs), and on censored statistics with independent data, we revisit the decades-old problem of learning an LDS, from censored observations (Lee and Maddala (1985), Zeger and Brookmeyer (1986)). Here, the learner observes the state x_t \in R^d if and only if x_t belongs to some set S_t _x0012_\in R^d. We develop the first computationally and statistically efficient algorithm for learning the system, assuming only oracle to the sets St. Our algorithm, Stochastic Online Newton with Switching Gradients, is a novel second-order method that builds on the Online Newton Step (ONS) of Hazan et al. (2007). Our Switching-Gradient scheme does not always use (stochastic) gradients of the function we want to optimize, which we call censor-aware function. Instead, in each iteration, it performs a simple test to decide whether to use the censor-aware, or another censor-oblivious function, for getting a stochastic gradient. In our analysis, we consider a “generic” Online Newton method, which uses arbitrary vectors instead of gradients, and we prove an error-bound for it. This can be used to appropriately design these vectors, Leading to our Switching-Gradient scheme. This framework significantly deviates from the recent long line of works on censored statistics (e.g, Daskalakis et al. (2018); Kontonis et al. (2019); Daskalakis et al. (2019), which apply Stochastic Gradient Descent (SGD), and their analysis reduces to establishing conditions for off-the-shelf SGD-bounds. Our approach enables to relax these conditions, and gives rise to phenomena that might appear counterintuitive, given the previous works. Specifically, our method makes progress even when the current “survival probability” is exponentially small. We believe that our analysis framework will have applications in more settings where the data are subject to censoring.