Efficient active learning with generalized linear models

Jeremy Lewi, Robert Butera, Liam Paninski
Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, PMLR 2:267-274, 2007.

Abstract

Active learning can significantly reduce the amount of training data required to fit parametric statistical models for supervised learning tasks. Here we present an efficient algorithm for choosing the optimal (most informative) query when the output labels are related to the inputs by a generalized linear model (GLM). The algorithm is based on a Laplace approximation of the posterior distribution of the GLM's parameters. The algorithm requires only low-rank matrix manipulations and a single two-dimensional search to choose the optimal query and has complexity $O(n^2)$ (with $n$ the dimension of the feature space), making active learning with GLMs feasible even for high-dimensional feature spaces. In certain cases the twodimensional search may be reduced to a onedimensional search, further improving the algorithm's efficiency. Simulation results show that the model parameters can be estimated much more efficiently using the active learning technique than by using randomly chosen queries. We compute the asymptotic posterior covariance semi-analytically and demonstrate that the algorithm empirically achieves this asymptotic convergence rate, which is generally better than the convergence rate in the random-query setting. Finally, we generalize the approach to efficiently handle both output history effects (for applications to time-series models of autoregressive type) and slow, non-systematic drifts in the model parameters

Cite this Paper


BibTeX
@InProceedings{pmlr-v2-lewi07a, title = {Efficient active learning with generalized linear models}, author = {Lewi, Jeremy and Butera, Robert and Paninski, Liam}, booktitle = {Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics}, pages = {267--274}, year = {2007}, editor = {Meila, Marina and Shen, Xiaotong}, volume = {2}, series = {Proceedings of Machine Learning Research}, address = {San Juan, Puerto Rico}, month = {21--24 Mar}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v2/lewi07a/lewi07a.pdf}, url = {https://proceedings.mlr.press/v2/lewi07a.html}, abstract = {Active learning can significantly reduce the amount of training data required to fit parametric statistical models for supervised learning tasks. Here we present an efficient algorithm for choosing the optimal (most informative) query when the output labels are related to the inputs by a generalized linear model (GLM). The algorithm is based on a Laplace approximation of the posterior distribution of the GLM's parameters. The algorithm requires only low-rank matrix manipulations and a single two-dimensional search to choose the optimal query and has complexity $O(n^2)$ (with $n$ the dimension of the feature space), making active learning with GLMs feasible even for high-dimensional feature spaces. In certain cases the twodimensional search may be reduced to a onedimensional search, further improving the algorithm's efficiency. Simulation results show that the model parameters can be estimated much more efficiently using the active learning technique than by using randomly chosen queries. We compute the asymptotic posterior covariance semi-analytically and demonstrate that the algorithm empirically achieves this asymptotic convergence rate, which is generally better than the convergence rate in the random-query setting. Finally, we generalize the approach to efficiently handle both output history effects (for applications to time-series models of autoregressive type) and slow, non-systematic drifts in the model parameters} }
Endnote
%0 Conference Paper %T Efficient active learning with generalized linear models %A Jeremy Lewi %A Robert Butera %A Liam Paninski %B Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2007 %E Marina Meila %E Xiaotong Shen %F pmlr-v2-lewi07a %I PMLR %P 267--274 %U https://proceedings.mlr.press/v2/lewi07a.html %V 2 %X Active learning can significantly reduce the amount of training data required to fit parametric statistical models for supervised learning tasks. Here we present an efficient algorithm for choosing the optimal (most informative) query when the output labels are related to the inputs by a generalized linear model (GLM). The algorithm is based on a Laplace approximation of the posterior distribution of the GLM's parameters. The algorithm requires only low-rank matrix manipulations and a single two-dimensional search to choose the optimal query and has complexity $O(n^2)$ (with $n$ the dimension of the feature space), making active learning with GLMs feasible even for high-dimensional feature spaces. In certain cases the twodimensional search may be reduced to a onedimensional search, further improving the algorithm's efficiency. Simulation results show that the model parameters can be estimated much more efficiently using the active learning technique than by using randomly chosen queries. We compute the asymptotic posterior covariance semi-analytically and demonstrate that the algorithm empirically achieves this asymptotic convergence rate, which is generally better than the convergence rate in the random-query setting. Finally, we generalize the approach to efficiently handle both output history effects (for applications to time-series models of autoregressive type) and slow, non-systematic drifts in the model parameters
RIS
TY - CPAPER TI - Efficient active learning with generalized linear models AU - Jeremy Lewi AU - Robert Butera AU - Liam Paninski BT - Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics DA - 2007/03/11 ED - Marina Meila ED - Xiaotong Shen ID - pmlr-v2-lewi07a PB - PMLR DP - Proceedings of Machine Learning Research VL - 2 SP - 267 EP - 274 L1 - http://proceedings.mlr.press/v2/lewi07a/lewi07a.pdf UR - https://proceedings.mlr.press/v2/lewi07a.html AB - Active learning can significantly reduce the amount of training data required to fit parametric statistical models for supervised learning tasks. Here we present an efficient algorithm for choosing the optimal (most informative) query when the output labels are related to the inputs by a generalized linear model (GLM). The algorithm is based on a Laplace approximation of the posterior distribution of the GLM's parameters. The algorithm requires only low-rank matrix manipulations and a single two-dimensional search to choose the optimal query and has complexity $O(n^2)$ (with $n$ the dimension of the feature space), making active learning with GLMs feasible even for high-dimensional feature spaces. In certain cases the twodimensional search may be reduced to a onedimensional search, further improving the algorithm's efficiency. Simulation results show that the model parameters can be estimated much more efficiently using the active learning technique than by using randomly chosen queries. We compute the asymptotic posterior covariance semi-analytically and demonstrate that the algorithm empirically achieves this asymptotic convergence rate, which is generally better than the convergence rate in the random-query setting. Finally, we generalize the approach to efficiently handle both output history effects (for applications to time-series models of autoregressive type) and slow, non-systematic drifts in the model parameters ER -
APA
Lewi, J., Butera, R. & Paninski, L.. (2007). Efficient active learning with generalized linear models. Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 2:267-274 Available from https://proceedings.mlr.press/v2/lewi07a.html.

Related Material