Attributing Hacks

Ziqi Liu, Alex Smola, Kyle Soska, Yu-Xiang Wang, Qinghua Zheng
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:794-802, 2017.

Abstract

In this paper, we describe an algorithm for estimating the provenance of hacks on websites. That is, given properties of sites and the temporal occurrence of attacks, we are able to attribute individual attacks to joint causes and vulnerabilities, as well as estimating the evolution of these vulnerabilities over time. Specifically, we use hazard regression with a time-varying additive hazard function parameterized in a generalized linear form. The activation coefficients on each feature are continuous-time functions over time. We formulate the problem of learning these functions as a constrained variational maximum likelihood estimation problem with total variation penalty and show that the optimal solution is a 0th order spline (a piecewise constant function) with a finite number of adaptively chosen knots. This allows the inference problem to be solved efficiently and at scale by solving a finite dimensional optimization problem. Extensive experiments on real data sets show that our method significantly outperforms Cox’s proportional hazard model. We also conduct case studies and verify that the fitted functions are indeed recovering vulnerable features.

Cite this Paper


BibTeX
@InProceedings{pmlr-v54-liu17a, title = {{Attributing Hacks}}, author = {Liu, Ziqi and Smola, Alex and Soska, Kyle and Wang, Yu-Xiang and Zheng, Qinghua}, booktitle = {Proceedings of the 20th International Conference on Artificial Intelligence and Statistics}, pages = {794--802}, year = {2017}, editor = {Singh, Aarti and Zhu, Jerry}, volume = {54}, series = {Proceedings of Machine Learning Research}, month = {20--22 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v54/liu17a/liu17a.pdf}, url = {https://proceedings.mlr.press/v54/liu17a.html}, abstract = {In this paper, we describe an algorithm for estimating the provenance of hacks on websites. That is, given properties of sites and the temporal occurrence of attacks, we are able to attribute individual attacks to joint causes and vulnerabilities, as well as estimating the evolution of these vulnerabilities over time. Specifically, we use hazard regression with a time-varying additive hazard function parameterized in a generalized linear form. The activation coefficients on each feature are continuous-time functions over time. We formulate the problem of learning these functions as a constrained variational maximum likelihood estimation problem with total variation penalty and show that the optimal solution is a 0th order spline (a piecewise constant function) with a finite number of adaptively chosen knots. This allows the inference problem to be solved efficiently and at scale by solving a finite dimensional optimization problem. Extensive experiments on real data sets show that our method significantly outperforms Cox’s proportional hazard model. We also conduct case studies and verify that the fitted functions are indeed recovering vulnerable features.} }
Endnote
%0 Conference Paper %T Attributing Hacks %A Ziqi Liu %A Alex Smola %A Kyle Soska %A Yu-Xiang Wang %A Qinghua Zheng %B Proceedings of the 20th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2017 %E Aarti Singh %E Jerry Zhu %F pmlr-v54-liu17a %I PMLR %P 794--802 %U https://proceedings.mlr.press/v54/liu17a.html %V 54 %X In this paper, we describe an algorithm for estimating the provenance of hacks on websites. That is, given properties of sites and the temporal occurrence of attacks, we are able to attribute individual attacks to joint causes and vulnerabilities, as well as estimating the evolution of these vulnerabilities over time. Specifically, we use hazard regression with a time-varying additive hazard function parameterized in a generalized linear form. The activation coefficients on each feature are continuous-time functions over time. We formulate the problem of learning these functions as a constrained variational maximum likelihood estimation problem with total variation penalty and show that the optimal solution is a 0th order spline (a piecewise constant function) with a finite number of adaptively chosen knots. This allows the inference problem to be solved efficiently and at scale by solving a finite dimensional optimization problem. Extensive experiments on real data sets show that our method significantly outperforms Cox’s proportional hazard model. We also conduct case studies and verify that the fitted functions are indeed recovering vulnerable features.
APA
Liu, Z., Smola, A., Soska, K., Wang, Y. & Zheng, Q.. (2017). Attributing Hacks. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 54:794-802 Available from https://proceedings.mlr.press/v54/liu17a.html.

Related Material