Algorithms for the Hard Pre-Image Problem of String Kernels and the General Problem of String Prediction

Sébastien Giguère; Amélie Rolland; Francois Laviolette; Mario Marchand

Algorithms for the Hard Pre-Image Problem of String Kernels and the General Problem of String Prediction

Sébastien Giguère, Amélie Rolland, Francois Laviolette, Mario Marchand

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:2021-2029, 2015.

Abstract

We address the pre-image problem encountered in structured output prediction and the one of finding a string maximizing the prediction function of various kernel-based classifiers and regressors. We demonstrate that these problems reduce to a common combinatorial problem valid for many string kernels. For this problem, we propose an upper bound on the prediction function which has low computational complexity and which can be used in a branch and bound search algorithm to obtain optimal solutions. We also show that for many string kernels, the complexity of the problem increases significantly when the kernel is normalized. On the optical word recognition task, the exact solution of the pre-image problem is shown to significantly improve the prediction accuracy in comparison with an approximation found by the best known heuristic. On the task of finding a string maximizing the prediction function of kernel-based classifiers and regressors, we highlight that existing methods can be biased toward long strings that contain many repeated symbols. We demonstrate that this bias is removed when using normalized kernels. Finally, we present results for the discovery of lead compounds in drug discovery. The source code can be found at https://github.com/a-ro/preimage

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-giguere15,
  title = 	 {Algorithms for the Hard Pre-Image Problem of String Kernels and the General Problem of String Prediction},
  author = 	 {Giguère, Sébastien and Rolland, Amélie and Laviolette, Francois and Marchand, Mario},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {2021--2029},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/giguere15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/giguere15.html},
  abstract = 	 {We address the pre-image problem encountered in structured output prediction and the one of finding a string maximizing the prediction function of various kernel-based classifiers and regressors. We demonstrate that these problems reduce to a common combinatorial problem valid for many string kernels. For this problem, we propose an upper bound on the prediction function which has low computational complexity and which can be used in a branch and bound search algorithm to obtain optimal solutions. We also show that for many string kernels, the complexity of the problem increases significantly when the kernel is normalized. On the optical word recognition task, the exact solution of the pre-image problem is shown to significantly improve the prediction accuracy in comparison with an approximation found by the best known heuristic. On the task of finding a string maximizing the prediction function of kernel-based classifiers and regressors, we highlight that existing methods can be biased toward long strings that contain many repeated symbols. We demonstrate that this bias is removed when using normalized kernels. Finally, we present results for the discovery of lead compounds in drug discovery. The source code can be found at https://github.com/a-ro/preimage}
}

Endnote

%0 Conference Paper
%T Algorithms for the Hard Pre-Image Problem of String Kernels and the General Problem of String Prediction
%A Sébastien Giguère
%A Amélie Rolland
%A Francois Laviolette
%A Mario Marchand
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-giguere15
%I PMLR
%P 2021--2029
%U https://proceedings.mlr.press/v37/giguere15.html
%V 37
%X We address the pre-image problem encountered in structured output prediction and the one of finding a string maximizing the prediction function of various kernel-based classifiers and regressors. We demonstrate that these problems reduce to a common combinatorial problem valid for many string kernels. For this problem, we propose an upper bound on the prediction function which has low computational complexity and which can be used in a branch and bound search algorithm to obtain optimal solutions. We also show that for many string kernels, the complexity of the problem increases significantly when the kernel is normalized. On the optical word recognition task, the exact solution of the pre-image problem is shown to significantly improve the prediction accuracy in comparison with an approximation found by the best known heuristic. On the task of finding a string maximizing the prediction function of kernel-based classifiers and regressors, we highlight that existing methods can be biased toward long strings that contain many repeated symbols. We demonstrate that this bias is removed when using normalized kernels. Finally, we present results for the discovery of lead compounds in drug discovery. The source code can be found at https://github.com/a-ro/preimage

RIS


TY  - CPAPER
TI  - Algorithms for the Hard Pre-Image Problem of String Kernels and the General Problem of String Prediction
AU  - Sébastien Giguère
AU  - Amélie Rolland
AU  - Francois Laviolette
AU  - Mario Marchand
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-giguere15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 2021
EP  - 2029
L1  - http://proceedings.mlr.press/v37/giguere15.pdf
UR  - https://proceedings.mlr.press/v37/giguere15.html
AB  - We address the pre-image problem encountered in structured output prediction and the one of finding a string maximizing the prediction function of various kernel-based classifiers and regressors. We demonstrate that these problems reduce to a common combinatorial problem valid for many string kernels. For this problem, we propose an upper bound on the prediction function which has low computational complexity and which can be used in a branch and bound search algorithm to obtain optimal solutions. We also show that for many string kernels, the complexity of the problem increases significantly when the kernel is normalized. On the optical word recognition task, the exact solution of the pre-image problem is shown to significantly improve the prediction accuracy in comparison with an approximation found by the best known heuristic. On the task of finding a string maximizing the prediction function of kernel-based classifiers and regressors, we highlight that existing methods can be biased toward long strings that contain many repeated symbols. We demonstrate that this bias is removed when using normalized kernels. Finally, we present results for the discovery of lead compounds in drug discovery. The source code can be found at https://github.com/a-ro/preimage
ER  -

APA


Giguère, S., Rolland, A., Laviolette, F. & Marchand, M.. (2015). Algorithms for the Hard Pre-Image Problem of String Kernels and the General Problem of String Prediction. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:2021-2029 Available from https://proceedings.mlr.press/v37/giguere15.html.

Algorithms for the Hard Pre-Image Problem of String Kernels and the General Problem of String Prediction

Abstract

Cite this Paper

Related Material