Using Grammatical Inference to Build Privacy Preserving Data-sets of User Logs

Victor Connes, Colin De La Higuera, Hoel Le Capitaine
Proceedings of the Fifteenth International Conference on Grammatical Inference, PMLR 153:176-190, 2021.

Abstract

In many web applications, user logs are extracted to build a user model which can be part of further development, recommendation systems or personalization. This is the case for education platforms like X5GON. In order to obtain community collaboration, these logs should be shared, but logical privacy issues arise. In this work, we propose to build a user model from a data-set of logs: this will be a timed and probabilistic $k$-testable automaton, which can then be used to generate a new data-set having statistically close characteristics, yet have in which the original sequences have been sufficiently chunked the original data to not be able to identify the original logs. Following ideas from Differencial Privacy, we provide a second algorithm allowing to eliminate any strings whose influence would be too great. Experiments validate the approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v153-connes21a, title = {Using Grammatical Inference to Build Privacy Preserving Data-sets of User Logs}, author = {Connes, Victor and De La Higuera, Colin and Le Capitaine, Hoel}, booktitle = {Proceedings of the Fifteenth International Conference on Grammatical Inference}, pages = {176--190}, year = {2021}, editor = {Chandlee, Jane and Eyraud, Rémi and Heinz, Jeff and Jardine, Adam and van Zaanen, Menno}, volume = {153}, series = {Proceedings of Machine Learning Research}, month = {23--27 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v153/connes21a/connes21a.pdf}, url = {https://proceedings.mlr.press/v153/connes21a.html}, abstract = {In many web applications, user logs are extracted to build a user model which can be part of further development, recommendation systems or personalization. This is the case for education platforms like X5GON. In order to obtain community collaboration, these logs should be shared, but logical privacy issues arise. In this work, we propose to build a user model from a data-set of logs: this will be a timed and probabilistic $k$-testable automaton, which can then be used to generate a new data-set having statistically close characteristics, yet have in which the original sequences have been sufficiently chunked the original data to not be able to identify the original logs. Following ideas from Differencial Privacy, we provide a second algorithm allowing to eliminate any strings whose influence would be too great. Experiments validate the approach.} }
Endnote
%0 Conference Paper %T Using Grammatical Inference to Build Privacy Preserving Data-sets of User Logs %A Victor Connes %A Colin De La Higuera %A Hoel Le Capitaine %B Proceedings of the Fifteenth International Conference on Grammatical Inference %C Proceedings of Machine Learning Research %D 2021 %E Jane Chandlee %E Rémi Eyraud %E Jeff Heinz %E Adam Jardine %E Menno van Zaanen %F pmlr-v153-connes21a %I PMLR %P 176--190 %U https://proceedings.mlr.press/v153/connes21a.html %V 153 %X In many web applications, user logs are extracted to build a user model which can be part of further development, recommendation systems or personalization. This is the case for education platforms like X5GON. In order to obtain community collaboration, these logs should be shared, but logical privacy issues arise. In this work, we propose to build a user model from a data-set of logs: this will be a timed and probabilistic $k$-testable automaton, which can then be used to generate a new data-set having statistically close characteristics, yet have in which the original sequences have been sufficiently chunked the original data to not be able to identify the original logs. Following ideas from Differencial Privacy, we provide a second algorithm allowing to eliminate any strings whose influence would be too great. Experiments validate the approach.
APA
Connes, V., De La Higuera, C. & Le Capitaine, H.. (2021). Using Grammatical Inference to Build Privacy Preserving Data-sets of User Logs. Proceedings of the Fifteenth International Conference on Grammatical Inference, in Proceedings of Machine Learning Research 153:176-190 Available from https://proceedings.mlr.press/v153/connes21a.html.

Related Material