Estimating Mutual Information in Under-Reported Variables

Konstantinos Sechidis, Matthew Sperrin, Emily Petherick, Gavin Brown
Proceedings of the Eighth International Conference on Probabilistic Graphical Models, PMLR 52:449-461, 2016.

Abstract

Under-reporting occurs in survey data when there is a reason to systematically misreport the response to a question. For example, in studies dealing with low birth weight infants, the smoking habits of the mother are very likely to be misreported. This creates problems for calculating effect sizes, such as bias, but these problems are commonly ignored due to lack of generally accepted solutions. We reinterpret this as a problem of learning from missing data, and particularly learning from positive and unlabelled data. By this formalisation we provide a simple method to incorporate prior knowledge of the misreporting and we present how we can use this knowledge to derive corrected point and interval estimates of the mutual information. Then we show how our corrected estimators outperform more complex approaches and we present applications of our theoretical results in real world problems and machine learning tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v52-sechidis16, title = {Estimating Mutual Information in Under-Reported Variables}, author = {Sechidis, Konstantinos and Sperrin, Matthew and Petherick, Emily and Brown, Gavin}, booktitle = {Proceedings of the Eighth International Conference on Probabilistic Graphical Models}, pages = {449--461}, year = {2016}, editor = {Antonucci, Alessandro and Corani, Giorgio and Campos}, Cassio Polpo}, volume = {52}, series = {Proceedings of Machine Learning Research}, address = {Lugano, Switzerland}, month = {06--09 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v52/sechidis16.pdf}, url = {https://proceedings.mlr.press/v52/sechidis16.html}, abstract = {Under-reporting occurs in survey data when there is a reason to systematically misreport the response to a question. For example, in studies dealing with low birth weight infants, the smoking habits of the mother are very likely to be misreported. This creates problems for calculating effect sizes, such as bias, but these problems are commonly ignored due to lack of generally accepted solutions. We reinterpret this as a problem of learning from missing data, and particularly learning from positive and unlabelled data. By this formalisation we provide a simple method to incorporate prior knowledge of the misreporting and we present how we can use this knowledge to derive corrected point and interval estimates of the mutual information. Then we show how our corrected estimators outperform more complex approaches and we present applications of our theoretical results in real world problems and machine learning tasks.} }
Endnote
%0 Conference Paper %T Estimating Mutual Information in Under-Reported Variables %A Konstantinos Sechidis %A Matthew Sperrin %A Emily Petherick %A Gavin Brown %B Proceedings of the Eighth International Conference on Probabilistic Graphical Models %C Proceedings of Machine Learning Research %D 2016 %E Alessandro Antonucci %E Giorgio Corani %E Cassio Polpo Campos} %F pmlr-v52-sechidis16 %I PMLR %P 449--461 %U https://proceedings.mlr.press/v52/sechidis16.html %V 52 %X Under-reporting occurs in survey data when there is a reason to systematically misreport the response to a question. For example, in studies dealing with low birth weight infants, the smoking habits of the mother are very likely to be misreported. This creates problems for calculating effect sizes, such as bias, but these problems are commonly ignored due to lack of generally accepted solutions. We reinterpret this as a problem of learning from missing data, and particularly learning from positive and unlabelled data. By this formalisation we provide a simple method to incorporate prior knowledge of the misreporting and we present how we can use this knowledge to derive corrected point and interval estimates of the mutual information. Then we show how our corrected estimators outperform more complex approaches and we present applications of our theoretical results in real world problems and machine learning tasks.
RIS
TY - CPAPER TI - Estimating Mutual Information in Under-Reported Variables AU - Konstantinos Sechidis AU - Matthew Sperrin AU - Emily Petherick AU - Gavin Brown BT - Proceedings of the Eighth International Conference on Probabilistic Graphical Models DA - 2016/08/15 ED - Alessandro Antonucci ED - Giorgio Corani ED - Cassio Polpo Campos} ID - pmlr-v52-sechidis16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 52 SP - 449 EP - 461 L1 - http://proceedings.mlr.press/v52/sechidis16.pdf UR - https://proceedings.mlr.press/v52/sechidis16.html AB - Under-reporting occurs in survey data when there is a reason to systematically misreport the response to a question. For example, in studies dealing with low birth weight infants, the smoking habits of the mother are very likely to be misreported. This creates problems for calculating effect sizes, such as bias, but these problems are commonly ignored due to lack of generally accepted solutions. We reinterpret this as a problem of learning from missing data, and particularly learning from positive and unlabelled data. By this formalisation we provide a simple method to incorporate prior knowledge of the misreporting and we present how we can use this knowledge to derive corrected point and interval estimates of the mutual information. Then we show how our corrected estimators outperform more complex approaches and we present applications of our theoretical results in real world problems and machine learning tasks. ER -
APA
Sechidis, K., Sperrin, M., Petherick, E. & Brown, G.. (2016). Estimating Mutual Information in Under-Reported Variables. Proceedings of the Eighth International Conference on Probabilistic Graphical Models, in Proceedings of Machine Learning Research 52:449-461 Available from https://proceedings.mlr.press/v52/sechidis16.html.

Related Material