Estimating Mutual Information in Under-Reported Variables

Konstantinos Sechidis, Matthew Sperrin, Emily Petherick, Gavin Brown
; Proceedings of the Eighth International Conference on Probabilistic Graphical Models, PMLR 52:449-461, 2016.

Abstract

Under-reporting occurs in survey data when there is a reason to systematically misreport the response to a question. For example, in studies dealing with low birth weight infants, the smoking habits of the mother are very likely to be misreported. This creates problems for calculating effect sizes, such as bias, but these problems are commonly ignored due to lack of generally accepted solutions. We reinterpret this as a problem of learning from missing data, and particularly learning from positive and unlabelled data. By this formalisation we provide a simple method to incorporate prior knowledge of the misreporting and we present how we can use this knowledge to derive corrected point and interval estimates of the mutual information. Then we show how our corrected estimators outperform more complex approaches and we present applications of our theoretical results in real world problems and machine learning tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v52-sechidis16, title = {Estimating Mutual Information in Under-Reported Variables}, author = {Konstantinos Sechidis and Matthew Sperrin and Emily Petherick and Gavin Brown}, booktitle = {Proceedings of the Eighth International Conference on Probabilistic Graphical Models}, pages = {449--461}, year = {2016}, editor = {Alessandro Antonucci and Giorgio Corani and Cassio Polpo Campos}}, volume = {52}, series = {Proceedings of Machine Learning Research}, address = {Lugano, Switzerland}, month = {06--09 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v52/sechidis16.pdf}, url = {http://proceedings.mlr.press/v52/sechidis16.html}, abstract = {Under-reporting occurs in survey data when there is a reason to systematically misreport the response to a question. For example, in studies dealing with low birth weight infants, the smoking habits of the mother are very likely to be misreported. This creates problems for calculating effect sizes, such as bias, but these problems are commonly ignored due to lack of generally accepted solutions. We reinterpret this as a problem of learning from missing data, and particularly learning from positive and unlabelled data. By this formalisation we provide a simple method to incorporate prior knowledge of the misreporting and we present how we can use this knowledge to derive corrected point and interval estimates of the mutual information. Then we show how our corrected estimators outperform more complex approaches and we present applications of our theoretical results in real world problems and machine learning tasks.} }
Endnote
%0 Conference Paper %T Estimating Mutual Information in Under-Reported Variables %A Konstantinos Sechidis %A Matthew Sperrin %A Emily Petherick %A Gavin Brown %B Proceedings of the Eighth International Conference on Probabilistic Graphical Models %C Proceedings of Machine Learning Research %D 2016 %E Alessandro Antonucci %E Giorgio Corani %E Cassio Polpo Campos} %F pmlr-v52-sechidis16 %I PMLR %J Proceedings of Machine Learning Research %P 449--461 %U http://proceedings.mlr.press %V 52 %W PMLR %X Under-reporting occurs in survey data when there is a reason to systematically misreport the response to a question. For example, in studies dealing with low birth weight infants, the smoking habits of the mother are very likely to be misreported. This creates problems for calculating effect sizes, such as bias, but these problems are commonly ignored due to lack of generally accepted solutions. We reinterpret this as a problem of learning from missing data, and particularly learning from positive and unlabelled data. By this formalisation we provide a simple method to incorporate prior knowledge of the misreporting and we present how we can use this knowledge to derive corrected point and interval estimates of the mutual information. Then we show how our corrected estimators outperform more complex approaches and we present applications of our theoretical results in real world problems and machine learning tasks.
RIS
TY - CPAPER TI - Estimating Mutual Information in Under-Reported Variables AU - Konstantinos Sechidis AU - Matthew Sperrin AU - Emily Petherick AU - Gavin Brown BT - Proceedings of the Eighth International Conference on Probabilistic Graphical Models PY - 2016/08/15 DA - 2016/08/15 ED - Alessandro Antonucci ED - Giorgio Corani ED - Cassio Polpo Campos} ID - pmlr-v52-sechidis16 PB - PMLR SP - 449 DP - PMLR EP - 461 L1 - http://proceedings.mlr.press/v52/sechidis16.pdf UR - http://proceedings.mlr.press/v52/sechidis16.html AB - Under-reporting occurs in survey data when there is a reason to systematically misreport the response to a question. For example, in studies dealing with low birth weight infants, the smoking habits of the mother are very likely to be misreported. This creates problems for calculating effect sizes, such as bias, but these problems are commonly ignored due to lack of generally accepted solutions. We reinterpret this as a problem of learning from missing data, and particularly learning from positive and unlabelled data. By this formalisation we provide a simple method to incorporate prior knowledge of the misreporting and we present how we can use this knowledge to derive corrected point and interval estimates of the mutual information. Then we show how our corrected estimators outperform more complex approaches and we present applications of our theoretical results in real world problems and machine learning tasks. ER -
APA
Sechidis, K., Sperrin, M., Petherick, E. & Brown, G.. (2016). Estimating Mutual Information in Under-Reported Variables. Proceedings of the Eighth International Conference on Probabilistic Graphical Models, in PMLR 52:449-461

Related Material