Being Properly Improper
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:20891-20932, 2022.
Properness for supervised losses stipulates that the loss function shapes the learning algorithm towards the true posterior of the data generating distribution. Unfortunately, data in modern machine learning can be corrupted or twisted in many ways. Hence, optimizing a proper loss function on twisted data could perilously lead the learning algorithm towards the twisted posterior, rather than to the desired clean posterior. Many papers cope with specific twists (e.g., label/feature/adversarial noise), but there is a growing need for a unified and actionable understanding atop properness. Our chief theoretical contribution is a generalization of the properness framework with a notion called twist-properness, which delineates loss functions with the ability to "untwist" the twisted posterior into the clean posterior. Notably, we show that a nontrivial extension of a loss function called alpha-loss, which was first introduced in information theory, is twist-proper. We study the twist-proper alpha-loss under a novel boosting algorithm, called PILBoost, and provide formal and experimental results for this algorithm. Our overarching practical conclusion is that the twist-proper alpha-loss outperforms the proper log-loss on several variants of twisted data.