Data Representations in Learning
Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, PMLR R0:495-501, 1995.
This paper examines the effect of varying the coarse-ness (or fine-ness) in a data representation upon the learning or recognition accuracy achievable. This accuracy is quantified by the least probability of error in recognition also known as the Bayes error rate, assuming that there are finite number of classes into which each data element can be classified. By modeling the granularity variation of the representation as a refinement of the underlying probability structure of the data, we examine how the recognition accuracy varies. Specifically, refining the data representation leads to improved bounds on the probability of error. Indeed, this confirms the intuitive notion that more information can lead to improved decision-making. This analysis may be extended to multiresolution methods where coarse-to-fine and fine-to-coarse variations in representations are possible. Our research was motivated by examining the change in the recognition accuracy of $k$-nearest neighbor classifiers while the resolution of the data - optical character images - is varied. In this domain, the data resolution is crucial in determining trade-offs in the speed and accuracy of the OCR system.