[edit]
Bridging the Reliability Gap: INT8 Quantization Effects on Discrimination and Calibration in Medical Imaging
Proceedings of the 7th Conference on Health, Inference, and Learning, PMLR 333:952-965, 2026.
Abstract
Deploying medical imaging classifiers often requires reduced-precision inference for practical latency and memory budgets, yet the impact of quantization on discrimination and calibration varies across tasks and architectures. We evaluate three public medical imaging datasets (BrainMRI, ChestXray, SkinCancer) and eight ImageNet-pretrained backbones under FP32, FP16, INT8 post-training quantization (PTQ), and INT8 quantization-aware training (QAT). We report macro one-vs-rest ROC-AUC and AUPRC, calibration metrics (ECE, Brier score), and efficiency metrics (throughput, p50 and p99 batch latency) measured on GPU and CPU. FP16 closely matches FP32 across datasets, while INT8-PTQ can introduce substantial and architecture-dependent degradation and calibration shifts. INT8-QAT largely recovers floating-point behavior while enabling integer inference. These results motivate evaluating accuracy, calibration, and efficiency together when selecting quantization strategies for clinical deployment.