[edit]
Enterprise Disk Drive Scrubbing Based on Mondrian Conformal Predictors
Proceedings of the Twelfth Symposium on Conformal
and Probabilistic Prediction with Applications, PMLR 204:56-73, 2023.
Abstract
Disk scrubbing is a process aimed at resolving read
errors on disks by reading data from the
disk. However, scrubbing the entire storage array at
once can adversely impact system performance,
particularly during periods of high input/output
operations. Additionally, the continuous reading of
data from disks when scrubbing can result in wear
and tear, especially on larger capacity disks, due
to the significant time and energy consumption
involved. To address these issues, we propose a
selective disk scrubbing method that enhances the
overall reliability and power efficiency in data
centers. Our method employs a Machine Learning model
based on Mondrian Conformal prediction to identify
specific disks for scrubbing, by proactively
predicting the health status of each disk in the
storage pool, forecasting n-days in advance, and
using an open-source dataset. For disks predicted as
non-healthy, we mark them for replacement without
further action. For healthy drives, we create a set
and quantify their relative health across the entire
storage pool based on the predictor’s
confidence. This enables us to prioritize selective
scrubbing for drives with established scrubbing
frequency based on the scrub cycle. The method we
propose provides an efficient and dependable
solution for managing enterprise disk drives. By
scrubbing just 22.7% of the total storage disks, we
can achieve optimized energy consumption and reduce
the carbon footprint of the data center.