Hide-and-Seek Privacy Challenge: Synthetic Data Generation vs. Patient Re-identification
Proceedings of the NeurIPS 2020 Competition and Demonstration Track, PMLR 133:206-215, 2021.
The clinical time-series setting poses a unique combination of challenges to data modelling and sharing. Due to the high dimensionality of clinical time series, adequate de-identification to preserve privacy while retaining data utility is difficult to achieve using common de-identification techniques. An innovative approach to this problem is synthetic data generation. From a technical perspective, a good generative model for time-series data should preserve temporal dynamics; new sequences should respect the original relationships between high-dimensional variables across time. From the privacy perspective, the model should prevent patient re-identification. The NeurIPS 2020 Hide-and-Seek Privacy Challenge was a novel two-tracked competition to simultaneously accelerate progress in tackling both problems. In our head-to-head format, participants in the generation track (?hiders?) and the patient re-identification track (?seekers?) were directly pitted against each other by way of a new, high-quality intensive care time-series dataset: the AmsterdamUMCdb dataset. In this paper we present an overview of the competition design, as well as highlighting areas we feel should be changed for future iterations of this competition.