Differentially Private Universal Coresets for \((\alpha,\beta)\)-Fair \(k\)-Median and \(k\)-Means Clustering

Iftikhar Ahmad1, Wang Zu2
1University of Agriculture Faisalabad (UAF), Pakistan
2Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China
DOI: https://doi.org/10.71448/bcds2564-2
Published: 30/12/2025
Cite this article as: Iftikhar Ahmad, Wang Zu. Differentially Private Universal Coresets for \((\alpha,\beta)\)-Fair \(k\)-Median and \(k\)-Means Clustering. Bulletin of Computer and Data Sciences, Volume 6 Issue 4. Page: 22-38.

Abstract

Fair clustering with \((\alpha,\beta)\)-proportionality constraints requires every cluster to satisfy lower and upper bounds on the representation of each sensitive group. Recent universal-coreset results enable scalable optimization by compressing the dataset into a small weighted subset that preserves constrained clustering costs simultaneously for all center sets and all feasible group-cardinality (coloring) constraints. In many applications, however, the sensitive attributes that motivate fairness constraints also require formal privacy protection, and releasing even a small coreset can leak membership and group information. We propose a modular framework for differentially private universal coresets for \((\alpha,\beta)\)-fair \(k\)-median and \(k\)-means in metric and Euclidean spaces. The framework consists of two stages: (i) construct a randomized \(\eta/2\)-universal coreset via random sampling and reweighting; (ii) release a privatized coreset by perturbing only the weight vector using the Gaussian mechanism, followed by post-processing (nonnegativity and mass renormalization). We prove that the released summary is \((\varepsilon,\delta)\)-differentially private and, with high probability, preserves constrained costs for all \((C,M)\) up to a multiplicative \((1\pm\eta)\) factor plus an additive privacy term that depends on the metric diameter bound, coreset size, and an \(\ell_2\) weight-sensitivity parameter. We further extend the method to streaming via merge-and-reduce and analyze privacy under continual observation using advanced composition and amplification-by-subsampling. The resulting DP universal coresets allow multi-query fair clustering and interactive constraint tuning without repeated access to the raw sensitive dataset.

Keywords: Fair clustering, \((\alpha,\beta)\)-proportionality, universal coreset, differential privacy, \(k\)-median, \(k\)-means, Gaussian mechanism, sensitivity sampling, merge-and-reduce streaming, privacy–utility tradeoff

Abstract

Fair clustering with \((\alpha,\beta)\)-proportionality constraints requires every cluster to satisfy lower and upper bounds on the representation of each sensitive group. Recent universal-coreset results enable scalable optimization by compressing the dataset into a small weighted subset that preserves constrained clustering costs simultaneously for all center sets and all feasible group-cardinality (coloring) constraints. In many applications, however, the sensitive attributes that motivate fairness constraints also require formal privacy protection, and releasing even a small coreset can leak membership and group information. We propose a modular framework for differentially private universal coresets for \((\alpha,\beta)\)-fair \(k\)-median and \(k\)-means in metric and Euclidean spaces. The framework consists of two stages: (i) construct a randomized \(\eta/2\)-universal coreset via random sampling and reweighting; (ii) release a privatized coreset by perturbing only the weight vector using the Gaussian mechanism, followed by post-processing (nonnegativity and mass renormalization). We prove that the released summary is \((\varepsilon,\delta)\)-differentially private and, with high probability, preserves constrained costs for all \((C,M)\) up to a multiplicative \((1\pm\eta)\) factor plus an additive privacy term that depends on the metric diameter bound, coreset size, and an \(\ell_2\) weight-sensitivity parameter. We further extend the method to streaming via merge-and-reduce and analyze privacy under continual observation using advanced composition and amplification-by-subsampling. The resulting DP universal coresets allow multi-query fair clustering and interactive constraint tuning without repeated access to the raw sensitive dataset.

Keywords: Fair clustering, \((\alpha,\beta)\)-proportionality, universal coreset, differential privacy, \(k\)-median, \(k\)-means, Gaussian mechanism, sensitivity sampling, merge-and-reduce streaming, privacy–utility tradeoff
Iftikhar Ahmad
University of Agriculture Faisalabad (UAF), Pakistan
Wang Zu
Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China

DOI

Cite this article as:

Iftikhar Ahmad, Wang Zu. Differentially Private Universal Coresets for \((\alpha,\beta)\)-Fair \(k\)-Median and \(k\)-Means Clustering. Bulletin of Computer and Data Sciences, Volume 6 Issue 4. Page: 22-38.

Publication history

Copyright © 2025 Iftikhar Ahmad, Wang Zu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Browse Advance Search