Efficient privacy loss consideration through subsampling and random assignment

Machine Learning


We consider the privacy amplification properties of a sampling scheme in which a user’s data is used in k steps randomly and uniformly selected from a sequence (or set) of t steps. This sampling scheme has recently been applied in the context of differential private optimization (Chua et al., 2024a; Choquette-Choo et al., 2025) and communication-efficient high-dimensional private aggregation (Asi et al., 2025), and has been shown to have practical advantages over standard Poisson sampling. Although theoretical analyzes of this sampling scheme (Feldman & Shenfeld, 2025; Dong et al., 2025) yield limits close to those of Poisson sampling, it still has two significant drawbacks. First, in many practical settings, the resulting privacy parameters are not exact due to the approximation step of the analysis. Second, the parameters computed are either hockey sticks or Renyi divergences, both of which incur overhead when used to account for privacy loss.

In this study, we demonstrate that the privacy loss distribution (PLD) of random assignment applied to differentially private algorithms can be efficiently computed. When applied to a Gaussian mechanism, our results show that the privacy-utility tradeoff of random assignment is at least as good as that of Poisson subsampling. In particular, random assignment is suitable for training with DP-SGD. To support these calculations, our research develops a new tool for calculating general privacy loss, based on the concept of PLD realization. This concept allows us to extend accurate privacy loss accounting to subsampling, which previously required manual noise mechanism-specific analysis.

†Hebrew University of Jerusalem



Source link