New observational auditing framework targets privacy leaks in machine learning

Privacy concerns in machine learning (ML) continue to surface as audits show that models can reveal some of the labels used during training, such as user choices, expressed preferences, or the results of actions. A new research paper considers another way to measure this risk, and the authors present findings that could change the way companies test their models for leaks.

Why standard audits are difficult to use

Previous privacy audits often relied on changing training data. A common tactic used by researchers has been to insert canaries, artificial records that are added to the dataset so testers can see if the model remembers. If a canary appears during testing, it indicates that the model is storing information in a way that could potentially be leaked.

Although this tactic highlighted privacy concerns, it created operational problems. The training pipeline has strict rules, and changes to the dataset can result in additional review steps. The study notes that the initial audit setup resulted in significant engineering overhead, which slowed implementation in large-scale systems.

The new Observational Audit Framework aims to remove that barrier.

“By reducing the complexity of privacy audits, our approach enables application in a wider range of contexts,” the researchers said.

It works without touching training data, making it suitable for pipelines that cannot be adjusted on a test-by-test basis.

How the observation audit framework works

The observation auditing framework checks whether the model’s behavior reveals which labels came from training and which labels came from alternative sources.

The trained model does not display mixed labels. Mixed datasets are only provided to auditors after training. To perform the test, auditors use two types of labels. Some of it comes from the original training process. Others are from proxy models that generate alternative labels for the same data.

This test works by giving the attacker a combination of these labels. The attacker tries to guess which record holds the training label. If the model leaks label information, the attacker is more likely to succeed than by chance. Increasing privacy protection reduces that signal.

The study points out that the proxy model does not need to match the training labels exactly. All that is required is that the two sources be close enough that an attacker cannot easily separate them. The paper explains that previous checkpoints of the same model can act as proxy label sources, avoiding additional training steps.

Once the attacker is finished, the audit converts the attacker’s score into a privacy measure. This follows the style used in previous privacy audits and allows us to compare results in different ways.

What researchers discovered

The authors tested the audit on two completely different datasets. One was a small collection of images used for research. The other is a large click dataset collected over a 24-day period. This allowed the team to see if the method worked reliably across tasks of different shapes and sizes.

Across both datasets, one pattern stood out. When the model was trained with stricter label privacy settings, auditors had a hard time determining which records retained their original labels. This indicates that the privacy tools are working as intended.

Auditors’ jobs were made easier when privacy settings were lax. This model preserved the label patterns that the privacy settings were intended to restrict, allowing auditors to detect patterns with much more confidence. The gap between tight and loose settings appeared in all tests.

What matters is how stable this behavior was across tasks. The stringent settings resulted in low label leakage in all experiments. Relaxing settings makes it easier for auditors to detect signals linked to training labels.

In this study, we compared this new audit to older methods that required records embedded in the training dataset. Both approaches surfaced the same types of privacy issues. The authors present this as evidence that observational audits can uncover these issues without changing training data or building additional models.

Source link