
Federated Learning (FL) has emerged as a pivotal technology in recent years, enabling collaborative model training between different entities without centrally managing data. This approach is particularly advantageous when organizations or individuals need to collaborate on model development without risking sensitive data.
By keeping data local and performing model updates locally, FL reduces communication costs, facilitates the integration of disparate data, and preserves the unique characteristics of each participant's dataset. However, despite its advantages, FL still has the risk of indirect information leakage, especially during the model aggregation stage.
FL includes various data partitioning strategies such as horizontal FL (HFL), vertical FL (VFL), and transfer learning. Because HFL involves parties with different sample spaces with the same attribute space, it is suitable for scenarios where regional branches of the same company aim to build richer datasets. Conversely, VFL includes nonconflicting entities with vertically partitioned data that share overlapping data samples but differ in feature space.
Finally, transfer learning can be applied when there is little overlap in data samples and features across multiple subjects with uneven distribution. Each category has its own challenges and benefits, with HFL focusing on independent training, VFL leveraging deeper attribute dimensions for more accurate models, and transfer learning addressing diverse data distribution scenarios. To do.
Despite the absence of raw data sharing in Florida, combining information across functions or the presence of compromised participants can still lead to privacy breaches. Label inference attacks pose a significant concern in this context, as they can exploit the sensitivity of labels to reveal sensitive information about the client.
To address this problem, researchers at the University of Pavia focus on defending against label inference attacks in VFL scenarios. They consider the attack and propose a defense mechanism called KD𝑘 (Knowledge Discovery and 𝑘-anonymity).
KD𝑘 relies on a knowledge distillation (KD) step and an obfuscation algorithm to enhance privacy protection. KD is a machine learning compression technique that transfers knowledge from a larger teacher model to smaller student models, producing softer probability distributions instead of hard labels.
Their framework includes a supervised network to generate soft labels for active participants, and the soft labels are processed using 𝑘anonymity to add uncertainty. Grouping the most probable 𝑘 labels makes it difficult for an attacker to accurately guess the most probable label. The server top model then uses this partially anonymized data for collaborative VFL tasks.
Experimental results show a significant decrease in the accuracy of label inference attacks across all three types outlined by Fu et al., demonstrating the effectiveness of the proposed defense mechanism. Research contributions include the development of robust countermeasures tailored to counter label inference attacks, which were validated through an extensive experimental campaign. Furthermore, this study provides a comprehensive comparison with existing defense strategies and highlights the superior performance of the proposed approach.
Please check paper. All credit for this study goes to the researchers of this project.Don't forget to follow us twitter.Please join us telegram channel, Discord channeland LinkedIn groupsHmm.
If you like what we do, you'll love Newsletter..
Don't forget to join us 40,000+ ML subreddits

Arshad is an intern at MarktechPost. He is currently continuing his international studies. He holds a master's degree in physics from the Indian Institute of Technology, Kharagpur. Understanding things from the fundamentals leads to new discoveries and advances in technology. He is passionate about leveraging tools such as mathematical models, ML models, and AI to fundamentally understand the essence.
