
Federated learning (FL) is a new machine learning (ML) where a logically centralized coordinator coordinates many distributed clients (such as mobile phones and laptops) to collectively train or evaluate models environment. This enables applications across a wide range of ML jobs to train models and evaluate end-user data while avoiding the significant costs and privacy risks associated with obtaining raw data from customers. Previous research has focused on improving FL’s key features related to varying execution speeds of client devices and non-IID data distribution.
Extensive benchmarks to evaluate FL solutions consider (1) data heterogeneity, (2) (3) heterogeneous connectivity, and (4) device heterogeneity under availability conditions at various scales. Its behavior in real FL scenarios should be studied (6) for a wide range of ML tasks. Although the first two factors are frequently cited in the literature, real-world network connectivity and client device availability can affect both forms of heterogeneity and hinder model convergence. Similarly, real-world FL deployments frequently involve thousands of concurrent participants out of millions of customers, so large-scale evaluations may reveal algorithmic resilience.
Missing just one component can skew the FL evaluation. Unfortunately, established FL benchmarks often fall short in many ways. First, many real-world FL applications have limited data flexibility. Despite having many datasets and FL training targets (e.g. LEAF), the datasets often consist of synthetically created partitions derived from traditional datasets (e.g. CIFAR). , does not represent a realistic feature. This is because these benchmarks are primarily based on classic ML benchmarks. (e.g. MLPerf, or built for simulated FL systems such as TensorFlow Federated or PySyft.
Second, existing benchmarks often ignore system performance, connectivity, and client availability (such as FedML and Flower). This prevents attempts at FL to account for system efficiency and makes the statistical performance overly optimistic. Third, the dataset is mainly small, as our experimental setup cannot simulate large-scale FL deployments. While real FL often has thousands of participants in each training cycle, most of the available benchmarking platforms can only train a few dozen participants in each round.
Finally, most of them lack easy-to-use APIs for automated integration and would require significant technical effort to benchmark at scale. Introducing FedScale, the FL benchmark and supporting runtime, to facilitate a complete and consistent FL evaluation. β’ FedScale has, to our knowledge, the most comprehensive collection of FL datasets for investigating various aspects of real FL installations. We currently have 20 real FL datasets of small, medium, and large sizes covering a wide range of task categories such as image classification, object identification, word prediction, speech recognition, and reinforcement learning.
A FedScale runtime that standardizes and simplifies FL evaluations in more realistic terms. FedScale Runtime includes a mobile backend for on-device FL evaluation and various actionable FL metrics (e.g. actual client round length) on GPU/CPU with accurate FL stats and system information Includes a cluster backend for benchmarking A cluster backend can efficiently train thousands of clients on a small number of GPUs in each cycle. The FedScale Runtime is also extensible, allowing rapid implementation of new algorithms and concepts through flexible APIs. Researchers conducted systematic tests to demonstrate how FedScale enables exhaustive FL benchmarking, specifically addressing system stragglers, accuracy bias, and device energy trade-offs In doing so, we emphasized the key requirement to simultaneously optimize system and statistical efficiency.
FedScale (fedscale.ai) provides a high-level API for implementing, deploying, and evaluating FL algorithms at scale across various hardware and software backends. FedScale also features the most comprehensive FL benchmarks, including FL tasks from image classification and object identification to language modeling and speech recognition. Additionally, we provide a dataset that adequately simulates FL training scenarios in which FL is applied in practice. Best of all, it’s open source, and the code is available for free on Github.
This Article is written as a summary article by Marktechpost Staff based on the research paper 'FedScale: Benchmarking Model and System Performance of Federated Learning at Scale'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper, github link and reference article. Please Don't Forget To Join Our ML Subreddit
Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his Bachelor of Science in Data Science and Artificial Intelligence from the Indian Institute of Technology (IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is in image processing and he is passionate about building solutions around it. He loves connecting with people and collaborating on interesting projects.

