A new federated learning framework, Flotilla promotes scalable, resilient distributed machine learning on edge devices. The assessment, which utilizes over 200 clients, demonstrates rapid fault tolerance and comparable resource use for existing frameworks such as Flower, OpenFL, FEDML, and excellent scalability for large client counts.
Coupled with increasing demand for data privacy, the increased prevalence of mobile and edge computing will drive the development of distributed machine learning technologies such as federated learning (FL). This approach allows the algorithm to train distributed data sources and minimize the need to centralize sensitive information. However, existing FL frameworks often prioritize the learning process itself and ignore the practical challenges of deployment on diverse and potentially unreliable edge hardware. Loopkasa Banerjee, Prince Modi and colleagues at the Indian Institute of Science (IISC) and the Institute of Technology Science (BITS) deal with a newly developed framework entitled “flotilla: scalable, modular and resilient related redaided learnt learnt framework for heterogenous resources.” The team's work focuses on building systems that can support both synchronous and asynchronous learning strategies, demonstrating robustness to client and server failures, and efficient resource utilization on edge devices.
Federated Learning (FL), a distributed machine learning technology, enables model training across distributed networks of edge devices such as smartphones and IoT sensors, without exchanging data itself. This preserves data privacy and reduces communication costs, but presents important engineering challenges when deploying at scale on resource-constrained hardware. Flotilla addresses these challenges with a framework designed for scalable and lightweight implementations of FL.
The architecture prioritizes asynchronous aggregation. This is a way that model updates from participating devices do not need to be synchronized before they are incorporated into the global model. This is in contrast to synchronous FL, which can suffer from delays caused by low devices. Flotilla's Stateless Clients combine with externalized session state to contribute to a resilient architecture that allows for rapid failover, demonstrated in testing with over 200 clients. Stateless means that each client operates independently without retaining information about previous interactions, simplifying recovery from failure.
Flotilla's modular design allows for flexible configuration of a variety of FL strategies and deep neural network (DNN) architectures. This adaptability has been confirmed through assessment using five different FL strategies and various DNN models. Framework performance is comparable or exceeded existing FL frameworks such as Flower, OpenFL, and FedML, particularly on resource-constrained devices such as Raspberry Pi and Jetson Boards. These boards represent a common platform for edge computing due to low cost and energy consumption.
Scalability testing reveals that Flotilla outperforms alternative frameworks in certain scenarios, suggesting the possibility of deployment in large, distributed learning environments. The lightweight and asynchronous aggregation of the framework contributes to its efficiency, reduces the computational burden on individual devices, and improves overall system responsiveness.
