Research on Large-Scale Batch Training Part 5 (Machine Learning) | By Monodeep Mukherjee

Concurrent Adversarial Learning for Large Batch Training

Authors: Yong Liu, Xiangning Chen, Minhao Cheng, Cho-Jui Hsieh, Yang You

Abstract: Large-batch training has become a popular technique for training neural networks on a large number of GPU/TPU processors. As the batch size increases, stochastic optimizers tend to converge to sharp local minima, leading to poor test performance. Current methods typically use large-scale data augmentation to increase the batch size, but we find that as the batch size increases, the performance gain from data augmentation diminishes and data augmentation becomes insufficient after a certain point. In this paper, we propose to use adversarial learning to increase the batch size in large-batch training. Adversarial learning is a natural choice for smoothing decision surfaces and biasing them towards flat regions, but it has not been successfully applied to large-batch training because it requires at least two successive gradient calculations at each step, which at least doubles the running time compared to vanilla training, even with a large number of processors. To overcome this issue, we propose a novel concurrent adversarial learning (ConAdv) method that utilizes stale parameters to decouple sequential gradient calculations in adversarial learning. Experimental results demonstrate that ConAdv can successfully increase the batch size of ResNet-50 training on ImageNet while maintaining high accuracy. In particular, we show that ConAdv can achieve a top-1 accuracy of 75.3\% on ImageNet ResNet-50 training at a batch size of 96K, and combining ConAdv with data augmentation further improves the accuracy to 76.2\%. This is the first study to successfully extend the ResNet-50 training batch size to 96K.

Source link