Detected 33.8% more mislabeled data with adaptive label error detection to improve machine learning

The reliability of machine learning classification systems is increasingly threatened by inaccurate ground truth labels, despite careful data curation by expert annotators. Zan Chaudhry, Norm H. Rothenberg, Brian Cuffo, and Craig K. Jones. A research team from Johns Hopkins University and the Johns Hopkins Bloomberg School of Public Health is tackling this critical issue with a new approach to identifying mislabeled data. Their work introduces adaptive label error detection (ALED), a technique that utilizes feature extraction and Gaussian distribution modeling to identify samples with incorrect labels. This new technique shows significantly improved sensitivity for error detection across multiple medical image datasets without sacrificing accuracy. Ultimately, ALED provides a powerful tool to improve model performance. This is evidenced by the 33.8% reduction in test set error when using system-corrected data to refine the model.

This new technique demonstrates significantly improved sensitivity for error detection without sacrificing accuracy across several medical image datasets, ultimately providing a powerful tool to improve model performance. The evidence for this is that using the data corrected by the system to refine the model reduces the error on the test set by 33.8%.

Dealing with mislabeled data in medical image processing

Machine learning and artificial intelligence are becoming increasingly popular in scientific research, especially in the field of medical image processing, where deep convolutional neural networks (DCNNs) have shown promise in image segmentation and classification. These networks are trained using a gradient descent algorithm to minimize loss criteria and measure the difference between the model’s predictions and the provided ground truth labels. The accuracy of these labels is very important because it determines how the model’s parameters are tuned during training and ultimately determines its performance. However, human annotation is inherently fallible and introduces mislabeled data, which poses a significant problem for artificial intelligence and machine learning research.

Examples of mislabeling have been found even in well-established benchmark datasets, highlighting the widespread nature of the problem. In medical imaging, the daily error rate for radiologists is estimated to be 3-5%, with interobserver variation reaching approximately 25% in certain radiology tasks. The ALED detector is a Python package called statlab that is designed to identify potentially incorrect labels in a dataset. By fine-tuning the neural network based on data corrected using ALED detectors, the team demonstrated a 33.8% reduction in errors on the test dataset, indicating significant benefits for end users.

This methodology focuses on the geometry of the feature space and leverages the principles of deep convolutional neural networks to detect mismatches between data points and their assigned labels. By identifying and correcting these mislabeled samples, the ALED detector aims to improve the overall accuracy and reliability of machine learning models used in medical image classification and other applications. The research team extracted intermediate features from a deep convolutional neural network and then removed noise to refine the data representation. These features were modeled using multidimensional Gaussian distributions, and mapping the reduced manifold of each class enabled accurate likelihood ratio tests for identification of mislabeled samples. Experiments demonstrate that ALED significantly outperforms established label error detection methods across multiple medical image datasets.

At the core of ALED is the ability to accurately identify errors in training data, a critical step in building more robust machine learning models. Measurements confirm that using ALED to fine-tune the neural network on corrected data significantly reduces the test set error by 33.8%. This breakthrough improves the accuracy and reliability of predictive models, delivering tangible benefits to end users. The team specifically measured performance improvements due to reductions in classification errors after implementing the modified dataset, highlighting the practical impact of their work.

Further technical achievements include implementing ALED as a deployable Python package named statlab, facilitating wide adoption and integration into existing machine learning pipelines. Data analysis reveals that ALED has improved sensitivity in detecting mislabeled samples and addresses important limitations of existing reliable learning approaches. The method extracts and denoises intermediate features, models the class distribution using a multidimensional Gaussian distribution, and performs a likelihood ratio test for incorrect labels. Results across multiple medical image datasets show that ALED improves the sensitivity of error detection without reducing accuracy compared to existing label error detection techniques. The demonstrated improvement extends to model performance, where using ALED to fine-tune the corrected data significantly reduced the error on the test set by 33.8%.

This highlights the potential of ALED to enhance the generalization ability of deep learning systems by addressing data quality issues. The authors acknowledge that the performance of ALED can be affected by the hyperparameters used during model training and suggest further investigation into the optimal timing of ALED application within the training process. Future research may also consider using features extracted from different depths within the network to further improve detection accuracy.

Source link