A new study demonstrates an innovative approach to critically detect precancerous lesions using large-scale, high-resolution images. A team of Portuguese researchers has developed a machine learning solution to help pathologists detect cervical dysplasia, fully automating the diagnosis of new samples. This is one of his first published works with full slides.
Cervical cancer is the fourth most common cancer among women, with an estimated 604,000 new cases in 2020, according to the World Health Organization (WHO). However, it is also one of the most preventable and treatable cancers if detected early and managed appropriately. Screening and detection of precancerous lesions (as well as vaccination) are therefore important to prevent this disease.
However, we can develop a machine learning model that helps subjectively classify lesions in squamous epithelium (a type of epithelium with protective functions against microorganisms) using whole-slide images (WSI) that contain information from the entire tissue. what if
In this sense, a team of researchers from Portugal’s Institute of Systems and Computer Engineering and Technology (INESC TEC) and the Institute of Molecular Anatomy and Pathology IMP Diagnostics developed a weakly supervised or machine learning method. Combine annotated and unannotated data during model training to grade cervical dysplasia.
This is especially useful given the difficulty of obtaining annotations for pathology data. Due to the sheer size of the images, the annotation process can be very time consuming and tedious. Also, it is highly subjective. This type of technique allows researchers to develop models with good performance even when information is scarce during the model training stage.
The model then grades cervical dysplasia, an abnormal proliferation of superficial cells, as low-grade (LSIL) or high-grade intraepithelial squamous lesion (HSIL).
In the detection of cervical dysplasia, this is one of the first published studies to use full slides and fully automate the diagnosis of new samples, following an approach involving segmentation of regions of interest followed by classification. is. ”
Sarah Oliveira, INESC TEC Researcher
Possibility of “big picture”
This classification process is complex and can be “subjective”. Therefore, the development of machine learning models can help pathologists to do this work. In addition, computer-aided diagnosis (CAD) also plays an important role. These systems act as first signs of suspicious cases, alerting pathologists to cases that require closer evaluation.
Sara Oliveira emphasized that even the development of CAD systems for decision support in digital pathology is not fully resolved. “Indeed, computational pathology is still a relatively new field, and there are many challenges to be solved so that machine learning models can effectively approach clinical applications,” she said.
There are trade-offs in using WSI, and the most common approach focuses on manually clipping small regions of the slide. WSIs are typically large, high-resolution images (often larger than 50,000 by 50,000 pixels). Therefore, it cannot be easily adapted to the graphics processing units (GPUs) used to train deep learning models.
“Despite the promising results, the fact that these approaches only focus on small regions (given the size of the slide) and require manual selection of regions to classify is a major drawback from an implementation point of view. It makes you more vulnerable,” he said. researcher.
Training a segmentation model
This framework consists of an epithelial segmentation step followed by dysplasia classifiers (non-neoplastic, LSIL, HSIL), eliminating the need to manually identify epithelial regions and fully automating slide assessment. increase. “The proposed classification approach achieved a balanced accuracy of 71.07% and sensitivity of 72.18% for slide-level testing on 600 independent samples,” revealed the study’s first author.
To train the segmentation model, the researchers used all annotated slides (186) containing a total of 312 tissue sections. The results indicate that “it is very rare for the model to fail to recognize large portions of the epithelium or misidentify important regions.”
After the first step of segmentation, researchers were able to use the identified ROIs to focus classification and use unannotated WSI for training, enabling automated diagnosis of unconfirmed cases. . The classifier can then diagnose the grade of dysplasia from the tiles in those areas.
In this solution, we trained a classification model using 383 annotated epithelial regions and split them into training and validation sets. After testing different models and choosing the best model, the researchers retrained the version by adding a few individual labeled tiles to the training set to take advantage of the classification learning task. (263). The tile selection process was improved by combining the selected tiles of each epithelial region with only the label of the corresponding bag and the tiles with specific associated labels.
Finally, to take advantage of the full dataset, the team retrained the model by adding a bag of tiles from unannotated slides (1198).
The principal investigator of this paper emphasizes that future work may aim to refine both parts of the model (segmentation and classification) and evaluate a fully integrated approach.
The 600-sample test set used in the current study was selected from the IMP Diagnostics dataset and is available “upon reasonable request.”
“At IMP Diagnostics, we are investing in improving cervical cancer diagnosis, and thus improving women’s health. This tool brings us one step closer to more efficient detection of precancerous lesions,” said Pathologist at IMP. Diana Montezuma Felizardo, Head of Research and Development, concludes. diagnose.
