Weep Method Description
Multiple Instance Learning (MIL) is one of the standard frameworks for weakly monitored learning problems in computational pathology15. In the standard MIL framework, a bag (represented by WSI) contains instances (represented by tiles) and the entire bag is accurately classified if at least one instance is classified as positive. However, in real-world computational pathological applications, bag-level labels are usually assigned based on functions describing a single tile to achieve optimal predictive performance. The MIL framework can be divided into two steps: The first is the tile-level feature representation/prediction score taken from the base CNN model, and the second is: These feature representation/prediction scores are aggregated to provide WSI prediction scores using inter-tile aggregator functions or models. Aggregator functions at different intertile levels are used in various weakly monitored classification scenarios16.
Weep takes advantage of the basic properties of the MIL model applied to tile-level instances of histopathology WSI. Specifically, we assume that there is a model that provides tile-level predictions that allow tile rankings to the prediction task, for example, based on predicted class probability.I and xjP (class = 1 | xI)> p(class = 1 | xj) means xI It's more likely than xj It belongs to class 1, so it is conditional on the model and tile. For simple tile-to-slide mapping functions (mean, median, or 75 pencen) of a tile distribution, you can simply use probability predictions for tile classes. However, if you have a second trainable model for slide-level prediction with attention, these weights can also be considered as ranking metrics.
You can apply a backward selection approach regardless of ranking metrics. This ensures that the selected subset is the largest set of tiles required for positive classification at the WSI level. Backward selection is a common strategy used for variable selection of predictive modeling tasks17. Here we apply it to identify the set of instances (tiles) required for a positive classification label (Table 1). The empirical evaluation results examine tile-level predicted class probability and attention weights, as well as attention weights as ranking metrics. The WEEP algorithm allows you to determine the set of tiles required to assign WSI labels (Figure 1).

An overview of the weep methodology. a Demonstrate weakly monitored learning scenarios for image analysis of histopathology. First, the full image (WSI) is split into small patches called tiles. Tiles are then provided as input to a weakly monitored model consisting mainly of two modules: the extractor and the inter-tile aggregator function. The Feature Extraction Module is used to extract low-dimensional representation features from each tile and uses inter-tile aggregator functions (based on summary statistics or another trainable model) to provide slide-level predictive scores. The Tile to Slide Aggregator feature utilizes tile rankings based on tile-level predictive scores or optimized attention scores for trainable attention modules. b A demonstration of the WEEP methodology that applies step-by-step backward selection of tiles based on rankings obtained from slide aggregator functions from tiles. The first iteration removes the highest ranked tile from the WSI, then applies the slide aggregator function from the tile to the remaining tiles, obtaining the predicted score at the slide level. The next iteration continues until the slide-level predicted score falls below the classification threshold (example iteration 3). The removed tiles are selected via Weep related to the positive classification of WSI.
Learning materials
This study included patients from the Sös-BC-4 cohort collected from Södersjukhuset (South General Hospital) in Stockholm, Sweden between 2012 and 2018. One WSI stained hematoxylin and eosin (H&E) scanned at a 40x magnification was considered from each patient. Followed the WSI preprocessing procedure as stated13. Further analysis included only the tiles predicted as invasive cancer (tile size: 598 × 598 pixels) (tile size: 598 × 598 pixels). This study has been approved by the Swedish Ethics Review Office (2018/2106-31, 2018/1462-32 and 2019–02336). This study was conducted in accordance with the Declaration of Helsinki. According to ethical approval in this non-mediated collection and analysis of data from patient records, no additional informed consent was required.
The model was optimized and validated on a subset of Sös-BC-4 (n= 1695) Use 5x cross-validation (CV). The dataset was split into CV training and CV test sets for each CV fold. Furthermore, the CV training set is divided into training sets, and the tuning set is14. Each data division was stratified by clinical NHG. Sös-BC-4 data division is shown in Supplementary Figure 1.
Explaining four weakly monitored modeling strategies
This study included four weakly monitored modeling strategies. The first modeling strategy examined the ResNet-18 CNN model architecture18 As a tile-level classification model. Tiles were assigned weak slide level labels and Imagenet's prerequisite model was used to initialize model weights19 It was also fine-tuned as a monitored binary histological grade 1-pair classification model for feature extraction training sets (Supplementary Fig. 1). We provided slide-level prediction scores as slide aggregation function from the first tile using the 75th percentile of tile-level prediction scores from the CNN model13.
In the second modeling strategy, tile-level feature vectors were extracted with attention module training sets and tuning sets from the average pooling layer of the ResNet-18 model fine-tuned from the first modeling strategy (Supplementary Fig. 1). The trainable attention-based MIL (Atten-MIL) model was considered a slide aggregate function from the second tile. It contains a trainable layer that is inspired by it15provides a predicted slide-level score by optimizing the weights for each tile level feature vector and performing a weighted average of all tile feature vectors belonging to that slide. The third modelling strategy involved extracting the features of the CV training set using a published foundation model called UNI20. Trans-based MIL model called Transmiltwenty one It was considered an aggregator function for tiles to slides. The fourth modelling strategy included Atten-Mil as an inter-tile aggregation function using features extracted from Uni. Optimization of the four modeling strategies is described in the supplement.
Validating the CV Test Set
The optimized weak monitoring models were validated on the CV test set of each CV fold. Additionally, slide-level predictive scores were aggregated from the five CV test sets, optimal classification thresholds were evaluated, and WSIs were classified into grades 3 and 1 using Euden's statistics.
Quantitative and visual analysis of selected regions in various modeling strategies
We investigated the evaluation of the WSI area that contributes to the classification of histological grade 3 WSIs (n= 543). We demonstrated the iterative backward selection approach of the weep methodology as a line plot ( Crying plot). Additionally, the distribution of the percentage of selected tiles for each WSI in the CV test set was observed using histogram plots. Selected tiles using WEEP for the WSI example were visualized as binary masks on tumor masks with low resolution WSI. All plots were created using the package matplotlib (v.3.6.2)twenty two Python (v.3.10.8).