data set
Five datasets were used for training, validation, and retrospective testing. (1) Dataset 1, training and validation set. (2) Dataset 2, the inner image test set. (3) Dataset 3, an external image test set. (4) Dataset 4, internal video test set. (5) Dataset 5, an external video test set. Datasets 1–4 were collected retrospectively from November 2016 to November 2021 from the People’s Hospital of Wuhan University (RWHU). Dataset 3 was retrospectively collected from 6 hospitals including Wuhan Central Hospital, China Three Gorges University and Yichang Central People’s Hospital Dataset 5 was collected from June 9, 2020 to November 17, 2020 Collected retrospectively from Beijing Cancer Hospital.
Lesion inclusion criteria: (1) focal lesions (only one focal lesion within the same visual field); Lesion exclusion criteria: (1) Multiple lesions (multiple focal lesions within the same field of view). (2) Type I lesions, Type III lesions, and ulcers. (3) View too close or too far. (4) submucosal lesions; Images from the same lesion were not split between training, validation, and test sets. Qualified and unqualified images are shown in Supplementary Fig. 1. An expert endoscopist selected images and videos according to lesion selection criteria. Inside and outside videos were selected and edited by research assistants under expert guidance. The pathology results of the image and video test sets were reviewed by an experienced gastroenterologist with more than 10 years of experience in pathological diagnosis of gastric abnormalities.
Establishment of characteristics
A literature review identified features associated with gastric tumors. Keywords “white light endoscopy” or “white light imaging”, “diagnosis” or “feature” or “feature”, “early gastric cancer” or “gastric dysplasia” or “gastric intraepithelial neoplasia” in the PubMed database Searched for January 1, 2011 and December 31, 2021. A total of 164 publications were evaluated. In addition, 149 records were excluded because they were unrelated to the diagnosis of gastric tumors (n = 49), independent of WLE (n= 97), case reports (n= 3). 1 out of 15 records could not be retrieved because the full text was not available. 14 records were then evaluated for eligibility. Eight were excluded because they were unrelated to the diagnostic features of gastric tumors.
In addition, a manual search added 6 records. Finally, 12 references were included. Based on the literature, two expert endoscopists and her two algorithm engineers determined features relevant to diagnosis. Ultimately, 13 features were selected to be included. The process of establishing function is shown in Supplementary Fig. 2.
Construction of ENDOANGEL-ED
Thirteen features, including seven DL features and six quantitative features, were determined by a literature survey and included to construct ENDOANGEL-ED.
Seven DL features were extracted using conventional deep neural networks (DCNN 1-7). Feature extraction models 1-6 were trained, validated, and tested using images from dataset 1. Images were not split between training, validation, and test sets. DCNNs 1–6 were binary or three-category classification models each aimed at determining the following six features: (1) Spontaneous bleeding: whether the lesion has spontaneous bleeding. (2) Elevation: Whether the lesion is elevated. (3) Depression: Whether the lesion is depressed or not. (4) Boundary: Whether the lesion has a clear boundary. (5) Surface: whether the surface of the lesion is rough or smooth. (6) Tone: whether the lesion is reddened, pale, or unchanged (same tone as the background mucosa). We compared the performance of supervised and semi-supervised algorithms when building DCNN 1-6. Before the images were sent to his DCNN 1–6, they were first processed by a previously built YOLO-v3 model to identify anomalies.27Briefly, YOLO-v3 was trained to detect gastric lesions using 21,000 gastric images.11 It can detect focal lesions with a sensitivity of 96.90%.
A seventh feature extraction model was previously developed using the ResNet-50 algorithm for classifying 26 anatomical landmarks in esophagogastroduodenoscopy.28,29Lesion location was further divided into three categories: upper middle stomach, lower stomach, and indistinguishable.
Quantitative features were extracted and analyzed based on regions localized by YOLO-v3. These quantitative features include:
-
1.
Lesion Area Aspect Ratio: The ratio of the width to height of the lesion in the image, which describes the overall shape of the lesion.
-
2.
Spectral principal component information of the color of the lesion area: Convert the image from the red-green-blue color space to the P color space and extract the 10 dominant color features of the image in the P color space. Next, the mean pixel for each color feature in the three channels is calculated, and the median of all mean pixels is the representative spectral principal component information. It was used to quantify lesion color features.
-
3.
Image entropy of S channel in HSI color space of lesion area: Convert the image from RGB color space to HSI color space and compute the image entropy of S channel. This was another feature used to describe color features.
-
Four.
Lesion area texture information: The local binary pattern method was used to analyze the statistical texture features of the images. Changes in textural information reflect changes in the gastric mucosa.
-
Five.
Oriented Gradient Histogram of Lesions: Characterized by the distribution (histogram) of the orientation of the gradient (gradient direction). Edges and corners pack more information about the shape of an object than flat areas. This index reflects information about lesion borders and shape.
-
6.
Lesion area color moment: A simple but efficient color feature that reflects general brightness, color distribution area, and color distribution symmetry.
Once the 7 DL-based features and 6 quantitative features are extracted, they are transformed into are combined and input into the fitting diagnostic model. , Logistic Regression (LR), Decision Trees (DT), Support Vector Machines (SVM), Gradient Boosting Decision Trees (GBDT). The perfect model for building the ENDOANGEL-ED has been selected. Representative images of these features and a schematic of this study are shown in Fig. 3 and Supplementary Fig. 3. The literal workflow of this study is shown in Supplementary Fig. 4.

a 13 features, including 7 deep learning-based features and 6 quantitative features. B. Framework for ENDOANGEL-ED development. HIS Hue, Saturation, Intensity.
Construction of a unique DL model for gastric tumor diagnosis
Using the resnet-50 algorithm, we constructed a conventional sole DL model and used the same training set as ENDOANGEL-ED to diagnose early gastric tumors under WL. Two image preprocessing methods (the detection box of YOLO-V3 keeps its original size or is enlarged to 1.2 size to include more information about the mucosa surrounding the lesion) and the only DL model’s We compared both supervised and semi-supervised algorithms in development. .
Internal Image Test, External Image Test, Internal Video Test, and External Video Test
The performance of ENDOANGEL-ED and DL alone was tested on datasets 2-5 based on images and videos.
continuous video test
Performance of ENDOANGEL-ED was tested on serial videos of patients undergoing EGD examination from RWHU between March 2022 and June 2022.
Inclusion criteria are: (1) 18 years of age or older; (2) sedated gastroscopy; (3) able to read, understand and sign informed consent; Exclusion criteria are: (1) emergency bleeding; (2) Food residue; (3) Patients who have had a history of gastrectomy or who have been diagnosed with a residual stomach. (4) no lesions or pathological results; Enrolled patients were further selected according to the lesion criteria described above. Raw videos of eligible lesions were then collected. All videos were edited into video clips containing target lesions. ENDOANGEL-ED is activated when the image frame is frozen. Predictions of the final included features and diagnostics with ENDOANGEL-ED were displayed on the screen (Figure 4 and Video 1).

Prediction and diagnostic results for the six functional indices are displayed on the left.
Man-machine comparison
Man-machine comparisons were made with internal and external videos. His 31 endoscopists at RWHU and his 46 endoscopists from 44 other hospitals participated in the comparison of internal and external videos, respectively. They independently reviewed all video clips and responded with either ‘neoplastic’ or ‘non-neoplastic’.External man-machine comparisons were reanalyzed from previously published trials30Endoscopist experience level was determined as novice [1–5 years of EGD (esophagogastroduodenoscopy) experience], elderly (6–10 years of EGD experience), and professionals (10+ years of EGD experience). Endoscopist performance was compared with ENDOANGEL-ED and DL alone.
MRMC research
Thirty-one endoscopists and 127 video clips of internal video tests were involved in the MRMC study. Using a crossover design, endoscopists were randomly and evenly randomized to Group A (first reading his video without the ENDOANGEL-ED extension) and Group B (first reading his video with the ENDOANGEL-ED extension). read). After a 2-week washout period, placement was reversed. Endoscopists had a unique choice to consider or ignore augmentation based on their judgment. Each endoscopist’s total time to read these cases was recorded. The study design is shown in Supplementary Fig. 5.
Acceptance analysis using scales dedicated to AI systems
A modified 5-point Likert-type acceptance scale was used for the implementation of AI in gastrointestinal endoscopy published by Tian et al.31 The scale consisted of 9 items to assess and compare endoscopists’ trust, acceptance, and confidence in the Explainable AI system and the conventional single DL system. Thirty-one endoscopists were invited for the scale evaluation. The scale form is attached in the Supplement.
ethics
The RHWU Ethics Committee approved this study. The Institutional Review Board has waived informed consent for retrospectively collected data. All patients to be enrolled had signed informed consent. This study was registered with the Chinese Clinical Trials Registry as ChiCTR2100045963.
statistical analysis
Regarding the continuous video test, the accuracy of ENDOANGEL-ED was estimated at 80%. The sample size was calculated as 72 with an alpha of 0.05 and a power of 0.80 using the one proportion test procedure (PASS 2021).
ENDOANGEL-ED, sole DL model, and endoscopist performance were assessed by accuracy, sensitivity, specificity, PPV, and NPV. McNemar’s test was used to compare precision, sensitivity, and specificity. χ2 The test was used to compare PPV and NPV between ENDOANGEL-ED and sole DL models. Interrater agreement among endoscopists was calculated using Fries’ kappa. Performance metrics between different levels of endoscopists and her ENDOANGEL-ED and sole DL models were compared using the Mann-Whitney U test. Questionnaire acceptance and comparisons of other items were analyzed using the Wilcoxon signed-rank test. P.Values <0.05 were considered statistically significant.
Report overview
For more information on the study design, see the Nature Research Reporting Summary linked to this article.