DENPAR: Perioral X-ray data set for machine learning

The compressed dataset is available at Zenodo (https://doi.org/10.5281/zenodo.16645076).¹⁴ Under the Creative Commons Attribution 4.0 license. An uncompressed folder consists of three directories. training, verificationand testcan be used as an official split for training, validation, and testing machine learning models. Additionally, the folder contains one Metadata spreadsheet file Named “Includes X-ray characteristics”.

Each directory contains five subdirectories. image, Masks (for X-ray imaging), Mask (like teeth), Bone-level commentaryand Key Point Annotation. The Metadata Spreadsheet file provides detailed information on the arch type, anatomical orientation, and FDI notation of each tooth in all radiographs.

image and Masks (for X-ray imaging) The subdirectory contains the corresponding IOPA radiographs .jpg Masks format and teeth radiograph images .png Each format.

Key Point Annotation The subdirectory contains keypoint annotation JSON files. Figure 5 shows the visualization of information from a keypoint annotation JSON file. In this visualization, key points in CEJ are highlighted with a red circle, and vertex points are highlighted with a blue circle. Each tooth is surrounded by an individual rectangular bounding box derived from the corresponding tooth mask. Bounding box coordinates are expressed as [x-minimum, y-minimum, x-maximum, y-maximum]X-minimum and y-minimum correspond to the upper left corner of the box, and x-maximum and y-maximum correspond to the lower right corner of the bounding box. These bounding boxes are included to meet the input requirements of a particular keypoint detection model. This utilizes bounding box constraints to enhance localization and accuracy of keypoint detection.

Bone-level commentary The subdirectories contain general objects for annotation JSON files at the alveolar sclerosed bone level and annotation files in context (COCO) format. In the annotation file for the alveolar coat of arms, in 2D images, the bone level of the alveolar bone is expressed as the individual X and Y coordinates of each alveolar sclerosing bone level, as the bone level is a line composed of multiple points. Images like X-rays .json The file will be renamed the corresponding name on the IOPA radiograph.

Mask (like teeth) Sub-Directory contains the corresponding IOPA radiograph name and subfolder of the annotation file in COCO format. Each subfolder contains mask images of individual teeth .png format.

A summary of the dataset structure is shown in Figure 4. The diagram shows the content structure of the “Training” folder, with the “Validation” and “Test” folders following the same structure.

Masks like X-rays are useful for tasks such as identifying all teeth in the X-rays, while teeth masks are useful for tasks such as identifying individual teeth in the X-rays. Annotation files in COCO format can be used directly or preprocessed depending on the machine learning model requirements. This format is widely used because it is extremely versatile and allows for the creation of customized annotation files and can be adapted to a variety of computer vision tasks.

Figure 6 shows the distribution of image resolutions across training, validation, and test sets. The average resolutions for the training, validation, and test set are 1059 x 960 pixels, 1084 x 944 pixels, and 1066 x 953 pixels, respectively. For training datasets, resolutions range from a minimum of 549 x 717 pixels to a maximum of 1542 x 1537 pixels. For validation datasets, resolutions range from 627 x 698 pixels to 1366 x 1533 pixels. For test datasets, resolutions range from 567 x 685 pixels to 1542 x 1370 pixels.