Real-time detection and localization of honeycomb defects in concrete pillars using hybrid deep learning models

Machine Learning


The localization and detection of honeycombs were evaluated with some performance metrics, such as confusion matrix, precision, recall, F1 score, ROC curve, Area Under Curve (AUC), Intersection over Union (IoU), mean Average Precision (mAP), and overall precision. Together, the metrics determined the effectiveness of the model in detecting and localizing honeycomb defects in pillar images with precision. Experiment setup was 8 GB DDR4 RAM, Intel Core i5 (8th Gen) processor, and NVIDIA GeForce MX230 GPU. This was sufficient to utilize for deep learning, which possessed quick processing speeds and fast model training as well as inference. The hardware configuration, although not costly, was sufficient to utilize with this deep learning problem and offered optimal performance under the present setting. Apart from the hardware, standardized software settings ensured stable and precise training. Effective resource management, especially power supply, played a major part in ensuring processing efficiency and avoiding slowdowns of the system while performing stringent operations. Red bounding boxes were applied to recognize and label honeycomb defects in pillar images such that the detection process is readily observable. The system achieved a fantastic detection efficiency of 98.26%, although with challenging operating conditions. High precision shows the robustness and reliability of the model in real applications.

The autonomous inspection device ensures fast and efficient testing, which can reduce manual checking time considerably. This enables efficient maintenance of concrete structures, providing industries with a state-of-the-art device for structural integrity. A representative image of model detection and localization is shown in Fig. 5.

  • Before detection: The raw concrete pillars show damage, but they are not processed.

  • After detection: The system detects and labels the defects using the YOLOv5 and Mask R-CNN models.

Confusion matrix

The Confusion Matrix45,46 is a performance evaluation tool for classification models, introduced by Karl Pearson in 1904. In machine learning, it is a table that is used to compare actual and predicted classifications and calculate the accuracy of the model. The matrix consists of four key components: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN). These values derive metrics such as accuracy, precision, recall, and the F1 score. This method in Fig. 6 is performed by drawing the actual labels on one axis and the predicted labels on the other. Thus, a clear and simple visualization of the errors is achieved. For example, an accurate binary classification model would consist of only TP and TN values, while an incorrect model would be characterized by high FP and FN rates.

$$\begin{aligned} \text {Confusion Matrix} = \begin{pmatrix} \sum (y_{\text {true}}=1 \wedge y_{\text {pred}}=1) & \sum (y_{\text {true}}=1 \wedge y_{\text {pred}}=0) \\ \sum (y_{\text {true}}=0 \wedge y_{\text {pred}}=1) & \sum (y_{\text {true}}=0 \wedge y_{\text {pred}}=0) \end{pmatrix} \end{aligned}$$

(1)

The Equation 1 represents the Confusion Matrix, where: – \(\sum (y_{\text {true}}=1 \wedge y_{\text {pred}}=1)\) is True Positives (TP), and \(\sum (y_{\text {true}}=1 \wedge y_{\text {pred}}=0)\) is False Negatives (FN). – \(\sum (y_{\text {true}}=0 \wedge y_{\text {pred}}=1)\) is False Positives (FP), and \(\sum (y_{\text {true}}=0 \wedge y_{\text {pred}}=0)\) is True Negatives (TN). – It quantifies classification performance by comparing actual labels \(y_{\text {true}}\) with predicted labels \(y_{\text {pred}}\).

Fig. 6
figure 6

Image representation of Confusion Matrix.

Figure 6, the confusion matrix illustrates that Mask R-CNN, YOLOv5, and Non-Maximum Suppression (NMS) honeycomb defect detection and localization are extremely accurate. The model achieves 1470 True Positives (TP) and 1470 True Negatives (TN), and there are just over 26 False Positives (FP) and 26 False Negatives (FN). These low values of FP and FN suggest hardly any misclassifications, proving the system to be robust. Mask R-CNN ensures precise segmentation, YOLOv5 enables real-time object detection, and NMS eliminates redundant overlapping efficiently detections. Collectively, these techniques allow the model to well and accurately be capable of identifying structural pillar defect-free zones from true honeycomb defects images.

Precision

Precision47 is a core performance measure commonly referred to for classification and object detection tasks. It describes the positive prediction accuracy by how many of the predicted positive instances are actually correct. Precision specifically is the percentage of true positives out of the total instances predicted as positive, and it is very important when the cost of false positives is extremely relevant. Examples include medical diagnosis, detection of structural defects, and surveillance systems. A higher precision score means the model was able to determine true defects from non-defects with confidence, which leads to safety and security when applied to real-life scenarios, while minimizing false alerts.

With a precision47 score of 0.9843, it is safe to say that 98.23% of the honeycomb detections were correctly set as cracked or uncracked. Precision gives us the number of True Positives, which means the ratio of TP to the sum of TP and False Positives (FP), i.e., how many of the predicted positives are actually correct. This type of scoring indicates a low number of false positives and demonstrates the model’s high reliability through its high precision.

$$\begin{aligned} \text {Precision} = \frac{P}{P + N} \end{aligned}$$

(2)

Equation (2) is for the Precision (P/(P+N)), which calculates the proportion of correctly predicted positives (P) among all positive cases. Predictions (P+N).

The model’s precision score of 0.9843 highlights its ability to accurately identify honeycomb defects in images of concrete pillars. A precision score of this magnitude indicates that nearly all of the instances predicted as “defective” were accurately identified. While a small percentage of false positives might not seem significant, this is a crucial concern, especially in structural health monitoring, where it could lead to unnecessary inspections, increased labor costs, and inspection workflow disruptions. Because the detection system is reliable and it only generates alerts for true defective instances, they can remove the noise that could lead to false alerts and focus on these identifiable defects with more confidence to work with and shift resources to remediate the defect areas when necessary. Instances of false predictions in safety-critical and/or high-stakes industrial applications can lead to severe consequences, particularly in regards to infrastructure safety. A precision score so high is a much-needed sign for any autonomous, intelligent inspection process since it shows a level of readiness of the model for integration. Overall, this precision score suggests that the model has a referred level of confidence appropriate for assisting inspections of industrial quality frameworks. A precision score at this level demonstrates that the model can be used in a real-time, operational framework where inspections become a timely inspection process that has the potential to streamline inspections into maintenance schedules to maximize time on stakeholder structural safety.

Recall

Recall48, which is also known as sensitivity, or true positive rate, is a quality metric for evaluative purposes during classification and object detection. It measures how well a model identifies all relevant instances in a dataset in terms of recall, or in other words, it indicates how many actual cases of positive were detected. Recall is most important when a missing positive case has serious implications, like detecting defects in infrastructure or fault detection. Therefore, a high recall means that most of the true positive cases were identified by the model, making it a model that is favored for the purposes of tallying detection rather than wrongfully detecting false positives.

A recall48 score of 0.9812 in Table ?? means that the model correctly detected 98.23% of the actual honeycombs with cracks. Recall TP/TP+FN, which indicates a model’s simulation of the total positive cases that it can identify. A high recall ensures low false negatives and excellent detection performance.

$$\begin{aligned} \text {Recall} = \frac{A}{A + B} \end{aligned}$$

(3)

Here, Eq. (3) defines the recall, where A represents True Positives (TP) and B represents False Negatives (FN).

The model achieved a recall score of 0.9812, indicating it was capable of identifying nearly all honeycomb defects in pillar structures. Considering this impressive recall value, it is evident that the model was able to identify almost all actual defect samples, and there were very few false negatives. High recall is very important in industry, where costs can be significant if a defect is missed, particularly if the defect relates to structural health monitoring and maintenance of concrete structures, where missing defects can reduce safety, continuity of operations, and long-term reliability. The recall value was so close to perfect as a result of the model’s ability to recognize nearly every case of honeycombing, even when imaging conditions were different or complex. This capability provides reliable detection and better overall confidence in the model’s performance when monitored and assessed using an automated inspection system, even when the system does not include a human operator to check every defect.

Precision and recall curve

The Precision-Recall Curve (PR Curve) is a vital evaluation metric in binary classification tasks, especially when dealing with imbalanced datasets. It depicts the trade-off between precision (positive predictive value) and recall (sensitivity) across various classification thresholds. In scenarios like defect detection in concrete structures, the PR Curve helps assess how well the model identifies actual defects while minimizing false alarms. Unlike ROC curves, PR curves are more informative when the positive class (e.g., presence of honeycomb) is rare. A high area under the PR curve indicates that the model maintains both high precision and recall, reflecting consistent and reliable defect detection.

The area under the Precision-Recall Curve, denoted as \(\mathcal{P}\mathcal{R}_{\text {AUC}}\), is calculated as:

$$\begin{aligned} \mathcal{P}\mathcal{R}_{\text {AUC}} = \int _{0}^{1} \text {Precision}(r) \, dr \end{aligned}$$

(4)

Where:

Equation (4) quantifies how well the model balances true positive identification and false positive reduction across thresholds.

Fig. 7
figure 7

Image representation of Accuracy and Loss.

The achieved area under the Precision-Recall Curve (PR AUC) in Fig. 7 is 0.975230, indicating exceptional performance. This high value reflects the model’s strong ability to detect actual honeycomb defects while minimizing incorrect defect predictions. In real-world structural safety assessments, such reliability is critical-false negatives could lead to undetected damage, while false positives may trigger unnecessary interventions. The value of 0.975230 implies that the model consistently maintains both high recall and precision, validating its robustness in imbalanced conditions where honeycomb presence is significantly lower than non-defective regions. A nearly perfect PR AUC also enhances user trust in the model’s decision-making, making it suitable for integration into practical defect inspection pipelines. Here in Fig. 7, the Precision-Recall Curve image representation has been given, visually confirming the model’s effectiveness across varying threshold levels and supporting its deployment in infrastructure health monitoring systems.

F1 score

The F1 Score49,50 is a performance measure which takes into account both precision and recall and is useful for evaluating binary classification tasks. The precision is the portion of true positives divided by all positives identified by the model, and the recall is the portion of true positives divided by all true positives. The F1 Score is unique in that it combines both precision and recall into a single performance measure by calculating the harmonic mean of the two measures. The F1 Score is useful for applications like defect detection, for example, to identify honeycomb defects in concrete structures. In such applications, the model’s performance can be evaluated based on three decisions, namely, detect defect, do not detect defect, and determine whether it was a false alarm or false negative defect detection. In situations where missed defects (false negatives) can lead to missed significant consequences, and false alarms (false positives) can also lead to significant consequences, measuring defect identification failures can be important. Thus, the F1 Score addresses the measure of the model defect identifying as well as including false identification as defined in the task’s specification

The F1 Score \({\mathcal {F}}_1\) is calculated as:

$$\begin{aligned} {\mathcal {F}}_1 = \frac{2 \times \text {Precision} \times \text {Recall}}{\text {Precision} + \text {Recall}} \end{aligned}$$

(5)

Where:

  • Precision = \(\frac{TP}{TP + FP}\),

  • Recall = \(\frac{TP}{TP + FN}\),

  • \(TP\) = True Positives, \(FP\) = False Positives, \(FN\) = False Negatives.

Equation 5 combines precision and recall into a single value, which is essential for evaluating model effectiveness.

Fig. 8
figure 8

Image representation of F1-Score.

The F1 Score achieved was 0.983500. The result shows a major issue with the model’s potential to find honeycomb defects in concrete structures, as indicated by the negative score, meaning the model was producing worse predictions than random predictions. The F1 Score can be very sensitive to class balance, as negative results often indicate a highly inverse correlation between precision and recall, or just that the model was poorly identifying defects. In any deployment, including infrastructure safety, we want to note that the model has a high F1 Score that captures defect identification accurately and has low false alarms. Low F1 Scores could lead to a misdiagnosis of defect, either missed defects or producing repairs that did not require repair. Therefore, a focus for improvement is the F1 Score so that the detection system represents a reliable and actionable identification of defects. Below in Fig. 8, the F1 Score image representation has been provided to notice the visual outcome of this evaluation.

Mean average precision (mAP)

Mean Average Precision (mAP)51,52 is a measure of a model’s performance, particularly in information retrieval or object detection. mAP approximates the average precision at different levels of recall, providing an estimate of total accuracy across multiple thresholds. mAP with a value of 0.9752 shows the model is doing extremely well with high precision in generating relevant output, as indicated in Table 5. The closer the mAP value to 1, the stronger ranking and identification performance of the model. The high score also indicates the strength of the model in minimizing false positives. The math formula for it is in Equation 6.

$$\begin{aligned} \text {mAP} = \frac{1}{K} \sum _{j=1}^{K} \int _0^1 Q_j(S) \, dS \end{aligned}$$

(6)

In Equation (6) K is the total number of classes, \(Q_j(S)\) represents the precision for class j as a function of recall S, S denotes recall ranging from 0 to 1, and \(\int _0^1 Q_j(S) \, dS\) corresponds to the area under the Precision-Recall curve for class j.

Notably, in this research, the mAP was computed with an Intersection over Union (IoU) threshold of 0.50. This threshold is suitable for the application at hand which is the detection of honeycomb defects in structural images, where a moderate level of localization accuracy is adequate for the detection of anomalies. Since the model has a high average IoU of 0.951467 (explained in the IoU section), the application of an IoU threshold of 0.50 is well-supported. It provides a balance between recall and precision and is still robust.

Table 5 Mean Average Precision (mAP) Score of the model.

Accuracy

Accuracy53,54 is one of the most basic metrics of evaluation because it calculates the rate of instances that were predicted correctly, divided by the total number of predictions. In the scope of deep learning models to identify structural defects, it indicates how well the overall model was successful at differentiating defects from non-defect regions (honeycomb) reliably. High accuracy is desired since it suggests the model, in general terms, is reliable. Nevertheless, when categories are imbalanced, accuracy should be interpreted with supplementary metrics. As stated, accuracy does provide a baseline perspective, indicating how often the model’s predictions were correct, and forms an important piece of the evaluation of both the training and generalization capabilities of the system in operation.

The accuracy \({\mathcal {A}}\) is mathematically defined as:

$$\begin{aligned} {\mathcal {A}} = \frac{TP + TN}{TP + TN + FP + FN} \end{aligned}$$

(6)

Where:

  • \(TP\): True Positives,

  • \(TN\): True Negatives,

  • \(FP\): False Positives,

  • \(FN\): False Negatives.

Equation (6) provides the fraction of correct predictions across all evaluated cases.

Fig. 9
figure 9

Image representation of accuracy curve.

The model achieved a training accuracy of 98.26% and a validation accuracy of 97.80% , indicating excellent generalization from trained data to previously unseen data as shown in Fig. 9. This small gap between these values confirms that the model has limited overfitting, which is a common shortcoming of neural networks in deep learning. These metrics demonstrate a reason for high confidence in real-world evaluations and indicate that the model not only maintains very high classification accuracy, but also that the stability of those metrics can be replicated. In the context of detecting structural defects, such as honeycomb patterns in concrete pillars for example, it can provide confidence that false positives can be minimized, and essentially can signal reduced risk for being out of specification when considered in the context of preventive and responsive maintenance, as it pertains to structural safety and stability.

Novel contributions

  • Dual-model integration The proposed pipeline uniquely combines YOLOv5 for rapid object detection with Mask R-CNN for precise segmentation, ensuring localization and region-wise accuracy.

  • Post-processing optimization The use of Non-Maximum Suppression (NMS) eliminates duplicate detections, improving clarity in prediction results.

  • Confidence-based filtering Only outputs above a defined confidence threshold (\(>0.5\)) are retained, enhancing prediction reliability.

  • End-to-end deployability The algorithm supports deployment on cloud or edge platforms, extending its usability in real-time inspection systems.

These innovations collectively ensure the model’s novelty, robustness, and readiness for practical field deployment, as detailed in Table ??.

Loss curve

The Loss Curve55,56 in Fig. 10 is initially a graphical representation of the model’s loss value over training epochs, which is a visual clue that the model learns correctly. The loss curve typically includes training loss and validation loss, allowing us to monitor convergence and detect potential overfitting. If the loss decreases, it means that the model is learning, and, in case the curves are far apart, it is likely that there are overfitting or underfitting issues. The mathematical formula is given in Equation 7.

$$\begin{aligned} L_e = -\frac{1}{N} \sum _{i=1}^{N} \sum _{c=1}^{C} y_{i,c} \log ({\hat{y}}_{i,c}) \end{aligned}$$

(7)

Here, \(L_e\) represents the loss at epoch \(e\), while \(N\) is the total number of samples and \(C\) is the number of classes. The term \(y_{i,c}\) denotes the true label for the sample \(i\) and the class \(c\), where it is 1 if the class is correct and 0 otherwise. Meanwhile, \({\hat{y}}_{i,c}\) is the probability predicted for the class \(c\) of sample \(i\).

Fig. 10
figure 10

Image representation of Loss Curve.

Figure 10 provides a picture of the loss curve by showing the loss of training and validation over 20 epochs. The blue line demonstrates a training loss that consistently falls and this reflects that learning is reasonably practical. The red dashed line indicates that the validation loss does not increase; however, it remains slightly higher than the training loss, which means that there is only a minimal difference in generalization. As the last epochs start, both losses are steady, and so it is eternal. The reduction of both curves is an evident token of the improved performance of the model. Loss validation follows the downhill line without jumps, which is a sign of nice training stability.

Receiver operating characteristics

During World War 2, engineers and scientists collaboratively introduced the Receiver Operating Characteristic57,58 curve. It compares true positives with false positives and shows how finely a model differentiates classes. If the result produces a higher curve, it means that the model works better. The mathematical equation is given in Eq. (5) and the ROC model is given in Fig. 11.

$$\begin{aligned} \text {TPR} = \frac{A}{A + B}, \quad \text {FPR} = \frac{C}{C + D} \end{aligned}$$

(5)

In the ROC formula, True Positives are denoted by A, B represents False Negatives, C denotes False Positives are denoted by C, and D represents True Negatives

Fig. 11
figure 11

Image representation of Receiver operating characteristics.

The ROC curve in Fig. 11 shows the performance of a classification model. The x-axis has the false positive rate (FPR), while the y-axis has the True Positive Rate (TPR). The blue line represents the model performance, which is expected to get even closer to the top left-hand point. This shows that the model has high sensitivity and specificity. The AUC measures obtained were 0.9828, which is an excellent model performance. AUCs near 1 have high discriminative power, meaning that the model can differentiate between positive and negative classes with very little or no wrong classification.

Intersection over Union (IoU)

IoU (Intersection over Union) is an important metric to track object detection and segmentation agreement. IoU59,60 measures the overlap between the predicted segmented area and the actual ground truth segmented area. It provides a clear indication of localization accuracy for object detection models. For this project, the localizing defect detection, especially for honeycomb patterns in concrete pillars, the IoU value is important because it tells us that the model was localizing the defects accurately and detecting the defects accurately.

The IoU is mathematically defined as:

$$\begin{aligned} \text {IoU} = \frac{\text {Area of Overlap}}{\text {Area of Union}} \end{aligned}$$

(8)

Where:

Equation 8 defines the ratio of overlap to union and is a key measure for evaluating segmentation models.

Fig. 12
figure 12

Image representation of IoU for segmentation accuracy.

The model obtained an IoU score of 0.951467, which is a positive indication of the model’s ability to detect and localize honeycomb defects in concrete pillars. IoU is a useful metric for assessing model performance because it provides an easily quantifiable measure of the overlap of the predicted regions of a defect area with the actual defect area, allowing us to assess both the detection and localization of any structural defects. Assuming an IoU score closer to 1.0 means the model is very precise with both detection and location. Therefore, any part of the defect area labeled was included, and the misses that occurred were likely smaller portions of the entire defect area.

In structural defect detection, particularly for safety-critical applications such as honeycomb detection in concrete pillars, a high IoU score demonstrates that the model can accurately characterize both the defect and its boundaries. This is very important for applications that involve maintenance and safety because it minimizes the chance of missing or misclassifying a defect of safety value that could compromise the structural integrity of the inspection entity. The high IoU is therefore important for the model to be robust enough for practical use in inspection systems and can be relied upon in terms of safety and reliability in infrastructure management. The high IoU illustrated in Fig. 12 affirms the accuracy and effectiveness of the model in localizing defects..

Calibration curves

The Calibration Curve61,62 is used to estimate the degree to which the predicted probabilities of the model agree with the actual results. A calibrated model will have predicted probabilities true to the correct probabilities, providing sound decision-making. If the curve closely traces the diagonal line (45 degree line), then calibration is good, indicating that the predicted probabilities are closely associated with the actual observed frequencies. Departures from this line are over- or under-confidence in the predictions, which can reflect that the model must be further refined. In Fig. 13, the calibration curve is shown, highlighting the calibration performance of the model.

Fig. 13
figure 13

Image representation of calibration curve.

In Fig. 13 of the calibration curve, the predicted probabilities of the model are closely in agreement with the actual probabilities, following the orange dashed line representing the perfect calibration. The blue line stays close to the diagonal with only a few minor deviations at the lower probability quartiles, so it is well-calibrated. The plotted points confirm that the model maintains reliability over various probability intervals.

Calibration error curve

Calibration Curve Error is a performance evaluation metric for gauging how effectively a predictive model’s probability estimates reflect actual outcomes. Within the relevant circumstances of deep learning-based detection, particularly in structural analysis such as addressing honeycomb defects in concrete pillars, this metric is used to quantify the reliability of predicted confidence scores. A well-calibrated model brings predictions where the certainty aligns very closely with the detected frequencies. For example, predictions with an assurance of 0.80, approximately 80%, have to be correct. This confirms trust in the assurance result of the detection model, particularly in critical applications like infrastructure safety evaluations.

The Calibration Curve Error \({\mathcal {C}}_{\text {err}}\) is computed as:

$$\begin{aligned} {\mathcal {C}}_{\text {err}} = \frac{1}{B} \sum _{b=1}^{B} \left| {\mathbb {E}}[{\hat{p}}_b] – {\mathbb {P}}(y=1|{\hat{p}}_b) \right| \end{aligned}$$

(9)

Where:

  • \(B\) represents the number of probability bins,

  • \({\hat{p}}_b\) is the mean predicted probability in bin \(b\),

  • \({\mathbb {P}}(y=1|{\hat{p}}_b)\) is the actual accuracy within bin \(b\).

The Calibration Curve Error Equation (9) assessably calculates the average deviation among predicted probabilities and actual outcomes over multiple bins. It assists in checking how beautifully the model’s confidence scores are measured, making sure of reliability in probabilistic predictions for critical applications such as defect detection in concrete structures.

Fig. 14
figure 14

Image representation of Calibration Error Curve.

In the proposed honeycomb detection method for concrete structures based on YOLOv5, Mask R-CNN, and Non-Maximum Suppression (NMS), the obtained Calibration Curve Error result is 0.18000. The result indicates that the system is quite well-calibrated, such that confidence levels assigned by the detector solidly provide the actual occurrence probabilities of honeycomb defects. A calibration error close to zero is perfect, so the value 0.18 indicates an average calibration level, where decision-makers can depend on the model’s result with enough trust for field arrangement. It plays a critical role in structural safety assessment where unperfect predictions would lead to under- or over-estimation defect severity. The accomplishment of the model proves the model’s strength in identifying defective from non-defective regions through the aid of its combined architecture. In Fig. 14, Calibration Curve error image representation has been depicted to illustrate the visual analogy between predicted and true probabilities, further proving the efficiency and dependability of the formulated detection system.

Matthews correlation coefficient

The Matthews Correlation Coefficient (MCC)63 is a performance measure for binary classification problems, particularly helpful when one deals with imbalanced datasets. It assesses the correlation between true positives, true negatives, false positives, and false negatives. MCC gives an equally weighted measure of model performance and is a better metric than accuracy, mainly for defect detection problems such as detecting honeycomb defects in concrete structures.

The MCC \({\mathcal {M}}\) is calculated using:

$$\begin{aligned} {\mathcal {M}} = \frac{TP \cdot TN – FP \cdot FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}} \end{aligned}$$

(10)

Where:

  • \(TP\) = True Positives,

  • \(TN\) = True Negatives,

  • \(FP\) = False Positives,

  • \(FN\) = False Negatives.

Equation (10) represents a balanced statistical measure that quantifies the accuracy of the model by incorporating all confusion matrix elements.

Fig. 15
figure 15

Image representation of Matthews Correlation Coefficient.

The resulting Matthews Correlation Coefficient is 0.962000. This result represents a highly negative inverse relation between predicted and actual values and is indicative that the model fails to classify defects in honeycombs with significant accuracy. Negative MCC means predictions from the model are highly erroneous, possibly a result of class imbalance or misclassification. Values around 1 are optimal but negative results draw attention to adjusting the model or additional refinement to data. Even though the value is low, it gives an insight into model performance and indicates that optimization of thresholds, feature selection, or data processing will assist in attaining better predictive accuracy. The outcome also indicates that the model must be optimized for more accurate defect detection. Here in Fig. 15, MCC image representation has been provided, indicating this evaluation’s visual result.

Dice similarity coefficient

Dice Similarity Coefficient (DSC)64 is a similarity measurement between two sets, commonly used in image segmentation applications. It is widely used for measuring the accuracy of segmentation models by evaluating the similarity between predicted regions and ground truth. For defect detection work such as honeycomb detection in concrete structures, DSC is an important metric to measure model accuracy.

The DSC \({\mathcal {D}}\) is computed as:

$$\begin{aligned} {\mathcal {D}} = \frac{2 \times |X \cap Y|}{|X| + |Y|} \end{aligned}$$

(11)

Where:

  • \(X\) = Set of predicted positive pixels,

  • \(Y\) = Set of ground truth positive pixels,

  • \(|X \cap Y|\) = Number of overlapping pixels.

Equation (11) captures the overlap between predicted and true regions, with higher values reflecting better model performance.

Fig. 16
figure 16

Image representation of dice similarity coefficient.

The DSC value obtained is 0.921000, showing an extremely strong correlation between the predicted and actual defective areas. With a DSC value near 1, the model detects defect regions well and provides little scope for false positives or false negatives.The model does high-quality segmentation where the model detects honeycomb defects in concrete with precision consistently.This is significant in application scenarios where precise defect location is needed for safety analysis and maintenance. The high score also reflects the reliability of the model, which helps ensure that it continues to be efficient in detecting defect areas and minimizing detection faults. In Fig. 16 here, the image representation by Dice Similarity Coefficient has been shown, further revealing the quality of the model’s segmentation.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *