UAV imagery and crack characteristics
Based on their morphology and propagation trends, cracks in bridge pavements can be classified into four types: transverse, longitudinal, block, and grid cracks. The two types share several common characteristics.
-
(1)
They propagate in irregular and unpredictable directions.
-
(2)
The crack width remains relatively uniform over short distances in the longitudinal direction.
-
(3)
Within the crack area, the optical reflectivity (or pixel value) consistently showed a lower intensity than the surrounding areas.
Owing to these properties, when cracks are segmented into sufficiently small sections, they can be approximated as a series of relatively uniform rectangular strips. Using grayscale data from these sections and plotting them as a curve, a characteristic U-shaped grayscale pattern was obtained, as illustrated in Fig. 2. The nadir of this curve corresponds to the lowest grayscale value. By leveraging these properties, a specific function template that optimally fits the cracks can be obtained, thereby enabling the extraction of highly correlated data for crack identification. Subsequently, the skeleton of the segmented image is extracted, from which the length data and the degree of damage in the area can be captured.

Gray value curve of crack section.
Design of enhanced matched filter algorithm
The gray-scale intensity within crack regions is typically lower than that of surrounding areas. Consequently, crack locations can be detected by identifying morphological features or geometric shapes4. Traditional digital image processing methods for crack detection based on gray-scale variations can be broadly categorized into two groups according to gradient calculations:
-
(1)
First-derivative-based methods: These simulate the first-derivative acquisition process, where extreme derivative values correspond to crack locations. Examples include Roberts, Sobel5, Prewitt6, Canny operators8, and wavelet-based detection techniques.
-
(2)
Second-derivative-based methods: These focus on identifying zero-crossing points of the second derivative, interpreted as edges. Representative approaches include Laplacian of Gaussian (LoG) detection7 and zero-crossing detection, which exhibit heightened sensitivity to subtle curvature changes in edges.
The choice between these methods depends on application-specific requirements and desired edge-detection precision.
Matched filter is a robust technique for distinguishing known signals from noisy backgrounds. By convolving input signals with predefined templates, it optimizes the signal-to-noise ratio (SNR) to enhance detection accuracy. Widely applied in radar, communications, and biomedical signal processing—particularly for extracting weak signals from noise—its core principle in image processing involves designing customized filters that capture target-specific attributes. This enables precise localization of image segments exhibiting strong correlation with filter characteristics. Historically, O’Gorman et al.9 pioneered cosine-based filters for fingerprint detection by leveraging ridge patterns. Chaudhuri et al.10 advanced this field with Gaussian filters for retinal vessel detection, achieving notable results. Recently, Zhang et al.11 successfully extended its application to pavement crack detection.
Building on these foundations, this study enhances the matched filter algorithm for crack detection (methodological flowchart shown in Fig. 3.
The grayscale value data in the crack area are roughly similar to the inverse Gaussian function. Therefore, using the Gaussian function to convolute and fit the grayscale value in the crack area can also be referred to as a Gaussian matched filter. The crack area f (x, y) in the image was simulated using a Gaussian function as follows:
$$f(x,y)=A[1 – k{\text{ }}\text{e}\text{x}\text{p}(\frac{{ – {d^2}}}{{2{\sigma ^2}}})]$$
(1)
where f (x, y) represents the intensity of the grayscale values in the image, (x, y) represents the coordinates of the points in the image, A represents the local background intensity, k is the reflectivity of the measured object, d is the distance between point (x, y) and the line segment passing through the center of the object, σ represents the intensity distribution.
The designed optimal filter must have the same grayscale value morphology as the crack area.
$$hopt= – \text{e}\text{x}\text{p}(\frac{{ – {d^2}}}{{2{\sigma ^2}}}).$$
(2)
Where hopt is the optimal filter function.
Owing to the hypothetical segmentation of the crack area into different small segments, the cracks were approximated using small rectangular segments. Therefore, a linear function is required to estimate it, and the designed small convolution kernel is
$$K(x,y) = – \exp (\frac{{ – x^{2} }}{{2\sigma ^{2} }}){\text{ }}|y| \le \frac{L}{2}$$
(3)

Enhanced matched filter algorithm’s flow-process diagram.
where L is the length of the line segment, x is perpendicular to the line segment, and y is in the direction of the line segment. To match the line segments in different directions, kernel K (x, y) must be rotated accordingly. The correlation between the points in the rotation kernel and the points in the horizontal kernel is given by the following equation:
$${p_i}=p{\left[ \begin{gathered} \cos \theta i{\text{ }} – \sin \theta i \hfill \\ \sin \theta i{\text{ }}\cos \theta i \hfill \\ \end{gathered} \right]^\text{T}}$$
(4)
where pi is the position of the point in the i-th θ angle, P is the corresponding point in the horizontal kernel, and T is the transpose of the matrix.
Because the two sides of the Gaussian curve extend to infinity, the neighborhood N={(x) is used for computational convenience, y)||x|≤3σ, |y|≤L/2} and truncates at x = ± 3σ on the Gaussian curve. Therefore, the i-th kernel is given by the following equation:
$$Ki(x,y)= – \text{e}\text{x}\text{p}(\frac{{ – {x^2}}}{{2{\sigma ^2}}}){\text{ }}\forall pi \in N$$
(5)
An additive Gaussian white noise model was used to describe the noise. Because the mean of the kernel function should be zero, the i-th kernel function is
$$Ki^{‘} (x,y) = Ki(x,y) – mi{\text{ }}\forall pi \in N$$
(6)
where mi is the mean of kernel Ki (x, y).
Crack information in multiple directions was extracted by convolving the image with omnidirectional kernels. The maximum value per direction was retained as the initial crack identification. Subsequently, a connected-domain-based threshold was applied to filter these results and obtain crack detection outputs. Final results were achievable when image quality was adequate. For images containing speckle noise, connected-domain denoising was employed: segmented regions below the threshold were discarded to eliminate noise. This denoising step provided supplemental robustness when initial screening results were reliable.
The matched filter algorithm employing direction-specific kernels for multi-orientation feature extraction offers advantages over gradient-based methods, including higher detection accuracy and reduced sensitivity to noise. However, its dual-loop computational structure suffers from low operational efficiency. To enhance processing speed while preserving the algorithmic framework and detection precision, this study implements the following improvements:
-
(1)
Vectorization of Gaussian-matched filter kernels coupled with coordinate transformation via broadcasting mechanisms, enabling batch computation to reduce loop iterations;
-
(2)
Parallel processing integration, treating convolution operations of directional filters as independent tasks to leverage multi-core CPU capabilities through parallel computing frameworks;
-
(3)
Dynamic path generation with result caching mechanisms to improve file handling flexibility and manageability.
Crack skeleton extraction and length calculation
Crack skeleton extraction involves the segmentation and extraction of a crack image to simplify it into a skeleton of a single pixel. Crack length can be obtained by calculating the number of pixels and estimating the extent to which each pixel represents the actual length. The degree of crack damage in this range can be determined to a certain extent by calculating the crack length per unit area.

Principles of crack imaging.
Currently, the primary skeleton extraction methods include morphological refinement, distance transformation, and central axis transformation. A morphological refinement method was adopted in this study. Morphological corrosion is an image processing technique based on set operations, the core idea of which is to use a structural element (usually a small, predefined shape, such as a circle or square) to slide over an image and compare it with the pixels in the image.
If the structural elements match the pixels perfectly, then the pixels are preserved; otherwise, they are removed. In this manner, the boundary of the object shrinks inward, and the object smaller than the structural element is completely removed. The fracture skeleton was obtained by refining the extracted and segmented crack images using a morphological method.
The principle of crack imaging is illustrated in Fig. 4. The image captured by the camera follows the principle of aperture imaging, in which the light passing through the aperture forms a real image on the screen. This real image is a projection of the object (crack) onto the screen, inverted both vertically and horizontally, compared with the actual object. According to the principles of crack imaging, a real crack is projected on a screen. The relationships among the distances from the object to the camera lens, from the lens to the image, the actual size of the object, and the size of the object in the image form similar triangles. By obtaining the length k represented by each pixel in the camera in the real world, the pixel counts p of the crack skeleton, the camera’s focal length f, and the shooting distance d, along with the camera’s pixel size c, number of combined pixels x, the actual length l of the crack can be calculated through simple proportional mapping.
The specific calculation method is as follows:
$$l=\frac{{d{\text{ }} \times {\text{ }}p{\text{ }} \times {\text{ }}c{\text{ }} \times {\text{ }}x}}{f}$$
(7)
$$k{\text{ }}={\text{ }}\frac{{d{\text{ }} \times {\text{ }}c{\text{ }}\times {\text{ }}x}}{f}$$
(8)
3D modeling reconstruction of cracked bridges
In bridge structures, different components experience varying stress states. For instance, in reinforced concrete box-girder bridges, the bottom flanges near mid-span typically undergo tension, while the top slabs endure compression. Cracks in bottom tension zones may indicate rebar yielding or fatigue, posing substantial safety risks. Conversely, surface cracks on top slabs—often caused by shrinkage or temperature variations—may be less critical for load-bearing capacity. Therefore, identifying crack locations is essential for assessing structural severity.
3D reconstruction technology utilizes UAV-collected field data to reconstruct 3D models of actual scenes24. Based on data reconstruction typology, it can be categorized into point cloud reverse reconstruction, photo reverse reconstruction, and 3D scanning reverse reconstruction36. By integrating crack detection results with UAV flight and imaging parameters, each crack can be precisely mapped to its actual spatial location, allowing engineers to assess both the existence and functional impact of damage.
This spatially resolved modeling enables time-variant damage tracking, supports cross-temporal comparisons, and helps prioritize maintenance based on crack severity and structural relevance. While close-up images ensure fine-grained detection, 3D reconstruction consolidates these insights into a comprehensive damage assessment framework. Notably, Kim et al.28,29integrated damage detection results by projecting crack shapes onto 3D models, achieving exceptional visualization effects. Displaying spatial crack locations through 3D image reconstruction is critically significant for the performance evaluation of bridge structures. The overall process of data acquisition and modeling is illustrated in Fig. 5.

Data collection and modeling flowchart of 3D reconsrtuction.
-
(1)
Data collection: The oblique photography system collects image data from five different angles (vertical, forward, left, right, and backward) by installing multiple sensors on the same flight platform. It also saves the GPS and shooting angle data.
-
(2)
Preprocessing: The acquired image data undergo preprocessing, which includes denoising, color balancing, and other operations, to enhance the image quality.
-
(3)
Camera calibration: To ensure that the captured photos or videos can accurately restore the three-dimensional information of objects, camera calibration is necessary. The purpose of calibration is to determine the internal and external parameters of the camera, specifically the conversion relationship between the camera coordinate system and world coordinate system.
-
(4)
Image matching and orientation: Algorithms, such as feature point detection and descriptor extraction, are used to find the corresponding relationships between photos from different angles to achieve image matching.
-
(5)
3D model reconstruction: based on on-site data combined with internal and external parameters, 3D modeling of objects is performed. Using the method of regional network joint adjustment and multi-view image matching, an irregular triangular network was constructed to generate a three-dimensional box model. Finally, by integrating the 3D model with the real spatial information of the image, the automatic mapping of the surface texture of the 3D model was achieved, thereby establishing a high-resolution real-world 3D model with realistic and natural textures.
