Data acquisition and processing
The LPBF experiments were performed at Lawrence Livermore National Laboratory with variable laser powder and speed, which were presented in13,14,15. The system utilized a 1070 nm continuous wave 400 W Yb-fiber with a beam diameter of approximately 100 μm at the focal point. The material used in all studies was 316L stainless steel powder with 50 μm layer thickness placed on 316L plates. Single patches measuring between \(2\times 5\) mm\(^2\) and \(1\times 5\)mm\(^2\) were melted with laser powers ranging between 50–375 W and scan speed between 100 and 400 mm/s at a 100 μm hatch spacing. The physical distance that the beam has moved is between 1 and 4 mm based on laser speeds between 100 and 400 μm/s. A microphone recorded acoustic measurements affixed to the build chamber positioned approximately 25 cm from the center of the build plate. Data was recorded at a sampling rate of 100 kHz with an AC-coupled low-pass filter at 6 dB via a Stanford Research System preamplifier. The distance from the microphone to the tracks was far greater than the size of the lasing tracks themselves, meaning that acoustic measurements were largely agnostic to the specific positions or direction of the laser. Acoustic emissions of the build process are recorded using an acoustic transducer synchronized to the x and y coordinates of the laser using a similar experimental design described in Reference39 and data registration scheme described in Reference13. The schematics presented in Figs. 2 and 3 illustrate the data acquisition and co-registration process. Data are collected at a sampling rate of 100 kHz with an AC-coupled low pass filter of 6 dB and a 10X gain via a Stanford Research Systems preamplifier. Post-build X-ray radiography was performed at beamline 8.3.2 of the Advanced Light Source at Lawrence Berkeley National Laboratory to identify the spatial locations of pores ex-situ. Then, the laser position synchronizes pores with the corresponding acoustic emission time histories. An important limitation of this experimental dataset is that it only looks at single-layer scans and cannot probe the lack of fusion pore formation events.

A schematic of an experiment to collect acoustic emission data during the LPBF process to assess pore formation.

The spatial-temporal registration scheme involves two main steps. First, the pore location(s) identified in radiography images are registered to the corresponding laser-measured locations on an \(x-y\) coordinate grid (a). Second, the time(s) at which the measured x and y coordinates align with a registered pore are denoted as \(t^*\), representing the registration time(s) (b). The acoustic signal is then partitioned into a pore-affiliated segment based on the registered pore time (t) and a random offset. This segmentation allows for the isolation and analysis of the acoustic data associated with the registered pore(s).
The acoustic data was divided into 10 ms windows and labeled as either pore or non-pore based on radiography images. The start of the acoustic waveform windows was offset so that the pores appeared randomly within the 10 ms window. An additional offset was included to compensate for the time-of-flight of the acoustic signal based on the speed of sound in Argon gas with an assumed distance of 25 cm separating the melt pool from the acoustic sensor. This window selection ensures that the pores sometimes occur within the same position in their corresponding acoustic window. This acoustic window partitioning replicates an in situ monitoring scheme where data is likely to be randomly partitioned into segments without knowing the location of a possible pore before ML analysis. The remaining acoustic emissions are labeled as non-pores.
Acoustic data alignment (or co-registration)40 with the pores has already been performed before the data is provided to ML model training. To account for the time-of-flight delay, we apply an offset when collecting time series partitions affiliated with pores based on the speed of sound in Argon and an approximate distance of 25 cm between the acoustic source and the microphone (roughly a 0.77 ms delay). Table 1 reports the acoustic microphone calibration data. Additionally, please refer to Sec-2 and references within Tempelman et al.14, which provides more information on the acoustic emission data collection process from the LBPF experiment.
Data description
We used acoustic datasets from five different experimental trials to train and test ML models, each corresponding to a different substrate used in the LPBF machine. In addition, we also used a combination of all these five datasets for training and testing, which we refer to as a combined dataset. Table 2 summarizes the details of the training and testing data split and the pore/non-pore labels. Each of these samples within the 10 ms window had a time series of 1000 points. The trained ML models are then used to classify the test data samples as either pore or non-pore. All five experimental acoustic datasets have a different number of pores and non-pores and exhibit class imbalance. The non-pore-to-pore ratio of acoustic sets 1, 2, and 4 is greater than 3, while the ratio is less than 2 for acoustic sets 3 and 5.
Note that each set (i.e., 1, 2, \(\ldots\), 5) refers to an individual LPBF experiment. During the experiment, acoustic emissions are collected for each set. As the collected data is imbalanced, stratification is performed to split this data into two sets: training (imbalanced) and testing (balanced). That is, stratified sampling offers a significant advantage in this context. Since the population density of pores (air pockets within the material) varies significantly across different regions within each experiment, stratified sampling ensures the ML model receives a balanced representation of these variations. This, in turn, helps the CNNs trained on this data achieve more consistent accuracy across different regions of the LPBF experiments, as shown in Fig. 7, Tables 3 and 4. Furthermore, combining the acoustic data from all five experiments into a single training set allows us to compare the predictive performance of the CNNs across different sets. This comparative analysis provides valuable insights into the models’ generalizability in the LPBF process. Undersampling techniques can be employed alongside stratified sampling to address the class imbalance within the training set. Undersampling reduces the representation of the majority class to match the size of the minority class. The testing set size is carefully chosen to ensure it contains enough samples for a statistically robust evaluation of CNN’s generalizability under limited and imbalanced training data. By incorporating such a data preprocessing step, we can ensure the ML models are trained on a balanced and representative dataset, ultimately leading to more robust and generalizable CNN predictions for real-world LPBF applications.
Data augmentation
Mixup data augmentation is a technique that enhances ML model performance by addressing class imbalance problems and promoting better generalization32,38,41,42. In the Mixup approach, given a training dataset with input and output, augmented data is created by randomly selecting two samples and assigning weights \(\lambda\) and \(1-\lambda\) to each instance, where \(\lambda\) is randomly drawn from a beta distribution38. The beta distribution is a continuous probability distribution defined on the interval [0, 1]. It is commonly used to model random variables representing proportions or probabilities43. The resulting input-outputs are combined using linear interpolation, generating weakly labeled data. Training on weakly labeled data introduces a regularization effect on the ML models, which helps prevent overfitting and improves the neural network’s ability to generalize to unseen data44,45. By reducing the reliance on exact (or true/strong) labels and allowing the model to learn from a mixture of samples, Mixup effectively mitigates the impact of class imbalance, where the minority class (in this case, pores) is underrepresented compared to the majority class (non-pores). As a result, Mixup data augmentation strikes a balance between fitting the minority class and the majority class. This avoids underfitting, leading to improved ML model performance. By creating augmented (or weakly labeled) samples that blend the characteristics of different instances, Mixup enables the ML model to capture a broader range of variations. Hence, trained ML models become more robust in detecting potential discrepancies in the collected experimental data. This technique has proven to be effective in various domains, including image classification and segmentation tasks, where the class imbalance is a common challenge32,38,41,42. The data augmented sample \((\hat{x}, \hat{y})\) can be represented by:
$$\begin{aligned} \hat{x}&= \lambda \times x_{i} + (1-\lambda ) \times x_{j}\nonumber \\ \hat{y}&= \lambda \times y_{i} + (1-\lambda ) \times y_{j} \end{aligned}$$
(1)
where \((x_i, y_i)\) and \((x_j, y_j)\), \(x_i, x_j \in X\), \(y_i, y_j \in Y\), which are strongly labeled samples. X is the input time-series vector, and Y is the target pore or non-pore labels. As we are working with a classification problem, we assign a value of 1 to \(y_i \in Y\) for a pore and 2 for a non-pore. \(\hat{y}\) is rounded to the nearest integer during interpolation. This process adds noise to the datasets, forcing the neural network to learn from them.
We used the non-pore-to-pore ratios to optimize the number of combinations or ‘mixups’ between two samples with our class imbalance data. By re-using the uncommon event (in this case, pores) multiple times, we effectively increase the size of the training data set. This Mixup-enabled data augmentation strategy accurately trains ML models by reducing the emphasis on a handful of identical rare events, reducing reliance on uncommon events, increasing robustness when learning from corrupt pore/non-pore labels, and improving generalization when faced with adversarial examples. For example, if there are 16 non-pore and four pore samples, the non-pore-to-pore ratio would be 4:1. As indicated in Fig. 4, four iterations of this are performed for each test. In the first iteration, augmented training data is constructed from combinations between the four pore acoustic events and the first four non-pore data. In iteration 2 (Fig. 4), the same four-pore acoustic events are mixed with the subsequent four non-pore events. This Mixup process continues for the next two iterations until an approximate 1:1 ratio number is met. For the combined dataset (Table 2), the original non-pore-to-pore ratio is approximately 2:1. Accordingly, we perform two iterations for this case, which leads to a balance in the data. Figure 5 shows the pore and non-pore sample distribution for the combined dataset before and after this Mixup process. The ratio improves from 2:1 (1694 non-pore and 822 pore samples) to 1:1 (337855 non-pore and pore 337007 samples).
The data augmentation method described in the preceding paragraphs discusses a one-to-one ratio between the unique combination and the synthetically constructed sample (one augmented selection for each combination). Multiple \(\lambda\) values were used to generate more samples per combination to increase the sample size further. For example, two different \(\lambda\) values will construct two augmented samples for one combination. This research analyzed 1–10 \(\lambda\) values for each acoustic set and 1–3 \(\lambda\) values for the combined dataset.
Reusing pores many times increases the size of the training data set, facilitating better training of ML models by placing less emphasis on a handful of identical pores. Therefore, a trained ML model barely memorizes the identical uncommon events, which increases the robustness of neural networks when learning from mixed pore/non-pore labels. Moreover, such ML models generalize better when faced with adversarial examples.

This is a schematic of a non-pore:pore = 4:1 Mixup method, where an equal number of non-pore and pore samples are mixed to generate mixed-up data. This process is repeated until the pore data is mixed with all non-pore data, making the dimensions of non-pore and mixed-up data the same.

Distribution of pore and non-pore samples before and after data augmentation.
ML-model using CNN
We develop a CNN to learn a mapping function between acoustic raw time series and pore/non-pore labels using the original and augmented training data. The convolutional neural layers and associated kernels allow us to learn this mapping by extracting representative features from the acoustic raw time series. Max pooling was employed to condense the number of abstract features, which are finally connected to a dense layer. It calculates the maximum value in each patch of each feature map. The convolutional layer outcomes are downsampled, and the pooled feature maps provide the most present feature in the patch. Dropout is used in these dense layers to reduce overfitting. Moreover, early stopping was used to overcome overfitting. Mathematically, this CNN architecture for acoustic measurement classification can be described as:
$$\begin{aligned} \textrm{output}\left( N_i, C_{\textrm{output}_j}\right) =F\left( \sum _{k=1}^{C_{\textrm{input}}}\varvec{W} \left( C_{\textrm{output}_j},k\right) *\textrm{input}\left( N_i, k\right) +\varvec{b}\left( C_{\textrm{output}_j}\right) \right) \end{aligned}$$
(2)
where N is the batch size, \(C_{\textrm{input}}\) is the number of acoustic measurements in the 10 ms window, \(C_{\textrm{output}}\) is the value of pore and non-pore labels (i.e., 2), \(\varvec{b}\) is the bias, \(\varvec{W}\) is the weight, F is an activation function, and \(*\) is a valid cross-correlation operator. Specifically, within the context of our problem:
-
\(\textrm{Output}\left( N_i, C_{\textrm{output}_j}\right)\): Represents the output value at the i-th sample in the batch (\(N_i\)) for the j-th output channel (\(C_{\textrm{output}_j}\)). This indicates the network produces multiple outputs corresponding to different pore/non-pore classifications.
-
F: Activation function that introduces non-linearity. Common choices include ReLU or sigmoid.
-
\(\sum _{k=1}^{C_{\textrm{input}}}\): Summation over all input channels (k) from 1 to \(C_{\textrm{input}}\).
-
\(\varvec{W}\left( C_{\textrm{output}_j},k\right)\): Weight matrix specific to the j-th output and k-th input channels. This captures how the network learns to combine features from different input acoustic emission measurements.
-
\(*\): This cross-correlation operator differs from the typical convolution used in CNNs for images. It performs a ‘sliding dot product’ between the filter (weights) and the input, resulting in the same output size as the input. Here, it extracts correlations within the 10 ms window of acoustic emission measurements.
-
\(\textrm{input}\left( N_i, k\right)\): Represents the k-th acoustic measurement for the i-th sample in the batch.
-
\(\varvec{b}\left( C_{\textrm{output}_j}\right)\): Bias term for the j-th output channel.
The above Eq. (2) calculates a single neuron’s activation in the CNN’s output layer. The network essentially learns weights (\(\varvec{W}\)) to combine different acoustic measurements (captured by the input channels) within a 10 ms window using valid cross-correlation. The activation function (F) then introduces non-linearity to create a more expressive pore/non-pore classification model.
We minimize sparse categorical cross-entropy loss function (\(\mathcal {L}\)) to find the best model:
$$\begin{aligned} \mathcal {L}(\varvec{y}, \varvec{\hat{y}}) =-\sum _{i=1}^{N}y_{i}\textrm{log}\left({ \hat{y}}_{i}\right) , y_i \in \varvec{y}, \hat{y}_{i} \in \varvec{\hat{y}} \end{aligned}$$
(3)
where \(\varvec{y}\) is the ground truth, and \(\varvec{\hat{y}}\) is the prediction. \(y_i\) is the true label (pore or non-pore) for the i-th sample. \(\hat{y}_{i}\) is the predicted probability for the i-th sample belonging to the pore class (other classes will have their own probabilities summing to 1).
The Eq. (3) defines the sparse categorical cross-entropy loss function, commonly used for multi-class classification problems. It penalizes the network for making incorrect predictions, aiming to minimize the overall loss during training. Here, the network strives to minimize the difference between the predicted pore class probabilities (\(\hat{y}_{i}\)) and the true labels (\(y_i\)).
For the activation function, we used a rectified linear unit (ReLU). ReLU is described as:
$$\begin{aligned} f(x) = \max (0, x) \end{aligned}$$
(4)
where x is an input to a neuron.

Schematic overview of the CNN architecture employed in this study. We are training a CNN on acoustic emission data to identify whether a signal is a pore or non-pore. The acoustic emission signal is co-registered to the location of individual pores/non-pores.
Figure 6 depicts a schematic of the CNN used in this study, developed after studying similar architectures in literature46,47,48. The input is an acoustic raw time series of size 1000 \(\times\) 1. Three convolutional and max-pooling layers perform feature extraction. The three convolutional layers use ReLU activation function with 64, 32, and 16 filter sizes, respectively. After the training samples pass through all these layers, the extracted features are flattened to a 1-dimensional array. This 1D array is then connected to a dense layer with dropout where pore labels are predicted. The output of the fully connected layer is the pore or non-pore classification. Several ML frameworks, such as deep neural networks, convolutional neural networks, and recurrent neural networks, may be employed for pore predictions. These frameworks are all suitable for handling complex datasets. CNN was deemed optimal because its kernels allow us to extract better features than a fully connected dense neural network.
Evaluation metrics
Accuracy is a metric commonly used for evaluating classification tasks. However, caution must be exercised for class imbalanced data such as those presented in Table 2 since the accuracy metric may be artificially high due to the majority. To have a complete evaluation of the model, precision, recall, and F1 scores are implemented to measure CNN’s performance49:
$$\begin{aligned} \textrm{Precision}= & {} \frac{\textrm{TP}}{\textrm{TP} + \textrm{FP}} \end{aligned}$$
(5)
$$\begin{aligned} \textrm{Recall}= & {} \frac{\textrm{TP}}{\textrm{TP} + \textrm{FN}} \end{aligned}$$
(6)
$$\begin{aligned} \mathrm{F1\ Score}= & {} \frac{\mathrm{(2 \times Precision \times Recall)}}{\textrm{Precision} +\textrm{Recall}} \end{aligned}$$
(7)
where TP, FP, and FN refer to true positive, false positive, and false negative. F1 Score is a precision and recall function commonly used to evaluate individual classes in an unbalanced dataset. In this study, a TP is an acoustic event correctly identified as a pore; FP is an acoustic event incorrectly identified as a pore; and FN is an acoustic event that is a pore but incorrectly identified as a non-pore.
