Incremental learning-integrated TL framework
Figure 6 illustrates the incremental learning-integrated TL framework using the material transfer task 1 as an example. The workflow is identical for the other two TL tasks, including material transfer task 2 and laser power transfer task. The framework is divided into three stages: (a) incremental learning-integrated source domain pre-training, (b) target domain re-training, and (c) target domain test. Firstly, in the source domain pre-training stage (Fig. 6a), an incremental learning approach is integrated into this stage. Incremental learning is an ML strategy that continuously updates model parameters as new data becomes available, rather than retraining from scratch29,30. This approach is particularly effective when handling sequential data. Mathematically, the parameters of a ML model, represented by \(\theta\), are iteratively adjusted to minimize the loss function \(L(\theta )\) as new data points arrive. For a given time step \(t\), the parameter update can be expressed as
$${\theta }_{t+1}={\theta }_{t}-\eta {\nabla }_{\theta }L({\theta }_{t},({x}_{j},{y}_{j}))$$
(1)
where \(\eta\) is the learning rate, and \(({x}_{j},{y}_{j})\) represents the new data point. This update ensures that the model refines itself with each new data, improving computational efficiency without reloading all prior data. This incremental approach reduces computational costs significantly, as pre-training happens in stages rather than re-doing the entire process for each data addition. During this stage, the ML models such as XGBoost, LSTM, TCN, and transformer are sequentially pre-trained using progressively increasing levels of source domain data. Specifically, these levels are defined by the laser power numerical levels that are used for fabricating the samples, which has the greatest impact on numerical size of energy, especially for the peak values. For example, in material transfer task 1, as shown in Table 8, the source domain consists of CoCrMo samples with two laser power levels, resulting in two sequential steps of data input to the ML models. Moreover, the mini-batch size and a reduced learning rate is used after pre-training on data with the minimum laser power level to facilitate stable fine-tuning.

a Incremental learning-integrated source domain pre-training, b target domain re-training, and c target domain test.
After being pre-trained in the source domain, the ML model parameters are transferred to another model, which is then fine-tuned on the training dataset of the target domain (Fig. 6b). By leveraging the model parameters of the source domain, the model for the target domain will converge faster during fine-tuning and the predictive performance can be improved. Finally, the fine-tuned models are evaluated on the test data of the target domain (Fig. 6c). The actual and predicted energy consumption are compared to evaluate the predictive performance of the incremental learning-integrated TL framework.
Machine learning models
In this study, four ML models with different levels of complexity were used to predict energy consumption, including the XGBoost, LSTM, TCN, and transformer model. The XGBoost model is a non-parametric, tree-based approach that handles small to moderate amounts of data very efficiently. However, the XGBoost model lacks inherent temporal structure and relies on lag features. The LSTM and TCN models are two deep learning algorithms with intermediate complexity. Both of them are well known for capturing temporal dependencies in time series data. The LSTM model introduces gated recurrence to learn short- and medium-range temporal dependencies of sequence data, but it usually has a longer training time and needs careful regularization to prevent overfitting. The TCN model leverages causal, dilated convolutions to capture long-range temporal dependencies of sequence data, but it can be trained ~20–30% faster per epoch than the LSTM model and it exhibits more stable gradients. The transformer model introduces the self-attention mechanism to adaptively focus on both local and global temporal contexts. However, its quadratic scaling in sequence lengths demands substantially longer training time and larger datasets.
LSTM is a type of recurrent neural network (RNN) designed to handle sequential data, capable of learning long-term dependencies35. By introducing a gate regulated memory cell into RNN, LSTM is very effective in analyzing time series data. The cell structure of LSTM is illustrated in Fig. 7b. It contains a memory cell state \({c}^{t}\), a hidden state \({h}^{t}\), a candidate cell state \({\widetilde{C}}^{t}\), a forget gate \({f}^{t}\), an input gate \({i}^{t}\), and an output gate \({o}^{t}\). The symbols \(\otimes\) and \(\oplus\) represent pointwise multiplication and pointwise addition, respectively. Blocks marked with tanh and \(\sigma\) represent hyperbolic tangent and sigmoid functions, respectively. Equations (2)–(7) are the basic equations for a typical LSTM model with a forget gate during the forward propagation process. \({W}_{{fh}}\), \({W}_{{fx}}\), \({W}_{{ih}}\), \({W}_{{ix}}\), \({W}_{{ch}}\), \({W}_{{cx}}\), \({W}_{{oh}}\), and \({W}_{{ox}}\) are the weight matrices for different types of gates. \({b}_{f}\), \({b}_{i}\), \({b}_{c}\), and \({b}_{o}\) are the bias terms of the gates. This cell structure allows a neural network to remember and maintain its state over time so that long-term dependences can be captured during the backpropagation stage. The LSTM model used in this study consists of a two-layer LSTM architecture with 64 and 32 units and uses a learning rate of 0.001. It uses the ReLU activation function in the dense layers and the mean squared error as the loss function. The input sequence length is set to 10, ensuring that each prediction incorporates information from the previous 10 time steps. Training is performed with a batch size of 32 for 30 epochs and then fine-tuned with a reduced batch size of 16 for 10 epochs.
$${\text{Forget}}\,{\text{gate}}\,{f}^{t}=\sigma ({W}_{{fh}}{h}^{t-1}+{W}_{{fx}}{x}^{t}+{b}_{f})$$
(2)
$${\text{Input}}\,{\text{gate}}\,{i}^{t}=\sigma ({W}_{{ih}}{h}^{t-1}+{W}_{{ix}}{x}^{t}+{b}_{i})$$
(3)
$${\text{Candidate}}\,{\text{cell}}\,{\text{state}}\,{c}^{t}={f}^{t}* {c}^{t-1}+{i}^{t}* {\widetilde{C}}^{t}$$
(4)
$${\text{Memory}}\,{\text{cell}}\,{\text{state}}\,{\widetilde{C}}^{t}={{\text{tanh}}\,({W}_{{ch}}h}^{t-1}* {{W}_{{cx}}x}^{t}+{b}_{c})$$
(5)
$${\text{Output}}\,{\text{gate}}\,{o}^{t}=\sigma ({W}_{{oh}}{o}^{t-1}+{W}_{{ox}}{x}^{t}+{b}_{o})$$
(6)
$${\text{Hidden}}\,{\text{state}}\,{h}^{t}={o}^{t}* {\text{tanh}}\,({c}^{t})$$
(7)

Diagram of the a XGBoost, b LSTM, c TCN, and d transformer.
TCN is a convolutional architecture that uses dilated causal convolutions to capture long-range temporal dependencies36. A key feature of TCN, as illustrated in Fig. 7c, is the use of residual blocks to enhance stability during training and to promote a deeper architecture without the risk of vanishing gradients. A residual block in TCN consists of two layers of dilated causal convolutions, where the convolution operation is performed with a specific dilation factor \(d\). This dilation factor introduces gaps between filter elements, effectively expanding the receptive field exponentially without significantly increasing the number of parameters. This allows the model to efficiently capture long-range dependencies. The output of a dilated causal convolution for a given layer can be represented as
$$y
(8)
where \(y
(9)
where \({d}_{k}\) is the dimensionality of the key vectors, \(Q{K}^{T}\) calculates the attention scores between the query and all keys, and scaling by \(\surd {d}_{k}\) helps stabilize gradients during training. In the multi-head attention mechanism, this attention operation is performed in parallel across multiple heads, which are then concatenated and linearly transformed. The transformer model used in this study utilizes is built with an embedding dimension of 64, four attention heads, and a feed-forward layer size of 128. The model employs a dropout rate of 0.1 to prevent overfitting and uses mean squared error as the loss function. Training is performed with a batch size of 4 and a learning rate of 0.001 for 20 epochs, then fine-tuned with a batch size of 4 and a reduced learning rate 0.0001 for 10 epochs.
Several error metrics, including MAPE, RMSE, and R², are used to evaluate the performance of the ML models. The MAPE indicates the relative error, expressed as a percentage. The RMSE is a quadratic scoring rule that measures the square root of the average of squared differences, especially sensitive for large differences. R² also known as the coefficient of determination, measures the proportion of the variance in the dependent variable that is predictable from the independent variable. These error metrics are defined as follows
$${MAPE}\,( \% )=\frac{{\sum }_{i=1}^{n}\frac{\left|{y}_{i}-{\hat{y}}_{i}\right|}{{y}_{i}}}{n}\times 100$$
(10)
$${RMSE}=\sqrt{\frac{{\sum }_{i=1}^{n}{\left({y}_{i}-{\hat{y}}_{i}\right)}^{2}}{n}}$$
(11)
$${R}^{2}=1-\frac{{\sum }_{i=1}^{n}{\left({y}_{i}-{\hat{y}}_{i}\right)}^{2}}{{\sum }_{i=1}^{n}{\left({y}_{i}-{\bar{y}}_{i}\right)}^{2}}$$
(12)
where n is the number of observations, \({y}_{i}\) is the actual value, \({\hat{y}}_{i}\) is the predicted value, \({\bar{y}}_{i}\) is the mean of actual value.
Experimental setup and design of experiments
Commercially gas-atomized CoCrMo and Inconel 718 powder (the average diameter is 81.5 μm) were used to fabricate samples. Their chemical composition is shown in Table 4. The low carbon steel was used as the substrate material. Particle morphology under scanning electron microscope (SEM) is shown in Fig. 8.

A customized hybrid manufacturing system (AMBITTM, Hybrid Manufacturing Technologies, Texas, USA) was used to fabricate the CoCrMo and IN718 samples. The AMBITTM hybrid system integrates a laser deposition module with a HAAS TM-1P milling machine, as shown in Fig. 9, providing both directed energy deposition and computer numerical control (CNC) machining functions in a single unit. The system includes a 1000 W IPG fiber laser system, a pneumatic powder feeder system, and a computer-controlled motion system. The powder is delivered to the melt pool created by a highly focused laser beam. Argon is used as carrier and shield gas. The CNC control panel controls the travel path to fabricate the samples. The ABMITTM core and CNC machine are connected and operated under a 3-phase 220 V condition to carry out the hybrid operations. Meanwhile, in this study, energy consumption is defined as the instantaneous energy usage measured in joules (J) per time step over the entire fabrication process. Therefore, energy consumption can be calculated as:
$${energy\; consumption}\left(J\right)=\sqrt{3}* {current}\left({Amp}\right)* v{oltage}\left(V\right)* {power\; factor}$$
(13)
where current is time-variant, measured by a data logger (HOBO U12-006, Bourne, MA) and a current transducer (HOBO CTV-B, Bourne, MA). Power factor equals to 0.85.

Customized manufacturing system used for fabricating the CoCrMo and IN718 samples.
The computer-aided design (CAD) model and scan pattern of the sample are shown in Fig. 10. The length and width of the sample were both 15 mm. Five layers were fabricated with a thickness of 0.54 mm for each layer. 0° and 90° rectilinear scanning strategies were implemented based on the order of layer. The process parameters, including laser power, scanning speed, and powder feed rate, are provided in Table 5. For the CoCrMo samples, each parameter consists of two numerical levels, resulting in a total of 8 samples. For the IN718 samples, laser power consists of three numerical levels, both scanning speed and feed rate consist of two numerical levels, respectively, resulting in a total of 12 samples. Specifically, the data sampling frequency for each CoCrMo sample was 0.05 Hz, meaning one time step was 20 s. For each IN718 sample, the data sampling frequency was 1 Hz, meaning one time step was 1 s. The adjustment in frequency of data collection results in two different datasets to cover more possibilities in the real-world scenarios.

a CAD model of the fabricated CoCrMo and IN718 samples. b Rectilinear scanning scheme.
Data description
According to the design of experiments (DOEs) shown in Table 5, 20 samples were fabricated using two different materials by varying process parameters. Table 6 lists all six input variables and their corresponding levels that were used to predict the energy consumption of fabricated samples. Since energy consumption is time-variant, time steps were introduced to provide more precise information to predict energy consumption. Table 6 reports the maximum time step slot observed across all samples, which ranges from 1 to 271. For the three time-invariant process parameters, both laser power and scanning speed consist of four numerical levels, while feed rate consists of three levels. Meanwhile, for each sample, layer and work indices were also introduced to provide more precise information.
The data collected from 20 samples were split into three tasks. Compared with the incremental learning-integrated TL model, the vanilla models were selected as the baselines since these models were developed without TL and incremental learning. Table 7 summarizes the training and test data divisions used for the vanilla models across different TL tasks.
Table 8 lists the divisions of the source and target domains, training and test datasets for the increment learning-integrated TL models across different TL tasks. In the material transfer task 1, the source domain consisted of the CoCrMo samples fabricated at two laser power levels (Sample 1–8), while the target domain involved the IN718 samples, where sample 9 and 20 were used for re-training and sample 10–19 for testing. In material transfer task 2, the source domain consisted of the IN718 samples fabricated at three laser power levels (sample 9–20) with the target consisting of CoCrMo samples, using sample 1 and 8 for re-training and sample 2–7 for testing. In these two material transfer tasks, only two samples were used for re-training in the target domain: one fabricated with the minimum values for all three process parameters and the other with the maximum values. This selection method was used to capture the full spectrum of potential variability, enabling the model to generalize effectively across the entire domain even with the unseen combination of process parameters. The laser power transfer task only involved the IN718 samples fabricated at different laser power levels, samples fabricated at 700 W (sample 9–12) and 800 W (sample 15–18) were selected as the source domain to directly predict the energy consumption of the samples fabricated at 900 W. Specifically, only sample 13 was used for fine-tuning the model, while samples 14, 19, and 20 were used for testing. The reason why just one sample was used for re-training in the target domain is to minimize computational cost while still enabling adequate model adaptation. Moreover, leveraging samples fabricated at 700 W and 800 W to predict the energy consumption at 900 W is an effective strategy to reduce operational costs in real-world scenarios, as energy consumption is the highest at 900 W laser power. By utilizing lower power levels as the basis for prediction, the approach reduces the cost of extensive data collection and re-training efforts, making it more resource-efficient while still achieving excellent predictive performance.
