Noise reduction of TD-NMR relaxation curves by a CNN model
In this study, polylactic acid (PLA) was utilized as a representative biodegradable polymer. Sixty-four different films were prepared by changing molding or crystallization conditions of crystallization temperature (75–120 °C), crystallization time (5–40 min), and nucleating agent concentration (0–1.5 wt%) of PLA (Table S1). These factors affect the molecular structure and properties of polymers. For example, Ma et al. demonstrated that crystallization temperature (100–130 °C) influences the crystallinity, tensile strength, and molecular structure of PLA29. Therefore, it can be assumed that differences in the molding conditions of the samples prepared in this study will cause variations in the properties and molecular structure of the polylactic acid film. The wide ranges of crystal structures, from almost amorphous to a high degree of crystallinity, were confirmed in X-ray scattering/diffraction measurements and polarized optical microscopic observations (Fig. S1). Low-field NMR measurements were carried out to evaluate the chain dynamics of PLA, and enzymatic degradation tests and tensile tests were conducted to obtain mechanical properties. Low-field NMR captures changes in molecular mobility associated with differences in the polymer’s higher-order structure and state. In particular, using the MSE pulse sequence, the shape of the relaxation curve in the sub-millisecond time domain can be interpreted in terms of contributions from crystalline, intermediate, and non-crystalline regions. The low-field NMR relaxation curves for all samples are shown in Fig. 2a, b. With increasing the crystallization temperature, the difference in slower relaxation regions in the time after 0.05 ms seemed to appear, suggesting that the difference in relaxation behavior is derived from the difference in crystalline structure of PLA. These differences were less pronounced when the dependence of nucleating agent concentration and crystallization time was plotted (Figs. S2 and S3). This may be due to the increase in crystal regions of PLA promotes faster T2 relaxation, thereby reducing signals from the non-crystalline regions, which exhibit slower T2 relaxation. It should be noted that the differences were hardly observable in real scale, suggesting the difference in dynamics in amorphous regions of PLA between the samples was less observed, because the chain mobility of PLA is frozen at room temperature due to high glass transition temperature (Tg ~ 60 °C). Furthermore, the samples crystallized at 90–120 °C have a high noise level even in log scale, making it difficult to discern the relative magnitudes of the signals.

a, b Relaxation curve before denoising in a real number scale and b log scale. Vertical scale “R” represents relaxation rate. Polylactic acids were crystallized with different crystallization temperatures, crystallization times, and concentrations of nucleating agents. The colors correspond to crystallization temperatures at 75 °C (blue), 90 °C (green), 105 °C (orange), and 120 °C (red).
The relaxation curves were denoised by a CNN model. We constructed a custom architecture based on SE-ResNet as shown in Fig. S430,31. Noiseless and noisy artificial data were created to mimic relaxation curves. The CNN model was trained with artificial relaxation curves with noisy data (input) and denoised data (output). An example of denoising for a simulation curve is shown in Fig. S5. The noises in relaxation curves were reduced through the CNN, indicating that the CNN model worked effectively for the denoising task.
After training, real data of polylactic acid relaxation curves were input for denoising. The denoising for the real data of low-field NMR measurements of PLA by the CNN model was attempted as shown in Fig. 3a, b. The denoised curves showed a linear decay on the logarithmic axis in the slow time region, suggesting high noise rejection performance in the measured data. In addition, the separation of the relaxation curves by molding conditions such as crystallization temperature was clearly visualized, suggesting improved interpretability of differences in molecular dynamics between samples. Similarly, the moderate differences in nucleating agent concentration and crystallization time were also distinguishable (Figs. S6 and S7). These results indicate that denoising the relaxation curves, even when differences are subtle, can assist in correlating relaxation behavior with the crystal structure of PLA. Furthermore, this suggests that the latent representation of the relaxation curves learned by the denoising model may correspond to material property information.

Relaxation curve of polylactic acid with different crystallization conditions after denoising in a real number scale and b log scale. The colors correspond to crystallization temperatures at 75 °C (blue), 90 °C (green), 105 °C (orange), and 120 °C (red).
Material property prediction by random forest regression
The material properties of PLA were predicted by random forest regression (RFR). As the material properties, enzymatic degradation rates (renzyme), strain at break, and Young’s modulus were evaluated for all films. For strain at break and Young’s modulus, the films were pre-treated by enzymatic degradation and/or light irradiation before the tensile tests (Y2-Y9, Table 1). All data points from the relaxation curves (either denoised or raw) were used as explanatory variables, and the property values were used as objective variables. Figure 4 shows the root mean square error (RMSE) of each property predicted by RFR. The RFR showed a similar RMSE trend for train data with and without denoising (Fig. 4a). However, the prediction using the test data of the non-denoised curves showed an increase in RMSE, suggesting that the regression performance of the RFR was reduced by noise, especially in unseen or test data (Figs. 4b–d and S8). A similar trend was observed in the coefficient of determination (R2) for most properties (Table 1), indicating that the predictions from the denoised relaxation curves show stable accuracy for both train and test. The improvement of prediction performance of material properties by denoising may be due to the clarification of the difference in relaxation curves by property, as shown in Fig. 3. In other words, noise in the relaxation curves likely caused overfitting on the training data, reducing predictive performance on unseen test data. Although there have been many attempts to predict properties and extract descriptors of polymers23,32,33, this study successfully predicted a wide variety of properties while using only low-field NMR relaxation curves as explanatory variables, which is a relatively easy method to obtain the information of chain dynamics as well as higher order structure. Our approach demonstrates the feasibility of a prediction approach of material properties based on simple low-field NMR measurements. In this study, the data were limited to 64 cases, and the evaluation by K-fold cross-validation was not stable due to differences in experimental conditions, which limited the estimation of generalization performance. In the future, it will be important to expand the data size, and it is expected that more reliable evaluation and improvement of generalization performance will be achieved when sufficient data is secured.

a, b Root mean squared error (RMSE) of predicted properties using raw (left bar) and denoised (right bar) NMR relaxation curves for a train data and b test data, c, d Parity Plot of Y1 using (c) denoising and (d) raw NMR relaxation curves. The vertical and horizontal axes correspond to predicted values and actual values, respectively. If the predicted and actual values are close, the plots are aligned on the diagonal in the Parity Plot. The color of the bar and plots corresponds to train data with denoising curves (blue), test data with denoising curves (orange), train data with raw curves (green), and test data with raw curves (red).
RFR also provides a measure called “feature importance“, which represents the contribution of each input data point to the prediction. Regions with high contributions can be interpreted as important time segments that significantly influence the target property. The distribution of feature importance thus offers insights that are valuable for advancing materials informatics. In the case of predicting Y1, the feature importance from the denoised curve is high around the 0.1–0.15 ms region, whereas the non-denoised curve shows localized distributions (Fig. 5). The distributions for other properties showed a similar trend (Figs. S9 and S10). While the averaging of multiple weak decision trees in RFR provides robustness to noise, the result of RFR was affected by the noise. These results consistently show the improvements in RFR performance due to noise removal by CNN. Since RFR is one of the effective approaches to NMR analysis21,34,35,36, the improved performance and interpretability achieved through denoising could assist in extracting effective descriptors from relaxation curves for property prediction.

a, b Feature importance of the enzyme degradation rate prediction model using a the denoising curve, and b an untreated curve. Vertical axis: Feature Importance, horizontal axis: corresponds to the time axis of the relaxation curve.
Feature extraction by GAP layers in a CNN model
The features of the relaxation curves were also extracted from the latent space in the CNN. In the current study, the global average pooling (GAP) layer was treated as the latent space in the denoising CNN model (Fig. S4). The correlation coefficients between each node value of GAP layers (GAP values) and each material property were calculated, and the one latent space with the strongest correlation with each material property was identified (Table 1, rlatent-properties). A notable correlation was observed for all properties except Y8, consistent with the poor RFR prediction accuracy for that property (Fig. 4a). Linear relationships between the GAP values and material properties were identified, as shown in Fig. 6a, b. These correlations suggest that the latent space of the CNN denoising model extracts the information of dynamics, which is related to the property. In addition, clusters corresponding to the crystallization temperature were observed in the scatter plot (Figs. 6 and S10), suggesting the extraction of crystal structure information from the relaxation curves in the latent space. These results indicate the effectiveness of the noise reduction model in extracting features from the relaxation curves.

a Correlation between the enzymatic degradation rate (Y1) and GAP values. b Correlation between Young’s modulus with treatments (Y9) and GAP values. The GAP values were selected by the highest correlation coefficients. The colors of the plots correspond to the crystallization temperatures: blue 75 °C, green 90 °C, orange 105 °C, and red 120 °C.
It has been pointed out that the material properties of polymers can correspond to the latent space of neural networks28,37. Attempts have been made to analyze the correspondence between latent space and glass transition temperature, to visualize the latent space by PCA analysis, and to map the polymer properties to the properties using RFR37,38,39,40. In these attempts, training data are generally expressed as strings in SMILES format, etc., to construct large-scale chemical language models. In contrast, our study used a lightweight training dataset with 310 data points and artificial 50,000 numbers of mimic relaxation curves, to form polymer fingerprints corresponding to various material properties of mechanical strength and biodegradability before and after degradation. Since the relaxation curve shape reflects the physical and chemical state of the nucleus, the approach using pattern recognition architecture may lead to essential and efficient information extraction. Visualization of important regions from relaxation curves using machine learning has been attempted by Okada et al.21, but this study extends this approach by synthesizing variables that correspond linearly to material properties from the relaxation curves to construct effective indicators as objective variables in the material properties.
Each latent space in the encoder–decoder structure includes any information in input and/or output data, which can be dealt with important feature in the original dimension. We therefore visualized the feature maps in the last convolutional layers corresponding to the selected GAP latent variables. In Fig. 7, the feature maps corresponding to the GAP variables are highly correlated with Y1 and Y9. Since the GAP value is calculated as the average of the feature maps, regions with large absolute values in the feature maps are considered to be regions with a strong contribution to the latent space-material property correspondence, representing important patterns of the input data41. The regions in the relatively early relaxation time were recognized as the important regions in the relaxation curve (Figs. 7 and S11). This behavior showed a different tendency for the feature importance of RFR (Fig. 5a). Because the kernel in CNN is suitable to extract the information in attenuation of the relaxation curve, more fluctuated regions were recognized as the important descriptors of low-field NMR for the properties. Generally, the entire latent space is often targeted as a black box in model validation, as seen in techniques such as Grad-CAM or non-linear analysis39,42,43. On the other hand, in this study, by calculating the correlation of properties for each latent space and specifying the latent space corresponding to the properties, the feature map was visualized as a heat map highlighting the regions with strong influence on that property.

a, b Feature maps in CNN for the NMR relaxation curves with a the biodegradability (Y1) and b Young’s modulus with treatments (Y9) for polylactic acid. Vertical axis: sample number, Horizontal axis: time axis of the relaxation curves. The GAP values were selected by the highest correlation coefficients among the relationships between several GAP values and objective properties.
Bayesian optimization for prompt searching the adequate process conditions
We then performed BO to identify effective process conditions using the latent space. BO is an algorithm for finding the input x that maximizes the output y in an unknown function y = f(x), using a small number of trials. It is widely applied in the process of searching for conditions to optimize material properties. To evaluate the contribution of the latent space to the optimization of the molding process, optimizations of the crystallization conditions with GAP values as the objective variable were performed, along with Bayesian optimization for conventional material properties and random search for comparisons. The sampling space in BO consists of the molding conditions for PLA. A Gaussian Process was employed as the surrogate model for BO, and the Upper Confidence Bound was used as the acquisition function. In order to reduce the influence of randomness on the interpretation of the results, a series of optimization processes was performed 200 times for each property. In most cases, the BO for GAP values and material properties showed higher performances compared with the random searches (Figs. 8 and S12). A significant difference in the number of trials required to reach 80% of the maximum value for a given property was observed between the optimization processes. The maximum value was simply selected as the highest for each objective value. The magnitude of the difference in the number of trials was determined numerically using Cohen’s d (Table 2). BO using GAP values significantly outperformed random search for Y6–Y9, and Cohen’s d also indicated a high-performance difference. The only other case of Y1 showed slightly worse performance than the random search, resulting in Cohen’s d of −0.2389, suggesting that the effect was small. The comparison of optimization from material properties and optimization from GAP also showed no significant difference for Y1–Y5 and Y8, while optimization from GAP had a weaker effect than optimization from material properties for Y6 and Y9, and a moderate effect for Y7. Overall, BO using latent-space objectives achieved improvements comparable to those achieved by conventional BO using actual property values, and both approaches performed better than random search.

BO was carried out by optimization of process conditions for GAP values with a number of trials. The results on the vertical axis were plotted as property values (Y9) instead of GAP values. The bootstrap sampling, or the resampling with replacement from the original dataset, was carried out. Blue: direct Bayesian optimization for properties, Orange: Bayesian optimization for GAP values correlated with properties, Green: random search, Black: maximum properties, Gray: 80% line of maximum properties.
Previous studies include Bayesian optimization of the degradability of polylactic acid using NMR spectral data as the objective variable44, materials property optimization combining evaluation of material properties by a neural network that has been pre-trained with a large amount of real data and candidate submission by a genetic algorithm45, and data derived from biological samples. The correspondence between latent space and qualitative scores of image classification models of biological samples has been studied46. Our study combines elements of these approaches, emphasizing both the use of readily obtainable experimental data and the ease of constructing an analytical model. The denoising model can be easily trained with artificial data, and once trained, its latent space encodes material property information extracted from measured low-field NMR data. It was shown that Bayesian optimization using the latent space can optimize various properties of polymers as well as real data, suggesting the feasibility of a property optimization process based on data that can be obtained more quickly and inexpensively than before.
Perspectives for the future
This study demonstrates the feasibility of using a CNN latent space obtained from a denoising task, instead of direct material property values, for optimizing biodegradable polymer processing. The denoising model developed in this study removed noise components from the signal and visualized small differences in the relaxation curves reflecting molecular dynamics. Noise removal also improved the accuracy of property regression and contributed to the visualization of important components. Furthermore, the specific latent space that correlates with material properties was identified from large latent spaces of the denoised model, suggesting the possibility of extracting information relating to material properties of biodegradable polymers. BO using these latent features as the objective variable for optimizing molding conditions showed performance comparable to using actual measured properties as the objective. Hence, this method is applicable to optimize the material design of biodegradable polymers without high experimental costs of degradation.
As a limitation of this study, experimental measurement of material properties remains essential, since identifying the GAP corresponding to a given property requires the property values themselves. However, by initially applying conventional methods and switching to the proposed approach once enough data has been collected, it may be possible to mitigate the impact of this limitation and improve the efficiency of materials development. Signal denoising itself is beneficial for data analysis, and given that the acquisition of material properties is often a bottleneck in materials development, optimization without explicit property acquisition, by leveraging the latent space of the denoising model, could be naturally integrated into the development process.
The application of the noise reduction model to the optimization of material properties in this study suggests the feasibility of a material property optimization process that omits the process of obtaining material properties. Since relaxation curves reflect molecular dynamics of polymers, some latent spaces in the denoising model are linearly correlated with the material property. This may enable the integrated replacement of multiple measurements for material properties with simple and high-throughput low-field NMR measurements, which is expected to contribute to the efficiency of material property optimization. In general, noise reduction models can be created relatively easily by constructing data sets by creating artificial data using random numbers or by adding white noise to measured data. Thus, this method is useful for materials with high experimental costs to evaluate their properties. Furthermore, noise reduction models have the potential to extract effective features from a wide range of NMR measurements, such as solution NMR, which captures exhaustive information at frequencies that reflect the exhaustive information through the chemicals and materials, as well as low-field NMR47,48. Thus, the latent space utilization of the noise reduction model treated in this study is expected to be applied to diverse fields, not limited to polymers and low-field NMR.
