Bayesian optimization of biodegradable polymers via machine learning driven features from low-field NMR data

Noise reduction of TD-NMR relaxation curves by a CNN model

In this study, polylactic acid (PLA) was utilized as a representative biodegradable polymer. Sixty-four different films were prepared by changing molding or crystallization conditions of crystallization temperature (75–120 °C), crystallization time (5–40 min), and nucleating agent concentration (0–1.5 wt%) of PLA (Table S1). These factors affect the molecular structure and properties of polymers. For example, Ma et al. demonstrated that crystallization temperature (100–130 °C) influences the crystallinity, tensile strength, and molecular structure of PLA²⁹. Therefore, it can be assumed that differences in the molding conditions of the samples prepared in this study will cause variations in the properties and molecular structure of the polylactic acid film. The wide ranges of crystal structures, from almost amorphous to a high degree of crystallinity, were confirmed in X-ray scattering/diffraction measurements and polarized optical microscopic observations (Fig. S1). Low-field NMR measurements were carried out to evaluate the chain dynamics of PLA, and enzymatic degradation tests and tensile tests were conducted to obtain mechanical properties. Low-field NMR captures changes in molecular mobility associated with differences in the polymer’s higher-order structure and state. In particular, using the MSE pulse sequence, the shape of the relaxation curve in the sub-millisecond time domain can be interpreted in terms of contributions from crystalline, intermediate, and non-crystalline regions. The low-field NMR relaxation curves for all samples are shown in Fig. 2a, b. With increasing the crystallization temperature, the difference in slower relaxation regions in the time after 0.05 ms seemed to appear, suggesting that the difference in relaxation behavior is derived from the difference in crystalline structure of PLA. These differences were less pronounced when the dependence of nucleating agent concentration and crystallization time was plotted (Figs. S2 and S3). This may be due to the increase in crystal regions of PLA promotes faster T₂ relaxation, thereby reducing signals from the non-crystalline regions, which exhibit slower T₂ relaxation. It should be noted that the differences were hardly observable in real scale, suggesting the difference in dynamics in amorphous regions of PLA between the samples was less observed, because the chain mobility of PLA is frozen at room temperature due to high glass transition temperature (T_g ~ 60 °C). Furthermore, the samples crystallized at 90–120 °C have a high noise level even in log scale, making it difficult to discern the relative magnitudes of the signals.

**Fig. 2: Low-field NMR relaxation curves of polylactic acid with different crystallization process conditions.**

The relaxation curves were denoised by a CNN model. We constructed a custom architecture based on SE-ResNet as shown in Fig. S4^30,31. Noiseless and noisy artificial data were created to mimic relaxation curves. The CNN model was trained with artificial relaxation curves with noisy data (input) and denoised data (output). An example of denoising for a simulation curve is shown in Fig. S5. The noises in relaxation curves were reduced through the CNN, indicating that the CNN model worked effectively for the denoising task.

After training, real data of polylactic acid relaxation curves were input for denoising. The denoising for the real data of low-field NMR measurements of PLA by the CNN model was attempted as shown in Fig. 3a, b. The denoised curves showed a linear decay on the logarithmic axis in the slow time region, suggesting high noise rejection performance in the measured data. In addition, the separation of the relaxation curves by molding conditions such as crystallization temperature was clearly visualized, suggesting improved interpretability of differences in molecular dynamics between samples. Similarly, the moderate differences in nucleating agent concentration and crystallization time were also distinguishable (Figs. S6 and S7). These results indicate that denoising the relaxation curves, even when differences are subtle, can assist in correlating relaxation behavior with the crystal structure of PLA. Furthermore, this suggests that the latent representation of the relaxation curves learned by the denoising model may correspond to material property information.

**Fig. 3: Low-field NMR relaxation curves after CNN denoising.**

Material property prediction by random forest regression

The material properties of PLA were predicted by random forest regression (RFR). As the material properties, enzymatic degradation rates (r_enzyme), strain at break, and Young’s modulus were evaluated for all films. For strain at break and Young’s modulus, the films were pre-treated by enzymatic degradation and/or light irradiation before the tensile tests (Y2-Y9, Table 1). All data points from the relaxation curves (either denoised or raw) were used as explanatory variables, and the property values were used as objective variables. Figure 4 shows the root mean square error (RMSE) of each property predicted by RFR. The RFR showed a similar RMSE trend for train data with and without denoising (Fig. 4a). However, the prediction using the test data of the non-denoised curves showed an increase in RMSE, suggesting that the regression performance of the RFR was reduced by noise, especially in unseen or test data (Figs. 4b–d and S8). A similar trend was observed in the coefficient of determination (R²) for most properties (Table 1), indicating that the predictions from the denoised relaxation curves show stable accuracy for both train and test. The improvement of prediction performance of material properties by denoising may be due to the clarification of the difference in relaxation curves by property, as shown in Fig. 3. In other words, noise in the relaxation curves likely caused overfitting on the training data, reducing predictive performance on unseen test data. Although there have been many attempts to predict properties and extract descriptors of polymers^23,32,33, this study successfully predicted a wide variety of properties while using only low-field NMR relaxation curves as explanatory variables, which is a relatively easy method to obtain the information of chain dynamics as well as higher order structure. Our approach demonstrates the feasibility of a prediction approach of material properties based on simple low-field NMR measurements. In this study, the data were limited to 64 cases, and the evaluation by K-fold cross-validation was not stable due to differences in experimental conditions, which limited the estimation of generalization performance. In the future, it will be important to expand the data size, and it is expected that more reliable evaluation and improvement of generalization performance will be achieved when sufficient data is secured.

**Fig. 4: Prediction of property from low-field NMR curves by random forest regression (RFR).**

Table 1 Kind of material’s properties and results on machine learning

RFR also provides a measure called “feature importance“, which represents the contribution of each input data point to the prediction. Regions with high contributions can be interpreted as important time segments that significantly influence the target property. The distribution of feature importance thus offers insights that are valuable for advancing materials informatics. In the case of predicting Y1, the feature importance from the denoised curve is high around the 0.1–0.15 ms region, whereas the non-denoised curve shows localized distributions (Fig. 5). The distributions for other properties showed a similar trend (Figs. S9 and S10). While the averaging of multiple weak decision trees in RFR provides robustness to noise, the result of RFR was affected by the noise. These results consistently show the improvements in RFR performance due to noise removal by CNN. Since RFR is one of the effective approaches to NMR analysis^21,34,35,36, the improved performance and interpretability achieved through denoising could assist in extracting effective descriptors from relaxation curves for property prediction.

**Fig. 5: Feature importance in NMR relaxation curves estimated by random forest regressors.**

Feature extraction by GAP layers in a CNN model

The features of the relaxation curves were also extracted from the latent space in the CNN. In the current study, the global average pooling (GAP) layer was treated as the latent space in the denoising CNN model (Fig. S4). The correlation coefficients between each node value of GAP layers (GAP values) and each material property were calculated, and the one latent space with the strongest correlation with each material property was identified (Table 1, r_{latent-properties}). A notable correlation was observed for all properties except Y8, consistent with the poor RFR prediction accuracy for that property (Fig. 4a). Linear relationships between the GAP values and material properties were identified, as shown in Fig. 6a, b. These correlations suggest that the latent space of the CNN denoising model extracts the information of dynamics, which is related to the property. In addition, clusters corresponding to the crystallization temperature were observed in the scatter plot (Figs. 6 and S10), suggesting the extraction of crystal structure information from the relaxation curves in the latent space. These results indicate the effectiveness of the noise reduction model in extracting features from the relaxation curves.

**Fig. 6: Relationships between the material’s properties and selected GAP values.**

It has been pointed out that the material properties of polymers can correspond to the latent space of neural networks^28,37. Attempts have been made to analyze the correspondence between latent space and glass transition temperature, to visualize the latent space by PCA analysis, and to map the polymer properties to the properties using RFR^37,38,39,40. In these attempts, training data are generally expressed as strings in SMILES format, etc., to construct large-scale chemical language models. In contrast, our study used a lightweight training dataset with 310 data points and artificial 50,000 numbers of mimic relaxation curves, to form polymer fingerprints corresponding to various material properties of mechanical strength and biodegradability before and after degradation. Since the relaxation curve shape reflects the physical and chemical state of the nucleus, the approach using pattern recognition architecture may lead to essential and efficient information extraction. Visualization of important regions from relaxation curves using machine learning has been attempted by Okada et al.²¹, but this study extends this approach by synthesizing variables that correspond linearly to material properties from the relaxation curves to construct effective indicators as objective variables in the material properties.

Each latent space in the encoder–decoder structure includes any information in input and/or output data, which can be dealt with important feature in the original dimension. We therefore visualized the feature maps in the last convolutional layers corresponding to the selected GAP latent variables. In Fig. 7, the feature maps corresponding to the GAP variables are highly correlated with Y1 and Y9. Since the GAP value is calculated as the average of the feature maps, regions with large absolute values in the feature maps are considered to be regions with a strong contribution to the latent space-material property correspondence, representing important patterns of the input data⁴¹. The regions in the relatively early relaxation time were recognized as the important regions in the relaxation curve (Figs. 7 and S11). This behavior showed a different tendency for the feature importance of RFR (Fig. 5a). Because the kernel in CNN is suitable to extract the information in attenuation of the relaxation curve, more fluctuated regions were recognized as the important descriptors of low-field NMR for the properties. Generally, the entire latent space is often targeted as a black box in model validation, as seen in techniques such as Grad-CAM or non-linear analysis^39,42,43. On the other hand, in this study, by calculating the correlation of properties for each latent space and specifying the latent space corresponding to the properties, the feature map was visualized as a heat map highlighting the regions with strong influence on that property.

**Fig. 7: Feature maps in CNN correlate to Y1 and Y9.**

Bayesian optimization for prompt searching the adequate process conditions

We then performed BO to identify effective process conditions using the latent space. BO is an algorithm for finding the input x that maximizes the output y in an unknown function y = f(x), using a small number of trials. It is widely applied in the process of searching for conditions to optimize material properties. To evaluate the contribution of the latent space to the optimization of the molding process, optimizations of the crystallization conditions with GAP values as the objective variable were performed, along with Bayesian optimization for conventional material properties and random search for comparisons. The sampling space in BO consists of the molding conditions for PLA. A Gaussian Process was employed as the surrogate model for BO, and the Upper Confidence Bound was used as the acquisition function. In order to reduce the influence of randomness on the interpretation of the results, a series of optimization processes was performed 200 times for each property. In most cases, the BO for GAP values and material properties showed higher performances compared with the random searches (Figs. 8 and S12). A significant difference in the number of trials required to reach 80% of the maximum value for a given property was observed between the optimization processes. The maximum value was simply selected as the highest for each objective value. The magnitude of the difference in the number of trials was determined numerically using Cohen’s d (Table 2). BO using GAP values significantly outperformed random search for Y6–Y9, and Cohen’s d also indicated a high-performance difference. The only other case of Y1 showed slightly worse performance than the random search, resulting in Cohen’s d of −0.2389, suggesting that the effect was small. The comparison of optimization from material properties and optimization from GAP also showed no significant difference for Y1–Y5 and Y8, while optimization from GAP had a weaker effect than optimization from material properties for Y6 and Y9, and a moderate effect for Y7. Overall, BO using latent-space objectives achieved improvements comparable to those achieved by conventional BO using actual property values, and both approaches performed better than random search.

**Fig. 8: Bayesian optimization (BO) of Y9 with GAP as the latent space, as the objective variable.**

Table 2 Statistical test to confirm the difference in Bayesian optimization and random search

Previous studies include Bayesian optimization of the degradability of polylactic acid using NMR spectral data as the objective variable⁴⁴, materials property optimization combining evaluation of material properties by a neural network that has been pre-trained with a large amount of real data and candidate submission by a genetic algorithm⁴⁵, and data derived from biological samples. The correspondence between latent space and qualitative scores of image classification models of biological samples has been studied⁴⁶. Our study combines elements of these approaches, emphasizing both the use of readily obtainable experimental data and the ease of constructing an analytical model. The denoising model can be easily trained with artificial data, and once trained, its latent space encodes material property information extracted from measured low-field NMR data. It was shown that Bayesian optimization using the latent space can optimize various properties of polymers as well as real data, suggesting the feasibility of a property optimization process based on data that can be obtained more quickly and inexpensively than before.

Perspectives for the future

This study demonstrates the feasibility of using a CNN latent space obtained from a denoising task, instead of direct material property values, for optimizing biodegradable polymer processing. The denoising model developed in this study removed noise components from the signal and visualized small differences in the relaxation curves reflecting molecular dynamics. Noise removal also improved the accuracy of property regression and contributed to the visualization of important components. Furthermore, the specific latent space that correlates with material properties was identified from large latent spaces of the denoised model, suggesting the possibility of extracting information relating to material properties of biodegradable polymers. BO using these latent features as the objective variable for optimizing molding conditions showed performance comparable to using actual measured properties as the objective. Hence, this method is applicable to optimize the material design of biodegradable polymers without high experimental costs of degradation.

As a limitation of this study, experimental measurement of material properties remains essential, since identifying the GAP corresponding to a given property requires the property values themselves. However, by initially applying conventional methods and switching to the proposed approach once enough data has been collected, it may be possible to mitigate the impact of this limitation and improve the efficiency of materials development. Signal denoising itself is beneficial for data analysis, and given that the acquisition of material properties is often a bottleneck in materials development, optimization without explicit property acquisition, by leveraging the latent space of the denoising model, could be naturally integrated into the development process.

The application of the noise reduction model to the optimization of material properties in this study suggests the feasibility of a material property optimization process that omits the process of obtaining material properties. Since relaxation curves reflect molecular dynamics of polymers, some latent spaces in the denoising model are linearly correlated with the material property. This may enable the integrated replacement of multiple measurements for material properties with simple and high-throughput low-field NMR measurements, which is expected to contribute to the efficiency of material property optimization. In general, noise reduction models can be created relatively easily by constructing data sets by creating artificial data using random numbers or by adding white noise to measured data. Thus, this method is useful for materials with high experimental costs to evaluate their properties. Furthermore, noise reduction models have the potential to extract effective features from a wide range of NMR measurements, such as solution NMR, which captures exhaustive information at frequencies that reflect the exhaustive information through the chemicals and materials, as well as low-field NMR^47,48. Thus, the latent space utilization of the noise reduction model treated in this study is expected to be applied to diverse fields, not limited to polymers and low-field NMR.

Source link

打开Binance账户 commented on Venture capital is opening the gates for defense tech: Can you be more specific about the content of your
注册 commented on Apple Stops Human Support on X: Your point of view caught my eye and was very inte
god of كازينو commented on Apple and Salesforce respond to YouTube video complaints: Hello Dear, are you actually visiting this web pag
创建免费账户 commented on CX Decoded Podcast Episode 2: AI Empowered CX: Real Conversations, Real Results: Shri Nandan, Comcast: Thank you for your sharing. I am worried that I la
开设Binance账户 commented on Driving Innovation & Making a Lasting Impact: Can you be more specific about the content of your

Bayesian optimization of biodegradable polymers via machine learning driven features from low-field NMR data

Noise reduction of TD-NMR relaxation curves by a CNN model

Material property prediction by random forest regression

Feature extraction by GAP layers in a CNN model

Bayesian optimization for prompt searching the adequate process conditions

Perspectives for the future

Leave a Reply

RECENT POSTS

AI has already focused more than 60% of global investment in “start-ups”

Russia floods information space with AI war videos, stalling profits – ISW

Nuclear ambition: How AI is the latest legal threat to trucking

Noise reduction of TD-NMR relaxation curves by a CNN model

Material property prediction by random forest regression

Feature extraction by GAP layers in a CNN model

Bayesian optimization for prompt searching the adequate process conditions

Perspectives for the future

Related Posts

Leave a Reply