Screening strategy
Figure 1 summarizes the workflow for the stability investigation of polyacrylamide-based hydrogel coating via droplet microarray platform and machine learning integrating strategy. The workflow primarily consists of five parts: (i) Miniaturized parallel hydrogel coating synthesis enabled by molecule solution dispensing technology; (ii) ImageJ software assisted rapid evaluation of the hydrogel coating stability under harsh conditions; (iii) Screening of extracted molecular feature descriptors via correlation analyze and recursive elimination; (iv) Machine learning model construction while interpreting using game-theory based SHAP methodology; (v) Selection and synthesis of high-performance coating formulation based on model predication. This droplet microarray high-throughput preparation platform was equipped with a piezoelectric pipetting tip to achieve a precise dispensing of droplets. Benefiting from the high-throughput experimental platform, we developed an efficient and economical method to produce hundreds of hydrogel coatings on one substrate. This method was demonstrated to be generally applicable to prepare and characterize a broad range of hydrogel coatings, such as zwitterion, poly (ethylene glycol) (PEG), 2-hydroxyethyl methacrylate (HEMA)18,24,25 and polyacrylamide (this work). Moreover, manually analyzing the large amounts of data obtained in high-throughput experiments may be time-consuming and laborious, and it is easy to ignore some underlying information. An interpretable machine learning method could aid us in understanding the molecular features that determined the performance of all of hydrogel coatings we tested. We analyzed the 19 feature descriptors of each coating and elucidated their contribution to stability tendencies. This closed-loop integrating strategy that involved data generation, data analysis and experimental verification had accelerated the investigation of the polyacrylamide-based hydrogel coatings.

Workflow of droplet microarray platform and machine learning integrating strategy for the investigation of polyacrylamide-based hydrogel coatings.
Hydrogel coating preparation based on droplet microarray platform
Due to their biocompatibility, polyacrylamide-based hydrogels have extensive applications in biomedical settings26, such as contact lenses27, drug delivery systems28, and anti-biofouling coatings29. The stability of polyacrylamide-based hydrogel coatings under physiological conditions is crucial for their longer-term application. To maximize the investigation scope of the stability of polyacrylamide-based hydrogel coatings, 9 commercially available acrylamide-derived monomers were selected, and the molecular structural formula of these monomers were shown in Fig. 2a. They provided wide chemical diversity, including various alkane chain lengths, distinct functional groups, both linear and cyclic aliphatic structures, varying charge properties, and diverse wettability. Relying on the reactivity of the C = C bond and crosslinker Bis, acrylamide-derived monomers can form hydrogel networks through one-step UV-induced free radical polymerization (Fig. 2b). As shown in Fig. 2c, the combinatorial preparation of the hydrogel coatings was achieved via a miniaturized high-throughput manner. According to the preset program, different precursor solutions are sequentially pipetted onto the metal surface and undergo in-situ polymerization to form hydrogel coatings (details were presented in ‘Methods’). A library of 117 copolymer hydrogel coatings comprising unique single or binary combinatorial mixtures (75:25, 50:50, 25:75 in mass ratio) was fabricated using these monomers.

a Monomers for combinatorial synthesis via photopolymerization: acrylamide (M1), diethylacrylamide (M2), hydroxymethylacrylamide (M3), hydroxyethylacrylamide (M4), acryloylmorpholine (M5), (acrylamidopropyl)trimethyl-ammonium (M6), 2-acrylamido-2-methyl-propane sulfonic acid (M7), dimethylacrylamide (M8), N-isopropylacrylamide (M9). b The polymerization method used for hydrogel preparation. c Schematic diagram of the process to prepare a hydrogel coating microarray.
The pipetting process was monitored, and the corresponding digital image is shown in Fig. 3a. The top view and side view of the coating microarray images demonstrated the successful preparation and regular distribution of hydrogel coating spots (Fig. 3b, c). A single microarray could consist of 400 (20 × 20) unique coating formulations, and these coatings exhibited the same size and thickness. The optical microscopy image showed that the diameter of each coating spot is approximately 2 mm (Fig. 3d). Moreover, the green and red fluorescent dyes were observed to be evenly distributed in the whole spot after preparation of the coating microarray (Fig. 3e, f). This phenomenon confirmed the adequate mixing and homogeneous distribution of the monomers in the hydrogel coating microarrays. The uniform mixing is mainly due to significant turbulence during the drop pipetting18,25. The miniaturized high-throughput synthesis paradigm enabled rapid screening of hydrogel coatings.

a Droplet pipetting process. b Top view of the microarray with a 400 (20 × 20) hydrogel spots. c Side view of the microarray. d The optical microscopy image of a single spot. e Green and f red fluorescence images of a single spot.
High-throughput stability evaluation of the hydrogel coating microarrays
From the literature, the monomer and crosslinking would impact the properties of the hydrogel coatings30,31,32. Taking these two parameters as variables, we first examined the water immersion stability of polyacrylamide-based hydrogel coatings. The hydrogel coating spots after the immersion for 48 h were simply classified into intact spots, slightly damaged spots and seriously damaged spots. The typical images of these three types of coatings were shown in Fig. S1, while the damage degree was used to evaluate the immersion stability of coating spots. As shown in Fig. S2, insufficient crosslinker content weakens the elastic properties of the polymer network, leading to poor stability of the hydrogel coating in an immersion environment. On the other hand, excessive crosslinkers leave the hydrogel coatings too fragile to remain stable against swelling33. A similar trend was observed when varying the monomer content. Almost all hydrogel coatings at a monomer content of 20 wt.% and crosslinker content of 10 wt.% were demonstrated to remain intact in the water immersion environment (Fig. S2f). Therefore, the above two parameters were fixed at 20% and 10% in subsequent experiments to avoid the coating being damaged by swelling.
Further, a library of hydrogel coatings was built to investigate the impact of different monomer combinations on their stability in the presence of strong base, acid and mixed enzymes. To rapidly evaluate the extent of damage on these hydrogel coatings in a parallel fashion, we used a high-throughput image analysis method to quantitatively measure the retained area of coating spots after immersion in these harsh conditions, respectively. The quantitative measurement process of 10 typical coating spots was shown in Fig. S3. Figure 4a summarized the stability values (hereafter referred to as Stab value) of all hydrogel coatings involving single and binary combinations of M1 to M9. In this 9 × 45 matrix, red and blue squares represented the relatively high and low Stab value, respectively. The hydrogel coatings with optimal Stab value were marked in bright red. Although we were not able to discern clear trends in the stability of the polyacrylamide hydrogel coatings under harsh conditions in their composition, the introduction of hydrophobic monomer M9 generally yielded more hydrogel coatings with high Stab value. In contrast, a binary combination of M8 and M9 yielded the optimal hydrogel coating that exhibited stability in strong base, strong acid and mixed enzyme solution. Some other formulations such as a 75:25 combination of M4 and M9 or a 50:50 combination of M2 and M8 exhibited excellent stability in the immersion of strong acid and mixed enzyme solution, respectively. The homopolymer prepared by single M8 and M9 monomers exhibited decent stability, but their Stab value was lower compared to the optimal hydrogel coating in the binary combination.

a Heat map of the Stab value of the single and binary combination of hydrogel coatings using high-throughput assays. The heat map reported the specific compositions of coatings. The vertical axis (on the left) represented the molecular combinations, and the numbers on the horizontal axis (at the top) indicated the proportion of the first monomer (expressed by larger characters). The sum of the proportions of the two monomers was 100. The sum of SHAP values of each monomer for the Stab value of hydrogel coatings after the immersion of b strong base, c acid and d mixed enzyme solution.
To comprehensively analyze the impact of hydrogel composition on its stability. A machine learning regression model (GBR) was built to fit the relationship between the hydrogel formulation and their Stab value. The GBR model offers robustness to high-dimensional and sparse features, and excels in capturing complex nonlinear relationships34. SHAP analysis was used to interpret the machine learning model output. A simple dataset was generated to capture the hydrogel coating formulations, whereby each coating was represented using one-hot encoding based on the mass ratios of the monomers. Subsequently, SHAP analysis was used to interpret the machine learning model output. As shown in Fig. 4b–d, the sum of positive (red color) and negative (blue color) SHAP values for each monomer in all hydrogel coating formulations was displayed separately. The contributions of the monomers to hydrogel coating stability generally exhibited consistent trends in different environments. The results pointed to M2, M5, M8 and M9 facilitating the coatings to be stable under harsh conditions, while the introduction of M1, M3, M6 and M7 was associated with the coating destabilization. Based on the results of SHAP analysis, it can be preliminarily deduced that the linear and charged monomers appear to be harmful for the stability of hydrogel coatings, and monomers with side chains seem to improve the coating stability. This may be attributed to the presence of side chains increases the steric hindrance of the polymer network, enhancing the entanglement between molecular chains35. Moreover, the electrostatic repulsion in charged monomers would increase the distance between polymer chains in hydrogels, resulting in a decrease in the tightness of the coating network, thereby impacting the integrity of the hydrogel coatings and reducing their stability36,37. Although one-hot encoding of hydrogel coating formulations has revealed some trends in their behaviors, its lack of generalization ability restricts its application in guiding the design of hydrogel coatings in unexplored chemical spaces.
Identifying main molecular feature descriptors for the coating stability
To help us in identifying the main molecular features giving rise to the stability of polyacrylamide-based hydrogel coatings under different conditions, we first extracted a total of 19 types of feature descriptors for each monomer. These descriptors were classified into four categories, including physicochemical descriptors, 2D molecular graph descriptors, 3D shape descriptors and hydrogen-bonding descriptors. Each polyacrylamide molecule was first represented as a simplified molecular input line entry system (SMILES) string, and subsequently their feature descriptor extraction could be implemented in the RDKit library38. The brief introduction of these 19 feature descriptors was summarized in Table S1. The feature descriptors for the hydrogel coating were obtained by calculating the weighted average of monomer descriptor values according to the specific coating composition26. We built five different regression models and used these 19 feature descriptors as the input. The MSE and R2 values were used to evaluate the prediction accuracy of the regression models, and the model with the minimal MSE and maximal R2 value was selected for further study. As shown in Fig. 5a–c, among these regression models, the best model corresponding to predicting the Stab value of hydrogel coatings in strong base, strong acid and mixed enzyme conditions was the GBR, GBR and ABR, respectively. The MSE values of the models ranged from 300 to 400. Based on the best models, we performed the feature screening on these 19 molecular feature descriptors using correlation analysis and recursive elimination.

Prediction accuracy (lower MSE value and higher R2 value represent higher accuracy) of SVR, ABR, GBR, ANN and GPR regression models for the Stab value of hydrogel coatings in a strong base, b strong acid and c mixed enzyme environments. d The flow chart showed the correlation screening process. e Pearson correlation coefficients between all molecular feature descriptors, where cyan and brown represented positive and negative correlations, respectively, and deeper colors indicated stronger correlations.
Figure 5d, e showed the correlation screening process and the correlation coefficient r value between 19 molecular feature descriptors. For any pair of feature descriptors exhibiting a strong correlation, two machine learning models were built using one descriptor from the pair along with the remaining descriptors as inputs. After evaluating the prediction accuracies of the two models, the feature descriptor corresponding to the less accurate model was excluded from the input dataset. In the machine learning model to predict the Stab value of hydrogel coatings in strong base, strong acid and mixed enzyme solutions, the same 12 feature descriptors were obtained after a correlation screening process. As shown in Fig. 6a–c, the prediction accuracy of the model increased after correlation screening (evidenced by the decreased MSE value). The redundant information in the input dataset led to overfitting or noise amplification of the model39.

Correlation screening and subsequent recursive elimination process for the input molecular feature descriptors in a strong base, b strong acid and c mixed enzyme dataset. The red star represented the number of the feature descriptors corresponding to the highest accurate model. Corresponding detailed process of feature screening via correlation analysis and recursive elimination in d strong base, e strong acid and f mixed enzyme dataset. The Y-axis was the name of the feature descriptors while the X-axis meant the round of the recursion elimination process (round 0 represented the results after the correlation screening. Yellow, blue and red rhombi represented the feature descriptors at round 0, remaining feature descriptors after each round of screening, and main feature descriptors obtained by the recursive elimination process, respectively.
These 12 feature descriptors were further screened using recursive elimination. With the recursive elimination process proceeding, the prediction accuracy of the model gradually increased. Eliminating feature descriptors that weakly correlated with the hydrogel coating stability further improved the prediction performance of the models40. Notably, the prediction accuracy of the three models changed from an upward trend to a downward trend when the number of features was reduced to 5, 5, and 6, respectively. It indicated that the weakly correlated descriptors have all been screened out, and the recursive elimination could be terminated at this point. Finally, 5, 5, and 6 feature descriptors were identified as the main descriptors for predicting the Stab value of hydrogel coatings under strong base, strong acid, and mixed enzyme environments, respectively. The detailed main feature identifying process using correlation analysis and recursive elimination was visualized in Fig. 6d–f. The remaining feature descriptors after each round of recursive elimination can be clearly observed (marked with a blue rhombus). Red rhombuses highlighted the main feature descriptors obtained after the whole screening process, which were considered the most determinant set of descriptors for hydrogel coating stability. This set of main feature descriptors was related to all 4 categories, they were physicochemical descriptors (logP and TPSA), 2D molecular graph descriptors (HallKierAlpha and BertzCT), 3D molecular shape descriptors (Eccentricity, FractionCSP3, Asphericity and InertialShapeFactor) and hydrogen-bonding descriptors (HBD). Interestingly, the feature descriptors determining the stability of hydrogel coatings exhibited disparities in different environmental conditions. Moreover, the predicted Stab values and the corresponding experimentally measured values of the three models before and after the feature screening were shown in Fig. 7. The MSE value of the models were decreased from 362 to 196 in strong base, from 299 to 145 in strong acid, and from 355 to 219 in mixed enzyme conditions, respectively. The corresponding R²values also increased synchronously. The results demonstrated that the feature screening process using correlation screening and recursive elimination significantly improved the model prediction accuracy.

The distribution of machine learning predicted values and experimental values in a strong base, b strong acid and c mixed enzyme dataset.
Interpretable analysis and experimental validation for machine learning models
To aid in our understanding of the hidden relationship between main molecular feature descriptors and the coating stability, a bee swarm diagram was used to summarize the SHAP values of each main feature descriptor related to the Stab value. The color of each point represented the relative magnitude of the descriptor value within the entire dataset. Specifically, colors closer to deep blue and light green correspond to relatively larger and smaller values, respectively. The X-axis meant the SHAP values of each point. The absolute SHAP values quantified the magnitude of the impact of feature descriptors on hydrogel coating stability, while positive and negative values indicate enhancement or reduction of the stability, respectively. As shown in Fig. 8a–c, LogP dominated the stability performance of the polyacrylamide-based hydrogel coatings in all three environments. Generally speaking, LogP values provide an estimate of the octanol/water partition coefficient, serving as a proxy for an energetic characteristic of the molecules to describe their hydrophilicity/hydrophobicity. The SHAP values of LogP exhibited a wide and monotonic distribution, indicating that LogP was positively correlated with the stability of hydrogel coatings. Overall, hydrogel coatings with high LogP values can form a physical barrier, reducing the direct attack of environmental factors on the hydrogel network. Specifically, hydrophilic functional groups in the network, such as esters and amide bonds, are more prone to hydrolysis under base conditions. High LogP coatings reduced the exposure of these groups, thereby slowing down the hydrolysis process41. In acidic environments, hydrophobic molecular chains lacking protonatable sites preferentially aggregate in non-polar regions, minimizing interactions with H⁺ and preventing the structural damage of the polymer network caused by protonation42. The hydrogel coatings with a high LogP value reduced the exposure of the catalytic site of enzymes. Therefore, LogP contributes most significantly to the stability performance of hydrogel coatings in strong base, strong acid and mixed enzyme environments.

Summary of the SHAP values of main molecular feature descriptors related to the stability behavior under a strong base, b strong acid and c mixed enzyme environments via a bee swarm diagram. The stability of typical coatings outside the original dataset under d strong base, e strong acid and f mixed enzyme conditions. The three numbers in M118, M028 and M136 represented the proportions of M2, M8 and M9 in the copolymer coatings. For example, M118 represented a hydrogel coating consisting of 10% M2, 10% M8 and 80% M9.
Subsequently, the 3D molecular shape descriptors, Eccentricity and Asphericity, were found to be highly relevant to the stability of hydrogel coatings. Coatings with high Eccentricity values exhibited superior stability under base and acid conditions, while those with high Asphericity values were observed high stability in enzyme solution. Both Eccentricity and Asphericity values serve as quantitative metrics for evaluating the extent of deviation of the object between an actual geometry and its idealized theoretical form43. Take the calculation of Eccentricity as an example, the atom (except the hydrogen atom) and the chemical bonds in molecule graphs were treated as vertices and edges, respectively. The Eccentricity value of a molecule was defined as the mean topological distance (i.e., the number of edges on the shortest path) between all pairs of vertices. Therefore, the magnitude of the eccentricity descriptor mainly reflects the extensibility or topological diameter of the molecule. Generally speaking, molecules with the following structural features tend to exhibit high eccentricity values: (i) long-chain alkanes, (ii) highly branched molecules with long arms, (iii) rigid rod-like molecules and (iv) macrocyclic molecules. The polymerization of these molecules tends to form hydrogel networks with topological chain entanglement and domains of directional alignment44. These polymer networks exhibited high elastic properties and have an ability to delay the penetration of corrosive media, thus exhibiting a strong impact on the stability of hydrogel coatings45. This finding agrees with the other work, which emphasizes the importance of polymer chain entanglement and asymmetrical network topology in determining the solvent resistance and increasing mechanical properties of hydrogels via NeTHE model46.
Moreover, HBD was also discovered to contribute a lot to coating stability. HBD is used to quantify the number of atoms or groups in a molecule that can act as hydrogen bond donors. In this work, HBD was identified to have a negative correlation with the stability of hydrogel coatings, i.e., the coating had fewer hydrogen bond donors in the polymer network exhibited better stability in base and acid environments. In general, high hydrogen bond density contributes to improved mechanical properties and stability of hydrogel coatings47. However, in base environments, the presence of abundant H⁺ ions led to the protonation of hydrogen bond donor sites, resulting in the dissociation of polymer network. Similarly, the OH– of base environment could destroy the hydrogen bond donors via deprotonation effect, leading to an increase in the swelling rate and a decrease in the mechanical properties of the hydrogel coating. It is worth noting that, HBD molecular descriptor does not exhibit a dominant effect on the stability of the hydrogel coating after immersing in the mixed enzyme solution. This result was due to that the mixed enzyme solution exhibited a neutral pH, which would not cause the protonation or deprotonation of the HBD hydrogen bond donor groups. The degradation mechanism of the enzyme solution to the hydrogel coating is considered to be the catalysis of specific substrates and is irrelevant to HBD48.
To verify the rationality of the feature identification results, the machine learning models built in this work were used to predict the stability of a large number of ternary hydrogels prepared from the combination of monomers M1 to M9. Similarly, the feature descriptors for the ternary hydrogel coatings were obtained by calculating the weighted average of monomer descriptor values according to the specific coating composition. According to the prediction results, we designed three new hydrogel coatings outside the original dataset and two of them were ternary. These coatings were composed of M2, M8, and M9 in different proportions, and were predicted with relatively high Stab value in all three environments. According to the specific compositions, these coatings were named M118, M028 and M136, respectively. The details of these coating compositions were reported in Table S2. For instance, M118 denotes a hydrogel coating composed of 10% M2, 10% M8, and 80% M9 by molar ratios. Fig. 8d–f showed the stability of M118, M028, M136 and M9 (the typical single-component coating in the original dataset) hydrogel coating under strong base, strong acid and mixed enzyme conditions. Among these, M118 and M028 samples exhibited superior stability to M9 during 168 h. Especially, the weight loss of M118 under all three conditions was less than 5%. Benefiting from the elaborate screening of main features, the advanced strategy successfully designed a ternary hydrogel coating with high stability in harsh environments by utilizing the labeled data obtained from single and binary coatings. The filtered list of feature descriptor candidates that significantly influenced the stability of hydrogel coatings can provide effective guidance for the design of long-term usable coatings. More interestingly, researchers have developed an advanced multi-objective property molecular optimization framework in a recent study. By leveraging the contribution of molecular feature descriptors to the target properties with scarce labeled data, they achieved the multi-objective optimization of molecular performance through structural modifications of molecules that have been optimized for a single target property with sufficiently labeled data49. Although the insufficient stability of hydrogel coating would severely hinder their service performance, current data and feature analyses regarding coating stability remain scarce. The revealed contribution of molecular features to hydrogel coating stability in this work provided a solid foundation for the knowledge transfer from single-attribute datasets to complex multi-objective optimization, thus playing a crucial role in the rational design of various functional hydrogel coatings with satisfying stability.
