Deep active optimization for complex systems

Framework of AO

We summarize our key innovations as follows:

A data-driven formula that leverages the number of visits and ML from a small initial dataset to effectively manage the exploration–exploitation trade-off. This markedly differs from the UCB formula utilized by MCTS, which relies on the average node value and the number of visits derived from numerous simulations.
Local backpropagation that ensures a balanced exploration–exploitation trade-off for the noncumulative reward problems.
Adaptive exploration mechanism that favors exploration over exploitation under certain circumstances.
A modified epsilon-greedy sampling technique that samples best-scored candidates and most-visited candidates at the same time.

While Fig. 1 provides the flowchart illustration of the AL loop, We provide a mathematical formulation framework of the AL problem (referring to materials science as a demonstrator).

Specifically, let X denote the input space (representing, for example, materials such as chemical compositions, specific crystalline structures and so on). Let Y represent the output space, where y ∈ Y (y < +∞) denotes the specific property or property spectrum of interest (for example, mechanical strength or resistivity) The goal is to identify the optimal material x^* ∈ X that maximizes or minimizes a property while minimizing the number of labeled data points required. The initial labeled dataset L consists of $D={\{({x}_{i},{y}_{i})\}}_{i}^{n}$, where n is the number of initial data points (n = 200 in this study). x is the input vector, X is defined as the search space, typically ${{\mathbb{R}}}^{N}$, and N is the dimension. f is the deterministic function that maps the input x to the ground-truth label y. The surrogate model f_θ learns the input-label relation through the dataset $D={\{({x}_{i},{y}_{i})\}}_{i}^{n}$, and n is the number of labels and y_i is the label of x_i.

The AL loop involves iteratively selecting the samples from X, based on a search algorithm Q(x; f_θ), and retraining the surrogate model f_θ. At each iteration t:

(1)

Model training: train the model f_θ using the current labeled dataset D:

$${\theta }^{t}={{\mathrm{argmin}}}_{\theta }{{\mathbb{E}}}_{(x,y)D}[L(\theta ;x,y)],$$

where L is the loss function.
(2)

Search and selection: select a subset of k samples x_new ∈ X based on f_θ using a search algorithm Q(x; f_θ) (k = 20 in both benchmark and real-world studies):

$${x}_{{\mathrm{new}}}={{\rm{argmax}}}_{x\in X}{\;f}_{\theta }.$$
(3)

Labeling and updating: obtain the labels y_new for the selected samples x_new, and add them to

$$D\leftarrow D\cup \{({x}_{{\mathrm{new}}},{y}_{{\mathrm{new}}})\}.$$

RL is another commonly used method for identifying optimal solutions. Differences in AL and RL lie in three main aspects: (1) data accessibility, (2) the quantity of data needed and (3) the nature of rewards (noncumulative versus cumulative).

Data accessibility: In typical RL settings, a policy network interacts with the environment, requiring easy access to reward functions. By contrast, AO, particularly in scientific contexts, often deals with limited access to reward functions. For instance, in materials science, it might take months to obtain just a few labeled data points.
Quantity of data needed: RL training commonly demands large amounts of labeled data or observations to develop an effective policy network. AL, however, operates in a low-data regime, usually with fewer than 1,000 data points, and requires only a value-estimation network.
Nature of rewards: RL algorithms are primarily used for trajectory planning and optimal control problems, involving sequential decisions and cumulative rewards. Conversely, AL typically focuses on maximizing the current reward functions.

Technical details of NTE

There are three modes of action for the stochastic expansion, each occurring with equal probability (that is, 1/3). (1) One-step move: this mode represents the smallest possible change at a single position of the feature vector. (2) Single mutation: in this mode, one position of the feature vector randomly mutates to any value within the allowed range. (3) Scaled random mutation: this mode involves a proportion of the feature vector randomly mutating to any allowed values. The number of leaf nodes equals the dimension of the feature vector.

A real-world complex system often can be represented as a vector or a matrix. For example, in materials science, searching for a high-performance CCAs can be formulated as optimizing the properties by tuning the alloy compositions³⁶; In biology, the protein design can be approached as improving biofunctionalities by optimizing a sequence consisting of 20 amino acids. We implement a convolutional neural network for the deep learning surrogate model. It consists of convolutional layers and is followed by pooling, dropout and normalization layers to prevent overfitting. The network parameters are optimized using Adam Optimizer, and the loss function is the mean-squared error or mean absolute percentage error. More detailed parameters are found in the Supplementary Note and Supplementary Fig. 3.

Standard MCTS consists of four major steps: selection, expansion, simulation and backpropagation (Supplementary Fig. 12). We summarize key differences between DANTE and MCTS.

(1)

The MCTS backpropagation mechanism uses the result of the rollout to update both value and visitation of the nodes along the path, which affects all nodes (from root to end node) at a global level. The local backpropagation updates only the visitation information of the current root node and the subsequent leaf nodes. We do not update the value information because our optimization problem focuses solely on discovering a single optimal state and retains little ‘memory’ of previous states. Therefore, value information is not backpropagated, and visitation backpropagation is short-ranged, relying only on nearby visitation data to guide exploration.
(2)

MCTS selection step chooses the leaf node with max UCB and proceeds to the next expansion with the selected leaf node, whereas the expansion of NTE is conditioned on an inequality of the DUCB: the expansion proceeds with the leaf node that has a higher DUCB than root node; otherwise, it proceeds with the same root.
(3)

Conventional rollout uses the simulation step to reach the end state (for example, win or loss of a game) and uses the average value as the current node value, while the stochastic rollout of NTE does not need the simulation step to obtain the node value; instead, it uses the surrogate model to estimate the node value.

Furthermore, Supplementary Fig. 13 shows the difference between UCB and DUCB.

Top-visit sampling

The sampling technique for selecting top candidates is a critical component of the AO pipeline. An effective sampling method should identify the most informative candidates while preserving data diversity, ensuring the surrogate model generalizes well to unseen data. The widely used sampling approach is the epsilon-greedy method, which combines greedy selection with random sampling. To enhance the generalization capability of the surrogate model, we extend the epsilon-greedy strategy by implementing ‘top-visit sampling’, which samples data that are frequently visited during rollouts. Figure 3b demonstrates that DANTE, when lacking top-visit sampling, exhibits a higher surrogate model loss and requires 30% more data points to achieve the global optimum (as detailed in the Methods, Supplementary Note and Supplementary Fig. 1).

Ablation study

We conducted an ablation study on the Rosenbrock-100d function to analyze the impact of DANTE’s individual components on overall performance (Fig. 3b; additional results are provided in Supplementary Fig. 1). The results clearly show that conditional selection and local backpropagation are critical to DANTE’s effectiveness. Without conditional selection, the tree search suffers from the value deterioration problem and has a 0% convergence ratio, defined as the frequency with which the algorithm identifies the global optimum within a given number of data points. Without local backpropagation, DANTE becomes a greedy stochastic tree search, leading to poor performance and similarly a 0% convergence ratio. Moreover, omitting top-visit sampling and adaptive exploration, while still allowing for convergence, notably degrades performance. In such cases, the average best f(x) remains distant from the global optimum, and the convergence ratios drop to 60% and 30%, respectively.

We further assess the limits of DANTE by evaluating its performance in tackling high-dimensional problems with a limited number of data points. As shown in Extended Data Table 2, DANTE demonstrates exceptional performance, successfully converging across various synthetic functions ranging from 200 to 2,000 dimensions. By contrast, SOTA methods fail to converge to any functions beyond 100 dimensions. Notably, none of the baseline methods achieves global convergence on the Rosenbrock function in dimensions exceeding 10, while DANTE successfully converges in dimensions as high as 200 (Supplementary Table 3).

Synthetic functions

The synthetic functions are designed for evaluating and analyzing the computational optimization approaches. In total, six of them are selected on the basis of their physical properties and shapes. Results for the Ackley, Rosenbrock and Rastrigin functions are presented in the main text because they are widely studied and relevant results are extensively available in the literature. We also test three other synthetic functions (Griewank, Schwefel and Michalewicz), and the results are presented in Supplementary Fig. 10. The Ackley function can be written as

$$f(x)=-a\times \exp \left(-b\right.\sqrt{\frac{1}{d}\mathop{\sum }\limits_{i=1}^{d}{x}_{i}^{2}}-\exp \left(\frac{1}{d}\mathop{\sum }\limits_{i=1}^{d}\cos (c{x}_{i})\right)+a+\exp (1),$$

(2)

where a = 20, b = 0.2, c = 2π and d is the dimension.

The Rosenbrock function can be written as

$$f(x)=\mathop{\sum}\limits_{i=1}^{d-1}\left[100{({x}_{i+1}-{x}_{i}^{2})}^{2}+{({x}_{i}-1)}^{2}\right].$$

(3)

The Rastrigin function can be written as

$$f(x)=10d+\mathop{\sum }\limits_{i=1}^{d-1}\left[{x}_{i}^{2}-10\cos (2\uppi {x}_{i})\right].$$

(4)

The three functions are evaluated on the hypercube x_i ∈ [−5, 5], for all i = 1, …, d with a discrete search space of a step size of 0.1; we also show that different step sizes (within a certain range) do not affect the general behavior of the algorithm (Supplementary Fig. 14). We sample 20 data points per round when using neural networks as surrogate models. More details and results can be found in the Supplementary Note.

Electron ptychography

Feature engineering

The feature vector consists of eight variables: beam energy, defocus, maximum number of iterations, number of iterations with identical slices, probe-forming semi-angle, update step size, slice thickness and number of slices. Detailed values and their bounds are listed in Supplementary Table 4.

Optimization target

The objective function NMSE is calculated between the positive square root of the measured diffraction pattern I_M and the modulus of the Fourier-transformed simulated exit-wave Ψ, which can be formulated as

$$\frac{1}{N}\mathop{\sum }\limits_{i}^{N}{\left\vert \sqrt{{I}_{{\mathrm{M}}(i)}({\bf{u}})}-\left\vert {\mathscr{F}}[{\Psi }_{i}({\bf{r}})]\right\vert \right\vert }^{2},$$

(5)

where r and u denote the real- and reciprocal-space coordinate vectors, respectively, and N is the total number of the measured diffraction patterns.

Correlation index

The degree of matching for a given template T by intensity function P is characterized by a correlation index, which can be defined by the following relation:

$$\frac{\mathop{\sum }\nolimits_{i = 1}^{m}P({x}_{i},{y}_{i})T({x}_{i},{y}_{i})}{\sqrt{\mathop{\sum }\nolimits_{i = 1}^{m}{P}^{2}({x}_{i},{y}_{i})}\sqrt{\mathop{\sum }\nolimits_{i = 1}^{m}{T}^{2}({x}_{i},{y}_{i})}},$$

(6)

where (x_i, y_i) is the coordinate of pixel i.

Dataset simulation

abTEM³⁷, an open-source package, is used for the simulation of a TEM experiment. For this case study, we simulated a four-dimensional dataset of 18-nm-thick silicon along the [110] direction with Poisson noise.

Ptychographic reconstruction

The analysis is performed using py4DSTEM³⁸, a versatile open-source package for different modes of STEM data analysis. See Supplementary Figs. 11 and 15 for more details about the reconstruction process.

Architected materials

Feature engineering

In this study, the objective for architected materials optimization is a Gyroid triply periodic minimal surface structure, which naturally occurs in butterfly wings and is renowned for its exceptional biological characteristics and mechanical performance. The Gyroid scaffold to be optimized comprises 27 subunits with a dimension of 2 × 2 × 2 mm, allowing for tuning its geometry features and mechanical properties by adjusting each subunit’s density. The density of each subunit can take discrete values from 10% to 80%, with an increment of 10%. The base material of the scaffold is Ti6Al4V alloy. Three-dimensional convolutional neural networks are used to accurately and rapidly assess the impact of the adjustments of the subunit’s density on the scaffold’s performance. Details about structure generation are presented in ref. ³⁹.

Optimization target

To mechanically stimulate bone reconstruction in bone defects, it is well recognized that the elastic modulus of bone grafts should be equivalent to that of the replaced bone, which ranges from 0.03 to 3 GPa for cancellous bone and 3 to 30 GPa for cortical bone, while there are specific modulus demands for different anatomical locations⁴⁰. Moreover, it requires the optimization of load-bearing capacity to prevent damage during implantation. Here, we establish the modulus requirement for the implanted site at 2.5 GPa. Consequently, the optimization target is to maximize the yield strength of the scaffold while ensuring the elastic modulus remains within a specified range (2,500 ± 200 MPa).

Finite element simulation

Finite element (FE) simulations of the compressive stress–strain curves of scaffolds are conducted using ABAQUS 2018. The FE simulations utilize the same rigid-cylinder and deformable-implant-structure model. The material property is set to be homogeneous with a Poisson’s ratio of 0.25; more details in the calibration protocol were developed in ref. ³⁹. Ductile damage is used to simulate plastic deformation up to the failure stage, with a fracture strain set at 0.03. The effects of triaxiality deviation and strain rate are disregarded. Displacement and force are extracted during postprocessing and subsequently converted to strain and stress, respectively. FE simulation agrees well with the experiment compression curves (Supplementary Fig. 16).

ML model

The initial dataset (100 density matrices) is consistent with our previous work³⁹, and the corresponding elastic modulus and yield strength are calculated by FE simulations. Three-dimensional convolutional neural networks are used to predict the elastic modulus and yield strength of the scaffolds with varying density matrices. The model architecture comprises an input layer, convolutional layers, fully connected layers and an output layer (refer to the Supplementary Note and Supplementary Fig. 17 for detailed parameters). In the input layer, the scaffold structure is voxelized into 60 × 60 × 60 pixels, where each pixel denotes either the solid phase (1) or void phase (0) within the scaffold. The convolutional layers are designed with a series of three-dimensional convolution kernels to extract high-dimension information about the scaffold, while the output layer delivers the final prediction.

Compositionally complex alloys

Feature engineering

We adopt 27 elements: Fe, Co, Ni, Ta, Al, Ti, Nb, Ge, Au, Pd, Zn, Ga, Mo, Cu, Pt, Sn, Cr, Mn, Mg, Si, Ru, Rh, Hf, W, Re, Ir and Bi, to design six-element CCAs with either bcc or fcc structures. For Fe, Co and Ni, the atomic ratio ranges from 0 at.% to 100 at.%, while for other elements, it ranges from 0 at.% to 40 at.%, with 0.5 at.% intervals. In addition, the total atomic percentage of Fe, Co and Ni is designed to fall between 60 at.% to 80 at.%. For CCAs with a bcc crystal structure, the Fe/(Co + Ni) ratio is required to be greater than or equal to 1.5, whereas for fcc structures, it is required to be less than or equal to 1.5.

Optimization target for magnetic and electric properties

The optimization target is to maximize the following target:

$${\mathrm{Target}}=M\times \rho,$$

(7)

where M stands for magnetic moment and ρ for resistivity.

Optimization target for transport properties

The optimization target is to maximize the following target:

$${\mathrm{Target}}={\mathrm{AHC}}\times {\mathrm{AHA}},$$

(8)

while keeping the formation energy under the upper limit of 0.02.

Density functional calculation

The transport properties are described by the conductivity tensor σ_νμ (ν, μ = x, y, z). The anomalous Hall conductivity (AHC, σ_xy) and anomalous Hall angle (AHA, σ_xy/σ_xx) are determined in the frame of Kubo–Bastin linear response formalism within relativistic multiple-scattering Korringa–Kohn–Rostoker (KKR) Green’s function (GF) method⁴¹, which has been implemented in the MUNICH SPR-KKR package⁴². The Kubo–Bastin formalism includes both the Fermi-surface and Fermi-sea contributions to equal footing, in which the Fermi-surface term contains only contribution from states at the Fermi energy (E_F) while the Fermi-sea term involves all the occupied states (with energy E) below the Fermi energy, that is,

$${\sigma }_{\mu \upsilon }={\sigma }_{\mu \upsilon }^{I}+{\sigma }_{\mu \upsilon }^{II}$$

(9)

$${\sigma }_{\mu \upsilon }^{I}=\frac{\hslash }{2\uppi \varOmega }Tr\left\langle{\hat{j}}_{\mu }\left({\hat{G}}^{+}-{\hat{G}}^{-}\right){\hat{j}}_{v}{\hat{G}}^{-}-{\hat{j}}_{\mu }{\hat{G}}^{+}{\hat{j}}_{v}\left({\hat{G}}^{+}-{\hat{G}}^{-}\right)\right\rangle$$

(10)

$$\begin{array}{l}{\sigma }_{\mu \upsilon }^{II}=\frac{\hslash }{2\uppi \varOmega }\mathop{\int}\nolimits_{-\infty }^{{E}_{F}}{\mathrm{Tr}}\left\langle {\hat{j}}_{\mu }{\hat{G}}^{+}{\hat{j}}_{v}\frac{{\mathrm{d}}{\hat{G}}^{+}}{{\mathrm{d}}E}-{\hat{j}}_{\mu }\frac{{\mathrm{d}}{\hat{G}}^{+}}{{\mathrm{d}}E}{\hat{j}}_{v}{\hat{G}}^{+}\right.\\\left.-\left({\hat{j}}_{\mu }{\hat{G}}^{-}{\hat{j}}_{v}\frac{{\mathrm{d}}{\hat{G}}^{-}}{{\mathrm{d}}E}-{\hat{j}}_{\mu }\frac{{\mathrm{d}}{\hat{G}}^{-}}{{\mathrm{d}}E}{\hat{j}}_{v}{\hat{G}}^{-}\right)\right\rangle.\end{array}$$

(11)

The electric current operator is given by${\hat{j}}_{\mu (v)}=-| e| c\alpha$, with e > 0 being the elementary charge. ${\hat{G}}^{+}$ and ${\hat{G}}^{-}$ denote the retarded and advanced GFs, respectively. The representation of the GFs for the first-principles treatment of equations (10) and (11) leads to a product expression containing matrix elements of the current operators with the basis functions and k-space integrals over scattering path operators. In this averaging procedure, the chemical disorder and vertex corrections are treated by means of coherent potential approximation⁴³. For both Fermi surface and surface terms, the conductivity tensor partitions into an on-site term σ⁰ involving regular and irregular solutions and an off-site term σ¹ containing only regular solutions. This formalism has been validated to provide consistent residual and anomalous Hall resistivities with experiments⁴¹; more details can be found in the Supplementary Note.

ML model

Initial 200 CCAs are randomly generated following the previously described design rules, and their corresponding AHA, AHC and formation energy are calculated by density functional theory (DFT). For CCAs with bcc grain structures, 154 configurations ultimately converge in the DFT calculations, whereas for fcc structures, there are 178. We train one-dimensional convolutional neural networks to predict the AHA, AHC and formation energy of the CCAs. The model architecture includes an input layer, convolutional layers, fully connected layers and an output layer (see the Supplementary Note and Supplementary Fig. 18 for detailed parameters).

Cyclic peptide binder

Feature engineering

We represent the cyclic peptide as a sequence of integers that range from 0 to 19, with each number corresponding to a distinct type of canonical amino acid. The leaf node within the DANTE framework is obtained through stochastic expansion. In this process, two complementary strategies are used: one that introduces random mutations in existing sequences and another that generates entirely new sequences, ensuring a comprehensive exploration of the sequence space.

Optimization target

The optimization target of cyclic peptide binder is defined as follows:

$${\mathrm{Target}}={\mathrm{SC}}\times {\mathrm{dSASA}}/100,$$

(12)

where SC stands for shape complementarity, and dSASA represents the change in solvent-accessible surface area before and after interface formation. The SC value ranges from 0 to 1, referring to how well the surfaces of two proteins fit geometrically together at their interface; dSASA measures the size of the interface (in units of Å²). Both metrics are essential to assess the quality of the interface. Therefore, we multiply these two metrics to formulate a multiobjective optimization problem, which is used to evaluate the performance of DANTE.

Dataset

Fourteen unique protein and canonical cyclic peptide complexes are sourced from the Protein Data Bank, with peptide lengths ranging from 7 to 14 amino acids. We perform three different optimization tasks using DANTE, gradient descent (GD) and Markov chain Monte Carlo (MCMC). The tasks start from a random initial sequence. The structure with the highest target value is selected as the best structure. For each task, we performed three independent tests.

Alphafold2 settings

The structure of protein and cyclic peptide binder complex is predicted by Alphafold2-multimer implemented in ColabDesign. A modified offset matrix for the relative positional encoding of a target protein and cyclic peptide complex is adapted to give the structure with high accuracy⁴⁴. For designing a cyclic peptide binder, the binder hallucination protocol is utilized for both GD and MCMC methods. In this study, we maintain the length of the cyclic binder and the interaction site hotspots consistent with those found in nature. For GD, the method ‘design_pssm_semigreedy()’ is used, setting soft_iter to 120 and hard_iter to 32. The loss function is a weighted sum of pLDDT (predicted local distance difference test) and interface contact loss, with other parameters left at their default settings. For the MCMC method, a total of 1,000 steps are executed to find the sequence achieving the highest pLDDT. More detail can be found in the Supplementary Note.

Rosetta interface analyzer

The SC and dSASA values for the predicted structure of the protein and cyclic peptide complex are computed using the Rosetta Interface Analyzer. Initially, the Rosetta minimize protocol is applied to obtain the structure with minimum energy proximal to the initial conformation. To ensure that cyclic peptides within the complex retain their cyclic nature and do not become linear, the options ‘-use_truncated_termini’ and ‘-relax:bb_move false’ are used. Subsequently, the minimized complex serves as the input for the interface analyzer.

Source link

注册 commented on Apple Stops Human Support on X: Your point of view caught my eye and was very inte
god of كازينو commented on Apple and Salesforce respond to YouTube video complaints: Hello Dear, are you actually visiting this web pag
创建免费账户 commented on CX Decoded Podcast Episode 2: AI Empowered CX: Real Conversations, Real Results: Shri Nandan, Comcast: Thank you for your sharing. I am worried that I la
开设Binance账户 commented on Driving Innovation & Making a Lasting Impact: Can you be more specific about the content of your
binance Registrera dig commented on OpenAI And Anthropic Aim For Big Valuation Spikes, Visa Looks To Join Generative AI Gold Rush: Your point of view caught my eye and was very inte

Deep active optimization for complex systems

Framework of AO

Technical details of NTE

Top-visit sampling

Ablation study

Synthetic functions

Electron ptychography

Feature engineering

Optimization target

Correlation index

Dataset simulation

Ptychographic reconstruction

Architected materials

Feature engineering

Optimization target

Finite element simulation

ML model

Compositionally complex alloys

Feature engineering

Optimization target for magnetic and electric properties

Optimization target for transport properties

Density functional calculation

ML model

Cyclic peptide binder

Feature engineering

Optimization target

Dataset

Alphafold2 settings

Rosetta interface analyzer

Leave a Reply

RECENT POSTS

AI doesn’t take away jobs, it “disconnects” them.

Machine learning and SHAP-based risk assessment of PICC-associated bloodstream infections in clinically suspected premature infants

Will AI be reviewing your grant applications soon?

Framework of AO

Technical details of NTE

Top-visit sampling

Ablation study

Synthetic functions

Electron ptychography

Feature engineering

Optimization target

Correlation index

Dataset simulation

Ptychographic reconstruction

Architected materials

Feature engineering

Optimization target

Finite element simulation

ML model

Compositionally complex alloys

Feature engineering

Optimization target for magnetic and electric properties

Optimization target for transport properties

Density functional calculation

ML model

Cyclic peptide binder

Feature engineering

Optimization target

Dataset

Alphafold2 settings

Rosetta interface analyzer

Related Posts

Leave a Reply