Framework of AO
We summarize our key innovations as follows:
-
A data-driven formula that leverages the number of visits and ML from a small initial dataset to effectively manage the exploration–exploitation trade-off. This markedly differs from the UCB formula utilized by MCTS, which relies on the average node value and the number of visits derived from numerous simulations.
-
Local backpropagation that ensures a balanced exploration–exploitation trade-off for the noncumulative reward problems.
-
Adaptive exploration mechanism that favors exploration over exploitation under certain circumstances.
-
A modified epsilon-greedy sampling technique that samples best-scored candidates and most-visited candidates at the same time.
While Fig. 1 provides the flowchart illustration of the AL loop, We provide a mathematical formulation framework of the AL problem (referring to materials science as a demonstrator).
Specifically, let X denote the input space (representing, for example, materials such as chemical compositions, specific crystalline structures and so on). Let Y represent the output space, where y ∈ Y (y < +∞) denotes the specific property or property spectrum of interest (for example, mechanical strength or resistivity) The goal is to identify the optimal material x* ∈ X that maximizes or minimizes a property while minimizing the number of labeled data points required. The initial labeled dataset L consists of \(D={\{({x}_{i},{y}_{i})\}}_{i}^{n}\), where n is the number of initial data points (n = 200 in this study). x is the input vector, X is defined as the search space, typically \({{\mathbb{R}}}^{N}\), and N is the dimension. f is the deterministic function that maps the input x to the ground-truth label y. The surrogate model fθ learns the input-label relation through the dataset \(D={\{({x}_{i},{y}_{i})\}}_{i}^{n}\), and n is the number of labels and yi is the label of xi.
The AL loop involves iteratively selecting the samples from X, based on a search algorithm Q(x; fθ), and retraining the surrogate model fθ. At each iteration t:
-
(1)
Model training: train the model fθ using the current labeled dataset D:
$${\theta }^{t}={{\mathrm{argmin}}}_{\theta }{{\mathbb{E}}}_{(x,y)D}[L(\theta ;x,y)],$$
where L is the loss function.
-
(2)
Search and selection: select a subset of k samples xnew ∈ X based on fθ using a search algorithm Q(x; fθ) (k = 20 in both benchmark and real-world studies):
$${x}_{{\mathrm{new}}}={{\rm{argmax}}}_{x\in X}{\;f}_{\theta }.$$
-
(3)
Labeling and updating: obtain the labels ynew for the selected samples xnew, and add them to
$$D\leftarrow D\cup \{({x}_{{\mathrm{new}}},{y}_{{\mathrm{new}}})\}.$$
RL is another commonly used method for identifying optimal solutions. Differences in AL and RL lie in three main aspects: (1) data accessibility, (2) the quantity of data needed and (3) the nature of rewards (noncumulative versus cumulative).
-
Data accessibility: In typical RL settings, a policy network interacts with the environment, requiring easy access to reward functions. By contrast, AO, particularly in scientific contexts, often deals with limited access to reward functions. For instance, in materials science, it might take months to obtain just a few labeled data points.
-
Quantity of data needed: RL training commonly demands large amounts of labeled data or observations to develop an effective policy network. AL, however, operates in a low-data regime, usually with fewer than 1,000 data points, and requires only a value-estimation network.
-
Nature of rewards: RL algorithms are primarily used for trajectory planning and optimal control problems, involving sequential decisions and cumulative rewards. Conversely, AL typically focuses on maximizing the current reward functions.
Technical details of NTE
There are three modes of action for the stochastic expansion, each occurring with equal probability (that is, 1/3). (1) One-step move: this mode represents the smallest possible change at a single position of the feature vector. (2) Single mutation: in this mode, one position of the feature vector randomly mutates to any value within the allowed range. (3) Scaled random mutation: this mode involves a proportion of the feature vector randomly mutating to any allowed values. The number of leaf nodes equals the dimension of the feature vector.
A real-world complex system often can be represented as a vector or a matrix. For example, in materials science, searching for a high-performance CCAs can be formulated as optimizing the properties by tuning the alloy compositions36; In biology, the protein design can be approached as improving biofunctionalities by optimizing a sequence consisting of 20 amino acids. We implement a convolutional neural network for the deep learning surrogate model. It consists of convolutional layers and is followed by pooling, dropout and normalization layers to prevent overfitting. The network parameters are optimized using Adam Optimizer, and the loss function is the mean-squared error or mean absolute percentage error. More detailed parameters are found in the Supplementary Note and Supplementary Fig. 3.
Standard MCTS consists of four major steps: selection, expansion, simulation and backpropagation (Supplementary Fig. 12). We summarize key differences between DANTE and MCTS.
-
(1)
The MCTS backpropagation mechanism uses the result of the rollout to update both value and visitation of the nodes along the path, which affects all nodes (from root to end node) at a global level. The local backpropagation updates only the visitation information of the current root node and the subsequent leaf nodes. We do not update the value information because our optimization problem focuses solely on discovering a single optimal state and retains little ‘memory’ of previous states. Therefore, value information is not backpropagated, and visitation backpropagation is short-ranged, relying only on nearby visitation data to guide exploration.
-
(2)
MCTS selection step chooses the leaf node with max UCB and proceeds to the next expansion with the selected leaf node, whereas the expansion of NTE is conditioned on an inequality of the DUCB: the expansion proceeds with the leaf node that has a higher DUCB than root node; otherwise, it proceeds with the same root.
-
(3)
Conventional rollout uses the simulation step to reach the end state (for example, win or loss of a game) and uses the average value as the current node value, while the stochastic rollout of NTE does not need the simulation step to obtain the node value; instead, it uses the surrogate model to estimate the node value.
Furthermore, Supplementary Fig. 13 shows the difference between UCB and DUCB.
Top-visit sampling
The sampling technique for selecting top candidates is a critical component of the AO pipeline. An effective sampling method should identify the most informative candidates while preserving data diversity, ensuring the surrogate model generalizes well to unseen data. The widely used sampling approach is the epsilon-greedy method, which combines greedy selection with random sampling. To enhance the generalization capability of the surrogate model, we extend the epsilon-greedy strategy by implementing ‘top-visit sampling’, which samples data that are frequently visited during rollouts. Figure 3b demonstrates that DANTE, when lacking top-visit sampling, exhibits a higher surrogate model loss and requires 30% more data points to achieve the global optimum (as detailed in the Methods, Supplementary Note and Supplementary Fig. 1).
Ablation study
We conducted an ablation study on the Rosenbrock-100d function to analyze the impact of DANTE’s individual components on overall performance (Fig. 3b; additional results are provided in Supplementary Fig. 1). The results clearly show that conditional selection and local backpropagation are critical to DANTE’s effectiveness. Without conditional selection, the tree search suffers from the value deterioration problem and has a 0% convergence ratio, defined as the frequency with which the algorithm identifies the global optimum within a given number of data points. Without local backpropagation, DANTE becomes a greedy stochastic tree search, leading to poor performance and similarly a 0% convergence ratio. Moreover, omitting top-visit sampling and adaptive exploration, while still allowing for convergence, notably degrades performance. In such cases, the average best f(x) remains distant from the global optimum, and the convergence ratios drop to 60% and 30%, respectively.
We further assess the limits of DANTE by evaluating its performance in tackling high-dimensional problems with a limited number of data points. As shown in Extended Data Table 2, DANTE demonstrates exceptional performance, successfully converging across various synthetic functions ranging from 200 to 2,000 dimensions. By contrast, SOTA methods fail to converge to any functions beyond 100 dimensions. Notably, none of the baseline methods achieves global convergence on the Rosenbrock function in dimensions exceeding 10, while DANTE successfully converges in dimensions as high as 200 (Supplementary Table 3).
Synthetic functions
The synthetic functions are designed for evaluating and analyzing the computational optimization approaches. In total, six of them are selected on the basis of their physical properties and shapes. Results for the Ackley, Rosenbrock and Rastrigin functions are presented in the main text because they are widely studied and relevant results are extensively available in the literature. We also test three other synthetic functions (Griewank, Schwefel and Michalewicz), and the results are presented in Supplementary Fig. 10. The Ackley function can be written as
$$f(x)=-a\times \exp \left(-b\right.\sqrt{\frac{1}{d}\mathop{\sum }\limits_{i=1}^{d}{x}_{i}^{2}}-\exp \left(\frac{1}{d}\mathop{\sum }\limits_{i=1}^{d}\cos (c{x}_{i})\right)+a+\exp (1),$$
(2)
where a = 20, b = 0.2, c = 2π and d is the dimension.
The Rosenbrock function can be written as
$$f(x)=\mathop{\sum}\limits_{i=1}^{d-1}\left[100{({x}_{i+1}-{x}_{i}^{2})}^{2}+{({x}_{i}-1)}^{2}\right].$$
(3)
The Rastrigin function can be written as
$$f(x)=10d+\mathop{\sum }\limits_{i=1}^{d-1}\left[{x}_{i}^{2}-10\cos (2\uppi {x}_{i})\right].$$
(4)
The three functions are evaluated on the hypercube xi ∈ [−5, 5], for all i = 1, …, d with a discrete search space of a step size of 0.1; we also show that different step sizes (within a certain range) do not affect the general behavior of the algorithm (Supplementary Fig. 14). We sample 20 data points per round when using neural networks as surrogate models. More details and results can be found in the Supplementary Note.
Electron ptychography
Feature engineering
The feature vector consists of eight variables: beam energy, defocus, maximum number of iterations, number of iterations with identical slices, probe-forming semi-angle, update step size, slice thickness and number of slices. Detailed values and their bounds are listed in Supplementary Table 4.
Optimization target
The objective function NMSE is calculated between the positive square root of the measured diffraction pattern IM and the modulus of the Fourier-transformed simulated exit-wave Ψ, which can be formulated as
$$\frac{1}{N}\mathop{\sum }\limits_{i}^{N}{\left\vert \sqrt{{I}_{{\mathrm{M}}(i)}({\bf{u}})}-\left\vert {\mathscr{F}}[{\Psi }_{i}({\bf{r}})]\right\vert \right\vert }^{2},$$
(5)
where r and u denote the real- and reciprocal-space coordinate vectors, respectively, and N is the total number of the measured diffraction patterns.
Correlation index
The degree of matching for a given template T by intensity function P is characterized by a correlation index, which can be defined by the following relation:
$$\frac{\mathop{\sum }\nolimits_{i = 1}^{m}P({x}_{i},{y}_{i})T({x}_{i},{y}_{i})}{\sqrt{\mathop{\sum }\nolimits_{i = 1}^{m}{P}^{2}({x}_{i},{y}_{i})}\sqrt{\mathop{\sum }\nolimits_{i = 1}^{m}{T}^{2}({x}_{i},{y}_{i})}},$$
(6)
where (xi, yi) is the coordinate of pixel i.
Dataset simulation
abTEM37, an open-source package, is used for the simulation of a TEM experiment. For this case study, we simulated a four-dimensional dataset of 18-nm-thick silicon along the [110] direction with Poisson noise.
Ptychographic reconstruction
The analysis is performed using py4DSTEM38, a versatile open-source package for different modes of STEM data analysis. See Supplementary Figs. 11 and 15 for more details about the reconstruction process.
Architected materials
Feature engineering
In this study, the objective for architected materials optimization is a Gyroid triply periodic minimal surface structure, which naturally occurs in butterfly wings and is renowned for its exceptional biological characteristics and mechanical performance. The Gyroid scaffold to be optimized comprises 27 subunits with a dimension of 2 × 2 × 2 mm, allowing for tuning its geometry features and mechanical properties by adjusting each subunit’s density. The density of each subunit can take discrete values from 10% to 80%, with an increment of 10%. The base material of the scaffold is Ti6Al4V alloy. Three-dimensional convolutional neural networks are used to accurately and rapidly assess the impact of the adjustments of the subunit’s density on the scaffold’s performance. Details about structure generation are presented in ref. 39.
Optimization target
To mechanically stimulate bone reconstruction in bone defects, it is well recognized that the elastic modulus of bone grafts should be equivalent to that of the replaced bone, which ranges from 0.03 to 3 GPa for cancellous bone and 3 to 30 GPa for cortical bone, while there are specific modulus demands for different anatomical locations40. Moreover, it requires the optimization of load-bearing capacity to prevent damage during implantation. Here, we establish the modulus requirement for the implanted site at 2.5 GPa. Consequently, the optimization target is to maximize the yield strength of the scaffold while ensuring the elastic modulus remains within a specified range (2,500 ± 200 MPa).
Finite element simulation
Finite element (FE) simulations of the compressive stress–strain curves of scaffolds are conducted using ABAQUS 2018. The FE simulations utilize the same rigid-cylinder and deformable-implant-structure model. The material property is set to be homogeneous with a Poisson’s ratio of 0.25; more details in the calibration protocol were developed in ref. 39. Ductile damage is used to simulate plastic deformation up to the failure stage, with a fracture strain set at 0.03. The effects of triaxiality deviation and strain rate are disregarded. Displacement and force are extracted during postprocessing and subsequently converted to strain and stress, respectively. FE simulation agrees well with the experiment compression curves (Supplementary Fig. 16).
ML model
The initial dataset (100 density matrices) is consistent with our previous work39, and the corresponding elastic modulus and yield strength are calculated by FE simulations. Three-dimensional convolutional neural networks are used to predict the elastic modulus and yield strength of the scaffolds with varying density matrices. The model architecture comprises an input layer, convolutional layers, fully connected layers and an output layer (refer to the Supplementary Note and Supplementary Fig. 17 for detailed parameters). In the input layer, the scaffold structure is voxelized into 60 × 60 × 60 pixels, where each pixel denotes either the solid phase (1) or void phase (0) within the scaffold. The convolutional layers are designed with a series of three-dimensional convolution kernels to extract high-dimension information about the scaffold, while the output layer delivers the final prediction.
Compositionally complex alloys
Feature engineering
We adopt 27 elements: Fe, Co, Ni, Ta, Al, Ti, Nb, Ge, Au, Pd, Zn, Ga, Mo, Cu, Pt, Sn, Cr, Mn, Mg, Si, Ru, Rh, Hf, W, Re, Ir and Bi, to design six-element CCAs with either bcc or fcc structures. For Fe, Co and Ni, the atomic ratio ranges from 0 at.% to 100 at.%, while for other elements, it ranges from 0 at.% to 40 at.%, with 0.5 at.% intervals. In addition, the total atomic percentage of Fe, Co and Ni is designed to fall between 60 at.% to 80 at.%. For CCAs with a bcc crystal structure, the Fe/(Co + Ni) ratio is required to be greater than or equal to 1.5, whereas for fcc structures, it is required to be less than or equal to 1.5.
Optimization target for magnetic and electric properties
The optimization target is to maximize the following target:
$${\mathrm{Target}}=M\times \rho,$$
(7)
where M stands for magnetic moment and ρ for resistivity.
Optimization target for transport properties
The optimization target is to maximize the following target:
$${\mathrm{Target}}={\mathrm{AHC}}\times {\mathrm{AHA}},$$
(8)
while keeping the formation energy under the upper limit of 0.02.
Density functional calculation
The transport properties are described by the conductivity tensor σνμ (ν, μ = x, y, z). The anomalous Hall conductivity (AHC, σxy) and anomalous Hall angle (AHA, σxy/σxx) are determined in the frame of Kubo–Bastin linear response formalism within relativistic multiple-scattering Korringa–Kohn–Rostoker (KKR) Green’s function (GF) method41, which has been implemented in the MUNICH SPR-KKR package42. The Kubo–Bastin formalism includes both the Fermi-surface and Fermi-sea contributions to equal footing, in which the Fermi-surface term contains only contribution from states at the Fermi energy (EF) while the Fermi-sea term involves all the occupied states (with energy E) below the Fermi energy, that is,
$${\sigma }_{\mu \upsilon }={\sigma }_{\mu \upsilon }^{I}+{\sigma }_{\mu \upsilon }^{II}$$
(9)
$${\sigma }_{\mu \upsilon }^{I}=\frac{\hslash }{2\uppi \varOmega }Tr\left\langle{\hat{j}}_{\mu }\left({\hat{G}}^{+}-{\hat{G}}^{-}\right){\hat{j}}_{v}{\hat{G}}^{-}-{\hat{j}}_{\mu }{\hat{G}}^{+}{\hat{j}}_{v}\left({\hat{G}}^{+}-{\hat{G}}^{-}\right)\right\rangle$$
(10)
$$\begin{array}{l}{\sigma }_{\mu \upsilon }^{II}=\frac{\hslash }{2\uppi \varOmega }\mathop{\int}\nolimits_{-\infty }^{{E}_{F}}{\mathrm{Tr}}\left\langle {\hat{j}}_{\mu }{\hat{G}}^{+}{\hat{j}}_{v}\frac{{\mathrm{d}}{\hat{G}}^{+}}{{\mathrm{d}}E}-{\hat{j}}_{\mu }\frac{{\mathrm{d}}{\hat{G}}^{+}}{{\mathrm{d}}E}{\hat{j}}_{v}{\hat{G}}^{+}\right.\\\left.-\left({\hat{j}}_{\mu }{\hat{G}}^{-}{\hat{j}}_{v}\frac{{\mathrm{d}}{\hat{G}}^{-}}{{\mathrm{d}}E}-{\hat{j}}_{\mu }\frac{{\mathrm{d}}{\hat{G}}^{-}}{{\mathrm{d}}E}{\hat{j}}_{v}{\hat{G}}^{-}\right)\right\rangle.\end{array}$$
(11)
The electric current operator is given by\({\hat{j}}_{\mu (v)}=-| e| c\alpha\), with e > 0 being the elementary charge. \({\hat{G}}^{+}\) and \({\hat{G}}^{-}\) denote the retarded and advanced GFs, respectively. The representation of the GFs for the first-principles treatment of equations (10) and (11) leads to a product expression containing matrix elements of the current operators with the basis functions and k-space integrals over scattering path operators. In this averaging procedure, the chemical disorder and vertex corrections are treated by means of coherent potential approximation43. For both Fermi surface and surface terms, the conductivity tensor partitions into an on-site term σ0 involving regular and irregular solutions and an off-site term σ1 containing only regular solutions. This formalism has been validated to provide consistent residual and anomalous Hall resistivities with experiments41; more details can be found in the Supplementary Note.
ML model
Initial 200 CCAs are randomly generated following the previously described design rules, and their corresponding AHA, AHC and formation energy are calculated by density functional theory (DFT). For CCAs with bcc grain structures, 154 configurations ultimately converge in the DFT calculations, whereas for fcc structures, there are 178. We train one-dimensional convolutional neural networks to predict the AHA, AHC and formation energy of the CCAs. The model architecture includes an input layer, convolutional layers, fully connected layers and an output layer (see the Supplementary Note and Supplementary Fig. 18 for detailed parameters).
Cyclic peptide binder
Feature engineering
We represent the cyclic peptide as a sequence of integers that range from 0 to 19, with each number corresponding to a distinct type of canonical amino acid. The leaf node within the DANTE framework is obtained through stochastic expansion. In this process, two complementary strategies are used: one that introduces random mutations in existing sequences and another that generates entirely new sequences, ensuring a comprehensive exploration of the sequence space.
Optimization target
The optimization target of cyclic peptide binder is defined as follows:
$${\mathrm{Target}}={\mathrm{SC}}\times {\mathrm{dSASA}}/100,$$
(12)
where SC stands for shape complementarity, and dSASA represents the change in solvent-accessible surface area before and after interface formation. The SC value ranges from 0 to 1, referring to how well the surfaces of two proteins fit geometrically together at their interface; dSASA measures the size of the interface (in units of Å2). Both metrics are essential to assess the quality of the interface. Therefore, we multiply these two metrics to formulate a multiobjective optimization problem, which is used to evaluate the performance of DANTE.
Dataset
Fourteen unique protein and canonical cyclic peptide complexes are sourced from the Protein Data Bank, with peptide lengths ranging from 7 to 14 amino acids. We perform three different optimization tasks using DANTE, gradient descent (GD) and Markov chain Monte Carlo (MCMC). The tasks start from a random initial sequence. The structure with the highest target value is selected as the best structure. For each task, we performed three independent tests.
Alphafold2 settings
The structure of protein and cyclic peptide binder complex is predicted by Alphafold2-multimer implemented in ColabDesign. A modified offset matrix for the relative positional encoding of a target protein and cyclic peptide complex is adapted to give the structure with high accuracy44. For designing a cyclic peptide binder, the binder hallucination protocol is utilized for both GD and MCMC methods. In this study, we maintain the length of the cyclic binder and the interaction site hotspots consistent with those found in nature. For GD, the method ‘design_pssm_semigreedy()’ is used, setting soft_iter to 120 and hard_iter to 32. The loss function is a weighted sum of pLDDT (predicted local distance difference test) and interface contact loss, with other parameters left at their default settings. For the MCMC method, a total of 1,000 steps are executed to find the sequence achieving the highest pLDDT. More detail can be found in the Supplementary Note.
Rosetta interface analyzer
The SC and dSASA values for the predicted structure of the protein and cyclic peptide complex are computed using the Rosetta Interface Analyzer. Initially, the Rosetta minimize protocol is applied to obtain the structure with minimum energy proximal to the initial conformation. To ensure that cyclic peptides within the complex retain their cyclic nature and do not become linear, the options ‘-use_truncated_termini’ and ‘-relax:bb_move false’ are used. Subsequently, the minimized complex serves as the input for the interface analyzer.
