In order to discover potential new catalysts for converting CO2 into methanol, we present the workflow depicted in Fig. 1. The key steps are summarized here, with detailed implementation procedures and configurations available in the Methods section.

The AED catalyst database was created through a series of steps, including the choice of metals, bulk optimization, selection of relevant surface geometries, preparation of adsorbate geometries, validation and compilation of AEDs, as illustrated in the figure.
Search space selection
To effectively reduce the search space for potential catalyst materials for CO2 thermal conversion, we first isolated the metallic elements that have undergone prior experimentation for this process, as documented by Bahri et al.36. To maintain prediction accuracy, these elements also had to be part of the Open Catalyst 2020 (OC20) database33. The elements shortlisted are the following: K, V, Mn, Fe, Co, Ni, Cu, Zn, Ga, Y, Ru, Rh, Pd, Ag, In, Ir, Pt, and Au. We then proceeded to search through the Materials Project database37 for stable and experimentally observed crystal structures associated with these metals and their bimetallic alloys. We compiled 216 stable phase forms involving both single metals and bimetallic alloys corresponding to our set of 18 elements. A detailed listing of these materials is provided in Supplementary Tables S1 and S2 of the Supplementary Information. We performed bulk DFT optimization at the RPBE38 level to align with OC20 for the obtained materials. Optimization of 22 materials was not successful, and therefore, they were excluded from the materials list, as detailed in Supplementary Table S2 in the Supplementary Information.
To identify the most crucial adsorbates for AEDs calculations, we perused the existing literature. An experimental investigation by Amman et al.30 highlighted the presence of surface-bound radicals such as *H (hydrogen atom), *OH (hydroxy group), *OCHO (formate), and *OCH3 (methoxy) as essential reaction intermediates in the thermocatalytic reduction of CO2 to methanol. Based on these findings, we selected these adsorbates for our AEDs calculations. Please note that the notation for formate can vary across the literature, e.g., *HCOO30,39 and HCOO*40. With the help of fairchem repository tools by OCP41, we created surfaces with their Miller index ∈ { − 2, − 1, . . . , 2} and calculated their total energy using OCP MLFF. If we encountered multiple cuts for the same facet, we selected the one with lowest energy for further calculations. Then we engineered surface-adsorbate configurations for the most stable surface terminations across all facets within our defined Miller index range for the materials, as described in the Methods section, and optimized these configurations using the OCP MLFF. During this process, we discovered that seven materials exhibited so large surface-adsorbate supercells, that their calculations were infeasible on available GPU resources, even with the effective OCP MLFF. Consequently, they were excluded from our study.
Validation and data cleaning
In our work, we have employed the OCP equiformer_V2 MLFF. Its reported accuracy for the adsorption energy of small molecular fragments is 0.23 eV42. However, *OCHO was not included in the OC20 database used for training the equiformer_V2, raising concerns about the accuracy of our adsorption energy predictions in this work. To benchmark equiformer_V2 for our use case, we chose Pt, Zn, and NiZn and performed explicit DFT calculations (see Methods section for details). The comparison between predicted and DFT-calculated adsorption energies can be found in Fig. 2 and Table 1. The predictions for Pt are precise, whereas the NiZn results show some outliers, and there is a noticeable degree of scatter for Zn. Despite this, the overall mean absolute error (MAE) for the adsorption energies of the selected materials is 0.16 eV, which is impressive and falls within the reported accuracy of the employed MLFF.

a–c Comparison of adsorption energies predicted by the OCP equiformer_V2 MLFF against single-point DFT calculations, for a Pt, b Zn and c NiZn. d Histogram of EMAEs for materials calculated with the OCP MLFF, with a 0.25 eV cut-off line, showing that the majority of materials have their EMAEs between 0.05 and 0.10 eV. Out of a total of 187 OCP-calculated materials, 17 are not shown here as their EMAEs exceed 0.5 eV.
To affirm the reliability of our predicted AEDs across a broader range of materials along with maintaining computational practicality, we integrated a validation step within our analysis workflow. We sampled the minimum, maximum, and median adsorption energies for each adsorbate-material pair from the predicted AEDs. We performed single-point DFT calculations on these selected systems and compared with the adsorption energy predictions of the OCP MLFF. The difference is compiled in an ‘estimated MAE’ (EMAE). Comparisons between EMAE and the all-encompassing MAE for our complete test set are presented in Table 1. While the EMAE may differ from the actual MAE by up to a factor of three for specific adsorbates, it generally remains in close proximity to the actual MAE, thus serving as a reliable gauge of data quality.
The validation step is connected with the final data cleaning when we exclude any material with an EMAE surpassing the threshold of 0.25 eV. Consequently, 29 materials were expunged from our dataset, retaining 158 materials. Most materials flagged for significant EMAEs exhibited magnetic properties, exemplified by materials like MnCo, MnGa, or FeCo. Magnetism presents significant challenges for the non-spin-polarized DFT calculations used in OC20 and in this work. A complete list of estimated MAEs for the remaining 158 materials is accessible in ref. 43.
Adsorption energy distributions
Lastly, to compile the AEDs, we examined the relaxed configurations. For many distinct initial configurations of identical adsorbates, materials, and facets that converged to the same final structure, only one of them is considered in the AED. In our final compilation, we transformed all AEDs into histograms that depict the probability distribution of adsorption sites falling within 0.1 eV energy intervals. Each AED was normalized, ensuring that the aggregate probability of adsorption sites per adsorbate and material equaled one. This standardization facilitates direct comparisons across materials with different numbers of adsorption sites, which can range from several tens to nearly 10,000 for a single material, depending upon the complexity and symmetry of its bulk structure. For illustrative purposes, Fig. 3 displays examples of AEDs for selected materials. The AEDs for all investigated materials is shown in Supplementary Fig. S1 in the Supplementary Information.

a AEDs for 25 materials with the amplitude of the distribution encoded in the intensity of the corresponding color; detailed AEDs for b K, c Y3In5, d Ni, e CuZn, f NiZn, g ZnRh. The numbers on the y-axis of panel (a) indicate the cluster numbers defined in Fig. 4.
Inspection of Fig. 3 and Supplementary Fig. S1 reveals that adsorption energies span a wide interval from −7.42 eV to 2.40 eV. We included energies above zero, although positive adsorption energies are typically indicative of molecular desorption. However, the adsorption energies reported in this work do not include entropy and pressure terms, which could shift the energies to more negative values. Secondly, the adsorption energy of radicals is somewhat ill-defined, if different desorption channels are conceivable. Since our objective is to achieve a qualitative comparison across materials, the price energy zero is of no relevance, as long as it is chosen consistently.
The AEDs exhibit varying dispersion and forms, indicating fluctuations in adsorption energy and related activity levels across the material space. The adsorption energies of *OCH3 (Eads) are generally the lowest, followed by those of *OCHO, which are approximately 0.5–1 eV higher. However, certain materials, such as K (illustrated in Fig. 3b), show unique distribution overlaps for *OCHO and *OCH3. Meanwhile, *H and *OH have comparatively higher Eads values, although their order is inconsistent. For instance, in some cases, *H has the highest Eads, particularly for K and Y3In5, whereas the opposite trend is observed for other materials like Ni. Single metal distributions are generally narrower and higher, as seen in the examples of K and Ni. Similarly, alloys composed of elements with high symmetry, such as CuZn, also exhibit narrow AEDs.
If the AEDs of a material predominantly align around the adsorption energy linked to maximal activity according to the Sabatier principle, the material is a strong candidate for a good catalyst. Conversely, complex alloys with low symmetry, such as Y3In5 (shown in the lower section of Fig. 3a and in Fig. 3c), display broad AED spreads. Extremely low adsorption energies can lead to catalyst poisoning, while excessively high energies can significantly reduce catalytic activity. Therefore, broad distributions are less desirable, as only a small portion of the material’s surface contributes effectively to catalytic processes.
Unsupervised learning: catalyst discovery
Although the ideal AEDs for the four adsorbates remain unknown, it is feasible to approximate their reactivity using AEDs based on their resemblance to previously identified, efficacious catalytic materials. In this context, our AEDs can be conceptualized as four-dimensional probability distributions. To quantify similarities across AEDs of different materials, we employ the Wasserstein distance as the metric35. By computing Wasserstein distances for all possible material combinations, we construct a distance matrix. To interpret the distance matrix, we apply hierarchical agglomerative clustering with Ward linkage44, which facilitates the identification of materials with similar AEDs. The outcomes of this clustering analysis are depicted in Fig. 4.

The graph shows that all 158 materials were assembled into 19 clusters, based on the similarity between of the AEDs, with the exception of potassium, which is dissimilar to nearby materials and forms a single, non-numbered cluster. This can be seen in detail in the insets, where materials in clusters 8, 9 and 10 are shown. The cluster 10 contains several alloys, which are part of known high-yield catalysts as well as new potentially active materials.
For a clustering threshold corresponding to a Wasserstein distance of 2.5 × 10−3, we arrive at a total of 19 distinct clusters, with potassium (K) forming its own, isolated, unnumbered cluster. The separation between clusters 11–19 and clusters 1–10 is considerable. The distinguishing feature is the broadness of the AEDs. The distributions in clusters 11–19 are noticeably broader than in clusters 1–10. Representative examples are depicted in Fig. 3a. The four materials at the bottom of the figure (Y3Zn11, V6Ga5, Y3In5, V4Zn5) pertain to cluster 18, whereas the rest belong to clusters 8–10. Further details are available in Supplementary Fig. S1 of the Supplementary Information, presenting the clustering of all considered materials. AEDs exhibit variability across distinct clusters (1–10) but show remarkable similarity within each individual cluster. For example, the AEDs for Ni, CuZn, NiZn, and ZnRh illustrated in Fig. 3d–g belong to the same cluster.
Clusters 8 through 10 aggregate into a larger cluster, from hereon denoted the macro-cluster, with relatively homogeneous AEDs. It encompasses materials such as Cu, a notably active component within known Cu/ZnO/Al2O3 catalysts30,36. The clusters also contain non-Cu materials such as Zn-Pd, Pd-In, Pt-In, and Ni-Zn in different compositions that have been reported as catalytic converters of CO2 to methanol36,45. Also different compositions of the bimetallic alloys Ga-Ag, In-Ag, K-In, Zn-Rh, and Zn-Pt, which, to our knowledge, have not been tested for CO2 to methanol conversion, are grouped with these materials. While most of the materials in the macro-cluster have either shown good catalytic performance or some have not been tested, potassium (K) (the lone non-numbered cluster), as a pure metal, is likely to undergo rapid oxidation under reaction conditions. Therefore, we anticipate that the macro-cluster, consisting of clusters 8-10, is likely too diverse to pinpoint only catalytically active materials.
Upon closer inspection, ZnRh and ZnPt3 stand out as new candidates. They are part of cluster 10, which also includes Ga2Cu, NiZn, InPt3, Ni and mainly CuZn, but have not been tested for CO2 to methanol conversion. While the exact material composition of the Cu and ZnO-based catalysts during the proceeding reaction is still debated46,47,48, some studies suggest that the formation of a Cu-Zn alloy enhances the activity. This Cu-Zn alloy, which is part of cluster 10, is believed to contribute to increased activity30,47,49. Similarly, NiZn has also been identified as an effective CO2 catalyst45. Catalysts such as Cu/Ga/ZnO50, Cu/Ga/SiO251 and Pt/In2O352, known for their high methanol yield, may include Ga2Cu and InPt3 alloys, respectively. Finally, Ni is often part of catalysts for CO2 transformation to methane53,54. The strong catalytic activity of Ga2Cu, NiZn, InPt3, Ni, and mainly CuZn in this cluster suggest that ZnRh and ZnPt3 should also have high activity.
To conclude this section, we reiterate that our approach groups catalyst materials according to their computed AED similarity. To ascribe meaning to certain similarities we observe, we currently rely on experimental reports of catalytically active materials. Our proposals for interesting candidates are based on the assumption that AED similarity with a known good catalyst is a meaningful indicator for promising catalytic activity. Since the catalyst composition and microstructure are often not reported or not known, “active sites” or details of the catalytic mechanisms also remain opaque30,46,48,55. In this context, our AED descriptor remains an attempt to find proxies for complex processes. It goes beyond the common practice of focusing on single adsorption energies in “active sites”, but could certainly be extended in future work and in collaboration with more detailed experimental investigations.
Statistical analysis and discussion
AEDs could serve as a descriptor of activity, however, the vast number of parameters (at least 388 bins in the distribution) makes it challenging to analyze them manually. To further our insight into the generated data, we conducted a statistical analysis of AEDs (SAAEDs) that facilitates comparison with previous adsorption energy-based studies. An example can be seen in Table 2, where we present the minimum adsorption energies for a subset of materials featured in Fig. 3a.
Our SAAED analysis revisits individual binding energies and connects to the Sabatier principle. For instance, the results for *OH, *OCHO, and *OCH3 can be compared to the volcano plot in Studt et al.49, that relates the catalytic activity of the studied materials to the oxygen adsorption energy. In line with our approach, their work compares potential catalyst materials to Cu, although their focus lies on single-facet surfaces. Following previous findings that the Cu(211) facet is more active than the close-packed Cu(111) surface47, Studt et al. use Cu(211) as their reference. The catalytic activity of Cu is further enhanced when Zn is added to the Cu(211) surface (referred to as Cu+Zn in the article). The oxygen adsorption energy decreases upon Zn addition, which indicates that the optimal oxygen adsorption energy should be lower than its minimal adsorption energy on the Cu(211) surface. Our data is consistent with those findings for the majority of our promising candidate materials. The minimal adsorption energy (\({E}_{ads}^{\min }\)) for all the oxygen-containing adsorbates on the majority of the materials in cluster 10 (highlighted in Table 2), including ZnRh, lies below that of Cu (our Cu data also covers the (211) surface) and is closely aligned across the materials. The exceptions are InPt3 and ZnPt3, in which the minima lie slightly above those of Cu, while both materials exhibit similar \({E}_{ads}^{\min }\) for all other adsorbates. This difference suggests that InPt3 and ZnPt3 may feature slightly different CO2 conversion mechanisms.
Using minimum adsorption energies derived from ML models is comparable to previously studied methods for identifying global minima13,21. Although the techniques by Lan et al.21 and Chen et al.13 might be more appropriate for the straightforward application of the Sabatier principle, our approach excels in providing more comprehensive information on various facets of catalytic materials. We have compiled this information for selected materials in Table S1 in the Supplementary Information. For example, the AED spread across energies, which can be deduced from the standard deviation \({E}_{\,\text{ads}}^{\text{std}\,}\), provides information about the percentage of the surface area usable for catalytic conversion.
Ultimately, both AEDs and SAAEDs, available on Zenodo43, can serve as material fingerprints. The SAAED acts like a materials descriptor, similar to the Magpie descriptor16, but can be adapted to specific reactions through the choice of adsorbates, offering more detailed and relevant material information. Optionally, specific descriptors (AED, SAAED) and general descriptors (like Magpie16) may be combined to enhance the information that might be lacking in ML models from theoretical calculations.
Both catalyst descriptors are tailored to perform extensive searches for catalytically active candidates. They do not, however, include effects of the support, additives, preparation procedures and operando states that could change the morphology of the catalyst (e.g., nanoparticle sizes or areas of different facets). Our proposed AED descriptor does not take the facet area into account and is therefore insensitive to morphology changes of the catalysts under reaction conditions. In principle, a Wulff construction could introduce better facet information. However, it also cannot account for support effects, additives or preparation conditions.
Our methodology facilitates high-throughput screening of metallic catalyst candidates. At present, the effects of co-operating oxides such as ZnO, In2O3, and ZrO2 that have been observed experimentally are not considered. Such oxides affect the electronic structure and adsorption energy landscape at the metal-oxide interface46,55,56,57,58 and should be included in future versions of our descriptor. For instance, incorporating general descriptors (e.g., Magpie) for co-catalysts and support materials could provide additional information to decide which active material-support combination should be investigated further in experimental testing.
Additionally, the choice of adsorbates can also influence the effectiveness of the AED descriptor. Our study focuses on the four most relevant intermediates, observed on the Cu(211) surface30. Studies on different materials, such as Ni-ZrO258,59, suggest that other intermediates or by-products like CO could play an important role in the hydrogenation mechanism. Thus, further investigations could extend the set of adsorbates to better capture various reaction paths and, therefore material-specific activity.
Our workflow clusters materials with high CO2 conversion efficiency, but the materials can vary in their selectivity towards methanol or methane36,45,50,52,60. As the reaction conditions, preparation procedures, or the interaction with support materials seem to affect the selectivity45,60, our proposed catalyst candidates should thus be tested under various conditions to investigate their optimal selectivity towards methanol.
To finalize the analysis of our results, the similarity of the SAAED and Wasserstein distances of ZnRh and ZnPt3 to good catalysts in the literature suggests that they could be potential catalyst candidates. As Cu-based catalysts are known for their vulnerability to degradation5, it is therefore reasonable to pre-examine these materials also in terms of stability. Given the harsh reaction conditions, mainly temperatures around 800 K36, the melting temperature of the catalyst is directly related to the stability of the catalyst. The melting temperature of both ZnRh and ZnPt3 is higher than that of pure copper or CuZn37, suggesting that our new candidates could also be more durable.
Summary
In summary, we have established a fast and reliable computational approach for discovering new catalyst candidates for the conversion of CO2 to methanol utilizing data-driven methodologies such as MLFFs and hierarchical clustering. Beginning with a list of potential metallic elements, we extracted experimentally verified materials from the Materials Project database. By integrating tools from fairchem, mainly OCP MLFFs, we created an extensive database of adsorption energies for a wide range of material facets and possible adsorption sites. We compiled this information to obtain a novel material descriptor, AED, which offers a more effective representation of the complex nature of heterocatalysts compared to standard methods. By carefully choosing the adsorbates, the descriptor can be tailored to provide the most information for any heterocatalytic reaction under study. Through efficient sampling for validation, we were able to quantify the quality of our workflow with a minimal number of DFT calculations while ensuring the high quality of our database. We grouped the materials by their AED similarity using statistical methods and clustering. This allowed us to pinpoint promising new candidates, namely ZnRh and ZnPt3, based on their resemblance to known effective catalysts. Our results indicate that AEDs, together with statistical analysis, can serve as material fingerprints, aiding in the prediction of catalyst activity and accelerating the discovery process.
