Machine learning the Roman universe

Broad science – normal
Shirley Ho, New York University, PI

The Rome Space Telescope (Rome) with its Large Scale Structure (LSS) survey will provide us with an unprecedented amount of data to answer fundamental questions about the universe, such as its origin, content, and future. Advances in either of these directions could represent breakthrough discoveries in physical cosmology. However, the nonlinear gravitational evolution makes it difficult to extract relevant information using traditional methods, as the current methods adopted in LSS surveys cannot extract the complete information content of the universe.

To address this challenge, we propose to develop three machine learning (ML)-based methods to learn information in the data and determine cosmological parameters and initial conditions of the universe. The proposed method has the potential to optimally (theoretically) extract information from Roman LSS data. The first method is based on a Bayesian statistical inference framework, which first reconstructs the initial conditions and uses that information to learn the likelihood of the data. The second method is based on unsupervised learning and learns the likelihood of the data as a function of cosmological parameters via a regularization flow. The third method is based on diffusion models and uses score-based generative models to generate posterior samples of the initial conditions and properties of the universe from nonlinear large-scale structures.

We exploit negligible astrophysical nuisance parameters, take advantage of scale separation information, and pay special attention to the robustness of all methods to systematic errors and astrophysical effects. An important contribution of this proposal is the generation of a mock survey dataset through deep learning accelerated simulations of galaxy surveys. These also serve as testbeds for ML methods. As an example use case, we apply these tools to the problem of extracting information about the initial conditions of the universe via primitive non-Gaussianity from space-based galaxy survey data.

The second major goal of this proposal is to develop a community framework that allows testing and comparing different ML techniques. We create deep learning-accelerated simulation datasets with exploratory realism that can be used for benchmarking and blind analysis of various methods using realistic computational simulations. We promote open access ML tools by releasing both software and simulated datasets into the public domain and providing community support for these products. Encourage community engagement through data challenges.

The results of this study provide a new ML method that significantly improves the information content of existing LSS analysis methods, which could expand the potential for space-based LSS missions to uncover the fundamental physics of the Universe, not just in Rome.

Source link