Searching for “oddballs” with machine learning: Detecting anomalous exoplanets using low-dimensional representations of transit spectra deep learned by autoencoders

Searching for “oddballs” with machine learning: Detecting anomalous exoplanets using low-dimensional representations of deeply learned transit spectra using autoencoders — Looking for the “oddball” — Grok via Astrobiology.com

In this study, we explore the application of autoencoder-based machine learning techniques for anomaly detection to identify exoplanet atmospheres with unconventional chemical signatures using a low-dimensional data representation.

We use the Atmospheric Big Challenge (ABC) database, a publicly available dataset containing over 100,000 simulated exoplanet spectra, to construct an anomaly detection scenario by defining CO2-rich atmospheres as anomalies and CO2.₂– Poor atmosphere like a normal class.

We benchmarked four different anomaly detection strategies: autoencoder reconstruction loss, one-class support vector machine (one-class SVM), K-means clustering, and local outlier factor (LOF). Each method was evaluated in both the original spectral space and the autoencoder latent space using receiver operating characteristic (ROC) curves and area under the curve (AUC) metrics.

To test the performance of different methods under realistic conditions, we introduced Gaussian noise levels ranging from 10 to 50 ppm. Our results show that anomaly detection is consistently more effective when performed in latent space across all noise levels. Specifically, K-means clustering in latent space has emerged as a stable and high-performance technique.

We demonstrate that this anomaly detection approach is robust to noise levels up to 30 ppm (consistent with realistic space-based observations) and is viable even to 50 ppm when leveraging latent space representations. On the other hand, the performance of anomaly detection methods applied directly to the raw spectral space decreases significantly as the level of noise increases.

This suggests that autoencoder-driven dimensionality reduction provides a robust methodology for flagging chemically unusual targets in large-scale surveys where exhaustive searches are computationally prohibitive.

Alexander Roman, Emily Panek, Roy T. Forestano, Eyup B. Unru, Katya Macheva, Konstantin T. Machev

Comments: 14 pages, 12 figures
Subjects: Earth and Planetary Astrophysics (astro-ph.EP); Astrophysical Instruments and Methods (astro-ph.IM); Machine Learning (cs.LG)
Quote: arXiv:2601.02324 [astro-ph.EP] (or arXiv:2601.02324v1 [astro-ph.EP] for this version)
https://doi.org/10.48550/arXiv.2601.02324
focus to learn more
Post history
Posted by: Emily Panek
[v1] Monday, January 5, 2026 18:15:53 UTC (2,734 KB)
https://arxiv.org/abs/2601.02324
astrobiology,

Source link