A research team at the University of Warwick, UK, has developed a stacked ensemble learning framework to directly predict key asteroid seismic parameters for delta stars from the TESS light curve. This method achieved remarkable results on a sample of 643 stars. The coefficient of determination R² for all target parameters was higher than 0.77, indicating good generalization ability for the 60 stars that were not involved in training. The predicted results are in good agreement with traditional starearthquake analysis.
Astroseismology is one of the most pervasive research methods in modern stellar physics. Analyze the star’s natural vibrational signals to reverse the star’s internal structure and evolutionary state. Among the many research subjects, δ Scuti (approximately 1.5 to 2.5 times the mass of the Sun) The rich pulsation modes and dense vibrational spectra make it an important experimental field in asteroseismology. The pulsations in these stars are primarily driven by the opacity (κ) mechanism of the helium ionization zone, and the inner convective core further induces complex processes such as convective overshoot, chemical mixing, and angular momentum redistribution. At the same time, the relatively fast rotation combines the oscillation modes and splits the frequencies, which greatly increases the difficulty of mode identification and parameter extraction.
In celestial seismic analysis, Parameters such as the frequency corresponding to the highest peak of the power spectrum, the frequency of maximum oscillation power, and large frequency spacing Δν are of particular interest. Among them, Δν is very sensitive to the average density of stars and is a central index for characterizing the overall structure of stars. However, in the case of δ Scuti, the rapid rotation and multimode aliasing destroys the naturally regular frequency spacing, posing significant challenges to traditional methods of measuring Δν.
In recent years, the large-scale and high-precision light curve data acquired by the TESS satellite have greatly expanded the study sample of these types of stars. However, the data processing process is still computationally intensive and experience-dependent, and achieving high-accuracy parameter extraction is still not easy. Against this background, machine learning offers a new technological path. Compared to traditional methods, ensemble learning can integrate the prediction results of multiple models and achieve higher accuracy and stability in complex data environments. Techniques such as random forests, gradient boosting, and ridge regression have shown great potential in astronomical data analysis in recent years.
Based on this idea, a research team at the University of Warwick in the UK has developed a stacked ensemble learning framework. We directly predict important minor star seismic parameters of δ Scuti from the TESS light curve. This method achieved remarkable results on a sample of 643 stars. The coefficient of determination R² for all target parameters was higher than 0.77, indicating good generalization ability for the 60 stars that were not involved in training. The predicted results are in good agreement with traditional starearthquake analysis.
The related research results were published in The Astronomical Journal, titled “An ensemble machine learning approach to estimate the astroseismic index of δ Scuti Stars observed by TESS.”
Research highlights:
*A machine learning framework for directly estimating key starseismic parameters from light curves is proposed, breaking through the limitations of traditional methods and significantly improving the efficiency of parameter extraction.
* Optimized feature selection and model architecture delivers highly accurate predictions, whose reliability is verified in independent samples.
* The stellar seismic index of 251 δ Scuti stars is determined, building a new star catalog and enriching the associated stellar parameter database, providing important data support for future large-sample statistical analyzes and stellar evolution studies.
Paper URL: https://beta.iopscience.iop.org/article/10.3847/1538-3881/ae4bd8
Dataset: TESS light curve screening and construction of astroseismic samples
The core dataset used in this study includes TESS light curves for 643 Delta Scuti starsand three important star-seismic indices, ν(Aₘₐₓ), νₘₐₓ, Δν. The initial sample contained 677 δ Scuti stars, and after multiple rounds of screening, 643 were retained as the core dataset. Screening criteria include: Must have a TESS 2 minute short exposure light curve (from the MAST archive). Each observed field must have at least 7,000 data points. PDC – Light curve processed by SAP correction. Three star seismic parameters are fully available.
Based on this, the researchers selected an additional 251 Delta Scuti stars as a supplementary sample. These stars also have high-quality light curves, but the corresponding stellar seismic parameters are not published. The selection criteria were to cover at least three observed fields and have at least 7,000 data points in each field. This part of the sample is primarily used for the actual prediction and validation of the model.
Frequency histogram of 643 delta Scuti stars
Model: Ensemble regression framework that stacks multiple basic models
The purpose of the model in this study is to estimate the star seismic parameters of stars based on light curve features. The overall process includes feature extraction, data preprocessing, ensemble modeling, and hyperparameter optimization.
Regarding function constructionTwo types of features are used. One is the statistical features (mean, standard deviation, median, etc.) that describe the basic characteristics of the photometric distribution. The other type is frequency-domain functions including principal component analysis (PCA), autocorrelation function (ACF), fast Fourier transform (FFT), and discrete wavelet transform (DWT) to extract periodic and multiscale structural information in vibration signals.
At the data preprocessing stagesamples with missing values are first removed and the features are normalized. Furthermore, to address the problem of uneven distribution of some features, a statistical distribution-based resampling method is introduced to generate synthetic data and reduce the bias, thereby improving the stability of model training.
Regarding the framework, the model uses a stacked ensemble regression framework that uses Random Forest, Gradient Boosting Regression, and Ridge Regression as the base model. The first two improve prediction performance in terms of reducing variance and bias, respectively, while ridge regression addresses the issue of collinearity between features through regularization. The output of the base model is further used as input to train a meta-regressor for fusion, thereby improving the overall generalization ability and reducing the prediction error.
During the model training process, researchers combine random search and cross-validation to optimize key hyperparameters (such as number of trees, maximum depth, and learning rate) to obtain a stable and high-performance model configuration.
Generalization test with 60 independent stars, R² > 0.77 for all astroseismic indices
Experimental validation includes three parts: training the model, evaluating its generalization ability, and predicting new samples.
During the training phase, the researchers randomly selected 583 stars out of 643 to build the model, split the training set and test set in an 8:2 ratio, and repeated this 100 times to reduce the effects of randomness. The remaining 60 stars were used as an independent test set to evaluate the generalization ability of the model. Additionally, 251 unlabeled samples were used for the final prediction.
Comparison of measured and predicted values, relative errors, and error distributions for 583 stars.
For training and test samples, The R² values of the model predictions for ν(Aₘₐₓ), νₘₐₓ, and Δν are 0.95, 0.93, and 0.87, respectively, and the relative errors for most samples are less than 0.2. Feature importance analysis shows that the autocorrelation function (ACF) contributes the most, followed by FFT and DWT, and some statistical features (such as skewness and kurtosis) also play a role. The learning curve shows that the model converges stably and the hyperparameter optimization is effective.
Model learning curve
The model still maintains good performance on an independent test set. The R² values for the three parameters are 0.91, 0.87, and 0.77, respectively, and the predicted results are in good agreement with the observed values. The results of multiple repeated experiments show little variation, indicating that the model has good stability and robustness. Finally, the researchers applied the model to 251 unlabeled stars and obtained predicted values for star seismic parameters. The results generally fall within a reasonable range for δ Scuti stars.
conclusion
Overall, this study is a targeted supplement rather than a replacement for traditional star-seismic techniques. With large-scale observational data rapidly accumulating, efficient parameter estimation is achieved through data-driven methods and detailed analysis is performed in combination with detailed physical modeling. This approach is particularly relevant for targets such as vertebral delta stars, which have complex vibrational modes and are difficult to standardize.
This article is from the WeChat official account “HyperAI Super Neural”. Author: Tian Xiaoyao. Republished by 36Kr with permission.
