The classification of Btype and hot subdwarf stars presents several technical challenges. Firstly, the spectra of these stars can have overlapping features, making accurate differentiation difficult. Effective baseline correction is crucial; therefore, we used Asymmetric Least Squares (ALS) to remove noise and enhance signal quality. Identifying the most relevant features from the spectra is another significant challenge. We addressed this by employing the PanCore concept to identify 500 unique patterns essential for classification.
Traditional machine learning methods have been explored extensively. In this paper, the PanCore concept based on Kmeans for training data acquisition, and Support Vector Machine (SVM) for classification of star data is implemented. The PanCore concept utilizes Kmeans to identify and select representative samples from the available training data, aiming to construct a robust classification model.
Model selection and parameter tuning significantly affect the performance of the classification. We evaluated three SVM kernels (linear, polynomial, radial basis) and used crossvalidation for optimal tuning. Furthermore, the research examines the effect of different kernel functions within SVM on the accuracy and performance of star classification. The choice of the kernel function plays a crucial role in capturing and separating the underlying patterns in the data.
SVMs offer several advantages over other methods such as decision trees, ensemble learning, and neural networks/deep learning for this spectral classification problem. SVMs are less prone to overfitting compared to decision trees and perform well in highdimensional spaces, which is crucial for spectral data. They maximize the margin between classes, aiding in the distinction between overlapping spectral features of Btype and hot subdwarf stars. The use of kernel functions allows SVMs to handle nonlinear relationships effectively. Additionally, SVMs are computationally more efficient and require less data than deep learning models, with simpler model interpretation and fewer hyperparameters to tune. These characteristics make SVMs particularly wellsuited for our classification task.
Additionally, the data imbalance between the more numerous Btype stars and the fewer hot subdwarf stars can bias the model. We mitigated this by ensuring balanced training through appropriate sampling techniques. The model can effectively classify new star data using SVM based on the learned patterns from the training samples.
The authors present a flow chart in Fig. 2 to visually represent the adopted methodology. This flow chart outlines the sequential steps and procedures involved in acquiring the training data, training the model using Kmeans and SVM, and ultimately classifying the star data. By implementing this approach and analyzing the impact of the kernel function, the study aims to enhance the accuracy and efficiency of star classification, contributing to a deeper understanding of celestial objects and their characteristics.
Preprocessing
In star spectroscopy, the spectra of stars are typically composed of absorption or emission lines superimposed on a continuum of emission. These spectral features arise from various physical processes occurring within the star, providing crucial information about its composition, temperature, and other fundamental properties. The deviation from the expected smoothness at zero intensity is a consequence of the presence of these absorption or emission lines and their interactions with the continuum emission.
To address these distortions and accurately interpret the spectral features, baseline correction procedures are utilized. These procedures aim to mitigate systematic variations in the intensity baseline, thereby improving the clarity of the spectral information. However, the effectiveness of baseline correction procedures depends on tuning parameters that need to be carefully selected.
In this study, instead of relying on subjective approaches, we adopted an objective procedure for choosing the baseline correction method^{37}. This method outlines an optimal and systematic approach to selecting the most suitable baseline correction technique for the given star spectroscopy data.
By employing this objective procedure, our goal is to eliminate potential biases and ensure the selection of a baseline correction method that aligns best with the specific characteristics of the star spectra being analyzed. This objective approach enhances the reliability and reproducibility of the baseline correction process, leading to a more accurate and meaningful interpretation of the star spectroscopic data. It also provides a standardized methodology that can be applied consistently across different datasets, improving the overall quality and comparability of the results obtained.
The algorithm states

1.
For each baseline correction algorithm, determine the appropriate levels at which all parameters will be tested.

2.
Using the corresponding algorithm, the baseline is corrected at each parameter level.

3.
Utilize the corrected baseline spectral data to model responses related to the physical characteristics of the stars, employing Partial Least Squares (PLS) regression.

4.
Validate the model’s prediction capability to assess its accuracy in forecasting the relevant spectral features.

5.
The optimal levels of parameters are determined for each baseline correction algorithm.

6.
The baseline correction algorithm with the best prediction capability is selected as the optimal choice among all the algorithms considered.
The evaluation of prediction capability typically involves assessing crossvalidated accuracy. This process includes performing crossvalidation, where the data is divided into subsets for both training and testing the model. This division allows for an estimation of the model’s predictive performance.
The potential Asymmetric Least Squares (ALS) method is briefly explained below.
Asymmetric least squares (ALS)
The Asymmetric Least Squares (ALS)^{23,24} method is a powerful approach used for data analysis utilizes the least square method to effectively handle predictor variables with significant errors. By assigning appropriate weights, the ALS method downplays the influence of variables with substantial errors while considering their impact on the analysis.
To achieve a smooth and accurate representation of the data, the ALS method incorporates 2nd derivative restriction within its smoothing process. This constraint helps balance the tradeoff between achieving smoothness and preserving the relevant features present in the dataset.
The ALS method is mathematically expressed as:
$$\begin{aligned} S = \sum w_i (x_i – b_i)^2 + \lambda \sum (\Delta ^2 b_i) \ \end{aligned}$$
(1)
Here, \(x_i\) represents the original spectrum, \(b_i\) denotes the estimated baseline, \(w_i\) corresponds to the asymmetric residual weights, and \(\Delta ^2\) represents the second derivative of the estimated baseline. ALS aims to minimize the value of the expression \(S\) by adjusting the baseline estimates.
To finetune the ALS algorithm, there are two adjustable parameters: – \(\lambda\) is the smoothing parameter, which controls the degree of smoothness applied to the estimated baseline. – \(w\) represents the weight assigned to the asymmetric residual, allowing for flexibility in handling different degrees of errors in predictor variables.
By appropriately adjusting these parameters, we customize the behavior of the ALS method according to the specific characteristics of our data. This flexibility enhances the ALS algorithm’s adaptability and improves its performance in accurately estimating baselines and revealing meaningful patterns in various analytical scenarios.
Pancore spectrum training data acquisition
The study involves a substantial dataset of hot subdwarf and Btype star spectra. To overcome the challenge of training a model on such a vast dataset, we employed the pancore concept, originally developed in genomics^{38}, as the basis for training data acquisition. The pancore concept involves the following steps:

1.
Utilizing Kmeans clustering with a large value of K within each class.

2.
Employing the nearest neighborhood method to extract \(s\) samples that are closest to each centroid obtained from Kmeans clustering.
Through the implementation of these steps, we curated a comprehensive set of \(Ks\) samples for each class, guaranteeing the inclusion of various spectral representations within our training dataset. It’s imperative to highlight that the input for Kmeans clustering consisted of preprocessed flux spectra, with the input dimension explicitly defined. This meticulous approach efficiently encapsulates the intrinsic characteristics of star spectra, facilitating the model’s learning process with a manageable yet informative subset of samples.
The integration of the pancore concept in star spectra analysis significantly diminishes the dimensionality of training data while upholding pivotal features and ensuring a comprehensive portrayal of spectral diversity within each class, as depicted in Fig. 3. This enhancement empowers our model to extract insights from a discerning subset of samples, thereby amplifying its accuracy and generalization capabilities in spectral classification tasks.
Support vector machines
Support vector machine (SVM) is a powerful supervised machine learning algorithm initially introduced by Cortes and Vapnik^{30}. It is widely utilized for both classification and regression tasks due to its ability to handle various types of data through the use of different kernels. SVM offers flexibility in choosing kernels such as linear (LSVM), polynomial (PSVM), and radial basis (RSVM)^{31,32,33}, allowing for effective modeling of complex relationships within the data.
SVMs are wellsuited for spectral classification problems due to several characteristics. They are effective in highdimensional spaces, which is important given the complexity and dimensionality of spectral data. They are robust to overfitting, especially in cases where the number of features exceeds the number of samples, as often seen in spectral datasets. SVMs also perform well with clear margin separation, which helps distinguish between Btype and hot subdwarf stars with overlapping spectral features. Additionally, SVMs can utilize different kernel functions to handle nonlinear relationships in the data, enhancing classification accuracy. Crossvalidation for parameter tuning ensures optimal model performance, making SVMs a reliable choice for this classification task. These characteristics make SVMs particularly suited for the spectral classification of stars.
Several variables are introduced to elucidate the workings of Support Vector Machines (SVM) in classifying star spectra data. These variables include \(J\), representing the total number of training samples; \(x_j\) and \(y_j\), denoting the features and labels of each sample, respectively; \(x\) and \(y\), representing the feature space and class labels, with \(y\) taking values of either 1 or 1; \(w\), signifying the coefficient vector; \(b\), representing the bias term; \(\alpha _j\), indicating the Lagrange multipliers associated with each training sample; \(\theta (x)\), denoting the feature mapping function; \(S(w, x)\), signifying the inner product between \(w\) and \(x\); and \(C\), representing the regularization parameter in softmargin SVM. Each variable plays a crucial role in the formulation and optimization of the SVM algorithm, contributing to its effectiveness in accurately classifying star spectra data.
In SVM, there is a constraint given by: \(\sum _{j=1}^{J} y_j \alpha _j = 0\) \((x_j,y_j), \quad x \in R^d, y \in \{1,1\}\) where \((x_j, y_j)\) represents the training samples, with \(x\) belonging to the \(d\)dimensional space and \(y\) taking values of either – 1 or 1. The aim of SVM is to find a linear classifier in an infinitedimensional space, given by:
$$\begin{aligned} f(x) = \text{sign}(w \cdot \theta (x) + b) \end{aligned}$$
(2)
Here, \(w \cdot \theta (x) = S(w, x)\) denotes the inner product between the coefficient vector \(w\) and the input sample \(x\).
SVM’s strength lies in its ability to separate data points by defining a decision boundary while maximizing the margin between different classes. The choice of a kernel function determines the transformation of the input data into a higherdimensional space, enabling effective separation of classes that may not be linearly separable in the original feature space.
By utilizing SVM with different kernels, we explore diverse strategies to classify the star spectra data effectively. The linearity of LSVM, the flexibility of PSVM, and the radial basis function of RSVM provide distinct approaches for capturing the underlying patterns and relationships within the data. This versatility allows for a comprehensive analysis of the star spectra and enhances the model’s capability to make accurate classifications.
In our study, the softmargin Support Vector Machine (SVM)^{39} formulation is essential for effectively classifying star spectra data. The key variables involved in the softmargin SVM include \(\theta ^*_\text{soft}(w)\), representing the optimized parameter; C, signifying the regularization parameter controlling the tradeoff between achieving a smooth decision boundary and accurately classifying training data points; \(\xi _j\), indicating the slack variables that allow for misclassifications in the optimization process; \(L_\text{soft}(w, b, \alpha , \xi )\), denoting the softmargin SVM objective function; and \(W_\text{soft}(\alpha )\), representing the dual cost function for softmargin SVM. By finetuning the C parameter, we aim to strike the right balance between maximizing the margin between classes and minimizing misclassifications, ensuring optimal classification performance for star spectra data analysis.
For softmargin SVM, optimization is given by:
$$\begin{aligned} \theta ^*_\text{soft}(w) = \text{argmin}_{w, \xi } \frac{1}{2} \Vert w\Vert ^2 + C\sum _{j=1}^{J} \xi _j \end{aligned}$$
(3)
such that,
$$\begin{aligned} \ y_j (w \cdot \theta (x_j) + b) \ge 1 – \xi _j \ \end{aligned}$$
(4)
$$\begin{aligned} \xi _j \ge 0 \end{aligned}$$
$$\begin{aligned} \begin{array}{rl} L_\text{soft}(w, b, \alpha , \xi ) =&\frac{1}{2} \Vert w\Vert ^2 + C \sum _{j=1}^{J} \xi _j – \sum _{j=1}^{J} \alpha _j \left( y_j (w \cdot \theta (x_j) + b) – 1 + \xi _j\right) \end{array} \end{aligned}$$
(5)
The objective of the softmargin SVM optimization is to minimize the above function, denoted by \(\theta ^*_\text{soft}(w)\), with respect to the coefficients \(w\) and slack variables \(\xi _j\), where \(C\) controls the tradeoff between margin maximization and error minimization.
The stationary conditions are,
$$\begin{aligned} \frac{\partial L_\text{soft}}{\partial w}= & {} w – \sum _{j=1}^{J} y_j \alpha _j \theta (x_j) = 0 \end{aligned}$$
(6)
$$\begin{aligned} \frac{\partial L_\text{soft}}{\partial b}= & {} \sum _{j=1}^{J} y_j \alpha _j = 0 \end{aligned}$$
(7)
$$\begin{aligned} \frac{\partial L_\text{soft}}{\partial \xi _j}= & {} C – \alpha _j = 0 \end{aligned}$$
(8)
$$\begin{aligned} \alpha _j (y_j (w \cdot \theta (x_j) + b) – 1 + \xi _j)= & {} 0 \end{aligned}$$
(9)
These stationary conditions define the critical points of the Lagrangian \(L_\text{soft}\), where the partial derivatives with respect to the parameters \(w\), \(b\), and \(\xi _j\) are equated to zero.
So the weight vector is a linear combination of the data points:
$$\begin{aligned} w = \sum _{j=1}^{J} y_j (\alpha _j – \alpha _j^*) \theta (x_j) \end{aligned}$$
(10)
The weight vector \(w\) is expressed as a linear combination of the support vectors \(x_j\), weighted by the corresponding Lagrange multipliers \(\alpha _j – \alpha _j^*\).
Then the classifier is:
$$\begin{aligned} f_\text{soft}(x)= & {} \text{sign}\left( \sum _{j=1}^{J} y_j (\alpha _j – \alpha _j^*) \theta (x_j) \cdot \theta (x) + b\right) \end{aligned}$$
(11)
$$\begin{aligned}= & {} \text{sign}\left( \sum _{j=1}^{J} y_j(\alpha _j – \alpha _j^*) S(x_j,x) + b\right) \end{aligned}$$
(12)
The softmargin classifier \(f_\text{soft}(x)\) is determined by the sign of the inner product of the support vectors \(x_j\) with the input sample \(x\), weighted by the differences in Lagrange multipliers \(\alpha _j – \alpha _j^*\), and added to a bias term \(b\).
Substituting into the Lagrangian gives the dual cost function for softmargin SVM:
$$\begin{aligned} W_\text{soft}(\alpha ) = \sum _{j=1}^{J} \alpha _j – \frac{1}{2} \sum _{j,i} y_j y_i (\alpha _j – \alpha _j^*) (\alpha _i – \alpha _i^*) S(x_j,x_i) \end{aligned}$$
(13)
The dual cost function \(W_\text{soft}\) captures the tradeoff between maximizing the margin and minimizing classification errors, where \(\alpha _j\) are the Lagrange multipliers associated with each support vector.
The optimization for softmargin SVM is now:
$$\begin{aligned} {\hat{\alpha }}_\text{soft} = \arg \max _\alpha W_\text{soft}(\alpha ) \end{aligned}$$
(14)
such that,
$$\begin{aligned} 0 \le \alpha _j \le C \end{aligned}$$
(15)
The optimal Lagrange multipliers \({\hat{\alpha }}_\text{soft}\) are obtained by maximizing the dual cost function \(W_\text{soft}\) subject to the constraints \(0 \le \alpha _j \le C\), ensuring that the Lagrange multipliers are within a feasible range.
$$\begin{aligned} f_\text{soft}(x)= & {} \text{sign}\left( \sum _{j=1}^{J}y_j (\alpha _j – \alpha _j^*) S(x_j,x) + b\right) \end{aligned}$$
(16)
$$\begin{aligned}= & {} \text{sign}\left( \sum _{y_j:j=1} (\alpha _j – \alpha _j^*) S(x_j,x) \right. \left. – \sum _{y_i:i=1} (\alpha _i – \alpha _i^*) S(x_i,x) + b\right) \end{aligned}$$
(17)
$$\begin{aligned} f_\text{soft}(x)= & {} \text{sign}\left( h_+(x) – h_(x) + b\right) \end{aligned}$$
(18)
The final softmargin classifier \(f_\text{soft}(x)\) predicts the class label of an input sample \(x\) based on the sign of the decision function \(h_+(x) – h_(x) + b\), where \(h_+(x)\) and \(h_(x)\) are the contributions from positive and negative support vectors, respectively, to the decision function.
In support vector machines (SVM), the parameter \(C\) plays a crucial role as the regularization parameter, influencing the balance between achieving a smooth decision boundary and accurately classifying training data points. A smaller value of \(C\) promotes a broader margin, allowing for a more generalizable model but potentially compromising on fitting the training data precisely. Conversely, a larger \(C\) value results in a narrower margin, potentially fitting the training data more closely but risking overfitting and reduced generalization to unseen data. Finetuning the \(C\) parameter is essential to find the right balance for SVM, ensuring effective classification while avoiding underfitting or overfitting issues in various applications, including our star spectra data analysis.
Linear kernel SVM (LSVM)
The Linear kernel is a fundamental kernel function specifically designed for dealing with linearly separable data. It allows for the transformation of data points into a higherdimensional space to facilitate linear separation.
The mathematical formula for the Linear kernel is given by:
$$\begin{aligned} F(x_j) \cdot F(x_k) = (x_j \cdot x_k)^2 \end{aligned}$$
(19)
This equation represents the inner product of the transformed feature vectors \(F(x_j)\) and \(F(x_k)\), which is obtained by squaring the dot product of the original data points \(x_j\) and \(x_k\).
In a simplified form, the expression for the Linear kernel can be represented as:
$$\begin{aligned} F(x_j, x_k)= x_j \cdot x_k + c \end{aligned}$$
(20)
Here, \(c\) represents a constant term. This formulation enables the calculation of the dot product between the input vectors \(x_j\) and \(x_k\), with the addition of the constant term \(c\).
While the Linear kernel in Support Vector Machines (SVM) offers simplicity and computational efficiency, it is crucial to delve into its inherent characteristics for effective utilization. The Linear kernel is particularly adept at handling linearly separable data by defining a decision boundary in the original feature space. Unlike its counterparts, such as the Polynomial or Radial Basis Function (RBF) kernels, the Linear kernel doesn’t involve complex transformations into higherdimensional spaces. This simplicity not only contributes to computational efficiency but also provides transparency in understanding the decisionmaking process. Additionally, the absence of kernelspecific parameters in the Linear SVM simplifies the tuning process, making it more straightforward for practitioners. Despite its simplicity, the Linear kernel remains a powerful tool, especially when dealing with largescale datasets, where its efficiency and interpretability become advantageous in various applications.
Polynomial kernel SVM (PSVM)
The Polynomial kernel is a nonstationary kernel that can be applied to both hardmargin and softmargin classification scenarios. It is particularly wellsuited for problems where all the training data has been normalized, ensuring consistency across the dataset.
The mathematical representation of the Polynomial kernel is as follows:
$$\begin{aligned} \ F(x_j, x_k) = (\alpha x_j \cdot x_k + c)^d \ \end{aligned}$$
(21)
In this equation, \(F(x_j, x_k)\) represents the transformed feature vectors obtained by raising the dot product of the input vectors \(x_j\) and \(x_k\) to the power of the polynomial degree \(d\). The parameters \(\alpha\), \(c\), and \(d\) are adjustable and play significant roles in shaping the behavior and performance of the Polynomial kernel.
By adjusting these parameters, we can control the complexity and flexibility of the kernel function, allowing it to adapt to different types of data and classification problems. The parameter \(\alpha\) determines the influence of the dot product term, \(c\) represents a constant offset, and \(d\) determines the degree of the polynomial transformation. Finetuning these parameters is essential to achieve optimal performance and generalization in Polynomial kernelbased SVM models.
The flexibility of the Polynomial kernel makes it a valuable tool for handling data sets with complex relationships and nonlinear decision boundaries. By leveraging the adjustable parameters, researchers can effectively explore the tradeoff between model complexity and generalization, ensuring that the Polynomial kernel captures the underlying patterns in the data accurately and provides robust classification results.
Radial kernel SVM (RSVM)
In cases where prior knowledge about the data is lacking, the radial basis function (RBF) kernel is commonly employed to transform the data. The RBF kernel introduces two critical parameters, namely C and \(\gamma\), which require careful consideration. The C parameter, commonly referred to as the regularization parameter, is shared among all SVM kernels and influences their behavior. A lower value for C promotes a smoother decision surface, while a higher value aims to classify all training sets accurately.
The \(\gamma\) parameter, also known as the kernel coefficient, determines the influence of each training example on the decision boundary. Additionally, the \(\sigma\) parameter, representing the standard deviation in the RBF kernel, controls the width of the kernel and influences the smoothness of the decision boundary. Choosing appropriate values for C, \(\gamma\), and \(\sigma\) is crucial, as they significantly impact the performance of the SVM model. It’s imperative to carefully tune these parameters to achieve optimal results.
The mathematical expression for the RBF kernel is as follows:
$$\begin{aligned} \ F(x_j, x_k) = \frac{1}{\sigma \sqrt{2\pi }} \exp \left( \frac{1}{2}\left( \frac{x_jx_k}{\sigma }\right) ^2 \right) \ \end{aligned}$$
(22)
This equation represents the transformed feature vectors obtained by computing the exponential of the squared difference between the input vectors \(x_j\) and \(x_k\) divided by the square of \(\sigma\). The term \(\frac{1}{\sigma \sqrt{2\pi }}\) serves as a normalization factor.
Choosing suitable values for the \(C\), \(\gamma\), and \(\sigma\) parameters is critical in achieving optimal SVM performance. Careful parameter tuning enables the RBF kernel to capture complex relationships and nonlinear patterns in the data, ultimately leading to improved classification results and better generalization.