The goal of this section is to assess the performance and interpretability of five widely used classical ML algorithms -support vector machine, random forest, logistic regression, decision Tree and k nearest neighbor, along with three quantum ML algorithms- Quantum SVM, quantum kNN and variational Quantum Classifier, under various levels of complexity, i.e. noise handling, resampled data. Various steps of the methodology is illustrated in Fig. 3.

Key Python modules and libraries used include:
-
Scikit-learn: For data preprocessing, feature selection, classical model training, and performance evaluation.
-
Imbalanced-learn: Specifically, SMOTE and ADASYN were used to address class imbalance in the training data.
-
PennyLane: For constructing and simulating quantum circuits and implementing the Quantum ML Models.
-
SHAP and LIME: For model explainability and visualisation, aiding in the interpretation of feature contributions to model decisions.
-
Matplotlib and Seaborn: For plotting ROC curves and explanation visualisations.
-
NumPy and Pandas: For efficient numerical computation and data manipulation.
Each step is described in detail below.
Here is the pseudo-code for the methodology, for both the quantum and classical models:

Data preparation, noise handling, feature selection, and model evaluation
Datasets
The different datasets have been selected to assess the performance of QML and ML classifiers. These datasets have been retrieved from the UCI and Kaggle repositories, as tabulated in Table 2. The term is used in the dataset to refer to several instances, features, classes, and their corresponding null values, if present.
Data preprocessing
Data preprocessing ensures the datasets are ready for model training and testing. The steps followed are:
Missing values
Missing values (if any) are imputed using the median (for numerical features) and mode (for categorical features).
Feature scaling
Standardization (z-score normalisation) is applied to all continuous variables to ensure uniformity.
Class imbalance simulation
SMOTE (Synthetic Minority Over-sampling Technique): Used for over-sampling the minority class in imbalanced datasets like Wine Quality, Breast Cancer, and Credit Card Fraud.
ADASYN (adaptive synthetic sampling): Applied over-sampling to the minority class in specific experiments, especially for highly imbalanced datasets. It generates synthetic samples adaptively, focusing more on minority class instances that are harder to learn, thereby improving classifiers’ performance.
Noise injection
Gaussian noise is added to the features to simulate sensor or data measurement errors. The noise level is controlled (e.g., adding 1% noise by randomly sampling from a Gaussian distribution).
$$X_{noisy} = X + N\left( {0,\sigma^{2} } \right)$$
(1)
where \(N\left( {0,\sigma^{2} } \right)\) Represents Gaussian noise with mean zero and standard deviation \({\varvec{\sigma}}\) .
Feature selection
SelectKBest: A univariate feature selection method based on ANOVA F-test scores is used to reduce the feature set and evaluate model performance under reduced dimensionality.
Dimensionality Reduction: The top k features (using SelectKBest) are selected to simulate a scenario with reduced information.
Supervised and quantum machine learning classifiers
Supervised learning in machine learning involves acquiring a function that establishes a relationship between input and output data, which can be achieved by utilising provided examples of input–output pairs. The procedure entails deriving a mathematical function based on training data labelled by assigning certain categories or classes and comprising a collection of training instances. The input dataset is partitioned into many training and testing datasets, as illustrated in Fig. 4. The training dataset includes a target variable that necessitates forecasting or categorisation. Algorithms extract patterns from the training dataset and employ them to make predictions or categorise the test dataset18.

Currently, Quantum ML has emerged as a promising advancement of classical ML by leveraging the principles of Quantum Computing to enhance the capacity of learning models. Quantum Models follow the same learning from labelled data but they encode the classical data to quantum data using feature maps and process it using quantum circuits. Figure 4 depicts the pipeline of machine learning.
Decision tree (DT)
Constructing a classifier involves utilising a set of internal and leaf nodes. The internal nodes represent decision criteria, while predictions are represented by the leaf nodes. Let \({^{\prime}}N{^{\prime}}\) Be the quantity of physiological characteristics and \({^{\prime}}M{^{\prime}}\) Represent the number of diagnosis predictions. Let \({^{\prime}}X{^{\prime}}\) Be a set of physiological characteristic vectors denoted \(as{ }\left\{ {X_{1} ,X_{2 \ldots \ldots \ldots } { }X_{N} } \right\}\) . Let \({^{\prime}}W{^{\prime}}\) Be the set of \({^{\prime}}N{^{\prime}}\) thresholds, denoted as {\(W_{1}\), \(W_{2 \ldots \ldots \ldots \ldots \ldots \ldots ..} W_{N}\). Consider \({^{\prime}}C{^{\prime}}\) as a set of diagnosis predictions, denoted as \(as{ }\left\{ {C_{1} ,C_{2 \ldots \ldots \ldots } { }C_{N} } \right\}\).. This process commences from the root node and proceeds towards the last node. Consequently, the classification effect depends on the choices from the starting node to the finishing node. The classification of a classifier based on decision trees can be understood as a collection of IF–THEN statements by systematically visiting each decision route. Hence, it is probable to transform a DT classifier in any rule base consisting of the collections of IF–THEN rules that are formed from the DT classifier19. The attribute that each node tests is labelled, and the values that correspond to those labels are labelled on its branches, as illustrated in Fig. 5.

Instance of decision tree.
Support vector machine
SVM, a supervised classification method, begins with a pre-existing training set. During the training process, a Support Vector Machine (SVM) acquires knowledge regarding the correlation between each data point and its related label in the given training dataset8. It is specifically designed for binary classification of new testing vectors and refers to a type of issue characterised by having more equations than unknowns, which is known as an over-determined system of equations as represented in Fig. 6. The algorithm produces a hyperplane defined by the equation \(\to _{w} .\to _{x}\) + b = 0. This hyperplane ensures that to obtain a training point \(\to _{xn}\) in the positive class, \(\to _{w} .\to _{x}\) + b ≥ 1, and to obtain a training point \(\to _{{{\text{xn}}}}\) in the negative class, \(\to _{w} .\to _{x}\) + b ≤—1. The algorithm strives to optimise the distance between the two classes throughout the training process. This is logical since we wish to create a clear separation between the two classes in order to obtain a more accurate classification result for new data samples, such as \(\to _{{{\text{xo}}}}\). Mathematically, Support Vector Machines (SVM) find any hyperplane that maximises the distance between two parallel hyperplanes, subject to the restriction that the product of the predicted label ( \(\to _{{{\text{yi}}}}\)) and the linear combination of the weights (\(\to _{{\text{w}}}\)) and the input vector (\(\to _{{{\text{xo}}}}\))20. The diagram in Fig. 6 below represents the categorization of classes using a hyperplane.

SVM Classification hyperplane.
K–Nearest Neighbor (KNN)
An enduring approach in class reasoning. The decision-making concept is based on a straightforward principle: the sample that needs to be evaluated is classified in the same category as the closest matching sample. The outcome of the Nearest Neighbor Rule is definitively established for all occurrences to be evaluated, assuming that the distance metric and training set remain constant. In set E, for every sample instance, if y is the closest neighboring instance to x, then the group of y is determined by the nearest neighbor rule. Assume X is a sample from an unidentified category. The decision process is envisaged in equation (2):
$$g_{j} \left( X \right) = {\text{min}}g_{i} X) i = 1, 2, . . . ,$$
(2)
Subsequently, the outcome of the choice is \(X \varepsilon Wi.\)
In this context, the nearest neighbour rule is presented with a focus on two key aspects: convergence and generalisation error. The nearest neighbour for a given point \(x\), derived from two training sets with different samples, varies. Given that the classification outcome is contingent upon the category label of the closest neighbouring data point, \(P\left( {e|{ }x,x^{\prime } } \right)\), is so obtained. \(x\) and \(x\prime\) factors determine the conditional error rate21.
Random forest (RF)
This model is a collective learning technique. The algorithm constructs a series of decision trees throughout the training process and the mode determines the class that has the highest frequency among the trees for classification tasks, while it calculates the mean prediction for regression tasks. The node splitting criterion was configured to demand a minimum of two samples, and each leaf node must include at least one sample. The procedural instructions of the Random Forest algorithm:
-
Generate a bootstrap sample using the given data.
-
In order to create each bootstrap sample, a regression tree must be constructed with certain modifications: randomly select a subset of the predictors at each node and determine the optimal split among the variables.
-
Calculate the most recent data by summing up the forecasts of the \({n}_{t}\) trees (taking the average for regression).
Built on a random selection of observations, whether with or without replacement, approximately 36.8% are not utilized for any one tree. This means that these observations are considered “out of the bag (OOB)” for that specific tree. The predictive accuracy of a random forest can be assessed using the out-of-bag (OOB) data.
$$OOB – MSE = \frac{{1{ }}}{n}\mathop \sum \limits_{i = 1}^{n} \left( {y_{i} – \widehat{{y_{i} }}OOB} \right)^{2}$$
(3)
where the average forecast for the \({i}_{th}\) opinion from total trees the observation that is out-of-bag (OOB) is represented by \(\widehat{{y}_{i}}OOB\).
Linear regression (LR)
Regression is a method employed to analyze the association between two variables. The analysis is commonly employed for the purpose of forecasting and prediction, and it shares significant similarities with the field of machine learning. Crucially, this method demonstrates the relationships between a dependent variable and a predetermined set of other variables in a dataset. The equation is represented as \(y{ } = { }\beta 0{ } + { }\beta 1x{ } + { }\varepsilon .\) Simple regression separates the impact of independent factors from the interplay of dependent variables.
Quantum support vector machine
A Quantum Support Vector Machine (QSVM) is a quantum-enhanced version of the classical Support Vector Machine (SVM) algorithm, designed to perform classification tasks. QSVM leverages quantum computing, particularly quantum kernel methods, to potentially outperform classical algorithms in certain scenarios, especially when dealing with high-dimensional data or complex patterns that are difficult to capture classically.
Quantum SVM replaces the classical kernel function with a quantum kernel, which is computed by a quantum computer. This quantum kernel is based on the inner product of quantum states, which allows encoding classical data into quantum feature space.
Quantum kernel estimation
The key idea is to map a classical data point xxx to a quantum state \(\left| {\left. {\phi \left( x \right)} \right\rangle } \right|\) and then compute a kernel matrix \(K\left( {x,x^{\prime } } \right) = \left| {\left\langle {\phi \left( x \right)} \right.} \right|\left. {\left. {\phi \left( {x^{\prime } } \right)} \right\rangle } \right|^{2}\) This is usually done by:
Encoding data into quantum circuits using a feature map \(U\phi \left( x \right)\).
Measuring the fidelity (overlap) between quantum states
This allows capturing nonlinear patterns efficiently, potentially with exponential speedups.
Workflow of QSVM
Feature mapping: Encode classical data xxx into a quantum state using a parameterized unitary \(U\phi \left( x \right)\).
Kernel evaluation: Compute \(K\left( {x,x^{\prime } } \right) = \left| {\left\langle {\phi \left( x \right)} \right.} \right|\left. {\left. {\phi \left( {x^{\prime } } \right)} \right\rangle } \right|^{2}\).
Classical SVM Solver: Use a classical algorithm (e.g., LIBSVM) to find the separating hyperplane using the quantum kernel matrix
Prediction: New data points are mapped and evaluated using the quantum kernel.
Quantum k nearest neighbor
The Quantum kNN algorithm adapts the classical k nearest neighbor classifier to a quantum computing paradigm by representing classical vectors as quantum states and using quantum subroutine to estimate inter sample similarities in superposition. QkNN aims to accelerate distance or similarity estimation and nearest neighbor search under theoretical models that provide efficient quantum access to data22.
QkNN implementations usually estimate \(s\left( {x_{q} ,x_{i} } \right)\) using one of the:
-
Fidelity / inner product: \(\left| {\left\langle {x_{q} } \right.} \right|\left. {\left. {x_{i} } \right\rangle } \right|^{2}\), estimated via a SWAP test or Hadamard test.
-
Euclidean distance via inner product identity:
-
\(\parallel x_{q} – x_{i} \parallel^{2} = \parallel x_{q} \parallel^{2} + \parallel x_{i} \parallel^{2} – 2\left\langle {x_{q} ,x_{i} } \right\rangle\) where the inner product is obtained from amplitude overlaps.
-
Hamming or other discrete distances when features are binarised; specialised circuits exist for parallel Hamming distance estimation.
Variational quantum classifier
The Quantum Variational Classifier (VQC) is a hybrid quantum–classical supervised learning model that leverages parameterised quantum circuits (PQCs) optimised via classical algorithms. It belongs to the class of Variational Quantum Algorithms (VQAs), specifically designed for Noisy Intermediate-Scale Quantum (NISQ) devices.VQC combine quantum feature encoding and trainable quantum layers to learn complex, nonlinear decision boundaries, analogous to a neural network but implemented on qubits. Their advantage lies in exploiting quantum entanglement and superposition to represent data in exponentially large Hilbert spaces, offering potential expressivity beyond classical models23 . If given a dataset D, then
$$D = \{ \left( {x_{i} ,y_{i} } \right)\}_{i = 1}^{N} ,x_{i} \in {\mathbb{R}}^{d} ,y_{i} \in \left\{ { – 1, + 1} \right\},$$
(4)
the goal is to find a variational quantum circuit \(U\left( \theta \right)\) parameterised by a vector of tunable parameters \(\theta\) such that the measurement expectation value corresponds to the predicted class label.
This process these steps are included :
-
Encoding: map classical data \(x_{i}\) into a quantum state \(\left| {x_{i} } \right\rangle = U_{\phi } \left( {x_{i} } \right)\left| 0 \right\rangle^{ \otimes n}\), where \(U_{\phi }\) is a data-dependent unitary transformation.
-
Parameterised evolution: Apply a trainable unitary \(U\left( \theta \right)\) to form \(\left| {\psi \left( {x_{i} ,\theta } \right)} \right\rangle = U\left( \theta \right)U_{\phi } \left( {x_{i} } \right)\left| 0 \right\rangle^{ \otimes n}\)
Measurement: Measure an observable (e.g., Pauli-Z) to obtain the expectation value:
$$f_{\theta } \left( {x_{i} } \right) = \left\langle {\psi \left( {x_{i} ,\theta } \right)} \right|M\left| {\psi \left( {x_{i} ,\theta } \right)} \right\rangle ,$$
where \(M\) is the measurement operator. The sign of \({f}_{\theta }({x}_{i})\) determines the class label.
Evaluation parameters
Within the domain of data mining, several evaluation measures, including accuracy, F-measure, precision, and recall, are commonly employed to assess the system’s performance. In this paper, we will consider these measures to evaluate performance. TPe is an abbreviation for correctly identified positive, FPe is a false positive, TNe is a true negative, and FNe is a false negative.
The accuracy rate is the ratio of instances with correct classification to the total cases tried, as stated in Eq. (1). The equation can be formulated in the following manner:
$${\text{Accuracy}} = \frac{{\left( {TPo} \right) + \left( {TNe} \right)}}{{(TPo + \left( {FPo} \right){ } + \left( {TNe} \right) + \left( {FNe} \right)}}$$
(6)
The precision of a model is defined as the proportion of the truly positive instances that are correctly forecasted to all positive forecasts generated by the model.
$${\text{Precision}} = \frac{{\left( {TrP} \right)}}{{\left( {TrP} \right) + \left( {FlP} \right){ }}}$$
(7)
The recall calculates the percentage of positive scenarios correctly predicted to the total number of positive scenarios, with the false negative scenarios, and is occasionally called the true positive rate.
$${\text{Recall}} = \frac{{\left( {TrP} \right)}}{{\left( {TrP} \right) + \left( {FlN} \right){ }}}$$
(8)
The F1 score is a quantitative measure that assesses the relationship between precision and recall. The calculation involves taking the harmonious average of both. It is a valuable metric for achieving a compromise between high precision and strong recall. It effectively penalises extreme negative values of either component.
Accuracy quantifies the overall correctness or precision of a measurement or calculation. In the equation \(Pr1\) stands for precision \(Re1\) stands for recall
$${\text{F1 score}} = \frac{{2{ } \times { }\left( {{\text{Pr}}1{ } \times {\text{ Re}}1} \right){ }}}{{{ }\left( {{\text{Pr}}1{ } + {\text{ Re}}1} \right)}}$$
(7)
ROC curve: The Receiver Operating Characteristic (ROC) curve is a standard tool for evaluating binary classifiers, especially in imbalanced medical datasets. It illustrates the trade-off between a True Positive Rate (Sensitivity) and a False Positive Rate (1—Specificity).
Confusion matrix: It is a tabular representation of showing the counts of True positives, true negatives, false positives and false negatives.
Explainability: SHAP, LIME and Quantum kernel distribution
To enhance model transparency, we use two popular Explainable AI (XAI) techniques: SHAP and LIME. These techniques provide insights into which features are driving the model’s predictions.
SHAP
We use SHAP (SHapley Additive exPlanations) to evaluate feature importance globally. SHAP values indicate the impact of each feature on the prediction for each instance.

LIME
For local interpretability, we use LIME to explain individual predictions by approximating the model with simpler interpretable models around each prediction.

Quantum kernel distribution
To analyse quantum-enhanced models like QSVM and QKNN, we examine the distribution of quantum kernel values. The quantum kernel distribution provides insights into how well different classes are separable in the high-dimensional feature space, highlighting regions where the quantum embedding improves classification performance.
Hardware availability and simulation justification
All quantum modes in the study are implemented and executed using the pennylane framework on Google Colab, leveraging its high-performance cloud runtime for reproducible quantum simulation. This design choice was guided by the limited public availability of current NISQ devices, where hardware execution remains constrained by qubit decoherence and hardware noise and is not publicly available or available at a very high cost. Running on Google Colab ensures platform independence and reproducibility across multiple runs. The implemented circuits are intentionally designed with shallow depth and fewer qubit requirements. Consequently, the reported results represent a reproducible and hardware-ready baseline for future near-term deployment.
