This section primarily covers the key methods employed in this research, the workings of the proposed model, dataset details, data preprocessing, and other relevant aspects.
Proposed advanced multiclass heart disease classification model
A modified Multiclass Attention Mechanism is used in conjunction with Deep Bidirectional Long Short-Term Memory (BiLSTM) for heart disease classification from ECG. M2AM aims to enhance the model’s accuracy using class-specific features discrimitive for diagnosing various heart diseases, a key challenge in multiclass classification. The procedure begins with the pre-processing of the raw ECG signal to reduce unwanted noise and improve the quality of the ECG signals using the Improved Adaptive Band Pass Filter (IABPF). In this filtering process unnecessary frequencies are eliminated and only the essential elements of the ECG signal (necessary for accurate analysis) were kept19. Wavelet transformations are then used to get temporal-spectral information from the ECG. This approach generates a fullest of features that realize the intricate and dynamic characteristic of ECG morphology. The BiLSTM layer process these input sequences in forward passes and backward passes to learn temporal dependencies and essential to understand the sequential nature of ECG signal.
M2 AM further modifies the model to adjust attention weights for each heart disease class dynamically. This enables the model to focus on the most relevant features for each class, especially when some classes share common or overlapping features (e.g., arrhythmia and coronary artery disease). M2AM focuses on class-specific features, enabling the model to distinguish diseases more effectively while mitigating the risk of misclassification. Later processing layers are fully connected layers that refine the features, and an output SoftMax layer provides the class probability for heart disease, thereby helping to classify. This relatively strong architecture significantly enhances classification accuracy compared to its predecessor, making the detector structure insensitive to noise and allowing the model to perform better in complex real-world ECG data21.
The Modified Multiclass Attention Mechanism is a significant enhancement of this architecture, as it computes the attention weights on a class-by-class basis. Such adaptation enables the model to favor features that are most important in distinguishing between different heart diseases. M2AM takes a new approach by calculating class-wise attention weights, whereas conventional AM is class-agnostic. Such a class-pair-specific learning scheme can help the model exploit the standard features common to different diseases, leading to higher classification accuracy and robustness. The processed data, after being fed into BiLSTM and M2AM, is then fed into fully connected layers, followed by a SoftMax output layer that generates probabilities for different heart disease classes. The introduction of M2AM with a class-specific attention mechanism can enhance the model’s ability to mitigate misclassification and noise sensitivity, thereby further improving its diagnostic performance.
Novelty of the M2AM
The M2AM is an adapted multiclass attention mechanism that has the following advantages compared to basic attention mechanisms:
-
a)
Class-Specific Attention
-
i.
In classical attention mechanisms (like those used in BiLSTM and Transformer models) compute a single global attention weight per feature to all classes. This has the drawback that every class is given the same weight for a feature, which can lead to suboptimal classification in tasks with a skewed class distribution.
-
ii.
M2AM overcomes this weakness by learning class-specific attention weights. The model computes distinct attention scores for every class \({y}_{j}\) w.r.t. each feature \({x}_{i}\). This way the model will be able to learn different parts of the input features that are the most relevant in regards to each heart disease and increase the precision of the classification.
-
i.
-
b)
Dynamic Attention Weighting
-
i.
In classical attention mechanisms, the attention weight for each feature is calculated by a fixed collection of scores. This may cause problems if features have the same importance to several classes.
-
ii.
M2AM adaptively determines the weight of each feature based on its relevance to the current class. This flexibility enables the model to learn which features could be significant for some classes but irrelevant to others.
-
i.
Mathematical formulation
where \(score\left({x}_{i}\right)\) Is any score function applied to the ith item or a Null/Dummy value? In this way, attention is paid to each feature. \({x}_{i}\) Across all the classes, a global score is calculated.
-
In M2AM, since attention is calculated per class, we can calculate attention weights for each class in a class-sensitive manner. M2AM can be mathematically expressed as in Eq. 2:
$$Attention \;Weight_{i} = \frac{{\exp \left( {score\left( {x_{i} ,y_{k} } \right)} \right)}}{{ \mathop \sum \nolimits_{k} \exp \left( {score\left( {x_{i} , y_{k} } \right)} \right)}}$$
(2)
where: \({x}_{i}\) Is the feature vector of the ith feature, the category to be assigned to the class (coronary artery disease, arrhythmia, etc.), \(score\left({x}_{i},{y}_{j}\right)\) Is it the correlation value for the feature \({x}_{i}\) with class \({y}_{j}\). The denominator normalizes scores of attention over all classes. \({y}_{k}\).
By calculating class-specific attention weights, M2AM can make the model pay more appropriate attention to informative features for individual disease type, which leads to noticeable improvement in classification performance. A brief description of its architecture and operations is presented in Fig. 1. The complete working of the proposed model is as follows.

Architecture of proposed hybrid model.
IABPF methods for data preprocessing
Before ECG signals are fed into the classification model, they must be preprocessed appropriately to enhance their accuracy and reliability. The Improved Adaptive Band Pass Filter (IABPF) is crucial in this phase, as it improves the quality and integrity of the signals in numerous essential ways23.
-
Removal of Noise: Noises such as baseline drift, power line interference, and muscle noise can all mask the underlying cardiac activity in ECG signals. The IABPF efficiently filters out these unwanted disturbances, preserving the main characteristics of the ECG waveform. The IABPF ensures that only relevant frequency components are retained by dynamically modifying the cutoff frequencies in response to the distinct features of the incoming signal24. The filtering process is represented by Eq. (3). Here \({f}_{output}\left(t\right)\) Shows filtered output, fIresponse shows impulse response, \({In}_{signal}(T)\) Shows input signal (Eq. 3).
$$f_{output} \left( t \right) = \mathop \smallint \limits_{ – \infty }^{\infty } \left[ {fI_{response} \left( {t – T} \right)In_{signal} \left( T \right)} \right]dT$$
(3)
-
Improvement of Signal Quality: The IABPF enhances the ECG data’s signal-to-noise ratio (SNR) by eliminating noise more effectively. A high signal-to-noise ratio (SNR) is crucial for accurate feature extraction and significantly impacts the precision of the classification model. Enhanced signal quality facilitates the identification of features such as the P, Q, R, S, and T waves, thereby assisting physicians in diagnosing cardiac issues. Physicians require a high signal-to-noise ratio (SNR) to accurately monitor cardiac rhythm and perform precise and timely interventions23,25.
-
Model Mathematization: The IABPF’s adaptive characteristics enable real-time adjustment of its filter coefficients, ensuring optimal performance across a diverse range of ECG recordings. This adaptability facilitates effective preprocessing irrespective of the noise attributes of the incoming data. Adaptive filtering possesses the following characteristics, as shown in Eq. 4. Where \({d}_{coefficient}\) shows dynamic coefficient, T shows sampling time.
$$fI_{response} \left( t \right) = \mathop \sum \limits_{n = 0}^{N} d_{coefficient} \times \pounds \left( {t – nT} \right)$$
(4)
The sampling interval is designated as T. This dynamic adjustment is necessary to enhance the filter’s efficacy in accommodating the diverse noise profiles in different ECG recordings.
Wavelet transformations for feature extraction
To analyze ECG signals, you need to be able to extract features. In this step, raw data is converted into a format that facilitates easier analysis and sorting. Wavelet transformations are helpful for signals that don’t stay in one place, like an ECG, because they let you look at them in both the time and frequency domains. This distinguishes them in this context20,21,22,23,24,25,26. With wavelet transformations, a signal is broken down into essential frequency components while keeping information about time. The dual capability facilitates the identification of features that change rapidly in the ECG, which is necessary for a comprehensive analysis19,21. The continuous wavelet transform (CWT) of a signal is examined by using Eq. (5).
$$CWT \left( {x,y} \right) = \frac{1}{\sqrt x }\mathop \int \limits_{ – \infty }^{\infty } \left[ {s\left( t \right) {-}\!\!\!\!{\uprho} \left( {\frac{{\left( {t – y} \right)}}{x}} \right)dt} \right]$$
(5)
where:
-
\(x\) Shows scalar parameters and controls the width of the wavelet.
-
\(y\) Shows translation parameters; it shifts the wavelet in time.
-
\(s\left(t\right)\) Represents the input signal.
-
\({-}\!\!\!\!{\uprho} \left( {\frac{{\left( {t – y} \right)}}{x}} \right)\) Represents the wavelet function like Morlet.
-
\({-}\!\!\!\!{\uprho}\) Represents the scaled and translated version of the wavelet.
Selecting the primary wavelet
The initial step in applying wavelet transformations is selecting a primary wavelet that is appropriate and serves as the basis function. The Morlet wavelet and the Daubechies wavelet are widely used for ECG analysis. These wavelets can be used to examine various ECG signal components due to their differences. A Morlet wavelet (combining a Gaussian function with an exponential) can be calculated using Eq. 6. Where \({f}_{o}\) Shows frequency at the central level, α shows wavelet width control27.
$${-}\!\!\!\!{\uprho} \left( t \right) = \frac{1}{{\sqrt {2\pi } }}e^{{2\pi if_{o} \left( t \right)}} e^{{ – \frac{{t^{2} }}{{2\alpha^{2} }}}}$$
(6)
Wavelet decomposition
The next step, after selecting the primary wavelet, is the wavelet decomposition analysis. To achieve this, the wavelet coefficients x and y must be calculated at multiple scales15,28.
Feature extraction
Following the decomposition procedure, the wavelet coefficients are analyzed to identify significant characteristics within the data. Several essential characteristics can be deduced, including the following:
-
Signal Energy: When attempting to quantify the signal’s strength, the energy of the wavelet coefficients is helpful10,28. Equation 7 can determine it.
$$_{{ Signal_{Energy} }} = \mathop \sum \limits_{i = 1}^{N} |CWT \left( {x_{i} ,y_{j} } \right)|^{2}$$
(7)
-
Entropy: This measurement provides insights into the information content of a signal through an evaluation of its complexity. This can be determined using Eq. 829. Here \(Coff_{normalized}\) shows normalized coefficient.
$$_{ Entropy}^{ } = – \mathop \sum \limits_{i = 1}^{N} Coff_{normalized} {\text{log}}\left( {Coff_{normalized} } \right)$$
(8)
-
Peak Detection: Obtaining an accurate diagnosis requires identifying key peaks associated with specific components of the electrocardiogram (ECG), such as the P, Q, R, S, and T waves.
Signal reconstruction
In certain circumstances, reconstructing the signal from the wavelet coefficients can be advantageous to emphasize particular characteristics. This can be determined using Eq. 9.
$$s\left( t \right) = \frac{1}{{Nor_{factor} }} \mathop \sum \limits_{x} \mathop \sum \limits_{y} CWT\left( {x,y} \right){-}\!\!\!\!{\uprho}\left( {\frac{t – y}{x}} \right)$$
(9)
Reduce overfitting with modified multiclass attention
Overfitting occurs when a model retains the training data instead of deriving general principles. This leads to suboptimal performance when the model encounters unfamiliar data. The Modified Multiclass Attention Mechanism addresses this problem, dynamically adjusting the model’s emphasis on relevant characteristics, thereby enhancing its capacity for generalization30. The proposed model effectively addresses the complexities of ECG signal classification by integrating MCAM, BiLSTM architecture, and the classification layer, thereby enhancing accuracy and robustness against overfitting.
To tackle the overfitting problem, we further propose the M2AM, in which the attention for features dynamically varies according to their importance in each class. For more details, only the most informative features are considered, so that the model does not overfit on irrelevant features. The attention weight of the feature \({x}_{j}\) to the class \({y}_{i}\) It is formulated as (Eq. 10):
$$y_{j} = \mathop \sum \limits_{j = 1}^{N} \left( {w_{j} .x_{j} } \right)$$
(10)
where:\({w}_{j}\) Is the weight for the jth feature? \({x}_{j}\) Is the feature vector of the jth feature, \(N\) Represents the dimension of all features. Through class-based attention, the M2A encourages the model to generalize to unseen data. The Essential functions of M2AM are as follows.
Dynamic feature selection
M2AM assigns attention weights to different input features based on their relevance for classification. This adaptive mechanism enables the model to prioritize the most informative features while minimizing the impact of those that may introduce noise or are less relevant31.
Effect of regularization
When M2AM is used, the model becomes much more straightforward, which helps keep it from fitting too well. Only the most essential parts of the model are shown to make this possible. This enables validation datasets to work more effectively, which in turn generalizes the work better.
Representation in mathematical model
The M2AM mathematical model can be represented using Eq. (11). Where N shows the feature count, w shows the weight parameter, and x shows the feature vector32.
$$y_{i} = \mathop \sum \limits_{j = 1}^{N} \left( {w_{j} \times x_{j} } \right)$$
(11)
Algorithm for modified multiclass attention mechanism
Algorithm 1, the M2AM, outlines a comprehensive method for dynamically allocating class-specific attention weights to input features. The process begins with the computation of attention weights for each class, which are subsequently applied to the input features. The combined weighted features are subsequently input into a BiLSTM model, which captures both temporal dependencies and feature interrelations33,34,35. The SoftMax function is ultimately employed to categorize the input into the most likely heart disease classification. This mechanism enhances the model’s ability to prioritize relevant features for each class, thereby improving its classification accuracy in multiclass heart disease detection.

Modified multiclass attention mechanism (M2AM).
Deep bidirectional long short-term memory architecture
BiLSTM structure is used to extract the ECG signals from both forward and backward direction, which can learn temporal dependencies from past and future information. It is also essential to analyse the sequential ECG signal since details from both sides is practical when classifying. The LSTM cells regulate information via three gates: the input gate (\({i}_{t}\)), the forget gate (\({f}_{t}\)), and the output gate (\({o}_{t}\)). The cell state \({C}_{t}\) It is updated by using Eq. (17). Where \({\widehat{C}}_{t}\) Shows the candidate’s cell state.
$$C_{t} = (f_{t} \times C_{t – 1} ) + (i_{t} \times \hat{C}_{t} )$$
(12)
Bidirectionality
A BiLSTM comprises two LSTM layers: one that processes the input sequence forward and another that processes it backward. This dual approach enables the model to capture context from both directions, which is especially beneficial for sequential data4,17.
Output calculation
The Deep BiLSTM enhances feature representation by leveraging both historical and prospective context, thereby increasing the system’s capability to perform classification tasks3. The complete output \(h_{t}\) It can be calculated by using Eq. (18).
$$h_{t} = o_{t} \times \tanh \left( {c_{t} } \right)$$
(18)
Classification using output layer
The classification process involves converting the feature representation generated by the BiLSTM into a final output consistent with the predicted class labels. The last stage is classification, the information of the BiLSTM layer is passed through a SoftMax function as below Eq. (19).
$$P\left( {y = Tz} \right) = \frac{{e ^{ZT} }}{{\mathop \sum \nolimits_{j = 1}^{C} e ^{Zj} }}$$
(19)
where, \(C\) the total number of classes, and \(ZT\) Is the logit of the target class \(T\).
The training loss function, the cross-entropy loss, is presented by Eq. (20).
$$L_{fun} = – \mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{j = 1}^{C} y_{ik} \cdot \log \left( {P\left( {y = Tz_{i} } \right)} \right)$$
(20)
It utilizes the following key steps5.
Fully connected layer
The output is then sent through a fully connected dense layer that undergoes a linear transformation shortly after the BiLSTM layers have finished processing the input. This occurs immediately after the BiLSTM layers have completed processing the input data. A transformation T, for the output \({h}_{t}\) Calculated using Eq. (21), where b represents bias, W represents weight, and T represents the logits, which are applied before the activation function7.
$$T = (h_{t} { } \times {\text{W}}) + {\text{b}}$$
(21)
Activation function
To obtain probabilities for each class, the Softmax function is applied to the logits, as determined by Eq. (22).
$$\left[ {P(y = T|z)} \right] = \left[ {\frac{{e^{{Z_{T} }} }}{{\mathop \sum \nolimits_{j = 1}^{C} e^{{z_{j} }} }}} \right]$$
(22)
Loss function
The cross-entropy loss function is commonly used to evaluate the model’s performance during training. It can be calculated by using Eq. (23).
$$L_{fun} = – \mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{j = 1}^{C} y_{ik} (\log (P(y = T|z_{i} ))$$
(23)
Dataset description
This research utilizes two popular datasets, MIT-BIH and INCART. The complete details of the datasets are as follows.
MIT-BIH dataset
The MIT-BIH Arrhythmia Database is a prestigious dataset utilized as a standard for assessing algorithms in ECG signal analysis. This dataset, assembled by the Massachusetts Institute of Technology (MIT) and Beth Israel Hospital, consists of 48 thirty-minute segments of two-channel ECG recordings sampled at 360 Hz. The dataset encompasses a variety of arrhythmias, including premature ventricular contractions, atrial fibrillation, and normal sinus rhythm, enabling researchers to train and evaluate models on different cardiac conditions. Every record is labeled with beat classifications, facilitating the accurate evaluation of model efficacy in identifying arrhythmic occurrences32.
This dataset has 113,000 labelled beats from recordings. This detailed annotation enables researchers to test the model’s arrhythmia detection capabilities. The MIT-BIH database has revolutionized cardiovascular research by providing a foundation for machine learning and deep learning models to improve ECG signal analysis and diagnosis. Table 2 presents the details of the MIT-BIH dataset.
INCART dataset
Another valuable ECG analysis resource is the INCART (International ECG Database), which facilitates research on cardiac arrhythmias. Electrocardiogram recordings from 75 subjects indicate both cardiac abnormalities and normal conditions. Data is sampled at 250 Hz for each 10-s interval. The INCART database is valuable for creating algorithms that differentiate between normal and abnormal ECG patterns, encompassing healthy and arrhythmic subjects33. Like the MIT-BIH dataset, INCART’s comprehensive annotation provides detailed information about each beat. Researchers can meticulously apply and assess classification algorithms using this comprehensive annotation. The INCART dataset’s varied subjects and comprehensive annotations establish a robust foundation for training machine-learning models to enhance arrhythmia detection and classification. Table 3 presents the details of the INCART dataset.
Data pre-processing
Proper preprocessing of ECG recordings is critical for accurate analysis and modeling. The preprocessing stage of the proposed pipeline receives raw ECG traces from known databases such as INCART and MIT-BIH. These databases contain waving range of ECG signals, which is essential to develop a generalized model for different types of heart diseases.
Improved adaptive bandpass filter
To guarantee the reliability and accuracy of input data, we adopt the IABPF to remove undesired noise including the power line interference, low-frequency baseline drift and other high-frequency disturbances usually existing in the ECG signals. The filter cutoff frequency is determined according to the natural frequency of the ECG, typically ranging from 0.5 Hz to 100 Hz. This band preserves the essential characteristics of the ECG waveform and removes irrelevant noise components. Also, the IABPF is of adaptable nature which enables the filter to adapt its response in terms of the signal characteristics and can work efficiently on numerous ECG signals. We choose IABPF the additional essential reasons of the filter is decreased level of noise and preserving significant waves components (P, Q, R, S, T), which is critical to diagnose heart disease accurately. Using the IABPF optimizes the process so that only the target frequency components are preserved, which is essential for subsequent feature extraction and classification tasks.
Segmentation and normalization of the signal
After filtering, the continuous ECG signal is divided into small overlapping segments to reflect the rhythmic changes in heart activity. Each segment corresponds to a cardiac cycle, which facilitates the identification of time-domain features. Segmentation ensures that the most essential properties of the signal are retained and analyzed separately. Normalisation is then performed to normalize the amplitude of ECG signals amongst different samples. Such normalization is necessary when the model is fed with signals of different magnitudes. Normalizing the data has the critical benefit of making all segments equally crucial to the classifier, allowing it to learn the aspects that differ among segments. This fosters the learning of those aspects related to temporal and frequency properties, rather than the amplitude discernment.
Wavelet transformation for feature extraction
Due to the non-stationary properties of ECG signals, the wavelet transform is performed to obtain both time and frequency features. Wavelet transforms are well-suited for analyzing time-varying signals due to their multi-resolution analysis, which offers the advantage of representing both high-frequency transients (e.g., the QRS complex) and low-frequency components (e.g., P and T waves). This allows a model to capture slight squeezes in the heart and notice patterns are linked with passing different arrhythmias. The choice of wavelet transforms such as Morlet wavelet, Daubechies wavelet etc., is dependent on its capacity towards performing efficient time–frequency localization (which is necessary for preserving fine details of ECG signals). The features of the wavelet coefficients are then employed to train the model.
Data augmentation
To further increase diversity of the training dataset, data augmentation methods are employed which produce synthetic ECG signals. This operates by applying a slight transformation to the original signals, which involves scaling, shifting, and rotating the signal patterns, to promote model generalization. Data Augmentation can have a significant advantage, especially when the dataset is unbalanced or when the model requires more training data.
Class Imbalance
For the task of heart disease classification, it is challenging to balance classes, as certain heart conditions (e.g., normal sinus rhythm) occur more frequently than others (e.g., arrhythmias). We have done nothing on them, so we have to use some techniques to handle Oscillations. We use some techniques that can be used: During pre-processing, During training.
-
Weight balancing We draw samples from minority classes with a higher probability during training to focus the model more on underrepresented conditions.
-
Resampling Over- or under-sampling of the classes in the dataset can also be used to balance datasets.
-
Evaluation Metrics We employ precision, recall, and the corresponding F-measure as our evaluation criteria, rather than just using accuracy – we want to ensure that the performance on minority classes is adequately evaluated.
These measures enable the model to avoid biasing towards the majority class, resulting in the detection of heart diseases across all classes. Collecting and integrating the IABPF with the wavelet transformation, signal normalization, data augmentation and class imbalance management improves both the quality and robustness of ECG data heavily. By enhancing signal quality, feature extraction is improved, and the amount of heart disease information reflected in the data is minimized. So the preprocessing steps do not only enhance the quality of input data but also improves machine learning approximation process to give more precise and reliable detection in heart disease.
Comparison parameters
The following key parameters were used for comparative analysis between the proposed and existing models30.
Accuracy
Accuracy is a key metric for model prediction. It displays the percentage of true results, both positive and negative, in relation to the total number of cases evaluated. It provides a broad model performance overview, as presented in Eq. 19. Here, AC: Accuracy, TN: True Negative, TP: True Positive, FN: False Negative, TP: True Positive
$$AC = \left[ {\frac{{\left( {TP + TN} \right)}}{{\left( {TP + FP + TN + FN} \right)}}} \right]$$
(19)
Precision
Precision measures model accuracy in positive predictions. It shows how many positive predictions were correct. False positives can have serious consequences, making this metric crucial, as presented in Eq. 24. Here PR: Precision
$${\text{PR}} = \left[ {\frac{{\left( {TP} \right)}}{{\left( {TP + FP} \right)}}} \right]$$
(24)
Recall
Recall, also known as sensitivity or true positive rate, measures the model’s ability to identify all relevant positive cases. This demonstrates the model’s ability to accurately identify positive instances in the dataset. Medical applications require this metric because missing a positive case can be disastrous, as presented in Eq. 25. Here RC: Recall
$${\text{RC}} = \left[ {\frac{{\left( {TP} \right)}}{{\left( {TP + FN} \right)}}} \right]$$
(25)
F-measure (F1-score)
The F-Measure balances precision and recall. It evaluates the model’s performance by considering false positives and negatives. F-Measure offers a more nuanced perspective on model effectiveness, making it particularly useful for imbalanced datasets, as shown in Eq. 26. Here FM: F-measure
$${\text{FM}} = 2 \times \left[ {\frac{{\left( {PR \times RC} \right)}}{{\left( {PR + RC} \right)}}} \right]$$
(26)
