Database
The study was approved by the Institutional Review Board of Massachusetts General Hospital. Since this was a retrospective study, informed consent was waived by the ethics committee. This retrospective study was performed in accordance with the Declaration of Helsinki. Adhering to the institutional review board guidelines, we obtained de-identified data from the bed-side monitors of the intensive care units (ICUs) of Massachusetts General Hospital (MGH).
The data, ~2TB telemetry waveform data, consists of 4 channels of ECG, ABP and SpO2 waveforms recorded using two device manufacturers: GE Healthcare and Philips Healthcare patient monitoring systems with sampling rates of 240 Hz and 250 Hz, respectively. The ABP was collected using a gold-standard intra-arterial catheter approach, providing continuous systolic and diastolic BP. Since the devices collected data at different sampling rates, data was resampled to 250 Hz. After this step, the waveforms from all devices were treated equally. In the hybrid convolutional neural network (CNN) noise detector model, data was screened for noise across devices, therefore forcing the annotator to learn and perform well across devices. We used streaming data acquired from 282 ICU patients at MGH (~30 M beats, ~400 patient days).
To address memory constraints posed by the large dataset, the data was partitioned into smaller, manageable segments for preprocessing and training. Parallel processing techniques were employed to accelerate preprocessing and feature extraction, utilizing distributed computing. The preprocessed data and extracted features were stored in MATLAB’s .mat format. Key features from each segment were then concatenated into a single array for training, ensuring the model had access to essential data without overloading system memory.
Patient demographics
Of the 282 patients (see Table 2), 181 were male, 99 were female, and for 2, the gender information was unavailable. With respect to race, 8 were Asian, 11 were Black, 17 were Hispanic, 214 were White, and for 32 patients, the self-declared race was unavailable. The patient population had a mean age of (66.21 ± 14.39) years and a mean body mass index (BMI) value of (28.77 ± 7.22) (see Supplementary Table 11).
Hybrid CNN approach for noisy signal detection
We wanted to ensure that the physiological signals were noise-free. A hybrid-CNN approach was accordingly designed and trained for the purpose of detecting noise in ECG, SpO2, and ABP; details of the noise-detector have been provided in the Supplement (see Supplementary Fig. 17, Supplementary Table 12). Additionally, SpO2 and ABP were considered noisy if their values ranged outside 0–100% and 0–250 mmHg ranges, respectively.
A time-segment of physiological signals was considered noise-free only when the corresponding SpO2, ABP, and all 4 ECG leads were identified as noise-free, and appropriate for training/testing of our BP estimation method. This procedure of identifying noise-free segments of physiological signals was applied to the entire repository of telemetry data, and all noise-free segments were segregated for further processing. For the hybrid-CNN noise model, 80% of the data was allocated for training and validation, while the remaining 20% was reserved for testing. During testing we found that our trained hybrid-CNN model detected noise in the ECG with a sensitivity of 94.0% and specificity of 91.9%, the hybrid-CNN BP noise classifier provided a sensitivity and specificity of 88.6% and 90.9% respectively, and the hybrid-CNN SpO2 noise classifier provided a sensitivity and specificity of 98.5% and 94.9%, respectively60.
Supplementary Fig. 17 illustrates the hybrid-CNN approach for detecting noise in the ECG, SpO2 and ABP. Supplementary Fig. 18 displays an example of a noisy and a non-noisy ECG signal. The method was designed to process ECG, SpO2 and ABP signals of length 4-sec with 0.5-sec overlap, at a sampling rate of 250 Hz.
Features were extracted from the ECG, SpO2 and ABP using both hand-engineered signal processing techniques and convolutional layers (as used in a convolutional neural network), and finally with a fully connected network, the method determined whether the ECG signals were noisy or clean. The part containing convolutional layers had the following architecture: the first 1D-convolutional layer had 16 filters of length 500 followed by a rectified linear unit (ReLU) and 1-D max pooling layer of size 2, afterwards we had another 1D-convolutional layer with 32 filters of length 250, followed by a rectified linear unit (ReLU), and a 1-D max pooling layer of size 2. We optimized the classifier performance using the Tensorflow Keras Tuner Bayesian Optimization using the cross-entropy loss function, by varying the number of filters, filter length and pooling operations, to determine the effect of the convolution filter length as well as the network capacity, in terms of the number of trainable parameters.
Following 3 expert clinician annotation of ~80 K s of signals from different patients, as either clean or noisy, the noisy signals were found to be primarily of four types:
-
(i)
large artifacts caused by loose contact of leads
-
(ii)
huge baseline-wander due to patient movement
-
(iii)
high frequency noise/jitters
-
(iv)
flat segments and signal saturation
We designed a method to account for the above types of noise. Specifically, as a first step, the noise detector examined the presence of a flat segment or ECG, ABP and SpO2 saturation; if detected, the signal was ascertained as noisy. If the signal passed this step, we extracted the following hand-engineered features:
-
(1)
Periodicity measure61 measures the variability in R-R intervals. If we let vector \({\bf{I}}\) contain the R-R intervals, and \({{\rm{\sigma }}}_{I}\) and \(\bar{I}\) be the standard deviation and mean of the vector, then the periodicity measure is computed as in Eq. (1):
$${PM}=100-\frac{100\times {{\rm{\sigma }}}_{I}}{\bar{I}}\,$$
(1)
-
(2)
Correlation measure61 measures the mean correlation of successive QRS complexes of all beats. Clean signals are generally expected to exhibit a high correlation measure.
-
(3)
Maximum signal energy beyond 12 Hz60 measures the maximum energy of high frequency components in the signal. The feature is computed by taking the maximum of the spectral energy contained in frequency components above 12 Hz. A higher signal energy above 12 Hz indicates poorer signal quality.
-
(4)
Sharpness60 measures how steep does the signal change around the QRS. A well-defined QRS is expected, except for abnormal cases such as ventricular fibrillation or ventricular tachycardia. Sharpness is given by \(\left(\frac{2}{\pi }\right)* {\tan }^{-1}\bar{S}\); \(\bar{S}\) denotes the sharpness of each QRS mean.
-
(5)
δP Stability60 measures the consistency of waveform amplitudes by assessing the difference between maximum and minimum amplitudes. Higher Stability indicates better signal quality.
-
(6)
Peak height stability60 measures the consistency of the ECG R-peaks amplitude over time. A more stable peak height indicates better signal quality. The peak height stability is given by \(1-\frac{{S}_{{\boldsymbol{\delta }}{\bf{P}}}}{\overline{\overline{{\boldsymbol{\delta }}{\bf{P}}}}}\).
Signal pre-processing and delineation
The baseline wander of the ECG signals was removed using a median filter based method62,63,64. The median filter62 allowed the replacement of each data point with the median of neighboring points, effectively removing outliers and preserving edges in the signal, without significantly distorting the QRS complex of the ECG signal65. Its performance has been compared to other methods66,67, with the results showing that while wavelet-based techniques and FIR filters may offer superior performance in some metrics, the median filter remains a viable option due to its simplicity and effectiveness. Unlike other methods, like spline interpolation, it does not rely on detecting the PQ interval or the precise annotation of fiducial points such as the R-wave or the P-wave, making it less prone to errors from automatic signal processing software. The presence of pacing spikes is a common concern while identifying the R-wave peaks in ECG signals. For patients with a pacemaker, a 51-order FIR filter with a 15 Hz cut-off frequency68 was used on the ECG to filter out the pacing spikes before the delineation process (Supplementary Fig. 19). Then, the data was calibrated using scale factors provided by the device manufacturers to convert signals into having standard units. Then, a state-of-the-art wavelet-based method by Martinez et al., 69,70 was used for delineating the ECG signal.
To delineate the SpO2 and ABP signals, we employed a modified version of Zong’s method to identify valleys61. This approach builds on Zong’s original open-source algorithm but introduces a few modifications. First, the low-pass filtering step was omitted, preventing artifacts and residual effects that could distort the signal; instead, a second application of the slope sum function was used to enhance sharp slopes and improve the detection of critical features in the signal. Lastly, the adaptive thresholding mechanism in Zong’s method was replaced with Martínez’s algorithm69, commonly utilized for QRS detection. Once the valleys in the SpO2 and ABP signals were identified, the maximum and minimum amplitudes between consecutive valleys were recorded and stored for subsequent feature extraction.
Beat sequence identification and selection
We then divided the physiological signals into window segments (number of heart beats) of fixed length, wherein the input signal (ECG, SpO2) characteristics were extracted and used to estimate the corresponding median BP of that segment/window. With short windows, we can get more instantaneous BP estimates, but inaccuracies due to even slight disruption of the heart rhythm may affect the estimation; on the other hand, longer window segments would capture the overall signal characteristics, but the estimates would not be instantaneous. The length of the window is therefore an important parameter to optimize.
At the single beat level, the extent of a single ECG beat (in all 4 leads) was defined from the end of T-wave of prior beat to the end of T-wave of the current beat; for SpO2 and ABP, the corresponding beat was defined between two consecutive valley points. Figure 5 illustrates these extents.

A single beat in electrocardiographic (ECG, from all 4 leads) extends from the end of T-wave of last beat to the end of T-wave of current beat. For oxygen saturation of peripheral arterial blood (SpO2) and arterial blood pressure (ABP), the corresponding beat extends between two consecutive valley points.
We identified windows consisting of {1, 5, 10, 20, 30, and 50} consecutive beats from the entire noise-free data, and made sure that all sequences of beats formed contiguous segments of signals. In case the extent of a beat was not certain in all six physiological signals, the beat was dropped from the analysis, and the corresponding sequences containing such beats were also dropped. Finally, we ensured that the beat sequences used for the BP estimation were non-overlapping, i.e., no two sequences had overlapping data.
Feature selection and estimation
All features employed in this study are presented in Supplementary Table 13; these involve time-based, spectral, statistical, and non-linear features. In this study, we build upon prior studies71,72,73 by developing and incorporating features pertinent to the BP estimation, such as the T2SpO2, QT-interval duration, T-wave amplitude, Kaiser-Teager energy features, autoregressive coefficients, as well as subject-specific characteristics such as gender, body mass index and race.
Overall, for each window/sequence, we extracted 48 features from the SpO2 and the 4-lead ECG signals. These included the heart rate (HR), statistical measures obtained from each ECG lead (i.e., signal mobility, signal complexity, fractal dimension, entropy, and autocorrelation), duration of the corrected QT-interval and T-wave amplitude, the pulse arrival time (PAT), T2SpO2, entropy of the power spectral distribution of the SpO2, statistical measures over the Kaiser-Teager energy of the SpO2 signal, and coefficients of an autoregressive model fit of the SpO2 signal. The features were not normalized to maintain the interpretability of the physiological values.
Some features, such as the corrected QT-interval and the T-wave amplitude, were calculated on a beat-to-beat basis. For such features, if the sequence had a length greater than one beat, the median value of the feature across all beats in the sequence was used as the final value.
The minimum and maximum amplitude ABP values of each beat were identified as the gold-standard diastolic and systolic BP values of that beat. If the sequence had a length greater than one beat, the median systolic /diastolic values across all beats in the sequence were used as the gold-standard. Only those sequences that had a diastolic/systolic BP between (80–220) mmHg /(40–120) mmHg were used in our studies.
In this manner, from every sequence, one input feature vector and a corresponding gold standard estimate for systolic/diastolic BP were extracted; together, the resulting pair was termed as one data sequence from the perspective of ML.
Features characterizing ECG morphology
We have used 7 features to characterize the ECG morphology of each of the four leads, resulting in a total of 28 features. The 7 features are explained below.
-
(1)
Signal mobility, measures the level of variation in the signal. If \({\boldsymbol{x}}\) denotes the ECG signal vector of length \(N\), \({x}_{i}\) is the \({i}^{{th}}\) sample, and \({d}_{j}={x}_{j+1}-{x}_{j}\), is the first order variation in the signal, then \({S}_{0},\) \({S}_{1}\), and signal mobility are defined as in Eq. (2):
$${S}_{0}=\sqrt{\frac{{\sum }_{i=1}^{N}{x}_{i}^{2}}{N}},{S}_{1}=\sqrt{\frac{{\sum }_{j=2}^{N}{d}_{j}^{2}}{N-1}}$$
(2)
where, signal mobility, is defined as the ratio of \({S}_{1}\) to \({S}_{0}\).
-
(2)
Signal complexity, measures the second order variation in the signal (i.e. first order differences of \({\boldsymbol{x}}\), \({\boldsymbol{d}}\),). Following above notation, signal complexity is denoted as \({S}_{2}\), and is defined as in Eq. (3):
$${S}_{2}=\sqrt{\frac{{\sum }_{j=2}^{N-1}{({d}_{j+1}\,-\,{d}_{j})}^{2}}{N-2}}$$
(3)
-
(3)
Fractal dimension, together with multi-scale entropy, measures the self-similarity of the signal. It identifies the patterns hidden in the signal at different portions and different zoom levels, and then compares them23.
-
(4)
Entropy, measures the state of disorder or randomness in the ECG signal. To compute the entropy, the entire amplitude variation (scale) of the ECG signal is divided into \(M\) bins, and then the number of occurrences when the ECG sample value falls in each of the bin is computed. Dividing the number of occurrences to the total number of samples gives the probability of an ECG sample value to lie in one of those bins. Once the probability \({p}_{i}\) is computed for all bins, the entropy is computed as in Eq. (4):
$${\rm{E}}{\rm{n}}{\rm{t}}{\rm{r}}{\rm{o}}{\rm{p}}{\rm{y}}=\mathop{\sum }\limits_{i=1}^{M-1}{p}_{i}{\rm{l}}{\rm{o}}{\rm{g}}\left(\frac{1}{{{\rm{p}}}_{{\rm{i}}}}\right)$$
(4)
-
(5)
Autocorrelation, measures the similarity/correlation of the signal with a delayed copy of itself, and helps identify repeating patterns. For extracting a single feature here, we output the autocorrelation function value at the center.
-
(6)
QT Interval, measures the median QT interval duration across all beats in the sequence, from the corresponding ECG lead. The QT interval encompasses the duration of ejection/systole part of the cardiac cycle71; hence, we believe it will be useful for BP estimation.
-
(7)
T-wave amplitude, measures the median T-wave amplitude across all beats in the sequence, from the corresponding ECG lead.
Features characterizing SpO2 morphology
We extracted 10 features from the SpO2 signal. Of these, four are taken from the Kaiser-Teager energy vector, one from the power spectral distribution of the SpO2 signal, and five from an auto-regressive model which is made to fit over the SpO2 signal.
-
(1)
Kaiser-Teager energy (KTE), is a popular tool used in signal processing for finding the energy profiles of signals with periodic components, and for end-point detection. If \({\boldsymbol{s}}\) is the vector containing the SpO2 signal, \({s}_{i}\) denotes the \({i}^{{th}}\) sample of the SpO2 signal, and vector \({\boldsymbol{KTE}}\) is the Kaiser-Teager energy of the SpO2 signal, then KTE is computed as in Eq. (5):
$${KT}{E}_{i}={s}_{i}^{2}-{s}_{i+1}.{s}_{i-1}$$
(5)
The \({\boldsymbol{KTE}}\) vector has unique properties. If the SpO2 is a periodic signal (signal made of periodic components), the mean value of the \({\boldsymbol{KTE}}\) vector is high, and if the SpO2 signal is noisy, the mean value of the \({\boldsymbol{KTE}}\) vector is low. Four features are extracted from the \({\boldsymbol{KTE}}\) vector, namely the mean of the vector (\({KT}{E}^{\mu })\), the variance (\({KT}{E}^{\sigma })\), the inter-quartile range (\({KT}{E}^{{iqr}}\)), and skewness (\({KT}{E}^{{skew}}\)).
-
(2)
Spectral entropy, the power spectral distribution of SpO2 signal is computed using the Fast Fourier Transform. We then compute the entropy of the power spectral distribution and utilize it as one of our features; this feature is called spectral entropy. Clean signals are expected to have low spectral entropy, while noisy signals (that may include a flat or tilted power spectrum) are expected to have high spectral entropy value.
-
(3)
AR coefficients, the coefficients learned while training an autoregressive (AR) model over the SpO2 signal, can provide useful information on the propagation of the ECG signal through arteries, veins and capillaries72. The AR coefficients appear to model the spectral envelope of the SpO2 signal and also model the basic shape of the pulse. For our approach, we train a 5th-order AR model over the SpO2 signal, and the five coefficients learned from the signal are used as features.
Other features
-
(1)
Heart rate, the median heart rate of the patient as measured in the given window, was utilized as a feature. The heart rate was measured by identifying the median R-R interval across all 4-leads of the ECG and across all beats. For windows of length one beat, the instantaneous heart rate, as measured by the R-R interval between R-wave peak of current beat to that of the last beat, was used73.
-
(2)
Pulse Arrival Time (PAT), is expected to give an indication of the pulse propagation velocity within arteries. It involves the detection of blood pulse at different arterial sites, and then measures the respective time delays. There are numerous studies24,74 which have indicated a strong relationship between PAT and BP. Here, we measure PAT as the mean time delay between the ECG signal’s R-wave peak (onset of pulse) to the corresponding peak observed in the SpO2 signal (which is measured at fingers, ears etc). PAT has been depicted in Fig. 5.
-
(3)
T2SpO2 is measured by taking the duration between the peak of SpO2 and the end of T-wave of the corresponding beat. T2SpO2 is depicted in Fig. 5.
Patient characteristics
We also included patient specific details as features/inputs for BP estimation; these details were retrieved from their medical records. The features are:
-
(1)
Age of the patient75.
-
(2)
Body mass index (BMI) of the patient76.
-
(3)
Gender of the patient. This variable was treated as a categorical variable. Value of “1” was used to denote “female”, “2” was used to denote “male”, and “NaN” was used to denote the case where either the gender of the patient was unavailable or when it was different from male and female75.
-
(4)
Race of the patient. The self-declared race was also treated as a categorical variable. The races considered were: Asian, Black, Hispanic and White. To input this information to ML models, race was encoded as a feature vector of size four, in a way similar to one-hot vector encoding. If the race was “White”, the four sized feature vector would be [1, 0, 0, 0]; if race was “Hispanic” the vector would be [0, 1, 0, 0]; if race was “Asian” the vector would be [0, 0, 1, 0], and if race was “Black” the vector would be [0, 0, 0, 1].
Random forest (RF) based regression model and performance assessment
Once the input features and gold-standard output were extracted, we proceeded towards fitting an RF based regression model. For this study, we used the Treebagger function from MATLAB 2018a, which is MATLAB’s implementation of the RF algorithm77. For a small and simple search space, such as optimizing a single feature with discrete values, a grid search was preferred due to its simplicity and interpretability. The computational cost of the grid search in this context was manageable, and it allowed for a comprehensive understanding of how the number of trees—and thus model complexity—impacted performance.
We built separate RF regression models for estimating the systolic and the diastolic BP. Feature selection was performed using Random Forest’s Out-Of-Bag (OOB) samples to measure each feature’s importance based on how much the prediction error increased when a feature’s values were randomly permuted, while using curvature tests to select features for splitting nodes in the decision trees. Information on the RF structure and hyperparameters can be found in Supplementary Table 14.
To assess the BP estimation performance, we trained/tested the ML method using five-fold cross-validation. Since all sequences were non-overlapping, this ensured that no overlap existed in the train and test data of the five folds. A five-fold cross-validation approach was utilized to determine the random forest model parameters, including the optimal number of trees, window length (number of beats as input), feature selection, and the number of leads, as well the final evaluations for all models, including the patient-specific model, where for each test-set fold, a model is trained with the patient data also divided in five-folds. This method provided a balance between computational efficiency and robust evaluation by testing across multiple test sets, reducing the risk of overfitting to a single data partition and ensuring reliable performance estimation.
Statistical methods
Continuous data are presented as mean and standard deviation, median and inter quartile ranges are used in the box plots. We used the linear mixed model (LMM) to assess the significance of observed differences between groups, using a significance level of 0.01. To meet the normality assumption of an LMM, a log transformation was used to approximate the residuals to a normal distribution, and p-values were calculated to assess statistical significance.
We conducted all statistical analyses using MATLAB (Version: R2018a) and Python (Package: StatsModels; Version: 0.14.4).
