Supervised deep learning with Vision Transformers predicts delirium using limited guided EEG

This is the first prospective proof-of-concept pilot study to EEG assess the presence or absence of delirium in critically ill patients using a fast-response EEG device that provides data from all cerebral lobes and a supervised deep learning method (Vision Transformer). is.

This study (UMCIRB 17-001900 MIND) was reviewed and approved by the East Carolina Vidant Medical Center Institutional Review Board (UMCIRB) on March 13, 2018. Written informed consent was obtained from legally authorized representatives of the participants prior to any research activity. All research procedures were conducted in accordance with the ethical standards set by the UMCIRB IRB and the 1975 Declaration of Helsinki.

Setting/Sample

The protocol for this study was previously published in RINAH.^{twenty one}. Briefly, from March 2019 to March 2020, three intensive care units (cardiac, medical, and surgical ICUs) at a large rural academic medical center in North Carolina were analyzed for inclusion criteria and exclusions. Patients who met the criteria were recruited. All participants were over the age of 50. An English-speaking person, with written informed consent from a legally authorized representative, who required a ventilator for more than 12 hours. Exclusion criteria included acute brain injury, seizures, or symptoms that precluded entry into delirium screening. As participants are unable to self-consent, consent was obtained from a legally authorized representative prior to registration.

Each day, patients were assessed for their ability to participate in delirium screening as determined using the Richmond Agitation-Sedation Scale (RASS).^22,23. RASS had excellent inter-rater reliability (r = 0.956, lower 90% confidence limit = 0.948; κ = 0.73, 95% confidence interval = 0.71, 0.75)^22,23. A RASS score of -2 or greater (able to open eyes for 10 seconds or longer before speaking) met eligibility.

Demographic and clinical characteristics were obtained from electronic medical records (EMR).

measures

Bedside behavioral assessment for delirium

The Intensive Care Unit Confusion Assessment Method (CAM-ICU) is a modification of CAM developed to assess non-verbal ventilated patients in the ICU.^24,25. The CAM-ICU is one of two delirium screening tools recommended for use by the Diagnostic and Statistical Manual of Mental Disorders IV (DSM-IV), the gold standard for delirium identification, and the Society for Critical Care Medicine (SCCM). It is based on.^{1, 4, 26}. In the CAM-ICU, patients were screened to identify four cardinal features of delirium, including (a) sudden onset or fluctuations in mental status within the past 24 hours, (b) inattentiveness, and (c) altered level of consciousness. participation is required. [Richmond Agitation Sedation Scale (RASS) ≠ 0](d) disorganized thinking^24,25. When used in research, the CAM-ICU has high sensitivity and specificity of 93% and 98%, respectively, and high inter-rater reliability of k.0.79.

Physiological assessment of delirium

A responsive EEG headband surrounding the head was applied daily to each participant. Placement accuracy was based on the position of the headband clasp in the middle of the forehead, the electrodes were numbered 1-10, and the headband was connected to the recorder at the occipital hairline. EEG waveforms are immediately visualized on the EEG recorder, which uses a color-coded diagram (green = low impedance/red = high impedance) on the headband to identify the quality of the connection (e.g., impedance). increase. EEG monitoring was performed for his 2 h from 5:00 pm to 9:00 pm (17:00–21:00) nightly for 4 days or after her ICU discharge. After 1 hour of EEG monitoring, the research team assessed the participants for delirium using his ICU Confusion Assessment Method (CAM-ICU). To be considered delirium-positive using the CAM-ICU, participants must have a core of delirium, including either an acute change in baseline mental status, manifestations of inattention, altered level of consciousness or disturbed thinking. must meet at least 3 of the criteria.

Processing of EEG data

Prior to analysis, EEG data were processed to remove artifacts such as facial muscle movements and interference from nearby equipment such as ventilators and heart monitors. To do this, filters are used to remove high and low frequencies. The data are then re-referenced to estimate the physiological noise and divided into multiple discrete time periods called epochs. After preprocessing, the EEG data are further “cleaned” using Individual Component Analysis (IDA) to remove noise and generate the features needed for machine learning algorithms. Component analysis is widely accepted as a method of cleaning data by separating artifacts from data originating from cortical processes.^14,27. An advantage of independent component analysis using higher-order statistics is that artifacts can be easily subtracted by directly examining the independent components of the data.

machine learning

EEG analysis techniques vary from study to study. Therefore, both traditional and three machine learning methods were initially used to analyze these data. Specifically, random forests (sets of decision trees), stepwise linear discriminant analysis (removal of variables that do not help classify the data, in this case delirium -/delirium +). , and a support vector machine (the computer constructs the model that provides the largest difference between the categories, in this case delirium-/delirium+). Due to the challenges of feature selection, a supervised deep learning technique, namely Vision Transformer, was primarily used in this analysis. In many deep learning applications, deep neural networks can learn these subtle features directly from the input data, often without the need for advanced data processing techniques or feature engineering. Therefore, two types of data were studied. The first type of input data underwent preprocessing (removal of muscle movements and device interference) and her IDA cleaning, the second type of input data underwent only preprocessing without her IDA cleaning . Data were extracted from his EEG device every 4 milliseconds, with each data sample containing eight sensor readings.

A number of contiguous data rows are organized into data slices. \(8 \timesn\) array, where \(n\) Represents the number of rows. These arrays are resized as follows: \(224 \times 224\) They are processed as input images to the ViT model using bilinear interpolation, as shown in Figure 2.

Our main aim was to identify the frequency ranges within the wave image that most affect the classification results. Our assumption is that if the image segment does not contain a complete cycle of waves at a particular frequency, the effect of this frequency is unlikely to be significant in the classification results. Controlling the length of the data slice allows you to control the range of frequencies contained in the image.For example, if \(n = 25\)then the timespan of that wave image is \(0.1 second\) The corresponding frequencies are \(f > 10\,\text{Hz}\). in short, \(n\) Determines the lowest frequency contained in the image. To investigate the effect of partial frequencies at different phases and increase the data size, we split the waveform images with overlapping segments. Such processing can better reflect the relationship between waves of different frequencies.

To understand how the results relate to the size of the slice, five different length has been selected. 1250 lines (5 seconds). Due to the small population size, the data are expanded using an overlapping window scheme. In this scheme, the starting row of the next data slice will be placed somewhere within the current data slice rather than after the last row of the current data slice. An example of a data slice with 30% overlap of 1 second (250 rows) is shown in Figure 3. Signals relevant to EEG studies include alpha (8-12 Hz), beta (15-30 Hz), and delta (0.5-3 Hz). Hz), gamma waves (> 30 Hz), and theta waves (4–7 Hz).When to use \(T\) Expressing the time extent of each data slice, the frequency at which images can be detected is: \(f = \frac{1}{T}\). In this study, the highest minimum frequencies that can be detected are: \(10\,\,\text{Hz}\) when \(T = 0.1 seconds\)the minimum frequency that can be detected is: \(0.2\,\,\text{Hz}\)when \(T = 5 seconds\). The data slices are split randomly into test and training sets, but it avoids all data from one subject going into his one (training+test) set. The positive and negative cases in both the training and test sets are relatively balanced, with a ratio close to 1.

In this study, the model’s default hyperparameters (batch size = 64, learning rate = 0.001, depth = 12, head = 8) were used. Overlap rates of 75%, 90%, and 95% were studied. A 90% window His overlap ratio was used to report the results, as no significant differences were found between these different overlap ratios. Using the ViT model, the EGG data converge very quickly. In most situations, the training accuracy achieved over 99% in just 3 epochs. To avoid overfitting, we evaluated the test dataset with the trained model after 5 epochs.

The reason we use a trance-based CV model, ViT, rather than a trance-based language model, is that the data is a mixture of waves with different frequencies. Given that waves repeat periodically, an image segment containing at least one complete cycle of a particular frequency may not provide enough information to analyze the data. In this study, we investigated image segments containing sets of waves of different frequencies. Transformer-based NLP models are not the best models for analyzing EEG wave images because EEG data is best viewed in wave image format. Therefore, he chose ViT instead of Transformer for data analysis.

Also note that it is customary to transform time series data into spectral images via transform techniques such as the Fast Fourier Transform (FFT). We argue that such transformations are not well suited for ViT models. This work also investigated the effect of adding his FFT to the process. The workflow is shown in Figure 3.

Public datasets are needed to better understand the value of ViT models in EEG data analysis.²⁸ It was also used to perform binary classification tasks. The data slice was set to 1250 rows and the duplication rate was set to 90%. This model achieved a test accuracy of 86.33%. This outperforms the state-of-the-art algorithm SleepEEGNet with 80.03% accuracy. Pilot results show that ViT is better suited for analyzing EEG data than existing algorithms.

Source link