Machine learning-based predictive models for the occurrence of behavioral and psychological symptoms of dementia: model development and validation

Study design

This study utilized a prospective observational design with three-wave data collection to build predictive models for BPSD subsyndromes. The second wave of data collection was conducted after the first wave, with repeated measures from participants in the first wave who had agreed to participate in the second wave. A detailed description of the first and second waves of data collection is reported elsewhere²⁰. In the third wave of data collection, a validation dataset was collected from new participants independently of the first and second wave data. We employed the first and second wave data for model training (i.e., the training dataset) and the third wave data for external validation (i.e., the test dataset).

We employed a standard mining methodology that comprised four steps: (1) data acquisition, (2) data preprocessing (e.g., data cleaning, class imbalance training, and dataset class optimization), (3) model learning, and (4) model evaluation²¹.

Recruitment and data collection

The first wave of data collection was conducted between June 2018 and June 2019. Eligible older adults with dementia living at home were recruited via on-site visits from outpatient neurological clinics at two tertiary hospitals and daycare centers in Seoul and the broader Gyeonggi region in Korea. The second wave of data collection, which involved first-wave participants who agreed to continue the study, was administered between July 2019 and June 2020. For external validation, eligible participants were recruited between July 2020 and November 2020 from an outpatient neurological clinic, where the first and second waves of data collection were conducted. The inclusion criteria applied to the three-wave data collection were (1) being at least 65 years old, (2) having a diagnosis of dementia, and (3) having a score of less than 24 on the Korean version of the Mini-Mental State Examination (K-MMSE)²².

Eligibility screening and data collection were performed by trained research staff.

After eligibility was established, trained research staff collected demographic and health data through interviews with family caregivers and older adults with dementia. Furthermore, chart reviews and standardized scales for physical, functional, and neuropsychological assessments were administered. Following the baseline assessment, the participants wore an actigraphy device on their wrists continuously for two weeks, and primary caregivers logged BPSD in the symptom diary daily for 14 consecutive days.

Ethical considerations

All procedures performed in this study involving human participants were in accordance with the ethical standards of the institutional or national research committee and with the Declaration of Helsinki (1964) and its later amendments or comparable ethical standards. Institutional review board approval was obtained from the Yonsei University Health System Severance Hospital (IRB 4-2018-0348, 4-2019-0314, 4-2020-0454) and Ilsan Hospital (IRB 2018-10-002-001, 2019-08-012-001). Legal representatives of all the participants provided written informed consent before enrollment after receiving a full explanation of the study procedures. The participants also provided verbal assent and written informed consent was obtained when possible.

Features

Demographic and health data

At baseline, demographic and health data comprised age, sex, marital status, education level, dementia diagnosis, and neurological and psychiatric medications.

Cognitive and functional status

Scores (range 0–30) on the K-MMSE were used to assess cognitive functioning, with lower scores indicating greater cognitive deficits²². For the K-MMSE, Cronbach’s α was 0.91 in older Korean adults with dementia²³. The severity of dementia was measured via the Korean version of the expanded Clinical Dementia Rating (CDR) scale, which assesses six functional domains—memory, orientation, judgment and problem-solving, community affairs, home and hobbies, and personal care²⁴. The summed score ranged from 0 (none) to 5 (terminal dementia), and good inter-rater reliability for the overall CDR ratings was confirmed in Korean patients with dementia (kappa value range: 0.86–1.00)²⁵. Functional independence was evaluated using the Korean version of Activities of Daily Living (K-ADL), which consists of seven items rated on a 3-point Likert-type scale, with higher scores indicating more severe levels of dependency²⁶. The K-ADL was validated for older Korean adults with dementia, with good reliability (Cronbach’s α = 0.94)²⁶.

Personality type

The family caregiver informant-rated premorbid personality traits of older adults with dementia were assessed using the Korean version of the Big Five Inventory (BFI-K)²⁷. The BFI-K constitutes 15 items rated on a 5-point Likert-type scale that measures 5 domains of personality traits: openness, conscientiousness, neuroticism, extraversion, and agreeableness. The internal consistency of the BFI-K was good; Cronbach’s α ranged from 0.67 to 0.82²⁷.

Actigraphy data: nighttime sleep and physical activity

Older adults with dementia were fitted with a wrist-worn actigraphy device (ActiGraph wGT3X-BT, ActiGraph Corporation, Pensacola, FL, USA), which they wore all day for 14 consecutive days. The participants were instructed to remove the device when bathing or for a few minutes as needed. Previous validity studies have demonstrated that wrist actigraphy is a reliable and suitable method for objectively measuring sleep–wake cycles in older adults with dementia^28,29. Raw acceleration data were collected along the three axes. We used ActiLife (version 6.13.3, Pensacola, FL, US) software to export the data and process raw acceleration data to sleep and physical activity parameters using vector magnitude count in 60-s epoch data (i.e., counts per minute). The vector magnitude is calculated as the square root of the sum of the squares of acceleration for each of the three axes (x, y, z). The Cole-Kripke algorithm was applied to score a one-minute epoch as asleep or awake³⁰. Moreover, the previous night’s sleep parameters were employed to predict BPSD the following day. In this study, nighttime sleep was defined as the period between 20:00 (8:00 pm) and 08:00 (8:00 am). The following nighttime sleep parameters were generated: total sleep time, wake time after sleep onset, sleep efficiency, defined as the ratio of sleep duration over the assumed sleep period (total sleep time/[total sleep time + wake time after sleep onset] × 100), number of awakenings, and mean awake length (wake time after sleep onset/number of awakenings). The following physical activity parameters were also generated: energy expenditure (calories burned) in kcal per day, metabolic equivalents per day, total time spent in moderate-to-vigorous physical activity per day, percentage of time spent in moderate-to-vigorous physical activity per day, and the number of steps per day. We employed physical activity parameters measured the same day, which reflected the physical conditions during the day when BPSD occurred.

Symptom diary data: BPSD and caregiver-perceived symptom triggers

A symptom diary that comprised a structured, easy-to-use checklist modeled on the Neuropsychiatric Inventory (NPI) was developed to assess the presence and severity of BPSD (i.e., delusions, hallucinations, agitation/aggression, depression/dysphoria, anxiety, elation/euphoria, apathy/indifference, disinhibition, irritability/lability, aberrant motor behaviors, sleep and nighttime behaviors, and appetite and eating disorders) daily³¹. It also included a checklist that assessed caregiver-perceived triggers of BPSD (i.e., hunger/thirst, urination/bowel movement, pain/discomfort, sleep disturbance, noise, light, temperature), interpersonal triggers (i.e., factors related to the person(s) who were present), and changes in the environment. Caregivers were also instructed to check “other causes” in the symptom diary if the perceived trigger was a factor that could not be categorized under any of the options listed in the checklist, and then, list the factors. Family caregivers were instructed to check all options that were perceived as triggers of BPSD on the same day when the symptoms had occurred. The symptom diary was designed to overcome recall bias (e.g., the NPI is based on the caregiver’s two-week retrospective rating), enable daily monitoring of the occurrence of BPSD, and link symptoms to triggers daily³².

Recent studies have established that clustering several individual BPSD that are highly correlated and co-occur enhances the clinical utility of the assessment of BPSD, thus allowing for a more meaningful interpretation of the study findings and increasing power by raising the number of participants who endorsed the symptom cluster rather than the individual symptoms alone^2,33,34. Based on previous NPI factor analysis studies, we clustered certain individual symptoms into three subsyndromes: psychotic symptoms (hallucination and delusion), affective symptoms (depression, anxiety, and apathy), and hyperactivity symptoms (agitation/aggression, disinhibition, and irritability)^34,35,36,37. As prior studies have demonstrated that euphoria/elation, aberrant motor behaviors, sleep and nighttime behaviors, and appetite and eating disorders do not load into any clusters^{33,34,36,37,38}, we analyzed them as individual subsyndromes consisting of only one symptom.

Data preprocessing

Missing actigraphy data were encountered for two main reasons: the improper wearing of the device and lack of participant compliance (e.g., constant removal of the device or not wearing the device). The number of participants with missing actigraphy data was 81/225 (36%). The mean number of days per person with missing actigraphy data was 0.9. The occurrence rates of BPSD were similar regardless of missing actigraphy data (Supplementary Table 1). Therefore, multivariate imputation was applied using chained equations³⁹ to address the missing actigraphy data. Before training the models, we applied min–max normalization for continuous features. For categorical features, target encoding was employed instead of one-hot encoding. Target encoding reduces feature dimensions by converting categorical features to numerical values derived from target variables, assuming that a categorical feature is related to the outcomes⁴⁰. There was an issue of outcome class imbalance for BPSD subsyndromes. While 26.8% of the participants exhibited affective symptoms, only 4.4% and 4.7% exhibited aberrant motor behaviors and euphoria/elation, respectively (Table 2). Researchers in various disciplines have prioritized the class imbalance problem and suggested strategies to address the issues of imbalanced data sets^41,42,43. This study applied a synthetic minority oversampling technique to address the outcome class imbalance issue⁴⁴.

Predictive modeling

Multiple machine learning approaches were selected for this study, including logistic regression, random forest⁴⁵, gradient boosting machine⁴⁶, and support vector machine⁴⁷. We investigated each of these machine learning methods with a specific learning algorithm to gauge their effectiveness, and then selected the best-performing model that could predict each subsyndrome of BPSD¹⁸. Using logistic regression, the most common and well-established binary classifier⁴⁸, as the baseline model, we evaluated the degree to which the machine learning models improved performance over the baseline model.

To avoid overfitting, hyperparameter tuning through random search⁴⁹ was implemented with five-fold cross-validation for each machine learning method. Binary cross-entropy was employed as the evaluation criterion for five-fold cross-validation. The hyperparameters for tree complexity were considered for the random forest and gradient boosting machine models. The gradient boosting machine model was iteratively trained to minimize the loss function using stochastic gradient boosting. Thus, we considered the learning rate and number of trees for the gradient boosting machine. Various kernel functions such as linear, polynomial, and radial basis kernels can be utilized for the support vector machine⁵⁰. Linear instead of nonlinear kernels, such as the radial basis function kernel, were used in this study to prevent overfitting in small datasets and calculate feature importance. For the support vector machine models, only the regularization hyperparameter was employed to determine the optimal model. All the selected hyperparameters are described in Supplementary Table 2. Feature importance analysis was performed to investigate the contribution of a range of features in predicting the seven subsyndromes of BPSD and to sort the importance of the top 10 influential features for prediction.

Statistical analysis

Categorical variables are summarized as the number of participants with percentages and continuous variables as means with standard deviations. Furthermore, two-sample independent t-tests and Fisher’s exact tests were used to compare the training and test dataset differences, respectively. The performances of the prediction models were compared and evaluated using several indices—accuracy, precision, sensitivity (recall), specificity, F1 score, and area under the receiver operating characteristic curve (AUC).

Statistical significance was set at p < 0.05, and all analyses were performed using R, version 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria) and Python, version 3.7 (Python Software Foundation, Wilmington, USA).

Source link