Early antidepressant treatment response prediction in major depression using clinical and TPH2 DNA methylation features based on a machine learning approach. BMC Psychiatry

Machine Learning


When this study was conducted and reported, it followed standards and guidelines for machine learning in psychiatry [20].

Participants and clinical evaluation

The study included 291 tertiary hospital inpatients diagnosed with major depressive disorder. Patient eligibility was determined based on the criteria of the Diagnostic and Statistical Manual of the American Psychiatric Association, Fourth Edition (DSM-IV). Blood samples were taken before antidepressant treatment.

All patients met the following criteria: Han Chinese, 18-65 years old, baseline 17-item Hamilton Depression Rating Scale (HAMD-17). [21] A score of 17 or greater and depressive symptoms lasting at least 2 weeks. All patients had recently been diagnosed or had a recent relapse and had been off medication for at least 2 weeks prior to enrollment. All diagnoses were independently made by her two psychiatrists with professional or higher professional background and confirmed by a third psychiatrist. The participant had no other diagnosis of her DSM-IV Axis I diagnosis (including substance use disorder, schizophrenia, affective disorder, bipolar disorder, generalized anxiety disorder, panic disorder, obsessive-compulsive disorder). There was no. They had never been diagnosed with a personality disorder or mental retardation. Patients with a history of organic brain syndromes, endocrine disorders, primary organic disorders, or other medical conditions that interfered with psychiatric evaluation were excluded from the study. Other exclusion criteria included hematologic, cardiac, hepatic, and renal disorders. electroconvulsive therapy in the last 6 months; or a manic episode in the last 12 months. Pregnant and lactating women were also excluded from participation.

All study subjects in the study provided written informed consent approved by the Zhongda Hospital Ethics Committee (2016ZDSYLL100-P01) under the Declaration of Helsinki.

demographic and clinical data

Response was defined as a ≥50% reduction in HAMD-17 score from baseline to 2 weeks. [22]Therefore, the 2-week treatment participants were divided into two groups: responders and non-responders.

Two retrospective self-report questionnaires, the Childhood Trauma Questionnaire (28-item short form, CTQ-SF) and the Life Events Scale (LES), were used to assess recent stress exposure and childhood adversity, respectively. rice field. Assessments of the LES and CTQ scales were completed by the same nurse using consistent scripted language. LES is her 48-item self-assessment questionnaire that reflects both positive and negative life events experienced in the past year. LES can be divided into positive life events and negative life events (NLES). CTQ-SF was dichotomized for use in gene-environment interaction analysis.

The 12 demographic and clinical characteristics considered were age, sex, years of education, marital status, family history, presence or absence of first onset, age at onset, frequency of occurrence, duration of illness, HAMD-17, NLES and CTQ-SF baseline score (Supplementary Materials Table 1).

genetic information

Primers were previously designed to encompass 100 bp upstream and 100 bp downstream of the TPH2 SNP, showing significant association with antidepressant response, as well as GC sequence content of >20% CpGs after methylation. showed gender. [11, 12]Of the total 24 TPH2 SNPs, 11 SNPs (rs7305115, rs2129575, rs11179002, rs11178998, rs7954758, rs1386494, rs1487278, rs17110563, rs34115267, rs10784) only 941, rs17110489) is the sequence DNA methylation detected Satisfied status criteria (Supplementary Material Table 2). The methylation levels of 38 TPH2 CpGs were calculated and expressed as the ratio of the number of methylated cytosines to the total number of cytosines.

Handling missing values

A data set containing 291 observations of 51 variables (12 demographic and clinical features, 38 CpG methylation levels, and 1 response variable) was missing 6% of entries ( (see Figure 1). Of the CpG methylation levels, three CpGs (TPH2-7-99, TPH2-7-142, TPH2-7-170) were excluded due to >45% missing values.Completely Randomly Missing (MCAR)/Randomly Missing (MAR) are assumed for DNA methylation data due to the randomness of experimental/technical errors and the interrelationship of variables, with mean imputation can handle missing data [23, 24]Values ​​for other features with missing values ​​were imputed with the mode and mean for categorical and numeric features, respectively.

Figure 1
Figure 1

Deletion patterns in DNA methylation datasets

Classification modeling using machine learning algorithms

Regularization (linear transformation) was used to improve the model’s numerical stability and reduce training time [25]To avoid overfitting when utilizing the maximum amount of data, we reported predictive performance using cross-validation (CV) using the entire sample. CV was 5-fold and mean predictive measures including area under the receiver operating curve (AUC), F measure, G mean, precision, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) were reported. I was. Hyperparameter tuning was based on AUC by random search using the caret default tuning settings. Packaging method (recursive feature elimination by random forest, RFE-RF) [26] We used 5-fold CV to select the features that contributed most to predicting early antidepressant response in patients with MDD. Variable importance was also estimated using random forest. The 5x CV procedure was repeated 10 times for better reproducibility.

ML methods were implemented in a standardized and reproducible manner through an interface with the open-source R package ‘caret’. In this study, five different supervised methods were used, including logistic regression, classification and regression trees (CART), support vector machines with radial basis function kernels (SVM-RBF), boosting methods (logitboost), and random forests (RF). Yes ML algorithms are used to develop predictive models. All analyzes were implemented with R statistical software (version 4.0.4). We used the caret package, which implements the rpart, caTools, e1071, and RandomForest packages for CART, logitboost, SVM-RBF, and RF, respectively.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *