participant
Participants were deceased individuals who underwent autopsy at the São Paulo Autopsy Service (SVOC) and whose brains were donated to the Biobank for Aging Research at the University of São Paulo (BAS-USP). [5, 10, 11] From 2004 to 2019. SVOC is a community-based autopsy service for natural (non-traumatic) deaths. SVOC does not perform autopsies in forensic cases. In Brazil, autopsies are mandatory when the cause of non-traumatic death is unclear due to lack of medical assistance or insufficient information available before death, and are free of charge to the family. All autopsies were performed by a pathologist assisted by a nationally certified technician. [10].
Cases were randomly selected between 7 am and 5 pm on weekdays, and families were asked to consent to participate in the study until the autopsy was completed. After consent was given, clinical and functional interviews were conducted in private.
Inclusion criteria for the BAS-USP were age ≥50 years at the time of death and knowledgeable informants whose next of kin had at least weekly contact with the deceased. We excluded individuals whose informants provided conflicting information during the clinical interview. Exclusion criteria for the BAS-USP were: (i) brain tissue unsuitable for neuropathological analysis (e.g., cerebrospinal fluid (CSF) pH<6.5 or significant acute brain pathology such as hemorrhage or tumor), (ii) inconsistent clinical data provided by informants. All protocols, informed consent forms, and procedures of the BAS-USP complied with international and Brazilian regulations on research involving human subjects. [5, 10, 11] It was approved by local and federal research committees.
Full body autopsy report
A pathologist performed a full body autopsy according to established protocols to identify CoD. [8, 12]First, an external examination of the body was performed, followed by internal examinations of the cranial, thoracic and abdominal cavities. The pathologist measured the volume of fluids and blood, examined the integrity and margins of the anatomical structures (appearance and position of the organs), and detected adhesions and obstructions of cavities, lesions and hemorrhages according to the general principles of pathological autopsy. Samples of abnormal parts of organs such as kidneys, spleen, lungs, liver, heart and brain were collected for anatomo-pathological analysis. Descriptions of the circumstances leading up to death and previous medical conditions were also compiled from next of kin. The pathologist was blinded to the group allocation of the study (MDD patients or non-MDD patients).
As death may be a multifactorial event, all autopsy reports were prepared according to a hierarchical structure of the sequence of events that led to a person's death, recording up to four related causes of death (CrD) and the CoD, the last event that led to a person's death. In our analysis, we considered the CoD and classified at least three CrD cases documented in the autopsy report according to the World Health Organization's International Statistical Classification of Diseases, 10th Revision (ICD-10). [8, 13].
We grouped the CoD and CrD autopsy reports into two sets of variables. One set contained diseases grouped according to body systems or conditions, generating variables covering disease categories as found in ICD-10 chapters. Examples of these diseases include tumors, blood cell disorders, endocrine and nutritional disorders, cardiovascular disorders, respiratory disorders, digestive disorders, and genitourinary disorders. The other set consisted of diseases grouped by ICD-10 subcategories. Examples of these include respiratory tumors and digestive tumors.
Clinical evaluation
Date and time of death, age, sex, ethnicity (white or non-white), and education (illiterate, 1–4 years, or ≥5 years) were collected from the full autopsy report. Other information was obtained after obtaining consent from next of kin, and a semi-structured clinical and functional assessment was applied by a trained geriatrician. The clinical assessment assessed the deceased's lifetime history of MDD, and their clinical and functional status 3 months before death. A validated semi-structured clinical interview assessed demographic characteristics, neuropsychiatric symptoms, cognitive ability, and clinical history. [14].
A diagnosis of lifetime MDD was made using the Axis I, informant portion, of the Structured Clinical Interview for DSM-IV Disorders (SCID). [15]ascertained according to DSM-5 criteria. Depression was diagnosed based on the presence of symptoms during the most severe episode in life. Participants were classified as having LLD if their first MDD episode occurred after age 60 and as having RD if they had had at least two depressive episodes.
We used the informant portion of the Clinical Dementia Rating Scale (CDR). [16]Validated for postmortem use [14]To assess cognitive impairment, a CDR >0.5 was considered to indicate cognitive impairment. [16, 17].
data set
Data obtained from autopsy reports and clinical evaluations were integrated into a unified tabular format. New variables were introduced to depict the cause of death (CoD) and cause of death-related causes (CrD) of each individual. These included 7 binary numeric variables representing diseases classified according to the chapters of the ICD-10 classification and 25 binary numeric variables with diseases grouped by ICD-10 subcategories, as shown in Supplementary Table 1. Each input variable indicates the presence or absence (value = 1) of a disease related to the CoD or CrD of an individual according to the ICD-10 code of the disease in the autopsy report.
Participants were classified based on the presence or absence of MDD (value = 1) according to the criteria presented in section 2.3.
ML algorithms benefit greatly from balanced data in classification tasks. To achieve this balance, we employed a matching procedure to ensure equal numbers of participants in MDD patients and control groups. This approach ensured that ML algorithms had access to equally representative examples from all cases, improving learning outcomes. We balanced the data between depressed patients and controls using only variables such as age, sex, cognitive impairment, education, and ethnicity.
Hierarchically, each MDD patient was paired with one control without MDD according to a computer-run algorithm that followed the following criteria: (1) age at death, (2) age at death ± 4 years, (3) sex, (4) cognitive impairment, (5) education, and (6) ethnicity (Figure 1). For each participant with MDD, the algorithm first attempted to pair them with a control with the same values for age at death, sex, cognitive impairment, education, and ethnicity. If no such match was identified, the algorithm proceeded to search for a control group differing in age at death by plus or minus 4 years while maintaining all other criteria. After this iterative process, all 232 depressed patients were successfully matched with a control.

Flowchart of the algorithm used to pair MDD patients with controls. Matching criteria were: (1) age, (2) age at death ± 4 years, (3) sex, (4) cognitive impairment, (5) education, and (6) ethnicity.
Data analysis
To compare participants with depression and the control group, paired samples T-tests were used for continuous variables and McNemar's test for categorical variables. The significance level was set at 0.05 for two-tailed tests, with Bonferroni correction for multiple testing. Statistical analyses were performed using the Statistical Package for Social Sciences (SPSS), version 20.0.
We evaluated 11 established ML algorithms. [18,19,20] Differentiate individuals with MDD, LDD, and RD from controls based on CoD and CrD. The advantage of ML over traditional inferential statistics is its ability to look for patterns in heterogeneous and multivariate data, independent of the specific data distribution. ML methods make few formal assumptions and let the data speak for itself, mining structured knowledge from extensive data. [21]ML algorithms focus on predictions, using general-purpose learning algorithms to discover patterns. [22]The ML algorithms used ICD-10 disease category and subcategory variables as input and created models that differentiated participants with depression (MDD and LLD vs. RD subgroups) from controls, revealing complex multivariate nonlinear relationships between these variables. ML focuses on prediction, and multivariate analysis improves sensitivity and generalizability even with diverse data. The algorithms applied in this study include logistic regression (LR), support vector machine (SVM), K nearest neighbors (KNN), decision tree (DT), random forest (RF), multilayer perceptron (MLP), AdaBoost (AD), gradient boosting (GB), extreme gradient boosting (XGBoost), lite gradient boosting machine (LGBM), and naive Bayes (NB) algorithms. These algorithms are Psychiatry Library Python language version 3.7.10, with default parameters. To estimate the performance of the algorithm, we applied stratified 10-fold cross-validation, a method to evaluate a predictive model by splitting the dataset into a training set for creating a model and a test set for evaluating the model. In this method, the data is randomly split into 10 subsamples (called folds), with each subsample containing the same proportion of observations from each class (MDD, LLD, RD and their controls). We repeated this process 10 times, using different subsamples (folds) as test sets. We reported the average accuracy over all test sets when using different ML algorithms. Accuracy is a simple way to understand the performance of a classification model and is a standard metric used to evaluate the performance of classification algorithms. Mathematically, accuracy is calculated by dividing the number of correct predictions by the total number of predictions and multiplying by 100 to get the percentage. Since the numbers of depressed patients (all groups of MDD, LDD, RD) and their respective control groups are equal, accuracy is a reliable indicator. [23].
We conducted two separate studies using the ML algorithm in patients with MDD, LLD, and RD. In the first study, we used variables associated with the disease category as input to the ML algorithm. In the second study, we used variables specific to the subcategories as input. As a result, each study produced different results for each depression type (MDD, LLD, RD). This resulted in a total of six sets of results including MDD, LDD, and RD for both categorical and subcategorical variables.
