Data collection
The SCI data sets GSE5296, GSE47681, GSE151371, and GSE45550 were downloaded from the gene expression Omnibus (GEO) database.12 (https://www.ncbi.nlm.nih.gov/geo/) via R package Geoquery13. Detailed information is shown in Table 1.
Glucocorticoid receptor-associated genes (GRRG) were recovered from the GeneCards database14 Filter “protein coding” and association scores by >2 using the “Glucocorticoid receptor” (https://www.genecards.org/), resulting in 352 GRRGS. These genes were converted to mouse counterparts using R-package homologen, generating 335 mouse GRRGs. An additional 492 mouse GRRGs were sourced from the Uniprot database15 (https://www.uniprot.org/). After merging the dataset and deleting the replica, the final set of 761 GRRG was compiled as summarized in Table S1.
Key regulatory genes of 13 PCD pathways were collected from published studies16resulting in a total of 884 PCD-related genes. The final list of genes is available in Table S2.
To eliminate batch effects, we combined GSE5296 and GSE47681 dataset using R package SVA17. The resulting integrated dataset consisted of 79 SCI samples and 51 control samples. Standardization was performed in R package LIMMA18,probe annotation and normalization follows. Principal Component Analysis (PCA)19 This was carried out to verify the effectiveness of batch effect removal, as shown in Figure S1.
Differentially expressed genes associated with glucocorticoid receptor (GR) in SCI (GRRDEG)
Samples of the integrated GEO dataset were categorized into SCI and control groups. Differential representation analysis was performed using the R package LIMMA with the threshold set to |logfc |. >0.5 and p-values <0.05 Identify differentially expressed genes (deg). The Benjamini–Hockerg (BH) method was used for P-value correction. Volcanic plots showing the expression results of the differential equation were generated via R package GGPLOT2.
Glucocorticoid receptor-associated differentially expressed genes (GRRDEG) were identified by crossing GRRG and DEG. Venn diagrams were created to show overlap, and Grrdegs expression was visualized via heatmap generated using the R package Pheatmap.
Functional analysis of SCI's GRRDEG
A genomic map featuring 24 chromosomes was created using the RCircos package. The functional role of Grrdeg was assessed through the genes and genomes (KEGG) in the Kyoto Encyclopedia20, 21, 22 and focuses on gene ontology (GO) analysis, biological processes (BPS), cellular components (CCS), and molecular function (MF). The major genes were further analyzed using the Metascape platform. The protein-protein interaction (PPI) network of key genes was constructed via string databasestwenty three (https://string-db.org/) Interaction score >0.400. The network was visualized with Cytoscape (version 3.9.0) and potential hub genes were identified using five Cytohubba algorithms.24,25Maximum Creek Centrality (MCC), Degree, Stress, Edge Permeation Component (EPC), and Proximity26. Based on the score, the top 20 grrdegs were selected and Venn diagrams were generated to highlight overlapping genes for further analysis.
Immune cell infiltration analysis
CiberSort27a method based on linearly supported vector regression was applied to deconvolute the transcriptome expression matrix and estimate the composition and abundance of immune cells in mixed cell populations. Immune cell invasion scores were calculated using the CiberSort algorithm using a mouse-specific gene signature matrix. The percentage of immune cells was visualized in bar plots. Differences in the proportion of infiltrating immune cells between SCI and control samples were visualized via R package GGPLOT2, with only results with P value <0.05 being retained. For subsequent analysis, immune cells with significant differences in proportion were selected. Spearman's correlation analysis was performed to assess the relationships between the proportions of immune cells, and heatmap of these correlations was created via the R package PheatMap. The correlation between hub gene expression levels and the proportion of invasive immune cells was similarly calculated and visualized.
Unmonitored clustering of SCI samples
Unmonitored clustering of 79 mouse samples was performed using Grrdegs expression profile and R package ConsensusClusterPlus28Apply partitions around the Medoid (PAM) algorithm. The optimal number of clusters was determined based on consistency score (>0.9), cumulative distribution function (CDF) curve, and consensus matrix assessment, with up to six subtypes being considered.
GSVA of the 13 cell death pathway
GSVA29 Equational matrix R package was performed via GSVA. This analysis calculates enrichment scores for each set of genes, reflecting the activity of different cell death pathways across the sample. Default parameters such as Gaussian kernel density estimation were used. A GSVA score matrix for all samples was obtained, with rows representing 13 cell death pathways and columns representing samples.
wgcna
WGCNA Algorithm30 It was employed to identify potentially important genes throughout the results. The appropriate thresholds were determined to construct a weighted adjacency and topology overlap matrix (TOM). Gene modules were analyzed based on their correlation with the target phenotype. Module membership (MM) represents a gene-specific correlation, whereas gene significance (GS) reflects the correlation between genes and target phenotypes. The WGCNA results were crossed to identify important genes within the coexpression module for subsequent validation and analysis.
Bayesian pathway enrichment analysis
To infer path interaction networks, use the CBNPLOT package31 It was used. Interaction orientation was calculated using 79 SCI mice sequence data from the combined dataset as background. The Bngeneplot function was used to generate a graphical representation with the filtering standard set to 0.90.
ML model structure
To construct diagnostic models, 12 machine learning (ML) algorithms were employed: Lasso, Ridge, Stepgram, Xgboost, Random Forest (RF), Elastic Net (ENET), Partial least-squares regression of generalized linear models (PLSRGLM), Gradient booster (GBM), Naive Bayes, VectiNant analysis (vectiNant) (SVM). A total of 113 combinations of these algorithms were tested for variable selection and model development using a training dataset with 10x cross-validation (integrated GEO dataset). To reduce the risk of overfitting, lasso regularization was applied during feature selection, reducing model complexity by punishing non-information variables. Meta-Cohort (GSE151371 and GSE45550) served as an external test set to validate the performance of the model. Models that achieved the highest average region under the receiver operating characteristic (ROC) curve (AUC) in both the training and test sets were selected as the best performance model.32.
Building a predictive model
The nomogram model was built with R package RMS33 We assess the development of SCI using four important predictive genes. “Total Points” refers to the collection of individual points assigned to each predictor variable. Decision curve analysis (DCA) and calibration plots were used to assess the predictive performance of the model. Group comparison plots based on hub gene expression levels were created to further explore the differences between SCI and control samples in the integrated GEO data set.
Animal research
Healthy female skin mice ranging from 7-8 weeks of age were housed under the standard conditions provided by Slike Jingda Laboratory Animal Co. Ltd. All animal treatments were carried out in accordance with International Association for Pain Research (IASP) guidelines and approved by the Ethics Committee of Gannan Medical University. The SCI and control groups consisted of six mice each.
Establishment of SCI mouse model
Anesthesia was induced with 1% pentobarbital solution (40 mg/kg) via intraperitoneal injection. The mice were placed on a heating pad during the treatment. After skin incisions were made, soft tissue at levels of the 8th to 12th chest (T8-T10) was isolated. Steel clips were inserted under the lateral process and laminectomy was performed at T9. The dorsal spinal cord was cut to a depth of approximately 2 mm. Hemostasis was achieved using a gelatin sponge. Spuru-manipulated mice underwent the same procedure except for spinal cord transection. Tissue samples for further experiments were collected after 1 week of postoperative surgery from 8-10 mm segments surrounding the site of injury. BMS exercise scores and mechanical pain thresholds for mice over 7 days are shown in Figure S2.
RNA separation and quantitative real-time polymerase chain reaction (QRT – PCR)
RNA was extracted using Trizol Reagent (Transgen Biotech, Beijing, China). OPN cDNA was synthesized from total RNA using a fast RT kit (Tiangen, Shanghai, China). Quantitative real-time PCR (QRT-PCR) was performed on SYBR Premix Extaq (Takara) according to the manufacturer's instructions. GAPDH functions as an internal control and primer sequences are listed in Table S3. Relative gene expression levels were calculated via 2-ΔΔct method.
Statistical analysis
All data processing and analysis were performed using R software (version 4.3.0). Unless otherwise specified, the statistical significance of differences between the two groups of normally distributed variables was assessed using independent student t-tests. Non-normal distributed variables were subject to the Mann-Whitney U test (Wilcoxon Rank-Sum test). Comparisons containing three or more groups were analyzed in the Kruskal-Wallis test. Spearman correlation analysis was performed to calculate the correlation coefficients between gene expression levels. Unless otherwise stated, all statistical tests were two-sided, with p-values <0.05 being considered statistically significant.
Software and package versions. All calculations were performed on R (R Statistical Computing Basics, v4.3.0, https://www.r-project.org). Gene expression matrices were obtained with geoquery (gene expression Omnibus Query, v2.69., (https://bioconductor.org/packages/geoquery). In the analysis, linear model of MicroArray & RNA-seq data, v3.58.1, https://bioconductor.org/limma) coexpression networks were constructed using WGCNA (weighted gene coexpression network analysis, v1.72-3, https://cran.r-project.org/package=wgcna). https://bioconductor.org/packages/consensutrusterplus) machine learning model was trained in Random Forests (Breiman & Cutler Random Forests, v4.7-1.1), GLMNET (Lasso/Elastic-Net GLMS, v4.1-9), and xgboost (extreme Gradient Boosting, v1.6). Regression model, v2.1-8), MBOOST (model-based boost, v2.9-7), PLSRGLM (partial squares in GLM, v1.2.4), E1071 (Tu Wien, Other Proc for v1.7-16 (v1.18.5), Heatmap for Foot Map (v1.0.12), Round notation symbol with rcilcos (v1.2.1), and general graphics with GGplot2 (r, graphics grammar for v3.5.0). https://bioconductor.org/packages/cbnplot); https://cytoscape.org) was deduced by CiberSortx (v1.0, http://ciberSortx.stanford.edu) and functional enrichment of gene lists was performed in metaspape (v3.5, https:///metaspare type).
