Data Sources and Preprocessing
This study included gene expression information from 1147 LUAD patients from six published data cohorts as a training and validation cohort (TCGA-LUAD cohort). n= 594; GSE7670 cohort, n= 57; GSE3141 cohort, n= 58; GSE31210 cohort, n= 226; GSE30219 cohort, n= 85; GSE50081 cohort, n= 127). mRNA expression data, clinical information, and genomic mutation data for TCGA-LUAD patients were downloaded from https://cancergenome.nih.gov/. Genomic information and clinical data from GEO-LUAD patients, including GSE7670, GSE3141, GSE31210, GSE30219, and GSE50081, were collected from https://www.ncbi.nlm.nih.gov/geo/. TP53, KRAS, LKB1, and EGFR mutant types were isolated from TCGA-LUAD tumor samples based on the genetic mutation status obtained from the Cbioportal database (https://www.cbioportal.org).
Additionally, 29 Luad tissue samples from Shandong Provincial Hospital were used to verify the relationship between DDX56 expression and patient prognosis. This study was reviewed and approved by the Biomedical Research Ethics Committee of Shandong Province (ethical approval number: SWYX2024-464). The Ethics Committee approved the exemption from informed consent for the following reasons: The project only includes research into past medical records, with no subject intervention or handling during the process. During the follow-up process, subjects are informed that the prognostic information can be used in clinical studies. Furthermore, the project is a retrospective study and does not intervene in participants' rights or health. This study did not disclose participants' personal privacy or identity information.
Identification and prognostic analysis of differentially expressed RBPs in LUAD
In total, 2334 RBPs were obtained from previous studies6,7. Supplementary data presents a list of these RBPs. Perform differential expression analysis of RBP mRNA levels in TCGA-LUAD and GSE7670-LUAD patients using the “LIMMA” package in R | logfc | >5 and adjustment p-Value of <0.001. Following the intersection with RBP, differentially expressed RBP was obtained. Prognosis-related RBPs were identified through univariate COX regression analysis. p<0.05.
Construction of prognostic risk models based on machine learning ensemble algorithms
Ten machine learning algorithms to develop more important RBPs and establish more accurate and stable prognostic risk models34 A combination of 72 algorithms has been implemented. These algorithms include lasso, monitored principal components (SUPERPC), survival support vector machine, random survival forest, stepwise Cox, partial least squares regression of COX, elastic networks, ridges, Cox boosts, and generalized boost regression modeling. Based on the one-time vacation cross-validation framework, 72 algorithm combinations were performed on prognostic related RBPs, resulting in hub RBPs. The TCGA-Luad cohort was used as a training cohort. All models were detected in four GEO validation cohorts (GSE3141, GSE31210, GSE30219, and GSE50081). For each model, Harrell's Concordance Index (C-Index) was calculated using the training and validation data cohort. The models with the highest average C index were considered optimal. After that, when we ran the “survival” package, we performed multivariate Cox regression analyses using these hub RBPs to construct a prognostic risk model. The risk scores for each patient were calculated as follows:
$${\rm {risk \; score}} = \mathop {\sum} \limits_ {i = 1}^{n} {coef} \left(i \right)* {exp} \left(i \right)$$
(1)
where n represents the number of genes in the model; \({coef} \left(i \right)\) Shows the regression coefficients of genes in the multivariable Cox regression model, \({exp} \left (i \right)\) Gene expression level.
Verification of hub RBP-based prognostic risk models
Based on median risk scores, patients in the training and validation cohort were divided into high- and low-risk groups. The “Survival” and “Surviner” software packages were run to generate the Kaplan-Meier (K-M) curve. The “Timeroc” software package was used to generate receiver operating characteristics (ROC) and calculate the area under the curve for 1, 3 and 5 years. Additionally, the “SurvComp” package was implemented to calculate the C-index of risk models, age, gender, TNM stage, smoking status, and genetic changes on two datasets.
Genomic mutation analysis
Mutation data for LUAD patients were downloaded from the TCGA database and processed using the “Maftools” package. We analyzed the mutation status of prognostic RBPs and the top 20 genes with mutation probability in risk subtypes.
Analysis of treatment response
The “Oncopredict” package was performed to calculate IC50 values for 198 drugs in TCGA-Luad patients and to analyze differences in drug susceptibility between high- and low-risk patient groups.35. Immunophenoscore (IPS) of TCGA-LUAD patients obtained from the Cancer Immune Atlas (https://tcia.at/) was used to predict the susceptibility of immunotherapy. Data on Tide scores, T cell dysfunction, T cell exclusion, and bone marrow-derived suppressor cells (MDSCs) were obtained from the Tide website (https://tide.dfci.harvard.edu/).
TME Analysis
TME and their stromal scores, immunity, and estimated scores of LUAD patients were assessed using estimation algorithms. Based on risk score, Xcell36,timer37QuantiseQ38McPCounter39amazing40CiberSort -Abs, and CiberSort41 An algorithm was employed to further analyze tumor-infiltrating immune cells and stromal cells.
Cell culture
Human PC9 and A549 cell lines are available from Procell Life Science & Technology Co., Ltd. Purchased from and supplemented with 10% fetal bovine serum (FBS) and 1% penicillin/streptomycin with RPMI 1640 and F12K medium (GIBCO), respectively. Both cell lines were cultured at 37 °C in a humidified incubator containing 5% CO.2.
Small interacting RNA transfection
DDX56 shortened RNA (siRNA) and control sirna were purchased from Keyybio (Shanghai, China). JetPrime® transfection reagents (Polyplus Transfection, Illkirch, France) were used for siRNA transfection. Supplementary Table 1 lists the DDX56 siRNA sequences.
RNA extraction and quantitative real-time PCR
Total RNA was extracted from LUAD cells using Ag RNAEx Pro Reagent (Accorate Biotechnology (Hunan) Co., Ltd., China). A reverse transcription kit (Hunan Co., Ltd.) was used for cDNA synthesis. Quantitative real-time PCR (QRT-PCR) was performed using the SYBR Premix EX TAQ kit (Hunan Co., Ltd.). RNA quality was assessed and normalized based on human 18S rRNA. Supplementary Table 2 provides a list of primer sequences used.
Western blotting
Total proteins were extracted from cells using RIPA buffer. Protein concentrations were determined using BCA reagents (Beyotime, Shanghai, China). Equal amounts of total protein extracts were then loaded onto a 10% SDS-PAGE gel and transferred to a polyvinylidene difluoride membrane. The membranes were blocked with 5% BSA solution for 1 h at room temperature and then incubated overnight with primary antibody at 4 °C. The next day, secondary antibodies (beyotime; 1:1000) were incubated with the membrane for 1 hour at room temperature. Proteins were visualized by enhancing chemiluminescence. Primary antibodies were: P65 and P-P65 (Ser536) (Cell Signaling Technology, USA: 1:1000); DDX56 and GAPDH (Santa Cruz, USA: 1:1000); BCL-2 (Proteintech Group: 1:1000);
Cell proliferation assay
After seeding the cells, they were fixed for at least 24 hours using 10% cold trichloroacetic acid. Cells were then stained with sulfonamide B sodium salt (Sigma, USA) for 20 min and washed with 1% (vol/vol) acetic acid. After drying the cells, 150 μl of 10 mmol/L Tris was added to the cells. Cell absorbance at 562 nm was measured using a microplate reader (Thermo Fisher, USA). To analyze cell proliferation, 96-well plates were seeded at a density of 3000 cells/well and trichloroacetic acid was fixed every 12 hours. All data were normalized to day 1 data and expressed as mean ± SD. All experiments were performed on 10reepeats.
Transwell assay
Transwell experiments were performed on transwell plates. First, 200 μL of FBS-free cell suspension (4 x 10)4 Cells) were filled into the upper chamber and 600 μL of 20% FBS medium medium was filled in the lower chamber. After incubation of the cells for 24 hours, cells in the lower chamber were fixed and stained with crystal violet. After imaging the cells, ImageJ software was used for cell counting.
Edu Incorporation Assay
Cells were seeded in 12-well plates. Once the cells reached 50% confluence, the rate of EDU uptake was determined using a Beyoclick™ EDU Cell growth kit containing Alexa Fluor 594 (BeyoTime).
Wound Healing Assay
Cells were seeded in 12-well plates. When the cells achieved 95% confluence, a pipette tip was used to injure the cell monolayer. Cells were then photographed every 24 hours. Scratch area changes were calculated using ImageJ software.
Cell apoptosis analysis
Adherent cells were digested with trypsin and centrifuged (300× g5 min at 4 °C), collected and washed twice with pre-cooled PBS at 4 °C (centrifuged each time under the conditions mentioned above). Annexin V-FITC/PI (Annexin V-fluorescein isothiocyanate/propidium iodide (FITC/PI)) apoptosis kit was used for cell staining. Cells were resuspended using binding buffer (1×) before incubating cells by staining and staining for annexin V-FITC/PI. Flow cytometry (FCM) was performed to determine apoptosis and results were analyzed using FlowJo V10.
Bioinformatic analysis of DDX56
Single sample gene set enrichment analysis (SSGSEA) was performed to calculate the enrichment scores of immune function and compared them between the DDX56 high- and low-expression groups. Differential representation analysis of the general-cancer was performed using data from the timer database (https://timer.comp-genomics.org/timer/). Tide score difference analysis, K–M survival curve analysis, ROC curve analysis, independent prognosis determination, and clinical correlation analysis were performed in the R package. Data from single-cell RNA sequencing and paired normal lung tissues of tumors of LUAD patients were obtained from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). The R language “Seurat” package was used to determine the expression patterns of a single cell, and the “single” package was used for cell annotation.
Statistical analysis
Statistical analyses were performed using R and GraphPad Prism 9.0 software. The t-test was performed to analyze the statistical significance of quantitative data. Wilcoxon's rank total test was applied to non-normal distributed variables. Statistical significance was compared between two or more groups by performing Kruskal–Wallis tests and one-way analysis. The R package “Survminer” has been implemented to identify differences in KM survival analysis.p<0.05 was considered a threshold of statistical significance.
