Collection of databases
Transcriptomic data were systematically collected from multiple public repositories. Specifically, RNA-sequencing data comprising 19 normal bladder tissues and 412 BCa specimens were acquired from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/). Furthermore, we obtained the gene expression profile dataset GSE13507 containing 256 BCa samples from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). For exosome-related gene identification, we extracted 1143 candidate genes from GeneCards (https://www.genecards.org/) using the following selection criteria: (1) search term “exosome” and (2) relevance score threshold >2. The complete list of 1143 initially identified exosome-related genes is provided in Supplementary Table 1.
Identification and enrichment analysis of the prognosis-related genes
Differential gene expression analysis was performed using the caret R package (v4.3.1) and Strawberry Perl, focusing on RNA-seq profiles from the TCGA cohort (412 bladder cancer specimens and 19 normal bladder tissues) to identify 132 differentially expressed genes (DEGs). Additionally, we conducted GO and KEGG enrichment analyses to explore the fundamental biological characteristics of these DEGs, employing the “org.Hs.eg.db” package and “clusterProfiler” package for these analyses.
Methodological framework for prognostic model development
Subsequent univariate Cox regression analysis of these DEGs revealed 15 prognostically significant genes for risk model construction. To ensure cross-sample comparability, gene expression values were standardized using z-score normalization. We implemented an ensemble machine learning framework integrating 10 distinct algorithms: random survival forest (RSF), Ridge regression, Lasso regression, generalized boosted regression modeling (GBM), supervised principal components (SuperPC), CoxBoost, partial least squares regression for Cox (plsRcox), elastic network (Enet), Stepwise Cox, and survival support vector machine (Survival-SVM). Through stochastic 10-fold cross-validation, 101 algorithm combinations were systematically generated to enhance predictive accuracy [14,15,16], with methodological details consistent with prior work by Hu et al. [15, 17]. The TCGA-BLCA cohort served as the training set for model construction, while the GSE13507 dataset was utilized for independent validation. Initial algorithms performed variable selection from the 15 prognostic genes, with final models derived from optimized variable combinations. Model performance was rigorously evaluated using the concordance index (C-index), with the algorithm combination demonstrating the highest average C-index across both cohorts selected as the optimal predictive model.
Validation of the MLDP
To assess the autonomous predictive capacity of clinical parameters regarding overall survival (OS), both univariate and multivariate Cox proportional hazards regression analyses were performed. The predictive accuracy of the BCa prognostic model and conventional clinicopathological parameters was evaluated through time-dependent receiver operating characteristic (ROC) curve analyzed using the “timeROC” R package. Additionally, we quantitatively compared the C-index of our machine learning-derived prognostic model (MLDPM) with those of established clinical predictors to demonstrate its superior discriminative capability. The C-index, which quantifies the model’s concordance between predicted and observed outcomes, was interpreted such that higher values reflect enhanced prognostic performance.
Developing and evaluating a prognostic nomogram
We developed a prognostic nomogram using the rms package in R to estimate 1-, 3-, and 5-year OS probabilities. The nomogram visually represents the predictive model by quantifying individual variable contributions through a point-scale system, where cumulative points translate to specific outcome probabilities. Model performance was assessed through calibration analysis, where agreement between nomogram-predicted probabilities and actual observed outcomes was graphically evaluated using calibration curves. These analyses confirmed the model’s predictive accuracy by demonstrating strong concordance between estimated and empirical survival outcomes.
Enrichment and immune infiltration analysis
To elucidate distinct underlying molecular mechanisms between high- and low-risk groups, we performed comparative Gene Set Enrichment Analysis (GSEA). Following DEGs analysis, all genes were ranked according to their log2 fold-change values. To elucidate the underlying biological mechanisms and associated signaling cascades, KEGG pathway enrichment analysis was conducted using the “clusterProfiler” package. The five most statistically enriched biological pathways were identified for graphical display.
We systematically characterized immune cell infiltration patterns between risk groups using seven established computational methods: TIMER, XCELL, CIBERSORT, EPIC, MCPcounter, QUANTISEQ, and CIBERSORT-ABS. This multi-algorithm approach enabled comprehensive quantification of tumor microenvironment composition in BCa patients, with TCGA-derived infiltration estimates serving as the primary data source. Inter-group differences in immune cell subsets were statistically evaluated using the Wilcoxon rank-sum test (non-parametric, two-tailed). Resultant data were visualized through bubble plots generated with R packages including scales, limma, ggtext, and ggplot2. Parallel assessment of immune checkpoint molecule expression and Tumor Immune Microenvironment (TME) scores was performed using ggpubr, with boxplot visualizations to highlight differential activation patterns between risk strata.
Molecular docking
The three-dimensional molecular structures of bioactive compounds were acquired from the PubChem database (https://pubchem.ncbi.nlm.nih.gov/). Corresponding receptor proteins for key target genes were downloaded from the Protein Data Bank (PDB, http://www.rcsb.org/). Both ligand and protein structures were subsequently submitted to the DockThor web server (https://dockthor.lncc.br/v2/) for computational docking analysis. According to DockThor scoring metrics, stronger molecular interactions were characterized by lower binding energy values, reflecting greater complex stability. Resulting docking poses were visualized using the CB-DOCK2 platform (https://cadd.labshare.cn/cb-dock2/).
Drug sensitivity analysis
We utilized the ‘pRRophetic’ R package to estimate half-maximal inhibitory concentrations (IC50) of multiple chemotherapeutic agents between risk-stratified groups [18]. This computational approach enabled comparative assessment of differential drug sensitivity patterns. Statistical thresholds were established a priori, with P values below 0.05 considered significant, and the comparative pharmacogenomic data were visualized using boxplot representations.
Database analysis of the THBS1
Single-cell RNA sequencing data were obtained from the Tumor Immune Single-cell Hub (TISCH) database (http://tisch.comp-genomics.org), a specialized resource for TME investigations [19]. TISCH facilitates comprehensive analysis of TME heterogeneity across diverse datasets and cell types. For validation purposes, the Kaplan-Meier (KM) Plotter resource was employed to evaluate the predictive performance of our risk model, while protein-level expression of prognostic markers was verified using the Human Protein Atlas (HPA) (https://www.proteinatlas.org/) [20].
Verification of THBS1 expression in BCa specimens
Paired BCa tissues and adjacent normal tissues were obtained from 10 patients undergoing treatment at Affiliated Hengyang Hospital of Hunan Normal University & Hengyang Central Hospital (Hengyang, China). Total RNA was extracted and subjected to quantitative real-time PCR (qRT-PCR) analysis using SYBR Green master mix (Roche) with β-actin serving as the endogenous control. All reactions were performed in technical triplicates to ensure reproducibility. The following primer sequences were used for THBS1 amplification:
Forward: 5′- TGGAAATGTGGTGCTTGTCC-3′.
Reverse: 5′- CCATTGTGGTTGAAGCAGGC-3′.
Isolation and identification of T24-derived exosomes
To isolate exosomes, a series of centrifugation steps are first performed to remove dead cells and debris. The supernatant is then processed using a 100 kDa ultrafiltration tube, followed by ultracentrifugation at 1,000,000 × g. The supernatant is discarded, and the resulting pellet is resuspended in phosphate-buffered saline (PBS), filtered through a 0.22 μm filter, and stored at −80 °C for subsequent use. The protein content of the concentrated Exo suspension is quantified using a BCA Protein Assay Kit (23,227, Thermo Fisher Scientific, USA). Western Blot analysis is conducted to confirm the presence of the exosomal surface markers CD63 and Calnexin.
Western blotting
The cells were subjected to centrifugation and lysed on ice for 30 min using RIPA buffer containing protease and phosphatase inhibitors. Following this, the cell lysate was processed with an ultrasonic homogenizer for 10 s and centrifuged at 12,000 × g for 20 min. Finally, protein samples were resolved using gradient SDS-PAGE and then transferred onto a PVDF membrane (Cytiva, catalog number: 10600021).
Cell culture, reagents and small interfering RNA transfection
The T24 and BIU cell lines were obtained from the China Center for Type Culture Collection (CCTCC, Shanghai, China). T24 cells were maintained in DMEM medium supplemented with 10% fetal bovine serum (FBS), 100 U/mL penicillin, and 100 µg/mL streptomycin at 37 °C under 5% CO₂ atmosphere. BIU cells were cultured in complete RPMI-1640 medium (Corning). For gene silencing experiments, THBS1-specific small interfering RNAs (siRNAs) were synthesized by GenePharma Biotechnology (Shanghai, China). The siRNA sequences were: si1: 5′-AGACATCTTCCAAGCATATAA-3′; si2: 5′-TGACATCAGTGAGACCGATTT-3′. Cell transfections were performed using Lipofectamine 2000 reagent (Invitrogen, USA) according to the manufacturer’s protocol to inhibit THBS1 expression.
Migration and invasion
For wound healing assays, confluent monolayers of T24 and BIU cells in six-well plates were mechanically scratched using sterile pipette tips. Migration progression was quantitatively assessed by measuring wound closure distances at specified time intervals using phase-contrast microscopy. Transwell assays were performed using 24-well plates equipped with 8 μm polycarbonate membrane filters (Corning Inc., Corning, NY), either coated with or without Matrigel. T24 and BIU cells (20,000 per well) were seeded in serum-free DMEM or 1640 media in the upper chamber, while 700 µl of medium supplemented with 10% FBS was added to the lower chamber as a chemoattractant. After incubation for 24–48 h, cells were fixed with 4% paraformaldehyde for 15 min and stained with 0.1% crystal violet for 10 min. Cells that had migrated to the underside of the filter were visualized and imaged under a light microscope. Cell counts were taken from five random fields per insert, with each data point representing the mean from three wells.
Cell proliferation assays
Clonogenic assays were performed by plating 700 cells/well in 6-well plates. Cells were cultured until microscopic observation revealed approximately 50-cell colonies (typically 7–14 days). Colonies were then fixed with 4% paraformaldehyde (20 min, RT), stained with 0.1% crystal violet (800 µL/well, 15 min), gently washed with PBS, air-dried, and imaged using a standardized documentation system. After 48 h of cell treatment, T24 and BIU cells were seeded into 96-well plates, and cell proliferation was evaluated using a CCK-8 kit (Beyotime, Shanghai, China). At different time points, 10 µL of CCK-8 reagent was dispensed into each well. After incubation under light-protected conditions, the optical density was measured at 450 nm using a microplate reader.
BALB/c nude mice lung metastasis model
Four-week-old BALB/c nude mice were obtained from Charles River Laboratories (Beijing, China). All animals in this study were handled humanely, in compliance with applicable animal welfare regulations, policies, and guidelines. To establish a tumor metastasis model, luciferase-transduced T24 cells—either shControl- or shTHBS1-expressing, at a density of 1 × 10⁶ cells per mouse—were administered to the mice via tail vein injection, with 3 mice assigned to each group. Lung metastasis monitoring was initiated on the 15th day following cell injection: each mouse was injected with D-Luciferin (YEASEN, 40902ES02) at a dose of 5 mg/kg, and 5 min later, in vivo imaging was performed using a LAGO imager equipped with Aura imaging software (Spectral Instruments Imaging) at the Animal Imaging Facility. Two weeks after the initial imaging, the mice were humanely euthanized. Subsequently, researchers with relevant expertise quantified the number of pulmonary metastatic foci through visual inspection.
Statistical analysis
All statistical analyses were conducted using R software (v4.3.1; R Foundation for Statistical Computing) and GraphPad Prism 8 (GraphPad Software). A two-tailed p-value threshold of < 0.05 was established a priori to determine statistical significance. To ensure experimental reproducibility, all assays included three independent biological replicates.
