Establishment of a predictive machine learning model for drug responses in patient-derived cell cultures.

Machine Learning


To develop an efficient methodology for predicting drug responses in patient-derived cell lines, several initial investigative experiments were performed using a dedicated validation set from the GDSC1 data set (Fig. 2). These provided the basis for establishing the parameters needed to build a viable prototype “recommended system” that could predict drug responses in invisible patient cell lines.

Figure 2: Overview of the research workflow, from data collection to model evaluation.
Figure 2

Four datasets (GDSC1, GDSC2, PRISM, RX) were preprocessed to remove missing values ​​and assigned as needed. Data are randomly split into training (80%), validation (10%), and test (10%) sets, with RX using one vacation cross-validation. Machine learning models were trained on historical drug response data to predict drug susceptibility in new patient-derived cell lines. Model performance was assessed using Pearson correlation, Spearman correlation, RMSE, accuracy, and hit rate metrics.

Based on the results of these experiments outlined in the supplemental material, TML was used to fill in missing values ​​in the training data set (Table S2 and Figure S1), and random forests of 50 trees (Table S3) (Table S3) were used (Table S3) with the first “probe” drug panel of 30 selected drugs. Model (Table S6). These choices form the basis of the prototype system.

Predictive performance of GDSC1 datasets

The above prototypes were applied to a dedicated test set containing 81 patient-derived cell lines. Performance was assessed through a variety of metrics. In addition to rPearson, rSpearman RMSE also includes four precision metrics.

We reported the percentage of accurate predictions within the top 10, 20 and 30 drugs, providing indications of prediction accuracy. For example, if 7 out of 10 predictions matched the actual top 10 drug responses, we reported a percentage of 0.7. Additionally, they reported hit rates among the top 10 predictions and provided direct insight into the number of these recommendations that accurately identified hits.

All results are reported as the mean of five experiments with standard deviations of results (Table 1). For comparison, the average hit rate for the entire set of test set drugs was 17.8% for all drugs and 5.0% for “selective” drugs (those active in less than 20% of cell lines).

Table 1. Predictive performance of prototype drug recommendation systems on GDSC1 datasets

The prototype recommendation system showed excellent performance with high correlations with predicted drug activity and actual drug activity in both the entire drug library and the entire selective drug (Table 1). Considering the entire library, I found that the recommended system works very well. On average, 6.6 out of the top 10 predictions were correctly identified, with 15.26 and 22.65 accurate predictions for the top 20 and 30 respectively. Even when accurate rankings of drugs were agnostic and aimed for 10 recommended hits, the system consistently predicted hits almost exclusively.

In the more challenging task of predicting selective drugs, the results remained strong in terms of overall bioactivity rankings (rPearson= 0.781, rSpearman= 0.791). When aimed at identifying the top 10, 20 and 30 drugs, the system provided accurate predictions averaged 3.6, 10.5 and 17.6, respectively. The hit rate among the top 10 drugs was slightly higher, averaging 4.3. It is important to note that in selective drugs, 50% of all cell lines have a total of 12 or fewer hits out of the 236 available for prediction, requiring a near-perfect system to choose them. If only 41 cell lines with 12 or more hits were considered, the hit rate would increase to 6.1.

Two additional experiments (Tables S9 and S10) were performed to further evaluate the accuracy of the top predictions. First, we evaluated whether at least one of the top three drugs appeared among the system's top three predictions, and assessed whether the number of top three predicted drugs was a real hit, regardless of the true rank. In both old-drug and selective scenarios, the system successfully captured the best-performing drugs with high reliability (Table S9). Second, we extended our rating to a top 15 prediction, a realistic number of real applications, examining the frequency at which the actual top 3 drugs were identified and whether the best performance drugs were predicted correctly. Again, the system demonstrated strong predictive performance, even considering a more challenging selective lag subset (Table S10).

A good look at the performance of individual cell lines from a Spearman's perspective r The coefficients indicating how well a drug ranks in terms of its activity were observed with a minimum score of 0.76 for all drugs and a minimum score of 0.39 for selective drugs (Figure S2). In particular, only two of the more difficult to predict selective drugs were run below 0.7, only 8 cell lines that ran below 0.65. These results show the overall strong performance of the prototype recommendation system, while highlighting the increased complexity of predicting selective drug responses. Additional experiments have also been performed comparing approaches to the use of standard molecular fingerprints, and comparison results show that this approach is much better in this dataset (Table S11).

Predictive performance of GDSC2 datasets

The parameters selected for the GDSC1 dataset were applied to the GDSC2 dataset and showed consistent high performance across all drugs. In the GDSC2 test set, all drugs had a hit rate of 13.2%, while selective drugs had a hit rate of 2.5%. These values ​​are significantly lower than those in the GDSC1 dataset. Considering all drugs, the hit rate is high among the top 10 recommendations, with an average of 9/10 active drugs (Table 2). However, the hit rate for selective drugs in the top 10 predictions was 0.193, significantly lower than GDSC1 (Table 1). This reduction could again be attributed to the small number of hits available in the dataset, with an average of 3.38 for all cell lines. As a result, this limit limits system performance, achieving an average of 3.38 hits for every 10 recommendations. With this limitation in mind, the performance of the system can be adjusted to 57% accurately, when compared to the theoretical maximum. Furthermore, 21 cell lines had no hits at all. If these are removed, the average hit rate will rise to 2.72, but stands at 57% of the maximum available hit rate (currently an average of 4.77). There were over 10 hits, with only eight cell lines with an average of 13.25, with an average hit rate of 6.6/10 for these cell lines.

Table 2 Predictive performance of prototype drug recommendation systems on GDSC2 datasets

Predictive performance of FDA-approved drug libraries

The PRISM dataset is very different from the previous dataset, as it contains a larger library of FDA-approved drugs. Due to the unique and diverse nature of this dataset, we once again investigated the effects of the optimal size of the probing drug panel and the number of cell lines included in training on model performance. We excluded two compounds “MG-132” and “bortezomib” from our study, as the authors of the prism study showed their use as a positive control.

First, we investigated the effect of the number of patients used to train the model. The experiments were conducted using a drug panel containing 90 drugs (approximately 2% of the data set). Surprisingly, we found that reducing the number of patients from 418 (all patients) to 30 patients retained an efficient prediction that required a significant amount of information (Table S7). Interestingly, even when working with 10 patients, the correlations remained relatively close to those of the larger patient subset. However, this minimal cohort showed significant decreases in accuracy among top-performing drugs, with top-45 selectivity hit rates. Nevertheless, our study demonstrates that reducing the number of patients to about 10% of the overall dataset still results in strong predictions and substantial hits. Based on our findings, we chose to advance 100 patients as the performance was roughly the same as that of 200 and 418 patients.

Next, we investigated the effect of drug panel size on performance. The experiments were performed on a drug panel consisting of 23, 45, and 90 drugs, representing approximately 0.5%, 1%, and 2%, respectively, of the data set. It was observed that there was a significant improvement with each size increase. However, considering that testing 90 drugs per new patient can be very important, we recommend using a panel of 45 drugs (below). This small panel led to a considerable number of compounds activated against competitive outcomes and cell lines, resulting in ~30 hits out of 45 recommendations (Table S8).

The above calibration tasks were performed using the validation set. The same panel size was then tested using an independent test set of 52 patients and observed very similar performance (Table 3). It was revealed that hit rates (of the top 45 predicted drugs) corresponded to an increase of ~1000-1600% compared to library-wide screening (5.2% hit rate). This will result in 45 recommendations containing, on average, approximately 25-38 active compounds, depending on the panel size selected.

Table 3 52 Effects of different drug panel sizes on predictive performance of PRISM datasets using test sets from 52 PDCs

Predictive performance of RX datasets

The RX dataset was obtained from a variety of tumor types. Drug screening was performed shortly after arrival in the laboratory with culture biopsies of cancer patients. Due to the limited number of drugs and cells in the dataset due to strict inclusion criteria (see Methods), we employed cross-validation to remove leave on this dataset. This included predicting patient-derived cells survival when treated with drugs, based on the biological nature that occurs when treated with the remaining drugs in the dataset and other patients' drugs. In this scenario, the unknown activity shown in Figure 1 involves a single drug, with the remaining drug becoming part of the drug panel.

Approximately 30 different drugs were tested for each cell line. The mean survival rate between cell lines in this library was 87.55% (Table 4). In contrast, selecting five drugs predicted to be most effective for each cell line reduced the average cancer cell survival to 33.45%. Using a <30% feasibility threshold as a sign of hits, the average number of hits per cell line was 4.36. On average, the methodology recommended 4.87 drugs for the test, of which 3.60 were identified as hits. Three of the cell lines never experienced a reduced survival rate of <30% and were excluded from hit analysis. The average hit rate across all resulting cell lines was 69.40%, and our approach was able to identify an average of 68.78% of all hits available in all cell lines. For the three cell lines, the resulting hit rate was 0%, and in each case, the total hits for 30 or more compounds were low with 1 or 2 hits. However, as methodology was able to always rank drugs in libraries, the top five predicted drugs resulted in significantly lower survival rates than library survival rates. The complete results are listed in Table S12.

Table 4 RX Dataset Performance



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *