Data
We evaluated popPK model search using a range of clinical datasets as detailed in Table 1. The synthetic ribociclib data was generated using the parameter values from a published model19 excluding the covariate effects. Data was simulated for 96 individuals using 4 doses of 100, 200, 400 and 600 mg and sampling mimicking the Phase 1 study with intensive PK samples collected before treatment and 0.5, 1, 2, 4, 6, 8, and 24 h after dosing on day 1 and day 21 for multiple doses.
The clinical datasets were a compilation of multiple phase 1 trials for the drugs osimertinib20,21, olaparib22,23,24,25, Tezepelumab26,27,28, and, camizestrant29. The details of the design and results of the studies have been previously reported, NCT study codes are listed in Table 1. The protocols were approved by the independent ethics committee at each participating site, and all participants provided written informed consent. This study used existing fully de-identified data and the investigators or participants could not be identified, directly or through identifiers linked to patients.
For this study, only data covering oral or subcutaneous administration was included, tezepelumab intravenous (IV) administration was not considered. For the osimertinib data, which included measurements of both parent drug and metabolite concentrations, only the parent drug data was used as metabolite characterization is not part of our current parameter space. With olaparib, which featured both tablet and capsule formulations, only the tablet data was used. The number of participants after filtering is provided in Table 1. The datasets were chosen as the published popPK model structures for osimertinib30, ribociclib19, olaparib31, tezepelumab32 and camizestrant33 displayed a range of pharmacokinetic behaviour with structures featuring 1 and 2 compartment models, a range of absorption mechanisms and residual error models in addition to both linear and non-linear PK. Although olaparib popPK model included an autoinhibition mechanism which is not present in the current model space, the evaluation of this asset still provided very useful insights to assess the performance of the automated search in cases where final models are not fully contained in the defined search space.
Automated structure search
The model search process runs following three steps as shown in Fig. 1. Step 1: The candidate popPK model structures are generated from the model space. For the 1st generation these models are randomly selected. Following this, model structures are selected using either the global optimiser or the local search strategy. Step 2: The candidate popPK model parameters are estimated using NONMEM5. Step 3: A fitness score is generated for each popPK model as the sum of the NONMEM objective function value (OFV) and a penalty function. These scores are used to update the surrogate model of the global optimiser or inform the next step of the local search. Steps 1-3 are repeated until completion where the system returns a ranked list of model structures ordered by suitability. Our approach uses Bayesian optimisation for the global step with a random forest surrogate model and an exhaustive local search for the local step, both run using the pyDarwin13 package.
Extravascular popPK model search space
To generate candidate popPK model structures, we developed a generic search space designed for models which describe the time course of drugs with extravascular administration. Each model was described by a set of 17 features which detailed different NMLE model elements such as number of compartments, the residual error model and a binary switch determining whether to estimate inter-individual variability (IIV) for different parameters5. An overview of the models which could be generated is given in Table 2 with a detailed description of the 17 features given in Supplementary Table 1. These 17 features can be thought of spanning a search space where each model is represented by a 17-dimensional feature vector. During the search, this feature vector could be randomly chosen or selected using an optimisation algorithm. The model space contains over 880k possible feature vectors which is calculated by taking all unique combinations of the feature values detailed in Supplementary Table 1 (3 × 2 × 2 × 2 × 2 × 2 × 2 × 2 × 2 × 3 × 4 × 2 × 2 × 2 × 2 × 2 × 3). Due to the hierarchy of the features, (e.g., the feature indicating whether to estimate IIV for a peripheral compartment is not used in a one-compartment model) the 880k possible sets of features resulted in >12k unique model structures. Each model structure could be configured for log-transformed or normal scale data by adapting the residual error model. The unique model calculation is detailed in Supplementary Methods.
PopPK models in the search space were coded as NONMEM control files5 which could be fit to data once generated. The model space was defined using the pyDarwin13 nomenclature; a text file detailed the fixed elements of the model and JSON file listed the variational model elements13. The model space files are available on the Github repository listed in the Code Availability section34. NONMEM control files describing a popPK model could be generated from the space by using the feature vector to select values from the JSON file and adding them into the fixed elements text.
The NONMEM control files generated from the model space were coded using the ADVAN5 subroutine in NONMEM 7.5. We updated the default pyDarwin library to enable more advanced Expectation-Maximization (EM) parameter estimation algorithms within NONMEM by adding support for MU referencing nomenclature35. All popPK model parameters were estimated using NONMEM with a sequential combination of Stochastic Approximation Expectation Maximization (SAEM) followed by OFV assessment using Importance Sampling (IMP). The SAEM algorithm used the following NONMEM parameters, ISAMPLE = 2 NBURN = 4000 NITER = 800 RANMETHOD = 3S2P CTYPE = 3 CITER = 10 CALPHA = 0.05 CINTERVAL = 10 INTERACTION. The IMP algorithm used the NONMEM parameters, ISAMPLE = 3000 EONLY = 1 NITER = 10, RANMETHOD = 3S2P NOABORT INTERACTION PRINT = 1 MAPITER = 0 SIGDIGITS = 3 NSIG = 3 SIGL = 9. The SAEM algorithm provides more robust and unbiased estimates in complex scenarios compared to gradient-based approaches such as First-Order Conditional Estimation with Interaction (FOCEI) which are more prone to local minima and require well-tuned initial parameter values for optimal performance36. Additionally, gradient-based approaches often do not converge when fitting complex models37, which would prevent calculation of the penalty function which relies on values such as the relative standard error and condition number. Therefore, the modification of the default pyDarwin version, which was not able to use EM methods, is another contribution of this work.
For the proposed extravascular model space, we aimed to minimise the number of features where possible to avoid creating a prohibitively large search space and refrain from including complexities that do not substantially contribute to the description of the data. To this effect, we chose to not evaluate more than 4 transit compartments. Additionally, OMEGA correlations were not estimated5 which potentially could be a source of bias given highly correlated random effects38. In the future, this could be addressed by adding an exhaustive search for the OMEGA covariance matrix once the top model structure has been selected13,39 or after finalisation of the covariate analysis.
Penalty function
As with the model search space, we designed a fitness function which could be applied generically across datasets to rank potential structures during a model search which was composed of the NONMEM OFV (two times the negative log likelihood [LL]) and two penalty functions.
$${{{\rm{Fitness}}}}=-2\,\ast \,{{{\rm{LL}}}}+{{{\rm{parameter}}}}\,{{{\rm{count}}}}\,{{{\rm{penalty}}}}+{{{\rm{parameter}}}}\,{{{\rm{estimate}}}}\,{{{\rm{penalty}}}}$$
The parameter count penalty was an AIC term that applied a penalty of 10 for each estimated fixed effect parameter (THETA) and random-effect parameter (OMEGA and SIGMA)7,11,15 in addition to each transient compartment. The parameter estimate penalty was designed to prevent models with undesirable values from being chosen and was calculated using the estimated values given by NONMEM. A parameter estimate penalty was calculated for each parameter value by linearly interpolating between set of thresholds and weights detailed in Table 3. The total estimated value penalty was the sum of the individual penalty scores.
The weights and thresholds for the estimated value penalty were initially chosen based on domain knowledge and were further optimised using initial results for model searches using four diverse datasets: ribociclib, osimertinib, olaparib and camizestrant. Weights were tuned manually to drive the search away from implausible structures and ensure the search generated models which matched expert model builder opinion. During this process, new weights and thresholds were added to avoid specific irregularities such as abnormal relative standard errors and estimated OMEGA values. In this context, piecewise linear functions were preferred to simple binary thresholds in order to avoid adding excessive relevance to the numeric values being chosen as thresholds. A search using the tezepelumab data was conducted to assess the adequacy of the final penalty weights and show general applicability of the approach. Additional sensitivity analysis to further optimise these parameters with additional data was outside the scope of this study.
Optimisation algorithms
The proposed model search utilises Bayesian optimisation with a random forest surrogate model40 to identify approximately good area of the model space combined with an exhaustive local search to determine the optimal solution in the local area. Bayesian optimisation is a machine learning framework designed for derivative-free black box systems which are expensive to evaluate41. The surrogate model predicts optimal areas of the model space where the fitness score is minimised using the 17 features which describe each candidate popPK model. The random forest Bayesian optimizer was implemented in pyDarwin using Scikit-Opt40 with the default hyperparameters. A negative expected improvement acquisition algorithm was used to select new candidate structures to evaluate at each stage. On completion of the global optimisation step, an exhaustive local search was used to explore the immediate neighbourhood of the current best model structure. In this study, neighbours are defined as models located either 1 or 2 bit flips away from the binary representation of the model features13. The local search starts by evaluating the neighbourhood of the current best model structure, if an improvement in fitness is observed after evaluation the search will move to a new neighbourhood and halt if not. The automated model search was evaluated using the same extravascular model space and penalty function for each dataset. Three stages of candidate selection were run sequentially. Step 1: The global search generated 40 candidate model structures for 4 generations. Step 2: A local search was performed from the top model by fitness and a second structure which was the best ranking within the subgroup those 2-bit flips away from the top structure. The two local neighbourhoods were explored exhaustively using a 1-bit flip neighbourhood function. The search runs from both neighbourhoods until no improvement is observed compared to the previous step. Step 3: Finally, a second local search was run using a single top model from step 2 using a 2-bit flip neighbourhood function.
Each model search was run with 40 models evaluated in parallel (one per 2.1 GHz CPU-core) in a computational environment with 40 cores and 40 GB of memory. Searches were run via a Slurm job queue system on a high-performance computer (HPC) cluster running GNU/Linux. Search results were analysed using the Python packages Pandas (1.2.4), Numpy (1.20.3) and Seaborn (0.12.2). Visulisations of the model search in 2D were generated by performing dimension reduction on feature vectors of the models explored during the search. Principal component analysis (PCA) and t-distributed Stochastic Neighbour Embedding (t-SNE) algorithms were implemented in Scikit-learn(1.1.3)42.
Statistics and reproducibility
Each model search was repeated 5 times with a different seed for the optimiser to test reproducibility of the search strategy. For each set of searches, we found fewer than 5 unique top models across the 5 repeat searches. The overall top model for each dataset was defined as the model which had the lowest fitness across every repeat search. The performance metrics of fitness score, penalty and OFV were taken are given for the top performing automatically discovered model. Non-functional metrics on search time and the number of popPK models evaluated was given as a mean average across the 5 repeat model runs. For each search, the same model space was used with small dataset-specific configurations for feature names, unit transforms and scaling. NONMEM parameter estimation was run with a fixed seed for all candidate model evaluations.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
