COMET details
This section describes the model architecture and training algorithms of COMET. Pseudocode for inference is provided in Algorithm S1.
COMET model architecture
Lipid molecular structures are encoded into high-dimensional vectors (molecular embeddings), while scalar compositional features are encoded using a Gaussian-based encoder53. Continuous formulation-wide parameters (for example, N/P ratio and volumetric mix ratio) are encoded with Gaussian layers; categorical inputs use one-hot embeddings.
The transformer uses a [CLS] token to aggregate input features across multiple attention layers. For multitask learning, each cell type is assigned a separate [CLS] token and prediction head, enabling task-specific outputs while sharing LNP-level representation learning.
Molecular encoder
COMET is compatible with various molecular encoders; here we use Uni-Mol11, pretrained to recover masked atom types and corrupted three-dimensional coordinates. It offers strong property prediction performance and is used with default hyperparameters (from https://github.com/dptech-corp/Uni-Mol/tree/main/unimol). Pretrained weights are frozen during COMET training. Each compound is encoded into a 512-dimensional vector using atom types and coordinates.
Lipid molar percentages are encoded into 128-dimensional vectors using a shared Gaussian layer. Each component is further assigned a 128-dimensional one-hot embedding (\({z}_{k}^{{\rm{type}}}\)) to distinguish lipid classes. These are concatenated and projected through a two-layer MLP into a 256-dimensional component representation.
N/P ratio and volumetric ratio
N/P ratio is encoded using a separate 256-dimensional Gaussian layer (zN/P). Aqueous/organic ratios, treated as categorical variables, are one-hot encoded (zphase) with 256 dimensions.
CLS token and prediction head
Each cell type uses a learned [CLS] token (zCLS) of dimension 256. These aggregate component and formulation-wide token representations across Nblock transformer layers via attention54. Final predictions are made by passing the [CLS] token through a two-layer MLP (MLPpredict).
Transformer blocks
Each block follows a Pre-LayerNorm structure55 composed of layernorm → self-attention → MLP with residual connections.
Training details
The model is trained with a binary ranking objective56 where, given a pair of LNP samples, the model learns to predict a larger efficacy score for the LNP that has a higher efficacy label value from the other LNP:
$${{\mathcal{L}}}_{\mathrm{ranking}}=-\log \left(\sigma (\;{f}_{\theta }({x}_{\mathrm{h}})-{f}_{\theta }({x}_{\mathrm{l}}))\right.$$
(1)
where xh and xl are high- and low-efficacy LNPs and fθ is COMET’s scoring function. Training uses a batch size of 64 (2,016 pairwise comparisons per batch).
Conflict-averse gradient descent
Conflict-averse gradient descent (CAGrad)33 mitigates conflicting gradients in multitask settings. We apply CAGrad with a coefficient of 0.2 to stabilize training across tasks.
Noise augmentation
To address noise in the experimental data, especially in the fluid handling process, we augment the molar percentage with Gaussian noise proportionate to its value where the standard deviation of the noise is 10% of actual molar percentage.
Label margin
From the label values, we can tell not only which LNP is better than another but also by how much. To train the model to learn this additional knowledge, we include a margin term57 in the binary ranking objective:
$${{\mathcal{L}}}_{\mathrm{ranking}}=-\log \left({\rm{Sigmoid}}(\;{f}_{\theta }({x}_{\mathrm{h}})-{f}_{\theta }({x}_{\mathrm{l}})-{\lambda }_{\mathrm{margin}}(\;{y}_{\mathrm{h}}-{y}_{\mathrm{l}}))\right.$$
(2)
where yh and yl are the (efficacy) label values of the more efficacious and less efficacious LNP, respectively, and λmargin controls how much this objective dominates the training. We use λmargin = 0.01 in our experiments.
Ensembling
For in silico evaluation (Fig. 3e–l), the ensemble is formed by Nmodel models trained with the same hyperparameters and dataset (train/valid/test split) but weights initialized with different random seeds. For the ensemble deployed to infer virtual LNPs, 5 different train (80%)/valid (20%) splits are made in a fivefold manner and each model in the ensemble is trained on a different fold. To ensure that ensembled scores are not biased towards models with high variance, the predicted scores from each model are normalized by making their scores for the LANCE LNPs fit a normal distribution with mean 0 and standard deviation 1 before ensembling. More specifically, for each model, this is done by inferring the predicted scores on all the LANCE LNPs and using the mean (meani) and standard deviation (stdi) of LANCE LNPs’ scores to compute the normalized scores \({y}_{i}^{{\prime} \mathrm{normalized}}\) through
$${y}_{i}^{{\prime} \mathrm{normalized}}=\frac{{y}_{i}^{{\prime} }-{\mathrm{mean}}_{i}}{\mathrm{std}_{i}},\quad i \sim \{1,{.}{.}{.},{N}_{{\rm{model}}}\}$$
(3)
The final ensemble score is the mean of all models’ normalized scores:
$${y}^{{\prime} \mathrm{ensemble}}=\frac{1}{{N}_{{\rm{model}}}}\mathop{\sum }\limits_{i}^{{N}_{{\rm{model}}}}{y}_{i}^{{\prime} \mathrm{normalized}}$$
(4)
COMET is implemented in PyTorch and trained with NVIDIA V100 GPUs.
k-Nearest neighbours and random forest model details
The k-nearest neighbours and random forest models are implemented with the scikit-learn (https://scikit-learn.org/) package, with default hyperparameters. More specifically, the k-nearest neighbours model uses n = 5 nearest neighbours while the random forest model uses n = 100 estimators (trees).
LANCE dataset details
LANCE comprises four parts spanning orthogonal LNP design dimensions: lipid component identities, molar percentages, synthesis parameters (for example, N/P and aqueous/organic volumetric ratios) and high-resolution molar sweeps.
Seven ionizable lipids, three sterols, two helper lipids and two PEG lipids were used (Supplementary Table 14), reflecting the focus of current research12,42. To study molar % effects, we designed 13 lipid ratios by varying one lipid class at a time from a reference BASE ratio (Fig. 1d), based on ref. 20. For instance, ratios I1–I4 modify ionizable lipid %, C1–C3 adjust cholesterol (compensated by helper lipid), and P1–P3 alter PEG lipid %, while the remaining modify multiple components (Supplementary Table 13).
Part 1 (lipid choice)
To examine lipid identity effects, we generated 84 combinations from all permutations of 7 ionizable lipids, 3 sterols, 2 helper lipids and 2 PEG lipids. Paired with 13 molar ratios, this results in 1,092 possible LNPs; 1,066 were tested. After removing 91 overlapping with part 2, this part yielded 975 unique LNPs.
Part 2 (ionizable lipid synergy)
Following studies suggesting synergy from dual-ionizable lipid formulations58, we created LNPs with 60:40 molar splits across all ionizable lipid pairs, distributed across 13 lipid ratios. This yielded 637 additional LNPs.
Part 3 (key synthesis parameters)
To explore synthesis effects, we introduced variation in ionizable lipid/RNA weight ratios (10:1, 15:1 and 20:1) and aqueous/organic phase ratios (1:1 and 3:1). Weight ratios were adjusted by molar mass to maintain equivalent molar %. These parameters were later converted to N/P ratios for model input. This part includes 924 LNPs.
Part 4 (molar percentage sweeps)
To study finer-grained molar % effects, we created 24 evenly spaced intervals from 10% to 80% for ionizable lipid, cholesterol and helper lipid, generating 492 LNPs across 3 focused sweeps.
Formulation ratios
Single-ionizable LNPs span 18 unique N/P ratios, derived from 3 ionizable lipid/RNA weight ratios and 7 ionizable lipids. Dual-ionizable formulations add 63 more, totalling 81 N/P ratios. In molar terms, 13 base lipid ratios and 72 sweep ratios (24 per lipid class) result in 85 total molar compositions.
LNP synthesis
LNPs were synthesized by mixing lipid–ethanol and mRNA–citrate buffer phases, incubated at 4 °C for 10 min. Automated handling was performed on the Tecan Fluent platform. For animal studies, LNPs were mixed, incubated on ice for 10 min and dialysed overnight at 4 °C in PBS (Slide-A-Lyzer, ThermoFisher).
Materials
FLuc mRNA (L-7202, Trilink); lipids (Cayman Chemicals, Avanti); luciferase assay (Steady-Glo, E2550) and Agilent BioTek plate reader for readout. alamarBlue was used for viability assays.
Data processing
Each 96-well plate included a ‘standard’ LNP. Raw luminescence values were normalized to the standard and averaged across four replicates (two biological, two technical). Mean values were log-transformed and min–max normalized to [0, 1].
We have represented several key features of the LANCE dataset in Fig. 2. Below, we explain how these key features were extracted from LANCE. For Fig. 2a, part 1 formulations were selected. For the four ionizable lipids (ALC-0315, DLin-MC3-DMA, C12-200 and CKK-E12), we had 156 formulations containing 2 helper lipids, 3 sterol lipids and 2 PEG lipids (that is, 2 × 2 × 3 = 12 combinations) at 13 molar ratios (12 × 13 = 156 formulations).
For Fig. 2b,c, part 3 formulations containing one ionizable lipid, cholesterol and C14-PEG were selected. Two molar ratios of the lipid components (which are shown in the figure) were studied. The ionizable lipid to mRNA molar ratio was 10,162. The aqueous to organic volume ratio was varied. For Fig. 2d, part 3 formulations containing one ionizable lipid, DOPE, cholesterol and C14-PEG were selected. Two molar ratios of the lipid components (which are shown in the figure) were studied. The organic to aqueous volume ratio was held at 1:3.
Figure 2e was generated from part 2 data. Only formulations containing DOPE, cholesterol and C14-PEG were used for the graph. The name of the first ionizable lipid was listed as the title of graph and the second ionizable lipid name was the row name. The total molar content of the ionizable lipids was the column name. The molar ratio of ionizable lipid 1/ionizable lipid 2 is 1.5. The molar ratio of DOPE/cholesterol was 0.34. The molar % of C14-PEG was 2.5%. The molar ratio of ionizable lipid/mRNA was 10,162. The entire library was used to construct Fig. 2f. We calculated the normalized transfection efficacy for the 30th and 70th percentile formulations in B16-F10 and DC2.4 cells. These values were as follows: 70th percentile, B16-F10 = 0.43887; 30th percentile, B16-F10 = 0.24315; 70th percentile DC2.4 = 0.64623; 30th percentile DC2.4 = 0.30946. Formulations above and below these values in the respective cell lines were selected and are plotted in Fig. 2f.
In vitro validation details
The LNPs are named according to the groups to which they belong. A summary of the prefixes used here is given in Supplementary Table 16.
Clinically approved LNP baselines
The recipes for the 3 clinical LNP baselines are based on the literature39 and synthesized in an aqueous/organic volumetric of 3:1 following what is typically used in previous work.
Top LANCE LNP hits baselines
To find strong and reliable LNP baselines from LANCE, we randomly select 10 LNP formulations from the 90th percentile for each cell line to again screen them with the respective cell line to check for reproducibility. Among these ten formulations, three LNPs with their normalized efficacy value closest to their original LANCE efficacy label values were selected as LANCE baseline LNPs.
Exploratory LNP library
To span a vast formulation space, the virtual library was generated by enumerating through possible LNP features such as lipid choices, their molar percentages and key synthesis parameters such as N/P ratios and aqueous/organic volumetric ratios, according to Supplementary Table 15. To find LNPs that are different from the hits in the LANCE dataset, formulations within a 10% L1 distance lipid molar percentage neighbourhood of any top 10% most efficacious LANCE hits were excluded. After this step, the exploratory library has 27,354,600 and 34,539,960 formulations for DC2.4 and B16-F10, respectively. An ensemble of five COMET models predicted efficacy in both cell lines. The top 0.1% highest-scoring LNPs were selected (34,529 B16-F10 and 27,354 DC2.4).
The next step removes formulations based on uncertainty in COMET prediction. We capture the level of uncertainty by first computing the standard deviation (σ) between the models’ prediction (\({y}_{i}^{{\prime} \mathrm{normalized}}\) in equation (3)) within the ensemble. We then scale the standard deviation by division with a non-negative predicted efficacy term to get a relative uncertainty value (urel):
$${u}_{{\rm{rel}}}=\frac{\sigma }{{\hat{y}}^{\rm{ensemble}}},\quad {\hat{y}}^{\rm{ensemble}}={y}^{{\prime} \rm{ensemble}}-{y}^{{\prime} \rm{ensemble,min,LANCE}}$$
(5)
where \({y}^{{\prime} \mathrm{ensemble,min,LANCE}}\) is the minimum ensemble score among the LANCE LNPs. Any formulations with negative \({\hat{y}}^{\mathrm{ensemble}}\) term were dropped. Supplementary Fig. 13 shows the distribution of this relative uncertainty value. Formulations with largest 50% relative uncertainty values were removed, leaving 17,269 B16-F10 and 13,677 DC2.4 formulations.
To promote chemical diversity, K-means clustering (on 14-dimensional vectors encoding lipid molar percentages) grouped these candidates into 10 clusters. Clustering was repeated 1,000 times to stabilize assignments. The highest-scoring formulation in each cluster was selected, resulting in ten diverse in silico hits per cell line (Supplementary Tables 17 and 18).
Lead optimization LNP library
For each cell type, three top LANCE hits (from ‘Top LANCE LNP hits baselines’ section) were used as starting points. Around each, virtual candidates were generated by (1) exploring within a 20% L1 molar percentage distance, (2) substituting at least one lipid (6 ionizable lipids, 2 cholesterols, 1 helper and 1 PEG) and (3) altering the N/P ratio.
To generate three diverse candidates per lead, we segmented the neighbourhood into three zones: (1) molar % segment (within 20% L1, no lipid changes), (2) substitute-lipid segment (within 20% L1, but with at least one different lipid) and (3) N/P ratio segment (differing N/P ratio). From each zone, the top predicted LNP was selected (Fig. 3d, right). This yielded three optimized LNPs per lead. The virtual library size ranged from 1.5 million (single-ionizable lipid) to 9 million (dual-ionizable lipid) candidates. The sixfold increase in dual-ionizable lipid cases arises from combinatorial enumeration: each minor ionizable lipid was paired with six major ones. By contrast, single-ionizable lipid compositions require no pairing. The final selected formulations for validation are listed in Supplementary Tables 19 and 20.
PBAE synthesis
The compositions and molar ratios of amines, diacrylates and branching agents are listed in Supplementary Table 21. To synthesize PBAE polymers, the combination of the amines, diacrylates and branching agents were used. In brief, in a 20 ml glass vial, the entire weight of diacrylate and branching agent was added. Then, the solvent (dimethylformamide) was added to the reaction mixture. Later, the reaction vials were placed on a hotplate at 90 °C. After 24 h, the vials were removed from the hotplate and cooled to room temperature. The amines were added to the reaction vial and placed back on the hotplate at 90 °C and the reaction was allowed to proceed for 48 h. Finally, the vials were removed from the hotplate and allowed to cool to room temperature. Then, the reaction mixture was added (drop-by-drop) into a beaker containing ~150 ml ice-cold diethyl ether (~10× excess volume). The collected samples were transferred to 50 ml tubes and centrifuged at 1,000 × g for 3 min to pellet the polymer. Later, the supernatant was removed and dissolved in the minimal possible volume of dimethylformamide. This purification step was repeated three times. Final polymers were dried under vacuum and solubility tested in ethanol.
Representing PBAEs in COMET
PBAEs were represented as a combination of their diacrylate–amine repeating unit and branching agent, each with unique component-type embeddings. The repeating unit was treated as a fifth molar component type alongside lipids, with its molar concentration estimated from polymer weight and molecular weight. Total molar percentages of PBAE and lipids sum to 100%. Inference proceeds as in lipid-only LNPs (‘COMET details’ section).
COMET PBAE LNP lead optimization hits
Two top-performing PBAE LNPs per cell type were used as starting points. Around each, virtual candidates were generated by (1) exploring within a 20% L1 molar percentage neighbourhood, (2) substituting lipids (6 ionizable lipids, 2 sterols, 1 helper and 1 PEG) and (3) converting to dual-ionizable compositions. To select three diverse candidates, we defined three non-overlapping segments: one within the 20% L1 distance but must have the same lipid choices, one with at least one different lipid compound and one with a dual-ionizable lipid configuration. The top predicted LNP from each segment was chosen (Fig. 3d, right). Final hits are detailed in Supplementary Tables 22 and 23.
Human IL-15 screening
The IL-15 mRNA is synthesized via in vitro transcription with a HiScribe T7 mRNA kit with CleanCap Reagent AG (E2080S) from New England Biolabs, with 5-methoxy-UTP (N-1093) from Trilink. The LNP transfection is done at an mRNA concentration of 0.25 µg ml−1 in the 96-well plate format. The Human IL-15 expression level is measured with Human IL-15 Uncoated ELISA (88-7620) procured from Invitrogen, after 16 h of incubation of HepG2 cells with LNP. Raw efficacy data are normalized, similar to bioluminescence data mentioned above, before used as dataset for machine learning experiments. This dataset (20%) is randomly split into test set, while the rest is used as the train and validation sets.
Lyophilization of LNPs
The LNPs are synthesized in a tris buffer (5 mM tris buffer, pH 8). After synthesis, the LNP formulations are frozen at −80 °C for 2 h before undergoing the following lyophilization process: equilibrate at −40 °C for 2 h, in atmosphere → −40 °C for 21 h, in vacuum → 25 °C for 2 h, in vacuum. Labconco FreeZone 6 l with a Stoppering Tray Dryer was used for lyophilization.
Degradation in the efficacy
Post-lyophilization efficacy values were computed and normalized similarly to the LANCE label values (‘Data processing’ section). The degradation of efficacy owing to lyophilization was calculated by subtracting the post-lyophilization efficacy values score from the LANCE B16-F10 values.
Animal experiments
Animal experiments for this study were approved by the Massachusetts Institute of Technology Institutional Animal Care and Use Committee and were consistent with local, state and federal regulations as applicable. Female C57BL/6J mice (000664, The Jackson Laboratory) were used in the experiments. For imagining, d-luciferin (LUCK-1G, Gold Biotechnology) solubilized in PBS was administered via intraperitoneal injection and the mice were imaged using an IVIS imaging system (PerkinElmer).
t-SNE visualization
We selected the COMET model most correlated (Spearman) with ensemble scores across a random virtual LNP subset. LNP features for t-SNE were the final [CLS] token representations. To ensure even distribution across ionizable types, dual-ionizable lipid LNPs were treated as a distinct class, and 1,250 LNPs per class (8 total) were randomly sampled (10,000 total).
Integrated gradients implementation
To execute integrated gradients (IG) with COMET’s multimodal inputs, we adapted the Captum library. IG computes attribution by integrating gradients along a path from reference to input. Feature attributions were computed per LNP, baseline-subtracted and averaged across each group. Non-PBAE LANCE LNPs were used as the baseline. Attribution scores were normalized (max = 1) and averaged across ensemble models.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
