AGILE platform: a deep learning powered approach to accelerate LNP development for mRNA delivery

Machine Learning


Details of the chemical synthesis for the compound libraries can be found in the Supplementary Information file.

Data preparation

Virtual library

We utilized the Ugi combinatorial chemistry method to design diverse head groups, connecting groups, and two distinct alkyl chains. To be specific, we used the Markush Editor in the ChemAxon Marvin Suite (Marvin 23.4.0, ChemAxon, https://www.chemaxon.com). The resulting virtual library contained ~60,000 lipid structures which were then exported into SMILES strings. This virtual library compromises multiple carbon chains, from C6 to C26. In addition, the presence or absence of ester bonds and their position in the carbon chain are used to improve the chemical diversity of the virtual library. The surface charge of LNP is usually determined by the lipids’ head groups. In addition, the head group is critical for mRNA binding. Amine groups are commonly used as lipids’ head groups to form hydrogen bonds with mRNA, especially those containing tertiary amine.

Experimental library

Our experimental library contains 20 head groups, 12 carbon chains with ester bonds, and 5 carbon chains with isocyanide head groups. We selected 1200 lipids for chemical synthesis and in vitro mTP experiments in HeLa and RAW 264.7 cell lines. We label the corresponding mTP in cells to each compound for the 1200 lipids library. And these data are generated by ChemAxon Marvin Suite into SMILE files (SMILE files in SI). The dataset is split based on Murcko scaffolds62 to ensure a robust validation of our model. This is achieved by extracting the core scaffold of each molecule in the dataset using the Murcko Scaffold method in RDkit, which can optionally include chiral centers.

Candidate library

The final library used for model prediction is a filtered subset of the virtual library. The filtering contains three steps based on availability and rationality. First, we retained the lipids containing tertiary amine structures. Second, we removed tail chains that were too long (>C18) or too short (<C10) based on expert knowledge of plausible ionizable lipid design15. Last, we select only those reagents commercially available for further validation of the model. Upon completion of the filtering process, the final candidate library comprises ~12,000 lipids (SMILE files in SI), with 22 unique head groups (Supplementary Fig. S29), and there are 9 unique tail types of Tail 1 and 2 different types of Tail 2 in this arrangement (Supplementary Fig. S34). In the prediction step of the platform, the model selects the most promising lipids based on the ranking of compounds in the candidate library.

Molecular graph construction

Molecular structures can be naturally represented as graphs where atoms are nodes and bonds are edges. For each molecule, the SMILES representation is converted into a molecular graph using RDKit63, and later input to the neural network model in the platform. This representation captures the topological structure and properties of a molecule effectively. An ionizable lipid molecule graph \(G\) is defined as \(G=\left(V,{E}\right)\), where nodes \(V\) represent the atoms and edges \(E\) represent chemical bonds. The atom node features include the atom type (as on the periodic table) and a flag indicating whether the whole molecule it belongs to is chiral. For a node \(v\), the features are constructed in a two-dimensional vector, \({h}_{v}\in \,{{N}}^{2}\). Edge features are constructed based on respective chemical bond types (i.e., single, double, triple, or aromatic bonds) and the stereochemical directionality (i.e., the rdchem.BondDir) in RDKit. Similarly, the edge features form another two-dimensional vector for each bond between atom \(v\) and \(u\), \({\epsilon }_{v,u}\in \,{{N}}^{2}\).

The model architecture

The deep learning model in AGILE comprises three major components: (1) the embedding layers to project node and edge features into learnable vectors, (2) the graph encoder for modeling molecular structures, and (3) the descriptor encoder for modeling molecular properties.

Embedding layers

The embedding layers project the integer features in \({h}_{v}\) and \({\epsilon }_{v,u}\) to learnable feature vectors \({h}_{v}^{\left(0\right)}\,{{{\rm{and}}}}\,{\epsilon }_{v,u}^{\left(0\right)},\) which can be optimized later during the training of the whole neural network. Here, both \({h}_{v}^{\left(0\right)}\,{{{\rm{and}}}}\,{\epsilon }_{v,u}^{\left(0\right)}\) are \({R}^{d}\) vectors, and \(d\) is a predefined size of embedding dimensions. To be specific, we first obtained the embedding vectors for both atom type and charity features in \({h}_{v}\), and added the two vectors elementwise to output the \({h}_{v}^{\left(0\right)}\):

$${h}_{v}^{\left(0\right)}={{Emb}}_{h,0}^{\left(0\right)}\left({h}_{v}\left[0\right]\right)+\,{{Emb}}_{h,1}^{\left(0\right)}\left({h}_{v}\left[1\right]\right),$$

(1)

here [i] denotes the i-th element in the vector. \({{{\rm{Emb}}}}\) is the embedding layer projection. In this work, we use the PyTorch embedding layers (https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html). Similarly, the \({\epsilon }_{v,u}^{\left(0\right)}\) is computed as:

$${\epsilon }_{v,u}^{\left(0\right)}={{Emb}}_{\epsilon,0}^{\left(0\right)}\left({\epsilon }_{v,u}\left[0\right]\right)+\,{{Emb}}_{\epsilon,1}^{\left(0\right)}\left({\epsilon }_{v,u}\left[1\right]\right).$$

(2)

Graph encoder

We used graph isomorphism network (GIN)64, a type of GNN, to operate on the input molecule graphs and to learn a representation vector for each ionizable lipid molecule. GIN can directly propagate messages among nodes and edges on a graph structure and thus is suitable for processing molecular graphs. Additionally, the advantage of GIN over other GNNs is its ability to distinguish between different graph structures, including isomorphic graphs. This makes GIN more expressive than many other GNNs and a suitable tool for tasks involving molecular graph data. It is worth noting that the implemented GIN model follows similar structures used in MolCLR, so that we can benefit from the general pre-trained molecular model of MolCLR as a warm start for the platform (section “Methods”). The update rule of GIN for a node representation on the \({k}{{{\rm{th}}}}\) layer is given as:

$${h}_{v}^{\left(k\right)}={{MLP}}^{\left(k\right)}\left(\left(1+{\varepsilon }^{\left(k\right)}\right)\cdot \,{h}_{v}^{\left(k-1\right)}+\,{\sum }_{u\in N\left(v\right)}{m}_{u}^{\left(k-1\right)}\right),$$

(3)

where \({h}_{v}^{\left(k\right)}\) is the representation of node \(v\) at the \({k}{{{\rm{th}}}}\) layer and \(N\left(v\right)\) denotes the set of neighbors of node \(v\), and \(\varepsilon\) is a learnable parameter. MLP denotes the stacked fully connected neural network layers. The \({m}_{u}^{\left(k-1\right)}\) is the message propagated between a neighbor \(u\) to the current node. It is computed as the sum of node and edge contributions:

$$\begin{array}{c}{m}_{u}^{\left(k-1\right)}={h}_{u}^{\left(k-1\right)}+\,{\epsilon }_{v,u}^{\left(k-1\right)},\\ {\epsilon }_{v,u}^{\left(k-1\right)}=\,{{Emb}}_{\epsilon,0}^{\left(k-1\right)}\left({\epsilon }_{v,u}\left[0\right]\right)+\,{{Emb}}_{\epsilon,1}^{\left(k-1\right)}\left({\epsilon }_{v,u}\left[1\right]\right).\end{array}$$

(4)

Notably, we use \({h}_{v}^{\left(0\right)}\) and \({\epsilon }_{v,u}^{\left(0\right)}\) from Eq. (1) and Eq. (2) for the first GIN layer.

We stack a total of K GIN layers for the entire graph encoder. To extract the feature of the whole molecular graph \({h}_{G}\), we implemented the mean pooling operation on the final layer to integrate all the node features:

$${h}_{G}={{Mean}}\left(\left\{{h}_{v}^{\left(K\right)}:v\,\in G\right\}\right).$$

(5)

Another fully connected layer is used to transform \({h}_{G}\) to the final lipid representation \({z}_{G}\):

$${z}_{G}={{MLP}}\left({h}_{G}\right).$$

(6)

Molecular descriptor encoder

In addition to the structure features encoded by the GIN, the platform utilizes another descriptor encoder to explicitly model molecular properties. In our experiment, we found this contributes to a more stabilized training optimization. We hypothesize that this benefit comes from the straight-forward utilization of computed properties during the optimization, which relieves the model from learning all information from the structure alone. In the implementation of the platform, the molecular descriptors derived from Mordred32 calculations were used, which contain over 1000 common descriptors for each molecule, including the number of atoms, bonds, etc. These features are encoded by gully connected layers into a representation for these properties, \({z}_{p}\in {R}^{{d}_{p}}\):

$${z}_{p}={{MLP}\, \left({descriptors}\right)}.$$

(7)

The final representation of the molecule is the concatenation of the structure and property representations:

$$z=\left[{z}_{G},\, {z}_{p}\right],$$

(8)

where [,] denotes the concatenation of two vectors.

Model pre-training

The model pre-training aims to learn generalizable lipid representation that can benefit the downstream mTP prediction task. Before our lipid-oriented pre-training, we first initialized the model parameters with the general pre-trained model from MolCLR, which has been trained on over ten million distinct small molecules. The rationale for this initialization is to provide a warm start to a model that already has been trained to capture molecular structures. Next, we perform continuous pre-training on the 60,000 lipids in the virtual library (section “Methods”) using contrastive learning to optimize the model’s performance within the lipid domain.

Contrastive learning objective

Our pre-training objective is to learn ionizable lipid representation through contrasting positive data pairs against negative pairs. The model is trained to minimize the following loss:

$$\begin{array}{c}{L}_{i,j}=-log \frac{{{{\rm{ex}}}}{{{\rm{p}}}}\left(\frac{{sim}\left({z}_{i},\, {z}_{j}\right)}{\tau }\right)}{{\sum }_{k=1}^{2N}{\mathbb{l}}\left\{k\ne i\right\}{{{\rm{ex}}}}{{{\rm{p}}}}\left(\frac{{sim}\left({z}_{i},\, {z}_{k}\right)}{\tau }\right)},\\ {sim}\left({z}_{i},\, {z}_{j}\right)=\frac{{z}_{i}{z}_{j}}{{{{\rm{||}}}}{z}_{i}{{{{\rm{||}}}}}_{2}{{||z}}_{j}{{||}}_{2}},\end{array}$$

(9)

where \({z}_{i}\) and \({z}_{j}\) are the learned lipid representation vectors extracted from a positive data pair, \(N\) is the batch size, and \(\tau\) is the temperature parameter set manually. In this pre-training step, we omitted the descriptor encoder, so the lipid representation only contains the graph structure representation \({z}_{G}\) as in Eq. (6). To construct the positive data pair, each input lipid molecule graph is transformed into two different but correlated molecule graphs using graph augmentation. The molecule graphs augmented from the same molecule are denoted as a positive pair, and those from different molecules are denoted as negative pairs within each batch. During training, the model learns to maximize the agreement of positive pairs while minimizing the agreement of negative ones.

Data augmentation

We used two augmentation strategies inherited from the MolCLR pre-training workflow at the atom and bond levels. In the continuous pre-training of lipid molecules, three molecular graph data augmentation strategies are consistently employed. (1) Atom masking: within the lipid molecular graph, atoms are randomly masked according to a specified ratio. This process compels the model to assimilate chemical information, such as atom types and corresponding chemical bond varieties within lipid molecules. (2) Bond deletion: chemical bonds interconnecting atoms are randomly removed in accordance with a designated ratio. As the formation and dissociation of chemical bonds dictate the properties of ionizable lipid molecules during chemical reactions, bond deletion facilitates the model’s learning of correlations between ionizable lipid molecule involvement in various reactions.

Model fine-tuning

The lipid-oriented pre-trained model (section “Methods”) serves as the starting point of the fine-tuning stage. During the fine-tuning, we included the molecular descriptor encoder and used the combined output \(z\) in Eq. (8) as the molecule representation. For the property descriptor input, a series of preprocessing procedures are executed, aiming to isolate pertinent features. Initially, descriptors with a standard deviation of zero are eliminated, followed by the selection of descriptors exhibiting correlation with the experimentally determined mTP in both HeLa and RAW 264.7 cells (score of R2 > 0.006), resulting in the identification of 813 salient descriptors (Supplementary Fig. S35). Subsequently, log transformation is applied to descriptors possessing extensive data ranges, with normalization conducted accordingly. The preprocessing steps enacted on the fine-tuning dataset are documented and replicated for the 12,000 lipids in the candidate library in anticipation of the model prediction phase (section “Methods”).

The model is fine-tuned utilizing the 1200 lipids of the experiment library to perform regression on mTP. The mean squared loss between the predicted and ground-truth potency is used to optimize the model parameters:

$${L}_{{mse}}=\frac{1}{n}{\sum }_{i=1}^{n}{\left({Pred}\left({z}_{i}\right)-{y}_{i}\right)}^{2},$$

(10)

where \({Pred}\left(\cdot \right)\) denotes the fully connected layers that perform the mTP prediction, and \({y}_{i}\) is the actual mTP recorded in vitro.

A scaffold-based 80%–10%–10% train–valid–test split is performed on the experimental library. We fine-tune the model on the training set only and evaluate the performance on the validation set using root mean squared error (RMSE) and Pearson correlation with the ground-truth mTP.

Model ensemble prediction and candidate ranking

To enhance the model’s robustness and generalizability, the fine-tuning process is carried out ten times, from which the top five models are selected based on RMSE and Pearson correlation performance on the testing set. These five models are subsequently employed for ensemble prediction on the 12,000-member candidate set. We first get the mTP predictions from each model and calculate the average and standard deviation of the five predicted values for each candidate molecule. The mean predicted values are then subtracted from the standard deviation, and the resulting predicted score is used to rank the candidates. We observed that the predicted potencies exhibit distinct stratification based on combinations of headgroups and tails, and the predicted mTP differences between molecules with the same headgroups and similar tails are relatively minor (Supplementary Fig. S36). To increase the diversity of selected candidates, we implement a ranking scheme that sorts candidate lipids by headgroups and tail combinations (Supplementary Fig. S37). Given the predicted values, candidates are first organized by headgroups and subsequently ranked in descending order. Candidates within each head group are then ranked by tail combinations following the same schema. Ultimately, we select the top five head groups and the top three tail combinations from each headgroup, resulting in a final candidate set of 15 lipids.

Implementation details

The graph encoder in the model consists of a five-layer GIN with ReLU activation. To extract a 512-dimensional lipid representation, an average pooling layer is applied to each lipid molecular graph. A single hidden layer MLP is then employed to map the representation into a 256-dimensional latent space. During model pre-training, the contrastive loss is optimized using the Adam optimizer65, with a weight decay of 10−5, and the temperature is set to 0.1. The pre-training process involves a batch size of 512 for 100 epochs.

For model fine-tuning, an additional MLP with one hidden layer is introduced to map the molecular descriptors into 100-dimensional latent vectors. These vectors are concatenated with the 256-dimensional lipid representation obtained from the GNN encoder. Subsequently, a two-layer MLP is utilized to derive the final prediction value from the concatenated vector. The fine-tuning process employs the Adam optimizer with a weight decay of 10−6 to optimize the loss (Eq. (10)). Each fine-tuned model is trained using a batch size of 128 for 30 epochs.

Comparison to other methods

To assess the precision and reliability of predicted mTP, AGILE was benchmarked against traditional ML algorithms, including Ridge regression38, Lasso39, Gradient Boosting40, and SVM41. To ensure the fairness of comparison, all models were trained and tested using the mTP results derived from the wet-lab experiment involving the 1200 LNPs in HeLa cells. Specifically, we allocated 80% of the data for model training, 10% for optimal hyperparameter selection, and the remaining 10% for result evaluation. Notably, we only used molecular feature descriptors as the input for the above-mentioned traditional ML algorithms since they are not able to process molecular structure data. We used R2 and PCC as metrics to evaluate the models’ robustness and accuracy. These evaluation results are included in Supplementary Table 5. AGILE outperforms other methods with the highest R2 and PCC scores of 0.249 and 0.573, respectively. In contrast, Ridge struggles with R2 of −1.035 and shows PCC of 0.514. SVM has R2 of 0.07 and PCC of 0.409. Gradient boosting has R2 of 0.09 and PCC of 0.308. Lasso performs with a lower R2 of −0.01 and PCC of 0.04. AGILE’s outperforming performance can be attributed to the combined representation of both lipid structures and molecular descriptors, making it more generalizable and robust for this intricate task of mTP prediction. The overall modest scores across models underscore the inherent challenges of the task, yet AGILE demonstrates great potential given its superior performance.

Model interpretation

Salient molecular descriptors calculation

In our study, we employed the Integrated Gradients66 methodology featured in the Captum67 Python package to interpret the significance of molecular descriptors. The process involves approximating the integral of molecular descriptor gradients in relation to their respective predicted mTP for each ionizable lipid within the candidate library. A molecular descriptor’s prominence is proportionate to the absolute value of its integrated gradient. We implemented computations across all five ensemble models for each target cell line. To calculate an overall significance for each feature, we initially averaged the computed gradients across all input samples on each model, subsequently normalizing these important scores. The final step involved computing the mean of these important scores across all five models. The top 20 critical features were selected and visualized based on the calculated importance scores. When assessing feature significance in the context of headgroups, we averaged the integrated gradients for each headgroup and then proceeded to normalization. Following this, we averaged the results across the five models for each respective headgroup. The top two significant features for each headgroup were then selected, and their scores were visualized across all headgroups.

Construction of the similarity network on the selected candidates

We constructed a similarity network for the 15 selected candidates respective to each target cell line, with the aim of elucidating the similarities among the candidates. Utilizing the vector representations provided by the corresponding fine-tuned model, we computed the cosine similarities for each candidate pair and chose the four most similar neighbors for each. This generated similarity network was then visualized, with the node sizes representing the relative luciferase units.

Molecular structure interpretation

To ascertain the critical areas within the lipid structure that contribute significantly to the model’s predictions, we engaged the Model Agnostic Counterfactual Compounds Generation feature present in the ExMol Python package68. This is accomplished by generating molecular counterfactuals and investigating the alterations required in the lipid molecule to modify its predicted mTP (Supplementary Fig. S38). The molecular counterfactuals produced are designed to retain as much similarity to the input lipid molecule as feasible. If modifications in particular regions result in either an increase or decrease in the mTP, such areas are deemed essential regions. The critical areas identified through this process were visualized for both H9 and R6.

Materials and lipid library synthesis

All materials were prepared and processed without nucleases throughout the synthesis and formulation steps. mFLuc (Translate), Cre recombinase mRNA (TriLink BioTechnologies), mOVA (TriLink BioTechnologies), and EGFP-mRNA (TriLink BioTechnologies) were directly purchased from vendors. PrestoBlue™ Cell Viability Reagent and Quant-it™ RiboGreen RNA Assay Kit were purchased from Thermo Fisher Scientific Inc. All mRNAs were stored at −80 °C and were allowed to thaw on ice before use. DLin-MC3-DMA and ALC-0315 were purchased from Echelon Biosciences. Amine headgroups and starting compounds were purchased from Sigma-Aldrich and TCI America to synthesize ionizable lipids. The tails were purified through flash column chromatography, and their final structures were confirmed using 1H 400 MHz NMR spectrometry with CDCl3 and tetramethylsilane as a standard at UHN Nuclear Magnetic Resonance Core Facility. To synthesize compounds in each well of a 96-well plate with glass inserts, we added 10 μL from a stock solution containing amines, tails, and catalyst that had been mixed and pre-stirred overnight. Specifically, the 350 μM stock solution was prepared by combining the amine and tail components in a 1:1:1 ratio and dissolving them in a 2:1 mixture of methanol and 0.2 equivalents of the phenyl hypophosphoric acid catalyst. This stock solution was then added to each well of the 96-well plate. The covered plates were placed on a shaker and stirred overnight to allow the reactions to proceed. To further analyze our materials, we obtained high-resolution mass spectra using an LC-Mass spectrophotometer at the Centre for Pharmaceutical Oncology of the University of Toronto.

LNP formulation and characterization

Before preparing LNPs, we estimated the average final concentration of ionizable lipids after the final reaction was completed and used the molar ratio of ionizable lipids/DOPE/Chol/C14-PEG2000 was 35/16/46.5/2.5 to formulate LNPs based on previous work69.

In HTS, LNPs were created by employing an automated liquid handler (OT-2) to mix an aqueous phase with an ethanol phase at a volume ratio of 3:1. The ethanol phase incorporated a crude mixture of ionizable lipids, DOPE (Avanti), cholesterol (Chol, Sigma-Aldrich), and C14-PEG2000 (Avanti), dissolved in ethanol at a pre-established molar ratio. Concurrently, the aqueous phase was formulated in a 10 mM citrate buffer including mFluc. At the high-throughput screening stage, the synthesized ionizable lipids were not purified before LNP formulation and the concentrations of crude lipids were estimated based on average reaction yields obtained from preliminary studies15,16,45,56.

For additional in vitro and in vivo HTS, LNPs were generated by manual pipetting to mix an aqueous phase with an ethanol phase, keeping the same 3:1 volume ratio. The aqueous phase was composed in a 10 mM citrate buffer containing the relevant mRNA. The ethanol phase involved a mixture of ionizable lipid and helper phospholipids (DOTAP, DOPE, cholesterol, and C14-PEG2000), dissolved at pre-determined molar ratios maintaining an ionizable lipid/mRNA weight ratio of 10:1. MC3-LNP and ALC-0315-LNP were formulated at molar ratios of 50:10:38.5:1.5 (MC3:DSPC:cholesterol:DMG-PEG2000) and 46.3:9.4:42.7:1.6 (ALC-0315:DSPC:cholesterol:ALC0159 [Echelon Biosciences]), respectively.

For in vitro and in vivo studies, excluding the high-throughput screening phase, LNPs were dialyzed against 1× PBS in a 20,000 MWCO cassette (Thermo Fisher) at 4 °C for 6 h before testing in cells and animals. During high-throughput screening, LNPs prepared using an automated liquid handler (OT2) were directly subjected to cellular assays without additional dialysis steps. The optimization of H9 and R6 LNP formulations for subsequent experiments was achieved using a DoE approach. This process involved the use of JMP 16 statistical software (SAS Institute) to analyze the experimental data. The application of a four-factor Box-Behnken design facilitated the development of second-order models across 17 preparation runs. This approach is widely acknowledged as an effective experimental design for the identification of key factors. The design encompassed five factors: lipid/mRNA weight ratio, ionizable lipid molar ratio, helper lipid or cationic molar ratio, PEG molar ratio, and cholesterol molar ratio, each of which was examined at low, middle, and high levels. Before formulation optimization, H9 LNP and R6 LNP were formulated with the same ratio of 35/16/46.5/2.5 (ionizable lipid:DOTAP:cholesterol:DMG-PEG2000). The top-performing H9 LNP and R6 LNP were formulated at molar ratios of 50:10:38.5:1.5 (H9:DOPE:cholesterol:DMG-PEG2000) and 60:15:42.7:1.6 (R6:DOTAP:cholesterol:DMG-PEG2000), respectively. The size, polydispersity index (PDI) and zeta potentials of LNPs were measured using Zetasizer Nano ZS (Malvern Instruments). mRNA encapsulation efficiency (EE) was measured by Ribogreen assay as previously described70. Briefly, a 100 mM stock solution of citric acid, sodium monobasic phosphate, and sodium bicarbonate was prepared. Using a 1 M sodium hydroxide stock solution, each buffer stock was aliquoted to create a total of 16 individual buffers with pH values ranging from 2 to 11. Citrate buffers ranged from pH 2 to 6, sodium phosphate buffers from pH 6 to 8, and bicarbonate buffers from pH 8 to 11. Separately, a stock solution of TNS in water was prepared at 600 µM, and LNPs were prepared at a concentration of 0.1 mg/ml mRNA. In a black 96-well plate, 100 µl of buffer, 10 µl of LNP, and 2 µl of TNS stock were added to each well. The fluorescence of each well was measured with excitation and emission wavelengths of 325 and 435 nm, respectively. The half-maximal point of the resulting fluorescence vs. pH plot was calculated as the LNP pKa.

LNP cytotoxicity and stability assay

To evaluate the cytotoxicity of two different post-DoE LNP formulations, H9 LNP and R6 LNP (0.1 µg) were tested in HeLa cells. For the toxicity assay, HeLa cells were seeded in 96-well plates at a density of 10,000 cells/well and incubated overnight to allow attachment. After 6, 12, 24, and 48 h, 20 μl of cell culture supernatant was added with 180 μl of QUANTI-Blue™ Solution was added to each well. Plates were then incubated at 37 °C for 15 min to allow metabolic reduction of resazurin by viable cells. The optical density value at 600 nm was immediately read on a Cytation microplate reader at 630 nm to evaluate the relative cytotoxicity induced by each empty LNP.

To study post-DoE LNP stability, the sizes, PDI, and EE were monitored for 1 week during storage in PBS at −20 °C. Similar to the storage condition of the lipid used by the Pfizer-BioNTech Comirnaty COVID-19 vaccine, 10% sucrose was added to the H9 LNP solution before the stability test. The luminescence was measured 6 h after injection.

In vitro high-throughput screening

Freshly prepared LNPs containing 0.1 μg of mFLuc, were added to pre-seeded HeLa and Raw 264.7 cells in 96-well plates. Following overnight incubation, the transfection of mFLuc was measured using the One-Glo Luciferase Assay System (Promega), following the manufacturer’s instructions. The luminescence was quantified using the Cytation imaging reader (BioTek). The mTP value is a measure of how effectively mRNA is able to transfect cells. It is calculated as the base 2 logarithm of the ratio of mean luminescence intensity between transfected cells and untreated cells at 24-h post-treatment. Specifically, the mTP value is defined as:

$${{mTP}}={{Log}}_{2}\left(\frac{{{Mean}}\; {{luminescence}}\; {{intensity}}\; {{of}}\; {{transfected}}\; {{cells}}}{{{Mean}}\; {{luminescence}}\; {{intensity}}\; {{of}}\; {{untreated}}\; {{cells}}}\right)$$

(11)

To summarize, all LNPs used in the HTS were prepared using liquid handling systems without any purification. The remaining LNPs were purified and dialysis. Finally, the resulting bioluminescence values are assigned to each SMILE string.

Animals experiment

All animal studies were approved and conducted in compliance with the University Health Network Animal Resources Centre guidelines (AUP#: 6842). Female and male C57BL/6 and ROSAmT/mG Cre reporter mice (4–8 weeks) were purchased from the Jackson Laboratory. The mice were maintained in a controlled environment with a 12-h light/dark cycle. The ambient temperature was maintained at 22–24 °C, and the humidity level was kept at 40–60%. Each cage housed a maximum of five mice to maintain appropriate social interaction and minimize stress levels among the animals.

In vivo luciferase mRNA for bioluminescence

At 6 h after the IM administration of the mRNA LNPs, mice were injected intraperitoneally with 0.2 ml d-luciferin (10 mg/ml in PBS). The mice were anesthetized in a ventilated anesthesia chamber with 1.5% isoflurane in oxygen and imaged 10 min after the injection with an in vivo imaging system (IVIS, PerkinElmer). Luminescence and fluorescence imaging were quantified using the Living Image software (PerkinElmer). For the bioluminescence assay, the exposure time (10 s), binning (medium), f/stop (1), and excitation (not applicable) emission filters (560 nm). For fluorescence imaging assay, the exposure time (30 s), binning (medium), f/stop (1), excitation wavelength 568 nm with emission filter 580 nm for GFP, excitation wavelength 488 nm with emission filter 505 nm for tdTomato, excitation wavelength 647 nm with emission filter 660 nm for Cy5. C57BL/6 mice (n = 3/group, 4–8 weeks, female) were purchased from the Jackson Laboratories.

ROSAmT/mG Cre reporter mice transfection analysis

For gene recombinant Cre mRNA delivery, LNPs co-formulated with Cre mRNA (0.5 mg kg−1) were IM injected into ROSAmT/mG Cre reporter mice (n = 3/group, 4–8 weeks, female, from the Jackson Laboratory). After 7d, mice were killed, and major organs were collected and imaged using an IVIS imaging system (PerkinElmer). For direct fluorescence imaging, organs and muscle tissues were fixed in 4% buffered paraformaldehyde overnight at 4 °C, then equilibrated in 30% sucrose overnight at 4 °C before freezing in OCT. Three nonconsecutive sections from each organ sample were mounted with DAPI to visualize nuclei and imaged for DAPI, tdTomato, and GFP. Sectioned into 10 μm depth, and further imaged using a Fluorescence microscope (Zeiss AXIO Observer 7 Inverted LED Fluorescence Motorized Microscope).

Transfection test in RAW 264.7 cells

LNPs containing 500 ng GFP-mRNA were added to 24-well plates pre-seeded with RAW 264.7 macrophages for 48 h incubation at 37 °C. A fluorescence microscope (Zeiss AXIO Observer 7 Inverted LED Fluorescence Motorized Microscope) and Flow cytometer (cytoFLEX S) were used to evaluate the GFP expression.

Statistical analysis

The data were subjected to statistical analyses using GraphPad Prism 9 (GraphPad Software). A two-tailed unpaired Student’s t-test was conducted to assess the significance of the comparisons as indicated. Data are expressed as mean ± s.d. P values < 0.05 (*), P < 0.01 (**), P < 0.001 (***), and P < 0.0001 (****) were statistically significant.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *