Prospective de novo drug design with deep interactome learning

Machine Learning


Neural network architecture

The DRAGONFLY method employs a graph neural network architecture73,74,75. This approach utilizes a GTNN model to encode the input molecular graph, which is represented as a 2D graph for ligands and a 3D graph for protein binding sites. The GTNN transforms the graph into a condensed one-dimensional (1D) feature vector. Subsequently, this feature vector is decoded back into the corresponding molecular string, using a CLM based on an RNN-LSTM32,76 architecture for the molecule generation process.

Graph transformer neural network

Message passing: The atomic features were embedded and transformed using a multilayer Perceptron (MLP) to obtain atomic feature vectors \({{{{{{{{\bf{h}}}}}}}}}_{i}^{0}\). Message passing as suggested by Satorras et al.77 and used in other 3D-based prediction tasks78,79 was applied to L = 3 layers, iteratively applied over all atomic representations \({{{{{{{{\bf{h}}}}}}}}}_{i}^{0}\). Edges were introduced differently in the 2D and 3D graph representations. In the 2D graph, edges were established between atoms connected by covalent bonds. On the other hand, in the 3D graph, edges were formed between all atoms situated within a radius of 4 Å from each other. This approach ensured that the molecular structures were accurately represented in both 2D and 3D formats, effectively capturing the most relevant interactions occurring between atoms. In each iteration of the message-passing layer, the atomic representations underwent a transformation as described by Equation (1).

$${{{{{{{{\bf{h}}}}}}}}}_{i}^{l+1}=\phi \left({{{{{{{{\bf{h}}}}}}}}}_{i}^{l},\mathop{\sum}\limits_{j\in {{{{{{{\mathcal{N}}}}}}}}(i)}\psi \left({{{{{{{{\bf{h}}}}}}}}}_{i}^{l},{{{{{{{{\bf{h}}}}}}}}}_{j}^{l}\right)\right),$$

(1)

for 2D graph structures, and Equation (2)

$${{{{{{{{\bf{h}}}}}}}}}_{i}^{l+1}=\phi \left({{{{{{{{\bf{h}}}}}}}}}_{i}^{l},\mathop{\sum}\limits_{j\in {{{{{{{\mathcal{N}}}}}}}}(i)}\psi \left({{{{{{{{\bf{h}}}}}}}}}_{i}^{l},{{{{{{{{\bf{h}}}}}}}}}_{j}^{l},{{{{{{{{\bf{r}}}}}}}}}_{i,j}\right)\right),$$

(2)

for 3D graph structures.

In Equations (1) and (2) \({{{{{{{{\bf{h}}}}}}}}}_{i}^{l}\) is the atomic representation h of the i-th atom at the l-th layer; \(j\in {{{{{{{\mathcal{N}}}}}}}}(i)\) is the set of neighboring nodes of atom i connected via edges; ri,j the inter-atomic distance represented in terms of Fourier features, using a sine- and cosine-based encoding; ψ is an MLP transforming node features into message features mij: \({{{{{{{{\bf{m}}}}}}}}}_{ij}=\psi ({{{{{{{{\bf{h}}}}}}}}}_{i}^{l},{{{{{{{{\bf{h}}}}}}}}}_{j}^{l},{{{{{{{{\bf{r}}}}}}}}}_{i,j})\) for 3D graphs, and \({{{{{{{{\bf{m}}}}}}}}}_{ij}=\psi ({{{{{{{{\bf{h}}}}}}}}}_{i}^{l},{{{{{{{{\bf{h}}}}}}}}}_{j}^{l})\) for 2D graphs; ∑ denotes the permutation-invariant pooling operator (i.e., sum) transforming mij into mi: \({{{{{{{{\bf{m}}}}}}}}}_{i}={\sum }_{j\in {{{{{{{\mathcal{N}}}}}}}}(i)}{{{{{{{{\bf{m}}}}}}}}}_{ij}\); and ϕ is an MLP transforming \({{{{{{{{\bf{h}}}}}}}}}_{i}^{l}\) and mi into \({{{{{{{{\bf{h}}}}}}}}}_{i}^{l+1}\). The atomic features from all layers \([{{{{{{{{\bf{h}}}}}}}}}_{i}^{l=1},{{{{{{{{\bf{h}}}}}}}}}_{i}^{l=2},{{{{{{{{\bf{h}}}}}}}}}_{i}^{l=3}]\) were concatenated and transformed via an MLP, resulting in final atomic features Hi. The features Hi were subsequently pooled into a molecular representation via a graph multiset transformer (GMT) and further transformed via two MLPs to the two 1D latent space representations \({{{{{{{{\bf{l}}}}}}}}}_{t=0}^{1}\) and \({{{{{{{{\bf{l}}}}}}}}}_{t=0}^{2}\). A detailed description of the GMT module can be found elsewhere30.

Long-short-term memory neural network

LSTM neural networks represent a specific category of recurrent neural networks renowned for their capacity to understand and produce sequences of characters. Their proficiency in comprehending sequential data and capturing intricate temporal connections renders them suitable for de novo drug design applications. In this context, the LSTM architecture was integrated to convert the acquired hidden states from the GTNN (i.e., lt = 01 and lt = 02) into a molecule represented in string form (SMILES or SELFIES). \({{{{{{{{\bf{l}}}}}}}}}_{t=0}^{1}\) and \({{{{{{{{\bf{l}}}}}}}}}_{t=0}^{2}\) are used as the initial hidden states of the LSTM architecture. At each time step t the next character of the sequence ωt+1 is predicted given the two hidden states \({{{{{{{{\bf{l}}}}}}}}}_{t}^{1}\) and \({{{{{{{{\bf{l}}}}}}}}}_{t}^{2}\), the two memory cell states \({{{{{{{{\bf{c}}}}}}}}}_{t}^{1}\) and \({{{{{{{{\bf{c}}}}}}}}}_{t}^{2}\), and the embedding kt of the previous character in the sequence ωt. This transformation is conducted using four non-linear transformations via Equation (3):

$${{{{{{{{\bf{g}}}}}}}}}_{i} =\sigma ({{{{{{{{\bf{W}}}}}}}}}_{ix}{{{{{{{{\bf{k}}}}}}}}}_{t}+{b}_{ix}+{{{{{{{{\bf{W}}}}}}}}}_{il}{{{{{{{{\bf{l}}}}}}}}}_{t-1}+{b}_{il})\\ {{{{{{{{\bf{g}}}}}}}}}_{f} =\sigma ({{{{{{{{\bf{W}}}}}}}}}_{fx}{{{{{{{{\bf{k}}}}}}}}}_{t}+{b}_{fx}+{{{{{{{{\bf{W}}}}}}}}}_{fl}{{{{{{{{\bf{l}}}}}}}}}_{t-1}+{b}_{fl})\\ {{{{{{{{\bf{g}}}}}}}}}_{o} =\sigma ({{{{{{{{\bf{W}}}}}}}}}_{ox}{{{{{{{{\bf{k}}}}}}}}}_{t}+{b}_{ox}+{{{{{{{{\bf{W}}}}}}}}}_{ol}{{{{{{{{\bf{l}}}}}}}}}_{t-1}+{b}_{ol})\\ {\widetilde{{{{{{{{\bf{c}}}}}}}}}}_{t} =\tanh ({{{{{{{{\bf{W}}}}}}}}}_{cx}{{{{{{{{\bf{k}}}}}}}}}_{t}+{b}_{cx}+{{{{{{{{\bf{W}}}}}}}}}_{cl}{{{{{{{{\bf{l}}}}}}}}}_{t-1}+{b}_{cl})\\ {{{{{{{{\bf{c}}}}}}}}}_{t} ={{{{{{{{\bf{g}}}}}}}}}_{f}\odot {{{{{{{{\bf{c}}}}}}}}}_{t-1}+{{{{{{{{\bf{g}}}}}}}}}_{i}\odot {\widetilde{{{{{{{{\bf{c}}}}}}}}}}_{t}\\ {{{{{{{{\bf{l}}}}}}}}}_{t} ={{{{{{{{\bf{g}}}}}}}}}_{o}\odot {{{{{{{{\bf{c}}}}}}}}}_{t}$$

(3)

where lt and ct represent the hidden state and the memory cell state at time t, respectively. gi, gf and go represent the input, forget, and output gates, respectively. σ and  indicate the sigmoid activation function and the Hadamard product80, respectively. \({\widetilde{{{{{{{{\bf{c}}}}}}}}}}_{t}\) represents the candidate memory cell state, which is used to update the previous memory cell state ct−1. W and b are the weights and biases used for the corresponding linear transformations. The resulting updated hidden state lt is then transformed using a softmax activation function to obtain a logit vector \({\hat{{{{{{{{\bf{y}}}}}}}}}}_{t}\) (i.e., a vector with the dimension of the alphabet Ω) via Equation (4):

$${\hat{{{{{{{{\bf{y}}}}}}}}}}_{t}={{{{{{{\rm{softmax}}}}}}}}({{{{{{{{\bf{W}}}}}}}}}_{yl}{{{{{{{{\bf{l}}}}}}}}}_{t}+{b}_{yl})$$

(4)

Throughout the training phase, the cross-entropy loss was computed based on \(\hat{{{{{{{{\bf{y}}}}}}}}}t\) and the ground truth yt. The ground truth vector yt was structured with zeros in all positions except for the character’s anticipated location, which was assigned a value of 1 for each prediction in the sequence. Subsequently, this calculated loss was backpropagated seamlessly through the LSTM and GTNN networks in an end-to-end manner. The training process involved the application of teacher forcing, as described in the work by Lamb et al.81.

Molecule sampling

Temperature sampling was employed as a mechanism to facilitate the generation of a diverse array of output molecules using a trained DRAGONFLY model7, achieved through Equation (5):

$$P({\hat{{{{{{{{\bf{y}}}}}}}}}}_{t+1}=\omega | {\hat{{{{{{{{\bf{y}}}}}}}}}}_{t=0},…,{\hat{{{{{{{{\bf{y}}}}}}}}}}_{t})=\frac{\exp ({\hat{{{{{{{{\bf{y}}}}}}}}}}_{t}^{\omega }/T)}{\mathop{\sum }\nolimits_{\omega }^{\Omega }\exp ({\hat{{{{{{{{\bf{y}}}}}}}}}}_{t}^{\omega }/T)}$$

(5)

where T is the temperature value, and P the probability of the output representation \({\hat{{{{{{{{\bf{y}}}}}}}}}}_{t+1}\) being the character ω given all previous outputs. The character sampling process was regulated by the temperature parameter T. When T is set to a high value (T → ), character probabilities tend to equalize across all characters. Conversely, as T decreases towards 0, the highest likelihood predicted by \({\hat{{{{{{{{\bf{y}}}}}}}}}}_{t+1}\) approaches 1. In the context of DRAGONFLY applications, four distinct temperature values (0.2, 0.5, 0.8, 1.1) were investigated. A value of T = 0.5 was found to strike the most favorable balance between novelty, diversity, the prediction of active compounds, and synthesizability, as indicated by the outcomes presented in Figs. S9–S10.

Atom featurization

Small molecules: The atomic properties of small-molecule ligands were encoded via the following embeddings: 10 atom types [H, C, N, O, F, P, S, Cl, Br, I], two ring types [True, False], two aromaticity types [True, False], and four hybridization types [sp3, sp2, sp, s].

Proteins: The protein binding site was defined by all protein atoms that are within a 5 Å radius to a ligand atom. The atomic properties of the respective protein binding sites were encoded using the following four features: (i) an embedding of the atom types using 22 different embeddings, (ii) an embedding of the combination of amino acid and atom types covering 225 different embeddigs, (iii) the distance to the closest atom of the bound small-molecule ligand, (iv) the calculated B factor, aiming to quantify protein flexibility and intrinsic disorder at the corresponding atom (Section S3).

Bond types: Edges were represented by inter-atomic distance in terms of Fourier features, using a sine- and cosine-based encoding for 3D graphs82. No edge features were used for 2D graphs. Edges were introduced between covalently bound atoms for the 2D graphs, and between all atoms within a 4 Å radius from each other for the 3D graphs.

Hyperparameters

The selected hyperparameters for the neural network led to a combined count of trainable parameters amounting to 6.94 million (3.49 million for the GTNN encoder and 3.45 million for the LSTM decoder) for the ligand-based design DRAGONFLY model. Similarly, the structure-based design DRAGONFLY model encompassed 7.01 million trainable parameters (3.56 million for the GTNN encoder and 3.45 million for the LSTM decoder).

Scoring

Quantitative structure-activity relationship

Kernel ridge regression (KRR) was employed to establish QSAR models based on descriptors and fingerprints. Kernel-based machine learning, rooted in the work of Krige83, resides within the realm of supervised learning techniques and has found application across a spectrum of machine learning investigations84,85,86,87. The assessment of similarity between two molecules i and j was carried out utilizing the Laplacian Kernel (Eq. (6)):

$$k({{{{{{{{\bf{x}}}}}}}}}_{i},{{{{{{{{\bf{x}}}}}}}}}_{j})=\exp (-\frac{| | {{{{{{{{\bf{x}}}}}}}}}_{i}-{{{{{{{{\bf{x}}}}}}}}}_{j}| {| }_{1}}{\sigma })$$

(6)

where xi is the molecular descriptor or fingerprint of molecule i and σ is the length scale hyperparameter. Herein, σ was set to 51.2, after screening 0.12i for i in range (1, 20). Three different molecular descriptors were applied in this study, namely, extended-connectivity fingerprints (ECFP, radius = 2, dimension = 512)36, chemically advanced template search (CATS) with absolute feature frequencies67,88, and ultrafast shape recognition with pharmacophoric constraints (USRCAT)38. Once the kernel matrix K = k(xi, xj) was calculated, the fitting coefficients α were computed via the inverse of the kernel matrix K via Equation (7):

$${{{{{{{\boldsymbol{\alpha }}}}}}}}={({{{{{{{\bf{K}}}}}}}}+\lambda {{{{{{{\bf{I}}}}}}}})}^{-1}{{{{{{{\bf{y}}}}}}}}$$

(7)

where λ denotes the regularization strength (herein, optimized to 10−7), I the identity matrix, and y the labels of the molecules (herein bioactivity to the investigated target). Given a labeled data set with N molecule-label pairs \(\{{({{{{{{{{\bf{x}}}}}}}}}_{i},{y}_{i})}_{i=1}^{N}\}\), a function was obtained that maps the molecular descriptor of a novel molecule xq to its predicted bioactivity \({\hat{y}}_{q}\) via Equation (8):

$${\hat{y}}_{q}({{{{{{{{\bf{x}}}}}}}}}_{q})=\mathop{\sum }\limits_{i}^{N}{{{{{{{{\boldsymbol{\alpha }}}}}}}}}_{i}\cdot k({{{{{{{{\bf{x}}}}}}}}}_{i},{{{{{{{{\bf{x}}}}}}}}}_{q})$$

(8)

Molecular novelty

The novelty of the generated molecules was assessed through two distinct metrics: structural novelty score (SECFP) and scaffold novelty score (Sscaffold). The structural novelty score (SECFP) was established based on the Jaccard distance (1 minus Tanimoto similarity89) concerning the most similar molecule within the training data set using ECFP36 descriptors. The Jaccard distance attains a value of 1 between two molecules when they possess no common structural attributes as identified by ECFP (bits within the ECFP vector). Conversely, it reaches a value of 0 when two distinct molecules share identical structural features (identical ECFP vectors). The scaffold novelty score (Sscaffold) gauges the novelty of both the atom scaffold (commonly referred to as the Murcko scaffold90) and the carbon scaffold (also known as the skeleton scaffold91) present in a generated molecule. Atom scaffolds were determined by considering the rings and branches of a specific template molecule. In this process, substituents were eliminated, while the identity of atoms and bonds remained unaltered (as detailed in SI2.4). Carbon scaffolds were identified by the carbon framework of a molecule, wherein all non-hydrogen atoms were transformed into carbon atoms and all bonds were replaced by single bonds (illustrated in Fig. S7). The scaffold novelty score was formulated by incorporating both atom and carbon scaffold scores. Each of these scores determined whether the corresponding scaffold was present in any molecule within the training set, as determined by Equations ((9)– (11)).

$${S}_{{{{{{{{\rm{atom}}}}}}}}}=\left\{\begin{array}{ll}0,\quad &\,{{\mbox{if atom scaffold in training set}}}\,\\ 0.1,\quad &\,{{\mbox{otherwise}}}\,\end{array}\right.$$

(9)

$${S}_{{{{{{{{\rm{carbon}}}}}}}}}=\left\{\begin{array}{ll}0,\quad &\,{{\mbox{if carbon scaffold in training set}}}\,\\ 0.1,\quad &\,{{\mbox{otherwise}}}\,\end{array}\right.$$

(10)

$${S}_{{{{{{{{\rm{scaffold}}}}}}}}}={S}_{{{{{{{{\rm{atom}}}}}}}}}+{S}_{{{{{{{{\rm{carbon}}}}}}}}}$$

(11)

Both structural and scaffold novelty contribute to the overall novelty score, i.e., Equation (12), ranging from 0 (for molecules very close to molecules the training set) to 1.2 (for molecules with no ECFP overlap with the training set and no shared scaffolds).

$${S}_{{{{{{{{\rm{novelty}}}}}}}}}={S}_{{{{{{{{\rm{ECFP}}}}}}}}}+{S}_{{{{{{{{\rm{scaffold}}}}}}}}}$$

(12)

Molecular property analysis

Molecular data sets were generated using a DRAGONFLY model, which was trained on a comprehensive data set excluding proteins and ligands associated with 20 specified targets. These targets are listed in Tables S2 and S3. For each target 2000 random molecules were selected. The physicochemical properties of these molecules were computed and subsequently used as input for the DRAGONFLY model. The properties of the generated molecules were visualized in a scatter plot (Fig. 2a) and summarized in Table 3. The scatter plot illustrates the relationship between the actual and predicted properties of the molecules. The mean absolute errors (MAEs) and Pearson correlation coefficients (r) were calculated to assess the predictive performance of the DRAGONFLY model. These statistical measures were derived by comparing the extracted properties of the generated molecules against the properties of the original data set.

Drug-target interactome preprocessing

The data necessary for constructing the drug-target graph, referred to as the “interactome,” was sourced from two distinct databases: ChEMBL28 (Version 29) and PDBBind92 (Version 2020).

Preprocessing ChEMBL data

To acquire the necessary interactome data, the ChEMBL29 database28 was queried. Similar to prior studies93, this data extraction process was divided into two stages: In the initial step, a compilation of biological targets was obtained. Subsequently, compounds were extracted for which specific activities against any of these targets were annotated. Single-protein targets that possessed assay information for a minimum of 10 compounds with unique internal identifiers were retrieved from the ChEMBL database. A series of activity and annotation filters were then applied to these compounds. The molecules underwent neutralization, and any salts and solvents were eliminated. For compounds comprising multiple distinct fragments following this “washing” procedure, all but the fragment with the highest number of heavy atoms were discarded. Furthermore, molecules containing <3 or >100 heavy atoms, as well as radical species, were excluded from the data set. This procedure yielded a data set of 742 k unique SMILES-strings with annotated biologic affinity. Using a cut-off of a binding affinity of 200 nM, removing duplicates, a maximal SMILES-string length of 97 (using the longest SMILES-length from five randomized sampled SMILES-strings) for the ligand, and a minimum number of five ligands per target resulted in a drug-target graph consisting of 501 k unique binding affinities for 360 k unique ligands and 2989 unique target-IDs.

Preprocessing PDBbind

The PDBbind database (Version 2020) was obtained by downloading it from the link http://www.pdbbind.org.cn/download.php, which yielded a collective count of 19,443 protein-ligand structures. After filtering out structures annotated with “incomplete ligand structure”, “covalent complex,” or “incomplete ligand structure”, a total of 19,000 entries remained. Additionally, a more refined filtering process was conducted, excluding structures with ligand molecular weights outside the range of 100–1200 g mol-1 and binding affinities >10 μM. This filtration yielded a collection of 17,824 structures. This curated list of entries was then cross-referenced with the target-IDs present within the drug-target graph used for ligand-based design. This specific graph contained 501,000 unique binding affinities encompassing around 360,000 unique molecules and 2989 unique target-IDs. The outcome of this mapping effort revealed a total of 8351 distinct protein structures associated with 744 unique target-IDs. By refining the drug-target graph to exclusively include target-IDs with annotated PDB structures, the modified graph encompassed around 263,000 unique binding affinities spanning around 208,000 unique molecules and 744 unique target-IDs. The connection between PDB-IDs and target-IDs within ChEMBL was facilitated through UNIPROT-IDs, given that both databases provide UNIPROT-IDs for individual proteins.

Numerous drug targets exhibit multiple binding sites, including orthosteric sites and various allosteric sites94. Although such details were not present in the ChEMBL database, recognizing these distinct binding sites was deemed essential for effective drug-target interactome learning.molecules known for their allosteric modulation were extracted from the reference cited as Ref. 95. Subsequently, the drug-target graph underwent a modification whereby target-IDs encompassing both allosteric and orthosteric ligands were treated as distinct target-IDs.

Chemical alphabet

DRAGONFLY models underwent training using two distinct chemical alphabets: SMILES strings3 and SELFIES40. To discern the distinct character types in both types of strings, 10 randomly generated SMILES strings were created for each molecule within the data set. For SMILES strings, all observed characters surrounded by brackets ([]), as well as some frequently occurring functional groups (e.g., sulfoxide, nitro, ketone, nitrile) were encoded as a single token (SI5). In both string types, three supplementary characters were introduced to serve as markers for the beginning, end, and padding of the strings: x, y, and z for SMILES-strings, and [\\X], [\\Y], and [\\Z] for SELFIES. Following this procedure, a SMILES-string alphabet ΩSMILES was established, comprising a total of 57 characters. A SELFIES alphabet ΩSELFIES was constructed, encompassing a total of 85 characters (as detailed in Table S1).

Absolute free binding energy calculations

Molecules 1 and 2 as well as different ligands from ChEMBL with known PPARγ activity (ChEMBL IDs: ChEMBLl391987, ChEMBL241472, ChEMBL241299, ChEMBL213355, ChEMBL212591) were modeled into the PPARγ-aleglitazar crystal structure (PDB ID: 3G9E)41. The chosen reference molecules from the ChEMBL database were selected based on their structural similarity to compounds 1 and 2 (i.e., possessing (i) a carboxylic acid as head group, (ii) an alkyl or polyethylene glycol linker, and (iii) an aromatic tail), and their comparable binding affinity (i.e., EC50 values ≤5 μM and ≥ 100 nM). After structure preparation, ABFEP simulations were carried out with Schrödinger software (release 2023-4) using default settings and a simulation time of 5 ns for both complex and solvent96. The lowest calculated free energies were obtained for the co-crystallized ligand aleglitazar (EC50  = 21 nM) and ChEMBL241472 (EC50  = 140 nM) (Fig. S17).

Cytotoxicity assay on HEK293T cells

HEK293T cells were seeded at the indicated number per well in DMEM-high glucose, complemented with glutamax, pen-strep, and 10% FBS, in a total of 40 μl of medium. The cells were incubated overnight at 37 °C. Compounds were added to the cells at the indicated concentrations, resulting in a final Dimethylsulfoxid (DMSO) concentration of 0.2%. The compounds were incubated on the cells for either 16 h or 24 h. At the specified time point, the medium was carefully removed from the vessel, leaving only 2 μl in the wells. Celltiter-glo (CTG) reagent (G7572, Promega) was prepared according to the manufacturer’s instructions. Plates with cells were equilibrated at room temperature for 30 min. Subsequently, 25 μl of CTG reagent was added to the cells. The plates were then shaken for 2 min and incubated for an additional 15 min at room temperature. Luminescence was read afterward with BG Pherastar.

Biological characterization

Compounds 13 were characterized in a hybrid reporter gene assay for their agoniztic effect on human nuclear receptors PPARα/γ/δ, RXRα, FXRα, RARα in HEK293T cells. Compound 1 was tested in an isothermal titration calorimetry (ITC) assay to measure direct binding affinity to the ligand-binding domain of PPARγ. ADME properties were measured in standardized assays at Roche.

Hybrid reporter gene assays

PPAR activation was determined in uniform Gal4-hybrid reporter gene assays for the PPARα, PPARγ and PPARδ isoforms in HEK293T cells (German Collection of Microorganisms and Cell Culture GmbH, DSMZ) which were transiently transfected with pFR-Luc (Stratagene, La Jolla, CA, USA; reporter) and pRL-SV40 (Promega, Madison, WI, USA; internal control) and one pFA-CMV-hPPAR-LBD97 clone, coding for the hinge region and ligand binding domain of the canonical isoform of human PPARα, PPARγ, PPARδ or respectively. Cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM), high glucose supplemented with 10% fetal calf serum (FCS), sodium pyruvate (1 mM), penicillin (100 U  ml-1), and streptomycin (100 μg  ml-1) at 37 °C and 5% CO2 and seeded in 96-well plates (3 × 104 cells per well). After 24 h, medium was changed to Opti-MEM without supplements and cells were transiently transfected using Lipofectamine LTX reagent (Invitrogen) according to the manufacturer’s protocol. Five hours after transfection, cells were incubated with the test compounds in Opti-MEM supplemented with penicillin (100 U  ml-1), streptomycin (100 μg  ml-1) and 0.1% DMSO for 16 h before luciferase activity was measured using the Dual-Glo Luciferase Assay System (Promega) according to the manufacturer’s protocol on a Tecan Spark luminometer (Tecan Deutschland GmbH, Germany). Firefly luminescence was divided by Renilla luminescence and multiplied by 1000 resulting in relative light units (RLU) to normalize for transfection efficiency and cell growth. Fold activation was obtained by dividing the mean RLU of a test compound by the mean RLU of the untreated control. All samples were tested in at least three biologically independent experiments in duplicates. For dose-response curve fitting and calculation of EC50 values, the equation “[Agonist] versus response (variable slope—four parameters)” was used in GraphPad Prism (version 7.00, GraphPad Software, La Jolla, CA, USA) with fold activation data. The reference agonizts GW7647 (PPARα)98,99, pioglitazone (PPARγ)100,101 and L165,041 (PPARδ)102,103 were used to validate the assays and to monitor assay performance. Nuclear receptor selectivity profiling was performed with corresponding pFA-CMV-hNR-LBD clones and suitable reference agonizts on RARα (pFA-CMV-hRARα-LBD104, 1 μM tretinoin), LXRα (pFA-CMV-hLXRα-LBD104, 1 μM TO901317) and RXRα (pFA-CMV-h RXRα-LBD105, 1 μM Bexarotene).

Isothermal Titration Calorimetry (ITC)

ITC experiments were conducted on an Affinity ITC instrument (TA Instruments, New Castle, DE) at 25 °C with a stirring rate of 75 rpm. PPARγ LBD protein (30 μM, prepared as described previously106) in buffer (20 mM Tris pH 7.5, 150 mM NaCl, 5% glycerol) containing 5% DMSO was titrated with the test compound (1) (100 μM in the same buffer containing 5% DMSO) in 21 injections (1 × 1μl and 20 × 5μl) with an injection interval of 120 s. The test compound was titrated into buffer, and the buffer was titrated to the PPARγ LBD proteins under otherwise identical conditions. The ITC results were analyzed using NanoAnalyze software (TA Instruments, New Castle, DE) with an independent binding model.

Protein-ligand co-crystallization

The following construct was used for expression and co-crystallization. PPARγ (L204-Y477) (UniProt ID: P37231-2): MGSS-6His-SG-TEV-(L204-Y477). Molecular weight: 33465 Da. Large-scale expression of human PPARγ was conducted in E. coli BL-21 (DE3) cells (SI10). Subsequently co-crystals of PPARγ were grown using 6 mg  ml-1 protein in buffer: 20 mM Tris-HCl pH 8.0, 1 mM TCEP, 0.5 mM EDTA and 1 mM design 1 mixed with equal amounts of reservoir: 0.1 M Tris-HCl pH 7.5 and 1.6 M ammonium sulfate (Fig. S14). The structure was determination and refinement yielding the elucidated co-crystal structure with a resolution of 1.85 Å as depicted in Fig. 5 (Table S13 and Fig. S15).

Off-target screening

To test the specificity of compound 1 and 2, both were subject to panel screen against 50 safety-relevant off-targets107. Both compounds have shown a clear profile not reaching ≥50% inhibition or binding at a concentration of 10 μM with the exception for PPARγ (Tables S9–S12).

Chemical synthesis

Compounds 13 were synthesized starting from commercial building blocks. The synthesis and the full analytical characterization of the final compounds and intermediates are described in SI13.

Co-crystallization

Compound 1 was co-crystalized with the ligand binding domain of human PPARγ (Leu204–Tyr477) (UniProt ID: P37231-2). The crystallographic structure is accessible from the Protein Data Bank108 (PDB ID: 8PBO). Details about construct design, protein expression and purification, crystallization, data collection, and structure determination and refinement can be found in SI10.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *