Pathway overview
The base pathway (Fig. 1a) is an engineered mevalonate (MVA) pathway with a promiscuous mevalonate decarboxylase (PMD*) from S. cerevisiae capable of converting mevalonate monophosphate into isopentenyl monophosphate, thereby bypassing isopentenyl diphosphate (IPP-Bypass)51,52. Pathway protein expression was monitored across all strains (Supplementary Fig. 15).
Construction of placeholder sgRNA Library
Cycle libraries were constructed via an automated Golden Gate assembly combinatorial strategy, Biopart Assembly Standard for Idempotent Cloning (BASIC)36,53. Briefly, sgRNA sequences of heuristic or FluxRETAP34 targets were generated using CRISPOR43 selected for no off-target sites, high predicted efficiency, binding to the non-target (sense) strand, and proximity to transcriptional initiation. Oligonucleotides (Supplementary Data 2) encoding sgRNAs (Integrated DNA Technologies, Redwood City, CA) with flanking type IIS endonuclease BsaI recognition sites were digested with BsaI HF v2 (New England Biolabs (NEB), Ipswich, MA), then purified with magnetic beads (AxyPrep Mag PCR Clean-Up, Axygen Inc., Union City, CA). A placeholder plasmid with sfGFP flanked by PaqCI recognition sites was likewise digested (NEB) to yield complementary cohesive ends for subsequent ligation with T4 DNA ligase (NEB) and transformation into XL-1 Blue chemically competent E. coli cells prepared by the UC Berkeley QB3 Core facility (Berkeley, CA). Transformations were plated on LB (Luria Bertani) agar (10 g/L tryptone, 5 g/L yeast extract, and 10 g/L NaCl; Millipore Sigma, Burlington, MA) with 100 mg/L carbenicillin and, following overnight growth, screened against sfGFP fluorescence. A single non-fluorescent colony was inoculated into liquid medium for plasmid extraction (QIAPrep Spin Miniprep Kit, Qiagen), then screened via Sanger sequencing (Azenta Life Sciences, Burlington, MA). Each placeholder plasmid thus maintained a “gRNA unit” composed of a J23119 constitutive Anderson promoter, a designed sgRNA, the Cas9 sgRNA handle, and the Cas9 terminator sequence again flanked by BsaI endonuclease recognition sites for future assemblies.
Automated construction of dCas9-gRNA plasmids
A 96-well plasmid extraction kit (BioBasic, Markham, ON, Canada) compatible with the Hamilton VANTAGE (Hamilton, Reno, NV) was used for purification of the plasmid libraries. Prefix and suffix linkers consist of an annealed adapter and linker oligo. In accordance with BASIC, individual oligos were first phosphorylated53 by mixing 5 μL T4 ligase buffer, 100 μM oligo, 1 μL T4 polynucleotide kinase (NEB) in a 50 μL reaction, followed by incubation at 37 °C. Phosphorylated oligos were then mixed in 400 μL of annealing buffer, heated to 95 °C for 10 min, then cooled to room temperature. Annealed prefix or suffix linkers were stored at −20 °C. Each suffix and prefix has position-dependent overhangs that facilitate the construction of multiplexed combinatorial arrays. Multiplexed sgRNA arrays demanded annealing of position-dependent linkers for each sgRNA, often with the same sgRNA unit in multiple positions on different recommendations (e.g., the PP_0812 sgRNA unit in position one as well as position two in a given cycle). A Jupyter notebook was written for rapid translation of ART recommendations to an ECHO dispense list.
The ECHO 550 acoustic liquid handler (Beckman Coulter, Brea, CA) was used to dispense 30 μL reaction mixtures for simultaneous digestion and annealing into a 384-well PCR plate as the destination plate (Beckman Coulter). Reagents for the reactions were stored in an ECHO-compatible 384-well polypropylene plate (384PP; Beckman Coulter). Each reaction mixture included 4 μL placeholder plasmid (diluted to 400 ng/μL on the plate), 0.5 μL BsaI HF v2, 0.5 μL T4 ligase, 5 μL of specified prefix and suffix linker, 3 μL ligase buffer, and 12 μL deionized (DI) water for each positional sgRNA arrangement. The dCas9-harboring vectors pIY989 or pDBTL3-Template also expressed sfGFP flanked by BsaI recognition sites, enabling fluorescent screening of successful transformants. Whole plasmid sequencing (Primordium Labs, Monrovia, CA; Plasmidsaurus, Eugene, OR) was performed on the dCas9 template vector as well as pIY670 before every cycle to ensure fidelity.
Analogous to the placeholder plasmids, the dCas9 vectors were similarly dispensed and annealed with linkers contingent upon the number of guides in the array. For context, generating the entire set of DBTL1-linked sgRNA units and destination vectors required 597 liquid handling steps. Following dispensing, the 384-well plates were sealed and transferred to a 384-well C1000 Touch Thermal Cycler (Bio-Rad, Hercules, CA). The assemblies were incubated at 37 °C for 2 min, then 20 °C for 1 min and cycled 20 times. Reactions were quenched by holding at 65 °C for 20 min before cooling to 4 °C.
Magnetic bead purification was performed using magnetic beads (Axygen Inc.) and automated using a KingFisher Apex (ThermoFisher Scientific, Waltham, MA). Purified linked sgRNA and vector units were then dispensed into another ECHO-compatible PP384 well source plate for ligation in a 384-well PCR destination plate. Owing to the designed compatible cohesive ends of the different prefixes and suffixes, ligation was accomplished simply by dispensing 0.5 μL of each DNA component with 0.5 μL cutsmart buffer (NEB) in a 5 μL reaction, subsequently incubating at 50 °C for 60 min to enable recombination. After DBTL0, each DBTL cycle consisted of 60 strains for four consecutive BioLector (Beckman Coulter) runs with on-plate controls.
Each reaction mixture was cooled and, as before, new assemblies were transformed into XL-1 Blue chemically competent E. coli cells. Here, 15 μL of chilled competent cells were dispensed into the reaction mixtures using a MANTIS liquid handler (FORMULATRIX, Dubai, United Arab Emirates). The transformation mixture was incubated on ice for 20 min, heat shocked at 42 °C for 1 min, and returned to ice for 5 min. Following heat shock, the transformations were outgrown by resuspension into a 96-well deep well plate (DWP) with 1 mL of SOC (20 g/L tryptone (Gibco Bacto, Fisher Scientific), 5 g/L yeast extract (Gibco Bacto, Fisher Scientific), 5 mM MgSO4, 20 mM glucose, 10 mM NaCl, and 2.5 mM KCl), and cultured for 1 h. Either the Biomek FX (Beckman Coulter) or the Hamilton VANTAGE was used to plate outgrowth cultures onto Q-Trays for overnight recovery. A QPix 460 colony picker (Molecular Devices, San Jose, CA) was then used to elucidate non-fluorescent colonies for overnight growth in a 96-well plate with LB and antibiotic (LB with 30 mg/L gentamicin). An aliquot of the overnight was cryostocked, with the remainder undergoing plasmid DNA extraction. Sanger sequencing confirmed a remarkably high assembly fidelity, typically surpassing 90% (55/60 transformations), though longer sgRNA arrays exhibited more frequent failure. Alignments were performed using multiple alignment using fast Fourier transform (MAFFT) in Benchling (Benchling, San Francisco, CA). In the event of sequencing failure, new non-fluorescent colonies were picked.
Automated transformation of P. putida
The selected chassis strain, P. putida IY1449b, has the in-frame deletions ΔphaABC, ΔmvaB, ΔhbdH, and 4,538,575Δ86,812 (Δzwf, ΔglZ, and ΔliuC) for improved isoprenol titers31. Each cycle, the strain was streaked onto an LB agar plate from which a single colony was inoculated into LB medium, grown overnight, and made electrocompetent for transformation with pIY670, harboring the IPP-Bypass MVA pathway, and selected (50 μg/mL kanamycin sulfate) to generate IY1452b31. From there, electrocompetent IY1452b cells were prepared for transformation with cycle-specific dCas-gRNA libraries.
A high-throughput electroporation platform was used to perform electroporation of dCas9-gRNA harboring plasmids into our P. putida production strain IY1452b32. The apparatus provides 384 individually addressable wells (0.1 mm gap width) in a format compatible with liquid handlers. The custom 384-well electroporation plates were washed with isopropyl alcohol, 70% ethanol, and MilliQ water before baking at 50 °C for 30 min. Then, plates were irradiated with 103 μJ/cm2 UV light for 30 s (UV Crosslinker, Fisher Scientific) and sealed with an AeraSeal gas-permeable membrane (Excel Scientific, Victorville, CA). The ECHO first dispensed 200 nL of each plasmid from a 384PP plasmid library plate, then dispensed 2 μL of electrocompetent P. putida IY1452 cells into individual wells. During DBTL0, transformations were completed in triplicate (Supplementary Fig. 3) while subsequent DBTL cycles of 60 sgRNA arrays were completed without replicates. Electroporations were completed with a voltage of 250 V and time constant of 5 ms, then resuspended in SOC medium with the Biomek FX (Beckman Coulter). Transformations were transferred to 96-well DWPs for recovery over 3 h at 30 °C and 1000 RPM shaking (INFORS HT Multitron, Bottmingen, Switzerland) and, finally, plated onto Q-Trays with LB medium (50 μg/mL kanamycin sulfate and 30 μg/mL gentamicin sulfate). Transformations were imaged using the QPix.
Passaging and culturing of P. putida strains
All strains were adapted to growth on minimal glucose medium. Briefly, single colonies were inoculated into 96-well DWPs with 1 mL LB medium (50 μg/mL kanamycin sulfate, 10 μg/mL gentamicin sulfate), then 40 μL was passaged twice into 1 mL of M9-NREL medium with the appropriate antibiotic. Here, 96-well DWP plates were again sealed with a gas-permeable membrane. M9-NREL medium was selected owing to its prevalence as a baseline P. putida production medium. Briefly, the medium composition included 20 g/L glucose, 0.5 g/L NaCl, 6.8 g/L Na2HPO4, 3 g/L KH2PO4, 100 μM CaCl2, 2 mM MgSO4, 10 mM (NH4)2SO4, and 500 μL of a trace metal solution (Teknova Cat no. T1001; Teknova, Hollister, CA). Gentamicin concentration was reduced during passaging and production owing to the considerable burden of dual antibiotics on growth. Following adaptation, strains were inoculated in triplicate into 1.5 mL of M9-NREL media in a 48-well BioLector flower plate without optodes and gas-permeable sealing foil (Beckman Coulter Life Sciences) to reduce evaporation. The production medium was always freshly made just prior to inoculation. A BioLector Pro (Beckman Coulter Life Sciences) was selected for strain culturing owing to its purported repeatability and scalability33,54,55. Cultures were grown at 24 °C and shaken at 1000 RPM without humidity control as batch experiments. Isoprenol pathway genes were induced after 8 h by the addition of L-arabinose to a final concentration of 2 g/L. Following 48 h of production, cultures were transferred from the 48-well BioLector plate into a 96-well DWP and pelleted by centrifugation at 3000 rpm in a benchtop centrifuge. Supernatant was extracted and transferred to a fresh deep well plate, while 10 μL of the pelleted cells were transferred to a 96-well PCR plate and frozen at −80 °C for proteomic analysis. All P. putida strains used in this study are listed in Supplementary Table 2.
Quantification of isoprenol using GC-FID
Isoprenol from BioLector experiments was detected using gas chromatography-flame ionization detection (GC-FID; Agilent Technologies, Santa Clara, CA). Here, a 400 µL aliquot of centrifuged culture supernatant was mixed with 400 µL of ethyl acetate containing 30 mg/L of 1-butanol as an internal standard. The extractions were vortexed for 10 min at 3000 RPM and then separated by centrifugation at 18,000 × g for 5 min. After separation, 200 µL of each ethyl acetate extraction was aliquoted into amber GC vials with glass vial inserts (Agilent Technologies). Samples were run on an Agilent 8890 GC System equipped with dual lines, Agilent 7693 A Autosampler, FID, and DB-Wax columns (15 m × 320 μm × 0.25 μm, Agilent J&W). The inlet and detector were held at 250 °C and 300 °C, respectively. All samples were run using splitless injection and a flow rate of 2.2 mL/min He. The oven program had an initial hold of 1 min at 40 °C, then the temperature was increased to 100 °C at a ramp rate of 15 °C/min, then increased to 230 °C at a ramp rate of 30 °C/min before holding for 1 min. All GC-FID data were quantified using OpenLab Chromatography Data Systems (Agilent Technologies).
Proteomics analysis
Protein was extracted from 10 µL cell pellets, and tryptic peptides were prepared by following an established proteomic sample preparation protocol56. Cell pellets were resuspended in Qiagen P2 Lysis Buffer (Qiagen, Germany) to promote cell lysis. Proteins were precipitated with the addition of 1 mM NaCl and 4× volume of acetone, followed by two additional washes with 80% acetone in water. The recovered protein pellet was homogenized by pipette mixing with 100 mM ammonium bicarbonate in 20% methanol. Protein concentration was determined by the DC protein assay (Bio-Rad, USA). Protein reduction was accomplished using 5 mM tris 2-(carboxyethyl)phosphine (TCEP) for 30 min at room temperature, and alkylation was performed with 10 mM iodoacetamide (IAM; final concentration) for 30 min at room temperature in the dark. Overnight digestion with trypsin was accomplished with a 1:50 trypsin:total protein ratio. The resulting peptide samples were analyzed on an Agilent 1290 UHPLC system coupled to a Thermo Scientific Orbitrap Exploris 480 mass spectrometer for discovery proteomics57. Briefly, peptide samples were loaded onto an Ascentis® ES-C18 Column (Sigma–Aldrich, USA) and were eluted from the column by using a 10 min gradient from 98% solvent A (0.1 % FA in H2O) and 2% solvent B (0.1% FA in ACN) to 65% solvent A and 35% solvent B. Eluting peptides were introduced to the mass spectrometer operating in positive-ion mode and were measured in data-independent acquisition (DIA) mode with a duty cycle of 3 survey scans from m/z 380 to m/z 985 and 45 Tandem mass spectrometry (MS2) scans with precursor isolation width of 13.5 m/z to cover the mass range. DIA raw data files were analyzed by an integrated software suite DIA-NN58. The database used in the DIA-NN search (library-free mode) included the latest P. putida KT2440 Uniprot proteome FASTA sequences in addition to the protein sequences of heterologous proteins and common proteomic contaminants. DIA-NN determines mass tolerances automatically based on first pass analysis of the samples with automated determination of optimal mass accuracies. The retention time extraction window was determined individually for all MS runs analyzed via the automated optimization procedure implemented in DIA-NN. Protein inference was enabled, and the quantification strategy was set to Robust LC = High Accuracy. The output main DIA-NN reports were filtered with a global false discovery rate set at 0.01 (FDR < = 0.01) on both the precursor level and protein group level. The Top3 method, which is the average MS signal response of the three most intense tryptic peptides of each identified protein, was used to plot the quantity of targeted proteins in the samples59,60.
Computational proteomics preprocessing
Initial CRISPRi target interference validation
In DBTL0, we used proteomics to determine whether an sgRNA downregulated a target gene. We kept the sgRNAs of strains that satisfied the following three constraints: (1) below 90% of the library mean for the target; (2) in the bottom quartile of target expression; and (3) isoprenol titer greater than 66.6 mg/L (approximately 40% of control isoprenol titer). Filtering resulted in 67 validated sgRNA targets.
Pseudorandom combinations for DBTL1
We generated pseudorandom combinations of the validated targets from DBTL0 to generate recommendations for DBTL1, biased by the target titer. We generated weights for each target using Equation 1:
$${weight}={round}(5+10*{normalized\; titer})$$
Here, titer was normalized to the control mean. We generated 30 of each two-sgRNA and three-sgRNA arrays by randomly selecting from the targets based on their weighted probabilities.
Multiguide CRISPRi validation
In DBTL1-6, we used proteomics abundance measurements normalized to the control to identify CRISPRi combinations in which all individual perturbations successfully interfered with their target. We first calculated the control average for all proteins targeted in a given DBTL cycle, along with the control average for dCas9. For each individual line, we then calculated the relative abundance of all targeted proteins in the strain relative to the control. To ensure that strains used for modeling express dCas9, we discarded strains with dCas9 expression below 25% of the control in any of the replicates. Strains were discarded if any individual POI level was above 50% that of the control in any of the replicates to ensure that strains used for modeling showed target downregulation. If any target was below the proteomics limit of detection, we assumed that the specific target was successfully downregulated. This proteomics filter was applied to the data collected in each DBTL cycle.
Active learning using the automated recommendation tool
We used ART to model the relationship between sgRNA and isoprenol titer and recommend new sgRNA combinations to try in the next DBTL cycle. ART is a flexible, user-friendly active learning package that has been applied to several other metabolic engineering optimization problems, including promoter selection and media optimization21,26. We represented sgRNA inputs as binary vectors of length 67 (the number of validated sgRNAs from DBTL0), where a 1 represented the presence of a sgRNA and a 0 represented the absence of that sgRNA. We chose to use a binary representation instead of one-hot encoding based on the assumption that the position of the sgRNA in the plasmid would not affect the outcome, and in order to reduce the sparsity of the dataset.
ART is an ensemble modeling tool. For our model, we used 7 models from the scikit-learn Python package (neural regressor, random forest regressor, support vector regressor, kernel ridge regressor, k-nearest neighbor regressor, Gaussian process regressor, and gradient boosting regressor). We also included one TPOT regressor model61. Model parameters are shown in Supplementary Table 3. We first tested models using 5-fold cross-validation to evaluate predictive performance. During cross-validation, we randomly split the data, ensuring that replicates from the same strain were all included in the same split. Finally, we trained a model on the full dataset to make recommendations.
For DBTL2, we trained ART on the data from DBTL0 and DBTL1 after filtering for proteomics. For DBTL3, we did not use ART, since we remade the designs from DBTL2 with an improved dCas9 plasmid. For DBTL4, we trained an ART model only on the data from DBTL3. Since DBTL3 only included 16 different sgRNA targets, we effectively reduced our target search space in DBTL4 and onwards. In DBTL5 and DBTL6, we trained on data from DBTL3 and all subsequent runs.
To generate recommendations, we passed all possible combinations of up to 4 sgRNA into ART (816,596 combinations for 67 targets as determined via binomial coefficients, \({\sum }_{k=2}^{4}\left(\frac{67}{k}\right)\)). In later cycles (DBTL4-5), we generated recommendations based only on sgRNA that were in DBTL3, since ART did not have any information about the contribution of other sgRNA not included in DBTL3 (18 targets, 3060 total recommendations). We used the trained ART model to calculate the mean and standard deviation of the posterior predictive distribution of titers for each sgRNA combination. To select recommendations to test experimentally, we sorted the list by the mean predicted titer. Then, we selected target combinations from the top of the list, with constraints on the number of times a single target could appear in the final list of experimental combinations. Specific settings for each cycle are detailed in Supplementary Table 4.
Sparse proteomics characterization using Stabl
We used the Stabl algorithm48 to identify sparse linear regression models that could predict isoprenol titer. Briefly, Stabl is a feature selection algorithm previously validated on high-dimensional simulated and biochemical data. Stabl provides false discovery rate control by augmenting the input data with ‘knockoff’ features62, which are generated from the input data and known to be uninformative. Then, subsets of features are used to fit a sparse classification (logistic regression) or regression model (LASSO, linear regression with L1 regularization). In this way, the number of knockoff features included in each model can be calculated explicitly, and an optimal false discovery rate can be chosen. We used Stabl in two ways: (1) to identify a minimal model to predict isoprenol titer; and (2) to classify strains based on the presence or absence of a sgRNA. We used standardized proteomics measurements for all endogenous proteins present in every cycle as features. After using Stabl to identify protein features to predict isoprenol, we identified upregulation targets based on their correlation with isoprenol titer. If a single target was part of a complex or operon, we included the whole complex/operon for upregulation. For predicting the presence or absence of an sgRNA in a strain, we further removed proteins within that operon, in addition to exogenous pathway proteins.
Overexpression of candidate genes
Plasmids were constructed by amplifying native P. putida genes associated with high isoprenol titer from Stabl. These operons, PP_2208-PP_2209 and PP_2791-PP_2794, were amplified along with the DBTL3 Control Vector using Q5 DNA polymerase (NEB) and oligos with 20 bp 5′ overhangs. The vector amplicons, designed for salicylic acid induction of the inserted genes, were digested with dpnI (Thermo Fisher Scientific), assembled (NEBuilder HiFi Assembly Cloning Kit, NEB), and, as before, cloned into XL-1-blue competent cells. Plasmid sequences were verified by whole plasmid sequencing (Primordium Labs) and ultimately transformed into IY1449b with pIY670 to evaluate the impact of titrated induction on isoprenol titer. Strains harboring genes informed by Stabl were adapted to M9 medium and cultured for over 48 h in a Biolector Pro with an RFP control (JBx_266188) before GC-FID analysis.
Knockout of candidate genes
Stable gene knockouts were generated from the parent strain IY1449b via a Cpf1-mediated repair63. Briefly, plasmid pTE452 harboring an RBS-tuned recT and mutL (E36K) from Pseudomonas aeruginosa under a 3-methyl-benzoate (3MB) inducible promoter was cloned into selected P. putida competent cells63,64. Another plasmid, pTE433β, harbored cpf1 from Francisella novicida for constitutive expression of a designed sgRNA. Initially, pTE433β was digested with PmeI and AgeI (NEB), purified, and then assembled (NEBuilder HiFi Assembly Cloning Kit, NEB) with a 70-mer ssDNA oligo encoding the sgRNA and flanked by 25 bp backbone homology. A library of plasmids targeting the 16 most abundant genes in later DBTL cycles was generated. These genes included PP_1769 in addition to those in Fig. 4D.
The pTE452 plasmid was cloned into IY1449b and grown overnight. Overnight cultures were diluted 10-fold into 25 mL LB gentamicin (30 mg/L) in a baffled 250 mL flask and grown for one hour at 200 RPM and 30 °C. After growth, 1 mM of 3MB was added to induce cpf1 expression, and cultures were grown for another hour. After growth, the strains were made electrocompetent as before and transformed with 1 μL of 100 μM repair oligonucleotides as well as 50 ng of sgRNA harboring plasmid. Transformations were recovered for 4 h in SOC medium before plating on LB kanamycin (50 mg/L) plates. Colonies were typically observed after outgrowth at 30 °C for 48 h. Deletion success rate varied dramatically between KO targets (Supplementary Table 5).
Statistics & reproducibility
All strains were cultured as biological triplicates (n = 3). Experimental groups were defined based on recommendations from the machine learning algorithm and limited to 60 recommendations to facilitate four sequential BioLector runs per DBTL cycle. Each experimental group was assembled, cultured, and analyzed under identical conditions alongside experimental controls. Data used to train the active learning model was filtered according to the method above; however, no data was excluded in our analysis. Where applicable, statistical significance was determined using a paired Student’s T-test, where p < 0.05. Spearman’s rank correlation coefficients (Supplementary Fig. 14) were calculated using the SciPy package in Python. With the exception of the box-and-whisker plot in Fig. 4d, all error bars represent standard deviation.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
