DNA2.0 combines state-of-the-art gene synthesis with cutting edge machine learning and ‘big data’ to explore and exploit sequence-function relationships, identifying protein variants with significantly improved characteristics; typically assaying 96-400 variants over 3-5 rounds.
2. Synthesize Infologs: using proprietary GeneGPS® coding algorithms and gene synthesis technology, ensuring that amino acid changes are distributed to achieve maximum information content.
3. Survey the Sequence Space: Test Infologs in a commercially relevant assay, and quantify relative variable contribution. Results allow us to deconvolute how substitutions within a protein sequence modify its function.
4. Map the sequence space: Establish a sequence-function model from the assay results and cross validate. Models are assessed on their predictive value.
Repeat: Explore new sequence space, using the model to design a new set of systematic variants.
Results: Select the best functionally improved variant(s) after multiple rounds of the above.
- Optimize directly for function in the final application
- Save years of time and millions of dollars
- No high-throughput (HTP) screens
- Screen small numbers of variants (50-200) directly for the desired function, ideal for Enzyme Engineering
- Don’t waste time pursuing false positives: variants identified by HTP screens that do not retain activity in ‘real’ assay
- No false negatives lost due to screening error or poor correlation between HTP screen and ‘real’ assay
- No biodiversity collections required, everything is synthesized as needed
- Sequence-function relationships provide the basis for strong composition-of-matter patent claims.
Typical protein engineering methods rely on screening a high number (106-1012 or more) of gene variants to identify individuals with improved activity using a surrogate high throughput screen (HTP) to identify initial hits. Unfortunately, you get what you screen for as the “hit” from the HTP screen often has very little real activity in a lower throughput assay more indicative of the improved functionality for which the protein is being developed.
ProteinGPS™ instead relies on identifying key amino acid substitutions through bioinformatics-based mining of available sequence space and combining such substitutions in an information maximized variant dataset (usually less than 100 unique gene variants). At that scale determining the activity for the commercially relevant function in an indicative assay can be readily performed. DNA2.0 then uses advanced machine learning algorithms to deconvolute the relative contributions of each substitutions to map the megadimensional sequence space contributing to the desired protein activity. We routinely see orders of magnitude functional improvement by measuring no more than 100-300 samples.
- Design Characteristics
- Number of Varying Positions
- Substitutions per Variant
- Number of Variants
- Protein Characteristics
- Deleterious Mutations
- Avg. Effect of Substitutions
- Assay Characteristics
- Assay Noise
- Surrogate Correlation
Applying Modern Engineering Principles
The bioengineering technology developed by DNA2.0 is based on mathematical nonlinear systems modeling and optimization algorithms routinely used in such diverse areas as small molecule QSAR, process control design for manufacturing, website optimization, and logistics. These problems all require methods that can analyze systems with high complexity and large numbers of independent impactful variables. Over the past seventy years, mathematicians and engineers have developed algorithms for identifying optimal solutions from data sets that are very small relative to the total potential information space being interrogated. Today, these principles are used in the development of numerous products, from the design of jet engines to the optimization of gasoline formulations to credit card fraud detection. Methods for multidimensional optimization that are now routinely employed in other engineering disciplines contrast starkly with both structure-based protein design and directed evolution, which have no real parallels in other engineering areas.
Developing Algorithms for Engineering Proteins
At DNA2.0 we have modified the standard algorithms for engineering complex systems to work with biological systems. The resulting process enables us to deconvolute how substitutions within a protein sequence modify its function. We have combined these algorithms with an integrated query and ranking mechanism to identify appropriate sequence substitutions.
From Predicted Sequences to Testable Genes
The conversion of computationally predicted DNA sequences to physically testable genes is powered by our gene synthesis pipeline. Until recently, the synthesis of individually designed genes was prohibitively expensive. As a result, the only practical way to obtain combinatorially modified proteins was to make recombinant libraries, which in turn necessitated high-throughput screens. By instead synthesizing individually designed gene variants, DNA2.0 ensures that amino acid changes are distributed to achieve maximum information content. This in turn obviates the need for high-throughput screening, allowing us instead to focus on measuring protein properties that are important for the final application.
Using independently designed synthetic genes where substitutions are systematically incorporated (Infologs™) leads to uniform sampling, systematic variance and unrestricted information rich results. Wheat GST with the ability to detoxify a panel of common herbicides was designed using this patented DNA2.0 bioengineering method. The relative functional contribution of 60 amino acid substitutions against 14 herbicides was quantified using only 96 infologs and dramatically improved by a small set (16) of 2nd generation infologs. Check out the full “Using Infologs to Engineer Biological Systems” Presentation or the ACS Synthetic Biology Publication.
Researchers at Pfizer and DNA2.0 publish the enzyme engineering of an aminotransferase for the biocatalysis of a key chiral intermediate in the synthesis of imagabalin, an advanced anxiolytic drug candidate. The starting wt protein, Vfat, is an ω-amino acid:pyruvate transaminase with very weak but detectable catalytic activity toward aliphatic amines. Designing and testing <450 Vfat variants synthesized by DNA2.0 resulted in an aminotransferase optimized for substrate selectivity and reaction velocity sufficient for the commercial biocatalysis goal.
ProteinGPS Engineering Overview
Webinar – ProteinGPS Engineering via Systematic Exploration of Space
Learn more about Protein Engineering and Infologs. ProteinGPS relies on identifying key amino acid substitutions through bioinformatics-based mining of available sequence space and combining such substitutions in information maximized Infologs – synthetic gene variants designed to be systematically varied across the searched space.
Using a small set of variants to explore the sequence space systematically can help us understand the effects of substitutions on the protein activities and further helps us to determine e strategies to explore the sequence space. This can be attained using Machine Learning techniques to analyze the data from a small number of systematically designed variants of the protein, usually on the order of 100 variants. We can address questions related to additivity and multidimensional effects of substitutions on the various properties and activities that can be measured accurately under commercially relevant conditions.
View a pdf of the webinar slides.
Webinar – Using ProteinGPS and Infologs to Engineer Biological Systems
ProteinGPS™ relies on identifying key amino acid substitutions through bioinformatics-based mining of available sequence space and combining such substitutions in information maximized Infologs – synthetic gene variants designed to be systematically varied across the searched space. The presentation includes recent case studies.View a pdf of the webinar slides.
DNA2.0 Presentations from Scientific Conferences
- Protein Engineering Solutions Brochure
- PEGS, 2015: Quantitative Biology – Tools to Build Better Biology
- PepTalk, 2015: Comprehensive Engineering of Biological Systems
- PEGS, 2014: Systematic Optimization of Therapeutic Protein Production
- PepTalk, 2014: Leveraging Gene Synthesis for Systematic Optimization of Protein Production
- GRC Biocatalysis, 2014: Systematic Exploration of Sequence Space for Protein Engineering Poster
- RAFT X 2013: Multidimensional Optimization of Biological Systems to Fit Industrial Applications
- PepTalk, 2013: Infologs™ for Multi-dimensional Gene and Protein Optimization
- Society for Industrial Microbiology and Technology, 2012: Design of Functional Bio Constructs
- Next Generation Protein Therapeutics, 2012: Use of Infologs™ to Accelerate Bioengineering
- Autodesk Ideas Conference, 2012: Gene Synthesis + Machine Learning = BioDesign
- Enzyme Engineering XXI, 2011: Using Infologs to Engineer Biological Systems
- DNA2.0 Pfenex Boehringer Ingelheim Collaboration Case Study/White Paper
Search the DNA2.0 Literature Database, containing over 1,000 scientific publications using DNA2.0 technology for references relevant to your research.
Highlight References:ACS Synth Biol 2014. Mapping of Amino Acid Substitutions Conferring Herbicide Resistance in Wheat Glutathione Transferase. Govindarajan et al.
J Am Chem Soc 2013. Improved biocatalysts from a synthetic circular permutation library of the flavin-dependent oxidoreductase Old Yellow Enzyme. Daugherty et al.
Protein Eng Des Sel 2013 26(1):25-33. Redesigning and characterizing the substrate specificity and activity of Vibrio fluvialis aminotransferase for the synthesis of imagabalin. Midelfort, KS. et al.
IBC Antibody Engineering and Therapeutics Poster December, 2011: Strategies for Maximizing Information Content in Protein Libraries
PNAS 2010 107(5):1948-53. Reconstructed evolutionary adaptive paths give polymerases accepting reversible terminators for sequencing and SNP detection. Chen, F. et al.
J Biol Chem 2009 284(39):26229-33. SCHEMA recombination of a fungal cellulase uncovers a single mutation that contributes markedly to stability. Heinzelman, P. et al.
PNAS 2009 106(14):5610-5. A family of thermostable fungal cellulases created by structure-guided recombination. Heinzelman, P. et al.
Protein Eng Des Sel 2008 21:699-707. Protein engineering of improved prolyl endopeptidases for celiac sprue therapy. Ehren, Govindarajan, Morón, Minshull, Khosla.
BMC Biotechnol. 2007 7:16. Engineering proteinase K using machine learning and synthetic genes. Liao, Warmuth, Govindarajan, Ness, Wang, Gustafsson, Minshull.
Curr Opin Chem Biol 2005 9:202-9. Predicting enzyme function from protein sequence. Minshull, Ness, Gustafsson, Govindarajan
Curr Opin Biotechnol 2003 14:366-70. Putting engineering back into protein engineering. Bioinformatic approaches to catalyst design. Gustafsson, Govindarajan, Minshull.
GenomeGPS™ and PathwayGPS™
The bioengineering technology developed by DNA2.0 can also be used to develop completely novel genomes or to optimize pathways.
PathwayGPS™ and GenomeGPS™ build on DNA2.0’s other GPS systems to explore higher order combinations of multiple genes into functionally improved metabolic pathways. Our capability for low-cost, high-capacity gene synthesis enables synthesis of multi-component multi-gene pathways up to several hundred kilobases in size.
Systematic non-correlated variation of control elements such as operators, promoters, and terminators across complex metabolic pathways while simultaneously varying individual genes to cover a range of expression levels, specificity and activity allows for sampling of vast areas of metabolic space. Application of advanced machine learning algorithms then enables a determination of each element’s contribution to pathway efficacy within a multitude of complex, interacting enzyme activities. The elements are then engineered to drive the system to its optimal performance using a minimum number of assays.
Modular Libraries allow researchers to build new genetic modules or pathways from basic DNA parts including regulatory elements (promoters, untranslated regions, signal peptides, terminators) and/or coding sequences. Multiple genetic elements can be combined in various predefined arrangements to create new transcriptional units, biochemical pathways, genetic circuits, or develop strains displaying new traits.
This technology is partially covered by US patents 8005620, 8412461, and 8635029 issued to DNA2.0.