All posts by tjbrunette

What is protein design

Proteins are the major functional macromolecules in living cells. Structurally, a protein is a long, flexible chain of amino acids, in which the particular sequence of amino acids is determined by DNA of the gene that encodes the protein in question. However, an unstructured and flexible chain of amino acids would be nonfunctional. In order to perform a specific function a protein must “fold”, or adopt a specific three- dimensional conformation. Since it is now possible to synthesize artificial genes and to insert these into bacteria, it should be possible to create proteins with any desired amino acid sequence, and, by extension, the desired function. In practice, however, the sequence-structure relationship is a complicated one, meaning that it is very difficult to predict structure from sequence, or to design sequences that fold into a desired structures.

De novo protein design

A) Protein design begins by laying out a backbone that has  the desired shape.
B) Rosetta design is used to identify amino acid sequences with low Rosetta score, good packing, and accurate secondary structure predictions.
C) Designs which have much lower energy than any other conformations sampled in a de-novo folding trajectory are selected.
D) From the best folding designs we order DNA, grow E. Coli transformed with the DNA, and collect our proteins.
E) After purification the quality of the protein design is assessed using CD melts(GuHCl and temperature), SEC-MALS, Mass-spec, SAXS and  Crystallography. Shown is a successful SAXS measurement

De-novo design of helical repeat proteins – My work

A central question in protein evolution is the extent to which naturally occurring proteins sample the space of folded structures accessible to the polypeptide chain. Repeat proteins composed of multiple tandem copies of a modular structure unit are widespread in nature and play critical roles in molecular recognition, signaling, and other essential biological processes. Naturally occurring repeat proteins have been reengineered for molecular recognition and modular scaffolding applications. Here we use computational protein design to investigate the space of folded structures that can be generated by tandem repeating a simple helix-loop-helix-loop structural motif. 83 designs with sequences unrelated to known repeat proteins were experimentally characterized. 53 were monomeric and stable at 95 degrees, and 43 have solution x-ray scattering spectra closely consistent with the design models. Crystal structures of 15 designs spanning a broad range of curvatures are in close agreement with the design models with RMSDs ranging from 0.7 to 2.5 Å. Our results show that existing repeat proteins occupy only a small fraction of the possible repeat protein sequence and structure space and that it is possible to design novel repeat proteins with precisely specified geometries, opening up a wide array of new possibilities for biomolecular engineering.

fig2_183mm

In this figure we show the helical repeat protein universe. a, the geometry of a repeat protein can be described by helical parameters. Axial displacement (z), radius of the helix (r) and angular displacement or twist (ω) between repeat units are depicted. b, designed repeat proteins (grey) cover radius and twist spaces not found in native repeat protein families, in color. Positive ω values indicate designs forming right-handed helices; negative left-handed. Native families are: ANK, ankyrin; ARM, armadillo; TPR, tetratricopeptide repeat; HAT, half TPR; PPR, pentatricopeptide repeat; HEAT, heat repeat; PUM, pumilio homology domain; mTERF, mitochondrial termination factor; TAL, transcription activator-like effector; OTHER, alpha helical repeat proteins not in the other families. Designs structurally validated by small angle x-ray scattering (SAXS) (black) or crystallography (black with red circle) are distributed throughout the space. On top, representative experimentally validated designs of a variety of shapes.

Functional repeat protein design – My ongoing projects

Ongoing_work
The ability to design large proteins with precise geometry that repeat  has led to numerous projects.

a. The ability to precisely adjust the shape of a protein scaffold from building blocks will make it very easy to quickly design de-novo protein-protein interfaces. In this project I have designed junction modules to connect the de-novo repeats. Design of junctions in naturally occurring LRR proteins has recently been done by Keunwan[  ], however, LRR are limited to beta-sheet interface interactions and do not offer the same degree of control as the designed helical repeat proteins.

b. Design of junctions between repeat proteins and parametrically designed bundles will allow us to make large protein cages and crystals. I am working with Una Nattermann and Yang Hsia on this project.

c. Controlled buildup of interdigitated proteins on Mica or Graphene has the potential to build exotic materials and with precisely controlled the shape. I am working with Harley Pyles and Professor Jim De Yoreo(Pacific Northwest National Laboratory – Material Science) on this project.

d. BAR proteins bend bilayers and recruit interaction partners trough poorly understood mechanisms[  ]. In this project I have designed a synthetic BAR proteins based on repeat proteins, my  collaborator Jihong Bai(Fred Hutchinson) will be using these synthetic BAR proteins to investigate how the BAR domain works.

e. I am developing de-novo protein tubes.  One potential of these tubes would be to order how carbon nanotubes assemble.

f. Embedding a heme, iron or copper in repeats should make electrically conductive protein wires. I am working with Anindya Roy on this.

Responsive cells – My proposed lab’s experimental aims

Nanorobotics

Engineering responsive cells will require design of novel proteins for localization, sensing, control, products, and delivery mechanisms. The possible applications for this technology are enormous. Digestive health could be improved by a Lactococcus lactis or Lactobacillus that can respond to C. Diff or digest allergy triggers. Alzheimers could be cured by a Leukocyte modified to promote crossing the blood-brain barrier and when across releasing a protein that binds A-beta. Salmonella designed to bind selectively to tumor cells with release a localized toxin could treat cancer. In plants a responsive cell could cure uncurable viral diseases or mitigate the anti-GMO movement by producing protein toxins on-demand.

My first two aims are cellular localization and a sensing system that works both in a cell and across a membrane.

Localization:

Bio-sensor:

Improved protein-protein interface design and de-novo protein design – Computational aims

Protein-protein interface design has two challenging aspects: binding site identification and design of a de novo protein to scaffold the binding site interface. Currently very few if any protein-protein interfaces have been designed without first having an antibody structure(See related work below). I intend to change that with the following strategy.

binding site identification: Binding site (stub) identification is challenging because it requires design of polar protein-protein interfaces. If we look to nature for inspiration we see many protein-protein interfaces are in loops [Antibodies].  However, loop design in Rosetta is difficult. I intend to overcome this in 4 ways. First, I will design hydrogen bond networks in helical and loop stubs using a new protocol called hb-net developed by Scott Boyken. Second, I will develop a new technique that designs loops that are either structurally conserved in the PDB or whose loop direction can be forced by forming kinks early in the loop. Allowing loops in stub placement will dramatically increase the possible orientations for the polar residues, thereby increasing the chance we will find favorable hydrogen bond networks. Third, since the first two steps are computationally expensive I will utilize model-based search on high performance compute architectures to direct exploration toward the best stub locations. And fourth, since I know loops are still likely to be designed poorly I will produce multiple loop stubs from the same N and C stub termini. Having multiple loops for the same stub termini will increase the  chances to find good loops without having to re-order the entire proteins through the use of a high-throughput chip assay.

de novo protein design : I intend to pursue two strategies to improve the design of functional de-novo proteins.  The goal of both strategies is to design a topology that is able to scaffold a large binding site and also pass structure prediction filters.

Strategy 1. Repeat protein assembly :  I propose to develop a protein design strategy that assembles proteins by connecting previously designed repeats with repeat junctions. Because all subunits already will have been experimentally tested it is very likely the new proteins will be viable experimentally. (See current work A)

Strategy 2. Broken chain assembly: I propose to develop a broken-chain assembly strategy that can connect multiple stubs together with a stabilizing backbone. Backbone assembly will iterate between backbone design and identification of poorly designed regions which will be identified by a fast method to assess if a protein folds computationally.

Both strategies offer a way to quickly design large functional de-novo proteins  that pass computational foldability tests.

Publications

In 2013 I switched to protein design. Based on my experience with protein structure prediction I was able to develop a fast, robust and completely computational method for protein design. My work has transformed protein design from a state where <1% designs pass structure prediction (a key check of protein viability) to a state where approximately 50% pass. This improved design strategy has led to a very high success rate with 44 out of 83 proteins ordered having the expected structure. [ ]. This general method will serve as the basis for design of functional de-novo proteins in the near future.

From 2011-2013 I worked on homology modeling. In homology modeling hybridizing homologs outperforms a wider search of conformation space near the homologs.  [unpublished work & ]. During these years I implemented the homolog identification and alignment system on http://robetta.bakerlab.org/ and conducted work on how to explore a wide region of conformation space in the vicinity of homologs.

From 2005-2011 I researched de-novo structure prediction and new methods for optimization [ ,, ]. Much of this work remains unpublished due to negligible RMSD improvements(even though energy was significantly improved). As of today only 8% of monomeric proteins < 120 amino acids can be predicted < 3 RMSD [unpublished work].