What is protein design

Proteins are the major functional macromolecules in living cells. Structurally, a protein is a long, flexible chain of amino acids, in which the particular sequence of amino acids is determined by DNA of the gene that encodes the protein in question. However, an unstructured and flexible chain of amino acids would be nonfunctional. In order to perform a specific function a protein must “fold”, or adopt a specific three- dimensional conformation. Since it is now possible to synthesize artificial genes and to insert these into bacteria, it should be possible to create proteins with any desired amino acid sequence, and, by extension, the desired function. In practice, however, the sequence-structure relationship is a complicated one, meaning that it is very difficult to predict structure from sequence, or to design sequences that fold into a desired structures.


De novo protein design

[Not a valid template]

A) Protein design begins by laying out a backbone that has  the desired shape.
B) Rosetta design is used to identify amino acid sequences with low Rosetta score, good packing, and accurate secondary structure predictions.
C) Designs which have much lower energy than any other conformations sampled in a de-novo folding trajectory are selected.
D) From the best folding designs we order DNA, grow E. Coli transformed with the DNA, and collect our proteins.
E) After purification the quality of the protein design is assessed using CD melts(GuHCl and temperature), SEC-MALS, Mass-spec, SAXS and  Crystallography. Shown is a successful SAXS measurement


De-novo design of helical repeat proteins – My work

A central question in protein evolution is the extent to which naturally occurring proteins sample the space of folded structures accessible to the polypeptide chain. Repeat proteins composed of multiple tandem copies of a modular structure unit are widespread in nature and play critical roles in molecular recognition, signaling, and other essential biological processes. Naturally occurring repeat proteins have been reengineered for molecular recognition and modular scaffolding applications. Here we use computational protein design to investigate the space of folded structures that can be generated by tandem repeating a simple helix-loop-helix-loop structural motif. 83 designs with sequences unrelated to known repeat proteins were experimentally characterized. 53 were monomeric and stable at 95 degrees, and 43 have solution x-ray scattering spectra closely consistent with the design models. Crystal structures of 15 designs spanning a broad range of curvatures are in close agreement with the design models with RMSDs ranging from 0.7 to 2.5 Å. Our results show that existing repeat proteins occupy only a small fraction of the possible repeat protein sequence and structure space and that it is possible to design novel repeat proteins with precisely specified geometries, opening up a wide array of new possibilities for biomolecular engineering.


In this figure we show the helical repeat protein universe. a, the geometry of a repeat protein can be described by helical parameters. Axial displacement (z), radius of the helix (r) and angular displacement or twist (ω) between repeat units are depicted. b, designed repeat proteins (grey) cover radius and twist spaces not found in native repeat protein families, in color. Positive ω values indicate designs forming right-handed helices; negative left-handed. Native families are: ANK, ankyrin; ARM, armadillo; TPR, tetratricopeptide repeat; HAT, half TPR; PPR, pentatricopeptide repeat; HEAT, heat repeat; PUM, pumilio homology domain; mTERF, mitochondrial termination factor; TAL, transcription activator-like effector; OTHER, alpha helical repeat proteins not in the other families. Designs structurally validated by small angle x-ray scattering (SAXS) (black) or crystallography (black with red circle) are distributed throughout the space. On top, representative experimentally validated designs of a variety of shapes.


Functional repeat protein design – My ongoing projects

The ability to design large proteins with precise geometry that repeat  has led to numerous projects.

a. The ability to precisely adjust the shape of a protein scaffold from building blocks will make it very easy to quickly design de-novo protein-protein interfaces. In this project I have designed junction modules to connect the de-novo repeats. Design of junctions in naturally occurring LRR proteins has recently been done by Keunwan[  ], however, LRR are limited to beta-sheet interface interactions and do not offer the same degree of control as the designed helical repeat proteins.

b. Design of junctions between repeat proteins and parametrically designed bundles will allow us to make large protein cages and crystals. I am working with Una Nattermann and Yang Hsia on this project.

c. Controlled buildup of interdigitated proteins on Mica or Graphene has the potential to build exotic materials and with precisely controlled the shape. I am working with Harley Pyles and Professor Jim De Yoreo(Pacific Northwest National Laboratory – Material Science) on this project.

d. BAR proteins bend bilayers and recruit interaction partners trough poorly understood mechanisms[  ]. In this project I have designed a synthetic BAR proteins based on repeat proteins, my  collaborator Jihong Bai(Fred Hutchinson) will be using these synthetic BAR proteins to investigate how the BAR domain works.

e. I am developing de-novo protein tubes.  One potential of these tubes would be to order how carbon nanotubes assemble.

f. Embedding a heme, iron or copper in repeats should make electrically conductive protein wires. I am working with Anindya Roy on this.

Park, K., Shen, B. W., Parmeggiani, F., Huang, P.-S., Stoddard, B. L., & Baker, D. (2015). Control of repeat-protein curvature by computational protein design. Nature Structural & Molecular Biology, 22(2), 167–174.
Mim, C., Cui, H., Gawronski-Salerno, J. A., Frost, A., Lyman, E., Voth, G. A., & Unger, V. M. (2012). Structural basis of membrane bending by the N-BAR protein endophilin. Cell, 149(1), 137–145.

Improved protein-protein interface design and de-novo protein design – Computational aims

Protein-protein interface design has two challenging aspects: binding site identification and design of a de novo protein to scaffold the binding site interface. Currently very few if any protein-protein interfaces have been designed without first having an antibody structure(See related work below). I intend to change that with the following strategy.

binding site identification: Binding site (stub) identification is challenging because it requires design of polar protein-protein interfaces. If we look to nature for inspiration we see many protein-protein interfaces are in loops [Antibodies].  However, loop design in Rosetta is difficult. I intend to overcome this in 4 ways. First, I will design hydrogen bond networks in helical and loop stubs using a new protocol called hb-net developed by Scott Boyken. Second, I will develop a new technique that designs loops that are either structurally conserved in the PDB or whose loop direction can be forced by forming kinks early in the loop. Allowing loops in stub placement will dramatically increase the possible orientations for the polar residues, thereby increasing the chance we will find favorable hydrogen bond networks. Third, since the first two steps are computationally expensive I will utilize model-based search on high performance compute architectures to direct exploration toward the best stub locations. And fourth, since I know loops are still likely to be designed poorly I will produce multiple loop stubs from the same N and C stub termini. Having multiple loops for the same stub termini will increase the  chances to find good loops without having to re-order the entire proteins through the use of a high-throughput chip assay.

de novo protein design : I intend to pursue two strategies to improve the design of functional de-novo proteins.  The goal of both strategies is to design a topology that is able to scaffold a large binding site and also pass structure prediction filters.

Strategy 1. Repeat protein assembly :  I propose to develop a protein design strategy that assembles proteins by connecting previously designed repeats with repeat junctions. Because all subunits already will have been experimentally tested it is very likely the new proteins will be viable experimentally. (See current work A)

Strategy 2. Broken chain assembly: I propose to develop a broken-chain assembly strategy that can connect multiple stubs together with a stabilizing backbone. Backbone assembly will iterate between backbone design and identification of poorly designed regions which will be identified by a fast method to assess if a protein folds computationally.

Both strategies offer a way to quickly design large functional de-novo proteins  that pass computational foldability tests.


Protein-protein interface design(circa 2011) – [Related work done by others]

Functional protein-protein interaction design was first accomplished in 2011 by Fleishman et al[ ]. The method begins by detecting where on the surface of a protein to bind which is done by copying an antibody binding site. From there hotspots are extracted from the antibody and protein scaffolds are selected in the PDB. The proteins are selected based on what shape can accommodate the hot-spots. Additional interactions on the protein scaffold are designed. The protein is then experimentally refined using directed evolution and verified using X-ray crystallography.

Fleishman, S. J., Whitehead, T. A., Ekiert, D. C., Dreyfus, C., Corn, J. E., Strauch, E.-M., Wilson, I. A., & Baker, D. (2011). Computational design of proteins targeting the conserved stem region of influenza hemagglutinin. Science (New York, N.Y.), 332(6031), 816–821.



High throughput protein-protein interface design(circa 2015)- [Related work done by others]


High throughput design of protein-protein interactions is now a reality thanks to work by Aaron Chevalier, Chris Bahl and others. Design begins with an antibody identified binding site. Then hotspots or binding site motifs are copied over to a library of de-novo design protein scaffold. The library of designs is expressed on a chip and tested via high-throughput screening. Using this technique it is possible to design therapeutic protein-protein binding proteins quickly, however, the vast majority of proteins are non-functional so ongoing work seeks to improve reliability.