Our group utilizes modern non-parametric statistical methods to determine probability distributions and regressions of structural parameters in very high-resolution structures, such as side-chain rotamer distributions, backbone dihedral angles, and beta turns. We and others use these distributions in protein structure prediction algorithms, such as our SCWRL4 program and Rosetta. Current work is focused on deep learning neural networks for analysis and prediction of protein structures and new methods for deriving independent training, validation, and testing sets for deep learning to be implemented on our PISCES server.
Proteins perform their functions via interactions with other molecules – including other proteins in homo- or heterooligomeric assemblies as well as peptides, small ligands, and nucleic acids. Our ProtCID database provides access to clustered interactions of protein domains based on Pfam assignments to every PDB chain (PDBfam). ProtCID provides clusters of protein-protein interactions across all structures in the PDB, as well as clusters of domain-peptide, domain- ligand, and domain-nucleic-acid interactions. Current work is focused on clustering of full protein assemblies across crystallographic, cryo-EM, and NMR structures of proteins in the PDB.
Antibodies utilize protein loops on their surface (“complementarity determining regions” or CDRs) to bind antigen. In 2011, we published a clustering of these CDR structures for all antibodies in the PDB; our nomenclature is now widely used to classify new antibody structures and is provided on our PyIgClassify server and database. We incorporated our clustering of antibody CDRs into the Rosetta program and developed an application within Rosetta, called RosettaAntibodyDesign (RAbD) which is able to improve the affinity of antibodies from 10 to 100 fold. We are currently updating the clustering and nomenclature with new clustering unsupervised learning methods, as well as applying these methods to T-cell receptor CDRs. We are also applying deep to the antibody and T-cell receptor design problems.
Typical protein kinase domains exhibit a wide range of conformational states, typically grouped as “DFGin” and “DFGout” states, describing the position of the active site Asp residue of the DFG motif at the N-terminal end of the activation loop. We have clustered the structures of the DFGmotif into eight conformational states. Six of these states are “DFGin” states, including the active state, two common inactive states, and three uncommon inactive states. Another common state is the most frequent conformation of DFGout structures, while the remaining state is a “DFGintermediate” structure. We present a database and webserver (Kincore) for providing these classifications as well as a classification of ligand types for all kinases across the PDB. We are currently performing a similar analysis on the loop structures of RAS proteins in the PDB and will extend it to other protein families.