lab

Smooth Backbone-Dependent Rotamer Library 2010

Dataset

We provide an original dataset that was prepared by us to compile the 2010 Rotamer Library itself. We have put a lot of efforts to prepare reliable data for the library computation with a series of 10 steps. We briefly describe the preparation process below. For more details, please refer to the publication indicated below, more specifically to Supplemental Experimental Procedures, Dataset preparation describing all 10 steps.

The dataset is stored in a single self-descriptive file. It has a very simple text format which is very to parse. Its header contains brief instructions, definitions and descriptions. The file has more than 600 thousand record lines; each line is for amino acid residue.

A few of extracts of the dataset are shown below. To download the full version, please apply for a license.


Preparation in a few words

We first determined the full list of protein-containing PDB entries for which we could obtain electron densities from the Uppsala Electron Density Server (EDS). We have shown previously that side chains with sp3-sp3 hybridized bonds with nonrotameric dihedral angles, those far from the typical mean values for (60°, 180°, 300°), have much lower electron density than average.

This list was then filtered by the PISCES server and run through the SIOCS program to flip Asn, Gln, and His terminal dihedral angles to account for hydrogen bonding. We obtained a list of 3,985 protein chains from 3,845 entries with resolution better than or equal to 1.8A, an R-factor cutoff of 0.22, and mutual sequence identity of the chains of 50% or less.

We calculated the electron density at the atom coordinates of 3,985 chains and computed the geometric mean of the electron density at the atomic positions in each residue as a quality filter to remove disordered residues - those with electron densities in the bottom 25th percentile for each residue type. For the rotamer library calculations, the resulting number of residues totaled unique 581,128. We also accounted for incorrectly modeled leucine residues, and we analyzed trans and cis prolines separately, as well as disulfide-bonded and nondisulfide-bonded cysteines.


Extracts of the Dataset file

DatasetForBBDepRL2010.txt

             # ALL COMMENT LINES START WITH "# ", PLEASE IGNORE THESE LINES WHEN PARSING
              #
              # Dataset used in computing of '2010 Backbone-dependent Rotamer Library'
              # Copyright (c) 2007-2012
              # Maxim V. Shapovalov and Roland L. Dunbrack Jr.
              # Fox Chase Cancer Center
              # Philadelphia, PA, USA
              #
              # File was generated in February, 2012
              #
              # ===============================================================================
              # Please cite this paper when publishing results based on our dataset or library:
              # ===============================================================================
              # Shapovalov, M.S., and Dunbrack, R.L., Jr. (2011).
              # "A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions." Structure, 19, 844-858.
              #
              # Column title abbreviations used:
              # --------------------------------
              #
              # * RES - three-character code for the i-th amino acid residue type
              #
              # * PDB_ID   - four-character PDB entry name
              # * CHAIN_ID - one-character chain ID
              # * RES_ID   - residue ID from ATOM-record lines
              #              ***** WARNING: It includes insertion code which may be not numerical; treat this column as a string
              # * ALTER_ID - one-character alternative conformation ID
              #              ***** WARNING: Please treat `!` alternative conformations as a single confirmation.
              #                             A space or no character in this field is replaced with `!`. This is done for easier parsing.
              #
              # * OMEGA - omega torsion angle for i-th residue.
              #           A traditional definition is used, i.e. omega(i) precedes i-th residue not follows it:
              #           CA(i-1) - C(i-1) - N(i) - C(i).  For example, cis proline means the cis peptide bond is before the proline.
              #
              # * CHI1 - chi1 torsion angle
              # * CHI2 - chi2 torsion angle (for residue types with no chi2 angle, NaN is provided instead, i.e. NaN = Not A Number)
              # * CHI3 - chi3 torsion angle (for residue types with no chi3 angle, NaN is provided instead)
              # * CHI4 - chi4 torsion angle (for residue types with no chi4 angle, NaN is provided instead)
              #
              # * BBPERC - electron density percentile for a group of the backbone atoms
              # * SCPERC - electron density percentile for a group of the side-chain atoms
              # * RSPERC - electron density percentile for a group of all residue atoms
              #
              # * FLP_STATE  - flip state for Asn, Gln and His concluded by Siocs software
              #                Possible values:
              #                                `Kept`    - original conformation is kept by Siocs
              #                                `Flipped` - original conformation is flipped by Siocs
              #                                `UNDEF`   - no information is provided by Siocs for some reason
              #                                `N./A.`   - non-applicable value for the rest of residue types
              #
              # * FLP_CONFID - `Flipped` or `Kept` confidence code for Asn, Gln and His by Siocs software
              #                Possible values:
              #                                `clear`    - high level of confidence for either `Flipped` or `Kept` by Siocs software
              #                                `probable` - middle level of confidence by Siocs software
              #                                `unsure`   - low level of confidence by Siocs software
              #                                `UNDEF`   - no information is provided by Siocs for some reason
              #                                `N./A.`   - non-applicable value for the rest of residue types
              #
              # * SS - one-letter secondary-structure codes for (i-1)-th, i-th and (i+1)th residues
              #
              # For details on electron density percentiles, please refer to the following paper:
              #
              # Shapovalov MV, Dunbrack RL Jr.
              # "Statistical and conformational analysis of the electron density of protein side chains." Proteins, 2007 Feb 1;66(2):279-303.
              #
              # The following residue types are provided: ARG ASN ASP CPR CYD CYH CYS GLN GLU HIS ILE LEU LYS MET PHE PRO SER THR TPR TRP TYR VAL
              # -----------------------------------------
              #
              # TPR are trans prolines
              # CPR are cis prolines
              # PRO include both trans and cis prolines
              #
              # CYH are nondisulfide-bonded cysteines
              # CYD are disulfide-bonded cysteines
              # CYS include both nondisulfide-bonded and disulfide-bonded cysteines
              #
              # All columns are tab-delimited.
              # ------------------------------
              #
             # RES PDB_ID  CHAIN_ID  RES_ID  ALTER_ID  OMEGA      PHI        PSI        CHI1      CHI2      CHI3      CHI4      BBPERC   SCPERC   RSPERC   SS    FLP_STATE  FLP_CONFID
              #
             ARG   1a2p    A         69      !         175.764    -97.348    125.795    171.866   189.419   286.267   148.456   7.484    59.358   40.532   TTC   N./A.      N./A.
              ARG   1a2p    A         72      !         173.119    -134.179   161.280    294.586   180.232   176.276   280.520   51.134   40.316   42.979   EEE   N./A.      N./A.
              ...
              SER   1a2p    A         38      A         -173.147   -70.162    -11.738    288.835   NaN       NaN       NaN       78.319   52.440   68.080   GGG   N./A.      N./A.
              SER   1a2p    A         50      !         172.341    -126.085   152.719    290.465   NaN       NaN       NaN       80.896   91.812   87.794   TEE   N./A.      N./A.
              ...
              GLN   1a2p    A         15      !         -174.003   -71.334    -15.915    288.411   198.670   -48.133   NaN       8.143    79.352   49.233   HHH   Kept       clear
              GLN   1a2p    A         31      A         178.544    -57.875    -41.798    288.200   170.764   -35.176   NaN       63.021   37.101   44.161   HHH   Kept       clear
              GLN   1a2p    A         104     !         179.566    -79.879    -41.775    293.901   181.327   95.711    NaN       20.989   40.133   33.230   TTT   Kept       clear
              GLN   1a3a    A         31      !         177.202    -74.527    -29.147    297.233   309.943   -45.855   NaN       86.350   69.375   76.526   HHH   Flipped    clear
              ...
              PRO   1a2p    A         21      !         172.859    -62.606    165.053    -3.922    5.290     -4.354    NaN       24.435   63.777   38.465   CTT   N./A.      N./A.
              PRO   1a2p    A         47      !         -174.356   -54.393    131.720    -16.341   16.766    -10.288   NaN       48.102   56.398   51.823   TTT   N./A.      N./A.
              ...
              CPR   1a4i    A         102     !         0.047      -82.903    156.917    29.614    -34.140   24.199    NaN       41.514   32.754   37.018   TTT   N./A.      N./A.
              CPR   1a4i    A         272     !         0.843      -78.813    -172.686   25.221    -23.577   12.734    NaN       29.712   21.141   25.246   TTT   N./A.      N./A.
              ...
              TPR   1a2p    A         21      !         172.859    -62.606    165.053   -3.922     5.290     -4.354    NaN       24.435   63.777   38.465   CTT   N./A.      N./A.
              TPR   1a2p    A         47      !         -174.356   -54.393    131.720   -16.341    16.766    -10.288   NaN       48.102   56.398   51.823   TTT   N./A.      N./A.
              ...
              ASN   1a2p    A         5       !         -175.640   -141.896   23.545    72.171     -16.270   NaN       NaN       39.409   46.826   44.700   CCC   Kept       clear
              ...
              ASN   1a3a    A         120     !         179.248    -74.424    -29.777   296.868    -20.719   NaN       NaN       20.311   37.946   30.339   HHH   Flipped    clear
              ASN   1a4i    A         8       !         -175.298   -83.178    93.920    190.943    -23.282   NaN       NaN       36.811   35.545   36.161   CCH   Kept       clear
              ...
              CYH   1a3a    A         82      !         -175.486   -116.448   122.085   306.796    NaN       NaN       NaN       94.560   87.639   94.289   EEE   N./A.      N./A.
              CYH   1a4i    A         147     !         -170.428   -67.142    -32.160   306.345    NaN       NaN       NaN       69.943   28.254   50.404   CHH   N./A.      N./A.
              ...
              CYD   1a7s    A         26      !         179.191    -156.003   165.151   271.016    NaN       NaN       NaN       85.909   41.553   70.490   EEE   N./A.      N./A.
              CYD   1a7s    A         123     !         -179.581   -136.108   177.296   306.932    NaN       NaN       NaN       58.133   34.026   47.400   EEE   N./A.      N./A.
              ...
              CYS   1a3a    A         82      !         -175.486   -116.448   122.085   306.796    NaN       NaN       NaN       94.560   87.639   94.289   EEE   N./A.     N./A.

2010 BBDep Rotamer Library Developers

Maxim Shapovalov and Roland Dunbrack

Article

A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Shapovalov, M.V., and Dunbrack, R.L., Jr., Structure 2011, 19, 844-858. Article