Culling the PDB
Culling the PDB by
Resolution and Sequence Identity
(Last update: November 18, 2009)
Pre-compiled CulledPDB Lists from PISCES
pdbaa: Sequence Files Representing the Whole PDB
Downloadable standalone PISCES package

Pre-compiled CulledPDB Lists from PISCES
From this page, you can download lists that we have already compiled for various parameter
sets (resolution, sequence identity, etc.).
The current list of PISCES CulledPDB sets is shown below. You may request
other lists by using our
Protein Sequence Culling Server .
In the list below, the resolution and percent identity cutoffs are
given in each filename. E.g., for cullpdb_pc20_res1.8_R0.25_d091114_chains2417,
the percentage identity cutoff is 20%, the resolution cutoff is 1.8
angstroms, and the R-factor cutoff is 0.25. The list was generated on
November 18, 2009. The number of chains in the list is 2417. Files
with "inclNOTXRAY" include sequences from non-xray-derived structures (mostly NMR
but also including electron diffraction, FTIR, fiber diffraction, etc.). Files
with "inclCA" include sequences of structures that contain only
backbone CA coordinates.
Each file gives the PDB entry (four-letter code), chain code ("0"
if there is only one chain in the entry), the experimental method (XRAY, NMR, etc.)
the number of residues in
the chain, the resolution, the R-value, and free R-value (if
available; otherwise NA). The directory includes fasta
sequence files for each list.
Lists currently available
The Culled PDB ftp directory
cull_d091114.tar.gz: A gzipped tar file with all of the lists.
pdbaa: Sequence Files Representing the Whole PDB
Three gzipped FASTA-format files of all PDB sequences are also available to be
downloaded:
pdbaa.gz:
every protein chain in every PDB file has a unique entry in pdbaa.gz.
For example, 1A01B and 1A01D are two chains from PDB entry 1A01, and have separate
entries in pdbaa.gz although their sequences are identical:
>1A01B 146 XRAY 1.80 0.169 0.223 HEMOGLOBIN (BETA CHAIN)
MHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPATQRFFESFGDLST
PDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDP
ENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
>1A01D 146 XRAY 1.80 0.169 0.223 HEMOGLOBIN (BETA CHAIN)
MHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPATQRFFESFGDLST
PDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDP
ENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
pdbaaent.gz:
only non-redundant sequences in each PDB file have unique entries in pdbaaent.gz,
and the redundant chain IDs in the PDB file are listed at the end of the title of the
representative chain entries. For example, 1A01D does not have an entry in
pdbaaent.gz, because it is identical to 1A01B. The title of 1A01B is
changed to:
>1A01B 146 XRAY 1.80 0.169 0.223 HEMOGLOBIN (BETA CHAIN) || 1A01D
pdbaanr.gz:
only non-redundant sequences across all PDB files have unique entries in pdbaanr.gz,
and the redundant chain IDs from all other PDB files are added at the end of the title of the
representative chain entries. Representative chains are selected based on the highest
resolution structure available and then the best R-values. Non-X-ray structures are considered
after X-ray structures.
For example, 1A01D, 1A0WB, and 1A0WD do not have
entries in pdbaanr.gz, because they are identical in sequence to 1A01B. The title of
1A01B is changed to:
>1A01B 146 XRAY 1.80 0.169 0.223 HEMOGLOBIN (BETA CHAIN) || 1A01D 1A0WB 1A0WD
Downloadable standalone PISCES package
PISCES.tar.gz:
the whole package to run PISCES locally in a standalone mode.
BLASTDB.tar.gz:
pdbaa related database files for PISCES. If you have downloaded PISCES.tar.gz before,
you only need to download this to get updated. You should put all unpacked files into
PISCES/BLASTDB to finish the updating.
pdbaa.tar.gz:
the whole package for blast searchable database files of pdbaa, can be downloaded for MolIDE's usage.
Contact us:
E-mail:
GL_Wang@fccc.edu
RL_Dunbrack@fccc.edu
Last modified November 18, 2009
by Guoli Wang & Roland L. Dunbrack, Jr.
Institute for Cancer Research
Fox Chase Cancer Center
333 Cottman Avenue
Philadelphia PA 19111
|