A Protein Sequence Culling Server
PISCES has several capabilities in producing subsets of sequences from larger sets:
- subsets of sequences culled from the entire PDB according to structure
quality and maximum mutual sequence identity
- subsets of sequences culled from a list of PDB chains or PDB entries input
by the user, according to structure quality and maximum mutual sequence identity
- subsets of sequences from user-input lists of GenBank accession numbers,
according to maximum mutual sequence identity
- subsets of sequences from user-input BLAST or PSI-BLAST output; the hits are
culled according to mutual sequence identity
Important features of this service:
- sequence identities for PDB sequences are determined by the combination of CE structural alignment
and PSI-BLAST alignment. We use Z-score of 3.5 as the threshold to accept possible evolutionary relationships,
and the sequence identity is calculated as Nc/Nt instead of
Nc/(Nt+Ng). Nc= Identical_Aligned_Pairs, Nt = Total_Aligned_Pairs,
Ng = Gaps.
- PISCES' alignments are therefore local, so that two proteins that share a
common domain with sequence identity above the threshold will not both be included
in the output lists. Some other available servers use global (Needleman-Wunsch-type)
alignment methods that may provide meaningless sequence identities for multidomain proteins.
- PISCES can also therefore provide meaningful results at low sequence identity (15-30%)
compared to servers that use only sequence pairwise alignments.
- PDB sequences, experiment type (X-ray, NMR, etc.), resolutions, and R-factors obtained
from the PDB's Data Uniformity Site.
These files have been curated by the RCSB to establish uniform representation of all
structure data from the 1000's of legacy files from the Brookhaven PDB.
- non-PDB sequences are culled with sequence identities from PSI-BLAST. We do not search
the non-redundant sequence database, but rather use the user's input sequences as the
database. This server will usually be used to cull a related set of sequences, for instance
those from a PSI-BLAST search.
- the server provides output lists of accession IDs and files of the sequences in FASTA
format on a webpage created for the user. The address is e-mailed to the user upon
completion of the calculation, and will be stored for one week.
- PISCES' PDB sequences are update weekly from the Uniformity PDB files.
Access the server to create your own lists
or
Download precompiled CulledPDB lists and standalone PISCES and databases
Please cite the following in any work that uses lists provided by PISCES
G. Wang and R. L. Dunbrack, Jr. PISCES: a protein sequence culling server. Bioinformatics, 19:1589-1591, 2003.
A preprint of this paper is available,containing more details of the calculations on PISCES.
Contact the authors:
Institute for Cancer Research
Fox Chase Cancer Center
333 Cottman Avenue
Philadelphia PA 19111
(215) 728-2434
Last modified: July 22, 2004
|