Protein Biological Unit Database (ProtBuD)
ProtBuD is a comprehensive database of asymmetric units (ASUs) and biological units (BUs) from the PDB and PQS, that uses SCOP and PSI-BLAST to provide the ASU and BUs for all PDB structures of proteins in particular superfamilies or families. The database provides information on the molecular content of each entry (protein names and species, nucleic acids, small molecules, ions, etc.) and information on whether the PDB and PQS biological units of an entry are the same or not.
A user can search ProtBuD for a particular SCOP designation (e.g., "b.2.5.2") or a particular PDB entry (with or without chain ID) to obtain the asymmetric and biological unit content of related proteins in the PDB as identified by PSI-BLAST.
The database can be updated weekly with a single click. The update automatically adds new entries, changes modified entries, and removes obsolete entries.
The software currently works on Windows 2000 and XP platforms, using
Microsoft .Net 1.1 and requires Windows Installer 2.0. Please perform the
To uninstall ProtBuD, go to Control Panel -> Add Or Remove Programs -> ProtBuD. Click Remove.
Windows Installer 2.0 for Win95, 98, Me, NT4 and 2000
Minimum Memory: 512MB
Minimum CPU: 1.8GHz
Minimum Disk Space: 800MB
The ProtBuD tool is designed to find biological units of all PDB entries in a specific SCOP classification or a homologous group as identified by PSI-BLAST. A user inputs a PDB entry with or without a chain ID, and ProtBuD first returns all SCOP domains and their SCOP codes for that PDB entry and chain ID, if specified. The user then clicks one of the SCOP families or superfamilies returned, and the database returns the PDB and PQS biological units as well as the asymmetric units of all PDB entries in that SCOP Family or Superfamily. Entries not in SCOP are added to the list if they are identified by PSI-BLAST as homologous to one of the entries in the SCOP family or superfamily.
To browse the contents of the PDB entries in the family or superfamily, the user can click a PDB code cell and use the UP or DOWN keys to navigate through the PDB-code list. In this way, the user can identify entries with desired content, such as specific proteins, nucleic acids, ligands, ions, etc.
For those queries consisting of a PDB entry not in SCOP, we use PSI-BLAST to find homologous structures. For a PDB query, the database returns a list of related sequences in the PDB and their corresponding SCOP designations, if these exist. A user can then click on the SCOP designation of one of the homologous proteins returned by the query search to analyze the biological units of all PDB entries in that specific SCOP category.
If there are no PSI-BLAST hits in the database, the program returns the biological units and asymmetric units of the query PDB.
The user can choose to input directly a known SCOP Family or Superfamily (e.g., "b.2.5.2") in order to retrieve the biological units of all PDB entries in that SCOP classification.
Examples of each type of query are shown below
1. Enter either one or two SCOP codes at Fold or Superfamily or Family level. For example, b.2.5.2 (Family: p53 DNA-binding domain-like). A list of biological units of all PDB entries in the designated SCOP classification is returned by the database. In case of two input SCOP codes, all PDB entries that contain both SCOP domains are returned Example (a.1.1.1 and a.1.1.1; c.5.1.1 and c.59.1.1).
2. Click any cell in the "PDB" column to display the entities (molecule types) and protein chains in the asymmetric unit ("Asymmetric chains"). Use the UP and DOWN keys to navigate through the list.
Table 1. Possible values for
3. Downloading ASU/BU files from the PDB and PQS ftp servers: A user can either download individual files by right clicking the ASU or BU cell (see figure 4) or download a list of ASU/BU files by selecting multiple rows.
4. Checking the "Interfaces" check box in the upper right activates the interfaces window for each entry. The symmetry operators and interface contacts can be displayed by left clicking on each entry of interest, or browsing with the UP and DOWN arrow keys. Clicking on each interface ID in the "Biological Unit Inferfaces" window for an entry will display symmetry operators and contacts for that interface. Files with coordinates of the proteins with that interface can be downloaded by right-clicking on either a PDB interface ID or PQS interface ID. To download multiple interface files at once, select several rows, then right click on the selected rows (see Figure 3).
1. Entering a PDB code with or without chain ID (or entity ID) will result in ProtBuD returning the SCOP domains and SCOP codes of that PDB entry/chain. If the input entry does not have a SCOP definition, a list of PSI-BLAST hits with their corresponding SCOP codes are returned. (Example entries: 1xb3, 1pk1, 1xmm ). If there are no PSI-BLAST hits for the input entry, ProtBuD returns only the biological units and asymmetric units for that entry. (Example entries: 1erf, 1g3x, 2ax3.)
2. Clicking on a SCOP code Family, Superfamily or Fold will return the biological units of all PDB entries at that SCOP level and the homologous PDB entries not present in SCOP, whose alignments with the query have E-values less than 0.001.
Clicking a PSI-BLAST hit entry will display the identity, E-value and alignment range between the query and hit sequences, instead of a SCOP code.
3. If the query is not in SCOP, ProtBuD will return a list of PSI-BLAST hits. Figure 8 shows for example a list of PSI-BLAST hits for PDB entry 1pk1. In this table, the user can:
(a). Click on a hit entry to display the asymmetric units and biological units of the chosen PDB entry.
(b). Click a SCOP family, superfamily or fold to view the asymmetric units and biological units of all PDB entries in that SCOP category.
(c). Right click the table and choose one of three menu items:
(i). "Show ASU/BU of Query" to display ASU/BU of the query entry,
(ii). "Show ASU/BU of All" to display ASU/BU of all PDB entries in the table.
(iii). "Show ASU/BU of Non-SCOP hits" to display ASU/BU of hit entries and query entry.
1. Entering keywords (that can optionally be preceded by Boolean operators: AND, OR) will return a list of PDB entries that contain those keywords in the PDB file. An example query can be "tumor and suppressor".
The "Update" function downloads the latest database from our web server and automatically decompresses and installs the database in the default or a user-specified directory.
The "Advanced" operations menu contains functions for file download, database rebuilding, automatic database update, manual database update, and user-defined SQL query.
In the download-settings window, a user can set the paths for the ftp and http servers. The user must choose a local directory for the downloaded files, either by using the Browse(...) button or by typing the path. Clicking the Default button will reset to the original settings.
Clicking the "Download" menu under Operations -> Advanced will cause ProtBuD to download the selected file types. The program always checks first if there are any new or modified files. After comparing the local and remote files, ProtBuD will download only the new or modified files.
The "Automatic Update" function downloads new and updated SCOP files, PDB files, PQS files and PDB PSI-BLAST Hit files. Then the newly downloaded files are parsed and the new information is inserted into the database. All steps are done with a single click.
"Manual Update" updates data tables selected in the Data Table Select Dialog. During this operation data files must be downloaded first. Data file directories must be correctly set in the download settings window.
Figure 12 shows the progress information for updating database.
The operation rebuilds the whole database. A user can choose which type of tables to be deleted. For instance, if the user wants to rebuild only the tables generated from PDB XML files, s/he can just check the PDB checkbox, and the software will rebuild only the PDB tables. Keep in mind though that the data files must be downloaded first using the "Download" function(figure 7) and the local data directories must be correctly set in the download settings window (figure 6).
This function allows a user to further analyze the database. For a given SQL select statement, the query results are displayed in an output window. Refer to the Database Structure and Database Description for column definitions and table names.
Checking/unchecking the checkboxes located at the top of the result window (see Figure 2) will turn on/off the corresponding columns in the biological unit table. For instance, checking "Asymmetric" checkbox will display the asymmetric ID format for the selected asymmetric and biological units. The default format is the "ABC" format.
A user can change the width of a table column and move the delimiters to change the table height. Table style can be reset by right click "Reset" menu item.
Right-clicking on an Entity or Asymmetric-chain table, will display a menu containing five options. "Polypeptide" will display only the polypeptide entities and/or asymmetric chains. "All" displays all entities and all polymer asymmetric chains. The "SCOP code" function displays only the entities with at least one SCOP code in the SCOP-code list. "Close" closes the table. "Reset" is used to reset the table style.
An instant-help window shows up when a user places the mouse over a biological unit or interface table. For instance, positioning the mouse over a column header of PDBID in biolUnit form will bring up "Left click to browse entities and asymmetric chains. After selecting a PDBID cell, then use UP and DOWN keys to quickly browse all entries." See figure 17.
1. SCOP Structural Classification. SCOP provides four hierarchical levels that imply the evolutionary and structural relationships of proteins with known structures. At the top level SCOP defines proteins as "a-helical", "all-b", "a/b", etc. The second level defines the folds, defined as the basic arrangement of secondary structure units. The third "superfamily level" includes proteins or domains with at least putative evolutionary relationships. These are broken down at the fourth level into "families" which generally includes proteins with very similar functions and high sequence identity relationships. For an instance, SCOP code a.1.1.1 refers to "all alpha proteins" class, "Globin-like" fold, "Globin-like" superfamily and "Truncated hemoglobin" family. For details about the SCOP hierarchical classification, see http://scop.mrc-lmb.cam.ac.uk/scop/.
The format of modified residues in the Asymmetric Chain table is PDB sequence number (Author sequence number): standard residue -> modified residue (modification details). (e.g. 18(19): ALA -> AIB (ALPHA-AMINOISOBUTYRIC)). There are 4 other cases:
Covalent attachments to an asymmetric chain comprise a list of asymmetric chains that are covalently attached. For instance, if "A" is a polypeptide and "K" is a sugar, then each one will be included in the covalent attachments list of the other one.
Residue coordinates for that specific residue are missing from the PDB file.