Protein Biological Unit Database (ProtBuD)

Copyrightę 2006 Qifang Xu, Adrian A. Canutescu and Roland L. Dunbrack, Jr.
Fox Chase Cancer Center
Philadelphia, PA, USA

    System Requirements

     Database Description 

     Other Features

    Term Definitions


ProtBuD is a comprehensive database of asymmetric units (ASUs) and biological units (BUs) from the PDB and PQS, that uses SCOP and PSI-BLAST to provide the ASU and BUs for all PDB structures of proteins in particular superfamilies or families. The database provides information on the molecular content of each entry (protein names and species, nucleic acids, small molecules, ions, etc.) and information on whether the PDB and PQS biological units of an entry are the same or not.

A user can search ProtBuD for a particular SCOP designation (e.g., "b.2.5.2") or a particular PDB entry (with or without chain ID) to obtain the asymmetric and biological unit content of related proteins in the PDB as identified by PSI-BLAST.

The database can be updated weekly with a single click. The update automatically adds new entries, changes modified entries, and removes obsolete entries.

back to index


The software currently works on Windows 2000 and XP platforms, using Microsoft .Net 1.1 and requires Windows Installer 2.0. Please perform the following steps:

1 [required] - After filling out the license form and receiving an e-mail from us, download the file ProtBuD.msi from the link given in the e-mail and launch it, then follow the step-by-step installation instructions.

2 [optional] - If you can not start the ProtBud.msi installer by double-clicking on it, then you may need to install Windows Installer 2.0 first. You can download this program from here.

3 [optional] - If .Net 1.1 is not installed on your machine, you will be shown an error message. If that happens, close the ProtBud installer and download the dotnetfx.exe file from here

and install it by double-clicking on it after downloading it. After installing .Net 1.1, run the ProtBuD installer (Step 1).

To uninstall ProtBuD, go to Control Panel -> Add Or Remove Programs -> ProtBuD. Click Remove.

Minimum Requirements

Windows XP/2000

Windows .Net Framework Version 1.1 Redistributable Package

Windows Installer 2.0 for Win95, 98, Me, NT4 and 2000

Minimum Memory: 512MB

Minimum CPU: 1.8GHz

Minimum Disk Space: 800MB

back to index



The ProtBuD tool is designed to find biological units of all PDB entries in a specific SCOP classification or a homologous group as identified by PSI-BLAST. A user inputs a PDB entry with or without a chain ID, and ProtBuD first returns all SCOP domains and their SCOP codes for that PDB entry and chain ID, if specified. The user then clicks one of the SCOP families or superfamilies returned, and the database returns the PDB and PQS biological units as well as the asymmetric units of all PDB entries in that SCOP Family or Superfamily. Entries not in SCOP are added to the list if they are identified by PSI-BLAST as homologous to one of the entries in the SCOP family or superfamily.

To browse the contents of the PDB entries in the family or superfamily, the user can click a PDB code cell and use the UP or DOWN keys to navigate through the PDB-code list. In this way, the user can identify entries with desired content, such as specific proteins, nucleic acids, ligands, ions, etc.

For those queries consisting of a PDB entry not in SCOP, we use PSI-BLAST to find homologous structures. For a PDB query, the database returns a list of related sequences in the PDB and their corresponding SCOP designations, if these exist. A user can then click on the SCOP designation of one of the homologous proteins returned by the query search to analyze the biological units of all PDB entries in that specific SCOP category.

If there are no PSI-BLAST hits in the database, the program returns the biological units and asymmetric units of the query PDB.

The user can choose to input directly a known SCOP Family or Superfamily (e.g., "b.2.5.2") in order to retrieve the biological units of all PDB entries in that SCOP classification.

Examples of each type of query are shown below

1. Enter either one or two SCOP codes at Fold or Superfamily or Family level. For example, b.2.5.2 (Family: p53 DNA-binding domain-like). A list of biological units of all PDB entries in the designated SCOP classification is returned by the database. In case of two input SCOP codes, all PDB entries that contain both SCOP domains are returned Example (a.1.1.1 and a.1.1.1; c.5.1.1 and c.59.1.1).

Figure 1

2. Click any cell in the "PDB" column to display the entities (molecule types) and protein chains in the asymmetric unit ("Asymmetric chains"). Use the UP and DOWN keys to navigate through the list.

Figure 2

Table 1. Possible values for "SameBUs" column This column provides information on whether the PQS and PDB biological units are the same or different.





Same entity contents, same orientation

(PDBBU-Entity: (1.1)(2.1), PQSBU-Entity: (1.1)(2.1))

Interfaces in PDBBU and PQSBU are same


Same entity contents, different number of interfaces

(PDBBU-Entity: (1.3), PQSBU-Entity: (1.3))
1tui.pdb1 has 2 interfaces, 1tui_1.mmol has 3 interfaces.


Same entity contents, same number of interfaces, different orientation

(PDBBU-Entity: (1.3), PQSBU-Entity: (1.3))
The number of interfaces in both biological units is 1, but orientations are different .


Entity content of one BU is a subset of the other one. Interfaces in the smaller biological unit are contained in the larger biological unit

(PDBBU-Entity: (1.2), PQSBU-Entity: (1.4))
Pdb dimer is contained within the PQS tetramer.


Different entity contents and one structure is not a substructure of the other

(PDBBU-Entity: (1.2), PQSBU-Entity: (1.4))
PDB biological unit is not the same size nor a substructure of PQS biological unit.



(PDBBU-Entity: (1.1), PQSBU-Entity: (1.2)).

3. Downloading ASU/BU files from the PDB and PQS ftp servers: A user can either download individual files by right clicking the ASU or BU cell (see figure 4) or download a list of ASU/BU files by selecting multiple rows.

Figure 3.

4. Checking the "Interfaces" check box in the upper right activates the interfaces window for each entry. The symmetry operators and interface contacts can be displayed by left clicking on each entry of interest, or browsing with the UP and DOWN arrow keys. Clicking on each interface ID in the "Biological Unit Inferfaces" window for an entry will display symmetry operators and contacts for that interface. Files with coordinates of the proteins with that interface can be downloaded by right-clicking on either a PDB interface ID or PQS interface ID. To download multiple interface files at once, select several rows, then right click on the selected rows (see Figure 3).

Figure 4.

1. Entering a PDB code with or without chain ID (or entity ID) will result in ProtBuD returning the SCOP domains and SCOP codes of that PDB entry/chain. If the input entry does not have a SCOP definition, a list of PSI-BLAST hits with their corresponding SCOP codes are returned. (Example entries: 1xb3, 1pk1, 1xmm ). If there are no PSI-BLAST hits for the input entry, ProtBuD returns only the biological units and asymmetric units for that entry. (Example entries: 1erf, 1g3x, 2ax3.)

Figure 5

2. Clicking on a SCOP code Family, Superfamily or Fold will return the biological units of all PDB entries at that SCOP level and the homologous PDB entries not present in SCOP, whose alignments with the query have E-values less than 0.001.

Figure 6

Clicking a PSI-BLAST hit entry will display the identity, E-value and alignment range between the query and hit sequences, instead of a SCOP code.

Figure 7

3. If the query is not in SCOP, ProtBuD will return a list of PSI-BLAST hits. Figure 8 shows for example a list of PSI-BLAST hits for PDB entry 1pk1. In this table, the user can:

(a). Click on a hit entry to display the asymmetric units and biological units of the chosen PDB entry.

(b). Click a SCOP family, superfamily or fold to view the asymmetric units and biological units of all PDB entries in that SCOP category.

(c). Right click the table and choose one of three menu items:

(i). "Show ASU/BU of Query" to display ASU/BU of the query entry,

(ii). "Show ASU/BU of All" to display ASU/BU of all PDB entries in the table.

(iii). "Show ASU/BU of Non-SCOP hits" to display ASU/BU of hit entries and query entry.

Figure 8

1. Entering keywords (that can optionally be preceded by Boolean operators: AND, OR) will return a list of PDB entries that contain those keywords in the PDB file. An example query can be "tumor and suppressor".

Figure 9

Figure 10

back to index


The "Update" function downloads the latest database from our web server and automatically decompresses and installs the database in the default or a user-specified directory.

back to index


The "Advanced" operations menu contains functions for file download, database rebuilding, automatic database update, manual database update, and user-defined SQL query.


In the download-settings window, a user can set the paths for the ftp and http servers. The user must choose a local directory for the downloaded files, either by using the Browse(...) button or by typing the path. Clicking the Default button will reset to the original settings.

    Figure 11.

  • Download

Clicking the "Download" menu under Operations -> Advanced will cause ProtBuD to download the selected file types. The program always checks first if there are any new or modified files. After comparing the local and remote files, ProtBuD will download only the new or modified files.

Figure 12.

Automatic Update

The "Automatic Update" function downloads new and updated SCOP files, PDB files, PQS files and PDB PSI-BLAST Hit files. Then the newly downloaded files are parsed and the new information is inserted into the database. All steps are done with a single click.

Manual Update

"Manual Update" updates data tables selected in the Data Table Select Dialog. During this operation data files must be downloaded first. Data file directories must be correctly set in the download settings window.

Figure 13.

Figure 12 shows the progress information for updating database.

Figure 14.

Rebuild Database

The operation rebuilds the whole database. A user can choose which type of tables to be deleted. For instance, if the user wants to rebuild only the tables generated from PDB XML files, s/he can just check the PDB checkbox, and the software will rebuild only the PDB tables. Keep in mind though that the data files must be downloaded first using the "Download" function(figure 7) and the local data directories must be correctly set in the download settings window (figure 6).

Figure 15.

User Query

This function allows a user to further analyze the database. For a given SQL select statement, the query results are displayed in an output window. Refer to the Database Structure and Database Description for column definitions and table names.

Figure 16.

back to index


back to index

Other Features

1. Choosing Biological Unit Formats

Checking/unchecking the checkboxes located at the top of the result window (see Figure 2) will turn on/off the corresponding columns in the biological unit table. For instance, checking "Asymmetric" checkbox will display the asymmetric ID format for the selected asymmetric and biological units. The default format is the "ABC" format.

2. Changing The Data Table Style

A user can change the width of a table column and move the delimiters to change the table height. Table style can be reset by right click "Reset" menu item.

3. Filtering for Entity and AsymID

Right-clicking on an Entity or Asymmetric-chain table, will display a menu containing five options. "Polypeptide" will display only the polypeptide entities and/or asymmetric chains. "All" displays all entities and all polymer asymmetric chains. The "SCOP code" function displays only the entities with at least one SCOP code in the SCOP-code list. "Close" closes the table. "Reset" is used to reset the table style.

4. Instant Help When Placing the Mouse Over Window Elements

An instant-help window shows up when a user places the mouse over a biological unit or interface table. For instance, positioning the mouse over a column header of PDBID in biolUnit form will bring up "Left click to browse entities and asymmetric chains. After selecting a PDBID cell, then use UP and DOWN keys to quickly browse all entries." See figure 17.

Figure 17

Term Definitions

1. SCOP Structural Classification. SCOP provides four hierarchical levels that imply the evolutionary and structural relationships of proteins with known structures. At the top level SCOP defines proteins as "a-helical", "all-b", "a/b", etc. The second level defines the folds, defined as the basic arrangement of secondary structure units. The third "superfamily level" includes proteins or domains with at least putative evolutionary relationships. These are broken down at the fourth level into "families" which generally includes proteins with very similar functions and high sequence identity relationships. For an instance, SCOP code a.1.1.1 refers to "all alpha proteins" class, "Globin-like" fold, "Globin-like" superfamily and "Truncated hemoglobin" family. For details about the SCOP hierarchical classification, see

2. Asymmetric Unit: see PDB Biological Unit Tutorial

3. Biological Unit: see PDB Biological Unit Tutorial

4.Biological Unit Formatting

Format Description Example (1gzh)
Asymmetric Format Unique PDB chains with number of copies (A)(B, D)(C)
Author Chain Format Author-named chains with number of copies (A)(B, D)(C)
Entity Format Entity ID with number of copies (1.1)(2.2)(3.1)
ABC Format ABC chains with number of copies* A2BC
* ABC format only uses the letters A, B, C, ... and begins with A and maximum number of copies.

5. Modified Residue Formatting

The format of modified residues in the Asymmetric Chain table is PDB sequence number (Author sequence number): standard residue -> modified residue (modification details). (e.g. 18(19): ALA -> AIB (ALPHA-AMINOISOBUTYRIC)). There are 4 other cases:

  • If the author sequence number is the same as the PDB sequence number, then only the PDB sequence number is listed. e.g. 18: ALA -> AIB (ALPHA-AMINOISOBUTYRIC)
  • If the modified residue type is the same as a standard residue type, then only the standard residue type is listed. e.g. 18: ASN (GLYCOSYLATION)
  • If non-standard residue name.e.g. 18: MLY (N-DIMETHYL-LYSINE)
  • If there is no PDB residue number but there is author residue number.(19): ASM (2-AMINO-4-OXO-4(1H-PYRROL-1-YL)BUTANOIC ACID).

6. Covalent Attachments

Covalent attachments to an asymmetric chain comprise a list of asymmetric chains that are covalently attached. For instance, if "A" is a polypeptide and "K" is a sugar, then each one will be included in the covalent attachments list of the other one.

7. Missing Residues In Coordinates

Residue coordinates for that specific residue are missing from the PDB file.

back to index