Below are the detailed steps of a complete homology modeling cycle. All these steps are done through the graphical interface and are prompted in Beginner Mode.
Step 1: Downloading Databases and Staying Up-to-date
The first time the program is used, it is required to download our compressed (250MB) Protein Relational Database, PDBfam (Main Menu -> Tools -> Download/Update Databases). It contains data on assigned Pfam domains to all PDB sequences, biologically active assemblies, descriptors and statistics. BAM will automatically take care of uncompressing it (1.6GB). The user will be notified by BAM's Urgent Message System when a new PDBfam release is available for update.
The user can update/reinstall BAM itself at any time (Tools -> Reinstall/Update BAM). Whenever a new version of BAM is available, the user will be automatically notified through GUI. New versions of BAM are expected to be released from time to time to fix newly detected bugs. In addition BAM relies on a set of online third-party services like RCSB PDB or Sanger Pfam. Problems may occur once these providers make changes to their formats or service. The BAM updates will patch any future format inconsistencies that may arise.
By default PDB sequence database, PDBAA is already included with BAM installation. Optionally the user can download a much larger, more comprehensive sequence database such as Uniref50 (1.8GB, compressed) or Uniref90 (4.0GB compressed). Main Menu -> Tools -> Download/Update Databases. It leads to more accurate results for targets that are only remotely related to PDB structures. However, it takes more computational time in the next alignment step. It is also recommended to update these databases every few months.
Open / copy-and-paste / type in / edit a FASTA-formatted sequence file with up to 6 query protein sequences. BAM package includes a set of 5 sample sequence files that can be used to learn BAM and produce 5 different modeling projects (File -> Open -> Samples). When a new sequence file is requested, BAM will suggest this template to start from.
Amino acid residues are represented with a single letter. A user can modify title tags for each target sequence to distinguish them with ease. BAM will automatically check the input for any discrepancies. We don't recommend to use too long sequence titles since they will be cutoff to up to 9 characters in order to fit GUI elements. On the sequence form there is a button in the upper-right corner leading to a Wikipedia webpage explaining the FASTA format in more detail.
BAM relies on a PSI-BLAST version modified by us. It produces a sequence profile for each input target sequence. By default there are 2 rounds of PSI-BLAST search against PDBAA. The PSI-BLAST sequence database, number of the rounds and other parameters can be modified in Main Menu -> Tools -> Settings -> View or Edit. During the final round of PSI-BLAST alignment, a sequence profile is saved for each target sequence. These profiles are used in secondary structure (SS) prediction (Step 4) and profile-profile sequence alignment of the target and template sequences (Step 6).
Step 4: Secondary Structure Prediction for Target Sequences
BAM runs PsiPred software to predict Target Secondary Structure (SS). It can be viewed right away for each or all PSI-BLAST rounds together. It will also be shown in the Target-Template alignment (Step 9).
The viewing of SS predictions for each round may help to determine how many PSI-BLAST rounds are needed in Step 3 above. The default number of 2 rounds can be adjusted. It may be helpful to change the number of rounds to produce a more accurate sequence profile for a more distant target sequence with a weaker homology signal.
Each target sequence is searched for family domains using Pfam web service. Pfam domains are assigned to each of the Target sequences. The number, order, locations and names of these Pfam domains define Target sequence domain architecture. Any domain can be clicked to open Pfam domain Wikipedia page. Domain assignment enables not only learning more about Target functional domains but it also allows a more sensitive search for templates as discussed in Step 6.
The disadvantage of this domain approach is when no Pfam domains are found in Target even when a homologous structure is contained in the PDB. We are planning to add an option of alignment search against the whole PDB when no domains are assigned to Target. At this point the only way around is to use our MolIDE1 when no Pfam domains can be found for Target.
Step 6: Search for Templates with Similar Domain Architecture
Our PDBfam database, which can be downloaded or updated (Step 1), stores assigned Pfam domains for the entire database of PDB structures. At first, a very quick search of Target sequence domain architecture is performed against PDBfam database. This search generates a list of template domain architectures with the those most similar to the Target at the top. The number of PDB structures belonging to each template architecture is shown next to it. If not all target domains are found, the most similar architectures will still be shown. The user can browse through the list to identify templates with additional protein subunits of interest. This approach, referred to as an intermediate search in the literature, allows for greater sensitivity and also saves time by not performing calculations against the whole PDB.
Second the user selects one or more template architectures from the list and hits Compute button. For each template from the selected architectures BAM performs profile-profile sequence alignments based on our own algorithm (Wang and Dunbrack, 2004). Precomputed target profiles are downloaded from our server on the fly. At any time the user can refresh the table with alignment statistics to start analyzing the resultant data without waiting until the whole process finishes.
Once the profile-profile alignment calculations are done, the default table sorting puts structures with the most similar domain architecture at the top. Further these structures are sorted by sequence identity for all target sequence(s), with the highest one at the very top.
BAM models biologically active assemblies, not asymmetric units which are not necessarily a biologically active complex. For example, an asymmetric unit for hemoglobin can be a hetero-dimer and include two subunits only: alpha and beta. For hemoglobin the biological assembly is a hetero-tetramer: two subunits of alpha and two subunits of beta. BAM shows stoichiometry of template biological assemblies in terms of 1) the input target chains, 2) any template complex chains and 3) domains assigned to any chain (when the table is in Architecture mode).
The table is sortable by any column such as resolution, sequence assembly architecture, chain assembly architecture, and for each sequence: sequence alignment identity, alignment gap percentage, alignment length, alignment start/end positions etc. Additional columns include 4-character PDB code, PDB title, PDB description, keywords, etc.
The default sorting can be restored at any time by clicking 'Best Template Sort' button.
If several template sequences share the same Pfam domain(s), BAM assigns a target sequence with the same Pfam domain(s) to the template sequence with the highest sequence alignment identity.
And vice versa when several target sequences share the same Pfam domain(s), a template sequence with the same Pfam domain(s) is assigned to the target sequence with the highest identity.
Step 8: Reviewing Domain Architecture of Template Assembly
Each row in the table corresponds to a separate template. Double-clicking any row opens a form summarizing the selected template domain architecture. The user can try several templates from the table until a satisfactory template structure is found.
BAM models a biologically active assembly by applying symmetry operators retrieved from the RSCB PDB. In its turn the PDB collects this information from authors who submitted a structure. The authors make their assembly decision based on a set of experiments or by making their best guess. When no authors' assembly is available, PDB assigns an assembly from assembly prediction software, PISA. This approach does not guarantee against mistakes but it is more likely to be correct than using the asymmetric unit instead. Users should inspect several biological assemblies for a given template architecture. If they are inconsistent, then the issue should be explored further by consulting the relevant papers on the experimental structures.
Step 9: Target-Template Alignment Editor + 3D Viewer
(1) Study the produced profile-profile alignments, check how well the experimental and predicted SS's match each other. For difficult alignments, other servers such as HHpred may be consulted. It is generally recommended to move gaps to the center of a coil SS region, so that alpha helices and beta sheets are not disrupted. If needed, make adjustments to the alignment by: moving, inserting and deleting gaps both in Target and Template sequence(s). Save the alignment changes when prompted by hitting 'Save Alignment Changes' button.
(2) In the top-left panel the user checks/unchecks inclusion of the specific template chains into the final target model.
At any time the user may want to click 'Hide Legend / Show Legend' button for quick reference on the designations and how to manipulate Alignments and 3D Viewer discussed below.
BAM automatically downloads a PDB XML file of the template selected in Step 8 and, constructs its biological assembly. The top-left panel holds all input Target sequences with matching chains from the template assembly. The top panel holds a rotatable 3D structure of the template assembly. The default view is Ca-Ca backbone where the nodes are Ca atoms connected with sticks. The aligned template pieces are colored from blue (N-terminus) to red (C-terminus) for each chain. The following parts of the template chains are colored in gray:
not aligned pieces,
the whole protein chains specifically excluded by the user from the target model,
deletions (residues removed from the template sequence(s) to align Target),
the starting and ending residues manually excluded by the user (details below) from the final model.
The bottom panel shows Target-Template alignment side-by-side for each input target sequence. Scroll down to see all Target-Template alignment pairs. The above sequence in red is always a target sequence. The sequence beneath it in blue is always the aligned template sequence. The target sequence numbering always starts with 1. The template sequence numbering is according to its original PDB structure. The matching residue types are printed with one-letter code in white. The similar types are coded with a white '+'. The mismatched types remain in their original color: Target in blue, Template in Red.
The SS of both Target (predicted by PsiPred) and Template (experimental taken from PDB) are drawn as small color-filled squares. The helix SS is in red. The sheet SS is in green. The coil SS is in gray. The higher color intensity of Target SS, the higher confidence of PsiPred SS prediction is. A template residue with missing coordinates has a '?' character inside its secondary structure box.
There may be two types of gaps in sequence alignment of Target and Template: gaps in Target and gaps in Template. It is convenient to define them relative to Template, i.e. required modifications applied to Template to achieve Target.
Alignment gaps in Target are deletions from Template sequence, i.e. there are a few template residues that need to be deleted from the Template in order to achieve a model of the Target sequence. Such deletions from Template are drawn with small red circles inside the SS boxes in the alignment panel and small red spheres positioned on the Ca atoms of the target Ca-Ca backbone in the 3D viewer panel. The way to remember: Red = STOP = Delete from Target.
Alignment gaps in Template are insertions into Template sequence, i.e. there are a few target residues to be inserted into Template in order to achieve a Target sequence model. Such insertions into Template are drawn with small yellow circles inside SS boxes in the alignment panel and small yellow spheres positioned on the stick between two template neighboring Ca atoms in the Ca-Ca backbone view.
Actions in Alignment Panel:
Scroll with a horizontal bar left and right to browse the whole alignment when it doesn't fit. Scroll to the left to find identifiers of Target and Template sequences.
Change the font size of the alignment with Font size control to zoom in/out the alignment. Hint: to get the whole picture, zoom out to see the whole alignment even though individual residues are too small to read.
Click Left Button (Cursor: Pointing Finger) on any aligned residue pair to see what target chains it belongs to, target sequence number (starting from 1), template sequence number (as in the template PDB file) and 3-letter code for each residue type of the pair. Clicking also shows backbone and side-chain atoms of the template residue in the 3D viewer. Click elsewhere to remove the selection.
Drag (Cursor: Double-Ended Horizontal Arrow) the starting and ending positions of the target model for each aligned sequence. Click the left or right golden arrow with the left mouse button and hold it and then move to a new desired position. The excluded residues will not be included in the final model.
To move a gap, move the cursor over the gap: either one-letter code or SS box (Cursor: Vertical Bar with a Double-ended Horizontal Arrow). Drag the gap to a new location.
To insert a gap into Target or Template, hold Left Shift + move the cursor over the desired location (Cursor: Insert Symbol) + Click Left Button. Repeat as many times as needed.
To delete a gap from Target or Template, hold Left Control + move the cursor over the gap (Cursor: X Symbol).
Change the color scheme for Alignment and 3D Viewer Panels with 'Colors' drop-down list.
Actions in 3D Viewer Panel:
To change Structure View to:
Backbone atom mode: press the 'b' (backbone) key or Main Menu -> Draw -> Backbone Atoms.
Spacefill mode: press the 'f' (full) key or Main Menu -> Draw -> Spacefill.
Ca-Ca backbone mode: press the 'w' (wire) key or Main Menu -> Draw -> Ca-Ca backbone.
To rotate the structure, please hold left mouse button and move the mouse. Or use 4 (left), 6 (right), 8 (up) and 2 (down) on the keypad. To reset the original location, please press 5.
To zoom in/out, hold the right mouse button and move the mouse up/down. Or use +, - keys on the keypad.
To shift the assembly, hold the middle mouse button and move the mouse.
Step 10: Copy Backbone from Template to Target and Preserve the Conservative Side Chains
(3) Hitting 'Copy Backbone' button on the alignment form collects all information from the alignment form and generates 3 files with 'ProjectName_basedOnABCD' filename root, where 'ProjectName' is the name of the project (the way the user named the original FASTA sequence file) and 'ABCD' is a 4-character code for the template PDB structure. There will be 3 files with extensions: '.backbone.pdb', '.coorseq' and '.allseq'.
'.backbone.pdb' is a PDB structure file for a model of the target assembly based on the template structure. The backbone of the aligned template residues is copied over to the target residues. The target residues of the same amino acid type as in Template (conserved) will have side-chain atoms copied over too. The target residue numbering is according to the original numbering of the target starting with 1 for the first residue in the FASTA sequence file. The following target residues will have no ATOM records in the PDB file:
any target residues specifically excluded by the user through the golden start and stop arrows,
any non-aligned target residues,
any target residues aligned to the template ones with missing coordinates,
any target residues treated as insertions.
'.coorseq' stores aminoacid sequence for residues with coordinates in the '.backbone.pdb' PDB file. The amino acid sequence is in the one-letter code. The lower-case letter designates a conserved residue type, i.e. the aligned amino acid pair of Target and Template share the same amino acid type. The upper-case letter corresponds to a non-conservative residue type. There is one line for each chain of the target assembly. The chains are in the same order as the PDB file has listed them. This file along with '.backbone.pdb' PDB file serve as input for our side-chain modeling software in the next step.
'.allseq' is the same as above but stores amino acid sequence for all target residues, not only the ones with the modeled backbone coordinates.
Press 'Build Side Chains' button. BAM will run our side-chain prediction software, Scwrl4 with ProjectName_basedOnABCD.backbone.pdb and ProjectName_basedOnABCD.coorseq as input. It will only model the non-conservative side chains encoded with an upper-case one-letter code in the '.coorseq' file. The conservative conformations will keep their rotamer conformations but will be slightly adjusted to match exactly a rotamer taken from our rotamer library. This rotamer will be the closest one to the original native conformation, a side chain has in the input '.backbone.pdb' PDB file.
Scwrl4 will output '.sidechain.pdb' and '.sidechain.log' files. The former one is a target model of the biological assembly without any deletions or insertions modeled with loop closure software. The latter one is a log file on side-chain predictions. This step finalizes the BAM modeling cycle. The user may want to study this model in software like PyMol or Chimera. From our experience this model may answer many biological questions the user may have. For a more complete model, the gaps need to be closed with loops outside of the current BAM release.
Step 12: Closing Deletions and Insertions with Loop Modeling
If the user needs to model the loops to close insertions and deletions, please use any 3rd-party software (e.g. Rosetta or Modeller) to do so. As discussed above, it is desired that deletions and insertions happen in the middle of coil SS regions. Refer to the '.allseq' file produced by BAM in Step 10 for complete Target sequence(s). Please note that the '.coorseq' file does not include target residues inserted into Template since no coordinates were available for them from the Template structure. After modeling loops with another program, it may be useful to re-run Scwlr4 to repack side chains of the model.