MolIDE v1.7
Copyright ©2008, Adrian A. Canutescu, Qiang Wang and Roland L. Dunbrack Jr.
Fox Chase Cancer Center
Philadelphia, PA, USA

MolIDE is a program for comparative modeling of protein structures. It acts as a graphical user interface to the common tasks involved in predicting protein structures based on known homologous structures.

CONTENTS CREDITS

MolIDE uses these auxilliary programs (included with MolIDE, except SCWRL, which must be licensed separately): Please cite: back to index


INSTALLATION

PREREQUISITES:

Linux: You need RedHat 8 or a more recent version (Fedora), or equivalent to install the Linux distribution of MolIDE.
Windows: Microsoft Windows XP/2000/98

Make sure the path where you will install MolIDE does not contain spaces. This limitation is imposed by some of the third-party programs that cannot cope with spaces in the file names and paths.

!!! WARNING !!!
If you already have the NCBI package installed on your machine, make back-up copies for .ncbirc in your home directory (Linux) or $WINDIR\ncbi.ini (Windows). These files will be overwritten the installer.

Windows Installation:
  1. Double click on the installer program to start the installation.
  2. Follow the instructions to continue with the installation. It is strongly suggested that you keep the default directory (C:\FCCC\MolIDE). This will save you a lot energy in configuration.
  3. If you choose a different installation directory, you need to edit the configuration files for two third part programs: '$WINDIR\ncbi.ini' and '$Installation_Directory\bin_aux\loopy\bin\jackal.dir’. For both files, you need to replace ‘C:\FCCC\MolIDE’ with your installation path. If you keep the default directory, skip this step.
  4. You can double click 'molide.exe' to run MolIDE or you can use the shortcut created on Desktop.
MolIDE is installed and running. See below for configuring MolIDE settings and getting SCWRL3.


Linux Installation:
  1. In a terminal window, type "gzip -d molide_distro_lin.tar.gz"
  2. Then "tar -xvf molide_distro_lin.tar"
  3. MolIDE uses the wxWindows cross-platform library, so you have to have it installed. For convenience we have included in the Linux distribution the rpm files for version 2.4.2-1, located in the subdirectory "wx". To install the wxWindows library on Linux, log in as root, cd to "wx" directory and type:
    "rpm -U *.rpm"
  4. cd to the directory "molide_distro_lin" and type: "./setup"
  5. Type "./molide" to run MolIDE.
MolIDE is installed and running. See below for configuring MolIDE settings and getting SCWRL3.


!!! SETTINGS !!!
Start MolIDE (last step of Installation above for Windows or Linux) and set up the file names paths and parameters for the programs used in Molide. The settings are accessed through Tools->Options menu. PSI-BLAST, FormatDB, PSIPRED, GZip, and Loopy are distributed with MolIDE and the paths for these programs are already entered in the Options menu as the default. Change the settings only if you have chosen a different installation path other than the directory 'C:\FCCC\MolIDE'. After modifying the parameters for a certain program, click on "Save" button in the corresponding settings dialog window. Please check each category in Tools->Options menu before running MolIDE.

SCWRL3
Make sure you download SCWRL3 for building side chains. SCWRL3 installation is very simple. SCWRL3 can be installed anywhere on your computer, but After SCWRL3 installation please set up the path to it in MolIDE Tools->Options->Scwrl.

 

!!! DATABASES !!!

There are two databases utilized by MolIDE. Please remember to update the databases on a regular basis:

Make sure you have setup the directory where the databases are located on your local machine, by checking Tools->Options->Psiblast from MolIDE menus. You  also need to setup where to download the databases with a click on the Tools->Options->Servers menu item.
For your convenience, the installer includes a recent version of PDBAA database. But you have to download the nr database after the installation, either manually or through Tools->Update DB.

back to index


TYPICAL ORDER OF OPERATIONS


Make sure the file names and paths in which you store the project files for a certain modeling project do not include spaces. This limitation is imposed by some of the third-party programs that cannot cope with spaces in the file names and paths.

1. OPEN SEQUENCE

A sequence file should be FASTA-formatted (by including ">Name" as the first line in the file with the sequence following starting on the next line) and have a ".seq" file extension. Open a sequence file via File->Open->Sequence menu.

back to index

2. RUN PSI-BLAST

While a sequence file is open, run a multiple-round PSI-BLAST using Tools->PSIBLAST menu.

PSI-BLAST is run first against the non-redundant protein sequence database (or a database of the user's choosing that can be set under Tools->Options->Dbases) with a customized version of PSI-BLAST that comes with MolIDE. This version of MolIDE outputs profile checkpoint files after every round each with a unique name. Once the non-redundant sequence database search is completed, the PDB is searched with each of the profiles and a separate PDB alignment file is created. You can change the parameters used during the PSI-BLAST runs using Options->PSIBLAST.

back to index

3. RUN PSIPRED

PSIPRED uses the output PSI-BLAST sequence profiles from the nr database search. PSIPRED should therefore be run only after the PSI-BLAST run in step 2 is completed.

back to index

4. OPEN SECONDARY STRUCTURE PREDICTION

Go to File->Open->Sec Struct Pred. After choosing a ".psipred" file, you are given the option to display all of the predictions based on the matrices generated by PSI-BLAST after each round. These are displayed in a single window with each prediction on a separate line.

A predicted sheet is colored in green and a predicted helix is in red. An unstructered region (loop) is depicted in gray.
The intensity of the color is proportional to the prediction confidence; the darker the color, the higher the prediction confidence.

This view allows you to see if the secondary structure prediction changes as more remotely related sequences are added to the profiles. For proteins with few close relatives, the predictions may be more accurate in later rounds as distantly related sequences provide information on likely secondary structure patterns. However, for proteins with a good number of close relatives, the addition of distantly related sequences with potentially large structural changes (additional secondary structure or missing secondary structure) may degrade the secondary structure prediction.
back to index

5. OPEN LIST OF PDB HITS

Open the PSI-BLAST file containing the alignments of your query sequence with sequences of proteins from PDB. These results are displayed in condensed form as a table. The table can be sorted by each column by clicking on the column header. Clicking again on the same column header will reverse the sorting order for that column.

By double clicking on a certain item in the first column, that specific alignment will be extracted and saved in the project directory (the directory where the sequence file resides) and automatically opened by MolIDE (see point 6. below for the available commands for visualizing an alignment with a certain template).

back to index

6. ALIGNMENT EDITING/TEMPLATE VIEW

This view integrates the sequence alignment, secondary structure prediction, secondary structure of the template, and PDB structure of the template.

If MolIDE cannot find the S2C or PDB files (in the directories specified in Options->Databases) associated with the currently used template, it will automatically try to download from RCSB the template XML file and generate the S2C and PDB files.

a) TEMPLATE VIEW

If MolIDE was able to locate or extract the PDB template file, the template structure will be displayed in the upper portion of the alignment window. The default view of the structures is as Backbone. The whole template protein is displayed in gray, while the part of the structure used in the alignment is displayed in green.

INSERTIONS in the target (target longer than template) are marked by 2 adjacent yellow spheres on the template structure CA atoms surrounding the insertion point. DELETIONS from the template (target shorter than template) are represented by red spheres on the CA atom of that particular residue in the template structure.

Manipulating the structure view:

Left_Button_Dragrotates the structure
Right_Button_Drag_Up/DownZoom Out/In
Middle_Button_Dragmove in XY plane
Double_Left_Click on an atom identifies the template residue and displays it in the 3rd column of the status bar

In the View Menu, the options are:
b) ALIGNMENT EDITING

Generally it is a good idea to edit the target-template sequence alignment manually. Deletions from the structure are least disruptive if the N- and C-terminal endpoints of the deletion are nearby each other in space. Insertions are best placed in the middle of loop regions, not immediately next to regular secondary structure. The correspondence of predicted secondary structure of the target and the experimental secondary structure of the template can be used to guide the alignment. Often PSI-BLAST may fail to align some regions correctly, so if there is other information available, on conserved residues for instance, then the alignment can be edited accordingly.

Moving the mouse over the alignment will display in the status bar the sequence numbers for query and template sequences, as well as the corresponding PDB coordinate residue number in the template PDB.
The color coding scheme for the secondary structure of the template is the same one used for the secondary structure prediction (helix=red; sheet=green). The secondary structure comes from the S2C file (either the downloaded version or that created by XML2PDB).

The third column of the status bar displays the number of identities in the alignment.

Both deletions and insertions are updated at the same time as the sequence alignment. To move a gap over several residues, delete it first, then move to the place of insertion and insert the appropriate number of gap characters as follows:

Shift + Left_Clickinsert gap
Ctrl + Left_Clickdelete gap
Left_Click on a residue in the alignment displays the corresponding residue in the template structure in spacefill mode
Middle_Click on a residue in the alignment adds the residue under the cursor to the list of residues displayed as spacefill

back to index


7. MODEL BUILDING

After the optional fine tuning of the sequence alignment is done, the typical steps that follow are:

a) COPY THE CONSERVED BACKBONE AND SIDE CHAINS

Copy the backbone and the conserved side chains, done using Tools->Copy Backbone menu.

b) BUILD/SUBSTITUTE SIDE CHAINS

Tools->Build Side Chains (SCWRL). The conserved side chains are left in the original conformation from the template crystal structure. This option can be changed with Options->Scwrl.

c) BUILD LOOPS

Loop building is done by first selecting residues for the left and right anchors. These are residues that will be kept fixed while the intervening sequence is modeled using the Loopy program. It is usually a good idea to allow at least 2-3 residues on either side of the insertion or deletion to move during the loop-building process. One option is to make the left and right anchors the last and first residues of the flanking secondary structures respectively. However, if part of a long loop is well conserved, it may be better to select a smaller region that contains less conserved segments. Loopy will sometimes be unable to build a loop if the loop length is too short and the distance to be spanned by the predicted loop is too large. In this case the anchors should be moved apart and Loopy should be run again.

Also note that if residues are missing from the structure due to poor electron density, these are marked with blue squares below the template sequence. These regions can also be built in with Loopy.

Right_Click on a Query residue in the sequence alignment will display a pop-up menu:

After choosing the loop's anchor residues, proceed with "Build Loop".

Proceed repeatedly with loop building until all the insertions/deletions/missing residues are modeled.

back to index


8. PDB VISUALIZATION

Open a PDB file via File->Open->PDB. The available controls are:

Left_Button_Dragrotates the structure
Right_Button_Drag_Up/DownZoom Out/In
Middle_Button_Dragmove in XY plane
Double_Left_Clickidentifies the template residue and displays it in the 3rd column of the status bar

In the View Menu, the options are: Use Alt + F4 to exit full screen mode.

 

9. Database Updating
With the ever increasing number of protein sequences, nr database is updated frequently while PDBAA database is updated weekly. In order to build a more accurate homology model, you are encouraged to update you local copies of the databases regularly. You can do this manually, but it is more convenient to do it automatically through the menu Tools->Update DB.

back to index

 


FILE TYPES

1. SEQUENCE FILE (*.seq)

It is a plain-text FASTA formatted file:

>ProteinName
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Example:
>ptp1b
MEMEKEFEQIDKSGSWAAIYQDIRHEASDFPCRVAKLPKNKNRNRYRDVSPFDHSRIKLH
QEDNDYINASLIKMEEAQRSYILTQGPLPNTCGHFWEMVWEQKSRGVVMLNRVMEKGSLK
...
WKPFLVNMCVATVLTAGAYLCYRFLFNSNT


back to index



2. SECONDARY STRUCTURE PREDICTION FILE (*.psipred)

This file is generated by PSIPRED and is plain-text formatted.

Name Convention: ProteinName_x.psipred, where x is the round number of PSI-BLAST run on which the secondary structure prediction is based.

Example:
# PSIPRED HFORMAT (PSIPRED V2.3 by David Jones)

Conf: 974588987444787068899987507887502212544467676666672532257870
Pred: CCHHHHHHHHCCCCCHHHHHHHHHHCCCCCCCHHHCCCCCCCCCCCCCCCCCCCEEEEEE
  AA: MEMEKEFEQIDKSGSWAAIYQDIRHEASDFPCRVAKLPKNKNRNRYRDVSPFDHSRIKLH
              10        20        30        40        50        60


back to index



3. PDB HITS ALIGNMENT FILE (*.pdbout)

It is generated by PSI-BLAST and contains the sequence alignments with PDB templates after a certain round.

Name Convention: ProteinName_x.pdbout, where x is the round number of PSI-BLAST run.

Example:
BLASTP 2.2.9 [May-01-2004]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= ptp1b
         (435 letters)

Database: f:\acworks\MolIDE_distro_win\db\pdbaa\pdbaa 
           55,990 sequences; 13,839,724 total letters



                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

1A5Y0 330 XRAY 2.50 0.205 0.281 PROTEIN TYROSINE PHOSPHATASE 1B ...   621   e-178
1PYNA 321 XRAY 2.20 0.212 0.224 Protein-tyrosine phosphatase, no...   609   e-174

...

1LARA 575 XRAY 2.00 0.222 0.274 LAR  [HOMO SAPIENS]   438   e-123
1LARB 575 XRAY 2.00 0.222 0.274 LAR  [HOMO SAPIENS]   438   e-123
1GWZ0 299 XRAY 2.50 0.209 0.293 SHP-1  [HOMO SAP...   369   e-102

...

>1LARA 575 XRAY 2.00 0.222 0.274 LAR  [HOMO SAPIENS]
          Length = 575

 Score =  438 bits (1128), Expect = e-123
 Identities = 111/310 (35%), Positives = 160/310 (51%), Gaps = 27/310 (8%)

Query: 1   MEMEKEFEQIDKSGSWAAIYQDIRHEASDFPCRVAKLPKNKNRNRYRDVSPFDHSRIKLH 60
           ++  +E+E ID                  F    + L  NK +NRY +V  +DHSR+ L 
Sbjct: 18  LKFSQEYESIDP--------------GQQFTWENSNLEVNKPKNRYANVIAYDHSRVILT 63

Query: 61  QED----NDYINASLIKMEEAQRSYILTQGPLPNTCGHFWEMVWEQKSRGVVMLNRVMEK 116
             D    +DYINA+ I     Q +YI TQGPLP T G FW MVWEQ++  VVM+ R+ EK
Sbjct: 64  SIDGVPGSDYINANYIDGYRKQNAYIATQGPLPETMGDFWRMVWEQRTATVVMMTRLEEK 123

...


back to index



4. ALIGNMENT WITH ONE TEMPLATE FILE (*.alnonet)

Name Convention: ProteinName_x_TemplatePDBChain_y.alnonet where x is the round number of PSI-BLAST run and y is the fragment number of the query sequence, that is aligned with that particular template PDB. y starts from 0.

Example:
#PSIPRED ptp1b_1.psipred
>1LARA 575 XRAY 2.00 0.222 0.274 LAR  [HOMO SAPIENS]
          Length = 575

 Score =  438 bits (1128), Expect = e-123
 Identities = 111/310 (35), Positives = 160/310 (51), Gaps = 27/310 (8)

Query: 1   MEMEKEFEQIDKSGSWAAIYQDIRHEASDFPCRVAKLPKNKNRNRYRDVSPFDHSRIKLH 60
           ++  +E+E ID                  F    + L  NK +NRY +V  +DHSR+ L 
Sbjct: 18  LKFSQEYESIDP--------------GQQFTWENSNLEVNKPKNRYANVIAYDHSRVILT 63

Query: 61  QED----NDYINASLIKMEEAQRSYILTQGPLPNTCGHFWEMVWEQKSRGVVMLNRVMEK 116
             D    +DYINA+ I     Q +YI TQGPLP T G FW MVWEQ++  VVM+ R+ EK
Sbjct: 64  SIDGVPGSDYINANYIDGYRKQNAYIATQGPLPETMGDFWRMVWEQRTATVVMMTRLEEK 123

...


back to index



5. CONSERVED COORDINATES FILE (*.model)

It is a PDB formatted file that contains the conserved coordinates of the template protein.

Name Convention: ProteinName_x_TemplatePDBChain_y.model where x is the round number of PSI-BLAST run and y is the fragment number of the query sequence, that is aligned with that particular template PDB.

Example:
ATOM      1  N   MET     1      24.015  59.084 111.076  1.00  0.00
ATOM      2  CA  MET     1      23.043  59.548 112.054  1.00  0.00
ATOM      3  C   MET     1      21.628  59.002 111.900  1.00  0.00
ATOM      4  O   MET     1      21.083  58.412 112.836  1.00  0.00
ATOM      5  N   GLU     2      21.028  59.211 110.733  1.00  0.00
ATOM      6  CA  GLU     2      19.661  58.756 110.498  1.00  0.00
ATOM      7  C   GLU     2      19.453  57.241 110.563  1.00  0.00
ATOM      8  O   GLU     2      18.344  56.782 110.824  1.00  0.00
ATOM      9  N   MET     3      20.504  56.466 110.316  1.00  0.00
ATOM     10  CA  MET     3      20.385  55.013 110.378  1.00  0.00
ATOM     11  C   MET     3      20.157  54.675 111.844  1.00  0.00
ATOM     12  O   MET     3      19.237  53.934 112.192  1.00  0.00
...


back to index



6. SCWRL3 SEQUENCE FILE (*.s3seq)

Name Convention: ProteinName_x_TemplatePDBChain_y.s3seq where x is the round number of PSI-BLAST run and y is the fragment number of the query sequence, that is aligned with that particular template PDB.

Example:

QvqlvQsgTEVKKpgAsVKVscKasgYtFTSFDLNwvrqapgQglewMGWMNpNSgKtGya

QKFQgrVtMTRNtsIRtayMELSGlrSedtavyFcarAAIYHYyGmdVwgqgtTvNvssas

...


back to index

 


7. SCWRL3 FULL SEQUENCE FILE (*.s3allseq)

Name Convention: ProteinName_x_TemplatePDBChain_y.s3seqall where x is the round number of PSI-BLAST run and y is the fragment number of the query sequence, that is aligned with that particular template PDB. Different from *.s2seq file, this file has the whole sequence that is aligned to the template, including insertions where the query is longer than the hit and residues on the template with missing coordinate information. It can be used on the command line to add a ligand.

Example:

QvqlvQsgTEVKKpgAsVKVscKasgYtFTSFDLNwvrqapgQglewMGWMNpNSgKtGya

QKFQgrVtMTRNtsIRtayMELSGlrSedtavyFcarNADNVEMAAIYHYyGmdVwgqgtT

...



back to index



8. MODEL COORDINATES FILE (*.pdb)



It is a PDB formatted file.

Name Convention: ProteinName_x_TemplatePDBChain_y.pdb where x is the round number of PSI-BLAST run and y is the fragment number of the query sequence, that is aligned with that particular template PDB.

This file is first generated after the side chains are built with SCWRL3. It is subsequently overwritten by loopy output after each loop is built. When all loops are built this file will contain the final homology model.


back to index