!!! DATABASES !!!
There are two databases utilized by MolIDE. Please remember to update the databases on a regular basis:
Make sure you have setup the directory where the databases are located on
your local machine, by checking Tools->Options->Psiblast from MolIDE
menus. You also need to setup where to download the databases with a click
on the Tools->Options->Servers menu item.
For your convenience, the installer includes a recent version of PDBAA
database. But you have to download the nr database after the installation,
either manually or through Tools->Update DB.
back to index
TYPICAL ORDER OF OPERATIONS
Make sure the file names and paths in which you store the project files for a certain modeling project do not include spaces. This limitation is imposed by some of the third-party programs that cannot cope with spaces in the file names and paths.
1. OPEN SEQUENCE
A sequence file should be FASTA-formatted (by including ">Name" as the first line in the file with the sequence following starting on the next line) and have a ".seq" file extension. Open a sequence file via File->Open->Sequence menu.
back to index
2. RUN PSI-BLAST
While a sequence file is open, run a multiple-round PSI-BLAST using
Tools->PSIBLAST menu.
PSI-BLAST is run first against the
non-redundant protein sequence database (or a database of the user's
choosing that can be set under Tools->Options->Dbases) with a
customized version of PSI-BLAST that comes with MolIDE. This version
of MolIDE outputs profile checkpoint files after every round each with
a unique name. Once the non-redundant sequence database search is
completed, the PDB is searched with each of the profiles and a
separate PDB alignment file is created. You can change the parameters
used during the PSI-BLAST runs using Options->PSIBLAST.
back to index
3. RUN PSIPRED
PSIPRED uses the output PSI-BLAST sequence profiles from the nr database search.
PSIPRED should therefore be run only after the PSI-BLAST run in step 2 is completed.
back to index
4. OPEN SECONDARY STRUCTURE PREDICTION
Go to File->Open->Sec Struct Pred. After choosing a ".psipred" file,
you are given the option to display all of the predictions based on
the matrices generated by PSI-BLAST after each round. These are
displayed in a single window with each prediction on a separate
line.
A predicted sheet is colored in green and a predicted helix is in red. An unstructered region (loop) is depicted in gray.
The intensity of the color is proportional to the prediction confidence; the darker the color, the higher the prediction confidence.
This view allows you to see if the secondary structure
prediction changes as more remotely related sequences are added to the
profiles. For proteins with few close relatives, the predictions may
be more accurate in later rounds as distantly related sequences
provide information on likely secondary structure patterns. However,
for proteins with a good number of close relatives, the addition of
distantly related sequences with potentially large structural changes
(additional secondary structure or missing secondary structure) may
degrade the secondary structure prediction.
back to index
5. OPEN LIST OF PDB HITS
Open the PSI-BLAST file containing the alignments of your
query sequence with sequences of proteins from PDB. These results are
displayed in condensed form as a table. The table can be sorted by
each column by clicking on the column header. Clicking again on the
same column header will reverse the sorting order for that column.
By double clicking on a certain item in the first column, that
specific alignment will be extracted and saved in the project
directory (the directory where the sequence file resides) and
automatically opened by MolIDE (see point 6. below for the available
commands for visualizing an alignment with a certain template).
back to index
6. ALIGNMENT EDITING/TEMPLATE VIEW
This view integrates the sequence alignment, secondary structure prediction, secondary structure of the template, and PDB structure of the template.
If MolIDE cannot find the S2C or PDB files (in the directories
specified in Options->Databases) associated with the currently used
template, it will automatically try to download from RCSB the template
XML file and generate the S2C and PDB files.
a) TEMPLATE VIEW
If MolIDE was able to locate or extract the PDB template file, the
template structure will be displayed in the upper portion of the
alignment window. The default view of the structures is as
Backbone. The whole template protein is displayed in gray, while the
part of the structure used in the alignment is displayed in
green.
INSERTIONS in the target (target longer than template) are marked by 2
adjacent yellow spheres on the template structure CA atoms surrounding the
insertion point. DELETIONS from the template (target shorter than
template) are represented by red spheres on the CA atom of that
particular residue in the template structure.
Manipulating the structure view:
| Left_Button_Drag | rotates the structure |
| Right_Button_Drag_Up/Down | Zoom Out/In |
| Middle_Button_Drag | move in XY plane |
| Double_Left_Click | on an atom identifies the template residue and displays it in the 3rd column of the status bar |
| Shift + Left_Click | insert gap |
| Ctrl + Left_Click | delete gap |
| Left_Click | on a residue in the alignment displays the corresponding residue in the template structure in spacefill mode |
| Middle_Click | on a residue in the alignment adds the residue under the cursor to the list of residues displayed as spacefill |
| Left_Button_Drag | rotates the structure |
| Right_Button_Drag_Up/Down | Zoom Out/In |
| Middle_Button_Drag | move in XY plane |
| Double_Left_Click | identifies the template residue and displays it in the 3rd column of the status bar |
9. Database Updating
With the ever
increasing number of protein sequences, nr database is updated frequently while
PDBAA database is updated weekly. In order to build a more accurate homology
model, you are encouraged to update you local copies of the databases regularly.
You can do this manually, but it is more convenient to do it automatically
through the menu Tools->Update DB.
FILE TYPES
1. SEQUENCE FILE (*.seq)
It is a plain-text FASTA formatted file:
>ProteinName XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXExample:
>ptp1b MEMEKEFEQIDKSGSWAAIYQDIRHEASDFPCRVAKLPKNKNRNRYRDVSPFDHSRIKLH QEDNDYINASLIKMEEAQRSYILTQGPLPNTCGHFWEMVWEQKSRGVVMLNRVMEKGSLK ... WKPFLVNMCVATVLTAGAYLCYRFLFNSNT
# PSIPRED HFORMAT (PSIPRED V2.3 by David Jones)
Conf: 974588987444787068899987507887502212544467676666672532257870
Pred: CCHHHHHHHHCCCCCHHHHHHHHHHCCCCCCCHHHCCCCCCCCCCCCCCCCCCCEEEEEE
AA: MEMEKEFEQIDKSGSWAAIYQDIRHEASDFPCRVAKLPKNKNRNRYRDVSPFDHSRIKLH
10 20 30 40 50 60
BLASTP 2.2.9 [May-01-2004]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= ptp1b
(435 letters)
Database: f:\acworks\MolIDE_distro_win\db\pdbaa\pdbaa
55,990 sequences; 13,839,724 total letters
Score E
Sequences producing significant alignments: (bits) Value
1A5Y0 330 XRAY 2.50 0.205 0.281 PROTEIN TYROSINE PHOSPHATASE 1B ... 621 e-178
1PYNA 321 XRAY 2.20 0.212 0.224 Protein-tyrosine phosphatase, no... 609 e-174
...
1LARA 575 XRAY 2.00 0.222 0.274 LAR [HOMO SAPIENS] 438 e-123
1LARB 575 XRAY 2.00 0.222 0.274 LAR [HOMO SAPIENS] 438 e-123
1GWZ0 299 XRAY 2.50 0.209 0.293 SHP-1 [HOMO SAP... 369 e-102
...
>1LARA 575 XRAY 2.00 0.222 0.274 LAR [HOMO SAPIENS]
Length = 575
Score = 438 bits (1128), Expect = e-123
Identities = 111/310 (35%), Positives = 160/310 (51%), Gaps = 27/310 (8%)
Query: 1 MEMEKEFEQIDKSGSWAAIYQDIRHEASDFPCRVAKLPKNKNRNRYRDVSPFDHSRIKLH 60
++ +E+E ID F + L NK +NRY +V +DHSR+ L
Sbjct: 18 LKFSQEYESIDP--------------GQQFTWENSNLEVNKPKNRYANVIAYDHSRVILT 63
Query: 61 QED----NDYINASLIKMEEAQRSYILTQGPLPNTCGHFWEMVWEQKSRGVVMLNRVMEK 116
D +DYINA+ I Q +YI TQGPLP T G FW MVWEQ++ VVM+ R+ EK
Sbjct: 64 SIDGVPGSDYINANYIDGYRKQNAYIATQGPLPETMGDFWRMVWEQRTATVVMMTRLEEK 123
...
#PSIPRED ptp1b_1.psipred >1LARA 575 XRAY 2.00 0.222 0.274 LAR[HOMO SAPIENS] Length = 575 Score = 438 bits (1128), Expect = e-123 Identities = 111/310 (35), Positives = 160/310 (51), Gaps = 27/310 (8) Query: 1 MEMEKEFEQIDKSGSWAAIYQDIRHEASDFPCRVAKLPKNKNRNRYRDVSPFDHSRIKLH 60 ++ +E+E ID F + L NK +NRY +V +DHSR+ L Sbjct: 18 LKFSQEYESIDP--------------GQQFTWENSNLEVNKPKNRYANVIAYDHSRVILT 63 Query: 61 QED----NDYINASLIKMEEAQRSYILTQGPLPNTCGHFWEMVWEQKSRGVVMLNRVMEK 116 D +DYINA+ I Q +YI TQGPLP T G FW MVWEQ++ VVM+ R+ EK Sbjct: 64 SIDGVPGSDYINANYIDGYRKQNAYIATQGPLPETMGDFWRMVWEQRTATVVMMTRLEEK 123 ...
ATOM 1 N MET 1 24.015 59.084 111.076 1.00 0.00 ATOM 2 CA MET 1 23.043 59.548 112.054 1.00 0.00 ATOM 3 C MET 1 21.628 59.002 111.900 1.00 0.00 ATOM 4 O MET 1 21.083 58.412 112.836 1.00 0.00 ATOM 5 N GLU 2 21.028 59.211 110.733 1.00 0.00 ATOM 6 CA GLU 2 19.661 58.756 110.498 1.00 0.00 ATOM 7 C GLU 2 19.453 57.241 110.563 1.00 0.00 ATOM 8 O GLU 2 18.344 56.782 110.824 1.00 0.00 ATOM 9 N MET 3 20.504 56.466 110.316 1.00 0.00 ATOM 10 CA MET 3 20.385 55.013 110.378 1.00 0.00 ATOM 11 C MET 3 20.157 54.675 111.844 1.00 0.00 ATOM 12 O MET 3 19.237 53.934 112.192 1.00 0.00 ...
QvqlvQsgTEVKKpgAsVKVscKasgYtFTSFDLNwvrqapgQglewMGWMNpNSgKtGya
QKFQgrVtMTRNtsIRtayMELSGlrSedtavyFcarAAIYHYyGmdVwgqgtTvNvssas
...
7. SCWRL3 FULL SEQUENCE FILE (*.s3allseq)
Name Convention: ProteinName_x_TemplatePDBChain_y.s3seqall
where x is the round number of PSI-BLAST run and y is the fragment number of the query sequence, that is aligned with that particular template PDB.
Different from *.s2seq file, this file has the whole sequence that is aligned to
the template, including insertions where the query is longer than the hit and
residues on the template with missing coordinate information. It can be used on
the command line to add a ligand.
Example:
QvqlvQsgTEVKKpgAsVKVscKasgYtFTSFDLNwvrqapgQglewMGWMNpNSgKtGya
QKFQgrVtMTRNtsIRtayMELSGlrSedtavyFcarNADNVEMAAIYHYyGmdVwgqgtT
...
8. MODEL COORDINATES FILE (*.pdb)
It is a PDB formatted file.
Name Convention: ProteinName_x_TemplatePDBChain_y.pdb
where x is the round number of PSI-BLAST run and y is the fragment number of the query sequence, that is aligned with that particular template PDB.
This file is first generated after the side chains are built with SCWRL3. It is subsequently overwritten by loopy output after each loop is built. When all loops are built this file will contain the final homology model.
back to index