The software is distributed under a BSD type license. Make sure you read the license (included in the distribution as LICENSE.txt file) before downloading and using XML2PDB.
XML2PDB is a command line program designed to extract the structure information (.pdb), sequence information ( .pdbaa, .pdbnt) and the correspondence between coordinate and sequence residue numbering (.sc) from the xml structure files. Source code and makefiles for each operating system are included in the distribution for users who may wish to extend the output of XML2PDB.
XML2PDB was initially developed as an auxiliary tool for MolIDE
Currently the xml structure files can be obtained from
XML2PDB is provided in binary form for the following operating systems:
and the installation kits for each operating system have the following names, respectively:
After you download the installation kit appropriate for your operating system, you should uncompress it:
for Windows, you have to use WinZip
for Linux you have to ungzip and untar the appropriate installation kit.
gzip -d xml2pdb_lin.tar.gz
tar -xf xml2pdb_lin.tar
The previous step will generate directory xml2pdb_win / xml2pdb_lin that contains the executable, the source code and an example.
XML2PDB provides a stripped-down PDB file for homology modeling purposes. This file contains the title, SEQRES records, and coordinates. Users who wish additional records in their PDB files are urged to edit the source files and recompile the program.
These files provide the correspondence of the residue numbering implicit in the sequence (1,2,3,...) and that used in the coordinates. The coordinate residue numbering may not start with 1, may skip some residue numbers, and may add insert codes so that a residue may be numbered 62A. This correspondence is not provided in the legacy PDB format, but is contained within the mmCIF and XML file formats now provided by RCSB. These files may be used to provide this information to programs that use the legacy PDB format. They have the following format:
One letter residue code
SEQRES three letter residue code
ATOM three letter residue code
SEQRES residue number
ATOM residue number
PDB secondary structure
Example 1 (from 1o0d.sc)
SEQCRD L T THR --- 1 - -
SEQCRD L F PHE --- 2 - -
SEQCRD L G GLY GLY 3 1F C
SEQCRD L S SER SER 4 1E C
SEQCRD L G GLY GLY 5 1D C
SEQCRD L E GLU GLU 6 1C C
SEQCRD L A ALA ALA 7 1B C
SEQCRD L D ASP ASP 8 1A C
SEQCRD L C CYS CYS 9 1 C
SEQCRD L G GLY GLY 10 2 C
Example 2 (from 1o07.sc)
SEQCRD A A ALA ALA 1 4 C
SEQCRD A P PRO PRO 2 5 H
SEQCRD A Q GLN GLN 3 6 H
SEQCRD A Q GLN GLN 4 7 H
SEQCRD A I ILE ILE 5 8 H
SEQCRD A N ASN ASN 6 9 H
SEQCRD A D ASP ASP 7 10 H
SEQCRD A I ILE ILE 8 11 H
SEQCRD A V VAL VAL 9 12 H
SEQCRD A H HIS HIS 10 13 H
This file contains the sequences in FASTA format for each peptide chain.
The header has the following structure:
>StructName_And_Chain ChainLength Method Resolution RFactor FreeRFactor Descr <DBCode> [Organism]
PDB chains without chainids are specified with an underscore character. The protein name is obtained from the SwissProt and GenBank records (listed in DBCode) in the XML file. The organism name is obtained from the scientific name given in the XML file.
This file contains the nucleotide sequences for each nucleic acid chain. The header has the same structure as the one for PDBAA file.