ProteinShop User's Manual - Protein Creation

Protein Creation

In ProteinShop, a protein is created by reading or loading an input file. The current version supports two different input file formats: protein files in PDB format, and prediction files in FASTA format.

PDB Files

A PDB file can be read to create a protein according to an already existing structure. ProteinShop will create a protein model by creating an atom for each read ATOM line, by setting that new atom's position to the 3D cartesian coordinates read from the line, and by creating bonds with all existing atoms that are close enough. The chain of amino acid residues will be created according to the residue type abbreviations given for each atom. ProteinShop uses STRIDE [see ref. below] to determine the secondary structure given the protein 3D coordinates. ProteinShop will assign structure types to each residue, and will therefore create a secondary structure chain.

Figure 1: Protein created by loading example PDB file.
Note:  Frishman, D. and Argos, P. (1995) "Knowledge-based secondary structure assignment". Proteins: structure, function and genetics, 23, 566-579. Also for more information about STRIDE go to http://www.embl-heidelberg.de/argos/stride/stride_info.html.

FASTA Prediction Files

A FASTA prediction file can be read to create a protein "from scratch." When reading a prediction file, ProteinShop creates an amino acid residue chain according to the residue type identifiers provided in the file, and a secondary structure chain according to the secondary structure type predictions provided for each residue. Currently, ProteinShop ignores the prediction confidence values.

To create a protein graph and to assign 3D cartesian coordinates to all created atoms, ProteinShop "simulates" the work of a ribosome. It creates an organic molecule by processing the provided residue type string one residue at a time. ProteinShop has a set of residue template files, one for each residue type, that contain atom positions to build an instance of a residue type in a local coordinate system. While concatenating residues, ProteinShop keeps track of an "end-of-chain transformation" which defines how to translate local coordinates to protein coordinates. When adding a residue to the current partial protein, the new residue's dihedral angles are set according to the new residue's secondary structure prediction, and the end-of-chain transformation is updated to point to the end of the elongated chain. This process is repeated until all residues are processed. As a result, ProteinShop will create a protein with secondary structures fully formed and intact, but no regard for tertiary structure. Here is a FASTA prediction file for the same protein structure contained in the above PDB file:

Conf: 9999999999999999999999999999999999999999999999999999999999999999999999
Pred: CCCCCCEEEEEEEECCCCEEEEEEECCCHHHHHHHHHHHHHHCCCEEEEEEECCCCEEEEEEECCCCCCC
AA: ELTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYANDNGVDGVWTYDDATKTFTVTEMVTEVPVA
The result of loading this prediction file can be seen in Figure 2.
Figure 2: Protein created by loading a FASTA prediction file describing a protein identical to the one described by the PDB file used in Figure 1.