Work-alone Exercises (Bioinformatics 201):

Hints, Tips, Tricks and Answers

 

1.     Access the Protein Data Base (PDB) at Rutgers (www.rcsb.org/pdb/) and perform the following operations:

A.     Enter “1CRN” for the PDB ID ® Download/Display File (1CRN = crambin)

Go to the Search sidebar (right side) ® Searchlite: Enter 1CRN ® Explore

B.     Go to: File Format ® PDB Text (complete with coordinates) and click on.

C.     Save full entry to disk (1CRN.pdb). You will need this file for later questions.

 

 

2.      Crambin consists of 46 residues and this exercise is aimed at determining some characteristics from its primary sequence. You will need to convert the triple letter code of the protein sequence (1CRN) to single letter form (FASTA format) for these analyses. This facility is provided by the ExPASy Server (http://www.expasy.ch/tools/):

The algorithm for converting triple letter to single letter protein code format seems to have been relocated or has vaporized! As an alternative, you may enter the Swiss-Prot accession number for crambin in the text box (P01542; note that the “0” is a zero and not the letter “O”) or key in the sequence below. The latter may also be accomplished by copy/paste.

 

TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

 

For the exercises below, access the ExPASy server (http://www.expasy.ch/tools/) and perform the analyses in black, bold type; the section headings for these is denoted in square brackets.

 

A.     Provide the theoretical isoelectric point (pI) and molecular mass (MW) for 1CRN (Compute pI/Mw). [Primary sequence analysis]

Perform these analyses for the entire sequence. The following values should be obtained: pI = 5.73 and Mw = 4736.46.

WARNING: These theoretical calculations are only reliable for proteins that are not post-translationally modified, e.g., glycosylation is capable of significantly changing these values. Many investigators have wasted time in devising separation techniques, e.g., ion-exchange chromatography for proteins deduced from DNA sequencing, only to find that they don’t work.

 

B.     Using the theoretical pI and MW values for 1CRN, as well as only four residues constituting the N-terminus (TTCC), determine how these parameters function to identify related proteins and how a very small amount of sequence data is extremely advantageous (TagIdent). [Protein identification and characterization]

Enter the theoretical pI + 0.25 and Mw 4736 + 20% (default ranges) without any N-terminal sequence information and then repeat the calculation with “TTCC” as the N-terminus. From the thousands of sequences in Swiss-Prot, you should obtain about 81 potential identities without any sequence data; with sequence data, only crambin is selected.

 

C.     Two-dimensional PAGE and MS analyses form the basis of Proteomics. To demonstrate this point, theoretically cleave 1CRN with trypsin and display all peptide fragments; leave all other parameters as default (PeptideMass). [Protein identification and characterization]

Accept all default settings and perform digestion with trypsin. You should obtain three fragments (residues 1-10, 11-17 and 18-46) consistent with cleavage at the carboxyl termini of basic residues [Lys (K) or Arg (R)]. This is a convenient approach preliminary to MS or Edman degradation sequencing, especially if the N-terminus of the intact protein is blocked.

 

D.     Theoretically determine the secondary structure of 1CRN with nnPredict, as well as with Garnier’s GOR IV. [Secondary structure Prediction] Provide output files for both analyses and compare with the x-ray crystallographic data (1CRN PDB file).

The GOR IV website has been difficult to access of late, so if you have a similar problem, go to http://molbiol.soton.ac.uk/compute/GOR.html. A comparison of the secondary structure predictions with the x-ray data (PDB file: rows designated as HELIX, SHEET and TURN) underscores the low degree of accuracy obtained by the former, especially for this small protein. For example, both nnPredict and GOR fail to identify two prominent helical (H) regions and overestimate the content of sheet (E) structures.

 

Swiss-Prot is famed for its excellent annotation of protein sequences. As an example, access the database (ca.expasy.org/sprot) and enter “crambin” (Quick Search).

 

3.      Download the latest version of Swiss PDB Viewer (www.expasy.ch/spdbv/) according to the procedure outlined in slide #72.

Be sure to choose the correct platform; four choices are available. Downloading the tutorial/user guide is strongly recommended. Given a choice between self-extracting (*.exe) vs. *.zip files, the former are usually easier to manipulate. The major test of your patience will be to make sure that the loop database is in the “_stuff” directory. Otherwise, it will not function!

 

 

4.      Start the Swiss PdbViewer and open the file 1CRN.pdb. If necessary, size and position the molecule so that it fills the main screen. [Ignore the screen that warns that some residues have unrealistic B-values.]

A.     Select: Secondary Structure: non-Trans amino acids and accept Torsion cutoff of 175o. From the Control Panel, provide a rough approximation of amino acids having a cis configuration. What does this tell us in view of the fact that trans is more energetically favorable?

When this selection operation is performed, six of the residues in the Control Panel will be displayed in red; these include T30, G31, C32, I34, T39 and Y44; therefore, only 6 of the 46 residues of crambin are in the cis configuration. Please note that this is not always the case, e.g., a beta-barrel protein like bovine pancreatic trypsin has a surprisingly high percentage of cis residues!

 

B.     Select: Secondary Structure: Helices, and Window: Ramachandran Plot and save these data (File: Save: Ramachandran Plot Values) as a text file, e.g., CRNHEL.TXT.

When this operation is performed, residues that occur in helices are selected and displayed in red in the Control Panel. Also, a new window is displayed in which the distribution of these residues is displayed in the four quadrants of a Ramachandran Plot. Each of the points in the plot can be identified as to residue by placing the cursor over the point (do not click). In this instance, all of the points are distributed in phi (-), psi (-) quadrant; there are no outliers.

 

NOTE: Residues may be selected by various criteria, e.g., chemical or physical characteristics, associated structural domain, accessibilities, etc. Any operation performed after selection (display, color etc.) only applies to selected residues, which facilitates emphasizing certain residues in the context of all remaining residues.

 

C.     Select: Secondary Structure: Strands, and Window: Ramachandran Plot and save these data (File: Save: Ramachandran Plot Values) as a text file, e.g., CRNSTR.TXT.

When this operation is performed, residues that occur in strands (sheets) are selected and displayed in red in the Control Panel. In this instance, all points are distributed in the phi (-), psi (+) quadrant; there are no outliers.

 

D.     Select: Secondary Structure: Coils, and Window: Ramachandran Plot and save these data (File: Save: Ramachandran Plot Values) as a text file, e.g., CRNCOIL.TXT.

When this operation is performed, residues that occur in coils are selected and displayed in red in the Control Panel. The term “coil” is outmoded, but is still used to indicate any protein structure that is not regular, i.e., helix or sheet in which hydrogen bonding is a dominant, repeating characteristic. In this instance, the points are scattered in several quadrants, with most occurring in the phi (-), psi (+) quadrant. There are a few outliers and these are glycine (Gly) residues. As discussed in class and the lecture notes, Gly has more freedom of rotation because it does not have an appended R-group.

 

E.      Provide a printout of the w, f and y torsion angles for each of these structures. In what quadrants do most of the f and y torsion angles lie for each of these structure types?

See answers above.

 

 

5.      Compare the structures of 1CRN (“wild type”) to the model of the “mutated” crambin (computer lab exercise).

 

Background: As explained during the presentation, the primary sequence of crambin (1CRN) was randomly mutated by tossing four coins and counting the number of heads for determining the positions to be mutated. These residues were then substituted using the BLOSUM62 matrix (Bioinformatics 101); the third best substitution was selected in each case. The resulting “mutated crambin” is 56.5% identical to “wild type” crambin and was submitted during the computer lab exercises for modeling by the “first approach mode.” The sequence of mutated crambin:

 

NTCCASIMARNNFNSCQLPSTPEVLCTTNASCLIIPSANCNSDIAE

 

Responses to modeling request: You should have received four email responses. The first two are trivial and acknowledge the fact that you submitted a request and alert you to news concerning SwissModel. The third email (SwissModel_Tracelog) provides a record of how the modeling was performed from selection of templates of known 3D-structures to energy minimization. The fourth email (SwissModel_Model) contains the 3D-coordinates of the model (the “target”), as well as the coordinates of all the templates, as an attachment. The latter should always be saved as a text file to preserve the original formatting, but with the pdb file extension. Molecular viewers recognize a limited set of file extensions and “expect” to find XYZ coordinate data in strict column positions, as emphasized in the presentation. If the file is reformatted as a *.doc file, for example, the data will likely be corrupted.

 

Visualizing the model: When the model pdb file is visualized with Swiss-PDB Viewer, a total of four windows should open. The large window contains the structural model itself, and the Control Panel is shown along the right side. Two smaller windows include an alignment of the target to the templates (lower left) and a Layer Infos window (upper right). The model looks “fuzzy” when first opened because the default is to visualize the target and all templates simultaneously (Layer Infos). By clicking on/off, any combination of target and templates may be visualized.

 

Quick comparisons of target (model) and templates: First comparisons are most easily made by initially comparing backbone structures only and assigning a specific color to each. To do this, place a checkmark opposite the target and each template in the CA (alpha carbon) column of the Layer Infos window, uncheck “side” and “ribn” columns in the Control Panel, and Color: by layer from the Main Window. This results in superposed backbone structures for the target (yellow), 1CNN (blue), 1CNR (green), 1CCM (red), 1CRN (gray) and 1AB1 (violet). It should be reasonably obvious from inspection of these structures that all are generally coincident, but diverge most noticeably in the loop region connecting the two alpha helices (residues 19-22). Please note that the target is most like 1CCN, and probably reflects that this template was probably given more weight in modeling the loop for the model. A more important point is the fact that even the template structures, determined by x-ray and NMR, differ quite markedly among themselves in this region and no model can be better than determined structures!

 

Root Mean Square (RMS) Determinations: More concrete data concerning the previous points can be derived from RMS determinations. To do this, go to the Layer Infos window and click on the target and select all residues in the Control Panel except OXT46, so that “46” shows in the Sel column (Layer Infos). Do this for 1CRN and 1CCN, as well. [All must have “46” in the Sel column, because RMS can only be determined with paired values (slide #64).] Then, go to Tools (Main Window): Calculate RMS: CA (carbon alpha) only and select target and 1CRN: OK. This results in a value of 0.66 Ǻ for the overall structures of target vs. 1CRN; a value of 0.53 Ǻ is obtained by comparing target vs. 1CCN. For RMS determined for backbone atoms only, the corresponding values include 0.62 Ǻ (target vs. 1CRN) and 0.59 Ǻ (target vs. 1CCN).  In summary, these data are entirely consistent with the Chothia/Lesk premise (slide #61).

 

Divergence within the loop regions (residues 19-22) is equally obvious by repeating the operations for these residues only. In this case, a comparison of target to 1CRN provides a value of 2.02 Ǻ (CA only) and 1.85 Ǻ (backbone atoms only). At the other end of the spectrum, the RMS deviation for the first helix (residues 7-18) of the target and 1CRN is 0.23 Ǻ (CA only) and 0.27 Ǻ (backbone atoms only). Therefore, the assumptions made from visual inspection are verified by real calculations.

 

The Rotamer Library: The extensive changes in the 1CRN sequence for the previous exercises were meant to illustrate the modeling of a similar protein from another plant species. Many times, however, e.g., in protein engineering, only a few specific changes are desired, in which case the rotamer library is very important. To illustrate this point, load the 1CRN.pdb file in order to substitute only one residue in the loop structure (residues 19-22), i.e., PGTP to PSTP or glycine to serine. In this case, the Layer Infos panel contains only one structure. Checkmark CA in this window, and uncheck all “side” column positions except for residues 19-22 (Control Panel). Finally, checkmark “label” column for residues 19-22 and position structure so that you have a clear view of GLY20. Click on the MUTATE button (Main Window), the CA atom of GLY20 and then select SER from the menu. In this case, you have a choice of 7 rotamers (click on small arrowheads), so select one that has no clashes (absence of dashed pink line), and accept it by clicking MUTATE again. Note: The resulting structure would then be submitted to Swiss-Model for energy minimization, as the final step.

 

The Loop Database: As emphasized in the presentation and the exercises above, modeling-by-homology progresses from structurally conserved regions (SCRs) and, then, these are connected by “loops.” A loop database, therefore, is an important resource for finalizing a model. To illustrate this point, load the 1CRN.pdb file and perform all steps as above, but checkmark the “label” column (Control Panel) for residues 18-23. There are two ways to proceed, either separately or in tandem. In the Main Window, click on Build: Build Loop or Build: Scan Loop Database. For both of these procedures, you will be asked to pick the first anchor point and then the second anchor point, which is equivalent to picking the residue immediately before and after the loop sequence, respectively, i.e., LEU18 and GLU23. In both cases, you will be presented with a number of choices, in no particular order, but they can be ordered by various criteria, e.g., force field (FF) by clicking on “FF.” When a selection is made, make sure you have a clear view of the loop to see how it changes. Obviously, your selection will be based on lowest FF, PP, number of clashes, etc. Liberal use of the tutorials and practice is really needed to make these points clear. Finally, the loop-modified model needs to be submitted to Swiss-Model for energy minimization, as in the previous exercise.