CCP4 Tutorial: Molecular Replacement

We will solve the hypF structure by molecular replacement, using several programs and approaches. Other MR examples can be found at the end of this tutorial, and at:

CCP4 tutorial (available in CCP4 suite). This uses a cardiotoxin 1tgx.

Molrep tutorial (Alexei Vagin).

Phaser tutorial (Cambridge).

Setting CCP4I project

In CCP4I Directories & Project Dir window, set up a new project 1_intro with corresponding working directory.

Checking the data

Target is the acylphosphatase-like domain of hydrogenase maturation factor HypF from E.coli, see Rosano et al, JMB, 321, 785 (2002). HypF-ACP sulphate and phosphate complexes deposited as 1gxt and 1gxu respectively.

This protein has a Hg derivative. You may have processed this data in a preceding tutorial. We have prepared a reflection file for you including the data from 1gxu, 1gxt, the Hg derivative, and some experimental phases based on the Hg sites.

There is native data in H32 to 1.3 A resolution. The target has 91 residues and a Matthews calculation strongly suggests only one molecule in the asymmetric unit.

We first use Sfcheck to check a few things about the data:

  1. Select Data Reduction and Analysis > Check Data Quality > Analysis with sfcheck to open the sfcheck task window.
  2. Enter a title.
  3. Make sure that Run Rampage to analyse structure geometry and Run Procheck to analyse structure geometry are unselected (we do not yet have any coordinates) and Run Sfcheck to analyse experimental data only is selected
  4. In the line MTZ in select the file hypF-1gxu-1gxt-HG_scaleit1.mtz
  5. Select the labels F FP1gxu, SIGF SIGFP1gxu and Free FREE
  6. Check that a suitable filename has been generated for Sfcheck Output PS
  7. Keep all defaults, and click Run -> Run Now.

Sfcheck produces a postscript file with some useful things (see under View Files from Job):

Also check the log file View Files from Job then View Log File:

Choice of search models

The target is an acylphosphatase-like domain. A search of the PDB reveals two acylphosphatases with a sequence identity to the target of about 31% - 1v3z and 1w2i. Each has two chains in the asymmetric unit, either of which could be used as the basis of a search model.

Normally you would use something like Chainsaw at this point to prepare a search model from the template. As an exercise, we are going to try MR straightaway. We will return to Chainsaw later before running Phaser.

Notes on Sequence Alignment

There are many ways of approaching this, and the different tools will give slightly different assessments. The sequence identity depends on the definitions used (i.e. treatment of gaps and alignment length), the specific alignment technique, and whether bits have been chopped out of the model.

Molrep

Run 1

We will use chain B of 1v3z as the search model.

  1. Select the Molecular Replacement module and open the Run Molrep - auto MR task window.
  2. Enter a title.
  3. Do molecular replacement performing rotation and translation function should be already selected.
  4. For MTZ in select the file hypF-1gxu-1gxt-HG_scaleit1.mtz
  5. Select the labels FP FP1gxu and SIGFP SIGFP1gxu
  6. For Model in select the file 1v3z_B.pdb
  7. (Optional) You can use an upper resolution cut off of 3A to speed up the calculation, see folder Experimental Data.
  8. Keep all defaults, and click Run -> Run Now.
When the job has finished, look at the log file (View Files from Job -> View Log File). Note the following:

Run 2

In fact, we can make use of our knowledge of the target, and this will often improve the solution. The search model has a moderately low sequence identity with the target and therefore the majority of the side chains are incorrect. Molrep can make use of the target sequence to improve the search model.

  1. Select the previous job, and click ReRun Job
  2. Most of the parameters should be set correctly, but you should change the title, and the name of the Coords out file, so that it is different from the first job.
  3. This time, select Use sequence in the protocol section. A folder Seq in will open below where you can specify the target sequence file hypF_Ndom.seq
  4. Click Run -> Run Now

Look at the log file of this job.

Checking the solution

The top MR solution is applied to the input coordinates, and the positioned PDB file is written out as 1v3z_B_molrep2.pdb. The contrast indicates that this is probably a correct solution, but this should be checked!

The positioned model can be submitted for a few cycles of automated refinement, then checked manually against 2mFo-DFc and mFo-DFc maps, using a graphics program such as Coot. Since we have a good resolution dataset, the model can also be passed to ARP/wARP for rebuilding. Refinement, validation and model re-building are covered in other tutorials.

In fact, the Molrep solution is related to the deposited structure 1gxu by the symmetry operation -Y+2/3, X-Y+1/3, Z+1/3. Comparison of the structures in CCP4mg shows that the beta sheet and one of the two helices are well matched, but there are significant differences elsewhere.

In general, if we want to compare an MR solution to the deposited structure, then we need to take into account possible symmetry operations and possible changes of origin. Two solutions may be identical, even if it is not obvious from a quick look in a graphics program. This can be checked with the csymmatch utility:

  1. Select the Symmetry match models task in module Coordinate Utilities.
  2. Enter the MR solution PDB file as the Work PDB in, and the deposited structure (1gxu) as Reference PDB in.
  3. Select Apply origin shift and hand correction and run.
The log file reports the symmetry operator and change of origin which give the best match, and a normalised score for the match is reported. The output PDB file has this transformation applied, and can be compared to the reference PDB file. Of course, usually we don't have a deposited structure to compare with, but the same process is useful to compare different MR solutions.

Chainsaw

Search models can also be prepared using Chainsaw. Chainsaw takes an external sequence alignment, which can be generated by many bioinformatics tools and/or manually adjusted. In this job, we will create a model based on chain B of 1v3z, using a previously prepared alignment to the target.

  1. Select the Molecular Replacement module and open the Create Search Model task window in the Model Generation folder.
  2. Enter a title.
  3. Leave Create search model using Chainsaw unchanged.
  4. Leave Prune non-conserved residues to gamma atom unchanged.
  5. For PDB in select the file 1v3z_B.pdb
  6. Use the sequence alignment format PIR and for Alignment in select the file 1v3z_B_to_target.pir
  7. Click Run -> Run Now

Chainsaw produces a coordinate file 1v3z_B_chainsaw1.pdb which is an edited version of the input PDB file. 6 residues that do not align to the target sequence have been deleted. Of the rest, 34 have been left unchanged and 50 have had their side chains cut back to the gamma atom. The output PDB file uses the naming and numbering of the target sequence.

Have a look at the log file:

Now repeat this exercise using the other search model, based on chain A of 1w2i. We can overlap the two models and use the ensemble as input to Phaser (in place of individual search models).

  • For PDB in select the file 1w2i_A.pdb
  • Use the sequence alignment format PIR and for Alignment in select the file 1w2i_A_to_target.pir

    Aligning the models

    These models can be aligned and the overlapped structures used as input to Phaser.

    1. Select the Coordinate Utilities module and open the Superpose Molecules task window.
    2. Enter a title.
    3. Change mode to Superpose using gesamt.
    4. Enter Moving 1w2i_A_chainsaw1.pdb
    5. Enter Fixed 1v3z_B_chainsaw1.pdb
    6. Enter PDB out 1w2i_A_to_1v3z_B_chainsaw1.pdb
    7. Click Run -> Run Now

    The 1w2i_A_chainsaw1.pdb has been moved to overlap 1v3z_B_chainsaw1.pdb. The log file shows the transformation used, and gives an RMSD = 0.305 A between 84 C-alpha atoms of the superposed structures.

    Phaser

    Using the superposed search models generated by Chainsaw, we will now use Phaser to solve hypF. Phaser is designed to use ensembles of models to improve the signal.

    1. Select the Molecular Replacement module and open the Run Phaser task window.
    2. Enter a title.
    3. Leave Mode for molecular replacement automated search unchanged.
    4. For MTZ in select the file hypF-1gxu-1gxt-HG_scaleit1.mtz, and select the labels FP FP1gxu and SIGFP SIGFP1gxu
    5. In the folder Define ensembles ..., enter the PDB #1 1v3z_B_chainsaw1.pdb. Set the similarity to be sequence identity 0.38
    6. To add another model click Add superimposed PDB file to the ensemble, enter the PDB #2 1w2i_A_to_1v3z_B_chainsaw1.pdb. Set the similarity to be sequence identity 0.38
    7. In the folder Define composition of the asymmetric unit, select Total scattering determined by components in asymmetric unit, and for the SEQ file select the file hypF_Ndom.seq, and leave Number in asymmetric unit 1 unchanged.
    8. In the folder Search parameters, select Perform search using ensemble1
    9. Click Run -> Run Now

    Have a look at the log file:

    Checking the solution:

    MrBUMP

    You have now prepared three search models based on 1v3z, and used Molrep and Phaser to do the molecular replacement. These steps, and the initial discovery of 1v3z and other related proteins, are automated in the program MrBUMP.

    Depending on what you want to do, MrBUMP can make use of web-based services. The following tutorial deliberately does not make use of the web, so that it can be run anywhere. At the end of the tutorial, there are suggestions for web-based options. The use of a few local PDB template files also means that the tutorial is fairly quick. Beware that a full run of MrBUMP might take longer than is reasonable for a tutorial.

    1. Select the Molecular Replacement module and open the Run MrBUMP task window.
    2. Enter a title.
    3. Leave Program Mode Model search and Molecular Replacement unchanged.
    4. For SEQ in select the file hypF_Ndom.seq
    5. For MTZ in select the file hypF-1gxu-1gxt-HG_scaleit1.mtz, and select the labels F FP1gxu, SIGF SIGFP1gxu and Free FREE
    6. Leave the rest of the files folder unchanged, and move to the Template Search Options folder.
    7. Un-check Do a FASTA search for possible template models. Instead we are going to use some known local templates.
    8. Un-check Update local copies of search databases
    9. Select Multiple alignment program Mafft if available
    10. Un-check all Additional search methods, i.e. SCOP, PQS and SSM
    11. The folder User specified search models will have opened. Because we have switched off all search options, we are required to use local files. Click on Add PDB file 3 times to add 3 local PDB files. The first file is 1w2i_A.pdb and Chain identifier A. The second file is 1v3z_B.pdb and Chain identifier B. The third file is 2acy.pdb and Chain identifier A.
    12. In the folder Search Model Preparation Options, keep the default which is to use Molrep, Chainsaw and Sculptor. This means there will be 9 search models in total. Turn one or two off to make the job quicker.
    13. In the folder Molecular Replacement and Refinement Options, keep Molrep and switch off Phaser. If you want, you can use Phaser instead of Molrep or both.
    14. In the folder Model Building and Phase Improvement, select the model building programs to try after MR and refinement. By default Buccaneer is set but depending on your installation you may be able to try ARP/wARP and c-alpha tracing with SHELXE as well. Model building can help determine if MR has been successful.
    15. Click Run -> Run Now

    After a few minutes, have a look at the MrBUMP log file (do not wait for the job to finish).

    By default, it will finish when it finds a solution. For example, it may finish with model loc1_B_MOLREP, which corresponds to template 1v3z_B.pdb with a search model created with the Molrep editing features. The Rfree drops from 0.549 to 0.436 (precise numbers may vary!) indicating that the MR solution is refinable, and likely to be correct. If you want to try all search models in MR (a good idea unless you are in a rush), select Finish when all of the search models have been tried in MR in the folder Molecular Replacement and Refinement Options.

    If there are no problems accessing web-based services, then you can search for templates rather than use local PDB files. Run as above, with the following differences:

    1. In the folder Template Search Options, check Do a FASTA search for possible template models.
    2. Check Run the FASTA search locally. This refers just to the search step - the PDB files are still downloaded from the web.
    3. Check all of the Additional search methods, i.e. SCOP, PQS and SSM
    4. Do not enter anything into the folder User specified search models.

    For comparison, here are some example results from MrBUMP (you may not get exactly the same):

    PDB chain    sequence identity    source / release date Rfree from MrBUMP
    1w2i_B 0.310 OCA - released Apr 2005 chainsaw 0.447 molrep 0.442
    1w2i_A 0.310 OCA chainsaw 0.471 molrep 0.527
    1v3z_B 0.310 OCA - released Mar 2005 chainsaw 0.430 molrep 0.453
    1v3z_A 0.310 OCA chainsaw 0.474 molrep 0.470
    2bje_G 0.287 OCA - released Nov 2005 chainsaw 0.458 molrep 0.442
    2bje_E 0.287 OCA chainsaw 0.468 molrep 0.486
    2bje_C 0.287 OCA chainsaw 0.491 molrep 0.481
    2bje_A 0.287 OCA chainsaw 0.448 molrep 0.443
    2bjd_B 0.287 OCA - released Nov 2005 chainsaw 0.468 molrep 0.529
    2bjd_A 0.287 OCA chainsaw 0.544 molrep 0.466
    1y9o_A 0.275 OCA - released Jan 2006 (NMR) (not tried)
    1ulr_A 0.286 OCA - released Nov 2004 chainsaw 0.476 molrep 0.471
    2acy_A 0.264 SSM - released Nov 1997 (authors tried)  chainsaw 0.539 molrep 0.564

    Advanced tutorial (OPTIONAL)

    Other search models for hypF

    Another possible search model is chain A of 1w2i. This is a different structure of the same protein as 1v3z. Try repeating the above steps using 1w2i_A.pdb as the search model.

    You should find that this is more difficult! Modifying the search model using the target sequence is now necessary. Adjusting the resolution limits also helps.

    Check your solutions against those produced from 1v3z_B.

    Other structure soltuions

    See separate document for 3 more example MR problems.