Welcome to our first tutorial!
Topics covered
- Short introduction to computational biophysics.
- Installing VMD
- Quick detour: amino-acids.
- Visualizing a protein with VMD and performing simple analyses.
This tutorial is adapted from Using VMD
A high-level overview of simulations
What do we gain from simulations?
Image adapt. from Allen & Tildesley Computer simulations of liquids, Oxford Science Publ.
Exact Mathematical models of complex systems are unattainable due to the very high number of degrees of freedom. One can hope to build an approximate physical theory. This poses some practical downsides though.
- The time required to build and validate a proper physical theory might be well beyond what is available to a project.
- Validating the theory requires a large amount of data from experiments -> more time, more money. It is often more convenient to simulate starting from first principles (e.g. Newton’s laws, Q.M., Statistical Mechanics)
First proper computer simulations (~1940-1950)
- Manhattan project.
- Meteorology/climate.
- equations of state of liquids.
- Evolutionary problems (cellular automata).
Nowadays
- Insight into otherwise intractable problems.
- In-silico experiments.
- Industrial workhorses.
Adapted from P. Carloni
The machinery for simulating a biomolecule
Practically, the machinery we are going to use can be exemplified in the following way.
In a nutshell, the ingredients we need are:
- the coordinates of the system;
- the structure of the protein, related to the force field we use;
- the force field parameters for the interactions;
- a configuration file with the appropriate settings of the used algorithms;
- a software to integrate the Newton’s equations;
- the trajectory: a collection of coordinates that follow the evolution of the system.
We will rapidly see some of the files in this lesson, but keep in mind this scheme, since it will be useful later!
Software
Let’s start from the basic. For this tutorial we will use VMD, a visualisation program
that will allow us to render pretty images and analyse our simulations as well.
The computers provided to you already have VMD installed.
If you want the software on your own laptop, you can download it from download VMD
with a simple registration. Choose the version 1.9.3
suitable for your operating system.
Bash
The Bourne Again SHell (bash) is a command-line interpreter that allows us to execute commands via text instructions. Let’s have a quick overview of the basic of bash
in the meanwhile.
On Ubuntu type ctrl-alt-T
in order to open a terminal. For macOS, you should find the Terminal in the Applications/ folder.
For Windows users, use the lab computers.
To check what bash
you are using, type which bash
. You should see something like
/bin/bash
/usr/bin/bash
Let’s have a walk into the computer filesystem. First of all, type pwd
(print working directory). The output should be something like:
/home/your_username
What is inside your folder? Type ls
(list) and read the output.
What changes if you type “ls -lh”? (without “”)
Let’s create now a new directory for the course.
$ mkdir QCB_course
Bash does not like white spaces, so use the underscore _
.
With pwd
we can see that we are still in the /home
directory. Try!
Let’s go inside
QCB_course
.
$ cd QCB_course
To create an empty file, you can use touch dummy_file.sh
. We will see in a minute
what the extension .sh
means.
We have no interest in the dummy_file.sh
, so we can remove it with:
$ rm dummy_file.sh
Of course, we are all smart, but in order to avoid unwanted deletion, let’s add a
small command to your ~/.bashrc
(the .
at the beginning of a filename makes it “invisible”).
First, let’s make a backup copy of the file:
$ cd #to go into the home folder
$ cp .bashrc .OLD_bashrc
Guess what cp
does.
Finally we can add our line:
$ echo “alias rm=’rm -i’” » .bashrc
If we go into ~/QCB_course
folder, create a new file and remove it, we should be
asked for confirm. Right? (Do the same on a new terminal or type source .bashrc
)
With bash
we can perform some basic arithmetics.
Create a new file called my_script.sh
and open it with a text editor.
Try to use an editor like ViM, Emacs or nano (without a GUI)
Write the following text into the file and try to understand what it does:
#!/bin/bash
i=5
for j in {1..4}
do
echo "Adding 1 to $j: $((j + 1))"
done
echo "Math (should) work! 5*4 = $((i*j))"
To launch this script:
$ bash my_script.sh
You can also launch the commands with ./my_script.sh
, if your file is an executable.
List the content of your folder and then do:
$ chmod 700 my_script.sh
If you “list” again, did somethin change?
bash is not the only shell you have on your computer!
To further learn about bash
, feel free to search it on Google.
Installing VMD (optional)
Extract the folder, run the configuration, go into the src/
folder and install VMD.
cd Downloads/
tar xvzf VMD_something.tar.gz
cd VMD_something
./config
cd src
make install
Now you should be ready to go.
Getting started
Download the source file. Move the .tar.gz
archive into QCB_course
and
untar it with
$ tar xvzf tutorial1.tar.gz
$ rm -f tutorial1.tar.gz
Now launch VMD:
$ vmd
and you should have the following:
VMD graphical interface.
If everything works, close it and let’s define a common playground.
Quick and Easy recap
There are a lot of amino acids out there.
But only 20 of them are coded into our DNA. We will deal mainly with these residues, but it is not uncommon to have acetylated residues especially in the termini.
“What is a terminus or a residue or an amino acid?” I hear you ask.
Amino acids are the building block of proteins. If isolated, an amino acid has an amine (-NH2)1 and a carboxyl (-COOH) group. If linked with others to form a polypeptide, they lose a hydrogen from the N-terminal (amine group) and an OH from the C-terminal (carboxyl group) and we have a residue. We will usually use residue without this distinction. Ça va sans dire, the first and the last residue have to be capped with the proper terminal end since they are not linked from both sides.
All amino acids have a Carbon, called carbon-alpha, where a functional group is attached. This group is specific to each amino acid and it is called sidechain, while the other (heavy) atoms, present in each residue, constitute the backbone.
Below we have an Alanine with different segments highlighted that exemplifies the nomenclature just introduced.
a) Alanine amino acid in Licorice with the C-alpha shown. b) Residue of the same amino acid.
c) Backbone (bottom) and sidechain (top).
Further information will be provided as we dig deeper in the course.
In media re with VMD
VMD is a software for visualising and analysing molecular systems like proteins.
We will see a small subset of the available functionality of the program.
In this lesson we will focus on the visualisation of a protein and on the basics
of a scripting language called Tcl
(read: tickle).
Let’s go back to the shell. In your working folder you should have the following files:
1ubq.pdb equilibration.dcd pulling.dcd ubiquitin.psf
Open VMD, typing vmd
.
We will have three windows:
- VMD Main
- VMD Display
- the Terminal
In the VMD Display
our system will be draw.
VMD Main
is a graphical interface for all the functionality we need.
File
: to load molecules, render them.
Molecule
: to perform actions on the loaded molecules.
Graphics
: to modify representation of the loaded structures.
Display
: to set option for the VMD Display
.
Mouse
: to perform actions using the mouse.
Extension
: contains collection of modelling and analysis tools.
Help
: (really?).
Let’s load a file containing the coordinate of a protein with the extension .pdb
.
We will load a structure of the ubiquitin, a small protein that labels proteins that have to be degraded.
So File -> New Molecule and browse 1ubq.pdb
and load it.
You will have something like this:
Now rotate the protein. Click on the VMD Display
and press r
and move the mouse with the left-button pressed.
What if you press the right-button?
You can change the center of rotation by pressing c
and selecting an atom as pivot.
With t
you can translate the molecule in order to better place it in the display, while with s
you can zoom in and out (or use the scroll button). Keep on play with it!
If you lose your molecule, press =
or go to Display -> Reset View.
For now we saw the protein as a bunch of lines with some red dots overthere.
What are these dots?
Using the nomenclature defined above, we can modify what is displayed. Go to Graphics -> Representation and a new window appears.
The representation style should be highlighted; if not, click on the line Lines Name all
.
Then go to the Selected Atoms and write protein and backbone
.
You should have this.
Modify the Drawing Method from Lines
to Licorice
.
More keyword for the selections can be found if click on the Selection tab.
Most of the keywords will be clearer in the next lessons, so don’t be scared!
As you can see, you can combine keywords with and
, or
, not
.
What happens if you select not protein
?
The red dots are water molecules whose hydrogen are not resolved by the X-rays crystallography. This is why you do not have hydrogen in your protein (the colour code for H is white).
If you change your selection into water
, nothing should happen.
Visualise the protein backbone and the water.
Does your selection work? If not, why?
You can also use more complex selections! Let’s make an example.
Click on the VMD Display
, press 1
and click on a oxygen atom of water. In the Terminal
few lines should appear. Among them
look for index: XXX
(where XXX
is an integer).
Go back to the Graphical Representation window and use index XXX
as Selected Atoms. Create a new representation (Create Rep
button) and use the following selection:
same residue as within 20 of (index XXX)
.
What happens if you don’t write same residue as
?
Usually proteins have a secondary structure: -helices, -sheets etc… It’s hard to find them if we use only sticks.
Delete all the representations you have and create a new one with
the selection all
.
Now go to Drawing Method and select New Cartoon.
In addition, change also the Coloring Method from Name into Secondary Structure (we are bored by the cyan colour!). The result:
The secondary structure is inferred by the 3D arrangement of the residues and the presence of h-bonds, a non covalent bond between a donor (a heavy electronegative atom with a Hydrogen, usually O-H
, N-H
, F-H
) and an acceptor (same kind of atoms not covalently bonded to the Hydrogen) and it is established upon geometric criteria.
Here the colouring scheme for the secondary structure.
- T = hydrogen bonded turn (3, 4 or 5 turn)
- E = extended strand in parallel and/or anti-parallel β-sheet conformation.
- B = residue in isolated β-bridge (single pair β-sheet hydrogen bond formation)
- H = 4-turn helix (α helix).
- G = 3-turn helix (310 helix)
- I = 5-turn helix (π helix)
- C = coil (residues which are not in any of the above conformations).
Now keep on playing with the colouring methods and representations. But before go to Extension -> Visualization -> ViewMaster. Click on Create New
, so you have a representation to go back if you change too many things and you
are not pleased with it.
The choice of the settings depends on what you want to convey with your image. Once you are happy we can (and will) do two things:
- save our work, so we can reload the representation for future use;
- render and save a picture.
To save the representation, click on File -> Save Visualization State, write a name for the file (usually with a .vmd
extension) and save it.
Now quit VMD and reload the representation with File -> Load Visualization State.
Let’s now render.
First of all, we need to change the backgound colour since black is not suitable for printing.
So go to Graphics -> Colors.. and change the Display Backgound in white
.
Then remove the axis: Display -> Axes -> off.
Probably your image is a bit pale. This is due to the Depth Cueing
. Go to Display -> Depth Cueing -> off.
If now it is too dark, turn the cuing on again and modify the parameter in Display -> Display Settings… and change the Cueing Mode in Linear
. You can also play with the related parameters.
Now if you go to File -> Render -> Snapshot and Start Render, you create an image. Depending on your system, something will happen.
But, in general, the image has a pretty low quality. Let’s improve it!
First, in Graphics -> Representation you can increase the resolution of each representation you have. Then, change the Material in AOChalky. Finally go back to the rendering window and change the rendering engine in Tachyon.
Tcl scripting
A feature of VMD is the presence of a Tcl/Tk console
.
Every thing that you can do with the graphical interface can be
done with the command line.
To open the console go to Extension -> Tcl/tk console and a new window pops up.
Just to see if my previous statement was correct, let’s type:
% color Display Background black
Did something happen?
Let’s move to the tcl basics:
- to define a variable
set a 4
; - to print the variable value
puts $a
.
Note that to access the variable value you have to use the dollar sign $
.
You can also print strings:
% puts “Hello, World! >.<”
Of course you can do simple arithmetics as below:
% expr 3 * 10
% expr $a + 5
To evaluate an instruction and assign its output to a variable you need to enclose the command into square brackets [ ]
.
% set a [expr $a * 13.4]
Common statements you will use are:
for
loop:
% for {set i 0} {$i < 10} {incr i} {
puts “$i plus 1:\t[expr $i + 1]”
}
if
clause:
% if { 10 > 4 } {
puts –nonewline “True “
puts “statement!”
}
Let’s check some VMD-specific commands. First let’s resize the display.
% display resize 600 600
This is not the only We saw already how to resize the display. For further info the display
command, type:
% display
The help will be prompted out in red. You don’t need to memorise all the commands!
Read the output/errors! They are really useful to understand that happens.
Check what molinfo
and atomselect
do.
Suppose we want to compute the number of residue in our protein.
First, we need to make a selection of the protein.
Can you think about one atom present in all the residues?
% set selection [atomselect top “ Selection-here “]
% $selection num
Let’s finally clear the workspace.
% mol delete all
Trajectories
The output of an MD simulation is a trajectory. Let’s see what VMD can do with it.
Equilibration
Load the ubiquitin.psf
(a Protein Structure File), then right-click on the brand new molecule and select Load Data Into Molecule
. Here you have to select the equilibration.dcd
(a trajectory file in binary) and load all at once
.
In the VMD Main you can play with the trajectory.
The ubiquitin wobbles a bit around. So let align all the frames in order to have the protein almost in the same position. How many frames do we have?
molinfo top get numframes
Let’s comment the following script.
% set nf [molinfo top get numframes]
% set all [atomselect top “all”]
% set ref_0 [atomselect top “protein and name CA” frame 0]
% set ref [atomselect top “protein and name CA”]
% for {set f 0 } {$f < $nf} {incr f} {
$ref frame $f
$all frame $f
set M [measure fit $ref $ref_0]
$all move $M
}
Now we should have the protein almost in the same position for the whole trajectory. But still, the movie is not smooth.
In Graphics -> Representation there is a tab we previously ignored: Trajectory.
Here we can play with the Trajectory Smoothing Window Size. VMD will perform a rolling average of the positions of the atoms (therefore we would see something weird if we use licorice, such as wrong methyl group arrangement.) The smoothing is applied to the representation is set. So if you have multiple reprs, check this option carefully.
Measuring quantities
We used the command measure
few lines ago. To have an idea of what you can actually measure, just type measure
in the Tcl/tk console
.
As an example, let’s compute the radius of gyration of our protein:
It is a quantity that tells you how compact your protein is. In VMD there are primitives you can use to measure properties of your system and combine them to do deeper and fancier analysis.
The command we look for is measure rgyr
.
Type:
% set calpha [atomselect top “alpha”]
% measure rgyr $calpha
Now compute with a script the radius of gyration for all the frames and save it into a file my_first_script.tcl
Solution not available ;)
Let’s improve our script by printing the results into a file. We will need to add a command to open a writeable file:
set output_file [open ‘eq_rgyr.dat’ w]
and modify the puts
command with:
puts $output_file ‘$f\t$rgyr’
and, finally, we have to close the file:
close $output_file
We can plot the data using gnuplot
(type it in a new shell), and then:
p “eq_rgyr.dat” w l
To quit from gnuplot
, type q
and Enter
.
Bonus: What does pressing 2
, 3
, 4
in the VMD Display
do?
Pulling
Let’s clear the workspace again mol delete all
and load the pulling.dcd
by using:
mol new ubiquitin.psf
mol addfile pulling.dcd waitfor all
In this simulation, one end of the protein is pulled, while another atom is kept fixed. It can be a way to unfold a protein.
Play the trajectory to see what happens.
You can visualise multiple frames at ones. To do so,
switch the representation to New Cartoon
and in the Trajectory tab modify the Draw Multiple Frames from now
to
0:10:99
. Modify now the colouring method in Trajectory -> Timestep
.
Now, go to frame 0
and create two representation to
visualise water molecules around the protein.
For the protein: colour the secondary structure and use New Cartoon
For the water: select water residues within 3 of protein with a VDW
Draw Style.
Play the trajectory now.
Do your water molecule have three atoms?
Is it the result you expect?
Go in the Trajectory tab and select Update Selection Every Frame and replay the trajectory.
Did something change?
Finally, we can modify my_first_script.tcl
in order to apply it to this new trajectory.
- Change the output file name;
- launch the script with
source my_first_script.tcl
Now plot the two radius of gyration.
Exercise
Downlod a structure from the Protein Data Bank and download a molecule you like (or ask for hints).
Create your custom representation of the protein, highlighting a particular residue in Licorice
, ligands in Licorice
or VDW
.
Bonus
If you manage to complete the exercise, ask for a really small task.
Further readings
Notes
-
Under biological conditions, the N-group is protonated (-NH3+) ↩