The tridimensional structure of complex biomolecules is responsible for their function, like the catalytic activity of enzymes, the specificity of antibodies, or the recognition performed by particular receptors. Proteins, for example, are synthesized as linear chains of monomers (amino acids), and how the filament fold, first locally, and then globally, affects directly their biological function. For this reason after denaturation, which causes the linearization of the protein chains,  often the activity is lost.

The conformation of these linear polymers is stabilized by interactions between monomers near or far apart in the sequence; these links can be both weak, like H-bonds or hydrophobic interactions, or very strong like covalent bonds.

Creating an atomic model of the structure by using computational tools, helps us in understanding first which bonds are involved in the stabilization of the protein conformation, then the possible interactions with other molecules (proteins, nucleic acids, substrates, inhibitors), and finally allows us to design new proteins or to modify preexisting ones.

X-ray crystallography

These days I’m trying to learn how to use ChimeraX program, which is used to visualize and study the 3D structure of molecules, creating nice images and animations. With this program is possible to visualize the atomic model of the molecule, its surface, and other physiochemical characteristics, like charge distribution, hydrophobicity, mobility of chains… The model can be derived from different types of analysis, like for example X-ray crystallography, and then the model is deposited in dedicated databases.
X-rays crystallography is one of the most used technics to identify the molecular structure of complex proteins. The analysis starts by producing the protein of interest, usually by recombinant systems, to obtain a highly pure and concentrated solution. From this solution, crystals are recovered and subjected to analysis.
One regular crystal is put in a goniometer, a device that helps in orienting the crystal with precise angles. The crystal is then bombarded with a focused X-rays beam and, depending on the molecular structure of the sample, the X-ray beam will be modified in intensity and direction. The detector behind the sample will then capture an image of the diffracted X-ray beam.
The measure is repeated several times with the crystal in different orientations.  The information of different exposures is put together with previous knowledge of the protein sequence, to come up with a molecular model that fits with the experimental density map.

But how exactly we can derive the picture of a molecule, starting from a physical sample? And why only crystals and X-rays are used?

First, we need to know that X-rays can interact with electrons that surround the nucleus of atoms, the energy provided by this beam of light is insufficient to let electrons occupy a different energetic level (they can absorb only discrete amounts of energy), so, X-rays are reflected with the same intensity, and energy is not lost (elastic scattering). X-rays will be emitted in different directions from the scatterer (atom), creating spherical waves that propagate through space. A regular array of scatterers produces a regular array of spherical waves that will interfere with each other, sometimes by canceling each other out (disrupting interference), other times by creating waves with higher intensity (constructive interference).
This is why crystals are used; they provide regularly spaced molecules, all of them in the same orientation. The intensity and pattern of the radiation that hit the detector depend on the position of scatterers. To reconstruct the structure of the crystal that caused in primis the diffraction of the incident X-ray, we need to dig a bit into the phenomenon of Bragg’s diffraction.


This law correlates 3 variables, the incident angle at which the beam reaches the sample (θ), the wavelength of the light (λ), and the distance between the different planes of the crystal lattice (d). When these parameters are correctly set, the diffracted x-rays coming from the sample will create a clear and unique peak due to the above-mentioned constructive interference. “n” in the equation is an integer that stands for the plane number; in fact, constructive interference of scattered waves occurs only if, at each plane of the crystal, the phase of the wave is modified with a delay that is a multiple of the wavelength.


From images to the density map

Even when the collection of the diffraction patterns is performed correctly, the creation of the 3D model starting from a series of 2D information is not trivial. For each diffraction pattern, the phase and amplitude of the X-ray need to be known, in order to derive the coordinates of atoms in the structure.

Diffraction pattern of crystallized buffalo (Bubalus bubalis) hemoglobin.

Often some types of “intruders” are added in the crystal to help the resolution of the 3D structure, due to their capability of creating particular and unique optical effects when interacting with X-rays. For this reason, is common to find big metal ions or selenium (Se) in the crystal. Se can be introduced in the protein structure itself during protein expression; due to its similarities with sulfur (S), it can be used in its organic form (artificial amino acid selenomethionine) by a microorganism incapable of producing cysteine (S-containing amino acid) during protein synthesis. Another possibility is to add a well-known structure in the protein of interest as an additional domain (chimeric protein) to help in resolving the structure.
In any case, components added to help the structural analysis shall not modify the native protein structure.
The result of diffraction pattern analysis is a 3D density map that provides information on the electron distribution in the molecule. In this density map then the theoretical model of the protein is fitted in, and the final selected structure will be the one with amino acid side chains correctly oriented.
Depending on the level of resolution it is possible to find in the map the volume occupied by each atom. Often H atoms are not resolved in the density map and they are added computationally, using possible H bonds as a guide for correct orientation.


Mesh representation of buffalo (Bubalus bubalis) hemoglobin (PDB:3CY5) X-ray crystallographic density map, with fitted molecular model. Analyzed with ChimeraX

Problems in crystal formation

The procedure that leads to the identification of the molecular model of a particular protein is linear: it starts with the expression of large quantities of proteins then, purification, crystallization, collection and interpretation of the data. One of the bottlenecks of the procedure is the recovery of high-quality diffraction-grade protein crystals.
In classical X-ray crystallography, the recovered crystal needs to have just one crystalline reticulum (it should come from an initial aggregate of a few molecules) and needs to be of good size. With small organic molecules, the crystallization process is often simple, the researchers start with a completely saturated solution of the chemical, and by slowly removing the solvent, or by reducing the solubility of the compound, let nucleation of crystals occur. Small seed crystals are then selected and allowed to grow in new fresh solutions.
With proteins and other large biomolecules, we face problems due to the high amounts of intra and intermolecular interactions. Some of the parameters that can affect the formation of crystals are the pH, the presence of salts and other organic molecules, temperature, and the relative amount of proteins. Due to this large number of variables, often large quantities of protein are necessary for the total number of trials, and nowadays it is not even possible, practically, to check for each condition to find out the best crystallization procedures. So more attention has been put in high throughput procedures that allow researchers, with the aid of robotic and automated systems, to obtain suitable crystals in short times and minimize resources.

Scheme of most used crystallization procedures for protein analysis

Vapor diffusion is one of the classic technic used, it is performed on wells plates in which a drop (1-2μl) of the diluted protein solution is put in a small close chamber with another solution that acts as crystallization buffer. The plate is then incubated in specific conditions to allow the water from the protein to equilibrate with the buffer in the chamber: the water from the protein drop slowly evaporates and the concentration of proteins and precipitant increases until crystals are formed.

Another common technic (microbatch crystallization) can work with even smaller volumes of protein solutions and uses different types of oil to slow down the evaporation process, which at those volumes (nl) occurs too quickly. It requires the usage of wells plates and a robotic deposition of the protein droplet under oil. In this way the process of evaporation, and so the formation of crystals can take weeks.

Large screening technologies able to check for thousands of conditions per day, and the development of crystallographic technologies able to use microcrystals, are lowering the time and resources necessary to create accurate and high-resolution molecular models of complex biomolecules.