We all know that DNA is something that carries information across generations; it is passed from parents to children and it describes your hair color, your height, how many arms you have, if you will like cilantro or not, and if you have diabetes.

All this information is carried in the form of genes, a series of nucleotides that are used from our cells for the synthesis of proteins, for their regulation, the organization of tissues, and many other things. These sequences are not spread randomly in our genome, but they are organized in packages to be easily accessible when needed.

It is fascinating how this complex molecule can do all of that, how it evolved throughout billions of years, and what it can reveal about us and the other organisms with whom we share this planet.

General structure

I cited nucleotides before, to understand what they are we need to dig a bit further into the chemical structure of this powerful molecule:
We can imagine DNA as a string of single monomers, called nucleotides, that are very similar chemically and share the same backbone composed of a sugar and a phosphate group, but they are different in one piece, the nitrogen base (this is why they are also called bases). In DNA there are 4 different bases, two belonging to the class of purines, Adenine (A) and Guanine (G), and two pyrimidines, Thymine (T) and Cytosine (C). One single DNA filament is formed by a sequence of nucleotides linked together through their common chemical structure, leaving exposed the bases, this is important because in nature (in the majority of the cases) DNA is not found as a single strand molecule, but as a double filament. This is possible because each base is able to connect with another exposed base through hydrogen bonds, this coupling is guided by the slightly different chemical structures of the nitrogen bases: thymine can create 2 H bond with adenine, meanwhile, between cytosine and guanine there are 3 H bonds (stronger coupling).
This specificity in the coupling of DNA filaments is important to understand the concept of complementarity between the 2 DNA filament: each double-stranded DNA (dsDNA) molecule carries the information on two complementary DNA filaments, one is called forward and the other reverse.  The fact that the same information is carried out by 2 filaments helps us in understanding how the information is copied and transferred across the cell, and used for protein synthesis. The only thing that I didn’t tell you yet, is that dsDNA molecules are not linear, but due to the arrangement of atoms and the angles of bonds, they form a right-handed double helix, with around 10 bases per turn, a structure that was observed for the first time by Watson and Crick in 1952, thanks to the x-ray crystallography analysis perform by Rosalind Franklin.

How DNA is managed

DNA molecules can reach huge dimensions, in particular genomic DNA, that carries all the information for a particular organism, can reach sizes that go from the 12 million base pairs (bp) of S. cerevisiae, to 17 billion bp of common wheat, to an outstanding 43 billion bp of the lungfish (that holds the record for genome size). In this sea of information, some sequences have been characterized as proper genes, responsible for the production of specific proteins, other sequences can code for RNAs, others have a regulatory function and some of them (the majority) don’t seem to have particular functions.
Molecules so large require dedicated systems of packing and organization, and cannot be left freely inside the cell. Procaryotes have genomes composed of just 1 circular helix of DNA, they do not have nuclei so they arrange the DNA in a specific region of the cytosol called the nucleoid, which can be observed under a microscope as an irregular shaped globule. It is possible to observe the genomic DNA because the cell organized it in a supercoiled structure: the filament is tightly packed with proteins, and usually, only small regions of the filament are unwinded at a time, to carry out the necessary cellular functions.
Things become more complicated with the even larger genomes of Eucariotyc cells, in which the DNA is stored as Chromatin complex, in which proteins are used to avoid the dispersion of DNA outside the nucleus and regulate its activity. The protein-DNA aggregate is very complex and requires hierarchical systems of packing, creating first small and repetitive structures (10nm beads) and then a more complex arrangement in filament and supercoils.
For lucky organisms that have a sexual reproductive system like us, the genome composition is even more complex, due to the storage of multiple copies (chromosomes) of the same genome, 2 in our case, 6 in the case of wheat. The two copies come from the parents, and the actual genome copy numbers (ploidy), depends on the evolutionary history, the type of cells, and the reproduction system adopted by the organism. The complete picture of the genome composition and gene regulation is even more complex due to the continuous mutations, shuffling, copying, and rearrangement of sequences during the evolution of the organism.


DNA and RNA: different but similar

DNA is an acronym as many of you may know, it stands for DeoxyriboNucleic Acid:

  • Deoxyribo-: it stands for the sugar that is present in the common structure of nucleotides, which is called 2-deoxy ribose, a version of ribose that is laking a particular hydroxyl group in position 2.
  • Nucleic Acid: it is an acid due to the high number of negative charges that are present in the DNA backbone, given by the presence of many phosphate groups. These phosphate groups help in keeping the DNA soluble in aqueous solutions.

Another molecule that is very similar to DNA is RNA, this less-stable version of DNA is involved primarily in the transfer of the information from the nucleus (where DNA is kept) to other regions of the cell, for protein synthesis. But it has other roles, over several years many different types of RNA have been found, each of which with different regulatory functions in cells. RNA molecules are smaller in respect to genomic DNA and are synthesized through the transcription (copying) of a small region of DNA[4].

The differences between DNA and RNA are an insight into the relationship between the chemical structure and the function of a biomolecule, and also gives hints on the evolutionary processes that gave birth to DNA, and life with it.

The sugar that is present in the backbone of RNA is ribose, the presence of the extra hydroxyl function in position 2 of this sugar cause problem in the stability of the RNA filament: the bulkiness and slightly negative charge of the oxygen interact with the phosphate group of the previous nucleotide in the sequence. With this, we obtain a less stable, more flexible filament, that can be easily degraded.

The bases are not the same, in RNA T is substitute with another pyrimidine: Uracil (U). The coupling T-A is then substituted with U-A, the new base is only slightly different but it is important for the long-term preservation of the information in the DNA. So, here the question is: Why in DNA U is not present?

               To understand that we need to know that C, present both in RNA and DNA, is subjected to spontaneous chemical degradation (deamination) that can turn it into a U base. In living cells, a particular enzyme is able to correct this error by substituting the U with the correct C, and this is important because if the base mismatch (G-U) is left, in the next cycle of replication the mutation will be kept. So, if U would be present in DNA this “correction-enzyme” wouldn’t be able to distinguish between a U present in the original filament, with U coming from C degradation. It is also possible that U was present initially in DNA, and that cells adopted methylation as a mechanism to tag the original U bases, to distinguish them from the degradation-derived ones, and so, over time the methylated U (aka T) took over and became a standard component of DNA.

Finally, RNA molecules are usually found as single-stranded filaments, because they carry the information taken from 1 strand of DNA that is used as a reference, and bring it in ribosomes where it is read to synthesize proteins. In other types of RNAs, specific repeated sequences allow the molecule to create particular tridimensional structures and create loops, allowing a unique RNA-protein interaction.

How it started

The complexity of life on earth has been well explained by the modern theory of evolution, by giving an organism the capacity to replicate, making imperfect copies of itself, throughout countless generations, it will create an entire ecosystem of organisms specialized in particular tasks, carved by the environmental pressure.

However, the starting point of life is still unknown, at some point in history (around 4 billion years ago) a living organism has to arise from an abiotic environment. But the challenge of explaining the origin of life can be simplified by the “snowball up the hill” effect: it is necessary to prove the creation of some sort of genetic code, that over time, thanks to its capability of both, replicate and catalyze chemical reactions, could lead to the formation of some sort of bacterial-like cell.

It has been proven that simple organic molecules could be formed spontaneously in prebiotic Earth, like in the atmosphere thanks to the activity of thunders, or in underwater vents.  Another possible explanation for the presence of organic compounds like amino acids, organic acids, and nitrogen bases at that time, could be the meteoric bombardment to which Earth was subjected. These hypotheses are difficult to prove and it is also possible that more than one event participated in the formation of an organic-rich environment in the pre-biotic area.

So, did DNA arise from spontaneous reactions on Earth? Probably not, its structure is complex and far too specialized in carrying information, the genetic polymer that we are looking for must be less complex, and so RNA could be a possible candidate as the simpler cousin of DNA. Not only that but the origin of the genetic code faces another problem: without pre-existing genetic information where the necessary enzymes for replication came from? A possible explanation can be given by RNA itself, it has been observed that RNA, can have catalytic activity, working as an enzyme. The hypothesis that RNA was in fact the first genetic polymer needs still to explain how monomers that are so complex, and require the formation of regiospecific bonds between 3 different chemical moiety, could be present at that time. The supporters of this hypothesis link the formation of long RNA molecules from activated nucleotides in the environment, to geological events and the interactions with clays that could act as the catalyst of this process[2].


DNA and potentially informational oligonucleotide analogs. (a) DNA. (b) Pyranosyl analog of RNA. (c) Peptide nucleic acid[2].

But RNA can be also considered a pretty complex molecule, it is possible that the original genetic code was not a nucleic acid at all: some scientists speculate that other types of polymers could work as genetic code and being simpler and more probable in a prebiotic environment. Aminoacids and peptidic bonds seem more plausible as components of the first genetic code, so peptide nucleic acid (PNA) could be a candidate as DNA ancestor. From this genetic code, information could then be passed to a molecule of RNA (it has been observed) and then RNA would act as the direct ancestor of DNA[1].
The research of the original genetic code is far from being completed, prebiotic chemists are continuously evaluating different scenarios, taking into consideration the environmental conditions on Earth, the probability of obtaining particular building blocks, and how they could be assembled, by using different kinds of bonds. The number of possibilities is very high, the different speculations that can be made are exciting, and I think that I will explore the subject of the origin of life more in-depth in the future.
I find this kind of research thrilling because we are moving to unexplored scenarios in which all kinds of possible origin stories could be possible. The complexity of this question relies on the difficulty in finding out the specific environmental conditions on Earth when life started, and the huge number of possible molecules, monomers, and chemical bonds that could arise in the prebiotic era.