CRISPR has revolutionized gene editing in the last decade, its simplicity and low costs have lead to a broader application of DNA manipulation technologies. It quickly replaced other protocols like TALLEN and Zinc fingers, used to modify specific sequences in few model organisms.

Derived from a particular form of adaptive immunity against phages, evolved in a wide range of prokaryotic species,  CRISPR-Cas systems are based on the capability of Cas proteins to use a guide RNA molecule to target specific genes in the DNA. Subsequently, a nuclease activity is responsible for double-stranded brakes (DSBs) or the creation of “sticky ends” cuts. DSBs then are solved by the natural cell activity, causing modifications like indels or by introducing new sequences, leading to gene knock-out or silencing.

In the early years of 2000, specific repetitive sequences were found in the genome of different bacteria, like Streptococcus thermophilus (applied in the production of yogurt), that matched with other sequences found in viral DNA. It took few years to develop a model of the evolution of these sequences, but then, it was proven that in S. thermophilus they were involved in virus immunity. Further studies characterized the structure of CRISPR arrays, that work as libraries of viral sequences available for the production of guide RNAs. Other systems like DNA acquisition in CRISPR arrays and the different families of Cas proteins are still under study, due to the high variability in different species and biological systems. Then, this technology was engineered for lab application, and RNAs were designed to perform eukaryotic cell modifications for the first time in 2013, leading the way to the CRISPR gene editing era.


3D structure of Cas9 protein from Streptococcus pyogenes (PDB:6o0x visualized in ChimeraX). Highlighted in blue is the protein surface, in red is represented the sgRNA molecule, in green is represented the target strand of the original dsDNA molecule. In the bottom, image is reported the opposite site of the protein where is possible to see the tracrRNA portion, arranged in loop structures.

Mechanism of immunity of class II CRISPR

In this post, I will focus mainly on Class 2 CRISPR system because one of the first studied and engineered for gene editing, which activity derives from just one multi-domain complex. The Cas9 protein is composed of a recognition lob, that uses the guide RNA to target specific DNA sequences, and a nuclease domain, responsible for the introduction of DSBs.  The overall mechanism of immunity is composed of 4 main steps. 

The DNA sequences in the organism’s genome, necessary for the synthesis of guide RNAs, are called protospacers, in a CRISPR array multiple protospacers are separated from repetitive sequences (from here the CRISPR acronym: Clustered Regularly Interspaced Short Palindromic Repeats). From this library, the cell can transcribe multiple guide RNAs used to silence viral genes during a potential infection. The process of (i) acquisition of protospacers by the cell is still not completely clear, it depends on the type of CRISPR system, in class II for example the Cas9 recognition domain is the one responsible for protospacer selection from a non-genomic DNA strand (viral DNA). It is believed that once protospacer selection is achieved the integration in the CRISPR array is performed thanks to the recruitment of other Cas proteins (Cas1 and Cas2). The selection of the protospacer is not random, Cas9 uses small 3-nt sequences called PAMs (protospacer adjacent motif) to target the possible protospacer, the PAM also plays an important role in guiding the nuclease activity and avoiding self-targeting of the CRISPR system. The so selected protospacer is added to the CRISPR collection and the information is passed on to the next generation, so, during the evolution of a particular species, a specific CRISPR array is developed as a consequence of ancestors’ infections. 

The CRISPR array is situated downstream the Cas9 protein gene, so that they are co-expressed when necessary. Once a protospacer is transcribed it undergoes a process of (ii) maturation to be used by the Cas9. In the natural immunity system the RNA molecules used by Cas9 are 2: 

·        crRNA: a short 20 nt RNA molecule that is used as a guide for the recognition of DNA regions, derived from the process of maturation of protospacers (this is why they are also called spacers). The complementary recognition is performed on the 5’ end of the filament.

·        tracrRNA: which stands for trans-activating RNA, is complementary to the 3’ extremity of the crRNA and forms a loop structure necessary for Cas9 activation and other functions. In artificial systems, the 2 RNA molecules are fused by the presence of a linker to create a single filament called guide(g)RNA.

The maturation of the crRNA depends on the different families of CRISPR, in some cases, specific endonucleases are involved, in type 2 Cas systems both tracrRNA and the Cas9 complex play a role.


Scheme of DNA interaction between Cas9 and Cas12a proteins. 


After the 2 RNA molecules form a duplex, or the artificial sgRNA is expressed, the Cas9 protein can start working, performing the (iii) targeting of the DNA. Initially, a D loop is formed in the region targeted by the crRNA to facilitate the Cas9 activity, because, in general, the normal double helix structure of DNA is not so accessible to nucleases activity.
The DNA structure is opened by using the PAM sequence as a reference, it could be either upstream or downstream the target sequence depending on the Cas type. In Cas9 the PAM sequence is  a G-rich 3-nt sequence. Finally, the DSB is produced, and the repair machinery of the cell could introduce a mutation. In Cas9 the cut is performed nearby the PAM sequence, so it is possible that, after the mutation, it is destroyed, avoiding a second cycle of Cas activity (that could be a problem in terms of efficiency in artificial systems).

The (iv) introduction of mutations can be guided by 2 possible mechanisms:

1.      Non Homologous Ends Joining (NHEJ): it is one of the most probable events, it causes the introduction of new nucleotides (insertion) or the elimination of some of the existing ones (deletion). These types of mutations, generally referred to as indels, have different sizes and their introduction is semi-random. This mechanism results in the disruption of a gene sequence by changes in the reading frame or by the introduction of stop codons (gene knock-out). Even if NHEJ seems to be the most common repair mechanism, it is often performed accurately, lowering the efficiency of indels introduction by Cas9.

2.      Homology-Directed Repair (HDR): it occurs when in the region of the DBS are present small DNA oligos with homolog regions that can be used as templates, or be inserted directly, resulting in the addition of new sequences (in case of dsDNA). It is difficult to guide precisely the mutation due to the complexity of the mechanisms involved, and this causes very low efficiency (around 5% in the case of Cas9). A higher level of efficiency could be achieved by systems that do not introduce DBS but leave sticky ends in the target DNA, like Cas12a.
 
The possible applications of CRISPR are limitless, it is nowadays used as a standard procedure for gene editing and manipulation, and even if it was initially developed for gene therapy, it could be applied in all fields of research in biotechnology. As we have seen before, DNA manipulation is now used in almost all fields of biotechnology, for the production of recombinant organisms, gene therapies, genomic studies, and industrial production. The technology is not perfect yet, there is space for improvement, in particular for the reduction of off-targets and the low efficiency of mutations, that could be solved by new design of sgRNAs, by developing new Cas proteins, and by shedding light on some of the still not known mechanisms, involved in CRISPR genetic manipulation.