DNA referred to as blueprint of life codes for the diversity and function of all the living organisms. Determining DNA sequences of the living organisms not only gives an overview of their genetic makeup, but also provides information about their function.
Nonetheless it was not easy to determine the genome sequencing of all the diversity around us especially with the technologies available before 2010.
Therefore, determining the sequence of humans and some other organisms only was prioritized. Pioneering methods for DNA sequencing given discovered by Maxam and Gilbert, and Sanger although were very powerful and popular but were not high throughput and economic. Therefore, it was necessary to develop new economic and high-throughput methods that can sequence the biodiversity consequently providing better insights of their possible function.
New methods were developed and commercialized by Roche Life Sciences, Thermo Fisher Scientific, Illumina, and Applied Biosystems. These methods generally referred to as next-generation sequencing methods have revolutionized the DNA sequencing.
Many sequencing platforms employing NGS have been developed including pyrosequencing, Ion Torrent technology, Illumina/Solexa platform, and SOLiD (Sequencing by Oligonucleotide Ligation and Detection). Further optimization has led to innovative third and fourth-generation platforms as single molecule real-time (SMRT) sequencing by PacBio, nanopore sequencing, etc. As a consequence there is a sharp increase in the number of genomes being published and other genome-based studies since 2012.
This has made it easy even to imagine of sequencing the genomes of individuals. Furthermore, scientists are now looking for third-generation sequencers that may be significantly different from the sequencers that are currently available.
Since the discovery of DNA as the genetic material by Frederick Griffith in 1928, mankind continues its advance in improving the DNA-based technology and to unravel the blueprint of life (Avery et al. 1944). One of the most important pro-gresses that has been made in understanding the blueprint of life is the improvements made in sequencing technologies.
The initial methods developed by Maxam and Gilbert were able to sequence only a few nucleotides, which was followed by the development of chain termination methods of Sanger. It was however realized during the Human Genome Project in 2003 (National Human Genome Research Institute (NHGRI), NIH 2003) that high-throughput and economic methods are required to complete the HGP and other future projects.
Later on next-generation sequencing methods were developed that made sequencing high throughput and generated the data faster than ever. It is interesting to note that since the development of these sequencing techniques, genomes of a number of organisms have been published. These approaches have provided better insights of the complex microbial environments such as gut microbiome, etc. Scientists are still working further to make these methods economic and high throughput.
Sanger method was used mainly as the only sequencing method for three decades, despite its high cost and time as major drawbacks. Next-generation sequencing (NGS) technologies are emerging as one of the most economic, quick, and high-throughput methods of DNA sequencing. A number of different platforms based on these technologies are being currently used for sequencing, such as:
- Roche/454 sequencing
- Ion torrent: Proton/PGM sequencing
- Illumina (Solexa) sequencing
- SOLiD sequencing
Where Sanger technique was considered as the first-generation method, such plateforms are recognized as the second-generation tool. The technique was firstly reported in 2005 by Roche’s 454 technology and was commercialized as technology capable of generating high-throughput sequence data, at much lower cost than the first-generation sequencing technologies (Qiang-long et al. 2014). NGS offers many benefits in comparison with the traditional sequencing methods as well as the microarray expression profiling.
The basic advantages of NGS technology are (1) high throughput, the generation of multiple short reads in parallel, (2) fast, (3) economic, (4) wide range of detection, and (5) discreteness (the results are generated without noise and signal saturation).
The sequence data produced by the second-generation sequencing comprises of billions of short DNA sequences (reads) that range from 50 to 300 nt in length. These sequences require de novo assembly before the analysis.Short-read sequencing methods are divided under two wide categories: (1) sequencing by ligation (SBL) and (2) sequencing by synthesis (SBS) (Goodwin et al. 2016; Myllykangas et al. 2012).
Sequencing by ligation (SBL) exploits the mismatch sensitivity of DNA ligase to fix the underlying sequence of nucleotides in a given DNA sequence. Sequencing by synthesis (SBS) utilizes DNA polymerase or ligase enzymes to encompass many DNA strands concurrently. Nucleotides or short oligonucleotides are introduced either on a single time or modified with identifying tags so that the base type of the incorporated nucleotide or oligonucleotide can be recognized as the extension happens.