The breakthrough technological leap in DNA sequencing
In this series of posts on precision medicine, we have constantly mentioned DNA, DNA sequencing, genome sequencing, or massive sequencing (NGS), but what does this concept really mean? What is DNA sequencing Simply put, sequencing is the technique that converts a biological sample (such as a biopsy, blood, or saliva) into data that can be analyzed by a computer. We go from having a piece of tissue to obtaining a text file that contains the "instruction manual" of that person's cells, that is, the exact sequence of their DNA. This sequence, which we can think of as a plain text file, is what we will use in the methods and procedures for genomic testing for precision medicine that we discussed in our previous article. This time we will talk about the technical differences we can find to sequence that genomic material. This will enable us to analyze it and find patterns or variants that help us advance in precision medicine. DNA, or deoxyribonucleic acid, is the molecule that contains all living beings' genetic information. It consists of four nucleotides, each represented by a letter: A (adenine), T (thymine), C (cytosine), and G (guanine). These letters are organized in a double helix structure, where each nucleotide on one strand pairs with its complementary nucleotide on the other: A with T and C with G, forming the so-called base pairs. Deciphering the sequence of these letters in DNA allows us to understand how genes work, how they are regulated, and how they influence health and disease. The ability to sequence DNA has led to great advances in the knowledge of genome organization and function. Thanks to these techniques, scientists have identified genes responsible for hereditary diseases, developed targeted therapies in precision medicine, and reconstructed species evolution over time. But not all sequencing technologies are the same: for example, depending on the technique used, longer or shorter DNA fragments can be obtained, which influences the accuracy and usefulness of the analysis. Below, we will explore the different generations of sequencing techniques. First-generation sequencing: the Sanger method Historically, the most used method for sequencing DNA has been the one developed by Sanger and his team. In this procedure, the DNA molecule whose sequence is to be determined is converted into single strands, which are used as templates to synthesize a series of complementary strands. Each of these strands randomly ends in a different specific nucleotide. This produces a series of DNA fragments separated electrophoretically, and whose analysis reveals the DNA sequence. In the first step of this reaction, DNA is heated to denature, forming single strands. The single-stranded DNA is mixed with primers that hybridize to the 3' end of this DNA. The single-stranded DNA sample bound to the primer is distributed in four tubes. In the next step, DNA polymerase and the four deoxyribonucleotide triphosphates (dATP, dCTP, dGTP, and dTTP) are added to each tube. In addition, each tube also receives a small amount of a modified deoxyribonucleotide, called a dideoxynucleotide (ddATP, ddCTP, ddGTP, ddTTP). Dideoxynucleotides have a 3'-H group instead of a 3'-OH group. To analyze the sequence, one of the deoxyribonucleotides or the primer is radioactively labeled. DNA polymerase is added to each tube, and the primer is elongated in the 5'-3' direction, forming a complementary strand to the template. During DNA synthesis, DNA polymerase occasionally inserts a dideoxynucleotide instead of a deoxyribonucleotide into the growing DNA chain. Since the dideoxynucleotide lacks a 3'-OH group, it cannot form a 3' bond with any other nucleotide, and DNA synthesis stops. 🧬 For example, in the tube to which ddATP has been added, the polymerase inserts ddATP instead of dATP,causing chainn elongation to stop. In the other tubes, the reactions end in a C, a G, or a T respectively. The DNA fragments from each reaction tube (one for each dideoxynucleotide) are separated in adjacent lanes by gel electrophoresis. The result is a series of bands that form a ladder pattern that is visualized by exposing the film to the gel. The nucleotide sequence is read directly from the base to the top of the gel, which corresponds to the 5'-3' sequence of the DNA strand complementary to the template. DNA sequencing in large-scale genome sequencing projects has been automated and uses machines that can sequence several hundred thousand nucleotides each day. In this procedure, each of the four dideoxynucleotide analogs is labeled with a fluorescent dye of a different color. This is so that the chains that end in adenosine are labeled with one color, those ending in cytosine with another, and so on. All four labeled dideoxynucleotides are added to the same tube. After primer extension by DNA polymerase, the reaction products are loaded into one lane of a gel. The gel is scanned by a laser, causing each band to emit fluorescence of a different color. The sequencing machine has a detector that reads the color of each band and determines whether it represents an A, a T, a C, or a G. This data is represented as colored peaks, each corresponding to a nucleotide in the sequence. Image of the Sanger sequencing flow. Source 🧬 This is the simplest and most universal sequencing method. In second- and third-generation sequencers, sequencing techniques differ among different manufacturers, so we won't go into detail here (you can breathe easy). From traditional sequencing to second-generation sequencing (NGS): the great technological leap Before the advent of Next-Generation Sequencing (NGS), DNA sequencing was performed using the aforementioned Sanger method, which is ultimately a slow and expensive process that allows reading relatively short DNA fragments sequentially. The arrival of the second generation of sequencers marked a before and after, as it introduced the possibility of sequencing millions of DNA fragments simultaneously (massive parallel sequencing). This allowed the DNA of entire organisms to be deciphered in record time, accelerating the development of projects such as the Human Genome Project and making genetic sequencing more accessible. The great feature of NGS is that it fragments DNA into small pieces called short reads, typically measuring between 50 and 600 base pairs (bp), which will be copied many times to amplify the material to be read. Subsequently, these pieces are computationally reconstructed to obtain the complete sequence, in what we call pipelines or bioinformatics workflows. 🧬 This works very well for identifying point mutations and small changes in DNA and giving themmeaning. For example,s variants that can influence hereditary diseases. However, when it comes to analyzing complex regions of DNA, such as highly repetitive ones or those containing large rearrangements, reconstructing the complete sequence can be complicated and prone to errors. Third-generation sequencing: longer reads and real-time analysis While NGS remains the most widely used technology today, third-generation sequencers have brought significant improvements. Instead of splitting DNA into small fragments, these new technologies allow reading much longer fragments, ranging from 10,000 to 100,000 base pairs (long reads), making genome assembly easier and identifying complex structural changes. Another key point is that third-generation sequencing is performed in real-time and does not require prior amplification of DNA, reducing errors introduced by genetic duplication. Additionally, these techniques can detect epigenetic modifications, such as DNA methylation, without the need for additional steps in the process. 🧬 This has significant implications for studies of diseases like cancer, where the regulation of gene expression, that is, which genes are "activated" and when they do, plays a fundamental role. Short reads or long reads? Which is better? The choice between second- and third-generation technologies depends on the specific application. While short reads (NGS) are extremely accurate and allow analyzing large volumes of data at low cost, long reads offer greater capacity to detect structural variants and assemble complex genomes without complicated computational reconstructions. Short reads (NGS, second generation) ✅ High precision for detecting point mutations and small variations. ✅ Lower cost per sequence base. ✅ Ideal for studies of genetic diseases, cancer, and transcriptomics. ❌ Difficulties in assembling complete genomes due to fragmentation. ❌ Limitations to detecting large structural variants. Long reads (third generation) ✅ Long reads that allow assembling complete genomes more easily. ✅ Ability to detect structural variants and repeats in DNA. ✅ Does not require amplification, avoiding biases in the process. ❌ Higher error rate compared to NGS (although it can be corrected with additional coverage). ❌ Higher cost per sequencing. Trends in the use of these technologies for DNA sequencing In recent years, third-generation sequencing has gained ground, especially in studies where it is essential to have a complete view of the genome without interruptions. For example, long-read sequencing has been key to identifying structural variants in diseases such as autism and certain forms of epilepsy (Chaisson et al., 2019). Additionally, in 2022, the Telomere-to-Telomere (T2T) Consortium managed to sequence the entire human genome without gaps for the first time, thanks to the combination of third-generation technologies (Nurk et al., 2022). However, NGS remains the most widely used option in hospitals and diagnostic laboratories, due to its low cost and high precision in detecting individual mutations. In many research cases, the current trend is to combine both technologies in hybrid studies, where short reads offer precision and long reads allow resolving complex regions of DNA. In the clinical setting, third-generation sequencing is still received with some skepticism, which makes sense if we think that an error in sequencing can lead to a diagnostic error for a patient, and therefore directly affect their health. Sequencing technologies are constantly evolving, and it is likely that in the coming years we will see even greater integration between second- and third-generation techniques. It is foreseeable that third-generation sequencing will become widespread both in biomedical research and in the clinic as costs decrease and accuracy improves. Third generation will become increasingly common in hospitals. Regardless of the technology used, the impact of DNA sequencing on personalized medicine, the identification of rare diseases, and cancer research will continue to grow, bringing us ever closer to treatments specifically designed for each person's profile. Cyber Security Cloud Connectivity & IoT IA & Data Healthcare's digital transformation: challenges, needs, and benefits December 18, 2024
March 31, 2025