De novo sequencing refers to the sequencing and construction of a new genome or transcriptome. New sequences can provide novel insight into an organism’s biology and can also be used in comparative genomic studies. The process of mapping the reads from these projects is complicated because there is no reference sequence to which sequence reads can be aligned.
To ease the bioinformatic process of alignment in de novo sequencing projects, overlapping sequences are built up into as few large contigs as possible. This is aided by using technologies that provide long read lengths as well as paired-end and mate pair reads. All of these, along with the provision of high-coverage read data, increase the amount of overlapping sequence and therefore increase confidence in sequence assembly.
Short reads by themselves are insufficient for de novo projects because they are not long enough to encapsulate long blocks of repetitive sequence, but long reads also suffer from the relatively low coverage they provide for an uncharacterized genome or transcriptome. Because next-generation sequencing platforms tend to optimize for either long reads or large numbers of shorter, paired-end reads, multiple technologies are routinely used in de novo sequencing projects. The longer read lengths (still sometimes generated through Sanger sequencing-based methods) provide a scaffold to which shorter reads can be aligned. These shorter reads, produced at a scale that provides high sequence coverage, can then be analyzed for annotation of rare variants such as SNPs, or effects of RNA editing.
Comparative genomics: Bonasio, R. et al. (2010) Genomic comparison of the ants Camponotus floridanus and Harpegnathos saltator. Science 329(5995):1068-71.
De novo transcriptome assembly: Martin, J.A. and Wang, Z. (2011) Next-generation transcriptome assembly. Nature Reviews Genetics 12(10):671-82.
Multi platform approach: DiGuistini, S. et al. (2009) De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data. Genome Biology 10:R94.
Key Platform Characteristics
|# of reads||Very Important||High sequence coverage is crucial to ensure confidence in SNP calls and small variants, as subsequent resequencing projects will rely on these annotations.|
|Read length||Very Important||Long read lengths ease the analysis involved in sequence scaffold assembly.|
|Error rate||Important||Errors in de novo project that are propagated in subsequent analyses are serious, but high read coverage of de novo projects can mitigate inappropriate base calls.|
|Paired-end reads||Critical||Provides additional read length that helps alignment of the read itself.|
|Mate-pair reads||Critical||Provides a reference point that allows bioinformaticians to properly align reads that may fall within a repetitive element.|
|Multiplexing||Nice to have||Because de novo projects require a large amount of data, there is normally little room left over for additional samples on a single run and therefore multiplexing is not normally used with these experiments.|
Please contact us at firstname.lastname@example.org if you have any information or opinions you’d like to share about this page.