Targeted resequencing is a variation of re-sequencing where only a small subset of the genome is sequenced, such as the exome, a particular chromosome, a set of genes or a region of interest. Focusing on a subset of the genome is generally done to reduce costs, although there may be certain cases, such as in a clinical setting, where it is important to sequence only particular regions. Also, by focusing all of the sequencing on a small region of the genome, it is possible to detect low levels of variation that might have otherwise been missed. Some researchers are starting to use targeted resequencing for genome-wide association studies (GWAS) instead of arrays as it is better suited for measuring rare alleles.
Targeted resequencing involves alternate methods of sample preparation that produce libraries that represent a desired subset of the genome. Often, this subset is the exome, which is functionally important and therefore is a high candidate target for medical/gene-related research. By targeting the exome of an individual, it is possible to identify known genetic variants that could promote a disease phenotype. Additionally, by targeting the exomes of multiple patients, rare variants can be found and further analysis on the functional consequences of the mutation can be completed. Exome sequencing typically uses either a ‘solution-based capture’ or ‘microarray capture’ method. Solution-based capture is a highly scalable method and is generally cheaper than array-based capture when a large number of samples is involved. Also some researchers feel that it outperforms the array-based method in terms of coverage uniformity. The array-based method is often used when the target design will only be used across a small number of samples (up to 20 or so) as it is easier to make small batches. Studies that focus on even smaller regions of the genome may also employ PCR-based approaches.
After fragmenting the genome, the desired target fragments are captured by hybridizing the sample to baited probes, which can then be separated from the rest of the sample. Separation is achieved by connecting the probes to a bead substrate via an interaction (often magnetic or antigen-antibody complexing) with a probe-attached bead, followed by a wash step to remove unbound, non-targeted fragments. The resulting DNA can is then used to prepare a standard library for next generation sequencing.
The in-solution method provides a user-friendly method of recovering almost all sequences that are targeted by the probe set. Because probes are mobile along with the targeted sequences in the solution-based method, the probability of probe-target hybridization is high. The method is portable to automated workstations and is therefore easily scalable. Because of this, many life science companies (i.e. Agilent, Illumina, Life Technologies, etc) now offer consumable kits that follow this format.
In the array-based method, probes that are fixed to a chip are hybridized to fragmented genomic DNA, immobilizing complementary target sequences. After removal of unbound fragments, the targeted DNA sequences can be eluted and used as a sample for library preparation. The use of microarrays is ideal for recovering all of the targeted sequences, but it does typically require large amounts of input DNA to be successful and does not scale as easily as in-solution capture. Agilent and Roche/NimbleGen offer DNA microarray chips for the capture of targeted regions suitable for library preparation.
The use of multiple primer sets to amplify multiple targets on a genomic template was an early form of targeting sequences for downstream sequencing. By designing sequences that flank targeted regions of interest, researchers can leverage PCR to amplify sequences that can be isolated after gel electrophoresis. This process of targeting genes can simultaneously be used to prepare libraries for next generation sequencing. Primer sets that target regions of interest can be designed to include adapter sequences. Therefore, resulting amplicons are sequencing-competent and do not require the traditional library preparation protocol. Furthermore, sample multiplexing can be achieved via incorporation of barcode sequences into the primer sets.
The capacity offered by next generation sequencing has revolutionized amplicon sequencing to the point of spurring new technologies. Companies such as Raindance and Fluidigm offer platforms which generate libraries that are sequencing-competent and composed purely of targeted sequences. By enabling high-throughput, mini PCR setup, these technologies are ideal for preparing amplicon libraries. When barcoded primer sets are used, this capability can be exploited for comparative studies. One drawback of PCR-based approaches is the limitation of amplicon length, which is determined by PCR itself. However, by targeting overlapping regions, this problem can be circumvented.
Lowering sequencing costs making exome-seq competitive with arrays for GWAS
Benchtop systems (e.g., Ion Torrent and MiSeq) driving use of these methods.
Microarray vs. in-solution capture: Kiialainen, A. et al. (2011) Performance of Microarray and Liquid Based Capture Methods for Target Enrichment for Massively Parallel Sequencing and SNP Discovery. PLoS ONE 6(2):e16486.
Key Platform Characteristics
|# of reads||Important||Typical resequencing experiments will target a relatively small portion of the genome, normally covering a defined number of genes. It is important to attain high coverage of these targets but because their numbers are constrained this is not a critical factor.|
|Read length||Nice to have||Longer read lengths afford more data per sequencing reaction, but this characteristic is not uniquely important to targeted resequencing.|
|Error rate||Critical||Targeted resequencing often aims to discover potential disease contributing mutations. These can include SNPs, indels and other variations. To ensure false positives and false negatives do not confound the discovery process, it is important that error rates are negligible. This is especially important for systematic errors, which cannot be overcome with oversampling.|
|Paired-end reads||Nice to have||Whether reads are single or paired end in nature, the resulting data is aligned to a reference sequence. For this reason, paired end reads are easily accommodated in targeted resequencing experiments, but single end reads can be appropriate as well. The importance increases when discovering or measuring indels.|
|Mate-pair reads||Irrelevant||While mate-pair reads would be useful for detecting large structural rearrangements of the genome (an important feature of cancer genomes), the targeting methods generally don’t pull down fragments large enough to work with this method.|
|Multiplexing||Critical||Targeted resequencing permits analysis of important genomic regions by excluding most of the genome. Since the sequencing capacity of the higher throughput platforms can exceed the needed data for analysis of a single sample, it is critical to be able to barcode multiple samples and include them in a sequencing run to justify the expense.|
Please contact us at firstname.lastname@example.org if you have any information or opinions you’d like to share about this page.