Complete Genomics is unique in the NGS market in that instead of selling their proprietary instrumentation they use it to offer NGS services. They have chosen to focus on a single application: human whole-genome resequencing. At the moment they don’t offer any other applications (e.g., exome, RNA-Seq, ChIP-Seq, etc). Because they operate solely as a service, it is less meaningful to compare the standard attributes (read length, GB/run, reads/run, etc) with those of the other vendors. Therefore, it’s more instructive to focus on the information they generate for a standard sequencing service contract: 40x coverage (min), SNP call rate of >90%, with a reference consensus accuracy of >99.999% (1 base-call error per 100kB). As part of the standard analysis package, they offer variant files for SNVs, InDels, CNVs, junctions, mobile element insertions as well as variant annotations from RefSeq, dbSNP, COSMIC, mirBASE, Pfam, and DGV. While raw read data is available as an output, Complete Genomics focuses on a ‘genome variant file’, potentially simplifying follow up data analysis. Their guaranteed turnaround time (maximum time) is 90-120 days (although according to their 11Q1 statement, their median time is down to 70 days).
Complete Genomics’ process can be broken down into four main steps: 1) Library Construction, 2) Self-assembling DNA Nanoarrays, 3) cPAL Sequencing and 4) Assembly and Analysis. While researchers will not have to perform any of these steps as the process is offered only as a service, it is helpful in understanding the nature of the data that is generated.
- Library Construction
The process starts with 7.5ug of unamplified human genomic DNA which is then sheared into 500bp fragments. Through an iterative process, four separate artificial adapter sequences are added to the fragment by using type IIS restriction endonucleases. The modified fragments are then amplified roughly 200-fold in solution through the creation of DNA nano-balls (DNBs) via rolling circle amplification. The creation of DNBs is one of two main differentiating features of Complete Genomics’ technology.
- Self-assembling DNA Nanoarrays
The DNBs are then arrayed on a 1”x3” patterned substrate which contains up to 2.85 billion spots. The arraying process occurs via self-assembly as each activated spot can contain one (and only one) DNB. In practice, approximately 90% of the spots are filled during the assembly process.
- cPAL Sequencing
The Combinatorial Probe-Anchor Ligation (cPAL) sequencing process is the second main differentiating feature of Complete Genomics’ technology. It uses a combination of sequencing by hybridization (SBH) and sequencing by ligation (SBL). Up to 10 bases are sequenced on either side of each adaptor sequence. This is done through the hybridization and ligation of pools of probes. Each probe consists of an anchor sequence (which is complementary to the adaptor) and an additional nine bases which are degenerate at all but one position, the one being interrogated. The interrogation position is labeled with one of four dyes (one for each base). After hybridization and ligation, the excess probes are washed off and the nanoarray is imaged. The anchor-probe complex is then washed away and the entire process is repeated using anchors with a different interrogation base. After 5 contiguous bases are read, the process is repeated using anchors with five additional degenerate bases which allows for a total of 10 bases to be sequenced on each side of the adaptor. A total of 70 bases from the original 500b fragment are sequenced, 35 bases on each end. Due to the spacing of the adaptors, the 35 bases are not contiguous as they contain a two base gap and a five base gap. The spacing of the adaptors is also what limits the number of sequenced bases to 70 instead of the theoretical 80 bases.
- Assembly and Analysis
The format of the sequencing data generated is somewhat different from that of other platforms (due to the gaps placed within the 35 sequenced bases), necessitating atypical alignment algorithms that can handle the gaps. However, an optimized analysis pipeline is performed by Complete Genomics as part of the standard service that they offer. They perform a local de novo assembly to look for variations, indel calls of up to 50 bases, structural variation calls, moveable element insertion calls, and copy number calls. Starting in August 2012 customers will also have six months access to the Ingenuity Variant Analysis application, which offers SNP annotation and access to the Scripps Health Wellderly dataset.
Key Papers: Technology description: Drmanac, R et al. (2010) Human Genome Sequencing Using Unchained Base Reads on Self-Assembling DNA Nanoarrays. Science 327, 78-81
Please contact us at firstname.lastname@example.org if you have any information or opinions you’d like to share about this page.