Take advantage of this unique opportunity to explore X Ten data.
The Garvan Institute of Medical Research, DNAnexus and AllSeq have teamed up to offer the genomics community open access to the first publicly available test data sets generated using Illumina’s HiSeq X Ten, an extremely powerful sequencing platform. Our goal is to provide sample data that will allow you to gain a deeper understanding of what this technological advancement means for your work today and in the future.
Why is this data available and why is it free?
The Kinghorn Centre for Clinical Genomics at Australia’s Garvan Institute was one of the first three organizations in the world to acquire the Illumina HiSeq X Ten sequencing system. In an effort to enable the scientific community to assess data quality from the HiSeq X Ten, generated from an independent laboratory, they have made reference data sets available for a world first HiSeq X Ten data sharing project.
DNAnexus, a platform provider for genomics analysis and data management, has sponsored the data storage and the bandwidth for downloading the data, and has also run analyses to produce metrics that will help the scientific community understand the results from the “$1000 genome”. The links below provide access to directly download all the data and to investigate one data set through a web-based genome browser.
AllSeq has arranged this data-sharing endeavor as a part of its Sequencing Marketplace effort, which aims to educate scientists about different sequencing technologies and match them with providers that offer these technologies. Their neutral sequencing experts can also answer questions about the HiSeq X Ten technology.
Your access and what to expect.
To develop this sample data set, scientists at the Garvan Institute sequenced the popular Coriell Cell Repository NA12878 reference sample, which has been extensively analyzed by the Genome in a Bottle Consortium.
Two different, high quality data sets are provided (NA12878D and NA12878J). The libraries, which were generated with Illumina’s TruSeq Nano kit using 350bp inserts, were each sequenced on a single lane of an Illumina HiSeq X patterned flow cell, achieving over 120Gb of yield, with > 87% bases with quality > Q30 in just 2.8 days. The two data sets are of similar quality, and both are provided to allow you to assess the reproducibility of the technology. Each data set substantially surpasses the minimum coverage and quality guaranteed by Illumina and is indicative of the potential for the Illumina HiSeq X Ten sequencing system.
To access both data sets, click on the “Access Test Data” button, which takes you to the original FASTQ files, as well as analysis results (BAM and VCF files), and quality metrics calculated using off the shelf tools like FastQC and Picard (MarkDuplicates, CollectInsertSizeMetrics, and CollectWgsMetrics). These data sets are provided as open source by cooperation of Garvan Institute, AllSeq and DNAnexus and are being shared under CC BY 4.0.
The “Genome Browser” button will open a web-based genome browser to visualize one data set (NA12878D). You can access all of this data until September 30th, 2014.
Garvan NA12878 HiSeqX datasets by The Garvan Institute of Medical Research, DNAnexus and AllSeq is licensed under a Creative Commons Attribution 4.0 International License