A reference human genome dataset of the BGISEQ-500 sequencer

doi:10.1093/GIGASCIENCE/GIX024

Open AccessJournal ArticleDOI

A reference human genome dataset of the BGISEQ-500 sequencer

Jie Huang, +20 more

- 01 May 2017 -

GigaScience

- Vol. 6, Iss: 5, pp 1-9

Chats0

TLDR

The first human whole-genome sequencing dataset of BGISEQ-500, generated by sequencing the widely used cell line HG001, can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform.

Abstract:

Background BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete Genomics™ sequencing technologies, it generates short reads at a large scale. Here, we present the first human whole-genome sequencing dataset of BGISEQ-500. The dataset was generated by sequencing the widely used cell line HG001 (NA12878) in two sequencing runs of paired-end 50 bp (PE50) and two sequencing runs of paired-end 100 bp (PE100). We also include examples of the raw images from the sequencer for reference. Finally, we identified variations using this dataset, estimated the accuracy of the variations, and compared to that of the variations identified from similar amounts of publicly available HiSeq2500 data. We found similar single nucleotide polymorphism (SNP) detection accuracy for the BGISEQ-500 PE100 data (false positive rate [FPR] = 0.00020%, sensitivity = 96.20%) compared to the PE150 HiSeq2500 data (FPR = 0.00017%, sensitivity = 96.60%) better SNP detection accuracy than the PE50 data (FPR = 0.0006%, sensitivity = 94.15%). But for insertions and deletions (indels), we found lower accuracy for BGISEQ-500 data (FPR = 0.00069% and 0.00067% for PE100 and PE50 respectively, sensitivity = 88.52% and 70.93%) than the HiSeq2500 data (FPR = 0.00032%, sensitivity = 96.28%). Our dataset can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data.

Chen Yuxin, +14 more

- 01 Jan 2018 -

GigaScience

TL;DR: SOAPnuke is demonstrated as a tool with abundant functions for a “QC-Preprocess-QC” workflow and MapReduce acceleration framework that enables large scalability to distribute all the processing works to an entire compute cluster.

...read moreread less

Journal ArticleDOI

Comparative performance of the BGISEQ-500 vs Illumina HiSeq2500 sequencing platforms for palaeogenomic sequencing

Sarah Siu Tze Mak Mak, +24 more

- 01 Aug 2017 -

GigaScience

TL;DR: The observations suggest that the BGISEQ-500 holds the potential to represent a valid and potentially valuable alternative platform for palaeogenomic data generation that is worthy of future exploration by those interested in the sequencing and analysis of degraded DNA.

...read moreread less

Journal ArticleDOI

Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly.

Ou Wang, +38 more

- 02 Apr 2019 -

Genome Research

TL;DR: StLFR represents an easily automatable solution that enables high-quality sequencing, phasing, SV detection, scaffolding, cost-effective diploid de novo genome assembly, and other long DNA sequencing applications.

...read moreread less

Journal ArticleDOI

Using machine learning approaches for multi-omics data analysis: A review

Parminder Singh Reel, +4 more

- 29 Mar 2021 -

Biotechnology Advances

TL;DR: In this article, the authors explore different integrative machine learning methods which have been used to provide an in-depth understanding of biological systems during normal physiological functioning and in the presence of a disease.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Fast and accurate short read alignment with Burrows–Wheeler transform

Heng Li, +1 more

- 01 Jul 2009 -

Bioinformatics

TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.

...read moreread less

Journal ArticleDOI

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

Aaron McKenna, +10 more

- 01 Sep 2010 -

Genome Research

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

Journal ArticleDOI

A framework for variation discovery and genotyping using next-generation DNA sequencing data

Mark A. DePristo, +22 more

- 01 May 2011 -

Nature Genetics

TL;DR: A unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs is presented.

...read moreread less

Journal ArticleDOI

Sequencing technologies-the next generation

Michael L. Metzker

- 01 Jan 2010 -

Nature Reviews Genetics

TL;DR: A technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments is presented.

...read moreread less