A reference human genome dataset of the BGISEQ-500 sequencer
Jie Huang,Xinming Liang,Yuankai Xuan,Chunyu Geng,Yuxiang Li,Haorong Lu,Shoufang Qu,Xianglin Mei,Hongbo Chen,Ting Yu,Nan Sun,Junhua Rao,Jiahao Wang,Wenwei Zhang,Ying Chen,Sha Liao,Hui Jiang,Xin Liu,Zhaopeng Yang,Feng Mu,Shangxian Gao +20 more
Reads0
Chats0
TLDR
The first human whole-genome sequencing dataset of BGISEQ-500, generated by sequencing the widely used cell line HG001, can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform.Abstract:
Background BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete Genomics™ sequencing technologies, it generates short reads at a large scale. Here, we present the first human whole-genome sequencing dataset of BGISEQ-500. The dataset was generated by sequencing the widely used cell line HG001 (NA12878) in two sequencing runs of paired-end 50 bp (PE50) and two sequencing runs of paired-end 100 bp (PE100). We also include examples of the raw images from the sequencer for reference. Finally, we identified variations using this dataset, estimated the accuracy of the variations, and compared to that of the variations identified from similar amounts of publicly available HiSeq2500 data. We found similar single nucleotide polymorphism (SNP) detection accuracy for the BGISEQ-500 PE100 data (false positive rate [FPR] = 0.00020%, sensitivity = 96.20%) compared to the PE150 HiSeq2500 data (FPR = 0.00017%, sensitivity = 96.60%) better SNP detection accuracy than the PE50 data (FPR = 0.0006%, sensitivity = 94.15%). But for insertions and deletions (indels), we found lower accuracy for BGISEQ-500 data (FPR = 0.00069% and 0.00067% for PE100 and PE50 respectively, sensitivity = 88.52% and 70.93%) than the HiSeq2500 data (FPR = 0.00032%, sensitivity = 96.28%). Our dataset can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform.read more
Citations
More filters
Journal ArticleDOI
SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data.
Chen Yuxin,Yongsheng Chen,Chunmei Shi,Huang Zhibo,Zhang Yong,Li Shengkang,Yan Li,Jia Ye,Chang Yu,Zhuo Li,Xiuqing Zhang,Jian Wang,Huanming Yang,Fang Lin,Qiang Chen +14 more
TL;DR: SOAPnuke is demonstrated as a tool with abundant functions for a “QC-Preprocess-QC” workflow and MapReduce acceleration framework that enables large scalability to distribute all the processing works to an entire compute cluster.
Journal ArticleDOI
Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing
Chao Fang,Huanzi Zhong,Yuxiang Lin,Bing Chen,Mo Han,Huahui Ren,Haorong Lu,Jacob M. Luber,Min Xia,Wangsheng Li,Shayna Stein,Xun Xu,Wenwei Zhang,Radoje Drmanac,Jian Wang,Huanming Yang,Lennart Hammarström,Aleksandar Kostic,Aleksandar Kostic,Karsten Kristiansen,Junhua Li +20 more
TL;DR: The high accuracy and technical reproducibility confirm the applicability of the new high-throughput sequencing platform BGISEQ-500 for metagenomic studies, though caution is still warranted when combining meetagenomic data from different platforms.
Journal ArticleDOI
Comparative performance of the BGISEQ-500 vs Illumina HiSeq2500 sequencing platforms for palaeogenomic sequencing
Sarah Siu Tze Mak Mak,Shyam Gopalakrishnan,Christian Carøe,Christian Carøe,Chunyu Geng,Shanlin Liu,Mikkel-Holger S. Sinding,Mikkel-Holger S. Sinding,Mikkel-Holger S. Sinding,Lukas F. K. Kuderna,Wenwei Zhang,Fu Shujin,Filipe G. Vieira,Mietje Germonpré,Hervé Bocherens,Sergey Fedorov,Bent O. Petersen,Thomas Sicheritz-Pontén,Tomas Marques-Bonet,Tomas Marques-Bonet,Guojie Zhang,Hui Jiang,M. Thomas P. Gilbert,M. Thomas P. Gilbert,M. Thomas P. Gilbert +24 more
TL;DR: The observations suggest that the BGISEQ-500 holds the potential to represent a valid and potentially valuable alternative platform for palaeogenomic data generation that is worthy of future exploration by those interested in the sequencing and analysis of degraded DNA.
Journal ArticleDOI
Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly.
Ou Wang,Robert Chin,Xiaofang Cheng,M. Wu,Qing Mao,Jingbo Tang,Yuhui Sun,Ellis Anderson,Han K. Lam,Dan Chen,Yujun Zhou,Linying Wang,Fei Fan,Yan Zou,Yinlong Xie,Rebecca Yu Zhang,Snezana Drmanac,Darlene Nguyen,Chongjun Xu,Christian Villarosa,Scott Gablenz,Nina Barua,Staci Nguyen,Wenlan Tian,Jia Sophie Liu,Jingwan Wang,Xiao Liu,Xiaojuan Qi,Ao Chen,He Wang,Dong Yuliang,Wenwei Zhang,Andrei Alexeev,Huanming Yang,Jing Wang,Karsten Kristiansen,Xun Xu,Radoje Drmanac,Brock A. Peters +38 more
TL;DR: StLFR represents an easily automatable solution that enables high-quality sequencing, phasing, SV detection, scaffolding, cost-effective diploid de novo genome assembly, and other long DNA sequencing applications.
Journal ArticleDOI
Using machine learning approaches for multi-omics data analysis: A review
TL;DR: In this article, the authors explore different integrative machine learning methods which have been used to provide an in-depth understanding of biological systems during normal physiological functioning and in the presence of a disease.
References
More filters
Journal ArticleDOI
Fast and accurate short read alignment with Burrows–Wheeler transform
Heng Li,Richard Durbin +1 more
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Journal ArticleDOI
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
Aaron McKenna,Matthew Hanna,Eric Banks,Andrey Sivachenko,Kristian Cibulskis,Andrew Kernytsky,Kiran V. Garimella,David Altshuler,Stacey Gabriel,Mark J. Daly,Mark A. DePristo +10 more
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Journal ArticleDOI
A framework for variation discovery and genotyping using next-generation DNA sequencing data
Mark A. DePristo,Eric Banks,Ryan Poplin,Kiran V. Garimella,Jared Maguire,Christopher Hartl,Anthony A. Philippakis,Anthony A. Philippakis,Anthony A. Philippakis,Guillermo del Angel,Manuel A. Rivas,Manuel A. Rivas,Matt Hanna,Aaron McKenna,Timothy Fennell,Andrew Kernytsky,Andrey Sivachenko,Kristian Cibulskis,Stacey Gabriel,David Altshuler,David Altshuler,Mark J. Daly,Mark J. Daly +22 more
TL;DR: A unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs is presented.
Journal ArticleDOI
Sequencing technologies-the next generation
TL;DR: A technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments is presented.
Journal ArticleDOI
From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline
Géraldine A. Van der Auwera,Mauricio O. Carneiro,Christopher Hartl,Ryan Poplin,Guillermo del Angel,Ami Levy-Moonshine,Tadeusz Jordan,Khalid Shakir,David Roazen,Joel Thibault,Eric Banks,Kiran V. Garimella,David Altshuler,Stacey Gabriel,Mark A. DePristo +14 more
TL;DR: This unit describes how to use BWA and the Genome Analysis Toolkit to map genome sequencing data to a reference and produce high‐quality variant calls that can be used in downstream analyses.