Home
/
Authors
/
Juanbin Zhang

Author

Juanbin Zhang

Other affiliations: Beijing Institute of Genomics

Bio: Juanbin Zhang is an academic researcher from Beijing Genomics Institute. The author has contributed to research in topics: Genome & Genomics. The author has an hindex of 5, co-authored 5 publications receiving 6985 citations. Previous affiliations of Juanbin Zhang include Beijing Institute of Genomics.

Topics: Genome, Genomics, Sequence assembly, Reference genome, Cancer genome sequencing ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Characterization of microRNAs in serum: a novel class of biomarkers for diagnosis of cancer and other diseases

[...]

Xi Chen¹, Yi Ba², Lijia Ma³, Lijia Ma⁴, Xing Cai¹, Yuan Yin¹, Kehui Wang¹, Jigang Guo¹, Yujing Zhang¹, Jiangning Chen¹, Xing Guo¹, Qibin Li⁴, Qibin Li³, Xiaoying Li⁵, Wenjing Wang⁶, Yan Zhang¹, Jin Wang¹, Xueyuan Jiang¹, Yang Xiang¹, Chen Xu¹, Pingping Zheng⁴, Juanbin Zhang⁴, Ruiqiang Li⁴, Hongjie Zhang¹, Xiaobin Shang⁷, Ting Gong⁷, Guang Ning⁵, Jun Wang⁴, Jun Wang³, Ke Zen¹, Junfeng Zhang¹, Chen-Yu Zhang¹ - Show less +28 more•Institutions (7)

Nanjing University¹, Tianjin Medical University Cancer Institute and Hospital², Beijing Institute of Genomics³, Beijing Genomics Institute⁴, Shanghai Jiao Tong University⁵, Centers for Disease Control and Prevention⁶, Tianjin Medical University⁷

01 Oct 2008-Cell Research

TL;DR: It is demonstrated that miRNAs are present in the serum and plasma of humans and other animals such as mice, rats, bovine fetuses, calves, and horses, and can serve as potential biomarkers for the detection of various cancers and other diseases.

...read moreread less

Abstract: Dysregulated expression of microRNAs (miRNAs) in various tissues has been associated with a variety of diseases, including cancers. Here we demonstrate that miRNAs are present in the serum and plasma of humans and other animals such as mice, rats, bovine fetuses, calves, and horses. The levels of miRNAs in serum are stable, reproducible, and consistent among individuals of the same species. Employing Solexa, we sequenced all serum miRNAs of healthy Chinese subjects and found over 100 and 91 serum miRNAs in male and female subjects, respectively. We also identified specific expression patterns of serum miRNAs for lung cancer, colorectal cancer, and diabetes, providing evidence that serum miRNAs contain fingerprints for various diseases. Two non-small cell lung cancer-specific serum miRNAs obtained by Solexa were further validated in an independent trial of 75 healthy donors and 152 cancer patients, using quantitative reverse transcription polymerase chain reaction assays. Through these analyses, we conclude that serum miRNAs can serve as potential biomarkers for the detection of various cancers and other diseases.

...read moreread less

4,184 citations

Journal Article•DOI•

The genome of the cucumber, Cucumis sativus L.

[...]

Sanwen Huang, Ruiqiang Li¹, Zhonghua Zhang, Li Li, Xingfang Gu, Wei Fan, William J. Lucas², Xiaowu Wang, Bingyan Xie, Peixiang Ni, Yuanyuan Ren, Hongmei Zhu, Jun Li, Kui Lin³, Weiwei Jin⁴, Zhangjun Fei⁵, Guangcun Li, Jack E. Staub⁶, Andrzej Kilian, Edwin A. G. van der Vossen⁷, Yang Wu³, Jie Guo³, Jun He, Zhiqi Jia, Yi Ren, Geng Tian, Yao Lu, Jue Ruan⁸, Wubin Qian, Mingwei Wang, Quanfei Huang, Bo Li, Zhaoling Xuan, Jianjun Cao, Asan, Zhigang Wu, Juanbin Zhang, Qingle Cai, Yinqi Bai, Bowen Zhao⁹, Yonghua Han⁴, Ying Li, Xuefeng Li, Shenhao Wang, Qiuxiang Shi, Shiqiang Liu, Won Kyong Cho¹⁰, Jae-Yean Kim¹⁰, Yong Xu, Katarzyna Heller-Uszynska, Han Miao, Zhouchao Cheng, Shengping Zhang, Jian Wu, Yuhong Yang, Houxiang Kang, Man Li, Huiqing Liang, Xiaoli Ren, Zhongbin Shi, Ming Wen, Min Jian, Hailong Yang, Guojie Zhang⁸, Zhentao Yang, Rui Chen, Shifang Liu, Jianwen Li, Lijia Ma⁸, Hui Liu, Yan Zhou, Jing Zhao, Xiaodong Fang, Guoqing Li, Lin Fang, Yingrui Li⁸, Dongyuan Liu, Hongkun Zheng¹, Yong Zhang, Nan Qin, Zhuo Li, Guohua Yang, Shuang Yang, Lars Bolund¹¹, Karsten Kristiansen¹², Hancheng Zheng¹³, Shaochuan Li¹³, Xiuqing Zhang, Huanming Yang, Jing Wang, Rifei Sun, Zhang Baoxi, Shuzhi Jiang, Jun Wang¹², Yongchen Du, Songgang Li - Show less +92 more•Institutions (13)

University of Southern Denmark¹, University of Minnesota², Beijing Normal University³, China Agricultural University⁴, Boyce Thompson Institute for Plant Research⁵, University of Wisconsin-Madison⁶, Wageningen University and Research Centre⁷, Chinese Academy of Sciences⁸, Renmin University of China⁹, Gyeongsang National University¹⁰, Aarhus University¹¹, University of Copenhagen¹², South China University of Technology¹³

01 Dec 2009-Nature Genetics

TL;DR: This study establishes that five of the cucumber's seven chromosomes arose from fusions of ten ancestral chromosomes after divergence from Cucumis melo, and identifies 686 gene clusters related to phloem function.

...read moreread less

Abstract: Cucumber is an economically important crop as well as a model system for sex determination studies and plant vascular biology. Here we report the draft genome sequence of Cucumis sativus var. sativus L., assembled using a novel combination of traditional Sanger and next-generation Illumina GA sequencing technologies to obtain 72.2-fold genome coverage. The absence of recent whole-genome duplication, along with the presence of few tandem duplications, explains the small number of genes in the cucumber. Our study establishes that five of the cucumber's seven chromosomes arose from fusions of ten ancestral chromosomes after divergence from Cucumis melo. The sequenced cucumber genome affords insight into traits such as its sex expression, disease resistance, biosynthesis of cucurbitacin and 'fresh green' odor. We also identify 686 gene clusters related to phloem function. The cucumber genome provides a valuable resource for developing elite cultivars and for studying the evolution and function of the plant vascular system.

...read moreread less

1,289 citations

Journal Article•DOI•

The sequence and de novo assembly of the giant panda genome

[...]

Ruiqiang Li, Wei Fan, Geng Tian¹, Hongmei Zhu, Lin He², Lin He³, Jing Cai⁴, Jing Cai¹, Quanfei Huang, Qingle Cai⁵, Bo Li, Yinqi Bai, Zhihe Zhang⁶, Ya-Ping Zhang⁴, Wen Wang⁴, Jun Li, Fuwen Wei¹, Heng Li⁷, Min Jian, Jianwen Li, Zhaolei Zhang⁸, Rasmus Nielsen⁹, Dawei Li, Wanjun Gu¹⁰, Zhentao Yang, Zhaoling Xuan, Oliver A. Ryder, Frederick C. Leung¹¹, Yan Zhou, Jianjun Cao, Xiao Sun¹⁰, Yonggui Fu¹², Xiaodong Fang, Xiaosen Guo, Bo Wang, Rong Hou⁶, Fujun Shen⁶, Bo Mu, Peixiang Ni, Runmao Lin, Wubin Qian, Guo-Dong Wang¹, Guo-Dong Wang⁴, Chang Yu, Wenhui Nie⁴, Jinhuan Wang⁴, Zhigang Wu, Huiqing Liang, Jiumeng Min⁵, Qi Wu¹, Shifeng Cheng⁵, Jue Ruan¹, Mingwei Wang, Zhongbin Shi, Ming Wen, Binghang Liu, Xiaoli Ren, Huisong Zheng, Dong Dong⁸, Kathleen Cook⁸, Gao Shan, Hao Zhang, Carolin Kosiol¹³, Xueying Xie¹⁰, Zuhong Lu¹⁰, Hancheng Zheng, Yingrui Li¹, Cynthia C. Steiner, Tommy Tsan-Yuk Lam¹¹, Siyuan Lin, Qinghui Zhang, Guoqing Li, Jing Tian, Timing Gong, Hongde Liu¹⁰, Dejin Zhang¹⁰, Lin Fang, Chen Ye, Juanbin Zhang, Wenbo Hu¹², Anlong Xu¹², Yuanyuan Ren, Guojie Zhang⁴, Guojie Zhang¹, Michael William Bruford¹⁴, Qibin Li¹, Lijia Ma¹, Yiran Guo¹, Na An, Yujie Hu¹, Yang Zheng¹, Yongyong Shi³, Zhiqiang Li³, Qing Liu, Yanling Chen, Jing Zhao, Ning Qu⁵, Shancen Zhao, Feng Tian, Xiaoling Wang, Haiyin Wang, Lizhi Xu, Xiao Liu, Tomas Vinar¹⁵, Yajun Wang¹⁶, Tak-Wah Lam¹¹, Siu-Ming Yiu¹¹, Shiping Liu¹⁷, Hemin Zhang, Desheng Li, Yan Huang, Xia Wang, Guohua Yang, Zhi Jiang, Junyi Wang, Nan Qin, Li Li, Jingxiang Li, Lars Bolund, Karsten Kristiansen¹⁸, Gane Ka-Shu Wong¹⁹, Maynard V. Olson²⁰, Xiuqing Zhang, Songgang Li, Huanming Yang, Jing Wang, Jun Wang¹⁸ - Show less +123 more•Institutions (20)

Chinese Academy of Sciences¹, Fudan University², Shanghai Jiao Tong University³, Kunming Institute of Zoology⁴, Shenzhen University⁵, Chengdu Research Base of Giant Panda Breeding⁶, Wellcome Trust⁷, University of Toronto⁸, University of California, Berkeley⁹, Southeast University¹⁰, University of Hong Kong¹¹, Sun Yat-sen University¹², University of Vienna¹³, Cardiff University¹⁴, Comenius University in Bratislava¹⁵, Sichuan University¹⁶, South China University of Technology¹⁷, University of Copenhagen¹⁸, University of Alberta¹⁹, University of Washington²⁰

21 Jan 2010-Nature

TL;DR: Using next-generation sequencing technology alone, a draft sequence of the giant panda genome is generated and assembled, indicating that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition.

...read moreread less

Abstract: Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.

...read moreread less

1,109 citations

Journal Article•DOI•

The diploid genome sequence of an Asian individual.

[...]

Jun Wang, Wei Wang¹, Ruiqiang Li¹, Ruiqiang Li², Yingrui Li¹, Yingrui Li³, Yingrui Li⁴, Geng Tian⁵, Geng Tian¹, Laurie Goodman¹, Wei Fan¹, Junqing Zhang¹, Jun Li¹, Juanbin Zhang¹, Yiran Guo¹, Yiran Guo⁵, Binxiao Feng¹, Heng Li⁶, Heng Li¹, Yao Lu¹, Xiaodong Fang¹, Huiqing Liang¹, Zhenglin Du¹, Dong Li¹, Yiqing Zhao¹, Yiqing Zhao⁵, Yujie Hu⁵, Yujie Hu¹, Zhenzhen Yang¹, Hancheng Zheng¹, Ines Hellmann⁷, Michael Inouye⁶, John E. Pool⁷, Xin Yi¹, Xin Yi⁵, Jing Zhao¹, Jinjie Duan¹, Yan Zhou¹, Junjie Qin⁵, Junjie Qin¹, Lijia Ma⁵, Lijia Ma¹, Guoqing Li¹, Zhentao Yang¹, Guojie Zhang¹, Guojie Zhang⁵, Bin Yang¹, Chang Yu¹, Fang Liang¹, Fang Liang⁵, Wenjie Li¹, Shaochuan Li¹, Dawei Li¹, Peixiang Ni¹, Jue Ruan¹, Jue Ruan⁵, Qibin Li¹, Qibin Li⁵, Hongmei Zhu¹, Dongyuan Liu¹, Zhike Lu¹, Ning Li¹, Ning Li⁵, Guangwu Guo¹, Guangwu Guo⁵, Jianguo Zhang¹, Jia Ye¹, Lin Fang¹, Qin Hao¹, Qin Hao⁵, Quan Chen¹, Quan Chen⁴, Yu Liang¹, Yu Liang⁵, Yeyang Su¹, Yeyang Su⁵, A. san¹, A. san⁵, Cuo Ping⁵, Cuo Ping¹, Shuang Yang¹, Fang Chen¹, Fang Chen⁵, Li Li¹, Ke Zhou¹, Hongkun Zheng¹, Hongkun Zheng², Yuanyuan Ren¹, Ling Yang¹, Yang Gao³, Yang Gao¹, Guohua Yang¹, Guohua Yang⁸, Zhuo Li¹, Xiaoli Feng¹, Karsten Kristiansen², Gane Ka-Shu Wong⁹, Gane Ka-Shu Wong¹, Rasmus Nielsen⁷, Richard Durbin⁶, Lars Bolund¹, Lars Bolund¹⁰, Xiuqing Zhang¹, Xiuqing Zhang³, Songgang Li⁸, Songgang Li⁴, Songgang Li¹, Huanming Yang⁸, Huanming Yang¹, Jian Wang⁸, Jian Wang¹ - Show less +107 more•Institutions (10)

Beijing Genomics Institute¹, University of Southern Denmark², Beijing Institute of Genomics³, Peking University⁴, Chinese Academy of Sciences⁵, Wellcome Trust Sanger Institute⁶, University of California, Berkeley⁷, Shenzhen University⁸, University of Alberta⁹, Aarhus University¹⁰

06 Nov 2008-Nature

TL;DR: Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly, and the potential usefulness of next-generation sequencing technologies for personal genomics.

...read moreread less

Abstract: Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics.

...read moreread less

963 citations

Journal Article•DOI•

The sequence and de novo assembly of the giant panda genome [Correction]

[...]

Ruiqiang Li¹, Ruiqiang Li², Wei Fan², Geng Tian², Geng Tian³, Zhu Hongmei², Lin He⁴, Lin He⁵, Jing Cai⁶, Jing Cai³, Quanfei Huang², Qingle Cai², Qingle Cai⁷, Bo Li², Yinqi Bai², Zhihe Zhang⁸, Ya-Ping Zhang⁶, Wen Wang⁶, Jun Li², Fuwen Wei, Heng Li⁹, Min Jian², Jianwen Li², Zhaolei Zhang¹⁰, Rasmus Nielsen¹¹, Dawei Li², Wanjun Gu¹², Zhentao Yang², Zhaoling Xuan², Oliver A. Ryder, Frederick C. Leung¹³, Yan Zhou², Jianjun Cao², Xiao Sun¹², Yonggui Fu¹⁴, Xiaodong Fang², Xiaosen Guo², Bo Wang², Rong Hou⁸, Fujun Shen⁸, Bo Mu², Peixiang Ni², Runmao Lin², Wubin Qian², Guo-Dong Wang⁶, Guo-Dong Wang³, Chang Yu², Wenhui Nie⁶, Jinhuan Wang⁶, Zhigang Wu², Huiqing Liang², Jiumeng Min⁷, Jiumeng Min², Qi Wu, Shifeng Cheng⁷, Shifeng Cheng², Jue Ruan³, Jue Ruan², Mingwei Wang², Zhongbin Shi², Ming Wen², Binghang Liu², Xiaoli Ren², Huisong Zheng², Dong Dong¹⁰, Kathleen Cook¹⁰, Gao Shan², Hao Zhang², Carolin Kosiol¹⁵, Xueying Xie¹², Zuhong Lu¹², Hancheng Zheng², Yingrui Li², Yingrui Li³, Cynthia C. Steiner, Tommy Tsan-Yuk Lam¹³, Siyuan Lin², Qinghui Zhang², Guoqing Li², Jing Tian², Timing Gong², Hongde Liu¹², Dejin Zhang¹², Lin Fang², Chen Ye², Juanbin Zhang², Wenbo Hu¹⁴, Anlong Xu¹⁴, Yuanyuan Ren², Guojie Zhang², Guojie Zhang³, Guojie Zhang⁶, Michael William Bruford¹⁶, Qibin Li³, Qibin Li², Lijia Ma³, Lijia Ma², Yiran Guo³, Yiran Guo², Na An², Yujie Hu², Yujie Hu³, Yang Zheng², Yang Zheng³, Yongyong Shi⁴, Zhiqiang Li⁴, Qing Liu², Yanling Chen², Jing Zhao², Ning Qu⁷, Ning Qu², Shancen Zhao², Feng Tian², Xiaoling Wang², Haiyin Wang², Lizhi Xu², Xiao Liu², Tomas Vinar¹⁷, Yajun Wang¹⁸, Tak-Wah Lam¹³, Siu-Ming Yiu¹³, Shiping Liu¹⁹, Hemin Zhang, Desheng Li, Yan Huang, Xia Wang², Guohua Yang², Zhi Jiang², Junyi Wang², Nan Qin², Li Li², Jingxiang Li², Lars Bolund², Karsten Kristiansen², Karsten Kristiansen¹, Gane Ka-Shu Wong², Gane Ka-Shu Wong²⁰, Maynard V. Olson²¹, Xiuqing Zhang², Songgang Li², Huanming Yang², Jian Wang², Jun Wang¹, Jun Wang² - Show less +140 more•Institutions (21)

University of Copenhagen¹, Beijing Institute of Genomics², Chinese Academy of Sciences³, Shanghai Jiao Tong University⁴, Fudan University⁵, Kunming Institute of Zoology⁶, Shenzhen University⁷, Chengdu Research Base of Giant Panda Breeding⁸, Wellcome Trust⁹, University of Toronto¹⁰, University of California, Berkeley¹¹, Southeast University¹², University of Hong Kong¹³, Sun Yat-sen University¹⁴, University of Veterinary Medicine Vienna¹⁵, Cardiff University¹⁶, Comenius University in Bratislava¹⁷, Sichuan University¹⁸, South China University of Technology¹⁹, University of Alberta²⁰, University of Washington²¹

25 Feb 2010-Nature

TL;DR: This corrects the article to show that the Higgs boson genome is a “spatially aggregating ‘spatiotemporal ’ organisation’, rather than a ‘cell-based’ organisation, which is more closely related to the immune system.

...read moreread less

Abstract: Nature 463, 311–317 (2010) In this Article, the Latin species name of the giant panda was written incorrectly as Ailuropoda melanoleura. The correct name is Ailuropoda melanoleuca.

...read moreread less

18 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

[...]

Aaron McKenna¹, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran V. Garimella, David Altshuler, Stacey Gabriel, Mark J. Daly, Mark A. DePristo - Show less +7 more•Institutions (1)

Broad Institute¹

01 Sep 2010-Genome Research

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

20,557 citations

Journal Article•DOI•

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

[...]

Ben Langmead¹, Cole Trapnell¹, Mihai Pop¹, Steven L. Salzberg¹•Institutions (1)

University of Maryland, College Park¹

04 Mar 2009-Genome Biology

TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.

...read moreread less

Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

...read moreread less

20,335 citations

Journal Article•DOI•

A Map of Human Genome Variation From Population-Scale Sequencing

[...]

Gonçalo R. Abecasis¹, David Altshuler², David Altshuler³, Adam Auton⁴, Lisa D Brooks⁵, Richard Durbin⁶, Richard A. Gibbs⁷, Matthew E. Hurles⁶, Gil McVean⁴ - Show less +5 more•Institutions (7)

University of Michigan¹, Harvard University², Broad Institute³, University of Oxford⁴, Johns Hopkins University⁵, Wellcome Trust Sanger Institute⁶, Baylor College of Medicine⁷

28 Oct 2010-Nature

TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.

...read moreread less

Abstract: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

...read moreread less

7,538 citations

Journal Article•DOI•

Sequencing technologies-the next generation

[...]

Michael L. Metzker¹•Institutions (1)

Baylor College of Medicine¹

01 Jan 2010-Nature Reviews Genetics

TL;DR: A technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments is presented.

...read moreread less

Abstract: Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate genome information. This challenge has catalysed the development of next-generation sequencing (NGS) technologies. The inexpensive production of large volumes of sequence data is the primary advantage over conventional methods. Here, I present a technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments. I also outline the broad range of applications for NGS technologies, in addition to providing guidelines for platform selection to address biological questions of interest.

...read moreread less

7,023 citations

Journal Article•DOI•

SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler

[...]

Ruibang Luo¹, Binghang Liu¹, Yinlong Xie², Yinlong Xie¹, Zhenyu Li¹, Weihua Huang, Jianying Yuan, Guangzhu He, Yanxiang Chen, Qi Pan, Yunjie Liu, Jingbo Tang, Gengxiong Wu, Hao Zhang, Yujian Shi, Yong Liu, Chang Yu, Bo Wang, Yao Lu, Changlei Han, David W. Cheung¹, Siu-Ming Yiu¹, Shaoliang Peng³, Zhu Xiao-qian³, Guangming Liu³, Xiangke Liao³, Yingrui Li¹, Huanming Yang, Jian Wang, Tak-Wah Lam¹, Jun Wang - Show less +27 more•Institutions (3)

University of Hong Kong¹, South China University of Technology², National University of Defense Technology³

27 Dec 2012-GigaScience

TL;DR: This work provides an updated assembly version of the 2008 Asian genome using SOAPdenovo2, a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome.

...read moreread less

Abstract: There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions. To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome. Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.

...read moreread less

4,284 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse