Home
/
Authors
/
Jing Zhao

Author

Jing Zhao

Other affiliations: Beijing Genomics Institute, Beijing Institute of Genomics

Bio: Jing Zhao is an academic researcher from University of Copenhagen. The author has contributed to research in topics: Genome & Genomics. The author has an hindex of 11, co-authored 18 publications receiving 6047 citations. Previous affiliations of Jing Zhao include Beijing Genomics Institute & Beijing Institute of Genomics.

Topics: Genome, Genomics, Phylogenomics, Sequence assembly, Reference genome ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Phylogenomics resolves the timing and pattern of insect evolution

[...]

Bernhard Misof, Shanlin Liu, Karen Meusemann¹, Ralph S. Peters, Alexander Donath, Christoph Mayer, Paul B. Frandsen², Jessica L. Ware², Tomas Flouri³, Rolf G. Beutel⁴, Oliver Niehuis, Malte Petersen, Fernando Izquierdo-Carrasco³, Torsten Wappler⁵, Jes Rust⁵, Andre J. Aberer³, Ulrike Aspöck⁶, Ulrike Aspöck⁷, Horst Aspöck⁶, Daniela Bartel⁶, Alexander Blanke⁸, Simon Berger³, Alexander Böhm⁶, Thomas R. Buckley⁹, Brett Calcott¹⁰, Junqing Chen, Frank Friedrich¹¹, Makiko Fukui¹², Mari Fujita⁸, Carola Greve, Peter Grobe, Shengchang Gu, Ying Huang, Lars S. Jermiin¹, Akito Y. Kawahara¹³, Lars Krogmann¹⁴, Martin Kubiak¹¹, Robert Lanfear¹⁵, Robert Lanfear¹⁶, Robert Lanfear¹⁷, Harald Letsch⁶, Yiyuan Li, Zhenyu Li, Jiguang Li, Haorong Lu, Ryuichiro Machida⁸, Yuta Mashimo⁸, Pashalia Kapli³, Pashalia Kapli¹⁸, Duane D. McKenna¹⁹, Guanliang Meng, Yasutaka Nakagaki⁸, José Luis Navarrete-Heredia²⁰, Michael Ott²¹, Yanxiang Ou, Günther Pass⁶, Lars Podsiadlowski⁵, Hans Pohl⁴, Björn M. von Reumont²², Kai Schütte¹¹, Kaoru Sekiya⁸, Shota Shimizu⁸, Adam Slipinski¹, Alexandros Stamatakis²³, Alexandros Stamatakis³, Wenhui Song, Xu Su, Nikolaus U. Szucsich⁶, Meihua Tan, Xuemei Tan, Min Tang, Jingbo Tang, Gerald Timelthaler⁶, Shigekazu Tomizuka⁸, Michelle D. Trautwein²⁴, Xiaoli Tong²⁵, Toshiki Uchifune⁸, Manfred Walzl⁶, Brian M. Wiegmann²⁶, Jeanne Wilbrandt, Benjamin Wipfler⁴, Thomas K. F. Wong¹, Qiong Wu, Gengxiong Wu, Yinlong Xie, Shenzhou Yang, Qing Yang, David K. Yeates¹, Kazunori Yoshizawa²⁷, Qing Zhang, Rui Zhang, Wenwei Zhang, Yunhui Zhang, Jing Zhao, Chengran Zhou, Lili Zhou, Tanja Ziesmann, Shijie Zou, Yingrui Li, Xun Xu, Yong Zhang, Huanming Yang, Jian Wang, Jun Wang, Karl M. Kjer², Xin Zhou - Show less +102 more•Institutions (27)

Commonwealth Scientific and Industrial Research Organisation¹, Rutgers University², Heidelberg Institute for Theoretical Studies³, University of Jena⁴, University of Bonn⁵, University of Vienna⁶, Naturhistorisches Museum⁷, University of Tsukuba⁸, Landcare Research⁹, Johns Hopkins University¹⁰, University of Hamburg¹¹, Ehime University¹², Florida Museum of Natural History¹³, Staatliches Museum für Naturkunde Stuttgart¹⁴, National Evolutionary Synthesis Center¹⁵, Australian National University¹⁶, Macquarie University¹⁷, American Museum of Natural History¹⁸, University of Memphis¹⁹, University of Guadalajara²⁰, Bavarian Academy of Sciences and Humanities²¹, Natural History Museum²², Karlsruhe Institute of Technology²³, California Academy of Sciences²⁴, South China Agricultural University²⁵, North Carolina State University²⁶, Hokkaido University²⁷

07 Nov 2014-Science

TL;DR: The phylogeny of all major insect lineages reveals how and when insects diversified and provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.

...read moreread less

Abstract: Insects are the most speciose group of animals, but the phylogenetic relationships of many major lineages remain unresolved. We inferred the phylogeny of insects from 1478 protein-coding genes. Phylogenomic analyses of nucleotide and amino acid sequences, with site-specific nucleotide or domain-specific amino acid substitution models, produced statistically robust and congruent results resolving previously controversial phylogenetic relations hips. We dated the origin of insects to the Early Ordovician [~479 million years ago (Ma)], of insect flight to the Early Devonian (~406 Ma), of major extant lineages to the Mississippian (~345 Ma), and the major diversification of holometabolous insects to the Early Cretaceous. Our phylogenomic study provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.

...read moreread less

1,998 citations

Journal Article•DOI•

The genome of the cucumber, Cucumis sativus L.

[...]

Sanwen Huang, Ruiqiang Li¹, Zhonghua Zhang, Li Li, Xingfang Gu, Wei Fan, William J. Lucas², Xiaowu Wang, Bingyan Xie, Peixiang Ni, Yuanyuan Ren, Hongmei Zhu, Jun Li, Kui Lin³, Weiwei Jin⁴, Zhangjun Fei⁵, Guangcun Li, Jack E. Staub⁶, Andrzej Kilian, Edwin A. G. van der Vossen⁷, Yang Wu³, Jie Guo³, Jun He, Zhiqi Jia, Yi Ren, Geng Tian, Yao Lu, Jue Ruan⁸, Wubin Qian, Mingwei Wang, Quanfei Huang, Bo Li, Zhaoling Xuan, Jianjun Cao, Asan, Zhigang Wu, Juanbin Zhang, Qingle Cai, Yinqi Bai, Bowen Zhao⁹, Yonghua Han⁴, Ying Li, Xuefeng Li, Shenhao Wang, Qiuxiang Shi, Shiqiang Liu, Won Kyong Cho¹⁰, Jae-Yean Kim¹⁰, Yong Xu, Katarzyna Heller-Uszynska, Han Miao, Zhouchao Cheng, Shengping Zhang, Jian Wu, Yuhong Yang, Houxiang Kang, Man Li, Huiqing Liang, Xiaoli Ren, Zhongbin Shi, Ming Wen, Min Jian, Hailong Yang, Guojie Zhang⁸, Zhentao Yang, Rui Chen, Shifang Liu, Jianwen Li, Lijia Ma⁸, Hui Liu, Yan Zhou, Jing Zhao, Xiaodong Fang, Guoqing Li, Lin Fang, Yingrui Li⁸, Dongyuan Liu, Hongkun Zheng¹, Yong Zhang, Nan Qin, Zhuo Li, Guohua Yang, Shuang Yang, Lars Bolund¹¹, Karsten Kristiansen¹², Hancheng Zheng¹³, Shaochuan Li¹³, Xiuqing Zhang, Huanming Yang, Jing Wang, Rifei Sun, Zhang Baoxi, Shuzhi Jiang, Jun Wang¹², Yongchen Du, Songgang Li - Show less +92 more•Institutions (13)

University of Southern Denmark¹, University of Minnesota², Beijing Normal University³, China Agricultural University⁴, Boyce Thompson Institute for Plant Research⁵, University of Wisconsin-Madison⁶, Wageningen University and Research Centre⁷, Chinese Academy of Sciences⁸, Renmin University of China⁹, Gyeongsang National University¹⁰, Aarhus University¹¹, University of Copenhagen¹², South China University of Technology¹³

01 Dec 2009-Nature Genetics

TL;DR: This study establishes that five of the cucumber's seven chromosomes arose from fusions of ten ancestral chromosomes after divergence from Cucumis melo, and identifies 686 gene clusters related to phloem function.

...read moreread less

Abstract: Cucumber is an economically important crop as well as a model system for sex determination studies and plant vascular biology. Here we report the draft genome sequence of Cucumis sativus var. sativus L., assembled using a novel combination of traditional Sanger and next-generation Illumina GA sequencing technologies to obtain 72.2-fold genome coverage. The absence of recent whole-genome duplication, along with the presence of few tandem duplications, explains the small number of genes in the cucumber. Our study establishes that five of the cucumber's seven chromosomes arose from fusions of ten ancestral chromosomes after divergence from Cucumis melo. The sequenced cucumber genome affords insight into traits such as its sex expression, disease resistance, biosynthesis of cucurbitacin and 'fresh green' odor. We also identify 686 gene clusters related to phloem function. The cucumber genome provides a valuable resource for developing elite cultivars and for studying the evolution and function of the plant vascular system.

...read moreread less

1,289 citations

Journal Article•DOI•

The sequence and de novo assembly of the giant panda genome

[...]

Ruiqiang Li, Wei Fan, Geng Tian¹, Hongmei Zhu, Lin He², Lin He³, Jing Cai⁴, Jing Cai¹, Quanfei Huang, Qingle Cai⁵, Bo Li, Yinqi Bai, Zhihe Zhang⁶, Ya-Ping Zhang⁴, Wen Wang⁴, Jun Li, Fuwen Wei¹, Heng Li⁷, Min Jian, Jianwen Li, Zhaolei Zhang⁸, Rasmus Nielsen⁹, Dawei Li, Wanjun Gu¹⁰, Zhentao Yang, Zhaoling Xuan, Oliver A. Ryder, Frederick C. Leung¹¹, Yan Zhou, Jianjun Cao, Xiao Sun¹⁰, Yonggui Fu¹², Xiaodong Fang, Xiaosen Guo, Bo Wang, Rong Hou⁶, Fujun Shen⁶, Bo Mu, Peixiang Ni, Runmao Lin, Wubin Qian, Guo-Dong Wang⁴, Guo-Dong Wang¹, Chang Yu, Wenhui Nie⁴, Jinhuan Wang⁴, Zhigang Wu, Huiqing Liang, Jiumeng Min⁵, Qi Wu¹, Shifeng Cheng⁵, Jue Ruan¹, Mingwei Wang, Zhongbin Shi, Ming Wen, Binghang Liu, Xiaoli Ren, Huisong Zheng, Dong Dong⁸, Kathleen Cook⁸, Gao Shan, Hao Zhang, Carolin Kosiol¹³, Xueying Xie¹⁰, Zuhong Lu¹⁰, Hancheng Zheng, Yingrui Li¹, Cynthia C. Steiner, Tommy Tsan-Yuk Lam¹¹, Siyuan Lin, Qinghui Zhang, Guoqing Li, Jing Tian, Timing Gong, Hongde Liu¹⁰, Dejin Zhang¹⁰, Lin Fang, Chen Ye, Juanbin Zhang, Wenbo Hu¹², Anlong Xu¹², Yuanyuan Ren, Guojie Zhang⁴, Guojie Zhang¹, Michael William Bruford¹⁴, Qibin Li¹, Lijia Ma¹, Yiran Guo¹, Na An, Yujie Hu¹, Yang Zheng¹, Yongyong Shi², Zhiqiang Li², Qing Liu, Yanling Chen, Jing Zhao, Ning Qu⁵, Shancen Zhao, Feng Tian, Xiaoling Wang, Haiyin Wang, Lizhi Xu, Xiao Liu, Tomas Vinar¹⁵, Yajun Wang¹⁶, Tak-Wah Lam¹¹, Siu-Ming Yiu¹¹, Shiping Liu¹⁷, Hemin Zhang, Desheng Li, Yan Huang, Xia Wang, Guohua Yang, Zhi Jiang, Junyi Wang, Nan Qin, Li Li, Jingxiang Li, Lars Bolund, Karsten Kristiansen¹⁸, Gane Ka-Shu Wong¹⁹, Maynard V. Olson²⁰, Xiuqing Zhang, Songgang Li, Huanming Yang, Jing Wang, Jun Wang¹⁸ - Show less +123 more•Institutions (20)

Chinese Academy of Sciences¹, Shanghai Jiao Tong University², Fudan University³, Kunming Institute of Zoology⁴, Shenzhen University⁵, Chengdu Research Base of Giant Panda Breeding⁶, Wellcome Trust⁷, University of Toronto⁸, University of California, Berkeley⁹, Southeast University¹⁰, University of Hong Kong¹¹, Sun Yat-sen University¹², University of Vienna¹³, Cardiff University¹⁴, Comenius University in Bratislava¹⁵, Sichuan University¹⁶, South China University of Technology¹⁷, University of Copenhagen¹⁸, University of Alberta¹⁹, University of Washington²⁰

21 Jan 2010-Nature

TL;DR: Using next-generation sequencing technology alone, a draft sequence of the giant panda genome is generated and assembled, indicating that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition.

...read moreread less

Abstract: Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.

...read moreread less

1,109 citations

Journal Article•DOI•

The diploid genome sequence of an Asian individual.

[...]

Jun Wang, Wei Wang¹, Ruiqiang Li¹, Ruiqiang Li², Yingrui Li³, Yingrui Li⁴, Yingrui Li¹, Geng Tian⁵, Geng Tian¹, Laurie Goodman¹, Wei Fan¹, Junqing Zhang¹, Jun Li¹, Juanbin Zhang¹, Yiran Guo⁵, Yiran Guo¹, Binxiao Feng¹, Heng Li⁶, Heng Li¹, Yao Lu¹, Xiaodong Fang¹, Huiqing Liang¹, Zhenglin Du¹, Dong Li¹, Yiqing Zhao⁵, Yiqing Zhao¹, Yujie Hu¹, Yujie Hu⁵, Zhenzhen Yang¹, Hancheng Zheng¹, Ines Hellmann⁷, Michael Inouye⁶, John E. Pool⁷, Xin Yi⁵, Xin Yi¹, Jing Zhao¹, Jinjie Duan¹, Yan Zhou¹, Junjie Qin⁵, Junjie Qin¹, Lijia Ma⁵, Lijia Ma¹, Guoqing Li¹, Zhentao Yang¹, Guojie Zhang¹, Guojie Zhang⁵, Bin Yang¹, Chang Yu¹, Fang Liang⁵, Fang Liang¹, Wenjie Li¹, Shaochuan Li¹, Dawei Li¹, Peixiang Ni¹, Jue Ruan⁵, Jue Ruan¹, Qibin Li¹, Qibin Li⁵, Hongmei Zhu¹, Dongyuan Liu¹, Zhike Lu¹, Ning Li⁵, Ning Li¹, Guangwu Guo¹, Guangwu Guo⁵, Jianguo Zhang¹, Jia Ye¹, Lin Fang¹, Qin Hao⁵, Qin Hao¹, Quan Chen¹, Quan Chen³, Yu Liang⁵, Yu Liang¹, Yeyang Su⁵, Yeyang Su¹, A. san⁵, A. san¹, Cuo Ping¹, Cuo Ping⁵, Shuang Yang¹, Fang Chen¹, Fang Chen⁵, Li Li¹, Ke Zhou¹, Hongkun Zheng¹, Hongkun Zheng², Yuanyuan Ren¹, Ling Yang¹, Yang Gao⁴, Yang Gao¹, Guohua Yang¹, Guohua Yang⁸, Zhuo Li¹, Xiaoli Feng¹, Karsten Kristiansen², Gane Ka-Shu Wong¹, Gane Ka-Shu Wong⁹, Rasmus Nielsen⁷, Richard Durbin⁶, Lars Bolund¹, Lars Bolund¹⁰, Xiuqing Zhang¹, Xiuqing Zhang⁴, Songgang Li³, Songgang Li⁸, Songgang Li¹, Huanming Yang¹, Huanming Yang⁸, Jian Wang⁸, Jian Wang¹ - Show less +107 more•Institutions (10)

Beijing Genomics Institute¹, University of Southern Denmark², Peking University³, Beijing Institute of Genomics⁴, Chinese Academy of Sciences⁵, Wellcome Trust Sanger Institute⁶, University of California, Berkeley⁷, Shenzhen University⁸, University of Alberta⁹, Aarhus University¹⁰

06 Nov 2008-Nature

TL;DR: Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly, and the potential usefulness of next-generation sequencing technologies for personal genomics.

...read moreread less

Abstract: Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics.

...read moreread less

963 citations

Journal Article•DOI•

Ancient human genome sequence of an extinct Palaeo-Eskimo

[...]

Morten Rasmussen¹, Yingrui Li¹, Stinus Lindgreen¹, Jakob Skou Pedersen¹, Anders Albrechtsen¹, Ida Moltke¹, Mait Metspalu², Ene Metspalu², Toomas Kivisild³, Toomas Kivisild², Ramneek Gupta⁴, Marcelo Bertalan⁴, Kasper Nielsen⁴, M. Thomas P. Gilbert¹, Yong Wang⁵, Maanasa Raghavan¹, Maanasa Raghavan⁶, Paula F. Campos¹, Hanne Munkholm Kamp¹, Andrew Wilson⁷, Andrew Gledhill⁷, Silvana R. Tridico⁸, Silvana R. Tridico⁹, Michael Bunce⁹, Eline D. Lorenzen¹, Jonas Binladen¹, Xiaosen Guo¹, Jing Zhao¹, Xiuqing Zhang¹, Hao Zhang¹, Zhuo Li¹, Minfeng Chen¹, Ludovic Orlando¹⁰, Karsten Kristiansen¹, Mads Bak¹, Niels Tommerup¹, Christian Bendixen¹¹, Tracey Pierre³, Bjarne Grønnow, Morten Meldgaard¹, Claus Andreasen, S. A. Fedorova¹², S. A. Fedorova², Ludmila P. Osipova¹³, Thomas Higham⁶, Christopher Bronk Ramsey⁷, Thomas Hansen¹, Finn Cilius Nielsen¹, Michael H. Crawford¹⁴, Søren Brunak⁴, Søren Brunak¹, Thomas Sicheritz-Pontén⁴, Richard Villems², Rasmus Nielsen¹, Rasmus Nielsen⁵, Anders Krogh¹, Jun Wang¹, Eske Willerslev¹ - Show less +54 more•Institutions (14)

University of Copenhagen¹, Estonian Biocentre², University of Cambridge³, Technical University of Denmark⁴, University of California, Berkeley⁵, University of Oxford⁶, University of Bradford⁷, Australian Federal Police⁸, Murdoch University⁹, École normale supérieure de Lyon¹⁰, Aarhus University¹¹, Russian Academy¹², Russian Academy of Sciences¹³, University of Kansas¹⁴

11 Feb 2010-Nature

TL;DR: This genome sequence of an ancient human obtained from ∼4,000-year-old permafrost-preserved hair provides evidence for a migration from Siberia into the New World some 5,500 years ago, independent of that giving rise to the modern Native Americans and Inuit.

...read moreread less

Abstract: We report here the genome sequence of an ancient human. Obtained from approximately 4,000-year-old permafrost-preserved hair, the genome represents a male individual from the first known culture to settle in Greenland. Sequenced to an average depth of 20x, we recover 79% of the diploid genome, an amount close to the practical limit of current sequencing technologies. We identify 353,151 high-confidence single-nucleotide polymorphisms (SNPs), of which 6.8% have not been reported previously. We estimate raw read contamination to be no higher than 0.8%. We use functional SNP assessment to assign possible phenotypic characteristics of the individual that belonged to a culture whose location has yielded only trace human remains. We compare the high-confidence SNPs to those of contemporary populations to find the populations most closely related to the individual. This provides evidence for a migration from Siberia into the New World some 5,500 years ago, independent of that giving rise to the modern Native Americans and Inuit.

...read moreread less

749 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

[...]

Aaron McKenna¹, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran V. Garimella, David Altshuler, Stacey Gabriel, Mark J. Daly, Mark A. DePristo - Show less +7 more•Institutions (1)

Broad Institute¹

01 Sep 2010-Genome Research

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

20,557 citations

Journal Article•DOI•

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

[...]

Ben Langmead¹, Cole Trapnell¹, Mihai Pop¹, Steven L. Salzberg¹•Institutions (1)

University of Maryland, College Park¹

04 Mar 2009-Genome Biology

TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.

...read moreread less

Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

...read moreread less

20,335 citations

Journal Article•DOI•

A Map of Human Genome Variation From Population-Scale Sequencing

[...]

Gonçalo R. Abecasis¹, David Altshuler², David Altshuler³, Adam Auton⁴, Lisa D Brooks⁵, Richard Durbin⁶, Richard A. Gibbs⁷, Matthew E. Hurles⁶, Gil McVean⁴ - Show less +5 more•Institutions (7)

University of Michigan¹, Harvard University², Broad Institute³, University of Oxford⁴, Johns Hopkins University⁵, Wellcome Trust Sanger Institute⁶, Baylor College of Medicine⁷

28 Oct 2010-Nature

TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.

...read moreread less

Abstract: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

...read moreread less

7,538 citations

Journal Article•DOI•

Sequencing technologies-the next generation

[...]

Michael L. Metzker¹•Institutions (1)

Baylor College of Medicine¹

01 Jan 2010-Nature Reviews Genetics

TL;DR: A technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments is presented.

...read moreread less

Abstract: Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate genome information. This challenge has catalysed the development of next-generation sequencing (NGS) technologies. The inexpensive production of large volumes of sequence data is the primary advantage over conventional methods. Here, I present a technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments. I also outline the broad range of applications for NGS technologies, in addition to providing guidelines for platform selection to address biological questions of interest.

...read moreread less

7,023 citations

Journal Article•DOI•

SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler

[...]

Ruibang Luo¹, Binghang Liu¹, Yinlong Xie², Yinlong Xie¹, Zhenyu Li¹, Weihua Huang, Jianying Yuan, Guangzhu He, Yanxiang Chen, Qi Pan, Yunjie Liu, Jingbo Tang, Gengxiong Wu, Hao Zhang, Yujian Shi, Yong Liu, Chang Yu, Bo Wang, Yao Lu, Changlei Han, David W. Cheung¹, Siu-Ming Yiu¹, Shaoliang Peng³, Zhu Xiao-qian³, Guangming Liu³, Xiangke Liao³, Yingrui Li¹, Huanming Yang, Jian Wang, Tak-Wah Lam¹, Jun Wang - Show less +27 more•Institutions (3)

University of Hong Kong¹, South China University of Technology², National University of Defense Technology³

27 Dec 2012-GigaScience

TL;DR: This work provides an updated assembly version of the 2008 Asian genome using SOAPdenovo2, a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome.

...read moreread less

Abstract: There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions. To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome. Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.

...read moreread less

4,284 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse