Home
/
Authors
/
Qibin Li

Author

Qibin Li

Other affiliations: Chinese Academy of Sciences, Beijing Institute of Genomics

Bio: Qibin Li is an academic researcher from Beijing Genomics Institute. The author has contributed to research in topics: Genome & Exome sequencing. The author has an hindex of 11, co-authored 15 publications receiving 18347 citations. Previous affiliations of Qibin Li include Chinese Academy of Sciences & Beijing Institute of Genomics.

Topics: Genome, Exome sequencing, Genomics, Cancer, Whole genome sequencing ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A global reference for human genetic variation.

[...]

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

...read moreread less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

12,661 citations

Journal Article•DOI•

Characterization of microRNAs in serum: a novel class of biomarkers for diagnosis of cancer and other diseases

[...]

Xi Chen¹, Yi Ba², Lijia Ma³, Lijia Ma⁴, Xing Cai¹, Yuan Yin¹, Kehui Wang¹, Jigang Guo¹, Yujing Zhang¹, Jiangning Chen¹, Xing Guo¹, Qibin Li³, Qibin Li⁴, Xiaoying Li⁵, Wenjing Wang⁶, Yan Zhang¹, Jin Wang¹, Xueyuan Jiang¹, Yang Xiang¹, Chen Xu¹, Pingping Zheng⁴, Juanbin Zhang⁴, Ruiqiang Li⁴, Hongjie Zhang¹, Xiaobin Shang⁷, Ting Gong⁷, Guang Ning⁵, Jun Wang⁴, Jun Wang³, Ke Zen¹, Junfeng Zhang¹, Chen-Yu Zhang¹ - Show less +28 more•Institutions (7)

Nanjing University¹, Tianjin Medical University Cancer Institute and Hospital², Beijing Institute of Genomics³, Beijing Genomics Institute⁴, Shanghai Jiao Tong University⁵, Centers for Disease Control and Prevention⁶, Tianjin Medical University⁷

01 Oct 2008-Cell Research

TL;DR: It is demonstrated that miRNAs are present in the serum and plasma of humans and other animals such as mice, rats, bovine fetuses, calves, and horses, and can serve as potential biomarkers for the detection of various cancers and other diseases.

...read moreread less

Abstract: Dysregulated expression of microRNAs (miRNAs) in various tissues has been associated with a variety of diseases, including cancers. Here we demonstrate that miRNAs are present in the serum and plasma of humans and other animals such as mice, rats, bovine fetuses, calves, and horses. The levels of miRNAs in serum are stable, reproducible, and consistent among individuals of the same species. Employing Solexa, we sequenced all serum miRNAs of healthy Chinese subjects and found over 100 and 91 serum miRNAs in male and female subjects, respectively. We also identified specific expression patterns of serum miRNAs for lung cancer, colorectal cancer, and diabetes, providing evidence that serum miRNAs contain fingerprints for various diseases. Two non-small cell lung cancer-specific serum miRNAs obtained by Solexa were further validated in an independent trial of 75 healthy donors and 152 cancer patients, using quantitative reverse transcription polymerase chain reaction assays. Through these analyses, we conclude that serum miRNAs can serve as potential biomarkers for the detection of various cancers and other diseases.

...read moreread less

4,184 citations

Journal Article•DOI•

International network of cancer genome projects

[...]

Thomas J. Hudson¹, Thomas J. Hudson², Warwick Anderson³, Axel Aretz⁴ +270 more•Institutions (92)

15 Apr 2010

TL;DR: Systematic studies of more than 25,000 cancer genomes will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.

...read moreread less

Abstract: The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.

...read moreread less

2,041 citations

Journal Article•DOI•

The sequence and de novo assembly of the giant panda genome

[...]

Ruiqiang Li, Wei Fan, Geng Tian¹, Hongmei Zhu, Lin He², Lin He³, Jing Cai¹, Jing Cai⁴, Quanfei Huang, Qingle Cai⁵, Bo Li, Yinqi Bai, Zhihe Zhang⁶, Ya-Ping Zhang⁴, Wen Wang⁴, Jun Li, Fuwen Wei¹, Heng Li⁷, Min Jian, Jianwen Li, Zhaolei Zhang⁸, Rasmus Nielsen⁹, Dawei Li, Wanjun Gu¹⁰, Zhentao Yang, Zhaoling Xuan, Oliver A. Ryder, Frederick C. Leung¹¹, Yan Zhou, Jianjun Cao, Xiao Sun¹⁰, Yonggui Fu¹², Xiaodong Fang, Xiaosen Guo, Bo Wang, Rong Hou⁶, Fujun Shen⁶, Bo Mu, Peixiang Ni, Runmao Lin, Wubin Qian, Guo-Dong Wang¹, Guo-Dong Wang⁴, Chang Yu, Wenhui Nie⁴, Jinhuan Wang⁴, Zhigang Wu, Huiqing Liang, Jiumeng Min⁵, Qi Wu¹, Shifeng Cheng⁵, Jue Ruan¹, Mingwei Wang, Zhongbin Shi, Ming Wen, Binghang Liu, Xiaoli Ren, Huisong Zheng, Dong Dong⁸, Kathleen Cook⁸, Gao Shan, Hao Zhang, Carolin Kosiol¹³, Xueying Xie¹⁰, Zuhong Lu¹⁰, Hancheng Zheng, Yingrui Li¹, Cynthia C. Steiner, Tommy Tsan-Yuk Lam¹¹, Siyuan Lin, Qinghui Zhang, Guoqing Li, Jing Tian, Timing Gong, Hongde Liu¹⁰, Dejin Zhang¹⁰, Lin Fang, Chen Ye, Juanbin Zhang, Wenbo Hu¹², Anlong Xu¹², Yuanyuan Ren, Guojie Zhang¹, Guojie Zhang⁴, Michael William Bruford¹⁴, Qibin Li¹, Lijia Ma¹, Yiran Guo¹, Na An, Yujie Hu¹, Yang Zheng¹, Yongyong Shi³, Zhiqiang Li³, Qing Liu, Yanling Chen, Jing Zhao, Ning Qu⁵, Shancen Zhao, Feng Tian, Xiaoling Wang, Haiyin Wang, Lizhi Xu, Xiao Liu, Tomas Vinar¹⁵, Yajun Wang¹⁶, Tak-Wah Lam¹¹, Siu-Ming Yiu¹¹, Shiping Liu¹⁷, Hemin Zhang, Desheng Li, Yan Huang, Xia Wang, Guohua Yang, Zhi Jiang, Junyi Wang, Nan Qin, Li Li, Jingxiang Li, Lars Bolund, Karsten Kristiansen¹⁸, Gane Ka-Shu Wong¹⁹, Maynard V. Olson²⁰, Xiuqing Zhang, Songgang Li, Huanming Yang, Jing Wang, Jun Wang¹⁸ - Show less +123 more•Institutions (20)

Chinese Academy of Sciences¹, Fudan University², Shanghai Jiao Tong University³, Kunming Institute of Zoology⁴, Shenzhen University⁵, Chengdu Research Base of Giant Panda Breeding⁶, Wellcome Trust⁷, University of Toronto⁸, University of California, Berkeley⁹, Southeast University¹⁰, University of Hong Kong¹¹, Sun Yat-sen University¹², University of Vienna¹³, Cardiff University¹⁴, Comenius University in Bratislava¹⁵, Sichuan University¹⁶, South China University of Technology¹⁷, University of Copenhagen¹⁸, University of Alberta¹⁹, University of Washington²⁰

21 Jan 2010-Nature

TL;DR: Using next-generation sequencing technology alone, a draft sequence of the giant panda genome is generated and assembled, indicating that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition.

...read moreread less

Abstract: Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.

...read moreread less

1,109 citations

Journal Article•DOI•

The diploid genome sequence of an Asian individual.

[...]

Jun Wang, Wei Wang¹, Ruiqiang Li¹, Ruiqiang Li², Yingrui Li³, Yingrui Li¹, Yingrui Li⁴, Geng Tian⁵, Geng Tian¹, Laurie Goodman¹, Wei Fan¹, Junqing Zhang¹, Jun Li¹, Juanbin Zhang¹, Yiran Guo¹, Yiran Guo⁵, Binxiao Feng¹, Heng Li⁶, Heng Li¹, Yao Lu¹, Xiaodong Fang¹, Huiqing Liang¹, Zhenglin Du¹, Dong Li¹, Yiqing Zhao¹, Yiqing Zhao⁵, Yujie Hu¹, Yujie Hu⁵, Zhenzhen Yang¹, Hancheng Zheng¹, Ines Hellmann⁷, Michael Inouye⁶, John E. Pool⁷, Xin Yi¹, Xin Yi⁵, Jing Zhao¹, Jinjie Duan¹, Yan Zhou¹, Junjie Qin⁵, Junjie Qin¹, Lijia Ma¹, Lijia Ma⁵, Guoqing Li¹, Zhentao Yang¹, Guojie Zhang¹, Guojie Zhang⁵, Bin Yang¹, Chang Yu¹, Fang Liang¹, Fang Liang⁵, Wenjie Li¹, Shaochuan Li¹, Dawei Li¹, Peixiang Ni¹, Jue Ruan⁵, Jue Ruan¹, Qibin Li⁵, Qibin Li¹, Hongmei Zhu¹, Dongyuan Liu¹, Zhike Lu¹, Ning Li¹, Ning Li⁵, Guangwu Guo¹, Guangwu Guo⁵, Jianguo Zhang¹, Jia Ye¹, Lin Fang¹, Qin Hao¹, Qin Hao⁵, Quan Chen¹, Quan Chen³, Yu Liang⁵, Yu Liang¹, Yeyang Su¹, Yeyang Su⁵, A. san¹, A. san⁵, Cuo Ping⁵, Cuo Ping¹, Shuang Yang¹, Fang Chen¹, Fang Chen⁵, Li Li¹, Ke Zhou¹, Hongkun Zheng², Hongkun Zheng¹, Yuanyuan Ren¹, Ling Yang¹, Yang Gao⁴, Yang Gao¹, Guohua Yang⁸, Guohua Yang¹, Zhuo Li¹, Xiaoli Feng¹, Karsten Kristiansen², Gane Ka-Shu Wong¹, Gane Ka-Shu Wong⁹, Rasmus Nielsen⁷, Richard Durbin⁶, Lars Bolund¹⁰, Lars Bolund¹, Xiuqing Zhang⁴, Xiuqing Zhang¹, Songgang Li⁸, Songgang Li¹, Songgang Li³, Huanming Yang¹, Huanming Yang⁸, Jian Wang¹, Jian Wang⁸ - Show less +107 more•Institutions (10)

Beijing Genomics Institute¹, University of Southern Denmark², Peking University³, Beijing Institute of Genomics⁴, Chinese Academy of Sciences⁵, Wellcome Trust Sanger Institute⁶, University of California, Berkeley⁷, Shenzhen University⁸, University of Alberta⁹, Aarhus University¹⁰

06 Nov 2008-Nature

TL;DR: Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly, and the potential usefulness of next-generation sequencing technologies for personal genomics.

...read moreread less

Abstract: Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics.

...read moreread less

963 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

[...]

Aaron McKenna¹, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran V. Garimella, David Altshuler, Stacey Gabriel, Mark J. Daly, Mark A. DePristo - Show less +7 more•Institutions (1)

Broad Institute¹

01 Sep 2010-Genome Research

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

20,557 citations

Journal Article•DOI•

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

[...]

Ben Langmead¹, Cole Trapnell¹, Mihai Pop¹, Steven L. Salzberg¹•Institutions (1)

University of Maryland, College Park¹

04 Mar 2009-Genome Biology

TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.

...read moreread less

Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

...read moreread less

20,335 citations

Journal Article•DOI•

The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data

[...]

Ethan Cerami¹, Jianjiong Gao, Ugur Dogrusoz, Benjamin Gross, Selcuk Onur Sumer, Bulent Arman Aksoy, Anders Jacobsen, Caitlin Byrne, Michael Heuer, Erik G. Larsson, Yevgeniy Antipin, Boris Reva, Arthur P. Goldberg, Chris Sander, Nikolaus Schultz - Show less +11 more•Institutions (1)

Memorial Sloan Kettering Cancer Center¹

01 May 2012-Cancer Discovery

TL;DR: The cBio Cancer Genomics Portal significantly lowers the barriers between complex genomic data and cancer researchers who want rapid, intuitive, and high-quality access to molecular profiles and clinical attributes from large-scale cancer genomics projects and empowers researchers to translate these rich data sets into biologic insights and clinical applications.

...read moreread less

Abstract: The cBio Cancer Genomics Portal (http://cbioportal.org) is an open-access resource for interactive exploration of multidimensional cancer genomics data sets, currently providing access to data from more than 5,000 tumor samples from 20 cancer studies. The cBio Cancer Genomics Portal significantly lowers the barriers between complex genomic data and cancer researchers who want rapid, intuitive, and high-quality access to molecular profiles and clinical attributes from large-scale cancer genomics projects and empowers researchers to translate these rich data sets into biologic insights and clinical applications.

...read moreread less

11,912 citations

Journal Article•DOI•

Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal

[...]

Jianjiong Gao¹, Bulent Arman Aksoy¹, Ugur Dogrusoz², Gideon Dresdner¹, Benjamin Gross¹, S. Onur Sumer¹, Yichao Sun¹, Anders Jacobsen¹, Rileen Sinha¹, Erik Larsson³, Ethan Cerami¹, Chris Sander¹, Nikolaus Schultz¹ - Show less +9 more•Institutions (3)

Memorial Sloan Kettering Cancer Center¹, Bilkent University², University of Gothenburg³

02 Apr 2013-Science Signaling

TL;DR: A practical guide to the analysis and visualization features of the cBioPortal for Cancer Genomics, which makes complex cancer genomics profiles accessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitating biological discoveries.

...read moreread less

Abstract: The cBioPortal for Cancer Genomics (http://cbioportal.org) provides a Web resource for exploring, visualizing, and analyzing multidimensional cancer genomics data. The portal reduces molecular profiling data from cancer tissues and cell lines into readily understandable genetic, epigenetic, gene expression, and proteomic events. The query interface combined with customized data storage enables researchers to interactively explore genetic alterations across samples, genes, and pathways and, when available in the underlying data, to link these to clinical outcomes. The portal provides graphical summaries of gene-level data from multiple platforms, network visualization and analysis, survival analysis, patient-centric queries, and software programmatic access. The intuitive Web interface of the portal makes complex cancer genomics profiles accessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitating biological discoveries. Here, we provide a practical guide to the analysis and visualization features of the cBioPortal for Cancer Genomics.

...read moreread less

10,947 citations

Journal Article•DOI•

Analysis of protein-coding genetic variation in 60,706 humans

[...]

Monkol Lek, Konrad J. Karczewski¹, Konrad J. Karczewski², Eric Vallabh Minikel¹, Eric Vallabh Minikel², Kaitlin E. Samocha, Eric Banks², Timothy Fennell², Anne H. O’Donnell-Luria¹, Anne H. O’Donnell-Luria², Anne H. O’Donnell-Luria³, James S. Ware, Andrew J. Hill¹, Andrew J. Hill⁴, Andrew J. Hill², Beryl B. Cummings², Beryl B. Cummings¹, Taru Tukiainen², Taru Tukiainen¹, Daniel P. Birnbaum², Jack A. Kosmicki, Laramie E. Duncan¹, Laramie E. Duncan², Karol Estrada², Karol Estrada¹, Fengmei Zhao², Fengmei Zhao¹, James Zou², Emma Pierce-Hoffman², Emma Pierce-Hoffman¹, Joanne Berghout⁵, David Neil Cooper⁶, Nicole A. Deflaux⁷, Mark A. DePristo², Ron Do, Jason Flannick¹, Jason Flannick², Menachem Fromer, Laura D. Gauthier², Jackie Goldstein², Jackie Goldstein¹, Namrata Gupta², Daniel P. Howrigan², Daniel P. Howrigan¹, Adam Kiezun², Mitja I. Kurki¹, Mitja I. Kurki², Ami Levy Moonshine², Pradeep Natarajan, Lorena Orozco, Gina M. Peloso¹, Gina M. Peloso², Ryan Poplin², Manuel A. Rivas², Valentin Ruano-Rubio², Samuel A. Rose², Douglas M. Ruderfer⁸, Khalid Shakir², Peter D. Stenson⁶, Christine Stevens², Brett Thomas², Brett Thomas¹, Grace Tiao², María Teresa Tusié-Luna, Ben Weisburd², Hong-Hee Won⁹, Dongmei Yu, David Altshuler¹⁰, David Altshuler², Diego Ardissino, Michael Boehnke¹¹, John Danesh¹², Stacey Donnelly², Roberto Elosua, Jose C. Florez², Jose C. Florez¹, Stacey Gabriel², Gad Getz¹, Gad Getz², Stephen J. Glatt¹³, Christina M. Hultman¹⁴, Sekar Kathiresan, Markku Laakso¹⁵, Steven A. McCarroll², Steven A. McCarroll¹, Mark I. McCarthy¹⁶, Mark I. McCarthy¹⁷, Dermot P.B. McGovern¹⁸, Ruth McPherson¹⁹, Benjamin M. Neale¹, Benjamin M. Neale², Aarno Palotie, Shaun Purcell⁸, Danish Saleheen²⁰, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan²¹, Patrick F. Sullivan¹⁴, Jaakko Tuomilehto²², Ming T. Tsuang²³, Hugh Watkins¹⁷, Hugh Watkins¹⁶, James G. Wilson²⁴, Mark J. Daly², Mark J. Daly¹, Daniel G. MacArthur¹, Daniel G. MacArthur² - Show less +103 more•Institutions (24)

Harvard University¹, Broad Institute², Boston Children's Hospital³, University of Washington⁴, University of Arizona⁵, Cardiff University⁶, Google⁷, Icahn School of Medicine at Mount Sinai⁸, Samsung Medical Center⁹, Vertex Pharmaceuticals¹⁰, University of Michigan¹¹, University of Cambridge¹², State University of New York Upstate Medical University¹³, Karolinska Institutet¹⁴, University of Eastern Finland¹⁵, University of Oxford¹⁶, Wellcome Trust Centre for Human Genetics¹⁷, Cedars-Sinai Medical Center¹⁸, University of Ottawa¹⁹, University of Pennsylvania²⁰, University of North Carolina at Chapel Hill²¹, University of Helsinki²², University of California, San Diego²³, University of Mississippi Medical Center²⁴

18 Aug 2016-Nature

TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.

...read moreread less

Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

...read moreread less

8,758 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse