Home
/
Authors
/
Yinlong Xie

Author

Yinlong Xie

Other affiliations: South China University of Technology

Bio: Yinlong Xie is an academic researcher from Beijing Institute of Genomics. The author has contributed to research in topics: Contig & Genome. The author has an hindex of 2, co-authored 7 publications receiving 9897 citations. Previous affiliations of Yinlong Xie include South China University of Technology.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A global reference for human genetic variation.

[...]

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

...read moreread less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

12,661 citations

Journal Article•DOI•

De novo assembly of a haplotype-resolved human genome

[...]

Hongzhi Cao¹, Honglong Wu, Ruibang Luo, Shujia Huang², Yuhui Sun², Xin Tong, Yinlong Xie², Binghang Liu, Hailong Yang, Hancheng Zheng¹, Jian Li¹, Bo Li, Yu Wang², Fang Yang, Peng Sun, Siyang Liu¹, Peng Gao, Haodong Huang², Sun Jing, Dan Chen, Guangzhu He, Weihua Huang, Zheng Huang, Li Yue, Laurent C. A. M. Tellier¹, Xiao Liu¹, Qiang Feng¹, Xun Xu, Xiuqing Zhang, Lars Bolund³, Anders Krogh¹, Karsten Kristiansen¹, Radoje Drmanac, Snezana Drmanac, Rasmus Nielsen⁴, Songgang Li, Jian Wang, Huanming Yang⁵, Yingrui Li⁶, Gane Ka-Shu Wong⁷, Jun Wang⁸ - Show less +37 more•Institutions (8)

University of Copenhagen¹, South China University of Technology², The Breast Cancer Research Foundation³, University of California, Berkeley⁴, King Abdulaziz University⁵, University of Queensland⁶, University of Alberta⁷, Macau University of Science and Technology⁸

01 Jun 2015-Nature Biotechnology

TL;DR: This haplotype-resolved diploid genome represents the most complete de novo human genome assembly to date and should aid in translating genotypes to phenotypes for the development of personalized medicine.

...read moreread less

Abstract: The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine.

...read moreread less

79 citations

Patent•

Transcriptome assembly method and system

[...]

Gengxiong Wu, Weihua Huang, Yinlong Xie, Jingbo Tang, Jun Wang, Jian Wang, Huanming Yang - Show less +3 more

13 Apr 2012

TL;DR: In this paper, a transcriptome assembly method based on the de Brujin graph is presented, which consists of the following steps: constructing a sequencing sample transcriptome read into a de-Brujin graph, forming continuous contigs, obtaining association among the contigs and filtering association data, performing linearization processing on a continuous sequence without bifurcation, outputting a contig sequence, and comparing the read and an end pairing read with the output contig sequences, so as to obtain information between the input read and the contig.

...read moreread less

Abstract: Provided is a transcriptome assembly method, comprising the following steps of: constructing a sequencing sample transcriptome read into a de Brujin graph; performing filtering and linearization processing on the de Brujin graph, so as to form continuous contigs; obtaining association among the contigs, and filtering association data; performing linearization processing on a continuous sequence without bifurcation; outputting a contig sequence; comparing the read and an end pairing read with the output contig sequence, so as to obtain information between the read and the contig; establishing connections among the contigs, so as to construct a graph with the contigs as points and the connections as edges; pre-processing and dividing the obtained graph, so as to obtain independent sub-graphs; and outputting a transcript according to the sub-graphs. Further provided is a transcriptome assembly system based on the method.

...read moreread less

2 citations

Posted Content•DOI•

Efficient long single molecule sequencing for cost effective and accurate sequencing, haplotyping, and de novo assembly

[...]

Ou Wang¹, Robert Chin, Xiaofang Cheng, Ka Wu M, Qing Mao, Jingbo Tang, Yuhui Sun, Ellis Anderson, Han K. Lam, Dan Chen, Yujun Zhou, Linying Wang, Fei Fan, Yan Zou, Yinlong Xie, Yu Zhang R, Snezana Drmanac, Darlene Nguyen, Chongjun Xu, Christian Villarosa, Scott Gablenz, Nina Barua, Staci Nguyen, Wenlan Tian, Sophie Liu J, Jingwan Wang, Xin Liu, Xiaojuan Qi, Ao Chen, He Wang, Dong Yuliang, Wenwei Zhang, Andrei Alexeev, Huanming Yang, Karsten Kristiansen¹, Xun Xu, Radoje Drmanac, Brock A. Peters - Show less +34 more•Institutions (1)

University of Copenhagen¹

17 May 2018-bioRxiv

TL;DR: StLFR represents an easily automatable solution that enables high quality sequencing, phasing, SV detection, scaffolding, cost-effective diploid de novo genome assembly, and other long DNA sequencing applications.

...read moreread less

Abstract: Obtaining accurate sequences from long DNA molecules is very important for genome assembly and other applications. Here we describe single tube long fragment read (stLFR), a technology that enables this a low cost. It is based on adding the same barcode sequence to sub-fragments of the original long DNA molecule (DNA co-barcoding). To achieve this efficiently, stLFR uses the surface of microbeads to create millions of miniaturized barcoding reactions in a single tube. Using a combinatorial process up to 3.6 billion unique barcode sequences were generated on beads, enabling practically non-redundant co-barcoding with 50 million barcodes per sample. Using stLFR, we demonstrate efficient unique co-barcoding of over 8 million 20-300 kb genomic DNA fragments. Analysis of the genome of the human genome NA12878 with stLFR demonstrated high quality variant calling and phasing into contigs up to N50 34 Mb. We also demonstrate detection of complex structural variants and complete diploid de novo assembly of NA12878. These analyses were all performed using single stLFR libraries and their construction did not significantly add to the time or cost of whole genome sequencing (WGS) library preparation. stLFR represents an easily automatable solution that enables high quality sequencing, phasing, SV detection, scaffolding, cost-effective diploid de novo genome assembly, and other long DNA sequencing applications.

...read moreread less

2 citations

An integrated map of structural variation in 2,504 human genomes_supplement

[...]

Sudmant Ph, Tobias Rausch, Gardner Ej, Handsaker Re +491 more

01 Jan 2016

TL;DR: An integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which are constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations are described.

...read moreread less

1 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Analysis of protein-coding genetic variation in 60,706 humans

[...]

Monkol Lek, Konrad J. Karczewski¹, Konrad J. Karczewski², Eric Vallabh Minikel², Eric Vallabh Minikel¹, Kaitlin E. Samocha, Eric Banks², Timothy Fennell², Anne H. O’Donnell-Luria², Anne H. O’Donnell-Luria³, Anne H. O’Donnell-Luria¹, James S. Ware, Andrew J. Hill¹, Andrew J. Hill², Andrew J. Hill⁴, Beryl B. Cummings², Beryl B. Cummings¹, Taru Tukiainen¹, Taru Tukiainen², Daniel P. Birnbaum², Jack A. Kosmicki, Laramie E. Duncan², Laramie E. Duncan¹, Karol Estrada¹, Karol Estrada², Fengmei Zhao¹, Fengmei Zhao², James Zou², Emma Pierce-Hoffman², Emma Pierce-Hoffman¹, Joanne Berghout⁵, David Neil Cooper⁶, Nicole A. Deflaux⁷, Mark A. DePristo², Ron Do, Jason Flannick², Jason Flannick¹, Menachem Fromer, Laura D. Gauthier², Jackie Goldstein², Jackie Goldstein¹, Namrata Gupta², Daniel P. Howrigan², Daniel P. Howrigan¹, Adam Kiezun², Mitja I. Kurki², Mitja I. Kurki¹, Ami Levy Moonshine², Pradeep Natarajan, Lorena Orozco, Gina M. Peloso², Gina M. Peloso¹, Ryan Poplin², Manuel A. Rivas², Valentin Ruano-Rubio², Samuel A. Rose², Douglas M. Ruderfer⁸, Khalid Shakir², Peter D. Stenson⁶, Christine Stevens², Brett Thomas¹, Brett Thomas², Grace Tiao², María Teresa Tusié-Luna, Ben Weisburd², Hong-Hee Won⁹, Dongmei Yu, David Altshuler², David Altshuler¹⁰, Diego Ardissino, Michael Boehnke¹¹, John Danesh¹², Stacey Donnelly², Roberto Elosua, Jose C. Florez¹, Jose C. Florez², Stacey Gabriel², Gad Getz¹, Gad Getz², Stephen J. Glatt¹³, Christina M. Hultman¹⁴, Sekar Kathiresan, Markku Laakso¹⁵, Steven A. McCarroll², Steven A. McCarroll¹, Mark I. McCarthy¹⁶, Mark I. McCarthy¹⁷, Dermot P.B. McGovern¹⁸, Ruth McPherson¹⁹, Benjamin M. Neale², Benjamin M. Neale¹, Aarno Palotie, Shaun Purcell⁸, Danish Saleheen²⁰, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan²¹, Patrick F. Sullivan¹⁴, Jaakko Tuomilehto²², Ming T. Tsuang²³, Hugh Watkins¹⁶, Hugh Watkins¹⁷, James G. Wilson²⁴, Mark J. Daly¹, Mark J. Daly², Daniel G. MacArthur¹, Daniel G. MacArthur² - Show less +103 more•Institutions (24)

Harvard University¹, Broad Institute², Boston Children's Hospital³, University of Washington⁴, University of Arizona⁵, Cardiff University⁶, Google⁷, Icahn School of Medicine at Mount Sinai⁸, Samsung Medical Center⁹, Vertex Pharmaceuticals¹⁰, University of Michigan¹¹, University of Cambridge¹², State University of New York Upstate Medical University¹³, Karolinska Institutet¹⁴, University of Eastern Finland¹⁵, University of Oxford¹⁶, Wellcome Trust Centre for Human Genetics¹⁷, Cedars-Sinai Medical Center¹⁸, University of Ottawa¹⁹, University of Pennsylvania²⁰, University of North Carolina at Chapel Hill²¹, University of Helsinki²², University of California, San Diego²³, University of Mississippi Medical Center²⁴

18 Aug 2016-Nature

TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.

...read moreread less

Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

...read moreread less

8,758 citations

Journal Article•DOI•

The UK Biobank resource with deep phenotyping and genomic data

[...]

Clare Bycroft¹, Colin Freeman¹, Desislava Petkova², Desislava Petkova¹, Gavin Band¹, Lloyd T. Elliott¹, Kevin Sharp¹, Allan Motyer³, Damjan Vukcevic³, Olivier Delaneau⁴, Olivier Delaneau⁵, Jared O'Connell⁶, Adrian Cortes⁷, Adrian Cortes¹, Samantha Welsh, Alan Young¹, Mark Effingham, Gil McVean¹, Stephen Leslie³, Naomi E. Allen¹, Peter Donnelly¹, Jonathan Marchini¹ - Show less +18 more•Institutions (7)

University of Oxford¹, Procter & Gamble², University of Melbourne³, University of Geneva⁴, Swiss Institute of Bioinformatics⁵, Illumina⁶, John Radcliffe Hospital⁷

11 Oct 2018-Nature

TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.

...read moreread less

Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.

...read moreread less

4,489 citations

Journal Article•DOI•

Genetic effects on gene expression across human tissues.

[...]

Enhancing GTEx (eGTEx) groups¹, Nih Common Fund², Nhgri, Biospecimen Core Resource—VARI, Elsi study, Genome Browser Data Integration Visualization—EBI, Lead analysts, Alexis Battle³, Christopher D. Brown⁴, Barbara E. Engelhardt¹, Stephen B. Montgomery² - Show less +7 more•Institutions (4)

Princeton University¹, Stanford University², Johns Hopkins University³, University of Pennsylvania⁴

12 Oct 2017-Nature

TL;DR: It is found that local genetic variation affects gene expression levels for the majority of genes, and inter-chromosomal genetic effects for 93 genes and 112 loci are identified, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.

...read moreread less

Abstract: Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.

...read moreread less

3,289 citations

Journal Article•DOI•

Coming of age: ten years of next-generation sequencing technologies

[...]

Sara Goodwin¹, John Douglas Mcpherson², W. Richard McCombie¹•Institutions (2)

Cold Spring Harbor Laboratory¹, University of California, Davis²

01 Jun 2016-Nature Reviews Genetics

TL;DR: These and other strategies are providing researchers and clinicians a variety of tools to probe genomes in greater depth, leading to an enhanced understanding of how genome sequence variants underlie phenotype and disease.

...read moreread less

Abstract: Since the completion of the human genome project in 2003, extraordinary progress has been made in genome sequencing technologies, which has led to a decreased cost per megabase and an increase in the number and diversity of sequenced genomes. An astonishing complexity of genome architecture has been revealed, bringing these sequencing technologies to even greater advancements. Some approaches maximize the number of bases sequenced in the least amount of time, generating a wealth of data that can be used to understand increasingly complex phenotypes. Alternatively, other approaches now aim to sequence longer contiguous pieces of DNA, which are essential for resolving structurally complex regions. These and other strategies are providing researchers and clinicians a variety of tools to probe genomes in greater depth, leading to an enhanced understanding of how genome sequence variants underlie phenotype and disease.

...read moreread less

3,096 citations

Journal Article•DOI•

The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.

[...]

Annalisa Buniello¹, Jacqueline A. L. MacArthur¹, Maria Cerezo¹, Laura W. Harris¹, James D. Hayhurst¹, Cinzia Malangone¹, Aoife McMahon¹, Joannella Morales¹, Edward Mountjoy², Edward Mountjoy³, Elliot Sollis¹, Daniel Suveges¹, Olga Vrousgou¹, Patricia L. Whetzel¹, M. Ridwan Amode¹, Jose A. Guillen¹, Harpreet Singh Riat¹, Stephen J. Trevanion¹, Peggy Hall⁴, Heather Junkins⁴, Paul Flicek¹, Tony Burdett¹, Lucia A. Hindorff⁴, Fiona Cunningham¹, Helen Parkinson¹ - Show less +21 more•Institutions (4)

European Bioinformatics Institute¹, University of Oxford², Wellcome Trust Sanger Institute³, National Institutes of Health⁴

08 Jan 2019-Nucleic Acids Research

TL;DR: Improved data access is improved with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database.

...read moreread less

Abstract: The GWAS Catalog delivers a high-quality curated collection of all published genome-wide association studies enabling investigations to identify causal variants, understand disease mechanisms, and establish targets for novel therapies. The scope of the Catalog has also expanded to targeted and exome arrays with 1000 new associations added for these technologies. As of September 2018, the Catalog contains 5687 GWAS comprising 71673 variant-trait associations from 3567 publications. New content includes 284 full P-value summary statistics datasets for genome-wide and new targeted array studies, representing 6 × 109 individual variant-trait statistics. In the last 12 months, the Catalog's user interface was accessed by ∼90000 unique users who viewed >1 million pages. We have improved data access with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database. Summary statistics provision is supported by a new format proposed as a community standard for summary statistics data representation. This format was derived from our experience in standardizing heterogeneous submissions, mapping formats and in harmonizing content. Availability: https://www.ebi.ac.uk/gwas/.

...read moreread less

2,878 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse