Author
Jiayou Chu
Bio: Jiayou Chu is an academic researcher from Peking Union Medical College. The author has contributed to research in topics: Population & Haplotype. The author has an hindex of 20, co-authored 68 publications receiving 14039 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
12,661 citations
01 Oct 2015
TL;DR: The 1000 Genomes Project as mentioned in this paper provided a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and reported the completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole genome sequencing, deep exome sequencing and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
3,247 citations
••
Council on Education for Public Health1, Cancer Research UK2, French Institute of Health and Medical Research3, Peking Union Medical College4, Chinese Academy of Sciences5, University of Geneva6, National Cancer Research Institute7, Temple University8, University of Los Andes9, Florida International University10, Harbin Medical University11, Yale University12, Stanford University13, University of Turin14, Marshfield Clinic15
TL;DR: A resource of 1064 cultured lymphoblastoid cell lines from individuals in different world populations and corresponding milligram quantities of DNA is deposited at the Foundation Jean Dausset (CEPH) in Paris.
Abstract: A resource of 1064 cultured lymphoblastoid cell lines (LCLs) ([1][1]) from individuals in different world populations and corresponding milligram quantities of DNA is deposited at the Foundation Jean Dausset (CEPH) ([2][2]) in Paris. LCLs were collected from various laboratories by the Human Genome
1,002 citations
••
University of Malaya1, Central Food Technological Research Institute2, Mahidol University3, Thailand National Science and Technology Development Agency4, Korea Research Institute of Bioscience and Biotechnology5, University of the Philippines6, Academia Sinica7, Agency for Science, Technology and Research8, Peking Union Medical College9, National Institute of Advanced Industrial Science and Technology10, Universiti Sains Malaysia11, Chinese National Human Genome Center12, Tokai University13, Fudan University14, Chiang Mai University15, Thermo Fisher Scientific16, Soongsil University17, Eulji University18, University of Tokyo19, National University of Singapore20, Indian Statistical Institute21, Eijkman Institute for Molecular Biology22, Nanyang Technological University23, University of the Ryukyus24, Health Sciences University of Hokkaido25, Monash University Malaysia Campus26, National Institutes of Health27
TL;DR: The results suggest that there may have been a single major migration of people into Asia and a subsequent south-to-north migration across the continent, and that genetic ancestry is strongly correlated with linguistic affiliations as well as geography.
Abstract: Asia harbors substantial cultural and linguistic diversity, but the geographic structure of genetic variation across the continent remains enigmatic. Here we report a large-scale survey of autosomal variation from a broad geographic sample of Asian human populations. Our results show that genetic ancestry is strongly correlated with linguistic affiliations as well as geography. Most populations show relatedness within ethnic/linguistic groups, despite prevalent gene flow among populations. More than 90% of East Asian (EA) haplotypes could be found in either Southeast Asian (SEA) or Central-South Asian (CSA) populations and show clinal structure with haplotype diversity decreasing from south to north. Furthermore, 50% of EA haplotypes were found in SEA only and 5% were found in CSA only, indicating that SEA was a major geographic source of EA populations.
545 citations
••
TL;DR: This pattern indicates that the first settlement of modern humans in eastern Asia occurred in mainland Southeast Asia during the last Ice Age, coinciding with the absence of human fossils in easternAsia, 50,000-100,000 years ago.
Abstract: Summary The timing and nature of the arrival and the subsequent expansion of modern humans into eastern Asia remains controversial. Using Y-chromosome biallelic markers, we investigated the ancient human-migration patterns in eastern Asia. Our data indicate that southern populations in eastern Asia are much more polymorphic than northern populations, which have only a subset of the southern haplotypes. This pattern indicates that the first settlement of modern humans in eastern Asia occurred in mainland Southeast Asia during the last Ice Age, coinciding with the absence of human fossils in eastern Asia, 50,000–100,000 years ago. After the initial peopling, a great northward migration extended into northern China and Siberia.
404 citations
Cited by
More filters
•
TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.
11,521 citations
••
TL;DR: A unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs is presented.
Abstract: Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.
10,056 citations
••
Broad Institute1, Harvard University2, Boston Children's Hospital3, University of Washington4, University of Arizona5, Cardiff University6, Google7, Icahn School of Medicine at Mount Sinai8, Samsung Medical Center9, Vertex Pharmaceuticals10, University of Michigan11, University of Cambridge12, State University of New York Upstate Medical University13, Karolinska Institutet14, University of Eastern Finland15, University of Oxford16, Wellcome Trust Centre for Human Genetics17, Cedars-Sinai Medical Center18, University of Ottawa19, University of Pennsylvania20, University of North Carolina at Chapel Hill21, University of Helsinki22, University of California, San Diego23, University of Mississippi Medical Center24
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
8,758 citations
••
TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.
Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.
4,489 citations
••
TL;DR: Viewing the microbiota from an ecological perspective could provide insight into how to promote health by targeting this microbial community in clinical treatments.
Abstract: Trillions of microbes inhabit the human intestine, forming a complex ecological community that influences normal physiology and susceptibility to disease through its collective metabolic activities and host interactions. Understanding the factors that underlie changes in the composition and function of the gut microbiota will aid in the design of therapies that target it. This goal is formidable. The gut microbiota is immensely diverse, varies between individuals and can fluctuate over time — especially during disease and early development. Viewing the microbiota from an ecological perspective could provide insight into how to promote health by targeting this microbial community in clinical treatments.
3,890 citations