Author
Jonathan Baker
Bio: Jonathan Baker is an academic researcher from Xi'an Jiaotong University. The author has contributed to research in topics: Speleothem & Holocene. The author has an hindex of 7, co-authored 12 publications receiving 5962 citations.
Topics: Speleothem, Holocene, Glacial period, East Asian Monsoon, Younger Dryas
Papers
More filters
••
TL;DR: The dbSNP database is a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, and is integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data.
Abstract: In response to a need for a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, the National Center for Biotechnology Information (NCBI) has established the dbSNP database [S.T.Sherry, M.Ward and K.Sirotkin (1999) Genome Res., 9, 677–679]. Submissions to dbSNP will be integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data. The complete contents of dbSNP are available to the public at website: http://www.ncbi.nlm.nih.gov/SNP. The complete contents of dbSNP can also be downloaded in multiple formats via anonymous FTP at ftp:// ncbi.nlm.nih.gov/snp/.
6,449 citations
••
TL;DR: The history of the population represented by the public genome sequence is one of collapse followed by a recent phase of mild size recovery, and the inferred times of collapse and recovery are Upper Paleolithic, in agreement with archaeological evidence of the initial modern human colonization of Europe.
Abstract: Single-nucleotide polymorphisms (SNPs) constitute the great majority of variations in the human genome, and as heritable variable landmarks they are useful markers for disease mapping and resolving population structure. Redundant coverage in overlaps of large-insert genomic clones, sequenced as part of the Human Genome Project, comprises a quarter of the genome, and it is representative in terms of base compositional and functional sequence features. We mined these regions to produce 500,000 high-confidence SNP candidates as a uniform resource for describing nucleotide diversity and its regional variation within the genome. Distributions of marker density observed at different overlap length scales under a model of recombination and population size change show that the history of the population represented by the public genome sequence is one of collapse followed by a recent phase of mild size recovery. The inferred times of collapse and recovery are Upper Paleolithic, in agreement with archaeological evidence of the initial modern human colonization of Europe.
124 citations
••
TL;DR: It is shown that the Younger Dryas event occurred first at high northern latitudes and then propagated southward into the tropical monsoon belt through both atmospheric and oceanic processes, ultimately reaching Antarctica before reversing the course to its eventual termination.
Abstract: The Younger Dryas (YD), arguably the most widely studied millennial-scale extreme climate event, was characterized by diverse hydroclimate shifts globally and severe cooling at high northern latitudes that abruptly punctuated the warming trend from the last glacial to the present interglacial. To date, a precise understanding of its trigger, propagation, and termination remains elusive. Here, we present speleothem oxygen-isotope data that, in concert with other proxy records, allow us to quantify the timing of the YD onset and termination at an unprecedented subcentennial temporal precision across the North Atlantic, Asian Monsoon-Westerlies, and South American Monsoon regions. Our analysis suggests that the onsets of YD in the North Atlantic (12,870 ± 30 B.P.) and the Asian Monsoon-Westerlies region are essentially synchronous within a few decades and lead the onset in Antarctica, implying a north-to-south climate signal propagation via both atmospheric (decadal-time scale) and oceanic (centennial-time scale) processes, similar to the Dansgaard-Oeschger events during the last glacial period. In contrast, the YD termination may have started first in Antarctica at ∼11,900 B.P., or perhaps even earlier in the western tropical Pacific, followed by the North Atlantic between ∼11,700 ± 40 and 11,610 ± 40 B.P. These observations suggest that the initial YD termination might have originated in the Southern Hemisphere and/or the tropical Pacific, indicating a Southern Hemisphere/tropics to North Atlantic-Asian Monsoon-Westerlies directionality of climatic recovery.
96 citations
••
23 Jul 2019TL;DR: In this article, the authors use the Speleothem Isotopes Synthesis and Analysis database (SISAL_v1) to present an overview of hydro-climate variability related to the ASM during three periods: the late Pleistocene, the Holocene, and the last two millennia.
Abstract: Asian summer monsoon (ASM) variability significantly affects hydro-climate, and thus socio-economics, in the East Asian region, where nearly one-third of the global population resides. Over the last two decades, speleothem δ18O records from China have been utilized to reconstruct ASM variability and its underlying forcing mechanisms on orbital to seasonal timescales. Here, we use the Speleothem Isotopes Synthesis and Analysis database (SISAL_v1) to present an overview of hydro-climate variability related to the ASM during three periods: the late Pleistocene, the Holocene, and the last two millennia. We highlight the possible global teleconnections and forcing mechanisms of the ASM on different timescales. The longest composite stalagmite δ18O record over the past 640 kyr BP from the region demonstrates that ASM variability on orbital timescales is dominated by the 23 kyr precessional cycles, which are in phase with Northern Hemisphere summer insolation (NHSI). During the last glacial, millennial changes in the intensity of the ASM appear to be controlled by North Atlantic climate and oceanic feedbacks. During the Holocene, changes in ASM intensity were primarily controlled by NHSI. However, the spatio-temporal distribution of monsoon rain belts may vary with changes in ASM intensity on decadal to millennial timescales.
85 citations
••
TL;DR: In this article, the authors present a composite speleothem δ18O record of the last ∼14 kyr from Shennong Cave in southeastern China and model-simulated data of rainfall and meteoric ǫ18O over eastern China.
64 citations
Cited by
More filters
••
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
20,557 citations
••
TL;DR: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI’s website.
Abstract: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's website. NCBI resources include Entrez, PubMed, PubMed Central, LocusLink, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SARS Coronavirus Resource, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD) and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.
9,604 citations
••
Wellcome Trust Sanger Institute1, Cambridge University Hospitals NHS Foundation Trust2, Wellcome Trust3, University of British Columbia4, University of Cambridge5, Oslo University Hospital6, The Breast Cancer Research Foundation7, University of Oslo8, University of Münster9, Université libre de Bruxelles10, German Cancer Research Center11, University of Iceland12, Erasmus University Rotterdam13, Paris Descartes University14, French Institute of Health and Medical Research15, University of Paris16, Broad Institute17, University of Bergen18, University of Queensland19, University of Oviedo20, University of Glasgow21, Harvard University22, United States Department of Veterans Affairs23, Netherlands Cancer Institute24, University of Kiel25, Radboud University Nijmegen26, King's College London27, Curie Institute28, University of New South Wales29, Bankstown Lidcombe Hospital30, University of Barcelona31
TL;DR: It is shown that hypermutation localized to small genomic regions, ‘kataegis’, is found in many cancer types, and this results reveal the diversity of mutational processes underlying the development of cancer.
Abstract: All cancers are caused by somatic mutations; however, understanding of the biological processes generating these mutations is limited. The catalogue of somatic mutations from a cancer genome bears the signatures of the mutational processes that have been operative. Here we analysed 4,938,362 mutations from 7,042 cancers and extracted more than 20 distinct mutational signatures. Some are present in many cancer types, notably a signature attributed to the APOBEC family of cytidine deaminases, whereas others are confined to a single cancer class. Certain signatures are associated with age of the patient at cancer diagnosis, known mutagenic exposures or defects in DNA maintenance, but many are of cryptic origin. In addition to these genome-wide mutational signatures, hypermutation localized to small genomic regions, 'kataegis', is found in many cancer types. The results reveal the diversity of mutational processes underlying the development of cancer, with potential implications for understanding of cancer aetiology, prevention and therapy.
7,904 citations
••
TL;DR: It is shown that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites.
Abstract: By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.
7,710 citations
••
Baylor College of Medicine1, Chinese Academy of Sciences2, Chinese National Human Genome Center3, University of Hong Kong4, The Chinese University of Hong Kong5, Hong Kong University of Science and Technology6, Illumina7, McGill University8, Washington University in St. Louis9, University of California, San Francisco10, Wellcome Trust Sanger Institute11, Beijing Normal University12, Health Sciences University of Hokkaido13, Shinshu University14, University of Tsukuba15, Howard University16, University of Ibadan17, Case Western Reserve University18, University of Utah19, Cold Spring Harbor Laboratory20, Johns Hopkins University21, University of Oxford22, North Carolina State University23, National Institutes of Health24, Massachusetts Institute of Technology25, Chinese Academy of Social Sciences26, Kyoto University27, Nagasaki University28, Wellcome Trust29, Genome Canada30, Foundation for the National Institutes of Health31, University of Maryland, Baltimore32, Vanderbilt University33, Stanford University34, New York University35, University of California, Berkeley36, University of Oklahoma37, University of New Mexico38, Université de Montréal39, University of California, Los Angeles40, University of Michigan41, University of Wisconsin-Madison42, London School of Economics and Political Science43, Genetic Alliance44, GlaxoSmithKline45, University of Washington46, Harvard University47, University of Chicago48, Fred Hutchinson Cancer Research Center49, University of Tokyo50
TL;DR: The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance the ability to choose targets for therapeutic intervention.
Abstract: The goal of the International HapMap Project is to determine the common patterns of DNA sequence variation in the human genome and to make this information freely available in the public domain. An international consortium is developing a map of these patterns across the genome by determining the genotypes of one million or more sequence variants, their frequencies and the degree of association between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe. The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance our ability to choose targets for therapeutic intervention.
5,926 citations