scispace - formally typeset
Search or ask a question
Author

Maria Cerezo

Other affiliations: Wellcome Trust Sanger Institute
Bio: Maria Cerezo is an academic researcher from European Bioinformatics Institute. The author has contributed to research in topics: 1000 Genomes Project & Genome-wide association study. The author has an hindex of 8, co-authored 14 publications receiving 14295 citations. Previous affiliations of Maria Cerezo include Wellcome Trust Sanger Institute.

Papers
More filters
Journal ArticleDOI
Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

12,661 citations

01 Oct 2015
TL;DR: The 1000 Genomes Project as mentioned in this paper provided a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and reported the completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole genome sequencing, deep exome sequencing and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

3,247 citations

Journal ArticleDOI
TL;DR: Improved data access is improved with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database.
Abstract: The GWAS Catalog delivers a high-quality curated collection of all published genome-wide association studies enabling investigations to identify causal variants, understand disease mechanisms, and establish targets for novel therapies. The scope of the Catalog has also expanded to targeted and exome arrays with 1000 new associations added for these technologies. As of September 2018, the Catalog contains 5687 GWAS comprising 71673 variant-trait associations from 3567 publications. New content includes 284 full P-value summary statistics datasets for genome-wide and new targeted array studies, representing 6 × 109 individual variant-trait statistics. In the last 12 months, the Catalog's user interface was accessed by ∼90000 unique users who viewed >1 million pages. We have improved data access with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database. Summary statistics provision is supported by a new format proposed as a community standard for summary statistics data representation. This format was derived from our experience in standardizing heterogeneous submissions, mapping formats and in harmonizing content. Availability: https://www.ebi.ac.uk/gwas/.

2,878 citations

Journal ArticleDOI
TL;DR: Improvements to the NHGRI-EBI GWAS Catalog improved the data release frequency by increasing automation of curation and providing scaling improvements, allowing the Catalog to adapt to the needs of evolving study design, genotyping technologies and user needs in the future.
Abstract: The NHGRI-EBI GWAS Catalog has provided data from published genome-wide association studies since 2008. In 2015, the database was redesigned and relocated to EMBL-EBI. The new infrastructure includes a new graphical user interface (www.ebi.ac.uk/gwas/), ontology supported search functionality and an improved curation interface. These developments have improved the data release frequency by increasing automation of curation and providing scaling improvements. The range of available Catalog data has also been extended with structured ancestry and recruitment information added for all studies. The infrastructure improvements also support scaling for larger arrays, exome and sequencing studies, allowing the Catalog to adapt to the needs of evolving study design, genotyping technologies and user needs in the future.

1,903 citations

Journal ArticleDOI
TL;DR: A calibrated phylogenetic tree is constructed on the basis of binary single-nucleotide variants and the more complex variants onto it, estimating the number of mutations for each class and shows bursts of extreme expansion in male numbers that have occurred independently among the five continental superpopulations examined.
Abstract: We report the sequences of 1,244 human Y chromosomes randomly ascertained from 26 worldwide populations by the 1000 Genomes Project. We discovered more than 65,000 variants, including single-nucleotide variants, multiple-nucleotide variants, insertions and deletions, short tandem repeats, and copy number variants. Of these, copy number variants contribute the greatest predicted functional impact. We constructed a calibrated phylogenetic tree on the basis of binary single-nucleotide variants and projected the more complex variants onto it, estimating the number of mutations for each class. Our phylogeny shows bursts of extreme expansion in male numbers that have occurred independently among each of the five continental superpopulations examined, at times of known migrations and technological innovations.

280 citations


Cited by
More filters
Journal ArticleDOI
Monkol Lek, Konrad J. Karczewski1, Konrad J. Karczewski2, Eric Vallabh Minikel1, Eric Vallabh Minikel2, Kaitlin E. Samocha, Eric Banks1, Timothy Fennell1, Anne H. O’Donnell-Luria1, Anne H. O’Donnell-Luria2, Anne H. O’Donnell-Luria3, James S. Ware, Andrew J. Hill4, Andrew J. Hill2, Andrew J. Hill1, Beryl B. Cummings2, Beryl B. Cummings1, Taru Tukiainen1, Taru Tukiainen2, Daniel P. Birnbaum1, Jack A. Kosmicki, Laramie E. Duncan2, Laramie E. Duncan1, Karol Estrada2, Karol Estrada1, Fengmei Zhao2, Fengmei Zhao1, James Zou1, Emma Pierce-Hoffman1, Emma Pierce-Hoffman2, Joanne Berghout5, David Neil Cooper6, Nicole A. Deflaux7, Mark A. DePristo1, Ron Do, Jason Flannick2, Jason Flannick1, Menachem Fromer, Laura D. Gauthier1, Jackie Goldstein1, Jackie Goldstein2, Namrata Gupta1, Daniel P. Howrigan2, Daniel P. Howrigan1, Adam Kiezun1, Mitja I. Kurki1, Mitja I. Kurki2, Ami Levy Moonshine1, Pradeep Natarajan, Lorena Orozco, Gina M. Peloso2, Gina M. Peloso1, Ryan Poplin1, Manuel A. Rivas1, Valentin Ruano-Rubio1, Samuel A. Rose1, Douglas M. Ruderfer8, Khalid Shakir1, Peter D. Stenson6, Christine Stevens1, Brett Thomas1, Brett Thomas2, Grace Tiao1, María Teresa Tusié-Luna, Ben Weisburd1, Hong-Hee Won9, Dongmei Yu, David Altshuler10, David Altshuler1, Diego Ardissino, Michael Boehnke11, John Danesh12, Stacey Donnelly1, Roberto Elosua, Jose C. Florez1, Jose C. Florez2, Stacey Gabriel1, Gad Getz2, Gad Getz1, Stephen J. Glatt13, Christina M. Hultman14, Sekar Kathiresan, Markku Laakso15, Steven A. McCarroll1, Steven A. McCarroll2, Mark I. McCarthy16, Mark I. McCarthy17, Dermot P.B. McGovern18, Ruth McPherson19, Benjamin M. Neale2, Benjamin M. Neale1, Aarno Palotie, Shaun Purcell8, Danish Saleheen20, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan21, Patrick F. Sullivan14, Jaakko Tuomilehto22, Ming T. Tsuang23, Hugh Watkins16, Hugh Watkins17, James G. Wilson24, Mark J. Daly2, Mark J. Daly1, Daniel G. MacArthur2, Daniel G. MacArthur1 
18 Aug 2016-Nature
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

8,758 citations

Journal ArticleDOI
TL;DR: This year’s update, DrugBank 5.0, represents the most significant upgrade to the database in more than 10 years and significant improvements have been made to the quantity, quality and consistency of drug indications, drug binding data as well as drug-drug and drug-food interactions.
Abstract: DrugBank (www.drugbank.ca) is a web-enabled database containing comprehensive molecular information about drugs, their mechanisms, their interactions and their targets. First described in 2006, DrugBank has continued to evolve over the past 12 years in response to marked improvements to web standards and changing needs for drug research and development. This year's update, DrugBank 5.0, represents the most significant upgrade to the database in more than 10 years. In many cases, existing data content has grown by 100% or more over the last update. For instance, the total number of investigational drugs in the database has grown by almost 300%, the number of drug-drug interactions has grown by nearly 600% and the number of SNP-associated drug effects has grown more than 3000%. Significant improvements have been made to the quantity, quality and consistency of drug indications, drug binding data as well as drug-drug and drug-food interactions. A great deal of brand new data have also been added to DrugBank 5.0. This includes information on the influence of hundreds of drugs on metabolite levels (pharmacometabolomics), gene expression levels (pharmacotranscriptomics) and protein expression levels (pharmacoprotoemics). New data have also been added on the status of hundreds of new drug clinical trials and existing drug repurposing trials. Many other important improvements in the content, interface and performance of the DrugBank website have been made and these should greatly enhance its ease of use, utility and potential applications in many areas of pharmacological research, pharmaceutical science and drug education.

4,797 citations

Journal ArticleDOI
11 Oct 2018-Nature
TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.
Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.

4,489 citations

Journal ArticleDOI
12 Oct 2017-Nature
TL;DR: It is found that local genetic variation affects gene expression levels for the majority of genes, and inter-chromosomal genetic effects for 93 genes and 112 loci are identified, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.
Abstract: Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.

3,289 citations

Journal ArticleDOI
TL;DR: These and other strategies are providing researchers and clinicians a variety of tools to probe genomes in greater depth, leading to an enhanced understanding of how genome sequence variants underlie phenotype and disease.
Abstract: Since the completion of the human genome project in 2003, extraordinary progress has been made in genome sequencing technologies, which has led to a decreased cost per megabase and an increase in the number and diversity of sequenced genomes. An astonishing complexity of genome architecture has been revealed, bringing these sequencing technologies to even greater advancements. Some approaches maximize the number of bases sequenced in the least amount of time, generating a wealth of data that can be used to understand increasingly complex phenotypes. Alternatively, other approaches now aim to sequence longer contiguous pieces of DNA, which are essential for resolving structurally complex regions. These and other strategies are providing researchers and clinicians a variety of tools to probe genomes in greater depth, leading to an enhanced understanding of how genome sequence variants underlie phenotype and disease.

3,096 citations