Author
Russell J. Grocock
Other affiliations: University of Nottingham, Wellcome Trust Sanger Institute, GlaxoSmithKline
Bio: Russell J. Grocock is an academic researcher from Illumina. The author has contributed to research in topics: Codon usage bias & Cancer. The author has an hindex of 23, co-authored 28 publications receiving 24455 citations. Previous affiliations of Russell J. Grocock include University of Nottingham & Wellcome Trust Sanger Institute.
Topics: Codon usage bias, Cancer, Chromosome 19, Chromosome 21, Human genome
Papers
More filters
••
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
12,661 citations
••
TL;DR: The miRBase database aims to provide integrated interfaces to comprehensive microRNA sequence data, annotation and predicted gene targets, and acts as an independent arbiter of microRNA gene nomenclature.
Abstract: The miRBase database aims to provide integrated interfaces to comprehensive microRNA sequence data, annotation and predicted gene targets. miRBase takes over functionality from the microRNA Registry and fulfils three main roles: the miRBase Registry acts as an independent arbiter of microRNA gene nomenclature, assigning names prior to publication of novel miRNA sequences. miRBase Sequences is the primary online repository for miRNA sequence data and annotation. miRBase Targets is a comprehensive new database of predicted miRNA target genes. miRBase is available at http://microrna.sanger.ac.uk/.
4,629 citations
01 Oct 2015
TL;DR: The 1000 Genomes Project as mentioned in this paper provided a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and reported the completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole genome sequencing, deep exome sequencing and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
3,247 citations
••
TL;DR: It is shown that mice deficient for bic/microRNA-155 are immunodeficient and display increased lung airway remodeling, and suggests that bic-micro RNA-155 plays a key role in the homeostasis and function of the immune system.
Abstract: MicroRNAs are a class of small RNAs that are increasingly being recognized as important regulators of gene expression. Although hundreds of microRNAs are present in the mammalian genome, genetic studies addressing their physiological roles are at an early stage. We have shown that mice deficient for bic/microRNA-155 are immunodeficient and display increased lung airway remodeling. We demonstrate a requirement of bic/microRNA-155 for the function of B and T lymphocytes and dendritic cells. Transcriptome analysis of bic/microRNA-155–deficient CD4+ T cells identified a wide spectrum of microRNA-155–regulated genes, including cytokines, chemokines, and transcription factors. Our work suggests that bic/microRNA-155 plays a key role in the homeostasis and function of the immune system.
1,880 citations
••
TL;DR: The genomes of a malignant melanoma and a lymphoblastoid cell line from the same person are sequenced, providing the first comprehensive catalogue of somatic mutations from an individual cancer.
Abstract: All cancers carry somatic mutations. A subset of these somatic alterations, termed driver mutations, confer selective growth advantage and are implicated in cancer development, whereas the remainder are passengers. Here we have sequenced the genomes of a malignant melanoma and a lymphoblastoid cell line from the same person, providing the first comprehensive catalogue of somatic mutations from an individual cancer. The catalogue provides remarkable insights into the forces that have shaped this cancer genome. The dominant mutational signature reflects DNA damage due to ultraviolet light exposure, a known risk factor for malignant melanoma, whereas the uneven distribution of mutations across the genome, with a lower prevalence in gene footprints, indicates that DNA repair has been preferentially deployed towards transcribed regions. The results illustrate the power of a cancer genome sequence to reveal traces of the DNA damage, repair, mutation and selection processes that were operative years before the cancer became symptomatic.
1,651 citations
Cited by
More filters
••
TL;DR: The current understanding of miRNA target recognition in animals is outlined and the widespread impact of miRNAs on both the expression and evolution of protein-coding genes is discussed.
18,036 citations
••
TL;DR: A unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs is presented.
Abstract: Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.
10,056 citations
••
Harvard University1, Broad Institute2, Boston Children's Hospital3, University of Washington4, University of Arizona5, Cardiff University6, Google7, Icahn School of Medicine at Mount Sinai8, Samsung Medical Center9, Vertex Pharmaceuticals10, University of Michigan11, University of Cambridge12, State University of New York Upstate Medical University13, Karolinska Institutet14, University of Eastern Finland15, University of Oxford16, Wellcome Trust Centre for Human Genetics17, Cedars-Sinai Medical Center18, University of Ottawa19, University of Pennsylvania20, University of North Carolina at Chapel Hill21, University of Helsinki22, University of California, San Diego23, University of Mississippi Medical Center24
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
8,758 citations
••
TL;DR: This work overhauled its tool for finding preferential conservation of sequence motifs and applied it to the analysis of human 3'UTRs, increasing by nearly threefold the detected number of preferentially conserved miRNA target sites.
Abstract: MicroRNAs (miRNAs) are small endogenous RNAs that pair to sites in mRNAs to direct post-transcriptional repression. Many sites that match the miRNA seed (nucleotides 2–7), particularly those in 3 untranslated regions (3UTRs), are preferentially conserved. Here, we overhauled our tool for finding preferential conservation of sequence motifs and applied it to the analysis of human 3UTRs, increasing by nearly threefold the detected number of preferentially conserved miRNA target sites. The new tool more efficiently incorporates new genomes and more completely controls for background conservation by accounting for mutational biases, dinucleotide conservation rates, and the conservation rates of individual UTRs. The improved background model enabled preferential conservation of a new site type, the “offset 6mer,” to be detected. In total, >45,000 miRNA target sites within human 3UTRs are conserved above background levels, and >60% of human protein-coding genes have been under selective pressure to maintain pairing to miRNAs. Mammalian-specific miRNAs have far fewer conserved targets than do the more broadly conserved miRNAs, even when considering only more recently emerged targets. Although pairing to the 3 end of miRNAs can compensate for seed mismatches, this class of sites constitutes less than 2% of all preferentially conserved sites detected. The new tool enables statistically powerful analysis of individual miRNA target sites, with the probability of preferentially conserved targeting (PCT) correlating with experimental measurements of repression. Our expanded set of target predictions (including conserved 3-compensatory sites), are available at the TargetScan website, which displays the PCT for each site and each predicted target.
7,744 citations
••
TL;DR: MiRNA-expression profiling of human tumours has identified signatures associated with diagnosis, staging, progression, prognosis and response to treatment and has been exploited to identify miRNA genes that might represent downstream targets of activated oncogenic pathways, or that target protein-coding genes involved in cancer.
Abstract: MicroRNA (miRNA ) alterations are involved in the initiation and progression of human cancer. The causes of the widespread differential expression of miRNA genes in malignant compared with normal cells can be explained by the location of these genes in cancer-associated genomic regions, by epigenetic mechanisms and by alterations in the miRNA processing machinery. MiRNA-expression profiling of human tumours has identified signatures associated with diagnosis, staging, progression, prognosis and response to treatment. In addition, profiling has been exploited to identify miRNA genes that might represent downstream targets of activated oncogenic pathways, or that target protein- coding genes involved in cancer.
6,345 citations