scispace - formally typeset
Search or ask a question
Author

Robert Gentleman

Bio: Robert Gentleman is an academic researcher from Genentech. The author has contributed to research in topics: Bioconductor & Gene expression profiling. The author has an hindex of 52, co-authored 139 publications receiving 48510 citations. Previous affiliations of Robert Gentleman include Harvard University & Brigham and Women's Hospital.


Papers
More filters
Posted ContentDOI
16 Jun 2021-medRxiv
TL;DR: In this paper, a large-scale online collection of self-reported diagnosis data is used for discovery and replication of genetic associations for rare diseases, including Duane retraction syndrome, vestibular schwannoma, and spontaneous pneumothorax.
Abstract: A key challenge in the study of rare disease genetics is assembling large case cohorts for well-powered studies. We demonstrate the use of self-reported diagnosis data to study rare diseases at scale. We performed genome-wide association studies (GWAS) for 33 rare diseases using self-reported diagnosis phenotypes and re-discovered 29 known associations to validate our approach. In addition, we performed the first GWAS for Duane retraction syndrome, vestibular schwannoma and spontaneous pneumothorax, and report novel genome-wide significant associations for these diseases. We replicated these novel associations in non-European populations within the 23andMe, Inc. cohort as well as in the UK Biobank cohort. We also show that mixed model analyses including all ethnicities and related samples increase the power for finding associations in rare diseases. Our results, based on analysis of 19,084 rare disease cases for 33 diseases from 7 populations, show that large-scale online collection of self-reported data is a viable method for discovery and replication of genetic associations for rare diseases. This approach, which is complementary to sequencing-based approaches, will enable the discovery of more novel genetic associations for increasingly rare diseases across multiple ancestries and shed more light on the genetic architecture of rare diseases.

4 citations

Posted Content
TL;DR: The Bioconductor project as discussed by the authors is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics, which aims to foster collaborative development and widespread use of innovative software, reduce barriers to entry into interdisciplinary scientific research, and promote the achievement of remote reproducibility of research results.
Abstract: The Bioconductor project is an initiative for the collaborative creation of the extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methodes, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples.

4 citations

01 Jan 2008
TL;DR: Using single gene deletion data, the problem of linking a phenotype to underlying functional roles in the organism is addressed and a sound computational and statistical paradigm is provided that can be extended to address more complex experimental settings such as multiple deletions.
Abstract: Understanding regulatory mechanisms in complex biological systems is an important challenge, in particular to understand disease mechanisms, and to discover new therapies and drugs. In this paper, we consider the important question of cellular regulation of phenotype. Using single gene deletion data, we address the problem of linking a phenotype to underlying functional roles in the organism and provide a sound computational and statistical paradigm that can be extended to address more complex experimental settings such as multiple deletions. We apply the proposed approaches to publicly available data sets to demonstrate strong evidence for the involvement of multi-protein complexes in the phenotypes studied. Assessing the role of multi-protein complexes in determining phenotype Nolwenn Le Meur and Robert Gentleman Fred Hutchinson Cancer Center Research, Program in Computational, Biology, 1100 Fairview Avenue North M2-B876, P.O. Box 19024, Seattle, Washington, USA 98109-1024 corresponding author: Nolwenn Le Meur

3 citations

Journal Article
TL;DR: In this paper, the authors argue for an increased emphasis on computing in the training of statisticians and in their professional practice, and they describe some of the current technological challenges and demonstrate the importance for statisticians of becoming more active in computational aspects of their work and specifically in producing software for carrying out statistical procedures.
Abstract: The author argues for an increased emphasis on computing in the training of statisticians and in their professional practice. He describes some of the current technological challenges and demonstrates the importance for statisticians of becoming more active in computational aspects of their work and specifically in producing software for carrying out statistical procedures. Such a reorientation will require substantial changes in thinking, pedagogy and infrastructure; the author mentions some of the conditions required to achieve these goals. Quelques points de vue sur le calcul statistique L'auteur plaide en faveur d'une part accrue pour le calcul iinformatique dans la formation des statisticiens et dans leur exercice de la profession. II evoque quelques-uns des defis technologiques actuels et montre l'importance pour les statisticiens de s'engager plus activement dans les aspects numeriques de leur travail et notamment dans l'elaboration de logiciels statistiques. Une telle reorientation necessitera des changements profonds aux plans conceptuels, pedagogiques et des infrastructures; l'auteur enumere certaines des conditions requises pour atteindre ces objectifs.

3 citations

Journal ArticleDOI
TL;DR: In this paper, the authors argue for an increased emphasis on computing in the training of statisticians and in their professional practice, and they describe some of the current technological challenges and demonstrate the importance for statisticians of becoming more active in computational aspects of their work and specifically in producing software for carrying out statistical procedures.
Abstract: The author argues for an increased emphasis on computing in the training of statisticians and in their professional practice. He describes some of the current technological challenges and demonstrates the importance for statisticians of becoming more active in computational aspects of their work and specifically in producing software for carrying out statistical procedures. Such a reorientation will require substantial changes in thinking, pedagogy and infrastructure; the author mentions some of the conditions required to achieve these goals. Quelques points de vue sur le calcul statistique L'auteur plaide en faveur d'une part accrue pour le calcul iinformatique dans la formation des statisticiens et dans leur exercice de la profession. II evoque quelques-uns des defis technologiques actuels et montre l'importance pour les statisticiens de s'engager plus activement dans les aspects numeriques de leur travail et notamment dans l'elaboration de logiciels statistiques. Une telle reorientation necessitera des changements profonds aux plans conceptuels, pedagogiques et des infrastructures; l'auteur enumere certaines des conditions requises pour atteindre ces objectifs.

3 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

47,038 citations

Journal ArticleDOI
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Abstract: Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org).

29,413 citations

Journal ArticleDOI
TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

22,147 citations

Journal ArticleDOI
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

20,557 citations

Posted ContentDOI
17 Nov 2014-bioRxiv
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.

17,014 citations