scispace - formally typeset
Search or ask a question
Author

Robert Gentleman

Bio: Robert Gentleman is an academic researcher from Genentech. The author has contributed to research in topics: Bioconductor & Gene expression profiling. The author has an hindex of 52, co-authored 139 publications receiving 48510 citations. Previous affiliations of Robert Gentleman include Harvard University & Brigham and Women's Hospital.


Papers
More filters
Posted ContentDOI
17 Sep 2020-bioRxiv
TL;DR: Crystalized and fluid cognitive abilities have correlated but distinct genetic architectures that relate to those of psychiatric disorders, and their relationships to psychiatric disorder risk can inform the understanding of disease biology nosology and etiology.
Abstract: Group-level cognitive performance differences are found in psychiatric disorders ranging from depression to autism to schizophrenia. To investigate the genetics of individual differences in fluid and crystallized cognitive abilities and their associations with psychiatric disorder risk, we conducted genome-wide association studies (GWAS) of a total of 335 227 consented 23andMe customers of European descent between the ages of 50 and 85, who completed at least one online test of crystallized cognitive ability (vocabulary knowledge, N=188 434) and/or fluid cognitive ability (visual change detection, N=158 888; digit-symbol substitution, N=132 807). All cognitive measures were significantly heritable (h2=0.10-0.16), and GWAS identified 25 novel genome-wide significant loci. Genetic correlation analyses highlight variable profiles of genetic relationships across tasks and disorders. While schizophrenia had moderate negative genetic correlations with tests of fluid cognition (visual change detection rg=-0.27, p 0.005). Crystalized and fluid cognitive abilities thus have correlated but distinct genetic architectures that relate to those of psychiatric disorders. Understanding the genetic underpinnings of specific cognitive abilities, and their relationships to psychiatric disorder risk, can inform the understanding of disease biology nosology and etiology.

7 citations

Journal ArticleDOI
TL;DR: In this article, statistical quality control procedures designed to monitor test performance of the microplates being processed are developed and the logarithm of measured optical density is shown to be a more appropriate scale on which to work.
Abstract: Ensuring the quality and performance of human immunodeficiency virus, type 1 (HIV-1) enzyme-linked immunosorbent assay (ELISA) testing in routine laboratory situations is an important practical concern. In this article we develop statistical quality control procedures designed to monitor test performance of the microplates being processed. First, the logarithm of measured optical density is shown to be a more appropriate scale on which to work. Shewhart control charts are considered but are not entirely satisfactory because of between-plate variation. Correct classification depends more on within-plate variation than on between-plate variation. Range charts are useful, but they do not directly indicate whether the results from any particular microplate are reliable. Consequently, a new statistical control chart—the separation chart—is proposed, and the assumptions on which it is based are empirically verified. Retrospective analysis of nearly 1,300 microplates, using separation and range charts, ...

6 citations

Book ChapterDOI
01 Jan 2005
TL;DR: This chapter describes software tools for creating, manipulating, and visualizing graphs in the Bioconductor project and gives the rationale for the design decisions and brief outlines of how to make use of these tools.
Abstract: We describe software tools for creating, manipulating, and visualizing graphs in the Bioconductor project. We give the rationale for our design decisions and provide brief outlines of how to make use of these tools. The discussion mirrors that of Chapter 20 where the different mathematical constructs were described. It is worth differentiating between packages that are mainly infrastructure (sets of tools that can be used to create other pieces of software) and packages that are designed to provide an end-user application. The packages graph, RBGL, and Rgraphviz are infrastructure packages. Software developers may use these packages to construct tools aimed at specific applications areas, such as the GOstats package.

6 citations

Journal ArticleDOI
TL;DR: A large-scale cross-sectional analysis of self-reported dietary intake data derived from the web-based National Health and Nutrition Examination Survey 2009–2010 dietary screener showed fruit, vegetables and milk intake frequency declined, while total dairy remained stable and added sugars increased.
Abstract: Objective: To characterise dietary habits, their temporal and spatial patterns and associations with BMI in the 23andMe study population Design: We present a large-scale cross-sectional analysis of self-reported dietary intake data derived from the web-based National Health and Nutrition Examination Survey 2009–2010 dietary screener Survey-weighted estimates for each food item were characterised by age, sex, race/ethnicity, education and BMI Temporal patterns were plotted over a 2-year time period, and average consumption for select food items was mapped by state Finally, dietary intake variables were tested for association with BMI Setting: US-based adults 20–85 years of age participating in the 23andMe research programme Participants: Participants were 23andMe customers who consented to participate in research (n 526 774) and completed web-based surveys on demographic and dietary habits Results: Survey-weighted estimates show very few participants met federal recommendations for fruit: 2·6 %, vegetables: 5·9 % and dairy intake: 2·8 % Between 2017 and 2019, fruit, vegetables and milk intake frequency declined, while total dairy remained stable and added sugars increased Seasonal patterns in reporting were most pronounced for ice cream, chocolate, fruits and vegetables Dietary habits varied across the USA, with higher intake of sugar and energy dense foods characterising areas with higher average BMI In multivariate-adjusted models, BMI was directly associated with the intake of processed meat, red meat, dairy and inversely associated with consumption of fruit, vegetables and whole grains Conclusions: 23andMe research participants have created an opportunity for rapid, large-scale, real-time nutritional data collection, informing demographic, seasonal and spatial patterns with broad geographical coverage across the USA

5 citations

Proceedings ArticleDOI
01 Nov 2013
TL;DR: A diverse set of genes differentially expressed between these cell lines, but only a fraction can be attributed to changes in DNA copy number or methylation, included the ABC transporter ABCC4, implicated in drug resistance, and the metastasis associated MET oncogene.
Abstract: Cancer cells derived from different stages of tumor progression may exhibit distinct biological properties, as exemplified by the paired lung cancer cell lines H1993 and H2073. While H1993 was derived from chemo-naive metastasized tumor, H2073 originated from the chemo-resistant primary tumor from the same patient and exhibits strikingly different drug response profile. To understand the underlying genetic and epigenetic bases for their biological properties, we investigated these cells using a wide range of large-scale methods including whole genome sequencing, RNA sequencing, SNP array, DNA methylation array, and de novo genome assembly. We conducted an integrative analysis of both cell lines to distinguish between potential driver and passenger alterations. Although many genes are mutated in these cell lines, the combination of DNA- and RNA-based variant information strongly implicates a small number of genes including TP53 and STK11 as likely drivers. Likewise, we found a diverse set of genes differentially expressed between these cell lines, but only a fraction can be attributed to changes in DNA copy number or methylation. This set included the ABC transporter ABCC4, implicated in drug resistance, and the metastasis associated MET oncogene. While the rich data content allowed us to reduce the space of hypotheses that could explain most of the observed biological properties, we also caution there is a lack of statistical power and inherent limitations in such single patient case studies.

5 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

47,038 citations

Journal ArticleDOI
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Abstract: Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org).

29,413 citations

Journal ArticleDOI
TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

22,147 citations

Journal ArticleDOI
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

20,557 citations

Posted ContentDOI
17 Nov 2014-bioRxiv
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.

17,014 citations