scispace - formally typeset
Search or ask a question

Showing papers by "Robert Gentleman published in 2010"


Journal ArticleDOI
TL;DR: In an application to microarray data, it was found that gene-by-gene filtering by overall variance followed by a t-test increased the number of discoveries by 50%, and it was shown that this particular statistic pair induces a lower bound on fold-change among the set of discoveries.
Abstract: With high-dimensional data, variable-by-variable statistical testing is often used to select variables whose behavior differs across conditions. Such an approach requires adjustment for multiple testing, which can result in low statistical power. A two-stage approach that first filters variables by a criterion independent of the test statistic, and then only tests variables which pass the filter, can provide higher power. We show that use of some filter/test statistics pairs presented in the literature may, however, lead to loss of type I error control. We describe other pairs which avoid this problem. In an application to microarray data, we found that gene-by-gene filtering by overall variance followed by a t-test increased the number of discoveries by 50%. We also show that this particular statistic pair induces a lower bound on fold-change among the set of discoveries. Independent filtering—using filter/test pairs that are independent under the null hypothesis but correlated under the alternative—is a general approach that can substantially increase the efficiency of experiments.

693 citations


Journal ArticleDOI
27 May 2010-Nature
TL;DR: A comprehensive view of somatic alterations in a single lung tumour is presented, and the first evidence, to the authors' knowledge, of distinct selective pressures present within the tumour environment is provided.
Abstract: Complete genome sequencing has already provided insights into the mutation spectra of a number of cancer types, including lung cancer. The latest sequencing technologies mean that it is possible to provide a genome-wide view of mutation differences, and this has now been done for lung cancer, comparing the complete sequences of a primary lung tumour — an adenocarcinoma in a male who reported smoking an average of 25 cigarettes a day for 15 years — and adjacent normal tissue. The comparison revealed more than 50,000 point mutations of which 530 were validated, 392 of them in coding regions, including previously known variations such as KRAS proto-oncogene mutation and amplification. The data suggest that genetically complex tumours may contain many partially redundant mutations, and that identifying recurrent cancer-causing driver mutations will require the sequencing of many more samples yet. Complete genome sequencing has already provided insights into the mutation spectra of several cancer types. Here, the first complete sequences are provided of a primary lung tumour and adjacent normal tissue. Comparison of the two reveals a variety of somatic mutations in the cancer genome, including changes in the KRAS proto-oncogene. The results reveal a distinct pattern of selection against mutations within expressed genes compared to non-expressed genes, and selection against mutations in promoter regions. Lung cancer is the leading cause of cancer-related mortality worldwide, with non-small-cell lung carcinomas in smokers being the predominant form of the disease1,2. Although previous studies have identified important common somatic mutations in lung cancers, they have primarily focused on a limited set of genes and have thus provided a constrained view of the mutational spectrum3,4,5,6,7,8. Recent cancer sequencing efforts have used next-generation sequencing technologies to provide a genome-wide view of mutations in leukaemia, breast cancer and cancer cell lines9,10,11,12,13. Here we present the complete sequences of a primary lung tumour (60× coverage) and adjacent normal tissue (46×). Comparing the two genomes, we identify a wide variety of somatic variations, including >50,000 high-confidence single nucleotide variants. We validated 530 somatic single nucleotide variants in this tumour, including one in the KRAS proto-oncogene and 391 others in coding regions, as well as 43 large-scale structural variations. These constitute a large set of new somatic mutations and yield an estimated 17.7 per megabase genome-wide somatic mutation rate. Notably, we observe a distinct pattern of selection against mutations within expressed genes compared to non-expressed genes and in promoter regions up to 5 kilobases upstream of all protein-coding genes. Furthermore, we observe a higher rate of amino acid-changing mutations in kinase genes. We present a comprehensive view of somatic alterations in a single lung tumour, and provide the first evidence, to our knowledge, of distinct selective pressures present within the tumour environment.

541 citations


Journal ArticleDOI
TL;DR: Findings were that MyoD was constitutively bound to thousands of additional sites in both myoblasts and myotubes, and that the genome-wide binding of Myo D was associated with regional histone acetylation.

459 citations


Journal ArticleDOI
TL;DR: Gene expression profiles between benign epithelia of patients with and without prostate cancer are very similar, however, these tissues exhibit differences in the expression levels of several genes previously associated with prostate cancer development or progression.
Abstract: Background: Several malignancies are known to exhibit a “field effect,” whereby regions beyond tumor boundaries harbor histologic or molecular changes that are associated with cancer. We sought to determine if histologically benign prostate epithelium collected from men with prostate cancer exhibits features indicative of premalignancy or field effect. Experimental Design: Prostate needle biopsies from 15 men with high-grade (Gleason 8-10) prostate cancer and 15 age- and body mass index–matched controls were identified from a biospecimen repository. Benign epithelia from each patient were isolated by laser capture microdissection. RNA was isolated, amplified, and used for microarray hybridization. Quantitative PCR was used to determine the expression of specific genes of interest. Alterations in protein expression were analyzed through immunohistochemistry. Results: Overall patterns of gene expression in microdissected benign prostate-associated benign epithelium (BABE) and cancer-associated benign epithelium (CABE) were similar. Two genes previously associated with prostate cancer, PSMA and SSTR1 , were significantly upregulated in the CABE group (false discovery rate ERG, HOXC4, HOXC5 , and MME , were also increased in CABE by quantitative reverse transcription-PCR, although other genes commonly altered in prostate cancer were not different between the BABE and CABE samples. The expression of MME and PSMA proteins on immunohistochemistry coincided with their mRNA alterations. Conclusion: Gene expression profiles between benign epithelia of patients with and without prostate cancer are very similar. However, these tissues exhibit differences in the expression levels of several genes previously associated with prostate cancer development or progression. These differences may comprise a field effect and represent early events in carcinogenesis. Clin Cancer Res; 16(22); 5414–23. ©2010 AACR.

43 citations


Journal ArticleDOI
TL;DR: In the example reported by Talloen et al. (1), the detection power achieved by the more general—and thus, more broadly applicable—overall variance filter was as good as or better than that of a technology-specific criterion at all filtering thresholds.
Abstract: Talloen et al. (1) point out an interesting special case of a filtering criterion that is specifically constructed for Affymetrix GeneChip technology. In general, methods that are adapted to a particular data generation technique are likely to outperform more general criteria. It is striking, however, that, in the example reported by Talloen et al. (1), the detection power achieved by the more general—and thus, more broadly applicable—overall variance filter was as good as or better than that of a technology-specific criterion at all filtering thresholds (figure 1 in ref. 1).

20 citations


Journal ArticleDOI
05 Jan 2010-Database
TL;DR: DATABASE: the Journal of Biological Databases and Curation invites the submission of novel strategies for the efficient and accurate curation of biological data, including systems to support ongoing curation by both individual researchers and research communities in order to ensure long-term availability and reusability of these data.
Abstract: Most computational tools for biologists preferably require data in large amounts. The larger the quantity of data, the more rigorous statistical analyses can support the discovery of new hypotheses for testing in a laboratory. A variety of technological developments during the past two decades have accelerated the rate of deposition of data into databases. Currently there are many public databases where data from, for example, DNA and protein sequences or 3D protein structures, and more complex information types, like ontologies, networks and pathways are deposited, maintained, annotated, curated and stored. Indeed, more recent efforts to store, for example, phenotype (in addition to genotypes) and clinical trials signify a new tendency to gather more complex data types. The data collected in these large public repositories represent valuable and significant resources for ongoing knowledge extraction. Mining of this data using computational tools is an increasingly indispensible part of modern research, and the organized storage of the data in databases is obligatory. Indeed such approaches are likely to have serious impact on the reproducibility of results. Resourceful tools for the establishment, interrogation, rearrangement, display and interpretation of new and large databases are frequently minor points in a publication and are relegated to brief statements in methods sections or in figure legends when the final work is published. However, there are often original and creative computational methods which resulted in these discoveries but which are not communicated in the scientific literature because the description of a database and the tools to interact with it are not deemed essential to the communication. Accepting that the archiving, curation, analysis and understanding of all of this data is a challenge, DATABASE: the Journal of Biological Databases and Curation will publish articles which describe the construction of novel databases and the software tools designed to interact with these databases. All submissions should describe worthy resources for the scientific research endeavor. We also plan to invite reviews and tutorials that will make the databases described in these pages more user friendly and easier to match with the tasks that need to be accomplished. In addition, manuscripts that describe collections of data and associated tools where a biologically relevant discovery or example is presented will be reviewed more favorably. We would also be prepared to review opinions, discussions and/or demonstrations of how new technologies, new data models (or data exchange models) can be used to address complexities presented by the new large datasets and/or personal identification challenges the new initiatives are presenting. The journal will also accept update reports which describe new features and content of existing databases. The maintenance and longevity (when appropriate) of databases is an ongoing point of discussion, and we welcome opinion pieces and the presentation of how such problems could best be addressed. Scalability and federation of a number of databases, the Web 2.0 and 3.0 integration and the semantic web are also pertinent discussions for the biological database community, and we hope that DATABASE: the Journal of Biological Databases and Curation becomes the place where some of these ideas are discussed and deliberated. We will provide online commenting and discussion tools on the journal's website to encourage this. Extensive and ongoing curation of the biological data being stored in public databases ensures that these data can be discovered and used optimally, and facilitates the integration of information from multiple sources. Structured collection of metadata, using standard terminology, will foster more complex and relevant analyses. DATABASE: the Journal of Biological Databases and Curation invites the submission of novel strategies for the efficient and accurate curation of biological data, including systems to support ongoing curation by both individual researchers and research communities in order to ensure long-term availability and reusability of these data. In support of the new open access policies of many funding agencies as well as the open source software movement which started in the 1980s, DATABASE: the Journal of Biological Databases and Curation will be a fully open access journal from launch. In addition, it will be a condition of publication that all databases and software described in DATABASE articles are made publicly available. The journal will be online-only, providing fast access of its full content to scientists worldwide. Submissions to DATABASE: the Journal of Biological Databases and Curation are welcomed via the journal's web site at www.database.oxfordjournals.org. We also welcome suggestions for how this new forum can best serve the needs of the increasingly important field it represents.

16 citations