scispace - formally typeset
Open accessJournal ArticleDOI: 10.1073/PNAS.0506580102

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

25 Oct 2005-Proceedings of the National Academy of Sciences of the United States of America (National Academy of Sciences)-Vol. 102, Iss: 43, pp 15545-15550
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets. more


Journal ArticleDOI: 10.1038/NPROT.2008.211
01 Jan 2009-Nature Protocols
Abstract: DAVID bioinformatics resources consists of an integrated biological knowledgebase and analytic tools aimed at systematically extracting biological meaning from large gene/protein lists. This protocol explains how to use DAVID, a high-throughput and integrated data-mining environment, to analyze gene lists derived from high-throughput genomic experiments. The procedure first requires uploading a gene list containing any number of common gene identifiers followed by analysis using one or more text and pathway-mining tools such as gene functional classification, functional annotation chart or clustering and functional annotation table. By following this protocol, investigators are able to gain an in-depth understanding of the biological themes in lists of genes that are enriched in genome-scale studies. more

27,356 Citations

Open accessJournal ArticleDOI: 10.1093/NAR/GKV007
Matthew E. Ritchie1, Belinda Phipson2, Di Wu3, Yifang Hu1  +4 moreInstitutions (5)
Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described. more

Topics: Microarray databases (61%), Bioconductor (51%)

13,819 Citations

Open accessJournal ArticleDOI: 10.1093/NAR/GKN923
Abstract: Functional analysis of large gene lists, derived in most cases from emerging high-throughput genomic, proteomic and bioinformatics scanning approaches, is still a challenging and daunting task. The gene-annotation enrichment analysis is a promising high-throughput strategy that increases the likelihood for investigators to identify biological processes most pertinent to their study. Approximately 68 bioinformatics enrichment tools that are currently available in the community are collected in this survey. Tools are uniquely categorized into three major classes, according to their underlying enrichment algorithms. The comprehensive collections, unique tool classifications and associated questions/issues will provide a more comprehensive and up-to-date view regarding the advantages, pitfalls and recent trends in a simpler tool-class level rather than by a tool-by-tool approach. Thus, the survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests. more

11,360 Citations

Open access
01 Aug 2000-
Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor. more

4,833 Citations

Open accessPosted Content
Abstract: Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions. more

Topics: Feature learning (52%)

4,132 Citations


Journal ArticleDOI: 10.1126/SCIENCE.270.5235.467
20 Oct 1995-Science
Abstract: A high-capacity system was developed to monitor the expression of many genes in parallel. Microarrays prepared by high-speed robotic printing of complementary DNAs on glass were used for quantitative expression measurements of the corresponding genes. Because of the small format and high density of the arrays, hybridization volumes of 2 microliters could be used that enabled detection of rare transcripts in probe mixtures derived from 2 micrograms of total cellular messenger RNA. Differential expression measurements of 45 Arabidopsis genes were made by means of simultaneous, two-color fluorescence hybridization. more

Topics: DNA Microarray Analysis (58%), DNA microarray (57%), Serial analysis of gene expression (56%) more

10,128 Citations

Open accessBook
01 Mar 1973-
Abstract: This Second Edition of Myles Hollander and Douglas A. Wolfe's successful Nonparametric Statistical Methods meets the needs of a new generation of users, with completely up-to-date coverage of this important statistical area. Like its predecessor, the revised edition, along with its companion ftp site, aims to equip readers with the conceptual and technical skills necessary to select and apply the appropriate procedures for a given situation. An extensive array of examples drawn from actual experiments illustrates clearly how to use nonparametric approaches to handle one- or two-sample location and dispersion problems, dichotomous data, and one-way and two-way layout problems. An ideal text for an upper-level undergraduate or first-year graduate course, Nonparametric Statistical Methods, Second Edition is also an invaluable source for professionals who want to keep abreast of the latest developments within this dynamic branch of modern statistics. more

7,240 Citations

Journal ArticleDOI: 10.1038/NG1180
01 Jul 2003-Nature Genetics
Abstract: DNA microarrays can be used to identify gene expression changes characteristic of human disease. This is challenging, however, when relevant differences are subtle at the level of individual genes. We introduce an analytical strategy, Gene Set Enrichment Analysis, designed to detect modest but coordinate changes in the expression of groups of functionally related genes. Using this approach, we identify a set of genes involved in oxidative phosphorylation whose expression is coordinately decreased in human diabetic muscle. Expression of these genes is high at sites of insulin-mediated glucose disposal, activated by PGC-1α and correlated with total-body aerobic capacity. Our results associate this gene set with clinically important variation in human metabolism and illustrate the value of pathway relationships in the analysis of genomic profiling experiments. more

Topics: Gene expression profiling (58%), DNA microarray (54%), Gene expression (53%) more

6,521 Citations

PatentDOI: 10.1038/NBT1296-1675
David J. Lockhart1, Eugene L. Brown1, Gordon G. Wong1, Mark S. Chee1  +1 moreInstitutions (1)
Abstract: This invention provides methods of monitoring the expression levels of a multiplicity of genes. The methods involve hybridizing a nucleic acid sample to a high density array of oligonucleotide probes where the high density array contains oligonucleotide probes complementary to subsequences of target nucleic acids in the nucleic acid sample. In one embodiment, the method involves providing a pool of target nucleic acids comprising RNA transcripts of one or more target genes, or nucleic acids derived from the RNA transcripts, hybridizing said pool of nucleic acids to an array of oligonucleotide probes immobilized on surface, where the array comprising more than 100 different oligonucleotides and each different oligonucleotide is localized in a predetermined region of the surface, the density of the different oligonucleotides is greater than about 60 different oligonucleotides per 1 cm2, and the oligonucleotide probes are complementary to the RNA transcripts or nucleic acids derived from the RNA transcripts; and quantifying the hybridized nucleic acids in the array. more

Topics: Molecular beacon (67%), Nucleic acid (62%), Oligonucleotide (61%) more

4,382 Citations

Journal ArticleDOI: 10.2307/2344557
01 Mar 1974-
Topics: Nonparametric statistics (57%)

3,841 Citations

No. of citations received by the Paper in previous years