Home
/
Authors
/
Hervé Pagès

Author

Hervé Pagès

Bio: Hervé Pagès is an academic researcher from Fred Hutchinson Cancer Research Center. The author has contributed to research in topics: Bioconductor & Matrix representation. The author has an hindex of 9, co-authored 14 publications receiving 5823 citations.

Topics: Bioconductor, Matrix representation, Heterochromatin, Nucleolus, Biological data ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Software for computing and annotating genomic ranges.

[...]

Michael F. Lawrence¹, Wolfgang Huber², Hervé Pagès³, Patrick Aboyoun³, Marc R. J. Carlson³, Robert Gentleman¹, Martin Morgan³, Vincent J. Carey⁴ - Show less +4 more•Institutions (4)

Genentech¹, European Bioinformatics Institute², Fred Hutchinson Cancer Research Center³, Brigham and Women's Hospital⁴

08 Aug 2013-PLOS Computational Biology

TL;DR: This work describes Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions, including those for sequence analysis, differential expression analysis and visualization.

...read moreread less

Abstract: We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.

...read moreread less

3,005 citations

Journal Article•DOI•

Orchestrating high-throughput genomic analysis with Bioconductor

[...]

Wolfgang Huber, Vincent J. Carey¹, Robert Gentleman², Simon Anders, Marc R. J. Carlson³, Benilton S. Carvalho⁴, Héctor Corrada Bravo⁵, Sean Davis⁶, Laurent Gatto⁷, Thomas Girke⁸, Raphael Gottardo³, Florian Hahne⁹, Kasper D. Hansen¹⁰, Rafael A. Irizarry¹, Michael S. Lawrence², Michael I. Love¹, James W. MacDonald¹¹, Valerie Obenchain³, Andrzej K. Oleś, Hervé Pagès³, Alejandro Reyes, Paul Shannon³, Gordon K. Smyth¹², Dan Tenenbaum³, Levi Waldron¹³, Martin Morgan³ - Show less +22 more•Institutions (13)

Harvard University¹, Genentech², Fred Hutchinson Cancer Research Center³, State University of Campinas⁴, University of Maryland, College Park⁵, National Institutes of Health⁶, University of Cambridge⁷, University of California, Riverside⁸, Novartis⁹, Johns Hopkins University¹⁰, University of Washington¹¹, Walter and Eliza Hall Institute of Medical Research¹², City University of New York¹³

01 Feb 2015-Nature Methods

TL;DR: An overview of Bioconductor, an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology, which comprises 934 interoperable packages contributed by a large, diverse community of scientists.

...read moreread less

Abstract: Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.

...read moreread less

2,818 citations

Journal Article•DOI•

ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data

[...]

Lihua Julie Zhu¹, Claude Gazin², Nathan D. Lawson¹, Hervé Pagès³, Simon Lin⁴, David S. Lapointe¹, Michael R. Green¹ - Show less +3 more•Institutions (4)

University of Massachusetts Medical School¹, Centre national de la recherche scientifique², Fred Hutchinson Cancer Research Center³, Northwestern University⁴

11 May 2010-BMC Bioinformatics

TL;DR: ChIPpeakAnno enables batch annotation of the binding sites identified from ChIP-seq, Chip-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R.

...read moreread less

Abstract: Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome. We have developed ChIPpeakAnno as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with ChIPpeakAnno can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes. ChIPpeakAnno enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as GenomicFeatures and BSgenom e, provides flexibility. Tight integration to the biomaRt package enables up-to-date annotation retrieval from the BioMart database.

...read moreread less

911 citations

Journal Article•DOI•

ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data

[...]

Martin Morgan¹, Simon Anders¹, Michael V. Lawrence¹, Patrick Aboyoun¹, Hervé Pagès¹, Robert Gentleman¹ - Show less +2 more•Institutions (1)

Fred Hutchinson Cancer Research Center¹

01 Oct 2009-Bioinformatics

TL;DR: ShortRead is a package for input, quality assessment, manipulation and output of high-throughput sequencing data, provided in the R and Bioconductor environments, allowing ready access to additional facilities for advanced statistical analysis, data transformation, visualization and integration with diverse genomic resources.

...read moreread less

Abstract: Summary: ShortRead is a package for input, quality assessment, manipulation and output of high-throughput sequencing data. ShortRead is provided in the R and Bioconductor environments, allowing ready access to additional facilities for advanced statistical analysis, data transformation, visualization and integration with diverse genomic resources. Availability and Implementation: This package is implemented in R and available at the Bioconductor web site; the package contains a ‘vignette’ outlining typical work flows. Contact: mtmorgan@fhcrc.org

...read moreread less

475 citations

Journal Article•DOI•

Orchestrating Single-Cell Analysis with Bioconductor

[...]

Robert A. Amezquita¹, Aaron T. L. Lun², Aaron T. L. Lun³, Etienne Becht¹, Vincent J. Carey⁴, Lindsay N. Carpp¹, Ludwig Geistlinger⁵, Federico Marini, Kevin Rue-Albrecht⁶, Davide Risso⁷, Davide Risso⁸, Charlotte Soneson⁹, Charlotte Soneson¹⁰, Levi Waldron⁵, Hervé Pagès¹, Mike L. Smith, Wolfgang Huber, Martin Morgan¹¹, Raphael Gottardo¹, Stephanie C. Hicks¹² - Show less +16 more•Institutions (12)

Fred Hutchinson Cancer Research Center¹, Genentech², University of Cambridge³, Brigham and Women's Hospital⁴, City University of New York⁵, University of Oxford⁶, Cornell University⁷, University of Padua⁸, Swiss Institute of Bioinformatics⁹, Friedrich Miescher Institute for Biomedical Research¹⁰, Roswell Park Cancer Institute¹¹, Johns Hopkins University¹²

01 Feb 2020-Nature Methods

TL;DR: This Perspective highlights open-source software for single-cell analysis released as part of the Bioconductor project, providing an overview for users and developers.

...read moreread less

Abstract: Recent technological advancements have enabled the profiling of a large number of genome-wide features in individual cells. However, single-cell data present unique challenges that require the development of specialized methods and software infrastructure to successfully derive biological insights. The Bioconductor project has rapidly grown to meet these demands, hosting community-developed open-source software distributed as R packages. Featuring state-of-the-art computational methods, standardized data infrastructure and interactive data visualization tools, we present an overview and online book (https://osca.bioconductor.org) of single-cell methods for prospective users.

...read moreread less

332 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

[...]

Michael I. Love¹, Michael I. Love², Wolfgang Huber, Simon Anders•Institutions (2)

Harvard University¹, Max Planck Society²

05 Dec 2014-Genome Biology

TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

...read moreread less

Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

...read moreread less

47,038 citations

Journal Article•DOI•

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

[...]

Aaron McKenna¹, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran V. Garimella, David Altshuler, Stacey Gabriel, Mark J. Daly, Mark A. DePristo - Show less +7 more•Institutions (1)

Broad Institute¹

01 Sep 2010-Genome Research

TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

...read moreread less

20,557 citations

Posted Content•DOI•

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

[...]

Michael I. Love¹, Wolfgang Huber, Simon Anders•Institutions (1)

Harvard University¹

17 Nov 2014-bioRxiv

...read moreread less

Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.

...read moreread less

17,014 citations

Journal Article•DOI•

HTSeq—a Python framework to work with high-throughput sequencing data

[...]

Simon Anders, Paul Theodor Pyl, Wolfgang Huber

15 Jan 2015-Bioinformatics

TL;DR: This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.

...read moreread less

Abstract: Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.de

...read moreread less

15,744 citations

Journal Article•DOI•

Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2

[...]

Evan Bolyen¹, Jai Ram Rideout¹, Matthew R. Dillon¹, Nicholas A. Bokulich¹, Christian C. Abnet², Gabriel A. Al-Ghalith³, Harriet Alexander⁴, Harriet Alexander⁵, Eric J. Alm⁶, Manimozhiyan Arumugam⁷, Francesco Asnicar⁸, Yang Bai⁹, Jordan E. Bisanz¹⁰, Kyle Bittinger¹¹, Asker Daniel Brejnrod⁷, Colin J. Brislawn¹², C. Titus Brown⁴, Benjamin J. Callahan¹³, Andrés Mauricio Caraballo-Rodríguez¹⁴, John Chase¹, Emily K. Cope¹, Ricardo Silva¹⁴, Christian Diener¹⁵, Pieter C. Dorrestein¹⁴, Gavin M. Douglas¹⁶, Daniel M. Durall¹⁷, Claire Duvallet⁶, Christian F. Edwardson, Madeleine Ernst¹⁴, Madeleine Ernst¹⁸, Mehrbod Estaki¹⁷, Jennifer Fouquier¹⁹, Julia M. Gauglitz¹⁴, Sean M. Gibbons¹⁵, Sean M. Gibbons²⁰, Deanna L. Gibson¹⁷, Antonio Gonzalez¹⁴, Kestrel Gorlick¹, Jiarong Guo²¹, Benjamin Hillmann³, Susan Holmes²², Hannes Holste¹⁴, Curtis Huttenhower²³, Curtis Huttenhower²⁴, Gavin A. Huttley²⁵, Stefan Janssen²⁶, Alan K. Jarmusch¹⁴, Lingjing Jiang¹⁴, Benjamin D. Kaehler²⁷, Benjamin D. Kaehler²⁵, Kyo Bin Kang¹⁴, Kyo Bin Kang²⁸, Christopher R. Keefe¹, Paul Keim¹, Scott T. Kelley²⁹, Dan Knights³, Irina Koester¹⁴, Tomasz Kosciolek¹⁴, Jorden Kreps¹, Morgan G. I. Langille¹⁶, Joslynn S. Lee³⁰, Ruth E. Ley³¹, Ruth E. Ley³², Yong-Xin Liu, Erikka Loftfield², Catherine A. Lozupone¹⁹, Massoud Maher¹⁴, Clarisse Marotz¹⁴, Bryan D Martin²⁰, Daniel McDonald¹⁴, Lauren J. McIver²⁴, Lauren J. McIver²³, Alexey V. Melnik¹⁴, Jessica L. Metcalf³³, Sydney C. Morgan¹⁷, Jamie Morton¹⁴, Ahmad Turan Naimey¹, Jose A. Navas-Molina³⁴, Jose A. Navas-Molina¹⁴, Louis-Félix Nothias¹⁴, Stephanie B. Orchanian, Talima Pearson¹, Samuel L. Peoples³⁵, Samuel L. Peoples²⁰, Daniel Petras¹⁴, Mary L. Preuss³⁶, Elmar Pruesse¹⁹, Lasse Buur Rasmussen⁷, Adam R. Rivers³⁷, Michael S. Robeson³⁸, Patrick Rosenthal³⁶, Nicola Segata⁸, Michael Shaffer¹⁹, Arron Shiffer¹, Rashmi Sinha², Se Jin Song¹⁴, John R. Spear³⁹, Austin D. Swafford, Luke R. Thompson⁴⁰, Luke R. Thompson⁴¹, Pedro J. Torres²⁹, Pauline Trinh²⁰, Anupriya Tripathi¹⁴, Peter J. Turnbaugh¹⁰, Sabah Ul-Hasan⁴², Justin J. J. van der Hooft⁴³, Fernando Vargas, Yoshiki Vázquez-Baeza¹⁴, Emily Vogtmann², Max von Hippel⁴⁴, William A. Walters³², Yunhu Wan², Mingxun Wang¹⁴, Jonathan Warren⁴⁵, Kyle C. Weber³⁷, Kyle C. Weber⁴⁶, Charles H. D. Williamson¹, Amy D. Willis²⁰, Zhenjiang Zech Xu¹⁴, Jesse R. Zaneveld²⁰, Yilong Zhang⁴⁷, Qiyun Zhu¹⁴, Rob Knight¹⁴, J. Gregory Caporaso¹ - Show less +120 more•Institutions (47)

Northern Arizona University¹, National Institutes of Health², University of Minnesota³, University of California, Davis⁴, Woods Hole Oceanographic Institution⁵, Massachusetts Institute of Technology⁶, University of Copenhagen⁷, University of Trento⁸, Chinese Academy of Sciences⁹, University of California, San Francisco¹⁰, University of Pennsylvania¹¹, Pacific Northwest National Laboratory¹², North Carolina State University¹³, University of California, San Diego¹⁴, Institute for Systems Biology¹⁵, Dalhousie University¹⁶, University of British Columbia¹⁷, Statens Serum Institut¹⁸, Anschutz Medical Campus¹⁹, University of Washington²⁰, Michigan State University²¹, Stanford University²², Broad Institute²³, Harvard University²⁴, Australian National University²⁵, University of Düsseldorf²⁶, University of New South Wales²⁷, Sookmyung Women's University²⁸, San Diego State University²⁹, Howard Hughes Medical Institute³⁰, Cornell University³¹, Max Planck Society³², Colorado State University³³, Google³⁴, Syracuse University³⁵, Webster University³⁶, United States Department of Agriculture³⁷, University of Arkansas for Medical Sciences³⁸, Colorado School of Mines³⁹, University of Southern Mississippi⁴⁰, National Oceanic and Atmospheric Administration⁴¹, University of California, Merced⁴², Wageningen University and Research Centre⁴³, University of Arizona⁴⁴, Environment Agency⁴⁵, University of Florida⁴⁶, Merck & Co.⁴⁷

01 Aug 2019-Nature Biotechnology

TL;DR: QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and R.K.P. and partial support was also provided by the following: grants NIH U54CA143925 and U54MD012388.

...read moreread less

Abstract: QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and 1565057 to R.K. Partial support was also provided by the following: grants NIH U54CA143925 (J.G.C. and T.P.) and U54MD012388 (J.G.C. and T.P.); grants from the Alfred P. Sloan Foundation (J.G.C. and R.K.); ERCSTG project MetaPG (N.S.); the Strategic Priority Research Program of the Chinese Academy of Sciences QYZDB-SSW-SMC021 (Y.B.); the Australian National Health and Medical Research Council APP1085372 (G.A.H., J.G.C., Von Bing Yap and R.K.); the Natural Sciences and Engineering Research Council (NSERC) to D.L.G.; and the State of Arizona Technology and Research Initiative Fund (TRIF), administered by the Arizona Board of Regents, through Northern Arizona University. All NCI coauthors were supported by the Intramural Research Program of the National Cancer Institute. S.M.G. and C. Diener were supported by the Washington Research Foundation Distinguished Investigator Award.

...read moreread less

8,821 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse