Home
/
Authors
/
Paul Theodor Pyl

Author

Paul Theodor Pyl

Other affiliations: Science for Life Laboratory, European Bioinformatics Institute, University of Copenhagen

Bio: Paul Theodor Pyl is an academic researcher from Lund University. The author has contributed to research in topics: Python (programming language) & Proteogenomics. The author has an hindex of 6, co-authored 15 publications receiving 13163 citations. Previous affiliations of Paul Theodor Pyl include Science for Life Laboratory & European Bioinformatics Institute.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

HTSeq—a Python framework to work with high-throughput sequencing data

[...]

Simon Anders, Paul Theodor Pyl, Wolfgang Huber

15 Jan 2015-Bioinformatics

TL;DR: This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.

...read moreread less

Abstract: Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.de

...read moreread less

15,744 citations

Journal Article•DOI•

Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer

[...]

Jakob Wirbel, Paul Theodor Pyl¹, Paul Theodor Pyl², Ece Kartal³, Konrad Zych, Alireza Kashani¹, Alessio Milanese, Jonas S. Fleck, Anita Y. Voigt, Albert Pallejà¹, Ruby Ponnudurai, Shinichi Sunagawa⁴, Luis Pedro Coelho⁵, Petra Schrotz-King⁶, Emily Vogtmann, Nina Habermann, Emma Niméus², Andrew Maltez Thomas⁷, Andrew Maltez Thomas⁸, Paolo Manghi⁷, Sara Gandini⁹, Davide Serrano⁹, Sayaka Mizutani¹⁰, Sayaka Mizutani¹¹, Hirotsugu Shiroma¹⁰, Satoshi Shiba, Tatsuhiro Shibata¹², Shinichi Yachida¹³, Takuji Yamada¹⁰, Takuji Yamada¹⁴, Levi Waldron¹⁵, Alessio Naccarati, Nicola Segata⁷, Rashmi Sinha¹⁶, Cornelia M. Ulrich¹⁷, Hermann Brenner⁶, Manimozhiyan Arumugam¹, Manimozhiyan Arumugam¹⁸, Peer Bork, Georg Zeller - Show less +36 more•Institutions (18)

University of Copenhagen¹, Lund University², Molecular Medicine Partnership Unit³, ETH Zurich⁴, Fudan University⁵, German Cancer Research Center⁶, University of Trento⁷, University of São Paulo⁸, European Institute of Oncology⁹, Tokyo Institute of Technology¹⁰, Japan Society for the Promotion of Science¹¹, University of Tokyo¹², Osaka University¹³, National Presto Industries¹⁴, City University of New York¹⁵, National Institutes of Health¹⁶, Huntsman Cancer Institute¹⁷, University of Southern Denmark¹⁸

01 Apr 2019-Nature Medicine

TL;DR: A meta-analysis of eight geographically and technically diverse fecal shotgun metagenomic studies of colorectal cancer identified a core set of 29 species significantly enriched in CRC metagenomes, establishing globally generalizable, predictive taxonomic and functional microbiome CRC signatures as a basis for future diagnostics.

...read moreread less

Abstract: Association studies have linked microbiome alterations with many human diseases. However, they have not always reported consistent results, thereby necessitating cross-study comparisons. Here, a meta-analysis of eight geographically and technically diverse fecal shotgun metagenomic studies of colorectal cancer (CRC, n = 768), which was controlled for several confounders, identified a core set of 29 species significantly enriched in CRC metagenomes (false discovery rate (FDR) < 1 × 10−5). CRC signatures derived from single studies maintained their accuracy in other studies. By training on multiple studies, we improved detection accuracy and disease specificity for CRC. Functional analysis of CRC metagenomes revealed enriched protein and mucin catabolism genes and depleted carbohydrate degradation genes. Moreover, we inferred elevated production of secondary bile acids from CRC metagenomes, suggesting a metabolic link between cancer-associated gut microbes and a fat- and meat-rich diet. Through extensive validations, this meta-analysis firmly establishes globally generalizable, predictive taxonomic and functional microbiome CRC signatures as a basis for future diagnostics. Cross-study analysis defines fecal microbial species associated with colorectal cancer.

...read moreread less

615 citations

Posted Content•DOI•

HTSeq - A Python framework to work with high-throughput sequencing data

[...]

Simon Anders, Paul Theodor Pyl, Wolfgang Huber

19 Aug 2014-bioRxiv

TL;DR: This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.

...read moreread less

Abstract: Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard work flows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data such as genomic coordinates, sequences, sequencing reads, alignments, gene model information, variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability: HTSeq is released as open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index, https://pypi.python.org/pypi/HTSeq

...read moreread less

433 citations

Journal Article•DOI•

Recovery of gut microbiota of healthy adults following antibiotic exposure

[...]

Albert Pallejà¹, Kristian Hallundbæk Mikkelsen¹, Sofia K. Forslund, Alireza Kashani¹, Kristine H. Allin¹, Kristine H. Allin², Trine Nielsen¹, Tue H. Hansen¹, Suisha Liang, Qiang Feng, Chenchen Zhang, Paul Theodor Pyl¹, Luis Pedro Coelho, Huanming Yang, Jian Wang, Athanasios Typas, Morten Frost Munk Nielsen¹, Henrik Nielsen, Peer Bork, Jun Wang, Tina Vilsbøll¹, Torben Hansen¹, Torben Hansen³, Filip K. Knop¹, Manimozhiyan Arumugam¹, Oluf Pedersen¹ - Show less +22 more•Institutions (3)

University of Copenhagen¹, Frederiksberg Hospital², University of Southern Denmark³

22 Oct 2018-Nature microbiology

TL;DR: It is shown that the human gut microbiome can recover after a clinically relevant, broad-spectrum antibiotic treatment and characterization of the resistome indicates that antibiotic resistance genes can impact the recovery process.

...read moreread less

Abstract: To minimize the impact of antibiotics, gut microorganisms harbour and exchange antibiotics resistance genes, collectively called their resistome. Using shotgun sequencing-based metagenomics, we analysed the partial eradication and subsequent regrowth of the gut microbiota in 12 healthy men over a 6-month period following a 4-day intervention with a cocktail of 3 last-resort antibiotics: meropenem, gentamicin and vancomycin. Initial changes included blooms of enterobacteria and other pathobionts, such as Enterococcus faecalis and Fusobacterium nucleatum, and the depletion of Bifidobacterium species and butyrate producers. The gut microbiota of the subjects recovered to near-baseline composition within 1.5 months, although 9 common species, which were present in all subjects before the treatment, remained undetectable in most of the subjects after 180 days. Species that harbour β-lactam resistance genes were positively selected for during and after the intervention. Harbouring glycopeptide or aminoglycoside resistance genes increased the odds of de novo colonization, however, the former also decreased the odds of survival. Compositional changes under antibiotic intervention in vivo matched results from in vitro susceptibility tests. Despite a mild yet long-lasting imprint following antibiotics exposure, the gut microbiota of healthy young adults are resilient to a short-term broad-spectrum antibiotics intervention and their antibiotics resistance gene carriage modulates their recovery processes.

...read moreread less

413 citations

Journal Article•DOI•

The Genomic and Transcriptomic Landscape of a HeLa Cell Line

[...]

Jonathan J M Landry, Paul Theodor Pyl, Tobias Rausch, Thomas Zichner, Manu M. Tekkedil, Adrian M. Stütz, Anna Jauch¹, Raeka S. Aiyar, Gregoire Pau, Nicolas Delhomme, Julien Gagneur, Jan O. Korbel, Wolfgang Huber, Lars M. Steinmetz - Show less +10 more•Institutions (1)

University Hospital Heidelberg¹

01 Aug 2013-G3: Genes, Genomes, Genetics

TL;DR: This study performed DNA and RNA sequencing of a HeLa Kyoto cell line and analyzed its mutational portfolio and gene expression profile, providing the first detailed account of genomic variants in the HeLa genome.

...read moreread less

Abstract: HeLa is the most widely used model cell line for studying human cellular and molecular biology. To date, no genomic reference for this cell line has been released, and experiments have relied on the human reference genome. Effective design and interpretation of molecular genetic studies performed using HeLa cells require accurate genomic information. Here we present a detailed genomic and transcriptomic characterization of a HeLa cell line. We performed DNA and RNA sequencing of a HeLa Kyoto cell line and analyzed its mutational portfolio and gene expression profile. Segmentation of the genome according to copy number revealed a remarkably high level of aneuploidy and numerous large structural variants at unprecedented resolution. Some of the extensive genomic rearrangements are indicative of catastrophic chromosome shattering, known as chromothripsis. Our analysis of the HeLa gene expression profile revealed that several pathways, including cell cycle and DNA repair, exhibit significantly different expression patterns from those in normal human tissues. Our results provide the first detailed account of genomic variants in the HeLa genome, yielding insight into their impact on gene expression and cellular function as well as their origins. This study underscores the importance of accounting for the strikingly aberrant characteristics of HeLa cells when designing and interpreting experiments, and has implications for the use of HeLa as a model of human biology.

...read moreread less

403 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

[...]

Michael I. Love¹, Michael I. Love², Wolfgang Huber, Simon Anders•Institutions (2)

Harvard University¹, Max Planck Society²

05 Dec 2014-Genome Biology

TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

...read moreread less

Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

...read moreread less

47,038 citations

Journal Article•DOI•

limma powers differential expression analyses for RNA-sequencing and microarray studies

[...]

Matthew E. Ritchie¹, Belinda Phipson², Di Wu³, Yifang Hu¹, Charity W. Law⁴, Wei Shi¹, Gordon K. Smyth¹, Gordon K. Smyth⁵ - Show less +4 more•Institutions (5)

Walter and Eliza Hall Institute of Medical Research¹, Royal Children's Hospital², Harvard University³, University of Zurich⁴, University of Melbourne⁵

20 Apr 2015-Nucleic Acids Research

TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read moreread less

Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read moreread less

22,147 citations

Posted Content•DOI•

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

[...]

Michael I. Love¹, Wolfgang Huber, Simon Anders•Institutions (1)

Harvard University¹

17 Nov 2014-bioRxiv

...read moreread less

Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.

...read moreread less

17,014 citations

Journal Article•DOI•

Near-optimal probabilistic RNA-seq quantification

[...]

Nicolas Bray¹, Harold Pimentel¹, Páll Melsted², Lior Pachter¹•Institutions (2)

University of California, Berkeley¹, University of Iceland²

01 May 2016-Nature Biotechnology

TL;DR: Kallisto pseudoaligns reads to a reference, producing a list of transcripts that are compatible with each read while avoiding alignment of individual bases, which removes a major computational bottleneck in RNA-seq analysis.

...read moreread less

Abstract: We present kallisto, an RNA-seq quantification program that is two orders of magnitude faster than previous approaches and achieves similar accuracy. Kallisto pseudoaligns reads to a reference, producing a list of transcripts that are compatible with each read while avoiding alignment of individual bases. We use kallisto to analyze 30 million unaligned paired-end RNA-seq reads in <10 min on a standard laptop computer. This removes a major computational bottleneck in RNA-seq analysis.

...read moreread less

6,468 citations

Journal Article•DOI•

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences

[...]

Charlotte Soneson¹, Charlotte Soneson², Michael I. Love³, Mark D. Robinson², Mark D. Robinson¹ - Show less +1 more•Institutions (3)

Swiss Institute of Bioinformatics¹, University of Zurich², Harvard University³

30 Dec 2015-F1000Research

TL;DR: It is illustrated that while the presence of differential isoform usage can lead to inflated false discovery rates in differential expression analyses on simple count matrices and transcript-level abundance estimates improve the performance in simulated data, the difference is relatively minor in several real data sets.

...read moreread less

Abstract: High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.

...read moreread less

2,420 citations

1
2
3
…
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse