HTSeq—a Python framework to work with high-throughput sequencing data
TLDR
This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.Abstract:
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.deread more
Citations
More filters
Journal ArticleDOI
Differential bacterial capture and transport preferences facilitate co-growth on dietary xylan in the human gut
Maria Louise Leth,Morten Ejby,Christopher T. Workman,David Adrian Ewald,Signe Schultz Pedersen,Claus Sternberg,Martin Iain Bahl,Tine Rask Licht,Finn Lillelund Aachmann,Bjørge Westereng,Maher Abou Hachem +10 more
TL;DR: Characterization of xylan utilization loci in the butyrate-producing Firmicute Roseburia intestinalis provides mechanistic insight into its growth on different xylan substrates and its ability to co-grow and compete with a xylan-degrading commensal from the Bacteroides genus.
Journal ArticleDOI
Defining the Functional Role of NaV1.7 in Human Nociception.
Lucy A. McDermott,Greg A. Weir,Andreas C. Themistocleous,Andrew R. Segerdahl,Iulia Blesneac,Georgios Baskozos,Alex J. Clark,Val Millar,Liam J Peck,Daniel Ebner,Irene Tracey,Jordi Serra,David L.H. Bennett +12 more
TL;DR: CIP arises due to a profound loss of functional nociceptors, which is more pronounced than that reported in rodent models, or likely achievable following acute pharmacological blockade.
Posted ContentDOI
The functional landscape of the human phosphoproteome
David Ochoa,Andrew F. Jarnuczak,Maja Gehre,Margaret Soucheray,Margaret Soucheray,Askar A. Kleefeldt,Cristina Viéitez,Anthony Hill,Luz Garcia-Alonso,Danielle L. Swaney,Danielle L. Swaney,Juan Antonio Vizcaíno,Kyung-Min Noh,Pedro Beltrao +13 more
TL;DR: A state-of-the-art phosphoproteome containing 119,809 human phosphosites is created by analyzing 6,801 publicly available phospho-enriched mass spectrometry proteomics experiments to identify the most relevant phosphorylations for a given process or disease addressing a major bottleneck in cell signaling studies.
Journal ArticleDOI
The Evolution of Orphan Regions in Genomes of a Fungal Pathogen of Wheat
TL;DR: This study showed that this pathogen species harbored extensive chromosomal structure polymorphism that may drive the evolution of virulence, and showed that pathogen populations harbor extensive polymorphism at the chromosome level and that this polymorphism can be a source of adaptive genetic variation in pathogen evolution.
Journal ArticleDOI
Implication of Long noncoding RNAs in the endothelial cell response to hypoxia revealed by RNA-sequencing
Christine Voellenkle,Jose Manuel Garcia-Manteiga,Simona Pedrotti,Alessandra Perfetti,I. De Toma,Derly José Henriques da Silva,Biagina Maimone,Simona Greco,Pasquale Fasanaro,Pasquale Creo,Germana Zaccagnini,Carlo Gaetano,Fabio Martelli +12 more
TL;DR: A high-confidence signature of lncRNAs modulated by hypoxia in HUVEC was identified and a significant impact of H19 lncRNA was shown.
References
More filters
Journal ArticleDOI
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Trimmomatic: a flexible trimmer for Illumina sequence data
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Journal ArticleDOI
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.