HTSeq—a Python framework to work with high-throughput sequencing data
TLDR
This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.Abstract:
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.deread more
Citations
More filters
Journal ArticleDOI
A comprehensive analysis of breast cancer microbiota and host gene expression.
Kevin J. Thompson,James N. Ingle,Xiaojia Tang,Nicholas Chia,Patricio Jeraldo,Marina Walther-Antonio,Karunya K. Kandimalla,Stephen Johnson,Janet Yao,Sean C. Harrington,Vera J. Suman,Liewei Wang,Richard L. Weinshilboum,Judy C. Boughey,Jean Pierre A. Kocher,Heidi Nelson,Matthew P. Goetz,Krishna R. Kalari +17 more
TL;DR: The objective was to characterize the microbiota and associate the microbiota with the tumor expression profiles, for 668 breast tumor tissues and 72 non-cancerous adjacent tissues and further unraveling this complicated interplay should enable us to better diagnose and treat breast cancer patients.
Journal ArticleDOI
Sexual Dimorphism and the Evolution of Sex-Biased Gene Expression in the Brown Alga Ectocarpus
Agnieszka P. Lipinska,Alexandre Cormier,Remy Luthringer,Akira F. Peters,Erwan Corre,Claire M. M. Gachon,J. Mark Cock,Susana M. Coelho +7 more
TL;DR: Gene duplication appears to have played a significant role in the generation of sex-biased genes in Ectocarpus, expanding previous models that propose this mechanism for the resolution of sexual antagonism in diploid systems.
Journal ArticleDOI
Dorsal tegmental dopamine neurons gate associative learning of fear
Florian Groessl,Thomas Munsch,Susanne Meis,Johannes Griessner,Johannes Griessner,Joanna Kaczanowska,Pinelopi Pliota,Dominic Kargl,Sylvia Badurek,Klaus Kraitsy,Arash Rassoulpour,Johannes Zuber,Johannes Zuber,Volkmar Lessmann,Wulf Haubensak +14 more
TL;DR: A circuit that reciprocally connects the ventral periaqueductal gray and dorsal raphe region with the central amygdala and that encode a positive prediction error in response to unpredicted shocks and may reshape intra-amygdala connectivity via a dopamine-dependent form of long-term potentiation is identified.
Journal ArticleDOI
Before and After: Comparison of Legacy and Harmonized TCGA Genomic Data Commons’ Data
Galen F. Gao,Joel S. Parker,Sheila Reynolds,Tiago C. Silva,Liang-Bo Wang,Wanding Zhou,Rehan Akbani,Matthew H. Bailey,Saianand Balu,Benjamin P. Berman,Denise Brooks,Hu Chen,Andrew D. Cherniack,John A. Demchok,Li Ding,Ina Felau,Sharon Gaheen,Daniela S. Gerhard,David I. Heiman,Kyle M. Hernandez,Katherine A. Hoadley,Reyka G Jayasinghe,Anab Kemal,Theo A. Knijnenburg,Peter W. Laird,Michael K A Mensah,Andrew J. Mungall,A. Gordon Robertson,Hui Shen,Roy Tarnuzzer,Zhining Wang,Matthew A. Wyczalkowski,Liming Yang,Jean C. Zenklusen,Z. Zhang,Han Liang,Han Liang,Michael S. Noble +37 more
TL;DR: The results demonstrate that the hg19 and hg38 TCGA datasets are very highly concordant, promote informed use of either legacy or harmonized omics data, and provide a rubric that encourages similar comparisons as new data emerge and reference data evolve.
Journal ArticleDOI
Determinants of RNA metabolism in the Schizosaccharomyces pombe genome
Philipp Eser,Philipp Eser,Leonhard Wachutka,Kerstin C. Maier,Carina Demel,Mariana Boroni,Srignanakshi Iyer,Patrick Cramer,Julien Gagneur +8 more
TL;DR: The approach reveals distinct kinetics of mRNA and ncRNA metabolism, separates antisense regulation by transcription interference from RNA interference, and provides a general tool for studying the regulatory code of genomes.
References
More filters
Journal ArticleDOI
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Trimmomatic: a flexible trimmer for Illumina sequence data
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Journal ArticleDOI
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.