HTSeq—a Python framework to work with high-throughput sequencing data
TLDR
This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.Abstract:
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.deread more
Citations
More filters
Journal ArticleDOI
Ribonuclease selection for ribosome profiling.
TL;DR: This study provides a guide for ribonuclease selection in ribosome profiling experiments across most common model systems, and shows that they yield comparable estimations of gene expression when ribosomal integrity is not compromised.
Journal ArticleDOI
Oncogenic Kras drives invasion and maintains metastases in colorectal cancer
Adam T. Boutin,Wen Ting Liao,Melody Wang,Soyoon Sarah Hwang,Tatiana Karpinets,Hannah Cheung,Gerald C. Chu,Shan Jiang,Jian Hu,Kyle Chang,Eduardo Vilar,Xingzhi Song,Jianhua Zhang,Scott Kopetz,Andrew Futreal,Y. Alan Wang,Lawrence N. Kwong,Ronald A. DePinho +17 more
TL;DR: A mouse model of metastatic CRC that harbors an inducible oncogenic Kras allele (Krasmut ) and conditional null alleles of Apc and Trp53 (iKAP) provides genetic evidence that Krasmut drives CRC invasion and maintenance of metastases.
Journal ArticleDOI
Expression of SARS-CoV-2 Entry Molecules ACE2 and TMPRSS2 in the Gut of Patients With IBD.
Juan Burgueño,Adrian Reich,Hajar Hazime,Maria A. Quintero,Irina Fernandez,Julia Fritsch,Ana M. Santander,Nivis Brito,Oriana M. Damas,Amar R. Deshpande,David H. Kerman,Lanyu Zhang,Zhen Gao,Yuguang Ban,Lily Wang,Judith Pignac-Kobinger,Maria T. Abreu +16 more
TL;DR: The viral entry molecules ACE2 and TMPRSS2 are expressed in the ileum and colon and had high expression in intestinal epithelial cells in animal models of IBD, providing reassurance for patients with IBD.
Journal ArticleDOI
Lifestyle and horizontal gene transfer- mediated evolution of Mucispirillum schaedleri, a core member of the murine gut microbiota
Alexander Loy,Carina Pfann,Michaela Steinberger,Buck Hanson,Simone Herp,Sandrine Brugiroux,João Carlos Gomes Neto,Mark V. Boekschoten,Clarissa Schwab,Tim Urich,Amanda E. Ramer-Tait,Thomas Rattei,Bärbel Stecher,David Berry +13 more
TL;DR: The lifestyle of the gut bacterium Mucispirillum schaedleri, which is associated with inflammation in widely used mouse models, is found to have specialized systems to handle oxidative stress during inflammation, suggesting that M. schaederi undergoes intimate interactions with its host and may play a role in inflammation.
Journal ArticleDOI
Mesenchymal Niche-Specific Expression of Cxcl12 Controls Quiescence of Treatment-Resistant Leukemia Stem Cells.
Puneet Agarwal,Stephan Isringhausen,Hui Li,Andrew J. Paterson,Jianbo He,Alvaro Gomariz,Takashi Nagasawa,César Nombela-Arrieta,Ravi Bhatia +8 more
TL;DR: It is shown that targeted deletion of Cxcl12 from mesenchymal stromal cells (MSCs) reduces normal HSC numbers but promotes LSC expansion by increasing self-renewing cell divisions, possibly through enhanced Ezh2 activity.
References
More filters
Journal ArticleDOI
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Trimmomatic: a flexible trimmer for Illumina sequence data
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Journal ArticleDOI
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.