HTSeq—a Python framework to work with high-throughput sequencing data
TLDR
This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.Abstract:
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.deread more
Citations
More filters
Journal ArticleDOI
An asthma-associated IL4R variant exacerbates airway inflammation by promoting conversion of regulatory T cells to TH17-like cells
Amir Massoud,Louis-Marie Charbonnier,Louis-Marie Charbonnier,David Lopez,Matteo Pellegrini,Wanda Phipatanakul,Wanda Phipatanakul,Talal A. Chatila,Talal A. Chatila +8 more
TL;DR: A severe asthma-associated polymorphism in the gene encoding the interleukin (IL)-4 receptor alpha chain (Il4raR576) promotes conversion of induced Treg cells toward a T helper 17 (TH17) cell fate, identifying a previously unknown mechanism for the development of mixed TH2–TH17 cell inflammation in genetically prone individuals.
Journal ArticleDOI
Transcriptional and proteomic insights into the host response in fatal COVID-19 cases.
Meng Wu,Yaobing Chen,Han Xia,Changli Wang,Chin Yee Tan,Xunhui Cai,Yufeng Liu,Fenghu Ji,Peng Xiong,Ran Liu,Yuanlin Guan,Yaqi Duan,Dong Kuang,Sanpeng Xu,Hanghang Cai,Qin Xia,Dehua Yang,Ming-Wei Wang,Isaac M. Chiu,Chao Cheng,Philip P. Ahern,Liang Liu,Guoping Wang,Neeraj K. Surana,Tian Xia,Dennis L. Kasper +25 more
TL;DR: It is shown that pathways related to neutrophil activation and pulmonary fibrosis are among the major up-regulated transcriptional signatures in lung tissue obtained from patients who died of COVID-19 in Wuhan, China, which suggests that the patient deaths may be related to the host response rather than an active fulminant infection.
Journal ArticleDOI
Distinct amyloid-β and tau-associated microglia profiles in Alzheimer’s disease
Emma Gerrits,Nieske Brouwer,Susanne M. Kooistra,Maya E. Woodbury,Yannick Vermeiren,Mirjam Lambourne,Jan Mulder,Markus P. Kummer,Thomas Möller,Knut Biber,Wilfred F. A. den Dunnen,Peter Paul De Deyn,Peter Paul De Deyn,Bart J. L. Eggen,Erik Boddeke,Erik Boddeke +15 more
TL;DR: In this paper, the authors performed snRNAseq on 482,472 nuclei from non-demented control and Alzheimer's disease (AD) brains containing only amyloid-β plaques or both tau and tau pathology.
Journal ArticleDOI
Integrated multi-omics framework of the plant response to jasmonic acid
Mark Zander,Mathew G. Lewsey,Mathew G. Lewsey,Natalie M. Clark,Lingling Yin,Lingling Yin,Anna Bartlett,J. Paola Saldierna Guzmán,J. Paola Saldierna Guzmán,Elizabeth Hann,Elizabeth Hann,Amber E. Langford,Bruce Jow,Aaron Wise,Joseph R. Nery,Huaming Chen,Ziv Bar-Joseph,Justin W. Walley,Roberto Solano,Joseph R. Ecker +19 more
TL;DR: This work investigated the signalling pathway of the hormone jasmonic acid (JA), which controls a plethora of critically important processes in plants and is orchestrated by the transcription factor MYC2 and its closest relatives in Arabidopsis thaliana, and generated an integrated framework of the response to JA.
Journal ArticleDOI
The non-coding RNA landscape of human hematopoiesis and leukemia
Adrian Schwarzer,Stephan Emmrich,Franziska Schmidt,Dominik Beck,Dominik Beck,Michelle Ng,Christina Reimer,Felix F. Adams,Sarah Grasedieck,Damian Witte,Sebastian Käbler,Jason W. H. Wong,Anushi Shah,Yizhou Huang,Razan Jammal,Aliaksandra Maroz,Mojca Jongen-Lavrencic,Axel Schambach,Florian Kuchenbauer,John E. Pimanda,John E. Pimanda,Dirk Reinhardt,Dirk Heckl,Jan-Henning Klusmann +23 more
TL;DR: A comprehensive resource defining the non-coding RNA landscape of the human hematopoietic system and identifying unique fingerprint non-Coding RNAs—such as LINC00173 in granulocytes—and assigning these to critical regulatory circuits involved in blood homeostasis.
References
More filters
Journal ArticleDOI
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Trimmomatic: a flexible trimmer for Illumina sequence data
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Journal ArticleDOI
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.