HTSeq—a Python framework to work with high-throughput sequencing data
TLDR
This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.Abstract:
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.deread more
Citations
More filters
Journal ArticleDOI
Transcriptional regulation of nitrogen-associated metabolism and growth
Allison Gaudinier,Joel Rodriguez-Medina,Lifang Zhang,Andrew Olson,Christophe Liseron-Monfils,Anne-Maarit Bågman,Jessica Foret,Shane E. Abbitt,Michelle Tang,Baohua Li,Daniel E. Runcie,Daniel J. Kliebenstein,Daniel J. Kliebenstein,Bo Shen,Mary J. Frank,Doreen Ware,Doreen Ware,Siobhan M. Brady +17 more
TL;DR: The yeast one-hybrid network for nitrogen-associated metabolism in Arabidopsis reveals the transcription factors that regulate the architecture of root and shoot systems under conditions of changing nitrogen availability.
Journal ArticleDOI
ENL links histone acetylation to oncogenic gene expression in acute myeloid leukaemia
Liling Wan,Liling Wan,Liling Wan,Hong Wen,Yuanyuan Li,Jie Lyu,Yuanxin Xi,Takayuki Hoshii,Takayuki Hoshii,Julia K. Joseph,Xiaolu Wang,Yong-Hwee E. Loh,Michael A. Erb,Amanda Souza,Amanda Souza,James E. Bradner,James E. Bradner,Li Shen,Wei Li,Haitao Li,C. David Allis,Scott A. Armstrong,Scott A. Armstrong,Xiaobing Shi,Xiaobing Shi +24 more
TL;DR: EATS domain-containing protein ENL is identified as a histone acetylation reader that regulates oncogenic transcriptional programs in acute myeloid leukaemia, and displacement of ENL from chromatin may be a promising epigenetic therapy, alone or in combination with BET inhibitors, for aggressiveLeukaemia.
Journal ArticleDOI
RNA 5-Methylcytosine Facilitates the Maternal-to-Zygotic Transition by Preventing Maternal mRNA Decay.
Ying Yang,Lu Wang,Xiao Han,Wen-Lan Yang,Wen-Lan Yang,Mengmeng Zhang,Hai-Li Ma,Bao-Fa Sun,Ang Li,Ang Li,Jun Xia,Jing Chen,Jing Chen,Jian Heng,Baixing Wu,Yu-Sheng Chen,Yu-Sheng Chen,Jia-Wei Xu,Xin Yang,Xin Yang,Huan Yao,Huan Yao,Jiawei Sun,Cong Lyu,Hailin Wang,Ying Huang,Ying-Pu Sun,Yongliang Zhao,Anming Meng,Jinbiao Ma,Feng Liu,Yun-Gui Yang +31 more
TL;DR: This study discovered that Y-box binding protein 1 (Ybx1) preferentially recognizes m5C-modified mRNAs through π-π interactions with a key residue, Trp45, in YbX1's cold shock domain (CSD), which plays essential roles in maternal mRNA stability and early embryogenesis of zebrafish.
Journal ArticleDOI
A conserved abundant cytoplasmic long noncoding RNA modulates repression by Pumilio proteins in human cells
Ailone Tichon,Noa Gil,Yoav Lubelsky,Tal Havkin Solomon,Doron Lemze,Shalev Itzkovitz,Noam Stern-Ginossar,Igor Ulitsky +7 more
TL;DR: It is shown that most of the sequence of NORAD is comprised of repetitive units that together contain at least 17 functional binding sites for the two mammalian Pumilio homologues, suggesting that some cytoplasmic lncRNAs function by modulating the activities of RNA-binding proteins, an activity which positions them at key junctions of cellular signalling pathways.
Journal ArticleDOI
The Arabidopsis CERK1‐associated kinase PBL27 connects chitin perception to MAPK activation
Kenta Yamada,Koji Yamaguchi,Tomomi Shirakawa,Hirofumi Nakagami,Akira Mine,Akira Mine,Kazuya Ishikawa,Masayuki Fujiwara,Mari Narusaka,Yoshihiro Narusaka,Kazuya Ichimura,Yuka Kobayashi,Hidenori Matsui,Yuko Nomura,Mika Nomoto,Yasuomi Tada,Yoichiro Fukao,Tamo Fukamizo,Kenichi Tsuda,Ken Shirasu,Naoto Shibuya,Tsutomu Kawasaki +21 more
TL;DR: A complete phospho‐signaling transduction pathway from PRR‐mediated pathogen recognition to MAPK activation in plants is identified and genetic evidence suggests that phosphorylation of MAPKKK5 by PBL27 is essential for chitin‐induced MAPKactivation in plants.
References
More filters
Journal ArticleDOI
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Trimmomatic: a flexible trimmer for Illumina sequence data
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Journal ArticleDOI
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.