HTSeq—a Python framework to work with high-throughput sequencing data
TLDR
This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.Abstract:
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.deread more
Citations
More filters
Journal ArticleDOI
Prolonged Mek1/2 suppression impairs the developmental potential of embryonic stem cells
Jiho Choi,Aaron J. Huebner,Kendell Clement,Kendell Clement,Ryan M. Walsh,Andrej J. Savol,Kaixuan Lin,Hongcang Gu,Bruno Di Stefano,Justin Brumbaugh,Sang Yong Kim,Jafar Sharif,Christopher M. Rose,Arman W. Mohammad,Junko Odajima,Jean Charron,Toshihiro Shioda,Andreas Gnirke,Steven P. Gygi,Haruhiko Koseki,Ruslan I. Sadreyev,Andrew Xiao,Alexander Meissner,Alexander Meissner,Konrad Hochedlinger +24 more
TL;DR: The data suggest that, although short-term suppression of Mek1/2 in ES cells helps to maintain an ICM-like epigenetic state, prolonged suppression results in irreversible changes that compromise their developmental potential.
Journal ArticleDOI
The genetic basis and cell of origin of mixed phenotype acute leukaemia
Thomas B. Alexander,Thomas B. Alexander,Zhaohui Gu,Ilaria Iacobucci,Kirsten Dickerson,John K. Choi,Beisi Xu,Debbie Payne-Turner,Hiroki Yoshihara,Mignon L. Loh,John T. Horan,Barbara Buldini,Giuseppe Basso,Sarah Elitzur,Valerie de Haas,C. Michel Zwaan,Allen Eng Juh Yeoh,Dirk Reinhardt,Daisuke Tomizawa,Nobutaka Kiyokawa,Tim Lammens,Barbara De Moerloose,Daniel Catchpoole,Hiroki Hori,Anthony V. Moorman,Andrew S. Moore,Ondrej Hrusak,Soheil Meshinchi,Soheil Meshinchi,Etan Orgel,Meenakshi Devidas,Michael J. Borowitz,Brent L. Wood,Nyla A. Heerema,Andrew Carrol,Yung-Li Yang,Malcolm A. Smith,Tanja M. Davidsen,Leandro C. Hermida,Patee Gesuwan,Marco A. Marra,Yussanne Ma,Andrew J. Mungall,Richard A. Moore,Steven J.M. Jones,Marcus B. Valentine,Laura J. Janke,Jeffrey E. Rubnitz,Ching-Hon Pui,Liang Ding,Yu Liu,Jinghui Zhang,Kim E. Nichols,James R. Downing,Xueyuan Cao,Lei Shi,Stanley Pounds,Scott Newman,Deqing Pei,Jaime M. Guidry Auvil,Daniela S. Gerhard,Stephen P. Hunger,Hiroto Inaba,Charles G. Mullighan +63 more
TL;DR: A large-scale genomics study shows that the cell of origin and founding mutations determine disease subtype and lead to the expression of multiple haematopoietic lineage-defining antigens in mixed phenotype acute leukaemia.
Journal ArticleDOI
The evening complex coordinates environmental and endogenous signals in Arabidopsis
Daphne Ezer,Jaehoon Jung,Hui Lan,Surojit Biswas,Laura Gregoire,Mathew S. Box,Varodom Charoensawan,Varodom Charoensawan,Sandra Cortijo,Xuelei Lai,Xuelei Lai,Dorothee Stöckle,Chloe Zubieta,Katja E. Jaeger,Philip A. Wigge +14 more
TL;DR: It is found that the ability of the EC to bind targets genome-wide depends on temperature, and co-occurrence of phytochrome B at multiple sites where the EC is bound provides a mechanism for integrating environmental information.
Journal ArticleDOI
Dynamic Gene Regulatory Networks Drive Hematopoietic Specification and Differentiation.
Debbie K. Goode,Nadine Obier,M. S. Vijayabaskar,Michael Lie-A-Ling,Andrew J. Lilly,Rebecca Hannah,Monika Lichtinger,Kiran Batta,Magdalena Florkowska,Rahima Patel,Mairi Challinor,Kirstie Wallace,Jane Gilmour,Salam A. Assi,Pierre Cauchy,Maarten Hoogenkamp,David R. Westhead,Georges Lacaud,Valerie Kouskoff,Berthold Göttgens,Constanze Bonifer +20 more
TL;DR: This study generated global gene expression, chromatin accessibility, histone modification, and transcription factor binding data from purified embryonic stem cell-derived cells representing six sequential stages of hematopoietic specification and differentiation to reveal the nature of regulatory elements driving differential gene expression and inform how transcription factorbinding impacts on promoter activity.
Journal ArticleDOI
NeuroD1 reprograms chromatin and transcription factor landscapes to induce the neuronal program
Abhijeet Pataskar,Johannes Jung,Pawel Smialowski,Florian Noack,Federico Calegari,Tobias Straub,Vijay K. Tiwari +6 more
TL;DR: This study reveals the NeuroD1‐dependent gene regulatory program driving neurogenesis and increases the understanding of how cell fate specification during development involves a concerted action of transcription factors and epigenetic mechanisms.
References
More filters
Journal ArticleDOI
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Trimmomatic: a flexible trimmer for Illumina sequence data
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Journal ArticleDOI
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.