HTSeq—a Python framework to work with high-throughput sequencing data
TLDR
This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.Abstract:
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.deread more
Citations
More filters
Journal ArticleDOI
Root transcriptional dynamics induced by beneficial rhizobacteria and microbial immune elicitors reveal signatures of adaptation to mutualists.
Ioannis A. Stringlis,Silvia Proietti,Richard Hickman,Marcel C. Van Verk,Christos Zamioudis,Corné M. J. Pieterse +5 more
TL;DR: Using auxin response mutant tir1afb2afb3, a dual role for auxin signaling in finely balancing growth‐promoting and defense‐eliciting activities of beneficial microbes in plant roots is demonstrated.
Journal ArticleDOI
Lineage-Determining Transcription Factor TCF-1 Initiates the Epigenetic Identity of T Cells
John L. Johnson,Georgios Georgakilas,Jelena Petrovic,Makoto Kurachi,Stanley Cai,Christelle Harly,Warren S. Pear,Avinash Bhandoola,E. John Wherry,Golnaz Vahedi +9 more
TL;DR: The results indicate that a mechanism by which TCF‐1 controls T cell fate is through its widespread ability to target silent chromatin and establish the epigenetic identity of T cells.
Journal ArticleDOI
A genomic and epigenomic atlas of prostate cancer in Asian populations.
Jing Li,Chuanliang Xu,Hyung Joo Lee,Shancheng Ren,Xiaoyuan Zi,Zhiming Zhang,Haifeng Wang,Yongwei Yu,Chenghua Yang,Chenghua Yang,Xiaofeng Gao,Jianguo Hou,Linhui Wang,Bo Yang,Qing Yang,Huamao Ye,Tie Zhou,Xin Lu,Yan Wang,Min Qu,Qingsong Yang,Wenhui Zhang,Nakul M. Shah,Erica C. Pehrsson,Shuo Wang,Zengjun Wang,Jun Jiang,Yan Zhu,Rui Chen,Huan Chen,Feng Zhu,Bijun Lian,Xiaoyun Li,Yun Zhang,Chao Wang,Yue Wang,Guangan Xiao,Junfeng Jiang,Yue Yang,Chaozhao Liang,Jian-quan Hou,Conghui Han,Ming Chen,Ning Jiang,Dahong Zhang,Song Wu,Jinjian Yang,Tao Wang,Yongliang Chen,Jiantong Cai,Wenzeng Yang,Jun Xu,Shaogang Wang,Xu Gao,Ting Wang,Yinghao Sun +55 more
TL;DR: Genomic, transcriptomic and DNA methylation data from tissue samples from 208 Chinese patients with prostate cancer define the landscape of alterations in this population, and comparison with data from Western cohorts suggests that the disease may stratify into different molecular subtypes.
Journal ArticleDOI
Analysis of non-coding transcriptome in rice and maize uncovers roles of conserved lncRNAs associated with agriculture traits
TL;DR: In this paper, the authors performed both non-directional and strand-specific RNA-sequencing experiments to profile non-coding transcriptomes of various rice and maize organs at different developmental stages.
Journal ArticleDOI
XACT Noncoding RNA Competes with XIST in the Control of X Chromosome Activity during Human Early Development
Céline Vallot,Catherine Patrat,Catherine Patrat,Amanda J. Collier,Amanda J. Collier,Christophe Huret,Miguel Casanova,Tharvesh Moideen Liyakat Ali,Matteo Tosolini,N. Frydman,Edith Heard,Peter J. Rugg-Gunn,Peter J. Rugg-Gunn,Claire Rougeulle +13 more
TL;DR: A mechanism involving antagonistic activity of XIST and XACT in controlling X chromosome activity in early human embryos is suggested, and the contribution of rapidly evolving lncRNAs to species-specific developmental mechanisms is highlighted.
References
More filters
Journal ArticleDOI
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Trimmomatic: a flexible trimmer for Illumina sequence data
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Journal ArticleDOI
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.