HTSeq—a Python framework to work with high-throughput sequencing data
TLDR
This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.Abstract:
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.deread more
Citations
More filters
Journal ArticleDOI
A census of human RNA-binding proteins.
TL;DR: This work presents a census of 1,542 manually curated RBPs that are analysed for their interactions with different classes of RNA, their evolutionary conservation, their abundance and their tissue-specific expression, a critical step towards the comprehensive characterization of proteins involved in human RNA metabolism.
Journal ArticleDOI
Altering the Intestinal Microbiota during a Critical Developmental Window Has Lasting Metabolic Consequences
Laura M. Cox,Shingo Yamanishi,Jiho Sohn,Alexander V. Alekseyenko,Jacqueline M. Leung,Ilseung Cho,Sungheon Kim,Huilin Li,Zhan Gao,Douglas Mahana,Jorge G. Zárate Rodriguez,Arlin B. Rogers,Nicolas Robine,P'ng Loke,Martin J. Blaser,Martin J. Blaser +15 more
TL;DR: It is shown that low-dose penicillin (LDP), delivered from birth, induces metabolic alterations and affects ileal expression of genes involved in immunity, indicating that microbiota interactions in infancy may be critical determinants of long-term host metabolic effects.
Journal ArticleDOI
The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads.
TL;DR: Rsubread is presented, a Bioconductor software package that provides high-performance alignment and read counting functions for RNA-seq reads that integrates read mapping and quantification in a single package and has no software dependencies other than R itself.
Journal ArticleDOI
Recognition of RNA N 6 -methyladenosine by IGF2BP proteins enhances mRNA stability and translation
Huilin Huang,Huilin Huang,Hengyou Weng,Hengyou Weng,Wen-Ju Sun,Xi Qin,Xi Qin,Hailing Shi,Hailing Shi,Huizhe Wu,Huizhe Wu,Huizhe Wu,Boxuan Simen Zhao,Boxuan Simen Zhao,Ana Mesquita,Chang Liu,Chang Liu,Celvie L. Yuan,Yueh-Chiang Hu,Stefan Hüttelmaier,Jennifer R. Skibbe,Rui Su,Rui Su,Xiaolan Deng,Xiaolan Deng,Xiaolan Deng,Lei Dong,Lei Dong,Miao Sun,Chenying Li,Chenying Li,Chenying Li,Sigrid Nachtergaele,Sigrid Nachtergaele,Yungui Wang,Yungui Wang,Chao Hu,Chao Hu,Kyle Ferchen,Kenneth D. Greis,Xi Jiang,Xi Jiang,Minjie Wei,Liang-Hu Qu,Jun-Lin Guan,Chuan He,Chuan He,Jian-Hua Yang,Jianjun Chen,Jianjun Chen +49 more
TL;DR: This work reports the insulin-like growth factor 2 mRNA-binding proteins as a distinct family of m6A readers that target thousands of mRNA transcripts through recognizing the consensus GG(m6A)C sequence, and identifies IGF2BPs as an additional class of N6-methyladenosine (m 6A) reader proteins.
Journal ArticleDOI
Single-cell reconstruction of the early maternal–fetal interface in humans
Roser Vento-Tormo,Roser Vento-Tormo,Mirjana Efremova,Rachel A. Botting,Margherita Y. Turco,Miquel Vento-Tormo,Kerstin B. Meyer,Jong-Eun Park,Emily Stephenson,Krzysztof Polanski,Angela Goncalves,Angela Goncalves,Lucy Gardner,Staffan Holmqvist,Johan Henriksson,Angela Zou,Andrew M. Sharkey,Ben Millar,Barbara A. Innes,Laura Wood,Anna Wilbrey-Clark,Rebecca Payne,Martin A. Ivarsson,Steve Lisgo,Andrew Filby,David H. Rowitch,Judith N. Bulmer,Gavin J. Wright,Michael J. T. Stubbington,Muzlifah Haniffa,Muzlifah Haniffa,Ashley Moffett,Sarah A. Teichmann,Sarah A. Teichmann,Sarah A. Teichmann +34 more
TL;DR: A single-cell atlas of the maternal–fetal interface reveals the cellular organization of the decidua and placenta, and the interactions that are critical for placentation and reproductive success, and develops a repository of ligand–receptor complexes and a statistical tool to predict the cell–cell communication via these molecular interactions.
References
More filters
Journal ArticleDOI
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Trimmomatic: a flexible trimmer for Illumina sequence data
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Journal ArticleDOI
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.