HTSeq—a Python framework to work with high-throughput sequencing data
TLDR
This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.Abstract:
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.deread more
Citations
More filters
Journal ArticleDOI
m6A mRNA methylation facilitates resolution of naïve pluripotency toward differentiation
Shay Geula,Sharon Moshitch-Moshkovitz,Dan Dominissini,Abed AlFatah Mansour,Nitzan Kol,Mali Salmon-Divon,Vera Hershkovitz,Eyal Peer,Nofar Mor,Yair S. Manor,Moshe Shay Ben-Haim,Eran Eyal,Sharon Yunger,Yishay Pinto,Diego Jaitin,Sergey Viukov,Yoach Rais,Vladislav Krupalnik,Elad Chomsky,Mirie Zerbib,Itay Maza,Yoav Rechavi,Rada Massarwa,Suhair Hanna,Suhair Hanna,Ido Amit,Erez Y. Levanon,Ninette Amariglio,Ninette Amariglio,Noam Stern-Ginossar,Noa Novershtern,Gideon Rechavi,Jacob H. Hanna +32 more
TL;DR: It is shown that N6-methyladenosine (m6A), a messenger RNA (mRNA) modification present on transcripts of pluripotency factors, drives this transition from the pluripotent to the differentiated state.
Journal ArticleDOI
Fusobacterium nucleatum Promotes Chemoresistance to Colorectal Cancer by Modulating Autophagy.
Ta Chung Yu,Fangfang Guo,Ya-Nan Yu,Tian-Tian Sun,Dan Ma,Ji-Xuan Han,Yun Qian,Ilona Kryczek,Danfeng Sun,Nisha Nagarsheth,Ying-Xuan Chen,Haoyan Chen,Jie Hong,Weiping Zou,Jing-Yuan Fang +14 more
TL;DR: It is found that Fusobacterium (F.) nucleatum was abundant in colorectal cancer tissues in patients with recurrence post chemotherapy, and was associated with patient clinicopathological characterisitcs, and bioinformatic and functional studies demonstrated that F. nucleatum promoted coloreCTal cancer resistance to chemotherapy.
Journal ArticleDOI
Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R.
TL;DR: The R/Bioconductor package scater is developed to facilitate rigorous pre‐processing, quality control, normalization and visualization of scRNA‐seq data and provides a convenient, flexible workflow to process raw sequencing reads into a high‐quality expression dataset ready for downstream analysis.
Journal ArticleDOI
A Living Biobank of Breast Cancer Organoids Captures Disease Heterogeneity
Norman Sachs,Joep de Ligt,Oded Kopper,Ewa Gogola,Gergana Bounova,Fleur Weeber,Anjali Vanita Balgobind,Karin Wind,Ana Gracanin,Harry Begthel,Jeroen Korving,Ruben van Boxtel,Alexandra A. Duarte,Daphne Lelieveld,Arne Van Hoeck,Robert F Ernst,Francis Blokzijl,Isaac J. Nijman,Marlous Hoogstraat,Marieke van der Ven,David A. Egan,Vittoria Zinzalla,Jürgen Moll,Sylvia F. Boj,Emile E. Voest,Lodewyk F. A. Wessels,Lodewyk F. A. Wessels,Paul J. van Diest,Sven Rottenberg,Sven Rottenberg,Robert G.J. Vries,Edwin Cuppen,Hans Clevers +32 more
TL;DR: This study describes a representative collection of well-characterized BC organoids available for cancer research and drug development, as well as a strategy to assess in vitro drug response in a personalized fashion.
Journal ArticleDOI
Computational and analytical challenges in single-cell transcriptomics
TL;DR: The development of high-throughput RNA sequencing at the single-cell level has already led to profound new discoveries in biology, ranging from the identification of novel cell types to the study of global patterns of stochastic gene expression.
References
More filters
Journal ArticleDOI
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Trimmomatic: a flexible trimmer for Illumina sequence data
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Journal ArticleDOI
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.