HTSeq—a Python framework to work with high-throughput sequencing data

doi:10.1093/BIOINFORMATICS/BTU638

Open AccessJournal ArticleDOI

HTSeq—a Python framework to work with high-throughput sequencing data

Simon Anders, +2 more

- 15 Jan 2015 -

Bioinformatics

- Vol. 31, Iss: 2, pp 166-169

Chats0

TLDR

This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.

Abstract:

Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.de

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Michael I. Love, +3 more

- 05 Dec 2014 -

Genome Biology

TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

...read moreread less

Journal ArticleDOI

limma powers differential expression analyses for RNA-sequencing and microarray studies

Matthew E. Ritchie, +7 more

- 20 Apr 2015 -

Nucleic Acids Research

TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read moreread less

Posted ContentDOI

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Michael I. Love, +2 more

- 17 Nov 2014 -

bioRxiv

TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

...read moreread less

Journal ArticleDOI

Near-optimal probabilistic RNA-seq quantification

Nicolas Bray, +3 more

- 01 May 2016 -

Nature Biotechnology

TL;DR: Kallisto pseudoaligns reads to a reference, producing a list of transcripts that are compatible with each read while avoiding alignment of individual bases, which removes a major computational bottleneck in RNA-seq analysis.

...read moreread less

Journal ArticleDOI

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences

Charlotte Soneson, +4 more

- 30 Dec 2015 -

F1000Research

TL;DR: It is illustrated that while the presence of differential isoform usage can lead to inflated false discovery rates in differential expression analyses on simple count matrices and transcript-level abundance estimates improve the performance in simulated data, the difference is relatively minor in several real data sets.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Michael I. Love, +3 more

- 05 Dec 2014 -

Genome Biology

TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

...read moreread less

Journal ArticleDOI

The Sequence Alignment/Map format and SAMtools

Heng Li, +8 more

- 01 Aug 2009 -

Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Journal ArticleDOI

Trimmomatic: a flexible trimmer for Illumina sequence data

Anthony Bolger, +2 more

- 01 Aug 2014 -

Bioinformatics

TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.

...read moreread less

Journal ArticleDOI

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Mark D. Robinson, +2 more

- 01 Jan 2010 -

Bioinformatics

TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.

...read moreread less

Journal ArticleDOI

BEDTools: a flexible suite of utilities for comparing genomic features

Aaron R. Quinlan, +1 more

- 15 Mar 2010 -

Bioinformatics

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.

...read moreread less