HTSeq—a Python framework to work with high-throughput sequencing data
TLDR
This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.Abstract:
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.deread more
Citations
More filters
Journal ArticleDOI
MEF2C regulates cortical inhibitory and excitatory synapses and behaviors relevant to neurodevelopmental disorders.
Adam J Harrington,Adam J Harrington,Aram J. Raissi,Kacey Rajkovich,Stefano Berto,Jaswinder Kumar,Jaswinder Kumar,Gemma Molinaro,Jonathan Raduazzo,Yuhong Guo,Kris Loerwald,Genevieve Konopka,Kimberly M. Huber,Christopher W. Cowan,Christopher W. Cowan +14 more
TL;DR: It is shown that conditional embryonic deletion of Mef2c in cortical and hippocampal excitatory neurons (Emx1-lineage) produces a dramatic reduction in cortical network activity in vivo, and that MEF2C regulates E/I synapse density predominantly as a cell-autonomous, transcriptional repressor.
Journal ArticleDOI
Meningeal lymphatic dysfunction exacerbates traumatic brain injury pathogenesis.
Ashley C. Bolte,Arun B. Dutta,Mariah E. Hurt,Igor Smirnov,Michael A. Kovacs,Celia A. McKee,Hannah E. Ennerfelt,Daniel Shapiro,Bao H. Nguyen,Elizabeth L. Frost,Catherine R. Lammert,Jonathan Kipnis,John R. Lukens +12 more
TL;DR: It is demonstrated in an experimental mouse model of TBI that mild forms of brain trauma cause severe deficits in meningeal lymphatic drainage that begin within hours and last out to at least one month post-injury.
Journal ArticleDOI
Disrupted alternative splicing for genes implicated in splicing and ciliogenesis causes PRPF31 retinitis pigmentosa.
Adriana Buskin,Lili Zhu,Valeria Chichagova,Basudha Basu,Sina Mozaffari-Jovin,David Dolan,Alastair Droop,Joseph Collin,Revital Bronstein,Sudeep Mehrotra,Michael H. Farkas,Gerrit Hilgen,Kathryn White,Kuan-Ting Pan,Achim Treumann,Dean Hallam,Katarzyna Bialas,Git Chung,Carla B. Mellough,Yuchun Ding,Natalio Krasnogor,Stefan Przyborski,Simon Zwolinski,Jumana Y. Al-Aama,Sameer E. Al-Harthi,Yaobo Xu,Gabrielle Wheway,Katarzyna Szymanska,Martin McKibbin,Chris F. Inglehearn,David J. Elliott,Susan Lindsay,Robin R. Ali,David H. W. Steel,Lyle Armstrong,Evelyne Sernagor,Henning Urlaub,Eric A. Pierce,Reinhard Lührmann,Sushma Nagaraja Grellscheid,Sushma Nagaraja Grellscheid,Colin A. Johnson,Majlinda Lako +42 more
TL;DR: In situ gene editing of a pathogenic mutation rescued protein expression and key cellular phenotypes in RPE and photoreceptors, providing proof of concept for future therapeutic strategies.
Journal ArticleDOI
Evolutionary Trajectories of IDHWT Glioblastomas Reveal a Common Path of Early Tumorigenesis Instigated Years ahead of Initial Diagnosis
Verena Körber,Verena Körber,Jing Yang,Pankaj Barah,Yonghe Wu,Damian Stichel,Zuguang Gu,Michael N. C. Fletcher,David T.W. Jones,Bettina Hentschel,Katrin Lamszus,Jörg C. Tonn,Gabriele Schackert,Michael Sabel,Jörg Felsberg,Angela Zacher,Kerstin Kaulich,Daniel Hübschmann,Christel Herold-Mende,Andreas von Deimling,Michael Weller,Bernhard Radlwimmer,Matthias Schlesner,Guido Reifenberger,Thomas Höfer,Thomas Höfer,Peter Lichter +26 more
TL;DR: This analysis suggests both a distant origin of de novo glioblastoma, up to 7 years before diagnosis, and a common path of early tumorigenesis, with one or more of chromosome 7 gain, 9p loss, or 10 loss, at tumor initiation.
Journal ArticleDOI
Errors in RNA-Seq quantification affect genes of relevance to human disease
Christelle Robert,Mick Watson +1 more
TL;DR: It is shown that it is possible to use data that may otherwise have been discarded to measure group-level expression, and that such data contains biologically relevant information.
References
More filters
Journal ArticleDOI
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Trimmomatic: a flexible trimmer for Illumina sequence data
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Journal ArticleDOI
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.