scispace - formally typeset
Search or ask a question
Journal ArticleDOI

HTSeq—a Python framework to work with high-throughput sequencing data

15 Jan 2015-Bioinformatics (Oxford University Press)-Vol. 31, Iss: 2, pp 166-169
TL;DR: This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.
Abstract: Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.de

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
14 Apr 2016-Nature
TL;DR: Interestingly, RNA sequencing revealed that astrocytes and non-astrocyte cells in SCI lesions express multiple axon-growth-supporting molecules, showing that contrary to the prevailing dogma, astroCyte scar formation aids rather than prevents central nervous system axon regeneration.
Abstract: Transected axons fail to regrow in the mature central nervous system. Astrocytic scars are widely regarded as causal in this failure. Here, using three genetically targeted loss-of-function manipulations in adult mice, we show that preventing astrocyte scar formation, attenuating scar-forming astrocytes, or ablating chronic astrocytic scars all failed to result in spontaneous regrowth of transected corticospinal, sensory or serotonergic axons through severe spinal cord injury (SCI) lesions. By contrast, sustained local delivery via hydrogel depots of required axon-specific growth factors not present in SCI lesions, plus growth-activating priming injuries, stimulated robust, laminin-dependent sensory axon regrowth past scar-forming astrocytes and inhibitory molecules in SCI lesions. Preventing astrocytic scar formation significantly reduced this stimulated axon regrowth. RNA sequencing revealed that astrocytes and non-astrocyte cells in SCI lesions express multiple axon-growth-supporting molecules. Our findings show that contrary to the prevailing dogma, astrocyte scar formation aids rather than prevents central nervous system axon regeneration.

1,292 citations

Journal ArticleDOI
TL;DR: This study provides strong evidence for translation of circRNAs, revealing the existence of an unexplored layer of gene activity.

1,254 citations


Additional excerpts

  • ...…https://ccb.jhu.edu/software/tophat/ index.shtml Bowtie2 Langmead and Salzberg, 2012 http://bowtie-bio.sourceforge.net/ bowtie2/index.shtml HTSeq Anders et al., 2015 http://www-huber.embl.de/HTSeq DESeq Anders and Huber, 2010 http://bioconductor.org/packages/ release/bioc/html/DESeq.html DAVID…...

    [...]

  • ...REAGENT or RESOURCE SOURCE IDENTIFIER Cdi V5 This paper N/A Pde8 V5 This paper N/A Camk1-mbl-cherry-camk1 This paper N/A circMbl IRES This paper N/A circPde8 IRES This paper N/A circCdi IRES This paper N/A circTai IRES This paper N/A circMbl IRES reverse This paper N/A circPde8 IRES reverse This paper N/A circCdi IRES reverse This paper N/A circTai IRES reverse This paper N/A UAS-circMbl OE This paper N/A Software and Algorithms Find_circ.py Memczak et al. 2013 http://circbase.org/cgi-bin/ downloads.cgi cORF_prediction_pipeline.py (prediction of ORFs from circRNA sequences) This paper https://github.com/kadenerlab/ cORF_pipeline SRCP.py (short read circRNA pipeline for detection of back-splice reads in RFP RNA-seq) This paper To be published, can be provided upon request Toptat2 Kim et al., 2013 https://ccb.jhu.edu/software/tophat/ index.shtml Bowtie2 Langmead and Salzberg, 2012 http://bowtie-bio.sourceforge.net/ bowtie2/index.shtml HTSeq Anders et al., 2015 http://www-huber.embl.de/HTSeq DESeq Anders and Huber, 2010 http://bioconductor.org/packages/ release/bioc/html/DESeq.html DAVID Huang da et al., 2009 https://david.ncifcrf.gov/home.jsp...

    [...]

  • ...HTSeq-a Python framework to work with high-throughput sequencing data....

    [...]

Journal ArticleDOI
06 Oct 2016-Cell
TL;DR: It is demonstrated that upon lipopolysaccharide stimulation, macrophages shift from producing ATP by oxidative phosphorylation to glycolysis while also increasing succinate levels, and repurpose mitochondria from ATP synthesis to ROS production in order to promote a pro-inflammatory state.

1,249 citations


Cites methods from "HTSeq—a Python framework to work wi..."

  • ...HTSeq–a Python framework to work with high-throughput sequencing data....

    [...]

  • ...HTSeq-count (Anders et al., 2015) was used to count the transcripts associated with each gene, and a counts matrix containing the number of counts for each gene across different samples and stimulations was obtained....

    [...]

  • ...N/A Mouse: SDHB-deficient: ROSA26-CreERT2/ SDHBfloxed/floxed Laboratory of Eyal Gottlieb N/A Mouse: SDHB-proficient: ROSA26-CreERT2/ SDHBwildtype/wildtype Laboratory of Eyal Gottlieb N/A Sequence-Based Reagents HIF-1a TaqMan assay Applied Biosystems Cat# 4331182; Assay ID: Mm00468869_m1 EGLN3 TaqMan assay Applied Biosystems Cat# 4331182; Assay ID: Mm00472200_m1 Rps18 FAM TaqMan assay Applied Biosystems Cat# 4331182; Assay ID: Mm_02601777_g1 Primers for mouse IL-1b See Table S7 N/A Primers for mouse IL-10 See Table S7 N/A Primers for mouse TNF- a See Table S7 N/A Primers for mouse Rps18 See Table S7 N/A Primers for mouse IL-1ra See Table S7 N/A Primers for mouse PHD3 See Table S7 N/A Primers for mouse cMyc See Table S7 N/A Primers for mouse CD71 See Table S7 N/A Software and Algorithms GraphPad Prism GraphPad Software http://www.graphpad.com/scientific- software/prism/ FlowJo FlowJo http://www.flowjo.com/ Bowtie Langmead et al., 2009 http://bowtie-bio.sourceforge.net/index.shtml TopHat Trapnell et al., 2009 http://ccb.jhu.edu/software/tophat/index.shtml HTSeq Anders et al., 2015 https://pypi.python.org/pypi/HTSeq...

    [...]

  • ...HTSeq-count was used to count the transcripts associated with each gene in the transcriptomics analysis....

    [...]

  • ...…FlowJo http://www.flowjo.com/ Bowtie Langmead et al., 2009 http://bowtie-bio.sourceforge.net/index.shtml TopHat Trapnell et al., 2009 http://ccb.jhu.edu/software/tophat/index.shtml HTSeq Anders et al., 2015 https://pypi.python.org/pypi/HTSeq...

    [...]

Journal ArticleDOI
TL;DR: The findings provide the direct evidence that m(6)A reader YTHDC1 regulates mRNA splicing through recruiting and modulating pre-mRNA splicing factors for their access to the binding regions of targeted mRNAs.

1,244 citations


Cites methods from "HTSeq—a Python framework to work wi..."

  • ...The number of reads mapped to each Ensembl gene (release 72) was counted using the HTSeq python package (Anders et al., 2015), with the ‘union’ overlap resolution mode, and –stranded = no and the expressions of transcripts were quantified as Reads Per Kilobase of exon model per...

    [...]

Journal ArticleDOI
23 Jan 2020-Nature
TL;DR: B cell markers were the most differentially expressed genes in the tumours of responders versus non-responders and insights are provided into the potential role of B cells and tertiary lymphoid structures in the response to ICB treatment, with implications for the development of biomarkers and therapeutic targets.
Abstract: Treatment with immune checkpoint blockade (ICB) has revolutionized cancer therapy. Until now, predictive biomarkers1-10 and strategies to augment clinical response have largely focused on the T cell compartment. However, other immune subsets may also contribute to anti-tumour immunity11-15, although these have been less well-studied in ICB treatment16. A previously conducted neoadjuvant ICB trial in patients with melanoma showed via targeted expression profiling17 that B cell signatures were enriched in the tumours of patients who respond to treatment versus non-responding patients. To build on this, here we performed bulk RNA sequencing and found that B cell markers were the most differentially expressed genes in the tumours of responders versus non-responders. Our findings were corroborated using a computational method (MCP-counter18) to estimate the immune and stromal composition in this and two other ICB-treated cohorts (patients with melanoma and renal cell carcinoma). Histological evaluation highlighted the localization of B cells within tertiary lymphoid structures. We assessed the potential functional contributions of B cells via bulk and single-cell RNA sequencing, which demonstrate clonal expansion and unique functional states of B cells in responders. Mass cytometry showed that switched memory B cells were enriched in the tumours of responders. Together, these data provide insights into the potential role of B cells and tertiary lymphoid structures in the response to ICB treatment, with implications for the development of biomarkers and therapeutic targets.

1,206 citations

References
More filters
Journal ArticleDOI
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

47,038 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations


"HTSeq—a Python framework to work wi..." refers background in this paper

  • ...…is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. functionality from PySam…...

    [...]

Journal ArticleDOI
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Abstract: Motivation: Although many next-generation sequencing (NGS) read preprocessing tools already existed, we could not find any tool or combination of tools that met our requirements in terms of flexibility, correct handling of paired-end data and high performance. We have developed Trimmomatic as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data. Results: The value of NGS read preprocessing is demonstrated for both reference-based and reference-free tasks. Trimmomatic is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested. Availability and implementation: Trimmomatic is licensed under GPL V3. It is cross-platform (Java 1.5+ required) and available at http://www.usadellab.org/cms/index.php?page=trimmomatic Contact: ed.nehcaa-htwr.1oib@ledasu Supplementary information: Supplementary data are available at Bioinformatics online.

39,291 citations

Journal ArticleDOI
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Abstract: Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org).

29,413 citations


"HTSeq—a Python framework to work wi..." refers methods in this paper

  • ...These counts can then be used for gene-level differential expression analyses using methods such as DESeq2 (Anders and Huber, 2010) or edgeR (Robinson et al., 2010)....

    [...]

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations


"HTSeq—a Python framework to work wi..." refers background in this paper

  • ...Interval queries are a recurring task in HTS analysis problems, and several libraries now offer solutions for different programming languages, including BEDtools (Quinlan and Hall, 2010; Dale et al., 2011) and IRanges/GenomicRanges (Lawrence et al....

    [...]

  • ...Interval queries are a recurring task in HTS analysis problems, and several libraries now offer solutions for different programming languages, including BEDtools (Quinlan and Hall, 2010; Dale et al., 2011) and IRanges/GenomicRanges (Lawrence et al., 2013)....

    [...]