HTSeq—a Python framework to work with high-throughput sequencing data
TLDR
This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.Abstract:
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.deread more
Citations
More filters
Journal ArticleDOI
The transcriptional response to tumorigenic polarity loss in Drosophila
TL;DR: Bunker et al. as discussed by the authors used RNA sequencing to study the expression of genes in tumor cells that have mutations in the Scribble module, and they found that deleting the layer of epithelial cells activated two other signaling pathways that work together to switch on the upd3 gene when cell polarity is lost.
Journal ArticleDOI
Myocardial Gene Expression Signatures in Human Heart Failure With Preserved Ejection Fraction.
Virginia S. Hahn,Hildur Knutsdottir,Xin Luo,Kenneth Bedi,Kenneth B. Margulies,Saptarsi M. Haldar,Marina Stolina,Jun Yin,Aarif Y. Khakoo,Joban Vaishnav,Joel S. Bader,David A. Kass,David A. Kass,Kavita Sharma +13 more
TL;DR: RNA sequencing on right ventricular septal endomyocardial biopsies prospectively obtained from patients with consensus criteria for HFpEF revealed new signaling targets to consider for precision therapeutics and confirmed group separation.
Journal ArticleDOI
Genome sequencing of the staple food crop white Guinea yam enables the development of a molecular marker for sex determination
Muluneh Tamiru,Satoshi Natsume,Hiroki Takagi,Benjamen White,Hiroki Yaegashi,Motoki Shimizu,Kentaro Yoshida,Aiko Uemura,Kaori Oikawa,Akira Abe,Naoya Urasaki,Hideo Matsumura,Pachakkil Babil,Shinsuke Yamanaka,Ryo Matsumoto,Satoru Muranaka,Gezahegn Girma,Antonio Lopez-Montes,Melaku Gedil,Ranjana Bhattacharjee,Michael Abberton,P. Lava Kumar,Ismail Y. Rabbi,Mai Tsujimura,Toru Terachi,Wilfried Haerty,Manuel Corpas,Sophien Kamoun,Günter Kahl,Hiroko Takagi,Robert Asiedu,Ryohei Terauchi +31 more
TL;DR: The genome analyses and sex-linked marker development performed in this study should greatly accelerate marker-assisted breeding of Guinea yam and can be utilized in genetic studies of other outcrossing crops and organisms with highly heterozygous genomes.
Journal ArticleDOI
Tissue-specific transcriptomics reveals an important role of the unfolded protein response in maintaining fertility upon heat stress in Arabidopsis.
Shuang Shuang Zhang,Hongxing Yang,Lan Ding,Ze Ting Song,Hong Ma,Fang Chang,Jian-Xiang Liu,Jian-Xiang Liu +7 more
TL;DR: This work reveals both the tissue-specific heat responsiveness of Arabidopsis at the reproductive stage and downstream genes of the UPR regulators important in maintaining fertility upon heat stress, and demonstrates the protective roles of theUPR for maintaining fertility based on heat stress.
Journal ArticleDOI
Dynamic landscape of immune cell-specific gene regulation in immune-mediated diseases.
Mineto Ota,Yasuo Nagafuchi,Hiroaki Hatano,Kazuyoshi Ishigaki,Chikashi Terao,Yusuke Takeshima,Haruyuki Yanaoka,Satomi Kobayashi,Mai Okubo,Harumi Shirai,Yusuke Sugimori,Junko Maeda,Masahiro Nakano,Saeko Yamada,Ryochi Yoshida,Haruka Tsuchiya,Yumi Tsuchida,Shuji Akizuki,Hajime Yoshifuji,Koichiro Ohmura,Tsuneyo Mimori,Ken Yoshida,Daitaro Kurosaka,Masato Okada,Keigo Setoguchi,Hiroshi Kaneko,Nobuhiro Ban,Nami Yabuki,Kosuke Matsuki,Hironori Mutoh,Sohei Oyama,Makoto Okazaki,Hiroyuki Tsunoda,Yukiko Iwasaki,Shuji Sumitomo,Hirofumi Shoda,Yuta Kochi,Yukinori Okada,Kazuhiko Yamamoto,Tomohisa Okamura,Keishi Fujio +40 more
TL;DR: In this article, a large-scale immune cell gene-expression analysis, together with whole-genome sequence analysis, was performed to understand the function of these variants, especially under disease-associated conditions.
References
More filters
Journal ArticleDOI
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Trimmomatic: a flexible trimmer for Illumina sequence data
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Journal ArticleDOI
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.