HTSeq—a Python framework to work with high-throughput sequencing data
TLDR
This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.Abstract:
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.deread more
Citations
More filters
Di↵erential analysis of count data - the DESeq2 package
TL;DR: The package DESeq2 provides methods to test for di↵erential expression by use of negative binomial generalized linear models; the estimates of dispersion and logarithmic fold changes incorporate data-driven prior distributions.
Journal ArticleDOI
Single-Cell RNA-Seq Analysis Maps Development of Human Germline Cells and Gonadal Niche Interactions
Li Li,Ji Dong,Ji Dong,Liying Yan,Jun Yong,Jun Yong,Xixi Liu,Xixi Liu,Yuqiong Hu,Xiaoying Fan,Xiaoying Fan,Xinglong Wu,Xinglong Wu,Hongshan Guo,Hongshan Guo,Xiaoye Wang,Xiaoye Wang,Xiaohui Zhu,Xiaohui Zhu,Rong Li,Rong Li,Jie Yan,Jie Yan,Yuan Wei,Yuan Wei,Yangyu Zhao,Yangyu Zhao,Wei Wang,Wei Wang,Yixin Ren,Yixin Ren,Peng Yuan,Peng Yuan,Zhiqiang Yan,Zhiqiang Yan,Boqiang Hu,Boqiang Hu,Fan Guo,Fan Guo,Lu Wen,Lu Wen,Fuchou Tang,Fuchou Tang,Jie Qiao +43 more
TL;DR: This work performs single-cell RNA-seq analysis of over 2,000 FGCs and their gonadal niche cells in female and male human embryos spanning several developmental stages to provide key insights into the crucial features of human F GCs during their highly ordered mitotic, meiotic, and gametogenetic processes in vivo.
Journal ArticleDOI
Haematopoietic stem and progenitor cells from human pluripotent stem cells
Ryohichi Sugimura,Deepak Kumar Jha,Areum Han,Clara Soria-Valles,Edroaldo Lummertz da Rocha,Yi Fen Lu,Jeremy A. Goettel,Jeremy A. Goettel,Erik Serrao,R. Grant Rowe,Mohan Malleshaiah,Irene Wong,Patricia Sousa,Ted N. Zhu,Andrea Ditadi,Gordon Keller,Alan Engelman,Scott B. Snapper,Scott B. Snapper,Scott B. Snapper,Sergei Doulatov,George Q. Daley +21 more
TL;DR: The combined approach of morphogen-driven differentiation and transcription-factor-mediated cell fate conversion produces haem atopoietic stem and progenitor cells from pluripotent stem cells and holds promise for modelling haematopoetic disease in humanized mice and for therapeutic strategies in genetic blood disorders.
Journal ArticleDOI
MHC proteins confer differential sensitivity to CTLA-4 and PD-1 blockade in untreated metastatic melanoma
Scott J. Rodig,Scott J. Rodig,Daniel Gusenleitner,Donald G. Jackson,Evisa Gjini,Anita Giobbie-Hurder,Chelsea Jin,Han Chang,Scott B. Lovitch,Christine Horak,Jeffrey S. Weber,Jason L. Weirather,Jedd D. Wolchok,Michael A. Postow,Michael A. Postow,Anna C. Pavlick,Jason Chesney,F. Stephen Hodi +17 more
TL;DR: It is shown that MHC class I expression in advanced melanoma predicted resistance to anti–CTLA-4, but not anti-PD-1, treatment, which may need MHCclass II to be effective, which explains why patients on combined therapy do better on average, with one drug overcoming the limitations of the other.
Journal ArticleDOI
Computational assignment of cell-cycle stage from single-cell transcriptome data.
Antonio Scialdone,Kedar Nath Natarajan,Luis R. Saraiva,Valentina Proserpio,Sarah A. Teichmann,Oliver Stegle,John C. Marioni,Florian Buettner +7 more
TL;DR: Five established supervised machine learning methods and a custom-built predictor for allocating cells to their cell-cycle stage on the basis of their transcriptome are described and compared and it is found that a PCA-based approach and the custom predictor performed best.
References
More filters
Journal ArticleDOI
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Trimmomatic: a flexible trimmer for Illumina sequence data
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Journal ArticleDOI
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.