HTSeq - A Python framework to work with high-throughput sequencing data
Reads0
Chats0
TLDR
This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.Abstract:
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard work flows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data such as genomic coordinates, sequences, sequencing reads, alignments, gene model information, variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability: HTSeq is released as open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index, https://pypi.python.org/pypi/HTSeqread more
Citations
More filters
Journal ArticleDOI
Altering the Intestinal Microbiota during a Critical Developmental Window Has Lasting Metabolic Consequences
Laura M. Cox,Shingo Yamanishi,Jiho Sohn,Alexander V. Alekseyenko,Jacqueline M. Leung,Ilseung Cho,Sungheon Kim,Huilin Li,Zhan Gao,Douglas Mahana,Jorge G. Zárate Rodriguez,Arlin B. Rogers,Nicolas Robine,P'ng Loke,Martin J. Blaser,Martin J. Blaser +15 more
TL;DR: It is shown that low-dose penicillin (LDP), delivered from birth, induces metabolic alterations and affects ileal expression of genes involved in immunity, indicating that microbiota interactions in infancy may be critical determinants of long-term host metabolic effects.
Journal ArticleDOI
ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases
TL;DR: Ngs.plot is a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data and is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.
Journal ArticleDOI
Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data
Franck Rapaport,Raya Khanin,Yupu Liang,Mono Pirun,Azra Krek,Paul Zumbo,Christopher E. Mason,Nicholas D. Socci,Doron Betel +8 more
TL;DR: A large number of computational methods have been developed for analyzing differential gene expression in RNA-seq data as discussed by the authors, and a comprehensive evaluation of common methods using the SEQC benchmark dataset and ENCODE data.
Journal ArticleDOI
Human Tumor-Associated Macrophage and Monocyte Transcriptional Landscapes Reveal Cancer-Specific Reprogramming, Biomarkers, and Therapeutic Targets.
Luca Cassetta,Stamatina Fragkogianni,Andrew H. Sims,Agnieszka Swierczak,Lesley M. Forrester,Hui Zhang,Daniel Y.H. Soong,Tiziana Cotechini,Pavana Anur,Elaine Y. Lin,Antonella Fidanza,Martha Lopez-Yrigoyen,Michael Millar,Alexandra Urman,Zhichao Ai,Paul T. Spellman,E. Shelley Hwang,J Michael Dixon,Lisa Wiechmann,Lisa M. Coussens,Harriet O. Smith,Jeffrey W. Pollard,Jeffrey W. Pollard +22 more
TL;DR: It is shown that monocyte subpopulation distribution and transcriptomes are significantly altered by the presence of endometrial and breast cancer and an auto-regulatory loop between TAMs and cancer cells driven by tumor necrosis factor alpha involving SIGLEC1 and CCL8 is identified.
Journal ArticleDOI
Highly evolvable malaria vectors: The genomes of 16 Anopheles mosquitoes
Daniel E. Neafsey,Robert M. Waterhouse,Mohammad Reza Abai,Sergey Aganezov,Max A. Alekseyev,James E. Allen,James Amon,Bruno Arcà,Peter Arensburger,Gleb N. Artemov,Lauren A. Assour,Hamidreza Basseri,Aaron M. Berlin,Bruce W. Birren,Stéphanie Blandin,Stéphanie Blandin,Andrew I. Brockman,Thomas R. Burkot,Austin Burt,Clara S. Chan,Cedric Chauve,Joanna C. Chiu,Mikkel B. Christensen,Carlo Costantini,Victoria L.M. Davidson,Elena Deligianni,Tania Dottorini,Vicky Dritsou,Stacey Gabriel,Wamdaogo M. Guelbeogo,Andrew Brantley Hall,Mira V. Han,Thaung Hlaing,Daniel S.T. Hughes,Daniel S.T. Hughes,Adam M. Jenkins,Xiaofang Jiang,Irwin Jungreis,Evdoxia G. Kakani,Evdoxia G. Kakani,Maryam Kamali,Petri Kemppainen,Ryan C. Kennedy,Ioannis K. Kirmitzoglou,Ioannis K. Kirmitzoglou,Lizette L. Koekemoer,Njoroge Laban,Nicholas Langridge,Mara K. N. Lawniczak,Manolis Lirakis,Neil F. Lobo,Ernesto Lowy,Robert M. MacCallum,Chunhong Mao,Gareth Maslen,Charles Mbogo,Jenny McCarthy,Kristin Michel,Sara N. Mitchell,Wendy Moore,Katherine A. Murphy,Anastasia N. Naumenko,Tony Nolan,Eva Maria Novoa,Samantha M. O’Loughlin,Chioma Oringanje,Mohammad Ali Oshaghi,Nazzy Pakpour,Philippos Aris Papathanos,Philippos Aris Papathanos,Ashley Peery,Michael Povelones,Anil Prakash,David P. Price,Ashok Rajaraman,Lisa J. Reimer,David C. Rinker,Antonis Rokas,Tanya L. Russell,N’Fale Sagnon,Maria V. Sharakhova,Terrance Shea,Felipe A. Simão,Felipe A. Simão,Frédéric Simard,Michel A. Slotman,Pradya Somboon,V. N. Stegniy,Claudio J. Struchiner,Claudio J. Struchiner,Gregg W.C. Thomas,Marta Tojo,Pantelis Topalis,Jose M. C. Tubio,Maria F. Unger,John Vontas,Catherine Walton,Craig S. Wilding,Judith H. Willis,Yi-Chieh Wu,Yi-Chieh Wu,Guiyun Yan,Evgeny M. Zdobnov,Evgeny M. Zdobnov,Xiaofan Zhou,Flaminia Catteruccia,Flaminia Catteruccia,George K. Christophides,Frank H. Collins,Robert S. Cornman,Andrea Crisanti,Andrea Crisanti,Martin J. Donnelly,Martin J. Donnelly,Scott J. Emrich,Michael C. Fontaine,Michael C. Fontaine,William M. Gelbart,Matthew W. Hahn,Immo A. Hansen,Paul I. Howell,Fotis C. Kafatos,Manolis Kellis,Daniel Lawson,Christos Louis,Shirley Luckhart,Marc A. T. Muskavitch,Marc A. T. Muskavitch,José M. C. Ribeiro,Michael A. Riehle,Igor V. Sharakhov,Zhijian Tu,Laurence J. Zwiebel,Nora J. Besansky +133 more
TL;DR: The authors investigated the genomic basis of vectorial capacity and explore new avenues for vector control, sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila.
References
More filters
Journal ArticleDOI
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Trimmomatic: a flexible trimmer for Illumina sequence data
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Journal ArticleDOI
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.