scispace - formally typeset
Author

Michael I. Love

Bio: Michael I. Love is a academic researcher at University of North Carolina at Chapel Hill who has co-authored 112 publication(s) receiving 44018 citation(s). The author has an hindex of 31. Previous affiliations of Michael I. Love include University of California, San Francisco & Harvard University. The author has done significant research in the topic(s): Bioconductor & Expression quantitative trait loci.

...read more

Papers
  More

Open accessJournal ArticleDOI: 10.1186/S13059-014-0550-8
05 Dec 2014-Genome Biology
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

...read more

Topics: MRNA Sequencing (54%), Integrator complex (51%), Count data (50%) ...read more

29,675 Citations


Open accessJournal ArticleDOI: 10.1038/NMETH.4197
Rob Patro1, Geet Duggal, Michael I. Love2, Rafael A. Irizarry2  +1 moreInstitutions (3)
01 Apr 2017-Nature Methods
Abstract: We introduce Salmon, a lightweight method for quantifying transcript abundance from RNA-seq reads. Salmon combines a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure. It is the first transcriptome-wide quantifier to correct for fragment GC-content bias, which, as we demonstrate here, substantially improves the accuracy of abundance estimates and the sensitivity of subsequent differential expression analysis.

...read more

3,535 Citations


Open accessPosted ContentDOI: 10.1101/002832
17 Nov 2014-bioRxiv
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.

...read more

  • Figure 2 Effect of shrinkage on logarithmic fold change estimates. Plots of the (A)MLE (i.e., no shrinkage) and (B)MAP estimate (i.e., with shrinkage) for the LFCs attributable to mouse strain, over the average expression strength for a ten vs eleven sample comparison of the Bottomly et al. [16] dataset. Small triangles at the top and bottom of the plots indicate points that would fall outside of the plotting window. Two genes with similar mean count and MLE logarithmic fold change are highlighted with green and purple circles. (C) The counts (normalized by size factors sj) for these genes reveal low dispersion for the gene in green and high dispersion for the gene in purple. (D) Density plots of the likelihoods (solid lines, scaled to integrate to 1) and the posteriors (dashed lines) for the green and purple genes and of the prior (solid black line): due to the higher
    Figure 2 Effect of shrinkage on logarithmic fold change estimates. Plots of the (A)MLE (i.e., no shrinkage) and (B)MAP estimate (i.e., with shrinkage) for the LFCs attributable to mouse strain, over the average expression strength for a ten vs eleven sample comparison of the Bottomly et al. [16] dataset. Small triangles at the top and bottom of the plots indicate points that would fall outside of the plotting window. Two genes with similar mean count and MLE logarithmic fold change are highlighted with green and purple circles. (C) The counts (normalized by size factors sj) for these genes reveal low dispersion for the gene in green and high dispersion for the gene in purple. (D) Density plots of the likelihoods (solid lines, scaled to integrate to 1) and the posteriors (dashed lines) for the green and purple genes and of the prior (solid black line): due to the higher
  • Figure 3 Stability of logarithmic fold changes. DESeq2 is run on equally split halves of the data of Bottomly et al. [16], and the LFCs from the halves are plotted against each other. (A)MLEs, i.e., without LFC shrinkage. (B)MAP estimates, i.e., with shrinkage. Points in the top left and bottom right quadrants indicate genes with a change of sign of LFC. Red points indicate genes with adjusted P value < 0.1. The legend displays the root-mean-square error of the estimates in group I compared to those in group II. LFC, logarithmic fold change; MAP, maximum a posteriori; MLE,
    Figure 3 Stability of logarithmic fold changes. DESeq2 is run on equally split halves of the data of Bottomly et al. [16], and the LFCs from the halves are plotted against each other. (A)MLEs, i.e., without LFC shrinkage. (B)MAP estimates, i.e., with shrinkage. Points in the top left and bottom right quadrants indicate genes with a change of sign of LFC. Red points indicate genes with adjusted P value < 0.1. The legend displays the root-mean-square error of the estimates in group I compared to those in group II. LFC, logarithmic fold change; MAP, maximum a posteriori; MLE,
  • Figure 8 Sensitivity estimated from experimental reproducibility. Each algorithm’s sensitivity in the evaluation set (box plots) is evaluated using the calls of each other algorithm in the verification set (panels with grey label).
    Figure 8 Sensitivity estimated from experimental reproducibility. Each algorithm’s sensitivity in the evaluation set (box plots) is evaluated using the calls of each other algorithm in the verification set (panels with grey label).
  • Figure 9 Precision estimated from experimental reproducibility. Each algorithm’s precision in the evaluation set (box plots) is evaluated using the calls of each other algorithm in the verification set (panels with grey label).
    Figure 9 Precision estimated from experimental reproducibility. Each algorithm’s precision in the evaluation set (box plots) is evaluated using the calls of each other algorithm in the verification set (panels with grey label).
Topics: Count data (53%), Bioconductor (53%), Fold change (51%)

2,229 Citations


Open accessJournal ArticleDOI: 10.1038/NMETH.3252
Wolfgang Huber, Vincent J. Carey1, Robert Gentleman2, Simon Anders  +22 moreInstitutions (13)
01 Feb 2015-Nature Methods
Abstract: Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.

...read more

Topics: Bioconductor (71%)

2,202 Citations


Open accessJournal ArticleDOI: 10.12688/F1000RESEARCH.7563.1
30 Dec 2015-F1000Research
Abstract: High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.

...read more

Topics: RNA-Seq (52%)

1,587 Citations


Cited by
  More

Open accessJournal ArticleDOI: 10.1186/S13059-014-0550-8
05 Dec 2014-Genome Biology
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

...read more

Topics: MRNA Sequencing (54%), Integrator complex (51%), Count data (50%) ...read more

29,675 Citations


Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTU638
15 Jan 2015-Bioinformatics
Abstract: Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.de

...read more

11,833 Citations


Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTT656
Yang Liao1, Gordon K. Smyth1, Wei Shi1Institutions (1)
01 Apr 2014-Bioinformatics
Abstract: MOTIVATION: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. RESULTS: We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. AVAILABILITY AND IMPLEMENTATION: featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.

...read more

8,495 Citations


Open accessJournal ArticleDOI: 10.1016/J.CELL.2019.05.031
13 Jun 2019-Cell
Abstract: Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function. Here, we develop a strategy to "anchor" diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities. After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations. Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns. Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets.

...read more

3,853 Citations


Open access
01 Feb 2015-
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

...read more

3,734 Citations


Performance
Metrics

Author's H-index: 31

No. of papers from the Author in previous years
YearPapers
202125
202023
201915
201811
20177
20169

Top Attributes

Show by:

Author's top 5 most impactful journals

bioRxiv

35 papers, 2.4K citations

Genome Biology

13 papers, 31K citations

F1000Research

7 papers, 1.8K citations

Bioinformatics

5 papers, 485 citations

Nucleic Acids Research

4 papers, 111 citations

Network Information
Related Authors (5)
Arjun Bhattacharya

23 papers, 134 citations

84% related
Stefan A. Haas

86 papers, 11.7K citations

83% related
Hugues Richard

36 papers, 2.9K citations

83% related
Hao Hu

39 papers, 2.3K citations

81% related
C Jensen

11 papers, 1.2K citations

79% related