Journal•ISSN: 2692-8205

bioRxiv

About: bioRxiv is an academic journal. The journal publishes majorly in the area(s): Biology & Population. It has an ISSN identifier of 2692-8205. Over the lifetime, 200629 publications have been published receiving 992562 citations. The journal is also known as: bioRxiv.org : the preprint server for biology & bioRxivorg.

...read moreread less

Topics: Biology, Population, Medicine, Gene, Cell biology ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•DOI•

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

[...]

Michael I. Love¹, Wolfgang Huber, Simon Anders•Institutions (1)

Harvard University¹

17 Nov 2014-bioRxiv

TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

...read moreread less

Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.

...read moreread less

17,014 citations

Posted Content•DOI•

fastp: an ultra-fast all-in-one FASTQ preprocessor

[...]

Shifu Chen¹, Yanqing Zhou, Yaru Chen, Jia Gu¹•Institutions (1)

Chinese Academy of Sciences¹

01 Mar 2018-bioRxiv

TL;DR: Fastp is developed as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features that can perform quality control, adapter trimming, quality filtering, per-read quality cutting, and many other operations with a single scan of the FastQ data.

...read moreread less

Abstract: Motivation: Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming, and quality filtering. These tools are often insufficiently fast as most are developed using high level programming languages (e.g., Python and Java) and provide limited multithreading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. Results: We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per read quality cutting, and many other operations with a single scan of the FASTQ data. It also supports unique molecular identifier preprocessing, poly tail trimming, output splitting, and base correction for paired-end data. It can automatically detect adapters for single-end and paired-end FASTQ data. This tool is developed in C++ and has multithreading support. Based on our evaluation, fastp is 2 to 5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools. Availability and Implementation: The open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp

...read moreread less

4,300 citations

Posted Content•DOI•

Integrated analysis of multimodal single-cell data

[...]

Yuhan Hao¹, Stephanie Hao², Erica Andersen-Nissen³, William M. Mauck¹, Shiwei Zheng¹, Andrew Butler¹, Maddie Jane Lee⁴, Aaron J. Wilk⁴, Charlotte A. Darby¹, Michael Zagar³, Paul Hoffman¹, Marlon Stoeckius², Efthymia Papalexi¹, Eleni P. Mimitou², Jaison Jain¹, Avi Srivastava¹, Tim Stuart¹, Lamar Ballweber Fleming³, Bertrand Z. Yeung, Angela J. Rogers⁴, Juliana M. McElrath³, Catherine A. Blish⁴, Raphael Gottardo³, Peter Smibert², Rahul Satija¹ - Show less +21 more•Institutions (4)

New York University¹, Harvard University², Fred Hutchinson Cancer Research Center³, Stanford University⁴

12 Oct 2020-bioRxiv

TL;DR: ‘weighted-nearest neighbor’ analysis is introduced, an unsupervised framework to learn the relative utility of each data type in each cell, enabling an integrative analysis of multiple modalities.

...read moreread less

Abstract: The simultaneous measurement of multiple modalities, known as multimodal analysis, represents an exciting frontier for single-cell genomics and necessitates new computational methods that can define cellular states based on multiple data types. Here, we introduce ‘weighted-nearest neighbor’ analysis, an unsupervised framework to learn the relative utility of each data type in each cell, enabling an integrative analysis of multiple modalities. We apply our procedure to a CITE-seq dataset of hundreds of thousands of human white blood cells alongside a panel of 228 antibodies to construct a multimodal reference atlas of the circulating immune system. We demonstrate that integrative analysis substantially improves our ability to resolve cell states and validate the presence of previously unreported lymphoid subpopulations. Moreover, we demonstrate how to leverage this reference to rapidly map new datasets, and to interpret immune responses to vaccination and COVID-19. Our approach represents a broadly applicable strategy to analyze single-cell multimodal datasets, including paired measurements of RNA and chromatin state, and to look beyond the transcriptome towards a unified and multimodal definition of cellular identity. Availability Installation instructions, documentation, tutorials, and CITE-seq datasets are available at http://www.satijalab.org/seurat

...read moreread less

2,924 citations

Posted Content•DOI•

Comprehensive integration of single cell data

[...]

Tim Stuart, Andrew Butler¹, Paul J. Hoffman, Christoph Hafemeister, Efthymia Papalexi¹, William M. Mauck¹, Marlon Stoeckius², Peter Smibert², Rahul Satija¹ - Show less +5 more•Institutions (2)

New York University¹, Harvard University²

02 Nov 2018-bioRxiv

TL;DR: This work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets, and demonstrates how anchoring can harmonize in-situ gene expression and scRNA-seq datasets.

...read moreread less

Abstract: Single cell transcriptomics (scRNA-seq) has transformed our ability to discover and annotate cell types and states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, including high-dimensional immunophenotypes, chromatin accessibility, and spatial positioning, a key analytical challenge is to integrate these datasets into a harmonized atlas that can be used to better understand cellular identity and function. Here, we develop a computational strategy to "anchor" diverse datasets together, enabling us to integrate and compare single cell measurements not only across scRNA-seq technologies, but different modalities as well. After demonstrating substantial improvement over existing methods for data integration, we anchor scRNA-seq experiments with scATAC-seq datasets to explore chromatin differences in closely related interneuron subsets, and project single cell protein measurements onto a human bone marrow atlas to annotate and characterize lymphocyte populations. Lastly, we demonstrate how anchoring can harmonize in-situ gene expression and scRNA-seq datasets, allowing for the transcriptome-wide imputation of spatial gene expression patterns, and the identification of spatial relationships between mapped cell types in the visual cortex. Our work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets. Availability: Installation instructions, documentation, and tutorials are available at: https://www.satijalab.org/seurat

...read moreread less

2,037 citations

Posted Content•DOI•

Unicycler: resolving bacterial genome assemblies from short and long sequencing reads

[...]

Ryan R. Wick¹, Louise M. Judd¹, Claire L. Gorrie¹, Kathryn E. Holt¹•Institutions (1)

University of Melbourne¹

22 Dec 2016-bioRxiv

TL;DR: Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long read depth and accuracy are low.

...read moreread less

Abstract: The Illumina DNA sequencing platform generates accurate but short reads, which can be used to produce accurate but fragmented genome assemblies. Pacific Biosciences and Oxford Nanopore Technologies DNA sequencing platforms generate long reads that can produce more complete genome assemblies, but the sequencing is more expensive and error prone. There is significant interest in combining data from these complementary sequencing technologies to generate more accurate "hybrid" assemblies. However, few tools exist that truly leverage the benefits of both types of data, namely the accuracy of short reads and the structural resolving power of long reads. Here we present Unicycler, a new tool for assembling bacterial genomes from a combination of short and long reads, which produces assemblies that are accurate, complete and cost-effective. Unicycler builds an initial assembly graph from short reads using the de novo assembler SPAdes and then simplifies the graph using information from short and long reads. Unicycler utilises a novel semi-global aligner, which is used to align long reads to the assembly graph. Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long read depth and accuracy are low. Unicycler is open source (GPLv3) and available at github.com/rrwick/Unicycler.

...read moreread less

1,750 citations

Collapse

Performance

Metrics

200,629

Papers

992,562

Citations

No. of papers from the Journal in previous years
Year	Papers
2023	21,617
2022	24,698
2021	37,102
2020	42,689
2019	31,470
2018	22,828