scispace - formally typeset

Journal

bioRxiv 

About: bioRxiv is an academic journal. The journal publishes majorly in the area(s): Population & Gene. Over the lifetime, 154314 publication(s) have been published receiving 439493 citation(s). The journal is also known as: bioRxiv.org : the preprint server for biology & bioRxivorg.
Topics: Population, Gene, Genome, Chromatin, RNA


Papers
More filters
Posted ContentDOI
17 Nov 2014-bioRxiv
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.

2,229 citations

Posted ContentDOI
30 Oct 2015-bioRxiv
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities. The resulting catalogue of human genetic diversity has unprecedented resolution, with an average of one variant every eight bases of coding sequence and the presence of widespread mutational recurrence. The deep catalogue of variation provided by the Exome Aggregation Consortium (ExAC) can be used to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; we identify 3,230 genes with near-complete depletion of truncating variants, 79% of which have no currently established human disease phenotype. Finally, we show that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human knockout variants in protein-coding genes.

1,552 citations

Posted ContentDOI
Konrad J. Karczewski1, Konrad J. Karczewski2, Laurent C. Francioli1, Laurent C. Francioli2, Grace Tiao2, Grace Tiao1, Beryl B. Cummings1, Beryl B. Cummings2, Jessica Alföldi2, Jessica Alföldi1, Qingbo Wang1, Qingbo Wang2, Ryan L. Collins2, Ryan L. Collins1, Kristen M. Laricchia2, Kristen M. Laricchia1, Andrea Ganna3, Andrea Ganna1, Andrea Ganna2, Daniel P. Birnbaum1, Laura D. Gauthier1, Harrison Brand2, Harrison Brand1, Matthew Solomonson1, Matthew Solomonson2, Nicholas A. Watts1, Nicholas A. Watts2, Daniel R. Rhodes4, Moriel Singer-Berk1, Eleanor G. Seaby1, Eleanor G. Seaby2, Jack A. Kosmicki2, Jack A. Kosmicki1, Raymond K. Walters1, Raymond K. Walters2, Katherine Tashman1, Katherine Tashman2, Yossi Farjoun1, Eric Banks1, Timothy Poterba2, Timothy Poterba1, Arcturus Wang2, Arcturus Wang1, Cotton Seed2, Cotton Seed1, Nicola Whiffin1, Nicola Whiffin5, Jessica X. Chong6, Kaitlin E. Samocha7, Emma Pierce-Hoffman1, Zachary Zappala8, Zachary Zappala1, Anne H. O’Donnell-Luria9, Anne H. O’Donnell-Luria1, Anne H. O’Donnell-Luria2, Eric Vallabh Minikel1, Ben Weisburd1, Monkol Lek10, Monkol Lek1, James S. Ware5, James S. Ware1, Christopher Vittal2, Christopher Vittal1, Irina M. Armean1, Irina M. Armean2, Irina M. Armean11, Louis Bergelson1, Kristian Cibulskis1, Kristen M. Connolly1, Miguel Covarrubias1, Stacey Donnelly1, Steven Ferriera1, Stacey Gabriel1, Jeff Gentry1, Namrata Gupta1, Thibault Jeandet1, Diane Kaplan1, Christopher Llanwarne1, Ruchi Munshi1, Sam Novod1, Nikelle Petrillo1, David Roazen1, Valentin Ruano-Rubio1, Andrea Saltzman1, Molly Schleicher1, Jose Soto1, Kathleen Tibbetts1, Charlotte Tolonen1, Gordon Wade1, Michael E. Talkowski2, Michael E. Talkowski1, Benjamin M. Neale1, Benjamin M. Neale2, Mark J. Daly1, Daniel G. MacArthur1, Daniel G. MacArthur2 
30 Jan 2019-bioRxiv
TL;DR: Using an improved human mutation rate model, human protein-coding genes are classified along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.
Abstract: Summary Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes. Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved model of human mutation, we classify human protein-coding genes along a spectrum representing intolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.

1,037 citations

Posted ContentDOI
20 Jun 2016-bioRxiv
TL;DR: It is shown that it is possible to make hundreds of thousands permutations in a few minutes, which leads to very accurate p-values, which allows applying standard FDR correction procedures, which are more accurate than the ones currently used.
Abstract: Gene set enrichment analysis is a widely used tool for analyzing gene expression data. However, current implementations are slow due to a large number of required samples for the analysis to have a good statistical power. In this paper we present a novel algorithm, that efficiently reuses one sample multiple times and thus speeds up the analysis. We show that it is possible to make hundreds of thousands permutations in a few minutes, which leads to very accurate p-values. This, in turn, allows applying standard FDR correction procedures, which are more accurate than the ones currently used. The method is implemented in a form of an R package and is freely available at \url{https://github.com/ctlab/fgsea}.

788 citations

Posted ContentDOI
11 Feb 2020-bioRxiv
TL;DR: The Coronavirus Study Group (CSG) of the International Committee on Taxonomy of Viruses assessed the novelty of the human pathogen tentatively named 2019-nCoV and formally recognizes this virus as a sister to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
Abstract: The present outbreak of lower respiratory tract infections, including respiratory distress syndrome, is the third spillover, in only two decades, of an animal coronavirus to humans resulting in a major epidemic. Here, the Coronavirus Study Group (CSG) of the International Committee on Taxonomy of Viruses, which is responsible for developing the official classification of viruses and taxa naming (taxonomy) of the Coronaviridae family, assessed the novelty of the human pathogen tentatively named 2019-nCoV. Based on phylogeny, taxonomy and established practice, the CSG formally recognizes this virus as a sister to severe acute respiratory syndrome coronaviruses (SARS-CoVs) of the species Severe acute respiratory syndrome-related coronavirus and designates it as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). To facilitate communication, the CSG further proposes to use the following naming convention for individual isolates: SARS-CoV-2/Isolate/Host/Date/Location. The spectrum of clinical manifestations associated with SARS-CoV-2 infections in humans remains to be determined. The independent zoonotic transmission of SARS-CoV and SARS-CoV-2 highlights the need for studying the entire (virus) species to complement research focused on individual pathogenic viruses of immediate significance. This research will improve our understanding of virus-host interactions in an ever-changing environment and enhance our preparedness for future outbreaks.

781 citations

Network Information
Related Journals (5)
Current Biology

18K papers, 1.2M citations

88% related
Nature Communications

41.3K papers, 2.5M citations

87% related
Genome Research

5.3K papers, 834K citations

87% related
PLOS ONE

252.9K papers, 7.2M citations

86% related
Bioinformatics

16.1K papers, 1.7M citations

86% related
Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
202137,102
202042,689
201931,470
201822,828
201712,258
20165,058