scispace - formally typeset
Open AccessPosted ContentDOI

De novoIdentification of DNA Modifications Enabled by Genome-Guided Nanopore Signal Processing

TLDR
The first algorithm for the identification of modified nucleotides without the need for prior training data is presented along with the open source software implementation, nanoraw, which accurately assigns contiguous raw nanopore signal to genomic positions, enabling novel data visualization and increasing power and accuracy for the discovery of covalently modified bases in native DNA.
Abstract
Advances in nanopore sequencing technology have enabled investigation of the full catalogue of covalent DNA modifications. We present the first algorithm for the identification of modified nucleotides without the need for prior training data along with the open source software implementation, nanoraw. Nanoraw accurately assigns contiguous raw nanopore signal to genomic positions, enabling novel data visualization, and increasing power and accuracy for the discovery of covalently modified bases in native DNA. Ground truth case studies utilizing synthetically methylated DNA show the capacity to identify three distinct methylation marks, 4mC, 5mC, and 6mA, in seven distinct sequence contexts without any changes to the algorithm. We demonstrate quantitative reproducibility simultaneously identifying 5mC and 6mA in native E. coli across biological replicates processed in different labs. Finally we propose a pipeline for the comprehensive discovery of DNA modifications in any genome without a priori knowledge of their chemical identities.

read more

Citations
More filters
Journal ArticleDOI

The Architecture of SARS-CoV-2 Transcriptome.

TL;DR: Functional investigation of the unknown transcripts and RNA modifications discovered in this study will open new directions to the understanding of the life cycle and pathogenicity of SARS-CoV-2.
Journal ArticleDOI

Opportunities and challenges in long-read sequencing data analysis.

TL;DR: The current landscape of available tools is reviewed, the principles of error correction, base modification detection, and long-read transcriptomics analysis are focused on, and the challenges that remain are highlighted.
Journal ArticleDOI

From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy.

TL;DR: Computational approaches determining the nanopore sequencing error rate are reviewed, and strategies for translation of raw sequencing data into base calls for detection of base modifications and for obtaining consensus sequences are outlined.
Journal ArticleDOI

Long-read human genome sequencing and its applications.

TL;DR: The currently available platforms, how the technologies are being applied to assemble and phase human genomes, and their impact on improving the authors' understanding of human genetic variation are discussed.
Journal ArticleDOI

The RNA modification landscape in human disease.

TL;DR: This work summarizes the state of knowledge and provides a catalog of RNA modifications and their links to neurological disorders, cancers, and other diseases, expecting that this catalog will help prioritize those RNA modifications for transcriptome-wide maps.
References
More filters
Book

Statistical Methods for Research Workers

R. A. Fisher
TL;DR: The prime object of as discussed by the authors is to put into the hands of research workers, and especially of biologists, the means of applying statistical tests accurately to numerical data accumulated in their own laboratories or available in the literature.
Journal ArticleDOI

On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other

TL;DR: In this paper, the authors show that the limit distribution is normal if n, n$ go to infinity in any arbitrary manner, where n = m = 8 and n = n = 8.
Posted ContentDOI

Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM

Heng Li
- 16 Mar 2013 - 
TL;DR: BWA-MEM automatically chooses between local and end-to-end alignments, supports paired-end reads and performs chimeric alignment, which is robust to sequencing errors and applicable to a wide range of sequence lengths from 70bp to a few megabases.
Proceedings Article

Fitting a mixture model by expectation maximization to discover motifs in biopolymers.

TL;DR: The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences.
Journal ArticleDOI

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.

TL;DR: Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences, is presented, demonstrating that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences or Oxford Nanopore technologies.
Related Papers (5)