scispace - formally typeset
Search or ask a question
Topic

2 base encoding

About: 2 base encoding is a research topic. Over the lifetime, 210 publications have been published within this topic receiving 81608 citations.


Papers
More filters
Proceedings ArticleDOI
01 Feb 2017
TL;DR: This paper has proposed an approach that will take a dataset of DNA sequencing as an input and split them across the cluster machine by applying MapReduce implementation of Hadoop to make the search efficient for large scale genome sequencing applications.
Abstract: The technique that allows for researchers to read and convert the genetic information found in the DNA of any organisms is called Genome Sequencing. Genome Sequencing involves determining the order of the nucleotide subunits found in DNA, which consists of a small number of bases called short reads. The human genome is approximately 3 billion bases in length, which would take months or years to be processed on a single machine. So large numbers of short reads are available in such sequencing. In these cases, the first step in the data analysis pipeline is the short read mapping problem. Speed is becoming significantly important and challenging due to the huge volume of data. In this paper, we have proposed an approach that will take a dataset of DNA sequencing as an input and split them across the cluster machine by applying MapReduce implementation of Hadoop to make the search efficient for large scale genome sequencing applications.
Book ChapterDOI
01 Jan 2015
TL;DR: While these fourth-generation technologies are years away from widespread clinical use, they provide a glimpse into the ever more sophisticated utilization of synthetic materials and advanced electronics that will continue to make DNA sequence analysis even faster and less costly.
Abstract: The DNA sequencing platforms that are currently in widespread use to perform massively parallel sequencing, which as a group are currently referred to as next-generation sequencing (NGS) platforms, have enabled the genomic revolution in science and medicine. However, current NGS platforms do not represent the final stage of development of DNA sequencing technologies. A number of so-called third-generation approaches, which are already available commercially, make it possible to sequence individual DNA molecules without the need for library amplification steps. These approaches offer a number of advantages over current NGS methods including avoidance of the artifactual DNA mutations and strand biases introduced by even limited cycles of PCR; higher throughput and faster turnaround times; longer read lengths (by some platforms) that enhance de novo contig and genome assembly; higher consensus accuracy; and analysis of smaller quantities of nucleic acids which is a clear advantage in clinical settings. However, the third-generation approaches are themselves transitional to fourth-generation techniques that, while largely still in developmental phases, rely on entirely different principles of chemistry and physics to produce DNA sequence. While these fourth-generation technologies are years away from widespread clinical use, they provide a glimpse into the ever more sophisticated utilization of synthetic materials and advanced electronics that will continue to make DNA sequence analysis even faster and less costly.
Book ChapterDOI
Simon Dear1
24 Feb 1998
TL;DR: The process of reconstruction of the sequence of a clone from the sequences of its many subclones is described in this chapter in the context of large-scale sequencing of the human genome and the key informatics problems with reference to the software that has been developed to address them.
Abstract: The accurate determination of the sequence of nucleotide bases in a genomic region involves many steps, the last of which is DNA sequencing. Current sequencing methods rely on the electrophoretic separation of their reaction products, the resolution of which is considerably less than the typical size of the clone being sequenced. The process of reconstruction of the sequence of a clone from the sequences of its many subclones is described in this chapter in the context of large-scale sequencing of the human genome. It also discusses the key informatics problems with reference to the software that has been developed to address them. Several Unix-based packages and components have been developed for gel image processing, mostly by large genome centers with the aim of streamlining their operations and improving the quality of their sequence data. The first step in the reconstruction of the sequence of the clone is to identify and remove all sequencing vector, because it is a product of subcloning. Several approaches to automated vector clipping have been developed, such as the program VECTOR_CLIP. PREGAP, a part of the Staden package, can take a batch of data from a variety of sequencing machines, gather information required for processing the reading, identify the good quality data, and mark sequencing and cloning vector and Alu repeats. Cloning vector will not be present in the regions that are already identified as sequencing vector, and, if it is present at all, it will comprise one end or all of the reading. Both VECTOR_CLIP and CROSS_MATCH perform this function well.
Posted ContentDOI
28 Dec 2015-bioRxiv
TL;DR: The method presented in this manuscript constructs copies of a nucleic acid molecule that are consecutively connected to the nucleic Acid molecule that can be sequenced by a nanopore device, enabling replicate reads, thus improving overall sequencing accuracy.
Abstract: Sequencing at single-nucleotide resolution using nanopore devices is performed with reported error rates 10.5-20.7% (Ip et al., 2015). Since errors occur randomly during sequencing, repeating the sequencing procedure for the same DNA strands several times can generate sequencing results based on consensus derived from replicate readings, thus reducing overall error rates. The method presented in this manuscript constructs copies of a nucleic acid molecule that are consecutively connected to the nucleic acid molecule. Such copies are useful because they can be sequenced by a nanopore device, enabling replicate reads, thus improving overall sequencing accuracy.
Book ChapterDOI
15 Jul 2005
TL;DR: Overall, single molecule array (SMA)-based sequencing will enable analysis of individual whole human genomes in a single experiment to achieve unprecedented levels of cost reduction and increased throughput.
Abstract: Single molecule array-based sequencing permits simultaneous analysis of potentially hundreds of millions of single molecules of DNA per cm2 in an array-based format. In this massively parallel approach to DNA sequence acquisition, reagents are utilized in an extremely efficient manner to achieve unprecedented levels of cost reduction and increased throughput. Ultimately, single molecule array (SMA)-based sequencing will enable analysis of individual whole human genomes in a single experiment. Keywords: sequencing; resequencing; whole-genome sequencing; single molecule array; clustered arrays; sequencing chemistry; fluorescence detection; SNP detection; genotyping; genome variation

Network Information
Related Topics (5)
Genome
74.2K papers, 3.8M citations
76% related
Gene
211.7K papers, 10.3M citations
73% related
Regulation of gene expression
85.4K papers, 5.8M citations
72% related
RNA
111.6K papers, 5.4M citations
72% related
DNA
107.1K papers, 4.7M citations
71% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20181
20179
201618
201522
20147
201325