scispace - formally typeset
Search or ask a question
Topic

Contig

About: Contig is a research topic. Over the lifetime, 3106 publications have been published within this topic receiving 146470 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This work presents an improved method for sequencing variable regions within the 16S rRNA gene using Illumina's MiSeq platform, which is currently capable of producing paired 250-nucleotide reads and demonstrates that it can provide data that are at least as good as that generated by the 454 platform while providing considerably higher sequencing coverage for a fraction of the cost.
Abstract: Rapid advances in sequencing technology have changed the experimental landscape of microbial ecology. In the last 10 years, the field has moved from sequencing hundreds of 16S rRNA gene fragments per study using clone libraries to the sequencing of millions of fragments per study using next-generation sequencing technologies from 454 and Illumina. As these technologies advance, it is critical to assess the strengths, weaknesses, and overall suitability of these platforms for the interrogation of microbial communities. Here, we present an improved method for sequencing variable regions within the 16S rRNA gene using Illumina's MiSeq platform, which is currently capable of producing paired 250-nucleotide reads. We evaluated three overlapping regions of the 16S rRNA gene that vary in length (i.e., V34, V4, and V45) by resequencing a mock community and natural samples from human feces, mouse feces, and soil. By titrating the concentration of 16S rRNA gene amplicons applied to the flow cell and using a quality score-based approach to correct discrepancies between reads used to construct contigs, we were able to reduce error rates by as much as two orders of magnitude. Finally, we reprocessed samples from a previous study to demonstrate that large numbers of samples could be multiplexed and sequenced in parallel with shotgun metagenomes. These analyses demonstrate that our approach can provide data that are at least as good as that generated by the 454 platform while providing considerably higher sequencing coverage for a fraction of the cost.

5,417 citations

Journal ArticleDOI
TL;DR: The third generation of the CAP sequence assembly program is described, which has a capability to clip 5' and 3' low-quality regions of reads and uses forward-reverse constraints to correct assembly errors and link contigs.
Abstract: The shotgun sequencing strategy has been used widely in genome sequencing projects. A major phase in this strategy is to assemble short reads into long sequences. A number of DNA sequence assembly programs have been developed (Staden 1980; Peltola et al. 1984; Huang 1992; Smith et al. 1993; Gleizes and Henaut 1994; Lawrence et al. 1994; Kececioglu and Myers 1995; Sutton et al. 1995; Green 1996). The FAKII program provides a library of routines for each phase of the assembly process (Larson et al. 1996). The GAP4 program has a number of useful interactive features (Bonfield et al. 1995). The PHRAP program clips 5′ and 3′ low-quality regions of reads and uses base quality values in evaluation of overlaps and generation of contig sequences (Green 1996). TIGR Assembler has been used in a number of megabase microbial genome projects (Sutton et al. 1995). Continued development and improvement of sequence assembly programs are required to meet the challenges of the human, mouse, and maize genome projects. We have developed the third generation of the CAP sequence assembly program (Huang 1992). The CAP3 program includes a number of improvements and new features. A capability to clip 5′ and 3′ low-quality regions of reads is included in the CAP3 program. Base quality values produced by PHRED (Ewing et al. 1998) are used in computation of overlaps between reads, construction of multiple sequence alignments of reads, and generation of consensus sequences. Efficient algorithms are employed to identify and compute overlaps between reads. Forward–reverse constraints are used to correct assembly errors and link contigs. Results of CAP3 on four BAC data sets are presented. The performance of CAP3 was compared with that of PHRAP on a number of BAC data sets. PHRAP often produces longer contigs than CAP3 whereas CAP3 often produces fewer errors in consensus sequences than PHRAP. It is easier to construct scaffolds with CAP3 than with PHRAP on low-pass data with forward–reverse constraints. An unusual feature of CAP3 is the use of forward–reverse constraints in the construction of contigs. A forward–reverse constraint is often produced by sequencing of both ends of a subclone. A forward–reverse constraint specifies that the two reads should be on the opposite strands of the DNA molecule within a specified range of distance. By sequencing both ends of each subclone, a large number of forward–reverse constraints are produced for a cosmid or BAC data set. A difficulty with use of forward–reverse constraints in assembly is that some of the forward–reverse constraints are incorrect because of errors in lane tracking and cloning. Our strategy for dealing with this difficulty is based on the observation that a majority of the constraints are correct and wrong constraints usually occur randomly. Thus, a few unsatisfied constraints in a contig may not be sufficient to indicate an assembly error in the contig. However, if a sufficient number of constraints are all inconsistent with a join in a contig and all support an alternative join, it is likely that the current join is an error, and the alternative join should be made.

5,074 citations

Journal ArticleDOI
TL;DR: The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.
Abstract: Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.

2,760 citations

Journal ArticleDOI

[...]

TL;DR: Comparison of the performances of IDBA-UD and existing assemblers (Velvet, Velvet-SC, SOAPdenovo and Meta-IDBA) for different datasets, shows that IDba-UD can reconstruct longer contigs with higher accuracy.
Abstract: Motivation: Next-generation sequencing allows us to sequence reads from a microbial environment using single-cell sequencing or metagenomic sequencing technologies. However, both technologies suffer from the problem that sequencing depth of different regions of a genome or genomes from different species are highly uneven. Most existing genome assemblers usually have an assumption that sequencing depths are even. These assemblers fail to construct correct long contigs. Results: We introduce the IDBA-UD algorithm that is based on the de Bruijn graph approach for assembling reads from single-cell sequencing or metagenomic sequencing technologies with uneven sequencing depths. Several non-trivial techniques have been employed to tackle the problems. Instead of using a simple threshold, we use multiple depthrelative thresholds to remove erroneous k-mers in both low-depth and high-depth regions. The technique of local assembly with paired-end information is used to solve the branch problem of low-depth short repeat regions. To speed up the process, an error correction step is conducted to correct reads of high-depth regions that can be aligned to highconfident contigs. Comparison of the performances of IDBA-UD and existing assemblers (Velvet, Velvet-SC, SOAPdenovo and Meta-IDBA) for different datasets, shows that IDBA-UD can reconstruct longer contigs with higher accuracy. Availability: The IDBA-UD toolkit is available at our website http://www.cs.hku.hk/~alse/idba_ud Contact: chin@cs.hku.hk

2,494 citations

Journal ArticleDOI
09 Aug 1991-Science
TL;DR: The APC gene was identified in a contig initiated from the MCC gene and was found to encode an unusually large protein, and these two closely spaced genes encode proteins predicted to contain coiled-coil regions, which were also expressed in a wide variety of tissues.
Abstract: Recent studies suggest that one or more genes on chromosome 5q21 are important for the development of colorectal cancers, particularly those associated with familial adenomatous polyposis (FAP). To facilitate the identification of genes from this locus, a portion of the region that is tightly linked to FAP was cloned. Six contiguous stretches of sequence (contigs) containing approximately 5.5 Mb of DNA were isolated. Subclones from these contigs were used to identify and position six genes, all of which were expressed in normal colonic mucosa. Two of these genes (APC and MCC) are likely to contribute to colorectal tumorigenesis. The MCC gene had previously been identified by virtue of its mutation in human colorectal tumors. The APC gene was identified in a contig initiated from the MCC gene and was found to encode an unusually large protein. These two closely spaced genes encode proteins predicted to contain coiled-coil regions. Both genes were also expressed in a wide variety of tissues. Further studies of MCC and APC and their potential interaction should prove useful for understanding colorectal neoplasia.

2,364 citations


Network Information
Related Topics (5)
Genome
74.2K papers, 3.8M citations
94% related
Gene
211.7K papers, 10.3M citations
91% related
Gene expression
113.3K papers, 5.5M citations
86% related
Regulation of gene expression
85.4K papers, 5.8M citations
85% related
Complementary DNA
55.3K papers, 2.7M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023224
2022450
202189
2020100
2019108
201878