scispace - formally typeset
Search or ask a question

Showing papers by "Yingrui Li published in 2009"


Journal ArticleDOI
TL;DR: SOAP2 is a significantly improved version of the short oligonucleotide alignment program that both reduces computer memory usage and increases alignment speed at an unprecedented rate and is compatible with both single- and paired-end reads.
Abstract: SOAP2 is a significantly improved version of the short oligonucleotide alignment program that both reduces computer memory usage and increases alignment speed at an unprecedented rate. We used a Burrows Wheeler Transformation (BWT) compression index to substitute the seed strategy for indexing the reference sequence in the main memory. We tested it on the whole human genome and found that this new algorithm reduced memory usage from 14.7 to 5.4 GB and improved alignment speed by 20-30 times. SOAP2 is compatible with both single- and paired-end reads. Additionally, this tool now supports multiple text and compressed file formats. A consensus builder has also been developed for consensus assembly and SNP detection from alignment of short reads on a reference genome.

3,502 citations


Journal ArticleDOI
TL;DR: This study establishes that five of the cucumber's seven chromosomes arose from fusions of ten ancestral chromosomes after divergence from Cucumis melo, and identifies 686 gene clusters related to phloem function.
Abstract: Cucumber is an economically important crop as well as a model system for sex determination studies and plant vascular biology. Here we report the draft genome sequence of Cucumis sativus var. sativus L., assembled using a novel combination of traditional Sanger and next-generation Illumina GA sequencing technologies to obtain 72.2-fold genome coverage. The absence of recent whole-genome duplication, along with the presence of few tandem duplications, explains the small number of genes in the cucumber. Our study establishes that five of the cucumber's seven chromosomes arose from fusions of ten ancestral chromosomes after divergence from Cucumis melo. The sequenced cucumber genome affords insight into traits such as its sex expression, disease resistance, biosynthesis of cucurbitacin and 'fresh green' odor. We also identify 686 gene clusters related to phloem function. The cucumber genome provides a valuable resource for developing elite cultivars and for studying the evolution and function of the plant vascular system.

1,289 citations


Journal ArticleDOI
TL;DR: A consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology that has a very low false call rate at any sequencing depth and excellent genome coverage at a high sequencing depth.
Abstract: Next-generation massively parallel sequencing technologies provide ultrahigh throughput at two orders of magnitude lower unit cost than capillary Sanger sequencing technology. One of the key applications of next-generation sequencing is studying genetic variation between individuals using whole-genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology. We designed this method by carefully considering the data quality, alignment, and experimental errors common to this technology. All of this information was integrated into a single quality score for each base under Bayesian theory to measure the accuracy of consensus calling. We tested this methodology using a large-scale human resequencing data set of 36× coverage and assembled a high-quality nonrepetitive consensus sequence for 92.25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered at 99.97% and 99.84% consistency, respectively. At a low sequencing depth, we used prior probability of dbSNP alleles and were able to improve coverage of the dbSNP sites significantly as compared to that obtained using a nonimputation model. Our analyses demonstrate that our method has a very low false call rate at any sequencing depth and excellent genome coverage at a high sequencing depth.

968 citations


Journal ArticleDOI
16 Oct 2009-Science
TL;DR: It is found that the domesticated silkworms are clearly genetically differentiated from the wild ones, but they have maintained large levels of genetic variability, suggesting a short domestication event involving a large number of individuals.
Abstract: A single-base pair resolution silkworm genetic variation map was constructed from 40 domesticated and wild silkworms, each sequenced to approximately threefold coverage, representing 99.88% of the genome. We identified ∼16 million single-nucleotide polymorphisms, many indels, and structural variations. We find that the domesticated silkworms are clearly genetically differentiated from the wild ones, but they have maintained large levels of genetic variability, suggesting a short domestication event involving a large number of individuals. We also identified signals of selection at 354 candidate genes that may have been important during domestication, some of which have enriched expression in the silk gland, midgut, and testis. These data add to our understanding of the domestication processes and may have applications in devising pest control strategies and advancing the use of silkworms as efficient bioreactors.

337 citations


Journal ArticleDOI
TL;DR: The YH database is a server that allows the user to easily browse and download data from the first Asian diploid genome, and is currently one of the three personal genome database, organizing the original data and analysis results in a user-friendly interface.
Abstract: The YH database is a server that allows the user to easily browse and download data from the first Asian diploid genome. The aim of this platform is to facilitate the study of this Asian genome and to enable improved organization and presentation large-scale personal genome data. Powered by GBrowse, we illustrate here the genome sequences, SNPs, and sequencing reads in the MapView. The relationships between phenotype and genotype can be searched by location, dbSNP ID, HGMD ID, gene symbol and disease name. A BLAST web service is also provided for the purpose of aligning query sequence against YH genome consensus. The YH database is currently one of the three personal genome database, organizing the original data and analysis results in a user-friendly interface, which is an endeavor to achieve fundamental goals for establishing personal medicine. The database is available at http://yh.genomics.org.cn.

40 citations


Journal ArticleDOI
TL;DR: Advances in parallelization allow a human genome to be sequenced using single-molecule technology to provide real-time information about the structure and function of the genome using a single molecule.
Abstract: Advances in parallelization allow a human genome to be sequenced using single-molecule technology.

20 citations


Journal ArticleDOI
TL;DR: In this paper, a new energy optimization design has been proposed to decrease the auxiliary system energy consumption and improve the product quality in the auxiliary riser fluid catalytic cracking (ARFCC) process.
Abstract: In view of the high energy consumption inherent in the auxiliary riser fluid catalytic cracking (ARFCC) process, a new energy optimization design has been suggested in this paper to decrease its auxiliary system energy cost and improve its product quality. The heat distribution of an auxiliary fractional system has been optimized and its surplus heat was used to heat crude gasoline, making low-temperature liquid crude gasoline into gas, which was then fed into the auxiliary reactor. The application in an ARFCC unit with 75 t/h of crude gasoline to be reprocessed showed, after the energy optimization design, that when the crude gasoline feed was heated from 40°C to 219°C, the contact temperature difference between the feed and the regenerated catalyst reduced from 650°C to 322°C, process exergy loss decreased by 77.8%, and less dry gas and coke was formed in the auxiliary reactor. At the same time, the energy-use optimization of the auxiliary fractional system increased its exergy recovery efficie...

1 citations