Showing papers in "BMC Genomics in 2010"

PDF

Open Access

Journal Article•DOI•

Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery

[...]

Thomas L. Parchman¹, Katherine S Geist², Johan A. Grahnen, Craig W. Benkman¹, C. Alex Buerkle¹ - Show less +1 more•Institutions (2)

University of Wyoming¹, Beloit College²

16 Mar 2010-BMC Genomics

TL;DR: This sequencing study of expressed genes from Lodgepole pine, including their assembly and annotation, and their potential for molecular marker development to support population and association genetic studies illustrate the utility of next generation sequencing as a basis for marker development and population genomics in non-model species.

...read moreread less

Abstract: Massively parallel sequencing of cDNA is now an efficient route for generating enormous sequence collections that represent expressed genes. This approach provides a valuable starting point for characterizing functional genetic variation in non-model organisms, especially where whole genome sequencing efforts are currently cost and time prohibitive. The large and complex genomes of pines (Pinus spp.) have hindered the development of genomic resources, despite the ecological and economical importance of the group. While most genomic studies have focused on a single species (P. taeda), genomic level resources for other pines are insufficiently developed to facilitate ecological genomic research. Lodgepole pine (P. contorta) is an ecologically important foundation species of montane forest ecosystems and exhibits substantial adaptive variation across its range in western North America. Here we describe a sequencing study of expressed genes from P. contorta, including their assembly and annotation, and their potential for molecular marker development to support population and association genetic studies. We obtained 586,732 sequencing reads from a 454 GS XLR70 Titanium pyrosequencer (mean length: 306 base pairs). A combination of reference-based and de novo assemblies yielded 63,657 contigs, with 239,793 reads remaining as singletons. Based on sequence similarity with known proteins, these sequences represent approximately 17,000 unique genes, many of which are well covered by contig sequences. This sequence collection also included a surprisingly large number of retrotransposon sequences, suggesting that they are highly transcriptionally active in the tissues we sampled. We located and characterized thousands of simple sequence repeats and single nucleotide polymorphisms as potential molecular markers in our assembled and annotated sequences. High quality PCR primers were designed for a substantial number of the SSR loci, and a large number of these were amplified successfully in initial screening. This sequence collection represents a major genomic resource for P. contorta, and the large number of genetic markers characterized should contribute to future research in this and other pines. Our results illustrate the utility of next generation sequencing as a basis for marker development and population genomics in non-model species.

...read moreread less

420 citations

Journal Article•DOI•

De novo assembly and characterization of root transcriptome using Illumina paired-end sequencing and development of cSSR markers in sweetpotato (Ipomoea batatas)

[...]

Wang Zhangying¹, Boping Fang¹, Chen Jingyi¹, Zhang Xiongjian¹, Zhongxia Luo¹, Lifei Huang¹, Chen Xinliang¹, Yujun Li¹ - Show less +4 more•Institutions (1)

Crops Research Institute¹

24 Dec 2010-BMC Genomics

TL;DR: A substantial fraction of sweetpotato transcript sequences were generated, which can be used to discover novel genes associated with tuberous root formation and development and will also make it possible to construct high density microarrays for further characterization of gene expression profiles during these processes.

...read moreread less

Abstract: The tuberous root of sweetpotato is an important agricultural and biological organ. There are not sufficient transcriptomic and genomic data in public databases for understanding of the molecular mechanism underlying the tuberous root formation and development. Thus, high throughput transcriptome sequencing is needed to generate enormous transcript sequences from sweetpotato root for gene discovery and molecular marker development. In this study, more than 59 million sequencing reads were generated using Illumina paired-end sequencing technology. De novo assembly yielded 56,516 unigenes with an average length of 581 bp. Based on sequence similarity search with known proteins, a total of 35,051 (62.02%) genes were identified. Out of these annotated unigenes, 5,046 and 11,983 unigenes were assigned to gene ontology and clusters of orthologous group, respectively. Searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG) indicated that 17,598 (31.14%) unigenes were mapped to 124 KEGG pathways, and 11,056 were assigned to metabolic pathways, which were well represented by carbohydrate metabolism and biosynthesis of secondary metabolite. In addition, 4,114 cDNA SSRs (cSSRs) were identified as potential molecular markers in our unigenes. One hundred pairs of PCR primers were designed and used for validation of the amplification and assessment of the polymorphism in genomic DNA pools. The result revealed that 92 primer pairs were successfully amplified in initial screening tests. This study generated a substantial fraction of sweetpotato transcript sequences, which can be used to discover novel genes associated with tuberous root formation and development and will also make it possible to construct high density microarrays for further characterization of gene expression profiles during these processes. Thousands of cSSR markers identified in the present study can enrich molecular markers and will facilitate marker-assisted selection in sweetpotato breeding. Overall, these sequences and markers will provide valuable resources for the sweetpotato community. Additionally, these results also suggested that transcriptome analysis based on Illumina paired-end sequencing is a powerful tool for gene discovery and molecular marker development for non-model species, especially those with large and complex genome.

...read moreread less

403 citations

Journal Article•DOI•

De novo characterization of a whitefly transcriptome and analysis of its gene expression during development.

[...]

Xiao-Wei Wang¹, Jun-Bo Luan¹, Junmin Li¹, Yan-Yuan Bao¹, Chuan-Xi Zhang¹, Shu-Sheng Liu¹ - Show less +2 more•Institutions (1)

Institute of Insect Sciences, Zhejiang University¹

24 Jun 2010-BMC Genomics

TL;DR: The data provides the most comprehensive sequence resource available for whitefly study and demonstrates that the Illumina sequencing allows de novo transcriptome assembly and gene expression analysis in a species lacking genome information.

...read moreread less

Abstract: Whitefly (Bemisia tabaci) causes extensive crop damage throughout the world by feeding directly on plants and by vectoring hundreds of species of begomoviruses. Yet little is understood about its genes involved in development, insecticide resistance, host range plasticity and virus transmission. To facilitate research on whitefly, we present a method for de novo assembly of whitefly transcriptome using short read sequencing technology (Illumina). In a single run, we produced more than 43 million sequencing reads. These reads were assembled into 168,900 unique sequences (mean size = 266 bp) which represent more than 10-fold of all the whitefly sequences deposited in the GenBank (as of March 2010). Based on similarity search with known proteins, these analyses identified 27,290 sequences with a cut-off E-value above 10-5. Assembled sequences were annotated with gene descriptions, gene ontology and clusters of orthologous group terms. In addition, we investigated the transcriptome changes during whitefly development using a tag-based digital gene expression (DGE) system. We obtained a sequencing depth of over 2.5 million tags per sample and identified a large number of genes associated with specific developmental stages and insecticide resistance. Our data provides the most comprehensive sequence resource available for whitefly study and demonstrates that the Illumina sequencing allows de novo transcriptome assembly and gene expression analysis in a species lacking genome information. We anticipate that next generation sequencing technologies hold great potential for the study of the transcriptome in other non-model organisms.

...read moreread less

380 citations

Journal Article•DOI•

An In silico approach for the evaluation of DNA barcodes

[...]

Gentile Francesco Ficetola¹, Gentile Francesco Ficetola², Eric Coissac¹, Stéphanie Zundel¹, Tiayyba Riaz¹, Wasim Shehzad¹, Julien Bessière¹, Pierre Taberlet¹, François Pompanon¹ - Show less +5 more•Institutions (2)

Joseph Fourier University¹, University of Milan²

16 Jul 2010-BMC Genomics

TL;DR: In this paper, a standard method for evaluating barcode quality, based on the use of a new bioinformatic tool that performs in silico PCR over large databases, is presented.

...read moreread less

Abstract: DNA barcoding is a key tool for assessing biodiversity in both taxonomic and environmental studies. Essential features of barcodes include their applicability to a wide spectrum of taxa and their ability to identify even closely related species. Several DNA regions have been proposed as barcodes and the region selected strongly influences the output of a study. However, formal comparisons between barcodes remained limited until now. Here we present a standard method for evaluating barcode quality, based on the use of a new bioinformatic tool that performs in silico PCR over large databases. We illustrate this approach by comparing the taxonomic coverage and the resolution of several DNA regions already proposed for the barcoding of vertebrates. To assess the relationship between in silico and in vitro PCR, we also developed specific primers amplifying different species of Felidae, and we tested them using both kinds of PCR Tests on specific primers confirmed the correspondence between in silico and in vitro PCR. Nevertheless, results of in silico and in vitro PCRs can be somehow different, also because tuning PCR conditions can increase the performance of primers with limited taxonomic coverage. The in silico evaluation of DNA barcodes showed a strong variation of taxonomic coverage (i.e., universality): barcodes based on highly degenerated primers and those corresponding to the conserved region of the Cyt-b showed the highest coverage. As expected, longer barcodes had a better resolution than shorter ones, which are however more convenient for ecological studies analysing environmental samples. In silico PCR could be used to improve the performance of a study, by allowing the preliminary comparison of several DNA regions in order to identify the most appropriate barcode depending on the study aims.

...read moreread less

356 citations

Journal Article•DOI•

Identification and developmental expression of the full complement of Cytochrome P450 genes in Zebrafish

[...]

Jared V. Goldstone¹, Andrew G. McArthur, Akira Kubota¹, Juliano Zanette¹, Juliano Zanette², Thiago E.M. Parente³, Thiago E.M. Parente¹, Maria Jonsson¹, Maria Jonsson⁴, David R. Nelson⁵, John J. Stegeman¹ - Show less +7 more•Institutions (5)

Woods Hole Oceanographic Institution¹, Universidade Federal do Rio Grande do Sul², Federal University of Rio de Janeiro³, Uppsala University⁴, University of Tennessee Health Science Center⁵

18 Nov 2010-BMC Genomics

TL;DR: It is revealed that the majority of zebrafish CYP genes are expressed in embryos, with waves of expression of different sets of genes over the course of development, which provides a foundation for the use ofZebrafish as a model in toxicological, pharmacological and chemical disease research.

...read moreread less

Abstract: Increasing use of zebrafish in drug discovery and mechanistic toxicology demands knowledge of cytochrome P450 (CYP) gene regulation and function. CYP enzymes catalyze oxidative transformation leading to activation or inactivation of many endogenous and exogenous chemicals, with consequences for normal physiology and disease processes. Many CYPs potentially have roles in developmental specification, and many chemicals that cause developmental abnormalities are substrates for CYPs. Here we identify and annotate the full suite of CYP genes in zebrafish, compare these to the human CYP gene complement, and determine the expression of CYP genes during normal development. Zebrafish have a total of 94 CYP genes, distributed among 18 gene families found also in mammals. There are 32 genes in CYP families 5 to 51, most of which are direct orthologs of human CYPs that are involved in endogenous functions including synthesis or inactivation of regulatory molecules. The high degree of sequence similarity suggests conservation of enzyme activities for these CYPs, confirmed in reports for some steroidogenic enzymes (e.g. CYP19, aromatase; CYP11A, P450scc; CYP17, steroid 17a-hydroxylase), and the CYP26 retinoic acid hydroxylases. Complexity is much greater in gene families 1, 2, and 3, which include CYPs prominent in metabolism of drugs and pollutants, as well as of endogenous substrates. There are orthologous relationships for some CYP1 s and some CYP3 s between zebrafish and human. In contrast, zebrafish have 47 CYP2 genes, compared to 16 in human, with only two (CYP2R1 and CYP2U1) recognized as orthologous based on sequence. Analysis of shared synteny identified CYP2 gene clusters evolutionarily related to mammalian CYP2 s, as well as unique clusters. Transcript profiling by microarray and quantitative PCR revealed that the majority of zebrafish CYP genes are expressed in embryos, with waves of expression of different sets of genes over the course of development. Transcripts of some CYP occur also in oocytes. The results provide a foundation for the use of zebrafish as a model in toxicological, pharmacological and chemical disease research.

...read moreread less

349 citations

Journal Article•DOI•

Computational approaches for detecting protein complexes from protein interaction networks: a survey

[...]

Xiaoli Li¹, Min Wu², Chee Keong Kwoh², See-Kiong Ng¹•Institutions (2)

Institute for Infocomm Research Singapore¹, Nanyang Technological University²

10 Feb 2010-BMC Genomics

TL;DR: The state-of-the-art techniques for computational detection of protein complexes are reviewed, some promising research directions in this field are discussed, and experimental results with yeast protein interaction data show that the interaction subgraphs discovered by various computational methods matched well with actual protein complexes.

...read moreread less

Abstract: Most proteins form macromolecular complexes to perform their biological functions. However, experimentally determined protein complex data, especially of those involving more than two protein partners, are relatively limited in the current state-of-the-art high-throughput experimental techniques. Nevertheless, many techniques (such as yeast-two-hybrid) have enabled systematic screening of pairwise protein-protein interactions en masse. Thus computational approaches for detecting protein complexes from protein interaction data are useful complements to the limited experimental methods. They can be used together with the experimental methods for mapping the interactions of proteins to understand how different proteins are organized into higher-level substructures to perform various cellular functions. Given the abundance of pairwise protein interaction data from high-throughput genome-wide experimental screenings, a protein interaction network can be constructed from protein interaction data by considering individual proteins as the nodes, and the existence of a physical interaction between a pair of proteins as a link. This binary protein interaction graph can then be used for detecting protein complexes using graph clustering techniques. In this paper, we review and evaluate the state-of-the-art techniques for computational detection of protein complexes, and discuss some promising research directions in this field. Experimental results with yeast protein interaction data show that the interaction subgraphs discovered by various computational methods matched well with actual protein complexes. In addition, the computational approaches have also improved in performance over the years. Further improvements could be achieved if the quality of the underlying protein interaction data can be considered adequately to minimize the undesirable effects from the irrelevant and noisy sources, and the various biological evidences can be better incorporated into the detection process to maximize the exploitation of the increasing wealth of biological knowledge available.

...read moreread less

338 citations

Journal Article•DOI•

Molecular analysis of the diversity of vaginal microbiota associated with bacterial vaginosis

[...]

Zongxin Ling¹, Jianming Kong², Jianming Kong¹, Fang Liu¹, Haibin Zhu¹, Xiaoyi Chen¹, Yuezhu Wang³, Lanjuan Li¹, Karen E. Nelson⁴, Yaxian Xia¹, Charlie Xiang¹, Charlie Xiang⁴ - Show less +8 more•Institutions (4)

Zhejiang University¹, Zhejiang California International NanoSystems Institute², Chinese National Human Genome Center³, J. Craig Venter Institute⁴

07 Sep 2010-BMC Genomics

TL;DR: The data presented here have clearly profiled the overall structure of vaginal communities and clearly demonstrated that BV is associated with a dramatic increase in the taxonomic richness and diversity of vaginal microbiota.

...read moreread less

Abstract: Bacterial vaginosis (BV) is an ecological disorder of the vaginal microbiota that affects millions of women annually, and is associated with numerous adverse health outcomes including pre-term birth and the acquisition of sexually transmitted infections. However, little is known about the overall structure and composition of vaginal microbial communities; most of the earlier studies focused on predominant vaginal bacteria in the process of BV. In the present study, the diversity and richness of vaginal microbiota in 50 BV positive and 50 healthy women from China were investigated using culture-independent PCR-denaturing gradient gel electrophoresis (DGGE) and barcoded 454 pyrosequencing methods, and validated by quantitative PCR. Our data demonstrated that there was a profound shift in the absolute and relative abundances of bacterial species present in the vagina when comparing populations associated with healthy and diseased conditions. In spite of significant interpersonal variations, the diversity of vaginal microbiota in the two groups could be clearly divided into two clusters. A total of 246,359 high quality pyrosequencing reads was obtained for evaluating bacterial diversity and 24,298 unique sequences represented all phylotypes. The most predominant phyla of bacteria identified in the vagina belonged to Firmicutes, Bacteroidetes, Actinobacteria and Fusobacteria. The higher number of phylotypes in BV positive women over healthy is consistent with the results of previous studies and a large number of low-abundance taxa which were missed in previous studies were revealed. Although no single bacterium could be identified as a specific marker for healthy over diseased conditions, three phyla - Bacteroidetes, Actinobacteria and Fusobacteria, and eight genera including Gardnerella, Atopobium, Megasphaera, Eggerthella, Aerococcus, Leptotrichia/Sneathia, Prevotella and Papillibacter were strongly associated with BV (p < 0.05). These genera are potentially excellent markers and could be used as targets for clinical BV diagnosis by molecular approaches. The data presented here have clearly profiled the overall structure of vaginal communities and clearly demonstrated that BV is associated with a dramatic increase in the taxonomic richness and diversity of vaginal microbiota. The study also provides the most comprehensive picture of the vaginal community structure and the bacterial ecosystem, and significantly contributes to the current understanding of the etiology of BV.

...read moreread less

314 citations

Journal Article•DOI•

Genome-wide characterization of simple sequence repeats in cucumber (Cucumis sativus L.)

[...]

Pablo Federico Cavagnaro¹, Pablo Federico Cavagnaro², Douglas Senalik¹, Luming Yang¹, Philipp W. Simon¹, Timothy T. Harkins³, Chinnappa D. Kodira, Sanwen Huang, Yiqun Weng¹ - Show less +5 more•Institutions (3)

University of Wisconsin-Madison¹, National Scientific and Technical Research Council², Hoffmann-La Roche³

15 Oct 2010-BMC Genomics

TL;DR: The cucumber genome is rich in microsatellites; AT and AAG are the most abundant repeat motifs in genomic and EST sequences of cucumber, respectively; the level of polymorphism seems to be positively associated with the number of repeat units in the microsatellite.

...read moreread less

Abstract: Cucumber, Cucumis sativus L. is an important vegetable crop worldwide. Until very recently, cucumber genetic and genomic resources, especially molecular markers, have been very limited, impeding progress of cucumber breeding efforts. Microsatellites are short tandemly repeated DNA sequences, which are frequently favored as genetic markers due to their high level of polymorphism and codominant inheritance. Data from previously characterized genomes has shown that these repeats vary in frequency, motif sequence, and genomic location across taxa. During the last year, the genomes of two cucumber genotypes were sequenced including the Chinese fresh market type inbred line '9930' and the North American pickling type inbred line 'Gy14'. These sequences provide a powerful tool for developing markers in a large scale. In this study, we surveyed and characterized the distribution and frequency of perfect microsatellites in 203 Mbp assembled Gy14 DNA sequences, representing 55% of its nuclear genome, and in cucumber EST sequences. Similar analyses were performed in genomic and EST data from seven other plant species, and the results were compared with those of cucumber. A total of 112,073 perfect repeats were detected in the Gy14 cucumber genome sequence, accounting for 0.9% of the assembled Gy14 genome, with an overall density of 551.9 SSRs/Mbp. While tetranucleotides were the most frequent microsatellites in genomic DNA sequence, dinucleotide repeats, which had more repeat units than any other SSR type, had the highest cumulative sequence length. Coding regions (ESTs) of the cucumber genome had fewer microsatellites compared to its genomic sequence, with trinucleotides predominating in EST sequences. AAG was the most frequent repeat in cucumber ESTs. Overall, AT-rich motifs prevailed in both genomic and EST data. Compared to the other species examined, cucumber genomic sequence had the highest density of SSRs (although comparable to the density of poplar, grapevine and rice), and was richest in AT dinucleotides. Using an electronic PCR strategy, we investigated the polymorphism between 9930 and Gy14 at 1,006 SSR loci, and found unexpectedly high degree of polymorphism (48.3%) between the two genotypes. The level of polymorphism seems to be positively associated with the number of repeat units in the microsatellite. The in silico PCR results were validated empirically in 660 of the 1,006 SSR loci. In addition, primer sequences for more than 83,000 newly-discovered cucumber microsatellites, and their exact positions in the Gy14 genome assembly were made publicly available. The cucumber genome is rich in microsatellites; AT and AAG are the most abundant repeat motifs in genomic and EST sequences of cucumber, respectively. Considering all the species investigated, some commonalities were noted, especially within the monocot and dicot groups, although the distribution of motifs and the frequency of certain repeats were characteristic of the species examined. The large number of SSR markers developed from this study should be a significant contribution to the cucurbit research community.

...read moreread less

304 citations

Journal Article•DOI•

Comparing de novo assemblers for 454 transcriptome data

[...]

Sujai Kumar¹, Mark Blaxter¹•Institutions (1)

University of Edinburgh¹

16 Oct 2010-BMC Genomics

TL;DR: A systematic comparison of five assemblers to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode Litomosoides sigmodontis found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number of contigs.

...read moreread less

Abstract: Background: Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base) reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC) to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode Litomosoides sigmodontis. Results: Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects), which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs. Conclusions: Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies from different programs however gave a more credible final product, and this strategy is recommended.

...read moreread less

301 citations

Journal Article•DOI•

Accounting for multiple comparisons in a genome-wide association study (GWAS)

[...]

Randall C. Johnson¹, Randall C. Johnson², George W. Nelson¹, Jennifer L. Troyer¹, James A. Lautenberger, Bailey Kessing¹, Cheryl A. Winkler¹, Stephen J. O'Brien - Show less +4 more•Institutions (2)

Science Applications International Corporation¹, Conservatoire national des arts et métiers²

22 Dec 2010-BMC Genomics

TL;DR: Correcting for the number of LD blocks resulted in an anti-conservative Bonferroni adjustment, and SLIDE and simpleℳ are particularly useful when using a statistical test not handled in optimized permutation testing packages, and genome-wide corrected p-values using SLIDE, are much easier to interpret for consumers of GWAS studies.

...read moreread less

Abstract: Background: As we enter an era when testing millions of SNPs in a single gene association study will become the standard, consideration of multiple comparisons is an essential part of determining statistical significance. Bonferroni adjustments can be made but are conservative due to the preponderance of linkage disequilibrium (LD) between genetic markers, and permutation testing is not always a viable option. Three major classes of corrections have been proposed to correct the dependent nature of genetic data in Bonferroni adjustments: permutation testing and related alternatives, principal components analysis (PCA), and analysis of blocks of LD across the genome. We consider seven implementations of these commonly used methods using data from 1514 European American participants genotyped for 700,078 SNPs in a GWAS for AIDS. Results: A Bonferroni correction using the number of LD blocks found by the three algorithms implemented by Haploview resulted in an insufficiently conservative threshold, corresponding to a genome-wide significance level of a = 0.15 - 0.20. We observed a moderate increase in power when using PRESTO, SLIDE, and simpleℳ when compared with traditional Bonferroni methods for population data genotyped on the Affymetrix 6.0 platform in European Americans (a = 0.05 thresholds between 1 × 10 -7 and 7 × 10 -8 ). Conclusions: Correcting for the number of LD blocks resulted in an anti-conservative Bonferroni adjustment. SLIDE and simpleℳ are particularly useful when using a statistical test not handled in optimized permutation testing packages, and genome-wide corrected p-values using SLIDE, are much easier to interpret for consumers of GWAS studies.

...read moreread less

290 citations

Journal Article•DOI•

High-throughput SNP discovery through deep resequencing of a reduced representation library to anchor and orient scaffolds in the soybean whole genome sequence

[...]

David L. Hyten¹, Steven B. Cannon¹, Qijian Song², Qijian Song¹, Nathan T. Weeks¹, Edward W. Fickus¹, Randy C. Shoemaker¹, James E. Specht³, Andrew Farmer⁴, Gregory D. May⁴, Perry B. Cregan¹ - Show less +7 more•Institutions (4)

United States Department of Agriculture¹, University of Maryland, College Park², University of Nebraska–Lincoln³, National Center for Genome Resources⁴

15 Jan 2010-BMC Genomics

TL;DR: Next generation sequencing was combined with high-throughput SNP detection assays to quickly discover large numbers of SNPs and those SNPs were then used to create a high resolution genetic map that assisted in the assembly of scaffolds from the 8× whole genome shotgun sequences into pseudomolecules corresponding to chromosomes of the organism.

...read moreread less

Abstract: The Soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy, but its marker density was only sufficient to properly orient 66% of the sequence scaffolds. The discovery and genetic mapping of more single nucleotide polymorphism (SNP) markers were needed to anchor and orient the remaining genome sequence. To that end, next generation sequencing and high-throughput genotyping were combined to obtain a much higher resolution genetic map that could be used to anchor and orient most of the remaining sequence and to help validate the integrity of the existing scaffold builds. A total of 7,108 to 25,047 predicted SNPs were discovered using a reduced representation library that was subsequently sequenced by the Illumina sequence-by-synthesis method on the clonal single molecule array platform. Using multiple SNP prediction methods, the validation rate of these SNPs ranged from 79% to 92.5%. A high resolution genetic map using 444 recombinant inbred lines was created with 1,790 SNP markers. Of the 1,790 mapped SNP markers, 1,240 markers had been selectively chosen to target existing unanchored or un-oriented sequence scaffolds, thereby increasing the amount of anchored sequence to 97%. We have demonstrated how next generation sequencing was combined with high-throughput SNP detection assays to quickly discover large numbers of SNPs. Those SNPs were then used to create a high resolution genetic map that assisted in the assembly of scaffolds from the 8× whole genome shotgun sequences into pseudomolecules corresponding to chromosomes of the organism.

...read moreread less

Journal Article•DOI•

High-throughput genome sequencing of two Listeria monocytogenes clinical isolates during a large foodborne outbreak

[...]

Matthew W. Gilmour¹, Matthew W. Gilmour², Morag R. Graham¹, Morag R. Graham², Gary Van Domselaar¹, Shaun Tyler¹, Heather Kent¹, Keri M. Trout-Yakel¹, Oscar E. Larios², Vanessa Allen, Barbara Lee³, Celine Nadon¹, Celine Nadon² - Show less +9 more•Institutions (3)

Public Health Agency of Canada¹, University of Manitoba², Canadian Food Inspection Agency³

18 Feb 2010-BMC Genomics

TL;DR: This study confirms that the latest generation of DNA sequencing technologies can be applied during high priority public health events, and laboratories need to prepare for this inevitability and assess how to properly analyze and interpret whole genome sequences in the context of molecular epidemiology.

...read moreread less

Abstract: A large, multi-province outbreak of listeriosis associated with ready-to-eat meat products contaminated with Listeria monocytogenes serotype 1/2a occurred in Canada in 2008. Subtyping of outbreak-associated isolates using pulsed-field gel electrophoresis (PFGE) revealed two similar but distinct Asc I PFGE patterns. High-throughput pyrosequencing of two L. monocytogenes isolates was used to rapidly provide the genome sequence of the primary outbreak strain and to investigate the extent of genetic diversity associated with a change of a single restriction enzyme fragment during PFGE. The chromosomes were collinear, but differences included 28 single nucleotide polymorphisms (SNPs) and three indels, including a 33 kbp prophage that accounted for the observed difference in Asc I PFGE patterns. The distribution of these traits was assessed within further clinical, environmental and food isolates associated with the outbreak, and this comparison indicated that three distinct, but highly related strains may have been involved in this nationwide outbreak. Notably, these two isolates were found to harbor a 50 kbp putative mobile genomic island encoding translocation and efflux functions that has not been observed in other Listeria genomes. High-throughput genome sequencing provided a more detailed real-time assessment of genetic traits characteristic of the outbreak strains than could be achieved with routine subtyping methods. This study confirms that the latest generation of DNA sequencing technologies can be applied during high priority public health events, and laboratories need to prepare for this inevitability and assess how to properly analyze and interpret whole genome sequences in the context of molecular epidemiology.

...read moreread less

Journal Article•DOI•

The proteolytic system of lactic acid bacteria revisited: a genomic comparison

[...]

Mengjin Liu¹, Mengjin Liu², Jumamurat R. Bayjanov², Bernadet Renckens², Arjen Nauta¹, Roland J. Siezen² - Show less +2 more•Institutions (2)

FrieslandCampina¹, Radboud University Nijmegen²

15 Jan 2010-BMC Genomics

TL;DR: The improved functional annotation of the proteolytic system components provides an excellent framework for future experimental validations of predicted enzymatic activities and can be used to tune the strain selection process in food fermentations.

...read moreread less

Abstract: Lactic acid bacteria (LAB) are a group of gram-positive, lactic acid producing Firmicutes. They have been extensively used in food fermentations, including the production of various dairy products. The proteolytic system of LAB converts proteins to peptides and then to amino acids, which is essential for bacterial growth and also contributes significantly to flavor compounds as end-products. Recent developments in high-throughput genome sequencing and comparative genomics hybridization arrays provide us with opportunities to explore the diversity of the proteolytic system in various LAB strains. We performed a genome-wide comparative genomics analysis of proteolytic system components, including cell-wall bound proteinase, peptide transporters and peptidases, in 22 sequenced LAB strains. The peptidase families PepP/PepQ/PepM, PepD and PepI/PepR/PepL are described as examples of our in silico approach to refine the distinction of subfamilies with different enzymatic activities. Comparison of protein 3D structures of proline peptidases PepI/PepR/PepL and esterase A allowed identification of a conserved core structure, which was then used to improve phylogenetic analysis and functional annotation within this protein superfamily. The diversity of proteolytic system components in 39 Lactococcus lactis strains was explored using pangenome comparative genome hybridization analysis. Variations were observed in the proteinase PrtP and its maturation protein PrtM, in one of the Opp transport systems and in several peptidases between strains from different Lactococcus subspecies or from different origin. The improved functional annotation of the proteolytic system components provides an excellent framework for future experimental validations of predicted enzymatic activities. The genome sequence data can be coupled to other "omics" data e.g. transcriptomics and metabolomics for prediction of proteolytic and flavor-forming potential of LAB strains. Such an integrated approach can be used to tune the strain selection process in food fermentations.

...read moreread less

Journal Article•DOI•

Adaptation of Hansenula polymorpha to methanol: a transcriptome analysis

[...]

Tim van Zutphen¹, Richard J.S. Baerends¹, Kim A. Susanna¹, Anne de Jong¹, Oscar P. Kuipers¹, Marten Veenhuis¹, Ida J. van der Klei¹ - Show less +3 more•Institutions (1)

University of Groningen¹

04 Jan 2010-BMC Genomics

TL;DR: Transcriptional profiling of H. polymorpha cells shifted from glucose to meethanol showed the expected downregulation of glycolytic genes together with upregulation of the methanol utilisation pathway, which may be responsible for the enhanced peroxisomal ?

...read moreread less

Abstract: Background: Methylotrophic yeast species (e.g. Hansenula polymorpha, Pichia pastoris) can grow on methanol as sole source of carbon and energy. These organisms are important cell factories for the production of recombinant proteins, but are also used in fundamental research as model organisms to study peroxisome biology. During exponential growth on glucose, cells of H. polymorpha typically contain a single, small peroxisome that is redundant for growth while on methanol multiple, enlarged peroxisomes are present. These organelles are crucial to support growth on methanol, as they contain key enzymes of methanol metabolism. In this study, changes in the transcriptional profiles during adaptation of H. polymorpha cells from glucose- to methanol-containing media were investigated using DNA-microarray analyses. Results: Two hours after the shift of cells from glucose to methanol nearly 20% (1184 genes) of the approximately 6000 annotated H. polymorpha genes were significantly upregulated with at least a two-fold differential expression. Highest upregulation (> 300-fold) was observed for the genes encoding the transcription factor Mpp1 and formate dehydrogenase, an enzyme of the methanol dissimilation pathway. Upregulated genes also included genes encoding other enzymes of methanol metabolism as well as of peroxisomal b-oxidation. A moderate increase in transcriptional levels (up to 4-fold) was observed for several PEX genes, which are involved in peroxisome biogenesis. Only PEX11 and PEX32 were higher upregulated. In addition, an increase was observed in expression of the several ATG genes, which encode proteins involved in autophagy and autophagy processes. The strongest upregulation was observed for ATG8 and ATG11. Approximately 20% (1246 genes) of the genes were downregulated. These included glycolytic genes as well as genes involved in transcription and translation. Conclusion: Transcriptional profiling of H. polymorpha cells shifted from glucose to methanol showed the expected downregulation of glycolytic genes together with upregulation of the methanol utilisation pathway. This serves as a confirmation and validation of the array data obtained. Consistent with this, also various PEX genes were upregulated. The strong upregulation of ATG genes is possibly due to induction of autophagy processes related to remodeling of the cell architecture required to support growth on methanol. These processes may also be responsible for the enhanced peroxisomal b oxidation, as autophagy leads to recycling of membrane lipids. The prominent downregulation of transcription and translation may be explained by the reduced growth rate on methanol (td glucose 1 h vs td methanol 4.5 h).

...read moreread less

Journal Article•DOI•

Genomic and transcriptomic analysis of the AP2/ERF superfamily in Vitis vinifera

[...]

Francesco Licausi¹, Federico M. Giorgi², Sara Zenoni³, Fabio Osti, Mario Pezzotti³, Pierdomenico Perata¹ - Show less +2 more•Institutions (3)

Sant'Anna School of Advanced Studies¹, Max Planck Society², University of Verona³

20 Dec 2010-BMC Genomics

TL;DR: The presented analysis of AP2/ERF genes in grapevine provides the bases for studying the molecular regulation of berry development and the ripening process and introduces possible new roles for members of some ERF groups during fruit ripening.

...read moreread less

Abstract: Background The AP2/ERF protein family contains transcription factors that play a crucial role in plant growth and development and in response to biotic and abiotic stress conditions in plants. Grapevine (Vitis vinifera) is the only woody crop whose genome has been fully sequenced. So far, no detailed expression profile of AP2/ERF-like genes is available for grapevine.

...read moreread less

Journal Article•DOI•

De Novo Sequencing and Analysis of the American Ginseng Root Transcriptome Using a GS FLX Titanium Platform to Discover Putative Genes Involved in Ginsenoside Biosynthesis

[...]

Chao Sun¹, Ying Li¹, Qiong Wu¹, Hongmei Luo¹, Yongzhen Sun¹, Jingyuan Song¹, Edmund M.K. Lui², Shilin Chen¹ - Show less +4 more•Institutions (2)

Peking Union Medical College¹, University of Western Ontario²

24 Apr 2010-BMC Genomics

TL;DR: It is demonstrated that transcriptome analysis based on 454 pyrosequencing is a powerful tool for determining the genes encoding enzymes responsible for the biosynthesis of secondary metabolites in non-model plants.

...read moreread less

Abstract: American ginseng (Panax quinquefolius L.) is one of the most widely used herbal remedies in the world. Its major bioactive constituents are the triterpene saponins known as ginsenosides. However, little is known about ginsenoside biosynthesis in American ginseng, especially the late steps of the pathway. In this study, a one-quarter 454 sequencing run produced 209,747 high-quality reads with an average sequence length of 427 bases. De novo assembly generated 31,088 unique sequences containing 16,592 contigs and 14,496 singletons. About 93.1% of the high-quality reads were assembled into contigs with an average 8-fold coverage. A total of 21,684 (69.8%) unique sequences were annotated by a BLAST similarity search against four public sequence databases, and 4,097 of the unique sequences were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes. Based on the bioinformatic analysis described above, we found all of the known enzymes involved in ginsenoside backbone synthesis, starting from acetyl-CoA via the isoprenoid pathway. Additionally, a total of 150 cytochrome P450 (CYP450) and 235 glycosyltransferase unique sequences were found in the 454 cDNA library, some of which encode enzymes responsible for the conversion of the ginsenoside backbone into the various ginsenosides. Finally, one CYP450 and four UDP-glycosyltransferases were selected as the candidates most likely to be involved in ginsenoside biosynthesis through a methyl jasmonate (MeJA) inducibility experiment and tissue-specific expression pattern analysis based on a real-time PCR assay. We demonstrated, with the assistance of the MeJA inducibility experiment and tissue-specific expression pattern analysis, that transcriptome analysis based on 454 pyrosequencing is a powerful tool for determining the genes encoding enzymes responsible for the biosynthesis of secondary metabolites in non-model plants. Additionally, the expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community that is interested in the molecular genetics and functional genomics of American ginseng.

...read moreread less

Journal Article•DOI•

dbDEMC: a database of differentially expressed miRNAs in human cancers.

[...]

Zhen Yang¹, Zhen Yang², Zhen Yang³, Fei Ren⁴, Changning Liu³, Shunmin He³, Gang Sun², Qian Gao², Lei Yao², Yangde Zhang⁴, Ruoyu Miao¹, Ying Cao⁵, Yi Zhao³, Yang Zhong², Haitao Zhao¹ - Show less +11 more•Institutions (5)

Peking Union Medical College Hospital¹, Fudan University², Chinese Academy of Sciences³, Central South University⁴, Graduate University for Advanced Studies⁵

02 Dec 2010-BMC Genomics

TL;DR: This database is expected to be a valuable source for identification of cancer-related miRNAs, thereby helping with the improvement of classification, diagnosis and treatment of human cancers.

...read moreread less

Abstract: Background MicroRNAs (miRNAs) are small noncoding RNAs about 22 nt long that negatively regulate gene expression at the post-transcriptional level. Their key effects on various biological processes, e.g., embryonic development, cell division, differentiation and apoptosis, are widely recognized. Evidence suggests that aberrant expression of miRNAs may contribute to many types of human diseases, including cancer. Here we present a database of differentially expressed miRNAs in human cancers (dbDEMC), to explore aberrantly expressed miRNAs among different cancers.

...read moreread less

Journal Article•DOI•

Population- and genome-specific patterns of linkage disequilibrium and SNP variation in spring and winter wheat (Triticum aestivum L.)

[...]

Shiaoman Chao¹, Jorge Dubcovsky², Jan Dvorak², Ming-Cheng Luo², Stephen Baenziger³, Rustam Matnyazov⁴, Dale R. Clark, Luther E. Talbert⁵, James A. Anderson⁶, Susanne Dreisigacker⁷, Karl D. Glover⁸, Jianli Chen⁹, Kim Garland Campbell¹⁰, Phil L. Bruckner, Jackie C. Rudd¹¹, Scott D. Haley¹², Brett F. Carver¹³, Sid Perry, Mark E. Sorrells¹⁴, Eduard Akhunov⁴ - Show less +16 more•Institutions (14)

Agricultural Research Service¹, University of California, Davis², University of Nebraska–Lincoln³, Kansas State University⁴, Montana State University⁵, University of Minnesota⁶, International Maize and Wheat Improvement Center⁷, South Dakota State University⁸, University of Idaho⁹, Washington State University¹⁰, Texas AgriLife Research¹¹, Colorado State University¹², Oklahoma State University–Stillwater¹³, Cornell University¹⁴

29 Dec 2010-BMC Genomics

TL;DR: This study demonstrated that the estimates of population structure between spring and winter wheat lines can identify genomic regions harboring candidate genes involved in the regulation of growth habit, and suggests that breeding and selection had a different impact on each wheat genome both within and among populations.

...read moreread less

Abstract: Background: Single nucleotide polymorphisms (SNPs) are ideally suited for the construction of high-resolution genetic maps, studying population evolutionary history and performing genome-wide association mapping experiments. Here, we used a genome-wide set of 1536 SNPs to study linkage disequilibrium (LD) and population structure in a panel of 478 spring and winter wheat cultivars (Triticum aestivum) from 17 populations across the United States and Mexico. Results: Most of the wheat oligo pool assay (OPA) SNPs that were polymorphic within the complete set of 478 cultivars were also polymorphic in all subpopulations. Higher levels of genetic differentiation were observed among wheat lines within populations than among populations. A total of nine genetically distinct clusters were identified, suggesting that some of the pre-defined populations shared significant proportion of genetic ancestry. Estimates of population structure (FST) at individual loci showed a high level of heterogeneity across the genome. In addition, seven genomic regions with elevated FST were detected between the spring and winter wheat populations. Some of these regions overlapped with previously mapped flowering time QTL. Across all populations, the highest extent of significant LD was observed in the wheat D-genome, followed by lower LD in the A- and B-genomes. The differences in the extent of LD among populations and genomes were mostly driven by differences in long-range LD ( > 10 cM). Conclusions: Genome- and population-specific patterns of genetic differentiation and LD were discovered in the populations of wheat cultivars from different geographic regions. Our study demonstrated that the estimates of population structure between spring and winter wheat lines can identify genomic regions harboring candidate genes involved in the regulation of growth habit. Variation in LD suggests that breeding and selection had a different impact on each wheat genome both within and among populations. The higher extent of LD in the wheat D-genome versus the A- and B-genomes likely reflects the episodes of recent introgression and population bottleneck accompanying the origin of hexaploid wheat. The assessment of LD and population structure in this assembled panel of diverse lines provides critical information for the development of genetic resources for genome-wide association mapping of agronomically important traits in wheat.

...read moreread less

Journal Article•DOI•

Comprehensive expression analysis suggests overlapping and specific roles of rice glutathione S-transferase genes during development and stress responses.

[...]

Mukesh K. Jain, Challa Ghanashyam, Annapurna Bhattacharjee

29 Jan 2010-BMC Genomics

TL;DR: This study provides evidence for the role of GSTs in mediating crosstalk between various stress and hormone response pathways and represents a very useful resource for functional analysis of selected members of this family in rice.

...read moreread less

Abstract: Glutathione S-transferases (GSTs) are the ubiquitous enzymes that play a key role in cellular detoxification. Although several GSTs have been identified and characterized in various plant species, the knowledge about their role in developmental processes and response to various stimuli is still very limited. In this study, we report genome-wide identification, characterization and comprehensive expression analysis of members of GST gene family in crop plant rice, to reveal their function(s). A systematic analysis revealed the presence of at least 79 GST genes in the rice genome. Phylogenetic analysis grouped GST proteins into seven classes. Sequence analysis together with the organization of putative motifs indicated the potential diverse functions of GST gene family members in rice. The tandem gene duplications have contributed a major role in expansion of this gene family. Microarray data analysis revealed tissue-/organ- and developmental stage-specific expression patterns of several rice GST genes. At least 31 GST genes showed response to plant hormones auxin and cytokinin. Furthermore, expression analysis showed the differential expression of quite a large number of GST genes during various abiotic stress (20), arsenate stress (32) and biotic stress (48) conditions. Many of the GST genes were commonly regulated by developmental processes, hormones, abiotic and biotic stresses. The transcript profiling suggests overlapping and specific role(s) of GSTs during various stages of development in rice. Further, the study provides evidence for the role of GSTs in mediating crosstalk between various stress and hormone response pathways and represents a very useful resource for functional analysis of selected members of this family in rice.

...read moreread less

Journal Article•DOI•

Study of inter- and intra-individual variations in the salivary microbiota.

[...]

Vladimir Lazarevic¹, Katrine Whiteson¹, David Hernandez¹, Patrice Francois¹, Jacques Schrenzel¹ - Show less +1 more•Institutions (1)

Geneva College¹

28 Sep 2010-BMC Genomics

TL;DR: The salivary microbial community appeared to be stable over at least 5 days, allowing for subject-specific grouping using UniFrac, and the results point to the persistence of subject- specific taxa whose frequency fluctuates between the time points.

...read moreread less

Abstract: Oral bacterial communities contain species that promote health and others that have been implicated in oral and/or systemic diseases. Culture-independent approaches provide the best means to assess the diversity of oral bacteria because most of them remain uncultivable. The salivary microbiota from five adults was analyzed at three time-points by means of the 454 pyrosequencing technology. The V1-V3 region of the bacterial 16S rRNA genes was amplified by PCR using saliva lysates and broad-range primers. The bar-coded PCR products were pooled and sequenced unidirectionally to cover the V3 hypervariable region. Of 50,708 obtained sequences, 31,860 passed the quality control. Non-bacterial sequences (2.2%) were removed leaving 31,170 reads. Samples were dominated by seven major phyla: members of Firmicutes, Proteobacteria, Actinobacteria, Bacteroidetes and candidate division TM7 were identified in all samples; Fusobacteria and Spirochaetes were identified in all individuals, but not at all time-points. The dataset was represented by 3,011 distinct sequences (100%-ID phylotypes) of ~215 nucleotides and 583 phylotypes defined at ≥97% identity (97%-ID phylotypes). We compared saliva samples from different individuals in terms of the phylogeny of their microbial communities. Based on the presence and absence of phylotypes defined at 100% or 97% identity thresholds, samples from each subject formed separate clusters. Among individual taxa, phylum Bacteroidetes and order Clostridiales (Firmicutes) were the best indicators of intraindividual similarity of the salivary flora over time. Fifteen out of 81 genera constituted 73 to 94% of the total sequences present in different samples. Of these, 8 were shared by all time points of all individuals, while 15-25 genera were present in all three time-points of different individuals. Representatives of the class Sphingobacteria, order Sphingobacteriales and family Clostridiaceae were found only in one subject. The salivary microbial community appeared to be stable over at least 5 days, allowing for subject-specific grouping using UniFrac. Inclusion of all available samples from more distant time points (up to 29 days) confirmed this observation. Samples taken at closer time intervals were not necessarily more similar than samples obtained across longer sampling times. These results point to the persistence of subject-specific taxa whose frequency fluctuates between the time points. Genus Gemella, identified in all time-points of all individuals, was not defined as a core-microbiome genus in previous studies of salivary bacterial communities. Human oral microbiome studies are still in their infancy and larger-scale projects are required to better define individual and universal oral microbiome core.

...read moreread less

Journal Article•DOI•

Fine definition of the pedigree haplotypes of closely related rice cultivars by means of genome-wide discovery of single-nucleotide polymorphisms.

[...]

Toshio Yamamoto, Hideki Nagasaki, Jun-ichi Yonemaru, Kaworu Ebana, Maiko Nakajima, Taeko Shibaya, Masahiro Yano - Show less +3 more

27 Apr 2010-BMC Genomics

TL;DR: Detection of genome-wide SNPs by both high-throughput sequencer and typing array made it possible to evaluate genomic composition of genetically related rice varieties and clarified the dynamics of chromosome recombination during the historical rice breeding process.

...read moreread less

Abstract: To create useful gene combinations in crop breeding, it is necessary to clarify the dynamics of the genome composition created by breeding practices. A large quantity of single-nucleotide polymorphism (SNP) data is required to permit discrimination of chromosome segments among modern cultivars, which are genetically related. Here, we used a high-throughput sequencer to conduct whole-genome sequencing of an elite Japanese rice cultivar, Koshihikari, which is closely related to Nipponbare, whose genome sequencing has been completed. Then we designed a high-throughput typing array based on the SNP information by comparison of the two sequences. Finally, we applied this array to analyze historical representative rice cultivars to understand the dynamics of their genome composition. The total 5.89-Gb sequence for Koshihikari, equivalent to 15.7× the entire rice genome, was mapped using the Pseudomolecules 4.0 database for Nipponbare. The resultant Koshihikari genome sequence corresponded to 80.1% of the Nipponbare sequence and led to the identification of 67 051 SNPs. A high-throughput typing array consisting of 1917 SNP sites distributed throughout the genome was designed to genotype 151 representative Japanese cultivars that have been grown during the past 150 years. We could identify the ancestral origin of the pedigree haplotypes in 60.9% of the Koshihikari genome and 18 consensus haplotype blocks which are inherited from traditional landraces to current improved varieties. Moreover, it was predicted that modern breeding practices have generally decreased genetic diversity Detection of genome-wide SNPs by both high-throughput sequencer and typing array made it possible to evaluate genomic composition of genetically related rice varieties. With the aid of their pedigree information, we clarified the dynamics of chromosome recombination during the historical rice breeding process. We also found several genomic regions decreasing genetic diversity which might be caused by a recent human selection in rice breeding. The definition of pedigree haplotypes by means of genome-wide SNPs will facilitate next-generation breeding of rice and other crops.

...read moreread less

Journal Article•DOI•

Transcriptome and proteome analysis of Pinctada margaritifera calcifying mantle and shell: focus on biomineralization

[...]

Caroline Joubert¹, David Piquemal, Benjamin Marie², Laurent Manchon, Fabien Pierrat, Isabelle Zanella-Cléon³, Nathalie Cochennec-Laureau¹, Yannick Gueguen¹, Caroline Montagnani¹ - Show less +5 more•Institutions (3)

IFREMER¹, University of Burgundy², Independent Bank³

01 Nov 2010-BMC Genomics

TL;DR: This EST study made on the calcifying tissue of P. margaritifera is the first description of pyrosequencing on a pearl-producing bivalve species, and represents a major breakthrough in the field of molluskan biomineralization.

...read moreread less

Abstract: The shell of the pearl-producing bivalve Pinctada margaritifera is composed of an organic cell-free matrix that plays a key role in the dynamic process of biologically-controlled biomineralization. In order to increase genomic resources and identify shell matrix proteins implicated in biomineralization in P. margaritifera, high-throughput Expressed Sequence Tag (EST) pyrosequencing was undertaken on the calcifying mantle, combined with a proteomic analysis of the shell. We report the functional analysis of 276 738 sequences, leading to the constitution of an unprecedented catalog of 82 P. margaritifera biomineralization-related mantle protein sequences. Components of the current "chitin-silk fibroin gel-acidic macromolecule" model of biomineralization processes were found, in particular a homolog of a biomineralization protein (Pif-177) recently discovered in P. fucata. Among these sequences, we could show the localization of two other biomineralization protein transcripts, pmarg-aspein and pmarg-pearlin, in two distinct areas of the outer mantle epithelium, suggesting their implication in calcite and aragonite formation. Finally, by combining the EST approach with a proteomic mass spectrometry analysis of proteins isolated from the P. margaritifera shell organic matrix, we demonstrated the presence of 30 sequences containing almost all of the shell proteins that have been previously described from shell matrix protein analyses of the Pinctada genus. The integration of these two methods allowed the global composition of biomineralizing tissue and calcified structures to be examined in tandem for the first time. This EST study made on the calcifying tissue of P. margaritifera is the first description of pyrosequencing on a pearl-producing bivalve species. Our results provide direct evidence that our EST data set covers most of the diversity of the matrix protein of P. margaritifera shell, but also that the mantle transcripts encode proteins present in P. margaritifera shell, hence demonstrating their implication in shell formation. Combining transcriptomic and proteomic approaches is therefore a powerful way to identify proteins involved in biomineralization. Data generated in this study supply the most comprehensive list of biomineralization-related sequences presently available among protostomian species, and represent a major breakthrough in the field of molluskan biomineralization.

...read moreread less

Journal Article•DOI•

Pyrosequencing-based comparative genome analysis of the nosocomial pathogen Enterococcus faecium and identification of a large transferable pathogenicity island

[...]

Willem van Schaik¹, Janetta Top¹, David R. Riley², Jos Boekhorst¹, Joyce E. P. Vrijenhoek¹, Claudia M. E. Schapendonk¹, Antoni P. A. Hendrickx³, Antoni P. A. Hendrickx¹, Isaac J. Nijman¹, Marc J. M. Bonten¹, Hervé Tettelin², Rob J. L. Willems¹ - Show less +8 more•Institutions (3)

Utrecht University¹, University of Maryland, Baltimore², University of Chicago³

14 Apr 2010-BMC Genomics

TL;DR: Genes involved in environmental persistence, colonization and virulence can easily be aquired by E. faecium, which will make the development of successful treatment strategies targeted against this organism a challenge for years to come.

...read moreread less

Abstract: The Gram-positive bacterium Enterococcus faecium is an important cause of nosocomial infections in immunocompromized patients. We present a pyrosequencing-based comparative genome analysis of seven E. faecium strains that were isolated from various sources. In the genomes of clinical isolates several antibiotic resistance genes were identified, including the vanA transposon that confers resistance to vancomycin in two strains. A functional comparison between E. faecium and the related opportunistic pathogen E. faecalis based on differences in the presence of protein families, revealed divergence in plant carbohydrate metabolic pathways and oxidative stress defense mechanisms. The E. faecium pan-genome was estimated to be essentially unlimited in size, indicating that E. faecium can efficiently acquire and incorporate exogenous DNA in its gene pool. One of the most prominent sources of genomic diversity consists of bacteriophages that have integrated in the genome. The CRISPR-Cas system, which contributes to immunity against bacteriophage infection in prokaryotes, is not present in the sequenced strains. Three sequenced isolates carry the esp gene, which is involved in urinary tract infections and biofilm formation. The esp gene is located on a large pathogenicity island (PAI), which is between 64 and 104 kb in size. Conjugation experiments showed that the entire esp PAI can be transferred horizontally and inserts in a site-specific manner. Genes involved in environmental persistence, colonization and virulence can easily be aquired by E. faecium. This will make the development of successful treatment strategies targeted against this organism a challenge for years to come.

...read moreread less

Journal Article•DOI•

Ensembl variation resources.

[...]

Yuan Chen¹, Fiona Cunningham¹, Daniel Rios¹, William M. McLaren¹, James Smith², Bethan Pritchard², Giulietta Spudich¹, Simon Brent², Eugene Kulesha¹, Pablo Marin-Garcia², Damian Smedley¹, Ewan Birney¹, Paul Flicek², Paul Flicek¹ - Show less +10 more•Institutions (2)

European Bioinformatics Institute¹, Wellcome Trust Sanger Institute²

11 May 2010-BMC Genomics

TL;DR: This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases, and explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically.

...read moreread less

Abstract: The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics. The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl. Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org.

...read moreread less

Journal Article•DOI•

Development of a EST dataset and characterization of EST-SSRs in a traditional Chinese medicinal plant, Epimedium sagittatum (Sieb. Et Zucc.) Maxim.

[...]

Shaohua Zeng¹, Gong Xiao¹, Juan Guo¹, Zhangjun Fei², Zhangjun Fei³, Yanqin Xu¹, Bruce A. Roe⁴, Ying Wang¹ - Show less +4 more•Institutions (4)

Chinese Academy of Sciences¹, Boyce Thompson Institute for Plant Research², Ithaca College³, University of Oklahoma⁴

08 Feb 2010-BMC Genomics

TL;DR: A large EST dataset with a total of 76,459 consensus sequences is generated, aiming to provide sequence information for deciphering secondary metabolism, especially for flavonoid pathway in Epimedium.

...read moreread less

Abstract: Background Epimedium sagittatum (Sieb. Et Zucc.) Maxim, a traditional Chinese medicinal plant species, has been used extensively as genuine medicinal materials. Certain Epimedium species are endangered due to commercial overexploition, while sustainable application studies, conservation genetics, systematics, and marker-assisted selection (MAS) of Epimedium is less-studied due to the lack of molecular markers. Here, we report a set of expressed sequence tags (ESTs) and simple sequence repeats (SSRs) identified in these ESTs for E. sagittatum.

...read moreread less

Journal Article•DOI•

A 454 multiplex sequencing method for rapid and reliable genotyping of highly polymorphic genes in large-scale studies

[...]

Maxime Galan¹, Emmanuel Guivier¹, Gilles Caraux², Gilles Caraux¹, Nathalie Charbonnel¹, Jean-François Cosson¹ - Show less +2 more•Institutions (2)

SupAgro¹, University of Montpellier²

11 May 2010-BMC Genomics

TL;DR: This new approach is a promising alternative to classical methods involving electrophoresis-based techniques for variant separation and cloning-sequencing for sequence determination, less costly and time consuming and may enhance the reliability of genotypes obtained when high numbers of samples are studied.

...read moreread less

Abstract: Background High-throughput sequencing technologies offer new perspectives for biomedical, agronomical and evolutionary research. Promising progresses now concern the application of these technologies to large-scale studies of genetic variation. Such studies require the genotyping of high numbers of samples. This is theoretically possible using 454 pyrosequencing, which generates billions of base pairs of sequence data. However several challenges arise: first in the attribution of each read produced to its original sample, and second, in bioinformatic analyses to distinguish true from artifactual sequence variation. This pilot study proposes a new application for the 454 GS FLX platform, allowing the individual genotyping of thousands of samples in one run. A probabilistic model has been developed to demonstrate the reliability of this method.

...read moreread less

Journal Article•DOI•

Deep sequencing discovery of novel and conserved microRNAs in trifoliate orange ( Citrus trifoliata )

[...]

Changnian Song¹, Chen Wang¹, Changqing Zhang, Nicholas Kibet Korir¹, Huaping Yu¹, Zhengqiang Ma¹, Jinggui Fang¹ - Show less +3 more•Institutions (1)

Nanjing Agricultural University¹

13 Jul 2010-BMC Genomics

TL;DR: In this article, the authors used Solexa sequencing to discover new microRNAs in trifoliate orange (Citrus trifoliata) which is an important rootstock of citrus.

...read moreread less

Abstract: MicroRNAs (miRNAs) play a critical role in post-transcriptional gene regulation and have been shown to control many genes involved in various biological and metabolic processes. There have been extensive studies to discover miRNAs and analyze their functions in model plant species, such as Arabidopsis and rice. Deep sequencing technologies have facilitated identification of species-specific or lowly expressed as well as conserved or highly expressed miRNAs in plants. In this research, we used Solexa sequencing to discover new microRNAs in trifoliate orange (Citrus trifoliata) which is an important rootstock of citrus. A total of 13,106,753 reads representing 4,876,395 distinct sequences were obtained from a short RNA library generated from small RNA extracted from C. trifoliata flower and fruit tissues. Based on sequence similarity and hairpin structure prediction, we found that 156,639 reads representing 63 sequences from 42 highly conserved miRNA families, have perfect matches to known miRNAs. We also identified 10 novel miRNA candidates whose precursors were all potentially generated from citrus ESTs. In addition, five miRNA* sequences were also sequenced. These sequences had not been earlier described in other plant species and accumulation of the 10 novel miRNAs were confirmed by qRT-PCR analysis. Potential target genes were predicted for most conserved and novel miRNAs. Moreover, four target genes including one encoding IRX12 copper ion binding/oxidoreductase and three genes encoding NB-LRR disease resistance protein have been experimentally verified by detection of the miRNA-mediated mRNA cleavage in C. trifoliata. Deep sequencing of short RNAs from C. trifoliata flowers and fruits identified 10 new potential miRNAs and 42 highly conserved miRNA families, indicating that specific miRNAs exist in C. trifoliata. These results show that regulatory miRNAs exist in agronomically important trifoliate orange and may play an important role in citrus growth, development, and response to disease.

...read moreread less

Journal Article•DOI•

Large-scale analysis of full-length cDNAs from the tomato ( Solanum lycopersicum ) cultivar Micro-Tom, a reference system for the Solanaceae genomics

[...]

Koh Aoki, Kentaro Yano¹, Ayako Suzuki¹, Shingo Kawamura¹, Nozomu Sakurai, Kunihiro Suda, Atsushi Kurabayashi, Tatsuya Suzuki, Taneaki Tsugane, Manabu Watanabe, Kazuhide Ooga, Maiko Torii, Takanori Narita², Tadasu Shin-I², Yuji Kohara², Naoki Yamamoto¹, Hideki Takahashi³, Yuichiro Watanabe⁴, Mayumi Egusa⁵, Motoichiro Kodama⁵, Yuki Ichinose⁶, Mari Kikuchi⁷, Sumire Fukushima⁷, Akiko Okabe⁷, Tsutomu Arie⁷, Yuko Sato⁸, Katsumi Yazawa⁸, Shinobu Satoh⁸, Toshikazu Omura⁸, Hiroshi Ezura⁸, Daisuke Shibata - Show less +27 more•Institutions (8)

Meiji University¹, National Institute of Genetics², Tohoku University³, University of Tokyo⁴, Tottori University⁵, Okayama University⁶, Tokyo University of Agriculture and Technology⁷, University of Tsukuba⁸

30 Mar 2010-BMC Genomics

TL;DR: The collection of Micro-Tom nrFLcDNAs generated in this study will serve as a valuable genomic tool for plant biologists to bridge the gap between basic and applied studies and aid in tomato functional genomics and molecular breeding.

...read moreread less

Abstract: The Solanaceae family includes several economically important vegetable crops. The tomato (Solanum lycopersicum) is regarded as a model plant of the Solanaceae family. Recently, a number of tomato resources have been developed in parallel with the ongoing tomato genome sequencing project. In particular, a miniature cultivar, Micro-Tom, is regarded as a model system in tomato genomics, and a number of genomics resources in the Micro-Tom-background, such as ESTs and mutagenized lines, have been established by an international alliance. To accelerate the progress in tomato genomics, we developed a collection of fully-sequenced 13,227 Micro-Tom full-length cDNAs. By checking redundant sequences, coding sequences, and chimeric sequences, a set of 11,502 non-redundant full-length cDNAs (nrFLcDNAs) was generated. Analysis of untranslated regions demonstrated that tomato has longer 5'- and 3'-untranslated regions than most other plants but rice. Classification of functions of proteins predicted from the coding sequences demonstrated that nrFLcDNAs covered a broad range of functions. A comparison of nrFLcDNAs with genes of sixteen plants facilitated the identification of tomato genes that are not found in other plants, most of which did not have known protein domains. Mapping of the nrFLcDNAs onto currently available tomato genome sequences facilitated prediction of exon-intron structure. Introns of tomato genes were longer than those of Arabidopsis and rice. According to a comparison of exon sequences between the nrFLcDNAs and the tomato genome sequences, the frequency of nucleotide mismatch in exons between Micro-Tom and the genome-sequencing cultivar (Heinz 1706) was estimated to be 0.061%. The collection of Micro-Tom nrFLcDNAs generated in this study will serve as a valuable genomic tool for plant biologists to bridge the gap between basic and applied studies. The nrFLcDNA sequences will help annotation of the tomato whole-genome sequence and aid in tomato functional genomics and molecular breeding. Full-length cDNA sequences and their annotations are provided in the database KaFTom http://www.pgb.kazusa.or.jp/kaftom/ via the website of the National Bioresource Project Tomato http://tomato.nbrp.jp .

...read moreread less

Journal Article•DOI•

A deep investigation into the adipogenesis mechanism: Profile of microRNAs regulating adipogenesis by modulating the canonical Wnt/β-catenin signaling pathway

[...]

Limei Qin¹, Yaosheng Chen¹, Yuna Niu¹, Weiquan Chen¹, Qiwei Wang¹, Shuqi Xiao¹, Anning Li¹, Ying Xie¹, Jing Li¹, Xiao Zhao¹, Zuyong He¹, Delin Mo¹ - Show less +8 more•Institutions (1)

Sun Yat-sen University¹

23 May 2010-BMC Genomics

TL;DR: This study represents the first attempt to unveil the profile of miRNAs involed in adipogenesis by modulating WNT signaling pathway, which contributed to deeper investigation of the mechanism of adipogenesis.

...read moreread less

Abstract: MicroRNAs (miRNAs) are a large class of tiny non-coding RNAs (~22-24 nt) that regulate diverse biological processes at the posttranscriptional level by controlling mRNA stability or translation. As a molecular switch, the canonical Wnt/β-catenin signaling pathway should be suppressed during the adipogenesis; However, activation of this pathway leads to the inhibition of lipid depots formation. The aim of our studies was to identify miRNAs that might be involved in adipogenesis by modulating WNT signaling pathway. Here we established two types of cell model, activation and repression of WNT signaling, and investigated the expression profile of microRNAs using microarray assay. The high throughput microarray data revealed 18 miRNAs that might promote adipogenesis by repressing WNT signaling: miR-210, miR-148a, miR-194, miR-322 etc. Meanwhile, we also identified 29 miRNAs that might have negative effect on adipogenesis by activating WNT signaling: miR-344, miR-27 and miR-181 etc. The targets of these miRNAs were also analysed by bioinformatics. To validate the predicted targets and the potential functions of these identified miRNAs, the mimics of miR-210 were transfected into 3T3-L1 cells and enlarged cells with distinct lipid droplets were observed; Meanwhile, transfection with the inhibitor of miR-210 could markedly decrease differentiation-specific factors at the transcription level, which suggested the specific role of miR-210 in promoting adipogenesis. Tcf7l2, the predicted target of miR-210, is a transcription factor triggering the downstream responsive genes of WNT signaling, was blocked at transcription level. Furthermore, the activity of luciferase reporter bearing Tcf7l2 mRNA 3' UTR was decreased after co-transfection with miR-210 in HEK-293FT cells. Last but not least, the protein expression level of β-catenin was increased in the lithium (LiCl) treated 3T3-L1 cells after transfection with miR-210. These findings suggested that miR-210 could promote adipogenesis by repressing WNT signaling through targeting Tcf7l2. The results suggest the presence of miRNAs in two cell models, providing insights into WNT pathway-specific miRNAs that can be further characterized for their potential roles in adipogenesis. To our knowledge, present study represents the first attempt to unveil the profile of miRNAs involed in adipogenesis by modulating WNT signaling pathway, which contributed to deeper investigation of the mechanism of adipogenesis.

...read moreread less

Journal Article•DOI•

Whole genome analysis of a livestock-associated methicillin-resistant Staphylococcus aureus ST398 isolate from a case of human endocarditis.

[...]

Maarten J Schijffelen¹, C H Edwin Boel¹, Jos A. G. van Strijp¹, Ad C. Fluit¹•Institutions (1)

University Medical Center Utrecht¹

14 Jun 2010-BMC Genomics

TL;DR: The proposed enhanced ability of these isolates to acquire mobile elements may lead to the rapid acquisition of determinants which contribute to virulence in human infections.

...read moreread less

Abstract: Recently, a new livestock-associated methicillin-resistant Staphylococcus aureus (MRSA) Sequence Type 398 (ST398) isolate has emerged worldwide. Although there have been reports of invasive disease in humans, MRSA ST398 colonization is much more common in livestock and demonstrates especially high prevalence rates in pigs and calves. The aim of this study was to compare the genome sequence of an ST398 MRSA isolate with other S. aureus genomes in order to identify genetic traits that may explain the success of this particular lineage. Therefore, we determined the whole genome sequence of S0385, an MRSA ST398 isolate from a human case of endocarditis. The entire genome sequence of S0385 demonstrated considerable accessory genome content differences relative to other S. aureus genomes. Several mobile genetic elements that confer antibiotic resistance were identified, including a novel composite of an type V (5C2&5) Staphylococcal Chromosome Cassette mec (SCCmec) with distinct joining (J) regions. The presence of multiple integrative conjugative elements combined with the absence of a type I restriction and modification system on one of the two νSa islands, could enhance horizontal gene transfer in this strain. The ST398 MRSA isolate carries a unique pathogenicity island which encodes homologues of two excreted virulence factors; staphylococcal complement inhibitor (SCIN) and von Willebrand factor-binding protein (vWbp). However, several virulence factors such as enterotoxins and phage encoded toxins, including Panton-Valentine leukocidin (PVL), were not identified in this isolate. Until now MRSA ST398 isolates did not cause frequent invasive disease in humans, which may be due to the absence of several common virulence factors. However, the proposed enhanced ability of these isolates to acquire mobile elements may lead to the rapid acquisition of determinants which contribute to virulence in human infections.

...read moreread less

Collapse