Author
Mi-Kyung Lee
Other affiliations: International Rice Research Institute
Bio: Mi-Kyung Lee is an academic researcher from Texas A&M University. The author has contributed to research in topics: Genome & Contig. The author has an hindex of 17, co-authored 19 publications receiving 1870 citations. Previous affiliations of Mi-Kyung Lee include International Rice Research Institute.
Topics: Genome, Contig, Gene, Bacterial artificial chromosome, Contig Mapping
Papers
More filters
••
Virginia Tech1, Joint Genome Institute2, Lawrence Berkeley National Laboratory3, Wageningen University and Research Centre4, University of Warwick5, Imperial College London6, University of California, Berkeley7, Cornell University8, Ohio Agricultural Research and Development Center9, Agriculture and Agri-Food Canada10, Agricultural Research Service11, Lawrence Livermore National Laboratory12, North Carolina State University13, University of Tennessee14, Oak Ridge National Laboratory15, University of California, Merced16, University of Queensland17, Wilkes University18, Bowling Green State University19, Hokkaido University20
TL;DR: Comparison of the two species' genomes reveals a rapid expansion and diversification of many protein families associated with plant infection such as hydrolases, ABC transporters, protein toxins, proteinase inhibitors, and, in particular, a superfamily of 700 proteins with similarity to known oömycete avirulence genes.
Abstract: Draft genome sequences have been determined for the soybean pathogen Phytophthora sojae and the sudden oak death pathogen Phytophthora ramorum. Oomycetes such as these Phytophthora species share the kingdom Stramenopila with photosynthetic algae such as diatoms, and the presence of many Phytophthora genes of probable phototroph origin supports a photosynthetic ancestry for the stramenopiles. Comparison of the two species' genomes reveals a rapid expansion and diversification of many protein families associated with plant infection such as hydrolases, ABC transporters, protein toxins, proteinase inhibitors, and, in particular, a superfamily of 700 proteins with similarity to known oomycete avirulence genes.
1,016 citations
••
Virginia Tech1, United States Department of Agriculture2, University of Maryland, College Park3, Wageningen University and Research Centre4, European Bioinformatics Institute5, Roche Applied Science6, University of Edinburgh7, Virginia Bioinformatics Institute8, Utah State University9, National Institutes of Health10, University of California, Davis11, Michigan State University12, Texas A&M University13, Leipzig University14, Children's Hospital Oakland Research Institute15, Institute for Animal Health16, Seoul National University17, University of Marburg18, Wellcome Trust Sanger Institute19, University of Delaware20, University of Vienna21, University of Minnesota22
TL;DR: The combined application of next-generation sequencing platforms has provided an economical approach to unlocking the potential of the turkey genome.
Abstract: A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.
415 citations
••
TL;DR: This map represents the first genome-wide, BAC-based physical map of the chicken genome and provides a powerful platform for many areas of chicken genomics, including targeted marker development, fine mapping of genes and QTL alleles, positional cloning, analysis of avian genome organization and evolution, chicken-mammalian comparative genomics and large-scale genome sequencing.
Abstract: A genome-wide physical map constructed with bacterial artificial chromosomes (BACs) is an essential component in linking phenotypic traits to the responsible genetic variation in the genomes of plants and animals. We have constructed a physical map of the chicken genome from 57,091 BACs (7.9-fold haploid genome coverage) by restriction fingerprint analysis using high-resolution polyacrylamide gel electrophoresis. The physical map consists of 2331 overlapping BAC contigs and is estimated to span 1510 Mb in physical length. BAC contigs were verified manually and by screening the BACs with 367 DNA markers. A total of 361 of the contigs have been anchored to the existing chicken genetic map. This map represents the first genome-wide, BAC-based physical map of the chicken genome. It provides a powerful platform for many areas of chicken genomics, including targeted marker development, fine mapping of genes and QTL alleles, positional cloning, analysis of avian genome organization and evolution, chicken-mammalian comparative genomics, and large-scale genome sequencing.
84 citations
••
TL;DR: It is found that the size variations of both gene families are associated with organisms’ phylogeny, suggesting their roles in speciation and evolution.
Abstract: Many genes exist in the form of families; however, little is known about their size variation, evolution and biology. Here, we present the size variation and evolution of the nucleotide-binding site (NBS)-encoding gene family and receptor-like kinase (RLK) gene family in Oryza, Glycine and Gossypium. The sizes of both families vary by numeral fold, not only among species, surprisingly, also within a species. The size variations of the gene families are shown to correlate with each other, indicating their interactions, and driven by natural selection, artificial selection and genome size variation, but likely not by polyploidization. The numbers of genes in the families in a polyploid species are similar to those of one of its diploid donors, suggesting that polyploidization plays little roles in the expansion of the gene families and that organisms tend not to maintain their ‘surplus’ genes in the course of evolution. Furthermore, it is found that the size variations of both gene families are associated with organisms’ phylogeny, suggesting their roles in speciation and evolution. Since both selection and speciation act on organism’s morphological, physiological and biological variation, our results indicate that the variation of gene family size provides a source of genetic variation and evolution.
63 citations
••
TL;DR: Chicken BAC libraries constructed with three different restriction enzyme-generated inserts (HindIII, BamHI and EcoRI) should provide nearly full coverage of the chicken genome, suitable for high-resolution physical mapping and sequence analysis.
Abstract: Source/description: Large-insert BAC libraries have been essential components in the physical mapping and sequencing of the human genome and those of other species. Crooijmans et al. constructed a chicken BAC library with a 5.5-fold representation of the chicken genome using HindIII partial digest fragments. However, this is unlikely to provide full coverage by itself, as the use of only a single enzyme will bias against regions of the genome with unusually high or low densities of restriction sites. Kato et al. recently described a HindIII-based library of similar size, but this is not publicly available. This paper describes chicken BAC libraries constructed with three different restriction enzyme-generated inserts (HindIII, BamHI and EcoRI). Together, the three libraries should provide nearly full coverage of the chicken genome, suitable for high-resolution physical mapping and sequence analysis. All three libraries are publicly available, distributed as duplicated libraries, individual BACs and high-density colony filters. A female of the Red Jungle Fowl (Gallus gallus gallus) line UCD001 was the DNA source. A single bird from an inbred line was chosen to minimize heterozygosity that could impede eventual BAC contig assembly using fingerprint analysis. In addition, comparison of the UCD001 sequence to numerous available White Leghorn accessions should generate dense single-nucleotide polymorphism coverage of the genome...
54 citations
Cited by
More filters
••
TL;DR: This work proposes a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient, based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length.
Abstract: Motivation: Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm.
Results: We propose a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data structure of choice for solving many string problems. For the task of k-mer counting, important in many biological applications, Jellyfish offers a much faster and more memory-efficient solution.
Availability: The Jellyfish software is written in C++ and is GPL licensed. It is available for download at http://www.cbcb.umd.edu/software/jellyfish.
Contact: [email protected]
Supplementary information:Supplementary data are available at Bioinformatics online.
2,779 citations
••
TL;DR: The sequencing and assembly of the oyster genome using short reads and a fosmid-pooling strategy and transcriptomes of development and stress response and the proteome of the shell are reported, showing that shell formation in molluscs is more complex than currently understood and involves extensive participation of cells and their exosomes.
Abstract: The Pacific oyster Crassostrea gigas belongs to one of the most species-rich but genomically poorly explored phyla, the Mollusca. Here we report the sequencing and assembly of the oyster genome using short reads and a fosmid-pooling strategy, along with transcriptomes of development and stress response and the proteome of the shell. The oyster genome is highly polymorphic and rich in repetitive sequences, with some transposable elements still actively shaping variation. Transcriptome studies reveal an extensive set of genes responding to environmental stress. The expansion of genes coding for heat shock protein 70 and inhibitors of apoptosis is probably central to the oyster's adaptation to sessile life in the highly stressful intertidal zone. Our analyses also show that shell formation in molluscs is more complex than currently understood and involves extensive participation of cells and their exosomes. The oyster genome sequence fills a void in our understanding of the Lophotrochozoa.
1,806 citations
••
Duke University1, University of Texas at Austin2, Heidelberg Institute for Theoretical Studies3, Beijing Genomics Institute4, Xi'an Jiaotong University5, American Museum of Natural History6, New Mexico State University7, University of Sydney8, University of California9, Uppsala University10, University of Copenhagen11, Okinawa Institute of Science and Technology12, University of Georgia13, Griffith University14, Catalan Institution for Research and Advanced Studies15, Joint Institute for Nuclear Research16, Oak Ridge National Laboratory17, Aarhus University18, Washington University in St. Louis19, University of California, Santa Cruz20, Cardiff University21, Kunming Institute of Zoology22, China Agricultural University23, Tulane University24, Louisiana State University25, Copenhagen Zoo26, Oregon Health & Science University27, Federal University of Pará28, Technical University of Denmark29, Canterbury Museum30, Curtin University31, Novosibirsk State University32, Smithsonian Institution33, National University of Singapore34, National Museum of Natural History35, Nova Southeastern University36, Occidental College37, University of Edinburgh38, Harvard University39, University of California, San Francisco40, University of Florida41, University of Illinois at Urbana–Champaign42
TL;DR: A genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves recovered a highly resolved tree that confirms previously controversial sister or close relationships and identifies the first divergence in Neoaves, two groups the authors named Passerea and Columbea.
Abstract: To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago.
1,624 citations
••
École Normale Supérieure1, J. Craig Venter Institute2, Joint Genome Institute3, Alfred Wegener Institute for Polar and Marine Research4, University of Konstanz5, University of Wisconsin–Milwaukee6, University of Melbourne7, University of Washington8, University of Nantes9, University of Wisconsin-Madison10, Ghent University11, University of Rhode Island12, Sewanee: The University of the South13, University of Arizona14, Hebrew University of Jerusalem15, Georgia Institute of Technology16, Leibniz Institute for Neurobiology17, Stazione Zoologica Anton Dohrn18, University of British Columbia19, Stanford University20, Scottish Association for Marine Science21, University of North Carolina at Wilmington22
TL;DR: Analysis of molecular divergence compared with yeasts and metazoans reveals rapid rates of gene diversification in diatoms, and documents the presence of hundreds of genes from bacteria, likely to provide novel possibilities for metabolite management and for perception of environmental signals.
Abstract: Diatoms are photosynthetic secondary endosymbionts found throughout marine and freshwater environments, and are believed to be responsible for around one- fifth of the primary productivity on Earth(1,2). The genome sequence of the marine centric diatom Thalassiosira pseudonana was recently reported, revealing a wealth of information about diatom biology(3-5). Here we report the complete genome sequence of the pennate diatom Phaeodactylum tricornutum and compare it with that of T. pseudonana to clarify evolutionary origins, functional significance and ubiquity of these features throughout diatoms. In spite of the fact that the pennate and centric lineages have only been diverging for 90 million years, their genome structures are dramatically different and a substantial fraction of genes (similar to 40%) are not shared by these representatives of the two lineages. Analysis of molecular divergence compared with yeasts and metazoans reveals rapid rates of gene diversification in diatoms. Contributing factors include selective gene family expansions, differential losses and gains of genes and introns, and differential mobilization of transposable elements. Most significantly, we document the presence of hundreds of genes from bacteria. More than 300 of these gene transfers are found in both diatoms, attesting to their ancient origins, and many are likely to provide novel possibilities for metabolite management and for perception of environmental signals. These findings go a long way towards explaining the incredible diversity and success of the diatoms in contemporary oceans.
1,500 citations
••
Broad Institute1, Ohio Agricultural Research and Development Center2, Sainsbury Laboratory3, Uppsala University4, Wageningen University and Research Centre5, Virginia Bioinformatics Institute6, University of California, Riverside7, University of Aberdeen8, Scottish Crop Research Institute9, University of Warwick10, Agricultural Research Service11, Royal Institute of Technology12, Cornell University13, Oregon State University14, Lafayette College15, University of Glasgow16, Harvard University17, Delaware Biotechnology Institute18, North Carolina State University19, University of Delaware20, University of Tennessee21, University of Maryland, Baltimore22, Vanderbilt University23, College of Wooster24, Bowling Green State University25, Edinburgh Cancer Research Centre26, J. Craig Venter Institute27, Tel Aviv University28, University of Wisconsin-Madison29, University of Hohenheim30, University of Dundee31
TL;DR: The sequence of the P. infestans genome is reported, which at ∼240 megabases (Mb) is by far the largest and most complex genome sequenced so far in the chromalveolates and probably plays a crucial part in the rapid adaptability of the pathogen to host plants and underpins its evolutionary potential.
Abstract: Phytophthora infestans is the most destructive pathogen of potato and a model organism for the oomycetes, a distinct lineage of fungus-like eukaryotes that are related to organisms such as brown algae and diatoms. As the agent of the Irish potato famine in the mid-nineteenth century, P. infestans has had a tremendous effect on human history, resulting in famine and population displacement(1). To this day, it affects world agriculture by causing the most destructive disease of potato, the fourth largest food crop and a critical alternative to the major cereal crops for feeding the world's population(1). Current annual worldwide potato crop losses due to late blight are conservatively estimated at $6.7 billion(2). Management of this devastating pathogen is challenged by its remarkable speed of adaptation to control strategies such as genetically resistant cultivars(3,4). Here we report the sequence of the P. infestans genome, which at similar to 240 megabases (Mb) is by far the largest and most complex genome sequenced so far in the chromalveolates. Its expansion results from a proliferation of repetitive DNA accounting for similar to 74% of the genome. Comparison with two other Phytophthora genomes showed rapid turnover and extensive expansion of specific families of secreted disease effector proteins, including many genes that are induced during infection or are predicted to have activities that alter host physiology. These fast-evolving effector genes are localized to highly dynamic and expanded regions of the P. infestans genome. This probably plays a crucial part in the rapid adaptability of the pathogen to host plants and underpins its evolutionary potential.
1,341 citations