scispace - formally typeset
Search or ask a question
Author

Alexander Kozik

Bio: Alexander Kozik is an academic researcher from University of California, Davis. The author has contributed to research in topics: Genome & Population. The author has an hindex of 32, co-authored 52 publications receiving 5003 citations. Previous affiliations of Alexander Kozik include University of California, Berkeley & University of California.


Papers
More filters
Journal ArticleDOI
TL;DR: The observed diversity of these NBS-LRR proteins indicates the variety of recognition molecules available in an individual genotype to detect diverse biotic challenges.
Abstract: The Arabidopsis genome contains ∼200 genes that encode proteins with similarity to the nucleotide binding site and other domains characteristic of plant resistance proteins. Through a reiterative process of sequence analysis and reannotation, we identified 149 NBS-LRR–encoding genes in the Arabidopsis (ecotype Columbia) genomic sequence. Fifty-six of these genes were corrected from earlier annotations. At least 12 are predicted to be pseudogenes. As described previously, two distinct groups of sequences were identified: those that encoded an N-terminal domain with Toll/Interleukin-1 Receptor homology (TIR-NBS-LRR, or TNL), and those that encoded an N-terminal coiled-coil motif (CC-NBS-LRR, or CNL). The encoded proteins are distinct from the 58 predicted adapter proteins in the previously described TIR-X, TIR-NBS, and CC-NBS groups. Classification based on protein domains, intron positions, sequence conservation, and genome distribution defined four subgroups of CNL proteins, eight subgroups of TNL proteins, and a pair of divergent NL proteins that lack a defined N-terminal motif. CNL proteins generally were encoded in single exons, although two subclasses were identified that contained introns in unique positions. TNL proteins were encoded in modular exons, with conserved intron positions separating distinct protein domains. Conserved motifs were identified in the LRRs of both CNL and TNL proteins. In contrast to CNL proteins, TNL proteins contained large and variable C-terminal domains. The extant distribution and diversity of the NBS-LRR sequences has been generated by extensive duplication and ectopic rearrangements that involved segmental duplications as well as microscale events. The observed diversity of these NBS-LRR proteins indicates the variety of recognition molecules available in an individual genotype to detect diverse biotic challenges.

1,503 citations

Journal ArticleDOI
TL;DR: The genome sequences of its diploid ancestors are reported to show that these genomes are similar to cultivated peanut's A and B subgenomes and used to identify candidate disease resistance genes, to guide tetraploid transcript assemblies and to detect genetic exchange between cultivated peanuts' subgenome.
Abstract: Cultivated peanut (Arachis hypogaea) is an allotetraploid with closely related subgenomes of a total size of ∼2.7 Gb. This makes the assembly of chromosomal pseudomolecules very challenging. As a foundation to understanding the genome of cultivated peanut, we report the genome sequences of its diploid ancestors (Arachis duranensis and Arachis ipaensis). We show that these genomes are similar to cultivated peanut's A and B subgenomes and use them to identify candidate disease resistance genes, to guide tetraploid transcript assemblies and to detect genetic exchange between cultivated peanut's subgenomes. On the basis of remarkably high DNA identity of the A. ipaensis genome and the B subgenome of cultivated peanut and biogeographic evidence, we conclude that A. ipaensis may be a direct descendant of the same population that contributed the B subgenome to cultivated peanut.

643 citations

Journal ArticleDOI
TL;DR: It is suggested that paleopolyploidy can yield strikingly consistent signatures of gene retention in plant genomes despite extensive lineage radiations and recurrent genome duplications but that these patterns vary substantially among higher taxonomic categories.
Abstract: Of the approximately 250,000 species of flowering plants, nearly one in ten are members of the Compositae (Asteraceae), a diverse family found in almost every habitat on all continents except Antarctica. With an origin in the mid Eocene, the Compositae is also a relatively young family with remarkable diversifications during the last 40 My. Previous cytologic and systematic investigations suggested that paleopolyploidy may have occurred in at least one Compositae lineage, but a recent analysis of genomic data was equivocal. We tested for evidence of paleopolyploidy in the evolutionary history of the family using recently available expressed sequence tag (EST) data from the Compositae Genome Project. Combined with data available on GenBank, we analyzed nearly 1 million ESTs from 18 species representing seven genera and four tribes. Our analyses revealed at least three ancient whole-genome duplications in the Compositae-a paleopolyploidization shared by all analyzed taxa and placed near the origin of the family just prior to the rapid radiation of its tribes and independent genome duplications near the base of the tribes Mutisieae and Heliantheae. These results are consistent with previous research implicating paleopolyploidy in the evolution and diversification of the Heliantheae. Further, we observed parallel retention of duplicate genes from the basal Compositae genome duplication across all tribes, despite divergence times of 33-38 My among these lineages. This pattern of retention was also repeated for the paleologs from the Heliantheae duplication. Intriguingly, the categories of genes retained in duplicate were substantially different from those in Arabidopsis. In particular, we found that genes annotated to structural components or cellular organization Gene Ontology categories were significantly enriched among paleologs, whereas genes associated with transcription and other regulatory functions were significantly underrepresented. Our results suggest that paleopolyploidy can yield strikingly consistent signatures of gene retention in plant genomes despite extensive lineage radiations and recurrent genome duplications but that these patterns vary substantially among higher taxonomic categories.

331 citations

Journal ArticleDOI
TL;DR: This work identifies several genomic features that may have contributed to the success of the Compositae family of flowering plants, including genes encoding Cycloidea-like transcription factors, kinases, enzymes involved in rubber biosynthesis and disease resistance proteins that are expanded in the genome.
Abstract: Lettuce (Lactuca sativa) is a major crop and a member of the large, highly successful Compositae family of flowering plants. Here we present a reference assembly for the species and family. This was generated using whole-genome shotgun Illumina reads plus in vitro proximity ligation data to create large superscaffolds; it was validated genetically and superscaffolds were oriented in genetic bins ordered along nine chromosomal pseudomolecules. We identify several genomic features that may have contributed to the success of the family, including genes encoding Cycloidea-like transcription factors, kinases, enzymes involved in rubber biosynthesis and disease resistance proteins that are expanded in the genome. We characterize 21 novel microRNAs, one of which may trigger phasiRNAs from numerous kinase transcripts. We provide evidence for a whole-genome triplication event specific but basal to the Compositae. We detect 26% of the genome in triplicated regions containing 30% of all genes that are enriched for regulatory sequences and depleted for genes involved in defence.

281 citations

Journal ArticleDOI
31 Aug 2011-PLOS ONE
TL;DR: Cantu et al. as discussed by the authors used Illumina sequencing to rapidly access the genomic sequence of the highly virulent PST race 130 (PST-130), which was assembled into 29,178 contigs (64.8 Mb).
Abstract: Author(s): Cantu, Dario; Govindarajulu, Manjula; Kozik, Alex; Wang, Meinan; Chen, Xianming; Kojima, Kenji K; Jurka, Jerzy; Michelmore, Richard W; Dubcovsky, Jorge | Abstract: BackgroundThe wheat stripe rust fungus (Puccinia striiformis f. sp. tritici, PST) is responsible for significant yield losses in wheat production worldwide. In spite of its economic importance, the PST genomic sequence is not currently available. Fortunately Next Generation Sequencing (NGS) has radically improved sequencing speed and efficiency with a great reduction in costs compared to traditional sequencing technologies. We used Illumina sequencing to rapidly access the genomic sequence of the highly virulent PST race 130 (PST-130).Methodology/principal findingsWe obtained nearly 80 million high quality paired-end reads (g50x coverage) that were assembled into 29,178 contigs (64.8 Mb), which provide an estimated coverage of at least 88% of the PST genes and are available through GenBank. Extensive micro-synteny with the Puccinia graminis f. sp. tritici (PGTG) genome and high sequence similarity with annotated PGTG genes support the quality of the PST-130 contigs. We characterized the transposable elements present in the PST-130 contigs and using an ab initio gene prediction program we identified and tentatively annotated 22,815 putative coding sequences. We provide examples on the use of comparative approaches to improve gene annotation for both PST and PGTG and to identify candidate effectors. Finally, the assembled contigs provided an inventory of PST repetitive elements, which were annotated and deposited in Repbase.Conclusions/significanceThe assembly of the PST-130 genome and the predicted proteins provide useful resources to rapidly identify and clone PST genes and their regulatory regions. Although the automatic gene prediction has limitations, we show that a comparative genomics approach using multiple rust species can greatly improve the quality of gene annotation in these species. The PST-130 sequence will also be useful for comparative studies within PST as more races are sequenced. This study illustrates the power of NGS for rapid and efficient access to genomic sequence in non-model organisms.

181 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Circos uses a circular ideogram layout to facilitate the display of relationships between pairs of positions by the use of ribbons, which encode the position, size, and orientation of related genomic elements.
Abstract: We created a visualization tool called Circos to facilitate the identification and analysis of similarities and differences arising from comparisons of genomes. Our tool is effective in displaying variation in genome structure and, generally, any other kind of positional relationships between genomic intervals. Such data are routinely produced by sequence alignments, hybridization arrays, genome mapping, and genotyping studies. Circos uses a circular ideogram layout to facilitate the display of relationships between pairs of positions by the use of ribbons, which encode the position, size, and orientation of related genomic elements. Circos is capable of displaying data as scatter, line, and histogram plots, heat maps, tiles, connectors, and text. Bitmap or vector images can be created from GFF-style data inputs and hierarchical configuration files, which can be easily generated by automated tools, making Circos suitable for rapid deployment in data analysis and reporting pipelines.

8,315 citations

Journal ArticleDOI
TL;DR: The MCScanX toolkit implements an adjusted MCScan algorithm for detection of synteny and collinearity that extends the original software by incorporating 14 utility programs for visualization of results and additional downstream analyses.
Abstract: MCScan is an algorithm able to scan multiple genomes or subgenomes in order to identify putative homologous chromosomal regions, and align these regions using genes as anchors. The MCScanX toolkit implements an adjusted MCScan algorithm for detection of synteny and collinearity that extends the original software by incorporating 14 utility programs for visualization of results and additional downstream analyses. Applications of MCScanX to several sequenced plant genomes and gene families are shown as examples. MCScanX can be used to effectively analyze chromosome structural changes, and reveal the history of gene family expansions that might contribute to the adaptation of lineages and taxa. An integrated view of various modes of gene duplication can supplement the traditional gene tree analysis in specific families. The source code and documentation of MCScanX are freely available at http://chibba.pgml.uga.edu/mcscan2/.

3,388 citations

Journal ArticleDOI
29 Jan 2009-Nature
TL;DR: An initial analysis of the ∼730-megabase Sorghum bicolor (L.) Moench genome is presented, placing ∼98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information.
Abstract: Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approximately 730-megabase Sorghum bicolor (L.) Moench genome, placing approximately 98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approximately 75% larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization approximately 70 million years ago, most duplicated gene sets lost one member before the sorghum-rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24% of genes are grass-specific and 7% are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.

2,809 citations

Journal ArticleDOI
TL;DR: A short resumé of each fungus in the Top 10 list and its importance is presented, with the intent of initiating discussion and debate amongst the plant mycology community, as well as laying down a bench-mark.
Abstract: The aim of this review was to survey all fungal pathologists with an association with the journal Molecular Plant Pathology and ask them to nominate which fungal pathogens they would place in a 'Top 10' based on scientific/economic importance. The survey generated 495 votes from the international community, and resulted in the generation of a Top 10 fungal plant pathogen list for Molecular Plant Pathology. The Top 10 list includes, in rank order, (1) Magnaporthe oryzae; (2) Botrytis cinerea; (3) Puccinia spp.; (4) Fusarium graminearum; (5) Fusarium oxysporum; (6) Blumeria graminis; (7) Mycosphaerella graminicola; (8) Colletotrichum spp.; (9) Ustilago maydis; (10) Melampsora lini, with honourable mentions for fungi just missing out on the Top 10, including Phakopsora pachyrhizi and Rhizoctonia solani. This article presents a short resume of each fungus in the Top 10 list and its importance, with the intent of initiating discussion and debate amongst the plant mycology community, as well as laying down a bench-mark. It will be interesting to see in future years how perceptions change and what fungi will comprise any future Top 10.

2,807 citations

Journal ArticleDOI
TL;DR: Current evidence indicates that MAMPs, DAMPs, and effectors are all perceived as danger signals and induce a stereotypic defense response, and the importance of MAMP/PRR signaling for plant immunity is highlighted.
Abstract: Microbe-associated molecular patterns (MAMPs) are molecular signatures typical of whole classes of microbes, and their recognition plays a key role in innate immunity. Endogenous elicitors are similarly recognized as damage-associated molecular patterns (DAMPs). This review focuses on the diversity of MAMPs/DAMPs and on progress to identify the corresponding pattern recognition receptors (PRRs) in plants. The two best-characterized MAMP/PRR pairs, flagellin/FLS2 and EF-Tu/EFR, are discussed in detail and put into a phylogenetic perspective. Both FLS2 and EFR are leucine-rich repeat receptor kinases (LRR-RKs). Upon treatment with flagellin, FLS2 forms a heteromeric complex with BAK1, an LRR-RK that also acts as coreceptor for the brassinolide receptor BRI1. The importance of MAMP/PRR signaling for plant immunity is highlighted by the finding that plant pathogens use effectors to inhibit PRR complexes or downstream signaling events. Current evidence indicates that MAMPs, DAMPs, and effectors are all perceived as danger signals and induce a stereotypic defense response.

2,801 citations