Home
/
Authors
/
Andrey Zharkikh

Author

Andrey Zharkikh

Other affiliations: Agency for Science, Technology and Research

Bio: Andrey Zharkikh is an academic researcher from Myriad Genetics. The author has contributed to research in topics: Gene & Genome. The author has an hindex of 17, co-authored 27 publications receiving 9229 citations. Previous affiliations of Andrey Zharkikh include Agency for Science, Technology and Research.

Topics: Gene, Genome, Shotgun sequencing, DNA sequencing, Concerted evolution ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A draft sequence of the rice genome (Oryza sativa L. ssp indica)

[...]

Stephen A. Goff¹, Darrell O. Ricke¹, Tien-Hung Lan¹, Gernot G. Presting¹, Ronglin Wang¹, Molly Dunn¹, Jane Glazebrook¹, Allen Sessions¹, Paul Oeller¹, Hemant Varma¹, David Hadley¹, Don Hutchison¹, Christopher M. Martin¹, Fumiaki Katagiri¹, B. Markus Lange¹, Todd Moughamer¹, Yu Xia¹, Paul Budworth¹, Jingping Zhong¹, Trini Miguel¹, Uta Paszkowski¹, Shiping Zhang¹, Michelle Colbert¹, Wei-lin Sun¹, Lili Chen¹, Bret Cooper¹, Sylvia Park¹, Todd Charles Wood², Long Mao³, Peter H. Quail⁴, Rod A. Wing⁵, Ralph A. Dean⁵, Yeisoo Yu⁵, Andrey Zharkikh⁶, Richard Shen⁶, Sudhir Sahasrabudhe⁶, Alun Thomas⁶, Rob Cannings⁶, Alexander Gutin⁶, Dmitry Pruss⁶, Julia Reid⁶, Sean V. Tavtigian⁶, J.T. Mitchell⁶, Glenn Eldredge⁶, Terri Scholl⁶, Rose Mary Miller⁶, Satish Bhatnagar⁶, Nils Adey⁶, Todd Rubano⁶, Nadeem Tusneem⁶, Rosann Robinson⁶, Jane Feldhaus⁶, Teresita Macalma⁶, Arnold R. Oliphant⁶, Steven P. Briggs¹ - Show less +51 more•Institutions (6)

Syngenta¹, Bryan College², Northern Illinois University³, University of California, Berkeley⁴, Clemson University⁵, Myriad Genetics⁶

05 Apr 2002-Science

TL;DR: A draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp.indica, by whole-genome shotgun sequencing is produced, with a large proportion of rice genes with no recognizable homologs due to a gradient in the GC content of rice coding sequences.

...read moreread less

Abstract: We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana. The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC-content of rice coding sequences.

...read moreread less

4,064 citations

Journal Article•DOI•

The genome of the domesticated apple ( Malus × domestica Borkh.)

[...]

Riccardo Velasco, Andrey Zharkikh¹, Jason P. Affourtit², Amit Dhingra³, Alessandro Cestaro, Ananth Kalyanaraman³, Paolo Fontana, Satish Bhatnagar¹, Michela Troggio, Dmitry Pruss¹, Silvio Salvi⁴, Massimo Pindo, Paolo Baldi, Sara Castelletti, Marina Cavaiuolo, G. Coppola, Fabrizio Costa, V. Cova, Antonio Dal Ri, Vadim V. Goremykin, M. Komjanc, Sara Longhi, P. Magnago, Giulia Malacarne, Mickael Malnoy, Diego Micheletti, Marco Moretto, Michele Perazzolli, Azeddine Si-Ammour, Silvia Vezzulli, E. Zini, Glenn Eldredge¹, Lisa M. Fitzgerald¹, N. Gutin¹, Jerry S. Lanchbury¹, Teresita Macalma¹, J.T. Mitchell¹, Julia Reid¹, Bryan Wardell¹, Chinnappa D. Kodira², Zhoutao Chen², Brian Desany², Faheem Niazi², Melinda Palmer², Tyson Koepke³, Derick Jiwan³, Scott Schaeffer³, Vandhana Krishnan³, Changjun Wu³, Vu T. Chu⁵, Stephen T. King⁵, Jessica Vick⁵, Quanzhou Tao, Amy Mraz, Aimee Stormo, Keith E. Stormo, Robert Bogden, Davide Ederle⁶, Alessandra Stella⁶, Alberto Vecchietti⁶, Martin M. Kater⁷, Simona Masiero⁷, Pauline Lasserre, Yves Lespinasse, Andrew C. Allan⁸, Vincent G. M. Bus⁸, David Chagné⁸, Ross N. Crowhurst⁸, Andrew P. Gleave⁸, Enrico Lavezzo⁹, Jeffrey A. Fawcett¹⁰, Jeffrey A. Fawcett¹¹, Sebastian Proost¹⁰, Sebastian Proost¹¹, Pierre Rouzé¹¹, Pierre Rouzé¹⁰, Lieven Sterck¹⁰, Lieven Sterck¹¹, Stefano Toppo⁹, Barbara Lazzari⁶, Roger P. Hellens⁸, Charles-Eric Durel, Alexander Gutin¹, Roger E. Bumgarner⁵, Susan E. Gardiner⁸, Mark H. Skolnick¹, Michael Egholm², Yves Van de Peer¹⁰, Yves Van de Peer¹¹, Francesco Salamini⁶, Roberto Viola - Show less +87 more•Institutions (11)

Myriad Genetics¹, Hoffmann-La Roche², Washington State University³, University of Bologna⁴, University of Washington⁵, Parco Tecnologico Padano⁶, University of Milan⁷, Plant & Food Research⁸, University of Padua⁹, Flanders Institute for Biotechnology¹⁰, Ghent University¹¹

01 Oct 2010-Nature Genetics

TL;DR: It is shown that a relatively recent (>50 million years ago) genome-wide duplication has resulted in the transition from nine ancestral chromosomes to 17 chromosomes in the Pyreae, which partly support the monophyly of the ancestral paleohexaploidy of eudicots.

...read moreread less

Abstract: We report a high-quality draft genome sequence of the domesticated apple (Malus × domestica). We show that a relatively recent (>50 million years ago) genome-wide duplication (GWD) has resulted in the transition from nine ancestral chromosomes to 17 chromosomes in the Pyreae. Traces of older GWDs partly support the monophyly of the ancestral paleohexaploidy of eudicots. Phylogenetic reconstruction of Pyreae and the genus Malus, relative to major Rosaceae taxa, identified the progenitor of the cultivated apple as M. sieversii. Expansion of gene families reported to be involved in fruit development may explain formation of the pome, a Pyreae-specific false fruit that develops by proliferation of the basal part of the sepals, the receptacle. In apple, a subclade of MADS-box genes, normally involved in flower and fruit development, is expanded to include 15 members, as are other gene families involved in Rosaceae-specific metabolism, such as transport and assimilation of sorbitol.

...read moreread less

1,718 citations

Journal Article•DOI•

Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes

[...]

Samuel Aparicio¹, Jarrod Chapman¹, Elia Stupka¹, Nik Putnam¹, Jer Ming Chia¹, Paramvir S. Dehal¹, Alan Christoffels¹, Sam Rash¹, Shawn Hoon¹, Arian F.A. Smit¹, Maarten D. Sollewijn Gelpke¹, Jared C. Roach¹, Tania Oh¹, Isaac Ho¹, Marie Wong¹, Chris Detter¹, Frans Verhoef¹, Paul Predki¹, Alice Tay¹, Susan Lucas¹, Paul G. Richardson¹, Sarah Smith¹, Melody S. Clark¹, Yvonne J. K. Edwards¹, Norman A. Doggett¹, Andrey Zharkikh¹, Sean V. Tavtigian¹, Dmitry Pruss¹, Mary Barnstead¹, Cheryl Evans¹, Holly Baden¹, Justin Powell¹, Gustavo Glusman¹, Lee Rowen¹, Leroy Hood¹, Y. H. Tan¹, Greg Elgar¹, Trevor Hawkins¹, Byrappa Venkatesh¹, Daniel S. Rokhsar¹, Sydney Brenner¹ - Show less +37 more•Institutions (1)

Agency for Science, Technology and Research¹

23 Aug 2002-Science

TL;DR: The Fugu rubripes genome has been sequenced to over 95% coverage, and more than 80% of the assembly is in multigene-sized scaffolds as discussed by the authors.

...read moreread less

Abstract: The compact genome of Fugu rubripes has been sequenced to over 95% coverage, and more than 80% of the assembly is in multigene-sized scaffolds. In this 365-megabase vertebrate genome, repetitive DNA accounts for less than one-sixth of the sequence, and gene loci occupy about one-third of the genome. As with the human genome, gene loci are not evenly distributed, but are clustered into sparse and dense regions. Some “giant” genes were observed that had average coding sequence sizes but were spread over genomic lengths significantly larger than those of their human orthologs. Although three-quarters of predicted human proteins have a strong match toFugu, approximately a quarter of the human proteins had highly diverged from or had no pufferfish homologs, highlighting the extent of protein evolution in the 450 million years since teleosts and mammals diverged. Conserved linkages between Fugu and human genes indicate the preservation of chromosomal segments from the common vertebrate ancestor, but with considerable scrambling of gene order.

...read moreread less

1,446 citations

Journal Article•DOI•

A High Quality Draft Consensus Sequence of the Genome of a Heterozygous Grapevine Variety

[...]

Riccardo Velasco, Andrey Zharkikh¹, Michela Troggio, Dustin Cartwright¹, Alessandro Cestaro, Dmitry Pruss¹, Massimo Pindo, Lisa M. Fitzgerald¹, Silvia Vezzulli, Julia Reid¹, Giulia Malacarne, Diana Iliev¹, G. Coppola, Bryan Wardell¹, Diego Micheletti, Teresita Macalma¹, Marco Facci, J.T. Mitchell¹, Michele Perazzolli, Glenn Eldredge¹, Pamela Gatto, Rozan Oyzerski¹, Marco Moretto, N. Gutin¹, Marco Stefanini, Yang Chen¹, C. Segala, Christine Davenport¹, Lorenzo Dematte, Amy Mraz, Juri Battilana, Keith E. Stormo, Fabrizio Costa, Quanzhou Tao, Azeddine Si-Ammour, Tim Harkins², Angie Lackey², Clotilde Perbost, Bruce E Taillon, Alessandra Stella, Victor V. Solovyev³, Jeffrey A. Fawcett⁴, Lieven Sterck⁴, Klaas Vandepoele⁴, Stella M. Grando, Stefano Toppo, Claudio Moser, Jerry S. Lanchbury¹, Robert Bogden, Mark H. Skolnick¹, Vittorio Sgaramella, Satish Bhatnagar¹, Paolo Fontana, Alexander Gutin¹, Yves Van de Peer⁴, Francesco Salamini, Roberto Viola - Show less +53 more•Institutions (4)

Myriad Genetics¹, Roche Applied Science², Royal Holloway, University of London³, Ghent University⁴

19 Dec 2007-PLOS ONE

TL;DR: A high quality draft genome sequence of a cultivated clone of V. vinifera Pinot Noir provides candidate genes implicated in traits relevant to grapevine cultivation, such as those influencing wine quality, via secondary metabolites, and those connected with the extreme susceptibility of grape to pathogens.

...read moreread less

Abstract: Background. Worldwide, grapes and their derived products have a large market. The cultivated grape species Vitis vinifera has potential to become a model for fruit trees genetics. Like many plant species, it is highly heterozygous, which is an additional challenge to modern whole genome shotgun sequencing. In this paper a high quality draft genome sequence of a cultivated clone of V. vinifera Pinot Noir is presented. Principal Findings. We estimate the genome size of V. vinifera to be 504.6 Mb. Genomic sequences corresponding to 477.1 Mb were assembled in 2,093 metacontigs and 435.1 Mb were anchored to the 19 linkage groups (LGs). The number of predicted genes is 29,585, of which 96.1% were assigned to LGs. This assembly of the grape genome provides candidate genes implicated in traits relevant to grapevine cultivation, such as those influencing wine quality, via secondary metabolites, and those connected with the extreme susceptibility of grape to pathogens. Single nucleotide polymorphism (SNP) distribution was consistent with a diffuse haplotype structure across the genome. Of around 2,000,000 SNPs, 1,751,176 were mapped to chromosomes and one or more of them were identified in 86.7% of anchored genes. The relative age of grape duplicated genes was estimated and this made possible to reveal a relatively recent Vitisspecific large scale duplication event concerning at least 10 chromosomes (duplication not reported before). Conclusions. Sanger shotgun sequencing and highly efficient sequencing by synthesis (SBS), together with dedicated assembly programs, resolved a complex heterozygous genome. A consensus sequence of the genome and a set of mapped marker loci were generated. Homologous chromosomes of Pinot Noir differ by 11.2% of their DNA (hemizygous DNA plus chromosomal gaps). SNP markers are offered as a tool with the potential of introducing a new era in the molecular breeding of grape.

...read moreread less

1,005 citations

Journal Article•DOI•

Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral

[...]

Sean V. Tavtigian¹, Amie M. Deffenbaugh², Luo Yin¹, Thaddeus Judkins², Thomas Scholl², Paul B. Samollow³, Deepika de Silva¹, Andrey Zharkikh², Alun Thomas - Show less +5 more•Institutions (3)

International Agency for Research on Cancer¹, Myriad Genetics², Texas Biomedical Research Institute³

13 Jul 2005-Journal of Medical Genetics

TL;DR: Odds ratios estimated for sets of substitutions grouped by A-GVGD scores are consistent with the hypothesis that most unclassified substitutions that are within the cross-species range of variation at their position in BRCA1 are also neutral.

...read moreread less

Abstract: Background: Genetic testing for hereditary cancer syndromes contributes to the medical management of patients who may be at increased risk of one or more cancers. BRCA1 and BRCA2 testing for hereditary breast and ovarian cancer is one such widely used test. However, clinical testing methods with high sensitivity for deleterious mutations in these genes also detect many unclassified variants, primarily missense substitutions. Methods: We developed an extension of the Grantham difference, called A-GVGD, to score missense substitutions against the range of variation present at their position in a multiple sequence alignment. Combining two methods, co-occurrence of unclassified variants with clearly deleterious mutations and A-GVGD, we analysed most of the missense substitutions observed in BRCA1. Results: A-GVGD was able to resolve known neutral and deleterious missense substitutions into distinct sets. Additionally, eight previously unclassified BRCA1 missense substitutions observed in trans with one or more deleterious mutations, and within the cross-species range of variation observed at their position in the protein, are now classified as neutral. Discussion: The methods combined here can classify as neutral about 50% of missense substitutions that have been observed with two or more clearly deleterious mutations. Furthermore, odds ratios estimated for sets of substitutions grouped by A-GVGD scores are consistent with the hypothesis that most unclassified substitutions that are within the cross-species range of variation at their position in BRCA1 are also neutral. For most of these, clinical reclassification will require integrated application of other methods such as pooled family histories, segregation analysis, or validated functional assay.

...read moreread less

615 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.

[...]

Arabidopsis Genome Initiative¹•Institutions (1)

J. Craig Venter Institute¹

14 Dec 2000-Nature

TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

...read moreread less

Abstract: The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

...read moreread less

8,742 citations

Journal Article•DOI•

Initial sequencing and comparative analysis of the mouse genome.

[...]

Robert H. Waterston¹, Kerstin Lindblad-Toh², Ewan Birney, Jane Rogers³ +219 more•Institutions (26)

05 Dec 2002-Nature

TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.

...read moreread less

Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

...read moreread less

6,643 citations

Journal Article•DOI•

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project

[...]

Ewan Birney, John A. Stamatoyannopoulos¹, Anindya Dutta², Roderic Guigó³ +317 more•Institutions (44)

14 Jun 2007-Nature

TL;DR: Functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project are reported, providing convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts.

...read moreread less

Abstract: We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

...read moreread less

5,091 citations

Journal Article•DOI•

The COG database: an updated version includes eukaryotes

[...]

Roman L. Tatusov¹, Natalie D. Fedorova¹, John D. Jackson¹, Aviva R. Jacobs¹, Boris Kiryutin¹, Eugene V. Koonin¹, Dmitri M. Krylov¹, Raja Mazumder², Sergei L. Mekhedov¹, Anastasia N. Nikolskaya², B Sridhar Rao¹, Sergei Smirnov¹, Alexander V. Sverdlov¹, Sona Vasudevan¹, Yuri I. Wolf¹, Jodie J. Yin¹, Darren A. Natale² - Show less +13 more•Institutions (2)

National Institutes of Health¹, Georgetown University Medical Center²

11 Sep 2003-BMC Bioinformatics

TL;DR: A major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes is described and is expected to be a useful platform for functional annotation of newlysequenced genomes, including those of complex eukARYotes, and genome-wide evolutionary studies.

...read moreread less

Abstract: The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after euk aryotic o rthologous g roups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The euk aryotic o rthologous g roups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or ~54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of ~20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (~1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.

...read moreread less

4,167 citations

Journal Article•DOI•

Evolution of Protein Molecules

[...]

S. Jeffery

01 Apr 1979-Biochemical Society Transactions

3,734 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse