Home
/
Authors
/
Jessica Vamathevan

Author

Jessica Vamathevan

Other affiliations: TigerLogic, University College London, GlaxoSmithKline ...read more

Bio: Jessica Vamathevan is an academic researcher from European Bioinformatics Institute. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 21, co-authored 35 publications receiving 10301 citations. Previous affiliations of Jessica Vamathevan include TigerLogic & University College London.

Topics: Genome, Gene, Replicon, Plasmid, Hepatitis C virus ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae

[...]

John F. Heidelberg, Jonathan A. Eisen, William C. Nelson, Rebecca A. Clayton, Michelle L. Gwinn, Robert J. Dodson, Daniel H. Haft, Erin Hickey, Jeremy Peterson, Lowell Umayam, Steven R. Gill, Karen E. Nelson, Timothy D. Read, Hervé Tettelin, Delwood Richardson, Maria D. Ermolaeva, Jessica Vamathevan, Steven Bass, Haiying Qin, Ioana Dragoi, Patrick Sellers, Lisa McDonald, Teresa Utterback, Robert D. Fleishmann, William C. Nierman, Owen White, Steven L. Salzberg, Hamilton O. Smith¹, Rita R. Colwell², Rita R. Colwell³, John J. Mekalanos⁴, J. Craig Venter¹, Claire M. Fraser - Show less +29 more•Institutions (4)

Celera Corporation¹, University of Maryland Biotechnology Institute², University of Maryland, College Park³, Harvard University⁴

03 Aug 2000-Nature

TL;DR: The V. cholerae genomic sequence provides a starting point for understanding how a free-living, environmental organism emerged to become a significant human bacterial pathogen.

...read moreread less

Abstract: Here we determine the complete genomic sequence of the Gram negative, g-Proteobacterium Vibrio cholerae El Tor N16961 to be 4,033,460 base pairs (bp). The genome consists of two circular chromosomes of 2,961,146 bp and 1,072,314 bp that together encode 3,885 open reading frames. The vast majority of recognizable genes for essential cell functions (such as DNA replication, transcription, translation and cell-wall biosynthesis) and pathogenicity (for example, toxins, surface antigens and adhesins) are located on the large chromosome. In contrast, the small chromosome contains a larger fraction (59%) of hypothetical genes compared with the large chromosome (42%), and also contains many more genes that appear to have origins other than the g-Proteobacteria. The small chromosome also carries a gene capture system (the integron island) and host ‘addiction’ genes that are typically found on plasmids; thus, the small chromosome may have originally been a megaplasmid that was captured by an ancestral Vibrio species. The V. cholerae genomic sequence provides a starting point for understanding how a free-living, environmental organism emerged to become a significant human bacterial pathogen.

...read moreread less

1,785 citations

Journal Article•DOI•

Complete Genome Sequence of Neisseria meningitidis Serogroup B Strain MC58

[...]

Hervé Tettelin¹, Nigel J. Saunders², John F. Heidelberg¹, Alex C. Jeffries², Karen E. Nelson¹, Jonathan A. Eisen¹, Karen A. Ketchum¹, Derek W. Hood², John F. Peden², Robert J. Dodson¹, William C. Nelson¹, Michelle L. Gwinn¹, Robert T. DeBoy¹, Jeremy Peterson¹, Erin Hickey¹, Daniel H. Haft¹, Steven L. Salzberg¹, Owen White¹, Robert D. Fleischmann¹, Brian Dougherty¹, Tanya Mason¹, Anne Ciecko¹, Debbie S. Parksey¹, Eric Blair¹, Henry Cittone¹, Emily B. Clark¹, Matthew D. Cotton¹, T. Utterback¹, Hoda Khouri¹, Haiying Qin¹, Jessica Vamathevan¹, John Gill¹, Vincenzo Scarlato, Vega Masignani, Mariagrazia Pizza, Guido Grandi, Li Sun², Hamilton O. Smith¹, Claire M. Fraser¹, E. Richard Moxon², Rino Rappuoli, J. Craig Venter¹ - Show less +38 more•Institutions (2)

J. Craig Venter Institute¹, University of Oxford²

10 Mar 2000-Science

TL;DR: Neisseria meningitidis contains more genes that undergo phase variation than any pathogen studied to date, a mechanism that controls their expression and contributes to the evasion of the host immune system.

...read moreread less

Abstract: The 2,272,351-base pair genome of Neisseria meningitidis strain MC58 (serogroup B), a causative agent of meningitis and septicemia, contains 2158 predicted coding regions, 1158 (53.7%) of which were assigned a biological role. Three major islands of horizontal DNA transfer were identified; two of these contain genes encoding proteins involved in pathogenicity, and the third island contains coding sequences only for hypothetical proteins. Insights into the commensal and virulence behavior of N. meningitidis can be gleaned from the genome, in which sequences for structural proteins of the pilus are clustered and several coding regions unique to serogroup B capsular polysaccharide synthesis can be identified. Finally, N. meningitidis contains more genes that undergo phase variation than any pathogen studied to date, a mechanism that controls their expression and contributes to the evasion of the host immune system.

...read moreread less

1,197 citations

Journal Article•DOI•

Applications of machine learning in drug discovery and development.

[...]

Jessica Vamathevan¹, Dominic Clark¹, Paul Czodrowski², Ian Dunham¹, Edgardo Ferran¹, George Lee³, Bin Li⁴, Anant Madabhushi⁵, Anant Madabhushi⁶, Parantu K. Shah⁷, Michaela Spitzer¹, Shanrong Zhao⁸ - Show less +8 more•Institutions (8)

European Bioinformatics Institute¹, Technical University of Dortmund², Bristol-Myers Squibb³, Takeda Pharmaceutical Company⁴, Case Western Reserve University⁵, Veterans Health Administration⁶, Merck Serono⁷, Pfizer⁸

01 Jun 2019-Nature Reviews Drug Discovery

TL;DR: The most useful techniques and how machine learning can promote data-driven decision making in drug discovery and development are discussed and major hurdles in the field are highlighted.

...read moreread less

Abstract: Drug discovery and development pipelines are long, complex and depend on numerous factors. Machine learning (ML) approaches provide a set of tools that can improve discovery and decision making for well-specified questions with abundant, high-quality data. Opportunities to apply ML occur in all stages of drug discovery. Examples include target validation, identification of prognostic biomarkers and analysis of digital pathology data in clinical trials. Applications have ranged in context and methodology, with some approaches yielding accurate predictions and insights. The challenges of applying ML lie primarily with the lack of interpretability and repeatability of ML-generated results, which may limit their application. In all areas, systematic and comprehensive high-dimensional data still need to be generated. With ongoing efforts to tackle these issues, as well as increasing awareness of the factors needed to validate ML approaches, the application of ML can promote data-driven decision making and has the potential to speed up the process and reduce failure rates in drug discovery and development. Machine learning has been applied to numerous stages in the drug discovery pipeline. Here, Vamathevan and colleagues discuss the most useful techniques and how machine learning can promote data-driven decision making in drug discovery and development. They highlight major hurdles in the field, such as the required data characteristics for applying machine learning, which will need to be solved as machine learning matures.

...read moreread less

1,159 citations

Journal Article•DOI•

Insights on Evolution of Virulence and Resistance from the Complete Genome Analysis of an Early Methicillin-Resistant Staphylococcus aureus Strain and a Biofilm-Producing Methicillin-Resistant Staphylococcus epidermidis Strain

[...]

Steven R. Gill¹, Derrick E. Fouts¹, Gordon L. Archer², Emmanuel F. Mongodin¹, Robert T. DeBoy¹, Jacques Ravel¹, Ian T. Paulsen¹, James F. Kolonay¹, Lauren M. Brinkac¹, Mauren Beanan¹, Robert J. Dodson¹, Sean C. Daugherty¹, R. Madupu¹, Samuel V. Angiuoli¹, A. Scott Durkin¹, Daniel H. Haft¹, Jessica Vamathevan¹, H. Khouri¹, T. Utterback¹, Chris Lee¹, George Dimitrov¹, Lingxia Jiang¹, Haiying Qin¹, Jan Weidman¹, Kevin Tran¹, Kathy Kang¹, Ioana R. Hance¹, Karen E. Nelson¹, Claire M. Fraser¹ - Show less +25 more•Institutions (2)

TigerLogic¹, Virginia Commonwealth University²

01 Apr 2005-Journal of Bacteriology

TL;DR: Gene transfer between staphylococci and low-GC-content gram-positive bacteria appears to have shaped their virulence and resistance profiles, and overall differences in pathogenicity can be attributed to genome islands in S. aureus and S. epidermidis.

...read moreread less

Abstract: Staphylococcus aureus is an opportunistic pathogen and the major causative agent of numerous hospital- and community-acquired infections. Staphylococcus epidermidis has emerged as a causative agent of infections often associated with implanted medical devices. We have sequenced the ∼2.8-Mb genome of S. aureus COL, an early methicillin-resistant isolate, and the ∼2.6-Mb genome of S. epidermidis RP62a, a methicillin-resistant biofilm isolate. Comparative analysis of these and other staphylococcal genomes was used to explore the evolution of virulence and resistance between these two species. The S. aureus and S. epidermidis genomes are syntenic throughout their lengths and share a core set of 1,681 open reading frames. Genome islands in nonsyntenic regions are the primary source of variations in pathogenicity and resistance. Gene transfer between staphylococci and low-GC-content gram-positive bacteria appears to have shaped their virulence and resistance profiles. Integrated plasmids in S. epidermidis carry genes encoding resistance to cadmium and species-specific LPXTG surface proteins. A novel genome island encodes multiple phenol-soluble modulins, a potential S. epidermidis virulence factor. S. epidermidis contains the cap operon, encoding the polyglutamate capsule, a major virulence factor in Bacillus anthracis. Additional phenotypic differences are likely the result of single nucleotide polymorphisms, which are most numerous in cell envelope proteins. Overall differences in pathogenicity can be attributed to genome islands in S. aureus which encode enterotoxins, exotoxins, leukocidins, and leukotoxins not found in S. epidermidis.

...read moreread less

1,075 citations

Journal Article•DOI•

Genome sequence of the radioresistant bacterium Deinococcus radiodurans R1.

[...]

Owen White, Jonathan A. Eisen, John F. Heidelberg, Erin Hickey, Jeremy Peterson, Robert J. Dodson, Daniel H. Haft, Michelle L. Gwinn, William C. Nelson, Delwood Richardson, Kelly Moffat, Haiying Qin, Lingxia Jiang, W. Pamphile, M. Crosby, Mian Shen, Jessica Vamathevan, P. Lam, Lisa McDonald, T. Utterback, C. Zalewski, Kira S. Makarova¹, L. Aravind¹, Michael J. Daly², Kenneth W. Minton², Robert D. Fleischmann, K. A. Ketchum, Karen E. Nelson, Steven L. Salzberg, Hamilton O. Smith, J C Venter³, J C Venter⁴, Claire M. Fraser - Show less +29 more•Institutions (4)

National Institutes of Health¹, Uniformed Services University of the Health Sciences², J. Craig Venter Institute³, Celera Corporation⁴

19 Nov 1999-Science

TL;DR: Deinococcus radiodurans represents an organism in which all systems for DNA repair, DNA damage export, desiccation and starvation recovery, and genetic redundancy are present in one cell.

...read moreread less

Abstract: The complete genome sequence of the radiation-resistant bacterium Deinococcus radiodurans R1 is composed of two chromosomes (2,648,638 and 412,348 base pairs), a megaplasmid (177,466 base pairs), and a small plasmid (45,704 base pairs), yielding a total genome of 3,284, 156 base pairs. Multiple components distributed on the chromosomes and megaplasmid that contribute to the ability of D. radiodurans to survive under conditions of starvation, oxidative stress, and high amounts of DNA damage were identified. Deinococcus radiodurans represents an organism in which all systems for DNA repair, DNA damage export, desiccation and starvation recovery, and genetic redundancy are present in one cell.

...read moreread less

931 citations

1
2
3
4
…
5
6
7
8

Collapse

Cited by

PDF

Open Access

More filters

疟原虫var基因转换速率变化导致抗原变异[英]／Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A

[...]

宁北芳, 朱淮民

28 Jul 2005

TL;DR: PfPMP1）与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作�ly.

...read moreread less

Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1（PfPMP1）与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员，通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

...read moreread less

18,940 citations

Journal Article•DOI•

The sequence of the human genome.

[...]

J. Craig Venter¹, Mark Raymond Adams¹, Eugene W. Myers¹, Peter W. Li¹ +269 more•Institutions (12)

16 Feb 2001-Science

TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.

...read moreread less

Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

...read moreread less

12,098 citations

Journal Article•DOI•

The PRIDE database and related tools and resources in 2019: improving support for quantification data.

[...]

Yasset Perez-Riverol¹, Attila Csordas¹, Jingwen Bai¹, Manuel Bernal-Llinares¹, Suresh Hewapathirana¹, Deepti J. Kundu¹, Avinash Inuganti¹, Johannes Griss¹, Johannes Griss², Gerhard Mayer³, Martin Eisenacher³, Enrique Perez¹, Julian Uszkoreit³, Julianus Pfeuffer⁴, Timo Sachsenberg⁴, Şule Yılmaz⁵, Shivani Tiwary⁵, Juergen Cox⁵, Enrique Audain, Mathias Walzer¹, Andrew F. Jarnuczak¹, Tobias Ternent¹, Alvis Brazma¹, Juan Antonio Vizcaíno¹ - Show less +20 more•Institutions (5)

European Bioinformatics Institute¹, Medical University of Vienna², Ruhr University Bochum³, University of Tübingen⁴, Max Planck Society⁵

08 Jan 2019-Nucleic Acids Research

TL;DR: Key statistics on the current data contents and volume of downloads are outlined, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas are outlined.

...read moreread less

Abstract: The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world’s largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.

...read moreread less

5,735 citations

Journal Article•DOI•

A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species

[...]

Robert J. Elshire¹, Jeffrey C. Glaubitz¹, Qi-ying Sun¹, Jesse Poland², Ken Kawamoto¹, Edward S. Buckler¹, Edward S. Buckler², Sharon E. Mitchell¹ - Show less +4 more•Institutions (2)

Cornell University¹, United States Department of Agriculture²

04 May 2011-PLOS ONE

TL;DR: A procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs) is reported, which is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches.

...read moreread less

Abstract: Advances in next generation technologies have driven the costs of DNA sequencing down to the point that genotyping-by-sequencing (GBS) is now feasible for high diversity, large genome species. Here, we report a procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs). This approach is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches. By using methylation-sensitive REs, repetitive regions of genomes can be avoided and lower copy regions targeted with two to three fold higher efficiency. This tremendously simplifies computationally challenging alignment problems in species with high levels of genetic diversity. The GBS procedure is demonstrated with maize (IBM) and barley (Oregon Wolfe Barley) recombinant inbred populations where roughly 200,000 and 25,000 sequence tags were mapped, respectively. An advantage in species like barley that lack a complete genome sequence is that a reference map need only be developed around the restriction sites, and this can be done in the process of sample genotyping. In such cases, the consensus of the read clusters across the sequence tagged sites becomes the reference. Alternatively, for kinship analyses in the absence of a reference genome, the sequence tags can simply be treated as dominant markers. Future application of GBS to breeding, conservation, and global species and population surveys may allow plant breeders to conduct genomic selection on a novel germplasm or species without first having to develop any prior molecular tools, or conservation biologists to determine population structure without prior knowledge of the genome or diversity in the species.

...read moreread less

5,163 citations

Journal Article•DOI•

Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen.

[...]

Charles K. Stover, X. Q. Pham¹, A. L. Erwin, S. D. Mizoguchi, Paul Warrener, Mark J. Hickey, Fiona S. L. Brinkman², W. O. Hufnagle, D. J. Kowalik, Lagrou Mj, R. L. Garber, L. Goltry, E. Tolentino, S. Westbrock-Wadman, Ying Yuan, L. L. Brody, S. N. Coulter, K. R. Folger, Arnold Kas¹, K. Larbig³, R. Lim¹, Kelly D. Smith¹, David H. Spencer¹, Gane Ka-Shu Wong¹, Z. Wu¹, Ian T. Paulsen⁴, Ian T. Paulsen⁵, Jonathan Reizer⁴, Milton H. Saier⁴, Robert E. W. Hancock², Stephen Lory¹, Maynard V. Olson¹ - Show less +28 more•Institutions (5)

University of Washington¹, University of British Columbia², Hochschule Hannover³, University of California, San Diego⁴, Research Medical Center⁵

31 Aug 2000-Nature

TL;DR: It is proposed that the size and complexity of the P. aeruginosa genome reflect an evolutionary adaptation permitting it to thrive in diverse environments and resist the effects of a variety of antimicrobial substances.

...read moreread less

Abstract: Pseudomonas aeruginosa is a ubiquitous environmental bacterium that is one of the top three causes of opportunistic human infections. A major factor in its prominence as a pathogen is its intrinsic resistance to antibiotics and disinfectants. Here we report the complete sequence of P. aeruginosa strain PAO1. At 6.3 million base pairs, this is the largest bacterial genome sequenced, and the sequence provides insights into the basis of the versatility and intrinsic drug resistance of P. aeruginosa. Consistent with its larger genome size and environmental adaptability, P. aeruginosa contains the highest proportion of regulatory genes observed for a bacterial genome and a large number of genes involved in the catabolism, transport and efflux of organic compounds as well as four potential chemotaxis systems. We propose that the size and complexity of the P. aeruginosa genome reflect an evolutionary adaptation permitting it to thrive in diverse environments and resist the effects of a variety of antimicrobial substances.

...read moreread less

4,220 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse