Home
/
Authors
/
John Lopez

Author

John Lopez

Other affiliations: European Bioinformatics Institute

Bio: John Lopez is an academic researcher from Celera Corporation. The author has contributed to research in topics: Genome & Gene density. The author has an hindex of 6, co-authored 6 publications receiving 16785 citations. Previous affiliations of John Lopez include European Bioinformatics Institute.

Topics: Genome, Gene density, Genome evolution, Reference genome, Genome project ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The sequence of the human genome.

[...]

J. Craig Venter¹, Mark Raymond Adams¹, Eugene W. Myers¹, Peter W. Li¹ +269 more•Institutions (12)

16 Feb 2001-Science

TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.

...read moreread less

Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

...read moreread less

12,098 citations

A global reference for human genetic variation

[...]

Adam Auton, Gonçalo R. Abecasis, David Altshuler, Richard Durbin +476 more

01 Oct 2015

TL;DR: The 1000 Genomes Project as mentioned in this paper provided a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and reported the completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole genome sequencing, deep exome sequencing and dense microarray genotyping.

...read moreread less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

3,247 citations

Journal Article•DOI•

The Genome Sequence of the Malaria Mosquito Anopheles gambiae

[...]

Robert A. Holt¹, G. Mani Subramanian¹, Aaron L. Halpern¹, Granger G. Sutton¹, Rosane Charlab¹, Deborah R. Nusskern¹, Patrick Wincker², Andrew G. Clark³, José M. C. Ribeiro⁴, Ron Wides⁵, Steven L. Salzberg⁶, Brendan J. Loftus⁶, Mark Yandell¹, William H. Majoros¹, William H. Majoros⁶, Douglas B. Rusch¹, Zhongwu Lai¹, Cheryl L. Kraft¹, Josep F. Abril, Véronique Anthouard², Peter Arensburger⁷, Peter W. Atkinson⁷, Holly Baden¹, Véronique de Berardinis², Danita Baldwin¹, Vladimir Benes, Jim Biedler⁸, Claudia Blass, Randall Bolanos¹, Didier Boscus², Mary Barnstead¹, Shuang Cai¹, Kabir Chatuverdi¹, George K. Christophides, Mathew A. Chrystal⁹, Michele Clamp¹⁰, Anibal Cravchik¹, Val Curwen¹⁰, Ali N Dana⁹, Arthur L. Delcher¹, Ian M. Dew¹, Cheryl A. Evans¹, Michael Flanigan¹, Anne Grundschober-Freimoser¹¹, Lisa Friedli⁷, Zhiping Gu¹, Ping Guan¹, Roderic Guigó, Maureen E. Hillenmeyer⁹, Susanne L. Hladun¹, James R. Hogan⁹, Young S. Hong⁹, Jeffrey Hoover¹, Olivier Jaillon², Zhaoxi Ke⁹, Zhaoxi Ke¹, Chinnappa D. Kodira¹, Kokoza Eb, Anastasios C. Koutsos¹², Ivica Letunic, Alex Levitsky¹, Yong Liang¹, Jhy-Jhu Lin¹, Jhy-Jhu Lin⁶, Neil F. Lobo⁹, John Lopez¹, Joel A. Malek⁶, Tina C. McIntosh¹, Stephan Meister, Jason R. Miller¹, Clark M. Mobarry¹, Emmanuel Mongin¹³, Sean D. Murphy¹, David A. O'Brochta¹¹, Cynthia Pfannkoch¹, Rong Qi¹, Megan A. Regier¹, Karin A. Remington¹, Hongguang Shao⁸, Maria V. Sharakhova⁹, Cynthia Sitter¹, Jyoti Shetty⁶, Thomas J. Smith¹, Renee Strong¹, Jingtao Sun¹, Dana Thomasova, Lucas Q. Ton⁹, Pantelis Topalis¹², Zhijian Tu⁸, Maria F. Unger⁹, Brian P. Walenz¹, Aihui Wang¹, Jian Wang¹, Mei Wang¹, X. Wang⁹, Kerry J. Woodford¹, Jennifer R. Wortman⁶, Jennifer R. Wortman¹, Martin Wu⁶, Alison Yao¹, Evgeny M. Zdobnov, Hongyu Zhang¹, Qi Zhao¹, Shaying Zhao⁶, Shiaoping C. Zhu¹, Igor F. Zhimulev, Mario Coluzzi¹⁴, Alessandra della Torre¹⁴, Charles Roth¹⁵, Christos Louis¹², Francis Kalush¹, Richard J. Mural¹, Eugene W. Myers¹, Mark Raymond Adams¹, Hamilton O. Smith¹, Samuel Broder¹, Malcolm J. Gardner⁶, Claire M. Fraser⁶, Ewan Birney¹³, Peer Bork, Paul T. Brey¹⁵, J. Craig Venter⁶, J. Craig Venter¹, Jean Weissenbach², Fotis C. Kafatos, Frank H. Collins⁹, Stephen L. Hoffman¹ - Show less +123 more•Institutions (15)

Celera Corporation¹, Centre national de la recherche scientifique², Cornell University³, National Institutes of Health⁴, Bar-Ilan University⁵, TigerLogic⁶, University of California, Riverside⁷, Virginia Tech⁸, University of Notre Dame⁹, Wellcome Trust Sanger Institute¹⁰, University of Maryland Biotechnology Institute¹¹, University of Crete¹², European Bioinformatics Institute¹³, Sapienza University of Rome¹⁴, Pasteur Institute¹⁵

04 Oct 2002-Science

TL;DR: Analysis of the PEST strain of A. gambiae revealed strong evidence for about 14,000 protein-encoding transcripts, and prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted.

...read moreread less

Abstract: Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.

...read moreread less

2,033 citations

Journal Article•DOI•

The Sequence of the Human Genome

[...]

J. Craig Venter¹, Mark Raymond Adams, Eugene W. Myers, Peter W. Li +269 more•Institutions (1)

01 Sep 2015-Clinical Chemistry

...read moreread less

Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional ∼12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

...read moreread less

1,674 citations

Journal Article•DOI•

A Comparison of Whole-Genome Shotgun-Derived Mouse Chromosome 16 and the Human Genome

[...]

Richard J. Mural¹, Mark Raymond Adams¹, Eugene W. Myers¹, Hamilton O. Smith¹ +171 more•Institutions (3)

31 May 2002-Science

TL;DR: Comparison of the structure and protein-coding potential of Mmu 16 with that of the homologous segments of the human genome identifies regions of conserved synteny with human chromosomes (Hsa) 3, 8, 12, 16, 21, and 22.

...read moreread less

Abstract: The high degree of similarity between the mouse and human genomes is demonstrated through analysis of the sequence of mouse chromosome 16 (Mmu 16), which was obtained as part of a whole-genome shotgun assembly of the mouse genome. The mouse genome is about 10% smaller than the human genome, owing to a lower repetitive DNA content. Comparison of the structure and protein-coding potential of Mmu 16 with that of the homologous segments of the human genome identifies regions of conserved synteny with human chromosomes (Hsa) 3, 8, 12, 16, 21, and 22. Gene content and order are highly conserved between Mmu 16 and the syntenic blocks of the human genome. Of the 731 predicted genes on Mmu 16, 509 align with orthologs on the corresponding portions of the human genome, 44 are likely paralogous to these genes, and 164 genes have homologs elsewhere in the human genome; there are 14 genes for which we could find no human counterpart.

...read moreread less

389 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Initial sequencing and analysis of the human genome.

[...]

Eric S. Lander¹, Lauren Linton¹, Bruce W. Birren¹, Chad Nusbaum¹ +245 more•Institutions (29)

15 Feb 2001-Nature

TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.

...read moreread less

Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

...read moreread less

22,269 citations

Journal Article•DOI•

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

[...]

Daniel R. Zerbino¹, Ewan Birney¹•Institutions (1)

European Bioinformatics Institute¹

01 May 2008-Genome Research

TL;DR: Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies and is in close agreement with simulated results without read-pair information.

...read moreread less

Abstract: We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of approximately 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.

...read moreread less

9,389 citations

Journal Article•DOI•

The Protein Kinase Complement of the Human Genome

[...]

Gerard Manning¹, David Whyte¹, Ricardo Martinez¹, Tony Hunter², Sucha Sudarsanam³, Sucha Sudarsanam¹ - Show less +2 more•Institutions (3)

Pfizer¹, Salk Institute for Biological Studies², Pharmacia³

06 Dec 2002-Science

TL;DR: The protein kinase complement of the human genome is catalogued using public and proprietary genomic, complementary DNA, and expressed sequence tag sequences to provide a starting point for comprehensive analysis of protein phosphorylation in normal and disease states and a detailed view of the current state of human genome analysis through a focus on one large gene family.

...read moreread less

Abstract: We have catalogued the protein kinase complement of the human genome (the "kinome") using public and proprietary genomic, complementary DNA, and expressed sequence tag (EST) sequences. This provides a starting point for comprehensive analysis of protein phosphorylation in normal and disease states, as well as a detailed view of the current state of human genome analysis through a focus on one large gene family. We identify 518 putative protein kinase genes, of which 71 have not previously been reported or described as kinases, and we extend or correct the protein sequences of 56 more kinases. New genes include members of well-studied families as well as previously unidentified families, some of which are conserved in model organisms. Classification and comparison with model organism kinomes identified orthologous groups and highlighted expansions specific to human and other lineages. We also identified 106 protein kinase pseudogenes. Chromosomal mapping revealed several small clusters of kinase genes and revealed that 244 kinases map to disease loci or cancer amplicons.

...read moreread less

7,486 citations

Journal Article•DOI•

DNA methylation patterns and epigenetic memory

[...]

Adrian Bird¹•Institutions (1)

University of Edinburgh¹

01 Jan 2002-Genes & Development

TL;DR: The heritability of methylation states and the secondary nature of the decision to invite or exclude methylation support the idea that DNA methylation is adapted for a specific cellular memory function in development.

...read moreread less

Abstract: The character of a cell is defined by its constituent proteins, which are the result of specific patterns of gene expression. Crucial determinants of gene expression patterns are DNA-binding transcription factors that choose genes for transcriptional activation or repression by recognizing the sequence of DNA bases in their promoter regions. Interaction of these factors with their cognate sequences triggers a chain of events, often involving changes in the structure of chromatin, that leads to the assembly of an active transcription complex (e.g., Cosma et al. 1999). But the types of transcription factors present in a cell are not alone sufficient to define its spectrum of gene activity, as the transcriptional potential of a genome can become restricted in a stable manner during development. The constraints imposed by developmental history probably account for the very low efficiency of cloning animals from the nuclei of differentiated cells (Rideout et al. 2001; Wakayama and Yanagimachi 2001). A “transcription factors only” model would predict that the gene expression pattern of a differentiated nucleus would be completely reversible upon exposure to a new spectrum of factors. Although many aspects of expression can be reprogrammed in this way (Gurdon 1999), some marks of differentiation are evidently so stable that immersion in an alien cytoplasm cannot erase the memory. The genomic sequence of a differentiated cell is thought to be identical in most cases to that of the zygote from which it is descended (mammalian B and T cells being an obvious exception). This means that the marks of developmental history are unlikely to be caused by widespread somatic mutation. Processes less irrevocable than mutation fall under the umbrella term “epigenetic” mechanisms. A current definition of epigenetics is: “The study of mitotically and/or meiotically heritable changes in gene function that cannot be explained by changes in DNA sequence” (Russo et al. 1996). There are two epigenetic systems that affect animal development and fulfill the criterion of heritability: DNA methylation and the Polycomb-trithorax group (Pc-G/trx) protein complexes. (Histone modification has some attributes of an epigenetic process, but the issue of heritability has yet to be resolved.) This review concerns DNA methylation, focusing on the generation, inheritance, and biological significance of genomic methylation patterns in the development of mammals. Data will be discussed favoring the notion that DNA methylation may only affect genes that are already silenced by other mechanisms in the embryo. Embryonic transcription, on the other hand, may cause the exclusion of the DNA methylation machinery. The heritability of methylation states and the secondary nature of the decision to invite or exclude methylation support the idea that DNA methylation is adapted for a specific cellular memory function in development. Indeed, the possibility will be discussed that DNA methylation and Pc-G/trx may represent alternative systems of epigenetic memory that have been interchanged over evolutionary time. Animal DNA methylation has been the subject of several recent reviews (Bird and Wolffe 1999; Bestor 2000; Hsieh 2000; Costello and Plass 2001; Jones and Takai 2001). For recent reviews of plant and fungal DNA methylation, see Finnegan et al. (2000), Martienssen and Colot (2001), and Matzke et al. (2001).

...read moreread less

6,691 citations

Journal Article•DOI•

Initial sequencing and comparative analysis of the mouse genome.

[...]

Robert H. Waterston¹, Kerstin Lindblad-Toh², Ewan Birney, Jane Rogers³ +219 more•Institutions (26)

05 Dec 2002-Nature

TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.

...read moreread less

Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

...read moreread less

6,643 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse