Home
/
Authors
/
Mary Barnstead

Author

Mary Barnstead

Other affiliations: J. Craig Venter Institute

Bio: Mary Barnstead is an academic researcher from Celera Corporation. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 6, co-authored 6 publications receiving 14907 citations. Previous affiliations of Mary Barnstead include J. Craig Venter Institute.

Topics: Genome, Gene, Gene density, Reference genome, Genome project ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The sequence of the human genome.

[...]

J. Craig Venter¹, Mark Raymond Adams¹, Eugene W. Myers¹, Peter W. Li¹ +269 more•Institutions (12)

16 Feb 2001-Science

TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.

...read moreread less

Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

...read moreread less

12,098 citations

Journal Article•DOI•

The Sequence of the Human Genome

[...]

J. Craig Venter¹, Mark Raymond Adams, Eugene W. Myers, Peter W. Li +269 more•Institutions (1)

01 Sep 2015-Clinical Chemistry

...read moreread less

Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional ∼12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

...read moreread less

1,674 citations

Journal Article•DOI•

Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana

[...]

Xiaoying Lin, Samir Kaul, Steve Rounsley, Terrance Shea, Maria-Ines Benito, Christopher D. Town, Claire Fujii, Tanya Mason, Cheryl Bowman, Mary Barnstead, Tamara Feldblyum, C. Robin Buell, Karen A. Ketchum, John Lee, Catherine M. Ronning, Hean L. Koo, Kelly Moffat, Lisa A. Cronin, Mian Shen, Grace Pai, Susan Van Aken, Lowell Umayam, Luke J. Tallon, John Gill, Mark Raymond Adams¹, Ana J. Carrera, Todd Creasy, Howard M. Goodman², Chris Somerville³, Gregory P. Copenhaver⁴, Daphne Preuss⁴, William C. Nierman, Owen White, Jonathan A. Eisen, Steven L. Salzberg, Claire M. Fraser, J. Craig Venter¹ - Show less +33 more•Institutions (4)

Celera Corporation¹, Harvard University², Carnegie Institution for Science³, University of Chicago⁴

16 Dec 1999-Nature

TL;DR: The sequence of chromosome 2 from the Columbia ecotype is reported in two gap-free assemblies (contigs) of 3.6 and 16 megabases, which represents the longest published stretch of uninterrupted DNA sequence assembled from any organism to date.

...read moreread less

Abstract: Arabidopsis thaliana (Arabidopsis) is unique among plant model organisms in having a small genome (130-140 Mb), excellent physical and genetic maps, and little repetitive DNA. Here we report the sequence of chromosome 2 from the Columbia ecotype in two gap-free assemblies (contigs) of 3.6 and 16 megabases (Mb). The latter represents the longest published stretch of uninterrupted DNA sequence assembled from any organism to date. Chromosome 2 represents 15% of the genome and encodes 4,037 genes, 49% of which have no predicted function. Roughly 250 tandem gene duplications were found in addition to large-scale duplications of about 0.5 and 4.5 Mb between chromosomes 2 and 1 and between chromosomes 2 and 4, respectively. Sequencing of nearly 2 Mb within the genetically defined centromere revealed a low density of recognizable genes, and a high density and diverse range of vestigial and presumably inactive mobile elements. More unexpected is what appears to be a recent insertion of a continuous stretch of 75% of the mitochondrial genome into chromosome 2.

...read moreread less

792 citations

Journal Article•DOI•

The genome of Nanoarchaeum equitans: Insights into early archaeal evolution and derived parasitism

[...]

Elizabeth M. Waters, Michael J. Hohn, Ivan Ahel, David E. Graham, Mark Raymond Adams, Mary Barnstead, Karen Beeson, Lisa Bibbs, Randall Bolanos, Martin Keller, Keith A. Kretz, Xiaoying Lin, Eric J. Mathur, Jingwei Ni, Mircea Podar, Toby Richardson, Granger G. Sutton, Melvin I. Simon, Dieter Söll, Karl O. Stetter, Jay M. Short, Michiel Noordewier - Show less +18 more

28 Oct 2003-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The hyperthermophile Nanoarchaeum equitans is an obligate symbiont growing in coculture with the crenarchaeon Ignicoccus, and represents a basal archaeal lineage and has a highly reduced genome.

...read moreread less

Abstract: The hyperthermophile Nanoarchaeum equitans is an obligate symbiont growing in coculture with the crenarchaeon Ignicoccus. Ribosomal protein and rRNA-based phylogenies place its branching point early in the archaeal lineage, representing the new archaeal kingdom Nanoarchaeota. The N. equitans genome (490,885 base pairs) encodes the machinery for information processing and repair, but lacks genes for lipid, cofactor, amino acid, or nucleotide biosyntheses. It is the smallest microbial genome sequenced to date, and also one of the most compact, with 95% of the DNA predicted to encode proteins or stable RNAs. Its limited biosynthetic and catabolic capacity indicates that N. equitans' symbiotic relationship to Ignicoccus is parasitic, making it the only known archaeal parasite. Unlike the small genomes of bacterial parasites that are undergoing reductive evolution, N. equitans has few pseudogenes or extensive regions of noncoding DNA. This organism represents a basal archaeal lineage and has a highly reduced genome.

...read moreread less

506 citations

Journal Article•DOI•

Genome duplications and other features in 12 Mb of DNA sequence from human chromosome 16p and 16q.

[...]

Brendan J. Loftus¹, Ung Jin Kim², Victoria P. Sneddon³, Francis Kalush¹, Rhonda Brandon¹, Joyce Fuhrmann¹, Tanya Mason¹, Marie L. Crosby¹, Mary Barnstead¹, Lisa A. Cronin¹, Anne Deslattes Mays¹, Yicheng Cao², Robert X. Xu², Hyung Lyun Kang², Steve Mitchell², Evan E. Eichler⁴, Peter C. Harris³, J. Craig Venter¹, Mark Raymond Adams¹ - Show less +15 more•Institutions (4)

J. Craig Venter Institute¹, California Institute of Technology², John Radcliffe Hospital³, Case Western Reserve University⁴

15 Sep 1999-Genomics

TL;DR: The apparent gene density varies throughout the region, but the number of genes predicted suggests that this is a gene-poor region, and this result may also suggest that the total number of human genes is likely to be at the lower end of published estimates.

...read moreread less

188 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Initial sequencing and analysis of the human genome.

[...]

Eric S. Lander¹, Lauren Linton¹, Bruce W. Birren¹, Chad Nusbaum¹ +245 more•Institutions (29)

15 Feb 2001-Nature

TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.

...read moreread less

Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

...read moreread less

22,269 citations

Journal Article•DOI•

The sequence of the human genome.

[...]

J. Craig Venter¹, Mark Raymond Adams¹, Eugene W. Myers¹, Peter W. Li¹ +269 more•Institutions (12)

16 Feb 2001-Science

...read moreread less

12,098 citations

Journal Article•DOI•

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

[...]

Daniel R. Zerbino¹, Ewan Birney¹•Institutions (1)

European Bioinformatics Institute¹

01 May 2008-Genome Research

TL;DR: Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies and is in close agreement with simulated results without read-pair information.

...read moreread less

Abstract: We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of approximately 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.

...read moreread less

9,389 citations

Journal Article•DOI•

Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.

[...]

Arabidopsis Genome Initiative¹•Institutions (1)

J. Craig Venter Institute¹

14 Dec 2000-Nature

TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

...read moreread less

Abstract: The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

...read moreread less

8,742 citations

Journal Article•DOI•

The Protein Kinase Complement of the Human Genome

[...]

Gerard Manning¹, David Whyte¹, Ricardo Martinez¹, Tony Hunter², Sucha Sudarsanam³, Sucha Sudarsanam¹ - Show less +2 more•Institutions (3)

Pfizer¹, Salk Institute for Biological Studies², Pharmacia³

06 Dec 2002-Science

TL;DR: The protein kinase complement of the human genome is catalogued using public and proprietary genomic, complementary DNA, and expressed sequence tag sequences to provide a starting point for comprehensive analysis of protein phosphorylation in normal and disease states and a detailed view of the current state of human genome analysis through a focus on one large gene family.

...read moreread less

Abstract: We have catalogued the protein kinase complement of the human genome (the "kinome") using public and proprietary genomic, complementary DNA, and expressed sequence tag (EST) sequences. This provides a starting point for comprehensive analysis of protein phosphorylation in normal and disease states, as well as a detailed view of the current state of human genome analysis through a focus on one large gene family. We identify 518 putative protein kinase genes, of which 71 have not previously been reported or described as kinases, and we extend or correct the protein sequences of 56 more kinases. New genes include members of well-studied families as well as previously unidentified families, some of which are conserved in model organisms. Classification and comparison with model organism kinomes identified orthologous groups and highlighted expansions specific to human and other lineages. We also identified 106 protein kinase pseudogenes. Chromosomal mapping revealed several small clusters of kinase genes and revealed that 244 kinases map to disease loci or cancer amplicons.

...read moreread less

7,486 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse