Home
/
Authors
/
Mark Gerstein

Author

Mark Gerstein

Other affiliations: Rutgers University, Structural Genomics Consortium, University of Antwerp ...read more

Bio: Mark Gerstein is an academic researcher from Yale University. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 168, co-authored 751 publications receiving 149578 citations. Previous affiliations of Mark Gerstein include Rutgers University & Structural Genomics Consortium.

Topics: Genome, Gene, Human genome, Genomics, Pseudogene ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Landscape and variation of novel retroduplications in 26 human populations.

[...]

Yan Zhang¹, Shantao Li¹, Alexej Abyzov², Mark Gerstein¹•Institutions (2)

Yale University¹, Mayo Clinic²

29 Jun 2017-PLOS Computational Biology

TL;DR: This investigation provides insight into the functional impact and association with genomic elements of retroduplications, and expects the approach and analytical methodology to have application in a more clinical context, where exome sequencing data is abundant and the discovery of ret Reproduplications can potentially improve the accuracy of SNP calling.

...read moreread less

Abstract: Retroduplications come from reverse transcription of mRNAs and their insertion back into the genome. Here, we performed comprehensive discovery and analysis of retroduplications in a large cohort of 2,535 individuals from 26 human populations, as part of 1000 Genomes Phase 3. We developed an integrated approach to discover novel retroduplications combining high-coverage exome and low-coverage whole-genome sequencing data, utilizing information from both exon-exon junctions and discordant paired-end reads. We found 503 parent genes having novel retroduplications absent from the reference genome. Based solely on retroduplication variation, we built phylogenetic trees of human populations; these represent superpopulation structure well and indicate that variable retroduplications are effective population markers. We further identified 43 retroduplication parent genes differentiating superpopulations. This group contains several interesting insertion events, including a SLMO2 retroduplication and insertion into CAV3, which has a potential disease association. We also found retroduplications to be associated with a variety of genomic features: (1) Insertion sites were correlated with regular nucleosome positioning. (2) They, predictably, tend to avoid conserved functional regions, such as exons, but, somewhat surprisingly, also avoid introns. (3) Retroduplications tend to be co-inserted with young L1 elements, indicating recent retrotranspositional activity, and (4) they have a weak tendency to originate from highly expressed parent genes. Our investigation provides insight into the functional impact and association with genomic elements of retroduplications. We anticipate our approach and analytical methodology to have application in a more clinical context, where exome sequencing data is abundant and the discovery of retroduplications can potentially improve the accuracy of SNP calling.

...read moreread less

26 citations

Journal Article•DOI•

Extending gene ontology in the context of extracellular RNA and vesicle communication.

[...]

Kei-Hoi Cheung¹, Shivakumar Keerthikumar², Paola Roncaglia³, Sai Lakshmi Subramanian⁴, Matthew E. Roth⁴, Monisha Samuel², Sushma Anand², Lahiru Gangoda², Stephen Gould⁵, Roger P. Alexander⁶, David J. Galas⁶, Mark Gerstein¹, Andrew F. Hill², Robert R. Kitchen¹, Jan Lötvall⁷, Tushar Patel⁸, Dena Procaccini⁹, Peter J. Quesenberry, Joel Rozowsky¹, Robert L. Raffai¹⁰, Aleksandra Shypitsyna³, Andrew I. Su¹¹, Clotilde Théry¹², Kasey C. Vickers¹³, Marca H. M. Wauben¹⁴, Suresh Mathivanan², Aleksandar Milosavljevic⁴, Louise C. Laurent¹⁵ - Show less +24 more•Institutions (15)

Yale University¹, La Trobe University², European Bioinformatics Institute³, Baylor College of Medicine⁴, Johns Hopkins University School of Medicine⁵, Pacific Northwest Diabetes Research Institute⁶, University of Gothenburg⁷, Mayo Clinic⁸, National Institute on Drug Abuse⁹, University of California, San Francisco¹⁰, Scripps Research Institute¹¹, PSL Research University¹², Vanderbilt University¹³, Utrecht University¹⁴, University of California, San Diego¹⁵

12 Apr 2016-Journal of Biomedical Semantics

TL;DR: This work has added 7 new terms and modified 9 existing terms (along with their synonyms and relationships) to GO and incorporated some of the GO terms into annotations of samples from the exRNA Atlas and implemented a faceted search interface based on such annotations.

...read moreread less

Abstract: To address the lack of standard terminology to describe extracellular RNA (exRNA) data/metadata, we have launched an inter-community effort to extend the Gene Ontology (GO) with subcellular structure concepts relevant to the exRNA domain. By extending GO in this manner, the exRNA data/metadata will be more easily annotated and queried because it will be based on a shared set of terms and relationships relevant to extracellular research. By following a consensus-building process, we have worked with several academic societies/consortia, including ERCC, ISEV, and ASEMV, to identify and approve a set of exRNA and extracellular vesicle-related terms and relationships that have been incorporated into GO. In addition, we have initiated an ongoing process of extractions of gene product annotations associated with these terms from Vesiclepedia and ExoCarta, conversion of the extracted annotations to Gene Association File (GAF) format for batch submission to GO, and curation of the submitted annotations by the GO Consortium. As a use case, we have incorporated some of the GO terms into annotations of samples from the exRNA Atlas and implemented a faceted search interface based on such annotations. We have added 7 new terms and modified 9 existing terms (along with their synonyms and relationships) to GO. Additionally, 18,695 unique coding gene products (mRNAs and proteins) and 963 unique non-coding gene products (ncRNAs) which are associated with the terms: “extracellular vesicle”, “extracellular exosome”, “apoptotic body”, and “microvesicle” were extracted from ExoCarta and Vesiclepedia. These annotations are currently being processed for submission to GO. As an inter-community effort, we have made a substantial update to GO in the exRNA context. We have also demonstrated the utility of some of the new GO terms for sample annotation and metadata search.

...read moreread less

25 citations

Journal Article•DOI•

Child development and structural variation in the human genome.

[...]

Ying Zhang¹, Rajini R Haraksingh², Fabian Grubert², Alexej Abyzov¹, Mark Gerstein¹, Sherman M. Weissman¹, Alexander E. Urban² - Show less +3 more•Institutions (2)

Yale University¹, Stanford University²

01 Jan 2013-Child Development

TL;DR: An overview of the phenomenon of structural variation in the human genome sequence is provided, describing the novel genomics technologies that are revolutionizing the way structural variation is studied and giving examples of genomic structural variations that affect child development.

...read moreread less

Abstract: Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects of structural variation on normal child development, but such effects could be of considerable significance. This review provides an overview of the phenomenon of structural variation in the human genome sequence, describing the novel genomics technologies that are revolutionizing the way structural variation is studied and giving examples of genomic structural variations that affect child development.

...read moreread less

24 citations

Journal Article•DOI•

Artificial Transmembrane Oncoproteins Smaller than the Bovine Papillomavirus E5 Protein Redefine Sequence Requirements for Activation of the Platelet-Derived Growth Factor β Receptor

[...]

Kristina Talbert-Slagle¹, Sara A. Marlatt, Francisco N. Barrera, Ekta Khurana, Joanne Oates², Mark Gerstein, Donald M. Engelman, Ann M. Dixon², Daniel DiMaio¹ - Show less +5 more•Institutions (2)

Yale University¹, University of Warwick²

01 Oct 2009-Journal of Virology

TL;DR: Small artificial proteins that bear little resemblance to a viral oncoprotein can nevertheless productively interact with the same cellular target and dimerize noncovalently and transform cells efficiently.

...read moreread less

Abstract: The bovine papillomavirus E5 protein (BPV E5) is a 44-amino-acid homodimeric transmembrane protein that binds directly to the transmembrane domain of the platelet-derived growth factor (PDGF) β receptor and induces ligand-independent receptor activation. Three specific features of BPV E5 are considered important for its ability to activate the PDGF β receptor and transform mouse fibroblasts: a pair of C-terminal cysteines, a transmembrane glutamine, and a juxtamembrane aspartic acid. By using a new genetic technique to screen libraries expressing artificial transmembrane proteins for activators of the PDGF β receptor, we isolated much smaller proteins, from 32 to 36 residues, that lack all three of these features yet still dimerize noncovalently, specifically activate the PDGF β receptor via its transmembrane domain, and transform cells efficiently. The primary amino acid sequence of BPV E5 is virtually unrecognizable in some of these proteins, which share as few as seven consecutive amino acids with the viral protein. Thus, small artificial proteins that bear little resemblance to a viral oncoprotein can nevertheless productively interact with the same cellular target. We speculate that similar cellular proteins may exist but have been overlooked due to their small size and hydrophobicity.

...read moreread less

24 citations

Journal Article•DOI•

Using a measure of structural variation to define a core for the globins

[...]

Mark Gerstein¹, Russ B. Altman¹•Institutions (1)

Stanford University¹

01 Dec 1995-Bioinformatics

TL;DR: The variability measure implicit in the core structures is compared with measures of sequence variability, using a procedure for measuring sequence variability that helps correct for the biased sampling in the databanks and finds, somewhat surprisingly, that sequence variation does not appear to correlate with structural variation.

...read moreread less

Abstract: As the database of three-dimensi onal protein structures expands, it becomes possible to classify related structures into families. Some of these families, such as the globins, have enough members to allow statistical analysis of conserved features. Previously, we have shown that a probabilistic representation based on means and variances can be useful for defining structural cores for large families. These cores contain the subset of atoms that are in essentially the same relative positions in all members of the family. In addition to defining a core, our method creates an ordered list of atoms, ranked by their structural variation. In applying our core-finding procedure to the globins, we find that helices A, B,G and Hform a structural core with low variance. These helices fold early in the folding pathway, and superimpose well with helices in the helixlurn-helix repressor protein family. The non-core helices (F and the parts of other helices that interact with it) are associated with the functional differences among the globins, and are encoded within a separate exon. We have also compared the variablity measure implicit in our core structures with measures of sequence variability, using a procedure for measuring sequence variability that helps correct for the biased sampling in the databanks. We find, somewhat surprisingly, that sequence variation does not appear to correlate with structural variation.

...read moreread less

24 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
…
95
96
97
98
99
100
101
…
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

[...]

Stephen F. Altschul¹, Thomas L. Madden, Alejandro A. Schäffer¹, Jinghui Zhang, Zheng Zhang², Webb Miller², David J. Lipman - Show less +3 more•Institutions (2)

National Institutes of Health¹, Pennsylvania State University²

01 Sep 1997-Nucleic Acids Research

TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.

...read moreread less

Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

...read moreread less

70,111 citations

Journal Article•DOI•

The Protein Data Bank

[...]

Helen M. Berman¹, John D. Westbrook, Zukang Feng, Gary L. Gilliland, Talapady N. Bhat, Helge Weissig, Ilya N. Shindyalov, Philip E. Bourne - Show less +4 more•Institutions (1)

Rutgers University¹

01 Jan 2000-Nucleic Acids Research

TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.

...read moreread less

Abstract: The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.

...read moreread less

34,239 citations

Journal Article•DOI•

STAR: ultrafast universal RNA-seq aligner

[...]

Alexander Dobin¹, Carrie A. Davis¹, Felix Schlesinger¹, Jorg Drenkow¹, Chris Zaleski¹, Sonali Jha¹, Philippe Batut¹, Mark Chaisson¹, Thomas R. Gingeras¹ - Show less +5 more•Institutions (1)

Cold Spring Harbor Laboratory¹

01 Jan 2013-Bioinformatics

TL;DR: The Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure outperforms other aligners by a factor of >50 in mapping speed.

...read moreread less

Abstract: Motivation Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. Availability and implementation STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

...read moreread less

30,684 citations

Journal Article•DOI•

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

[...]

Ben Langmead¹, Cole Trapnell¹, Mihai Pop¹, Steven L. Salzberg¹•Institutions (1)

University of Maryland, College Park¹

04 Mar 2009-Genome Biology

TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.

...read moreread less

Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

...read moreread less

20,335 citations

疟原虫var基因转换速率变化导致抗原变异[英]／Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A

[...]

宁北芳, 朱淮民

28 Jul 2005

TL;DR: PfPMP1）与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作�ly.

...read moreread less

Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1（PfPMP1）与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员，通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

...read moreread less

18,940 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse