Home
/
Authors
/
Mark Gerstein

Author

Mark Gerstein

Other affiliations: Rutgers University, Structural Genomics Consortium, University of Antwerp ...read more

Bio: Mark Gerstein is an academic researcher from Yale University. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 168, co-authored 751 publications receiving 149578 citations. Previous affiliations of Mark Gerstein include Rutgers University & Structural Genomics Consortium.

Topics: Genome, Gene, Human genome, Genomics, Pseudogene ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Positional artifacts in microarrays: experimental verification and construction of COP, an automated detection tool

[...]

Haiyuan Yu¹, Katherine T. Nguyen¹, Thomas Royce¹, Jiang Qian¹, Kenneth Nelson¹, Michael Snyder¹, Mark Gerstein¹ - Show less +3 more•Institutions (1)

Yale University¹

07 Dec 2006-Nucleic Acids Research

TL;DR: An automated web tool is developed—COP (COrrelations by Positional artifacts) to detect these artifacts in microarray experiments, which find that genes that are close on the microarray chips tend to have higher correlations between their expression profiles.

...read moreread less

Abstract: Microarray technology is currently one of the most widely-used technologies in biology. Many studies focus on inferring the function of an unknown gene from its co-expressed genes. Here, we are able to show that there are two types of positional artifacts in microarray data introducing spurious correlations between genes. First, we find that genes that are close on the microarray chips tend to have higher correlations between their expression profiles. We call this the ‘chip artifact’. Our calculations suggest that the carry-over during the printing process is one of the major sources of this type of artifact, which is later confirmed by our experiments. Based on our experiments, the measured intensity of a microarray spot contains 0.1% (for fully-hybridized spots) to 93% (for un-hybridized ones) of noise resulting from this artifact. Secondly, we, for the first time, show that genes that are close on the microtiter plates in microarray experiments also tend to have higher correlations. We call this the ‘plate artifact’. Both types of artifacts exist with different severity in all cDNA microarray experiments that we analyzed. Therefore, we develop an automated web tool— COP (COrrelations by Positional artifacts) to detect these artifacts in microarray experiments. COP has been integrated with the microarray data normalization tool, ExpressYourself, which is available at http:// bioinfo.mbb.yale.edu/ExpressYourself/. Together, the two can eliminate most of the common noises in microarray data.

...read moreread less

13 citations

Journal Article•DOI•

Structured digital tables on the Semantic Web: toward a structured digital literature

[...]

Kei-Hoi Cheung, Matthias Samwald¹, Matthias Samwald², Raymond K. Auerbach³, Mark Gerstein³ - Show less +1 more•Institutions (3)

National University of Ireland, Galway¹, Konrad Lorenz Institute for Evolution and Cognition Research², Yale University³

01 Jan 2010-Molecular Systems Biology

TL;DR: This work identifies three canonical types of tables (conveying information about properties, networks, and concept hierarchies) and shows how more complex tables can be built from these basic types and presents examples for converting representative tables into triples.

...read moreread less

Abstract: In parallel to the growth in bioscience databases, biomedical publications have increased exponentially in the past decade. However, the extraction of high-quality information from the corpus of scientific literature has been hampered by the lack of machine-interpretable content, despite text-mining advances. To address this, we propose creating a structured digital table as part of an overall effort in developing machine-readable, structured digital literature. In particular, we envision transforming publication tables into standardized triples using Semantic Web approaches. We identify three canonical types of tables (conveying information about properties, networks, and concept hierarchies) and show how more complex tables can be built from these basic types. We envision that authors would create tables initially using the structured triples for canonical types and then have them visually rendered for publication, and we present examples for converting representative tables into triples. Finally, we discuss how ‘stub' versions of structured digital tables could be a useful bridge for connecting together the literature with databases, allowing the former to more precisely document the later.

...read moreread less

13 citations

Posted Content•DOI•

Multiple laboratory mouse reference genomes define strain specific haplotypes and novel functional loci

[...]

Jingtao Lilue¹, Jingtao Lilue², Anthony G. Doran¹, Anthony G. Doran², Ian T. Fiddes³, Monica Abrudan², Joel Armstrong³, Ruth Bennett¹, William Chow², Joanna Collins², Stephan C. Collins⁴, Anne Czechanski, Petr Danecek², Mark Diekhans³, Dirk-Dominic Dolle², Matthew Dunn², Richard Durbin², Dent Earl³, Anne C. Ferguson-Smith⁵, Paul Flicek¹, Jonathan Flint⁶, Adam Frankish², Adam Frankish¹, Beiyuan Fu², Mark Gerstein⁷, James G. R. Gilbert², Leo Goodstadt⁸, Jen Harrow², Kerstin Howe², Mikhail Kolmogorov⁹, Köenig S¹⁰, Lelliott C², Jane E. Loveland¹, Jane E. Loveland², Clayton E. Mathews¹¹, Richard Mott¹², Paul R. Muir⁷, Fabio C. P. Navarro⁷, Duncan T. Odom⁵, Naomi R Park², Sarah Pelan², Phan Sk¹³, Michael A. Quail², Laura G. Reinholdt, Lars Romoth¹⁰, Lesley Shirley², Cristina Sisu⁷, Marcela K. Sjoberg-Herrera¹⁴, Mario Stanke¹⁰, Charles A. Steward², Mark G. Thomas², Glen Threadgold², Thybert D¹⁵, James Torrance², Kim Wong², Jonathan Wood², Binnaz Yalcin⁴, Fengtang Yang², David J. Adams², Benedict Paten³, Thomas M. Keane², Thomas M. Keane¹ - Show less +58 more•Institutions (15)

European Bioinformatics Institute¹, Wellcome Trust Sanger Institute², University of California, Santa Cruz³, French Institute of Health and Medical Research⁴, University of Cambridge⁵, University of California, Los Angeles⁶, Yale University⁷, Wellcome Trust Centre for Human Genetics⁸, University of California, Berkeley⁹, University of Greifswald¹⁰, University of Florida¹¹, University College London¹², Salk Institute for Biological Studies¹³, Pontifical Catholic University of Chile¹⁴, Norwich Research Park¹⁵

12 Feb 2018-bioRxiv

TL;DR: High quality collection of genomes revealed a previously unannotated gene (Efcab3-like) encoding 5,874 amino acids, one of the largest known in the rodent lineage, and Interestingly, Efcab 3-like−/− mice exhibit severe size anomalies in four regions of the brain suggesting a mechanism of EfcAB3- like regulating brain development.

...read moreread less

Abstract: The most commonly employed mammalian model organism is the laboratory mouse. A wide variety of genetically diverse inbred mouse strains, representing distinct physiological states, disease susceptibilities, and biological mechanisms have been developed over the last century. We report full length draft de novo genome assemblies for 16 of the most widely used inbred strains and reveal for the first time extensive strain-specific haplotype variation. We identify and characterise 2,567 regions on the current Genome Reference Consortium mouse reference genome exhibiting the greatest sequence diversity between strains. These regions are enriched for genes involved in defence and immunity, and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. Several immune related loci, some in previously identified QTLs for disease response have novel haplotypes not present in the reference that may explain the phenotype. We used these genomes to improve the mouse reference genome resulting in the completion of 10 new gene structures, and 62 new coding loci were added to the reference genome annotation. Notably this high quality collection of genomes revealed a previously unannotated gene (Efcab3-like) encoding 5,874 amino acids, one of the largest known in the rodent lineage. Interestingly, Efcab3-like-/- mice exhibit severe size anomalies in four regions of the brain suggesting a mechanism of Efcab3-like regulating brain development.

...read moreread less

13 citations

Posted Content•DOI•

Repeat associated mechanisms of genome evolution and function revealed by the Mus caroli and Mus pahari genomes

[...]

David Thybert¹, David Thybert², Maša Roller¹, Fabio C. P. Navarro³, Ian T. Fiddes⁴, Ian Streeter¹, Christine Feig⁵, David Martín-Gálvez¹, Mikhail Kolmogorov⁶, Václav Janoušek⁷, Wasiu Akanni¹, Bronwen Aken¹, Sarah Aldridge⁵, Varshith Chakrapani¹, William Chow⁸, Laura Clarke¹, Carla Cummins¹, Anthony G. Doran⁸, Matthew Dunn⁸, Leo Goodstadt⁹, Kerstin Howe⁸, Matthew Howell¹, Ambre Aurore Josselin¹, Robert C. Karn¹⁰, Christina M. Laukaitis¹⁰, Lilue Jingtao⁸, Fergal J. Martin¹, Matthieu Muffato¹, Michael A. Quail⁸, Cristina Sisu³, Mario Stanke¹¹, Klara Stefflova⁵, Cock van Oosterhout¹², Frédéric Veyrunes¹³, Ben J. Ward², Fengtang Yang⁸, Golbahar Yazdanifar¹⁰, Amonida Zadissa¹, David J. Adams⁸, Alvis Brazma¹, Mark Gerstein³, Benedict Paten⁴, Son Pham, Thomas M. Keane¹, Duncan T. Odom⁵, Paul Flicek¹ - Show less +42 more•Institutions (13)

European Bioinformatics Institute¹, Norwich Research Park², Yale University³, University of California, Santa Cruz⁴, University of Cambridge⁵, University of California, San Diego⁶, Charles University in Prague⁷, Wellcome Trust Sanger Institute⁸, University of Oxford⁹, University of Arizona¹⁰, University of Greifswald¹¹, University of East Anglia¹², University of Montpellier¹³

02 Jul 2017-bioRxiv

TL;DR: A system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species and demonstrated that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology.

...read moreread less

Abstract: Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the Mus caroli and Mus pahari genomes. Together with the Mus musculus and Rattus norvegicus genomes, this set of rodent genomes is similar in divergence times to the Hominidae (human-chimpanzee-gorilla-orangutan). By comparing the evolutionary dynamics between the Muridae and Hominidae, we identified punctate events of chromosome reshuffling that shaped the ancestral karyotype of Mus musculus and Mus caroli between 3 to 6 MYA, but that are absent in the Hominidae. In fact, Hominidae show between four- and seven-fold lower rates of nucleotide change and feature turnover in both neutral and functional sequences suggesting an underlying coherence to the Muridae acceleration. Our system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species. For example, recent LINE activity has remodeled protein-coding loci to a greater extent across the Muridae than the Hominidae, with functional consequences at the species level such as reproductive isolation. Furthermore, we charted a Muridae-specific retrotransposon expansion at unprecedented resolution, revealing how a single nucleotide mutation transformed a specific SINE element into an active CTCF binding site carrier specifically in Mus caroli. This process resulted in thousands of novel, species-specific CTCF binding sites. Our results demonstrate that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology.

...read moreread less

13 citations

Journal Article•DOI•

Helix Interaction Tool (HIT): a web-based tool for analysis of helix-helix interactions in proteins

[...]

Anne E. Counterman Burba¹, Ursula Lehnert¹, Eric Z. Yu¹, Mark Gerstein¹•Institutions (1)

Yale University¹

15 Oct 2006-Bioinformatics

TL;DR: A comprehensive package of tools for analyzing helix-helix packing in proteins including quantitative measures of the helix interaction surface area and helix crossing angle, as well as several methods for visualizing the helical interaction are developed.

...read moreread less

Abstract: Motivation: In many proteins, helix--helix interactions can be critical to establishing protein conformation (folding) and dynamics, as well as determining associations between protein units. However, the determination of a set of rules that guide helix--helix interaction has been elusive. In order to gain further insight into the helix--helix interface, we have developed a comprehensive package of tools for analyzing helix--helix packing in proteins. These tools are available at http://helix.gersteinlab.org. They include quantitative measures of the helix interaction surface area and helix crossing angle, as well as several methods for visualizing the helical interaction. These methods can be used for analysis of individual protein conformations or to gain insight into dynamic changes in helix interactions. For the latter purpose, a direct interface from entries in the Molecular Motions Database to the HIT site has been provided. Contact: Mark.Gerstein@yale.edu

...read moreread less

13 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
…
109
110
111
112
113
114
115
…
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

[...]

Stephen F. Altschul¹, Thomas L. Madden, Alejandro A. Schäffer¹, Jinghui Zhang, Zheng Zhang², Webb Miller², David J. Lipman - Show less +3 more•Institutions (2)

National Institutes of Health¹, Pennsylvania State University²

01 Sep 1997-Nucleic Acids Research

TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.

...read moreread less

Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

...read moreread less

70,111 citations

Journal Article•DOI•

The Protein Data Bank

[...]

Helen M. Berman¹, John D. Westbrook, Zukang Feng, Gary L. Gilliland, Talapady N. Bhat, Helge Weissig, Ilya N. Shindyalov, Philip E. Bourne - Show less +4 more•Institutions (1)

Rutgers University¹

01 Jan 2000-Nucleic Acids Research

TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.

...read moreread less

Abstract: The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.

...read moreread less

34,239 citations

Journal Article•DOI•

STAR: ultrafast universal RNA-seq aligner

[...]

Alexander Dobin¹, Carrie A. Davis¹, Felix Schlesinger¹, Jorg Drenkow¹, Chris Zaleski¹, Sonali Jha¹, Philippe Batut¹, Mark Chaisson¹, Thomas R. Gingeras¹ - Show less +5 more•Institutions (1)

Cold Spring Harbor Laboratory¹

01 Jan 2013-Bioinformatics

TL;DR: The Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure outperforms other aligners by a factor of >50 in mapping speed.

...read moreread less

Abstract: Motivation Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. Availability and implementation STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

...read moreread less

30,684 citations

Journal Article•DOI•

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

[...]

Ben Langmead¹, Cole Trapnell¹, Mihai Pop¹, Steven L. Salzberg¹•Institutions (1)

University of Maryland, College Park¹

04 Mar 2009-Genome Biology

TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.

...read moreread less

Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

...read moreread less

20,335 citations

疟原虫var基因转换速率变化导致抗原变异[英]／Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A

[...]

宁北芳, 朱淮民

28 Jul 2005

TL;DR: PfPMP1）与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作�ly.

...read moreread less

Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1（PfPMP1）与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员，通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

...read moreread less

18,940 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse