scispace - formally typeset
Search or ask a question
Author

Mark Gerstein

Bio: Mark Gerstein is an academic researcher from Yale University. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 168, co-authored 751 publications receiving 149578 citations. Previous affiliations of Mark Gerstein include Rutgers University & Structural Genomics Consortium.
Topics: Genome, Gene, Human genome, Genomics, Pseudogene


Papers
More filters
Journal ArticleDOI
TL;DR: It is shown that a residue located in the kinase activation segment, which is termed the “DFG+1” residue, acts as a major determinant for serine-threonine phosphorylation site specificity.

96 citations

Journal ArticleDOI
TL;DR: This study performs genome‐wide profiling of 49 primary prostate cancers and identifies 20 recurrent chromosomal copy number aberrations, mainly occurring as genomic losses, and demonstrates that high‐resolution tiling arrays can be used to pin‐point breakpoints leading to fusion events.
Abstract: Emerging molecular and clinical data suggest that ETS fusion prostate cancer represents a distinct molecular subclass, driven most commonly by a hormonally regulated promoter and characterized by an aggressive natural history. The study of the genomic landscape of prostate cancer in the light of ETS fusion events is required to understand the foundation of this molecularly and clinically distinct subtype. We performed genome-wide profiling of 49 primary prostate cancers and identified 20 recurrent chromosomal copy number aberrations, mainly occurring as genomic losses. Co-occurring events included losses at 19q13.32 and 1p22.1. We discovered three genomic events associated with ERG rearranged prostate cancer, affecting 6q, 7q, and 16q. 6q loss in nonrearranged prostate cancer is accompanied by gene expression deregulation in an independent dataset and by protein deregulation of MYO6. To analyze copy number alterations within the ETS genes, we performed a comprehensive analysis of all 27 ETS genes and of the 3 Mbp genomic area between ERG and TMPRSS2 (21q) with an unprecedented resolution (30 bp). We demonstrate that high-resolution tiling arrays can be used to pinpoint breakpoints leading to fusion events. This study provides further support to define a distinct molecular subtype of prostate cancer based on the presence of ETS gene rearrangements. V C 2009 Wiley-Liss,Inc.

96 citations

Journal ArticleDOI
TL;DR: SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatics mutation calls for both single nucleotide variants and small insertions and deletions that achieves better overall accuracy than any individual tool incorporated.
Abstract: SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for each candidate site. A training set is provided to an adaptively boosted decision tree learner to create a classifier for predicting mutation statuses. We validate our results with both synthetic and real data. We report that SomaticSeq is able to achieve better overall accuracy than any individual tool incorporated.

95 citations

Journal ArticleDOI
Marta Paczkowska1, Jonathan Barenboim1, N Sintupisut1, Natalie S. Fox1, Helen He Zhu1, Diala Abd-Rabbo1, Miles W Mee1, Paul C. Boutros2, Federico Abascal2, Samirkumar B. Amin, Gary D. Bader, Rameen Beroukhim, Johanna Bertl, Keith A. Boroevich, Søren Brunak, Peter J. Campbell, Joana Carlevaro-Fita, Dimple Chakravarty, Calvin Wing Yiu Chan, Ken Chen, Jung Kyoon Choi, Jordi Deu-Pons, Priyanka Dhingra, Klev Diamanti, Lars Feuerbach, J Fink, Nuno A. Fonseca, Joan Frigola, C Gambacorti Passerini, Dale W. Garsed, Mark Gerstein, Gad Getz, Abel Gonzalez-Perez, Qianyun Guo, Ivo Gut, David Haan, Mark P. Hamilton, Nicholas J. Haradhvala, Arif Harmanci, Mohamed Helmy, Carl Herrmann, Julian M. Hess, Asger Hobolth, Ermin Hodzic, Chen Hong, Henrik Hornshøj, Keren Isaev, Jose M. G. Izarzugaza, Rory Johnson, Toby Johnson, Malene Juul, Randi Istrup Juul, André Kahles, Abdullah Kahraman, Manolis Kellis, Ekta Khurana, Jong Kyoung Kim, Young-Wook Kim, Jan Komorowski, Jan O. Korbel, Swathi Ashok Kumar, Andrés Lanzós, Mitchell G. Lawrence, Darlene Lee, Kjong-Van Lehmann, Shantao Li, Xiaotong Li, Z Lin, Eric Minwei Liu, Lucas Lochovsky, Shaoke Lou, Tobias Madsen, Kathleen Marchal, Inigo Martincorena, Alexander Martinez-Fundichely, Yosef E. Maruvka, Patrick McGillivray, William Meyerson, Ferran Muiños, Loris Mularoni, Hidewaki Nakagawa, Morten Nielsen, Kiejung Park, Jakob Skou Pedersen, Oriol Pich, Tirso Pons, Sergio Pulido-Tamayo, Benjamin J. Raphael, I Reyes-Salazar, Matthew A. Reyna, Ester Rheinbay, Mark A. Rubin, Carlota Rubio-Perez, Radhakrishnan Sabarinathan, Suleyman Cenk Sahinalp, Gordon Saksena, Leonidas Salichos, Cindy Sander, Steve Schumacher, Mark Shackleton, Ofer Shapira, Ciyue Shen, Raunak Shrestha, Shimin Shuai, Nikos Sidiropoulos, Lina Sieverling, Nicholas A Sinnott-Armstrong, Lincoln Stein, Joshua M. Stuart, David Tamborero, Grace Tiao, Tatsuhiko Tsunoda, Husen M. Umer, Liis Uusküla-Reimand, Alfonso Valencia, Miguel Vazquez, Lieven Verbeke, Claes Wadelius, Lina Wadi, Jian Wang, Jonathan Warrell, Sebastian M. Waszak, Joachim Weischenfeldt, D Wheeler, Guanming Wu, Jun Yu, Jiashan Zhang, Xiuqing Zhang, Yan Zhang, Zhongming Zhao, Lihua Zou, C. Von Mering, Jüri Reimand2 
01 Jan 2020-bioRxiv
TL;DR: The authors develop ActivePathways method, which uses data fusion techniques for integrative pathway analysis of multi-omics data and candidate gene discovery that discovers significantly enriched pathways across multiple datasets using statistical data fusion.
Abstract: Multi-omics datasets quantify complementary aspects of molecular biology and thus pose challenges to data interpretation and hypothesis generation. ActivePathways is an integrative method that discovers significantly enriched pathways across multiple omics datasets using a statistical data fusion approach, rationalizes contributing evidence and highlights associated genes. We demonstrate its utility by analyzing coding and non-coding mutations from 2,583 whole cancer genomes, revealing frequently mutated hallmark pathways and a long tail of known and putative cancer driver genes. We also studied prognostic molecular pathways in breast cancer subtypes by integrating genomic and transcriptomic features of tumors and tumor-adjacent cells and found significant associations with immune response processes and anti-apoptotic signaling pathways. ActivePathways is a versatile method that improves systems-level understanding of cellular organization in health and disease through integration of multiple molecular datasets and pathway annotations.

95 citations

Journal ArticleDOI
01 Jun 2003-Proteins
TL;DR: This work tested the hypothesis that more efficient crystallization strategies could be formulated by extracting useful patterns and correlations from the large data sets of crystallization trials created in structural proteomics projects, and identified the conditions that crystallize the most proteins.
Abstract: Protein crystallization is a major bottleneck in protein X-ray crystallography, the workhorse of most structural proteomics projects. Because the principles that govern protein crystallization are too poorly understood to allow them to be used in a strongly predictive sense, the most common crystallization strategy entails screening a wide variety of solution conditions to identify the small subset that will support crystal nucleation and growth. We tested the hypothesis that more efficient crystallization strategies could be formulated by extracting useful patterns and correlations from the large data sets of crystallization trials created in structural proteomics projects. A database of crystallization conditions was constructed for 755 different proteins purified and crystallized under uniform conditions. Forty-five percent of the proteins formed crystals. Data mining identified the conditions that crystallize the most proteins, revealed that many conditions are highly correlated in their behavior, and showed that the crystallization success rate is markedly dependent on the organism from which proteins derive. Of the proteins that crystallized in a 48-condition experiment, 60% could be crystallized in as few as 6 conditions and 94% in 24 conditions. Consideration of the full range of information coming from crystal screening trials allows one to design screens that are maximally productive while consuming minimal resources, and also suggests further useful conditions for extending existing screens.

94 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations

Journal ArticleDOI
TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.
Abstract: The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.

34,239 citations

Journal ArticleDOI
TL;DR: The Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure outperforms other aligners by a factor of >50 in mapping speed.
Abstract: Motivation Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. Availability and implementation STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

30,684 citations

Journal ArticleDOI
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

20,335 citations

28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations