scispace - formally typeset
Search or ask a question
Author

Amir Ben-Dor

Bio: Amir Ben-Dor is an academic researcher from Agilent Technologies. The author has contributed to research in topics: Comparative genomic hybridization & Gene expression profiling. The author has an hindex of 35, co-authored 86 publications receiving 11588 citations. Previous affiliations of Amir Ben-Dor include University of Washington & Stanford University.


Papers
More filters
Journal ArticleDOI
03 Aug 2000-Nature
TL;DR: Many genes underlying the classification of this subset of melanomas are differentially regulated in invasive melanomas that form primitive tubular networks in vitro, a feature of some highly aggressive metastatic melanomas.
Abstract: The most common human cancers are malignant neoplasms of the skin. Incidence of cutaneous melanoma is rising especially steeply, with minimal progress in non-surgical treatment of advanced disease. Despite significant effort to identify independent predictors of melanoma outcome, no accepted histopathological, molecular or immunohistochemical marker defines subsets of this neoplasm. Accordingly, though melanoma is thought to present with different 'taxonomic' forms, these are considered part of a continuous spectrum rather than discrete entities. Here we report the discovery of a subset of melanomas identified by mathematical analysis of gene expression in a series of samples. Remarkably, many genes underlying the classification of this subset are differentially regulated in invasive melanomas that form primitive tubular networks in vitro, a feature of some highly aggressive metastatic melanomas. Global transcript analysis can identify unrecognized subtypes of cutaneous melanoma and predict experimentally verifiable phenotypic characteristics that may be of importance to disease progression.

2,058 citations

Journal ArticleDOI
TL;DR: Significantly different groups of genes are expressed by breast cancers with BRCA1 mutations and breast cancersWith BRCa2 mutations, the results suggest that a heritable mutation influences the gene-expression profile of the cancer.
Abstract: Background Many cases of hereditary breast cancer are due to mutations in either the BRCA1 or the BRCA2 gene. The histopathological changes in these cancers are often characteristic of the mutant gene. We hypothesized that the genes expressed by these two types of tumors are also distinctive, perhaps allowing us to identify cases of hereditary breast cancer on the basis of gene-expression profiles. Methods RNA from samples of primary tumors from seven carriers of the BRCA1 mutation, seven carriers of the BRCA2 mutation, and seven patients with sporadic cases of breast cancer was compared with a microarray of 6512 complementary DNA clones of 5361 genes. Statistical analyses were used to identify a set of genes that could distinguish the BRCA1 genotype from the BRCA2 genotype. Results Permutation analysis of multivariate classification functions established that the gene-expression profiles of tumors with BRCA1 mutations, tumors with BRCA2 mutations, and sporadic tumors differed significantly from each othe...

1,638 citations

Journal ArticleDOI
TL;DR: This paper defines an appropriate stochastic error model on the input, and proves that under the conditions of the model, the algorithm recovers the cluster structure with high probability, and presents a practical heuristic based on the same algorithmic ideas.
Abstract: Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. The corresponding algorithmic problem is to cluster multicondition gene expression patterns. In this paper we describe a novel clustering algorithm that was developed for analysis of gene expression data. We define an appropriate stochastic error model on the input, and prove that under the conditions of the model, the algorithm recovers the cluster structure with high probability. The running time of the algorithm on an n-gene dataset is O[n2[log(n)]c]. We also present a practical heuristic based on the same algorithmic ideas. The heuristic was implemented and its performance is demonstrated on simulated data and on real gene expression data, with very promising results.

1,241 citations

Journal ArticleDOI
TL;DR: This work examines three sets of gene expression data measured across sets of tumor(s) and normal clinical samples, and presents results of performing leave-one-out cross validation (LOOCV) experiments on the three data sets, employing nearest neighbor classifier, SVM, AdaBoost and a novel clustering-based classification technique.
Abstract: Constantly improving gene expression profiling technologies are expected to provide understanding and insight into cancer-related cellular processes. Gene expression data is also expected to significantly aid in the development of efficient cancer diagnosis and classification platforms. In this work we examine three sets of gene expression data measured across sets of tumor(s) and normal clinical samples: The first set consists of 2,000 genes, measured in 62 epithelial colon samples (Alon et al., 1999). The second consists of approximately equal to 100,000 clones, measured in 32 ovarian samples (unpublished extension of data set described in Schummer et al. (1999)). The third set consists of approximately equal to 7,100 genes, measured in 72 bone marrow and peripheral blood samples (Golub et al, 1999). We examine the use of scoring methods, measuring separation of tissue type (e.g., tumors from normals) using individual gene expression levels. These are then coupled with high-dimensional classification methods to assess the classification power of complete expression profiles. We present results of performing leave-one-out cross validation (LOOCV) experiments on the three data sets, employing nearest neighbor classifier, SVM (Cortes and Vapnik, 1995), AdaBoost (Freund and Schapire, 1997) and a novel clustering-based classification technique. As tumor samples can differ from normal samples in their cell-type composition, we also perform LOOCV experiments using appropriately modified sets of genes, attempting to eliminate the resulting bias. We demonstrate success rate of at least 90% in tumor versus normal classification, using sets of selected genes, with, as well as without, cellular-contamination-related members. These results are insensitive to the exact selection mechanism, over a certain range.

789 citations

Journal ArticleDOI
TL;DR: The results identify matrilysin as a mediator of pulmonary fibrosis and a potential therapeutic target and illustrate the power of global gene expression analysis of human tissue samples to identify molecular pathways involved in clinical disease.
Abstract: Pulmonary fibrosis is a progressive and largely untreatable group of disorders that affects up to 100,000 people on any given day in the United States. To elucidate the molecular mechanisms that lead to end-stage human pulmonary fibrosis we analyzed samples from patients with histologically proven pulmonary fibrosis (usual interstitial pneumonia) by using oligonucleotide microarrays. Gene expression patterns clearly distinguished normal from fibrotic lungs. Many of the genes that were significantly increased in fibrotic lungs encoded proteins associated with extracellular matrix formation and degradation and proteins expressed in smooth muscle. Using a combined set of scoring systems we determined that matrilysin (matrix metalloproteinase 7), a metalloprotease not previously associated with pulmonary fibrosis, was the most informative increased gene in our data set. Immunohistochemisry demonstrated increased expression of matrilysin protein in fibrotic lungs. Furthermore, matrilysin knockout mice were dramatically protected from pulmonary fibrosis in response to intratracheal bleomycin. Our results identify matrilysin as a mediator of pulmonary fibrosis and a potential therapeutic target. They also illustrate the power of global gene expression analysis of human tissue samples to identify molecular pathways involved in clinical disease.

597 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
TL;DR: Survival analyses on a subcohort of patients with locally advanced breast cancer uniformly treated in a prospective study showed significantly different outcomes for the patients belonging to the various groups, including a poor prognosis for the basal-like subtype and a significant difference in outcome for the two estrogen receptor-positive groups.
Abstract: The purpose of this study was to classify breast carcinomas based on variations in gene expression patterns derived from cDNA microarrays and to correlate tumor characteristics to clinical outcome. A total of 85 cDNA microarray experiments representing 78 cancers, three fibroadenomas, and four normal breast tissues were analyzed by hierarchical clustering. As reported previously, the cancers could be classified into a basal epithelial-like group, an ERBB2-overexpressing group and a normal breast-like group based on variations in gene expression. A novel finding was that the previously characterized luminal epithelial/estrogen receptor-positive group could be divided into at least two subgroups, each with a distinctive expression profile. These subtypes proved to be reasonably robust by clustering using two different gene sets: first, a set of 456 cDNA clones previously selected to reflect intrinsic properties of the tumors and, second, a gene set that highly correlated with patient outcome. Survival analyses on a subcohort of patients with locally advanced breast cancer uniformly treated in a prospective study showed significantly different outcomes for the patients belonging to the various groups, including a poor prognosis for the basal-like subtype and a significant difference in outcome for the two estrogen receptor-positive groups.

10,791 citations

Journal ArticleDOI
31 Jan 2002-Nature
TL;DR: DNA microarray analysis on primary breast tumours of 117 young patients is used and supervised classification is applied to identify a gene expression signature strongly predictive of a short interval to distant metastases (‘poor prognosis’ signature) in patients without tumour cells in local lymph nodes at diagnosis, providing a strategy to select patients who would benefit from adjuvant therapy.
Abstract: Breast cancer patients with the same stage of disease can have markedly different treatment responses and overall outcome. The strongest predictors for metastases (for example, lymph node status and histological grade) fail to classify accurately breast tumours according to their clinical behaviour. Chemotherapy or hormonal therapy reduces the risk of distant metastases by approximately one-third; however, 70-80% of patients receiving this treatment would have survived without it. None of the signatures of breast cancer gene expression reported to date allow for patient-tailored therapy strategies. Here we used DNA microarray analysis on primary breast tumours of 117 young patients, and applied supervised classification to identify a gene expression signature strongly predictive of a short interval to distant metastases ('poor prognosis' signature) in patients without tumour cells in local lymph nodes at diagnosis (lymph node negative). In addition, we established a signature that identifies tumours of BRCA1 carriers. The poor prognosis signature consists of genes regulating cell cycle, invasion, metastasis and angiogenesis. This gene expression profile will outperform all currently used clinical parameters in predicting disease outcome. Our findings provide a strategy to select patients who would benefit from adjuvant therapy.

9,664 citations

Journal ArticleDOI
TL;DR: This work proposes an approach to measuring statistical significance in genomewide studies based on the concept of the false discovery rate, which offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted.
Abstract: With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features in a genomewide data set are tested against some null hypothesis, where a number of features are expected to be significant. Here we propose an approach to measuring statistical significance in these genomewide studies based on the concept of the false discovery rate. This approach offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted. In doing so, a measure of statistical significance called the q value is associated with each tested feature. The q value is similar to the well known p value, except it is a measure of significance in terms of the false discovery rate rather than the false positive rate. Our approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage.

9,239 citations

Journal ArticleDOI
01 Nov 2001-Nature
TL;DR: Stem cell biology has come of age: Unequivocal proof that stem cells exist in the haematopoietic system has given way to the prospective isolation of several tissue-specific stem and progenitor cells, the initial delineation of their properties and expressed genetic programmes, and the beginnings of their utility in regenerative medicine.
Abstract: Stem cell biology has come of age. Unequivocal proof that stem cells exist in the haematopoietic system has given way to the prospective isolation of several tissue-specific stem and progenitor cells, the initial delineation of their properties and expressed genetic programmes, and the beginnings of their utility in regenerative medicine. Perhaps the most important and useful property of stem cells is that of self-renewal. Through this property, striking parallels can be found between stem cells and cancer cells: tumours may often originate from the transformation of normal stem cells, similar signalling pathways may regulate self-renewal in stem cells and cancer cells, and cancer cells may include 'cancer stem cells' - rare cells with indefinite potential for self-renewal that drive tumorigenesis.

8,999 citations