scispace - formally typeset
Search or ask a question
Author

Arlindo L. Oliveira

Bio: Arlindo L. Oliveira is an academic researcher from University of Lisbon. The author has contributed to research in topics: Compressed suffix array & Sequential logic. The author has an hindex of 34, co-authored 154 publications receiving 6991 citations. Previous affiliations of Arlindo L. Oliveira include Technical University of Lisbon & University of California, Berkeley.


Papers
More filters
Journal ArticleDOI
TL;DR: In this comprehensive survey, a large number of existing approaches to biclustering are analyzed, and they are classified in accordance with the type of biclusters they can find, the patterns of bIClusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.
Abstract: A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results from the application of standard clustering methods to genes are limited. This limitation is imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the data matrix has been proposed. The goal is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this paper, we refer to this class of algorithms as biclustering. Biclustering is also referred in the literature as coclustering and direct clustering, among others names, and has also been used in fields such as information retrieval and data mining. In this comprehensive survey, we analyze a large number of existing approaches to biclustering, and classify them in accordance with the type of biclusters they can find, the patterns of biclusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.

2,123 citations

Posted ContentDOI
Spyridon Bakas1, Mauricio Reyes, Andras Jakab2, Stefan Bauer3  +435 moreInstitutions (111)
TL;DR: This study assesses the state-of-the-art machine learning methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018, and investigates the challenge of identifying the best ML algorithms for each of these tasks.
Abstract: Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumoris a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses thestate-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross tota lresection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.

1,165 citations

Journal ArticleDOI
TL;DR: The YEAst Search for Transcriptional Regulators And Consensus Tracking (YEASTRACT; ) database is a repository of 12 346 regulatory associations between transcription factors and target genes, based on experimental evidence which was spread throughout 861 bibliographic references.
Abstract: We present the YEAst Search for Transcriptional Regulators And Consensus Tracking (YEASTRACT; www.yeastract.com) database, a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae. This database is a repository of 12 346 regulatory associations between transcription factors and target genes, based on experimental evidence which was spread throughout 861 bibliographic references. It also includes 257 specific DNA-binding sites for more than a hundred characterized transcription factors. Further information about each yeast gene included in the database was obtained from Saccharomyces Genome Database (SGD), Regulatory Sequences Analysis Tools and Gene Ontology (GO) Consortium. Computational tools are also provided to facilitate the exploitation of the gathered data when solving a number of biological questions as exemplified in the Tutorial also available on the system. YEASTRACT allows the identification of documented or potential transcription regulators of a given gene and of documented or potential regulons for each transcription factor. It also renders possible the comparison between DNA motifs, such as those found to be over-represented in the promoter regions of co-regulated genes, and the transcription factor-binding sites described in the literature. The system also provides an useful mechanism for grouping a list of genes (for instance a set of genes with similar expression profiles as revealed by microarray analysis) based on their regulatory associations with known transcription factors.

525 citations

01 Jan 2001
TL;DR: A survey on the most significant techniques developed in the past ten years to deal with temporal sequences is provided.
Abstract: One of the main unresolved problems that arise during the data mining process is treating data that contains temporal information. In this case, a complete understanding of the entire phenomenon requires that the data should be viewed as a sequence of events. Temporal sequences appear in a vast range of domains, from engineering, to medicine and finance, and the ability to model and extract information from them is crucial for the advance of the information society. This paper provides a survey on the most significant techniques developed in the past ten years to deal with temporal sequences.

275 citations

Journal ArticleDOI
TL;DR: The YEASTRACT information system was revisited and new information was added on the experimental conditions in which those associations take place and on whether the TF is acting on its target genes as activator or repressor, allowing the selection of specific environmental conditions, experimental evidence or positive/negative regulatory effect.
Abstract: The YEASTRACT (http://www.yeastract.com) information system is a tool for the analysis and prediction of transcription regulatory associations in Saccharomyces cerevisiae. Last updated in June 2013, this database contains over 200,000 regulatory associations between transcription factors (TFs) and target genes, including 326 DNA binding sites for 113 TFs. All regulatory associations stored in YEASTRACT were revisited and new information was added on the experimental conditions in which those associations take place and on whether the TF is acting on its target genes as activator or repressor. Based on this information, new queries were developed allowing the selection of specific environmental conditions, experimental evidence or positive/negative regulatory effect. This release further offers tools to rank the TFs controlling a gene or genome-wide response by their relative importance, based on (i) the percentage of target genes in the data set; (ii) the enrichment of the TF regulon in the data set when compared with the genome; or (iii) the score computed using the TFRank system, which selects and prioritizes the relevant TFs by walking through the yeast regulatory network. We expect that with the new data and services made available, the system will continue to be instrumental for yeast biologists and systems biology researchers.

248 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
01 Apr 1988-Nature
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

9,929 citations

Book
24 Aug 2012
TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Abstract: Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

8,059 citations

Journal ArticleDOI
TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.
Abstract: Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.

5,744 citations