scispace - formally typeset
Search or ask a question
Author

Sašo Džeroski

Other affiliations: The Turing Institute
Bio: Sašo Džeroski is an academic researcher from Jožef Stefan Institute. The author has contributed to research in topics: Cluster analysis & Random forest. The author has an hindex of 49, co-authored 268 publications receiving 9183 citations. Previous affiliations of Sašo Džeroski include The Turing Institute.


Papers
More filters
Journal ArticleDOI
Predrag Radivojac1, Wyatt T. Clark1, Tal Ronnen Oron2, Alexandra M. Schnoes3, Tobias Wittkop2, Artem Sokolov4, Artem Sokolov5, Kiley Graim5, Christopher S. Funk6, Karin Verspoor6, Asa Ben-Hur5, Gaurav Pandey7, Gaurav Pandey8, Jeffrey M. Yunes8, Ameet Talwalkar8, Susanna Repo9, Susanna Repo8, Michael L Souza8, Damiano Piovesan10, Rita Casadio10, Zheng Wang11, Jianlin Cheng11, Hai Fang, Julian Gough12, Patrik Koskinen13, Petri Törönen13, Jussi Nokso-Koivisto13, Liisa Holm13, Domenico Cozzetto14, Daniel W. A. Buchan14, Kevin Bryson14, David T. Jones14, Bhakti Limaye15, Harshal Inamdar15, Avik Datta15, Sunitha K Manjari15, Rajendra Joshi15, Meghana Chitale16, Daisuke Kihara16, Andreas Martin Lisewski17, Serkan Erdin17, Eric Venner17, Olivier Lichtarge17, Robert Rentzsch14, Haixuan Yang18, Alfonso E. Romero18, Prajwal Bhat18, Alberto Paccanaro18, Tobias Hamp19, Rebecca Kaßner19, Stefan Seemayer19, Esmeralda Vicedo19, Christian Schaefer19, Dominik Achten19, Florian Auer19, Ariane Boehm19, Tatjana Braun19, Maximilian Hecht19, Mark Heron19, Peter Hönigschmid19, Thomas A. Hopf19, Stefanie Kaufmann19, Michael Kiening19, Denis Krompass19, Cedric Landerer19, Yannick Mahlich19, Manfred Roos19, Jari Björne20, Tapio Salakoski20, Andrew Wong21, Hagit Shatkay21, Hagit Shatkay22, Fanny Gatzmann23, Ingolf Sommer23, Mark N. Wass24, Michael J.E. Sternberg24, Nives Škunca, Fran Supek, Matko Bošnjak, Panče Panov, Sašo Džeroski, Tomislav Šmuc, Yiannis A. I. Kourmpetis25, Yiannis A. I. Kourmpetis26, Aalt D. J. van Dijk25, Cajo J. F. ter Braak25, Yuanpeng Zhou27, Qingtian Gong27, Xinran Dong27, Weidong Tian27, Marco Falda28, Paolo Fontana, Enrico Lavezzo28, Barbara Di Camillo28, Stefano Toppo28, Liang Lan29, Nemanja Djuric29, Yuhong Guo29, Slobodan Vucetic29, Amos Marc Bairoch30, Amos Marc Bairoch31, Michal Linial32, Patricia C. Babbitt3, Steven E. Brenner8, Christine A. Orengo14, Burkhard Rost19, Sean D. Mooney2, Iddo Friedberg33 
TL;DR: Today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets, and there is considerable need for improvement of currently available tools.
Abstract: Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.

859 citations

Journal ArticleDOI
TL;DR: This work empirically evaluates several state-of-the-art methods for constructing ensembles of heterogeneous classifiers with stacking and shows that they perform (at best) comparably to selecting the best classifier from the ensemble by cross validation and proposes two extensions of this method using an extended set of meta-level features and multi-response model trees to learn at the meta- level.
Abstract: We empirically evaluate several state-of-the-art methods for constructing ensembles of heterogeneous classifiers with stacking and show that they perform (at best) comparably to selecting the best classifier from the ensemble by cross validation. Among state-of-the-art stacking methods, stacking with probability distributions and multi-response linear regression performs best. We propose two extensions of this method, one using an extended set of meta-level features and the other using multi-response model trees to learn at the meta-level. We show that the latter extension performs better than existing stacking approaches and better than selecting the best classifier by cross validation.

768 citations

Journal ArticleDOI
TL;DR: HMC trees outperform HSC and SC trees along three dimensions: predictive accuracy, model size, and induction time, and it is concluded that HMC trees should definitely be considered in HMC tasks where interpretable models are desired.
Abstract: Hierarchical multi-label classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. This article presents several approaches to the induction of decision trees for HMC, as well as an empirical study of their use in functional genomics. We compare learning a single HMC tree (which makes predictions for all classes together) to two approaches that learn a set of regular classification trees (one for each class). The first approach defines an independent single-label classification task for each class (SC). Obviously, the hierarchy introduces dependencies between the classes. While they are ignored by the first approach, they are exploited by the second approach, named hierarchical single-label classification (HSC). Depending on the application at hand, the hierarchy of classes can be such that each class has at most one parent (tree structure) or such that classes may have multiple parents (DAG structure). The latter case has not been considered before and we show how the HMC and HSC approaches can be modified to support this setting. We compare the three approaches on 24 yeast data sets using as classification schemes MIPS's FunCat (tree structure) and the Gene Ontology (DAG structure). We show that HMC trees outperform HSC and SC trees along three dimensions: predictive accuracy, model size, and induction time. We conclude that HMC trees should definitely be considered in HMC tasks where interpretable models are desired.

616 citations

Journal ArticleDOI
TL;DR: Relational reinforcement learning (RL) as mentioned in this paper is a learning technique that combines reinforcement learning with relational learning or inductive logic programming, which can be potentially applied to a new range of learning tasks.
Abstract: Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation language to represent states, actions and Q-functions, relational reinforcement learning can be potentially applied to a new range of learning tasks. One such task that we investigate is planning in the blocks world, where it is assumed that the effects of the actions are unknown to the agent and the agent has to learn a policy. Within this simple domain we show that relational reinforcement learning solves some existing problems with reinforcement learning. In particular, relational reinforcement learning allows us to employ structural representations, to abstract from specific goals pursued and to exploit the results of previous learning phases when addressing new (more complex) situations.

395 citations

Journal ArticleDOI
TL;DR: This article provides a brief introduction to MRDM, while the remainder of this special issue treats in detail advanced research topics at the frontiers of MRDM.
Abstract: Data mining algorithms look for patterns in data. While most existing data mining approaches look for patterns in a single data table, multi-relational data mining (MRDM) approaches look for patterns that involve multiple tables (relations) from a relational database. In recent years, the most common types of patterns and approaches considered in data mining have been extended to the multi-relational case and MRDM now encompasses multi-relational (MR) association rule discovery, MR decision trees and MR distance-based methods, among others. MRDM approaches have been successfully applied to a number of problems in a variety of areas, most notably in the area of bioinformatics. This article provides a brief introduction to MRDM, while the remainder of this special issue treats in detail advanced research topics at the frontiers of MRDM.

292 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Proceedings ArticleDOI
13 Aug 2016
TL;DR: Node2vec as mentioned in this paper learns a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes by using a biased random walk procedure.
Abstract: Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node's network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations. We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks.

7,072 citations

Journal ArticleDOI

6,278 citations