Showing papers in "Machine Learning in 2021"

PDF

Open Access

Journal Article•DOI•

Aleatoric and epistemic uncertainty in machine learning : an introduction to concepts and methods

[...]

Eyke Hüllermeier¹, Willem Waegeman²•Institutions (2)

University of Paderborn¹, Ghent University²

01 Mar 2021-Machine Learning

TL;DR: The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology as mentioned in this paper, and this includes the importance of distinguishing between aleatoric and epistemic uncertainty.

...read moreread less

Abstract: The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often referred to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of attempts so far at handling uncertainty in general and formalizing this distinction in particular.

...read moreread less

321 citations

Journal Article•DOI•

Regularisation of neural networks by enforcing Lipschitz continuity

[...]

Henry Gouk¹, Eibe Frank², Bernhard Pfahringer², Michael J. Cree²•Institutions (2)

University of Edinburgh¹, University of Waikato²

01 Feb 2021-Machine Learning

TL;DR: The technique is used to formulate training a neural network with a bounded Lipschitz constant as a constrained optimisation problem that can be solved using projected stochastic gradient methods and shows that the performance of the resulting models exceeds that of models trained with other common regularisers.

...read moreread less

Abstract: We investigate the effect of explicitly enforcing the Lipschitz continuity of neural networks with respect to their inputs. To this end, we provide a simple technique for computing an upper bound to the Lipschitz constant—for multiple p-norms—of a feed forward neural network composed of commonly used layer types. Our technique is then used to formulate training a neural network with a bounded Lipschitz constant as a constrained optimisation problem that can be solved using projected stochastic gradient methods. Our evaluation study shows that the performance of the resulting models exceeds that of models trained with other common regularisers. We also provide evidence that the hyperparameters are intuitive to tune, demonstrate how the choice of norm for computing the Lipschitz constant impacts the resulting model, and show that the performance gains provided by our method are particularly noticeable when only a small amount of training data is available.

...read moreread less

196 citations

Journal Article•DOI•

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

[...]

Gabriel Dulac-Arnold¹, Nir Levine, Daniel J. Mankowitz, Jerry Li, Cosmin Paduraru, Sven Gowal, Todd Hester - Show less +3 more•Institutions (1)

Google¹

22 Apr 2021-Machine Learning

TL;DR: This work identifies and formalizes a series of independent challenges that embody the difficulties that must be addressed for RL to be commonly deployed in real-world systems and proposes an as an open-source benchmark.

...read moreread less

Abstract: Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. In this work, we identify and formalize a series of independent challenges that embody the difficulties that must be addressed for RL to be commonly deployed in real-world systems. For each challenge, we define it formally in the context of a Markov Decision Process, analyze the effects of the challenge on state-of-the-art learning algorithms, and present some existing attempts at tackling it. We believe that an approach that addresses our set of proposed challenges would be readily deployable in a large number of real world problems. Our proposed challenges are implemented in a suite of continuous control environments called realworldrl-suite which we propose an as an open-source benchmark.

...read moreread less

195 citations

Journal Article•DOI•

Conditional variance penalties and domain shift robustness

[...]

Christina Heinze-Deml¹, Nicolai Meinshausen¹•Institutions (1)

ETH Zurich¹

01 Feb 2021-Machine Learning

TL;DR: Using a causal framework, this conditional variance regularization (CoRe) is shown to protect asymptotically against shifts in the distribution of the style variables and improves predictive accuracy substantially in settings where domain changes occur in terms of image quality, brightness and color.

...read moreread less

Abstract: When training a deep neural network for image classification, one can broadly distinguish between two types of latent features of images that will drive the classification. We can divide latent features into (i) ‘core’ or ‘conditionally invariant’ features $$C$$ whose distribution $$C\vert Y$$ , conditional on the class Y, does not change substantially across domains and (ii) ‘style’ features $$S$$ whose distribution $$S\vert Y$$ can change substantially across domains. Examples for style features include position, rotation, image quality or brightness but also more complex ones like hair color, image quality or posture for images of persons. Our goal is to minimize a loss that is robust under changes in the distribution of these style features. In contrast to previous work, we assume that the domain itself is not observed and hence a latent variable. We do assume that we can sometimes observe a typically discrete identifier or “ $$\mathrm {ID}$$ variable”. In some applications we know, for example, that two images show the same person, and $$\mathrm {ID}$$ then refers to the identity of the person. The proposed method requires only a small fraction of images to have $$\mathrm {ID}$$ information. We group observations if they share the same class and identifier $$(Y,\mathrm {ID})=(y,\mathrm {id})$$ and penalize the conditional variance of the prediction or the loss if we condition on $$(Y,\mathrm {ID})$$ . Using a causal framework, this conditional variance regularization (CoRe) is shown to protect asymptotically against shifts in the distribution of the style variables in a partially linear structural equation model. Empirically, we show that the CoRe penalty improves predictive accuracy substantially in settings where domain changes occur in terms of image quality, brightness and color while we also look at more complex changes such as changes in movement and posture.

...read moreread less

100 citations

Journal Article•DOI•

How artificial intelligence and machine learning can help healthcare systems respond to COVID-19.

[...]

Mihaela van der Schaar¹, Mihaela van der Schaar², Ahmed M. Alaa², R. Andres Floto¹, Alexander Gimson³, Stefan Scholtes¹, Angela M. Wood¹, Eoin F. McKinney¹, Daniel Jarrett¹, Pietro Liò¹, Ari Ercole³, Ari Ercole¹ - Show less +8 more•Institutions (3)

University of Cambridge¹, University of California, Los Angeles², Cambridge University Hospitals NHS Foundation Trust³

01 Jan 2021-Machine Learning

TL;DR: It is argued that the integration of these techniques into local, national, and international healthcare systems will save lives, and proposed specific methods by which implementation can happen swiftly and efficiently are proposed.

...read moreread less

Abstract: The COVID-19 global pandemic is a threat not only to the health of millions of individuals, but also to the stability of infrastructure and economies around the world. The disease will inevitably place an overwhelming burden on healthcare systems that cannot be effectively dealt with by existing facilities or responses based on conventional approaches. We believe that a rigorous clinical and societal response can only be mounted by using intelligence derived from a variety of data sources to better utilize scarce healthcare resources, provide personalized patient management plans, inform policy, and expedite clinical trials. In this paper, we introduce five of the most important challenges in responding to COVID-19 and show how each of them can be addressed by recent developments in machine learning (ML) and artificial intelligence (AI). We argue that the integration of these techniques into local, national, and international healthcare systems will save lives, and propose specific methods by which implementation can happen swiftly and efficiently. We offer to extend these resources and knowledge to assist policymakers seeking to implement these techniques.

...read moreread less

99 citations

Journal Article•DOI•

LoRAS: an oversampling approach for imbalanced datasets

[...]

Saptarshi Bej¹, Narek Davtyan¹, Markus Wolfien¹, Mariam Nassar¹, Olaf Wolkenhauer¹ - Show less +1 more•Institutions (1)

University of Rostock¹

01 Feb 2021-Machine Learning

TL;DR: To explain the success of the algorithm, a mathematical framework is constructed to prove that LoRAS oversampling technique provides a better estimate for the mean of the underlying local data distribution of the minority class data space.

...read moreread less

Abstract: The Synthetic Minority Oversampling TEchnique (SMOTE) is widely-used for the analysis of imbalanced datasets. It is known that SMOTE frequently over-generalizes the minority class, leading to misclassifications for the majority class, and effecting the overall balance of the model. In this article, we present an approach that overcomes this limitation of SMOTE, employing Localized Random Affine Shadowsampling (LoRAS) to oversample from an approximated data manifold of the minority class. We benchmarked our algorithm with 14 publicly available imbalanced datasets using three different Machine Learning (ML) algorithms and compared the performance of LoRAS, SMOTE and several SMOTE extensions that share the concept of using convex combinations of minority class data points for oversampling with LoRAS. We observed that LoRAS, on average generates better ML models in terms of F1-Score and Balanced accuracy. Another key observation is that while most of the extensions of SMOTE we have tested, improve the F1-Score with respect to SMOTE on an average, they compromise on the Balanced accuracy of a classification model. LoRAS on the contrary, improves both F1 Score and the Balanced accuracy thus produces better classification models. Moreover, to explain the success of the algorithm, we have constructed a mathematical framework to prove that LoRAS oversampling technique provides a better estimate for the mean of the underlying local data distribution of the minority class data space.

...read moreread less

67 citations

Journal Article•DOI•

Interpretable clustering: an optimization approach

[...]

Dimitris Bertsimas¹, Agni Orfanoudaki¹, Holly Wiberg¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2021-Machine Learning

TL;DR: A new unsupervised learning method is presented that leverages Mixed Integer Optimization techniques to generate interpretable tree-based clustering models and achieves comparable or superior performance to other clustering methods on both synthetic and real-world datasets while offering significantly higher interpretability.

...read moreread less

Abstract: State-of-the-art clustering algorithms provide little insight into the rationale for cluster membership, limiting their interpretability. In complex real-world applications, the latter poses a barrier to machine learning adoption when experts are asked to provide detailed explanations of their algorithms’ recommendations. We present a new unsupervised learning method that leverages Mixed Integer Optimization techniques to generate interpretable tree-based clustering models. Utilizing a flexible optimization-driven framework, our algorithm approximates the globally optimal solution leading to high quality partitions of the feature space. We propose a novel method which can optimize for various clustering internal validation metrics and naturally determines the optimal number of clusters. It successfully addresses the challenge of mixed numerical and categorical data and achieves comparable or superior performance to other clustering methods on both synthetic and real-world datasets while offering significantly higher interpretability.

...read moreread less

46 citations

Journal Article•DOI•

HIVE-COTE 2.0: a new meta ensemble for time series classification

[...]

Matthew Middlehurst¹, James Large¹, Michael Flynn¹, Jason Lines¹, Aaron Bostrom¹, Anthony J. Bagnall¹ - Show less +2 more•Institutions (1)

University of East Anglia¹

24 Sep 2021-Machine Learning

TL;DR: The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is a heterogeneous meta ensemble for time series classification as discussed by the authors, which forms its ensemble from classifiers of multiple domains, including phase-independent shapelets, bag-of-words based dictionaries and phasedependent intervals.

...read moreread less

Abstract: The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is a heterogeneous meta ensemble for time series classification. HIVE-COTE forms its ensemble from classifiers of multiple domains, including phase-independent shapelets, bag-of-words based dictionaries and phase-dependent intervals. Since it was first proposed in 2016, the algorithm has remained state of the art for accuracy on the UCR time series classification archive. Over time it has been incrementally updated, culminating in its current state, HIVE-COTE 1.0. During this time a number of algorithms have been proposed which match the accuracy of HIVE-COTE. We propose comprehensive changes to the HIVE-COTE algorithm which significantly improve its accuracy and usability, presenting this upgrade as HIVE-COTE 2.0. We introduce two novel classifiers, the Temporal Dictionary Ensemble and Diverse Representation Canonical Interval Forest, which replace existing ensemble members. Additionally, we introduce the Arsenal, an ensemble of ROCKET classifiers as a new HIVE-COTE 2.0 constituent. We demonstrate that HIVE-COTE 2.0 is significantly more accurate on average than the current state of the art on 112 univariate UCR archive datasets and 26 multivariate UEA archive datasets.

...read moreread less

46 citations

Journal Article•DOI•

The voice of optimization

[...]

Dimitris Bertsimas¹, Bartolomeo Stellato¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Feb 2021-Machine Learning

TL;DR: In this paper, the authors introduce the idea of using OCTs and OCT-Hs, interpretable machine learning algorithms developed by Bertsimas and Dunn, to obtain insight on the strategy behind the optimal solution in continuous and mixed-integer convex optimization problems.

...read moreread less

Abstract: We introduce the idea that using optimal classification trees (OCTs) and optimal classification trees with-hyperplanes (OCT-Hs), interpretable machine learning algorithms developed by Bertsimas and Dunn (Mach Learn 106(7):1039–1082, 2017), we are able to obtain insight on the strategy behind the optimal solution in continuous and mixed-integer convex optimization problem as a function of key parameters that affect the problem. In this way, optimization is not a black box anymore. Instead, we redefine optimization as a multiclass classification problem where the predictor gives insights on the logic behind the optimal solution. In other words, OCTs and OCT-Hs give optimization a voice. We show on several realistic examples that the accuracy behind our method is in the 90–100% range, while even when the predictions are not correct, the degree of suboptimality or infeasibility is very low. We compare optimal strategy predictions of OCTs and OCT-Hs and feedforward neural networks (NNs) and conclude that the performance of OCT-Hs and NNs is comparable. OCTs are somewhat weaker but often competitive. Therefore, our approach provides a novel insightful understanding of optimal strategies to solve a broad class of continuous and mixed-integer optimization problems.

...read moreread less

43 citations

Journal Article•DOI•

OWL2Vec*: Embedding of OWL Ontologies

[...]

Jiaoyan Chen¹, Pan Hu¹, Ernesto Jiménez-Ruiz², Ernesto Jiménez-Ruiz³, Ole Magnus Holter², Denvar Antonyrajah⁴, Ian Horrocks¹ - Show less +3 more•Institutions (4)

University of Oxford¹, University of Oslo², City University London³, Samsung⁴

16 Jun 2021-Machine Learning

TL;DR: In this paper, a random walk and word embedding based ontology embedding method named OWL2Vec*, which encodes the semantics of an OWL ontology by taking into account its graph structure, lexical information and logical constructors.

...read moreread less

Abstract: Semantic embedding of knowledge graphs has been widely studied and used for prediction and statistical analysis tasks across various domains such as Natural Language Processing and the Semantic Web. However, less attention has been paid to developing robust methods for embedding OWL (Web Ontology Language) ontologies, which contain richer semantic information than plain knowledge graphs, and have been widely adopted in domains such as bioinformatics. In this paper, we propose a random walk and word embedding based ontology embedding method named OWL2Vec*, which encodes the semantics of an OWL ontology by taking into account its graph structure, lexical information and logical constructors. Our empirical evaluation with three real world datasets suggests that OWL2Vec* benefits from these three different aspects of an ontology in class membership prediction and class subsumption prediction tasks. Furthermore, OWL2Vec* often significantly outperforms the state-of-the-art methods in our experiments.

...read moreread less

40 citations

Journal Article•DOI•

F*: an interpretable transformation of the F-measure

[...]

David J. Hand¹, Peter Christen², Nishadi Kirielle²•Institutions (2)

Imperial College London¹, Australian National University²

15 Mar 2021-Machine Learning

TL;DR: This paper proposed a simple transformation of the F-measure, which they call $$F^*$$ (F-star), which has an immediate practical interpretation, and showed that combining two aspects of performance as conceptually distinct as precision and recall is the best way to combine them.

...read moreread less

Abstract: The F-measure, also known as the F1-score, is widely used to assess the performance of classification algorithms. However, some researchers find it lacking in intuitive interpretation, questioning the appropriateness of combining two aspects of performance as conceptually distinct as precision and recall, and also questioning whether the harmonic mean is the best way to combine them. To ease this concern, we describe a simple transformation of the F-measure, which we call $$F^*$$ (F-star), which has an immediate practical interpretation.

...read moreread less

Journal Article•DOI•

Learning programs by learning from failures

[...]

Andrew Cropper¹, Rolf Morel¹•Institutions (1)

University of Oxford¹

19 Feb 2021-Machine Learning

TL;DR: Popper is introduced, an ILP system that implements this approach by combining answer set programming and Prolog, and shows that constraints drastically improve learning performance, and Popper can outperform existing ILP systems, both in terms of predictive accuracies and learning times.

...read moreread less

Abstract: We describe an inductive logic programming (ILP) approach called learning from failures. In this approach, an ILP system (the learner) decomposes the learning problem into three separate stages: generate, test, and constrain. In the generate stage, the learner generates a hypothesis (a logic program) that satisfies a set of hypothesis constraints (constraints on the syntactic form of hypotheses). In the test stage, the learner tests the hypothesis against training examples. A hypothesis fails when it does not entail all the positive examples or entails a negative example. If a hypothesis fails, then, in the constrain stage, the learner learns constraints from the failed hypothesis to prune the hypothesis space, i.e. to constrain subsequent hypothesis generation. For instance, if a hypothesis is too general (entails a negative example), the constraints prune generalisations of the hypothesis. If a hypothesis is too specific (does not entail all the positive examples), the constraints prune specialisations of the hypothesis. This loop repeats until either (i) the learner finds a hypothesis that entails all the positive and none of the negative examples, or (ii) there are no more hypotheses to test. We introduce Popper, an ILP system that implements this approach by combining answer set programming and Prolog. Popper supports infinite problem domains, reasoning about lists and numbers, learning textually minimal programs, and learning recursive programs. Our experimental results on three domains (toy game problems, robot strategies, and list transformations) show that (i) constraints drastically improve learning performance, and (ii) Popper can outperform existing ILP systems, both in terms of predictive accuracies and learning times.

...read moreread less

Journal Article•DOI•

Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics

[...]

Felix Berkenkamp¹, Andreas Krause¹, Angela P. Schoellig²•Institutions (2)

ETH Zurich¹, University of Toronto²

24 Jun 2021-Machine Learning

TL;DR: In this paper, the authors present a generalized algorithm that allows for multiple safety constraints separate from the objective, which can be used to safely transfer knowledge to new situations and tasks, and demonstrate that the proposed algorithm enables fast, automatic, and safe optimization of tuning parameters in experiments on a quadrotor vehicle.

...read moreread less

Abstract: Selecting the right tuning parameters for algorithms is a pravelent problem in machine learning that can significantly affect the performance of algorithms. Data-efficient optimization algorithms, such as Bayesian optimization, have been used to automate this process. During experiments on real-world systems such as robotic platforms these methods can evaluate unsafe parameters that lead to safety-critical system failures and can destroy the system. Recently, a safe Bayesian optimization algorithm, called SafeOpt, has been developed, which guarantees that the performance of the system never falls below a critical value; that is, safety is defined based on the performance function. However, coupling performance and safety is often not desirable in practice, since they are often opposing objectives. In this paper, we present a generalized algorithm that allows for multiple safety constraints separate from the objective. Given an initial set of safe parameters, the algorithm maximizes performance but only evaluates parameters that satisfy safety for all constraints with high probability. To this end, it carefully explores the parameter space by exploiting regularity assumptions in terms of a Gaussian process prior. Moreover, we show how context variables can be used to safely transfer knowledge to new situations and tasks. We provide a theoretical analysis and demonstrate that the proposed algorithm enables fast, automatic, and safe optimization of tuning parameters in experiments on a quadrotor vehicle.

...read moreread less

Journal Article•DOI•

CPAS: the UK's national machine learning-based hospital capacity planning system for COVID-19.

[...]

Zhaozhi Qian¹, Ahmed M. Alaa², Mihaela van der Schaar², Mihaela van der Schaar³, Mihaela van der Schaar¹ - Show less +1 more•Institutions (3)

University of Cambridge¹, University of California, Los Angeles², The Turing Institute³

01 Jan 2021-Machine Learning

TL;DR: In this paper, the authors developed the COVID-19 Capacity Planning and Analysis System (CPAS), a machine learning-based system for hospital resource planning that has successfully deployed at individual hospitals and across regions in the UK in coordination with NHS Digital.

...read moreread less

Abstract: The coronavirus disease 2019 (COVID-19) global pandemic poses the threat of overwhelming healthcare systems with unprecedented demands for intensive care resources. Managing these demands cannot be effectively conducted without a nationwide collective effort that relies on data to forecast hospital demands on the national, regional, hospital and individual levels. To this end, we developed the COVID-19 Capacity Planning and Analysis System (CPAS)-a machine learning-based system for hospital resource planning that we have successfully deployed at individual hospitals and across regions in the UK in coordination with NHS Digital. In this paper, we discuss the main challenges of deploying a machine learning-based decision support system at national scale, and explain how CPAS addresses these challenges by (1) defining the appropriate learning problem, (2) combining bottom-up and top-down analytical approaches, (3) using state-of-the-art machine learning algorithms, (4) integrating heterogeneous data sources, and (5) presenting the result with an interactive and transparent interface. CPAS is one of the first machine learning-based systems to be deployed in hospitals on a national scale to address the COVID-19 pandemic-we conclude the paper with a summary of the lessons learned from this experience.

...read moreread less

Journal Article•DOI•

Boosting Poisson regression models with telematics car driving data

[...]

Guangyuan Gao¹, He Wang², Mario V. Wüthrich³•Institutions (3)

Renmin University of China¹, Southern University of Science and Technology², ETH Zurich³

21 Mar 2021-Machine Learning

TL;DR: It is concluded from the numerical analysis that both classical actuarial risk factors and telematics car driving data are necessary to receive the best predictive models and that these two sources of information interact and complement each other.

...read moreread less

Abstract: With the emergence of telematics car driving data, insurance companies have started to boost classical actuarial regression models for claim frequency prediction with telematics car driving information. In this paper, we propose two data-driven neural network approaches that process telematics car driving data to complement classical actuarial pricing with a driving behavior risk factor from telematics data. Our neural networks simultaneously accommodate feature engineering and regression modeling which allows us to integrate telematics car driving data in a one-step approach into the claim frequency regression models. We conclude from our numerical analysis that both classical actuarial risk factors and telematics car driving data are necessary to receive the best predictive models. This emphasizes that these two sources of information interact and complement each other.

...read moreread less

Journal Article•DOI•

Spatial dependence between training and test sets: another pitfall of classification accuracy assessment in remote sensing

[...]

N. Karasiak¹, Jean-François Dejoux¹, Claude Monteil¹, David Sheeren¹•Institutions (1)

University of Toulouse¹

26 Apr 2021-Machine Learning

TL;DR: Experiments based on Sentinel-2 data for mapping two simple forest classes show that spatial leave-one-out cross-validation is the better strategy to provide unbiased estimates of predictive error, contrary to traditional non-spatial cross- validation that overestimates accuracy.

...read moreread less

Abstract: Spatial autocorrelation is inherent to remotely sensed data. Nearby pixels are more similar than distant ones. This property can help to improve the classification performance, by adding spatial or contextual features into the model. However, it can also lead to overestimation of generalisation capabilities, if the spatial dependence between training and test sets is ignored. In this paper, we review existing approaches that deal with spatial autocorrelation for image classification in remote sensing and demonstrate the importance of bias in accuracy metrics when spatial independence between the training and test sets is not respected. We compare three spatial and non-spatial cross-validation strategies at pixel and object levels and study how performances vary at different sample sizes. Experiments based on Sentinel-2 data for mapping two simple forest classes show that spatial leave-one-out cross-validation is the better strategy to provide unbiased estimates of predictive error. Its performance metrics are consistent with the real quality of the resulting map contrary to traditional non-spatial cross-validation that overestimates accuracy. This highlight the need to change practices in classification accuracy assessment. To encourage it we developped Museo ToolBox, an open-source python library that makes spatial cross-validation possible.

...read moreread less

Journal Article•DOI•

Transforming variables to central normality

[...]

Jakob Raymaekers¹, Peter J. Rousseeuw¹•Institutions (1)

Katholieke Universiteit Leuven¹

21 Mar 2021-Machine Learning

TL;DR: This work proposes a modification of the Box–Cox and Yeo–Johnson transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it.

...read moreread less

Abstract: Many real data sets contain numerical features (variables) whose distribution is far from normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box–Cox and Yeo–Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose a modification of these transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it. It compares favorably to existing techniques in an extensive simulation study and on real data.

...read moreread less

Journal Article•DOI•

How to measure uncertainty in uncertainty sampling for active learning

[...]

Vu-Linh Nguyen¹, Mohammad Hossein Shaker², Eyke Hüllermeier²•Institutions (2)

Eindhoven University of Technology¹, University of Paderborn²

18 Jun 2021-Machine Learning

TL;DR: In this article, the authors compare different uncertainty measures for active learning and compare their performance in an experimental study with different sampling strategies, and analyze the properties of the sampling strategies and compare them in an empirical study.

...read moreread less

Abstract: Various strategies for active learning have been proposed in the machine learning literature. In uncertainty sampling, which is among the most popular approaches, the active learner sequentially queries the label of those instances for which its current prediction is maximally uncertain. The predictions as well as the measures used to quantify the degree of uncertainty, such as entropy, are traditionally of a probabilistic nature. Yet, alternative approaches to capturing uncertainty in machine learning, alongside with corresponding uncertainty measures, have been proposed in recent years. In particular, some of these measures seek to distinguish different sources and to separate different types of uncertainty, such as the reducible (epistemic) and the irreducible (aleatoric) part of the total uncertainty in a prediction. The goal of this paper is to elaborate on the usefulness of such measures for uncertainty sampling, and to compare their performance in active learning. To this end, we instantiate uncertainty sampling with different measures, analyze the properties of the sampling strategies thus obtained, and compare them in an experimental study.

...read moreread less

Journal Article•DOI•

Large-scale pinball twin support vector machines

[...]

Muhammad Tanveer¹, Aruna Tiwari¹, Rahul Choudhary¹, M.A. Ganaie¹•Institutions (1)

Indian Institute of Technology Indore¹

04 Oct 2021-Machine Learning

TL;DR: In this paper, a pinball twin support vector machine (LPTWSVM) is proposed to solve the problem of noise sensitivity in TWSVM by utilizing the pinball loss function.

...read moreread less

Abstract: Twin support vector machines (TWSVMs) have been shown to be effective classifiers for a range of pattern classification tasks. However, the TWSVM formulation suffers from a range of shortcomings: (i) TWSVM uses hinge loss function which renders it sensitive to dataset outliers (noise sensitivity). (ii) It requires a matrix inversion calculation in the Wolfe-dual formulation which is intractable for datasets with large numbers of features/samples. (iii) TWSVM minimizes the empirical risk instead of the structural risk in its formulation with the consequent risk of overfitting. This paper proposes a novel large scale pinball twin support vector machines (LPTWSVM) to address these shortcomings. The proposed LPTWSVM model firstly utilizes the pinball loss function to achieve a high level of noise insensitivity, especially in relation to data with substantial feature noise. Secondly, and most significantly, the proposed LPTWSVM formulation eliminates the need to calculate inverse matrices in the dual problem (which apart from being very computationally demanding may not be possible due to matrix singularity). Further, LPTWSVM does not employ kernel-generated surfaces for the non-linear case, instead using the kernel trick directly; this ensures that the proposed LPTWSVM is a fully modular kernel approach in contrast to the original TWSVM. Lastly, structural risk is explicitly minimized in LPTWSVM with consequent improvement in classification accuracy (we explicitly analyze the properties of classification accuracy and noise insensitivity of the proposed LPTWSVM). Experiments on benchmark datasets show that the proposed LPTWSVM model may be effectively deployed on large datasets and that it exhibits similar or better performance on most datasets in comparison to relevant baseline methods.

...read moreread less

Journal Article•DOI•

Beneficial and harmful explanatory machine learning

[...]

Lun Ai¹, Stephen Muggleton¹, Céline Hocquette¹, Mark Gromowski², Ute Schmid² - Show less +1 more•Institutions (2)

Imperial College London¹, University of Bamberg²

11 Mar 2021-Machine Learning

TL;DR: Investigating the explanatory effects of a machine learned theory in the context of simple two person games and proposing a framework for identifying the harmfulness of machine explanations based on the Cognitive Science literature indicate that human learning aided by a symbolic machine learning theory which satisfies a cognitive window has achieved significantly higher performance than human self learning.

...read moreread less

Abstract: Given the recent successes of Deep Learning in AI there has been increased interest in the role and need for explanations in machine learned theories. A distinct notion in this context is that of Michie’s definition of ultra-strong machine learning (USML). USML is demonstrated by a measurable increase in human performance of a task following provision to the human of a symbolic machine learned theory for task performance. A recent paper demonstrates the beneficial effect of a machine learned logic theory for a classification task, yet no existing work to our knowledge has examined the potential harmfulness of machine’s involvement for human comprehension during learning. This paper investigates the explanatory effects of a machine learned theory in the context of simple two person games and proposes a framework for identifying the harmfulness of machine explanations based on the Cognitive Science literature. The approach involves a cognitive window consisting of two quantifiable bounds and it is supported by empirical evidence collected from human trials. Our quantitative and qualitative results indicate that human learning aided by a symbolic machine learned theory which satisfies a cognitive window has achieved significantly higher performance than human self learning. Results also demonstrate that human learning aided by a symbolic machine learned theory that fails to satisfy this window leads to significantly worse performance than unaided human learning.

...read moreread less

Journal Article•DOI•

Density-based weighting for imbalanced regression

[...]

Michael Steininger¹, Konstantin Kobs¹, Padraig Davidson¹, Anna Krause¹, Andreas Hotho¹ - Show less +1 more•Institutions (1)

University of Würzburg¹

07 Jul 2021-Machine Learning

TL;DR: In this paper, a sample weighting approach for imbalanced regression datasets called DenseWeight and a cost-sensitive learning approach for neural network regression with imbalanced data based on the weighting scheme is proposed.

...read moreread less

Abstract: In many real world settings, imbalanced data impedes model performance of learning algorithms, like neural networks, mostly for rare cases. This is especially problematic for tasks focusing on these rare occurrences. For example, when estimating precipitation, extreme rainfall events are scarce but important considering their potential consequences. While there are numerous well studied solutions for classification settings, most of them cannot be applied to regression easily. Of the few solutions for regression tasks, barely any have explored cost-sensitive learning which is known to have advantages compared to sampling-based methods in classification tasks. In this work, we propose a sample weighting approach for imbalanced regression datasets called DenseWeight and a cost-sensitive learning approach for neural network regression with imbalanced data called DenseLoss based on our weighting scheme. DenseWeight weights data points according to their target value rarities through kernel density estimation (KDE). DenseLoss adjusts each data point’s influence on the loss according to DenseWeight, giving rare data points more influence on model training compared to common data points. We show on multiple differently distributed datasets that DenseLoss significantly improves model performance for rare data points through its density-based weighting scheme. Additionally, we compare DenseLoss to the state-of-the-art method SMOGN, finding that our method mostly yields better performance. Our approach provides more control over model training as it enables us to actively decide on the trade-off between focusing on common or rare cases through a single hyperparameter, allowing the training of better models for rare data points.

...read moreread less

Journal Article•DOI•

Embed2Detect: temporally clustered embedded words for event detection in social media

[...]

Hansi Hettiarachchi¹, Mariam Adedoyin-Olowe¹, Jagdev Bhogal¹, Mohamed Medhat Gaber¹•Institutions (1)

Birmingham City University¹

24 May 2021-Machine Learning

TL;DR: In this article, a novel method termed Embed2Detect is proposed for event detection in social media by combining the characteristics in word embeddings and hierarchical agglomerative clustering.

...read moreread less

Abstract: Social media is becoming a primary medium to discuss what is happening around the world. Therefore, the data generated by social media platforms contain rich information which describes the ongoing events. Further, the timeliness associated with these data is capable of facilitating immediate insights. However, considering the dynamic nature and high volume of data production in social media data streams, it is impractical to filter the events manually and therefore, automated event detection mechanisms are invaluable to the community. Apart from a few notable exceptions, most previous research on automated event detection have focused only on statistical and syntactical features in data and lacked the involvement of underlying semantics which are important for effective information retrieval from text since they represent the connections between words and their meanings. In this paper, we propose a novel method termed Embed2Detect for event detection in social media by combining the characteristics in word embeddings and hierarchical agglomerative clustering. The adoption of word embeddings gives Embed2Detect the capability to incorporate powerful semantical features into event detection and overcome a major limitation inherent in previous approaches. We experimented our method on two recent real social media data sets which represent the sports and political domain and also compared the results to several state-of-the-art methods. The obtained results show that Embed2Detect is capable of effective and efficient event detection and it outperforms the recent event detection methods. For the sports data set, Embed2Detect achieved 27% higher F-measure than the best-performed baseline and for the political data set, it was an increase of 29%.

...read moreread less

Journal Article•DOI•

Grounded action transformation for sim-to-real reinforcement learning

[...]

Josiah P. Hanna¹, Siddharth Desai², Haresh Karnan², Garrett Warnell², Peter Stone² - Show less +1 more•Institutions (2)

University of Edinburgh¹, University of Texas at Austin²

13 May 2021-Machine Learning

TL;DR: This article introduces a new algorithm for gsl—Grounded Action Transformation (GAT)—and applies it to learning control policies for a humanoid robot and empirically shows that sgat leads to successful real world transfer in situations where gat may fail to find a good policy.

...read moreread less

Abstract: Reinforcement learning in simulation is a promising alternative to the prohibitive sample cost of reinforcement learning in the physical world. Unfortunately, policies learned in simulation often perform worse than hand-coded policies when applied on the target, physical system. Grounded simulation learning (gsl) is a general framework that promises to address this issue by altering the simulator to better match the real world (Farchy et al. 2013 in Proceedings of the 12th international conference on autonomous agents and multiagent systems (AAMAS)). This article introduces a new algorithm for gsl—Grounded Action Transformation (GAT)—and applies it to learning control policies for a humanoid robot. We evaluate our algorithm in controlled experiments where we show it to allow policies learned in simulation to transfer to the real world. We then apply our algorithm to learning a fast bipedal walk on a humanoid robot and demonstrate a 43.27% improvement in forward walk velocity compared to a state-of-the art hand-coded walk. This striking empirical success notwithstanding, further empirical analysis shows that gat may struggle when the real world has stochastic state transitions. To address this limitation we generalize gat to the stochastic gat (sgat) algorithm and empirically show that sgat leads to successful real world transfer in situations where gat may fail to find a good policy. Our results contribute to a deeper understanding of grounded simulation learning and demonstrate its effectiveness for applying reinforcement learning to learn robot control policies entirely in simulation.

...read moreread less

Journal Article•DOI•

Loss aware post-training quantization

[...]

Yury Nahshan¹, Brian Chmiel¹, Brian Chmiel², Chaim Baskin², Evgenii Zheltonozhskii², Ron Banner¹, Alexander M. Bronstein², Avi Mendelson² - Show less +4 more•Institutions (2)

Intel¹, Technion – Israel Institute of Technology²

01 Oct 2021-Machine Learning

TL;DR: In this paper, the effect of quantization on the structure of the loss landscape is studied and the authors design a method that quantizes the layer parameters jointly, enabling significant accuracy improvement over current post-training quantization methods.

...read moreread less

Abstract: Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. We show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results. We show that with more aggressive quantization, the loss landscape becomes highly non-separable with steep curvature, making the selection of quantization parameters more challenging. Armed with this understanding, we design a method that quantizes the layer parameters jointly, enabling significant accuracy improvement over current post-training quantization methods. Reference implementation is available at https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq .

...read moreread less

Journal Article•DOI•

Bandit algorithms to personalize educational chatbots

[...]

William Cai¹, Josh Grossman¹, Zhiyuan Jerry Lin¹, Hao Sheng¹, Johnny Tian-Zheng Wei², Joseph Jay Williams³, Sharad Goel¹ - Show less +3 more•Institutions (3)

Stanford University¹, University of Massachusetts Amherst², University of Toronto³

25 May 2021-Machine Learning

TL;DR: The findings suggest that personalized conversational agents are promising tools to complement existing online resources for math education, and that data-driven approaches such as contextual bandits are valuable tools for learning effective personalization.

...read moreread less

Abstract: To emulate the interactivity of in-person math instruction, we developed MathBot, a rule-based chatbot that explains math concepts, provides practice questions, and offers tailored feedback. We evaluated MathBot through three Amazon Mechanical Turk studies in which participants learned about arithmetic sequences. In the first study, we found that more than 40% of our participants indicated a preference for learning with MathBot over videos and written tutorials from Khan Academy. The second study measured learning gains, and found that MathBot produced comparable gains to Khan Academy videos and tutorials. We solicited feedback from users in those two studies to emulate a real-world development cycle, with some users finding the lesson too slow and others finding it too fast. We addressed these concerns in the third and main study by integrating a contextual bandit algorithm into MathBot to personalize the pace of the conversation, allowing the bandit to either insert extra practice problems or skip explanations. We randomized participants between two conditions in which actions were chosen uniformly at random (i.e., a randomized A/B experiment) or by the contextual bandit. We found that the bandit learned a similarly effective pedagogical policy to that learned by the randomized A/B experiment while incurring a lower cost of experimentation. Our findings suggest that personalized conversational agents are promising tools to complement existing online resources for math education, and that data-driven approaches such as contextual bandits are valuable tools for learning effective personalization.

...read moreread less

Journal Article•DOI•

Semi-supervised semantic segmentation in Earth Observation: the MiniFrance suite, dataset analysis and multi-task network study

[...]

Javiera Castillo-Navarro¹, Javiera Castillo-Navarro², Bertrand Le Saux³, Alexandre Boulch⁴, Nicolas Audebert⁵, Sébastien Lefèvre² - Show less +2 more•Institutions (5)

Université Paris-Saclay¹, University of Southern Brittany², European Space Agency³, Valeo⁴, Conservatoire national des arts et métiers⁵

14 Apr 2021-Machine Learning

TL;DR: This work introduces a novel large-scale dataset for semi-supervised semantic segmentation in Earth Observation, the MiniFrance suite, and presents tools for data representativeness analysis in terms of appearance similarity and a thorough study of MiniFrance data, demonstrating that it is suitable for learning and generalizes well in a semi- supervised setting.

...read moreread less

Abstract: The development of semi-supervised learning techniques is essential to enhance the generalization capacities of machine learning algorithms. Indeed, raw image data are abundant while labels are scarce, therefore it is crucial to leverage unlabeled inputs to build better models. The availability of large databases have been key for the development of learning algorithms with high level performance. Despite the major role of machine learning in Earth Observation to derive products such as land cover maps, datasets in the field are still limited, either because of modest surface coverage, lack of variety of scenes or restricted classes to identify. We introduce a novel large-scale dataset for semi-supervised semantic segmentation in Earth Observation, the MiniFrance suite. MiniFrance has several unprecedented properties: it is large-scale, containing over 2000 very high resolution aerial images, accounting for more than 200 billions samples (pixels); it is varied, covering 16 conurbations in France, with various climates, different landscapes, and urban as well as countryside scenes; and it is challenging, considering land use classes with high-level semantics. Nevertheless, the most distinctive quality of MiniFrance is being the only dataset in the field especially designed for semi-supervised learning: it contains labeled and unlabeled images in its training partition, which reproduces a life-like scenario. Along with this dataset, we present tools for data representativeness analysis in terms of appearance similarity and a thorough study of MiniFrance data, demonstrating that it is suitable for learning and generalizes well in a semi-supervised setting. Finally, we present semi-supervised deep architectures based on multi-task learning and the first experiments on MiniFrance. These results will serve as baselines for future work on semi-supervised learning over the MiniFrance dataset. The Minifrance suite and related semi-supervised networks will be publicly available to promote semi-supervised works in Earth Observation.

...read moreread less

Journal Article•DOI•

A Deep Reinforcement Learning Framework for Continuous Intraday Market Bidding

[...]

Ioannis Boukas¹, Damien Ernst¹, Thibaut Théate¹, Adrien Bolland¹, Alexandre Huynen, Martin Buchwald, Christelle Wynants, Bertrand Cornélusse¹ - Show less +4 more•Institutions (1)

University of Liège¹

12 Jul 2021-Machine Learning

TL;DR: In this paper, a Markov Decision Process (MDP) is used to solve the sequential decision-making problem of trading in the European continuous intraday market where exchanges occur through a centralized order book, where storage device operator is the maximization of the profits received over the entire trading horizon, while taking into account the operational constraints of the unit.

...read moreread less

Abstract: The large integration of variable energy resources is expected to shift a large part of the energy exchanges closer to real-time, where more accurate forecasts are available. In this context, the short-term electricity markets and in particular the intraday market are considered a suitable trading floor for these exchanges to occur. A key component for the successful renewable energy sources integration is the usage of energy storage. In this paper, we propose a novel modelling framework for the strategic participation of energy storage in the European continuous intraday market where exchanges occur through a centralized order book. The goal of the storage device operator is the maximization of the profits received over the entire trading horizon, while taking into account the operational constraints of the unit. The sequential decision-making problem of trading in the intraday market is modelled as a Markov Decision Process. An asynchronous version of the fitted Q iteration algorithm is chosen for solving this problem due to its sample efficiency. The large and variable number of the existing orders in the order book motivates the use of high-level actions and an alternative state representation. Historical data are used for the generation of a large number of artificial trajectories in order to address exploration issues during the learning process. The resulting policy is back-tested and compared against a number of benchmark strategies. Finally, the impact of the storage characteristics on the total revenues collected in the intraday market is evaluated.

...read moreread less

Journal Article•DOI•

Global optimization based on active preference learning with radial basis functions

[...]

Alberto Bemporad¹, Dario Piga²•Institutions (2)

IMT Institute for Advanced Studies Lucca¹, Dalle Molle Institute for Artificial Intelligence Research²

01 Feb 2021-Machine Learning

TL;DR: In this paper, a radial-basis function surrogate is fit via linear or quadratic programming, satisfying if possible the preferences expressed by the decision maker on existing samples, and the surrogate is used to propose a new sample of the decision vector for comparison with the current best candidate based on two possible criteria: minimize a combination of the surrogate and an inverse weighting distance function to balance between exploitation of surrogate and exploration of the space, or maximize a function related to the probability that the new candidate will be preferred.

...read moreread less

Abstract: This paper proposes a method for solving optimization problems in which the decision-maker cannot evaluate the objective function, but rather can only express a preference such as “this is better than that” between two candidate decision vectors. The algorithm described in this paper aims at reaching the global optimizer by iteratively proposing the decision maker a new comparison to make, based on actively learning a surrogate of the latent (unknown and perhaps unquantifiable) objective function from past sampled decision vectors and pairwise preferences. A radial-basis function surrogate is fit via linear or quadratic programming, satisfying if possible the preferences expressed by the decision maker on existing samples. The surrogate is used to propose a new sample of the decision vector for comparison with the current best candidate based on two possible criteria: minimize a combination of the surrogate and an inverse weighting distance function to balance between exploitation of the surrogate and exploration of the decision space, or maximize a function related to the probability that the new candidate will be preferred. Compared to active preference learning based on Bayesian optimization, we show that our approach is competitive in that, within the same number of comparisons, it usually approaches the global optimum more closely and is computationally lighter. Applications of the proposed algorithm to solve a set of benchmark global optimization problems, for multi-objective optimization, and for optimal tuning of a cost-sensitive neural network classifier for object recognition from images are described in the paper. MATLAB and a Python implementations of the algorithms described in the paper are available at http://cse.lab.imtlucca.it/~bemporad/glis .

...read moreread less

Journal Article•DOI•

Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

[...]

Amarildo Likmeta¹, Amarildo Likmeta², Alberto Maria Metelli¹, Giorgia Ramponi¹, Andrea Tirinzoni¹, Matteo Giuliani¹, Marcello Restelli¹ - Show less +3 more•Institutions (2)

Polytechnic University of Milan¹, University of Bologna²

14 Mar 2021-Machine Learning

TL;DR: This paper discusses how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications and resorts to a class of truly batch model-free IRL algorithms.

...read moreread less

Abstract: In real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. Scaling IRL to real-world cases has proved challenging as typically only a fixed dataset of demonstrations is available and further interactions with the environment are not allowed. For this reason, we resort to a class of truly batch model-free IRL algorithms and we present three application scenarios: (1) the high-level decision-making problem in the highway driving scenario, and (2) inferring the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. For each of these scenarios, we provide formalization, experiments and a discussion to interpret the obtained results.

...read moreread less

Journal Article•DOI•

A Bayesian-inspired, deep learning-based, semi-supervised domain adaptation technique for land cover mapping

[...]

Benjamin Lucas¹, Charlotte Pelletier², Daniel F. Schmidt¹, Geoffrey I. Webb¹, François Petitjean¹ - Show less +1 more•Institutions (2)

Monash University¹, University of Southern Brittany²

04 Mar 2021-Machine Learning

TL;DR: Sourcerer, a Bayesian-inspired, deep learning-based, semi-supervised DA technique for producing land cover maps from SITS data, is presented and it is shown that on two different source-target domain pairings Sourcerer outperforms all other methods for any quantity of labelled target data available.

...read moreread less

Abstract: Land cover maps are a vital input variable to many types of environmental research and management. While they can be produced automatically by machine learning techniques, these techniques require substantial training data to achieve high levels of accuracy, which are not always available. One technique researchers use when labelled training data are scarce is domain adaptation (DA)—where data from an alternate region, known as the source domain, are used to train a classifier and this model is adapted to map the study region, or target domain. The scenario we address in this paper is known as semi-supervised DA, where some labelled samples are available in the target domain. In this paper we present Sourcerer, a Bayesian-inspired, deep learning-based, semi-supervised DA technique for producing land cover maps from satellite image time series (SITS) data. The technique takes a convolutional neural network trained on a source domain and then trains further on the available target domain with a novel regularizer applied to the model weights. The regularizer adjusts the degree to which the model is modified to fit the target data, limiting the degree of change when the target data are few in number and increasing it as target data quantity increases. Our experiments on Sentinel-2 time series images compare Sourcerer with two state-of-the-art semi-supervised domain adaptation techniques and four baseline models. We show that on two different source-target domain pairings Sourcerer outperforms all other methods for any quantity of labelled target data available. In fact, the results on the more difficult target domain show that the starting accuracy of Sourcerer (when no labelled target data are available), 74.2%, is greater than the next-best state-of-the-art method trained on 20,000 labelled target instances.

...read moreread less

Collapse