scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A survey of multi-view machine learning

17 Feb 2013-Neural Computing and Applications (Springer London)-Vol. 23, Iss: 7, pp 2031-2038
TL;DR: This paper reviews theories developed to understand the properties and behaviors of multi-view learning and gives a taxonomy of approaches according to the machine learning mechanisms involved and the fashions in which multiple views are exploited.
Abstract: Multi-view learning or learning with multiple distinct feature sets is a rapidly growing direction in machine learning with well theoretical underpinnings and great practical success. This paper reviews theories developed to understand the properties and behaviors of multi-view learning and gives a taxonomy of approaches according to the machine learning mechanisms involved and the fashions in which multiple views are exploited. This survey aims to provide an insightful organization of current developments in the field of multi-view learning, identify their limitations, and give suggestions for further research. One feature of this survey is that we attempt to point out specific open problems which can hopefully be useful to promote the research of multi-view machine learning.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
TL;DR: In this paper, a taxonomy of recent contributions related to explainability of different machine learning models, including those aimed at explaining Deep Learning methods, is presented, and a second dedicated taxonomy is built and examined in detail.

2,827 citations

Journal ArticleDOI
01 Jan 2021
TL;DR: Transfer learning aims to improve the performance of target learners on target domains by transferring the knowledge contained in different but related source domains as discussed by the authors, in which the dependence on a large number of target-domain data can be reduced for constructing target learners.
Abstract: Transfer learning aims at improving the performance of target learners on target domains by transferring the knowledge contained in different but related source domains. In this way, the dependence on a large number of target-domain data can be reduced for constructing target learners. Due to the wide application prospects, transfer learning has become a popular and promising area in machine learning. Although there are already some valuable and impressive surveys on transfer learning, these surveys introduce approaches in a relatively isolated way and lack the recent advances in transfer learning. Due to the rapid expansion of the transfer learning area, it is both necessary and challenging to comprehensively review the relevant studies. This survey attempts to connect and systematize the existing transfer learning research studies, as well as to summarize and interpret the mechanisms and the strategies of transfer learning in a comprehensive way, which may help readers have a better understanding of the current research status and ideas. Unlike previous surveys, this survey article reviews more than 40 representative transfer learning approaches, especially homogeneous transfer learning approaches, from the perspectives of data and model. The applications of transfer learning are also briefly introduced. In order to show the performance of different transfer learning models, over 20 representative transfer learning models are used for experiments. The models are performed on three different data sets, that is, Amazon Reviews, Reuters-21578, and Office-31, and the experimental results demonstrate the importance of selecting appropriate transfer learning models for different applications in practice.

2,433 citations

Posted Content
TL;DR: Previous efforts to define explainability in Machine Learning are summarized, establishing a novel definition that covers prior conceptual propositions with a major focus on the audience for which explainability is sought, and a taxonomy of recent contributions related to the explainability of different Machine Learning models are proposed.
Abstract: In the last years, Artificial Intelligence (AI) has achieved a notable momentum that may deliver the best of expectations over many application sectors across the field. For this to occur, the entire community stands in front of the barrier of explainability, an inherent problem of AI techniques brought by sub-symbolism (e.g. ensembles or Deep Neural Networks) that were not present in the last hype of AI. Paradigms underlying this problem fall within the so-called eXplainable AI (XAI) field, which is acknowledged as a crucial feature for the practical deployment of AI models. This overview examines the existing literature in the field of XAI, including a prospect toward what is yet to be reached. We summarize previous efforts to define explainability in Machine Learning, establishing a novel definition that covers prior conceptual propositions with a major focus on the audience for which explainability is sought. We then propose and discuss about a taxonomy of recent contributions related to the explainability of different Machine Learning models, including those aimed at Deep Learning methods for which a second taxonomy is built. This literature analysis serves as the background for a series of challenges faced by XAI, such as the crossroads between data fusion and explainability. Our prospects lead toward the concept of Responsible Artificial Intelligence, namely, a methodology for the large-scale implementation of AI methods in real organizations with fairness, model explainability and accountability at its core. Our ultimate goal is to provide newcomers to XAI with a reference material in order to stimulate future research advances, but also to encourage experts and professionals from other disciplines to embrace the benefits of AI in their activity sectors, without any prior bias for its lack of interpretability.

1,602 citations


Cites background from "A survey of multi-view machine lear..."

  • ...• Finally, Multiview Learning [417] constructs different views of the object as per the information contained in the different data sources ( Fig....

    [...]

  • ...• Finally, Multiview Learning [396] constructs different views of the object as per the information contained in the different data sources (Figure 13....

    [...]

  • ...Multiview Learning devises strategies to jointly optimize ML models learned from the aforementioned views to enhance the generalization performance, specially in those applications with weak data supervision and hence, prone to model overfitting....

    [...]

Journal ArticleDOI
TL;DR: This overview reviews theoretical underpinnings of multi-view learning and attempts to identify promising venues and point out some specific challenges which can hopefully promote further research in this rapidly developing field.

679 citations

References
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations


"A survey of multi-view machine lear..." refers methods in this paper

  • ...Loosely speaking, a concept class C is PAC-learnable by a learner L using a hypothesis space H if, for any target concept in C, L will with probability at least (1 - d) output a hypothesis whose error is less than or equal to ; after training with a reasonable number of examples and performing a reasonable amount of computation [27]....

    [...]

Book ChapterDOI
TL;DR: The concept of correlation and regression may be applied not only to ordinary one-dimensional variates but also to variates of two or more dimensions as discussed by the authors, where the correlation of the horizontal components is ordinarily discussed, whereas the complex consisting of horizontal and vertical deviations may be even more interesting.
Abstract: Concepts of correlation and regression may be applied not only to ordinary one-dimensional variates but also to variates of two or more dimensions. Marksmen side by side firing simultaneous shots at targets, so that the deviations are in part due to independent individual errors and in part to common causes such as wind, provide a familiar introduction to the theory of correlation; but only the correlation of the horizontal components is ordinarily discussed, whereas the complex consisting of horizontal and vertical deviations may be even more interesting. The wind at two places may be compared, using both components of the velocity in each place. A fluctuating vector is thus matched at each moment with another fluctuating vector. The study of individual differences in mental and physical traits calls for a detailed study of the relations between sets of correlated variates. For example the scores on a number of mental tests may be compared with physical measurements on the same persons. The questions then arise of determining the number and nature of the independent relations of mind and body shown by these data to exist, and of extracting from the multiplicity of correlations in the system suitable characterizations of these independent relations. As another example, the inheritance of intelligence in rats might be studied by applying not one but s different mental tests to N mothers and to a daughter of each

6,122 citations


"A survey of multi-view machine lear..." refers background or methods in this paper

  • ...Canonical correlation analysis, first proposed by Hotelling [21], works on a paired data set (e....

    [...]

  • ...Note that, CCA [21] and Bayesian co-training [54] also belong to the co-regularization style category....

    [...]

  • ...Canonical correlation analysis is an early and classical method for multi-view dimensionality reduction by learning subspaces jointly from different views [21]....

    [...]

  • ...Canonical correlation analysis (CCA) [21] and cotraining [8] are two representative techniques in early studies of multi-view learning....

    [...]

Book
01 Jan 2004
TL;DR: This book provides an easy introduction for students and researchers to the growing field of kernel-based pattern analysis, demonstrating with examples how to handcraft an algorithm or a kernel for a new specific application, and covering all the necessary conceptual and mathematical tools to do so.
Abstract: Kernel methods provide a powerful and unified framework for pattern discovery, motivating algorithms that can act on general types of data (e.g. strings, vectors or text) and look for general types of relations (e.g. rankings, classifications, regressions, clusters). The application areas range from neural networks and pattern recognition to machine learning and data mining. This book, developed from lectures and tutorials, fulfils two major roles: firstly it provides practitioners with a large toolkit of algorithms, kernels and solutions ready to use for standard pattern discovery problems in fields such as bioinformatics, text analysis, image analysis. Secondly it provides an easy introduction for students and researchers to the growing field of kernel-based pattern analysis, demonstrating with examples how to handcraft an algorithm or a kernel for a new specific application, and covering all the necessary conceptual and mathematical tools to do so.

6,050 citations

Proceedings ArticleDOI
24 Jul 1998
TL;DR: A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples.
Abstract: We consider the problem of using a large unlabeled sample to boost performance of a learning algorit,hrn when only a small set of labeled examples is available. In particular, we consider a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views. For example, the description of a web page can be partitioned into the words occurring on that page, and the words occurring in hyperlinks t,hat point to that page. We assume that either view of the example would be sufficient for learning if we had enough labeled data, but our goal is to use both views together to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples. Specifically, the presence of two distinct views of each example suggests strategies in which two learning algorithms are trained separately on each view, and then each algorithm’s predictions on new unlabeled examples are used to enlarge the training set of the other. Our goal in this paper is to provide a PAC-style analysis for this setting, and, more broadly, a PAC-style framework for the general problem of learning from both labeled and unlabeled data. We also provide empirical results on real web-page data indicating that this use of unlabeled examples can lead to significant improvement of hypotheses in practice. *This research was supported in part by the DARPA HPKB program under contract F30602-97-1-0215 and by NSF National Young investigator grant CCR-9357793. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. TO copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. COLT 98 Madison WI USA Copyright ACM 1998 l-58113-057--0/98/ 7...%5.00 92 Tom Mitchell School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213-3891 mitchell+@cs.cmu.edu

5,840 citations


"A survey of multi-view machine lear..." refers background or methods in this paper

  • ...The co-training style algorithms are inspired by the co-training algorithm [8], which essentially involve an iterative procedure to exploit different views....

    [...]

  • ...2.2 Effectiveness of co-training The original co-training algorithm was introduced by Blum and Mitchell [8] for semi-supervised classification that combines both labeled and unlabeled data under a twoview setting....

    [...]

  • ...The original co-training algorithm was introduced by Blum and Mitchell [8] for semi-supervised classification that combines both labeled and unlabeled data under a twoview setting....

    [...]

  • ...To justify the effectiveness of co-training, Blum and Mitchell [8] gave a PAC-style analysis....

    [...]

  • ...Canonical correlation analysis (CCA) [21] and cotraining [8] are two representative techniques in early studies of multi-view learning....

    [...]

Book ChapterDOI
01 Mar 2003
TL;DR: In this paper, the authors investigate the use of data-dependent estimates of the complexity of a function class, called Rademacher and Gaussian complexities, in a decision theoretic setting and prove general risk bounds in terms of these complexities.
Abstract: We investigate the use of certain data-dependent estimates of the complexity of a function class, called Rademacher and Gaussian complexities. In a decision theoretic setting, we prove general risk bounds in terms of these complexities. We consider function classes that can be expressed as combinations of functions from basis classes and show how the Rademacher and Gaussian complexities of such a function class can be bounded in terms of the complexity of the basis classes. We give examples of the application of these techniques in finding data-dependent risk bounds for decision trees, neural networks and support vector machines.

2,535 citations