Showing papers on "Generalization published in 2015"

PDF

Open Access

Proceedings Article•DOI•

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

[...]

Kai Sheng Tai¹, Richard Socher², Christopher D. Manning¹•Institutions (2)

Stanford University¹, University of Colorado Boulder²

28 Feb 2015

TL;DR: The authors introduced the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies, which outperformed all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences (SemEval 2014, Task 1) and sentiment classification (Stanford Sentiment Treebank).

...read moreread less

Abstract: A Long Short-Term Memory (LSTM) network is a type of recurrent neural network architecture which has recently obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure that has been explored so far is a linear chain. However, natural language exhibits syntactic properties that would naturally combine words to phrases. We introduce the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies. TreeLSTMs outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences (SemEval 2014, Task 1) and sentiment classification (Stanford Sentiment Treebank).

...read moreread less

2,702 citations

Proceedings Article•DOI•

Deep learning and the information bottleneck principle

[...]

Naftali Tishby¹, Noga Zaslavsky¹•Institutions (1)

Hebrew University of Jerusalem¹

25 Jun 2015

TL;DR: It is argued that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer.

...read moreread less

Abstract: Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds. The advantage of getting closer to the theoretical limit is quantifiable both by the generalization bound and by the network's simplicity. We argue that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer. The hierarchical representations at the layered network naturally correspond to the structural phase transitions along the information curve. We believe that this new insight can lead to new optimality bounds and deep learning algorithms.

...read moreread less

1,187 citations

Journal Article•DOI•

Learning class-specific descriptors for deformable shapes using localized spectral convolutional networks

[...]

Davide Boscaini¹, Jonathan Masci¹, Simone Melzi², Michael M. Bronstein¹, Umberto Castellani², Pierre Vandergheynst³ - Show less +2 more•Institutions (3)

University of Lugano¹, University of Verona², École Polytechnique Fédérale de Lausanne³

06 Jul 2015

TL;DR: Experimental results show that the proposed approach allows learning class‐specific shape descriptors significantly outperforming recent state‐of‐the‐art methods on standard benchmarks.

...read moreread less

Abstract: In this paper, we propose a generalization of convolutional neural networks (CNN) to non-Euclidean domains for the analysis of deformable shapes. Our construction is based on localized frequency analysis (a generalization of the windowed Fourier transform to manifolds) that is used to extract the local behavior of some dense intrinsic descriptor, roughly acting as an analogy to patches in images. The resulting local frequency representations are then passed through a bank of filters whose coefficient are determined by a learning procedure minimizing a task-specific cost. Our approach generalizes several previous methods such as HKS, WKS, spectral CNN, and GPS embeddings. Experimental results show that the proposed approach allows learning class-specific shape descriptors significantly outperforming recent state-of-the-art methods on standard benchmarks.

...read moreread less

244 citations

Journal Article•DOI•

Is Extreme Learning Machine Feasible? A Theoretical Assessment (Part I)

[...]

Xia Liu¹, Shaobo Lin¹, Jian Fang¹, Zongben Xu¹•Institutions (1)

Xi'an Jiaotong University¹

01 Jan 2015-IEEE Transactions on Neural Networks

TL;DR: A comprehensive feasibility analysis of ELM is conducted and it is revealed that there also exists some activation functions, which makes the corresponding ELM degrade the generalization capability.

...read moreread less

Abstract: An extreme learning machine (ELM) is a feedforward neural network (FNN) like learning system whose connections with output neurons are adjustable, while the connections with and within hidden neurons are randomly fixed. Numerous applications have demonstrated the feasibility and high efficiency of ELM-like systems. It has, however, been open if this is true for any general applications. In this two-part paper, we conduct a comprehensive feasibility analysis of ELM. In Part I, we provide an answer to the question by theoretically justifying the following: 1) for some suitable activation functions, such as polynomials, Nadaraya-Watson and sigmoid functions, the ELM-like systems can attain the theoretical generalization bound of the FNNs with all connections adjusted, i.e., they do not degrade the generalization capability of the FNNs even when the connections with and within hidden neurons are randomly fixed; 2) the number of hidden neurons needed for an ELM-like system to achieve the theoretical bound can be estimated; and 3) whenever the activation function is taken as polynomial, the deduced hidden layer output matrix is of full column-rank, therefore the generalized inverse technique can be efficiently applied to yield the solution of an ELM-like system, and, furthermore, for the nonpolynomial case, the Tikhonov regularization can be applied to guarantee the weak regularity while not sacrificing the generalization capability. In Part II, however, we reveal a different aspect of the feasibility of ELM: there also exists some activation functions, which makes the corresponding ELM degrade the generalization capability. The obtained results underlie the feasibility and efficiency of ELM-like systems, and yield various generalizations and improvements of the systems as well.

...read moreread less

185 citations

Book Chapter•DOI•

A Deeper Look at Dataset Bias

[...]

Tatiana Tommasi¹, Novi Patricia², Novi Patricia³, Barbara Caputo⁴, Tinne Tuytelaars⁵ - Show less +1 more•Institutions (5)

University of North Carolina at Chapel Hill¹, École Polytechnique Fédérale de Lausanne², Idiap Research Institute³, Sapienza University of Rome⁴, Katholieke Universiteit Leuven⁵

07 Oct 2015

TL;DR: In this paper, the authors propose to verify the potential of the DeCAF features when facing the dataset bias problem and conduct a series of analyses looking at how existing datasets differ among each other and verifying the performance of existing debiasing methods under different representations.

...read moreread less

Abstract: The presence of a bias in each image data collection has recently attracted a lot of attention in the computer vision community showing the limits in generalization of any learning method trained on a specific dataset. At the same time, with the rapid development of deep learning architectures, the activation values of Convolutional Neural Networks (CNN) are emerging as reliable and robust image descriptors. In this paper we propose to verify the potential of the DeCAF features when facing the dataset bias problem. We conduct a series of analyses looking at how existing datasets differ among each other and verifying the performance of existing debiasing methods under different representations. We learn important lessons on which part of the dataset bias problem can be considered solved and which open questions still need to be tackled.

...read moreread less

185 citations

Journal Article•DOI•

Fractional calculus for interval-valued functions

[...]

Vasile Lupulescu

15 Apr 2015-Fuzzy Sets and Systems

TL;DR: A generalization of the Hukuhara difference for closed intervals on the real line is used to develop a theory of the fractional calculus for interval-valued functions.

...read moreread less

166 citations

Posted Content•

On (p,q)-analogue of Bernstein Operators

[...]

Mohammad Mursaleen¹, Khursheed J. Ansari¹, Asif Khan¹•Institutions (1)

Aligarh Muslim University¹

24 Mar 2015

TL;DR: In this article, a new analogue of Bernstein operators is introduced, called (p, q)-Bernstein operators, which is a generalization of q-Bernstein operator and also study approximation properties based on Korovkin's type approximation theorem.

...read moreread less

Abstract: In this paper, we introduce a new analogue of Bernstein operators and we call it as (p, q)-Bernstein operators which is a generalization of q-Bernstein operators. We also study approximation properties based on Korovkin's type approximation theorem of (p, q)-Bernstein operators and establish some direct theorems. Furthermore, we show comparisons and some illustrative graphics for the convergence of operators to a function.

...read moreread less

163 citations

Journal Article•DOI•

Variational and optimal control representations of conditioned and driven processes

[...]

Raphael Chetrite¹, Hugo Touchette²•Institutions (2)

University of Nice Sophia Antipolis¹, Stellenbosch University²

10 Dec 2015-Journal of Statistical Mechanics: Theory and Experiment

TL;DR: These interpretations of the driven process generalize and unify many previous results on maximum entropy approaches to nonequilibrium systems, spectral characterizations of positive operators, and control approaches to large deviation theory and lead to new methods for analytically or numerically approximating large deviation functions.

...read moreread less

Abstract: We have shown recently that a Markov process conditioned on rare events involving time-integrated random variables can be described in the long-time limit by an effective Markov process, called the driven process, which is given mathematically by a generalization of Doob's $h$-transform. We show here that this driven process can be represented in two other ways: first, as a process satisfying various variational principles involving large deviation functions and relative entropies and, second, as an optimal stochastic control process minimizing a cost function also related to large deviation functions. These interpretations of the driven process generalize and unify many previous results on maximum entropy approaches to nonequilibrium systems, spectral characterizations of positive operators, and control approaches to large deviation theory. They also lead, as briefly discussed, to new methods for analytically or numerically approximating large deviation functions.

...read moreread less

157 citations

Journal Article•DOI•

Are Gibbs-Type Priors the Most Natural Generalization of the Dirichlet Process?

[...]

Pierpaolo De Blasi¹, Stefano Favaro¹, Antonio Lijoi², Ramsés H. Mena³, Igor Prünster¹, Matteo Luca Ruggiero¹ - Show less +2 more•Institutions (3)

University of Turin¹, Collegio Carlo Alberto², National Autonomous University of Mexico³

01 Feb 2015-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The goal of this paper is to provide a systematic and unified treatment of Gibbs–type priors and highlight their implications for Bayesian nonparametric inference.

...read moreread less

Abstract: Discrete random probability measures and the exchangeable random partitions they induce are key tools for addressing a variety of estimation and prediction problems in Bayesian inference. Here we focus on the family of Gibbs–type priors, a recent elegant generalization of the Dirichlet and the Pitman–Yor process priors. These random probability measures share properties that are appealing both from a theoretical and an applied point of view: (i) they admit an intuitive predictive characterization justifying their use in terms of a precise assumption on the learning mechanism; (ii) they stand out in terms of mathematical tractability; (iii) they include several interesting special cases besides the Dirichlet and the Pitman–Yor processes. The goal of our paper is to provide a systematic and unified treatment of Gibbs–type priors and highlight their implications for Bayesian nonparametric inference. We deal with their distributional properties, the resulting estimators, frequentist asymptotic validation and the construction of time–dependent versions. Applications, mainly concerning mixture models and species sampling, serve to convey the main ideas. The intuition inherent to this class of priors and the neat results they lead to make one wonder whether it actually represents the most natural generalization of the Dirichlet process.

...read moreread less

150 citations

Posted Content•

Generalization in Adaptive Data Analysis and Holdout Reuse

[...]

Cynthia Dwork¹, Vitaly Feldman², Moritz Hardt³, Toniann Pitassi⁴, Omer Reingold⁵, Aaron Roth⁶ - Show less +2 more•Institutions (6)

Microsoft¹, IBM², Google³, University of Toronto⁴, Samsung⁵, University of Pennsylvania⁶

08 Jun 2015-arXiv: Learning

TL;DR: In this paper, a simple and practical method for reusing a holdout (or testing) set to validate the accuracy of hypotheses produced by a learning algorithm operating on a training set is presented.

...read moreread less

Abstract: Overfitting is the bane of data analysts, even when data are plentiful. Formal approaches to understanding this problem focus on statistical inference and generalization of individual analysis procedures. Yet the practice of data analysis is an inherently interactive and adaptive process: new analyses and hypotheses are proposed after seeing the results of previous ones, parameters are tuned on the basis of obtained results, and datasets are shared and reused. An investigation of this gap has recently been initiated by the authors in (Dwork et al., 2014), where we focused on the problem of estimating expectations of adaptively chosen functions. In this paper, we give a simple and practical method for reusing a holdout (or testing) set to validate the accuracy of hypotheses produced by a learning algorithm operating on a training set. Reusing a holdout set adaptively multiple times can easily lead to overfitting to the holdout set itself. We give an algorithm that enables the validation of a large number of adaptively chosen hypotheses, while provably avoiding overfitting. We illustrate the advantages of our algorithm over the standard use of the holdout set via a simple synthetic experiment. We also formalize and address the general problem of data reuse in adaptive data analysis. We show how the differential-privacy based approach given in (Dwork et al., 2014) is applicable much more broadly to adaptive data analysis. We then show that a simple approach based on description length can also be used to give guarantees of statistical validity in adaptive settings. Finally, we demonstrate that these incomparable approaches can be unified via the notion of approximate max-information that we introduce.

...read moreread less

148 citations

Proceedings Article•

Towards a Learning Theory of Cause-Effect Inference

[...]

David Lopez-Paz¹, David Lopez-Paz², Krikamol Muandet¹, Bernhard Sch lkopf¹, Iliya Tolstikhin¹ - Show less +1 more•Institutions (2)

Max Planck Society¹, University of Cambridge²

06 Jul 2015

TL;DR: This work poses causal inference as the problem of learning to classify probability distributions, and extends the ideas to infer causal relationships between more than two variables.

...read moreread less

Abstract: We pose causal inference as the problem of learning to classify probability distributions. In particular, we assume access to a collection {(Si, li)}in=1, where each Si is a sample drawn from the probability distribution of Xi×Yi, and li is a binary label indicating whether "Xi→Yi" or "Xi←Yi". Given these data, we build a causal inference rule in two steps. First, we featurize each Si using the kernel mean embedding associated with some characteristic kernel. Second, we train a binary classifier on such embeddings to distinguish between causal directions. We present generalization bounds showing the statistical consistency and learning rates of the proposed approach, and provide a simple implementation that achieves state-of-the-art cause-effect inference. Furthermore, we extend our ideas to infer causal relationships between more than two variables.

...read moreread less

Journal Article•DOI•

Multi-marginal optimal transport: theory and applications ∗

[...]

Brendan Pass¹•Institutions (1)

University of Alberta¹

01 Nov 2015-Mathematical Modelling and Numerical Analysis

TL;DR: The multi-marginal optimal transport problem (MOPT) as mentioned in this paper is a generalization of the two-dimensional optimal transport (NP) problem, and it has attracted considerable attention over the past five years due to a wide variety of emerging applications.

...read moreread less

Abstract: Over the past five years, multi-marginal optimal transport, a generalization of the well known optimal transport problem of Monge and Kantorovich, has begun to attract considerable atten- tion, due in part to a wide variety of emerging applications. Here, we survey this problem, addressing fundamental theoretical questions including the uniqueness and structure of solutions. The answers to these questions uncover a surprising divergence from the classical two marginal setting, and reflect a delicate dependence on the cost function, which we then illustrate with a series of examples. We go on to describe some applications of the multi-marginal optimal transport problem, focusing primarily on matching in economics and density functional theory in physics.

...read moreread less

Posted Content•

Deep Learning and the Information Bottleneck Principle

[...]

Naftali Tishby¹, Noga Zaslavsky¹•Institutions (1)

Hebrew University of Jerusalem¹

09 Mar 2015-arXiv: Learning

TL;DR: In this article, the mutual information between the layers and the input and output variables is quantified by using this representation to calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds.

...read moreread less

Journal Article•DOI•

Directional monotonicity of fusion functions

[...]

Humberto Bustince¹, Javier Fernández¹, Anna Kolesárová², Radko Mesiar², Radko Mesiar³ - Show less +1 more•Institutions (3)

University of Navarra¹, Slovak University of Technology in Bratislava², University of Ostrava³

01 Jul 2015-European Journal of Operational Research

TL;DR: The directional monotonicity of piecewise linear fusion functions is completely characterized and results cover, among others, weighted arithmetic means, OWA operators, the Choquet, Sugeno and Shilkret integrals.

...read moreread less

Proceedings Article•

Generalization in adaptive data analysis and holdout reuse

[...]

Cynthia Dwork¹, Vitaly Feldman², Moritz Hardt³, Toniann Pitassi⁴, Omer Reingold⁵, Aaron Roth⁶ - Show less +2 more•Institutions (6)

Microsoft¹, IBM², Google³, University of Toronto⁴, Samsung⁵, University of Pennsylvania⁶

07 Dec 2015

TL;DR: A simple and practical method for reusing a holdout set to validate the accuracy of hypotheses produced by a learning algorithm operating on a training set and it is shown that a simple approach based on description length can also be used to give guarantees of statistical validity in adaptive settings.

...read moreread less

Abstract: Overfitting is the bane of data analysts, even when data are plentiful. Formal approaches to understanding this problem focus on statistical inference and generalization of individual analysis procedures. Yet the practice of data analysis is an inherently interactive and adaptive process: new analyses and hypotheses are proposed after seeing the results of previous ones, parameters are tuned on the basis of obtained results, and datasets are shared and reused. An investigation of this gap has recently been initiated by the authors in [7], where we focused on the problem of estimating expectations of adaptively chosen functions. In this paper, we give a simple and practical method for reusing a holdout (or testing) set to validate the accuracy of hypotheses produced by a learning algorithm operating on a training set. Reusing a holdout set adaptively multiple times can easily lead to overfitting to the holdout set itself. We give an algorithm that enables the validation of a large number of adaptively chosen hypotheses, while provably avoiding overfitting. We illustrate the advantages of our algorithm over the standard use of the holdout set via a simple synthetic experiment. We also formalize and address the general problem of data reuse in adaptive data analysis. We show how the differential-privacy based approach given in [7] is applicable much more broadly to adaptive data analysis. We then show that a simple approach based on description length can also be used to give guarantees of statistical validity in adaptive settings. Finally, we demonstrate that these incomparable approaches can be unified via the notion of approximate max-information that we introduce. This, in particular, allows the preservation of statistical validity guarantees even when an analyst adaptively composes algorithms which have guarantees based on either of the two approaches.

...read moreread less

Journal Article•DOI•

Existence and classification of overtwisted contact structures in all dimensions

[...]

Matthew Strom Borman¹, Yakov Eliashberg¹, Emmy Murphy²•Institutions (2)

Stanford University¹, Massachusetts Institute of Technology²

01 Dec 2015-Acta Mathematica

TL;DR: In this paper, a parametric extension of the h-principle for overtwisted contact structures on manifolds of all dimensions was established, which implies that any closed manifold admits a contact structure in any given homotopy class of almost contact structures.

...read moreread less

Abstract: We establish a parametric extension h-principle for overtwisted contact structures on manifolds of all dimensions, which is the direct generalization of the 3-dimensional result from [12]. It implies, in particular, that any closed manifold admits a contact structure in any given homotopy class of almost contact structures.

...read moreread less

Journal Article•DOI•

A KAM algorithm for the resonant non-linear Schrödinger equation

[...]

Claudio Procesi¹, Michela Procesi¹•Institutions (1)

Sapienza University of Rome¹

26 Feb 2015-Advances in Mathematics

TL;DR: In this paper, the existence of large families of stable and unstable quasi-periodic solutions for the NLS in any number of independent frequencies was proved through a KAM algorithm.

...read moreread less

Journal Article•DOI•

[...]

Shan Ye, Jun Ye¹•Institutions (1)

Shaoxing University¹

01 Jul 2015-Neutrosophic Sets and Systems

TL;DR: The concept of a single valued neutrosophic multiset (SVNM) is introduced as a generalization of an intuitionistic fuzzy multisets (IFM) and some basic operational relations of SVNMs and the Dice similarity measure is proposed and applied to a medical diagnosis problem with SVNM information.

...read moreread less

Abstract: This paper introduces the concept of a single valued neutrosophic multiset (SVNM) as a generalization of an intuitionistic fuzzy multiset (IFM) and some basic operational relations of SVNMs, and then proposes the Dice similarity measure and the weighted Dice similarity measure for SVNMs and investigates their properties. Finally, the Dice similarity measure is applied to a medical diagnosis problem with SVNM information. This diagnosis method can deal with the medical diagnosis problem with indeterminate and inconsistent information which cannot be handled by the diagnosis method based on IFMs.

...read moreread less

Journal Article•DOI•

A survey of water level fluctuation predicting in Urmia Lake using support vector machine with firefly algorithm

[...]

Ozgur Kisi¹, Jalal Shiri², Sepideh Karimi², Shahaboddin Shamshirband³, Shervin Motamedi⁴, Dalibor Petković⁵, Roslan Hashim⁴ - Show less +3 more•Institutions (5)

Canik Başarı University¹, University of Tabriz², Information Technology University³, University of Malaya⁴, University of Niš⁵

01 Nov 2015-Applied Mathematics and Computation

TL;DR: The experimental results showed that an improvement in the predictive accuracy and capability of generalization can be achieved by the SVM-FA approach in comparison to the GP and ANN in 1 day ahead lake level forecast.

...read moreread less

Journal Article•DOI•

Generalization of Pairwise Models to non-Markovian Epidemics on Networks.

[...]

Istvan Z. Kiss¹, Gergely Röst², Zsolt Vizi²•Institutions (2)

University of Sussex¹, University of Szeged²

13 Aug 2015-Physical Review Letters

TL;DR: It is shown that the pairwise model and the analytic results can be generalized to an arbitrary distribution of the infectious times, using integro-differential equations, and this leads to a general expression for the final epidemic size.

...read moreread less

Abstract: In this Letter, a generalization of pairwise models to non-Markovian epidemics on networks is presented. For the case of infectious periods of fixed length, the resulting pairwise model is a system of delay differential equations, which shows excellent agreement with results based on stochastic simulations. Furthermore, we analytically compute a new R0-like threshold quantity and an analytical relation between this and the final epidemic size. Additionally, we show that the pairwise model and the analytic results can be generalized to an arbitrary distribution of the infectious times, using integro-differential equations, and this leads to a general expression for the final epidemic size. By showing the rigorous link between non-Markovian dynamics and pairwise delay differential equations, we provide the framework for a more systematic understanding of non-Markovian dynamics.

...read moreread less

Journal Article•DOI•

Novelty and Inductive Generalization in Human Reinforcement Learning.

[...]

Samuel J. Gershman¹, Yael Niv²•Institutions (2)

Massachusetts Institute of Technology¹, Princeton University²

01 Jul 2015-Topics in Cognitive Science

TL;DR: It is shown how hierarchical Bayesian inference can be used to solve the reinforcement learning problem, and an equivalence between the Bayesian model and temporal difference learning algorithms that have been proposed as models of RL in humans and animals is described.

...read moreread less

Abstract: In reinforcement learning (RL), a decision maker searching for the most rewarding option is often faced with the question: What is the value of an option that has never been tried before? One way to frame this question is as an inductive problem: How can I generalize my previous experience with one set of options to a novel option? We show how hierarchical Bayesian inference can be used to solve this problem, and we describe an equivalence between the Bayesian model and temporal difference learning algorithms that have been proposed as models of RL in humans and animals. According to our view, the search for the best option is guided by abstract knowledge about the relationships between different options in an environment, resulting in greater search efficiency compared to traditional RL algorithms previously applied to human cognition. In two behavioral experiments, we test several predictions of our model, providing evidence that humans learn and exploit structured inductive knowledge to make predictions about novel options. In light of this model, we suggest a new interpretation of dopaminergic responses to novelty.

...read moreread less

Proceedings Article•

A generalization of submodular cover via the diminishing return property on the integer lattice

[...]

Tasuku Soma¹, Yuichi Yoshida²•Institutions (2)

University of Tokyo¹, National Institute of Informatics²

07 Dec 2015

TL;DR: This work considers a generalization of the submodular cover problem based on the concept of diminishing return property on the integer lattice and devise a bicriteria approximation algorithm that is guaranteed to output a log-factor approximate solution that satisfies the constraints with the desired accuracy.

...read moreread less

Abstract: We consider a generalization of the submodular cover problem based on the concept of diminishing return property on the integer lattice. We are motivated by real scenarios in machine learning that cannot be captured by (traditional) sub-modular set functions. We show that the generalized submodular cover problem can be applied to various problems and devise a bicriteria approximation algorithm. Our algorithm is guaranteed to output a log-factor approximate solution that satisfies the constraints with the desired accuracy. The running time of our algorithm is roughly O(n log(nr) log r), where n is the size of the ground set and r is the maximum value of a coordinate. The dependency on r is exponentially better than the naive reduction algorithms. Several experiments on real and artificial datasets demonstrate that the solution quality of our algorithm is comparable to naive algorithms, while the running time is several orders of magnitude faster.

...read moreread less

Journal Article•DOI•

Some Concepts and Theorems of Uncertain Random Process

[...]

Jinwu Gao¹, Kai Yao²•Institutions (2)

Renmin University of China¹, Chinese Academy of Sciences²

01 Jan 2015-Journal of intelligent systems

TL;DR: In this article, an uncertain random variable has been proposed as a generalization of both the stochastic process and the uncertain process, and some special types of uncertain random processes such as stationary increment process and renewal process are discussed.

...read moreread less

Abstract: To deal with a system with both randomness and uncertainty, chance theory has been built and an uncertain random variable has been proposed as a generalization of random variable and uncertain variable Correspondingly, as a generalization of both the stochastic process and the uncertain process, this paper will propose an uncertain random process In addition, some special types of uncertain random processes such as stationary increment process and renewal process will also be discussed

...read moreread less

Journal Article•DOI•

Probabilistic Preferences in the Graph Model for Conflict Resolution

[...]

Leandro Chaves Rêgo¹, Andrea Maria dos Santos²•Institutions (2)

Federal University of Pernambuco¹, Federal Institute of Pernambuco²

06 Jan 2015

TL;DR: A generalization of the graph model for conflict resolution is presented, introducing the possibility of decision makers (DMs) expressing their preferences among the possible scenarios through probabilistic preferences.

...read moreread less

Abstract: We present a generalization of the graph model for conflict resolution, introducing the possibility of decision makers (DMs) expressing their preferences among the possible scenarios through probabilistic preferences In this new scenario, four stability definitions (solution concepts) are proposed: 1) $\boldsymbol {\alpha }$ -Nash stability; 2) $(\boldsymbol {\alpha }, \boldsymbol {\beta })$ -metarationality; 3) $(\boldsymbol {\alpha }, \boldsymbol {\beta })$ -symmetric metarationality; and 4) $(\boldsymbol {\alpha }, \boldsymbol {\beta }, \boldsymbol {\gamma })$ -sequential stability We deal with conflicts that involve two or more DMs Relationships between these definitions are demonstrated, as well as an analysis of how the values of the parameters $\boldsymbol {\alpha }$ , $\boldsymbol {\beta }$ , and $\boldsymbol {\gamma }$ influence the set of stable states is made Applications of the proposed model to conflicts involving two and three DMs are presented The analysis of these applications highlights the advantages gained by allowing individuals to express their preferences probabilistically

...read moreread less

Proceedings Article•DOI•

Multi-view Domain Generalization for Visual Recognition

[...]

Li Niu¹, Wen Li², Dong Xu¹•Institutions (2)

Nanyang Technological University¹, ETH Zurich²

01 Dec 2015

TL;DR: This paper builds upon exemplar SVMs to learn a set of SVM classifiers by using one positive sample and all negative samples in the source domain each time, and introduces a new regularizer to minimize the mismatch between any two representation matrices on different views.

...read moreread less

Abstract: In this paper, we propose a new multi-view domain generalization (MVDG) approach for visual recognition, in which we aim to use the source domain samples with multiple types of features (i.e., multi-view features) to learn robust classifiers that can generalize well to any unseen target domain. Considering the recent works show the domain generalization capability can be enhanced by fusing multiple SVM classifiers, we build upon exemplar SVMs to learn a set of SVM classifiers by using one positive sample and all negative samples in the source domain each time. When the source domain samples come from multiple latent domains, we expect the weight vectors of exemplar SVM classifiers can be organized into multiple hidden clusters. To exploit such cluster structure, we organize the weight vectors learnt on each view as a weight matrix and seek the low-rank representation by reconstructing this weight matrix using itself as the dictionary. To enforce the consistency of inherent cluster structures discovered from the weight matrices learnt on different views, we introduce a new regularizer to minimize the mismatch between any two representation matrices on different views. We also develop an efficient alternating optimization algorithm and further extend our MVDG approach for domain adaptation by exploiting the manifold structure of unlabeled target domain samples. Comprehensive experiments for visual recognition clearly demonstrate the effectiveness of our approaches for domain generalization and domain adaptation.

...read moreread less

Proceedings Article•

Lifelong learning with non-i.i.d. tasks

[...]

Anastasia Pentina¹, Christoph H. Lampert¹•Institutions (1)

Institute of Science and Technology Austria¹

07 Dec 2015

TL;DR: A PAC-Bayesian theorem is proved that can be seen as a direct generalization of the analogous previous result for the i.i.d. case and an inductive bias in form of a transfer procedure is proposed to learn.

...read moreread less

Abstract: In this work we aim at extending the theoretical foundations of lifelong learning. Previous work analyzing this scenario is based on the assumption that learning tasks are sampled i.i.d. from a task environment or limited to strongly constrained data distributions. Instead, we study two scenarios when lifelong learning is possible, even though the observed tasks do not form an i.i.d. sample: first, when they are sampled from the same environment, but possibly with dependencies, and second, when the task environment is allowed to change over time in a consistent way. In the first case we prove a PAC-Bayesian theorem that can be seen as a direct generalization of the analogous previous result for the i.i.d. case. For the second scenario we propose to learn an inductive bias in form of a transfer procedure. We present a generalization bound and show on a toy example how it can be used to identify a beneficial transfer algorithm.

...read moreread less

Journal Article•DOI•

A New Look at Reweighted Message Passing

[...]

Vladimir Kolmogorov

01 May 2015-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work proposes a new family of message passing techniques for MAP estimation in graphical models which is simpler than the original derivation of TRW-S, and does not involve a decomposition into trees, which allows easy generalizations.

...read moreread less

Abstract: We propose a new family of message passing techniques for MAP estimation in graphical models which we call Sequential Reweighted Message Passing (SRMP). Special cases include well-known techniques such as Min-Sum Diffusion (MSD) and a faster Sequential Tree-Reweighted Message Passing (TRW-S). Importantly, our derivation is simpler than the original derivation of TRW-S, and does not involve a decomposition into trees. This allows easy generalizations. The new family of algorithms can be viewed as a generalization of TRW-S from pairwise to higher-order graphical models. We test SRMP on several real-world problems with promising results.

...read moreread less

Book Chapter•DOI•

On a Generalization of Marcel Riesz’ Potentials and the Semi-Groups generated by them

[...]

William Feller

01 Jan 2015

Journal Article•DOI•

Z-Number-Based Linear Programming

[...]

Rafik A. Aliev, A. V. Alizadeh¹, Oleg H. Huseynov², Konul Jabbarova²•Institutions (2)

Azerbaijan University¹, Azerbaijan State Oil Academy²

01 May 2015-Journal of intelligent systems

TL;DR: This paper presents a new approach to account for reliability of information within the framework of LP, using models with interval, fuzzy, generalized fuzzy, and random numbers.

...read moreread less

Abstract: Linear programming LP is the operations research technique frequently used in the fields of science, economics, business, management science, and engineering. Although it is investigated and applied for more than six decades, and LP models with different level of generalization of information about parameters including models with interval, fuzzy, generalized fuzzy, and random numbers are considered, until now there is no approach to account for reliability of information within the framework of LP.

...read moreread less

Journal Article•DOI•

Deep Belief Network-Based Approaches for Link Prediction in Signed Social Networks

[...]

Feng Liu¹, Bingquan Liu¹, Chengjie Sun¹, Ming Liu¹, Xiaolong Wang¹ - Show less +1 more•Institutions (1)

Harbin Institute of Technology¹

10 Apr 2015-Entropy

TL;DR: The deep belief network (DBN)-based approaches for link prediction are proposed, including an unsupervised link prediction model, a feature representation method and a DBN-based link prediction method, which can predict the values of the links with high performance and have a good generalization ability across these datasets.

...read moreread less

Abstract: In some online social network services (SNSs), the members are allowed to label their relationships with others, and such relationships can be represented as the links with signed values (positive or negative). The networks containing such relations are named signed social networks (SSNs), and some real-world complex systems can be also modeled with SSNs. Given the information of the observed structure of an SSN, the link prediction aims to estimate the values of the unobserved links. Noticing that most of the previous approaches for link prediction are based on the members’ similarity and the supervised learning method, however, research work on the investigation of the hidden principles that drive the behaviors of social members are rarely conducted. In this paper, the deep belief network (DBN)-based approaches for link prediction are proposed. Including an unsupervised link prediction model, a feature representation method and a DBN-based link prediction method are introduced. The experiments are done on the datasets from three SNSs (social networking services) in different domains, and the results show that our methods can predict the values of the links with high performance and have a good generalization ability across these datasets.

...read moreread less

Collapse