scispace - formally typeset
Search or ask a question

Showing papers on "Generalization published in 2015"


Proceedings ArticleDOI
28 Feb 2015
TL;DR: The authors introduced the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies, which outperformed all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences (SemEval 2014, Task 1) and sentiment classification (Stanford Sentiment Treebank).
Abstract: A Long Short-Term Memory (LSTM) network is a type of recurrent neural network architecture which has recently obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure that has been explored so far is a linear chain. However, natural language exhibits syntactic properties that would naturally combine words to phrases. We introduce the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies. TreeLSTMs outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences (SemEval 2014, Task 1) and sentiment classification (Stanford Sentiment Treebank).

2,702 citations


Proceedings ArticleDOI
25 Jun 2015
TL;DR: It is argued that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer.
Abstract: Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds. The advantage of getting closer to the theoretical limit is quantifiable both by the generalization bound and by the network's simplicity. We argue that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer. The hierarchical representations at the layered network naturally correspond to the structural phase transitions along the information curve. We believe that this new insight can lead to new optimality bounds and deep learning algorithms.

1,187 citations


Journal ArticleDOI
06 Jul 2015
TL;DR: Experimental results show that the proposed approach allows learning class‐specific shape descriptors significantly outperforming recent state‐of‐the‐art methods on standard benchmarks.
Abstract: In this paper, we propose a generalization of convolutional neural networks (CNN) to non-Euclidean domains for the analysis of deformable shapes. Our construction is based on localized frequency analysis (a generalization of the windowed Fourier transform to manifolds) that is used to extract the local behavior of some dense intrinsic descriptor, roughly acting as an analogy to patches in images. The resulting local frequency representations are then passed through a bank of filters whose coefficient are determined by a learning procedure minimizing a task-specific cost. Our approach generalizes several previous methods such as HKS, WKS, spectral CNN, and GPS embeddings. Experimental results show that the proposed approach allows learning class-specific shape descriptors significantly outperforming recent state-of-the-art methods on standard benchmarks.

244 citations


Journal ArticleDOI
TL;DR: A comprehensive feasibility analysis of ELM is conducted and it is revealed that there also exists some activation functions, which makes the corresponding ELM degrade the generalization capability.
Abstract: An extreme learning machine (ELM) is a feedforward neural network (FNN) like learning system whose connections with output neurons are adjustable, while the connections with and within hidden neurons are randomly fixed. Numerous applications have demonstrated the feasibility and high efficiency of ELM-like systems. It has, however, been open if this is true for any general applications. In this two-part paper, we conduct a comprehensive feasibility analysis of ELM. In Part I, we provide an answer to the question by theoretically justifying the following: 1) for some suitable activation functions, such as polynomials, Nadaraya-Watson and sigmoid functions, the ELM-like systems can attain the theoretical generalization bound of the FNNs with all connections adjusted, i.e., they do not degrade the generalization capability of the FNNs even when the connections with and within hidden neurons are randomly fixed; 2) the number of hidden neurons needed for an ELM-like system to achieve the theoretical bound can be estimated; and 3) whenever the activation function is taken as polynomial, the deduced hidden layer output matrix is of full column-rank, therefore the generalized inverse technique can be efficiently applied to yield the solution of an ELM-like system, and, furthermore, for the nonpolynomial case, the Tikhonov regularization can be applied to guarantee the weak regularity while not sacrificing the generalization capability. In Part II, however, we reveal a different aspect of the feasibility of ELM: there also exists some activation functions, which makes the corresponding ELM degrade the generalization capability. The obtained results underlie the feasibility and efficiency of ELM-like systems, and yield various generalizations and improvements of the systems as well.

185 citations


Book ChapterDOI
07 Oct 2015
TL;DR: In this paper, the authors propose to verify the potential of the DeCAF features when facing the dataset bias problem and conduct a series of analyses looking at how existing datasets differ among each other and verifying the performance of existing debiasing methods under different representations.
Abstract: The presence of a bias in each image data collection has recently attracted a lot of attention in the computer vision community showing the limits in generalization of any learning method trained on a specific dataset. At the same time, with the rapid development of deep learning architectures, the activation values of Convolutional Neural Networks (CNN) are emerging as reliable and robust image descriptors. In this paper we propose to verify the potential of the DeCAF features when facing the dataset bias problem. We conduct a series of analyses looking at how existing datasets differ among each other and verifying the performance of existing debiasing methods under different representations. We learn important lessons on which part of the dataset bias problem can be considered solved and which open questions still need to be tackled.

185 citations


Journal ArticleDOI
TL;DR: A generalization of the Hukuhara difference for closed intervals on the real line is used to develop a theory of the fractional calculus for interval-valued functions.

166 citations


Posted Content
24 Mar 2015
TL;DR: In this article, a new analogue of Bernstein operators is introduced, called (p, q)-Bernstein operators, which is a generalization of q-Bernstein operator and also study approximation properties based on Korovkin's type approximation theorem.
Abstract: In this paper, we introduce a new analogue of Bernstein operators and we call it as (p, q)-Bernstein operators which is a generalization of q-Bernstein operators. We also study approximation properties based on Korovkin's type approximation theorem of (p, q)-Bernstein operators and establish some direct theorems. Furthermore, we show comparisons and some illustrative graphics for the convergence of operators to a function.

163 citations


Journal ArticleDOI
TL;DR: These interpretations of the driven process generalize and unify many previous results on maximum entropy approaches to nonequilibrium systems, spectral characterizations of positive operators, and control approaches to large deviation theory and lead to new methods for analytically or numerically approximating large deviation functions.
Abstract: We have shown recently that a Markov process conditioned on rare events involving time-integrated random variables can be described in the long-time limit by an effective Markov process, called the driven process, which is given mathematically by a generalization of Doob's $h$-transform. We show here that this driven process can be represented in two other ways: first, as a process satisfying various variational principles involving large deviation functions and relative entropies and, second, as an optimal stochastic control process minimizing a cost function also related to large deviation functions. These interpretations of the driven process generalize and unify many previous results on maximum entropy approaches to nonequilibrium systems, spectral characterizations of positive operators, and control approaches to large deviation theory. They also lead, as briefly discussed, to new methods for analytically or numerically approximating large deviation functions.

157 citations


Journal ArticleDOI
TL;DR: The goal of this paper is to provide a systematic and unified treatment of Gibbs–type priors and highlight their implications for Bayesian nonparametric inference.
Abstract: Discrete random probability measures and the exchangeable random partitions they induce are key tools for addressing a variety of estimation and prediction problems in Bayesian inference. Here we focus on the family of Gibbs–type priors, a recent elegant generalization of the Dirichlet and the Pitman–Yor process priors. These random probability measures share properties that are appealing both from a theoretical and an applied point of view: (i) they admit an intuitive predictive characterization justifying their use in terms of a precise assumption on the learning mechanism; (ii) they stand out in terms of mathematical tractability; (iii) they include several interesting special cases besides the Dirichlet and the Pitman–Yor processes. The goal of our paper is to provide a systematic and unified treatment of Gibbs–type priors and highlight their implications for Bayesian nonparametric inference. We deal with their distributional properties, the resulting estimators, frequentist asymptotic validation and the construction of time–dependent versions. Applications, mainly concerning mixture models and species sampling, serve to convey the main ideas. The intuition inherent to this class of priors and the neat results they lead to make one wonder whether it actually represents the most natural generalization of the Dirichlet process.

150 citations


Posted Content
TL;DR: In this paper, a simple and practical method for reusing a holdout (or testing) set to validate the accuracy of hypotheses produced by a learning algorithm operating on a training set is presented.
Abstract: Overfitting is the bane of data analysts, even when data are plentiful. Formal approaches to understanding this problem focus on statistical inference and generalization of individual analysis procedures. Yet the practice of data analysis is an inherently interactive and adaptive process: new analyses and hypotheses are proposed after seeing the results of previous ones, parameters are tuned on the basis of obtained results, and datasets are shared and reused. An investigation of this gap has recently been initiated by the authors in (Dwork et al., 2014), where we focused on the problem of estimating expectations of adaptively chosen functions. In this paper, we give a simple and practical method for reusing a holdout (or testing) set to validate the accuracy of hypotheses produced by a learning algorithm operating on a training set. Reusing a holdout set adaptively multiple times can easily lead to overfitting to the holdout set itself. We give an algorithm that enables the validation of a large number of adaptively chosen hypotheses, while provably avoiding overfitting. We illustrate the advantages of our algorithm over the standard use of the holdout set via a simple synthetic experiment. We also formalize and address the general problem of data reuse in adaptive data analysis. We show how the differential-privacy based approach given in (Dwork et al., 2014) is applicable much more broadly to adaptive data analysis. We then show that a simple approach based on description length can also be used to give guarantees of statistical validity in adaptive settings. Finally, we demonstrate that these incomparable approaches can be unified via the notion of approximate max-information that we introduce.

148 citations


Proceedings Article
06 Jul 2015
TL;DR: This work poses causal inference as the problem of learning to classify probability distributions, and extends the ideas to infer causal relationships between more than two variables.
Abstract: We pose causal inference as the problem of learning to classify probability distributions. In particular, we assume access to a collection {(Si, li)}in=1, where each Si is a sample drawn from the probability distribution of Xi×Yi, and li is a binary label indicating whether "Xi→Yi" or "Xi←Yi". Given these data, we build a causal inference rule in two steps. First, we featurize each Si using the kernel mean embedding associated with some characteristic kernel. Second, we train a binary classifier on such embeddings to distinguish between causal directions. We present generalization bounds showing the statistical consistency and learning rates of the proposed approach, and provide a simple implementation that achieves state-of-the-art cause-effect inference. Furthermore, we extend our ideas to infer causal relationships between more than two variables.

Journal ArticleDOI
TL;DR: The multi-marginal optimal transport problem (MOPT) as mentioned in this paper is a generalization of the two-dimensional optimal transport (NP) problem, and it has attracted considerable attention over the past five years due to a wide variety of emerging applications.
Abstract: Over the past five years, multi-marginal optimal transport, a generalization of the well known optimal transport problem of Monge and Kantorovich, has begun to attract considerable atten- tion, due in part to a wide variety of emerging applications. Here, we survey this problem, addressing fundamental theoretical questions including the uniqueness and structure of solutions. The answers to these questions uncover a surprising divergence from the classical two marginal setting, and reflect a delicate dependence on the cost function, which we then illustrate with a series of examples. We go on to describe some applications of the multi-marginal optimal transport problem, focusing primarily on matching in economics and density functional theory in physics.

Posted Content
TL;DR: In this article, the mutual information between the layers and the input and output variables is quantified by using this representation to calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds.
Abstract: Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds. The advantage of getting closer to the theoretical limit is quantifiable both by the generalization bound and by the network's simplicity. We argue that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer. The hierarchical representations at the layered network naturally correspond to the structural phase transitions along the information curve. We believe that this new insight can lead to new optimality bounds and deep learning algorithms.

Journal ArticleDOI
TL;DR: The directional monotonicity of piecewise linear fusion functions is completely characterized and results cover, among others, weighted arithmetic means, OWA operators, the Choquet, Sugeno and Shilkret integrals.

Proceedings Article
07 Dec 2015
TL;DR: A simple and practical method for reusing a holdout set to validate the accuracy of hypotheses produced by a learning algorithm operating on a training set and it is shown that a simple approach based on description length can also be used to give guarantees of statistical validity in adaptive settings.
Abstract: Overfitting is the bane of data analysts, even when data are plentiful. Formal approaches to understanding this problem focus on statistical inference and generalization of individual analysis procedures. Yet the practice of data analysis is an inherently interactive and adaptive process: new analyses and hypotheses are proposed after seeing the results of previous ones, parameters are tuned on the basis of obtained results, and datasets are shared and reused. An investigation of this gap has recently been initiated by the authors in [7], where we focused on the problem of estimating expectations of adaptively chosen functions. In this paper, we give a simple and practical method for reusing a holdout (or testing) set to validate the accuracy of hypotheses produced by a learning algorithm operating on a training set. Reusing a holdout set adaptively multiple times can easily lead to overfitting to the holdout set itself. We give an algorithm that enables the validation of a large number of adaptively chosen hypotheses, while provably avoiding overfitting. We illustrate the advantages of our algorithm over the standard use of the holdout set via a simple synthetic experiment. We also formalize and address the general problem of data reuse in adaptive data analysis. We show how the differential-privacy based approach given in [7] is applicable much more broadly to adaptive data analysis. We then show that a simple approach based on description length can also be used to give guarantees of statistical validity in adaptive settings. Finally, we demonstrate that these incomparable approaches can be unified via the notion of approximate max-information that we introduce. This, in particular, allows the preservation of statistical validity guarantees even when an analyst adaptively composes algorithms which have guarantees based on either of the two approaches.

Journal ArticleDOI
TL;DR: In this paper, a parametric extension of the h-principle for overtwisted contact structures on manifolds of all dimensions was established, which implies that any closed manifold admits a contact structure in any given homotopy class of almost contact structures.
Abstract: We establish a parametric extension h-principle for overtwisted contact structures on manifolds of all dimensions, which is the direct generalization of the 3-dimensional result from [12]. It implies, in particular, that any closed manifold admits a contact structure in any given homotopy class of almost contact structures.

Journal ArticleDOI
TL;DR: In this paper, the existence of large families of stable and unstable quasi-periodic solutions for the NLS in any number of independent frequencies was proved through a KAM algorithm.

Journal ArticleDOI
Shan Ye, Jun Ye1
TL;DR: The concept of a single valued neutrosophic multiset (SVNM) is introduced as a generalization of an intuitionistic fuzzy multisets (IFM) and some basic operational relations of SVNMs and the Dice similarity measure is proposed and applied to a medical diagnosis problem with SVNM information.
Abstract: This paper introduces the concept of a single valued neutrosophic multiset (SVNM) as a generalization of an intuitionistic fuzzy multiset (IFM) and some basic operational relations of SVNMs, and then proposes the Dice similarity measure and the weighted Dice similarity measure for SVNMs and investigates their properties. Finally, the Dice similarity measure is applied to a medical diagnosis problem with SVNM information. This diagnosis method can deal with the medical diagnosis problem with indeterminate and inconsistent information which cannot be handled by the diagnosis method based on IFMs.

Journal ArticleDOI
TL;DR: The experimental results showed that an improvement in the predictive accuracy and capability of generalization can be achieved by the SVM-FA approach in comparison to the GP and ANN in 1 day ahead lake level forecast.

Journal ArticleDOI
TL;DR: It is shown that the pairwise model and the analytic results can be generalized to an arbitrary distribution of the infectious times, using integro-differential equations, and this leads to a general expression for the final epidemic size.
Abstract: In this Letter, a generalization of pairwise models to non-Markovian epidemics on networks is presented. For the case of infectious periods of fixed length, the resulting pairwise model is a system of delay differential equations, which shows excellent agreement with results based on stochastic simulations. Furthermore, we analytically compute a new R0-like threshold quantity and an analytical relation between this and the final epidemic size. Additionally, we show that the pairwise model and the analytic results can be generalized to an arbitrary distribution of the infectious times, using integro-differential equations, and this leads to a general expression for the final epidemic size. By showing the rigorous link between non-Markovian dynamics and pairwise delay differential equations, we provide the framework for a more systematic understanding of non-Markovian dynamics.

Journal ArticleDOI
TL;DR: It is shown how hierarchical Bayesian inference can be used to solve the reinforcement learning problem, and an equivalence between the Bayesian model and temporal difference learning algorithms that have been proposed as models of RL in humans and animals is described.
Abstract: In reinforcement learning (RL), a decision maker searching for the most rewarding option is often faced with the question: What is the value of an option that has never been tried before? One way to frame this question is as an inductive problem: How can I generalize my previous experience with one set of options to a novel option? We show how hierarchical Bayesian inference can be used to solve this problem, and we describe an equivalence between the Bayesian model and temporal difference learning algorithms that have been proposed as models of RL in humans and animals. According to our view, the search for the best option is guided by abstract knowledge about the relationships between different options in an environment, resulting in greater search efficiency compared to traditional RL algorithms previously applied to human cognition. In two behavioral experiments, we test several predictions of our model, providing evidence that humans learn and exploit structured inductive knowledge to make predictions about novel options. In light of this model, we suggest a new interpretation of dopaminergic responses to novelty.

Proceedings Article
07 Dec 2015
TL;DR: This work considers a generalization of the submodular cover problem based on the concept of diminishing return property on the integer lattice and devise a bicriteria approximation algorithm that is guaranteed to output a log-factor approximate solution that satisfies the constraints with the desired accuracy.
Abstract: We consider a generalization of the submodular cover problem based on the concept of diminishing return property on the integer lattice. We are motivated by real scenarios in machine learning that cannot be captured by (traditional) sub-modular set functions. We show that the generalized submodular cover problem can be applied to various problems and devise a bicriteria approximation algorithm. Our algorithm is guaranteed to output a log-factor approximate solution that satisfies the constraints with the desired accuracy. The running time of our algorithm is roughly O(n log(nr) log r), where n is the size of the ground set and r is the maximum value of a coordinate. The dependency on r is exponentially better than the naive reduction algorithms. Several experiments on real and artificial datasets demonstrate that the solution quality of our algorithm is comparable to naive algorithms, while the running time is several orders of magnitude faster.

Journal ArticleDOI
TL;DR: In this article, an uncertain random variable has been proposed as a generalization of both the stochastic process and the uncertain process, and some special types of uncertain random processes such as stationary increment process and renewal process are discussed.
Abstract: To deal with a system with both randomness and uncertainty, chance theory has been built and an uncertain random variable has been proposed as a generalization of random variable and uncertain variable Correspondingly, as a generalization of both the stochastic process and the uncertain process, this paper will propose an uncertain random process In addition, some special types of uncertain random processes such as stationary increment process and renewal process will also be discussed

Journal ArticleDOI
06 Jan 2015
TL;DR: A generalization of the graph model for conflict resolution is presented, introducing the possibility of decision makers (DMs) expressing their preferences among the possible scenarios through probabilistic preferences.
Abstract: We present a generalization of the graph model for conflict resolution, introducing the possibility of decision makers (DMs) expressing their preferences among the possible scenarios through probabilistic preferences In this new scenario, four stability definitions (solution concepts) are proposed: 1) $\boldsymbol {\alpha }$ -Nash stability; 2) $(\boldsymbol {\alpha }, \boldsymbol {\beta })$ -metarationality; 3) $(\boldsymbol {\alpha }, \boldsymbol {\beta })$ -symmetric metarationality; and 4) $(\boldsymbol {\alpha }, \boldsymbol {\beta }, \boldsymbol {\gamma })$ -sequential stability We deal with conflicts that involve two or more DMs Relationships between these definitions are demonstrated, as well as an analysis of how the values of the parameters $\boldsymbol {\alpha }$ , $\boldsymbol {\beta }$ , and $\boldsymbol {\gamma }$ influence the set of stable states is made Applications of the proposed model to conflicts involving two and three DMs are presented The analysis of these applications highlights the advantages gained by allowing individuals to express their preferences probabilistically

Proceedings ArticleDOI
01 Dec 2015
TL;DR: This paper builds upon exemplar SVMs to learn a set of SVM classifiers by using one positive sample and all negative samples in the source domain each time, and introduces a new regularizer to minimize the mismatch between any two representation matrices on different views.
Abstract: In this paper, we propose a new multi-view domain generalization (MVDG) approach for visual recognition, in which we aim to use the source domain samples with multiple types of features (i.e., multi-view features) to learn robust classifiers that can generalize well to any unseen target domain. Considering the recent works show the domain generalization capability can be enhanced by fusing multiple SVM classifiers, we build upon exemplar SVMs to learn a set of SVM classifiers by using one positive sample and all negative samples in the source domain each time. When the source domain samples come from multiple latent domains, we expect the weight vectors of exemplar SVM classifiers can be organized into multiple hidden clusters. To exploit such cluster structure, we organize the weight vectors learnt on each view as a weight matrix and seek the low-rank representation by reconstructing this weight matrix using itself as the dictionary. To enforce the consistency of inherent cluster structures discovered from the weight matrices learnt on different views, we introduce a new regularizer to minimize the mismatch between any two representation matrices on different views. We also develop an efficient alternating optimization algorithm and further extend our MVDG approach for domain adaptation by exploiting the manifold structure of unlabeled target domain samples. Comprehensive experiments for visual recognition clearly demonstrate the effectiveness of our approaches for domain generalization and domain adaptation.

Proceedings Article
07 Dec 2015
TL;DR: A PAC-Bayesian theorem is proved that can be seen as a direct generalization of the analogous previous result for the i.i.d. case and an inductive bias in form of a transfer procedure is proposed to learn.
Abstract: In this work we aim at extending the theoretical foundations of lifelong learning. Previous work analyzing this scenario is based on the assumption that learning tasks are sampled i.i.d. from a task environment or limited to strongly constrained data distributions. Instead, we study two scenarios when lifelong learning is possible, even though the observed tasks do not form an i.i.d. sample: first, when they are sampled from the same environment, but possibly with dependencies, and second, when the task environment is allowed to change over time in a consistent way. In the first case we prove a PAC-Bayesian theorem that can be seen as a direct generalization of the analogous previous result for the i.i.d. case. For the second scenario we propose to learn an inductive bias in form of a transfer procedure. We present a generalization bound and show on a toy example how it can be used to identify a beneficial transfer algorithm.

Journal ArticleDOI
TL;DR: This work proposes a new family of message passing techniques for MAP estimation in graphical models which is simpler than the original derivation of TRW-S, and does not involve a decomposition into trees, which allows easy generalizations.
Abstract: We propose a new family of message passing techniques for MAP estimation in graphical models which we call Sequential Reweighted Message Passing (SRMP). Special cases include well-known techniques such as Min-Sum Diffusion (MSD) and a faster Sequential Tree-Reweighted Message Passing (TRW-S). Importantly, our derivation is simpler than the original derivation of TRW-S, and does not involve a decomposition into trees. This allows easy generalizations. The new family of algorithms can be viewed as a generalization of TRW-S from pairwise to higher-order graphical models. We test SRMP on several real-world problems with promising results.


Journal ArticleDOI
TL;DR: This paper presents a new approach to account for reliability of information within the framework of LP, using models with interval, fuzzy, generalized fuzzy, and random numbers.
Abstract: Linear programming LP is the operations research technique frequently used in the fields of science, economics, business, management science, and engineering. Although it is investigated and applied for more than six decades, and LP models with different level of generalization of information about parameters including models with interval, fuzzy, generalized fuzzy, and random numbers are considered, until now there is no approach to account for reliability of information within the framework of LP.

Journal ArticleDOI
10 Apr 2015-Entropy
TL;DR: The deep belief network (DBN)-based approaches for link prediction are proposed, including an unsupervised link prediction model, a feature representation method and a DBN-based link prediction method, which can predict the values of the links with high performance and have a good generalization ability across these datasets.
Abstract: In some online social network services (SNSs), the members are allowed to label their relationships with others, and such relationships can be represented as the links with signed values (positive or negative). The networks containing such relations are named signed social networks (SSNs), and some real-world complex systems can be also modeled with SSNs. Given the information of the observed structure of an SSN, the link prediction aims to estimate the values of the unobserved links. Noticing that most of the previous approaches for link prediction are based on the members’ similarity and the supervised learning method, however, research work on the investigation of the hidden principles that drive the behaviors of social members are rarely conducted. In this paper, the deep belief network (DBN)-based approaches for link prediction are proposed. Including an unsupervised link prediction model, a feature representation method and a DBN-based link prediction method are introduced. The experiments are done on the datasets from three SNSs (social networking services) in different domains, and the results show that our methods can predict the values of the links with high performance and have a good generalization ability across these datasets.