scispace - formally typeset
Search or ask a question

Showing papers on "Generalization published in 2016"


Posted Content
TL;DR: Wide & Deep as mentioned in this paper combines the benefits of memorization and generalization for recommender systems by jointly trained wide linear models and deep neural networks, which can generalize better to unseen feature combinations through lowdimensional dense embeddings learned for the sparse features.
Abstract: Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize and recommend less relevant items when the user-item interactions are sparse and high-rank. In this paper, we present Wide & Deep learning---jointly trained wide linear models and deep neural networks---to combine the benefits of memorization and generalization for recommender systems. We productionized and evaluated the system on Google Play, a commercial mobile app store with over one billion active users and over one million apps. Online experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models. We have also open-sourced our implementation in TensorFlow.

1,242 citations


Proceedings Article
01 Jan 2016
TL;DR: The "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies and significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers.
Abstract: We introduce the "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies. Like rectified linear units (ReLUs), leaky ReLUs (LReLUs) and parametrized ReLUs (PReLUs), ELUs alleviate the vanishing gradient problem via the identity for positive values. However, ELUs have improved learning characteristics compared to the units with other activation functions. In contrast to ReLUs, ELUs have negative values which allows them to push mean unit activations closer to zero like batch normalization but with lower computational complexity. Mean shifts toward zero speed up learning by bringing the normal gradient closer to the unit natural gradient because of a reduced bias shift effect. While LReLUs and PReLUs have negative values, too, they do not ensure a noise-robust deactivation state. ELUs saturate to a negative value with smaller inputs and thereby decrease the forward propagated variation and information. Therefore, ELUs code the degree of presence of particular phenomena in the input, while they do not quantitatively model the degree of their absence. In experiments, ELUs lead not only to faster learning, but also to significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers. On CIFAR-100 ELUs networks significantly outperform ReLU networks with batch normalization while batch normalization does not improve ELU networks. ELU networks are among the top 10 reported CIFAR-10 results and yield the best published result on CIFAR-100, without resorting to multi-view evaluation or model averaging. On ImageNet, ELU networks considerably speed up learning compared to a ReLU network with the same architecture, obtaining less than 10% classification error for a single crop, single model network.

1,180 citations


Posted Content
TL;DR: In this paper, the authors investigate the cause of the generalization drop in the large batch regime and present numerical evidence that supports the view that large-batch methods tend to converge to sharp minima of the training and testing functions.
Abstract: The stochastic gradient descent (SGD) method and its variants are algorithms of choice for many Deep Learning tasks. These methods operate in a small-batch regime wherein a fraction of the training data, say $32$-$512$ data points, is sampled to compute an approximation to the gradient. It has been observed in practice that when using a larger batch there is a degradation in the quality of the model, as measured by its ability to generalize. We investigate the cause for this generalization drop in the large-batch regime and present numerical evidence that supports the view that large-batch methods tend to converge to sharp minimizers of the training and testing functions - and as is well known, sharp minima lead to poorer generalization. In contrast, small-batch methods consistently converge to flat minimizers, and our experiments support a commonly held view that this is due to the inherent noise in the gradient estimation. We discuss several strategies to attempt to help large-batch methods eliminate this generalization gap.

925 citations


Proceedings Article
15 Sep 2016
TL;DR: In this article, the authors investigate the cause of the generalization drop in the large batch regime and present numerical evidence that supports the view that large-batch methods tend to converge to sharp minima of the training and testing functions.
Abstract: The stochastic gradient descent (SGD) method and its variants are algorithms of choice for many Deep Learning tasks. These methods operate in a small-batch regime wherein a fraction of the training data, say $32$-$512$ data points, is sampled to compute an approximation to the gradient. It has been observed in practice that when using a larger batch there is a degradation in the quality of the model, as measured by its ability to generalize. We investigate the cause for this generalization drop in the large-batch regime and present numerical evidence that supports the view that large-batch methods tend to converge to sharp minimizers of the training and testing functions - and as is well known, sharp minima lead to poorer generalization. In contrast, small-batch methods consistently converge to flat minimizers, and our experiments support a commonly held view that this is due to the inherent noise in the gradient estimation. We discuss several strategies to attempt to help large-batch methods eliminate this generalization gap.

845 citations



Proceedings Article
05 Dec 2016
TL;DR: In this article, a recurrent neural network (RNN) is used to perform probabilistic inference in structured image models that explicitly reason about objects, i.e., counting, locating and classifying the elements of a scene.
Abstract: We present a framework for efficient inference in structured image models that explicitly reason about objects. We achieve this by performing probabilistic inference using a recurrent neural network that attends to scene elements and processes them one at a time. Crucially, the model itself learns to choose the appropriate number of inference steps. We use this scheme to learn to perform inference in partially specified 2D models (variable-sized variational auto-encoders) and fully specified 3D models (probabilistic renderers). We show that such models learn to identify multiple objects - counting, locating and classifying the elements of a scene -without any supervision, e.g., decomposing 3D images with various numbers of objects in a single forward pass of a neural network at unprecedented speed. We further show that the networks produce accurate inferences when compared to supervised counterparts, and that their structure leads to improved generalization.

311 citations


Journal ArticleDOI
TL;DR: In this paper, the authors explore the use of simplicial complexes, a structure developed in the field of mathematics known as algebraic topology, of increasing applicability to real data due to a rapidly growing computational toolset.
Abstract: The language of graph theory, or network science, has proven to be an exceptional tool for addressing myriad problems in neuroscience. Yet, the use of networks is predicated on a critical simplifying assumption: that the quintessential unit of interest in a brain is a dyad --- two nodes (neurons or brain regions) connected by an edge. While rarely mentioned, this fundamental assumption inherently limits the types of neural structure and function that graphs can be used to model. Here, we describe a generalization of graphs that overcomes these limitations, thereby offering a broad range of new possibilities in terms of modeling and measuring neural phenomena. Specifically, we explore the use of simplicial complexes: a structure developed in the field of mathematics known as algebraic topology, of increasing applicability to real data due to a rapidly growing computational toolset. We review the underlying mathematical formalism as well as the budding literature applying simplicial complexes to neural data, from electrophysiological recordings in animal models to hemodynamic fluctuations in humans. Based on the exceptional flexibility of the tools and recent ground-breaking insights into neural function, we posit that this framework has the potential to eclipse graph theory in unraveling the fundamental mysteries of cognition.

283 citations


Posted Content
TL;DR: A novel, simple and intuitive generalization-error bound is given showing that the expected ITE estimation error of a representation is bounded by a sum of the standard generalized-error of that representation and the distance between the treated and control distributions induced by the representation.
Abstract: There is intense interest in applying machine learning to problems of causal inference in fields such as healthcare, economics and education. In particular, individual-level causal inference has important applications such as precision medicine. We give a new theoretical analysis and family of algorithms for predicting individual treatment effect (ITE) from observational data, under the assumption known as strong ignorability. The algorithms learn a "balanced" representation such that the induced treated and control distributions look similar. We give a novel, simple and intuitive generalization-error bound showing that the expected ITE estimation error of a representation is bounded by a sum of the standard generalization-error of that representation and the distance between the treated and control distributions induced by the representation. We use Integral Probability Metrics to measure distances between distributions, deriving explicit bounds for the Wasserstein and Maximum Mean Discrepancy (MMD) distances. Experiments on real and simulated data show the new algorithms match or outperform the state-of-the-art.

247 citations


Journal ArticleDOI
TL;DR: Two generalizations of the network model are presented, establishing network modeling as parts of the more general framework of structural equation modeling (SEM), and it is shown that, within this framework, identifiable models can be obtained in which local independence is structurally violated.
Abstract: We introduce the network model as a formal psychometric model, conceptualizing the covariance between psychometric indicators as resulting from pairwise interactions between observable variables in a network structure. This contrasts with standard psychometric models, in which the covariance between test items arises from the influence of one or more common latent variables. Here, we present two generalizations of the network model that encompass latent variable structures, establishing network modeling as parts of the more general framework of Structural Equation Modeling (SEM). In the first generalization, we model the covariance structure of latent variables as a network. We term this framework Latent Network Modeling (LNM) and show that, with LNM, a unique structure of conditional independence relationships between latent variables can be obtained in an explorative manner. In the second generalization, the residual variance-covariance structure of indicators is modeled as a network. We term this generalization Residual Network Modeling (RNM) and show that, within this framework, identifiable models can be obtained in which local independence is structurally violated. These generalizations allow for a general modeling framework that can be used to fit, and compare, SEM models, network models, and the RNM and LNM generalizations. This methodology has been implemented in the free-to-use software package lvnet, which contains confirmatory model testing as well as two exploratory search algorithms: stepwise search algorithms for low-dimensional datasets and penalized maximum likelihood estimation for larger datasets. We show in simulation studies that these search algorithms performs adequately in identifying the structure of the relevant residual or latent networks. We further demonstrate the utility of these generalizations in an empirical example on a personality inventory dataset.

233 citations


Posted Content
Danilo Jimenez Rezende1, Shakir Mohamed1, Ivo Danihelka1, Karol Gregor1, Daan Wierstra1 
TL;DR: New deep generative models are developed, models that combine the representational power of deep learning with the inferential power of Bayesian reasoning, and are able to generate compelling and diverse samples, providing an important class of general-purpose models for one-shot machine learning.
Abstract: Humans have an impressive ability to reason about new concepts and experiences from just a single example. In particular, humans have an ability for one-shot generalization: an ability to encounter a new concept, understand its structure, and then be able to generate compelling alternative variations of the concept. We develop machine learning systems with this important capacity by developing new deep generative models, models that combine the representational power of deep learning with the inferential power of Bayesian reasoning. We develop a class of sequential generative models that are built on the principles of feedback and attention. These two characteristics lead to generative models that are among the state-of-the art in density estimation and image generation. We demonstrate the one-shot generalization ability of our models using three tasks: unconditional sampling, generating new exemplars of a given concept, and generating new exemplars of a family of concepts. In all cases our models are able to generate compelling and diverse samples---having seen new examples just once---providing an important class of general-purpose models for one-shot machine learning.

226 citations


Proceedings ArticleDOI
19 Jun 2016
TL;DR: The first upper bounds on the number of samples required to answer more general families of queries, including arbitrary low-sensitivity queries and an important class of optimization queries (alternatively, risk minimization queries), are proved.
Abstract: Adaptivity is an important feature of data analysis - the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated a general formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution P and a set of n independent samples x is drawn from P. We seek an algorithm that, given x as input, accurately answers a sequence of adaptively chosen ``queries'' about the unknown distribution P. How many samples n must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? In this work we make two new contributions towards resolving this question: We give upper bounds on the number of samples n that are needed to answer statistical queries. The bounds improve and simplify the work of Dwork et al. (STOC, 2015), and have been applied in subsequent work by those authors (Science, 2015; NIPS, 2015). We prove the first upper bounds on the number of samples required to answer more general families of queries. These include arbitrary low-sensitivity queries and an important class of optimization queries (alternatively, risk minimization queries). As in Dwork et al., our algorithms are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that the stability notion guaranteed by differential privacy implies low generalization error. We also show that weaker stability guarantees such as bounded KL divergence and total variation distance lead to correspondingly weaker generalization guarantees.

Proceedings ArticleDOI
27 Jun 2016
TL;DR: This work provides a novel perspective to attribute detection and proposes to gear the techniques in multi-source domain generalization for the purpose of learning cross-category generalizable attribute detectors.
Abstract: Attributes possess appealing properties and benefit many computer vision problems, such as object recognition, learning with humans in the loop, and image retrieval. Whereas the existing work mainly pursues utilizing attributes for various computer vision problems, we contend that the most basic problem—how to accurately and robustly detect attributes from images—has been left under explored. Especially, the existing work rarely explicitly tackles the need that attribute detectors should generalize well across different categories, including those previously unseen. Noting that this is analogous to the objective of multi-source domain generalization, if we treat each category as a domain, we provide a novel perspective to attribute detection and propose to gear the techniques in multi-source domain generalization for the purpose of learning cross-category generalizable attribute detectors. We validate our understanding and approach with extensive experiments on four challenging datasets and three different problems.

Posted Content
01 Feb 2016-viXra
TL;DR: In this article, the concept of single valued neutrosophic sets (SVNGs) was introduced and investigated in the context of graph analysis, with proofs and examples.
Abstract: The notion of single valued neutrosophic sets is a generalization of fuzzy sets, intuitionistic fuzzy sets. We apply the concept of single valued neutrosophic sets, an instance of neutrosophic sets, to graphs. We introduce certain types of single valued neutrosophic graphs (SVNG) and investigate some of their properties with proofs and examples.

Journal ArticleDOI
TL;DR: In this article, the authors studied how the combinatorial behavior of a category C affects the algebraic behavior of representations of C, and showed that C-algebraic representations are noetherian.
Abstract: Given a category C of a combinatorial nature, we study the following fundamental question: how does the combinatorial behavior of C affect the algebraic behavior of representations of C? We prove two general results. The first gives a combinatorial criterion for representations of C to admit a theory of Grobner bases. From this, we obtain a criterion for noetherianity of representations. The second gives a combinatorial criterion for a general “rationality” result for Hilbert series of representations of C. This criterion connects to the theory of formal languages, and makes essential use of results on the generating functions of languages, such as the transfer-matrix method and the Chomsky–Schutzenberger theorem. Our work is motivated by recent work in the literature on representations of various specific categories. Our general criteria recover many of the results on these categories that had been proved by ad hoc means, and often yield cleaner proofs and stronger statements. For example: we give a new, more robust, proof that FI-modules (originally introduced by Church–Ellenberg–Farb), and a family of natural generalizations, are noetherian; we give an easy proof of a generalization of the Lannes–Schwartz artinian conjecture from the study of generic representation theory of finite fields; we significantly improve the theory of ∆modules, introduced by Snowden in connection to syzygies of Segre embeddings; and we establish fundamental properties of twisted commutative algebras in positive characteristic.

Proceedings Article
01 Jan 2016
TL;DR: ZKBoo1 as mentioned in this paper is a proof-of-concept implementation of the MPC-in-the-head (IKOS) approach to zero-knowledge arguments for Boolean circuits.
Abstract: In this paper we describe ZKBoo1, a proposal for practically efficient zero-knowledge arguments especially tailored for Boolean circuits and report on a proof-ofconcept implementation. As an highlight, we can generate (resp. verify) a non-interactive proof for the SHA-1 circuit in approximately 13ms (resp. 5ms), with a proof size of 444KB. Our techniques are based on the “MPC-in-the-head” approach to zero-knowledge of Ishai et al. (IKOS), which has been successfully used to achieve significant asymptotic improvements. Our contributions include: ◦ A thorough analysis of the different variants of IKOS, which highlights their pros and cons for practically relevant soundness parameters; ◦ A generalization and simplification of their approach, which leads to faster Σ-protocols (that can be made non-interactive using the Fiat-Shamir heuristic) for statements of the form “I know x such that y = φ(x)” (where φ is a circuit and y a public value); ◦ A case study, where we provide explicit protocols, implementations and benchmarking of zero-knowledge protocols for the SHA-1 and SHA-256 circuits.

Posted Content
TL;DR: In this article, the generalization properties of ridge regression with random features in the statistical learning framework were studied and it was shown for the first time that $O(1/ √ n) learning bounds can be achieved with only O(sqrt{n} log n) random features rather than O(n)$ as suggested by previous results.
Abstract: We study the generalization properties of ridge regression with random features in the statistical learning framework. We show for the first time that $O(1/\sqrt{n})$ learning bounds can be achieved with only $O(\sqrt{n}\log n)$ random features rather than $O({n})$ as suggested by previous results. Further, we prove faster learning rates and show that they might require more random features, unless they are sampled according to a possibly problem dependent distribution. Our results shed light on the statistical computational trade-offs in large scale kernelized learning, showing the potential effectiveness of random features in reducing the computational complexity while keeping optimal generalization properties.

Proceedings Article
Scott Reed1, Nando de Freitas1
01 Jan 2016
TL;DR: The Neural Programmer-Interpreter (NPI) as discussed by the authors is a recurrent and compositional neural network that learns to represent and execute programs by combining a task-agnostic recurrent core, a persistent key-value program memory, and domain specific encoders.
Abstract: We propose the neural programmer-interpreter (NPI): a recurrent and compositional neural network that learns to represent and execute programs. NPI has three learnable components: a task-agnostic recurrent core, a persistent key-value program memory, and domain-specific encoders that enable a single NPI to operate in multiple perceptually diverse environments with distinct affordances. By learning to compose lower-level programs to express higher-level programs, NPI reduces sample complexity and increases generalization ability compared to sequence-to-sequence LSTMs. The program memory allows efficient learning of additional tasks by building on existing programs. NPI can also harness the environment (e.g. a scratch pad with read-write pointers) to cache intermediate results of computation, lessening the long-term memory burden on recurrent hidden units. In this work we train the NPI with fully-supervised execution traces; each program has example sequences of calls to the immediate subprograms conditioned on the input. Rather than training on a huge number of relatively weak labels, NPI learns from a small number of rich examples. We demonstrate the capability of our model to learn several types of compositional programs: addition, sorting, and canonicalizing 3D models. Furthermore, a single NPI learns to execute these programs and all 21 associated subprograms.

Proceedings Article
19 Jun 2016
TL;DR: Randomized least-squares value iteration (RLSVI) as discussed by the authors is a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions, and it has been shown that randomized value functions offer a promising approach to synthesizing efficient exploration and effective generalization.
Abstract: We propose randomized least-squares value iteration (RLSVI) - a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions. We explain why versions of least-squares value iteration that use Boltzmann or e-greedy exploration can be highly inefficient, and we present computational results that demonstrate dramatic efficiency gains enjoyed by RLSVI. Further, we establish an upper bound on the expected regret of RLSVI that demonstrates nearoptimality in a tabula rasa learning context. More broadly, our results suggest that randomized value functions offer a promising approach to tackling a critical challenge in reinforcement learning: synthesizing efficient exploration and effective generalization.

01 May 2016
TL;DR: This paper shows that it is possible to train large and deep convolutional neural networks for JPEG compression artifacts reduction, and that such networks can provide significantly better reconstruction quality compared to previously used smaller networks as well as to any other state-of-the-art methods.
Abstract: This paper shows that it is possible to train large and deep convolutional neural networks (CNN) for JPEG compression artifacts reduction, and that such networks can provide significantly better reconstruction quality compared to previously used smaller networks as well as to any other state-of-the-art methods. We were able to train networks with 8 layers in a single step and in relatively short time by combining residual learning, skip architecture, and symmetric weight initialization. We provide further insights into convolution networks for JPEG artifact reduction by evaluating three different objectives, generalization with respect to training dataset size, and generalization with respect to JPEG quality level.

Journal ArticleDOI
TL;DR: Estimation and prediction results of the ELM models were compared with genetic programming (GP) and artificial neural networks (ANNs) models and indicate that on the whole, the newflanged algorithm creates good generalization presentation.
Abstract: Evaluation of the parameters affecting the shear strength and ductility of steel–concrete composite beam is the goal of this study. This study focuses on predicting the future output of beam’s strength and ductility based on relative inputs using a soft computing scheme, extreme learning machine (ELM). Estimation and prediction results of the ELM models were compared with genetic programming (GP) and artificial neural networks (ANNs) models. Referring to the experimental results, as opposed to the GP and ANN methods, the ELM approach enhanced generalization ability and predictive accuracy. Moreover, achieved results indicated that the developed ELM models can be used with confidence for further work on formulating novel model predictive strategy in shear strength and ductility of steel concrete composite. Furthermore, the experimental results indicate that on the whole, the newflanged algorithm creates good generalization presentation. In comparison to the other widely used conventional learning algorithms, the ELM has a much faster learning ability.

Journal ArticleDOI
TL;DR: In this article, a short-term, multistep ahead predictive models of heat load of consumer attached to district heating system were created using the novel method based on Extreme Learning Machine (ELM).

01 Mar 2016
TL;DR: Certain types of single valued neutrosophic graphs (SVNG) are introduced and some of their properties are investigated with proofs and examples.
Abstract: The notion of single valued neutrosophic sets is a generalization of fuzzy sets, intuitionistic fuzzy sets. We apply the concept of single valued neutrosophic sets, an instance of neutrosophic sets, to graphs. We introduce certain types of single valued neutrosophic graphs (SVNG) and investigate some of their properties with proofs and examples

01 Jan 2016
TL;DR: Good generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series, which are applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices.
Abstract: An efficient method for the calculation of the interactions of a 2' factorial experiment was introduced by Yates and is widely known by his name. The generalization to 3' was given by Box et al. [1]. Good [2] generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series. In their full generality, Good's methods are applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices, where m is proportional to log N. This results inma procedure requiring a number of operations proportional to N log N rather than N2. These methods are applied here to the calculation of complex Fourier series. They are useful in situations where the number of data points is, or can be chosen to be, a highly composite number. The algorithm is here derived and presented in a rather different form. Attention is given to the choice of N. It is also shown how special advantage can be obtained in the use of a binary computer with N = 2' and how the entire calculation can be performed within the array of N data storage locations used for the given Fourier coefficients. Consider the problem of calculating the complex Fourier series N-1

Proceedings Article
05 Dec 2016
TL;DR: For the negative log-likelihood loss function, it is shown that the minimization of PAC-Bayesian generalization risk bounds maximizes the Bayesian marginal likelihood.
Abstract: We exhibit a strong link between frequentist PAC-Bayesian risk bounds and the Bayesian marginal likelihood. That is, for the negative log-likelihood loss function, we show that the minimization of PAC-Bayesian generalization risk bounds maximizes the Bayesian marginal likelihood. This provides an alternative explanation to the Bayesian Occam's razor criteria, under the assumption that the data is generated by an i.i.d. distribution. Moreover, as the negative log-likelihood is an unbounded loss function, we motivate and propose a PAC-Bayesian theorem tailored for the sub-gamma loss family, and we show that our approach is sound on classical Bayesian linear regression tasks.


Journal ArticleDOI
15 Jan 2016
TL;DR: In this paper, a projection formula for tropical multiplicities and a generalization of the Sturmfels{Tevelev multiplicity formula in tropical elimination theory to the case of a nontrivial valuation were developed.
Abstract: We develop a number of general techniques for comparing analytications and tropicalizations of algebraic varieties. Our basic results include a projection formula for tropical multiplicities and a generalization of the Sturmfels{Tevelev multiplicity formula in tropical elimination theory to the case of a nontrivial valuation. For curves, we explore in detail the relationship between skeletal metrics and lattice lengths on tropicalizations and show that the maps from the analytication of a curve to the tropicalizations of its toric embeddings stabilize to isometries on nite subgraphs. Other applications include generalizations of Speyer’s well-spacedness condition and the Katz{ Markwig{Markwig results on tropical j-invariants.

Journal ArticleDOI
07 May 2016-Filomat
TL;DR: In this paper, a generalization of the Bleimann-Butzer-Hahn operators based on (p, q)-integers was introduced and the convergence of these operators was computed by using the modulus of continuity.
Abstract: In this paper, we introduce a generalization of the Bleimann-Butzer-Hahn operators based on (p,q)-integers and obtain Korovkin's type approximation theorem for these operators. Furthermore, we compute convergence of these operators by using the modulus of continuity.

Posted Content
01 Feb 2016-viXra
TL;DR: This work applies for the first time the concept of interval valued neutrosophic sets, an instance of neutrosophile sets, to graph theory.
Abstract: The notion of interval valued neutrosophic sets is a generalization of fuzzy sets, intuitionistic fuzzy sets, interval valued fuzzy sets, interval valued intuitionstic fuzzy sets and single valued neutrosophic sets. We apply for the first time the concept of interval valued neutrosophic sets, an instance of neutrosophic sets, to graph theory. We introduce certain types of interval valued neutrosophc graphs (IVNG) and investigate some of their properties with proofs and examples.

Posted Content
01 Aug 2016-viXra
TL;DR: In this article, a new generalized proportional conflict redistribution rule was proposed and compared with different combination rules in terms of decision on didactic example and on generated data for the combination in the belief function theory.
Abstract: In this chapter, we present and discuss a new generalized proportional conflict redistribution rule. The Dezert-Smarandache extension of the Dempster- Shafer theory has relaunched the studies on the combination rules especially for the management of the conflict. Many combination rules have been proposed in the last few years. We study here different combination rules and compare them in terms of decision on didactic example and on generated data. Indeed, in real applications, we need a reliable decision and it is the final results that matter. This chapter shows that a fine proportional conflict redistribution rule must be preferred for the combination in the belief function theory.

Proceedings Article
19 Jun 2016
TL;DR: The authors showed that the most important contribution is made by the lateral connections, followed by the application of noise, and the choice of what they refer to as the "combinator function".
Abstract: The Ladder Network is a recent new approach to semi-supervised learning that turned out to be very successful. While showing impressive performance, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture. This paper presents an extensive experimental investigation of variants of the Ladder Network in which we replaced or removed individual components to learn about their relative importance. For semi-supervised tasks, we conclude that the most important contribution is made by the lateral connections, followed by the application of noise, and the choice of what we refer to as the 'combinator function'. As the number of labeled training examples increases, the lateral connections and the reconstruction criterion become less important, with most of the generalization improvement coming from the injection of noise in each layer. Finally, we introduce a combinator function that reduces test error rates on Permutation-Invariant MNIST to 0.57% for the supervised setting, and to 0.97% and 1.0% for semi-supervised settings with 1000 and 100 labeled examples, respectively.