scispace - formally typeset
Search or ask a question

Showing papers on "Function (mathematics) published in 2016"


Posted Content
TL;DR: In this article, the authors address the problem of 3D reconstruction from a single image, generating a straight-forward form of output -point cloud coordinates. But the groundtruth shape for an input image may be ambiguous, and they design architecture, loss function and learning paradigm that are novel and effective.
Abstract: Generation of 3D data by deep neural network has been attracting increasing attention in the research community. The majority of extant works resort to regular representations such as volumetric grids or collection of images; however, these representations obscure the natural invariance of 3D shapes under geometric transformations and also suffer from a number of other issues. In this paper we address the problem of 3D reconstruction from a single image, generating a straight-forward form of output -- point cloud coordinates. Along with this problem arises a unique and interesting issue, that the groundtruth shape for an input image may be ambiguous. Driven by this unorthodox output form and the inherent ambiguity in groundtruth, we design architecture, loss function and learning paradigm that are novel and effective. Our final solution is a conditional shape sampler, capable of predicting multiple plausible 3D point clouds from an input image. In experiments not only can our system outperform state-of-the-art methods on single image based 3d reconstruction benchmarks; but it also shows a strong performance for 3d shape completion and promising ability in making multiple plausible predictions.

1,194 citations


Posted Content
TL;DR: A gating function is proposed to selectively emphasize such fine common local patterns that may be essential to distinguish positive pairs from hard negative pairs by comparing the mid-level features across pairs of images.
Abstract: Matching pedestrians across multiple camera views, known as human re-identification, is a challenging research problem that has numerous applications in visual surveillance. With the resurgence of Convolutional Neural Networks (CNNs), several end-to-end deep Siamese CNN architectures have been proposed for human re-identification with the objective of projecting the images of similar pairs (i.e. same identity) to be closer to each other and those of dissimilar pairs to be distant from each other. However, current networks extract fixed representations for each image regardless of other images which are paired with it and the comparison with other images is done only at the final level. In this setting, the network is at risk of failing to extract finer local patterns that may be essential to distinguish positive pairs from hard negative pairs. In this paper, we propose a gating function to selectively emphasize such fine common local patterns by comparing the mid-level features across pairs of images. This produces flexible representations for the same image according to the images they are paired with. We conduct experiments on the CUHK03, Market-1501 and VIPeR datasets and demonstrate improved performance compared to a baseline Siamese CNN architecture.

661 citations


Posted Content
TL;DR: The third-degree stochastic dominance condition was introduced in this article, where the authors show that the set of probability distributions that can be ordered by means of second-degree Stieltjes dominance is, in general, larger than that which can be order by first-degree SDE.
Abstract: Here F(x) and G(x) are less-than cumulative probability distributionis where x is a continuous or discrete random variable representing the outcome of a prospect. The closed interval [a, b] is the sample space of both prospects. The integral shown in Rule 2 and those shown throughout the paper are Stieltjes integrals. Recall that the Stieltjes integral fb f(x)dg(x) exists if one of the functions f and g is continuous and the other has finite variation in [a, b]. Let D1, D2, and D3 be three sets of utility functions ?(x). D1 is the set containing all utility functions with 4(x) and +1(x) continuous, and 41(x) >0 for all xE[a, b]. D2 is the set with ?(x), ?1(x), ?2(x) continuous, and q$j(x)>0, 02(x)?O for all xC[a, b]. D3 is the set with ?(x), ?1(x), ?2(X), ?3(X) continuous, and +1(x) > 04 2(x) O O for all xC[a, b]. Here +1(x) denotes the ith derivative of +(x). Hadar and Russell proved that Rule 1 is valid for all ,CD1 and Rutle 2 is valid for all ED2. The authors point out that the set of probability distributions that can be ordered by means of second-degree stochastic dominance is, in general, larger than that which can be ordered by means of first-degree stochastic dominance. Note that in Rule 2, they assume that +(x) is not only an increasing function of x but also exhibits weak global risk aversion, a condition guaranteed by requiring the second derivative of ?(x) to be nonpositive. In this paper, a condition which will be called third-degree stochastic dominance is considered. It is based on the following assumption about the form of the utility function ?(x). From a normative point of view, one expects the risk premium associated with an uncertain prospect to become smaller the greater is the individual's wealth. The plausibility and implications of this assumption h'ave been explored by John Pratt, as well as others. The risk premium of an uncertain prospect is that amount by which the certainty equivalent of the prospect differs from its expected value. In mathematical terms, given the prospect F(x) with expected value A, the corresponding risk premium -t is obtained by solving the following equation. rb

537 citations


Posted Content
TL;DR: Domain Transfer Network (DTN) as discussed by the authors employs a compound loss function that includes a multiclass GAN loss, an f-constancy component, and a regularizing component that encourages G to map samples from T to themselves.
Abstract: We study the problem of transferring a sample in one domain to an analog sample in another domain. Given two related domains, S and T, we would like to learn a generative function G that maps an input sample from S to the domain T, such that the output of a given function f, which accepts inputs in either domains, would remain unchanged. Other than the function f, the training data is unsupervised and consist of a set of samples from each domain. The Domain Transfer Network (DTN) we present employs a compound loss function that includes a multiclass GAN loss, an f-constancy component, and a regularizing component that encourages G to map samples from T to themselves. We apply our method to visual domains including digits and face images and demonstrate its ability to generate convincing novel images of previously unseen entities, while preserving their identity.

460 citations


Proceedings ArticleDOI
19 Apr 2016
TL;DR: The authors compared bi-LSTMs with word, character, and unicode byte embeddings for POS tagging and showed that biLSTM is less sensitive to training data size and label corruptions than previously assumed.
Abstract: Bidirectional long short-term memory (biLSTM) networks have recently proven successful for various NLP sequence modeling tasks, but little is known about their reliance to input representations, target languages, data set size, and label noise. We address these issues and evaluate bi-LSTMs with word, character, and unicode byte embeddings for POS tagging. We compare bi-LSTMs to traditional POS taggers across languages and data sizes. We also present a novel biLSTM model, which combines the POS tagging loss function with an auxiliary loss function that accounts for rare words. The model obtains state-of-the-art performance across 22 languages, and works especially well for morphologically complex languages. Our analysis suggests that biLSTMs are less sensitive to training data size and label corruptions (at small noise levels) than previously assumed.

448 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of deriving an explicit approximate solution of the nonlinear power equations that describe a balanced power distribution network and propose an approximation that is linear in the active and reactive power demands of the PQ buses.
Abstract: We consider the problem of deriving an explicit approximate solution of the nonlinear power equations that describe a balanced power distribution network. We give sufficient conditions for the existence of a practical solution to the power flow equations, and we propose an approximation that is linear in the active and reactive power demands of the PQ buses. For this approximation, which is valid for generic power line impedances and grid topology, we derive a bound on the approximation error as a function of the grid parameters. We illustrate the quality of the approximation via simulations, we show how it can also model the presence of voltage controlled (PV) buses, and we discuss how it generalizes the DC power flow model to lossy networks.

407 citations


Proceedings Article
01 Jan 2016
TL;DR: This work addresses the large number of samples typically required and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias.
Abstract: Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks. The two main challenges are the large number of samples typically required, and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data. We address the first challenge by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias, with an exponentially-weighted estimator of the advantage function that is analogous to TD(lambda). We address the second challenge by using trust region optimization procedure for both the policy and the value function, which are represented by neural networks. Our approach yields strong empirical results on highly challenging 3D locomotion tasks, learning running gaits for bipedal and quadrupedal simulated robots, and learning a policy for getting the biped to stand up from starting out lying on the ground. In contrast to a body of prior work that uses hand-crafted policy representations, our neural network policies map directly from raw kinematics to joint torques. Our algorithm is fully model-free, and the amount of simulated experience required for the learning tasks on 3D bipeds corresponds to 1-2 weeks of real time.

356 citations


Journal ArticleDOI
TL;DR: In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms and it is emphasized that new termination criteria are established to guarantee the effectiveness of the iteration control laws.
Abstract: In this paper, a value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon undiscounted optimal control problems for discrete-time nonlinear systems. The present value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize the algorithm. A novel convergence analysis is developed to guarantee that the iterative value function converges to the optimal performance index function. Initialized by different initial functions, it is proven that the iterative value function will be monotonically nonincreasing, monotonically nondecreasing, or nonmonotonic and will converge to the optimum. In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms. It is emphasized that new termination criteria are established to guarantee the effectiveness of the iterative control laws. Neural networks are used to approximate the iterative value function and compute the iterative control law, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.

324 citations


Proceedings Article
02 May 2016
TL;DR: The proposed pooling operations provide a boost in invariance properties relative to conventional pooling and set the state of the art on several widely adopted benchmark datasets; they are also easy to implement, and can be applied within various deep neural network architectures.
Abstract: We seek to improve deep neural networks by generalizing the pooling operations that play a central role in current architectures. We pursue a careful exploration of approaches to allow pooling to learn and to adapt to complex and variable patterns. The two primary directions lie in (1) learning a pooling function via (two strategies of) combining of max and average pooling, and (2) learning a pooling function in the form of a tree-structured fusion of pooling filters that are themselves learned. In our experiments every generalized pooling operation we explore improves performance when used in place of average or max pooling. We experimentally demonstrate that the proposed pooling operations provide a boost in invariance properties relative to conventional pooling and set the state of the art on several widely adopted benchmark datasets; they are also easy to implement, and can be applied within various deep neural network architectures. These benefits come with only a light increase in computational overhead during training and a very modest increase in the number of model parameters.

313 citations


Proceedings Article
04 Nov 2016
TL;DR: The Domain Transfer Network (DTN) is presented, which employs a compound loss function that includes a multiclass GAN loss, an f-constancy component, and a regularizing component that encourages G to map samples from T to themselves.
Abstract: We study the problem of transferring a sample in one domain to an analog sample in another domain. Given two related domains, S and T, we would like to learn a generative function G that maps an input sample from S to the domain T, such that the output of a given function f, which accepts inputs in either domains, would remain unchanged. Other than the function f, the training data is unsupervised and consist of a set of samples from each domain. The Domain Transfer Network (DTN) we present employs a compound loss function that includes a multiclass GAN loss, an f-constancy component, and a regularizing component that encourages G to map samples from T to themselves. We apply our method to visual domains including digits and face images and demonstrate its ability to generate convincing novel images of previously unseen entities, while preserving their identity.

301 citations


Journal ArticleDOI
TL;DR: It is proved that as long as b is below a certain threshold, the authors can reach any predefined accuracy with less overall work than without mini-batching, and is suitable for further acceleration by parallelization.
Abstract: We propose mS2GD: a method incorporating a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent (S2GD). We consider the problem of minimizing a strongly convex function represented as the sum of an average of a large number of smooth convex functions, and a simple nonsmooth convex regularizer. Our method first performs a deterministic step (computation of the gradient of the objective function at the starting point), followed by a large number of stochastic steps. The process is repeated a few times with the last iterate becoming the new starting point. The novelty of our method is in introduction of mini-batching into the computation of stochastic steps. In each step, instead of choosing a single function, we sample $b$ functions, compute their gradients, and compute the direction based on this. We analyze the complexity of the method and show that it benefits from two speedup effects. First, we prove that as long as $b$ is below a certain threshold, we can reach any predefined accuracy with less overall work than without mini-batching. Second, our mini-batching scheme admits a simple parallel implementation, and hence is suitable for further acceleration by parallelization.

Proceedings ArticleDOI
Guannan Qu1, Na Li1
01 Dec 2016
TL;DR: This paper proposes a distributed algorithm that, despite using the same amount of communication per iteration as DGD, can effectively harnesses the function smoothness and converge to the optimum with a rate of O(1/t) if the objective function is strongly convex and smooth.
Abstract: There has been a growing effort in studying the distributed optimization problem over a network. The objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. Literature has developed consensus-based distributed (sub)gradient descent (DGD) methods and has shown that they have the same convergence rate O(log t/√t) as the centralized (sub)gradient methods (CGD) when the function is convex but possibly nonsmooth. However, when the function is convex and smooth, under the framework of DGD, it is unclear how to harness the smoothness to obtain a faster convergence rate comparable to CGD's convergence rate. In this paper, we propose a distributed algorithm that, despite using the same amount of communication per iteration as DGD, can effectively harnesses the function smoothness and converge to the optimum with a rate of O(1/t). If the objective function is further strongly convex, our algorithm has a linear convergence rate. Both rates match the convergence rate of CGD. The key step in our algorithm is a novel gradient estimation scheme that uses history information to achieve fast and accurate estimation of the average gradient. To motivate the necessity of history information, we also show that it is impossible for a class of distributed algorithms like DGD to achieve a linear convergence rate without using history information even if the objective function is strongly convex and smooth.

Journal ArticleDOI
TL;DR: An improved accuracy function under IVPFS environment has been developed by taking the account of the unknown hesitation degree and has been applied to decision making problems to show the validity, practicality and effectiveness of the new approach.
Abstract: The objective of the present work is divided into two folds. Firstly, an interval-valued Pythagorean fuzzy set (IVPFS) has been introduced along with their two aggregation operators, namely, interval-valued Pythagorean fuzzy weighted average and weighted geometric operators for different IVPFS. Secondly, an improved accuracy function under IVPFS environment has been developed by taking the account of the unknown hesitation degree. The proposed function has been applied to decision making problems to show the validity, practicality and effectiveness of the new approach. A systematic comparison between the existing work and the proposed work has also been given.

Journal ArticleDOI
TL;DR: In this article, a unified performance analysis of a single-link free-space optical (FSO) link that accounts for pointing errors and both types of detection techniques is presented.
Abstract: In this work, we present a unified performance analysis of a free-space optical (FSO) link that accounts for pointing errors and both types of detection techniques [i.e., intensity modulation/direct detection (IM/DD) and heterodyne detection]. More specifically, we present unified exact closed-form expressions for the cumulative distribution function, the probability density function, the moment generating function, and the moments of the end-to-end signal-to-noise ratio (SNR) of a single link FSO transmission system, all in terms of the Meijer’s G function except for the moments that is in terms of simple elementary functions. We then capitalize on these unified results to offer unified exact closed-form expressions for various performance metrics of FSO link transmission systems, such as the outage probability, the scintillation index (SI), the average error rate for binary and $M$ -ary modulation schemes, and the ergodic capacity (except for IM/DD technique, where we present closed-form lower bound results), all in terms of Meijer’s G functions except for the SI that is in terms of simple elementary functions. Additionally, we derive the asymptotic results for all the expressions derived earlier in terms of Meijer’s G function in the high SNR regime in terms of simple elementary functions via an asymptotic expansion of the Meijer’s G function. We also derive new asymptotic expressions for the ergodic capacity in the low as well as high SNR regimes in terms of simple elementary functions via utilizing moments. All the presented results are verified via computer-based Monte-Carlo simulations.

Journal ArticleDOI
TL;DR: It is shown that Steinmann relations dramatically simplify the function space for the hexagon function bootstrap in planar maximally supersymmetric Yang-Mills theory, and is obtained the complete five-loop six-particle amplitude.
Abstract: The analytic structure of scattering amplitudes is restricted by Steinmann relations, which enforce the vanishing of certain discontinuities of discontinuities. We show that these relations dramatically simplify the function space for the hexagon function bootstrap in planar maximally supersymmetric Yang-Mills theory. Armed with this simplification, along with the constraints of dual conformal symmetry and Regge exponentiation, we obtain the complete five-loop six-particle amplitude.

Journal ArticleDOI
TL;DR: It is proved that the rate of convergence of a slight variant of Nesterov's accelerated forward-backward method, which produces convergent sequences, is actually $o(k-2)$, rather than $\mathcal O(k^{-2})$.
Abstract: The forward-backward algorithm is a powerful tool for solving optimization problems with an additively separable and smooth plus nonsmooth structure. In the convex setting, a simple but ingenious acceleration scheme developed by Nesterov improves the theoretical rate of convergence for the function values from the standard $\mathcal O(k^{-1})$ down to $\mathcal O(k^{-2})$. In this short paper, we prove that the rate of convergence of a slight variant of Nesterov's accelerated forward-backward method, which produces convergent sequences, is actually $o(k^{-2})$, rather than $\mathcal O(k^{-2})$. Our arguments rely on the connection between this algorithm and a second-order differential inclusion with vanishing damping.

Journal ArticleDOI
20 Jan 2016-Neuron
TL;DR: A simple holographic method to simultaneously perform two-photon calcium imaging of neuronal populations across multiple areas and layers of mouse cortex in vivo using prior knowledge of neuronal locations, activity sparsity, and a constrained nonnegative matrix factorization algorithm is presented.

Proceedings ArticleDOI
24 Oct 2016
TL;DR: In this article, a tensoring operation was introduced to obtain a conceptually simpler derivation of previous constructions and present new constructions for m-party FSS schemes, which are useful for applications that involve private reading from or writing to distributed databases while minimizing the amount of communication.
Abstract: Function Secret Sharing (FSS), introduced by Boyle et al. (Eurocrypt 2015), provides a way for additively secret-sharing a function from a given function family F. More concretely, an m-party FSS scheme splits a function f : {0, 1}n -> G, for some abelian group G, into functions f1,...,fm, described by keys k1,...,km, such that f = f1 + ... + fm and every strict subset of the keys hides f. A Distributed Point Function (DPF) is a special case where F is the family of point functions, namely functions f_{a,b} that evaluate to b on the input a and to 0 on all other inputs. FSS schemes are useful for applications that involve privately reading from or writing to distributed databases while minimizing the amount of communication. These include different flavors of private information retrieval (PIR), as well as a recent application of DPF for large-scale anonymous messaging. We improve and extend previous results in several ways: * Simplified FSS constructions. We introduce a tensoring operation for FSS which is used to obtain a conceptually simpler derivation of previous constructions and present our new constructions. * Improved 2-party DPF. We reduce the key size of the PRG-based DPF scheme of Boyle et al. roughly by a factor of 4 and optimize its computational cost. The optimized DPF significantly improves the concrete costs of 2-server PIR and related primitives. * FSS for new function families. We present an efficient PRG-based 2-party FSS scheme for the family of decision trees, leaking only the topology of the tree and the internal node labels. We apply this towards FSS for multi-dimensional intervals. We also present a general technique for extending FSS schemes by increasing the number of parties. * Verifiable FSS. We present efficient protocols for verifying that keys (k*/1,...,k*/m ), obtained from a potentially malicious user, are consistent with some f in F. Such a verification may be critical for applications that involve private writing or voting by many users.

Journal ArticleDOI
TL;DR: In this paper, a combined model of generalized bilinear Kadomtsev-Petviashvili and Boussinesq equation (gbKPB for short) in terms of the function f is proposed, which involves four arbitrary coefficients.
Abstract: Associated with the prime number $$p=3$$ , a combined model of generalized bilinear Kadomtsev–Petviashvili and Boussinesq equation (gbKPB for short) in terms of the function f is proposed, which involves four arbitrary coefficients. To guarantee the existence of lump solutions, a constraint among these four coefficients is presented firstly, and then, the lump solutions are constructed and classified via searching for positive quadratic function solutions to the gbKPB equation. Different conditions posed on lump parameters are investigated to keep the analyticity and rational localization of the resulting solutions. Finally, 3-dimensional plots, density plots and 2-dimensional curves with particular choices of the involved parameters are given to show the profile characteristics of the presented lump solutions for the potential function $$u=2(\mathrm{{ln}}f)_x$$ .

Journal ArticleDOI
TL;DR: This work describes convolutional neural network scoring functions that take as input a comprehensive three-dimensional representation of a protein-ligand interaction and finds that the CNN scoring function outperforms the AutoDock Vina scoring function when ranking poses both for pose prediction and virtual screening.
Abstract: Computational approaches to drug discovery can reduce the time and cost associated with experimental assays and enable the screening of novel chemotypes. Structure-based drug design methods rely on scoring functions to rank and predict binding affinities and poses. The ever-expanding amount of protein-ligand binding and structural data enables the use of deep machine learning techniques for protein-ligand scoring. We describe convolutional neural network (CNN) scoring functions that take as input a comprehensive 3D representation of a protein-ligand interaction. A CNN scoring function automatically learns the key features of protein-ligand interactions that correlate with binding. We train and optimize our CNN scoring functions to discriminate between correct and incorrect binding poses and known binders and non-binders. We find that our CNN scoring function outperforms the AutoDock Vina scoring function when ranking poses both for pose prediction and virtual screening.

Proceedings Article
01 Jan 2016
TL;DR: The parallel knowledge gradient method as discussed by the authors provides the one-step Bayes-optimal batch of points to sample, and it finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms.
Abstract: In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural network architectures in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm --- the parallel knowledge gradient method. By construction, this method provides the one-step Bayes optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy.

01 Jan 2016
TL;DR: The muscles testing and function is universally compatible with any devices to read and is available in the digital library an online access to it is set as public so you can get it instantly.
Abstract: Thank you very much for downloading muscles testing and function. Maybe you have knowledge that, people have search numerous times for their chosen books like this muscles testing and function, but end up in harmful downloads. Rather than enjoying a good book with a cup of coffee in the afternoon, instead they are facing with some harmful bugs inside their laptop. muscles testing and function is available in our digital library an online access to it is set as public so you can get it instantly. Our books collection saves in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Merely said, the muscles testing and function is universally compatible with any devices to read.

Journal ArticleDOI
TL;DR: In this article, the authors used the hexagon function bootstrap to compute the ratio function which characterizes the next-to-maximally-helicity-violating (NMHV) six-point amplitude in planar N=4 super-Yang-Mills theory at four loops.
Abstract: We use the hexagon function bootstrap to compute the ratio function which characterizes the next-to-maximally-helicity-violating (NMHV) six-point amplitude in planar N=4 super-Yang-Mills theory at four loops. A powerful constraint comes from dual superconformal invariance, in the form of a Q differential equation, which heavily constrains the first derivatives of the transcendental functions entering the ratio function. At four loops, it leaves only a 34-parameter space of functions. Constraints from the collinear limits, and from the multi-Regge limit at the leading-logarithmic (LL) and next-to-leading-logarithmic (NLL) order, suffice to fix these parameters and obtain a unique result. We test the result against multi-Regge predictions at NNLL and N^3LL, and against predictions from the operator product expansion involving one and two flux-tube excitations; all cross-checks are satisfied. We study the analytical and numerical behavior of the parity-even and parity-odd parts on various lines and surfaces traversing the three-dimensional space of cross ratios. As part of this program, we characterize all irreducible hexagon functions through weight eight in terms of their coproduct. We also provide representations of the ratio function in particular kinematic regions in terms of multiple polylogarithms.

Proceedings ArticleDOI
TL;DR: This paper enhanced the architecture of Recommender Systems by using a loss function adapted to input data with missing values, and by incorporating side information, demonstrating that while side information only slightly improve the test error averaged on all users/items, it has more impact on cold users/ items.
Abstract: A standard model for Recommender Systems is the Matrix Completion setting: given partially known matrix of ratings given by users (rows) to items (columns), infer the unknown ratings. In the last decades, few attempts where done to handle that objective with Neural Networks, but recently an architecture based on Autoencoders proved to be a promising approach. In current paper, we enhanced that architecture (i) by using a loss function adapted to input data with missing values, and (ii) by incorporating side information. The experiments demonstrate that while side information only slightly improve the test error averaged on all users/items, it has more impact on cold users/items.

Journal ArticleDOI
TL;DR: Even with a very simple dynamical model it can be shown that functional relations between nodes of a realistic anatomical network display clear patterns if the system is studied near the critical transition, especially for disconnected nodes and in combination with an effective connectivity measure.

Journal ArticleDOI
TL;DR: This paper proposes three different methods to build preaggregation functions and experimentally shows that in fuzzy rule-based classification systems, the results obtained when applying the fuzzy reasoning methods obtained using two classical averaging operators such as the maximum and the Choquet integral are improved.
Abstract: In this paper, we introduce the notion of preaggregation function. Such a function satisfies the same boundary conditions as an aggregation function, but, instead of requiring monotonicity, only monotonicity along some fixed direction (directional monotonicity) is required. We present some examples of such functions. We propose three different methods to build preaggregation functions. We experimentally show that in fuzzy rule-based classification systems, when we use one of these methods, namely, the one based on the use of the Choquet integral replacing the product by other aggregation functions, if we consider the minimum or the Hamacher product t-norms for such construction, we improve the results obtained when applying the fuzzy reasoning methods obtained using two classical averaging operators such as the maximum and the Choquet integral.

Journal ArticleDOI
TL;DR: In this paper, the authors show that any scoring function that is consistent for a quantile or an expectile functional can be represented as a mixture of elementary or extremal scoring functions that form a linearly parameterized family.
Abstract: Summary In the practice of point prediction, it is desirable that forecasters receive a directive in the form of a statistical functional. For example, forecasters might be asked to report the mean or a quantile of their predictive distributions. When evaluating and comparing competing forecasts, it is then critical that the scoring function used for these purposes be consistent for the functional at hand, in the sense that the expected score is minimized when following the directive. We show that any scoring function that is consistent for a quantile or an expectile functional can be represented as a mixture of elementary or extremal scoring functions that form a linearly parameterized family. Scoring functions for the mean value and probability forecasts of binary events constitute important examples. The extremal scoring functions admit appealing economic interpretations of quantiles and expectiles in the context of betting and investment problems. The Choquet-type mixture representations give rise to simple checks of whether a forecast dominates another in the sense that it is preferable under any consistent scoring function. In empirical settings it suffices to compare the average scores for only a finite number of extremal elements. Plots of the average scores with respect to the extremal scoring functions, which we call Murphy diagrams, permit detailed comparisons of the relative merits of competing forecasts.

Journal ArticleDOI
TL;DR: In this paper, a parameter extraction technique for the five-parameter solar-cell model is presented, which only requires the priori knowledge of three load points: the open circuit, the short circuit, and the maximum power points.

Proceedings ArticleDOI
25 Jul 2016
TL;DR: This paper presents an iterative distributed algorithm that achieves optimal fault-tolerance, and ensures that at least |N|-f agents have weights that are bounded away from 0 (in particular, lower bounded by 1/2|N |-f}).
Abstract: This paper addresses the problem of distributed multi-agent optimization in which each agent i has a local cost function hi(x), and the goal is to optimize a global cost function that aggregates the local cost functions. Such optimization problems are of interest in many contexts, including distributed machine learning, distributed resource allocation, and distributed robotics.We consider the distributed optimization problem in the presence of faulty agents. We focus primarily on Byzantine failures, but also briey discuss some results for crash failures. For the Byzantine fault-tolerant optimization problem, the ideal goal is to optimize the average of local cost functions of the non-faulty agents. However, this goal also cannot be achieved. Therefore, we consider a relaxed version of the fault-tolerant optimization problem.The goal for the relaxed problem is to generate an output that is an optimum of a global cost function formed as a convex combination of local cost functions of the non-faulty agents. More precisely, there must exist weights αi for i∈N such that αi ≥ 0 and ∑i≥ Nαi=1, and the output is an optimum of the cost function ∑i≥ N αihi(x). Ideally, we would like αi=1/|N| for all i≥ N, however, this cannot be guaranteed due to the presence of faulty agents. In fact, the maximum number of nonzero weights (αi's) that can be guaranteed is |N|-f, where f is the maximum number of Byzantine faulty agents.We present an iterative distributed algorithm that achieves optimal fault-tolerance. Specifically, it ensures that at least |N|-f agents have weights that are bounded away from 0 (in particular, lower bounded by 1/2|N|-f}). The proposed distributed algorithm has a simple iterative structure, with each agent maintaining only a small amount of local state. We show that the iterative algorithm ensures two properties as time goes to ∞: consensus (i.e., output of non-faulty agents becomes identical in the time limit), and optimality (in the sense that the output is the optimum of a suitably defined global cost function).

Journal ArticleDOI
TL;DR: In this article, the authors consider the random-design least-squares regression problem within the reproducing kernel Hilbert space (RKHS) framework and show that the averaged unregularized least-mean-square algorithm attains optimal rates of convergence for a variety of regimes for the smoothness of the optimal prediction function and the functions in $\mathcal{H}$.
Abstract: We consider the random-design least-squares regression problem within the reproducing kernel Hilbert space (RKHS) framework. Given a stream of independent and identically distributed input/output data, we aim to learn a regression function within an RKHS $\mathcal{H}$, even if the optimal predictor (i.e., the conditional expectation) is not in $\mathcal{H}$. In a stochastic approximation framework where the estimator is updated after each observation, we show that the averaged unregularized least-mean-square algorithm (a form of stochastic gradient), given a sufficient large step-size, attains optimal rates of convergence for a variety of regimes for the smoothnesses of the optimal prediction function and the functions in $\mathcal{H}$.