scispace - formally typeset
Search or ask a question

Showing papers by "Ali H. Sayed published in 2016"


Journal ArticleDOI
TL;DR: Adaptive filters are at the core of many signal processing applications, ranging from acoustic noise supression to echo cancelation to array beamforming.
Abstract: Adaptive filters are at the core of many signal processing applications, ranging from acoustic noise supression to echo cancelation [1], array beamforming [2], channel equalization [3], to more recent sensor network applications in surveillance, target localization, and tracking. A trending approach in this direction is to recur to in-network distributed processing in which individual nodes implement adaptation rules and diffuse their estimation to the network [4], [5].

115 citations


Journal ArticleDOI
TL;DR: This work considers multitask learning problems where clusters of nodes are interested in estimating their own parameter vector and proposes a fully distributed algorithm that relies on minimizing a global mean-square error criterion regularized by nondifferentiable terms to promote cooperation among neighboring clusters.
Abstract: In this work, we consider multitask learning problems where clusters of nodes are interested in estimating their own parameter vector. Cooperation among clusters is beneficial when the optimal models of adjacent clusters have a good number of similar entries. We propose a fully distributed algorithm for solving this problem. The approach relies on minimizing a global mean-square error criterion regularized by nondifferentiable terms to promote cooperation among neighboring clusters. A general diffusion forward–backward splitting strategy is introduced. Then, it is specialized to the case of sparsity promoting regularizers. A closed-form expression for the proximal operator of a weighted sum of $\ell _1$ -norms is derived to achieve higher efficiency. We also provide conditions on the step-sizes that ensure convergence of the algorithm in the mean and mean-square error sense. Simulations are conducted to illustrate the effectiveness of the strategy.

74 citations


Posted Content
TL;DR: In this paper, the authors proposed an asynchronous, decentralized algorithm for consensus optimization, where each agent can compute and communicate independently at different times, for different durations, with the information it has even if the latest information from its neighbors is not yet available.
Abstract: We propose an asynchronous, decentralized algorithm for consensus optimization. The algorithm runs over a network in which the agents communicate with their neighbors and perform local computation. In the proposed algorithm, each agent can compute and communicate independently at different times, for different durations, with the information it has even if the latest information from its neighbors is not yet available. Such an asynchronous algorithm reduces the time that agents would otherwise waste idle because of communication delays or because their neighbors are slower. It also eliminates the need for a global clock for synchronization. Mathematically, the algorithm involves both primal and dual variables, uses fixed step-size parameters, and provably converges to the exact solution under a bounded delay assumption and a random agent assumption. When running synchronously, the algorithm performs just as well as existing competitive synchronous algorithms such as PG-EXTRA, which diverges without synchronization. Numerical experiments confirm the theoretical findings and illustrate the performance of the proposed algorithm.

59 citations


Journal ArticleDOI
TL;DR: In this article, a model for the solution of multitask problems over asynchronous networks is described and a detailed mean and mean-square error analysis is carried out, which shows that sufficiently small step-sizes can still ensure both stability and performance.
Abstract: The multitask diffusion LMS is an efficient strategy to simultaneously infer, in a collaborative manner, multiple parameter vectors. Existing works on multitask problems assume that all agents respond to data synchronously. In several applications, agents may not be able to act synchronously because networks can be subject to several sources of uncertainties such as changing topology, random link failures, or agents turning on and off for energy conservation. In this paper, we describe a model for the solution of multitask problems over asynchronous networks and carry out a detailed mean and mean-square error analysis. Results show that sufficiently small step-sizes can still ensure both stability and performance. Simulations and illustrative examples are provided to verify the theoretical findings.

52 citations


Journal ArticleDOI
TL;DR: In this paper, a robust adaptive filtering algorithm is developed that effectively learns and tracks the output error distribution to improve estimation performance.
Abstract: The popular least-mean-squares (LMS) algorithm for adaptive filtering is nonrobust against impulsive noise in the measurements. The presence of this type of noise degrades the transient and steady-state performance of the algorithm. Since the distribution of the impulsive noise is generally unknown, a robust semi-parametric approach to adaptive filtering is warranted, where the output error nonlinearity is adapted jointly with the parameter of interest. In this paper, a robust adaptive filtering algorithm is developed that effectively learns and tracks the output error distribution to improve estimation performance. The performance of the algorithm is analyzed mathematically and validated experimentally.

43 citations


Proceedings ArticleDOI
20 Mar 2016
TL;DR: This work analyzes the theoretical performance of the single-task diffusion LMS when it is run, intentionally or unintentionally, in a multitask environment in the presence of noisy links and introduces an improved strategy that allows the agents to promote or reduce exchanges of information with their neighbors.
Abstract: Diffusion LMS is an efficient strategy for solving distributed optimization problems with cooperating agents. In some applications, the optimum parameter vectors may not be the same for all agents. Moreover, agents usually exchange information through noisy communication links. In this work, we analyze the theoretical performance of the single-task diffusion LMS when it is run, intentionally or unintentionally, in a multitask environment in the presence of noisy links. To reduce the impact of these nuisance factors, we introduce an improved strategy that allows the agents to promote or reduce exchanges of information with their neighbors.

40 citations


Journal ArticleDOI
TL;DR: In this article, a scaling law for the steady-state probabilities of miss detection and false alarm in the slow adaptation regime was established for distributed detection schemes over fully decentralized networks, where the agents interact with each other according to distributed strategies that employ small constant step-sizes.
Abstract: This paper examines the close interplay between cooperation and adaptation for distributed detection schemes over fully decentralized networks. The combined attributes of cooperation and adaptation are necessary to enable networks of detectors to continually learn from streaming data and to continually track drifts in the state of nature when deciding in favor of one hypothesis or another. The results in this paper establish a fundamental scaling law for the steady-state probabilities of miss detection and false alarm in the slow adaptation regime, when the agents interact with each other according to distributed strategies that employ small constant step-sizes. The latter are critical to enable continuous adaptation and learning. This paper establishes three key results. First, it is shown that the output of the collaborative process at each agent has a steady-state distribution. Second, it is shown that this distribution is asymptotically Gaussian in the slow adaptation regime of small step-sizes. Third, by carrying out a detailed large deviations analysis, closed-form expressions are derived for the decaying rates of the false-alarm and miss-detection probabilities. Interesting insights are gained from these expressions. In particular, it is verified that as the step-size $\mu $ decreases, the error probabilities are driven to zero exponentially fast as functions of $1/\mu $ , and that the exponents governing the decay increase linearly in the number of agents. It is also verified that the scaling laws governing the errors of detection and the errors of estimation over the network behave very differently, with the former having exponential decay proportional to $1/\mu $ , while the latter scales linearly with decay proportional to $\mu $ . Moreover, and interestingly, it is shown that the cooperative strategy allows each agent to reach the same detection performance, in terms of detection error exponents, of a centralized stochastic-gradient solution. The results of this paper are illustrated by applying them to canonical distributed detection problems.

36 citations


Journal ArticleDOI
26 Sep 2016
TL;DR: In this paper, the authors consider distributed detection problems over adaptive networks, where dispersed agents learn continually from streaming data by means of local interactions, and propose diffusion algorithms with constant step-size approximate asymptotics.
Abstract: We consider distributed detection problems over adaptive networks, where dispersed agents learn continually from streaming data by means of local interactions. The requirement of adaptation allows the network of detectors to track drifts in the underlying hypothesis. The requirement of cooperation allows each agent to deliver a performance superior to what would be obtained if it were acting individually. The simultaneous requirements of adaptation and cooperation are achieved by employing diffusion algorithms with constant step-size $\mu$ . By conducting a refined asymptotic analysis based on the mathematical framework of exact asymptotics, we arrive at a revealing understanding of the universal behavior of distributed detection over adaptive networks : as functions of $1/\mu$ , the error (log-)probability curves corresponding to different agents stay nearly-parallel to each other (as already discovered in [1] and [2] ), however, these curves are ordered following a criterion reflecting the degree of connectivity of each agent. Depending on the combination weights, the more connected an agent is, the lower its error probability curve will be. The analysis provides explicit analytical formulas for the detection error probabilities and these expressions are also verified by means of extensive simulations. We further enlarge the reference setting from the case of doubly-stochastic combination matrices considered in [1] and [2] to the more general and demanding setting of right-stochastic combination matrices; this extension poses new and interesting questions in terms of the interplay between the network topology, the combination weights, and the inference performance. The potential of the proposed methods is illustrated by application of the results to canonical detection problems, to typical network topologies, for both doubly-stochastic and right-stochastic combination matrices. Interesting and somehow unexpected behaviors emerge, and the lesson learned is that connectivity matters .

35 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied the learning ability of consensus and diffusion learners from continuous streams of data arising from different but related statistical distributions and derived closed-form expressions for the evolution of their excess-risk under a diminishing step-size rule.
Abstract: This paper studies the learning ability of consensus and diffusion distributed learners from continuous streams of data arising from different but related statistical distributions. Four distinctive features for diffusion learners are revealed in relation to other decentralized schemes even under left-stochastic combination policies. First, closed-form expressions for the evolution of their excess-risk are derived for strongly convex risk functions under a diminishing step-size rule. Second, using these results, it is shown that the diffusion strategy improves the asymptotic convergence rate of the excess-risk relative to non-cooperative schemes. Third, it is shown that when the in-network cooperation rules are designed optimally, the performance of the diffusion implementation can outperform that of naive centralized processing. Finally, the arguments further show that diffusion outperforms consensus strategies asymptotically and that the asymptotic excess-risk expression is invariant to the particular network topology. The framework adopted in this paper studies convergence in the stronger mean-square-error sense, rather than in distribution, and develops tools that enable a close examination of the differences between distributed strategies in terms of asymptotic behavior, as well as in terms of convergence rates.

31 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examine the learning mechanism of adaptive agents over weakly connected graphs and reveal an interesting behavior on how information flows through such topologies, and explain why strong-connectivity of the network topology, adaptation of the combination weights, and clustering of agents are important ingredients to equalize the learning abilities of all agents against such disturbances.
Abstract: This paper examines the learning mechanism of adaptive agents over weakly connected graphs and reveals an interesting behavior on how information flows through such topologies. The results clarify how asymmetries in the exchange of data can mask local information at certain agents and make them totally dependent on other agents. A leader-follower relationship develops with the performance of some agents being fully determined by the performance of other agents that are outside their domain of influence. This scenario can arise, for example, due to intruder attacks by malicious agents or as the result of failures by some critical links. The findings in this paper help explain why strong-connectivity of the network topology, adaptation of the combination weights, and clustering of agents are important ingredients to equalize the learning abilities of all agents against such disturbances. The results also clarify how weak-connectivity can be helpful in reducing the effect of outlier data on learning performance.

29 citations


Proceedings ArticleDOI
01 Sep 2016
TL;DR: This work draws from recent results in the field of online adaptation to derive new tight performance expressions for empirical implementations of stochastic gradient descent, mini-batchgradient descent, and importance sampling, and proposes an optimal importance sampling algorithm to optimize performance.
Abstract: The minimization of empirical risks over finite sample sizes is an important problem in large-scale machine learning. A variety of algorithms has been proposed in the literature to alleviate the computational burden per iteration at the expense of convergence speed and accuracy. Many of these approaches can be interpreted as stochastic gradient descent algorithms, where data is sampled from particular empirical distributions. In this work, we leverage this interpretation and draw from recent results in the field of online adaptation to derive new tight performance expressions for empirical implementations of stochastic gradient descent, mini-batch gradient descent, and importance sampling. The expressions are exact to first order in the step-size parameter and are tighter than existing bounds. We further quantify the performance gained from employing mini-batch solutions, and propose an optimal importance sampling algorithm to optimize performance.

Proceedings ArticleDOI
20 Mar 2016
TL;DR: The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value, and suggests a method to enhance performance in the Stochastic setting by tuning the momentum parameter over time.
Abstract: This paper examines the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The equivalence result is established for all time instants and not only in steady-state. The analysis is carried out for general risk functions, and is not limited to quadratic risks. One notable conclusion is that the well-known benefits of momentum constructions for deterministic optimization problems do not necessarily carry over to the stochastic setting when gradient noise is present and continuous adaptation is necessary. The analysis suggests a method to enhance performance in the stochastic setting by tuning the momentum parameter over time.

Proceedings ArticleDOI
20 Mar 2016
TL;DR: It is shown how the regularizers can be smoothed and how the Pareto solution can be sought by appealing to a multi-agent diffusion strategy under conditions that are weaker than assumed earlier in the literature.
Abstract: We develop an effective distributed strategy for seeking the Pareto solution of an aggregate cost consisting of regularized risks. The focus is on stochastic optimization problems where each risk function is expressed as the expectation of some loss function and the probability distribution of the data is unknown. We assume each risk function is regularized and allow the regularizer to be non-smooth. Under conditions that are weaker than assumed earlier in the literature and, hence, applicable to a broader class of adaptation and learning problems, we show how the regularizers can be smoothed and how the Pareto solution can be sought by appealing to a multi-agent diffusion strategy. The formulation is general enough and includes, for example, a multi-agent proximal strategy as a special case.

Journal ArticleDOI
TL;DR: In this article, the authors study the performance of diffusion least-mean squares algorithms for distributed parameter estimation in multi-agent networks when nodes exchange information over wireless communication links and show that by properly monitoring the CSI over the network and choosing sufficiently small adaptation step-sizes, diffusion strategies are able to deliver satisfactory performance in the presence of fading and path loss.
Abstract: We study the performance of diffusion least-mean squares algorithms for distributed parameter estimation in multi-agent networks when nodes exchange information over wireless communication links. Wireless channel impairments, such as fading and path-loss, adversely affect the exchanged data and cause instability and performance degradation if left unattended. To mitigate these effects, we incorporate equalization coefficients into the diffusion combination step and update the combination weights dynamically in the face of randomly changing neighborhoods due to fading conditions. When channel state information (CSI) is unavailable, we determine the equalization factors from pilot-aided channel coefficient estimates. The analysis reveals that by properly monitoring the CSI over the network and choosing sufficiently small adaptation step-sizes, the diffusion strategies are able to deliver satisfactory performance in the presence of fading and path loss.

Journal Article
TL;DR: In this article, the convergence rate and mean square error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime was examined in the adaptive online setting.
Abstract: The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value The size of the re-scaling is determined by the value of the momentum parameter The equivalence result is established for all time instants and not only in steady-state The analysis is carried out for general strongly convex and smooth risk functions, and is not limited to quadratic risks One notable conclusion is that the well-known benefits of momentum constructions for deterministic optimization problems do not necessarily carry over to the adaptive online setting when small constant step-sizes are used to enable continuous adaptation and learning in the presence of persistent gradient noise From simulations, the equivalence between momentum and standard stochastic gradient methods is also observed for nondifferentiable and non-convex problems

Proceedings ArticleDOI
20 Mar 2016
TL;DR: It is shown that the diffusion LMS algorithm for distributed inference over networks can be extended to deal with structured criteria built upon groups of variables, leading to a flexible framework that can encode various structures in the parameters to estimate.
Abstract: Considering groups of variables, rather than variables individually, can be beneficial for estimation accuracy if structural relationships between variables exist (e.g., spatial, hierarchical or related to the physics of the problem). Group-sparsity inducing estimators are typical examples that benefit from such type of prior knowledge. Building on this principle, we show that the diffusion LMS algorithm for distributed inference over networks can be extended to deal with structured criteria built upon groups of variables, leading to a flexible framework that can encode various structures in the parameters to estimate. We also propose an unsupervised online strategy to differentially promote or inhibit collaborations between nodes depending on the group of variables at hand.

Posted Content
TL;DR: In this article, the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime were analyzed for strongly convex and smooth risk functions, and not limited to quadratic risks.
Abstract: The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The size of the re-scaling is determined by the value of the momentum parameter. The equivalence result is established for all time instants and not only in steady-state. The analysis is carried out for general strongly convex and smooth risk functions, and is not limited to quadratic risks. One notable conclusion is that the well-known bene ts of momentum constructions for deterministic optimization problems do not necessarily carry over to the adaptive online setting when small constant step-sizes are used to enable continuous adaptation and learn- ing in the presence of persistent gradient noise. From simulations, the equivalence between momentum and standard stochastic gradient methods is also observed for non-differentiable and non-convex problems.

Journal ArticleDOI
19 May 2016
TL;DR: In this article, the authors examine how a connected network of agents, with each one of them subjected to a stream of data with incomplete regression information, can cooperate with each other through local interactions to estimate the underlying model parameters in the presence of missing data.
Abstract: In many fields, and especially in the medical and social sciences and in recommender systems, data are gathered through clinical studies or targeted surveys. Participants are generally reluctant to respond to all questions in a survey or they may lack information to respond adequately to some questions. The data collected from these studies tend to lead to linear regression models where the regression vectors are only known partially: some of their entries are either missing completely or replaced randomly by noisy values. In this work, assuming missing positions are replaced by noisy values, we examine how a connected network of agents, with each one of them subjected to a stream of data with incomplete regression information, can cooperate with each other through local interactions to estimate the underlying model parameters in the presence of missing data. We explain how to adjust the distributed diffusion strategy through (de)regularization in order to eliminate the bias introduced by the incomplete model. We also propose a technique to recursively estimate the (de)regularization parameter and examine the performance of the resulting strategy. We illustrate the results by considering two applications: one dealing with a mental health survey and the other dealing with a household consumption survey.

Posted Content
TL;DR: In this paper, the authors consider distributed detection problems over adaptive networks, where dispersed agents learn continually from streaming data by means of local interactions, and the simultaneous requirements of adaptation and cooperation are achieved by employing diffusion algorithms with constant step-size.
Abstract: We consider distributed detection problems over adaptive networks, where dispersed agents learn continually from streaming data by means of local interactions. The simultaneous requirements of adaptation and cooperation are achieved by employing diffusion algorithms with constant step-size {\mu}. In [1], [2] some main features of adaptive distributed detection were revealed. By resorting to large deviations analysis, it was established that the Type-I and Type-II error probabilities of all agents vanish exponentially as functions of 1/{\mu}, and that all agents share the same Type-I and Type-II error exponents. However, numerical evidences presented in [1], [2] showed that the theory of large deviations does not capture the fundamental impact of network connectivity on performance, and that additional tools and efforts are required to obtain accurate predictions for the error probabilities. This work addresses these open issues and extends the results of [1], [2] in several directions. By conducting a refined asymptotic analysis based on the mathematical framework of exact asymptotics, we arrive at a revealing and powerful understanding of the universal behavior of distributed detection over adaptive networks: as functions of 1/{\mu}, the error (log-)probability curves corresponding to different agents stay nearly-parallel to each other (as already discovered in [1], [2]), however, these curves are ordered following a criterion reflecting the degree of connectivity of each agent. Depending on the combination weights, the more connected an agent is, the lower its error probability curve will be. Interesting and somehow unexpected behaviors emerge, in terms of the interplay between the network topology, the combination weights, and the inference performance. The lesson learned is that connectivity matters.

Proceedings ArticleDOI
01 Nov 2016
TL;DR: An asynchronous, decentralized algorithm for consensus optimization that involves both primal and dual variables, uses fixed step-size parameters, and provably converges to the exact solution under a random agent assumption and both bounded and unbounded delay assumptions.
Abstract: We propose an asynchronous, decentralized algorithm for consensus optimization. The algorithm runs over a network of agents, where the agents perform local computation and communicate with neighbors. We design the algorithm so that the agents can compute and communicate independently at different times and for different durations. This reduces the waiting time for the slowest agent or longest communication delay and also eliminates the need for a global clock. Mathematically, the algorithm involves both primal and dual variables, uses fixed step-size parameters, and provably converges to the exact solution under a bounded delay assumption and a random agent assumption. When running synchronously, the algorithm performs just as well as existing competitive synchronous algorithms such as PG-EXTRA, which diverges without synchronization. Numerical experiments confirm the theoretical findings and illustrate the performance of the proposed algorithm.

Proceedings ArticleDOI
20 Mar 2016
TL;DR: It is shown that the asymmetric flow of information hinders the learning abilities of certain agents regardless of their local observations, and useful closed-form expressions are derived which can be used to motivate design problems to control it.
Abstract: In this paper, we study diffusion social learning over weakly-connected graphs. We show that the asymmetric flow of information hinders the learning abilities of certain agents regardless of their local observations. Under some circumstances that we clarify in this work, a scenario of total influence (or "mind-control") arises where a set of influential agents ends up shaping the beliefs of non-influential agents. We derive useful closed-form expressions that characterize this influence, and which can be used to motivate design problems to control it. We provide simulation examples to illustrate the results.

Posted Content
TL;DR: In this paper, the authors study diffusion social learning over weakly-connected graphs and show that the asymmetric flow of information hinders the learning abilities of certain agents regardless of their local observations.
Abstract: In this paper, we study diffusion social learning over weakly-connected graphs. We show that the asymmetric flow of information hinders the learning abilities of certain agents regardless of their local observations. Under some circumstances that we clarify in this work, a scenario of total influence (or "mind-control") arises where a set of influential agents ends up shaping the beliefs of non-influential agents. We derive useful closed-form expressions that characterize this influence, and which can be used to motivate design problems to control it. We provide simulation examples to illustrate the results.

Proceedings ArticleDOI
20 Mar 2016
TL;DR: The analysis establishes that sub-gradient strategies can attain exponential convergence rates, as opposed to sub-linear rates, and that they can approach the optimal solution within O(p), for sufficiently small step-sizes, p.
Abstract: This work examines the performance of stochastic sub-gradient learning strategies, for both cases of stand-alone and networked agents, under weaker conditions than usually considered in the literature. It is shown that these conditions are automatically satisfied by several important cases of interest, including support-vector machines and sparsity-inducing learning solutions. The analysis establishes that sub-gradient strategies can attain exponential convergence rates, as opposed to sub-linear rates, and that they can approach the optimal solution within O(p), for sufficiently small step-sizes, p. A realizable exponential-weighting procedure is proposed to smooth the intermediate iterates and to guarantee these desirable performance properties.

Proceedings ArticleDOI
01 Aug 2016
TL;DR: The analysis provides insight into the interplay between the network topology, the combination weights, and the inference performance, revealing the universal behavior of diffusion-based detectors over adaptive networks.
Abstract: Exploiting recent progress [1]-[4] in the characterization of the detection performance of diffusion strategies over adaptive multi-agent networks: i) we present two theoretical approximations, one based on asymptotic normality and the other based on the theory of exact asymptotics; and ii) we develop an efficient simulation method by tailoring the importance sampling technique to diffusion adaptation. We show that these theoretical and experimental tools complement each other well, with their combination offering a substantial advance for a reliable quantitative detection-performance assessment. The analysis provides insight into the interplay between the network topology, the combination weights, and the inference performance, revealing the universal behavior of diffusion-based detectors over adaptive networks.

Proceedings ArticleDOI
01 Nov 2016
TL;DR: A projection based diffusion LMS approach is derived and studied for distributed adaptive learning over multitask mean-square-error networks where each agent is interested in estimating its own parameter vector and the set of constraints involving its vector.
Abstract: In this work, we consider distributed adaptive learning over multitask mean-square-error (MSE) networks where each agent is interested in estimating its own parameter vector, also called task, and where the tasks at neighboring agents are related according to a set of linear equality constraints. We assume that each agent knows its own cost function of its vector and the set of constraints involving its vector. In order to solve the multitask problem and to optimize the individual costs subject to all constraints, a projection based diffusion LMS approach is derived and studied. Simulation results illustrate the efficiency of the strategy.


Proceedings ArticleDOI
20 Mar 2016
TL;DR: Three stochastic gradient strategies are developed by relying on a penalty-based approach where the constrained GNEP formulation is replaced by a penalized unconstrained formulation that is able to approach the Nash equilibrium in a stable manner within O(p), for small step-size values p.
Abstract: This work examines a stochastic formulation of the generalized Nash equilibrium problem (GNEP) where agents are subject to randomness in the environment of unknown statistical distribution. Three stochastic gradient strategies are developed by relying on a penalty-based approach where the constrained GNEP formulation is replaced by a penalized unconstrained formulation. It is shown that this penalty solution is able to approach the Nash equilibrium in a stable manner within O(p), for small step-size values p. The operation of the algorithms is illustrated by considering the Cournot competition problem.

Proceedings ArticleDOI
24 Feb 2016
TL;DR: In this paper, the authors develop an online dual coordinate-ascent (O-DCA) algorithm that is able to respond to streaming data and does not need to revisit the past data.
Abstract: The stochastic dual coordinate-ascent (S-DCA) technique is a useful alternative to the traditional stochastic gradient-descent algorithm for solving large-scale optimization problems due to its scalability to large data sets and strong theoretical guarantees. However, the available S-DCA formulation is limited to finite sample sizes and relies on performing multiple passes over the same data. This formulation is not well-suited for online implementations where data keep streaming in. In this work, we develop an online dual coordinate-ascent (O-DCA) algorithm that is able to respond to streaming data and does not need to revisit the past data. This feature infuses the resulting construction with continuous adaptation, learning, and tracking abilities, which are particularly useful for online learning scenarios.

Proceedings ArticleDOI
01 Dec 2016
TL;DR: This work proposes a BRAIN strategy for learning, which enhances the performance of traditional algorithms, such as logistic regression and SVM learners, by incorporating a graphical layer that tracks and learns in real-time the underlying correlation structure among feature subspaces.
Abstract: Complexity is a double-edged sword for learning algorithms when the number of available samples for training in relation to the dimension of the feature space is small. This is because simple models do not sufficiently capture the nuances of the data set, while complex models overfit. While remedies such as regularization and dimensionality reduction exist, they themselves can suffer from overfitting or introduce bias. To address the issue of overfitting, the incorporation of prior structural knowledge is generally of paramount importance. In this work, we propose a BRAIN strategy for learning, which enhances the performance of traditional algorithms, such as logistic regression and SVM learners, by incorporating a graphical layer that tracks and learns in real-time the underlying correlation structure among feature subspaces. In this way, the algorithm is able to identify salient subspaces and their correlations, while simultaneously dampening the effect of irrelevant features. This effect is particularly useful for high-dimensional feature spaces.

Posted Content
TL;DR: In this article, the authors develop an online dual coordinate-ascent (O-DCA) algorithm that is able to respond to streaming data and does not need to revisit the past data.
Abstract: The stochastic dual coordinate-ascent (S-DCA) technique is a useful alternative to the traditional stochastic gradient-descent algorithm for solving large-scale optimization problems due to its scalability to large data sets and strong theoretical guarantees. However, the available S-DCA formulation is limited to finite sample sizes and relies on performing multiple passes over the same data. This formulation is not well-suited for online implementations where data keep streaming in. In this work, we develop an {\em online} dual coordinate-ascent (O-DCA) algorithm that is able to respond to streaming data and does not need to revisit the past data. This feature embeds the resulting construction with continuous adaptation, learning, and tracking abilities, which are particularly attractive for online learning scenarios.