scispace - formally typeset
Search or ask a question

Showing papers by "Michael I. Jordan published in 2023"


Journal ArticleDOI
25 May 2023-JAMA
TL;DR: This article developed a definition of postacute sequelae of SARS-CoV-2 infection (PASC) using self-reported symptoms and described PASC frequencies across cohorts, vaccination status, and number of infections.
Abstract: Importance SARS-CoV-2 infection is associated with persistent, relapsing, or new symptoms or other health effects occurring after acute infection, termed postacute sequelae of SARS-CoV-2 infection (PASC), also known as long COVID. Characterizing PASC requires analysis of prospectively and uniformly collected data from diverse uninfected and infected individuals. Objective To develop a definition of PASC using self-reported symptoms and describe PASC frequencies across cohorts, vaccination status, and number of infections. Design, Setting, and Participants Prospective observational cohort study of adults with and without SARS-CoV-2 infection at 85 enrolling sites (hospitals, health centers, community organizations) located in 33 states plus Washington, DC, and Puerto Rico. Participants who were enrolled in the RECOVER adult cohort before April 10, 2023, completed a symptom survey 6 months or more after acute symptom onset or test date. Selection included population-based, volunteer, and convenience sampling. Exposure SARS-CoV-2 infection. Main Outcomes and Measures PASC and 44 participant-reported symptoms (with severity thresholds). Results A total of 9764 participants (89% SARS-CoV-2 infected; 71% female; 16% Hispanic/Latino; 15% non-Hispanic Black; median age, 47 years [IQR, 35-60]) met selection criteria. Adjusted odds ratios were 1.5 or greater (infected vs uninfected participants) for 37 symptoms. Symptoms contributing to PASC score included postexertional malaise, fatigue, brain fog, dizziness, gastrointestinal symptoms, palpitations, changes in sexual desire or capacity, loss of or change in smell or taste, thirst, chronic cough, chest pain, and abnormal movements. Among 2231 participants first infected on or after December 1, 2021, and enrolled within 30 days of infection, 224 (10% [95% CI, 8.8%-11%]) were PASC positive at 6 months. Conclusions and Relevance A definition of PASC was developed based on symptoms in a prospective cohort study. As a first step to providing a framework for other investigations, iterative refinement that further incorporates other clinical features is needed to support actionable definitions of PASC.

18 citations


Journal ArticleDOI
TL;DR: In this paper , the authors provide a theoretical framework for reinforcement learning with human feedback, and show that when the true reward function is linear, the widely used maximum likelihood estimator (MLE) converges under both the BTL model and the Plackett-Luce (PL) model.
Abstract: We provide a theoretical framework for Reinforcement Learning with Human Feedback (RLHF). Our analysis shows that when the true reward function is linear, the widely used maximum likelihood estimator (MLE) converges under both the Bradley-Terry-Luce (BTL) model and the Plackett-Luce (PL) model. However, we show that when training a policy based on the learned reward model, MLE fails while a pessimistic MLE provides policies with improved performance under certain coverage assumptions. Additionally, we demonstrate that under the PL model, the true MLE and an alternative MLE that splits the $K$-wise comparison into pairwise comparisons both converge. Moreover, the true MLE is asymptotically more efficient. Our results validate the empirical success of existing RLHF algorithms in InstructGPT and provide new insights for algorithm design. Furthermore, our results unify the problem of RLHF and max-entropy Inverse Reinforcement Learning (IRL), and provide the first sample complexity bound for max-entropy IRL.

12 citations



Proceedings ArticleDOI
24 Feb 2023
TL;DR: In this article , the authors study a heterogeneous agent macroeconomic model with an infinite number of households and firms competing in a labor market, and propose a data-driven reinforcement learning framework that finds the regularized competitive equilibrium of the model.
Abstract: We study a heterogeneous agent macroeconomic model with an infinite number of households and firms competing in a labor market. Each household earns income and engages in consumption at each time step while aiming to maximize a concave utility subject to the underlying market conditions. The households aim to find the optimal saving strategy that maximizes their discounted cumulative utility given the market condition, while the firms determine the market conditions through maximizing corporate profit based on the household population behavior. The model captures a wide range of applications in macroeconomic studies, and we propose a data-driven reinforcement learning framework that finds the regularized competitive equilibrium of the model. The proposed algorithm enjoys theoretical guarantees in converging to the equilibrium of the market at a sub-linear rate.

3 citations


Journal ArticleDOI
TL;DR: In this paper , the complexity of deterministically smoothing nonconvex Lipschitz functions was studied, and it was shown that randomization is necessary to obtain a dimension-free rate.
Abstract: We study the complexity of optimizing nonsmooth nonconvex Lipschitz functions by producing $(\delta,\epsilon)$-stationary points. Several recent works have presented randomized algorithms that produce such points using $\tilde O(\delta^{-1}\epsilon^{-3})$ first-order oracle calls, independent of the dimension $d$. It has been an open problem as to whether a similar result can be obtained via a deterministic algorithm. We resolve this open problem, showing that randomization is necessary to obtain a dimension-free rate. In particular, we prove a lower bound of $\Omega(d)$ for any deterministic algorithm. Moreover, we show that unlike smooth or convex optimization, access to function values is required for any deterministic algorithm to halt within any finite time. On the other hand, we prove that if the function is even slightly smooth, then the dimension-free rate of $\tilde O(\delta^{-1}\epsilon^{-3})$ can be obtained by a deterministic algorithm with merely a logarithmic dependence on the smoothness parameter. Motivated by these findings, we turn to study the complexity of deterministically smoothing Lipschitz functions. Though there are efficient black-box randomized smoothings, we start by showing that no such deterministic procedure can smooth functions in a meaningful manner, resolving an open question. We then bypass this impossibility result for the structured case of ReLU neural networks. To that end, in a practical white-box setting in which the optimizer is granted access to the network's architecture, we propose a simple, dimension-free, deterministic smoothing that provably preserves $(\delta,\epsilon)$-stationary points. Our method applies to a variety of architectures of arbitrary depth, including ResNets and ConvNets. Combined with our algorithm, this yields the first deterministic dimension-free algorithm for optimizing ReLU networks, circumventing our lower bound.

2 citations


Journal ArticleDOI
TL;DR: In this article , a framework for performing valid statistical inference when an experimental data set is supplemented with predictions from a machine-learning system is introduced, without making any assumptions on the machine learning algorithm that supplies the predictions.
Abstract: We introduce prediction-powered inference $\unicode{x2013}$ a framework for performing valid statistical inference when an experimental data set is supplemented with predictions from a machine-learning system. Our framework yields provably valid conclusions without making any assumptions on the machine-learning algorithm that supplies the predictions. Higher accuracy of the predictions translates to smaller confidence intervals, permitting more powerful inference. Prediction-powered inference yields simple algorithms for computing valid confidence intervals for statistical objects such as means, quantiles, and linear and logistic regression coefficients. We demonstrate the benefits of prediction-powered inference with data sets from proteomics, genomics, electronic voting, remote sensing, census analysis, and ecology.

1 citations


Journal ArticleDOI
TL;DR: In this article , a three-party game between the users, the platform, and the content creators is modeled as a principal-agent game, with the platform interacting with the content creator under a principal agent model through contracts to encourage better content, while the platform interacts with the users to recommend new content, receive an evaluation, and ultimately profit from the content.
Abstract: The creator economy has revolutionized the way individuals can profit through online platforms. In this paper, we initiate the study of online learning in the creator economy by modeling the creator economy as a three-party game between the users, platform, and content creators, with the platform interacting with the content creator under a principal-agent model through contracts to encourage better content. Additionally, the platform interacts with the users to recommend new content, receive an evaluation, and ultimately profit from the content, which can be modeled as a recommender system. Our study aims to explore how the platform can jointly optimize the contract and recommender system to maximize the utility in an online learning fashion. We primarily analyze and compare two families of contracts: return-based contracts and feature-based contracts. Return-based contracts pay the content creator a fraction of the reward the platform gains. In contrast, feature-based contracts pay the content creator based on the quality or features of the content, regardless of the reward the platform receives. We show that under smoothness assumptions, the joint optimization of return-based contracts and recommendation policy provides a regret $\Theta(T^{2/3})$. For the feature-based contract, we introduce a definition of intrinsic dimension $d$ to characterize the hardness of learning the contract and provide an upper bound on the regret $\mathcal{O}(T^{(d+1)/(d+2)})$. The upper bound is tight for the linear family.

1 citations


Journal ArticleDOI
TL;DR: In this article , a nonlinear nowcasting model for extreme precipitation is presented, which unifies physical-evolution schemes and conditional-learning methods into a neural-network framework with end-to-end forecast error optimization.
Abstract: Abstract Extreme precipitation is a considerable contributor to meteorological disasters and there is a great need to mitigate its socioeconomic effects through skilful nowcasting that has high resolution, long lead times and local details 1–3 . Current methods are subject to blur, dissipation, intensity or location errors, with physics-based numerical methods struggling to capture pivotal chaotic dynamics such as convective initiation 4 and data-driven learning methods failing to obey intrinsic physical laws such as advective conservation 5 . We present NowcastNet, a nonlinear nowcasting model for extreme precipitation that unifies physical-evolution schemes and conditional-learning methods into a neural-network framework with end-to-end forecast error optimization. On the basis of radar observations from the USA and China, our model produces physically plausible precipitation nowcasts with sharp multiscale patterns over regions of 2,048 km × 2,048 km and with lead times of up to 3 h. In a systematic evaluation by 62 professional meteorologists from across China, our model ranks first in 71% of cases against the leading methods. NowcastNet provides skilful forecasts at light-to-heavy rain rates, particularly for extreme-precipitation events accompanied by advective or convective processes that were previously considered intractable.

1 citations


Journal ArticleDOI
TL;DR: In this article , a unifying framework for the design and analysis of multi-calibrated and moment-multi-alibrated predictors is proposed. But this framework is not suitable for multi-objective learning, where learning guarantees must hold simultaneously over a set of distributions.
Abstract: We provide a unifying framework for the design and analysis of multi-calibrated and moment-multi-calibrated predictors. Placing the multi-calibration problem in the general setting of \emph{multi-objective learning} -- where learning guarantees must hold simultaneously over a set of distributions and loss functions -- we exploit connections to game dynamics to obtain state-of-the-art guarantees for a diverse set of multi-calibration learning problems. In addition to shedding light on existing multi-calibration guarantees, and greatly simplifying their analysis, our approach yields a $1/\epsilon^2$ improvement in the number of oracle calls compared to the state-of-the-art algorithm of Jung et al. 2021 for learning deterministic moment-calibrated predictors and an exponential improvement in $k$ compared to the state-of-the-art algorithm of Gopalan et al. 2022 for learning a $k$-class multi-calibrated predictor. Beyond multi-calibration, we use these game dynamics to address existing and emerging considerations in the study of group fairness and multi-distribution learning.

1 citations


Journal ArticleDOI
28 May 2023-PLOS ONE
TL;DR: The Researching COVID to Enhance Recovery (RECOVER) Multi-site Observational Study of SARS-CoV-2 infection (PASC) in Adults (reCOVER-Adult) as discussed by the authors is the first large-scale longitudinal cohort of PASC among US adults.
Abstract: Importance: SARS-CoV-2 infection can result in ongoing, relapsing, or new symptoms or other health effects after the acute phase of infection; termed post-acute sequelae of SARS-CoV-2 infection (PASC), or long COVID. The characteristics, prevalence, trajectory and mechanisms of PASC are ill-defined. The objectives of the Researching COVID to Enhance Recovery (RECOVER) Multi-site Observational Study of PASC in Adults (RECOVER-Adult) are to: (1) characterize PASC prevalence; (2) characterize the symptoms, organ dysfunction, natural history, and distinct phenotypes of PASC; (3) identify demographic, social and clinical risk factors for PASC onset and recovery; and (4) define the biological mechanisms underlying PASC pathogenesis. Methods: RECOVER-Adult is a combined prospective/retrospective cohort currently planned to enroll 14,880 adults aged [≥]18 years. Eligible participants either must meet WHO criteria for suspected, probable, or confirmed infection; or must have evidence of no prior infection. Recruitment occurs at 86 sites in 33 U.S. states, Washington, DC and Puerto Rico, via facility- and community-based outreach. Participants complete quarterly questionnaires about symptoms, social determinants, vaccination status, and interim SARS-CoV-2 infections. In addition, participants contribute biospecimens and undergo physical and laboratory examinations at approximately 0, 90 and 180 days from infection or negative test date, and yearly thereafter. Some participants undergo additional testing based on specific criteria or random sampling. Patient representatives provide input on all study processes. The primary study outcome is onset of PASC, measured by signs and symptoms. A paradigm for identifying PASC cases will be defined and updated using supervised and unsupervised learning approaches with cross-validation. Logistic regression and proportional hazards regression will be conducted to investigate associations between risk factors, onset, and resolution of PASC symptoms. Discussion: RECOVER-Adult is the first national, prospective, longitudinal cohort of PASC among US adults. Results of this study are intended to inform public health, spur clinical trials, and expand treatment options.

Journal ArticleDOI
TL;DR: In this paper , the authors studied the problem of online learning in a two-player decentralized cooperative Stackelberg game, where the leader first takes an action, followed by the follower who takes their action after observing the leader's move.
Abstract: We study the problem of online learning in a two-player decentralized cooperative Stackelberg game. In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader's move. The goal of the leader is to learn to minimize the cumulative regret based on the history of interactions. Differing from the traditional formulation of repeated Stackelberg games, we assume the follower is omniscient, with full knowledge of the true reward, and that they always best-respond to the leader's actions. We analyze the sample complexity of regret minimization in this repeated Stackelberg game. We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically, from constant to exponential, even for linear cooperative Stackelberg games. This poses unique challenges for the learning process of the leader and the subsequent regret analysis.

Journal ArticleDOI
TL;DR: In this paper , a cold atmospheric plasma (CAP) was designed and developed for SARS-CoV-2 killing as evaluated by pseudotyped viral infectivity assays.
Abstract: A Cold Atmospheric Plasma (CAP) apparatus was designed and developed for SARS-CoV-2 killing as evaluated by pseudotyped viral infectivity assays. The reactive species generated by the plasma system was fully characterized by using Optical Emission Spectroscopy (OES) measurement under given conditions such as plasma power, flow rate, and treatment time. A variety of reactive oxygen species (ROS) and reactive nitrogen species (RNS) were identified from plasma plume with energies of 15–72 eV in the frequency range between 500–1000 nm. Systematic virus killing experiments were carried out, and the efficacy of CAP treatment in reducing SARS-CoV-2 viral infectivity was significant following treatment for 8 s, with further enhancement of killing upon longer exposures of 15–120 s. We correlated killing efficacy with the reactive species in terms of type, intensity, energy, and frequency. These experimental results demonstrate effective cold plasma virus killing via ROS and RNS under ambient conditions.

Journal ArticleDOI
TL;DR: Clustered conformal prediction as mentioned in this paper clusters together classes that have similar conformal scores and then performs conformal predictions at the cluster level to obtain a stronger guarantee that for test points of a specific class, the prediction set contains the true label with the same user-selected probability.
Abstract: Standard conformal prediction methods provide a marginal coverage guarantee, which means that for a random test point, the conformal prediction set contains the true label with a user-chosen probability. In many classification problems, we would like to obtain a stronger guarantee -- that for test points of a specific class, the prediction set contains the true label with the same user-chosen probability. Existing conformal prediction methods do not work well when there is a limited amount of labeled data per class, as is often the case in real applications where the number of classes is large. We propose a method called clustered conformal prediction, which clusters together classes that have"similar"conformal scores and then performs conformal prediction at the cluster level. Based on empirical evaluation across four image data sets with many (up to 1000) classes, we find that clustered conformal typically outperforms existing methods in terms of class-conditional coverage and set size metrics.

Journal ArticleDOI
TL;DR: In this paper , the authors formulate the problem of federated learning as a game between the principal and multiple agents, and focus on the linear experiment design problem to formally study their interaction, showing that the statistical criterion used to quantify the diversity of the data, as well as the choice of the FL algorithm used, has a significant effect on the resulting equilibrium.
Abstract: For a federated learning model to perform well, it is crucial to have a diverse and representative dataset. However, the data contributors may only be concerned with the performance on a specific subset of the population, which may not reflect the diversity of the wider population. This creates a tension between the principal (the FL platform designer) who cares about global performance and the agents (the data collectors) who care about local performance. In this work, we formulate this tension as a game between the principal and multiple agents, and focus on the linear experiment design problem to formally study their interaction. We show that the statistical criterion used to quantify the diversity of the data, as well as the choice of the federated learning algorithm used, has a significant effect on the resulting equilibrium. We leverage this to design simple optimal federated learning mechanisms that encourage data collectors to contribute data representative of the global population, thereby maximizing global performance.

Journal ArticleDOI
TL;DR: In this article , the authors study hypothesis testing when there is an agent (e.g., a researcher or a pharmaceutical company) with a private prior about an unknown parameter and a principal who wishes to make decisions based on the parameter value.
Abstract: Contemporary scientific research is a distributed, collaborative endeavor, carried out by teams of researchers, regulatory institutions, funding agencies, commercial partners, and scientific bodies, all interacting with each other and facing different incentives. To maintain scientific rigor, statistical methods should acknowledge this state of affairs. To this end, we study hypothesis testing when there is an agent (e.g., a researcher or a pharmaceutical company) with a private prior about an unknown parameter and a principal (e.g., a policymaker or regulator) who wishes to make decisions based on the parameter value. The agent chooses whether to run a statistical trial based on their private prior and then the result of the trial is used by the principal to reach a decision. We show how the principal can conduct statistical inference that leverages the information that is revealed by an agent's strategic behavior -- their choice to run a trial or not. In particular, we show how the principal can design a policy to elucidate partial information about the agent's private prior beliefs and use this to control the posterior probability of the null. One implication is a simple guideline for the choice of significance threshold in clinical trials: the type-I error level should be set to be strictly less than the cost of the trial divided by the firm's profit if the trial is successful.

Journal ArticleDOI
TL;DR: In this article , the authors define a model of competition for classification tasks and use data representations as a lens for studying the impact of increases in scale on the performance of machine learning models.
Abstract: As the scale of machine learning models increases, trends such as scaling laws anticipate consistent downstream improvements in predictive accuracy. However, these trends take the perspective of a single model-provider in isolation, while in reality providers often compete with each other for users. In this work, we demonstrate that competition can fundamentally alter the behavior of these scaling trends, even causing overall predictive accuracy across users to be non-monotonic or decreasing with scale. We define a model of competition for classification tasks, and use data representations as a lens for studying the impact of increases in scale. We find many settings where improving data representation quality (as measured by Bayes risk) decreases the overall predictive accuracy across users (i.e., social welfare) for a marketplace of competing model-providers. Our examples range from closed-form formulas in simple settings to simulations with pretrained representations on CIFAR-10. At a conceptual level, our work suggests that favorable scaling trends for individual model-providers need not translate to downstream improvements in social welfare in marketplaces with multiple model providers.

Journal ArticleDOI
TL;DR: In this paper , the authors presented a method for solving general nonconvex-strongly-concave bilevel optimization problems, which achieves the best known theoretical guarantees for finding stationary points.
Abstract: We present a method for solving general nonconvex-strongly-convex bilevel optimization problems. Our method -- the \emph{Restarted Accelerated HyperGradient Descent} (\texttt{RAHGD}) method -- finds an $\epsilon$-first-order stationary point of the objective with $\tilde{\mathcal{O}}(\kappa^{3.25}\epsilon^{-1.75})$ oracle complexity, where $\kappa$ is the condition number of the lower-level objective and $\epsilon$ is the desired accuracy. We also propose a perturbed variant of \texttt{RAHGD} for finding an $\big(\epsilon,\mathcal{O}(\kappa^{2.5}\sqrt{\epsilon}\,)\big)$-second-order stationary point within the same order of oracle complexity. Our results achieve the best-known theoretical guarantees for finding stationary points in bilevel optimization and also improve upon the existing upper complexity bound for finding second-order stationary points in nonconvex-strongly-concave minimax optimization problems, setting a new state-of-the-art benchmark. Empirical studies are conducted to validate the theoretical results in this paper.

Journal ArticleDOI
TL;DR: The Federated Conformal Prediction (FCP) framework as mentioned in this paper extends conformal prediction to the federated learning setting, where data heterogeneity across the clients violates the fundamental tenet of exchangeability required for conformal predictions.
Abstract: Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning since it can be easily applied as a post-processing step to already trained models. In this paper, we extend conformal prediction to the federated learning setting. The main challenge we face is data heterogeneity across the clients - this violates the fundamental tenet of exchangeability required for conformal prediction. We propose a weaker notion of partial exchangeability, better suited to the FL setting, and use it to develop the Federated Conformal Prediction (FCP) framework. We show FCP enjoys rigorous theoretical guarantees and excellent empirical performance on several computer vision and medical imaging datasets. Our results demonstrate a practical approach to incorporating meaningful uncertainty quantification in distributed and heterogeneous environments. We provide code used in our experiments https://github.com/clu5/federated-conformal.

Journal ArticleDOI
TL;DR: This article proposed an advantage-induced policy alignment (APAIA) algorithm, which leverages a squared error loss function based on the estimated advantages to align large language models to human preferences.
Abstract: Reinforcement learning from human feedback (RLHF) has emerged as a reliable approach to aligning large language models (LLMs) to human preferences. Among the plethora of RLHF techniques, proximal policy optimization (PPO) is of the most widely used methods. Despite its popularity, however, PPO may suffer from mode collapse, instability, and poor sample efficiency. We show that these issues can be alleviated by a novel algorithm that we refer to as Advantage-Induced Policy Alignment (APA), which leverages a squared error loss function based on the estimated advantages. We demonstrate empirically that APA consistently outperforms PPO in language tasks by a large margin, when a separate reward model is employed as the evaluator. In addition, compared with PPO, APA offers a more stable form of control over the deviation from the model's initial policy, ensuring that the model improves its performance without collapsing to deterministic output. In addition to empirical results, we also provide a theoretical justification supporting the design of our loss function.

Journal ArticleDOI
TL;DR: In this article , the authors directly study the incentive misalignments that arise from such average treated outcome metrics, and show that the incentives driving treatment decisions would align with maximizing total patient welfare if the metrics (i) accounted for counterfactual untreated outcomes and (ii) considered total welfare instead of average welfare among treated patients.
Abstract: From the social sciences to machine learning, it has been well documented that metrics to be optimized are not always aligned with social welfare. In healthcare, Dranove et al. [12] showed that publishing surgery mortality metrics actually harmed the welfare of sicker patients by increasing provider selection behavior. Using a principal-agent model, we directly study the incentive misalignments that arise from such average treated outcome metrics, and show that the incentives driving treatment decisions would align with maximizing total patient welfare if the metrics (i) accounted for counterfactual untreated outcomes and (ii) considered total welfare instead of average welfare among treated patients. Operationalizing this, we show how counterfactual metrics can be modified to satisfy desirable properties when used for ranking. Extending to realistic settings when the providers observe more about patients than the regulatory agencies do, we bound the decay in performance by the degree of information asymmetry between the principal and the agent. In doing so, our model connects principal-agent information asymmetry with unobserved heterogeneity in causal inference.

Journal ArticleDOI
TL;DR: In this article , the authors revisited the original scheme of Riemannian gradient descent and analyzed it under a geodesic monotonicity assumption, which includes the well-studied geodesically convex-concave min-max optimization problem as a special case.
Abstract: Numerous applications in machine learning and data analytics can be formulated as equilibrium computation over Riemannian manifolds. Despite the extensive investigation of their Euclidean counterparts, the performance of Riemannian gradient-based algorithms remain opaque and poorly understood. We revisit the original scheme of Riemannian gradient descent (RGD) and analyze it under a geodesic monotonicity assumption, which includes the well-studied geodesically convex-concave min-max optimization problem as a special case. Our main contribution is to show that, despite the phenomenon of distance distortion, the RGD scheme, with a step size that is agnostic to the manifold's curvature, achieves a curvature-independent and linear last-iterate convergence rate in the geodesically strongly monotone setting. To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence in the Riemannian setting has not been considered before.

Journal ArticleDOI
TL;DR: In this paper , the authors study the incentives arising from online learning, analyzing the quality of content produced at a Nash equilibrium, and design a different learning algorithm based on punishing producers who create low-quality content.
Abstract: For content recommender systems such as TikTok and YouTube, the platform's decision algorithm shapes the incentives of content producers, including how much effort the content producers invest in the quality of their content. Many platforms employ online learning, which creates intertemporal incentives, since content produced today affects recommendations of future content. In this paper, we study the incentives arising from online learning, analyzing the quality of content produced at a Nash equilibrium. We show that classical online learning algorithms, such as Hedge and EXP3, unfortunately incentivize producers to create low-quality content. In particular, the quality of content is upper bounded in terms of the learning rate and approaches zero for typical learning rate schedules. Motivated by this negative result, we design a different learning algorithm -- based on punishing producers who create low-quality content -- that correctly incentivizes producers to create high-quality content. At a conceptual level, our work illustrates the unintended impact that a platform's learning algorithm can have on content quality and opens the door towards designing platform learning algorithms that incentivize the creation of high-quality content.

Journal ArticleDOI
01 Feb 2023-Viruses
TL;DR: The first nationally representative cross-sectional HIV drug resistance survey was conducted in Uruguay in 2018-2019 among adults diagnosed with HIV and initiating or reinitiating antiretroviral therapy (ART) as discussed by the authors .
Abstract: The first nationally representative cross-sectional HIV drug resistance (HIVDR) survey was conducted in Uruguay in 2018–2019 among adults diagnosed with HIV and initiating or reinitiating antiretroviral therapy (ART). Protease, reverse transcriptase, and integrase genes of HIV-1 were sequenced. A total of 206 participants were enrolled in the survey; 63.2% were men, 85.7% were >25 years of age, and 35.6% reported previous exposure to antiretroviral (ARV) drugs. The prevalence of HIVDR to efavirenz or nevirapine was significantly higher (OR: 1.82, p < 0.001) in adults with previous ARV drug exposure (20.3%, 95% CI: 18.7–22.0%) compared to adults without previous ARV drug exposure (12.3%, 11.0–13.8%). HIVDR to any nucleoside reverse transcriptase inhibitors was 10.3% (9.4–11.2%). HIVDR to ritonavir-boosted protease inhibitors was 1.5% (1.1–2.1%); resistance to ritonavir-boosted darunavir was 0.9% (0.4–2.1%) among adults without previous ARV drug exposure and it was not observed among adults with previous ARV drug exposure. Resistance to integrase inhibitors was 12.7% (11.7–13.8%), yet HIVDR to dolutegravir, bictegravir, and cabotegravir was not observed. The high level (>10%) of HIVDR to efavirenz highlights the need to accelerate the transition to the WHO-recommended dolutegravir-based ART. Access to dolutegravir-based ART should be prioritised for people reporting previous ARV drug exposure.

Journal ArticleDOI
TL;DR: In this paper , the authors study two approaches for mitigating resource consumption and latency challenges: employing a cache to store previous queries and learning a model multiplexer to choose from an ensemble of models for query processing.
Abstract: Large Language Models (LLMs) and other large foundation models have achieved noteworthy success, but their size exacerbates existing resource consumption and latency challenges. In particular, the large-scale deployment of these models is hindered by the significant resource requirements during inference. In this paper, we study two approaches for mitigating these challenges: employing a cache to store previous queries and learning a model multiplexer to choose from an ensemble of models for query processing. Theoretically, we provide an optimal algorithm for jointly optimizing both approaches to reduce the inference cost in both offline and online tabular settings. By combining a caching algorithm, namely Greedy Dual Size with Frequency (GDSF) or Least Expected Cost (LEC), with a model multiplexer, we achieve optimal rates in both offline and online settings. Empirically, simulations show that the combination of our caching and model multiplexing algorithms greatly improves over the baselines, with up to $50\times$ improvement over the baseline when the ratio between the maximum cost and minimum cost is $100$. Experiments on real datasets show a $4.3\times$ improvement in FLOPs over the baseline when the ratio for FLOPs is $10$, and a $1.8\times$ improvement in latency when the ratio for average latency is $1.85$.

Journal Article
TL;DR: In this article , the authors consider general-sum Markov games with myopically rational players and develop sample-efficient RL algorithms for solving for a Stackelberg-Nash equilibrium.
Abstract: We study multi-player general-sum Markov games with one of the players designated as the leader and the other players regarded as followers. In particular, we focus on the class of games where the followers are myopically rational; i.e., they aim to maximize their instantaneous rewards. For such a game, our goal is to find a Stackelberg-Nash equilibrium (SNE), which is a policy pair (π∗, ν∗) such that: (i) π∗ is the optimal policy for the leader when the followers always play their best response, and (ii) ν∗ is the best response policy of the followers, which is a Nash equilibrium of the followers’ game induced by π∗. We develop sample-efficient reinforcement learning (RL) algorithms for solving for an SNE in both online and offline settings. Our algorithms are optimistic and pessimistic variants of least-squares value iteration, and they are readily able to incorporate function approximation tools in the setting of large state spaces. Furthermore, for the case with linear function approximation, we prove that our algorithms achieve sublinear regret and suboptimality under online and offline setups respectively. To the best of our knowledge, we establish the first provably efficient RL algorithms for solving for SNEs in general-sum Markov games with myopically rational followers.

Journal ArticleDOI
TL;DR: In this paper , the authors exploit analogies between first-order algorithms for constrained optimization and non-smooth dynamical systems to design a new class of accelerated firstorder algorithms.
Abstract: We exploit analogies between first-order algorithms for constrained optimization and non-smooth dynamical systems to design a new class of accelerated first-order algorithms for constrained optimization. Unlike Frank-Wolfe or projected gradients, these algorithms avoid optimization over the entire feasible set at each iteration. We prove convergence to stationary points even in a nonconvex setting and we derive rates for the convex setting. An important property of these algorithms is that constraints are expressed in terms of velocities instead of positions, which naturally leads to sparse, local and convex approximations of the feasible set (even if the feasible set is nonconvex). Thus, the complexity tends to grow mildly in the number of decision variables and in the number of constraints, which makes the algorithms suitable for machine learning applications. We apply our algorithms to a compressed sensing and a sparse regression problem, showing that we can treat nonconvex $\ell^p$ constraints ($p<1$) efficiently, while recovering state-of-the-art performance for $p=1$.

Journal ArticleDOI
TL;DR: In this paper , the authors formalize personalized federated learning as a stochastic optimization problem, where the gradient on a client may correspond to one of several distributions of the same distribution.
Abstract: Clustering clients with similar objectives and learning a model per cluster is an intuitive and interpretable approach to personalization in federated learning. However, doing so with provable and optimal guarantees has remained an open challenge. In this work, we formalize personalized federated learning as a stochastic optimization problem where the stochastic gradients on a client may correspond to one of $K$ distributions. In such a setting, we show that using i) a simple thresholding-based clustering algorithm, and ii) local client gradients obtains optimal convergence guarantees. In fact, our rates asymptotically match those obtained if we knew the true underlying clustering of the clients. Furthermore, our algorithms are provably robust in the Byzantine setting where some fraction of the gradients are corrupted.