Institution
Alibaba Group
Company•Hangzhou, China•
About: Alibaba Group is a company organization based out in Hangzhou, China. It is known for research contribution in the topics: Computer science & Terminal (electronics). The organization has 6810 authors who have published 7389 publications receiving 55653 citations. The organization is also known as: Alibaba Group Holding Limited & Alibaba Group (Cayman Islands).
Topics: Computer science, Terminal (electronics), Graph (abstract data type), Node (networking), Deep learning
Papers published on a yearly basis
Papers
More filters
••
11 Jul 2021TL;DR: In this article, a dynamic weighted graph based RL method is proposed to learn a policy to select the action at each conversation turn, either asking an attribute or recommending items, and two action selection strategies for reducing the candidate action space according to the preference and entropy information.
Abstract: Conversational recommender systems (CRS) enable the traditional recommender systems to explicitly acquire user preferences towards items and attributes through interactive conversations. Reinforcement learning (RL) is widely adopted to learn conversational recommendation policies to decide what attributes to ask, which items to recommend, and when to ask or recommend, at each conversation turn. However, existing methods mainly target at solving one or two of these three decision-making problems in CRS with separated conversation and recommendation components, which restrict the scalability and generality of CRS and fall short of preserving a stable training procedure. In the light of these challenges, we propose to formulate these three decision-making problems in CRS as a unified policy learning task. In order to systematically integrate conversation and recommendation components, we develop a dynamic weighted graph based RL method to learn a policy to select the action at each conversation turn, either asking an attribute or recommending items. Further, to deal with the sample efficiency issue, we propose two action selection strategies for reducing the candidate action space according to the preference and entropy information. Experimental results on two benchmark CRS datasets and a real-world E-Commerce application show that the proposed method not only significantly outperforms state-of-the-art methods but also enhances the scalability and stability of CRS.
60 citations
••
13 May 2019
TL;DR: This paper proposes the hierarchical neural encoder based on adaptive recurrent networks to learn the semantic representation of meeting conversation with adaptive conversation segmentation and develops the reinforced decoder network to generate the high-quality summaries for abstractive meeting summarization.
Abstract: ive meeting summarization is a challenging problem in natural language understanding, which automatically generates the condensed summary covering the important points in the meeting conversation. However, the existing abstractive summarization works mainly focus on the structured text documents, which may be ineffectively applied to the meeting summarization task due to the lack of modeling the unstructured long-form conversational contents. In this paper, we consider the problem of abstractive meeting summarization from the viewpoint of hierarchical adaptive segmental encoder-decoder network learning. We propose the hierarchical neural encoder based on adaptive recurrent networks to learn the semantic representation of meeting conversation with adaptive conversation segmentation. We then develop the reinforced decoder network to generate the high-quality summaries for abstractive meeting summarization. We conduct the extensive experiments on the well-known AMI meeting conversation dataset to validate the effectiveness of our proposed method.
60 citations
••
12 Oct 2020TL;DR: HiFaceGAN, a multi-stage framework containing several nested CSR units that progressively replenish facial details based on the hierarchical semantic guidance extracted from the front-end content-adaptive suppression modules, is presented.
Abstract: Existing face restoration researches typically rely on either the image degradation prior or explicit guidance labels for training, which often lead to limited generalization ability over real-world images with heterogeneous degradation and rich background contents. In this paper, we investigate a more challenging and practical "dual-blind" version of the problem by lifting the requirements on both types of prior, termed as "Face Renovation"(FR). Specifically, we formulate FR as a semantic-guided generation problem and tackle it with a collaborative suppression and replenishment (CSR) approach. This leads to HiFaceGAN, a multi-stage framework containing several nested CSR units that progressively replenish facial details based on the hierarchical semantic guidance extracted from the front-end content-adaptive suppression modules. Extensive experiments on both synthetic and real face images have verified the superior performance of our HiFaceGAN over a wide range of challenging restoration subtasks, demonstrating its versatility, robustness and generalization ability towards real-world face processing applications. Code is available at https://github.com/Lotayou/Face-Renovation.
60 citations
••
10 Apr 2018TL;DR: This paper proposes a novel model named Multi-Agent Recurrent Deterministic Policy Gradient (MA-RDPG) which has a communication component for passing messages, several private actors (agents) for making actions for ranking, and a centralized critic for evaluating the overall performance of the co-working actors.
Abstract: Ranking is a fundamental and widely studied problem in scenarios such as search, advertising, and recommendation. However, joint optimization for multi-scenario ranking, which aims to improve the overall performance of several ranking strategies in different scenarios, is rather untouched. Separately optimizing each individual strategy has two limitations. The first one is lack of collaboration between scenarios meaning that each strategy maximizes its own objective but ignores the goals of other strategies, leading to a sub-optimal overall performance. The second limitation is the inability of modeling the correlation between scenarios meaning that independent optimization in one scenario only uses its own user data but ignores the context in other scenarios. In this paper, we formulate multi-scenario ranking as a fully cooperative, partially observable, multi-agent sequential decision problem. We propose a novel model named Multi-Agent Recurrent Deterministic Policy Gradient (MA-RDPG) which has a communication component for passing messages, several private actors (agents) for making actions for ranking, and a centralized critic for evaluating the overall performance of the co-working actors. Each scenario is treated as an agent (actor). Agents collaborate with each other by sharing a global action-value function (the critic) and passing messages that encodes historical information across scenarios. The model is evaluated with online settings on a large E-commerce platform. Results show that the proposed model exhibits significant improvements against baselines in terms of the overall performance.
60 citations
•
01 Sep 2017TL;DR: A coarse-to-fine multi-stage prediction framework for image captioning, composed of multiple decoders each of which operates on the output of the previous stage, producing increasingly refined image descriptions, which simultaneously solves the well-known exposure bias problem and the loss-evaluation mismatch problem.
Abstract: The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multi-stage prediction framework for image captioning, composed of multiple decoders each of which operates on the output of the previous stage, producing increasingly refined image descriptions. Our proposed learning approach addresses the difficulty of vanishing gradients during training by providing a learning objective function that enforces intermediate supervisions. Particularly, we optimize our model with a reinforcement learning approach which utilizes the output of each intermediate decoder's test-time inference algorithm as well as the output of its preceding decoder to normalize the rewards, which simultaneously solves the well-known exposure bias problem and the loss-evaluation mismatch problem. We extensively evaluate the proposed approach on MSCOCO and show that our approach can achieve the state-of-the-art performance.
60 citations
Authors
Showing all 6829 results
Name | H-index | Papers | Citations |
---|---|---|---|
Philip S. Yu | 148 | 1914 | 107374 |
Lei Zhang | 130 | 2312 | 86950 |
Jian Xu | 94 | 1366 | 52057 |
Wei Chu | 80 | 670 | 28771 |
Le Song | 76 | 345 | 21382 |
Yuan Xie | 76 | 739 | 24155 |
Narendra Ahuja | 76 | 474 | 29517 |
Rong Jin | 75 | 449 | 19456 |
Beng Chin Ooi | 73 | 408 | 19174 |
Wotao Yin | 72 | 303 | 27233 |
Deng Cai | 70 | 326 | 24524 |
Xiaofei He | 70 | 260 | 28215 |
Irwin King | 67 | 476 | 19056 |
Gang Wang | 65 | 373 | 21579 |
Xiaodan Liang | 61 | 318 | 14121 |