scispace - formally typeset
Search or ask a question

Showing papers presented at "International ACM SIGIR Conference on Research and Development in Information Retrieval in 2020"


Proceedings ArticleDOI
25 Jul 2020
TL;DR: LightGCN as mentioned in this paper learns user and item embedding by linearly propagating them on the user-item interaction graph, and uses the weighted sum of the embeddings learned at all layers as the final embedding.
Abstract: Graph Convolution Network (GCN) has become new state-of-the-art for collaborative filtering. Nevertheless, the reasons of its effectiveness for recommendation are not well understood. Existing work that adapts GCN to recommendation lacks thorough ablation analyses on GCN, which is originally designed for graph classification tasks and equipped with many neural network operations. However, we empirically find that the two most common designs in GCNs -- feature transformation and nonlinear activation -- contribute little to the performance of collaborative filtering. Even worse, including them adds to the difficulty of training and degrades recommendation performance. In this work, we aim to simplify the design of GCN to make it more concise and appropriate for recommendation. We propose a new model named LightGCN, including only the most essential component in GCN -- neighborhood aggregation -- for collaborative filtering. Specifically, LightGCN learns user and item embeddings by linearly propagating them on the user-item interaction graph, and uses the weighted sum of the embeddings learned at all layers as the final embedding. Such simple, linear, and neat model is much easier to implement and train, exhibiting substantial improvements (about 16.0% relative improvement on average) over Neural Graph Collaborative Filtering (NGCF) -- a state-of-the-art GCN-based recommender model -- under exactly the same experimental setting. Further analyses are provided towards the rationality of the simple LightGCN from both analytical and empirical perspectives.

962 citations


Proceedings ArticleDOI
25 Jul 2020
TL;DR: ColBERT is presented, a novel ranking model that adapts deep LMs (in particular, BERT) for efficient retrieval that is competitive with existing BERT-based models (and outperforms every non-BERT baseline) and enables leveraging vector-similarity indexes for end-to-end retrieval directly from millions of documents.
Abstract: Recent progress in Natural Language Understanding (NLU) is driving fast-paced advances in Information Retrieval (IR), largely owed to fine-tuning deep language models (LMs) for document ranking. While remarkably effective, the ranking models based on these LMs increase computational cost by orders of magnitude over prior approaches, particularly as they must feed each query-document pair through a massive neural network to compute a single relevance score. To tackle this, we present ColBERT, a novel ranking model that adapts deep LMs (in particular, BERT) for efficient retrieval. ColBERT introduces a late interaction architecture that independently encodes the query and the document using BERT and then employs a cheap yet powerful interaction step that models their fine-grained similarity. By delaying and yet retaining this fine-granular interaction, ColBERT can leverage the expressiveness of deep LMs while simultaneously gaining the ability to pre-compute document representations offline, considerably speeding up query processing. Crucially, ColBERT's pruning-friendly interaction mechanism enables leveraging vector-similarity indexes for end-to-end retrieval directly from millions of documents. We extensively evaluate ColBERT using two recent passage search datasets. Results show that ColBERT's effectiveness is competitive with existing BERT-based models (and outperforms every non-BERT baseline), while executing two orders-of-magnitude faster and requiring up to four orders-of-magnitude fewer FLOPs per query.

658 citations


Proceedings ArticleDOI
25 Jul 2020
TL;DR: A novel approach to exploit item transitions over all sessions in a more subtle manner for better inferring the user preference of the current session, called GCE-GNN, which outperforms the state-of-the-art methods consistently.
Abstract: Session-based recommendation (SBR) is a challenging task, which aims at recommending items based on anonymous behavior sequences. Almost all the existing solutions for SBR model user preference only based on the current session without exploiting the other sessions, which may contain both relevant and irrelevant item-transitions to the current session. This paper proposes a novel approach, called Global Context Enhanced Graph Neural Networks (GCE-GNN) to exploit item transitions over all sessions in a more subtle manner for better inferring the user preference of the current session. Specifically, GCE-GNN learns two levels of item embeddings from session graph and global graph, respectively: (i) Session graph, which is to learn the session-level item embedding by modeling pairwise item-transitions within the current session; and (ii) Global graph, which is to learn the global-level item embedding by modeling pairwise item-transitions over all sessions. In GCE-GNN, we propose a novel global-level item representation learning layer, which employs a session-aware attention mechanism to recursively incorporate the neighbors' embeddings of each node on the global graph. We also design a session-level item representation learning layer, which employs a GNN on the session graph to learn session-level item embeddings within the current session. Moreover, GCE-GNN aggregates the learnt item representations in the two levels with a soft attention mechanism. Experiments on three benchmark datasets demonstrate that GCE-GNN outperforms the state-of-the-art methods consistently.

243 citations


Proceedings ArticleDOI
25 Jul 2020
TL;DR: Wang et al. as mentioned in this paper proposed Disentangled Graph Collaborative Filtering (DGCF) to disentangle the factors and yield disentangled representations by modeling a distribution over intents for each user-item interaction.
Abstract: Learning informative representations of users and items from the interaction data is of crucial importance to collaborative filtering (CF). Present embedding functions exploit user-item relationships to enrich the representations, evolving from a single user-item instance to the holistic interaction graph. Nevertheless, they largely model the relationships in a uniform manner, while neglecting the diversity of user intents on adopting the items, which could be to pass time, for interest, or shopping for others like families. Such uniform approach to model user interests easily results in suboptimal representations, failing to model diverse relationships and disentangle user intents in representations. In this work, we pay special attention to user-item relationships at the finer granularity of user intents. We hence devise a new model, Disentangled Graph Collaborative Filtering (DGCF), to disentangle these factors and yield disentangled representations. Specifically, by modeling a distribution over intents for each user-item interaction, we iteratively refine the intent-aware interaction graphs and representations. Meanwhile, we encourage independence of different intents. This leads to disentangled representations, effectively distilling information pertinent to each intent. We conduct extensive experiments on three benchmark datasets, and DGCF achieves significant improvements over several state-of-the-art models like NGCF, DisenGCN, and MacridVAE. Further analyses offer insights into the advantages of DGCF on the disentanglement of user intents and interpretability of representations. Our codes are available in https://github.com/ xiangwang1223/disentangled_graph_collaborative_filtering.

177 citations


Proceedings ArticleDOI
25 Jul 2020
TL;DR: This work innovatively constructing a unified graph to represent multi-behavior data and proposing a new model named MBGCN (short for Multi-Behavior Graph Convolutional Network), which can well address the limitations of existing works.
Abstract: Traditional recommendation models that usually utilize only one type of user-item interaction are faced with serious data sparsity or cold start issues. Multi-behavior recommendation taking use of multiple types of user-item interactions, such as clicks and favorites, can serve as an effective solution. Early efforts towards multi-behavior recommendation fail to capture behaviors' different influence strength on target behavior. They also ignore behaviors' semantics which is implied in multi-behavior data. Both of these two limitations make the data not fully exploited for improving the recommendation performance on the target behavior. In this work, we approach this problem by innovatively constructing a unified graph to represent multi-behavior data and proposing a new model named MBGCN (short for Multi-Behavior Graph Convolutional Network ). Learning behavior strength by user-item propagation layer and capturing behavior semantics by item-item propagation layer, MBGCN can well address the limitations of existing works. Empirical results on two real-world datasets verify the effectiveness of our model in exploiting multi-behavior data. Our model outperforms the best baseline by 25.02% and 6.51% averagely on two datasets. Further studies on cold-start users confirm the practicability of our proposed model.

169 citations


Proceedings ArticleDOI
25 Jul 2020
TL;DR: The proposed model can significantly outperform the state-of-the-art in predicting the next interesting item for each user and is equipped with a fusion layer to incorporate both the dynamic item embedding and short-term user intent to the representation of each interaction.
Abstract: There is an increasing attention on next-item recommendation systems to infer the dynamic user preferences with sequential user interactions. While the semantics of an item can change over time and across users, the item correlations defined by user interactions in the short term can be distilled to capture such change, and help in uncovering the dynamic user preferences. Thus, we are motivated to develop a novel next-item recommendation framework empowered by sequential hypergraphs. Specifically, the framework: (i) adopts hypergraph to represent the short-term item correlations and applies multiple convolutional layers to capture multi-order connections in the hypergraph; (ii) models the connections between different time periods with a residual gating layer; and (iii) is equipped with a fusion layer to incorporate both the dynamic item embedding and short-term user intent to the representation of each interaction before feeding it into the self-attention layer for dynamic user modeling. Through experiments on datasets from the ecommerce sites Amazon and Etsy and the information sharing platform Goodreads, the proposed model can significantly outperform the state-of-the-art in predicting the next interesting item for each user.

151 citations


Proceedings ArticleDOI
Ze Wang1, Guangyan Lin1, Huobin Tan1, Qinghong Chen1, Xiyang Liu1 
25 Jul 2020
TL;DR: This paper proposes a novel method named Collaborative Knowledge-aware Attentive Network (CKAN), which explicitly encodes the collaborative signals by collaboration propagation and proposes a natural way of combining collaborative signals with knowledge associations together.
Abstract: Since it can effectively address the problem of sparsity and cold start of collaborative filtering, knowledge graph (KG) is widely studied and employed as side information in the field of recommender systems. However, most of existing KG-based recommendation methods mainly focus on how to effectively encode the knowledge associations in KG, without highlighting the crucial collaborative signals which are latent in user-item interactions. As such, the learned embeddings underutilize the two kinds of pivotal information and are insufficient to effectively represent the latent semantics of users and items in vector space. In this paper, we propose a novel method named Collaborative Knowledge-aware Attentive Network (CKAN) which explicitly encodes the collaborative signals by collaboration propagation and proposes a natural way of combining collaborative signals with knowledge associations together. Specifically, CKAN employs a heterogeneous propagation strategy to explicitly encode both kinds of information, and applies a knowledge-aware attention mechanism to discriminate the contribution of different knowledge-based neighbors. Compared with other KG-based methods, CKAN provides a brand-new idea of combining collaborative information with knowledge information together. We apply the proposed model on four real-world datasets, and the empirical results demonstrate that CKAN significantly outperforms several compelling state-of-the-art baselines.

134 citations


Proceedings ArticleDOI
25 Jul 2020
TL;DR: This paper analyzes different groups of users according to their level of activity, and finds that bias exists in recommendation performance between different groups, and proposes a fairness constrained approach via heuristic re-ranking to mitigate this unfairness problem in the context of explainable recommendation over knowledge graphs.
Abstract: There has been growing attention on fairness considerations recently, especially in the context of intelligent decision making systems. For example, explainable recommendation systems may suffer from both explanation bias and performance disparity. We show that inactive users may be more susceptible to receiving unsatisfactory recommendations due to their insufficient training data, and that their recommendations may be biased by the training records of active users due to the nature of collaborative filtering, which leads to unfair treatment by the system. In this paper, we analyze different groups of users according to their level of activity, and find that bias exists in recommendation performance between different groups. Empirically, we find that such performance gap is caused by the disparity of data distribution, specifically the knowledge graph path distribution in this work. We propose a fairness constrained approach via heuristic re-ranking to mitigate this unfairness problem in the context of explainable recommendation over knowledge graphs. We experiment on several real-world datasets with state-of-the-art knowledge graph-based explainable recommendation algorithms. The promising results show that our algorithm is not only able to provide high-quality explainable recommendations, but also reduces the recommendation unfairness in several aspects.

124 citations


Proceedings ArticleDOI
25 Jul 2020
TL;DR: DGL-KE introduces various novel optimizations that accelerate training on knowledge graphs with millions of nodes and billions of edges using multi-processing, multi-GPU, and distributed parallelism to increase data locality, reduce communication overhead, overlap computations with memory accesses, and achieve high operation efficiency.
Abstract: Knowledge graphs have emerged as a key abstraction for organizing information in diverse domains and their embeddings are increasingly used to harness their information in various information retrieval and machine learning tasks. However, the ever growing size of knowledge graphs requires computationally efficient algorithms capable of scaling to graphs with millions of nodes and billions of edges. This paper presents DGL-KE, an open-source package to efficiently compute knowledge graph embeddings. DGL-KE introduces various novel optimizations that accelerate training on knowledge graphs with millions of nodes and billions of edges using multi-processing, multi-GPU, and distributed parallelism. These optimizations are designed to increase data locality, reduce communication overhead, overlap computations with memory accesses, and achieve high operation efficiency. Experiments on knowledge graphs consisting of over 86M nodes and 338M edges show that DGL-KE can compute embeddings in 100 minutes on an EC2 instance with 8 GPUs and 30 minutes on an EC2 cluster with 4 machines with 48 cores/machine. These results represent a 2× ~ 5× speedup over the best competing approaches. DGL-KE is available on https://github.com/awslabs/dgl-ke.

122 citations


Proceedings ArticleDOI
25 Jul 2020
TL;DR: Zhang et al. as discussed by the authors proposed a target attentive graph neural network (TAGNN) model for session-based recommendation, which adaptively activates different user interests with respect to varied target items.
Abstract: Session-based recommendation nowadays plays a vital role in many websites, which aims to predict users' actions based on anonymous sessions. There have emerged many studies that model a session as a sequence or a graph via investigating temporal transitions of items in a session. However, these methods compress a session into one fixed representation vector without considering the target items to be predicted. The fixed vector will restrict the representation ability of the recommender model, considering the diversity of target items and users' interests. In this paper, we propose a novel target attentive graph neural network (TAGNN) model for session-based recommendation. In TAGNN, target-aware attention adaptively activates different user interests with respect to varied target items. The learned interest representation vector varies with different target items, greatly improving the expressiveness of the model. Moreover, TAGNN harnesses the power of graph neural networks to capture rich item transitions in sessions. Comprehensive experiments conducted on real-world datasets demonstrate its superiority over state-of-the-art methods.

121 citations


Proceedings ArticleDOI
25 Jul 2020
TL;DR: This paper proposes a general knowledge distillation framework for counterfactual recommendation that enables uniform data modeling through four approaches that achieve better performance over the baseline models in terms of AUC and NLL.
Abstract: Recommender systems are feedback loop systems, which often face bias problems such as popularity bias, previous model bias and position bias. In this paper, we focus on solving the bias problems in a recommender system via a uniform data. Through empirical studies in online and offline settings, we observe that simple modeling with a uniform data can alleviate the bias problems and improve the performance. However, the uniform data is always few and expensive to collect in a real product. In order to use the valuable uniform data more effectively, we propose a general knowledge distillation framework for counterfactual recommendation that enables uniform data modeling through four approaches: (1) label-based distillation focuses on using the imputed labels as a carrier to provide useful de-biasing guidance; (2) feature-based distillation aims to filter out the representative causal and stable features; (3) sample-based distillation considers mutual learning and alignment of the information of the uniform and non-uniform data; and (4) model structure-based distillation constrains the training of the models from the perspective of embedded representation. We conduct extensive experiments on both public and product datasets, demonstrating that the proposed four methods achieve better performance over the baseline models in terms of AUC and NLL. Moreover, we discuss the relation between the proposed methods and the previous works. We emphasize that counterfactual modeling with uniform data is a rich research area, and list some interesting and promising research topics worthy of further exploration. Note that the source codes are available at \urlhttps://github.com/dgliu/SIGIR20_KDCRec.

Proceedings ArticleDOI
25 Jul 2020
TL;DR: Zhang et al. as mentioned in this paper introduced an open-retrieval conversational question answering (ORConvQA) setting, where they learn to retrieve evidence from a large collection before extracting answers, as a further step towards building functional conversational search systems.
Abstract: Conversational search is one of the ultimate goals of information retrieval. Recent research approaches conversational search by simplified settings of response ranking and conversational question answering, where an answer is either selected from a given candidate set or extracted from a given passage. These simplifications neglect the fundamental role of retrieval in conversational search. To address this limitation, we introduce an open-retrieval conversational question answering (ORConvQA) setting, where we learn to retrieve evidence from a large collection before extracting answers, as a further step towards building functional conversational search systems. We create a dataset, OR-QuAC, to facilitate research on ORConvQA. We build an end-to-end system for ORConvQA, featuring a retriever, a reranker, and a reader that are all based on Transformers. Our extensive experiments on OR-QuAC demonstrate that a learnable retriever is crucial for ORConvQA. We further show that our system can make a substantial improvement when we enable history modeling in all system components. Moreover, we show that the reranker component contributes to the model performance by providing a regularization effect. Finally, further in-depth analyses are performed to provide new insights into ORConvQA.

Proceedings ArticleDOI
25 Jul 2020
TL;DR: This paper proposes two frameworks namely Self-supervised Q-learning and Self-Supervised Actor-Critic and integrates the proposed frameworks with four state-of-the-art recommendation models, demonstrating the effectiveness of the approach on real-world datasets.
Abstract: In session-based or sequential recommendation, it is important to consider a number of factors like long-term user engagement, multiple types of user-item interactions such as clicks, purchases etc. The current state-of-the-art supervised approaches fail to model them appropriately. Casting sequential recommendation task as a reinforcement learning (RL) problem is a promising direction. A major component of RL approaches is to train the agent through interactions with the environment. However, it is often problematic to train a recommender in an on-line fashion due to the requirement to expose users to irrelevant recommendations. As a result, learning the policy from logged implicit feedback is of vital importance, which is challenging due to the pure off-policy setting and lack of negative rewards (feedback). In this paper, we propose self-supervised reinforcement learning for sequential recommendation tasks. Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL. The RL part acts as a regularizer to drive the supervised layer focusing on specific rewards (e.g., recommending items which may lead to purchases rather than clicks) while the self-supervised layer with cross-entropy loss provides strong gradient signals for parameter updates. Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning (SQN) and Self-Supervised Actor-Critic (SAC). We integrate the proposed frameworks with four state-of-the-art recommendation models. Experimental results on two real-world datasets demonstrate the effectiveness of our approach.

Proceedings ArticleDOI
25 Jul 2020
TL;DR: The proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep transformer networks making these networks more practical to use in a real-time ranking scenario.
Abstract: Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their computational expenses deem them cost-prohibitive in practice. Our proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep transformer networks (up to a 42x speedup on web document ranking) making these networks more practical to use in a real-time ranking scenario. Specifically, we precompute part of the document term representations at indexing time (without a query), and merge them with the query representation at query time to compute the final ranking score. Due to the large size of the token representations, we also propose an effective approach to reduce the storage requirement by training a compression layer to match attention scores. Our compression technique reduces the storage required up to 95% and it can be applied without a substantial degradation in ranking performance.

Proceedings ArticleDOI
Dehong Gao1, Linbo Jin1, Ben Chen1, Minghui Qiu1, Peng Li1, Yi Wei1, Yi Hu1, Hao Wang1 
25 Jul 2020
TL;DR: The fashion matching is required to pay much more attention to the fine-grained information in the fashion images and texts, so FashionBERT, which leverages patches as image features, is proposed, which learns high level representations of texts and images.
Abstract: In this paper, we address the text and image matching in cross-modal retrieval of the fashion industry. Different from the matching in the general domain, the fashion matching is required to pay much more attention to the fine-grained information in the fashion images and texts. Pioneer approaches detect the region of interests (i.e., RoIs) from images and use the RoI embeddings as image representations. In general, RoIs tend to represent the "object-level" information in the fashion images, while fashion texts are prone to describe more detailed information, e.g. styles, attributes. RoIs are thus not fine-grained enough for fashion text and image matching. To this end, we propose FashionBERT, which leverages patches as image features. With the pre-trained BERT model as the backbone network, FashionBERT learns high level representations of texts and images. Meanwhile, we propose an adaptive loss to trade off multitask learning in the FashionBERT modeling. Two tasks (i.e., text and image matching and cross-modal retrieval) are incorporated to evaluate FashionBERT. On the public dataset, experiments demonstrate FashionBERT achieves significant improvements in performances than the baseline and state-of-the-art approaches. In practice, FashionBERT is applied in a concrete cross-modal retrieval application. We provide the detailed matching performance and inference efficiency analysis.

Proceedings ArticleDOI
25 Jul 2020
TL;DR: A novel framework NIA-GCN is proposed, which can explicitly model the relational information between neighbor nodes and exploit the heterogeneous nature of the user-item bipartite graph, and generalize to a commercial App store recommendation scenario.
Abstract: Personalized recommendation plays an important role in many online services. Substantial research has been dedicated to learning embeddings of users and items to predict a user's preference for an item based on the similarity of the representations. In many settings, there is abundant relationship information, including user-item interaction history, user-user and item-item similarities. In an attempt to exploit these relationships to learn better embeddings, researchers have turned to the emerging field of Graph Convolutional Neural Networks (GCNs), and applied GCNs for recommendation. Although these prior works have demonstrated promising performance, directly apply GCNs to process the user-item bipartite graph is suboptimal because the GCNs do not consider the intrinsic differences between user nodes and item nodes. Additionally, existing large-scale graph neural networks use aggregation functions such as sum/mean/max pooling operations to generate a node embedding that considers the nodes' neighborhood (i.e., the adjacent nodes in the graph), and these simple aggregation strategies fail to preserve the relational information in the neighborhood. To resolve the above limitations, in this paper, we propose a novel framework NIA-GCN, which can explicitly model the relational information between neighbor nodes and exploit the heterogeneous nature of the user-item bipartite graph. We conduct empirical studies on four public benchmarks, demonstrating a significant improvement over state-of-the-art approaches. Furthermore, we generalize our framework to a commercial App store recommendation scenario. We observe significant improvement on a large-scale commercial dataset, demonstrating the practical potential for our proposed solution as a key component of a large scale commercial recommender system. Furthermore, online experiments are conducted to demonstrate that NIA-GCN outperforms the baseline by 10.19% and 9.95% in average in terms of CTR and CVR during ten-day AB test in a mainstream App store.

Proceedings ArticleDOI
Chenyang Wang1, Min Zhang1, Weizhi Ma1, Yiqun Liu1, Shaoping Ma1 
25 Jul 2020
TL;DR: A novel method Chorus is proposed to take both item relations and corresponding temporal dynamics into consideration, which gains significant improvements compared to state-of-the-art baseline methods and can strengthen the explainability of recommendation.
Abstract: Traditional recommender systems mainly aim to model inherent and long-term user preference, while dynamic user demands are also of great importance. Typically, a historical consumption will have impacts on the user demands for its relational items. For instance, users tend to buy complementary items together (iPhone and Airpods) but not substitutive items (Powerbeats and Airpods), although substitutes of the bought one still cater to his/her preference. To better model the effects of history sequence, previous studies introduce the semantics of item relations to capture user demands for recommendation. However, we argue that the temporal evolution of the effects caused by different relations cannot be neglected. In the example above, user demands for headphones can be promoted after a long period when a new one is needed. To model dynamic meanings of an item in different sequence contexts, a novel method Chorus is proposed to take both item relations and corresponding temporal dynamics into consideration. Chorus aims to derive the embedding of target item in a knowledge-aware and time-aware way, where each item will get its basic representation and relation-related ones. Then, we devise temporal kernel functions to combine these representations dynamically, according to whether there are relational items in history sequence as well as the elapsed time. The enhanced target item embedding is flexible to work with various algorithms to calculate the ranking score and generate recommendations. According to extensive experiments in three real-world datasets, Chorus gains significant improvements compared to state-of-the-art baseline methods. Furthermore, the time-related parameters are highly interpretable and hence can strengthen the explainability of recommendation.

Proceedings ArticleDOI
25 Jul 2020
TL;DR: The REL system is presented, building on state-of-the-art neural components from natural language processing research, provided as a Python package as well as a web API and reports on an experimental comparison against both well-established systems and the current state of theart on standard entity linking benchmarks.
Abstract: Entity linking is a standard component in modern retrieval system that is often performed by third-party toolkits. Despite the plethora of open source options, it is difficult to find a single system that has a modular architecture where certain components may be replaced, does not depend on external sources, can easily be updated to newer Wikipedia versions, and, most important of all, has state-of-the-art performance. The REL system presented in this paper aims to fill that gap. Building on state-of-the-art neural components from natural language processing research, it is provided as a Python package as well as a web API. We also report on an experimental comparison against both well-established systems and the current state-of-the-art on standard entity linking benchmarks.

Proceedings ArticleDOI
25 Jul 2020
TL;DR: In this article, a learning-to-rank approach for explicitly enforcing merit-based fairness guarantees to groups of items (e.g. articles by the same publisher, tracks of the same artist) is presented.
Abstract: Rankings are the primary interface through which many online platforms match users to items (e.g. news, products, music, video). In these two-sided markets, not only the users draw utility from the rankings, but the rankings also determine the utility (e.g. exposure, revenue) for the item providers (e.g. publishers, sellers, artists, studios). It has already been noted that myopically optimizing utility to the users -- as done by virtually all learning-to-rank algorithms -- can be unfair to the item providers. We, therefore, present a learning-to-rank approach for explicitly enforcing merit-based fairness guarantees to groups of items (e.g. articles by the same publisher, tracks by the same artist). In particular, we propose a learning algorithm that ensures notions of amortized group fairness, while simultaneously learning the ranking function from implicit feedback data. The algorithm takes the form of a controller that integrates unbiased estimators for both fairness and utility, dynamically adapting both as more data becomes available. In addition to its rigorous theoretical foundation and convergence guarantees, we find empirically that the algorithm is highly practical and robust.

Proceedings ArticleDOI
25 Jul 2020
TL;DR: A Deep Contextualized Term Weighting framework (DeepCT) is proposed that maps the contextualized term representations from BERT to into context-aware term weights for passage retrieval.
Abstract: Term frequency is a common method for identifying the importance of a term in a document. But term frequency ignores how a term interacts with its text context, which is key to estimating document-specific term weights. This paper proposes a Deep Contextualized Term Weighting framework (DeepCT) that maps the contextualized term representations from BERT to into context-aware term weights for passage retrieval. The new, deep term weights can be stored in an ordinary inverted index for efficient retrieval. Experiments on two datasets demonstrate that DeepCT greatly improves the accuracy of first-stage passage retrieval algorithms.

Proceedings ArticleDOI
25 Jul 2020
TL;DR: A novel deep recommendation model named Elaborated Entire Space Supervised Multi-task Model (ESM2) is devised, which employs multi-task learning to predict some decomposed sub-targets in parallel and compose them sequentially to formulate the final CVR.
Abstract: Recommender system, as an essential part of modern e-commerce, consists of two fundamental modules, namely Click-Through Rate (CTR) and Conversion Rate (CVR) prediction. While CVR has a direct impact on the purchasing volume, its prediction is well-known challenging due to the Sample Selection Bias (SSB) and Data Sparsity (DS) issues. Although existing methods, typically built on the user sequential behavior path "impression->click->purchase", is effective for dealing with SSB issue, they still struggle to address the DS issue due to rare purchase training samples. Observing that users always take several purchase-related actions after clicking, we propose a novel idea of post-click behavior decomposition. Specifically, disjoint purchase-related Deterministic Action (DAction) and Other Action (OAction) are inserted between click and purchase in parallel, forming a novel user sequential behavior graph "impression->click->D(O)Action->purchase". Defining model on this graph enables to leverage all the impression samples over the entire space and extra abundant supervised signals from D(O)Action, which will effectively address the SSB and DS issues together. To this end, we devise a novel deep recommendation model named Elaborated Entire Space Supervised Multi-task Model (ESM2). According to the conditional probability rule defined on the graph, it employs multi-task learning to predict some decomposed sub-targets in parallel and compose them sequentially to formulate the final CVR. Extensive experiments on both offline and online environments demonstrate the superiority of ESM2 over state-of-the-art models. The source code and dataset will be released.

Proceedings ArticleDOI
25 Jul 2020
TL;DR: The key insight is that the satisfied recommendations triggered by the exploration recommendation can be viewed as the exploration bonus (delayed reward) for its contribution on improving the quality of the user profile.
Abstract: In this paper, we study collaborative filtering in an interactive setting, in which the recommender agents iterate between making recommendations and updating the user profile based on the interactive feedback. The most challenging problem in this scenario is how to suggest items when the user profile has not been well established, \ie recommend for cold-start users or warm-start users with taste drifting. Existing approaches either rely on overly pessimistic linear exploration strategy or adopt meta-learning based algorithms in a full exploitation way. In this work, to quickly catch up with the user's interests, we proposed to represent the exploration policy with a neural network and directly learn it from the feedback data. Specifically, the exploration policy is encoded in the weights of multi-channel stacked self-attention neural networks and trained with efficient Q-learning by maximizing users' overall satisfaction in the recommender systems. The key insight is that the satisfied recommendations triggered by the exploration recommendation can be viewed as the exploration bonus (delayed reward) for its contribution on improving the quality of the user profile. Therefore, the proposed exploration policy, to balance between learning the user profile and making accurate recommendations, can be directly optimized by maximizing users' long-term satisfaction with reinforcement learning. Extensive experiments and analysis conducted on three benchmark collaborative filtering datasets have demonstrated the advantage of our method over state-of-the-art methods.

Proceedings ArticleDOI
25 Jul 2020
TL;DR: This paper develops two methods, based on rules and self-supervised learning, to generate weak supervision data using large amounts of ad hoc search sessions, and to fine-tune GPT-2 to rewrite conversational queries.
Abstract: Conversational query rewriting aims to reformulate a concise conversational query to a fully specified, context-independent query that can be effectively handled by existing information retrieval systems. This paper presents a few-shot generative approach to conversational query rewriting. We develop two methods, based on rules and self-supervised learning, to generate weak supervision data using large amounts of ad hoc search sessions, and to fine-tune GPT-2 to rewrite conversational queries. On the TREC Conversational Assistance Track, our weakly supervised GPT-2 rewriter improves the state-of-the-art ranking accuracy by 12%, only using very limited amounts of manual query rewrites. In the zero-shot learning setting, the rewriter still gives a comparable result to previous state-of-the-art systems. Our analyses reveal that GPT-2 effectively picks up the task syntax and learns to capture context dependencies, even for hard cases that involve group references and long-turn dependencies.

Proceedings ArticleDOI
25 Jul 2020
TL;DR: This work formalizes the sequential recommendation task as a Markov Decision Process (MDP), and makes three major technical extensions in this framework, including state representation, reward function and learning algorithm, which is the first time that knowledge information has been explicitly discussed and utilized in RL-based sequential recommenders, especially for the exploration process.
Abstract: For sequential recommendation, it is essential to capture and predict future or long-term user preference for generating accurate recommendation over time. To improve the predictive capacity, we adopt reinforcement learning (RL) for developing effective sequential recommenders. However, user-item interaction data is likely to be sparse, complicated and time-varying. It is not easy to directly apply RL techniques to improve the performance of sequential recommendation. Inspired by the availability of knowledge graph (KG), we propose a novel Knowledge-guidEd Reinforcement Learning model (KERL for short) for fusing KG information into a RL framework for sequential recommendation. Specifically, we formalize the sequential recommendation task as a Markov Decision Process (MDP), and make three major technical extensions in this framework, including state representation, reward function and learning algorithm. First, we propose to enhance the state representations with KG information considering both exploitation and exploration. Second, we carefully design a composite reward function that is able to compute both sequence- and knowledge-level rewards. Third, we propose a new algorithm for more effectively learning the proposed model. To our knowledge, it is the first time that knowledge information has been explicitly discussed and utilized in RL-based sequential recommenders, especially for the exploration process. Extensive experiment results on both next-item and next-session recommendation tasks show that our model can significantly outperform the baselines on four real-world datasets.

Proceedings ArticleDOI
25 Jul 2020
TL;DR: This paper designs a demonstration-based knowledge graph reasoning framework for explainable recommendation and proposes an ADversarial Actor-Critic model for the demonstration-guided path finding.
Abstract: Knowledge graphs have been widely adopted to improve recommendation accuracy. The multi-hop user-item connections on knowledge graphs also endow reasoning about why an item is recommended. However, reasoning on paths is a complex combinatorial optimization problem. Traditional recommendation methods usually adopt brute-force methods to find feasible paths, which results in issues related to convergence and explainability. In this paper, we address these issues by better supervising the path finding process. The key idea is to extract imperfect path demonstrations with minimum labeling efforts and effectively leverage these demonstrations to guide path finding. In particular, we design a demonstration-based knowledge graph reasoning framework for explainable recommendation. We also propose an ADversarial Actor-Critic (ADAC) model for the demonstration-guided path finding. Experiments on three real-world benchmarks show that our method converges more quickly than the state-of-the-art baseline and achieves better recommendation accuracy and explainability.

Proceedings ArticleDOI
25 Jul 2020
TL;DR: This work investigates the potential of leveraging knowledge graph (KG) in dealing with issues of RL methods for IRS, which provides rich side information for recommendation decision making and makes use of the prior knowledge of the item correlation learned from KG to guide the candidate selection for better candidate item retrieval.
Abstract: Interactive recommender system (IRS) has drawn huge attention because of its flexible recommendation strategy and the consideration of optimal long-term user experiences. To deal with the dynamic user preference and optimize accumulative utilities, researchers have introduced reinforcement learning (RL) into IRS. However, RL methods share a common issue of sample efficiency, i.e., huge amount of interaction data is required to train an effective recommendation policy, which is caused by the sparse user responses and the large action space consisting of a large number of candidate items. Moreover, it is infeasible to collect much data with explorative policies in online environments, which will probably harm user experience. In this work, we investigate the potential of leveraging knowledge graph (KG) in dealing with these issues of RL methods for IRS, which provides rich side information for recommendation decision making. Instead of learning RL policies from scratch, we make use of the prior knowledge of the item correlation learned from KG to (i) guide the candidate selection for better candidate item retrieval, (ii) enrich the representation of items and user states, and (iii) propagate user preferences among the correlated items over KG to deal with the sparsity of user feedback. Comprehensive experiments have been conducted on two real-world datasets, which demonstrate the superiority of our approach with significant improvements against state-of-the-arts.

Proceedings ArticleDOI
25 Jul 2020
TL;DR: A novel hyperbolic metric embedding (HME) model is proposed, which projects the check-in data into aHyperbolic space, which can effectively capture the underlying hierarchical structures, which are implied by the power-law distributions of user movements.
Abstract: With the increasing popularity of location-aware social media services, next-Point-of-Interest (POI) recommendation has gained significant research interest. The key challenge of next-POI recommendation is to precisely learn users' sequential movements from sparse check-in data. To this end, various embedding methods have been proposed to learn the representations of check-in data in the Euclidean space. However, their ability to learn complex patterns, especially hierarchical structures, is limited by the dimensionality of the Euclidean space. To this end, we propose a new research direction that aims to learn the representations of check-in activities in a hyperbolic space, which yields two advantages. First, it can effectively capture the underlying hierarchical structures, which are implied by the power-law distributions of user movements. Second, it provides high representative strength and enables the check-in data to be effectively represented in a low-dimensional space. Specifically, to solve the next-POI recommendation task, we propose a novel hyperbolic metric embedding (HME) model, which projects the check-in data into a hyperbolic space. The HME jointly captures sequential transition, user preference, category and region information in a unified approach by learning embeddings in a shared hyperbolic space. To the best of our knowledge, this is the first study to explore a non-Euclidean embedding model for next-POI recommendation. We conduct extensive experiments on three check-in datasets to demonstrate the superiority of our hyperbolic embedding approach over the state-of-the-art next-POI recommendation algorithms. Moreover, we conduct experiments on another four online transaction datasets for next-item recommendation to further demonstrate the generality of our proposed model.

Proceedings ArticleDOI
Yao Ma1, Ziyi Guo, Zhaocun Ren, Jiliang Tang1, Dawei Yin2 
25 Jul 2020
TL;DR: In this paper, the authors proposed DyGNN, a dynamic graph neural network model, which can model the dynamic information as the graph evolving by keeping updating node information by capturing the sequential information of edges (interactions), the time intervals between edges and information propagation coherently.
Abstract: Graphs are used to model pairwise relations between entities in many real-world scenarios such as social networks. Graph Neural Networks(GNNs) have shown their superior ability in learning representations for graph structured data, which leads to performance improvements in many graph related tasks such as link prediction, node classification and graph classification. Most of the existing graph neural networks models are designed for static graphs while many real-world graphs are inherently dynamic with new nodes and edges constantly emerging. Existing graph neural network models cannot utilize the dynamic information, which has been shown to enhance the performance of many graph analytic tasks such as community detection. Hence, in this paper, we propose DyGNN, a Dynamic Graph Neural Network model, which can model the dynamic information as the graph evolving. In particular, the proposed framework keeps updating node information by capturing the sequential information of edges (interactions), the time intervals between edges and information propagation coherently. Experimental results on various dynamic graphs demonstrate the effectiveness of the proposed framework.

Proceedings ArticleDOI
25 Jul 2020
TL;DR: A local self-attention which considers a moving window over the document terms and for each term attends only to other terms in the same window resulting in increased retrieval of longer documents at moderate increase in compute and memory costs is proposed.
Abstract: Neural networks, particularly Transformer-based architectures, have achieved significant performance improvements on several retrieval benchmarks. When the items being retrieved are documents, the time and memory cost of employing Transformers over a full sequence of document terms can be prohibitive. A popular strategy involves considering only the first n terms of the document. This can, however, result in a biased system that under retrieves longer documents. In this work, we propose a local self-attention which considers a moving window over the document terms and for each term attends only to other terms in the same window. This local attention incurs a fraction of the compute and memory cost of attention over the whole document. The windowed approach also leads to more compact packing of padded documents in minibatches resulting in additional savings. We also employ a learned saturation function and a two-staged pooling strategy to identify relevant regions of the document. The Transformer-Kernel pooling model with these changes can efficiently elicit relevance information from documents with thousands of tokens. We benchmark our proposed modifications on the document ranking task from the TREC 2019 Deep Learning track and observe significant improvements in retrieval quality as well as increased retrieval of longer documents at moderate increase in compute and memory costs.

Proceedings ArticleDOI
Jiarui Qin1, Weinan Zhang1, Xin Wu1, Jiarui Jin1, Yuchen Fang1, Yong Yu1 
25 Jul 2020
TL;DR: In UBR4CTR, the most relevant and appropriate user behaviors will be firstly retrieved from the entire user history sequence using a learnable search method and fed into a deep model to make the final prediction instead of simply using the most recent ones.
Abstract: Click-through rate (CTR) prediction plays a key role in modern online personalization services. In practice, it is necessary to capture user's drifting interests by modeling sequential user behaviors to build an accurate CTR prediction model. However, as the users accumulate more and more behavioral data on the platforms, it becomes non-trivial for the sequential models to make use of the whole behavior history of each user. First, directly feeding the long behavior sequence will make online inference time and system load infeasible. Second, there is much noise in such long histories to fail the sequential model learning. The current industrial solutions mainly truncate the sequences and just feed recent behaviors to the prediction model, which leads to a problem that sequential patterns such as periodicity or long-term dependency are not embedded in the recent several behaviors but in far back history. To tackle these issues, in this paper we consider it from the data perspective instead of just designing more sophisticated yet complicated models and propose User Behavior Retrieval for CTR prediction (UBR4CTR) framework. In UBR4CTR, the most relevant and appropriate user behaviors will be firstly retrieved from the entire user history sequence using a learnable search method. These retrieved behaviors are then fed into a deep model to make the final prediction instead of simply using the most recent ones. It is highly feasible to deploy UBR4CTR into industrial model pipeline with low cost. Experiments on three real-world large-scale datasets demonstrate the superiority and efficacy of our proposed framework and models.