scispace - formally typeset
Search or ask a question
Author

Xikun Zhang

Other affiliations: Stanford University
Bio: Xikun Zhang is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Physics & Embedding. The author has an hindex of 7, co-authored 9 publications receiving 307 citations. Previous affiliations of Xikun Zhang include Stanford University.

Papers
More filters
Proceedings ArticleDOI
TL;DR: JODIE as mentioned in this paper employs two recurrent neural networks to update the embedding of a user and an item at every interaction, which can be used to predict future user-item interactions.
Abstract: Modeling sequential interactions between users and items/products is crucial in domains such as e-commerce, social networking, and education. Representation learning presents an attractive opportunity to model the dynamic evolution of users and items, where each user/item can be embedded in a Euclidean space and its evolution can be modeled by an embedding trajectory in this space. However, existing dynamic embedding methods generate embeddings only when users take actions and do not explicitly model the future trajectory of the user/item in the embedding space. Here we propose JODIE, a coupled recurrent neural network model that learns the embedding trajectories of users and items. JODIE employs two recurrent neural networks to update the embedding of a user and an item at every interaction. Crucially, JODIE also models the future embedding trajectory of a user/item. To this end, it introduces a novel projection operator that learns to estimate the embedding of the user at any time in the future. These estimated embeddings are then used to predict future user-item interactions. To make the method scalable, we develop a t-Batch algorithm that creates time-consistent batches and leads to 9x faster training. We conduct six experiments to validate JODIE on two prediction tasks---future interaction prediction and state change prediction---using four real-world datasets. We show that JODIE outperforms six state-of-the-art algorithms in these tasks by at least 20% in predicting future interactions and 12% in state change prediction.

297 citations

Proceedings ArticleDOI
25 Jul 2019
TL;DR: JODIE is proposed, a coupled recurrent neural network model that learns the embedding trajectories of users and items that outperforms six state-of-the-art algorithms in predicting future interactions and 12% in state change prediction.
Abstract: Modeling sequential interactions between users and items/products is crucial in domains such as e-commerce, social networking, and education. Representation learning presents an attractive opportunity to model the dynamic evolution of users and items, where each user/item can be embedded in a Euclidean space and its evolution can be modeled by an embedding trajectory in this space. However, existing dynamic embedding methods generate embeddings only when users take actions and do not explicitly model the future trajectory of the user/item in the embedding space. Here we propose JODIE, a coupled recurrent neural network model that learns the embedding trajectories of users and items. JODIE employs two recurrent neural networks to update the embedding of a user and an item at every interaction. Crucially, JODIE also models the future embedding trajectory of a user/item. To this end, it introduces a novel projection operator that learns to estimate the embedding of the user at any time in the future. These estimated embeddings are then used to predict future user-item interactions. To make the method scalable, we develop a t-Batch algorithm that creates time-consistent batches and leads to 9x faster training. We conduct six experiments to validate JODIE on two prediction tasks---future interaction prediction and state change prediction---using four real-world datasets. We show that JODIE outperforms six state-of-the-art algorithms in these tasks by at least 20% in predicting future interactions and 12% in state change prediction.

227 citations

Posted Content
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ B. Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri S. Chatterji, Annie Chen, Kathleen Creel, Jared Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel1, Noah D. Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Ahmad Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf H. Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Yang Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang 
TL;DR: The authors provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e. g.g. model architectures, training procedures, data, systems, security, evaluation, theory) to their applications.
Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

76 citations

Journal ArticleDOI
TL;DR: In this article , a hole-based double quantum dot in a germanium hut wire (GHW) was demonstrated to achieve a Rabi frequency exceeding 540 MHz at a magnetic field of 100 mT, setting a record for ultrafast spin qubit control.
Abstract: Abstract Operation speed and coherence time are two core measures for the viability of a qubit. Strong spin-orbit interaction (SOI) and relatively weak hyperfine interaction make holes in germanium (Ge) intriguing candidates for spin qubits with rapid, all-electrical coherent control. Here we report ultrafast single-spin manipulation in a hole-based double quantum dot in a germanium hut wire (GHW). Mediated by the strong SOI, a Rabi frequency exceeding 540 MHz is observed at a magnetic field of 100 mT, setting a record for ultrafast spin qubit control in semiconductor systems. We demonstrate that the strong SOI of heavy holes (HHs) in our GHW, characterized by a very short spin-orbit length of 1.5 nm, enables the rapid gate operations we accomplish. Our results demonstrate the potential of ultrafast coherent control of hole spin qubits to meet the requirement of DiVincenzo’s criteria for a scalable quantum information processor.

39 citations

Posted Content
TL;DR: This work identifies contextual information in pre-training and numeracy as two key factors affecting their performance, and shows that a simple method of canonicalizing numbers can have a significant effect on the results.
Abstract: Pretrained Language Models (LMs) have been shown to possess significant linguistic, common sense, and factual knowledge. One form of knowledge that has not been studied yet in this context is information about the scalar magnitudes of objects. We show that pretrained language models capture a significant amount of this information but are short of the capability required for general common-sense reasoning. We identify contextual information in pre-training and numeracy as two key factors affecting their performance and show that a simple method of canonicalizing numbers can have a significant effect on the results.

34 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This survey serves tofacilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG by providing a broad overview of the research progress and challenges in the hallucination problem inNLG.
Abstract: Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation, and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions, and (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, and machine translation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.

314 citations

Journal ArticleDOI
30 Jan 2020
TL;DR: The authors examined and analyzed the linguistic and psychological features of political discourse using a computer-based Linguistic Inquiry and Word Count (LIWC) content analysis program to explore the relationship between political discourse and the personality of politicians.
Abstract: The article examines and analyzes the linguistic and psychological features of political discourse using a computer-based Linguistic Inquiry and Word Count (LIWC) content analysis program to explore the relationship between political discourse and the personality of politicians. As for political discourse, it is perhaps the communicator, the linguistic personality, who plays the most important role in the communication. The linguistic personality of a politician is of particular interest in political discourse content-analysis, since it has the greatest influence on the public consciousness via mass media. Using text as a source of psychological and cognitive information has been gaining popularity. Researchers use a variety of methods to analyze texts, but Linguistic Inquiry Word Count (LIWC) has proved to be the most common technique. The analysis of linguistic patterns of political discourse shows that in the context of political speech events such as media interviews, politicians make a unique choice of lexical units, which can be interpreted as a manifestation of certain personality traits. However, despite the significance of the results, there are clear limitations to the use of computerized methodologies to make political discourse content-analysis, such as the limited interpretive capacity of software to understand pragmatic and contextual use of lexical units.

286 citations

Posted Content
TL;DR: This paper presents Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events that significantly outperform previous approaches being at the same time more computationally efficient.
Abstract: Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (eg evolving features or connectivity over time) In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs

238 citations

Posted Content
TL;DR: This paper provides a model-agnostic vector representation for time, called Time2Vec, that can be easily imported into many existing and future architectures and improve their performances.
Abstract: Time is an important feature in many applications involving events that occur synchronously and/or asynchronously. To effectively consume time information, recent studies have focused on designing new architectures. In this paper, we take an orthogonal but complementary approach by providing a model-agnostic vector representation for time, called Time2Vec, that can be easily imported into many existing and future architectures and improve their performances. We show on a range of models and problems that replacing the notion of time with its Time2Vec representation improves the performance of the final model.

147 citations

Journal ArticleDOI
TL;DR: This work establishes a foundation of dynamic networks with consistent, detailed terminology and notation and presents a comprehensive survey of dynamic graph neural network models using the proposed terminology.
Abstract: Dynamic networks are used in a wide range of fields, including social network analysis, recommender systems and epidemiology. Representing complex networks as structures changing over time allow network models to leverage not only structural but also temporal patterns. However, as dynamic network literature stems from diverse fields and makes use of inconsistent terminology, it is challenging to navigate. Meanwhile, graph neural networks (GNNs) have gained a lot of attention in recent years for their ability to perform well on a range of network science tasks, such as link prediction and node classification. Despite the popularity of graph neural networks and the proven benefits of dynamic network models, there has been little focus on graph neural networks for dynamic networks. To address the challenges resulting from the fact that this research crosses diverse fields as well as to survey dynamic graph neural networks, this work is split into two main parts. First, to address the ambiguity of the dynamic network terminology we establish a foundation of dynamic networks with consistent, detailed terminology and notation. Second, we present a comprehensive survey of dynamic graph neural network models using the proposed terminology.

144 citations