Home
/
Authors
/
Weixin Liang

Author

Weixin Liang

Bio: Weixin Liang is an academic researcher from Stanford University. The author has contributed to research in topics: Computer science & Dialog box. The author has an hindex of 7, co-authored 19 publications receiving 138 citations. Previous affiliations of Weixin Liang include Zhejiang University.

Topics: Computer science, Dialog box, Natural language, Dialog system, Cache ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Advances, challenges and opportunities in creating data for trustworthy AI

[...]

Weixin Liang, Girmaw Abebe Tadesse, Daniel Ho, Li Fei-Fei, Matei Zaharia, Ce Zhang, James Zou - Show less +3 more

01 Aug 2022-Nature Machine Intelligence

TL;DR: This Perspective discusses key considerations for each stage of the data-for-AI pipeline—starting from data design to data sculpting (for example, cleaning, valuation and annotation) and data evaluation—to make AI more reliable.

...read moreread less

65 citations

Proceedings Article•

Improving Out-of-Distribution Robustness via Selective Augmentation

[...]

Huaxiu Yao, Yu Wang, Sai Li, Linjun Zhang, Weixin Liang, James Zou, Chelsea Finn - Show less +3 more

02 Jan 2022

TL;DR: The effectiveness of LISA is studied, it is shown that LISA consistently outperforms other state-of-the-art methods and leads to more invariant predictors, and a linear setting is analyzed to theoretically show how LISA leads to a smaller worst-group error.

...read moreread less

Abstract: Machine learning algorithms typically assume that training and test examples are drawn from the same distribution. However, distribution shift is a common problem in real-world applications and can cause models to perform dramatically worse at test time. In this paper, we specifically consider the problems of subpopulation shifts (e.g., imbalanced data) and domain shifts. While prior works often seek to explicitly regularize internal representations or predictors of the model to be domain invariant, we instead aim to learn invariant predictors without restricting the model's internal representations or predictors. This leads to a simple mixup-based technique which learns invariant predictors via selective augmentation called LISA. LISA selectively interpolates samples either with the same labels but different domains or with the same domain but different labels. Empirically, we study the effectiveness of LISA on nine benchmarks ranging from subpopulation shifts to domain shifts, and we find that LISA consistently outperforms other state-of-the-art methods and leads to more invariant predictors. We further analyze a linear setting and theoretically show how LISA leads to a smaller worst-group error.

...read moreread less

60 citations

Proceedings Article•DOI•

Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning

[...]

Weixin Liang, Yuhui Zhang, Yongchan Kwon, Serena Yeung, James Zou - Show less +1 more

03 Mar 2022

TL;DR: Modality gap, an intriguing geometric phenomenon of the representation space of multi-modal models, is presented and it is demonstrated that varying the modality gap distance has a signiﬁcant impact in improving the model’s downstream zero-shot classi-cation performance and fairness.

...read moreread less

Abstract: We present modality gap, an intriguing geometric phenomenon of the representation space of multi-modal models. Specifically, we show that different data modalities (e.g. images and text) are embedded at arm's length in their shared representation in multi-modal models such as CLIP. Our systematic analysis demonstrates that this gap is caused by a combination of model initialization and contrastive learning optimization. In model initialization, we show empirically and theoretically that the representation of a common deep neural network is restricted to a narrow cone. As a consequence, in a multi-modal model with two encoders, the representations of the two modalities are clearly apart when the model is initialized. During optimization, contrastive learning keeps the different modalities separate by a certain distance, which is influenced by the temperature parameter in the loss function. Our experiments further demonstrate that varying the modality gap distance has a significant impact in improving the model's downstream zero-shot classification performance and fairness. Our code and data are available at https://modalitygap.readthedocs.io/

...read moreread less

59 citations

Journal Article•DOI•

MOSS: End-to-End Dialog System Framework with Modular Supervision

[...]

Weixin Liang¹, Youzhi Tian¹, Chengcai Chen, Zhou Yu²•Institutions (2)

Zhejiang University¹, University of California, Davis²

03 Apr 2020

TL;DR: In this paper, an encoder-decoder training framework that could incorporate supervision from various intermediate dialog system modules including natural language understanding, dialog state tracking, dialog policy learning and natural language generation was proposed.

...read moreread less

Abstract: A major bottleneck in training end-to-end task-oriented dialog system is the lack of data. To utilize limited training data more efficiently, we propose Modular Supervision Network (MOSS), an encoder-decoder training framework that could incorporate supervision from various intermediate dialog system modules including natural language understanding, dialog state tracking, dialog policy learning and natural language generation. With only 60% of the training data, MOSS-all (i.e., MOSS with supervision from all four dialog modules) outperforms state-of-the-art models on CamRest676. Moreover, introducing modular supervision has even bigger benefits when the dialog task has a more complex dialog state and action space. With only 40% of the training data, MOSS-all outperforms the state-of-the-art model on a complex laptop network trouble shooting dataset, LaptopNetwork, that we introduced. LaptopNetwork consists of conversations between real customers and customer service agents in Chinese. Moreover, MOSS framework can accommodate dialogs that have supervision from different dialog modules at both framework level and model level. Therefore, MOSS is extremely flexible to update in real-world deployment.

...read moreread less

54 citations

Proceedings Article•DOI•

DeepStore: In-Storage Acceleration for Intelligent Queries

[...]

Vikram Sharma Mailthody¹, Zaid Qureshi¹, Weixin Liang², Ziyan Feng¹, Simon Garcia de Gonzalo¹, Youjie Li¹, Hubertus Franke³, Jinjun Xiong³, Jian Huang¹, Wen-mei W. Hwu¹ - Show less +6 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, Stanford University², IBM³

12 Oct 2019

TL;DR: DeepStore is presented, an in-storage accelerator architecture for intelligent queries designed specifically for supporting DNN-based intelligent queries, under the resource constraints in modern SSD controllers and exploits SSD parallelisms with design space exploration for achieving the maximal energy efficiency for in- storage accelerators.

...read moreread less

Abstract: Recent advancements in deep learning techniques facilitate intelligent-query support in diverse applications, such as content-based image retrieval and audio texturing. Unlike conventional key-based queries, these intelligent queries lack efficient indexing and require complex compute operations for feature matching. To achieve high-performance intelligent querying against massive datasets, modern computing systems employ GPUs in-conjunction with solid-state drives (SSDs) for fast data access and parallel data processing. However, our characterization with various intelligent-query workloads developed with deep neural networks (DNNs), shows that the storage I/O bandwidth is still the major bottleneck that contributes 56%--90% of the query execution time. To this end, we present DeepStore, an in-storage accelerator architecture for intelligent queries. It consists of (1) energy-efficient in-storage accelerators designed specifically for supporting DNN-based intelligent queries, under the resource constraints in modern SSD controllers; (2) a similarity-based in-storage query cache to exploit the temporal locality of user queries for further performance improvement; and (3) a lightweight in-storage runtime system working as the query engine, which provides a simple software abstraction to support different types of intelligent queries. DeepStore exploits SSD parallelisms with design space exploration for achieving the maximal energy efficiency for in-storage accelerators. We validate DeepStore design with an SSD simulator, and evaluate it with a variety of vision, text, and audio based intelligent queries. Compared with the state-of-the-art GPU+SSD approach, DeepStore improves the query performance by up to 17.7×, and energy-efficiency by up to 78.6×.

...read moreread less

45 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Posted Content•

Learning Transferable Visual Models From Natural Language Supervision

[...]

Alec Radford¹, Jong Wook Kim¹, Chris Hallacy¹, Aditya Ramesh¹, Gabriel Goh¹, Sandhini Agarwal¹, Girish Sastry¹, Amanda Askell, Pamela Mishkin¹, Jack Clark¹, Gretchen Krueger¹, Ilya Sutskever¹ - Show less +8 more•Institutions (1)

OpenAI¹

26 Feb 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a pre-training task of predicting which caption goes with which image is used to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet.

...read moreread less

Abstract: State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at this https URL.

...read moreread less

403 citations

Posted Content•

A Simple Language Model for Task-Oriented Dialogue

[...]

Ehsan Hosseini-Asl¹, Bryan McCann¹, Chien-Sheng Wu¹, Semih Yavuz¹, Richard Socher¹ - Show less +1 more•Institutions (1)

Salesforce.com¹

02 May 2020-arXiv: Computation and Language

TL;DR: SimpleTOD is a simple approach to task-oriented dialogue that uses a single causal language model trained on all sub-tasks recast as a single sequence prediction problem, which allows it to fully leverage transfer learning from pre-trained, open domain, causal language models such as GPT-2.

...read moreread less

Abstract: Task-oriented dialogue is often decomposed into three tasks: understanding user input, deciding actions, and generating a response. While such decomposition might suggest a dedicated model for each sub-task, we find a simple, unified approach leads to state-of-the-art performance on the MultiWOZ dataset. SimpleTOD is a simple approach to task-oriented dialogue that uses a single causal language model trained on all sub-tasks recast as a single sequence prediction problem. This allows SimpleTOD to fully leverage transfer learning from pre-trained, open domain, causal language models such as GPT-2. SimpleTOD improves over the prior state-of-the-art by 0.49 points in joint goal accuracy for dialogue state tracking. More impressively, SimpleTOD also improves the main metrics used to evaluate action decisions and response generation in an end-to-end setting for task-oriented dialog systems: inform rate by 8.1 points, success rate by 9.7 points, and combined score by 7.2 points.

...read moreread less

313 citations

Posted Content•

A Survey of Evaluation Metrics Used for NLG Systems

[...]

Ananya B. Sai, Akash Kumar Mohankumar, Mitesh M. Khapra

27 Aug 2020-arXiv: Computation and Language

TL;DR: This survey of automatic evaluation metrics for evaluating Natural Language Generation (NLG) systems highlights the challenges, proposes a coherent taxonomy for organising existing evaluation metrics, and briefly describes different existing metrics.

...read moreread less

Abstract: The success of Deep Learning has created a surge in interest in a wide a range of Natural Language Generation (NLG) tasks. Deep Learning has not only pushed the state of the art in several existing NLG tasks but has also facilitated researchers to explore various newer NLG tasks such as image captioning. Such rapid progress in NLG has necessitated the development of accurate automatic evaluation metrics that would allow us to track the progress in the field of NLG. However, unlike classification tasks, automatically evaluating NLG systems in itself is a huge challenge. Several works have shown that early heuristic-based metrics such as BLEU, ROUGE are inadequate for capturing the nuances in the different NLG tasks. The expanding number of NLG models and the shortcomings of the current metrics has led to a rapid surge in the number of evaluation metrics proposed since 2014. Moreover, various evaluation metrics have shifted from using pre-determined heuristic-based formulae to trained transformer models. This rapid change in a relatively short time has led to the need for a survey of the existing NLG metrics to help existing and new researchers to quickly come up to speed with the developments that have happened in NLG evaluation in the last few years. Through this survey, we first wish to highlight the challenges and difficulties in automatically evaluating NLG systems. Then, we provide a coherent taxonomy of the evaluation metrics to organize the existing metrics and to better understand the developments in the field. We also describe the different metrics in detail and highlight their key contributions. Later, we discuss the main shortcomings identified in the existing metrics and describe the methodology used to evaluate evaluation metrics. Finally, we discuss our suggestions and recommendations on the next steps forward to improve the automatic evaluation metrics.

...read moreread less

96 citations

Posted Content•

UBAR: Towards Fully End-to-End Task-Oriented Dialog Systems with GPT-2

[...]

Yunyi Yang, Yunhao Li¹, Xiaojun Quan¹•Institutions (1)

Sun Yat-sen University¹

07 Dec 2020-arXiv: Computation and Language

TL;DR: Thorough analyses demonstrate that the session-level training sequence formulation and the generated dialog context are essential for UBAR to operate as a fully end-to-end task-oriented dialog system in real life.

...read moreread less

Abstract: This paper presents our task-oriented dialog system UBAR which models task-oriented dialogs on a dialog session level. Specifically, UBAR is acquired by fine-tuning the large pre-trained unidirectional language model GPT-2 on the sequence of the entire dialog session which is composed of user utterance, belief state, database result, system act, and system response of every dialog turn. Additionally, UBAR is evaluated in a more realistic setting, where its dialog context has access to user utterances and all content it generated such as belief states, system acts, and system responses. Experimental results on the MultiWOZ datasets show that UBAR achieves state-of-the-art performances in multiple settings, improving the combined score of response generation, policy optimization, and end-to-end modeling by 4.7, 3.5, and 9.4 points respectively. Thorough analyses demonstrate that the session-level training sequence formulation and the generated dialog context are essential for UBAR to operate as a fully end-to-end task-oriented dialog system in real life. We also examine the transfer ability of UBAR to new domains with limited data and provide visualization and a case study to illustrate the advantages of UBAR in modeling on a dialog session level.

...read moreread less

85 citations

Proceedings Article•DOI•

DSAGEN: synthesizing programmable spatial accelerators

[...]

Jian Weng¹, Sihao Liu¹, Vidushi Dadu¹, Zhengrong Wang¹, Preyas Shah, Tony Nowatzki¹ - Show less +2 more•Institutions (1)

University of California, Los Angeles¹

30 May 2020

TL;DR: The insight is that many prior accelerator architectures can be approximated by composing a small number of hardware primitives, specifically those from spatial architectures, which is used to develop the DSAGEN framework, which automates the hardware/software co-design process for reconfigurable accelerators.

...read moreread less

Abstract: Domain-specific hardware accelerators can provide orders of magnitude speedup and energy efficiency over general purpose processors. However, they require extensive manual effort in hardware design and software stack development. Automated ASIC generation (eg. HLS) can be insufficient, because the hardware becomes inflexible. An ideal accelerator generation framework would be automatable, enable deep specialization to the domain, and maintain a uniform programming interface. Our insight is that many prior accelerator architectures can be approximated by composing a small number of hardware primitives, specifically those from spatial architectures. With careful design, a compiler can understand how to use available primitives, with modular and composable transformations, to take advantage of the features of a given program. This suggests a paradigm where accelerators can be generated by searching within such a rich accelerator design space, guided by the affinity of input programs for hardware primitives and their interactions. We use this approach to develop the DSAGEN framework, which automates the hardware/software co-design process for reconfigurable accelerators. For several existing accelerators, our evaluation demonstrates that the compiler can achieve 89% of the performance of manually tuned versions. For automated design space exploration, we target multiple sets of workloads which prior accelerators are design for; the generated hardware has mean 1.3 x perf2/mm2 over prior programmable accelerators.

...read moreread less

73 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95

Collapse