Top 43 papers published by Kai-Wei Chang from University of California, Los Angeles in 2020

Proceedings Article•DOI•

A Transformer-based Approach for Source Code Summarization

[...]

Wasi Uddin Ahmad¹, Saikat Chakraborty², Baishakhi Ray², Kai-Wei Chang¹•Institutions (2)

University of California, Los Angeles¹, Columbia University²

01 Jul 2020

TL;DR: This work explores the Transformer model that uses a self-attention mechanism and has shown to be effective in capturing long-range dependencies in source code summarization, and shows that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.

...read moreread less

Abstract: Generating a readable summary that describes the functionality of a program is known as source code summarization. In this task, learning code representation by modeling the pairwise relationship between code tokens to capture their long-range dependencies is crucial. To learn code representation for summarization, we explore the Transformer model that uses a self-attention mechanism and has shown to be effective in capturing long-range dependencies. In this work, we show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin. We perform extensive analysis and ablation studies that reveal several important findings, e.g., the absolute encoding of source code tokens' position hinders, while relative encoding significantly improves the summarization performance. We have made our code publicly available (https://github.com/wasiahmad/NeuralCodeSum) to facilitate future research.

...read moreread less

229 citations

Posted Content•

GPT-GNN: Generative Pre-Training of Graph Neural Networks

[...]

Ziniu Hu¹, Yuxiao Dong², Kuansan Wang², Kai-Wei Chang¹, Yizhou Sun¹ - Show less +1 more•Institutions (2)

University of California, Los Angeles¹, Microsoft²

27 Jun 2020-arXiv: Learning

TL;DR: The GPT-GNN framework to initialize GNNs by generative pre-training introduces a self-supervised attributed graph generation task to pre-train a GNN so that it can capture the structural and semantic properties of the graph.

...read moreread less

Abstract: Graph neural networks (GNNs) have been demonstrated to be powerful in modeling graph-structured data However, training GNNs usually requires abundant task-specific labeled data, which is often arduously expensive to obtain One effective way to reduce the labeling effort is to pre-train an expressive GNN model on unlabeled data with self-supervision and then transfer the learned model to downstream tasks with only a few labels In this paper, we present the GPT-GNN framework to initialize GNNs by generative pre-training GPT-GNN introduces a self-supervised attributed graph generation task to pre-train a GNN so that it can capture the structural and semantic properties of the graph We factorize the likelihood of the graph generation into two components: 1) Attribute Generation and 2) Edge Generation By modeling both components, GPT-GNN captures the inherent dependency between node attributes and graph structure during the generative process Comprehensive experiments on the billion-scale Open Academic Graph and Amazon recommendation data demonstrate that GPT-GNN significantly outperforms state-of-the-art GNN models without pre-training by up to 91% across various downstream tasks

...read moreread less

183 citations

Proceedings Article•DOI•

GPT-GNN: Generative Pre-Training of Graph Neural Networks

[...]

Ziniu Hu¹, Yuxiao Dong², Kuansan Wang², Kai-Wei Chang¹, Yizhou Sun¹ - Show less +1 more•Institutions (2)

University of California, Los Angeles¹, Microsoft²

23 Aug 2020

TL;DR: GPT-GNN as discussed by the authors introduces a self-supervised attributed graph generation task to pre-train a GNN so that it can capture the structural and semantic properties of the graph.

...read moreread less

Abstract: Graph neural networks (GNNs) have been demonstrated to be powerful in modeling graph-structured data. However, training GNNs requires abundant task-specific labeled data, which is often arduously expensive to obtain. One effective way to reduce the labeling effort is to pre-train an expressive GNN model on unlabelled data with self-supervision and then transfer the learned model to downstream tasks with only a few labels. In this paper, we present the GPT-GNN framework to initialize GNNs by generative pre-training. GPT-GNN introduces a self-supervised attributed graph generation task to pre-train a GNN so that it can capture the structural and semantic properties of the graph. We factorize the likelihood of graph generation into two components: 1) attribute generation and 2) edge generation. By modeling both components, GPT-GNN captures the inherent dependency between node attributes and graph structure during the generative process. Comprehensive experiments on the billion-scale open academic graph and Amazon recommendation data demonstrate that GPT-GNN significantly outperforms state-of-the-art GNN models without pre-training by up to 9.1% across various downstream tasks?

...read moreread less

153 citations

Proceedings Article•

Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond

[...]

Kaidi Xu¹, Zhouxing Shi², Huan Zhang³, Yihan Wang², Kai-Wei Chang³, Minlie Huang², Bhavya Kailkhura⁴, Xue Lin¹, Cho-Jui Hsieh³ - Show less +5 more•Institutions (4)

Northeastern University¹, Tsinghua University², University of California, Los Angeles³, ETH Zurich⁴

01 Jan 2020

TL;DR: This work develops an automatic framework to enable perturbation analysis on any neural network structures, by generalizing existing LiRPA algorithms such as CROWN to operate on general computational graphs and yields an open-source library for the community to applyLiRPA to areas beyond certified defense without much LiR PA expertise.

...read moreread less

Abstract: Linear relaxation based perturbation analysis (LiRPA) for neural networks, which computes provable linear bounds of output neurons given a certain amount of input perturbation, has become a core component in robustness verification and certified defense. The majority of LiRPA-based methods focus on simple feed-forward networks and need particular manual derivations and implementations when extended to other architectures. In this paper, we develop an automatic framework to enable perturbation analysis on any neural network structures, by generalizing existing LiRPA algorithms such as CROWN to operate on general computational graphs. The flexibility, differentiability and ease of use of our framework allow us to obtain state-of-the-art results on LiRPA based certified defense on fairly complicated networks like DenseNet, ResNeXt and Transformer that are not supported by prior works. Our framework also enables loss fusion, a technique that significantly reduces the computational complexity of LiRPA for certified defense. For the first time, we demonstrate LiRPA based certified defense on Tiny ImageNet and Downscaled ImageNet where previous approaches cannot scale to due to the relatively large number of classes. Our work also yields an open-source library for the community to apply LiRPA to areas beyond certified defense without much LiRPA expertise, e.g., we create a neural network with a probably flat optimization landscape by applying LiRPA to network parameters. Our opensource library is available at this https URL.

...read moreread less

125 citations

Proceedings Article•DOI•

What Does BERT with Vision Look At

[...]

Liunian Harold Li¹, Mark Yatskar², Da Yin¹, Cho-Jui Hsieh³, Kai-Wei Chang³ - Show less +1 more•Institutions (3)

Peking University¹, Allen Institute for Artificial Intelligence², University of California, Los Angeles³

01 Jul 2020

TL;DR: It is demonstrated that certain attention heads of a visually grounded language model actively ground elements of language to image regions, performing the task known as entity grounding.

...read moreread less

Abstract: Pre-trained visually grounded language models such as ViLBERT, LXMERT, and UNITER have achieved significant performance improvement on vision-and-language tasks but what they learn during pre-training remains unclear. In this work, we demonstrate that certain attention heads of a visually grounded language model actively ground elements of language to image regions. Specifically, some heads can map entities to image regions, performing the task known as entity grounding. Some heads can even detect the syntactic relations between non-entity words and image regions, tracking, for example, associations between verbs and regions corresponding to their arguments. We denote this ability as syntactic grounding. We verify grounding both quantitatively and qualitatively, using Flickr30K Entities as a testbed.

...read moreread less

93 citations

Posted Content•

Towards Controllable Biases in Language Generation

[...]

Emily Sheng¹, Kai-Wei Chang¹, Premkumar Natarajan², Nanyun Peng²•Institutions (2)

Information Sciences Institute¹, University of California, Los Angeles²

01 May 2020-arXiv: Computation and Language

TL;DR: This paper developed a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups, and analyzed two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics.

...read moreread less

Abstract: We present a general approach towards controllable societal biases in natural language generation (NLG). Building upon the idea of adversarial triggers, we develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups. We then analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics. The former scenario enables us to detect the types of biases present in the model. Specifically, we show the effectiveness of our approach at facilitating bias analysis by finding topics that correspond to demographic inequalities in generated text and comparing the relative effectiveness of inducing biases for different demographics. The second scenario is useful for mitigating biases in downstream applications such as dialogue generation. In our experiments, the mitigation technique proves to be effective at equalizing the amount of biases across demographics while simultaneously generating less negatively biased text overall.

...read moreread less

76 citations

Proceedings Article•DOI•

SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics

[...]

Da Yin¹, Tao Meng², Kai-Wei Chang²•Institutions (2)

Peking University¹, University of California, Los Angeles²

08 May 2020

TL;DR: SentiBERT is better than baseline approaches in capturing negation and the contrastive relation and model the compositional sentiment semantics, and can be transferred to other sentiment analysis tasks as well as related tasks, such as emotion classification tasks.

...read moreread less

Abstract: We propose SentiBERT, a variant of BERT that effectively captures compositional sentiment semantics. The model incorporates contextualized representation with binary constituency parse tree to capture semantic composition. Comprehensive experiments demonstrate that SentiBERT achieves competitive performance on phrase-level sentiment classification. We further demonstrate that the sentiment composition learned from the phrase-level annotations on SST can be transferred to other sentiment analysis tasks as well as related tasks, such as emotion classification tasks. Moreover, we conduct ablation studies and design visualization methods to understand SentiBERT. We show that SentiBERT is better than baseline approaches in capturing negation and the contrastive relation and model the compositional sentiment semantics.

...read moreread less

70 citations

Posted Content•

Robustness Verification for Transformers

[...]

Zhouxing Shi¹, Huan Zhang², Kai-Wei Chang², Minlie Huang¹, Cho-Jui Hsieh² - Show less +1 more•Institutions (2)

Tsinghua University¹, University of California, Los Angeles²

16 Feb 2020-arXiv: Learning

TL;DR: In this article, the authors consider the robustness verification problem for Transformers and develop the first robustness algorithm for Transformers, which is significantly tighter than those by naive Interval Bound Propagation.

...read moreread less

Abstract: Robustness verification that aims to formally certify the prediction behavior of neural networks has become an important tool for understanding model behavior and obtaining safety guarantees. However, previous methods can usually only handle neural networks with relatively simple architectures. In this paper, we consider the robustness verification problem for Transformers. Transformers have complex self-attention layers that pose many challenges for verification, including cross-nonlinearity and cross-position dependency, which have not been discussed in previous works. We resolve these challenges and develop the first robustness verification algorithm for Transformers. The certified robustness bounds computed by our method are significantly tighter than those by naive Interval Bound Propagation. These bounds also shed light on interpreting Transformers as they consistently reflect the importance of different words in sentiment analysis.

...read moreread less

56 citations

Proceedings Article•DOI•

Towards Controllable Biases in Language Generation

[...]

Emily Sheng¹, Kai-Wei Chang¹, Premkumar Natarajan², Nanyun Peng²•Institutions (2)

Information Sciences Institute¹, University of California, Los Angeles²

01 May 2020

TL;DR: The effectiveness of the approach at facilitating bias analysis is shown by finding topics that correspond to demographic inequalities in generated text and comparing the relative effectiveness of inducing biases for different demographics.

...read moreread less

Abstract: We present a general approach towards controllable societal biases in natural language generation (NLG). Building upon the idea of adversarial triggers, we develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups. We then analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics. The former scenario enables us to detect the types of biases present in the model. Specifically, we show the effectiveness of our approach at facilitating bias analysis by finding topics that correspond to demographic inequalities in generated text and comparing the relative effectiveness of inducing biases for different demographics. The second scenario is useful for mitigating biases in downstream applications such as dialogue generation. In our experiments, the mitigation technique proves to be effective at equalizing the amount of biases across demographics while simultaneously generating less negatively biased text overall.

...read moreread less

47 citations

Posted Content•

Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

[...]

Jieyu Zhao¹, Subhabrata Mukherjee², Saghar Hosseini², Kai-Wei Chang³, Ahmed Hassan Awadallah² - Show less +1 more•Institutions (3)

University of California, Los Angeles¹, Microsoft², Information Sciences Institute³

02 May 2020-arXiv: Computation and Language

TL;DR: This paper creates a multilingual dataset for bias analysis and proposes several ways for quantifying bias in multilingual representations from both the intrinsic and extrinsic perspectives, and shows that the magnitude of bias in the mult bilingual representations changes differently when the authors align the embeddings to different target spaces.

...read moreread less

Abstract: Multilingual representations embed words from many languages into a single semantic space such that words with similar meanings are close to each other regardless of the language. These embeddings have been widely used in various settings, such as cross-lingual transfer, where a natural language processing (NLP) model trained on one language is deployed to another language. While the cross-lingual transfer techniques are powerful, they carry gender bias from the source to target languages. In this paper, we study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations from both the intrinsic and extrinsic perspectives. Experimental results show that the magnitude of bias in the multilingual representations changes differently when we align the embeddings to different target spaces and that the alignment direction can also have an influence on the bias in transfer learning. We further provide recommendations for using the multilingual word representations for downstream tasks.

...read moreread less

42 citations

Posted Content•

Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood Ensemble.

[...]

Yi Zhou¹, Xiaoqing Zheng¹, Cho-Jui Hsieh², Kai-Wei Chang, Xuanjing Huang¹ - Show less +1 more•Institutions (2)

Fudan University¹, University of California, Los Angeles²

20 Jun 2020-arXiv: Computation and Language

TL;DR: Dirichlet Neighborhood Ensemble is proposed, a randomized smoothing method for training a robust model to defense substitution-based attacks that consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.

...read moreread less

Abstract: Despite neural networks have achieved prominent performance on many natural language processing (NLP) tasks, they are vulnerable to adversarial examples. In this paper, we propose Dirichlet Neighborhood Ensemble (DNE), a randomized smoothing method for training a robust model to defense substitution-based attacks. During training, DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data. In such a way, the model is robust to adversarial attacks while maintaining the performance on the original clean data. DNE is agnostic to the network architectures and scales to large models for NLP applications. We demonstrate through extensive experimentation that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.

...read moreread less

Proceedings Article•

Robustness Verification for Transformers

[...]

Zhouxing Shi¹, Huan Zhang², Kai-Wei Chang², Minlie Huang¹, Cho-Jui Hsieh² - Show less +1 more•Institutions (2)

Tsinghua University¹, University of California, Los Angeles²

30 Apr 2020

TL;DR: The certified robustness bounds computed by the first verification algorithm for Transformers are significantly tighter than those by naive Interval Bound Propagation, which sheds light on interpreting Transformers as they consistently reflect the importance of words in sentiment analysis.

...read moreread less

Abstract: Robustness verification that aims to formally certify the prediction behavior of neural networks has become an important tool for understanding the behavior of a given model and for obtaining safety guarantees. However, previous methods are usually limited to relatively simple neural networks. In this paper, we consider the robustness verification problem for Transformers. Transformers have complex self-attention layers that pose many challenges for verification, including cross-nonlinearity and cross-position dependency, which have not been discussed in previous work. We resolve these challenges and develop the first verification algorithm for Transformers. The certified robustness bounds computed by our method are significantly tighter than those by naive Interval Bound Propagation. These bounds also shed light on interpreting Transformers as they consistently reflect the importance of words in sentiment analysis.

...read moreread less

Posted Content•

GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction

[...]

Wasi Uddin Ahmad¹, Nanyun Peng¹, Kai-Wei Chang¹•Institutions (1)

University of California, Los Angeles¹

06 Oct 2020-arXiv: Computation and Language

TL;DR: This work introduces GATE, a self-attention mechanism where it explicitly fuse structural information to learn the dependencies between words at different syntactic distances, and test its cross-lingual transferability on relation and event extraction tasks.

...read moreread less

Abstract: Recent progress in cross-lingual relation and event extraction use graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations such that models trained on one language can be applied to other languages. However, GCNs struggle to model words with long-range dependencies or are not directly connected in the dependency tree. To address these challenges, we propose to utilize the self-attention mechanism where we explicitly fuse structural information to learn the dependencies between words with different syntactic distances. We introduce GATE, a {\bf G}raph {\bf A}ttention {\bf T}ransformer {\bf E}ncoder, and test its cross-lingual transferability on relation and event extraction tasks. We perform experiments on the ACE05 dataset that includes three typologically different languages: English, Chinese, and Arabic. The evaluation results show that GATE outperforms three recently proposed methods by a large margin. Our detailed analysis reveals that due to the reliance on syntactic dependencies, GATE produces robust representations that facilitate transfer across languages.

...read moreread less

Proceedings Article•DOI•

Mitigating Gender Bias Amplification in Distribution by Posterior Regularization

[...]

Shengyu Jia, Tao Meng¹, Jieyu Zhao¹, Kai-Wei Chang¹•Institutions (1)

University of California, Los Angeles¹

01 Jul 2020

TL;DR: This paper investigates the gender bias amplification issue from the distribution perspective and demonstrates that the bias is amplified in the view of predicted probability distribution over labels, and proposes a bias mitigation approach based on posterior regularization.

...read moreread less

Abstract: Advanced machine learning techniques have boosted the performance of natural language processing. Nevertheless, recent studies, e.g., (CITATION) show that these techniques inadvertently capture the societal bias hidden in the corpus and further amplify it. However, their analysis is conducted only on models’ top predictions. In this paper, we investigate the gender bias amplification issue from the distribution perspective and demonstrate that the bias is amplified in the view of predicted probability distribution over labels. We further propose a bias mitigation approach based on posterior regularization. With little performance loss, our method can almost remove the bias amplification in the distribution. Our study sheds the light on understanding the bias amplification.

...read moreread less

Proceedings Article•DOI•

PolicyQA: A Reading Comprehension Dataset for Privacy Policies

[...]

Wasi Uddin Ahmad¹, Jianfeng Chi², Yuan Tian², Kai-Wei Chang³•Institutions (3)

University of California, Los Angeles¹, University of Virginia², Information Sciences Institute³

01 Nov 2020

TL;DR: This paper argues that providing users with a short text span from policy documents reduces the burden of searching the target information from a lengthy text segment, and evaluates two existing neural QA models and performs rigorous analysis to reveal the advantages and challenges offered by PolicyQA.

...read moreread less

Abstract: Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy document given a question. On the contrary, we argue that providing users with a short text span from policy documents reduces the burden of searching the target information from a lengthy text segment. In this paper, we present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies. PolicyQA provides 714 human-annotated questions written for a wide range of privacy practices. We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.

...read moreread less

Proceedings Article•DOI•

On the Robustness of Language Encoders against Grammatical Errors

[...]

Fan Yin¹, Quanyu Long², Tao Meng³, Kai-Wei Chang³•Institutions (3)

Peking University¹, Shanghai Jiao Tong University², University of California, Los Angeles³

12 May 2020

TL;DR: This study collects real grammatical errors from non-native speakers and conducts adversarial attacks to simulate these errors on clean text data to facilitate debugging models on downstream applications and finds that fixed contextual encoders with a simple classifier trained on the prediction of sentence correctness are able to locate error positions.

...read moreread less

Abstract: We conduct a thorough study to diagnose the behaviors of pre-trained language encoders (ELMo, BERT, and RoBERTa) when confronted with natural grammatical errors. Specifically, we collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. We use this approach to facilitate debugging models on downstream applications. Results confirm that the performance of all tested models is affected but the degree of impact varies. To interpret model behaviors, we further design a linguistic acceptability task to reveal their abilities in identifying ungrammatical sentences and the position of errors. We find that fixed contextual encoders with a simple classifier trained on the prediction of sentence correctness are able to locate error positions. We also design a cloze test for BERT and discover that BERT captures the interaction between errors and specific tokens in context. Our results shed light on understanding the robustness and behaviors of language encoders against grammatical errors.

...read moreread less

Proceedings Article•DOI•

Towards Understanding Gender Bias in Relation Extraction

[...]

Andrew Gaut, Tony Sun¹, Shirlyn Tang, Yuxin Huang, Jing Qian¹, Mai ElSherief¹, Jieyu Zhao², Diba Mirza¹, Elizabeth Belding¹, Kai-Wei Chang², William Yang Wang³ - Show less +7 more•Institutions (3)

University of California, Santa Barbara¹, University of California, Los Angeles², Google³

01 Jul 2020

TL;DR: In this paper, the authors created WikiGenderBias, a distantly supervised dataset composed of over 45,000 sentences including a 10% human annotated test set for the purpose of analyzing gender bias in relation extraction systems.

...read moreread less

Abstract: Recent developments in Neural Relation Extraction (NRE) have made significant strides towards Automated Knowledge Base Construction. While much attention has been dedicated towards improvements in accuracy, there have been no attempts in the literature to evaluate social biases exhibited in NRE systems. In this paper, we create WikiGenderBias, a distantly supervised dataset composed of over 45,000 sentences including a 10% human annotated test set for the purpose of analyzing gender bias in relation extraction systems. We find that when extracting spouse-of and hypernym (i.e., occupation) relations, an NRE system performs differently when the gender of the target entity is different. However, such disparity does not appear when extracting relations such as birthDate or birthPlace. We also analyze how existing bias mitigation techniques, such as name anonymization, word embedding debiasing, and data augmentation affect the NRE system in terms of maintaining the test performance and reducing biases. Unfortunately, due to NRE models rely heavily on surface level cues, we find that existing bias mitigation approaches have a negative effect on NRE. Our analysis lays groundwork for future quantifying and mitigating bias in NRE.

...read moreread less

Posted Content•

Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global Inference

[...]

Yichao Zhou¹, Yu Yan, Rujun Han², J. Harry Caufield³, Kai-Wei Chang¹, Yizhou Sun¹, Peipei Ping¹, Wei Wang⁴ - Show less +4 more•Institutions (4)

University of California, Los Angeles¹, Information Sciences Institute², University of California, Berkeley³, Beihang University⁴

16 Dec 2020-arXiv: Computation and Language

TL;DR: A novel method, Clinical Temporal ReLation Exaction with Probabilistic Soft Logic Regularization and Global Inference (CTRL-PG) to tackle the problem at the document level, and significantly outperforms baseline methods for temporal relation extraction.

...read moreread less

Abstract: There has been a steady need in the medical community to precisely extract the temporal relations between clinical events. In particular, temporal information can facilitate a variety of downstream applications such as case report retrieval and medical question answering. Existing methods either require expensive feature engineering or are incapable of modeling the global relational dependencies among the events. In this paper, we propose a novel method, Clinical Temporal ReLation Exaction with Probabilistic Soft Logic Regularization and Global Inference (CTRL-PG) to tackle the problem at the document level. Extensive experiments on two benchmark datasets, I2B2-2012 and TB-Dense, demonstrate that CTRL-PG significantly outperforms baseline methods for temporal relation extraction.

...read moreread less

Proceedings Article•DOI•

Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

[...]

Jieyu Zhao¹, Subhabrata Mukherjee², Saghar Hosseini², Kai-Wei Chang³, Ahmed Hassan Awadallah² - Show less +1 more•Institutions (3)

University of California, Los Angeles¹, Microsoft², Information Sciences Institute³

02 May 2020

TL;DR: This article studied gender bias in multilingual embeddings and how it affects transfer learning for NLP applications and provided recommendations for using the multilingual word representations for downstream tasks, and proposed several ways for quantifying bias in multi-language representations from both the intrinsic and extrinsic perspectives.

...read moreread less

Abstract: Multilingual representations embed words from many languages into a single semantic space such that words with similar meanings are close to each other regardless of the language. These embeddings have been widely used in various settings, such as cross-lingual transfer, where a natural language processing (NLP) model trained on one language is deployed to another language. While the cross-lingual transfer techniques are powerful, they carry gender bias from the source to target languages. In this paper, we study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations from both the intrinsic and extrinsic perspectives. Experimental results show that the magnitude of bias in the multilingual representations changes differently when we align the embeddings to different target spaces and that the alignment direction can also have an influence on the bias in transfer learning. We further provide recommendations for using the multilingual word representations for downstream tasks.

...read moreread less

Posted Content•

A Transformer-based Approach for Source Code Summarization

[...]

Wasi Uddin Ahmad¹, Saikat Chakraborty², Baishakhi Ray², Kai-Wei Chang¹•Institutions (2)

University of California, Los Angeles¹, Columbia University²

01 May 2020-arXiv: Software Engineering

TL;DR: In this paper, the Transformer model is used to learn code representation for source code summarization, which has shown to be effective in capturing long-range dependencies and outperforms the state-of-the-art techniques by a significant margin.

...read moreread less

Abstract: Generating a readable summary that describes the functionality of a program is known as source code summarization. In this task, learning code representation by modeling the pairwise relationship between code tokens to capture their long-range dependencies is crucial. To learn code representation for summarization, we explore the Transformer model that uses a self-attention mechanism and has shown to be effective in capturing long-range dependencies. In this work, we show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin. We perform extensive analysis and ablation studies that reveal several important findings, e.g., the absolute encoding of source code tokens' position hinders, while relative encoding significantly improves the summarization performance. We have made our code publicly available to facilitate future research.

...read moreread less

Posted Content•

SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics

[...]

Da Yin¹, Tao Meng², Kai-Wei Chang²•Institutions (2)

Peking University¹, University of California, Los Angeles²

08 May 2020-arXiv: Computation and Language

TL;DR: This article proposed SentiBERT, a variant of BERT that effectively captures compositional sentiment semantics, which incorporates contextualized representation with binary constituency parse tree to capture semantic composition and achieved competitive performance on phrase-level sentiment classification.

...read moreread less

Abstract: We propose SentiBERT, a variant of BERT that effectively captures compositional sentiment semantics. The model incorporates contextualized representation with binary constituency parse tree to capture semantic composition. Comprehensive experiments demonstrate that SentiBERT achieves competitive performance on phrase-level sentiment classification. We further demonstrate that the sentiment composition learned from the phrase-level annotations on SST can be transferred to other sentiment analysis tasks as well as related tasks, such as emotion classification tasks. Moreover, we conduct ablation studies and design visualization methods to understand SentiBERT. We show that SentiBERT is better than baseline approaches in capturing negation and the contrastive relation and model the compositional sentiment semantics.

...read moreread less

Journal Article•DOI•

Distributed block-diagonal approximation methods for regularized empirical risk minimization

[...]

Ching-pei Lee¹, Kai-Wei Chang²•Institutions (2)

National University of Singapore¹, University of California, Los Angeles²

01 Apr 2020-Machine Learning

TL;DR: In this article, the authors proposed a flexible framework for distributed empirical risk minimization (ERM) training through solving the dual problem, which provides a unified description and comparison of existing methods.

...read moreread less

Abstract: In recent years, there is a growing need to train machine learning models on a huge volume of data. Therefore, designing efficient distributed optimization algorithms for empirical risk minimization (ERM) has become an active and challenging research topic. In this paper, we propose a flexible framework for distributed ERM training through solving the dual problem, which provides a unified description and comparison of existing methods. Our approach requires only approximate solutions of the sub-problems involved in the optimization process, and is versatile to be applied on many large-scale machine learning problems including classification, regression, and structured prediction. We show that our framework enjoys global linear convergence for a broad class of non-strongly-convex problems, and some specific choices of the sub-problems can even achieve much faster convergence than existing approaches by a refined analysis. This improved convergence rate is also reflected in the superior empirical performance of our method.

...read moreread less

Proceedings Article•DOI•

LOGAN: Local Group Bias Detection by Clustering

[...]

Jieyu Zhao¹, Kai-Wei Chang²•Institutions (2)

University of California, Los Angeles¹, Information Sciences Institute²

01 Nov 2020

TL;DR: LOGAN, a new bias detection technique based on clustering, is proposed and experiments show that LOGAN identifies bias in a local region and allows us to better analyze the biases in model predictions.

...read moreread less

Abstract: Machine learning techniques have been widely used in natural language processing (NLP). However, as revealed by many recent studies, machine learning models often inherit and amplify the societal biases in data. Various metrics have been proposed to quantify biases in model predictions. In particular, several of them evaluate disparity in model performance between protected groups and advantaged groups in the test corpus. However, we argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model. In fact, a model with similar aggregated performance between different groups on the entire data may behave differently on instances in a local region. To analyze and detect such local bias, we propose LOGAN, a new bias detection technique based on clustering. Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region and allows us to better analyze the biases in model predictions.

...read moreread less

Posted Content•

Automatic Perturbation Analysis on General Computational Graphs.

[...]

Kaidi Xu, Zhouxing Shi, Huan Zhang, Minlie Huang, Kai-Wei Chang, Bhavya Kailkhura, Xue Lin, Cho-Jui Hsieh - Show less +4 more

28 Feb 2020-arXiv: Learning

TL;DR: The main idea is to express a network as a computational graph and then generalize linear relaxation algorithms such as CROWN as a graph algorithm, and its computation can be done automatically in a similar manner as the back-propagation algorithm for gradient computation.

...read moreread less

Abstract: Linear relaxation based perturbation analysis for neural networks, which aims to compute tight linear bounds of output neurons given a certain amount of input perturbation, has become a core component in robustness verification and certified defense. However, the majority of linear relaxation based methods only consider feed-forward ReLU networks. While several works extended them to relatively complicated networks, they often need tedious manual derivations and implementation which are arduous and error-prone. Their limited flexibility makes it difficult to handle more complicated tasks. In this paper, we take a significant leap by developing an automatic perturbation analysis algorithm to enable perturbation analysis on any neural network structure, and its computation can be done automatically in a similar manner as the back-propagation algorithm for gradient computation. The main idea is to express a network as a computational graph and then generalize linear relaxation algorithms such as CROWN as a graph algorithm. Our algorithm itself is differentiable and integrated with PyTorch, which allows to optimize network parameters to reshape bounds into desired specifications, enabling automatic robustness verification and certified defense. In particular, we demonstrate a few tasks that are not easily achievable without an automatic framework. We first perform certified robust training and robustness verification for complex natural language models which could be challenging with manual derivation and implementation. We further show that our algorithm can be used for tasks beyond certified defense - we create a neural network with a provably flat optimization landscape and study its generalization capability, and we show that this network can preserve accuracy better after aggressive weight quantization. Code is available at this https URL.

...read moreread less

Posted Content•

Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond

[...]

Kaidi Xu¹, Zhouxing Shi², Huan Zhang³, Yihan Wang², Kai-Wei Chang³, Minlie Huang², Bhavya Kailkhura⁴, Xue Lin¹, Cho-Jui Hsieh³ - Show less +5 more•Institutions (4)

Northeastern University¹, Tsinghua University², University of California, Los Angeles³, ETH Zurich⁴

28 Feb 2020-arXiv: Learning

TL;DR: LiRPA-based certified defense as mentioned in this paper uses linear relaxation based perturbation analysis for neural networks, which computes provable linear bounds of output neurons given a certain amount of input perturbations.

...read moreread less

Abstract: Linear relaxation based perturbation analysis (LiRPA) for neural networks, which computes provable linear bounds of output neurons given a certain amount of input perturbation, has become a core component in robustness verification and certified defense. The majority of LiRPA-based methods focus on simple feed-forward networks and need particular manual derivations and implementations when extended to other architectures. In this paper, we develop an automatic framework to enable perturbation analysis on any neural network structures, by generalizing existing LiRPA algorithms such as CROWN to operate on general computational graphs. The flexibility, differentiability and ease of use of our framework allow us to obtain state-of-the-art results on LiRPA based certified defense on fairly complicated networks like DenseNet, ResNeXt and Transformer that are not supported by prior works. Our framework also enables loss fusion, a technique that significantly reduces the computational complexity of LiRPA for certified defense. For the first time, we demonstrate LiRPA based certified defense on Tiny ImageNet and Downscaled ImageNet where previous approaches cannot scale to due to the relatively large number of classes. Our work also yields an open-source library for the community to apply LiRPA to areas beyond certified defense without much LiRPA expertise, e.g., we create a neural network with a probably flat optimization landscape by applying LiRPA to network parameters. Our opensource library is available at this https URL.

...read moreread less

Posted Content•DOI•

Annotating Gene Ontology terms for protein sequences with the Transformer model

[...]

Dat Duong¹, Lisa Gai¹, Ankith Uppunda¹, Don Le¹, Eleazar Eskin¹, Jingyi Jessica Li¹, Kai-Wei Chang¹ - Show less +3 more•Institutions (1)

University of California, Los Angeles¹

02 Feb 2020-bioRxiv

TL;DR: A novel GO annotation model based on the Transformer neural network, which can capture more relevant information from the protein sequences and yields higher classification accuracy when compared to the recent CNN-based method DeepGO.

...read moreread less

Abstract: Predicting functions for novel amino acid sequences is a long-standing research problem. The Uniprot database which contains protein sequences annotated with Gene Ontology (GO) terms, is one commonly used training dataset for this problem. Predicting protein functions can then be viewed as a multi-label classification problem where the input is an amino acid sequence and the output is a set of GO terms. Recently, deep convolutional neural network (CNN) models have been introduced to annotate GO terms for protein sequences. However, the CNN architecture can only model close-range interactions between amino acids in a sequence. In this paper, first, we build a novel GO annotation model based on the Transformer neural network. Unlike the CNN architecture, the Transformer models all pairwise interactions for the amino acids within a sequence, and so can capture more relevant information from the sequences. Indeed, we show that our adaptation of Transformer yields higher classification accuracy when compared to the recent CNN-based method DeepGO. Second, we modify our model to take motifs in the protein sequences found by BLAST as additional input features. Our strategy is different from other ensemble approaches that average the outcomes of BLAST-based and machine learning predictors. Third, we integrate into our Transformer the metadata about the protein sequences such as 3D structure and protein-protein interaction (PPI) data. We show that such information can greatly improve the prediction accuracy, especially for rare GO labels.

...read moreread less

Proceedings Article•

Generating Sports News from Live Commentary: A Chinese Dataset for Sports Game Summarization.

[...]

Kuan-Hao Huang¹, Chen Li², Kai-Wei Chang¹•Institutions (2)

University of California, Los Angeles¹, Tencent²

01 Dec 2020

TL;DR: Wang et al. as mentioned in this paper presented SportsSum, a Chinese sports game summarization dataset which contains 5,428 soccer games of live commentaries and the corresponding news articles and proposed a two-step summarization model consisting of a selector and a rewriter for SportsSum.

...read moreread less

Abstract: Sports game summarization focuses on generating news articles from live commentaries. Unlike traditional summarization tasks, the source documents and the target summaries for sports game summarization tasks are written in quite different writing styles. In addition, live commentaries usually contain many named entities, which makes summarizing sports games precisely very challenging. To deeply study this task, we present SportsSum, a Chinese sports game summarization dataset which contains 5,428 soccer games of live commentaries and the corresponding news articles. Additionally, we propose a two-step summarization model consisting of a selector and a rewriter for SportsSum. To evaluate the correctness of generated sports summaries, we design two novel score metrics: name matching score and event matching score. Experimental results show that our model performs better than other summarization baselines on ROUGE scores as well as the two designed scores.

...read moreread less

Posted Content•

Weakly-supervised VisualBERT: Pre-training without Parallel Images and Captions

[...]

Liunian Harold Li¹, Haoxuan You², Zhecan Wang², Alireza Zareian², Shih-Fu Chang², Kai-Wei Chang¹ - Show less +2 more•Institutions (2)

University of California, Los Angeles¹, Columbia University²

24 Oct 2020-arXiv: Computation and Language

TL;DR: This work proposes Weakly-supervised VisualBERT with the key idea of conducting "mask-and-predict" pre-training on language-only and image-only corpora, and introduces the object tags detected by an object recognition model as anchor points to bridge two modalities.

...read moreread less

Abstract: Pre-trained contextual vision-and-language (V&L) models have brought impressive performance improvement on various benchmarks. However, the paired text-image data required for pre-training are hard to collect and scale up. We investigate if a strong V&L representation model can be learned without text-image pairs. We propose Weakly-supervised VisualBERT with the key idea of conducting "mask-and-predict" pre-training on language-only and image-only corpora. Additionally, we introduce the object tags detected by an object recognition model as anchor points to bridge two modalities. Evaluation on four V&L benchmarks shows that Weakly-supervised VisualBERT achieves similar performance with a model pre-trained with paired data. Besides, pre-training on more image-only data further improves a model that already has access to aligned data, suggesting the possibility of utilizing billions of raw images available to enhance V&L models.

...read moreread less

Posted Content•

"The Boating Store Had Its Best Sail Ever": Pronunciation-attentive Contextualized Pun Recognition

[...]

Yichao Zhou¹, Jyun-Yu Jiang¹, Jieyu Zhao¹, Kai-Wei Chang¹, Wei Wang¹ - Show less +1 more•Institutions (1)

University of California, Los Angeles¹

29 Apr 2020-arXiv: Computation and Language

TL;DR: Pronunciation-attentive Contextualized Pun Recognition is proposed to perceive human humor, detect if a sentence contains puns and locate them in the sentence, and significantly outperforms the state-of-the-art methods in pun detection and location tasks.

...read moreread less

Abstract: Humor plays an important role in human languages and it is essential to model humor when building intelligence systems. Among different forms of humor, puns perform wordplay for humorous effects by employing words with double entendre and high phonetic similarity. However, identifying and modeling puns are challenging as puns usually involved implicit semantic or phonological tricks. In this paper, we propose Pronunciation-attentive Contextualized Pun Recognition (PCPR) to perceive human humor, detect if a sentence contains puns and locate them in the sentence. PCPR derives contextualized representation for each word in a sentence by capturing the association between the surrounding context and its corresponding phonetic symbols. Extensive experiments are conducted on two benchmark datasets. Results demonstrate that the proposed approach significantly outperforms the state-of-the-art methods in pun detection and location tasks. In-depth analyses verify the effectiveness and robustness of PCPR.

...read moreread less

Posted Content•

Select, Extract and Generate: Neural Keyphrase Generation with Syntactic Guidance.

[...]

Wasi Uddin Ahmad, Xiao Bai, Soomin Lee, Kai-Wei Chang

04 Aug 2020

TL;DR: SEG-Net is proposed, a neural keyphrase generation model that is composed of two major components, a selector that selects the salient sentences in a document, and an extractor-generator that jointly extracts and generates keyphrases from the selected sentences.

...read moreread less

Abstract: In recent years, deep neural sequence-to-sequence framework has demonstrated promising results in keyphrase generation. However, processing long documents using such deep neural networks requires high computational resources. To reduce the computational cost, the documents are typically truncated before given as inputs. As a result, the models may miss essential points conveyed in a document. Moreover, most of the existing methods are either extractive (identify important phrases from the document) or generative (generate phrases word by word), and hence they do not benefit from the advantages of both modeling techniques. To address these challenges, we propose \emph{SEG-Net}, a neural keyphrase generation model that is composed of two major components, (1) a selector that selects the salient sentences in a document, and (2) an extractor-generator that jointly extracts and generates keyphrases from the selected sentences. SEG-Net uses a self-attentive architecture, known as, \emph{Transformer} as the building block with a couple of uniqueness. First, SEG-Net incorporates a novel \emph{layer-wise} coverage attention to summarize most of the points discussed in the target document. Second, it uses an \emph{informed} copy attention mechanism to encourage focusing on different segments of the document during keyphrase extraction and generation. Besides, SEG-Net jointly learns keyphrase generation and their part-of-speech tag prediction, where the later provides syntactic supervision to the former. The experimental results on seven keyphrase generation benchmarks from scientific and web documents demonstrate that SEG-Net outperforms the state-of-the-art neural generative methods by a large margin in both domains.

...read moreread less

Showing papers by "Kai-Wei Chang published in 2020"