Showing papers on "Phrase published in 2020"

PDF

Open Access

Posted Content•

Graph Structured Network for Image-Text Matching

[...]

Chunxiao Liu¹, Zhendong Mao², Tianzhu Zhang², Hongtao Xie², Bin Wang³, Yongdong Zhang² - Show less +2 more•Institutions (3)

Chinese Academy of Sciences¹, University of Science and Technology of China², Xiaomi³

01 Apr 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper presents a novel Graph Structured Matching Network (GSMN) to learn fine-grained correspondence of structured phrase correspondence, and shows that GSMN outperforms state-of-the-art methods on benchmarks.

...read moreread less

Abstract: Image-text matching has received growing interest since it bridges vision and language. The key challenge lies in how to learn correspondence between image and text. Existing works learn coarse correspondence based on object co-occurrence statistics, while failing to learn fine-grained phrase correspondence. In this paper, we present a novel Graph Structured Matching Network (GSMN) to learn fine-grained correspondence. The GSMN explicitly models object, relation and attribute as a structured phrase, which not only allows to learn correspondence of object, relation and attribute separately, but also benefits to learn fine-grained correspondence of structured phrase. This is achieved by node-level matching and structure-level matching. The node-level matching associates each node with its relevant nodes from another modality, where the node can be object, relation or attribute. The associated nodes then jointly infer fine-grained correspondence by fusing neighborhood associations at structure-level matching. Comprehensive experiments show that GSMN outperforms state-of-the-art methods on benchmarks, with relative Recall@1 improvements of nearly 7% and 2% on Flickr30K and MSCOCO, respectively. Code will be released at: this https URL.

...read moreread less

87 citations

Proceedings Article•DOI•

Graph Structured Network for Image-Text Matching

[...]

Chunxiao Liu¹, Zhendong Mao², Tianzhu Zhang², Hongtao Xie², Bin Wang³, Yongdong Zhang² - Show less +2 more•Institutions (3)

Chinese Academy of Sciences¹, University of Science and Technology of China², Xiaomi³

14 Jun 2020

TL;DR: GSMN as mentioned in this paper explicitly models object, relation and attribute as a structured phrase, which not only allows to learn correspondence of object, relations and attribute separately, but also benefits to learn fine-grained correspondence of structured phrase.

...read moreread less

83 citations

Journal Article•DOI•

Overcoming Language Priors in VQA via Decomposed Linguistic Representations

[...]

Chenchen Jing¹, Yuwei Wu¹, Xiaoxun Zhang², Yunde Jia¹, Qi Wu³ - Show less +1 more•Institutions (3)

Beijing Institute of Technology¹, Alibaba Group², University of Adelaide³

03 Apr 2020

TL;DR: A novel method of language attention-based VQA that learns decomposed linguistic representations of questions and utilizes the representations to infer answers for overcoming language priors is presented.

...read moreread less

Abstract: Most existing Visual Question Answering (VQA) models overly rely on language priors between questions and answers. In this paper, we present a novel method of language attention-based VQA that learns decomposed linguistic representations of questions and utilizes the representations to infer answers for overcoming language priors. We introduce a modular language attention mechanism to parse a question into three phrase representations: type representation, object representation, and concept representation. We use the type representation to identify the question type and the possible answer set (yes/no or specific concepts such as colors or numbers), and the object representation to focus on the relevant region of an image. The concept representation is verified with the attended region to infer the final answer. The proposed method decouples the language-based concept discovery and vision-based concept verification in the process of answer inference to prevent language priors from dominating the answering process. Experiments on the VQA-CP dataset demonstrate the effectiveness of our method.

...read moreread less

65 citations

Posted Content•

A Controllable Model of Grounded Response Generation

[...]

Zeqiu Wu¹, Michel Galley², Chris Brockett², Yizhe Zhang², Xiang Gao², Chris Quirk², Rik Koncel-Kedziorski¹, Jianfeng Gao², Hannaneh Hajishirzi, Mari Ostendorf¹, Bill Dolan² - Show less +7 more•Institutions (2)

University of Washington¹, Microsoft²

01 May 2020-arXiv: Computation and Language

TL;DR: Quantitative and qualitative results show that, using this framework, a GPT-2 based model trained on a conversation-like Reddit dataset outperforms strong generation baselines.

...read moreread less

Abstract: Current end-to-end neural conversation models inherently lack the flexibility to impose semantic control in the response generation process, often resulting in uninteresting responses. Attempts to boost informativeness alone come at the expense of factual accuracy, as attested by pretrained language models' propensity to "hallucinate" facts. While this may be mitigated by access to background knowledge, there is scant guarantee of relevance and informativeness in generated responses. We propose a framework that we call controllable grounded response generation (CGRG), in which lexical control phrases are either provided by a user or automatically extracted by a control phrase predictor from dialogue context and grounding knowledge. Quantitative and qualitative results show that, using this framework, a transformer based model with a novel inductive attention mechanism, trained on a conversation-like Reddit dataset, outperforms strong generation baselines.

...read moreread less

63 citations

Journal Article•DOI•

Dynamic Multi-Phrase Ranked Search over Encrypted Data with Symmetric Searchable Encryption

[...]

Cheng Guo¹, Xue Chen¹, Yingmo Jie¹, Zhangjie Fu², Mingchu Li¹, Bin Feng¹ - Show less +2 more•Institutions (2)

Dalian University of Technology¹, Nanjing University of Information Science and Technology²

01 Nov 2020-IEEE Transactions on Services Computing

TL;DR: This paper proposes a multi-phrase ranked search over encrypted cloud data, which also supports dynamic update operations, such as adding or deleting files, and used an inverted index to record the locations of keywords and to judge whether the phrase appears.

...read moreread less

Abstract: As cloud computing becomes prevalent, more and more data owners are likely to outsource their data to a cloud server. However, to ensure privacy, the data should be encrypted before outsourcing. Symmetric searchable encryption allows users to retrieve keyword over encrypted data without decrypting the data. Many existing schemes that are based on symmetric searchable encryption only support single keyword search, conjunctive keywords search, multiple keywords search, or single phrase search. However, some schemes, i.e., static schemes, only search one phrase in a query request. In this paper, we propose a multi-phrase ranked search over encrypted cloud data, which also supports dynamic update operations, such as adding or deleting files. We used an inverted index to record the locations of keywords and to judge whether the phrase appears. This index can search for keywords efficiently. In order to rank the results and protect the privacy of relevance score, the relevance score evaluation model is used in searching process on client-side. Also, the special construction of the index makes the scheme dynamic. The data owner can update the cloud data at very little cost. Security analyses and extensive experiments were conducted to demonstrate the safety and efficiency of the proposed scheme.

...read moreread less

60 citations

Posted Content•

Contrastive Learning for Weakly Supervised Phrase Grounding

[...]

Tanmay Gupta¹, Arash Vahdat², Gal Chechik³, Gal Chechik², Xiaodong Yang², Jan Kautz², Derek Hoiem¹ - Show less +3 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, Nvidia², Bar-Ilan University³

17 Jun 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is shown that phrase grounding can be learned by optimizing word-region attention to maximize a lower bound on mutual information between images and caption words.

...read moreread less

Abstract: Phrase grounding, the problem of associating image regions to caption words, is a crucial component of vision-language tasks We show that phrase grounding can be learned by optimizing word-region attention to maximize a lower bound on mutual information between images and caption words Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions A key idea is to construct effective negative captions for learning through language model guided word substitutions Training with our negatives yields a $\sim10\%$ absolute gain in accuracy over randomly-sampled negatives from the training data Our weakly supervised phrase grounding model trained on COCO-Captions shows a healthy gain of $57\%$ to achieve $767\%$ accuracy on Flickr30K Entities benchmark

...read moreread less

59 citations

Proceedings Article•DOI•

Structure-invariant testing for machine translation

[...]

Pinjia He¹, Clara Meister¹, Zhendong Su¹•Institutions (1)

ETH Zurich¹

27 Jun 2020

TL;DR: This article proposed structure-invariant testing (SIT), a novel metamorphic testing approach for validating machine translation software that generates similar source sentences by substituting one word in a given sentence with semantically similar, syntactically equivalent words.

...read moreread less

Abstract: In recent years, machine translation software has increasingly been integrated into our daily lives. People routinely use machine translation for various applications, such as describing symptoms to a foreign doctor and reading political news in a foreign language. However, the complexity and intractability of neural machine translation (NMT) models that power modern machine translation make the robustness of these systems difficult to even assess, much less guarantee. Machine translation systems can return inferior results that lead to misunderstanding, medical misdiagnoses, threats to personal safety, or political conflicts. Despite its apparent importance, validating the robustness of machine translation systems is very difficult and has, therefore, been much under-explored. To tackle this challenge, we introduce structure-invariant testing (SIT), a novel metamorphic testing approach for validating machine translation software. Our key insight is that the translation results of "similar" source sentences should typically exhibit similar sentence structures. Specifically, SIT (1) generates similar source sentences by substituting one word in a given sentence with semantically similar, syntactically equivalent words; (2) represents sentence structure by syntax parse trees (obtained via constituency or dependency parsing); (3) reports sentence pairs whose structures differ quantitatively by more than some threshold. To evaluate SIT, we use it to test Google Translate and Bing Microsoft Translator with 200 source sentences as input, which led to 64 and 70 buggy issues with 69.5% and 70% top-1 accuracy, respectively. The translation errors are diverse, including under-translation, over-translation, incorrect modification, word/phrase mistranslation, and unclear logic.

...read moreread less

51 citations

Book Chapter•DOI•

Contrastive Learning for Weakly Supervised Phrase Grounding

[...]

Tanmay Gupta¹, Arash Vahdat², Gal Chechik², Gal Chechik³, Xiaodong Yang², Jan Kautz², Derek Hoiem¹ - Show less +3 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, Nvidia², Bar-Ilan University³

23 Aug 2020

TL;DR: Gupta et al. as discussed by the authors proposed a phrase grounding model by optimizing word-region attention to maximize a lower bound on mutual information between images and caption words, which achieved a good performance on the Flickr30k Entities benchmark.

...read moreread less

Abstract: Phrase grounding, the problem of associating image regions to caption words, is a crucial component of vision-language tasks. We show that phrase grounding can be learned by optimizing word-region attention to maximize a lower bound on mutual information between images and caption words. Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions. A key idea is to construct effective negative captions for learning through language model guided word substitutions. Training with our negatives yields a $\sim 10\%$ absolute gain in accuracy over randomly-sampled negatives from the training data. Our weakly supervised phrase grounding model trained on COCO-Captions shows a healthy gain of $5.7\%$ to achieve $76.7\%$ accuracy on Flickr30K Entities benchmark. Our code and project material will be available at http://tanmaygupta.info/info-ground.

...read moreread less

49 citations

Journal Article•DOI•

Context-Specific Heterogeneous Graph Convolutional Network for Implicit Sentiment Analysis

[...]

Enguang Zuo¹, Hui Zhao¹, Bo Chen¹, Qiuchang Chen¹•Institutions (1)

Xinjiang University¹

20 Feb 2020-IEEE Access

TL;DR: This work proposes a new context-specific heterogeneous graph convolutional network (CsHGCN) framework that can combine all context representations and has a complete context that reflects the information on documents more comprehensively.

...read moreread less

Abstract: Sentiment analysis has attracted considerable attention in recent years. In particular, implicit sentiment analysis is a more challenging problem due to the lack of sentiment words. It requires us to combine contextual information and precisely understand the emotion changing process. Graph convolutional network (GCN) techniques have been widely applied for sentiment analysis since they are capable of learning from complex structures and preserving global information. However, these models either only focus on extracting features from a single sentence and ignore the context semantic background or only consider the textual information and overlook the phrase dependency when constructing the graph. To address these problems, we propose a new context-specific heterogeneous graph convolutional network (CsHGCN) framework that can combine all context representations. It has a complete context that reflects the information on documents more comprehensively. It has a dependency structure that obtains token-token semantic acquisition more accurately. The experimental results on a Chinese implicit sentiment dataset show that our proposed model can effectively identify the target sentiment of sentences, and visualization of the attention layers further demonstrates that the model selects qualitatively informative tokens and sentences.

...read moreread less

43 citations

Posted Content•

Mapping Natural Language Instructions to Mobile UI Action Sequences

[...]

Yang Li¹, Jiacong He, Xin Zhou¹, Yuan Zhang¹, Jason Baldridge¹ - Show less +1 more•Institutions (1)

Google¹

07 May 2020-arXiv: Computation and Language

TL;DR: This work creates PixelHelp, a corpus that pairs English instructions with actions performed by people on a mobile UI emulator and decouple the language and action data by annotating action phrase spans in How-To instructions and synthesizing grounded descriptions of actions for mobile user interfaces.

...read moreread less

Abstract: We present a new problem: grounding natural language instructions to mobile user interface actions, and create three new datasets for it. For full task evaluation, we create PIXELHELP, a corpus that pairs English instructions with actions performed by people on a mobile UI emulator. To scale training, we decouple the language and action data by (a) annotating action phrase spans in HowTo instructions and (b) synthesizing grounded descriptions of actions for mobile user interfaces. We use a Transformer to extract action phrase tuples from long-range natural language instructions. A grounding Transformer then contextually represents UI objects using both their content and screen position and connects them to object descriptions. Given a starting screen and instruction, our model achieves 70.59% accuracy on predicting complete ground-truth action sequences in PIXELHELP.

...read moreread less

39 citations

Journal Article•DOI•

A lazy learning-based language identification from speech using MFCC-2 features

[...]

Himadri Mukherjee¹, Sk Md Obaidullah², K. C. Santosh³, Santanu Phadikar⁴, Kaushik Roy¹ - Show less +1 more•Institutions (4)

West Bengal State University¹, Aliah University², University of South Dakota³, Islamic Azad University⁴

01 Jan 2020-International Journal of Machine Learning and Cybernetics

TL;DR: A new second level Mel frequency cepstral coefficient-based feature named MFCC-2 that handles the large and uneven dimensionality of MFCC has been used to characterize languages in the thick of English, Bangla and Hindi.

...read moreread less

Abstract: Developing an automatic speech recognition system for multilingual countries like India is a challenging task due to the fact that the people are inured to using multiple languages while talking. This makes language identification from speech an important and essential task prior to recognition of the same. In this paper a system is proposed towards language identification from multilingual speech signals. A new second level Mel frequency cepstral coefficient-based feature named MFCC-2 that handles the large and uneven dimensionality of MFCC has been used to characterize languages in the thick of English, Bangla and Hindi. The system has been tested with recordings of as many as 12,000 utterances of numerals and 41,884 clips extracted from YouTube videos considering background music, data from multiple environments, avoidance of noise suppression and use of keywords from different languages in a single phrase. The highest and average accuracies (for Top-3 classifiers from a pool of nine classifiers) of 98.09% and 95.54%, respectively were achieved for YouTube data.

...read moreread less

Proceedings Article•DOI•

Mapping Natural Language Instructions to Mobile UI Action Sequences

[...]

Yang Li¹, Jiacong He, Xin Zhou¹, Yuan Zhang¹, Jason Baldridge¹ - Show less +1 more•Institutions (1)

Google¹

07 May 2020

TL;DR: PixelHelp as mentioned in this paper ) is a corpus that pairs English instructions with actions performed by people on a mobile UI emulator, annotating action phrase spans in How-To instructions and synthesizing grounded descriptions of actions for mobile user interfaces.

...read moreread less

Abstract: We present a new problem: grounding natural language instructions to mobile user interface actions, and contribute three new datasets for it. For full task evaluation, we create PixelHelp, a corpus that pairs English instructions with actions performed by people on a mobile UI emulator. To scale training, we decouple the language and action data by (a) annotating action phrase spans in How-To instructions and (b) synthesizing grounded descriptions of actions for mobile user interfaces. We use a Transformer to extract action phrase tuples from long-range natural language instructions. A grounding Transformer then contextually represents UI objects using both their content and screen position and connects them to object descriptions. Given a starting screen and instruction, our model achieves 70.59% accuracy on predicting complete ground-truth action sequences in PixelHelp.

...read moreread less

Journal Article•DOI•

Multi-Word Expressions in Second Language Writing: A Large-Scale Longitudinal Learner Corpus Study

[...]

Anna Siyanova-Chanturia¹, Stefania Spina²•Institutions (2)

Victoria University of Wellington¹, University for Foreigners Perugia²

01 Jun 2020-Language Learning

TL;DR: This article tracked the development of phrasal vocabulary in essays produced at two different points in time and found that higher proficiency and greater exposure to the L2 learners did not result in more idiomatic and target-like output, and may, in fact, result in greater reliance on low frequency combinations whose constituent words are non-associated or mutually attracted.

...read moreread less

Abstract: © 2019 Language Learning Research Club, University of Michigan In the present study, we sought to advance the field of learner corpus research by tracking the development of phrasal vocabulary in essays produced at two different points in time. To this aim, we employed a large pool of second language (L2) learners (N = 175) from three proficiency levels—beginner, elementary, and intermediate—and focused on an underrepresented L2 (Italian). Employing mixed-effects models, a flexible and powerful tool for corpus data analysis, we analyzed learner combinations in terms of five different measures: phrase frequency, mutual information, lexical gravity, delta Pforward, and delta Pbackward. Our findings suggest a complex picture, in which higher proficiency and greater exposure to the L2 do not result in more idiomatic and targetlike output, and may, in fact, result in greater reliance on low frequency combinations whose constituent words are non-associated or mutually attracted.

...read moreread less

Proceedings Article•DOI•

Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples

[...]

Xiaoqing Zheng¹, Jiehang Zeng¹, Yi Zhou¹, Cho-Jui Hsieh², Minhao Cheng², Xuanjing Huang¹ - Show less +2 more•Institutions (2)

Fudan University¹, University of California, Los Angeles²

01 Jul 2020

TL;DR: This study shows that adversarial examples also exist in dependency parsing, and proposes two approaches to study where and how parsers make mistakes by searching over perturbations to existing texts at sentence and phrase levels, and design algorithms to construct such examples in both of the black-box and white-box settings.

...read moreread less

Abstract: Despite achieving prominent performance on many important tasks, it has been reported that neural networks are vulnerable to adversarial examples. Previously studies along this line mainly focused on semantic tasks such as sentiment analysis, question answering and reading comprehension. In this study, we show that adversarial examples also exist in dependency parsing: we propose two approaches to study where and how parsers make mistakes by searching over perturbations to existing texts at sentence and phrase levels, and design algorithms to construct such examples in both of the black-box and white-box settings. Our experiments with one of state-of-the-art parsers on the English Penn Treebank (PTB) show that up to 77% of input examples admit adversarial perturbations, and we also show that the robustness of parsing models can be improved by crafting high-quality adversaries and including them in the training stage, while suffering little to no performance drop on the clean input data.

...read moreread less

Book Chapter•DOI•

PhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and Click

[...]

Henghui Ding¹, Scott Cohen², Brian Price², Xudong Jiang¹•Institutions (2)

Nanyang Technological University¹, Adobe Systems²

23 Aug 2020

TL;DR: The proposed multimodal phrase+click approach achieves new state-of-the-art performance on interactive segmentation by employing phrase expressions as another interaction input to infer the attributes of target object.

...read moreread less

Abstract: Existing interactive object segmentation methods mainly take spatial interactions such as bounding boxes or clicks as input. However, these interactions do not contain information about explicit attributes of the target-of-interest and thus cannot quickly specify what the selected object exactly is, especially when there are diverse scales of candidate objects or the target-of-interest contains multiple objects. Therefore, excessive user interactions are often required to reach desirable results. On the other hand, in existing approaches attribute information of objects is often not well utilized in interactive segmentation. We propose to employ phrase expressions as another interaction input to infer the attributes of target object. In this way, we can 1) leverage spatial clicks to locate the target object and 2) utilize semantic phrases to qualify the attributes of the target object. Specifically, the phrase expressions focus on “what” the target object is and the spatial clicks are in charge of “where” the target object is, which together help to accurately segment the target-of-interest with smaller number of interactions. Moreover, the proposed approach is flexible in terms of interaction modes and can efficiently handle complex scenarios by leveraging the strengths of each type of input. Our multi-modal phrase+click approach achieves new state-of-the-art performance on interactive segmentation. To the best of our knowledge, this is the first work to leverage both clicks and phrases for interactive segmentation.

...read moreread less

Journal Article•DOI•

Phrase2Vec: Phrase embedding based on parsing

[...]

Yongliang Wu¹, Shuliang Zhao¹, Wen-Bin Li²•Institutions (2)

Hebei Normal University¹, Shijiazhuang University of Economics²

01 May 2020-Information Sciences

TL;DR: Experiments show that the introduced Phrase2Vec outperforms state-of-the-art phrase embedding models in the similarity task and the analogical reasoning task on Enwiki, DBLP, and Yelp dataset.

...read moreread less

Posted Content•

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

[...]

Liwei Wang¹, Jing Huang², Yin Li³, Kun Xu⁴, Zhengyuan Yang⁵, Dong Yu⁴ - Show less +2 more•Institutions (5)

The Chinese University of Hong Kong¹, University of Illinois at Urbana–Champaign², University of Wisconsin-Madison³, Tencent⁴, University of Rochester⁵

03 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work leverage a generic object detector at training time, and proposes a contrastive learning framework that accounts for both region-phrase and image-sentence matching, which achieves state-of-the-art results on visual phrase grounding, surpassing previous methods that require expensive object detectors at test time.

...read moreread less

Abstract: Weakly supervised phrase grounding aims at learning region-phrase correspondences using only image-sentence pairs. A major challenge thus lies in the missing links between image regions and sentence phrases during training. To address this challenge, we leverage a generic object detector at training time, and propose a contrastive learning framework that accounts for both region-phrase and image-sentence matching. Our core innovation is the learning of a region-phrase score function, based on which an image-sentence score function is further constructed. Importantly, our region-phrase score function is learned by distilling from soft matching scores between the detected object class names and candidate phrases within an image-sentence pair, while the image-sentence score function is supervised by ground-truth image-sentence pairs. The design of such score functions removes the need of object detection at test time, thereby significantly reducing the inference cost. Without bells and whistles, our approach achieves state-of-the-art results on the task of visual phrase grounding, surpassing previous methods that require expensive object detectors at test time.

...read moreread less

Proceedings Article•DOI•

Parsing All: Syntax and Semantics, Dependencies and Spans.

[...]

Junru Zhou¹, Zuchao Li¹, Hai Zhao¹•Institutions (1)

Shanghai Jiao Tong University¹

01 Nov 2020

TL;DR: This paper propose a joint model of syntactic and semantic parsing on both span and dependency representations, which incorporates syntactic information effectively in the encoder of neural network and benefits from two representation formalisms in a uniform way.

...read moreread less

Abstract: Both syntactic and semantic structures are key linguistic contextual clues, in which parsing the latter has been well shown beneficial from parsing the former. However, few works ever made an attempt to let semantic parsing help syntactic parsing. As linguistic representation formalisms, both syntax and semantics may be represented in either span (constituent/phrase) or dependency, on both of which joint learning was also seldom explored. In this paper, we propose a novel joint model of syntactic and semantic parsing on both span and dependency representations, which incorporates syntactic information effectively in the encoder of neural network and benefits from two representation formalisms in a uniform way. The experiments show that semantics and syntax can benefit each other by optimizing joint objectives. Our single model achieves new state-of-the-art or competitive results on both span and dependency semantic parsing on Propbank benchmarks and both dependency and constituent syntactic parsing on Penn Treebank.

...read moreread less

Proceedings Article•DOI•

Building Legal Case Retrieval Systems with Lexical Matching and Summarization using A Pre-Trained Phrase Scoring Model

[...]

Vu Tran¹, Minh Le Nguyen¹, Ken Satoh²•Institutions (2)

Japan Advanced Institute of Science and Technology¹, National Institute of Informatics²

29 Sep 2020-arXiv: Computation and Language

TL;DR: The approach is based on the idea that summarization is important for retrieval and adopts a summarization based model called encoded summarization which encodes a given document into continuous vector space which embeds the summary properties of the document.

...read moreread less

Abstract: We present our method for tackling the legal case retrieval task of the Competition on Legal Information Extraction/Entailment 2019. Our approach is based on the idea that summarization is important for retrieval. On one hand, we adopt a summarization based model called encoded summarization which encodes a given document into continuous vector space which embeds the summary properties of the document. We utilize the resource of COLIEE 2018 on which we train the document representation model. On the other hand, we extract lexical features on different parts of a given query and its candidates. We observe that by comparing different parts of the query and its candidates, we can achieve better performance. Furthermore, the combination of the lexical features with latent features by the summarization-based method achieves even better performance. We have achieved the state-of-the-art result for the task on the benchmark of the competition.

...read moreread less

Posted Content•

Assessing Phrasal Representation and Composition in Transformers

[...]

Lang Yu¹, Allyson Ettinger¹•Institutions (1)

University of Chicago¹

08 Oct 2020-arXiv: Computation and Language

TL;DR: It is found that phrase representation in state-of-the-art pre-trained transformers relies heavily on word content, with little evidence of nuanced composition.

...read moreread less

Abstract: Deep transformer models have pushed performance on NLP tasks to new limits, suggesting sophisticated treatment of complex linguistic inputs, such as phrases. However, we have limited understanding of how these models handle representation of phrases, and whether this reflects sophisticated composition of phrase meaning like that done by humans. In this paper, we present systematic analysis of phrasal representations in state-of-the-art pre-trained transformers. We use tests leveraging human judgments of phrase similarity and meaning shift, and compare results before and after control of word overlap, to tease apart lexical effects versus composition effects. We find that phrase representation in these models relies heavily on word content, with little evidence of nuanced composition. We also identify variations in phrase representation quality across models, layers, and representation types, and make corresponding recommendations for usage of representations from these models.

...read moreread less

Posted Content•

PhraseCut: Language-based Image Segmentation in the Wild

[...]

Chenyun Wu¹, Zhe Lin¹, Scott Cohen², Trung Bui², Subhransu Maji² - Show less +1 more•Institutions (2)

University of Massachusetts Amherst¹, Adobe Systems²

03 Aug 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work considers the problem of segmenting image regions given a natural language phrase, and studies it on a novel dataset of 77,262 images and 345,486 phrase-region pairs, collected on top of the Visual Genome dataset.

...read moreread less

Abstract: We consider the problem of segmenting image regions given a natural language phrase, and study it on a novel dataset of 77,262 images and 345,486 phrase-region pairs. Our dataset is collected on top of the Visual Genome dataset and uses the existing annotations to generate a challenging set of referring phrases for which the corresponding regions are manually annotated. Phrases in our dataset correspond to multiple regions and describe a large number of object and stuff categories as well as their attributes such as color, shape, parts, and relationships with other entities in the image. Our experiments show that the scale and diversity of concepts in our dataset poses significant challenges to the existing state-of-the-art. We systematically handle the long-tail nature of these concepts and present a modular approach to combine category, attribute, and relationship cues that outperforms existing approaches.

...read moreread less

Proceedings Article•DOI•

Assessing Phrasal Representation and Composition in Transformers

[...]

Lang Yu¹, Allyson Ettinger¹•Institutions (1)

University of Chicago¹

08 Oct 2020

TL;DR: The authors found that phrase representation in pre-trained transformers relies heavily on word content, with little evidence of nuanced composition, and identified variations in phrase representation quality across models, layers, and representation types, and made corresponding recommendations for usage of representations from these models.

...read moreread less

Posted Content•

Beheshti-NER: Persian Named Entity Recognition Using BERT.

[...]

Ehsan Taher¹, Seyed Abbas Hoseini¹, Mehrnoush Shamsfard¹•Institutions (1)

Shahid Beheshti University¹

19 Mar 2020-arXiv: Computation and Language

TL;DR: The pre-trained deep bidirectional network, BERT, is used to make a model for named entity recognition in Persian to achieve second place in NSURL-2019 task 7 competition which associated with NER for the Persian language.

...read moreread less

Abstract: Named entity recognition is a natural language processing task to recognize and extract spans of text associated with named entities and classify them in semantic Categories. Google BERT is a deep bidirectional language model, pre-trained on large corpora that can be fine-tuned to solve many NLP tasks such as question answering, named entity recognition, part of speech tagging and etc. In this paper, we use the pre-trained deep bidirectional network, BERT, to make a model for named entity recognition in Persian. We also compare the results of our model with the previous state of the art results achieved on Persian NER. Our evaluation metric is CONLL 2003 score in two levels of word and phrase. This model achieved second place in NSURL-2019 task 7 competition which associated with NER for the Persian language. our results in this competition are 83.5 and 88.4 f1 CONLL score respectively in phrase and word level evaluation.

...read moreread less

Proceedings Article•DOI•

Visual-Semantic Graph Matching for Visual Grounding

[...]

Chenchen Jing¹, Yuwei Wu¹, Mingtao Pei¹, Yao Hu, Yunde Jia¹, Qi Wu² - Show less +2 more•Institutions (2)

Beijing Institute of Technology¹, University of Adelaide²

12 Oct 2020

TL;DR: This paper forms visual grounding as a graph matching problem to find node correspondences between a visual scene graph and a language scene graph, and learns unified contextual node representations of the two graphs by using a cross-modal graph convolutional network to reduce their discrepancy.

...read moreread less

Abstract: Visual Grounding is the task of associating entities in a natural language sentence with objects in an image. In this paper, we formulate visual grounding as a graph matching problem to find node correspondences between a visual scene graph and a language scene graph. These two graphs are heterogeneous, representing structure layouts of the sentence and image, respectively. We learn unified contextual node representations of the two graphs by using a cross-modal graph convolutional network to reduce their discrepancy. The graph matching is thus relaxed as a linear assignment problem because the learned node representations characterize both node information and structure information. A permutation loss and a semantic cycle-consistency loss are further introduced to solve the linear assignment problem with or without ground-truth correspondences. Experimental results on two visual grounding tasks, i.e., referring expression comprehension and phrase localization, demonstrate the effectiveness of our method.

...read moreread less

Proceedings Article•DOI•

Contextualized Sparse Representations for Real-Time Open-Domain Question Answering

[...]

Jinhyuk Lee¹, Minjoon Seo², Hannaneh Hajishirzi³, Jaewoo Kang¹•Institutions (3)

Korea University¹, Naver Corporation², University of Washington³

01 Jul 2020

TL;DR: This paper aims to improve the quality of each phrase embedding by augmenting it with a contextualized sparse representation (Sparc) and shows 4%+ improvement in CuratedTREC and SQuAD-Open.

...read moreread less

Abstract: Open-domain question answering can be formulated as a phrase retrieval problem, in which we can expect huge scalability and speed benefit but often suffer from low accuracy due to the limitation of existing phrase representation models. In this paper, we aim to improve the quality of each phrase embedding by augmenting it with a contextualized sparse representation (Sparc). Unlike previous sparse vectors that are term-frequency-based (e.g., tf-idf) or directly learned (only few thousand dimensions), we leverage rectified self-attention to indirectly learn sparse vectors in n-gram vocabulary space. By augmenting the previous phrase retrieval model (Seo et al., 2019) with Sparc, we show 4%+ improvement in CuratedTREC and SQuAD-Open. Our CuratedTREC score is even better than the best known retrieve & read model with at least 45x faster inference speed.

...read moreread less

Proceedings Article•DOI•

API method recommendation via explicit matching of functionality verb phrases

[...]

Wenkai Xie¹, Xin Peng¹, Mingwei Liu¹, Christoph Treude², Zhenchang Xing³, Xiaoxin Zhang¹, Wenyun Zhao¹ - Show less +3 more•Institutions (3)

Fudan University¹, University of Adelaide², Australian National University³

08 Nov 2020

TL;DR: PreMA is proposed, an API method recommendation approach based on explicit matching of functionality verb phrases in functionality descriptions and user queries, called PreMA that can accurately recognize the functionality categories and phrase patterns of functionality description sentences and help participants complete their tasks more accurately and with fewer retries.

...read moreread less

Abstract: Due to the lexical gap between functionality descriptions and user queries, documentation-based API retrieval often produces poor results.Verb phrases and their phrase patterns are essential in both describing API functionalities and interpreting user queries. Thus we hypothesize that API retrieval can be facilitated by explicitly recognizing and matching between the fine-grained structures of functionality descriptions and user queries. To verify this hypothesis, we conducted a large-scale empirical study on the functionality descriptions of 14,733 JDK and Android API methods. We identified 356 different functionality verbs from the descriptions, which were grouped into 87 functionality categories, and we extracted 523 phrase patterns from the verb phrases of the descriptions. Building on these findings, we propose an API method recommendation approach based on explicit matching of functionality verb phrases in functionality descriptions and user queries, called PreMA. Our evaluation shows that PreMA can accurately recognize the functionality categories (92.8%) and phrase patterns (90.4%) of functionality description sentences; and when used for API retrieval tasks, PreMA can help participants complete their tasks more accurately and with fewer retries compared to a baseline approach.

...read moreread less

Posted Content•

Modality-Agnostic Attention Fusion for visual search with text feedback.

[...]

Eric Dodds, Jack Culpepper, Simao Herdade, Yang Zhang, Kofi Boakye - Show less +1 more

30 Jun 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: The Modality-Agnostic Attention Fusion (MAAF) model combines image and text features and outperforms existing approaches on two visual search with modifying phrase datasets, Fashion IQ and CSS, and performs competitively on a dataset with only single-word modifications, Fashion200k.

...read moreread less

Abstract: Image retrieval with natural language feedback offers the promise of catalog search based on fine-grained visual features that go beyond objects and binary attributes, facilitating real-world applications such as e-commerce. Our Modality-Agnostic Attention Fusion (MAAF) model combines image and text features and outperforms existing approaches on two visual search with modifying phrase datasets, Fashion IQ and CSS, and performs competitively on a dataset with only single-word modifications, Fashion200k. We also introduce two new challenging benchmarks adapted from Birds-to-Words and Spot-the-Diff, which provide new settings with rich language inputs, and we show that our approach without modification outperforms strong baselines. To better understand our model, we conduct detailed ablations on Fashion IQ and provide visualizations of the surprising phenomenon of words avoiding "attending" to the image region they refer to.

...read moreread less

Journal Article•DOI•

Deconstructing Information Structure

[...]

Angelika Kratzer¹, Elisabeth Selkirk¹•Institutions (1)

University of Massachusetts Amherst¹

20 Nov 2020

TL;DR: It is argued that a core part of what is traditionally referred to as ‘information structure’ can be deconstructed into genuine morphosyntactic features that are visible to syntactic operations, contribute to discourse-related expressive meanings, and just happen to be spelled out prosodically in Standard American and British English.

...read moreread less

Abstract: The paper argues that a core part of what is traditionally referred to as ‘information structure’ can be deconstructed into genuine morphosyntactic features that are visible to syntactic operations, contribute to discourse-related expressive meanings, and just happen to be spelled out prosodically in Standard American and British English. We motivate two features, [FoC] and [G], and we track the fate of those features at and beyond the syntax-semantics and the syntax-phonology interfaces. [FoC] and [G] are responsible for two distinct obligatory strategies for establishing discourse coherence. A [G]-marked constituent signals a match with a discourse referent, whereas a [FoC]-marked constituent invokes alternatives and thereby signals a contrast. In Standard American and British English [FoC] aims for highest prosodic prominence in the intonational phrase, whereas [G] lacks phrase-level prosodic properties. There is no grammatical marking of newness: The apparent prosodic effects of newness are the result of default prosody.

...read moreread less

Journal Article•DOI•

Market matters: interdependencies in the Indian media economy:

[...]

Vibodh Parthasarathi¹, Adrian Athique²•Institutions (2)

Jamia Millia Islamia¹, University of Queensland²

01 Apr 2020-Media, Culture & Society

TL;DR: In the complex operations of the Indian media economy, the phrase "media markets" requires careful consideration as an analytical concept as mentioned in this paper, as a noun is typically used to refer to a...

...read moreread less

Abstract: In the complex operations of the Indian media economy, the phrase ‘media markets’ requires careful consideration as an analytical concept. As a noun, ‘media markets’ is typically used to refer to a...

...read moreread less

Book Chapter•DOI•

Propagating Over Phrase Relations for One-Stage Visual Grounding.

[...]

Sibei Yang¹, Guanbin Li², Yizhou Yu¹•Institutions (2)

University of Hong Kong¹, Sun Yat-sen University²

23 Aug 2020

TL;DR: A linguistic structure guided propagation network for one-stage phrase grounding that explicitly explores the linguistic structure of the sentence and performs relational propagation among noun phrases under the guidance of the linguistic relations between them.

...read moreread less

Abstract: Phrase level visual grounding aims to locate in an image the corresponding visual regions referred to by multiple noun phrases in a given sentence. Its challenge comes not only from large variations in visual contents and unrestricted phrase descriptions but also from unambiguous referrals derived from phrase relational reasoning. In this paper, we propose a linguistic structure guided propagation network for one-stage phrase grounding. It explicitly explores the linguistic structure of the sentence and performs relational propagation among noun phrases under the guidance of the linguistic relations between them. Specifically, we first construct a linguistic graph parsed from the sentence and then capture multimodal feature maps for all the phrasal nodes independently. The node features are then propagated over the edges with a tailor-designed relational propagation module and ultimately integrated for final prediction. Experiments on Flickr30K Entities dataset show that our model outperforms state-of-the-art methods and demonstrate the effectiveness of propagating among phrases with linguistic relations (Source code will be available at https://github.com/sibeiyang/lspn.).

...read moreread less

Collapse