Showing papers on "Phrase published in 2018"

PDF

Open Access

Proceedings Article•DOI•

MAttNet: Modular Attention Network for Referring Expression Comprehension

[...]

Licheng Yu¹, Zhe Lin², Xiaohui Shen², Jimei Yang², Xin Lu², Mohit Bansal¹, Tamara L. Berg¹ - Show less +3 more•Institutions (2)

University of North Carolina at Chapel Hill¹, Adobe Systems²

01 Jun 2018

TL;DR: The authors decompose expressions into three modular components related to subject appearance, location, and relationship to other objects in an end-to-end framework, which allows to flexibly adapt to expressions containing different types of information.

...read moreread less

Abstract: In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression. While most recent work treats expressions as a single unit, we propose to decompose them into three modular components related to subject appearance, location, and relationship to other objects. This allows us to flexibly adapt to expressions containing different types of information in an end-to-end framework. In our model, which we call the Modular Attention Network (MAttNet), two types of attention are utilized: language-based attention that learns the module weights as well as the word/phrase attention that each module should focus on; and visual attention that allows the subject and relationship modules to focus on relevant image components. Module weights combine scores from all three modules dynamically to output an overall score. Experiments show that MAttNet outperforms previous state-of-the-art methods by a large margin on both bounding-box-level and pixel-level comprehension tasks. Demo1 and code2 are provided.

...read moreread less

626 citations

Proceedings Article•DOI•

Phrase-Based & Neural Unsupervised Machine Translation

[...]

Guillaume Lample¹, Myle Ott¹, Alexis Conneau¹, Ludovic Denoyer², Marc'Aurelio Ranzato¹ - Show less +1 more•Institutions (2)

Facebook¹, University of Paris²

20 Apr 2018

TL;DR: The authors proposed two model variants, a neural and a phrase-based model, which leverage a careful initialization of the parameters, the denoising effect of language models and automatic generation of parallel data by iterative back-translation.

...read moreread less

Abstract: Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs. This work investigates how to learn to translate when having access to only large monolingual corpora in each language. We propose two model variants, a neural and a phrase-based model. Both versions leverage a careful initialization of the parameters, the denoising effect of language models and automatic generation of parallel data by iterative back-translation. These models are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters. On the widely used WMT’14 English-French and WMT’16 German-English benchmarks, our models respectively obtain 28.1 and 25.2 BLEU points without using a single parallel sentence, outperforming the state of the art by more than 11 BLEU points. On low-resource languages like English-Urdu and English-Romanian, our methods achieve even better results than semi-supervised and supervised approaches leveraging the paucity of available bitexts. Our code for NMT and PBSMT is publicly available.

...read moreread less

461 citations

Journal Article•DOI•

Automated Phrase Mining from Massive Text Corpora

[...]

Jingbo Shang¹, Jialu Liu², Meng Jiang³, Xiang Ren⁴, Clare R. Voss⁵, Jiawei Han¹ - Show less +2 more•Institutions (5)

University of Illinois at Urbana–Champaign¹, Google², University of Notre Dame³, University of Southern California⁴, United States Army Research Laboratory⁵

01 Oct 2018-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper proposed a framework for automated phrase mining, $\mathsf{AutoPhrase}$, which supports any language as long as a general knowledge base (e.g., Wikipedia) in that language is available, while benefiting from, but not requiring, a POS tagger.

...read moreread less

Abstract: As one of the fundamental tasks in text analysis, phrase mining aims at extracting quality phrases from a text corpus and has various downstream applications including information extraction/retrieval, taxonomy construction, and topic modeling. Most existing methods rely on complex, trained linguistic analyzers, and thus likely have unsatisfactory performance on text corpora of new domains and genres without extra but expensive adaption. None of the state-of-the-art models, even data-driven models, is fully automated because they require human experts for designing rules or labeling phrases. In this paper, we propose a novel framework for automated phrase mining, $\mathsf{AutoPhrase}$ , which supports any language as long as a general knowledge base (e.g., Wikipedia) in that language is available, while benefiting from, but not requiring, a POS tagger. Compared to the state-of-the-art methods, $\mathsf{AutoPhrase}$ has shown significant improvements in both effectiveness and efficiency on five real-world datasets across different domains and languages. Besides, $\mathsf{AutoPhrase}$ can be extended to model single-word quality phrases.

...read moreread less

286 citations

Journal Article•DOI•

The neural oscillations of speech processing and language comprehension: State of the art and emerging mechanisms

[...]

Lars Meyer¹•Institutions (1)

Max Planck Society¹

01 Oct 2018-European Journal of Neuroscience

TL;DR: An accessible and extensive review of the functional mechanisms that neural oscillations subserve in speech processing and language comprehension and synthesises a mapping from each linguistic processing domain to a unique set of subserving oscillatory mechanisms.

...read moreread less

Abstract: Neural oscillations subserve a broad range of functions in speech processing and language comprehension. On the one hand, speech contains-somewhat-repetitive trains of air pressure bursts that occur at three dominant amplitude modulation frequencies, physically marking the linguistically meaningful progressions of phonemes, syllables and intonational phrase boundaries. To these acoustic events, neural oscillations of isomorphous operating frequencies are thought to synchronise, presumably resulting in an implicit temporal alignment of periods of neural excitability to linguistically meaningful spectral information on the three low-level linguistic description levels. On the other hand, speech is a carrier signal that codes for high-level linguistic meaning, such as syntactic structure and semantic information-which cannot be read from stimulus acoustics, but must be acquired during language acquisition and decoded for language comprehension. Neural oscillations subserve the processing of both syntactic structure and semantic information. Here, I synthesise a mapping from each linguistic processing domain to a unique set of subserving oscillatory mechanisms-the mapping is plausible given the role ascribed to different oscillatory mechanisms in different subfunctions of cortical information processing and faithful to the underlying electrophysiology. In sum, the present article provides an accessible and extensive review of the functional mechanisms that neural oscillations subserve in speech processing and language comprehension.

...read moreread less

223 citations

Proceedings Article•DOI•

Visual Grounding via Accumulated Attention

[...]

Chaorui Deng¹, Qi Wu², Qingyao Wu¹, Fuyuan Hu³, Fan Lyu³, Mingkui Tan¹ - Show less +2 more•Institutions (3)

South China University of Technology¹, University of Adelaide², Suzhou University of Science and Technology³

18 Jun 2018

TL;DR: The A-ATT mechanism can circularly accumulate the attention for useful information in image, query, and objects, while the noises are ignored gradually and the experimental results show the superiority of the proposed method in term of accuracy.

...read moreread less

Abstract: Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence or even a multi-round dialogue. There are three main challenges in VG: 1) what is the main focus in a query; 2) how to understand an image; 3) how to locate an object. Most existing methods combine all the information curtly, which may suffer from the problem of information redundancy (i.e. ambiguous query, complicated image and a large number of objects). In this paper, we formulate these challenges as three attention problems and propose an accumulated attention (A-ATT) mechanism to reason among them jointly. Our A-ATT mechanism can circularly accumulate the attention for useful information in image, query, and objects, while the noises are ignored gradually. We evaluate the performance of A-ATT on four popular datasets (namely Refer-COCO, ReferCOCO+, ReferCOCOg, and Guesswhat?!), and the experimental results show the superiority of the proposed method in term of accuracy.

...read moreread less

197 citations

Book Chapter•DOI•

Grounding Visual Explanations

[...]

Lisa Anne Hendricks¹, Ronghang Hu¹, Trevor Darrell¹, Zeynep Akata²•Institutions (2)

University of California, Berkeley¹, University of Amsterdam²

08 Sep 2018

TL;DR: This paper propose a phrase-critic model to refine generated candidate explanations augmented with flipped phrases which are used as negative examples while training, which improves the textual explanation quality of fine-grained classification decisions by mentioning phrases that are grounded in the image.

...read moreread less

Abstract: Existing visual explanation generating agents learn to fluently justify a class prediction. However, they may mention visual attributes which reflect a strong class prior, although the evidence may not actually be in the image. This is particularly concerning as ultimately such agents fail in building trust with human users. To overcome this limitation, we propose a phrase-critic model to refine generated candidate explanations augmented with flipped phrases which we use as negative examples while training. At inference time, our phrase-critic model takes an image and a candidate explanation as input and outputs a score indicating how well the candidate explanation is grounded in the image. Our explainable AI agent is capable of providing counter arguments for an alternative prediction, i.e. counterfactuals, along with explanations that justify the correct classification decisions. Our model improves the textual explanation quality of fine-grained classification decisions on the CUB dataset by mentioning phrases that are grounded in the image. Moreover, on the FOIL tasks, our agent detects when there is a mistake in the sentence, grounds the incorrect phrase and corrects it significantly better than other models.

...read moreread less

173 citations

Proceedings Article•

Learning Structured Representation for Text Classification via Reinforcement Learning

[...]

Tianyang Zhang¹, Minlie Huang¹, Li Zhao²•Institutions (2)

Tsinghua University¹, Microsoft²

26 Apr 2018

TL;DR: Results show that the proposed reinforcement learning method can learn task-friendly representations by identifying important words or task-relevant structures without explicit structure annotations, and thus yields competitive performance.

...read moreread less

Abstract: Representation learning is a fundamental problem in natural language processing. This paper studies how to learn a structured representation for text classification. Unlike most existing representation models that either use no structure or rely on pre-specified structures, we propose a reinforcement learning (RL) method to learn sentence representation by discovering optimized structures automatically. We demonstrate two attempts to build structured representation: Information Distilled LSTM (ID-LSTM) and Hierarchically Structured LSTM (HS-LSTM). ID-LSTM selects only important, task-relevant words, and HS-LSTM discovers phrase structures in a sentence. Structure discovery in the two representation models is formulated as a sequential decision problem: current decision of structure discovery affects following decisions, which can be addressed by policy gradient RL. Results show that our method can learn task-friendly representations by identifying important words or task-relevant structures without explicit structure annotations, and thus yields competitive performance.

...read moreread less

142 citations

Book Chapter•DOI•

Visual Coreference Resolution in Visual Dialog Using Neural Module Networks

[...]

Satwik Kottur¹, Jose M. F. Moura², Devi Parikh¹, Dhruv Batra¹, Marcus Rohrbach¹ - Show less +1 more•Institutions (2)

Facebook¹, Carnegie Mellon University²

08 Sep 2018

TL;DR: This paper propose a neural module network architecture for visual dialog by introducing two novel modules, refer and exclude, that perform explicit, grounded, coreference resolution at a finer word level, and demonstrate the effectiveness of their model on MNIST Dialog.

...read moreread less

Abstract: Visual dialog entails answering a series of questions grounded in an image, using dialog history as context. In addition to the challenges found in visual question answering (VQA), which can be seen as one-round dialog, visual dialog encompasses several more. We focus on one such problem called visual coreference resolution that involves determining which words, typically noun phrases and pronouns, co-refer to the same entity/object instance in an image. This is crucial, especially for pronouns (e.g., ‘it’), as the dialog agent must first link it to a previous coreference (e.g., ‘boat’), and only then can rely on the visual grounding of the coreference ‘boat’ to reason about the pronoun ‘it’. Prior work (in visual dialog) models visual coreference resolution either (a) implicitly via a memory network over history, or (b) at a coarse level for the entire question; and not explicitly at a phrase level of granularity. In this work, we propose a neural module network architecture for visual dialog by introducing two novel modules—Refer and Exclude—that perform explicit, grounded, coreference resolution at a finer word level. We demonstrate the effectiveness of our model on MNIST Dialog, a visually simple yet coreference-wise complex dataset, by achieving near perfect accuracy, and on VisDial, a large and challenging visual dialog dataset on real images, where our model outperforms other approaches, and is more interpretable, grounded, and consistent qualitatively.

...read moreread less

134 citations

Posted Content•

Visual Coreference Resolution in Visual Dialog using Neural Module Networks

[...]

Satwik Kottur¹, Jose M. F. Moura², Devi Parikh¹, Dhruv Batra¹, Marcus Rohrbach¹ - Show less +1 more•Institutions (2)

Facebook¹, Carnegie Mellon University²

06 Sep 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes a neural module network architecture for visual dialog by introducing two novel modules—Refer and Exclude—that perform explicit, grounded, coreference resolution at a finer word level, and demonstrates the effectiveness of the model on MNIST Dialog, a visually simple yet coreference-wise complex dataset, by achieving near perfect accuracy.

...read moreread less

Abstract: Visual dialog entails answering a series of questions grounded in an image, using dialog history as context. In addition to the challenges found in visual question answering (VQA), which can be seen as one-round dialog, visual dialog encompasses several more. We focus on one such problem called visual coreference resolution that involves determining which words, typically noun phrases and pronouns, co-refer to the same entity/object instance in an image. This is crucial, especially for pronouns (e.g., `it'), as the dialog agent must first link it to a previous coreference (e.g., `boat'), and only then can rely on the visual grounding of the coreference `boat' to reason about the pronoun `it'. Prior work (in visual dialog) models visual coreference resolution either (a) implicitly via a memory network over history, or (b) at a coarse level for the entire question; and not explicitly at a phrase level of granularity. In this work, we propose a neural module network architecture for visual dialog by introducing two novel modules - Refer and Exclude - that perform explicit, grounded, coreference resolution at a finer word level. We demonstrate the effectiveness of our model on MNIST Dialog, a visually simple yet coreference-wise complex dataset, by achieving near perfect accuracy, and on VisDial, a large and challenging visual dialog dataset on real images, where our model outperforms other approaches, and is more interpretable, grounded, and consistent qualitatively.

...read moreread less

107 citations

Book Chapter•DOI•

Conditional Image-Text Embedding Networks

[...]

Bryan A. Plummer¹, Paige Kordas¹, M. Hadi Kiapour², Shuai Zheng², Robinson Piramuthu², Svetlana Lazebnik¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, eBay²

08 Sep 2018

TL;DR: This article propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments, allowing the underrepresented concepts to take advantage of the shared representations before feeding them into concept-specific layers.

...read moreread less

Abstract: This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies the representation requirements for individual embeddings and allows the underrepresented concepts to take advantage of the shared representations before feeding them into concept-specific layers. Comprehensive experiments verify the effectiveness of our approach across three phrase grounding datasets, Flickr30K Entities, ReferIt Game, and Visual Genome, where we obtain a (resp.) 4%, 3%, and 4% improvement in grounding performance over a strong region-phrase embedding baseline (Code: https://github.com/BryanPlummer/cite).

...read moreread less

107 citations

Proceedings Article•DOI•

Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase Embeddings

[...]

Debanjan Mahata¹, John Kuriakose, Rajiv Ratn Shah², Roger Zimmermann³•Institutions (3)

Delhi Technological University¹, Indraprastha Institute of Information Technology², National University of Singapore³

01 Jun 2018

TL;DR: An effective way of processing text documents for training multi-word phrase embeddings that are used for thematic representation of scientific articles and ranking of keyphrases extracted from them using theme-weighted PageRank is proposed.

...read moreread less

Abstract: Keyphrase extraction is a fundamental task in natural language processing that facilitates mapping of documents to a set of representative phrases. In this paper, we present an unsupervised technique (Key2Vec) that leverages phrase embeddings for ranking keyphrases extracted from scientific articles. Specifically, we propose an effective way of processing text documents for training multi-word phrase embeddings that are used for thematic representation of scientific articles and ranking of keyphrases extracted from them using theme-weighted PageRank. Evaluations are performed on benchmark datasets producing state-of-the-art results.

...read moreread less

Proceedings Article•DOI•

Improving Lexical Choice in Neural Machine Translation.

[...]

Toan Q. Nguyen¹, David Chiang²•Institutions (2)

Amazon.com¹, University of Notre Dame²

01 Jun 2018

TL;DR: This article proposed to fix the norms of both vectors to a constant value and integrate a simple lexical module which is jointly trained with the rest of the model, which achieved improvements of up to +4.3 BLEU, surpassing phrase-based translation in nearly all settings.

...read moreread less

Abstract: We explore two solutions to the problem of mistranslating rare words in neural machine translation. First, we argue that the standard output layer, which computes the inner product of a vector representing the context with all possible output word embeddings, rewards frequent words disproportionately, and we propose to fix the norms of both vectors to a constant value. Second, we integrate a simple lexical module which is jointly trained with the rest of the model. We evaluate our approaches on eight language pairs with data sizes ranging from 100k to 8M words, and achieve improvements of up to +4.3 BLEU, surpassing phrase-based translation in nearly all settings.

...read moreread less

Posted Content•

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

[...]

Nicholas Carlini¹, David Wagner¹•Institutions (1)

University of California, Berkeley¹

05 Jan 2018-arXiv: Learning

TL;DR: In this article, a white-box iterative optimization-based attack was applied to Mozilla's DeepSpeech end-to-end speech recognition system, achieving a 100% success rate.

...read moreread less

Abstract: We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (recognizing up to 50 characters per second of audio). We apply our white-box iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples.

...read moreread less

Posted Content•

Phrase-Based & Neural Unsupervised Machine Translation.

[...]

Guillaume Lample¹, Myle Ott¹, Alexis Conneau¹, Ludovic Denoyer², Marc'Aurelio Ranzato¹ - Show less +1 more•Institutions (2)

Facebook¹, University of Paris²

20 Apr 2018-arXiv: Computation and Language

TL;DR: This work investigates how to learn to translate when having access to only large monolingual corpora in each language, and proposes two model variants, a neural and a phrase-based model, which are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters.

...read moreread less

Abstract: Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs. This work investigates how to learn to translate when having access to only large monolingual corpora in each language. We propose two model variants, a neural and a phrase-based model. Both versions leverage a careful initialization of the parameters, the denoising effect of language models and automatic generation of parallel data by iterative back-translation. These models are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters. On the widely used WMT'14 English-French and WMT'16 German-English benchmarks, our models respectively obtain 28.1 and 25.2 BLEU points without using a single parallel sentence, outperforming the state of the art by more than 11 BLEU points. On low-resource languages like English-Urdu and English-Romanian, our methods achieve even better results than semi-supervised and supervised approaches leveraging the paucity of available bitexts. Our code for NMT and PBSMT is publicly available.

...read moreread less

Proceedings Article•DOI•

CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

[...]

Shikhar Vashishth¹, Prince Jain², Partha Pratim Talukdar¹•Institutions (2)

Indian Institute of Science¹, Microsoft²

10 Apr 2018

TL;DR: Canonicalization using Embeddings and Side Information (CESI) is proposed -- a novel approach which performs canonicalization over learned embeddings of Open KBs by incorporating relevant NP and relation phrase side information in a principled manner.

...read moreread less

Abstract: Open Information Extraction (OpenIE) methods extract (noun phrase, relation phrase, noun phrase) triples from text, resulting in the construction of large Open Knowledge Bases (Open KBs). The noun phrases (NPs) and relation phrases in such Open KBs are not canonicalized, leading to the storage of redundant and ambiguous facts. Recent research has posed canonicalization of Open KBs as clustering over manually-defined feature spaces. Manual feature engineering is expensive and often sub-optimal. In order to overcome this challenge, we propose Canonicalization using Embeddings and Side Information (CESI) -- a novel approach which performs canonicalization over learned embeddings of Open KBs. CESI extends recent advances in KB embedding by incorporating relevant NP and relation phrase side information in a principled manner. Through extensive experiments on multiple real-world datasets, we demonstrate CESI's effectiveness.

...read moreread less

Proceedings Article•DOI•

Joint Global and Co-Attentive Representation Learning for Image-Sentence Retrieval

[...]

Shuhui Wang¹, Yangyu Chen¹, Junbao Zhuo¹, Qingming Huang¹, Qi Tian² - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, University of Texas at San Antonio²

15 Oct 2018

TL;DR: A novel softmax-like bi-directional ranking loss to learn the co-attentive representation for image-sentence similarity computation and is capable of discovering the correlative components and rectifying inappropriate component-level correlation to produce more accurate sentence-level ranking results.

...read moreread less

Abstract: In image-sentence retrieval task, correlated images and sentences involve different levels of semantic relevance. However, existing multi-modal representation learning paradigms fail to capture the meaningful component relation on word and phrase level, while the attention-based methods still suffer from component-level mismatching and huge computation burden. We propose a Joint Global and Co-Attentive Representation learning method (JGCAR) for image-sentence retrieval. We formulate a global representation learning task which utilizes both intra-modal and inter-modal relative similarity to optimize the semantic consistency of the visual/textual component representations. We further develop a co-attention learning procedure to fully exploit different levels of visual-linguistic relations. We design a novel softmax-like bi-directional ranking loss to learn the co-attentive representation for image-sentence similarity computation. It is capable of discovering the correlative components and rectifying inappropriate component-level correlation to produce more accurate sentence-level ranking results. By joint global and co-attentive representation learning, the latter benefits from the former by producing more semantically consistent component representation, and the former also benefits from the latter by back-propagating the contextual information. Image-sentence retrieval is performed as a two-step process in the testing stage, inheriting advantages on both effectiveness and efficiency. Experiments show that JGCAR outperforms existing methods on MSCOCO and Flickr30K image-sentence retrieval tasks.

...read moreread less

Proceedings Article•DOI•

Knowledge Aided Consistency for Weakly Supervised Phrase Grounding

[...]

Kan Chen¹, Jiyang Gao², Ram Nevatia²•Institutions (2)

Adobe Systems¹, University of Southern California²

01 Jun 2018

TL;DR: A novel Knowledge Aided Consistency Network (KAC Net) is proposed which is optimized by reconstructing input query and proposal's information, and introduced a Knowledge Based Pooling (KBP) gate to focus on query-related proposals.

...read moreread less

Abstract: Given a natural language query, a phrase grounding system aims to localize mentioned objects in an image. In weakly supevised scenario, mapping between image regions (i.e., proposals) and language is not available in the training set. Previous methods address this deficiency by training a grounding system via learning to reconstruct language information contained in input queries from predicted proposals. However, the optimization is solely guided by the reconstruction loss from the language modality, and ignores rich visual information contained in proposals and useful cues from external knowledge. In this paper, we explore the consistency contained in both visual and language modalities, and leverage complementary external knowledge to facilitate weakly supervised grounding. We propose a novel Knowledge Aided Consistency Network (KAC Net) which is optimized by reconstructing input query and proposal's information. To leverage complementary knowledge contained in the visual features, we introduce a Knowledge Based Pooling (KBP) gate to focus on query-related proposals. Experiments show that KAC Net provides a significant improvement on two popular datasets.

...read moreread less

Proceedings Article•DOI•

Who Am I? Personality Detection Based on Deep Learning for Texts

[...]

Xiangguo Sun¹, Bo Liu¹, Jiuxin Cao¹, Junzhou Luo¹, Xiaojun Shen² - Show less +1 more•Institutions (2)

Southeast University¹, University of Missouri–Kansas City²

20 May 2018

TL;DR: This paper proposes a model named 2CLSTM, which is a bidirectional LSTMs (Long Short Term Memory networks) concatenated with CNN (Convolutional Neural Network), to detect user's personality using structures of texts to show that the structure of texts can be also an important feature in the study of personality detection from texts.

...read moreread less

Abstract: Recently, personality detection based on texts from online social networks has attracted more and more attentions. However, most related models are based on letter, word or phrase, which is not sufficient to get good results. In this paper, we present our preliminary but interesting and useful research results to show that the structure of texts can be also an important feature in the study of personality detection from texts. We propose a model named 2CLSTM, which is a bidirectional LSTMs (Long Short Term Memory networks) concatenated with CNN (Convolutional Neural Network), to detect user's personality using structures of texts. Besides, a concept, Latent Sentence Group (LSG), is put forward to express the abstract feature combination based on closely connected sentences and we use our model to capture it. To the best of our knowledge, most related works only conducted their experiments on one data set, which may not well explain the versatility of their models. We implement our evaluations on two different kinds of datasets, containing long texts and short texts. Evaluations on both datasets have achieved better results, which demonstrate that our model can efficiently learn valid text structure features to accomplish the task.

...read moreread less

Journal Article•DOI•

A Neural Approach to Source Dependence Based Context Model for Statistical Machine Translation

[...]

Kehai Chen¹, Tiejun Zhao², Muyun Yang², Lemao Liu², Akihiro Tamura³, Rui Wang², Masao Utiyama², Eiichiro Sumita² - Show less +4 more•Institutions (3)

Harbin Institute of Technology¹, National Institute of Information and Communications Technology², Ehime University³

01 Feb 2018-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A novel neural approach to source dependence-based context representation for translation prediction capable of not only encoding source long-distance dependencies but also capturing functional similarities to better predict translations.

...read moreread less

Abstract: In statistical machine translation, translation prediction considers not only the aligned source word itself but also its source contextual information. Learning context representation is a promising method for improving translation results, particularly through neural networks. Most of the existing methods process context words sequentially and neglect source long-distance dependencies. In this paper, we propose a novel neural approach to source dependence-based context representation for translation prediction. The proposed model is capable of not only encoding source long-distance dependencies but also capturing functional similarities to better predict translations (i.e., word form translations and ambiguous word translations). To verify our method, the proposed mode is incorporated into phrase-based and hierarchical phrase-based translation models, respectively. Experiments on large-scale Chinese-to-English and English-to-German translation tasks show that the proposed approach achieves significant improvement over the baseline systems and outperforms several existing context-enhanced methods.

...read moreread less

Posted Content•

Knowledge Aided Consistency for Weakly Supervised Phrase Grounding

[...]

Kan Chen¹, Jiyang Gao², Ram Nevatia²•Institutions (2)

Adobe Systems¹, University of Southern California²

11 Mar 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a knowledge-aided consistency network (KAC Net) is proposed to leverage complementary knowledge contained in the visual features, which is optimized by reconstructing input query and proposal's information.

...read moreread less

Abstract: Given a natural language query, a phrase grounding system aims to localize mentioned objects in an image. In weakly supervised scenario, mapping between image regions (i.e., proposals) and language is not available in the training set. Previous methods address this deficiency by training a grounding system via learning to reconstruct language information contained in input queries from predicted proposals. However, the optimization is solely guided by the reconstruction loss from the language modality, and ignores rich visual information contained in proposals and useful cues from external knowledge. In this paper, we explore the consistency contained in both visual and language modalities, and leverage complementary external knowledge to facilitate weakly supervised grounding. We propose a novel Knowledge Aided Consistency Network (KAC Net) which is optimized by reconstructing input query and proposal's information. To leverage complementary knowledge contained in the visual features, we introduce a Knowledge Based Pooling (KBP) gate to focus on query-related proposals. Experiments show that KAC Net provides a significant improvement on two popular datasets.

...read moreread less

Proceedings Article•DOI•

Phrase-level Self-Attention Networks for Universal Sentence Encoding.

[...]

Wei Wu¹, Houfeng Wang¹, Tianyu Liu¹, Shuming Ma¹•Institutions (1)

Peking University¹

01 Jan 2018

TL;DR: Phrase-level Self-Attention Networks (PSAN) that perform self-attention across words inside a phrase to capture context dependencies at the phrase level, and use the gated memory updating mechanism to refine each word’s representation hierarchically with longer-term context dependencies captured in a larger phrase are proposed.

...read moreread less

Abstract: Universal sentence encoding is a hot topic in recent NLP research Attention mechanism has been an integral part in many sentence encoding models, allowing the models to capture context dependencies regardless of the distance between the elements in the sequence Fully attention-based models have recently attracted enormous interest due to their highly parallelizable computation and significantly less training time However, the memory consumption of their models grows quadratically with the sentence length, and the syntactic information is neglected To this end, we propose Phrase-level Self-Attention Networks (PSAN) that perform self-attention across words inside a phrase to capture context dependencies at the phrase level, and use the gated memory updating mechanism to refine each word’s representation hierarchically with longer-term context dependencies captured in a larger phrase As a result, the memory consumption can be reduced because the self-attention is performed at the phrase level instead of the sentence level At the same time, syntactic information can be easily integrated in the model Experiment results show that PSAN can achieve the state-of-the-art performance across a plethora of NLP tasks including binary and multi-class classification, natural language inference and sentence similarity

...read moreread less

Journal Article•DOI•

Sentence entailment in compositional distributional semantics

[...]

Mehrnoosh Sadrzadeh¹, Dimitri Kartsaklis¹, Esma Balkir¹•Institutions (1)

Queen Mary University of London¹

15 Feb 2018-Annals of Mathematics and Artificial Intelligence

TL;DR: In this article, the authors show that entropy-based distances of vectors and density matrices provide a good candidate to measure word-level entailment, and prove that these distances extend compositionally from words to phrases and sentences.

...read moreread less

Abstract: Distributional semantic models provide vector representations for words by gathering co-occurrence frequencies from corpora of text. Compositional distributional models extend these from words to phrases and sentences. In categorical compositional distributional semantics, phrase and sentence representations are functions of their grammatical structure and representations of the words therein. In this setting, grammatical structures are formalised by morphisms of a compact closed category and meanings of words are formalised by objects of the same category. These can be instantiated in the form of vectors or density matrices. This paper concerns the applications of this model to phrase and sentence level entailment. We argue that entropy-based distances of vectors and density matrices provide a good candidate to measure word-level entailment, show the advantage of density matrices over vectors for word level entailments, and prove that these distances extend compositionally from words to phrases and sentences. We exemplify our theoretical constructions on real data and a toy entailment dataset and provide preliminary experimental evidence.

...read moreread less

Proceedings Article•DOI•

A Word-Complexity Lexicon and A Neural Readability Ranking Model for Lexical Simplification

[...]

Mounica Maddela, Wei Xu¹•Institutions (1)

Ohio State University¹

01 Jan 2018

TL;DR: This work creates a human-rated word-complexity lexicon of 15,000 English words and proposes a novel neural readability ranking model with a Gaussian-based feature vectorization layer that utilizes these human ratings to measure the complexity of any given word or phrase.

...read moreread less

Abstract: Current lexical simplification approaches rely heavily on heuristics and corpus level features that do not always align with human judgment. We create a human-rated word-complexity lexicon of 15,000 English words and propose a novel neural readability ranking model with a Gaussian-based feature vectorization layer that utilizes these human ratings to measure the complexity of any given word or phrase. Our model performs better than the state-of-the-art systems for different lexical simplification tasks and evaluation datasets. Additionally, we also produce SimplePPDB++, a lexical resource of over 10 million simplifying paraphrase rules, by applying our model to the Paraphrase Database (PPDB).

...read moreread less

Posted Content•

Secure Phrase Search for Intelligent Processing of Encrypted Data in Cloud-Based IoT

[...]

Meng Shen¹, Baoli Ma¹, Liehuang Zhu¹, Xiaojiang Du², Ke Xu³ - Show less +1 more•Institutions (3)

Beijing Institute of Technology¹, Temple University², Tsinghua University³

21 Sep 2018-arXiv: Cryptography and Security

TL;DR: This paper proposes P3, an efficient privacy-preserving phrase search scheme for intelligent encrypted data processing in cloud-based IoT that exploits the homomorphic encryption and bilinear map to determine the location relationship of multiple queried keywords over encrypted data.

...read moreread less

Abstract: Phrase search allows retrieval of documents containing an exact phrase, which plays an important role in many machine learning applications for cloud-based IoT, such as intelligent medical data analytics. In order to protect sensitive information from being leaked by service providers, documents (e.g., clinic records) are usually encrypted by data owners before being outsourced to the cloud. This, however, makes the search operation an extremely challenging task. Existing searchable encryption schemes for multi-keyword search operations fail to perform phrase search, as they are unable to determine the location relationship of multiple keywords in a queried phrase over encrypted data on the cloud server side. In this paper, we propose P3, an efficient privacy-preserving phrase search scheme for intelligent encrypted data processing in cloud-based IoT. Our scheme exploits the homomorphic encryption and bilinear map to determine the location relationship of multiple queried keywords over encrypted data. It also utilizes a probabilistic trapdoor generation algorithm to protect users search patterns. Thorough security analysis demonstrates the security guarantees achieved by P3. We implement a prototype and conduct extensive experiments on real-world datasets. The evaluation results show that compared with existing multikeyword search schemes, P3 can greatly improve the search accuracy with moderate overheads.

...read moreread less

Book Chapter•DOI•

"One size fits all": an idea whose time has come and gone

[...]

Michael Stonebraker¹, Uĝur Çetintemel²•Institutions (2)

Massachusetts Institute of Technology¹, Brown University²

01 Dec 2018

TL;DR: It is argued that the commercial world will fracture into a collection of independent database engines, some of which may be unified by a common front-end parser, and that the classical DBMS architecture is no longer applicable to the database market.

...read moreread less

Abstract: The last 25 years of commercial DBMS development can be summed up in a single phrase: "One size fits all". This phrase refers to the fact that the traditional DBMS architecture (originally designed and optimized for business data processing) has been used to support many data-centric applications with widely varying characteristics and requirements.In this paper, we argue that this concept is no longer applicable to the database market, and that the commercial world will fracture into a collection of independent database engines, some of which may be unified by a common front-end parser. We use examples from the stream-processing market and the data-warehouse market to bolster our claims. We also briefly discuss other markets for which the traditional architecture is a poor fit and argue for a critical rethinking of the current factoring of systems services into products.

...read moreread less

Posted Content•

MAttNet: Modular Attention Network for Referring Expression Comprehension

[...]

Licheng Yu¹, Zhe Lin², Xiaohui Shen², Jimei Yang², Xin Lu², Mohit Bansal¹, Tamara L. Berg¹ - Show less +3 more•Institutions (2)

University of North Carolina at Chapel Hill¹, Adobe Systems²

24 Jan 2018-arXiv: Computer Vision and Pattern Recognition

...read moreread less

Abstract: In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression. While most recent work treats expressions as a single unit, we propose to decompose them into three modular components related to subject appearance, location, and relationship to other objects. This allows us to flexibly adapt to expressions containing different types of information in an end-to-end framework. In our model, which we call the Modular Attention Network (MAttNet), two types of attention are utilized: language-based attention that learns the module weights as well as the word/phrase attention that each module should focus on; and visual attention that allows the subject and relationship modules to focus on relevant image components. Module weights combine scores from all three modules dynamically to output an overall score. Experiments show that MAttNet outperforms previous state-of-art methods by a large margin on both bounding-box-level and pixel-level comprehension tasks. Demo and code are provided.

...read moreread less

Journal Article•DOI•

Sentence Selection and Weighting for Neural Machine Translation Domain Adaptation

[...]

Rui Wang¹, Masao Utiyama¹, Andrew Finch¹, Lemao Liu², Kehai Chen¹, Eiichiro Sumita¹ - Show less +2 more•Institutions (2)

National Institute of Information and Communications Technology¹, Tencent²

01 Oct 2018-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: Empirical results show that the sentence selection and weighting methods can significantly improve the NMT performance, outperforming the existing baselines.

...read moreread less

Abstract: Neural machine translation (NMT) has been prominent in many machine translation tasks. However, in some domain-specific tasks, only the corpora from similar domains can improve translation performance. If out-of-domain corpora are directly added into the in-domain corpus, the translation performance may even degrade. Therefore, domain adaptation techniques are essential to solve the NMT domain problem. Most existing methods for domain adaptation are designed for the conventional phrase-based machine translation. For NMT domain adaptation, there have been only a few studies on topics such as fine tuning, domain tags, and domain features. In this paper, we have four goals for sentence level NMT domain adaptation. First, the NMT's internal sentence embedding is exploited and the sentence embedding similarity is used to select out-of-domain sentences that are close to the in-domain corpus. Second, we propose three sentence weighting methods, i.e., sentence weighting, domain weighting, and batch weighting, to balance the data distribution during NMT training. Third, in addition, we propose dynamic training methods to adjust the sentence selection and weighting during NMT training. Fourth, to solve the multidomain problem in a real-world NMT scenario where the domain distributions of training and testing data often mismatch, we proposed a multidomain sentence weighting method to balance the domain distributions of training data and match the domain distributions of training and testing data. The proposed methods are evaluated in international workshop on spoken language translation (IWSLT) English-to-French/German tasks and a multidomain English-to-French task. Empirical results show that the sentence selection and weighting methods can significantly improve the NMT performance, outperforming the existing baselines.

...read moreread less

Posted Content•

Grounding Visual Explanations

[...]

Lisa Anne Hendricks¹, Ronghang Hu¹, Trevor Darrell¹, Zeynep Akata²•Institutions (2)

University of California, Berkeley¹, University of Amsterdam²

25 Jul 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: A phrase-critic model to refine generated candidate explanations augmented with flipped phrases to improve the textual explanation quality of fine-grained classification decisions on the CUB dataset by mentioning phrases that are grounded in the image.

...read moreread less

Proceedings Article•

Using Syntax to Ground Referring Expressions in Natural Images.

[...]

Volkan Cirik¹, Taylor Berg-Kirkpatrick², Louis-Philippe Morency³•Institutions (3)

Koç University¹, Microsoft², Carnegie Mellon University³

26 May 2018

TL;DR: For instance, GroundNet as discussed by the authors uses a syntactic analysis of the input referring expression in order to inform the structure of the computation graph and localizes the object referred to by a natural language expression.

...read moreread less

Abstract: We introduce GroundNet, a neural network for referring expression recognition -- the task of localizing (or grounding) in an image the object referred to by a natural language expression. Our approach to this task is the first to rely on a syntactic analysis of the input referring expression in order to inform the structure of the computation graph. Given a parse tree for an input expression, we explicitly map the syntactic constituents and relationships present in the tree to a composed graph of neural modules that defines our architecture for performing localization. This syntax-based approach aids localization of \textit{both} the target object and auxiliary supporting objects mentioned in the expression. As a result, GroundNet is more interpretable than previous methods: we can (1) determine which phrase of the referring expression points to which object in the image and (2) track how the localization of the target object is determined by the network. We study this property empirically by introducing a new set of annotations on the GoogleRef dataset to evaluate localization of supporting objects. Our experiments show that GroundNet achieves state-of-the-art accuracy in identifying supporting objects, while maintaining comparable performance in the localization of target objects.

...read moreread less

Proceedings Article•DOI•

Mapping Text to Knowledge Graph Entities using Multi-Sense LSTMs

[...]

Dimitri Kartsaklis¹, Mohammad Taher Pilehvar², Nigel Collier²•Institutions (2)

Queen Mary University of London¹, University of Cambridge²

01 Jan 2018

TL;DR: This paper propose a multi-sense LSTM with a dynamic disambiguation mechanism on the input word embeddings to address polysemy issues in text-to-entity mapping.

...read moreread less

Abstract: This paper addresses the problem of mapping natural language text to knowledge base entities. The mapping process is approached as a composition of a phrase or a sentence into a point in a multi-dimensional entity space obtained from a knowledge graph. The compositional model is an LSTM equipped with a dynamic disambiguation mechanism on the input word embeddings (a Multi-Sense LSTM), addressing polysemy issues. Further, the knowledge base space is prepared by collecting random walks from a graph enhanced with textual features, which act as a set of semantic bridges between text and knowledge base entities. The ideas of this work are demonstrated on large-scale text-to-entity mapping and entity classification tasks, with state of the art results.

...read moreread less

Collapse