scispace - formally typeset
Search or ask a question

Showing papers on "Phrase published in 2018"


Proceedings ArticleDOI
01 Jun 2018
TL;DR: The authors decompose expressions into three modular components related to subject appearance, location, and relationship to other objects in an end-to-end framework, which allows to flexibly adapt to expressions containing different types of information.
Abstract: In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression. While most recent work treats expressions as a single unit, we propose to decompose them into three modular components related to subject appearance, location, and relationship to other objects. This allows us to flexibly adapt to expressions containing different types of information in an end-to-end framework. In our model, which we call the Modular Attention Network (MAttNet), two types of attention are utilized: language-based attention that learns the module weights as well as the word/phrase attention that each module should focus on; and visual attention that allows the subject and relationship modules to focus on relevant image components. Module weights combine scores from all three modules dynamically to output an overall score. Experiments show that MAttNet outperforms previous state-of-the-art methods by a large margin on both bounding-box-level and pixel-level comprehension tasks. Demo1 and code2 are provided.

626 citations


Proceedings ArticleDOI
20 Apr 2018
TL;DR: The authors proposed two model variants, a neural and a phrase-based model, which leverage a careful initialization of the parameters, the denoising effect of language models and automatic generation of parallel data by iterative back-translation.
Abstract: Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs. This work investigates how to learn to translate when having access to only large monolingual corpora in each language. We propose two model variants, a neural and a phrase-based model. Both versions leverage a careful initialization of the parameters, the denoising effect of language models and automatic generation of parallel data by iterative back-translation. These models are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters. On the widely used WMT’14 English-French and WMT’16 German-English benchmarks, our models respectively obtain 28.1 and 25.2 BLEU points without using a single parallel sentence, outperforming the state of the art by more than 11 BLEU points. On low-resource languages like English-Urdu and English-Romanian, our methods achieve even better results than semi-supervised and supervised approaches leveraging the paucity of available bitexts. Our code for NMT and PBSMT is publicly available.

461 citations


Journal ArticleDOI
TL;DR: This paper proposed a framework for automated phrase mining, $\mathsf{AutoPhrase}$, which supports any language as long as a general knowledge base (e.g., Wikipedia) in that language is available, while benefiting from, but not requiring, a POS tagger.
Abstract: As one of the fundamental tasks in text analysis, phrase mining aims at extracting quality phrases from a text corpus and has various downstream applications including information extraction/retrieval, taxonomy construction, and topic modeling. Most existing methods rely on complex, trained linguistic analyzers, and thus likely have unsatisfactory performance on text corpora of new domains and genres without extra but expensive adaption. None of the state-of-the-art models, even data-driven models, is fully automated because they require human experts for designing rules or labeling phrases. In this paper, we propose a novel framework for automated phrase mining, $\mathsf{AutoPhrase}$ , which supports any language as long as a general knowledge base (e.g., Wikipedia) in that language is available, while benefiting from, but not requiring, a POS tagger. Compared to the state-of-the-art methods, $\mathsf{AutoPhrase}$ has shown significant improvements in both effectiveness and efficiency on five real-world datasets across different domains and languages. Besides, $\mathsf{AutoPhrase}$ can be extended to model single-word quality phrases.

286 citations


Journal ArticleDOI
Lars Meyer1
TL;DR: An accessible and extensive review of the functional mechanisms that neural oscillations subserve in speech processing and language comprehension and synthesises a mapping from each linguistic processing domain to a unique set of subserving oscillatory mechanisms.
Abstract: Neural oscillations subserve a broad range of functions in speech processing and language comprehension. On the one hand, speech contains-somewhat-repetitive trains of air pressure bursts that occur at three dominant amplitude modulation frequencies, physically marking the linguistically meaningful progressions of phonemes, syllables and intonational phrase boundaries. To these acoustic events, neural oscillations of isomorphous operating frequencies are thought to synchronise, presumably resulting in an implicit temporal alignment of periods of neural excitability to linguistically meaningful spectral information on the three low-level linguistic description levels. On the other hand, speech is a carrier signal that codes for high-level linguistic meaning, such as syntactic structure and semantic information-which cannot be read from stimulus acoustics, but must be acquired during language acquisition and decoded for language comprehension. Neural oscillations subserve the processing of both syntactic structure and semantic information. Here, I synthesise a mapping from each linguistic processing domain to a unique set of subserving oscillatory mechanisms-the mapping is plausible given the role ascribed to different oscillatory mechanisms in different subfunctions of cortical information processing and faithful to the underlying electrophysiology. In sum, the present article provides an accessible and extensive review of the functional mechanisms that neural oscillations subserve in speech processing and language comprehension.

223 citations


Proceedings ArticleDOI
18 Jun 2018
TL;DR: The A-ATT mechanism can circularly accumulate the attention for useful information in image, query, and objects, while the noises are ignored gradually and the experimental results show the superiority of the proposed method in term of accuracy.
Abstract: Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence or even a multi-round dialogue. There are three main challenges in VG: 1) what is the main focus in a query; 2) how to understand an image; 3) how to locate an object. Most existing methods combine all the information curtly, which may suffer from the problem of information redundancy (i.e. ambiguous query, complicated image and a large number of objects). In this paper, we formulate these challenges as three attention problems and propose an accumulated attention (A-ATT) mechanism to reason among them jointly. Our A-ATT mechanism can circularly accumulate the attention for useful information in image, query, and objects, while the noises are ignored gradually. We evaluate the performance of A-ATT on four popular datasets (namely Refer-COCO, ReferCOCO+, ReferCOCOg, and Guesswhat?!), and the experimental results show the superiority of the proposed method in term of accuracy.

197 citations


Book ChapterDOI
08 Sep 2018
TL;DR: This paper propose a phrase-critic model to refine generated candidate explanations augmented with flipped phrases which are used as negative examples while training, which improves the textual explanation quality of fine-grained classification decisions by mentioning phrases that are grounded in the image.
Abstract: Existing visual explanation generating agents learn to fluently justify a class prediction. However, they may mention visual attributes which reflect a strong class prior, although the evidence may not actually be in the image. This is particularly concerning as ultimately such agents fail in building trust with human users. To overcome this limitation, we propose a phrase-critic model to refine generated candidate explanations augmented with flipped phrases which we use as negative examples while training. At inference time, our phrase-critic model takes an image and a candidate explanation as input and outputs a score indicating how well the candidate explanation is grounded in the image. Our explainable AI agent is capable of providing counter arguments for an alternative prediction, i.e. counterfactuals, along with explanations that justify the correct classification decisions. Our model improves the textual explanation quality of fine-grained classification decisions on the CUB dataset by mentioning phrases that are grounded in the image. Moreover, on the FOIL tasks, our agent detects when there is a mistake in the sentence, grounds the incorrect phrase and corrects it significantly better than other models.

173 citations


Proceedings Article
26 Apr 2018
TL;DR: Results show that the proposed reinforcement learning method can learn task-friendly representations by identifying important words or task-relevant structures without explicit structure annotations, and thus yields competitive performance.
Abstract: Representation learning is a fundamental problem in natural language processing. This paper studies how to learn a structured representation for text classification. Unlike most existing representation models that either use no structure or rely on pre-specified structures, we propose a reinforcement learning (RL) method to learn sentence representation by discovering optimized structures automatically. We demonstrate two attempts to build structured representation: Information Distilled LSTM (ID-LSTM) and Hierarchically Structured LSTM (HS-LSTM). ID-LSTM selects only important, task-relevant words, and HS-LSTM discovers phrase structures in a sentence. Structure discovery in the two representation models is formulated as a sequential decision problem: current decision of structure discovery affects following decisions, which can be addressed by policy gradient RL. Results show that our method can learn task-friendly representations by identifying important words or task-relevant structures without explicit structure annotations, and thus yields competitive performance.

142 citations


Book ChapterDOI
08 Sep 2018
TL;DR: This paper propose a neural module network architecture for visual dialog by introducing two novel modules, refer and exclude, that perform explicit, grounded, coreference resolution at a finer word level, and demonstrate the effectiveness of their model on MNIST Dialog.
Abstract: Visual dialog entails answering a series of questions grounded in an image, using dialog history as context. In addition to the challenges found in visual question answering (VQA), which can be seen as one-round dialog, visual dialog encompasses several more. We focus on one such problem called visual coreference resolution that involves determining which words, typically noun phrases and pronouns, co-refer to the same entity/object instance in an image. This is crucial, especially for pronouns (e.g., ‘it’), as the dialog agent must first link it to a previous coreference (e.g., ‘boat’), and only then can rely on the visual grounding of the coreference ‘boat’ to reason about the pronoun ‘it’. Prior work (in visual dialog) models visual coreference resolution either (a) implicitly via a memory network over history, or (b) at a coarse level for the entire question; and not explicitly at a phrase level of granularity. In this work, we propose a neural module network architecture for visual dialog by introducing two novel modules—Refer and Exclude—that perform explicit, grounded, coreference resolution at a finer word level. We demonstrate the effectiveness of our model on MNIST Dialog, a visually simple yet coreference-wise complex dataset, by achieving near perfect accuracy, and on VisDial, a large and challenging visual dialog dataset on real images, where our model outperforms other approaches, and is more interpretable, grounded, and consistent qualitatively.

134 citations


Posted Content
TL;DR: This work proposes a neural module network architecture for visual dialog by introducing two novel modules—Refer and Exclude—that perform explicit, grounded, coreference resolution at a finer word level, and demonstrates the effectiveness of the model on MNIST Dialog, a visually simple yet coreference-wise complex dataset, by achieving near perfect accuracy.
Abstract: Visual dialog entails answering a series of questions grounded in an image, using dialog history as context. In addition to the challenges found in visual question answering (VQA), which can be seen as one-round dialog, visual dialog encompasses several more. We focus on one such problem called visual coreference resolution that involves determining which words, typically noun phrases and pronouns, co-refer to the same entity/object instance in an image. This is crucial, especially for pronouns (e.g., `it'), as the dialog agent must first link it to a previous coreference (e.g., `boat'), and only then can rely on the visual grounding of the coreference `boat' to reason about the pronoun `it'. Prior work (in visual dialog) models visual coreference resolution either (a) implicitly via a memory network over history, or (b) at a coarse level for the entire question; and not explicitly at a phrase level of granularity. In this work, we propose a neural module network architecture for visual dialog by introducing two novel modules - Refer and Exclude - that perform explicit, grounded, coreference resolution at a finer word level. We demonstrate the effectiveness of our model on MNIST Dialog, a visually simple yet coreference-wise complex dataset, by achieving near perfect accuracy, and on VisDial, a large and challenging visual dialog dataset on real images, where our model outperforms other approaches, and is more interpretable, grounded, and consistent qualitatively.

107 citations


Book ChapterDOI
08 Sep 2018
TL;DR: This article propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments, allowing the underrepresented concepts to take advantage of the shared representations before feeding them into concept-specific layers.
Abstract: This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies the representation requirements for individual embeddings and allows the underrepresented concepts to take advantage of the shared representations before feeding them into concept-specific layers. Comprehensive experiments verify the effectiveness of our approach across three phrase grounding datasets, Flickr30K Entities, ReferIt Game, and Visual Genome, where we obtain a (resp.) 4%, 3%, and 4% improvement in grounding performance over a strong region-phrase embedding baseline (Code: https://github.com/BryanPlummer/cite).

107 citations


Proceedings ArticleDOI
01 Jun 2018
TL;DR: An effective way of processing text documents for training multi-word phrase embeddings that are used for thematic representation of scientific articles and ranking of keyphrases extracted from them using theme-weighted PageRank is proposed.
Abstract: Keyphrase extraction is a fundamental task in natural language processing that facilitates mapping of documents to a set of representative phrases. In this paper, we present an unsupervised technique (Key2Vec) that leverages phrase embeddings for ranking keyphrases extracted from scientific articles. Specifically, we propose an effective way of processing text documents for training multi-word phrase embeddings that are used for thematic representation of scientific articles and ranking of keyphrases extracted from them using theme-weighted PageRank. Evaluations are performed on benchmark datasets producing state-of-the-art results.

Proceedings ArticleDOI
01 Jun 2018
TL;DR: This article proposed to fix the norms of both vectors to a constant value and integrate a simple lexical module which is jointly trained with the rest of the model, which achieved improvements of up to +4.3 BLEU, surpassing phrase-based translation in nearly all settings.
Abstract: We explore two solutions to the problem of mistranslating rare words in neural machine translation. First, we argue that the standard output layer, which computes the inner product of a vector representing the context with all possible output word embeddings, rewards frequent words disproportionately, and we propose to fix the norms of both vectors to a constant value. Second, we integrate a simple lexical module which is jointly trained with the rest of the model. We evaluate our approaches on eight language pairs with data sizes ranging from 100k to 8M words, and achieve improvements of up to +4.3 BLEU, surpassing phrase-based translation in nearly all settings.

Posted Content
TL;DR: In this article, a white-box iterative optimization-based attack was applied to Mozilla's DeepSpeech end-to-end speech recognition system, achieving a 100% success rate.
Abstract: We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (recognizing up to 50 characters per second of audio). We apply our white-box iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples.

Posted Content
TL;DR: This work investigates how to learn to translate when having access to only large monolingual corpora in each language, and proposes two model variants, a neural and a phrase-based model, which are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters.
Abstract: Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs. This work investigates how to learn to translate when having access to only large monolingual corpora in each language. We propose two model variants, a neural and a phrase-based model. Both versions leverage a careful initialization of the parameters, the denoising effect of language models and automatic generation of parallel data by iterative back-translation. These models are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters. On the widely used WMT'14 English-French and WMT'16 German-English benchmarks, our models respectively obtain 28.1 and 25.2 BLEU points without using a single parallel sentence, outperforming the state of the art by more than 11 BLEU points. On low-resource languages like English-Urdu and English-Romanian, our methods achieve even better results than semi-supervised and supervised approaches leveraging the paucity of available bitexts. Our code for NMT and PBSMT is publicly available.

Proceedings ArticleDOI
10 Apr 2018
TL;DR: Canonicalization using Embeddings and Side Information (CESI) is proposed -- a novel approach which performs canonicalization over learned embeddings of Open KBs by incorporating relevant NP and relation phrase side information in a principled manner.
Abstract: Open Information Extraction (OpenIE) methods extract (noun phrase, relation phrase, noun phrase) triples from text, resulting in the construction of large Open Knowledge Bases (Open KBs). The noun phrases (NPs) and relation phrases in such Open KBs are not canonicalized, leading to the storage of redundant and ambiguous facts. Recent research has posed canonicalization of Open KBs as clustering over manually-defined feature spaces. Manual feature engineering is expensive and often sub-optimal. In order to overcome this challenge, we propose Canonicalization using Embeddings and Side Information (CESI) -- a novel approach which performs canonicalization over learned embeddings of Open KBs. CESI extends recent advances in KB embedding by incorporating relevant NP and relation phrase side information in a principled manner. Through extensive experiments on multiple real-world datasets, we demonstrate CESI's effectiveness.

Proceedings ArticleDOI
15 Oct 2018
TL;DR: A novel softmax-like bi-directional ranking loss to learn the co-attentive representation for image-sentence similarity computation and is capable of discovering the correlative components and rectifying inappropriate component-level correlation to produce more accurate sentence-level ranking results.
Abstract: In image-sentence retrieval task, correlated images and sentences involve different levels of semantic relevance. However, existing multi-modal representation learning paradigms fail to capture the meaningful component relation on word and phrase level, while the attention-based methods still suffer from component-level mismatching and huge computation burden. We propose a Joint Global and Co-Attentive Representation learning method (JGCAR) for image-sentence retrieval. We formulate a global representation learning task which utilizes both intra-modal and inter-modal relative similarity to optimize the semantic consistency of the visual/textual component representations. We further develop a co-attention learning procedure to fully exploit different levels of visual-linguistic relations. We design a novel softmax-like bi-directional ranking loss to learn the co-attentive representation for image-sentence similarity computation. It is capable of discovering the correlative components and rectifying inappropriate component-level correlation to produce more accurate sentence-level ranking results. By joint global and co-attentive representation learning, the latter benefits from the former by producing more semantically consistent component representation, and the former also benefits from the latter by back-propagating the contextual information. Image-sentence retrieval is performed as a two-step process in the testing stage, inheriting advantages on both effectiveness and efficiency. Experiments show that JGCAR outperforms existing methods on MSCOCO and Flickr30K image-sentence retrieval tasks.

Proceedings ArticleDOI
01 Jun 2018
TL;DR: A novel Knowledge Aided Consistency Network (KAC Net) is proposed which is optimized by reconstructing input query and proposal's information, and introduced a Knowledge Based Pooling (KBP) gate to focus on query-related proposals.
Abstract: Given a natural language query, a phrase grounding system aims to localize mentioned objects in an image. In weakly supevised scenario, mapping between image regions (i.e., proposals) and language is not available in the training set. Previous methods address this deficiency by training a grounding system via learning to reconstruct language information contained in input queries from predicted proposals. However, the optimization is solely guided by the reconstruction loss from the language modality, and ignores rich visual information contained in proposals and useful cues from external knowledge. In this paper, we explore the consistency contained in both visual and language modalities, and leverage complementary external knowledge to facilitate weakly supervised grounding. We propose a novel Knowledge Aided Consistency Network (KAC Net) which is optimized by reconstructing input query and proposal's information. To leverage complementary knowledge contained in the visual features, we introduce a Knowledge Based Pooling (KBP) gate to focus on query-related proposals. Experiments show that KAC Net provides a significant improvement on two popular datasets.

Proceedings ArticleDOI
20 May 2018
TL;DR: This paper proposes a model named 2CLSTM, which is a bidirectional LSTMs (Long Short Term Memory networks) concatenated with CNN (Convolutional Neural Network), to detect user's personality using structures of texts to show that the structure of texts can be also an important feature in the study of personality detection from texts.
Abstract: Recently, personality detection based on texts from online social networks has attracted more and more attentions. However, most related models are based on letter, word or phrase, which is not sufficient to get good results. In this paper, we present our preliminary but interesting and useful research results to show that the structure of texts can be also an important feature in the study of personality detection from texts. We propose a model named 2CLSTM, which is a bidirectional LSTMs (Long Short Term Memory networks) concatenated with CNN (Convolutional Neural Network), to detect user's personality using structures of texts. Besides, a concept, Latent Sentence Group (LSG), is put forward to express the abstract feature combination based on closely connected sentences and we use our model to capture it. To the best of our knowledge, most related works only conducted their experiments on one data set, which may not well explain the versatility of their models. We implement our evaluations on two different kinds of datasets, containing long texts and short texts. Evaluations on both datasets have achieved better results, which demonstrate that our model can efficiently learn valid text structure features to accomplish the task.

Journal ArticleDOI
TL;DR: A novel neural approach to source dependence-based context representation for translation prediction capable of not only encoding source long-distance dependencies but also capturing functional similarities to better predict translations.
Abstract: In statistical machine translation, translation prediction considers not only the aligned source word itself but also its source contextual information. Learning context representation is a promising method for improving translation results, particularly through neural networks. Most of the existing methods process context words sequentially and neglect source long-distance dependencies. In this paper, we propose a novel neural approach to source dependence-based context representation for translation prediction. The proposed model is capable of not only encoding source long-distance dependencies but also capturing functional similarities to better predict translations (i.e., word form translations and ambiguous word translations). To verify our method, the proposed mode is incorporated into phrase-based and hierarchical phrase-based translation models, respectively. Experiments on large-scale Chinese-to-English and English-to-German translation tasks show that the proposed approach achieves significant improvement over the baseline systems and outperforms several existing context-enhanced methods.

Posted Content
TL;DR: In this paper, a knowledge-aided consistency network (KAC Net) is proposed to leverage complementary knowledge contained in the visual features, which is optimized by reconstructing input query and proposal's information.
Abstract: Given a natural language query, a phrase grounding system aims to localize mentioned objects in an image. In weakly supervised scenario, mapping between image regions (i.e., proposals) and language is not available in the training set. Previous methods address this deficiency by training a grounding system via learning to reconstruct language information contained in input queries from predicted proposals. However, the optimization is solely guided by the reconstruction loss from the language modality, and ignores rich visual information contained in proposals and useful cues from external knowledge. In this paper, we explore the consistency contained in both visual and language modalities, and leverage complementary external knowledge to facilitate weakly supervised grounding. We propose a novel Knowledge Aided Consistency Network (KAC Net) which is optimized by reconstructing input query and proposal's information. To leverage complementary knowledge contained in the visual features, we introduce a Knowledge Based Pooling (KBP) gate to focus on query-related proposals. Experiments show that KAC Net provides a significant improvement on two popular datasets.

Proceedings ArticleDOI
01 Jan 2018
TL;DR: Phrase-level Self-Attention Networks (PSAN) that perform self-attention across words inside a phrase to capture context dependencies at the phrase level, and use the gated memory updating mechanism to refine each word’s representation hierarchically with longer-term context dependencies captured in a larger phrase are proposed.
Abstract: Universal sentence encoding is a hot topic in recent NLP research Attention mechanism has been an integral part in many sentence encoding models, allowing the models to capture context dependencies regardless of the distance between the elements in the sequence Fully attention-based models have recently attracted enormous interest due to their highly parallelizable computation and significantly less training time However, the memory consumption of their models grows quadratically with the sentence length, and the syntactic information is neglected To this end, we propose Phrase-level Self-Attention Networks (PSAN) that perform self-attention across words inside a phrase to capture context dependencies at the phrase level, and use the gated memory updating mechanism to refine each word’s representation hierarchically with longer-term context dependencies captured in a larger phrase As a result, the memory consumption can be reduced because the self-attention is performed at the phrase level instead of the sentence level At the same time, syntactic information can be easily integrated in the model Experiment results show that PSAN can achieve the state-of-the-art performance across a plethora of NLP tasks including binary and multi-class classification, natural language inference and sentence similarity

Journal ArticleDOI
TL;DR: In this article, the authors show that entropy-based distances of vectors and density matrices provide a good candidate to measure word-level entailment, and prove that these distances extend compositionally from words to phrases and sentences.
Abstract: Distributional semantic models provide vector representations for words by gathering co-occurrence frequencies from corpora of text. Compositional distributional models extend these from words to phrases and sentences. In categorical compositional distributional semantics, phrase and sentence representations are functions of their grammatical structure and representations of the words therein. In this setting, grammatical structures are formalised by morphisms of a compact closed category and meanings of words are formalised by objects of the same category. These can be instantiated in the form of vectors or density matrices. This paper concerns the applications of this model to phrase and sentence level entailment. We argue that entropy-based distances of vectors and density matrices provide a good candidate to measure word-level entailment, show the advantage of density matrices over vectors for word level entailments, and prove that these distances extend compositionally from words to phrases and sentences. We exemplify our theoretical constructions on real data and a toy entailment dataset and provide preliminary experimental evidence.

Proceedings ArticleDOI
01 Jan 2018
TL;DR: This work creates a human-rated word-complexity lexicon of 15,000 English words and proposes a novel neural readability ranking model with a Gaussian-based feature vectorization layer that utilizes these human ratings to measure the complexity of any given word or phrase.
Abstract: Current lexical simplification approaches rely heavily on heuristics and corpus level features that do not always align with human judgment. We create a human-rated word-complexity lexicon of 15,000 English words and propose a novel neural readability ranking model with a Gaussian-based feature vectorization layer that utilizes these human ratings to measure the complexity of any given word or phrase. Our model performs better than the state-of-the-art systems for different lexical simplification tasks and evaluation datasets. Additionally, we also produce SimplePPDB++, a lexical resource of over 10 million simplifying paraphrase rules, by applying our model to the Paraphrase Database (PPDB).

Posted Content
TL;DR: This paper proposes P3, an efficient privacy-preserving phrase search scheme for intelligent encrypted data processing in cloud-based IoT that exploits the homomorphic encryption and bilinear map to determine the location relationship of multiple queried keywords over encrypted data.
Abstract: Phrase search allows retrieval of documents containing an exact phrase, which plays an important role in many machine learning applications for cloud-based IoT, such as intelligent medical data analytics. In order to protect sensitive information from being leaked by service providers, documents (e.g., clinic records) are usually encrypted by data owners before being outsourced to the cloud. This, however, makes the search operation an extremely challenging task. Existing searchable encryption schemes for multi-keyword search operations fail to perform phrase search, as they are unable to determine the location relationship of multiple keywords in a queried phrase over encrypted data on the cloud server side. In this paper, we propose P3, an efficient privacy-preserving phrase search scheme for intelligent encrypted data processing in cloud-based IoT. Our scheme exploits the homomorphic encryption and bilinear map to determine the location relationship of multiple queried keywords over encrypted data. It also utilizes a probabilistic trapdoor generation algorithm to protect users search patterns. Thorough security analysis demonstrates the security guarantees achieved by P3. We implement a prototype and conduct extensive experiments on real-world datasets. The evaluation results show that compared with existing multikeyword search schemes, P3 can greatly improve the search accuracy with moderate overheads.

Book ChapterDOI
01 Dec 2018
TL;DR: It is argued that the commercial world will fracture into a collection of independent database engines, some of which may be unified by a common front-end parser, and that the classical DBMS architecture is no longer applicable to the database market.
Abstract: The last 25 years of commercial DBMS development can be summed up in a single phrase: "One size fits all". This phrase refers to the fact that the traditional DBMS architecture (originally designed and optimized for business data processing) has been used to support many data-centric applications with widely varying characteristics and requirements.In this paper, we argue that this concept is no longer applicable to the database market, and that the commercial world will fracture into a collection of independent database engines, some of which may be unified by a common front-end parser. We use examples from the stream-processing market and the data-warehouse market to bolster our claims. We also briefly discuss other markets for which the traditional architecture is a poor fit and argue for a critical rethinking of the current factoring of systems services into products.

Posted Content
TL;DR: The authors decompose expressions into three modular components related to subject appearance, location, and relationship to other objects in an end-to-end framework, which allows to flexibly adapt to expressions containing different types of information.
Abstract: In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression. While most recent work treats expressions as a single unit, we propose to decompose them into three modular components related to subject appearance, location, and relationship to other objects. This allows us to flexibly adapt to expressions containing different types of information in an end-to-end framework. In our model, which we call the Modular Attention Network (MAttNet), two types of attention are utilized: language-based attention that learns the module weights as well as the word/phrase attention that each module should focus on; and visual attention that allows the subject and relationship modules to focus on relevant image components. Module weights combine scores from all three modules dynamically to output an overall score. Experiments show that MAttNet outperforms previous state-of-art methods by a large margin on both bounding-box-level and pixel-level comprehension tasks. Demo and code are provided.

Journal ArticleDOI
TL;DR: Empirical results show that the sentence selection and weighting methods can significantly improve the NMT performance, outperforming the existing baselines.
Abstract: Neural machine translation (NMT) has been prominent in many machine translation tasks. However, in some domain-specific tasks, only the corpora from similar domains can improve translation performance. If out-of-domain corpora are directly added into the in-domain corpus, the translation performance may even degrade. Therefore, domain adaptation techniques are essential to solve the NMT domain problem. Most existing methods for domain adaptation are designed for the conventional phrase-based machine translation. For NMT domain adaptation, there have been only a few studies on topics such as fine tuning, domain tags, and domain features. In this paper, we have four goals for sentence level NMT domain adaptation. First, the NMT's internal sentence embedding is exploited and the sentence embedding similarity is used to select out-of-domain sentences that are close to the in-domain corpus. Second, we propose three sentence weighting methods, i.e., sentence weighting, domain weighting, and batch weighting, to balance the data distribution during NMT training. Third, in addition, we propose dynamic training methods to adjust the sentence selection and weighting during NMT training. Fourth, to solve the multidomain problem in a real-world NMT scenario where the domain distributions of training and testing data often mismatch, we proposed a multidomain sentence weighting method to balance the domain distributions of training data and match the domain distributions of training and testing data. The proposed methods are evaluated in international workshop on spoken language translation (IWSLT) English-to-French/German tasks and a multidomain English-to-French task. Empirical results show that the sentence selection and weighting methods can significantly improve the NMT performance, outperforming the existing baselines.

Posted Content
TL;DR: A phrase-critic model to refine generated candidate explanations augmented with flipped phrases to improve the textual explanation quality of fine-grained classification decisions on the CUB dataset by mentioning phrases that are grounded in the image.
Abstract: Existing visual explanation generating agents learn to fluently justify a class prediction. However, they may mention visual attributes which reflect a strong class prior, although the evidence may not actually be in the image. This is particularly concerning as ultimately such agents fail in building trust with human users. To overcome this limitation, we propose a phrase-critic model to refine generated candidate explanations augmented with flipped phrases which we use as negative examples while training. At inference time, our phrase-critic model takes an image and a candidate explanation as input and outputs a score indicating how well the candidate explanation is grounded in the image. Our explainable AI agent is capable of providing counter arguments for an alternative prediction, i.e. counterfactuals, along with explanations that justify the correct classification decisions. Our model improves the textual explanation quality of fine-grained classification decisions on the CUB dataset by mentioning phrases that are grounded in the image. Moreover, on the FOIL tasks, our agent detects when there is a mistake in the sentence, grounds the incorrect phrase and corrects it significantly better than other models.

Proceedings Article
26 May 2018
TL;DR: For instance, GroundNet as discussed by the authors uses a syntactic analysis of the input referring expression in order to inform the structure of the computation graph and localizes the object referred to by a natural language expression.
Abstract: We introduce GroundNet, a neural network for referring expression recognition -- the task of localizing (or grounding) in an image the object referred to by a natural language expression. Our approach to this task is the first to rely on a syntactic analysis of the input referring expression in order to inform the structure of the computation graph. Given a parse tree for an input expression, we explicitly map the syntactic constituents and relationships present in the tree to a composed graph of neural modules that defines our architecture for performing localization. This syntax-based approach aids localization of \textit{both} the target object and auxiliary supporting objects mentioned in the expression. As a result, GroundNet is more interpretable than previous methods: we can (1) determine which phrase of the referring expression points to which object in the image and (2) track how the localization of the target object is determined by the network. We study this property empirically by introducing a new set of annotations on the GoogleRef dataset to evaluate localization of supporting objects. Our experiments show that GroundNet achieves state-of-the-art accuracy in identifying supporting objects, while maintaining comparable performance in the localization of target objects.

Proceedings ArticleDOI
01 Jan 2018
TL;DR: This paper propose a multi-sense LSTM with a dynamic disambiguation mechanism on the input word embeddings to address polysemy issues in text-to-entity mapping.
Abstract: This paper addresses the problem of mapping natural language text to knowledge base entities. The mapping process is approached as a composition of a phrase or a sentence into a point in a multi-dimensional entity space obtained from a knowledge graph. The compositional model is an LSTM equipped with a dynamic disambiguation mechanism on the input word embeddings (a Multi-Sense LSTM), addressing polysemy issues. Further, the knowledge base space is prepared by collecting random walks from a graph enhanced with textual features, which act as a set of semantic bridges between text and knowledge base entities. The ideas of this work are demonstrated on large-scale text-to-entity mapping and entity classification tasks, with state of the art results.