scispace - formally typeset
Search or ask a question

Showing papers on "Phrase published in 2017"


Proceedings ArticleDOI
01 Aug 2017
TL;DR: The authors explore six challenges for NMT: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search, and show both deficiencies and improvements over the quality of phrase-based statistical machine translation.
Abstract: We explore six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search. We show both deficiencies and improvements over the quality of phrase-based statistical machine translation.

840 citations


Proceedings ArticleDOI
01 Jul 2017
TL;DR: In ViP-CNN, a Phrase-guided Message Passing Structure (PMPS) is presented to establish the connection among relationship components and help the model consider the three problems jointly and Experimental results show that the Vip-CNN outperforms the state-of-art method both in speed and accuracy.
Abstract: As the intermediate level task connecting image captioning and object detection, visual relationship detection started to catch researchers attention because of its descriptive power and clear structure. It detects the objects and captures their pair-wise interactions with a subject-predicate-object triplet, e.g. person-ride-horse. In this paper, each visual relationship is considered as a phrase with three components. We formulate the visual relationship detection as three inter-connected recognition problems and propose a Visual Phrase guided Convolutional Neural Network (ViP-CNN) to address them simultaneously. In ViP-CNN, we present a Phrase-guided Message Passing Structure (PMPS) to establish the connection among relationship components and help the model consider the three problems jointly. Corresponding non-maximum suppression method and model training strategy are also proposed. Experimental results show that our ViP-CNN outperforms the state-of-art method both in speed and accuracy. We further pretrain ViP-CNN on our cleansed Visual Genome Relationship dataset, which is found to perform better than the pretraining on the ImageNet for this task.

248 citations


Proceedings ArticleDOI
07 Apr 2017
TL;DR: This paper revisit bilingual pivoting in the context of neural machine translation and presents a paraphrasing model based purely on neural networks, which represents paraphrases in a continuous space, estimates the degree of semantic relatedness between text segments of arbitrary length, and generates candidate paraphrase for any source input.
Abstract: Recognizing and generating paraphrases is an important component in many natural language processing applications. A well-established technique for automatically extracting paraphrases leverages bilingual corpora to find meaning-equivalent phrases in a single language by “pivoting” over a shared translation in another language. In this paper we revisit bilingual pivoting in the context of neural machine translation and present a paraphrasing model based purely on neural networks. Our model represents paraphrases in a continuous space, estimates the degree of semantic relatedness between text segments of arbitrary length, and generates candidate paraphrases for any source input. Experimental results across tasks and datasets show that neural paraphrases outperform those obtained with conventional phrase-based pivoting approaches.

246 citations


Journal ArticleDOI
TL;DR: The results provide initial intracranial evidence for the neurophysiological reality of the merge operation postulated by linguists and suggest that the brain compresses syntactically well-formed sequences of words into a hierarchy of nested phrases.
Abstract: Although sentences unfold sequentially, one word at a time, most linguistic theories propose that their underlying syntactic structure involves a tree of nested phrases rather than a linear sequence of words. Whether and how the brain builds such structures, however, remains largely unknown. Here, we used human intracranial recordings and visual word-by-word presentation of sentences and word lists to investigate how left-hemispheric brain activity varies during the formation of phrase structures. In a broad set of language-related areas, comprising multiple superior temporal and inferior frontal sites, high-gamma power increased with each successive word in a sentence but decreased suddenly whenever words could be merged into a phrase. Regression analyses showed that each additional word or multiword phrase contributed a similar amount of additional brain activity, providing evidence for a merge operation that applies equally to linguistic objects of arbitrary complexity. More superficial models of language, based solely on sequential transition probability over lexical and syntactic categories, only captured activity in the posterior middle temporal gyrus. Formal model comparison indicated that the model of multiword phrase construction provided a better fit than probability-based models at most sites in superior temporal and inferior frontal cortices. Activity in those regions was consistent with a neural implementation of a bottom-up or left-corner parser of the incoming language stream. Our results provide initial intracranial evidence for the neurophysiological reality of the merge operation postulated by linguists and suggest that the brain compresses syntactically well-formed sequences of words into a hierarchy of nested phrases.

219 citations


Proceedings ArticleDOI
01 Oct 2017
TL;DR: The authors used a large collection of linguistic and visual cues, such as appearance, size, and position of entity bounding boxes, adjectives that contain attribute information, and spatial relationships between pairs of entities connected by verbs or prepositions.
Abstract: This paper presents a framework for localization or grounding of phrases in images using a large collection of linguistic and visual cues. We model the appearance, size, and position of entity bounding boxes, adjectives that contain attribute information, and spatial relationships between pairs of entities connected by verbs or prepositions. Special attention is given to relationships between people and clothing or body part mentions, as they are useful for distinguishing individuals. We automatically learn weights for combining these cues and at test time, perform joint inference over all phrases in a caption. The resulting system produces state of the art performance on phrase localization on the Flickr30k Entities dataset [33] and visual relationship detection on the Stanford VRD dataset [27].

164 citations


Patent
29 Sep 2017
TL;DR: In this paper, various systems and methods of selecting a display language for a surgical instrument are disclosed. The surgical instrument includes a handle assembly and a module, such as a shaft assembly, that is interchangeably connectable to the handle assembly.
Abstract: Various systems and methods of selecting a display language for a surgical instrument are disclosed. The surgical instrument includes a handle assembly and a module, such as a shaft assembly, that is interchangeably connectable to the handle assembly. The handle assembly includes a first memory configured to store a language parameter and a control circuit that is operably coupled to the memory. The module includes a second memory configured to store a textual phrase in a plurality of languages. When the module and the handle assembly are connected together, the control circuit is configured to retrieve, from the second memory, the textual phrase in a language corresponding to the language parameter.

151 citations


Journal ArticleDOI
TL;DR: The Flickr30k Entities dataset as mentioned in this paper augments the 158k captions with 244k coreference chains, linking mentions of the same entities across different captions for the same image and associating them with 276k manually annotated bounding boxes.
Abstract: The Flickr30k dataset has become a standard benchmark for sentence-based image description. This paper presents Flickr30k Entities, which augments the 158k captions from Flickr30k with 244k coreference chains, linking mentions of the same entities across different captions for the same image, and associating them with 276k manually annotated bounding boxes. Such annotations are essential for continued progress in automatic image description and grounded language understanding. They enable us to define a new benchmark for localization of textual entity mentions in an image. We present a strong baseline for this task that combines an image-text embedding, detectors for common objects, a color classifier, and a bias towards selecting larger objects. While our baseline rivals in accuracy more complex state-of-the-art models, we show that its gains cannot be easily parlayed into improvements on such tasks as image-sentence retrieval, thus underlining the limitations of current methods and the need for further research.

150 citations


Proceedings ArticleDOI
04 Aug 2017
TL;DR: QRC Net as mentioned in this paper adopts a spatial regression method to break the performance limit, and introduces reinforcement learning techniques to further leverage semantic context information, which jointly learns a Proposal Generation Network (PGN), a Query-guided Regression Network (QRN), and a Context Policy Network (CPN).
Abstract: Given a textual description of an image, phrase grounding localizes objects in the image referred by query phrases in the description. State-of-the-art methods address the problem by ranking a set of proposals based on the relevance to each query, which are limited by the performance of independent proposal generation systems and ignore useful cues from context in the description. In this paper, we adopt a spatial regression method to break the performance limit, and introduce reinforcement learning techniques to further leverage semantic context information. We propose a novel Query-guided Regression network with Context policy (QRC Net) which jointly learns a Proposal Generation Network (PGN), a Query-guided Regression Network (QRN) and a Context Policy Network (CPN). Experiments show QRC Net provides a significant improvement in accuracy on two popular datasets: Flickr30K Entities and Referit Game, with 14.25% and 17.14% increase over the state-of-the-arts respectively.

139 citations


Journal ArticleDOI
01 Mar 2017-Cortex
TL;DR: This article showed that the posterior superior temporal sulcus (pSTS) and the anterior temporal lobe (ATL) of the left inferior frontal gyrus (IFG) underlie top-down syntactic predictions but are not necessary for building syntactic structure.

126 citations


Posted Content
TL;DR: This paper investigates two-branch neural networks for learning the similarity between image-sentence matching and region-phrase matching, and proposes two network structures that produce different output representations.
Abstract: Image-language matching tasks have recently attracted a lot of attention in the computer vision field. These tasks include image-sentence matching, i.e., given an image query, retrieving relevant sentences and vice versa, and region-phrase matching or visual grounding, i.e., matching a phrase to relevant regions. This paper investigates two-branch neural networks for learning the similarity between these two data modalities. We propose two network structures that produce different output representations. The first one, referred to as an embedding network, learns an explicit shared latent embedding space with a maximum-margin ranking loss and novel neighborhood constraints. Compared to standard triplet sampling, we perform improved neighborhood sampling that takes neighborhood information into consideration while constructing mini-batches. The second network structure, referred to as a similarity network, fuses the two branches via element-wise product and is trained with regression loss to directly predict a similarity score. Extensive experiments show that our networks achieve high accuracies for phrase localization on the Flickr30K Entities dataset and for bi-directional image-sentence retrieval on Flickr30K and MSCOCO datasets.

122 citations


Proceedings ArticleDOI
01 Jan 2017
TL;DR: It is found that translations produced by neural machine translation systems are considerably different, more fluent and more accurate in terms of word order compared to those produced by phrase-based systems.
Abstract: We aim to shed light on the strengths and weaknesses of the newly introduced neural machine translation paradigm. To that end, we conduct a multifaceted evaluation in which we compare outputs produced by state-of-the-art neural machine translation and phrase-based machine translation systems for 9 language directions across a number of dimensions. Specifically, we measure the similarity of the outputs, their fluency and amount of reordering, the effect of sentence length and performance across different error categories. We find out that translations produced by neural machine translation systems are considerably different, more fluent and more accurate in terms of word order compared to those produced by phrase-based systems. Neural machine translation systems are also more accurate at producing inflected forms, but they perform poorly when translating very long sentences.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: The NMT’s internal embedding of the source sentence is exploited and the sentence embedding similarity is used to select the sentences which are close to in-domain data to substantially improve NMT performance.
Abstract: Although new corpora are becoming increasingly available for machine translation, only those that belong to the same or similar domains are typically able to improve translation performance. Recently Neural Machine Translation (NMT) has become prominent in the field. However, most of the existing domain adaptation methods only focus on phrase-based machine translation. In this paper, we exploit the NMT’s internal embedding of the source sentence and use the sentence embedding similarity to select the sentences which are close to in-domain data. The empirical adaptation results on the IWSLT English-French and NIST Chinese-English tasks show that the proposed methods can substantially improve NMT performance by 2.4-9.0 BLEU points, outperforming the existing state-of-the-art baseline by 2.3-4.5 BLEU points.

Journal ArticleDOI
TL;DR: This article proposes to learn tag-specific composition functions and tag embeddings in recursive neural networks, and proposes to utilize POS tags to control the gates of tree-structured LSTM networks.
Abstract: Phrase/Sentence representation is one of the most important problems in natural language processing. Many neural network models such as Convolutional Neural Network (CNN), Recursive Neural Network (RNN), and Long Short-Term Memory (LSTM) have been proposed to learn representations of phrase/sentence, however, rich syntactic knowledge has not been fully explored when composing a longer text from its shorter constituent words. In most traditional models, only word embeddings are utilized to compose phrase/sentence representations, while the syntactic information of words is yet to be explored. In this article, we discover that encoding syntactic knowledge (part-of-speech tag) in neural networks can enhance sentence/phrase representation. Specifically, we propose to learn tag-specific composition functions and tag embeddings in recursive neural networks, and propose to utilize POS tags to control the gates of tree-structured LSTM networks. We evaluate these models on two benchmark datasets for sentiment classification, and demonstrate that improvements can be obtained with such syntactic knowledge encoded.

Journal ArticleDOI
TL;DR: This method allows the system to read raw characters, instead of words generated by preprocessing steps, into a pure single neural network model under an end-to-end framework and generate character-level sequence representation as input.
Abstract: This paper presents a character-level seque-nce-to-sequence learning method, RNNembed. This method allows the system to read raw characters, instead of words generated by preprocessing steps, into a pure single neural network model under an end-to-end framework. Specifically, we embed a recurrent neural network into an encoder–decoder framework and generate character-level sequence representation as input. The dimension of input feature space can be significantly reduced as well as avoiding the need to handle unknown or rare words in sequences. In the language model, we improve the basic structure of a gated recurrent unit by adding an output gate, which is used for filtering out unimportant information involved in the attention scheme of the alignment model. Our proposed method was examined in a large-scale dataset on an English-to-Chinese translation task. Experimental results demonstrate that the proposed approach achieves a translation performance comparable, or close, to conventional word-based and phrase-based systems.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: This paper develops visual n-gram models that can predict arbitrary phrases that are relevant to the content of an image, and demonstrates the merits of the models in phrase prediction, phrase-based image retrieval, relating images and captions, and zero-shot transfer.
Abstract: Real-world image recognition systems need to recognize tens of thousands of classes that constitute a plethora of visual concepts. The traditional approach of annotating thousands of images per class for training is infeasible in such a scenario, prompting the use of webly supervised data. This paper explores the training of image-recognition systems on large numbers of images and associated user comments, without using manually labeled images. In particular, we develop visual n-gram models that can predict arbitrary phrases that are relevant to the content of an image. Our visual n-gram models are feed-forward convolutional networks trained using new loss functions that are inspired by n-gram models commonly used in language modeling. We demonstrate the merits of our models in phrase prediction, phrase-based image retrieval, relating images and captions, and zero-shot transfer.

Proceedings ArticleDOI
Swarnadeep Saha1, Harinder Pal
01 Jul 2017
TL;DR: BONIE is designed and released, the first open numerical relation extractor, for extracting Open IE tuples where one of the arguments is a number or a quantity-unit phrase.
Abstract: We design and release BONIE, the first open numerical relation extractor, for extracting Open IE tuples where one of the arguments is a number or a quantity-unit phrase. BONIE uses bootstrapping to learn the specific dependency patterns that express numerical relations in a sentence. BONIE’s novelty lies in task-specific customizations, such as inferring implicit relations, which are clear due to context such as units (for e.g., ‘square kilometers’ suggests area, even if the word ‘area’ is missing in the sentence). BONIE obtains 1.5x yield and 15 point precision gain on numerical facts over a state-of-the-art Open IE system.

Journal ArticleDOI
01 Feb 2017
TL;DR: This study addresses key questions related to the explosion of interest in how to extract insight from unstructured data and how to determine if such insight provides any hints concerning the trends of financial markets and proposes a sentiment analysis engine (SAE) which takes advantage of linguistic analyses based on grammars.
Abstract: The growth of financial texts in the wake of big data has challenged most organizations and brought escalating demands for analysis tools. In general, text streams are more challenging to handle than numeric data streams. Text streams are unstructured by nature, but they represent collective expressions that are of value in any financial decision. It can be both daunting and necessary to make sense of unstructured textual data. In this study, we address key questions related to the explosion of interest in how to extract insight from unstructured data and how to determine if such insight provides any hints concerning the trends of financial markets. A sentiment analysis engine (SAE) is proposed which takes advantage of linguistic analyses based on grammars. This engine extends sentiment analysis not only at the word token level, but also at the phrase level within each sentence. An assessment heuristic is applied to extract the collective expressions shown in the texts. Also, three evaluations are presented to assess the performance of the engine. First, several standard parsing evaluation metrics are applied on two treebanks. Second, a benchmark evaluation using a dataset of English movie review is conducted. Results show our SAE outperforms the traditional bag of words approach. Third, a financial text stream with twelve million words that aligns with a stock market index is examined. The evaluation results and their statistical significance provide strong evidence of a long persistence in the mood time series generated by the engine. In addition, our approach establishes grounds for belief that the sentiments expressed through text streams are helpful for analyzing the trends in a stock market index, although such sentiments and market indices are normally considered to be completely uncorrelated. To explain a classifier-based sentiment parser for financial textsTo demonstrate how to assign the polarity of phrases using an assessment heuristicTo provide statistical tests using twelve million words to attest its significance

Journal ArticleDOI
TL;DR: Compared with RAE and some supervised methods such as support vector machine (SVM) and naive Bayesian on English and Chinese datasets, the experiment results show that CHL-PRAE can provide the best performance for sentence-level sentiment analysis.

Journal ArticleDOI
01 Nov 2017-Cortex
TL;DR: The findings suggest that syntactic and semantic contribution to phrasal formation can be already differentiated at a very basic level, with each of these two processes comprising non-overlapping areas on the cerebral cortex.

Journal ArticleDOI
TL;DR: An analysis of the strengths and weaknesses of several Machine Translation engines implementing the three most widely used paradigms finds that the successful translations of neural MT systems sometimes bear resemblance with the translations of a rule-based MT system.
Abstract: Abstract In this paper, we report an analysis of the strengths and weaknesses of several Machine Translation (MT) engines implementing the three most widely used paradigms. The analysis is based on a manually built test suite that comprises a large range of linguistic phenomena. Two main observations are on the one hand the striking improvement of an commercial online system when turning from a phrase-based to a neural engine and on the other hand that the successful translations of neural MT systems sometimes bear resemblance with the translations of a rule-based MT system.

Journal ArticleDOI
Joel L. Fagan1
02 Aug 2017
TL;DR: An automatic phrase indexing method based on the term discrimination model is described, and the results of retrieval experiments on five document collections are presented.
Abstract: An automatic phrase indexing method based on the term discrimination model is described, and the results of retrieval experiments on five document collections are presented. Problems related to this non-syntactic phrase construction method are discussed, and some possible solutions are proposed that make use of information about the syntactic structure of document and query texts.

Posted Content
TL;DR: This work explores six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search and shows both deficiencies and improvements over the quality of phrase-based statistical machine translation.
Abstract: We explore six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search. We show both deficiencies and improvements over the quality of phrase-based statistical machine translation.

Posted Content
TL;DR: The proposed Deep Relational Network is a novel formulation designed specifically for exploiting the statistical dependencies between objects and their relationships and achieves substantial improvement over state-of-the-art on two large data sets.
Abstract: Relationships among objects play a crucial role in image understanding. Despite the great success of deep learning techniques in recognizing individual objects, reasoning about the relationships among objects remains a challenging task. Previous methods often treat this as a classification problem, considering each type of relationship (e.g. "ride") or each distinct visual phrase (e.g. "person-ride-horse") as a category. Such approaches are faced with significant difficulties caused by the high diversity of visual appearance for each kind of relationships or the large number of distinct visual phrases. We propose an integrated framework to tackle this problem. At the heart of this framework is the Deep Relational Network, a novel formulation designed specifically for exploiting the statistical dependencies between objects and their relationships. On two large datasets, the proposed method achieves substantial improvement over state-of-the-art.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This work considers the problem of learning general-purpose, paraphrastic sentence embeddings, revisiting the setting of Wieting et al. (2016b), and presents several developments that together produce the opposite conclusion.
Abstract: We consider the problem of learning general-purpose, paraphrastic sentence embeddings, revisiting the setting of Wieting et al. (2016b). While they found LSTM recurrent networks to underperform word averaging, we present several developments that together produce the opposite conclusion. These include training on sentence pairs rather than phrase pairs, averaging states to represent sequences, and regularizing aggressively. These improve LSTMs in both transfer learning and supervised settings. We also introduce a new recurrent architecture, the Gated Recurrent Averaging Network, that is inspired by averaging and LSTMs while outperforming them both. We analyze our learned models, finding evidence of preferences for particular parts of speech and dependency relations.

Proceedings ArticleDOI
Assaf Hurwitz Michaely1, Xuedong Zhang1, Gabor Simko1, Carolina Parada1, Petar Aleksic1 
01 Dec 2017
TL;DR: A system that uses server-side contextual ASR and trigger phrase non-terminals to improve overall KWS accuracy and significantly improves the ASR quality, reducing Word Error Rate (WER) (by 10% to 50% relative), and allows the user to speak seamlessly, without pausing between the trigger phrase and the voice command.
Abstract: We present a novel keyword spotting (KWS) system that uses contextual automatic speech recognition (ASR). For voice-activated devices, it is common that a KWS system is run on the device in order to quickly detect a trigger phrase (e.g. “Ok Google”). After the trigger phrase is detected, the audio corresponding to the voice command that follows is streamed to the server. The audio is transcribed by the server-side ASR system and semantically processed to generate a response which is sent back to the device. Due to limited resources on the device, the device KWS system might introduce false accepts (FA) and false rejects (FR) that can cause an unsatisfactory user experience. We describe a system that uses server-side contextual ASR and trigger phrase non-terminals to improve overall KWS accuracy. We show that this approach can significantly reduce the FA rate (by 89%) while minimally increasing the FR rate (by 0.2%). Furthermore, we show that this system significantly improves the ASR quality, reducing Word Error Rate (WER) (by 10% to 50% relative), and allows the user to speak seamlessly, without pausing between the trigger phrase and the voice command.

Journal ArticleDOI
TL;DR: This work proposes a straightforward hidden Markov model (HMM) based extension of the i-vector approach, which allows i-vectors to be successfully applied to text-dependent speaker verification and presents the best published results obtained with a single system on both RSR2015 and RedDots dataset.
Abstract: The low-dimensional i-vector representation of speech segments is used in the state-of-the-art text-independent speaker verification systems. However, i-vectors were deemed unsuitable for the text-dependent task, where simpler and older speaker recognition approaches were found more effective. In this work, we propose a straightforward hidden Markov model (HMM) based extension of the i-vector approach, which allows i-vectors to be successfully applied to text-dependent speaker verification. In our approach, the Universal Background Model (UBM) for training phrase-independent i-vector extractor is based on a set of monophone HMMs instead of the standard Gaussian Mixture Model (GMM). To compensate for the channel variability, we propose to precondition i-vectors using a regularized variant of within-class covariance normalization, which can be robustly estimated in a phrase-dependent fashion on the small datasets available for the text-dependent task. The verification scores are cosine similarities between the i-vectors normalized using phrase-dependent s-norm. The experimental results on RSR2015 and RedDots databases confirm the effectiveness of the proposed approach, especially in rejecting test utterances with a wrong phrase. A simple MFCC based i-vector/HMM system performs competitively when compared to very computationally expensive DNN-based approaches or the conventional relevance MAP GMM-UBM, which does not allow for compact speaker representations. To our knowledge, this paper presents the best published results obtained with a single system on both RSR2015 and RedDots dataset.

Posted Content
TL;DR: This article proposed the Gated Recurrent Averaging Network (GRAN) which is inspired by averaging and LSTMs while outperforming them both in both transfer learning and supervised settings.
Abstract: We consider the problem of learning general-purpose, paraphrastic sentence embeddings, revisiting the setting of Wieting et al. (2016b). While they found LSTM recurrent networks to underperform word averaging, we present several developments that together produce the opposite conclusion. These include training on sentence pairs rather than phrase pairs, averaging states to represent sequences, and regularizing aggressively. These improve LSTMs in both transfer learning and supervised settings. We also introduce a new recurrent architecture, the Gated Recurrent Averaging Network, that is inspired by averaging and LSTMs while outperforming them both. We analyze our learned models, finding evidence of preferences for particular parts of speech and dependency relations.

Journal ArticleDOI
TL;DR: The visual world paradigm was employed to examine the extent to which gender-marked Spanish determiners facilitate upcoming target nouns in a group of Spanish-English bilingual code-switchers, revealing an asymmetric gender effect in processing.
Abstract: Using code-switching as a tool to illustrate how language experience modulates comprehension, the visual world paradigm was employed to examine the extent to which gender-marked Spanish determiners facilitate upcoming target nouns in a group of Spanish-English bilingual code-switchers. The first experiment tested target Spanish nouns embedded in a carrier phrase ( Experiment 1b ) and included a control Spanish monolingual group ( Experiment 1a ). The second set of experiments included critical trials in which participants heard code-switches from Spanish determiners into English nouns (e.g., la house) either in a fixed carrier phrase ( Experiment 2a ) or in variable and complex sentences ( Experiment 2b ). Across the experiments, bilinguals revealed an asymmetric gender effect in processing, showing facilitation only for feminine target items. These results reflect the asymmetric use of gender in the production of code-switched speech. The extension of the asymmetric effect into Spanish ( Experiment 1b ) underscores the permeability between language modes in bilingual code-switchers.

Journal ArticleDOI
13 Jul 2017
TL;DR: This article evaluates the use of deep learning advances, namely the Recursive Neural Tensor Networks (RNTN), for sentiment analysis in Arabic as a case study of MRLs and proposes the creation of the first Arabic Sentiment Treebank (ArSenTB) that is morphologically and orthographically enriched.
Abstract: Accurate sentiment analysis models encode the sentiment of words and their combinations to predict the overall sentiment of a sentence. This task becomes challenging when applied to morphologically rich languages (MRL). In this article, we evaluate the use of deep learning advances, namely the Recursive Neural Tensor Networks (RNTN), for sentiment analysis in Arabic as a case study of MRLs. While Arabic may not be considered the only representative of all MRLs, the challenges faced and proposed solutions in Arabic are common to many other MRLs. We identify, illustrate, and address MRL-related challenges and show how RNTN is affected by the morphological richness and orthographic ambiguity of the Arabic language. To address the challenges with sentiment extraction from text in MRL, we propose to explore different orthographic features as well as different morphological features at multiple levels of abstraction ranging from raw words to roots. A key requirement for RNTN is the availability of a sentiment treebank; a collection of syntactic parse trees annotated for sentiment at all levels of constituency and that currently only exists in English. Therefore, our contribution also includes the creation of the first Arabic Sentiment Treebank (ArSenTB) that is morphologically and orthographically enriched. Experimental results show that, compared to the basic RNTN proposed for English, our solution achieves significant improvements up to 8% absolute at the phrase level and 10.8% absolute at the sentence level, measured by average F1 score. It also outperforms well-known classifiers including Support Vector Machines, Recursive Auto Encoders, and Long Short-Term Memory by 7.6%, 3.2%, and 1.6% absolute respectively, all models being trained with similar morphological considerations.

Posted Content
23 Feb 2017
TL;DR: In ViP-CNN, the visual relationship is considered as a phrase with three components and a Visual Phrase Reasoning Structure (VPRS) is presented to set up the connection among the relationship components and help the model consider the three problems jointly.
Abstract: As the intermediate level task connecting image captioning and object detection, visual relationship detection started to catch researchers’ attention because of its descriptive power and clear structure. It localizes the objects and captures their interactions with a subject-predicateobject triplet, e.g. 〈person-ride-horse〉. In this paper, the visual relationship is considered as a phrase with three components. So we formulate the visual relationship detection as three inter-connected recognition problems and propose a Visual Phrase reasoning Convolutional Neural Network (ViP-CNN) to address them simultaneously. In ViP-CNN, we present a Visual Phrase Reasoning Structure (VPRS) to set up the connection among the relationship components and help the model consider the three problems jointly. Corresponding non-maximum suppression method and model training strategy are also proposed. Experimental results show that our ViP-CNN outperforms the stateof-art method both in speed and accuracy. We further pretrain our model on our cleansed Visual Genome Relationship dataset, which is found to perform better than the pretraining on the ImageNet for this task.