Showing papers on "Phrase published in 2017"

PDF

Open Access

Proceedings Article•DOI•

Six Challenges for Neural Machine Translation.

[...]

Philipp Koehn¹, Rebecca Knowles¹•Institutions (1)

01 Aug 2017

TL;DR: The authors explore six challenges for NMT: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search, and show both deficiencies and improvements over the quality of phrase-based statistical machine translation.

...read moreread less

Abstract: We explore six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search. We show both deficiencies and improvements over the quality of phrase-based statistical machine translation.

...read moreread less

840 citations

Proceedings Article•DOI•

ViP-CNN: Visual Phrase Guided Convolutional Neural Network

[...]

Yikang Li¹, Wanli Ouyang¹, Xiaogang Wang¹, Xiaoou Tang¹•Institutions (1)

The Chinese University of Hong Kong¹

01 Jul 2017

TL;DR: In ViP-CNN, a Phrase-guided Message Passing Structure (PMPS) is presented to establish the connection among relationship components and help the model consider the three problems jointly and Experimental results show that the Vip-CNN outperforms the state-of-art method both in speed and accuracy.

...read moreread less

Abstract: As the intermediate level task connecting image captioning and object detection, visual relationship detection started to catch researchers attention because of its descriptive power and clear structure. It detects the objects and captures their pair-wise interactions with a subject-predicate-object triplet, e.g. person-ride-horse. In this paper, each visual relationship is considered as a phrase with three components. We formulate the visual relationship detection as three inter-connected recognition problems and propose a Visual Phrase guided Convolutional Neural Network (ViP-CNN) to address them simultaneously. In ViP-CNN, we present a Phrase-guided Message Passing Structure (PMPS) to establish the connection among relationship components and help the model consider the three problems jointly. Corresponding non-maximum suppression method and model training strategy are also proposed. Experimental results show that our ViP-CNN outperforms the state-of-art method both in speed and accuracy. We further pretrain ViP-CNN on our cleansed Visual Genome Relationship dataset, which is found to perform better than the pretraining on the ImageNet for this task.

...read moreread less

248 citations

Proceedings Article•DOI•

Paraphrasing Revisited with Neural Machine Translation

[...]

Jonathan Mallinson, Rico Sennrich, Mirella Lapata

07 Apr 2017

TL;DR: This paper revisit bilingual pivoting in the context of neural machine translation and presents a paraphrasing model based purely on neural networks, which represents paraphrases in a continuous space, estimates the degree of semantic relatedness between text segments of arbitrary length, and generates candidate paraphrase for any source input.

...read moreread less

Abstract: Recognizing and generating paraphrases is an important component in many natural language processing applications. A well-established technique for automatically extracting paraphrases leverages bilingual corpora to find meaning-equivalent phrases in a single language by “pivoting” over a shared translation in another language. In this paper we revisit bilingual pivoting in the context of neural machine translation and present a paraphrasing model based purely on neural networks. Our model represents paraphrases in a continuous space, estimates the degree of semantic relatedness between text segments of arbitrary length, and generates candidate paraphrases for any source input. Experimental results across tasks and datasets show that neural paraphrases outperform those obtained with conventional phrase-based pivoting approaches.

...read moreread less

246 citations

Journal Article•DOI•

Neurophysiological dynamics of phrase-structure building during sentence processing

[...]

Matthew J. Nelson¹, Imen El Karoui², Kristóf Giber³, Xiaofang Yang⁴, Laurent D. Cohen², Hilda Koopman⁵, Sydney S. Cash³, Lionel Naccache², John Hale⁶, Christophe Pallier¹, Christophe Pallier⁷, Stanislas Dehaene⁸, Stanislas Dehaene⁷, Stanislas Dehaene¹ - Show less +10 more•Institutions (8)

French Institute of Health and Medical Research¹, Centre national de la recherche scientifique², Harvard University³, Stanford University⁴, University of California, Los Angeles⁵, Cornell University⁶, University of Paris⁷, Collège de France⁸

02 May 2017-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The results provide initial intracranial evidence for the neurophysiological reality of the merge operation postulated by linguists and suggest that the brain compresses syntactically well-formed sequences of words into a hierarchy of nested phrases.

...read moreread less

Abstract: Although sentences unfold sequentially, one word at a time, most linguistic theories propose that their underlying syntactic structure involves a tree of nested phrases rather than a linear sequence of words. Whether and how the brain builds such structures, however, remains largely unknown. Here, we used human intracranial recordings and visual word-by-word presentation of sentences and word lists to investigate how left-hemispheric brain activity varies during the formation of phrase structures. In a broad set of language-related areas, comprising multiple superior temporal and inferior frontal sites, high-gamma power increased with each successive word in a sentence but decreased suddenly whenever words could be merged into a phrase. Regression analyses showed that each additional word or multiword phrase contributed a similar amount of additional brain activity, providing evidence for a merge operation that applies equally to linguistic objects of arbitrary complexity. More superficial models of language, based solely on sequential transition probability over lexical and syntactic categories, only captured activity in the posterior middle temporal gyrus. Formal model comparison indicated that the model of multiword phrase construction provided a better fit than probability-based models at most sites in superior temporal and inferior frontal cortices. Activity in those regions was consistent with a neural implementation of a bottom-up or left-corner parser of the incoming language stream. Our results provide initial intracranial evidence for the neurophysiological reality of the merge operation postulated by linguists and suggest that the brain compresses syntactically well-formed sequences of words into a hierarchy of nested phrases.

...read moreread less

219 citations

Proceedings Article•DOI•

Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues

[...]

Bryan A. Plummer¹, Arun Mallya¹, Christopher M. Cervantes¹, Julia Hockenmaier¹, Svetlana Lazebnik¹ - Show less +1 more•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Oct 2017

TL;DR: The authors used a large collection of linguistic and visual cues, such as appearance, size, and position of entity bounding boxes, adjectives that contain attribute information, and spatial relationships between pairs of entities connected by verbs or prepositions.

...read moreread less

Abstract: This paper presents a framework for localization or grounding of phrases in images using a large collection of linguistic and visual cues. We model the appearance, size, and position of entity bounding boxes, adjectives that contain attribute information, and spatial relationships between pairs of entities connected by verbs or prepositions. Special attention is given to relationships between people and clothing or body part mentions, as they are useful for distinguishing individuals. We automatically learn weights for combining these cues and at test time, perform joint inference over all phrases in a caption. The resulting system produces state of the art performance on phrase localization on the Flickr30k Entities dataset [33] and visual relationship detection on the Stanford VRD dataset [27].

...read moreread less

164 citations

Patent•

Systems and methods for language selection of a surgical instrument

[...]

Richard L. Leimbach, Raymond E. Parfett, Mark D. Overmyer

29 Sep 2017

TL;DR: In this paper, various systems and methods of selecting a display language for a surgical instrument are disclosed. The surgical instrument includes a handle assembly and a module, such as a shaft assembly, that is interchangeably connectable to the handle assembly.

...read moreread less

Abstract: Various systems and methods of selecting a display language for a surgical instrument are disclosed. The surgical instrument includes a handle assembly and a module, such as a shaft assembly, that is interchangeably connectable to the handle assembly. The handle assembly includes a first memory configured to store a language parameter and a control circuit that is operably coupled to the memory. The module includes a second memory configured to store a textual phrase in a plurality of languages. When the module and the handle assembly are connected together, the control circuit is configured to retrieve, from the second memory, the textual phrase in a language corresponding to the language parameter.

...read moreread less

151 citations

Journal Article•DOI•

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

[...]

Bryan A. Plummer¹, Liwei Wang¹, Christopher M. Cervantes¹, Juan C. Caicedo², Julia Hockenmaier¹, Svetlana Lazebnik¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Broad Institute²

01 May 2017-International Journal of Computer Vision

TL;DR: The Flickr30k Entities dataset as mentioned in this paper augments the 158k captions with 244k coreference chains, linking mentions of the same entities across different captions for the same image and associating them with 276k manually annotated bounding boxes.

...read moreread less

Abstract: The Flickr30k dataset has become a standard benchmark for sentence-based image description. This paper presents Flickr30k Entities, which augments the 158k captions from Flickr30k with 244k coreference chains, linking mentions of the same entities across different captions for the same image, and associating them with 276k manually annotated bounding boxes. Such annotations are essential for continued progress in automatic image description and grounded language understanding. They enable us to define a new benchmark for localization of textual entity mentions in an image. We present a strong baseline for this task that combines an image-text embedding, detectors for common objects, a color classifier, and a bias towards selecting larger objects. While our baseline rivals in accuracy more complex state-of-the-art models, we show that its gains cannot be easily parlayed into improvements on such tasks as image-sentence retrieval, thus underlining the limitations of current methods and the need for further research.

...read moreread less

150 citations

Proceedings Article•DOI•

Query-Guided Regression Network with Context Policy for Phrase Grounding

[...]

Kan Chen¹, Rama Kovvuri¹, Ram Nevatia¹•Institutions (1)

University of Southern California¹

04 Aug 2017

TL;DR: QRC Net as mentioned in this paper adopts a spatial regression method to break the performance limit, and introduces reinforcement learning techniques to further leverage semantic context information, which jointly learns a Proposal Generation Network (PGN), a Query-guided Regression Network (QRN), and a Context Policy Network (CPN).

...read moreread less

Abstract: Given a textual description of an image, phrase grounding localizes objects in the image referred by query phrases in the description. State-of-the-art methods address the problem by ranking a set of proposals based on the relevance to each query, which are limited by the performance of independent proposal generation systems and ignore useful cues from context in the description. In this paper, we adopt a spatial regression method to break the performance limit, and introduce reinforcement learning techniques to further leverage semantic context information. We propose a novel Query-guided Regression network with Context policy (QRC Net) which jointly learns a Proposal Generation Network (PGN), a Query-guided Regression Network (QRN) and a Context Policy Network (CPN). Experiments show QRC Net provides a significant improvement in accuracy on two popular datasets: Flickr30K Entities and Referit Game, with 14.25% and 17.14% increase over the state-of-the-arts respectively.

...read moreread less

139 citations

Journal Article•DOI•

The role of the IFG and pSTS in syntactic prediction: Evidence from a parametric study of hierarchical structure in fMRI.

[...]

William Matchin¹, Christopher Mathias Hammerly², Ellen Lau³•Institutions (3)

University of California, San Diego¹, University of Massachusetts Amherst², University of Maryland, College Park³

01 Mar 2017-Cortex

TL;DR: This article showed that the posterior superior temporal sulcus (pSTS) and the anterior temporal lobe (ATL) of the left inferior frontal gyrus (IFG) underlie top-down syntactic predictions but are not necessary for building syntactic structure.

...read moreread less

126 citations

Posted Content•

Learning Two-Branch Neural Networks for Image-Text Matching Tasks

[...]

Liwei Wang¹, Yin Li², Jing Huang¹, Svetlana Lazebnik¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, Georgia Institute of Technology²

11 Apr 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper investigates two-branch neural networks for learning the similarity between image-sentence matching and region-phrase matching, and proposes two network structures that produce different output representations.

...read moreread less

Abstract: Image-language matching tasks have recently attracted a lot of attention in the computer vision field. These tasks include image-sentence matching, i.e., given an image query, retrieving relevant sentences and vice versa, and region-phrase matching or visual grounding, i.e., matching a phrase to relevant regions. This paper investigates two-branch neural networks for learning the similarity between these two data modalities. We propose two network structures that produce different output representations. The first one, referred to as an embedding network, learns an explicit shared latent embedding space with a maximum-margin ranking loss and novel neighborhood constraints. Compared to standard triplet sampling, we perform improved neighborhood sampling that takes neighborhood information into consideration while constructing mini-batches. The second network structure, referred to as a similarity network, fuses the two branches via element-wise product and is trained with regression loss to directly predict a similarity score. Extensive experiments show that our networks achieve high accuracies for phrase localization on the Flickr30K Entities dataset and for bi-directional image-sentence retrieval on Flickr30K and MSCOCO datasets.

...read moreread less

122 citations

Proceedings Article•DOI•

A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions

[...]

Antonio Toral¹, Víctor M. Sánchez-Cartagena²•Institutions (2)

Dublin City University¹, University of Alicante²

01 Jan 2017

TL;DR: It is found that translations produced by neural machine translation systems are considerably different, more fluent and more accurate in terms of word order compared to those produced by phrase-based systems.

...read moreread less

Abstract: We aim to shed light on the strengths and weaknesses of the newly introduced neural machine translation paradigm. To that end, we conduct a multifaceted evaluation in which we compare outputs produced by state-of-the-art neural machine translation and phrase-based machine translation systems for 9 language directions across a number of dimensions. Specifically, we measure the similarity of the outputs, their fluency and amount of reordering, the effect of sentence length and performance across different error categories. We find out that translations produced by neural machine translation systems are considerably different, more fluent and more accurate in terms of word order compared to those produced by phrase-based systems. Neural machine translation systems are also more accurate at producing inflected forms, but they perform poorly when translating very long sentences.

...read moreread less

Proceedings Article•DOI•

Sentence Embedding for Neural Machine Translation Domain Adaptation

[...]

Rui Wang¹, Andrew Finch², Masao Utiyama², Eiichiro Sumita²•Institutions (2)

Shanghai Jiao Tong University¹, National Institute of Information and Communications Technology²

01 Jul 2017

TL;DR: The NMT’s internal embedding of the source sentence is exploited and the sentence embedding similarity is used to select the sentences which are close to in-domain data to substantially improve NMT performance.

...read moreread less

Abstract: Although new corpora are becoming increasingly available for machine translation, only those that belong to the same or similar domains are typically able to improve translation performance. Recently Neural Machine Translation (NMT) has become prominent in the field. However, most of the existing domain adaptation methods only focus on phrase-based machine translation. In this paper, we exploit the NMT’s internal embedding of the source sentence and use the sentence embedding similarity to select the sentences which are close to in-domain data. The empirical adaptation results on the IWSLT English-French and NIST Chinese-English tasks show that the proposed methods can substantially improve NMT performance by 2.4-9.0 BLEU points, outperforming the existing state-of-the-art baseline by 2.3-4.5 BLEU points.

...read moreread less

Journal Article•DOI•

Encoding Syntactic Knowledge in Neural Networks for Sentiment Classification

[...]

Minlie Huang¹, Qiao Qian¹, Xiaoyan Zhu¹•Institutions (1)

Tsinghua University¹

05 Jun 2017-ACM Transactions on Information Systems

TL;DR: This article proposes to learn tag-specific composition functions and tag embeddings in recursive neural networks, and proposes to utilize POS tags to control the gates of tree-structured LSTM networks.

...read moreread less

Abstract: Phrase/Sentence representation is one of the most important problems in natural language processing. Many neural network models such as Convolutional Neural Network (CNN), Recursive Neural Network (RNN), and Long Short-Term Memory (LSTM) have been proposed to learn representations of phrase/sentence, however, rich syntactic knowledge has not been fully explored when composing a longer text from its shorter constituent words. In most traditional models, only word embeddings are utilized to compose phrase/sentence representations, while the syntactic information of words is yet to be explored. In this article, we discover that encoding syntactic knowledge (part-of-speech tag) in neural networks can enhance sentence/phrase representation. Specifically, we propose to learn tag-specific composition functions and tag embeddings in recursive neural networks, and propose to utilize POS tags to control the gates of tree-structured LSTM networks. We evaluate these models on two benchmark datasets for sentiment classification, and demonstrate that improvements can be obtained with such syntactic knowledge encoded.

...read moreread less

Journal Article•DOI•

Understanding Subtitles by Character-Level Sequence-to-Sequence Learning

[...]

Haijun Zhang¹, Jingxuan Li¹, Yuzhu Ji¹, Heng Yue²•Institutions (2)

Harbin Institute of Technology¹, Northeastern University (China)²

01 Apr 2017-IEEE Transactions on Industrial Informatics

TL;DR: This method allows the system to read raw characters, instead of words generated by preprocessing steps, into a pure single neural network model under an end-to-end framework and generate character-level sequence representation as input.

...read moreread less

Abstract: This paper presents a character-level seque-nce-to-sequence learning method, RNNembed. This method allows the system to read raw characters, instead of words generated by preprocessing steps, into a pure single neural network model under an end-to-end framework. Specifically, we embed a recurrent neural network into an encoder–decoder framework and generate character-level sequence representation as input. The dimension of input feature space can be significantly reduced as well as avoiding the need to handle unknown or rare words in sequences. In the language model, we improve the basic structure of a gated recurrent unit by adding an output gate, which is used for filtering out unimportant information involved in the attention scheme of the alignment model. Our proposed method was examined in a large-scale dataset on an English-to-Chinese translation task. Experimental results demonstrate that the proposed approach achieves a translation performance comparable, or close, to conventional word-based and phrase-based systems.

...read moreread less

Proceedings Article•DOI•

Learning Visual N-Grams from Web Data

[...]

Ang Li¹, Allan Jabri², Armand Joulin², Laurens van der Maaten²•Institutions (2)

University of Maryland, College Park¹, Facebook²

01 Oct 2017

TL;DR: This paper develops visual n-gram models that can predict arbitrary phrases that are relevant to the content of an image, and demonstrates the merits of the models in phrase prediction, phrase-based image retrieval, relating images and captions, and zero-shot transfer.

...read moreread less

Abstract: Real-world image recognition systems need to recognize tens of thousands of classes that constitute a plethora of visual concepts. The traditional approach of annotating thousands of images per class for training is infeasible in such a scenario, prompting the use of webly supervised data. This paper explores the training of image-recognition systems on large numbers of images and associated user comments, without using manually labeled images. In particular, we develop visual n-gram models that can predict arbitrary phrases that are relevant to the content of an image. Our visual n-gram models are feed-forward convolutional networks trained using new loss functions that are inspired by n-gram models commonly used in language modeling. We demonstrate the merits of our models in phrase prediction, phrase-based image retrieval, relating images and captions, and zero-shot transfer.

...read moreread less

Proceedings Article•DOI•

Bootstrapping for Numerical Open IE

[...]

Swarnadeep Saha¹, Harinder Pal•Institutions (1)

IBM¹

01 Jul 2017

TL;DR: BONIE is designed and released, the first open numerical relation extractor, for extracting Open IE tuples where one of the arguments is a number or a quantity-unit phrase.

...read moreread less

Abstract: We design and release BONIE, the first open numerical relation extractor, for extracting Open IE tuples where one of the arguments is a number or a quantity-unit phrase. BONIE uses bootstrapping to learn the specific dependency patterns that express numerical relations in a sentence. BONIE’s novelty lies in task-specific customizations, such as inferring implicit relations, which are clear due to context such as units (for e.g., ‘square kilometers’ suggests area, even if the word ‘area’ is missing in the sentence). BONIE obtains 1.5x yield and 15 point precision gain on numerical facts over a state-of-the-art Open IE system.

...read moreread less

Journal Article•DOI•

Sentiment analysis in financial texts

[...]

Samuel W. K. Chan¹, Mickey W. C. Chong¹•Institutions (1)

The Chinese University of Hong Kong¹

01 Feb 2017

TL;DR: This study addresses key questions related to the explosion of interest in how to extract insight from unstructured data and how to determine if such insight provides any hints concerning the trends of financial markets and proposes a sentiment analysis engine (SAE) which takes advantage of linguistic analyses based on grammars.

...read moreread less

Abstract: The growth of financial texts in the wake of big data has challenged most organizations and brought escalating demands for analysis tools. In general, text streams are more challenging to handle than numeric data streams. Text streams are unstructured by nature, but they represent collective expressions that are of value in any financial decision. It can be both daunting and necessary to make sense of unstructured textual data. In this study, we address key questions related to the explosion of interest in how to extract insight from unstructured data and how to determine if such insight provides any hints concerning the trends of financial markets. A sentiment analysis engine (SAE) is proposed which takes advantage of linguistic analyses based on grammars. This engine extends sentiment analysis not only at the word token level, but also at the phrase level within each sentence. An assessment heuristic is applied to extract the collective expressions shown in the texts. Also, three evaluations are presented to assess the performance of the engine. First, several standard parsing evaluation metrics are applied on two treebanks. Second, a benchmark evaluation using a dataset of English movie review is conducted. Results show our SAE outperforms the traditional bag of words approach. Third, a financial text stream with twelve million words that aligns with a stock market index is examined. The evaluation results and their statistical significance provide strong evidence of a long persistence in the mood time series generated by the engine. In addition, our approach establishes grounds for belief that the sentiments expressed through text streams are helpful for analyzing the trends in a stock market index, although such sentiments and market indices are normally considered to be completely uncorrelated. To explain a classifier-based sentiment parser for financial textsTo demonstrate how to assign the polarity of phrases using an assessment heuristicTo provide statistical tests using twelve million words to attest its significance

...read moreread less

Journal Article•DOI•

Combine HowNet lexicon to train phrase recursive autoencoder for sentence-level sentiment analysis

[...]

Xianghua Fu¹, Wangwang Liu¹, Yingying Xu¹, Laizhong Cui¹•Institutions (1)

Shenzhen University¹

07 Jun 2017-Neurocomputing

TL;DR: Compared with RAE and some supervised methods such as support vector machine (SVM) and naive Bayesian on English and Chinese datasets, the experiment results show that CHL-PRAE can provide the best performance for sentence-level sentiment analysis.

...read moreread less

Journal Article•DOI•

Differential cortical contribution of syntax and semantics: An fMRI study on two-word phrasal processing.

[...]

Marianne Schell¹, Emiliano Zaccarella¹, Angela D. Friederici¹•Institutions (1)

Max Planck Society¹

01 Nov 2017-Cortex

TL;DR: The findings suggest that syntactic and semantic contribution to phrasal formation can be already differentiated at a very basic level, with each of these two processes comprising non-overlapping areas on the cerebral cortex.

...read moreread less

Journal Article•DOI•

A Linguistic Evaluation of Rule-Based, Phrase-Based, and Neural MT Engines

[...]

Aljoscha Burchardt¹, Vivien Macketanz¹, Jon Dehdari¹, Georg Heigold¹, Jan-Thorsten Peter², Philip Williams - Show less +2 more•Institutions (2)

German Research Centre for Artificial Intelligence¹, RWTH Aachen University²

01 Jun 2017-The Prague Bulletin of Mathematical Linguistics

TL;DR: An analysis of the strengths and weaknesses of several Machine Translation engines implementing the three most widely used paradigms finds that the successful translations of neural MT systems sometimes bear resemblance with the translations of a rule-based MT system.

...read moreread less

Abstract: Abstract In this paper, we report an analysis of the strengths and weaknesses of several Machine Translation (MT) engines implementing the three most widely used paradigms. The analysis is based on a manually built test suite that comprises a large range of linguistic phenomena. Two main observations are on the one hand the striking improvement of an commercial online system when turning from a phrase-based to a neural engine and on the other hand that the successful translations of neural MT systems sometimes bear resemblance with the translations of a rule-based MT system.

...read moreread less

Journal Article•DOI•

Automatic P h r a s e Indexing for Document Retrieval: An Examination of Syntactic and Non-Syntactic Methods

[...]

Joel L. Fagan¹•Institutions (1)

Cornell University¹

02 Aug 2017

TL;DR: An automatic phrase indexing method based on the term discrimination model is described, and the results of retrieval experiments on five document collections are presented.

...read moreread less

Abstract: An automatic phrase indexing method based on the term discrimination model is described, and the results of retrieval experiments on five document collections are presented. Problems related to this non-syntactic phrase construction method are discussed, and some possible solutions are proposed that make use of information about the syntactic structure of document and query texts.

...read moreread less

Posted Content•

Six Challenges for Neural Machine Translation

[...]

Philipp Koehn¹, Rebecca Knowles¹•Institutions (1)

Johns Hopkins University¹

12 Jun 2017-arXiv: Computation and Language

TL;DR: This work explores six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search and shows both deficiencies and improvements over the quality of phrase-based statistical machine translation.

...read moreread less

Posted Content•

Detecting Visual Relationships with Deep Relational Networks

[...]

Bo Dai¹, Yuqi Zhang¹, Dahua Lin¹•Institutions (1)

The Chinese University of Hong Kong¹

11 Apr 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: The proposed Deep Relational Network is a novel formulation designed specifically for exploiting the statistical dependencies between objects and their relationships and achieves substantial improvement over state-of-the-art on two large data sets.

...read moreread less

Abstract: Relationships among objects play a crucial role in image understanding. Despite the great success of deep learning techniques in recognizing individual objects, reasoning about the relationships among objects remains a challenging task. Previous methods often treat this as a classification problem, considering each type of relationship (e.g. "ride") or each distinct visual phrase (e.g. "person-ride-horse") as a category. Such approaches are faced with significant difficulties caused by the high diversity of visual appearance for each kind of relationships or the large number of distinct visual phrases. We propose an integrated framework to tackle this problem. At the heart of this framework is the Deep Relational Network, a novel formulation designed specifically for exploiting the statistical dependencies between objects and their relationships. On two large datasets, the proposed method achieves substantial improvement over state-of-the-art.

...read moreread less

Proceedings Article•DOI•

Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings

[...]

John Wieting¹, Kevin Gimpel²•Institutions (2)

Carnegie Mellon University¹, Toyota Technological Institute at Chicago²

01 Jul 2017

TL;DR: This work considers the problem of learning general-purpose, paraphrastic sentence embeddings, revisiting the setting of Wieting et al. (2016b), and presents several developments that together produce the opposite conclusion.

...read moreread less

Abstract: We consider the problem of learning general-purpose, paraphrastic sentence embeddings, revisiting the setting of Wieting et al. (2016b). While they found LSTM recurrent networks to underperform word averaging, we present several developments that together produce the opposite conclusion. These include training on sentence pairs rather than phrase pairs, averaging states to represent sequences, and regularizing aggressively. These improve LSTMs in both transfer learning and supervised settings. We also introduce a new recurrent architecture, the Gated Recurrent Averaging Network, that is inspired by averaging and LSTMs while outperforming them both. We analyze our learned models, finding evidence of preferences for particular parts of speech and dependency relations.

...read moreread less

Proceedings Article•DOI•

Keyword spotting for Google assistant using contextual speech recognition

[...]

Assaf Hurwitz Michaely¹, Xuedong Zhang¹, Gabor Simko¹, Carolina Parada¹, Petar Aleksic¹ - Show less +1 more•Institutions (1)

Google¹

01 Dec 2017

TL;DR: A system that uses server-side contextual ASR and trigger phrase non-terminals to improve overall KWS accuracy and significantly improves the ASR quality, reducing Word Error Rate (WER) (by 10% to 50% relative), and allows the user to speak seamlessly, without pausing between the trigger phrase and the voice command.

...read moreread less

Abstract: We present a novel keyword spotting (KWS) system that uses contextual automatic speech recognition (ASR). For voice-activated devices, it is common that a KWS system is run on the device in order to quickly detect a trigger phrase (e.g. “Ok Google”). After the trigger phrase is detected, the audio corresponding to the voice command that follows is streamed to the server. The audio is transcribed by the server-side ASR system and semantically processed to generate a response which is sent back to the device. Due to limited resources on the device, the device KWS system might introduce false accepts (FA) and false rejects (FR) that can cause an unsatisfactory user experience. We describe a system that uses server-side contextual ASR and trigger phrase non-terminals to improve overall KWS accuracy. We show that this approach can significantly reduce the FA rate (by 89%) while minimally increasing the FR rate (by 0.2%). Furthermore, we show that this system significantly improves the ASR quality, reducing Word Error Rate (WER) (by 10% to 50% relative), and allows the user to speak seamlessly, without pausing between the trigger phrase and the voice command.

...read moreread less

Journal Article•DOI•

HMM-Based Phrase-Independent i-Vector Extractor for Text-Dependent Speaker Verification

[...]

Hossein Zeinali¹, Hossein Sameti¹, Lukas Burget²•Institutions (2)

Sharif University of Technology¹, Brno University of Technology²

01 Jul 2017-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This work proposes a straightforward hidden Markov model (HMM) based extension of the i-vector approach, which allows i-vectors to be successfully applied to text-dependent speaker verification and presents the best published results obtained with a single system on both RSR2015 and RedDots dataset.

...read moreread less

Abstract: The low-dimensional i-vector representation of speech segments is used in the state-of-the-art text-independent speaker verification systems. However, i-vectors were deemed unsuitable for the text-dependent task, where simpler and older speaker recognition approaches were found more effective. In this work, we propose a straightforward hidden Markov model (HMM) based extension of the i-vector approach, which allows i-vectors to be successfully applied to text-dependent speaker verification. In our approach, the Universal Background Model (UBM) for training phrase-independent i-vector extractor is based on a set of monophone HMMs instead of the standard Gaussian Mixture Model (GMM). To compensate for the channel variability, we propose to precondition i-vectors using a regularized variant of within-class covariance normalization, which can be robustly estimated in a phrase-dependent fashion on the small datasets available for the text-dependent task. The verification scores are cosine similarities between the i-vectors normalized using phrase-dependent s-norm. The experimental results on RSR2015 and RedDots databases confirm the effectiveness of the proposed approach, especially in rejecting test utterances with a wrong phrase. A simple MFCC based i-vector/HMM system performs competitively when compared to very computationally expensive DNN-based approaches or the conventional relevance MAP GMM-UBM, which does not allow for compact speaker representations. To our knowledge, this paper presents the best published results obtained with a single system on both RSR2015 and RedDots dataset.

...read moreread less

Posted Content•

Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings

[...]

John Wieting¹, Kevin Gimpel²•Institutions (2)

Carnegie Mellon University¹, Toyota Technological Institute at Chicago²

30 Apr 2017-arXiv: Computation and Language

TL;DR: This article proposed the Gated Recurrent Averaging Network (GRAN) which is inspired by averaging and LSTMs while outperforming them both in both transfer learning and supervised settings.

...read moreread less

Journal Article•DOI•

Experience with code-switching modulates the use of grammatical gender during sentence processing.

[...]

Jorge R. Valdés Kroff¹, Paola E. Dussias², Chip Gerfen³, Lauren Perrotti², M. Teresa Bajo⁴ - Show less +1 more•Institutions (4)

University of Florida¹, Pennsylvania State University², American University³, University of Granada⁴

01 Feb 2017-Linguistic Approaches to Bilingualism

TL;DR: The visual world paradigm was employed to examine the extent to which gender-marked Spanish determiners facilitate upcoming target nouns in a group of Spanish-English bilingual code-switchers, revealing an asymmetric gender effect in processing.

...read moreread less

Abstract: Using code-switching as a tool to illustrate how language experience modulates comprehension, the visual world paradigm was employed to examine the extent to which gender-marked Spanish determiners facilitate upcoming target nouns in a group of Spanish-English bilingual code-switchers. The first experiment tested target Spanish nouns embedded in a carrier phrase ( Experiment 1b ) and included a control Spanish monolingual group ( Experiment 1a ). The second set of experiments included critical trials in which participants heard code-switches from Spanish determiners into English nouns (e.g., la house) either in a fixed carrier phrase ( Experiment 2a ) or in variable and complex sentences ( Experiment 2b ). Across the experiments, bilinguals revealed an asymmetric gender effect in processing, showing facilitation only for feminine target items. These results reflect the asymmetric use of gender in the production of code-switched speech. The extension of the asymmetric effect into Spanish ( Experiment 1b ) underscores the permeability between language modes in bilingual code-switchers.

...read moreread less

Journal Article•DOI•

A Sentiment Treebank and Morphologically Enriched Recursive Deep Models for Effective Sentiment Analysis in Arabic

[...]

Ramy Baly¹, Hazem Hajj¹, Nizar Habash², Khaled Bashir Shaban³, Wassim El-Hajj¹ - Show less +1 more•Institutions (3)

American University of Beirut¹, New York University Abu Dhabi², Qatar University³

13 Jul 2017

TL;DR: This article evaluates the use of deep learning advances, namely the Recursive Neural Tensor Networks (RNTN), for sentiment analysis in Arabic as a case study of MRLs and proposes the creation of the first Arabic Sentiment Treebank (ArSenTB) that is morphologically and orthographically enriched.

...read moreread less

Abstract: Accurate sentiment analysis models encode the sentiment of words and their combinations to predict the overall sentiment of a sentence. This task becomes challenging when applied to morphologically rich languages (MRL). In this article, we evaluate the use of deep learning advances, namely the Recursive Neural Tensor Networks (RNTN), for sentiment analysis in Arabic as a case study of MRLs. While Arabic may not be considered the only representative of all MRLs, the challenges faced and proposed solutions in Arabic are common to many other MRLs. We identify, illustrate, and address MRL-related challenges and show how RNTN is affected by the morphological richness and orthographic ambiguity of the Arabic language. To address the challenges with sentiment extraction from text in MRL, we propose to explore different orthographic features as well as different morphological features at multiple levels of abstraction ranging from raw words to roots. A key requirement for RNTN is the availability of a sentiment treebank; a collection of syntactic parse trees annotated for sentiment at all levels of constituency and that currently only exists in English. Therefore, our contribution also includes the creation of the first Arabic Sentiment Treebank (ArSenTB) that is morphologically and orthographically enriched. Experimental results show that, compared to the basic RNTN proposed for English, our solution achieves significant improvements up to 8% absolute at the phrase level and 10.8% absolute at the sentence level, measured by average F1 score. It also outperforms well-known classifiers including Support Vector Machines, Recursive Auto Encoders, and Long Short-Term Memory by 7.6%, 3.2%, and 1.6% absolute respectively, all models being trained with similar morphological considerations.

...read moreread less

Posted Content•

ViP-CNN: A Visual Phrase Reasoning Convolutional Neural Network for Visual Relationship Detection.

[...]

Yikang Li, Wanli Ouyang, Xiaogang Wang

23 Feb 2017

TL;DR: In ViP-CNN, the visual relationship is considered as a phrase with three components and a Visual Phrase Reasoning Structure (VPRS) is presented to set up the connection among the relationship components and help the model consider the three problems jointly.

...read moreread less

Abstract: As the intermediate level task connecting image captioning and object detection, visual relationship detection started to catch researchers’ attention because of its descriptive power and clear structure. It localizes the objects and captures their interactions with a subject-predicateobject triplet, e.g. 〈person-ride-horse〉. In this paper, the visual relationship is considered as a phrase with three components. So we formulate the visual relationship detection as three inter-connected recognition problems and propose a Visual Phrase reasoning Convolutional Neural Network (ViP-CNN) to address them simultaneously. In ViP-CNN, we present a Visual Phrase Reasoning Structure (VPRS) to set up the connection among the relationship components and help the model consider the three problems jointly. Corresponding non-maximum suppression method and model training strategy are also proposed. Experimental results show that our ViP-CNN outperforms the stateof-art method both in speed and accuracy. We further pretrain our model on our cleansed Visual Genome Relationship dataset, which is found to perform better than the pretraining on the ImageNet for this task.

...read moreread less

Collapse