Showing papers on "Rule-based machine translation published in 2019"

PDF

Open Access

Proceedings Article•DOI•

Graphonomy: Universal Human Parsing via Graph Transfer Learning

[...]

Ke Gong¹, Yiming Gao¹, Xiaodan Liang¹, Xiaohui Shen, Meng Wang², Liang Lin¹ - Show less +2 more•Institutions (2)

Sun Yat-sen University¹, Hefei University of Technology²

01 Jun 2019

TL;DR: A new universal human parsing agent, named ``Graphonomy", is proposed, which incorporates hierarchical graph transfer learning upon the conventional parsing network to encode the underlying label semantic structures and propagate relevant semantic information.

...read moreread less

Abstract: Prior highly-tuned human parsing models tend to fit towards each dataset in a specific domain or with discrepant label granularity, and can hardly be adapted to other human parsing tasks without extensive re-training. In this paper, we aim to learn a single universal human parsing model that can tackle all kinds of human parsing needs by unifying label annotations from different domains or at various levels of granularity. This poses many fundamental learning challenges, e.g. discovering underlying semantic structures among different label granularity, performing proper transfer learning across different image domains, and identifying and utilizing label redundancies across related tasks. To address these challenges, we propose a new universal human parsing agent, named ``Graphonomy", which incorporates hierarchical graph transfer learning upon the conventional parsing network to encode the underlying label semantic structures and propagate relevant semantic information. In particular, Graphonomy first learns and propagates compact high-level graph representation among the labels within one dataset via Intra-Graph Reasoning, and then transfers semantic information across multiple datasets via Inter-Graph Transfer. Various graph transfer dependencies (e.g., similarity, linguistic knowledge) between different datasets are analyzed and encoded to enhance graph transfer capability. By distilling universal semantic graph representation to each specific task, Graphonomy is able to predict all levels of parsing labels in one system without piling up the complexity. Experimental results show Graphonomy effectively achieves the state-of-the-art results on three human parsing benchmarks as well as advantageous universal human parsing performance.

...read moreread less

161 citations

DOI•

BLiMP: A Benchmark of Linguistic Minimal Pairs for English

[...]

Alex Warstadt, Alicia Parrish, Haokun Liu, Anhad Mohananey, Wei Peng, Sheng-Fu Wang, Samuel R. Bowman - Show less +3 more

01 Jan 2019

TL;DR: The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP) as discussed by the authors is a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English.

...read moreread less

Abstract: We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. The data is automatically generated according to expert-crafted grammars, and aggregate human agreement with the labels is 96.4%. We use it to evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs. We find that state-of-the-art models identify morphological contrasts reliably, but they struggle with semantic restrictions on the distribution of quantifiers and negative polarity items and subtle syntactic phenomena such as extraction islands.

...read moreread less

133 citations

Journal Article•DOI•

Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks

[...]

Victor Dibia¹, Çağatay Demiralp•Institutions (1)

Cloudera¹

01 Sep 2019-IEEE Computer Graphics and Applications

TL;DR: Data2Vis is introduced, an end-to-end trainable neural translation model for automatically generating visualizations from given datasets that are comparable to manually created visualizations in a fraction of the time, with potential to learn more complex visualization strategies at scale.

...read moreread less

Abstract: Rapidly creating effective visualizations using expressive grammars is challenging for users who have limited time and limited skills in statistics and data visualization. Even high-level, dedicated visualization tools often require users to manually select among data attributes, decide which transformations to apply, and specify mappings between visual encoding variables and raw or transformed attributes. In this paper we introduce Data2Vis, an end-to-end trainable neural translation model for automatically generating visualizations from given datasets. We formulate visualization generation as a language translation problem, where data specifications are mapped to visualization specifications in a declarative language (Vega-Lite). To this end, we train a multilayered attention-based encoder–decoder network with long short-term memory (LSTM) units on a corpus of visualization specifications. Qualitative results show that our model learns the vocabulary and syntax for a valid visualization specification, appropriate transformations (count, bins, mean), and how to use common data selection patterns that occur within data visualizations. We introduce two metrics for evaluating the task of automated visualization generation (language syntax validity, visualization grammar syntax validity) and demonstrate the efficacy of bidirectional models with attention mechanisms for this task. Data2Vis generates visualizations that are comparable to manually created visualizations in a fraction of the time, with potential to learn more complex visualization strategies at scale.

...read moreread less

121 citations

Proceedings Article•DOI•

Unsupervised Recurrent Neural Network Grammars

[...]

Yoon Kim¹, Alexander M. Rush¹, Lei Yu², Adhiguna Kuncoro², Chris Dyer², Gábor Melis² - Show less +2 more•Institutions (2)

Harvard University¹, Google²

07 Apr 2019

TL;DR: An inference network parameterized as a neural CRF constituency parser is developed to maximize the evidence lower bound and apply amortized variational inference to unsupervised learning of RNNGs.

...read moreread less

Abstract: Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNGs. Since directly marginalizing over the space of latent trees is intractable, we instead apply amortized variational inference. To maximize the evidence lower bound, we develop an inference network parameterized as a neural CRF constituency parser. On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese. On constituency grammar induction, they are competitive with recent neural language models that induce tree structures from words through attention mechanisms.

...read moreread less

115 citations

Book Chapter•DOI•

A Unified Framework

[...]

Yucheng Dong¹, Jiuping Xu¹•Institutions (1)

Sichuan University¹

01 Jan 2019

TL;DR: This chapter provides a connection between the model of the use of a linguistic hierarchy and the numerical Scale model, and shows that the numerical scale model can provide a unified framework to connect different linguistic symbolic computational models.

...read moreread less

Abstract: The 2-tuple linguistic representation model is widely used as a basis for linguistic symbolic computational models in linguistic decision making problems. In this chapter we provide a connection between the model of the use of a linguistic hierarchy and the numerical scale model, and then show that the numerical scale model can provide a unified framework [13] to connect different linguistic symbolic computational models. Further, a novel computing with words (CWW) methodology [13] where hesitant fuzzy linguistic term sets (HFLTSs) can be constructed based on unbalanced linguistic term sets (ULTSs) using a numerical scale is proposed. In the proposed CWW methodology, several novel possibility degree formulas for comparing HFLTSs are presented, and novel operators based on a mixed 0–1 linear programming model to aggregate hesitant unbalanced linguistic information are defined.

...read moreread less

97 citations

Posted Content•

Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction

[...]

Jingwen Wang¹, Lin Ma¹, Wenhao Jiang¹•Institutions (1)

Tencent¹

11 Sep 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes an end-to-end boundary-aware model, which uses a lightweight branch to predict semantic boundaries corresponding to the given linguistic information, and outperforms its competitors with a clear margin on three public datasets.

...read moreread less

Abstract: The task of temporally grounding language queries in videos is to temporally localize the best matched video segment corresponding to a given language (sentence). It requires certain models to simultaneously perform visual and linguistic understandings. Previous work predominantly ignores the precision of segment localization. Sliding window based methods use predefined search window sizes, which suffer from redundant computation, while existing anchor-based approaches fail to yield precise localization. We address this issue by proposing an end-to-end boundary-aware model, which uses a lightweight branch to predict semantic boundaries corresponding to the given linguistic information. To better detect semantic boundaries, we propose to aggregate contextual information by explicitly modeling the relationship between the current element and its neighbors. The most confident segments are subsequently selected based on both anchor and boundary predictions at the testing stage. The proposed model, dubbed Contextual Boundary-aware Prediction (CBP), outperforms its competitors with a clear margin on three public datasets. All codes are available on this https URL .

...read moreread less

80 citations

Proceedings Article•DOI•

Understanding Learning Dynamics Of Language Models with SVCCA

[...]

Naomi Saphra¹, Adam Lopez•Institutions (1)

University of Edinburgh¹

22 Feb 2019

TL;DR: This paper used Singular Vector Canonical Correlation Analysis (SVCCA) to compare learned representations across time and across models, without the need to evaluate directly on annotated data.

...read moreread less

Abstract: Research has shown that neural models implicitly encode linguistic features, but there has been no research showing how these encodings arise as the models are trained. We present the first study on the learning dynamics of neural language models, using a simple and flexible analysis method called Singular Vector Canonical Correlation Analysis (SVCCA), which enables us to compare learned representations across time and across models, without the need to evaluate directly on annotated data. We probe the evolution of syntactic, semantic, and topic representations, finding, for example, that part-of-speech is learned earlier than topic; that recurrent layers become more similar to those of a tagger during training; and embedding layers less similar. Our results and methods could inform better learning algorithms for NLP models, possibly to incorporate linguistic information more effectively.

...read moreread less

70 citations

Journal Article•DOI•

Unbalanced double hierarchy linguistic term set: The TOPSIS method for multi-expert qualitative decision making involving green mine selection

[...]

Ziguo Fu¹, Huchang Liao¹•Institutions (1)

Sichuan University¹

01 Nov 2019-Information Fusion

TL;DR: To characterize the unbalanced distribution of semantics of the second hierarchy linguistic terms, three linguistic scale functions with cognitive bias parameters are proposed and a non-linear fitting method is presented to determine these parameters.

...read moreread less

65 citations

Posted Content•

Unsupervised Recurrent Neural Network Grammars

[...]

Yoon Kim¹, Alexander M. Rush¹, Lei Yu², Adhiguna Kuncoro², Chris Dyer², Gábor Melis² - Show less +2 more•Institutions (2)

Harvard University¹, Google²

07 Apr 2019-arXiv: Computation and Language

TL;DR: This paper proposed an unsupervised learning of recurrent neural network grammars (RNNG) using amortized variational inference to maximize the evidence lower bound of the lower bound.

...read moreread less

55 citations

Posted Content•

Grammar-based Neural Text-to-SQL Generation.

[...]

Kevin Lin, Ben Bogin, Mark Neumann, Jonathan Berant, Matt Gardner - Show less +1 more

30 May 2019-arXiv: Computation and Language

TL;DR: The authors construct a schema-dependent grammar with minimal over-generation, and apply it to text-to-SQL data sets, including ATIS and Spider, and demonstrate that they yield 14-18% relative reductions in error.

...read moreread less

Abstract: The sequence-to-sequence paradigm employed by neural text-to-SQL models typically performs token-level decoding and does not consider generating SQL hierarchically from a grammar. Grammar-based decoding has shown significant improvements for other semantic parsing tasks, but SQL and other general programming languages have complexities not present in logical formalisms that make writing hierarchical grammars difficult. We introduce techniques to handle these complexities, showing how to construct a schema-dependent grammar with minimal over-generation. We analyze these techniques on ATIS and Spider, two challenging text-to-SQL datasets, demonstrating that they yield 14--18\% relative reductions in error.

...read moreread less

47 citations

Posted Content•

LIMIT-BERT : Linguistic Informed Multi-Task BERT

[...]

Junru Zhou, Zhuosheng Zhang, Hai Zhao, Shuailiang Zhang

31 Oct 2019-arXiv: Computation and Language

TL;DR: LIMIT-BERT outperforms the strong baseline Whole Word Masking BERT on both dependency and constituent syntactic/semantic parsing, GLUE benchmark, and SNLI task and is able to release a well pre-trained model for multi-purpose of natural language processing tasks once for all.

...read moreread less

Abstract: In this paper, we present a Linguistic Informed Multi-Task BERT (LIMIT-BERT) for learning language representations across multiple linguistic tasks by Multi-Task Learning (MTL). LIMIT-BERT includes five key linguistic syntax and semantics tasks: Part-Of-Speech (POS) tags, constituent and dependency syntactic parsing, span and dependency semantic role labeling (SRL). Besides, LIMIT-BERT adopts linguistics mask strategy: Syntactic and Semantic Phrase Masking which mask all of the tokens corresponding to a syntactic/semantic phrase. Different from recent Multi-Task Deep Neural Networks (MT-DNN) (Liu et al., 2019), our LIMIT-BERT is linguistically motivated and learning in a semi-supervised method which provides large amounts of linguistic-task data as same as BERT learning corpus. As a result, LIMIT-BERT not only improves linguistic tasks performance but also benefits from a regularization effect and linguistic information that leads to more general representations to help adapt to new tasks and domains. LIMIT-BERT obtains new state-of-the-art or competitive results on both span and dependency semantic parsing on Propbank benchmarks and both dependency and constituent syntactic parsing on Penn Treebank.

...read moreread less

Proceedings Article•DOI•

Neural Text Normalization with Subword Units.

[...]

Courtney Mansfield, Ming Sun¹, Yuzong Liu², Ankur Gandhe², Bjorn Hoffmeister² - Show less +1 more•Institutions (2)

Disney Research¹, Amazon.com²

01 Jun 2019

TL;DR: This work frames TN as a machine translation task and tackles it with sequence-to-sequence (seq2seq) models, and finds subword models with additional linguistic features yield the best performance.

...read moreread less

Abstract: Text normalization (TN) is an important step in conversational systems. It converts written text to its spoken form to facilitate speech recognition, natural language understanding and text-to-speech synthesis. Finite state transducers (FSTs) are commonly used to build grammars that handle text normalization. However, translating linguistic knowledge into grammars requires extensive effort. In this paper, we frame TN as a machine translation task and tackle it with sequence-to-sequence (seq2seq) models. Previous research focuses on normalizing a word (or phrase) with the help of limited word-level context, while our approach directly normalizes full sentences. We find subword models with additional linguistic features yield the best performance (with a word error rate of 0.17%).

...read moreread less

Proceedings Article•DOI•

Empirical Linguistic Study of Sentence Embeddings

[...]

Katarzyna Krasnowska-Kieraś¹, Alina Wróblewska¹•Institutions (1)

Polish Academy of Sciences¹

01 Jul 2019

TL;DR: A method of analysing the content of sentence embeddings based on universal probing tasks, along with the classification datasets for two contrasting languages, to answer the question whether linguistic information is retained in vector representations of sentences.

...read moreread less

Abstract: The purpose of the research is to answer the question whether linguistic information is retained in vector representations of sentences. We introduce a method of analysing the content of sentence embeddings based on universal probing tasks, along with the classification datasets for two contrasting languages. We perform a series of probing and downstream experiments with different types of sentence embeddings, followed by a thorough analysis of the experimental results. Aside from dependency parser-based embeddings, linguistic information is retained best in the recently proposed LASER sentence embeddings.

...read moreread less

Journal Article•DOI•

Automatic translation of Arabic text-to-Arabic sign language

[...]

Hamzah Luqman¹, Sabri A. Mahmoud¹•Institutions (1)

King Fahd University of Petroleum and Minerals¹

01 Nov 2019-Universal Access in The Information Society

TL;DR: This work proposes a rule-based machine translation system to translate Arabic text into ArSL, and develops a parallel corpus in the health domain, which consists of 600 sentences, and will be freely available for researchers.

...read moreread less

Abstract: Arabic sign language (ArSL) is a full natural language that is used by the deaf in Arab countries to communicate in their community. Unfamiliarity with this language increases the isolation of deaf people from society. This language has a different structure, word order, and lexicon than Arabic. The translation between ArSL and Arabic is a complete machine translation challenge, because the two languages have different structures and grammars. In this work, we propose a rule-based machine translation system to translate Arabic text into ArSL. The proposed system performs a morphological, syntactic, and semantic analysis on an Arabic sentence to translate it into a sentence with the grammar and structure of ArSL. To transcribe ArSL, we propose a gloss system that can be used to represent ArSL. In addition, we develop a parallel corpus in the health domain, which consists of 600 sentences, and will be freely available for researchers. We evaluate our translation system on this corpus and find that our translation system provides an accurate translation for more than 80% of the translated sentences.

...read moreread less

Proceedings Article•DOI•

An End-to-End Generative Architecture for Paraphrase Generation

[...]

Qian Yang¹, Zhouyuan Huo², Dinghan Shen¹, Yong Cheng³, Wenlin Wang¹, Guoyin Wang¹, Lawrence Carin¹ - Show less +3 more•Institutions (3)

Duke University¹, University of Pittsburgh², Google³

01 Nov 2019

TL;DR: This work proposes the first end-to-end conditional generative architecture for generating paraphrases via adversarial training, which does not depend on extra linguistic information and achieves state-of-the-art results.

...read moreread less

Abstract: Generating high-quality paraphrases is a fundamental yet challenging natural language processing task. Despite the effectiveness of previous work based on generative models, there remain problems with exposure bias in recurrent neural networks, and often a failure to generate realistic sentences. To overcome these challenges, we propose the first end-to-end conditional generative architecture for generating paraphrases via adversarial training, which does not depend on extra linguistic information. Extensive experiments on four public datasets demonstrate the proposed method achieves state-of-the-art results, outperforming previous generative architectures on both automatic metrics (BLEU, METEOR, and TER) and human evaluations.

...read moreread less

Journal Article•DOI•

Random Language Model.

[...]

Eric DeGiuli¹•Institutions (1)

École Normale Supérieure¹

29 Mar 2019-Physical Review Letters

TL;DR: A model of random languages, defined by weighted context-free grammars, is considered, finding a transition is found from a random phase, in which sentences are indistinguishable from noise, to an organized phase in which nontrivial information is carried.

...read moreread less

Abstract: Many complex generative systems use languages to create structured objects We consider a model of random languages, defined by weighted context-free grammars As the distribution of grammar weights broadens, a transition is found from a random phase, in which sentences are indistinguishable from noise, to an organized phase in which nontrivial information is carried This marks the emergence of deep structure in the language, and can be understood by a competition between energy and entropy

...read moreread less

Journal Article•DOI•

Transmission lines’ fault detection using syntactic pattern recognition

[...]

Christos Pavlatos¹, Vasiliki Vita², Alexandros C. Dimopoulos³, Lambros Ekonomou²•Institutions (3)

Hellenic Air Force Academy¹, School of Pedagogical and Technological Education², National and Kapodistrian University of Athens³

01 May 2019-Energy Systems

TL;DR: An efficient hardware relay is presented that is capable of detecting faults existing at power system waveforms in $$\upmu $$μs, after reading each peak of the examined signal, aiming to reduce or even to totally prevent safety problems and economic losses.

...read moreread less

Abstract: In this paper, an efficient hardware relay is presented that is implemented based on syntactic pattern recognition techniques. The proposed system is capable of detecting faults existing at power system waveforms in $$\upmu $$ s, after reading each peak of the examined signal, aiming to reduce or even to totally prevent safety problems and economic losses. In order syntactic pattern recognition methods to be utilized as a recognition tool of waveforms at transmission lines, the tasks of selecting appropriate primitive patterns, determining the linguistic representation and forming a suitable grammar, should be developed. In this study, attribute grammars have been selected to model the examined signals due to their power to describe syntactic and semantic knowledge. The hardware implementation of the suggested relay, that stands on Earley’s parsing algorithm, is developed using Verilog hardware description language, downloaded on a Virtex 7 XILINX FPGA board and evaluated through real waveforms and data received from IPTO. The obtained results have shown that the presented system could be an efficient alternative tool in the field of transmission lines’ fault detection.

...read moreread less

Journal Article•DOI•

Finding the P3 in the P600: Decoding shared neural mechanisms of responses to syntactic violations and oddball targets

[...]

Jona Sassenhagen¹, Christian J. Fiebach¹•Institutions (1)

Goethe University Frankfurt¹

15 Oct 2019-NeuroImage

TL;DR: It is demonstrated that P3 and P600 share neural patterns to a substantial degree, calling into question the interpretation of P600 as a language-specific brain response and instead strengthening its association with the P3.

...read moreread less

Journal Article•DOI•

Global Syntactic Variation in Seven Languages: Toward a Computational Dialectology

[...]

Jonathan Dunn¹•Institutions (1)

University of Canterbury¹

14 Aug 2019

TL;DR: This paper uses Computational Construction Grammar to provide a replicable and falsifiable set of syntactic features and uses global language mapping based on web-crawled and social media datasets to determine the selection of national varieties.

...read moreread less

Abstract: The goal of this paper is to provide a complete representation of regional linguistic variation on a global scale. To this end, the paper focuses on removing three constraints that have previously limited work within dialectology/dialectometry. First, rather than assuming a fixed and incomplete set of variants, we use Computational Construction Grammar to provide a replicable and falsifiable set of syntactic features. Second, rather than assuming a specific area of interest, we use global language mapping based on web-crawled and social media datasets to determine the selection of national varieties. Third, rather than looking at a single language in isolation, we model seven major languages together using the same methods: Arabic, English, French, German, Portuguese, Russian, and Spanish. Results show that models for each language are able to robustly predict the region-of-origin of held-out samples better using Construction Grammars than using simpler syntactic features. These global-scale experiments are used to argue that new methods in computational sociolinguistics are able to provide more generalized models of regional variation that are essential for understanding language variation and change at scale.

...read moreread less

Proceedings Article•DOI•

Cross-Lingual Transfer of Semantic Roles: From Raw Text to Semantic Roles

[...]

Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab

01 May 2019

TL;DR: This work describes a transfer method based on annotation projection to develop a dependency-based semantic role labeling system for languages for which no supervised linguistic information other than parallel data is available.

...read moreread less

Abstract: We describe a transfer method based on annotation projection to develop a dependency-based semantic role labeling system for languages for which no supervised linguistic information other than parallel data is available. Unlike previous work that presumes the availability of supervised features such as lemmas, part-of-speech tags, and dependency parse trees, we only make use of word and character features. Our deep model considers using character-based representations as well as unsupervised stem embeddings to alleviate the need for supervised features. Our experiments outperform a state-of-the-art method that uses supervised lexico-syntactic features on 6 out of 7 languages in the Universal Proposition Bank.

...read moreread less

Journal Article•DOI•

A product and process analysis of post-editor corrections on neural, statistical and rule-based machine translation output

[...]

Maarit Koponen¹, Leena Salmi¹, Markku Nikulin¹•Institutions (1)

University of Turku¹

08 Mar 2019-Machine Translation

TL;DR: The results suggest that although different types of edits were needed to outputs from NMT, RBMT and SMT systems, the difference is not necessarily reflected in process-based effort indicators.

...read moreread less

Abstract: This paper presents a comparison of post-editing (PE) changes performed on English-to-Finnish neural (NMT), rule-based (RBMT) and statistical machine translation (SMT) output, combining a product-based and a process-based approach. A total of 33 translation students acted as participants in a PE experiment providing both post-edited texts and edit process data. Our product-based analysis of the post-edited texts shows statistically significant differences in the distribution of edit types between machine translation systems. Deletions were the most common edit type for the RBMT, insertions for the SMT, and word form changes as well as word substitutions for the NMT system. The results also show significant differences in the correctness and necessity of the edits, particularly in the form of a large number of unnecessary edits in the RBMT output. Problems related to certain verb forms and ambiguity were observed for NMT and SMT, while RBMT was more likely to handle them correctly. Process-based comparison of effort indicators shows a slight increase of keystrokes per word for NMT output, and a slight decrease in average pause length for NMT compared to RBMT and SMT in specific text blocks. A statistically significant difference was observed in the number of visits per sub-segment, which is lower for NMT than for RBMT and SMT. The results suggest that although different types of edits were needed to outputs from NMT, RBMT and SMT systems, the difference is not necessarily reflected in process-based effort indicators.

...read moreread less

Posted Content•

On the Linguistic Representational Power of Neural Machine Translation Models

[...]

Yonatan Belinkov¹, Nadir Durrani², Fahim Dalvi², Hassan Sajjad², James Glass¹ - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Qatar Computing Research Institute²

01 Nov 2019-arXiv: Computation and Language

TL;DR: It is shown that deep NMT models trained in an end-to-end fashion, without being provided any direct supervision during the training process, learn a non-trivial amount of linguistic information.

...read moreread less

Abstract: Despite the recent success of deep neural networks in natural language processing (NLP), their interpretability remains a challenge. We analyze the representations learned by neural machine translation models at various levels of granularity and evaluate their quality through relevant extrinsic properties. In particular, we seek answers to the following questions: (i) How accurately is word-structure captured within the learned representations, an important aspect in translating morphologically-rich languages? (ii) Do the representations capture long-range dependencies, and effectively handle syntactically divergent languages? (iii) Do the representations capture lexical semantics? We conduct a thorough investigation along several parameters: (i) Which layers in the architecture capture each of these linguistic phenomena; (ii) How does the choice of translation unit (word, character, or subword unit) impact the linguistic properties captured by the underlying representations? (iii) Do the encoder and decoder learn differently and independently? (iv) Do the representations learned by multilingual NMT models capture the same amount of linguistic information as their bilingual counterparts? Our data-driven, quantitative evaluation illuminates important aspects in NMT models and their ability to capture various linguistic phenomena. We show that deep NMT models learn a non-trivial amount of linguistic information. Notable findings include: i) Word morphology and part-of-speech information are captured at the lower layers of the model; (ii) In contrast, lexical semantics or non-local syntactic and semantic dependencies are better represented at the higher layers; (iii) Representations learned using characters are more informed about wordmorphology compared to those learned using subword units; and (iv) Representations learned by multilingual models are richer compared to bilingual models.

...read moreread less

Journal Article•DOI•

Representing and Learning Grammars in Answer Set Programming

[...]

Mark Law¹, Alessandra Russo¹, Elisa Bertino², Krysia Broda¹, Jorge Lobo - Show less +1 more•Institutions (2)

Imperial College London¹, Purdue University²

17 Jul 2019

TL;DR: To aid the applicability of these grammars to computational problems that require context-sensitive parsers for partially known languages, a learning task for inducing the annotations of an ASG is proposed and an algorithm for solving it is presented.

...read moreread less

Abstract: In this paper we introduce an extension of context-free grammars called answer set grammars (ASGs). These grammars allow annotations on production rules, written in the language of Answer Set Programming (ASP), which can express context-sensitive constraints. We investigate the complexity of various classes of ASG with respect to two decision problems: deciding whether a given string belongs to the language of an ASG and deciding whether the language of an ASG is non-empty. Specifically, we show that the complexity of these decision problems can be lowered by restricting the subset of the ASP language used in the annotations. To aid the applicability of these grammars to computational problems that require context-sensitive parsers for partially known languages, we propose a learning task for inducing the annotations of an ASG. We characterise the complexity of this task and present an algorithm for solving it. An evaluation of a (prototype) implementation is also discussed.

...read moreread less

Proceedings Article•DOI•

REINAM: reinforcement learning for input-grammar inference

[...]

Zhengkai Wu¹, Evan Johnson¹, Wei Yang², Osbert Bastani³, Dawn Song⁴, Jian Peng¹, Tao Xie¹ - Show less +3 more•Institutions (4)

University of Illinois at Urbana–Champaign¹, University of Texas at Dallas², University of Pennsylvania³, University of California, Berkeley⁴

12 Aug 2019

TL;DR: REINAM is able to synthesize a grammar covering the entire valid input space for some benchmarks without decreasing the accuracy of the grammar, and fuzz testing based on REINAM substantially increases the coverage of the space of valid inputs.

...read moreread less

Abstract: Program input grammars (i.e., grammars encoding the language of valid program inputs) facilitate a wide range of applications in software engineering such as symbolic execution and delta debugging. Grammars synthesized by existing approaches can cover only a small part of the valid input space mainly due to unanalyzable code (e.g., native code) in programs and lacking high-quality and high-variety seed inputs. To address these challenges, we present REINAM, a reinforcement-learning approach for synthesizing probabilistic context-free program input grammars without any seed inputs. REINAM uses an industrial symbolic execution engine to generate an initial set of inputs for the given target program, and then uses an iterative process of grammar generalization to proactively generate additional inputs to infer grammars generalized from these initial seed inputs. To efficiently search for target generalizations in a huge search space of candidate generalization operators, REINAM includes a novel formulation of the search problem as a reinforcement learning problem. Our evaluation on eleven real-world benchmarks shows that REINAM outperforms an existing state-of-the-art approach on precision and recall of synthesized grammars, and fuzz testing based on REINAM substantially increases the coverage of the space of valid inputs. REINAM is able to synthesize a grammar covering the entire valid input space for some benchmarks without decreasing the accuracy of the grammar.

...read moreread less

Proceedings Article•DOI•

Semantic graph parsing with recurrent neural network DAG grammars

[...]

Federico Fancellu¹, Sorcha Gilroy, Adam Lopez, Mirella Lapata•Institutions (1)

Samsung¹

04 Nov 2019

TL;DR: The authors proposed a graph-aware sequence model that generates well-formed graphs while sidestepping many difficulties in graph prediction, such as the difficulty of predicting linearized graphs in semantic parsing.

...read moreread less

Abstract: Semantic parses are directed acyclic graphs (DAGs), so semantic parsing should be modeled as graph prediction. But predicting graphs presents difficult technical challenges, so it is simpler and more common to predict the *linearized* graphs found in semantic parsing datasets using well-understood sequence models. The cost of this simplicity is that the predicted strings may not be well-formed graphs. We present recurrent neural network DAG grammars, a graph-aware sequence model that generates only well-formed graphs while sidestepping many difficulties in graph prediction. We test our model on the Parallel Meaning Bank—a multilingual semantic graphbank. Our approach yields competitive results in English and establishes the first results for German, Italian and Dutch.

...read moreread less

Posted Content•

BLiMP: A Benchmark of Linguistic Minimal Pairs for English

[...]

Alex Warstadt, Alicia Parrish, Haokun Liu, Anhad Mohananey, Wei Peng, Sheng-Fu Wang, Samuel R. Bowman - Show less +3 more

02 Dec 2019

TL;DR: The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP) as mentioned in this paper is a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English.

...read moreread less

Proceedings Article•DOI•

Probing Multilingual Sentence Representations With X-Probe

[...]

Vinit Ravishankar¹, Lilja Øvrelid¹, Erik Velldal¹•Institutions (1)

University of Oslo¹

01 Jun 2019

TL;DR: It is discovered that cross-lingually mapped representations are often better at retaining certain linguistic information than representations derived from English encoders trained on natural language inference (NLI) as a downstream task.

...read moreread less

Abstract: This paper extends the task of probing sentence representations for linguistic insight in a multilingual domain. In doing so, we make two contributions: first, we provide datasets for multilingual probing, derived from Wikipedia, in five languages, viz. English, French, German, Spanish and Russian. Second, we evaluate six sentence encoders for each language, each trained by mapping sentence representations to English sentence representations, using sentences in a parallel corpus. We discover that cross-lingually mapped representations are often better at retaining certain linguistic information than representations derived from English encoders trained on natural language inference (NLI) as a downstream task.

...read moreread less

Proceedings Article•DOI•

Linguistic Information in Neural Semantic Parsing with Multiple Encoders

[...]

Rik van Noord, Antonio Toral, Johan Bos

01 May 2019

TL;DR: It is shown that (i) linguistic features can be beneficial for neural semantic parsing and (ii) the best method of adding these features is by using multiple encoders.

...read moreread less

Abstract: Recently, sequence-to-sequence models have achieved impressive performance on a number of semantic parsing tasks. However, they often do not exploit available linguistic resources, while these, when employed correctly, are likely to increase performance even further. Research in neural machine translation has shown that employing this information has a lot of potential, especially when using a multi-encoder setup. We employ a range of semantic and syntactic resources to improve performance for the task of Discourse Representation Structure Parsing. We show that (i) linguistic features can be beneficial for neural semantic parsing and (ii) the best method of adding these features is by using multiple encoders.

...read moreread less

Journal Article•DOI•

Linguistic networks associated with lexical, semantic and syntactic predictability in reading: A fixation-related fMRI study.

[...]

Benjamin T. Carter¹, Brent Foster¹, Nathan M. Muncy¹, Steven G. Luke¹•Institutions (1)

Brigham Young University¹

01 Apr 2019-NeuroImage

TL;DR: Results suggest that most linguistic predictions are graded in nature, activating components of the existing language system, including the anterior temporal lobe and the inferior posterior temporal cortex.

...read moreread less

Posted Content•

Semantic Graph Parsing with Recurrent Neural Network DAG Grammars

[...]

Federico Fancellu¹, Sorcha Gilroy, Adam Lopez, Mirella Lapata•Institutions (1)

Samsung¹

30 Sep 2019-arXiv: Computation and Language

TL;DR: Recurrent neural network DAG grammars is presented, a graph-aware sequence model that generates only well-formed graphs while sidestepping many difficulties in graph prediction.

...read moreread less

Abstract: Semantic parses are directed acyclic graphs (DAGs), so semantic parsing should be modeled as graph prediction. But predicting graphs presents difficult technical challenges, so it is simpler and more common to predict the linearized graphs found in semantic parsing datasets using well-understood sequence models. The cost of this simplicity is that the predicted strings may not be well-formed graphs. We present recurrent neural network DAG grammars, a graph-aware sequence model that ensures only well-formed graphs while sidestepping many difficulties in graph prediction. We test our model on the Parallel Meaning Bank---a multilingual semantic graphbank. Our approach yields competitive results in English and establishes the first results for German, Italian and Dutch.

...read moreread less

Collapse