scispace - formally typeset
Search or ask a question

Showing papers on "Rule-based machine translation published in 2019"


Proceedings ArticleDOI
01 Jun 2019
TL;DR: A new universal human parsing agent, named ``Graphonomy", is proposed, which incorporates hierarchical graph transfer learning upon the conventional parsing network to encode the underlying label semantic structures and propagate relevant semantic information.
Abstract: Prior highly-tuned human parsing models tend to fit towards each dataset in a specific domain or with discrepant label granularity, and can hardly be adapted to other human parsing tasks without extensive re-training. In this paper, we aim to learn a single universal human parsing model that can tackle all kinds of human parsing needs by unifying label annotations from different domains or at various levels of granularity. This poses many fundamental learning challenges, e.g. discovering underlying semantic structures among different label granularity, performing proper transfer learning across different image domains, and identifying and utilizing label redundancies across related tasks. To address these challenges, we propose a new universal human parsing agent, named ``Graphonomy", which incorporates hierarchical graph transfer learning upon the conventional parsing network to encode the underlying label semantic structures and propagate relevant semantic information. In particular, Graphonomy first learns and propagates compact high-level graph representation among the labels within one dataset via Intra-Graph Reasoning, and then transfers semantic information across multiple datasets via Inter-Graph Transfer. Various graph transfer dependencies (e.g., similarity, linguistic knowledge) between different datasets are analyzed and encoded to enhance graph transfer capability. By distilling universal semantic graph representation to each specific task, Graphonomy is able to predict all levels of parsing labels in one system without piling up the complexity. Experimental results show Graphonomy effectively achieves the state-of-the-art results on three human parsing benchmarks as well as advantageous universal human parsing performance.

161 citations


DOI
01 Jan 2019
TL;DR: The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP) as discussed by the authors is a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English.
Abstract: We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. The data is automatically generated according to expert-crafted grammars, and aggregate human agreement with the labels is 96.4%. We use it to evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs. We find that state-of-the-art models identify morphological contrasts reliably, but they struggle with semantic restrictions on the distribution of quantifiers and negative polarity items and subtle syntactic phenomena such as extraction islands.

133 citations


Journal ArticleDOI
TL;DR: Data2Vis is introduced, an end-to-end trainable neural translation model for automatically generating visualizations from given datasets that are comparable to manually created visualizations in a fraction of the time, with potential to learn more complex visualization strategies at scale.
Abstract: Rapidly creating effective visualizations using expressive grammars is challenging for users who have limited time and limited skills in statistics and data visualization. Even high-level, dedicated visualization tools often require users to manually select among data attributes, decide which transformations to apply, and specify mappings between visual encoding variables and raw or transformed attributes. In this paper we introduce Data2Vis, an end-to-end trainable neural translation model for automatically generating visualizations from given datasets. We formulate visualization generation as a language translation problem, where data specifications are mapped to visualization specifications in a declarative language (Vega-Lite). To this end, we train a multilayered attention-based encoder–decoder network with long short-term memory (LSTM) units on a corpus of visualization specifications. Qualitative results show that our model learns the vocabulary and syntax for a valid visualization specification, appropriate transformations (count, bins, mean), and how to use common data selection patterns that occur within data visualizations. We introduce two metrics for evaluating the task of automated visualization generation (language syntax validity, visualization grammar syntax validity) and demonstrate the efficacy of bidirectional models with attention mechanisms for this task. Data2Vis generates visualizations that are comparable to manually created visualizations in a fraction of the time, with potential to learn more complex visualization strategies at scale.

121 citations


Proceedings ArticleDOI
07 Apr 2019
TL;DR: An inference network parameterized as a neural CRF constituency parser is developed to maximize the evidence lower bound and apply amortized variational inference to unsupervised learning of RNNGs.
Abstract: Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNGs. Since directly marginalizing over the space of latent trees is intractable, we instead apply amortized variational inference. To maximize the evidence lower bound, we develop an inference network parameterized as a neural CRF constituency parser. On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese. On constituency grammar induction, they are competitive with recent neural language models that induce tree structures from words through attention mechanisms.

115 citations


Book ChapterDOI
01 Jan 2019
TL;DR: This chapter provides a connection between the model of the use of a linguistic hierarchy and the numerical Scale model, and shows that the numerical scale model can provide a unified framework to connect different linguistic symbolic computational models.
Abstract: The 2-tuple linguistic representation model is widely used as a basis for linguistic symbolic computational models in linguistic decision making problems. In this chapter we provide a connection between the model of the use of a linguistic hierarchy and the numerical scale model, and then show that the numerical scale model can provide a unified framework [13] to connect different linguistic symbolic computational models. Further, a novel computing with words (CWW) methodology [13] where hesitant fuzzy linguistic term sets (HFLTSs) can be constructed based on unbalanced linguistic term sets (ULTSs) using a numerical scale is proposed. In the proposed CWW methodology, several novel possibility degree formulas for comparing HFLTSs are presented, and novel operators based on a mixed 0–1 linear programming model to aggregate hesitant unbalanced linguistic information are defined.

97 citations


Posted Content
Jingwen Wang1, Lin Ma1, Wenhao Jiang1
TL;DR: This work proposes an end-to-end boundary-aware model, which uses a lightweight branch to predict semantic boundaries corresponding to the given linguistic information, and outperforms its competitors with a clear margin on three public datasets.
Abstract: The task of temporally grounding language queries in videos is to temporally localize the best matched video segment corresponding to a given language (sentence). It requires certain models to simultaneously perform visual and linguistic understandings. Previous work predominantly ignores the precision of segment localization. Sliding window based methods use predefined search window sizes, which suffer from redundant computation, while existing anchor-based approaches fail to yield precise localization. We address this issue by proposing an end-to-end boundary-aware model, which uses a lightweight branch to predict semantic boundaries corresponding to the given linguistic information. To better detect semantic boundaries, we propose to aggregate contextual information by explicitly modeling the relationship between the current element and its neighbors. The most confident segments are subsequently selected based on both anchor and boundary predictions at the testing stage. The proposed model, dubbed Contextual Boundary-aware Prediction (CBP), outperforms its competitors with a clear margin on three public datasets. All codes are available on this https URL .

80 citations


Proceedings ArticleDOI
22 Feb 2019
TL;DR: This paper used Singular Vector Canonical Correlation Analysis (SVCCA) to compare learned representations across time and across models, without the need to evaluate directly on annotated data.
Abstract: Research has shown that neural models implicitly encode linguistic features, but there has been no research showing how these encodings arise as the models are trained. We present the first study on the learning dynamics of neural language models, using a simple and flexible analysis method called Singular Vector Canonical Correlation Analysis (SVCCA), which enables us to compare learned representations across time and across models, without the need to evaluate directly on annotated data. We probe the evolution of syntactic, semantic, and topic representations, finding, for example, that part-of-speech is learned earlier than topic; that recurrent layers become more similar to those of a tagger during training; and embedding layers less similar. Our results and methods could inform better learning algorithms for NLP models, possibly to incorporate linguistic information more effectively.

70 citations


Journal ArticleDOI
TL;DR: To characterize the unbalanced distribution of semantics of the second hierarchy linguistic terms, three linguistic scale functions with cognitive bias parameters are proposed and a non-linear fitting method is presented to determine these parameters.

65 citations


Posted Content
TL;DR: This paper proposed an unsupervised learning of recurrent neural network grammars (RNNG) using amortized variational inference to maximize the evidence lower bound of the lower bound.
Abstract: Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNGs. Since directly marginalizing over the space of latent trees is intractable, we instead apply amortized variational inference. To maximize the evidence lower bound, we develop an inference network parameterized as a neural CRF constituency parser. On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese. On constituency grammar induction, they are competitive with recent neural language models that induce tree structures from words through attention mechanisms.

55 citations


Posted Content
TL;DR: The authors construct a schema-dependent grammar with minimal over-generation, and apply it to text-to-SQL data sets, including ATIS and Spider, and demonstrate that they yield 14-18% relative reductions in error.
Abstract: The sequence-to-sequence paradigm employed by neural text-to-SQL models typically performs token-level decoding and does not consider generating SQL hierarchically from a grammar. Grammar-based decoding has shown significant improvements for other semantic parsing tasks, but SQL and other general programming languages have complexities not present in logical formalisms that make writing hierarchical grammars difficult. We introduce techniques to handle these complexities, showing how to construct a schema-dependent grammar with minimal over-generation. We analyze these techniques on ATIS and Spider, two challenging text-to-SQL datasets, demonstrating that they yield 14--18\% relative reductions in error.

47 citations


Posted Content
TL;DR: LIMIT-BERT outperforms the strong baseline Whole Word Masking BERT on both dependency and constituent syntactic/semantic parsing, GLUE benchmark, and SNLI task and is able to release a well pre-trained model for multi-purpose of natural language processing tasks once for all.
Abstract: In this paper, we present a Linguistic Informed Multi-Task BERT (LIMIT-BERT) for learning language representations across multiple linguistic tasks by Multi-Task Learning (MTL). LIMIT-BERT includes five key linguistic syntax and semantics tasks: Part-Of-Speech (POS) tags, constituent and dependency syntactic parsing, span and dependency semantic role labeling (SRL). Besides, LIMIT-BERT adopts linguistics mask strategy: Syntactic and Semantic Phrase Masking which mask all of the tokens corresponding to a syntactic/semantic phrase. Different from recent Multi-Task Deep Neural Networks (MT-DNN) (Liu et al., 2019), our LIMIT-BERT is linguistically motivated and learning in a semi-supervised method which provides large amounts of linguistic-task data as same as BERT learning corpus. As a result, LIMIT-BERT not only improves linguistic tasks performance but also benefits from a regularization effect and linguistic information that leads to more general representations to help adapt to new tasks and domains. LIMIT-BERT obtains new state-of-the-art or competitive results on both span and dependency semantic parsing on Propbank benchmarks and both dependency and constituent syntactic parsing on Penn Treebank.

Proceedings ArticleDOI
01 Jun 2019
TL;DR: This work frames TN as a machine translation task and tackles it with sequence-to-sequence (seq2seq) models, and finds subword models with additional linguistic features yield the best performance.
Abstract: Text normalization (TN) is an important step in conversational systems. It converts written text to its spoken form to facilitate speech recognition, natural language understanding and text-to-speech synthesis. Finite state transducers (FSTs) are commonly used to build grammars that handle text normalization. However, translating linguistic knowledge into grammars requires extensive effort. In this paper, we frame TN as a machine translation task and tackle it with sequence-to-sequence (seq2seq) models. Previous research focuses on normalizing a word (or phrase) with the help of limited word-level context, while our approach directly normalizes full sentences. We find subword models with additional linguistic features yield the best performance (with a word error rate of 0.17%).

Proceedings ArticleDOI
01 Jul 2019
TL;DR: A method of analysing the content of sentence embeddings based on universal probing tasks, along with the classification datasets for two contrasting languages, to answer the question whether linguistic information is retained in vector representations of sentences.
Abstract: The purpose of the research is to answer the question whether linguistic information is retained in vector representations of sentences. We introduce a method of analysing the content of sentence embeddings based on universal probing tasks, along with the classification datasets for two contrasting languages. We perform a series of probing and downstream experiments with different types of sentence embeddings, followed by a thorough analysis of the experimental results. Aside from dependency parser-based embeddings, linguistic information is retained best in the recently proposed LASER sentence embeddings.

Journal ArticleDOI
TL;DR: This work proposes a rule-based machine translation system to translate Arabic text into ArSL, and develops a parallel corpus in the health domain, which consists of 600 sentences, and will be freely available for researchers.
Abstract: Arabic sign language (ArSL) is a full natural language that is used by the deaf in Arab countries to communicate in their community. Unfamiliarity with this language increases the isolation of deaf people from society. This language has a different structure, word order, and lexicon than Arabic. The translation between ArSL and Arabic is a complete machine translation challenge, because the two languages have different structures and grammars. In this work, we propose a rule-based machine translation system to translate Arabic text into ArSL. The proposed system performs a morphological, syntactic, and semantic analysis on an Arabic sentence to translate it into a sentence with the grammar and structure of ArSL. To transcribe ArSL, we propose a gloss system that can be used to represent ArSL. In addition, we develop a parallel corpus in the health domain, which consists of 600 sentences, and will be freely available for researchers. We evaluate our translation system on this corpus and find that our translation system provides an accurate translation for more than 80% of the translated sentences.

Proceedings ArticleDOI
01 Nov 2019
TL;DR: This work proposes the first end-to-end conditional generative architecture for generating paraphrases via adversarial training, which does not depend on extra linguistic information and achieves state-of-the-art results.
Abstract: Generating high-quality paraphrases is a fundamental yet challenging natural language processing task. Despite the effectiveness of previous work based on generative models, there remain problems with exposure bias in recurrent neural networks, and often a failure to generate realistic sentences. To overcome these challenges, we propose the first end-to-end conditional generative architecture for generating paraphrases via adversarial training, which does not depend on extra linguistic information. Extensive experiments on four public datasets demonstrate the proposed method achieves state-of-the-art results, outperforming previous generative architectures on both automatic metrics (BLEU, METEOR, and TER) and human evaluations.

Journal ArticleDOI
TL;DR: A model of random languages, defined by weighted context-free grammars, is considered, finding a transition is found from a random phase, in which sentences are indistinguishable from noise, to an organized phase in which nontrivial information is carried.
Abstract: Many complex generative systems use languages to create structured objects We consider a model of random languages, defined by weighted context-free grammars As the distribution of grammar weights broadens, a transition is found from a random phase, in which sentences are indistinguishable from noise, to an organized phase in which nontrivial information is carried This marks the emergence of deep structure in the language, and can be understood by a competition between energy and entropy

Journal ArticleDOI
TL;DR: An efficient hardware relay is presented that is capable of detecting faults existing at power system waveforms in $$\upmu $$μs, after reading each peak of the examined signal, aiming to reduce or even to totally prevent safety problems and economic losses.
Abstract: In this paper, an efficient hardware relay is presented that is implemented based on syntactic pattern recognition techniques. The proposed system is capable of detecting faults existing at power system waveforms in $$\upmu $$ s, after reading each peak of the examined signal, aiming to reduce or even to totally prevent safety problems and economic losses. In order syntactic pattern recognition methods to be utilized as a recognition tool of waveforms at transmission lines, the tasks of selecting appropriate primitive patterns, determining the linguistic representation and forming a suitable grammar, should be developed. In this study, attribute grammars have been selected to model the examined signals due to their power to describe syntactic and semantic knowledge. The hardware implementation of the suggested relay, that stands on Earley’s parsing algorithm, is developed using Verilog hardware description language, downloaded on a Virtex 7 XILINX FPGA board and evaluated through real waveforms and data received from IPTO. The obtained results have shown that the presented system could be an efficient alternative tool in the field of transmission lines’ fault detection.

Journal ArticleDOI
TL;DR: It is demonstrated that P3 and P600 share neural patterns to a substantial degree, calling into question the interpretation of P600 as a language-specific brain response and instead strengthening its association with the P3.

Journal ArticleDOI
14 Aug 2019
TL;DR: This paper uses Computational Construction Grammar to provide a replicable and falsifiable set of syntactic features and uses global language mapping based on web-crawled and social media datasets to determine the selection of national varieties.
Abstract: The goal of this paper is to provide a complete representation of regional linguistic variation on a global scale. To this end, the paper focuses on removing three constraints that have previously limited work within dialectology/dialectometry. First, rather than assuming a fixed and incomplete set of variants, we use Computational Construction Grammar to provide a replicable and falsifiable set of syntactic features. Second, rather than assuming a specific area of interest, we use global language mapping based on web-crawled and social media datasets to determine the selection of national varieties. Third, rather than looking at a single language in isolation, we model seven major languages together using the same methods: Arabic, English, French, German, Portuguese, Russian, and Spanish. Results show that models for each language are able to robustly predict the region-of-origin of held-out samples better using Construction Grammars than using simpler syntactic features. These global-scale experiments are used to argue that new methods in computational sociolinguistics are able to provide more generalized models of regional variation that are essential for understanding language variation and change at scale.

Proceedings ArticleDOI
01 May 2019
TL;DR: This work describes a transfer method based on annotation projection to develop a dependency-based semantic role labeling system for languages for which no supervised linguistic information other than parallel data is available.
Abstract: We describe a transfer method based on annotation projection to develop a dependency-based semantic role labeling system for languages for which no supervised linguistic information other than parallel data is available. Unlike previous work that presumes the availability of supervised features such as lemmas, part-of-speech tags, and dependency parse trees, we only make use of word and character features. Our deep model considers using character-based representations as well as unsupervised stem embeddings to alleviate the need for supervised features. Our experiments outperform a state-of-the-art method that uses supervised lexico-syntactic features on 6 out of 7 languages in the Universal Proposition Bank.

Journal ArticleDOI
TL;DR: The results suggest that although different types of edits were needed to outputs from NMT, RBMT and SMT systems, the difference is not necessarily reflected in process-based effort indicators.
Abstract: This paper presents a comparison of post-editing (PE) changes performed on English-to-Finnish neural (NMT), rule-based (RBMT) and statistical machine translation (SMT) output, combining a product-based and a process-based approach. A total of 33 translation students acted as participants in a PE experiment providing both post-edited texts and edit process data. Our product-based analysis of the post-edited texts shows statistically significant differences in the distribution of edit types between machine translation systems. Deletions were the most common edit type for the RBMT, insertions for the SMT, and word form changes as well as word substitutions for the NMT system. The results also show significant differences in the correctness and necessity of the edits, particularly in the form of a large number of unnecessary edits in the RBMT output. Problems related to certain verb forms and ambiguity were observed for NMT and SMT, while RBMT was more likely to handle them correctly. Process-based comparison of effort indicators shows a slight increase of keystrokes per word for NMT output, and a slight decrease in average pause length for NMT compared to RBMT and SMT in specific text blocks. A statistically significant difference was observed in the number of visits per sub-segment, which is lower for NMT than for RBMT and SMT. The results suggest that although different types of edits were needed to outputs from NMT, RBMT and SMT systems, the difference is not necessarily reflected in process-based effort indicators.

Posted Content
TL;DR: It is shown that deep NMT models trained in an end-to-end fashion, without being provided any direct supervision during the training process, learn a non-trivial amount of linguistic information.
Abstract: Despite the recent success of deep neural networks in natural language processing (NLP), their interpretability remains a challenge. We analyze the representations learned by neural machine translation models at various levels of granularity and evaluate their quality through relevant extrinsic properties. In particular, we seek answers to the following questions: (i) How accurately is word-structure captured within the learned representations, an important aspect in translating morphologically-rich languages? (ii) Do the representations capture long-range dependencies, and effectively handle syntactically divergent languages? (iii) Do the representations capture lexical semantics? We conduct a thorough investigation along several parameters: (i) Which layers in the architecture capture each of these linguistic phenomena; (ii) How does the choice of translation unit (word, character, or subword unit) impact the linguistic properties captured by the underlying representations? (iii) Do the encoder and decoder learn differently and independently? (iv) Do the representations learned by multilingual NMT models capture the same amount of linguistic information as their bilingual counterparts? Our data-driven, quantitative evaluation illuminates important aspects in NMT models and their ability to capture various linguistic phenomena. We show that deep NMT models learn a non-trivial amount of linguistic information. Notable findings include: i) Word morphology and part-of-speech information are captured at the lower layers of the model; (ii) In contrast, lexical semantics or non-local syntactic and semantic dependencies are better represented at the higher layers; (iii) Representations learned using characters are more informed about wordmorphology compared to those learned using subword units; and (iv) Representations learned by multilingual models are richer compared to bilingual models.

Journal ArticleDOI
17 Jul 2019
TL;DR: To aid the applicability of these grammars to computational problems that require context-sensitive parsers for partially known languages, a learning task for inducing the annotations of an ASG is proposed and an algorithm for solving it is presented.
Abstract: In this paper we introduce an extension of context-free grammars called answer set grammars (ASGs). These grammars allow annotations on production rules, written in the language of Answer Set Programming (ASP), which can express context-sensitive constraints. We investigate the complexity of various classes of ASG with respect to two decision problems: deciding whether a given string belongs to the language of an ASG and deciding whether the language of an ASG is non-empty. Specifically, we show that the complexity of these decision problems can be lowered by restricting the subset of the ASP language used in the annotations. To aid the applicability of these grammars to computational problems that require context-sensitive parsers for partially known languages, we propose a learning task for inducing the annotations of an ASG. We characterise the complexity of this task and present an algorithm for solving it. An evaluation of a (prototype) implementation is also discussed.

Proceedings ArticleDOI
12 Aug 2019
TL;DR: REINAM is able to synthesize a grammar covering the entire valid input space for some benchmarks without decreasing the accuracy of the grammar, and fuzz testing based on REINAM substantially increases the coverage of the space of valid inputs.
Abstract: Program input grammars (i.e., grammars encoding the language of valid program inputs) facilitate a wide range of applications in software engineering such as symbolic execution and delta debugging. Grammars synthesized by existing approaches can cover only a small part of the valid input space mainly due to unanalyzable code (e.g., native code) in programs and lacking high-quality and high-variety seed inputs. To address these challenges, we present REINAM, a reinforcement-learning approach for synthesizing probabilistic context-free program input grammars without any seed inputs. REINAM uses an industrial symbolic execution engine to generate an initial set of inputs for the given target program, and then uses an iterative process of grammar generalization to proactively generate additional inputs to infer grammars generalized from these initial seed inputs. To efficiently search for target generalizations in a huge search space of candidate generalization operators, REINAM includes a novel formulation of the search problem as a reinforcement learning problem. Our evaluation on eleven real-world benchmarks shows that REINAM outperforms an existing state-of-the-art approach on precision and recall of synthesized grammars, and fuzz testing based on REINAM substantially increases the coverage of the space of valid inputs. REINAM is able to synthesize a grammar covering the entire valid input space for some benchmarks without decreasing the accuracy of the grammar.

Proceedings ArticleDOI
04 Nov 2019
TL;DR: The authors proposed a graph-aware sequence model that generates well-formed graphs while sidestepping many difficulties in graph prediction, such as the difficulty of predicting linearized graphs in semantic parsing.
Abstract: Semantic parses are directed acyclic graphs (DAGs), so semantic parsing should be modeled as graph prediction. But predicting graphs presents difficult technical challenges, so it is simpler and more common to predict the *linearized* graphs found in semantic parsing datasets using well-understood sequence models. The cost of this simplicity is that the predicted strings may not be well-formed graphs. We present recurrent neural network DAG grammars, a graph-aware sequence model that generates only well-formed graphs while sidestepping many difficulties in graph prediction. We test our model on the Parallel Meaning Bank—a multilingual semantic graphbank. Our approach yields competitive results in English and establishes the first results for German, Italian and Dutch.

Posted Content
02 Dec 2019
TL;DR: The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP) as mentioned in this paper is a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English.
Abstract: We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. The data is automatically generated according to expert-crafted grammars, and aggregate human agreement with the labels is 96.4%. We use it to evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs. We find that state-of-the-art models identify morphological contrasts reliably, but they struggle with semantic restrictions on the distribution of quantifiers and negative polarity items and subtle syntactic phenomena such as extraction islands.

Proceedings ArticleDOI
01 Jun 2019
TL;DR: It is discovered that cross-lingually mapped representations are often better at retaining certain linguistic information than representations derived from English encoders trained on natural language inference (NLI) as a downstream task.
Abstract: This paper extends the task of probing sentence representations for linguistic insight in a multilingual domain. In doing so, we make two contributions: first, we provide datasets for multilingual probing, derived from Wikipedia, in five languages, viz. English, French, German, Spanish and Russian. Second, we evaluate six sentence encoders for each language, each trained by mapping sentence representations to English sentence representations, using sentences in a parallel corpus. We discover that cross-lingually mapped representations are often better at retaining certain linguistic information than representations derived from English encoders trained on natural language inference (NLI) as a downstream task.

Proceedings ArticleDOI
01 May 2019
TL;DR: It is shown that (i) linguistic features can be beneficial for neural semantic parsing and (ii) the best method of adding these features is by using multiple encoders.
Abstract: Recently, sequence-to-sequence models have achieved impressive performance on a number of semantic parsing tasks. However, they often do not exploit available linguistic resources, while these, when employed correctly, are likely to increase performance even further. Research in neural machine translation has shown that employing this information has a lot of potential, especially when using a multi-encoder setup. We employ a range of semantic and syntactic resources to improve performance for the task of Discourse Representation Structure Parsing. We show that (i) linguistic features can be beneficial for neural semantic parsing and (ii) the best method of adding these features is by using multiple encoders.

Journal ArticleDOI
TL;DR: Results suggest that most linguistic predictions are graded in nature, activating components of the existing language system, including the anterior temporal lobe and the inferior posterior temporal cortex.

Posted Content
TL;DR: Recurrent neural network DAG grammars is presented, a graph-aware sequence model that generates only well-formed graphs while sidestepping many difficulties in graph prediction.
Abstract: Semantic parses are directed acyclic graphs (DAGs), so semantic parsing should be modeled as graph prediction. But predicting graphs presents difficult technical challenges, so it is simpler and more common to predict the linearized graphs found in semantic parsing datasets using well-understood sequence models. The cost of this simplicity is that the predicted strings may not be well-formed graphs. We present recurrent neural network DAG grammars, a graph-aware sequence model that ensures only well-formed graphs while sidestepping many difficulties in graph prediction. We test our model on the Parallel Meaning Bank---a multilingual semantic graphbank. Our approach yields competitive results in English and establishes the first results for German, Italian and Dutch.