scispace - formally typeset
Search or ask a question

Showing papers on "Rule-based machine translation published in 2020"


Proceedings ArticleDOI
01 Jul 2020
TL;DR: An information-theoretic operationalization of probing as estimating mutual information that contradicts received wisdom: one should always select the highest performing probe one can, even if it is more complex, since it will result in a tighter estimate, and thus reveal more of the linguistic information inherent in the representation.
Abstract: The success of neural networks on a diverse set of NLP tasks has led researchers to question how much these networks actually ``know'' about natural language. Probes are a natural way of assessing this. When probing, a researcher chooses a linguistic task and trains a supervised model to predict annotations in that linguistic task from the network's learned representations. If the probe does well, the researcher may conclude that the representations encode knowledge related to the task. A commonly held belief is that using simpler models as probes is better; the logic is that simpler models will identify linguistic structure, but not learn the task itself. We propose an information-theoretic operationalization of probing as estimating mutual information that contradicts this received wisdom: one should always select the highest performing probe one can, even if it is more complex, since it will result in a tighter estimate, and thus reveal more of the linguistic information inherent in the representation. The experimental portion of our paper focuses on empirically estimating the mutual information between a linguistic property and BERT, comparing these estimates to several baselines. We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research---plus English---totalling eleven languages. Our implementation is available in https://github.com/rycolab/info-theoretic-probing.

162 citations


Journal ArticleDOI
Jingwen Wang1, Lin Ma1, Wenhao Jiang1
03 Apr 2020
TL;DR: The authors propose an end-to-end boundary-aware model, which uses a lightweight branch to predict semantic boundaries corresponding to the given linguistic information, and the most confident segments are subsequently selected based on both anchor and boundary predictions at the testing stage.
Abstract: The task of temporally grounding language queries in videos is to temporally localize the best matched video segment corresponding to a given language (sentence). It requires certain models to simultaneously perform visual and linguistic understandings. Previous work predominantly ignores the precision of segment localization. Sliding window based methods use predefined search window sizes, which suffer from redundant computation, while existing anchor-based approaches fail to yield precise localization. We address this issue by proposing an end-to-end boundary-aware model, which uses a lightweight branch to predict semantic boundaries corresponding to the given linguistic information. To better detect semantic boundaries, we propose to aggregate contextual information by explicitly modeling the relationship between the current element and its neighbors. The most confident segments are subsequently selected based on both anchor and boundary predictions at the testing stage. The proposed model, dubbed Contextual Boundary-aware Prediction (CBP), outperforms its competitors with a clear margin on three public datasets.

94 citations


Journal ArticleDOI
TL;DR: Computational and corpus evidence is reported for the hypothesis that a prominent subset of these universal properties—those related to word order—result from a process of optimization for efficient communication among humans, trading off the need to reduce complexity with theneed to reduce ambiguity.
Abstract: The universal properties of human languages have been the subject of intense study across the language sciences. We report computational and corpus evidence for the hypothesis that a prominent subset of these universal properties-those related to word order-result from a process of optimization for efficient communication among humans, trading off the need to reduce complexity with the need to reduce ambiguity. We formalize these two pressures with information-theoretic and neural-network models of complexity and ambiguity and simulate grammars with optimized word-order parameters on large-scale data from 51 languages. Evolution of grammars toward efficiency results in word-order patterns that predict a large subset of the major word-order correlations across languages.

60 citations


Proceedings ArticleDOI
01 Jan 2020
TL;DR: A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language, however, most systems are developed using data from just one language such as English.
Abstract: A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language. Most systems, however, are developed using data from just one language such as English. The SIGMORPHON 2020 shared task on morphological reinflection aims to investigate systems’ ability to generalize across typologically distinct languages, many of which are low resource. Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages. A total of 22 systems (19 neural) from 10 teams were submitted to the task. All four winning systems were neural (two monolingual transformers and two massively multilingual RNN-based models with gated attention). Most teams demonstrate utility of data hallucination and augmentation, ensembles, and multilingual training for low-resource languages. Non-neural learners and manually designed grammars showed competitive and even superior performance on some languages (such as Ingrian, Tajik, Tagalog, Zarma, Lingala), especially with very limited data. Some language families (Afro-Asiatic, Niger-Congo, Turkic) were relatively easy for most systems and achieved over 90% mean accuracy while others were more challenging.

54 citations


Journal ArticleDOI
TL;DR: A new fuzzy linguistic representation model for comparative linguistic expressions is presented that takes advantage of the goodness of the 2-tuple linguistic representations model and improves the interpretability and accuracy of the results in computing with words processes, resulting the so-called extended comparative linguistic expression with symbolic translation.
Abstract: Many real-world decision making (DM) problems present changing contexts in which uncertainty or vagueness appear. Such uncertainty has been often modeled based on the linguistic information by using single linguistic terms. Dealing with linguistic information in DM demands processes of computing with words whose main characteristic is to emulate human beings reasoning processes to obtain linguistic outputs from linguistic inputs. However, often single linguistic terms are limited or do not express properly the expert's knowledge, being necessary to elaborate richer linguistic expressions easy to understand and able to express greater amount of knowledge, as it is the case of the comparative linguistic expressions based on hesitant fuzzy linguistic terms sets. Nevertheless, current computational models for comparative linguistic expressions present limitations both from understandability and precision points of view. The 2-tuple linguistic representation model stands out in these aspects because of its accuracy and interpretability dealing with linguistic terms, both related to the use of the symbolic translation, although 2-tuple linguistic values are still limited by the use of single linguistic terms. Therefore, the aim of this article is to present a new fuzzy linguistic representation model for comparative linguistic expressions that takes advantage of the goodness of the 2-tuple linguistic representation model and improve the interpretability and accuracy of the results in computing with words processes, resulting the so-called extended comparative linguistic expressions with symbolic translation. Taking into account the proposed model, a new computing with words approach is presented and then applied to a DM case study to show its performance and advantages in a real case by comparing with other linguistic decision approaches.

52 citations


Posted Content
TL;DR: The authors propose an information-theoretic operationalization of probing as estimating mutual information that contradicts this received wisdom: one should always select the highest performing probe one can, even if it is more complex, since it will result in a tighter estimate, and thus reveal more of the linguistic information inherent in the representation.
Abstract: The success of neural networks on a diverse set of NLP tasks has led researchers to question how much these networks actually ``know'' about natural language. Probes are a natural way of assessing this. When probing, a researcher chooses a linguistic task and trains a supervised model to predict annotations in that linguistic task from the network's learned representations. If the probe does well, the researcher may conclude that the representations encode knowledge related to the task. A commonly held belief is that using simpler models as probes is better; the logic is that simpler models will identify linguistic structure, but not learn the task itself. We propose an information-theoretic operationalization of probing as estimating mutual information that contradicts this received wisdom: one should always select the highest performing probe one can, even if it is more complex, since it will result in a tighter estimate, and thus reveal more of the linguistic information inherent in the representation. The experimental portion of our paper focuses on empirically estimating the mutual information between a linguistic property and BERT, comparing these estimates to several baselines. We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research---plus English---totalling eleven languages.

52 citations


Proceedings ArticleDOI
01 Jul 2020
TL;DR: ClAMS (Cross-Linguistic Assessment of Models on Syntax), a syntactic evaluation suite for monolingual and multilingual models, is introduced, which uses subject-verb agreement challenge sets for English, French, German, Hebrew and Russian, generated from grammars developed.
Abstract: A range of studies have concluded that neural word prediction models can distinguish grammatical from ungrammatical sentences with high accuracy. However, these studies are based primarily on monolingual evidence from English. To investigate how these models' ability to learn syntax varies by language, we introduce CLAMS (Cross-Linguistic Assessment of Models on Syntax), a syntactic evaluation suite for monolingual and multilingual models. CLAMS includes subject-verb agreement challenge sets for English, French, German, Hebrew and Russian, generated from grammars we develop. We use CLAMS to evaluate LSTM language models as well as monolingual and multilingual BERT. Across languages, monolingual LSTMs achieved high accuracy on dependencies without attractors, and generally poor accuracy on agreement across object relative clauses. On other constructions, agreement accuracy was generally higher in languages with richer morphology. Multilingual models generally underperformed monolingual models. Multilingual BERT showed high syntactic accuracy on English, but noticeable deficiencies in other languages.

50 citations


Proceedings ArticleDOI
01 May 2020
TL;DR: To explore whether syntactic probes would do better to make use of existing techniques, this work compares the structural probe to a more traditional parser with an identical lightweight parameterisation.
Abstract: Measuring what linguistic information is encoded in neural models of language has become popular in NLP. Researchers approach this enterprise by training “probes”—supervised models designed to extract linguistic structure from another model’s output. One such probe is the structural probe (Hewitt and Manning, 2019), designed to quantify the extent to which syntactic information is encoded in contextualised word representations. The structural probe has a novel design, unattested in the parsing literature, the precise benefit of which is not immediately obvious. To explore whether syntactic probes would do better to make use of existing techniques, we compare the structural probe to a more traditional parser with an identical lightweight parameterisation. The parser outperforms structural probe on UUAS in seven of nine analysed languages, often by a substantial amount (e.g. by 11.1 points in English). Under a second less common metric, however, there is the opposite trend—the structural probe outperforms the parser. This begs the question: which metric should we prefer?

49 citations


Proceedings ArticleDOI
01 Jul 2020
TL;DR: A novel reading comprehension paradigm for solving the token-level metaphor detection task which provides an innovative type of solution for this task and proposes an end-to-end deep metaphor detection model named DeepMet based on this paradigm.
Abstract: Machine metaphor understanding is one of the major topics in NLP. Most of the recent attempts consider it as classification or sequence tagging task. However, few types of research introduce the rich linguistic information into the field of computational metaphor by leveraging powerful pre-training language models. We focus a novel reading comprehension paradigm for solving the token-level metaphor detection task which provides an innovative type of solution for this task. We propose an end-to-end deep metaphor detection model named DeepMet based on this paradigm. The proposed approach encodes the global text context (whole sentence), local text context (sentence fragments), and question (query word) information as well as incorporating two types of part-of-speech (POS) features by making use of the advanced pre-training language model. The experimental results by using several metaphor datasets show that our model achieves competitive results in the second shared task on metaphor detection.

43 citations


Proceedings ArticleDOI
01 Dec 2020
TL;DR: This paper investigates the linguistic knowledge learned by a Neural Language Model before and after a fine-tuning process and how this knowledge affects its predictions during several classification problems, and finds that BERT’s capacity to encode different kind of linguistic properties has a positive influence on its predictions.
Abstract: In this paper we investigate the linguistic knowledge learned by a Neural Language Model (NLM) before and after a fine-tuning process and how this knowledge affects its predictions during several classification problems. We use a wide set of probing tasks, each of which corresponds to a distinct sentence-level feature extracted from different levels of linguistic annotation. We show that BERT is able to encode a wide range of linguistic characteristics, but it tends to lose this information when trained on specific downstream tasks. We also find that BERT’s capacity to encode different kind of linguistic properties has a positive influence on its predictions: the more it stores readable linguistic information of a sentence, the higher will be its capacity of predicting the expected label assigned to that sentence.

35 citations


Proceedings ArticleDOI
08 Nov 2020
TL;DR: A general algorithm that takes a program and a small set of sample inputs and automatically infers a readable context-free grammar capturing the input language of the program, and works entirely without program specific heuristics.
Abstract: One of the key properties of a program is its input specification. Having a formal input specification can be critical in fields such as vulnerability analysis, reverse engineering, software testing, clone detection, or refactoring. Unfortunately, accurate input specifications for typical programs are often unavailable or out of date. In this paper, we present a general algorithm that takes a program and a small set of sample inputs and automatically infers a readable context-free grammar capturing the input language of the program. We infer the syntactic input structure only by observing access of input characters at different locations of the input parser. This works on all stack based recursive descent input parsers, including parser combinators, and works entirely without program specific heuristics. Our Mimid prototype produced accurate and readable grammars for a variety of evaluation subjects, including complex languages such as JSON, TinyC, and JavaScript.

Journal ArticleDOI
TL;DR: An interdisciplinary experimental evaluation that compares sequence modeling methods on the task of next-element prediction on four real-life sequence datasets indicates that machine learning methods, which generally do not aim at model interpretability, tend to outperform methods from the process mining and grammar inference fields in terms of accuracy.
Abstract: Data of sequential nature arise in many application domains in the form of, e.g., textual data, DNA sequences, and software execution traces. Different research disciplines have developed methods to learn sequence models from such datasets: (i) In the machine learning field methods such as (hidden) Markov models and recurrent neural networks have been developed and successfully applied to a wide range of tasks, (ii) in process mining process discovery methods aim to generate human-interpretable descriptive models, and (iii) in the grammar inference field the focus is on finding descriptive models in the form of formal grammars. Despite their different focuses, these fields share a common goal: learning a model that accurately captures the sequential behavior in the underlying data. Those sequence models are generative, i.e., they are able to predict what elements are likely to occur after a given incomplete sequence. So far, these fields have developed mainly in isolation from each other and no comparison exists. This paper presents an interdisciplinary experimental evaluation that compares sequence modeling methods on the task of next-element prediction on four real-life sequence datasets. The results indicate that machine learning methods, which generally do not aim at model interpretability, tend to outperform methods from the process mining and grammar inference fields in terms of accuracy.

Proceedings ArticleDOI
01 Nov 2020
TL;DR: A neural semantic parsing system that learns new high-level abstractions through decomposition is introduced, demonstrating the flexibility of modern neural systems, as well as the one-shot reliable generalization of grammar-based methods.
Abstract: Our goal is to create an interactive natural language interface that efficiently and reliably learns from users to complete tasks in simulated robotics settings. We introduce a neural semantic parsing system that learns new high-level abstractions through decomposition: users interactively teach the system by breaking down high-level utterances describing novel behavior into low-level steps that it can understand. Unfortunately, existing methods either rely on grammars which parse sentences with limited flexibility, or neural sequence-to-sequence models that do not learn efficiently or reliably from individual examples. Our approach bridges this gap, demonstrating the flexibility of modern neural systems, as well as the one-shot reliable generalization of grammar-based methods. Our crowdsourced interactive experiments suggest that over time, users complete complex tasks more efficiently while using our system by leveraging what they just taught. At the same time, getting users to trust the system enough to be incentivized to teach high-level utterances is still an ongoing challenge. We end with a discussion of some of the obstacles we need to overcome to fully realize the potential of the interactive paradigm.

Proceedings ArticleDOI
01 Jul 2020
TL;DR: A language-aware interlingua is incorporated into the Encoder-Decoder architecture that enables the model to learn a language-independent representation from the semantic spaces of different languages, while still allowing for language-specific specialization of a particular language-pair.
Abstract: Multilingual neural machine translation (NMT) has led to impressive accuracy improvements in low-resource scenarios by sharing common linguistic information across languages. However, the traditional multilingual model fails to capture the diversity and specificity of different languages, resulting in inferior performance compared with individual models that are sufficiently trained. In this paper, we incorporate a language-aware interlingua into the Encoder-Decoder architecture. The interlingual network enables the model to learn a language-independent representation from the semantic spaces of different languages, while still allowing for language-specific specialization of a particular language-pair. Experiments show that our proposed method achieves remarkable improvements over state-of-the-art multilingual NMT baselines and produces comparable performance with strong individual models.

Proceedings ArticleDOI
01 Jul 2020
TL;DR: The proposed IlliniMet system is presented, a system to automatically detect metaphorical words that combines the strengths of the contextualized representation by the widely used RoBERTa model and the rich linguistic information from external resources such as WordNet.
Abstract: Metaphors are rhetorical use of words based on the conceptual mapping as opposed to their literal use. Metaphor detection, an important task in language understanding, aims to identify metaphors in word level from given sentences. We present IlliniMet, a system to automatically detect metaphorical words. Our model combines the strengths of the contextualized representation by the widely used RoBERTa model and the rich linguistic information from external resources such as WordNet. The proposed approach is shown to outperform strong baselines on a benchmark dataset. Our best model achieves F1 scores of 73.0% on VUA ALLPOS, 77.1% on VUA VERB, 70.3% on TOEFL ALLPOS and 71.9% on TOEFL VERB.

Proceedings ArticleDOI
01 Dec 2020
TL;DR: This work proposes HeterTFV, a graph-based reasoning approach, that learns to combine linguistic information and symbolic information effectively and proposes aGraph-based Reasoning approach to reason over the multiple types of nodes to make an effective combination of both types of information.
Abstract: Table-based fact verification is expected to perform both linguistic reasoning and symbolic reasoning. Existing methods lack attention to take advantage of the combination of linguistic information and symbolic information. In this work, we propose HeterTFV, a graph-based reasoning approach, that learns to combine linguistic information and symbolic information effectively. We first construct a program graph to encode programs, a kind of LISP-like logical form, to learn the semantic compositionality of the programs. Then we construct a heterogeneous graph to incorporate both linguistic information and symbolic information by introducing program nodes into the heterogeneous graph. Finally, we propose a graph-based reasoning approach to reason over the multiple types of nodes to make an effective combination of both types of information. Experimental results on a large-scale benchmark dataset TABFACT illustrate the effect of our approach.

Proceedings ArticleDOI
01 Jan 2020
TL;DR: This paper introduces TurnGPT, a transformer-based language model for predicting turn-shifts in spoken dialog and explores the model’s potential in not only detecting, but also projecting, turn-completions.
Abstract: Syntactic and pragmatic completeness is known to be important for turn-taking prediction, but so far machine learning models of turn-taking have used such linguistic information in a limited way. In this paper, we introduce TurnGPT, a transformer-based language model for predicting turn-shifts in spoken dialog. The model has been trained and evaluated on a variety of written and spoken dialog datasets. We show that the model outperforms two baselines used in prior work. We also report on an ablation study, as well as attention and gradient analyses, which show that the model is able to utilize the dialog context and pragmatic completeness for turn-taking prediction. Finally, we explore the model’s potential in not only detecting, but also projecting, turn-completions.

Proceedings ArticleDOI
Shubhi Tyagi1, Marco Nicolis1, Jonas Rohnke1, Thomas Drugman1, Jaime Lorenzo-Trueba1 
25 Oct 2020
TL;DR: This paper propose a novel embedding selection approach which exploits linguistic information, leveraging the speech variability present in the training dataset, to improve the prosody and naturalness for complex utterances as well as in Long Form Reading (LFR).
Abstract: Recent advances in Text-to-Speech (TTS) have improved quality and naturalness to near-human capabilities when considering isolated sentences. But something which is still lacking in order to achieve human-like communication is the dynamic variations and adaptability of human speech. This work attempts to solve the problem of achieving a more dynamic and natural intonation in TTS systems, particularly for stylistic speech such as the newscaster speaking style. We propose a novel embedding selection approach which exploits linguistic information, leveraging the speech variability present in the training dataset. We analyze the contribution of both semantic and syntactic features. Our results show that the approach improves the prosody and naturalness for complex utterances as well as in Long Form Reading (LFR).

Journal ArticleDOI
TL;DR: The extensive experiments demonstrate that incorporating fine-grained local linguistic information with cross-modal correlation can greatly improve the performance of text-to-image synthesis, even when generating high-resolution images.
Abstract: The task of text-to-image synthesis is to generate photographic images conditioned on given textual descriptions. This challenging task has recently attracted considerable attention from the multimedia community due to its potential applications. Most of the up-to-date approaches are built based on generative adversarial network (GAN) models, and they synthesize images conditioned on the global linguistic representation. However, the sparsity of the global representation results in training difficulties on GANs and a shortage of fine-grained information in the generated images. To address this problem, we propose cross-modal global and local linguistic representations-based generative adversarial networks (CGL-GAN) by incorporating the local linguistic representation into the GAN. In our CGL-GAN, we construct a generator to synthesize the target images and a discriminator to judge whether the generated images conform with the text description. In the discriminator, we construct the cross-modal correlation by projecting the image representations at high and low levels onto the global and local linguistic representations, respectively. We design the hinge loss function to train our CGL-GAN model. We evaluate the proposed CGL-GAN on two publicly available datasets, the CUB and the MS-COCO. The extensive experiments demonstrate that incorporating fine-grained local linguistic information with cross-modal correlation can greatly improve the performance of text-to-image synthesis, even when generating high-resolution images.

Proceedings ArticleDOI
01 Dec 2020
TL;DR: This paper presents a new approach, Sequence-to-Sequence with Shared Latent Space (S2S-SLS), for formality style transfer, where two auxiliary losses are proposed and joint training of bi-directional transfer and auto-encoding is adopted.
Abstract: Conventional approaches for formality style transfer borrow models from neural machine translation, which typically requires massive parallel data for training. However, the dataset for formality style transfer is considerably smaller than translation corpora. Moreover, we observe that informal and formal sentences closely resemble each other, which is different from the translation task where two languages have different vocabularies and grammars. In this paper, we present a new approach, Sequence-to-Sequence with Shared Latent Space (S2S-SLS), for formality style transfer, where we propose two auxiliary losses and adopt joint training of bi-directional transfer and auto-encoding. Experimental results show that S2S-SLS (with either RNN or Transformer architectures) consistently outperforms baselines in various settings, especially when we have limited data.

Journal ArticleDOI
14 Feb 2020-Cortex
TL;DR: An initial picture of the rapid spatio-temporal dynamics of the syntactic and semantic composition network in sentence processing is drawn, with a unique demonstration of the relevance of posterior temporal cortex for syntactic processing in natural language.

Proceedings Article
01 May 2020
TL;DR: This paper presents and releases MorphAGram, a publicly available framework for unsupervised morphological segmentation that uses Adaptor Grammars (AG) and is based on the work presented by Eskander et al. (2016).
Abstract: Computational morphological segmentation has been an active research topic for decades as it is beneficial for many natural language processing tasks. With the high cost of manually labeling data for morphology and the increasing interest in low-resource languages, unsupervised morphological segmentation has become essential for processing a typologically diverse set of languages, whether high-resource or low-resource. In this paper, we present and release MorphAGram, a publicly available framework for unsupervised morphological segmentation that uses Adaptor Grammars (AG) and is based on the work presented by Eskander et al. (2016). We conduct an extensive quantitative and qualitative evaluation of this framework on 12 languages and show that the framework achieves state-of-the-art results across languages of different typologies (from fusional to polysynthetic and from high-resource to low-resource).

Posted Content
TL;DR: The SIGMORPHON 2020 shared task on morphological reinflection aims to investigate systems' ability to generalize across typologically distinct languages, many of which are low resource as mentioned in this paper.
Abstract: A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language Most systems, however, are developed using data from just one language such as English The SIGMORPHON 2020 shared task on morphological reinflection aims to investigate systems' ability to generalize across typologically distinct languages, many of which are low resource Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages A total of 22 systems (19 neural) from 10 teams were submitted to the task All four winning systems were neural (two monolingual transformers and two massively multilingual RNN-based models with gated attention) Most teams demonstrate utility of data hallucination and augmentation, ensembles, and multilingual training for low-resource languages Non-neural learners and manually designed grammars showed competitive and even superior performance on some languages (such as Ingrian, Tajik, Tagalog, Zarma, Lingala), especially with very limited data Some language families (Afro-Asiatic, Niger-Congo, Turkic) were relatively easy for most systems and achieved over 90% mean accuracy while others were more challenging

Journal ArticleDOI
Li Zou1, Kuo Pang1, Xiaoying Song1, Ning Kang1, Xin Liu1 
TL;DR: This work focuses on FCA under uncertainty while the attributes are described with linguistic terms or attribute description are incomplete, and a new algorithm to complete the incomplete linguistic concept formal context based on the closeness degree between fuzzy objects is proposed.

Posted Content
TL;DR: This work studies the abilities of real-time counter machines as formal grammars, focusing on formal properties that are relevant for NLP models and makes general contributions to the theory of formal languages that are of potential interest for understanding recurrent neural networks.
Abstract: Counter machines have achieved a newfound relevance to the field of natural language processing (NLP): recent work suggests some strong-performing recurrent neural networks utilize their memory as counters. Thus, one potential way to understand the success of these networks is to revisit the theory of counter computation. Therefore, we study the abilities of real-time counter machines as formal grammars, focusing on formal properties that are relevant for NLP models. We first show that several variants of the counter machine converge to express the same class of formal languages. We also prove that counter languages are closed under complement, union, intersection, and many other common set operations. Next, we show that counter machines cannot evaluate boolean expressions, even though they can weakly validate their syntax. This has implications for the interpretability and evaluation of neural network systems: successfully matching syntactic patterns does not guarantee that counter memory accurately encodes compositional semantics. Finally, we consider whether counter languages are semilinear. This work makes general contributions to the theory of formal languages that are of potential interest for understanding recurrent neural networks.

Posted Content
TL;DR: Experiments demonstrate that the proposed ASGK is able to generate a robust and accurate report, and moreover outperforms state-of-the-art methods on both medical terminology classification and paragraph generation metrics.
Abstract: Beyond the common difficulties faced in the natural image captioning, medical report generation specifically requires the model to describe a medical image with a fine-grained and semantic-coherence paragraph that should satisfy both medical commonsense and logic. Previous works generally extract the global image features and attempt to generate a paragraph that is similar to referenced reports; however, this approach has two limitations. Firstly, the regions of primary interest to radiologists are usually located in a small area of the global image, meaning that the remainder parts of the image could be considered as irrelevant noise in the training procedure. Secondly, there are many similar sentences used in each medical report to describe the normal regions of the image, which causes serious data bias. This deviation is likely to teach models to generate these inessential sentences on a regular basis. To address these problems, we propose an Auxiliary Signal-Guided Knowledge Encoder-Decoder (ASGK) to mimic radiologists' working patterns. In more detail, ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning. The core structure of ASGK consists of a medical graph encoder and a natural language decoder, inspired by advanced Generative Pre-Training (GPT). Experiments on the CX-CHR dataset and our COVID-19 CT Report dataset demonstrate that our proposed ASGK is able to generate a robust and accurate report, and moreover outperforms state-of-the-art methods on both medical terminology classification and paragraph generation metrics.

Journal ArticleDOI
TL;DR: A machine translation system capable of translating Tunisian Dialect text to Modern Standard Arabic (MSA) using a rule-based approach, which allows the translation task to build a training dataset (parallel corpus) and allows for hybridization with the statistical approach or even for the newly developed neural-network method.

Journal ArticleDOI
Thomas Icard1
TL;DR: Unlike in the classical case where the ”semi-linear” languages all collapse into the regular languages, using analytic tools adapted from the classical setting it is shown there is no collapse in the probabilistic hierarchy: more distributions become definable at each level.

Journal ArticleDOI
03 Apr 2020
TL;DR: This article proposed a bilinear pooling model to model pairwise multiplicative interactions among individual neurons, and a low-rank approximation was proposed to make the model computationally feasible.
Abstract: Recent NLP studies reveal that substantial linguistic information can be attributed to single neurons, i.e., individual dimensions of the representation vectors. We hypothesize that modeling strong interactions among neurons helps to better capture complex information by composing the linguistic properties embedded in individual neurons. Starting from this intuition, we propose a novel approach to compose representations learned by different components in neural machine translation (e.g., multi-layer networks or multi-head attention), based on modeling strong interactions among neurons in the representation vectors. Specifically, we leverage bilinear pooling to model pairwise multiplicative interactions among individual neurons, and a low-rank approximation to make the model computationally feasible. We further propose extended bilinear pooling to incorporate first-order representations. Experiments on WMT14 English⇒German and English⇒French translation tasks show that our model consistently improves performances over the SOTA Transformer baseline. Further analyses demonstrate that our approach indeed captures more syntactic and semantic information as expected.

Journal ArticleDOI
22 May 2020
TL;DR: This paper illustrates what is considered the current state of the art of computer-assisted language comparison by presenting a workflow that starts with raw data and leads up to a stage where sound correspondence patterns across multiple languages have been identified and can be readily presented, inspected, and discussed.
Abstract: Historical language comparison opens windows onto a human past, long before the availability of written records. Since traditional language comparison within the framework of the comparative method is largely based on manual data comparison, requiring the meticulous sifting through dictionaries, word lists, and grammars, the framework is difficult to apply, especially in times where more and more data have become available in digital form. Unfortunately, it is not possible to simply automate the process of historical language comparison, not only because computational solutions lag behind human judgments in historical linguistics, but also because they lack the flexibility that would allow them to integrate various types of information from various kinds of sources. A more promising approach is to integrate computational and classical approaches within a computer-assisted framework, “neither completely computer-driven nor ignorant of the assistance computers afford” [1, p. 4]. In this paper, we will illustrate what we consider the current state of the art of computer-assisted language comparison by presenting a workflow that starts with raw data and leads up to a stage where sound correspondence patterns across multiple languages have been identified and can be readily presented, inspected, and discussed. We illustrate this workflow with the help of a newly prepared dataset on Hmong-Mien languages. Our illustration is accompanied by Python code and instructions on how to use additional web-based tools we developed so that users can apply our workflow for their own purposes.