scispace - formally typeset
Search or ask a question

Showing papers on "Natural language understanding published in 2022"


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors give a comprehensive survey of recent advances in Chinese NER, including the common datasets, tag schemes, evaluation metrics and difficulties of CNER, in which the CNER with deep learning is their focus.

30 citations


Journal ArticleDOI
TL;DR: The authors proposed a natural language understanding (NLU) framework for argumentative dialogue systems in the information-seeking and opinion building domain, which consists of two sub-models, namely intent classifier and argument similarity.
Abstract: This paper introduces a natural language understanding (NLU) framework for argumentative dialogue systems in the information-seeking and opinion building domain. The proposed framework consists of two sub-models, namely intent classifier and argument similarity. Intent classifier model stacks BiLSTM with attention mechanism on top of the pre-trained BERT model and fine-tune the model for recognizing the user intent, whereas the argument similarity model employs BERT+BiLSTM for identifying system arguments the user refers to in his or her natural language utterances. Our model is evaluated in an argumentative dialogue system that engages the user to inform him-/herself about a controversial topic by exploring pro and con arguments and build his/her opinion towards the topic. In order to evaluate the proposed approach, we collect user utterances for the interaction with the respective system labeling intent and referenced argument in an extensive online study. The data collection includes multiple topics and two different user types (native English speakers from the UK and non-native English speakers from China). Additionally, we evaluate the proposed intent classifier and argument similarity models separately on the publicly available Banking77 and STS benchmark datasets. The evaluation indicates a clear advantage of the utilized techniques over baseline approaches on several datasets, as well as the robustness of the proposed approach against new topics and different language proficiency as well as the cultural background of the user. Furthermore, results show that our intent classifier model outperforms DIET, DistillBERT, and BERT fine-tuned models in few-shot setups (i.e., with 10, 20, or 30 labeled examples per intent) and full data setup.

13 citations


Journal ArticleDOI
TL;DR: For a survey of the state-of-the-art methods for joint intent classification and slot filling in natural language understanding, we refer the reader to as mentioned in this paper , where a survey is provided.
Abstract: Intent classification, to identify the speaker’s intention, and slot filling, to label each token with a semantic type, are critical tasks in natural language understanding. Traditionally the two tasks have been addressed independently. More recently joint models that address the two tasks together have achieved state-of-the-art performance for each task and have shown there exists a strong relationship between the two. In this survey, we bring the coverage of methods up to 2021 including the many applications of deep learning in the field. As well as a technological survey, we look at issues addressed in the joint task and the approaches designed to address these issues. We cover datasets, evaluation metrics, and experiment design and supply a summary of reported performance on the standard datasets.

13 citations


Journal ArticleDOI
TL;DR: This article identified, investigated, and analyzed various language models used in NLU and NLP to find directions for future research and proposed building steps for a conceptual framework to achieve goals of enhancing the performance of language models in the field of NLU.
Abstract: Learning human languages is a difficult task for a computer. However, Deep Learning (DL) techniques have enhanced performance significantly for almost all-natural language processing (NLP) tasks. Unfortunately, these models cannot be generalized for all the NLP tasks with similar performance. NLU (Natural Language Understanding) is a subset of NLP including tasks, like machine translation, dialogue-based systems, natural language inference, text entailment, sentiment analysis, etc. The advancement in the field of NLU is the collective performance enhancement in all these tasks. Even though MTL (Multi-task Learning) was introduced before Deep Learning, it has gained significant attention in the past years. This paper aims to identify, investigate, and analyze various language models used in NLU and NLP to find directions for future research. The Systematic Literature Review (SLR) is prepared using the literature search guidelines proposed by Kitchenham and Charters on various language models between 2011 and 2021. This SLR points out that the unsupervised learning method-based language models show potential performance improvement. However, they face the challenge of designing the general-purpose framework for the language model, which will improve the performance of multi-task NLU and the generalized representation of knowledge. Combining these approaches may result in a more efficient and robust multi-task NLU. This SLR proposes building steps for a conceptual framework to achieve goals of enhancing the performance of language models in the field of NLU.

13 citations


Proceedings ArticleDOI
15 Jun 2022
TL;DR: Results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B, their subsequent distillation into smaller models, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system are presented.
Abstract: We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B, their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system. Though we train using 70% spoken-form data, our teacher models perform comparably to XLM-R and mT5 when evaluated on the written-form Cross-lingual Natural Language Inference (XNLI) corpus. We perform a second stage of pretraining on our teacher models using in-domain data from our system, improving error rates by 3.86% relative for intent classification and 7.01% relative for slot filling. We find that even a 170M-parameter model distilled from our Stage 2 teacher model has 2.88% better intent classification and 7.69% better slot filling error rates when compared to the 2.3B-parameter teacher trained only on public data (Stage 1), emphasizing the importance of in-domain data for pretraining. When evaluated offline using labeled NLU data, our 17M-parameter Stage 2 distilled model outperforms both XLM-R Base (85M params) and DistillBERT (42M params) by 4.23% to 6.14%, respectively. Finally, we present results from a full virtual assistant experimentation platform, where we find that models trained using our pretraining and distillation pipeline outperform models distilled from 85M-parameter teachers by 3.74%-4.91% on an automatic measurement of full-system user dissatisfaction.

12 citations


Proceedings ArticleDOI
01 May 2022
TL;DR: An accurate and practical automated approach for handling anaphoric ambiguity in requirements, addressing both ambiguity detection and anaphora interpretation, and observes that supervised ML outperforms both a large-scale language model, SpanBERT, as well as a solution assembled from off-the-shelf NLP coreference re-solvers.
Abstract: Ambiguity is a pervasive issue in natural-language requirements. A common source of ambiguity in requirements is when a pronoun is anaphoric. In requirements engineering, anaphoric ambiguity occurs when a pronoun can plausibly refer to different entities and thus be interpreted differently by different readers. In this paper, we develop an accurate and practical automated approach for handling anaphoric ambiguity in requirements, addressing both ambiguity detection and anaphora interpretation. In view of the multiple competing natural language processing (NLP) and machine learning (ML) technologies that one can utilize, we simultaneously pursue six alternative solutions, empirically assessing each using a col-lection of ≈1,350 industrial requirements. The alternative solution strategies that we consider are natural choices induced by the existing technologies; these choices frequently arise in other automation tasks involving natural-language requirements. A side-by-side em-pirical examination of these choices helps develop insights about the usefulness of different state-of-the-art NLP and ML technologies for addressing requirements engineering problems. For the ambigu-ity detection task, we observe that supervised ML outperforms both a large-scale language model, SpanBERT (a variant of BERT), as well as a solution assembled from off-the-shelf NLP coreference re-solvers. In contrast, for anaphora interpretation, SpanBERT yields the most accurate solution. In our evaluation, (1) the best solution for anaphoric ambiguity detection has an average precision of ≈60% and a recall of 100%, and (2) the best solution for anaphora interpretation (resolution) has an average success rate of ≈98%.

10 citations


Proceedings ArticleDOI
01 Jan 2022
TL;DR: In this paper , the authors introduce NATURAL INSTRUCTIONS, a dataset of 61 distinct tasks, their human-authored instructions, and 193k task instances (input-output pairs), which are obtained from crowdsourcing instructions used to create existing NLP datasets and mapped to a unified schema.
Abstract: Humans (e.g., crowdworkers) have a remarkable ability in solving different tasks, by simply reading textual instructions that define them and looking at a few examples. Despite the success of the conventional supervised learning on individual datasets, such models often struggle with generalization across tasks (e.g., a question-answering system cannot solve classification tasks). A long-standing challenge in AI is to build a model that learns a new task by understanding the human-readable instructions that define it. To study this, we introduce NATURAL INSTRUCTIONS, a dataset of 61 distinct tasks, their human-authored instructions, and 193k task instances (input-output pairs). The instructions are obtained from crowdsourcing instructions used to create existing NLP datasets and mapped to a unified schema. Using this meta-dataset, we measure cross-task generalization by training models on seen tasks and measuring generalization to the remaining unseen ones. We adopt generative pre-trained language models to encode task-specific instructions along with input and generate task output. Our results indicate that models benefit from instructions when evaluated in terms of generalization to unseen tasks (19% better for models utilizing instructions). These models, however, are far behind an estimated performance upperbound indicating significant room for more progress in this direction.

9 citations


Journal ArticleDOI
TL;DR: This paper shows that there is an unobserved confounder for the natural language utterances and their respective classes, leading to spurious correlations from training data, and provides a new perspective with causal inference to find out the bias.
Abstract: Recent studies have shown that strong Natural Language Understanding (NLU) models are prone to relying on annotation biases of the datasets as a shortcut, which goes against the underlying mechanisms of the task of interest. To reduce such biases, several recent works introduce debiasing methods to regularize the training process of targeted NLU models. In this paper, we provide a new perspective with causal inference to find out the bias. On one hand, we show that there is an unobserved confounder for the natural language utterances and their respective classes, leading to spurious correlations from training data. To remove such confounder, the backdoor adjustment with causal intervention is utilized to find the true causal effect, which makes the training process fundamentally different from the traditional likelihood estimation. On the other hand, in inference process, we formulate the bias as the direct causal effect and remove it by pursuing the indirect causal effect with counterfactual reasoning. We conduct experiments on large-scale natural language inference and fact verification benchmarks, evaluating on bias sensitive datasets that are specifically designed to assess the robustness of models against known biases in the training data. Experimental results show that our proposed debiasing framework outperforms previous state-of-the-art debiasing methods while maintaining the original in-distribution performance.

9 citations


Proceedings ArticleDOI
01 Jan 2022
TL;DR: In this article , the generative language model BioBART was adapted to the biomedical domain to perform various biomedical language generation tasks including dialogue, summarization, entity linking, and named entity recognition.
Abstract: Pretrained language models have served as important backbones for natural language processing. Recently, in-domain pretraining has been shown to benefit various domain-specific downstream tasks. In the biomedical domain, natural language generation (NLG) tasks are of critical importance, while understudied. Approaching natural language understanding (NLU) tasks as NLG achieves satisfying performance in the general domain through constrained language generation or language prompting. We emphasize the lack of in-domain generative language models and the unsystematic generative downstream benchmarks in the biomedical domain, hindering the development of the research community. In this work, we introduce the generative language model BioBART that adapts BART to the biomedical domain. We collate various biomedical language generation tasks including dialogue, summarization, entity linking, and named entity recognition. BioBART pretrained on PubMed abstracts has enhanced performance compared to BART and set strong baselines on several tasks. Furthermore, we conduct ablation studies on the pretraining tasks for BioBART and find that sentence permutation has negative effects on downstream tasks.

8 citations


Journal ArticleDOI
TL;DR: This paper tackles their key limits by fully abstracting text into meaning and introducing language-independent concepts and semantic relations, in order to obtain an interlingual representation, and aims to overcome the language barrier.
Abstract: Conceptual representations of meaning have long been the general focus of Artificial Intelligence (AI) towards the fundamental goal of machine understanding, with innumerable efforts made in Knowledge Representation, Speech and Natural Language Processing, Computer Vision, inter alia. Even today, at the core of Natural Language Understanding lies the task of Semantic Parsing, the objective of which is to convert natural sentences into machine-readable representations. Through this paper, we aim to revamp the historical dream of AI, by putting forward a novel, all-embracing, fully semantic meaning representation, that goes beyond the many existing formalisms. Indeed, we tackle their key limits by fully abstracting text into meaning and introducing language-independent concepts and semantic relations, in order to obtain an interlingual representation. Our proposal aims to overcome the language barrier, and connect not only texts across languages, but also images, videos, speech and sound, and logical formulas, across many fields of AI.

6 citations


Proceedings ArticleDOI
23 May 2022
TL;DR: In this paper , an end-to-end (E2E) Spoken Language Understanding (SLU) network with a continuous token interface (CTI) is proposed, which is a junctional representation of the ASR and NLU networks.
Abstract: Most End-to-End (E2E) Spoken Language Understanding (SLU) networks leverage the pre-trained Automatic Speech Recognition (ASR) networks but still lack the capability to understand the semantics of utterances, crucial for the SLU task. To solve this, recently proposed studies use pre-trained Natural Language Understanding (NLU) networks. However, it is not trivial to fully utilize both pre-trained networks; many solutions were proposed, such as Knowledge Distillation (KD), cross-modal shared embedding and network integration with Interface. We propose a simple and robust integration method for the E2E SLU network with a novel Interface, Continuous Token Interface (CTI). CTI is a junctional representation of the ASR and NLU networks when both networks are pre-trained with the same vocabulary. Thus, we can train our SLU network in an E2E manner without additional modules, such as Gumbel-Softmax. We evaluate our model using SLURP, a challenging SLU dataset and achieve state-of-the-art scores on intent classification and slot filling tasks. We also verify that the NLU network, pre-trained with Masked Language Model (MLM), can utilize a noisy textual representation of CTI. Moreover, we train our model with extra data, SLURP-Synth, and get better results.

Journal ArticleDOI
TL;DR: The authors provide an overview of the evolution of visually grounded models of spoken language over the last 20 years, and provide a useful introduction and overview for practitioners in all these areas. But they do not discuss the evaluation metrics and analysis techniques.
Abstract: This survey provides an overview of the evolution of visually grounded models of spoken language over the last 20 years. Such models are inspired by the observation that when children pick up a language, they rely on a wide range of indirect and noisy clues, crucially including signals from the visual modality co-occurring with spoken utterances. Several fields have made important contributions to this approach to modeling or mimicking the process of learning language: Machine Learning, Natural Language and Speech Processing, Computer Vision and Cognitive Science. The current paper brings together these contributions in order to provide a useful introduction and overview for practitioners in all these areas. We discuss the central research questions addressed, the timeline of developments, and the datasets which enabled much of this work. We then summarize the main modeling architectures and offer an exhaustive overview of the evaluation metrics and analysis techniques.

Journal ArticleDOI
TL;DR: This structured survey provides an overview of the evolving research area by categorising reported weaknesses in models and datasets and the methods proposed to reveal and alleviate those weaknesses for the English language.
Abstract: Abstract Recent years have seen a growing number of publications that analyse Natural Language Understanding (NLU) datasets for superficial cues, whether they undermine the complexity of the tasks underlying those datasets and how they impact those models that are optimised and evaluated on this data. This structured survey provides an overview of the evolving research area by categorising reported weaknesses in models and datasets and the methods proposed to reveal and alleviate those weaknesses for the English language. We summarise and discuss the findings and conclude with a set of recommendations for possible future research directions. We hope that it will be a useful resource for researchers who propose new datasets to assess the suitability and quality of their data to evaluate various phenomena of interest, as well as those who propose novel NLU approaches, to further understand the implications of their improvements with respect to their model’s acquired capabilities.

Journal ArticleDOI
TL;DR: It is found that the BERT-based model pretrained on the most recent Korean corpus performed the best in terms of Korean-based multiclass text classification, suggesting the necessity of optimal pretraining for specific NLU tasks, particularly those in languages other than English.
Abstract: Recently, transformer-based pretrained language models have demonstrated stellar performance in natural language understanding (NLU) tasks. For example, bidirectional encoder representations from transformers (BERT) have achieved outstanding performance through masked self-supervised pretraining and transformer-based modeling. However, the original BERT may only be effective for English-based NLU tasks, whereas its effectiveness for other languages such as Korean is limited. Thus, the applicability of BERT-based language models pretrained in languages other than English to NLU tasks based on those languages must be investigated. In this study, we comparatively evaluated seven BERT-based pretrained language models and their expected applicability to Korean NLU tasks. We used the climate technology dataset, which is a Korean-based large text classification dataset, in research proposals involving 45 classes. We found that the BERT-based model pretrained on the most recent Korean corpus performed the best in terms of Korean-based multiclass text classification. This suggests the necessity of optimal pretraining for specific NLU tasks, particularly those in languages other than English.

Book ChapterDOI
TL;DR: In this article , the authors present an architecture adopted to deploy a successful conversational AI agent, named Ainume, using Google Dialogflow on the Google Cloud Platform (GCP), which identifies symptoms of common and chronic diseases, accordingly, suggesting nutraceutical solutions to reduce the symptoms of these diseases.
Abstract: Innovative conversational artificial intelligence (AI) powered systems have been gaining momentum in the healthcare industry in recent years. Automated artificial intelligence programs are built with the purpose of allowing effective communication by providing an interface between the computer and the user. Conversational AI are making a significant impact on the healthcare industry for both medical health providers and patients. Several natural language processing (NLP) platforms, in particular using natural language understanding (NLU), such as Google Dialogflow, IBM Watson and Rasa are used in conversational AI. This paper intends to present an architecture adopted to deploy a successful conversational AI agent, named Ainume, using Google Dialogflow on the Google Cloud Platform (GCP). Ainume identifies symptoms of common and chronic diseases, accordingly, suggesting nutraceutical solutions to reduce the symptoms of these diseases. The focus of this paper is on one aspect that Ainume is equipped to deal with, that is, cardiovascular diseases.

Journal ArticleDOI
TL;DR: A real case study in implementing a chatbot, which answers frequently asked questions from learners on an Italian e-learning platform that provides workplace safety courses to several business customers, and its results on the original users’ requests are presented.
Abstract: During the COVID-19 pandemic, the corporate online training sector has increased exponentially and online course providers had to implement innovative solutions to be more efficient and provide a satisfactory service. This paper considers a real case study in implementing a chatbot, which answers frequently asked questions from learners on an Italian e-learning platform that provides workplace safety courses to several business customers. Having to respond quickly to the increase in the courses activated, the company decided to develop a chatbot using a cloud-based service currently available on the market. These services are based on Natural Language Understanding (NLU) engines, which deal with identifying information such as entities and intentions from the sentences provided as input. To integrate a chatbot in an e-learning platform, we studied the performance of the intent recognition task of the major NLU platforms available on the market with an in-depth comparison, using an Italian dataset provided by the owner of the e-learning platform. We focused on intent recognition, carried out several experiments and evaluated performance in terms of F-score, error rate, response time, and robustness of all the services selected. The chatbot is currently in production, therefore we present a description of the system implemented and its results on the original users’ requests.

Journal ArticleDOI
Peng Zhao1
TL;DR: In this paper , an end-to-end SLU (E2E SLU) based on Deep Neural Networks has gained momentum since it benefits from the joint optimization of the ASR and the NLU parts, hence limiting the cascade of error effect of the pipeline architecture.

Proceedings ArticleDOI
01 Jan 2022-Findings
TL;DR: This paper proposed a self-training method which augments available few-shot training data with similar (automatically labeled) in-domain sentences from large monolingual Web-scale corpora.
Abstract: Scaling dialogue systems to a multitude of domains, tasks and languages relies on costly and time-consuming data annotation for different domain-task-language configurations. The annotation efforts might be substantially reduced by the methods that generalise well in zero- and few-shot scenarios, and also effectively leverage external unannotated data sources (e.g., Web-scale corpora). We propose two methods to this aim, offering improved dialogue natural language understanding (NLU) across multiple languages: 1) Multi-SentAugment, and 2) LayerAgg. Multi-SentAugment is a self-training method which augments available (typically few-shot) training data with similar (automatically labelled) in-domain sentences from large monolingual Web-scale corpora. LayerAgg learns to select and combine useful semantic information scattered across different layers of a Transformer model (e.g., mBERT); it is especially suited for zero-shot scenarios as semantically richer representations should strengthen the model’s cross-lingual capabilities. Applying the two methods with state-of-the-art NLU models obtains consistent improvements across two standard multilingual NLU datasets covering 16 diverse languages. The gains are observed in zero-shot, few-shot, and even in full-data scenarios. The results also suggest that the two methods achieve a synergistic effect: the best overall performance in few-shot setups is attained when the methods are used together.

Book ChapterDOI
TL;DR: In this paper , a rule-based technique that leverages both natural language features extracted using existing NLP and ML techniques and contextual knowledge to capture the different classes of complex intents is proposed.
Abstract: Task-oriented dialogue systems employ third-party APIs to serve end-users via natural language interactions. while existing advances in Natural Language Processing (NLP) and Machine Learning (ML) techniques have produced promising and useful results to recognize user intents, the synthesis of API calls to support a broad range of potentially complex user intents is still largely a manual and costly process. In this paper, we propose a new approach to recognize and realize complex user intents. Our approach relies on a new rule-based technique that leverages both (i) natural language features extracted using existing NLP and ML techniques and (ii) contextual knowledge to capture the different classes of complex intents. We devise a context knowledge service to capture the requisite contextual knowledge.

Journal ArticleDOI
TL;DR: A practical approach to enabling conversational agents over military scenarios based on natural language understanding (NLU) and natural language generation (NLG) and can be trained by other datasets for future application domains is provided.
Abstract: With the rise of artificial intelligence, conversational agents (CA) have found use in various applications in the commerce and service industries. In recent years, many conversational datasets have becomes publicly available, most relating to open-domain social conversations. However, it is difficult to obtain domain-specific or language-specific conversational datasets. This work focused on developing conversational systems based on the Chinese corpus over military scenarios. The soldier will need information regarding their surroundings and orders to carry out their mission in an unfamiliar environment. Additionally, using a conversational military agent will help soldiers obtain immediate and relevant responses while reducing labor and cost requirements when performing repetitive tasks. This paper proposes a system architecture for conversational military agents based on natural language understanding (NLU) and natural language generation (NLG). The NLU phase comprises two tasks: intent detection and slot filling. Detecting intent and filling slots involves predicting the user’s intent and extracting related entities. The goal of the NLG phase, in contrast, is to provide answers or ask questions to clarify the user’s needs. In this study, the military training task was when soldiers sought information via a conversational agent during the mission. In summary, we provide a practical approach to enabling conversational agents over military scenarios. Additionally, the proposed conversational system can be trained by other datasets for future application domains.

Proceedings ArticleDOI
23 May 2022
TL;DR: In this paper , a self-distillation joint NLU model was proposed for multiple intent detection and slot filling in natural language understanding (NLU), where the output of each decoder serves as auxiliary information for the next decoder, and the auxiliary loop completes via the selfdistillation.
Abstract: Intent detection and slot filling are two main tasks in natural language understanding (NLU). These two tasks are highly related and often trained jointly. However, most previous works assume an utterance only corresponds to one intent, ignoring that it can include multiple intents. In this paper, we propose a novel Self-Distillation Joint NLU model (SDJN) for multi-intent NLU. Specifically, we adopt three orderly connected decoders and a self-distillation approach to form an auxiliary loop that establishes interrelated connections between multiple intents and slots. The output of each decoder serves as auxiliary information for the next decoder, and the auxiliary loop completes via the self-distillation. Furthermore, we formulate multiple intent detection as a weakly supervised task and handle it with multiple instance learning (MIL), which exploits token-level intent information to predict multiple intents and guide slot decoder. Experimental results indicate that our model achieves competitive performance compared to others.

Journal ArticleDOI
TL;DR: Xiao-Shih as discussed by the authors is the first intelligent question answering bot on Chinese-based massive open online courses (MOOCs), which integrates many novel natural language processing and machine learning approaches to achieve state-of-the-art performance.
Abstract: This article introduces Xiao-Shih, the first intelligent question answering bot on Chinese-based massive open online courses (MOOCs). Question answering is critical for solving individual problems. However, instructors on MOOCs must respond to many questions, and learners must wait a long time for answers. To address this issue, Xiao-Shih integrates many novel natural language processing and machine learning approaches to achieve state-of-the-art performance. Furthermore, Xiao-Shih has a built-in self-enriched mechanism for expanding the knowledge base through open community-based question answering. This article proposes a novel approach, known as spreading question similarity (SQS), which iterates similar keywords on our keyword networks to find duplicate questions. Compared with BERT, an advanced neural language model, the results showed that SQS outperforms BERT on recall and accuracy above a prediction probability threshold of 0.8. After training, Xiao-Shih achieved a perfect correct rate. Furthermore, Xiao-Shih outperforms Jill Watson 1.0, which is a noted question answering bot, on answer rate with the self-enriched mechanism.

Journal ArticleDOI
TL;DR: benchmarking evaluates the impacts of different components of, or options for, the vision-and-language learning model and shows the effectiveness of pretraining strategies, as well as assessing the robustness of the framework to novel scenarios.
Abstract: This paper investigates human instruction following for robotic manipulation via a hybrid, modular system with symbolic and connectionist elements. Symbolic methods build modular systems with semantic parsing and task planning modules for producing sequences of actions from natural language requests. Modern connectionist methods employ deep neural networks that learn visual and linguistic features for mapping inputs to a sequence of low-level actions, in an end-to-end fashion. The hybrid, modular system blends these two approaches to create a modular framework: it formulates instruction following as symbolic goal learning via deep neural networks followed by task planning via symbolic planners. Connectionist and symbolic modules are bridged with Planning Domain Definition Language. The vision-and-language learning network predicts its goal representation, which is sent to a planner for producing a task-completing action sequence. For improving the flexibility of natural language, we further incorporate implicit human intents with explicit human instructions. To learn generic features for vision and language, we propose to separately pretrain vision and language encoders on scene graph parsing and semantic textual similarity tasks. Benchmarking evaluates the impacts of different components of, or options for, the vision-and-language learning model and shows the effectiveness of pretraining strategies. Manipulation experiments conducted in the simulator AI2THOR show the robustness of the framework to novel scenarios.

Proceedings ArticleDOI
06 Jul 2022
TL;DR: The authors propose to formulate the task-oriented dialogue system as the purely natural language generation task, so as to fully leverage the large-scale pre-trained models like GPT-2 and simplify complicated delexicalization prepossessing.
Abstract: In this paper, we propose to formulate the task-oriented dialogue system as the purely natural language generation task, so as to fully leverage the large-scale pre-trained models like GPT-2 and simplify complicated delexicalization prepossessing. However, directly applying this method heavily suffers from the dialogue entity inconsistency caused by the removal of delexicalized tokens, as well as the catastrophic forgetting problem of the pre-trained model during fine-tuning, leading to unsatisfactory performance. To alleviate these problems, we design a novel GPT-Adapter-CopyNet network, which incorporates the lightweight adapter and CopyNet modules into GPT-2 to achieve better performance on transfer learning and dialogue entity generation. Experimental results conducted on the DSTC8 Track 1 benchmark and MultiWOZ dataset demonstrate that our proposed approach significantly outperforms baseline models with a remarkable performance on automatic and human evaluations.


Journal ArticleDOI
TL;DR: In this paper , the Dual Intent and Entity Transformer (DIET) architecture was fed with pre-trained word embeddings, surpassing other recent proposals in the sentiment analysis field.
Abstract: The Rasa open-source toolkit provides a valuable Natural Language Understanding (NLU) infrastructure to assist the development of conversational agents. In this paper, we show that this infrastructure can seamlessly and effectively be used for other different NLU-related text classification tasks, such as sentiment analysis. The approach is evaluated on three widely used datasets containing movie reviews, namely IMDb, Movie Review (MR) and the Stanford Sentiment Treebank (SST2). The results are consistent across the three databases, and show that even simple configurations of the NLU pipeline lead to accuracy rates that are comparable to those obtained with other state-of-the-art architectures. The best results were obtained when the Dual Intent and Entity Transformer (DIET) architecture was fed with pre-trained word embeddings, surpassing other recent proposals in the sentiment analysis field. In particular, accuracy rates of 0.907, 0.816 and 0.858 were obtained for the IMDb, MR and SST2 datasets, respectively.

Journal ArticleDOI
TL;DR: This article proposes Computational Linguistics with Deep-Learning-Based Intent Detection and Classification (CL-DLBIDC) for natural language understanding and makes use of the deep learning modified neural network (DLMNN) model for intent detection and classification.
Abstract: Computational linguistics explores how human language is interpreted automatically and then processed. Research in this area takes the logical and mathematical features of natural language and advances methods and statistical procedures for automated language processing. Slot filling and intent detection are significant modules in task-based dialogue systems. Intent detection is a critical task in any natural language understanding (NLU) system and constitutes the base of a task-based dialogue system. In order to build high-quality, real-time conversational solutions for edge gadgets, there is a demand for deploying intent-detection methods on devices. This mandates an accurate, lightweight, and fast method that effectively operates in a resource-limited environment. Earlier works have explored the usage of several machine-learning (ML) techniques for detecting intent in user queries. In this article, we propose Computational Linguistics with Deep-Learning-Based Intent Detection and Classification (CL-DLBIDC) for natural language understanding. The presented CL-DLBIDC technique receives word embedding as input and learned meaningful features to determine the probable intention of the user query. In addition, the presented CL-DLBIDC technique uses the GloVe approach. In addition, the CL-DLBIDC technique makes use of the deep learning modified neural network (DLMNN) model for intent detection and classification. For the hyperparameter tuning process, the mayfly optimization (MFO) algorithm was used in this study. The experimental analysis of the CL-DLBIDC method took place under a set of simulations, and the results were scrutinized for distinct aspects. The simulation outcomes demonstrate the significant performance of the CL-DLBIDC algorithm over other DL models.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a natural language inference method for relation extraction, where given a premise and a hypothesis, the NLP inference task refers to predicting whether the facts in the premise necessarily imply the facts of the hypothesis, and then model infers whether these hypotheses can be concluded from the premise.

Proceedings ArticleDOI
18 Sep 2022
TL;DR: In this article , the authors evaluate ASR output hypotheses quality with SemDist that can measure semantic correctness by using the distance between the semantic vectors of the reference and hypothesis extracted from a pre-trained language model.
Abstract: Measuring automatic speech recognition (ASR) system quality is critical for creating user-satisfying voice-driven applications. Word Error Rate (WER) has been traditionally used to evaluate ASR system quality; however, it sometimes correlates poorly with user perception/judgement of transcription quality. This is because WER weighs every word equally and does not consider semantic correctness which has a higher impact on user perception. In this work, we propose evaluating ASR output hypotheses quality with SemDist that can measure semantic correctness by using the distance between the semantic vectors of the reference and hypothesis extracted from a pre-trained language model. Our experimental results of 71K and 36K user annotated ASR output quality show that SemDist achieves higher correlation with user perception than WER. We also show that SemDist has higher correlation with downstream Natural Language Understanding (NLU) tasks than WER.

Proceedings ArticleDOI
23 May 2022
TL;DR: A novel framework, ADVIN, is proposed, to automatically discover novel domains and intents from large volumes of unlabeled text and form a hierarchical intent-domain taxonomy by linking mutually related novel intents into novel domains.
Abstract: Recognizing the intents and domains of users’ spoken and written language is a key component of Natural Language Understanding (NLU) systems. Real applications however encounter dynamic, rapidly evolving environments with newly emerging intents and domains, for which no labeled data or prior information is available. For such a setting, we propose a novel framework, ADVIN, to automatically discover novel domains and intents from large volumes of unlabeled text. We first employ an open classification model to discriminate all utterances potentially consisting of a novel intent. Next, we train a deep learning model with a pairwise margin loss function and knowledge transfer, to discover multiple latent intent categories in an unsupervised manner. We finally form a hierarchical intent-domain taxonomy by linking mutually related novel intents into novel domains. ADVIN significantly outperforms strong baselines on four benchmark datasets, and data from a real-world voice agent.