scispace - formally typeset
Search or ask a question

Showing papers on "Natural language published in 2022"


Proceedings ArticleDOI
26 Feb 2022
TL;DR: PolyCoder as mentioned in this paper is a large open-source model trained on a multi-lingual corpus of code, with 2.7B parameters based on the GPT-2 architecture, that was trained on 249GB of code across 12 programming languages on a single machine.
Abstract: Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. However, the current state-of-the-art code LMs (e.g., Codex) are not publicly available, leaving many questions about their model and data design decisions. We aim to fill in some of these blanks through a systematic evaluation of the largest existing models: Codex, GPT-J, GPT-Neo, GPT-NeoX-20B, and CodeParrot, across various programming languages. Although Codex itself is not open-source, we find that existing opensource models do achieve close results in some programming languages, although targeted mainly for natural language modeling. We further identify an important missing piece in the form of a large open-source model trained exclusively on a multi-lingual corpus of code. We release a new model, PolyCoder, with 2.7B parameters based on the GPT-2 architecture, that was trained on 249GB of code across 12 programming languages on a single machine. In the C programming language, PolyCoder outperforms all models including Codex. Our trained models are open-source and publicly available at https://github.com/VHellendoorn/Code-LMs, which enables future research and application in this area. We have an online appendix at https://arxiv.org/abs/2202.13169.

126 citations


Proceedings ArticleDOI
22 Mar 2022
TL;DR: This work built Wordcraft, a text editor in which users collaborate with a generative language model to write a story, and shows that large language models enable novel co-writing experiences.
Abstract: The latest generation of large neural language models such as GPT-3 have achieved new levels of performance on benchmarks for language understanding and generation. These models have even demonstrated an ability to perform arbitrary tasks without explicit training. In this work, we sought to learn how people might use such models in the process of creative writing. We built Wordcraft, a text editor in which users collaborate with a generative language model to write a story. We evaluated Wordcraft with a user study in which participants wrote short stories with and without the tool. Our results show that large language models enable novel co-writing experiences. For example, the language model is able to engage in open-ended conversation about the story, respond to writers’ custom requests expressed in natural language (such as ”rewrite this text to be more Dickensian”), and generate suggestions that serve to unblock writers in the creative process. Based on these results, we discuss design implications for future human-AI co-writing systems.

63 citations


Journal ArticleDOI
TL;DR: Two complementary reviews of computational linguistics and organizational text mining research are conducted to provide empirically grounded text preprocessing decision-making recommendations that account for the type of text mining conducted, the research question under investigation, and the data set’s characteristics.
Abstract: Recent advances in text mining have provided new methods for capitalizing on the voluminous natural language text data created by organizations, their employees, and their customers. Although often...

60 citations


Proceedings ArticleDOI
11 Feb 2022
TL;DR: It is argued that existing protection methods cannot guarantee a generic and meaningful notion of privacy for language models, and it is concluded that language models should be trained on text data which was explicitly produced for public use.
Abstract: Natural language reflects our private lives and identities, making its privacy concerns as broad as those of real life. Language models lack the ability to understand the context and sensitivity of text, and tend to memorize phrases present in their training sets. An adversary can exploit this tendency to extract training data. Depending on the nature of the content and the context in which this data was collected, this could violate expectations of privacy. Thus, there is a growing interest in techniques for training language models that preserve privacy. In this paper, we discuss the mismatch between the narrow assumptions made by popular data protection techniques (data sanitization and differential privacy), and the broadness of natural language and of privacy as a social norm. We argue that existing protection methods cannot guarantee a generic and meaningful notion of privacy for language models. We conclude that language models should be trained on text data which was explicitly produced for public use.

52 citations


Book ChapterDOI
TL;DR: Zhang et al. as mentioned in this paper proposed an end-to-end Transformer-based open-vocabulary detector, which can detect any object given its class name or an exemplar image.
Abstract: Open-vocabulary object detection, which is concerned with the problem of detecting novel objects guided by natural language, has gained increasing attention from the community. Ideally, we would like to extend an open-vocabulary detector such that it can produce bounding box predictions based on user inputs in form of either natural language or exemplar image. This offers great flexibility and user experience for human-computer interaction. To this end, we propose a novel open-vocabulary detector based on DETR -- hence the name OV-DETR -- which, once trained, can detect any object given its class name or an exemplar image. The biggest challenge of turning DETR into an open-vocabulary detector is that it is impossible to calculate the classification cost matrix of novel classes without access to their labeled images. To overcome this challenge, we formulate the learning objective as a binary matching one between input queries (class name or exemplar image) and the corresponding objects, which learns useful correspondence to generalize to unseen queries during testing. For training, we choose to condition the Transformer decoder on the input embeddings obtained from a pre-trained vision-language model like CLIP, in order to enable matching for both text and image queries. With extensive experiments on LVIS and COCO datasets, we demonstrate that our OV-DETR -- the first end-to-end Transformer-based open-vocabulary detector -- achieves non-trivial improvements over current state of the arts.

40 citations


Proceedings ArticleDOI
05 Jan 2022
TL;DR: Three pre-training tasks are introduced that are specifically designed to enable SPT-Code to learn knowledge of source code, the corresponding code structure, as well as a natural language description of the code without relying on any bilingual corpus, and eventually exploit these three sources of information when it is applied to downstream tasks.
Abstract: Recent years have seen the successful application of large pretrained models to code representation learning, resulting in substantial improvements on many code-related downstream tasks. But there are issues surrounding their application to SE tasks. First, the majority of the pre-trained models focus on pre-training only the encoder of the Transformer. For generation tasks that are addressed using models with the encoder-decoder architecture, however, there is no reason why the decoder should be left out during pre-training. Second, many existing pre-trained models, including state-of-the-art models such as T5-learning, simply reuse the pretraining tasks designed for natural languages. Moreover, to learn the natural language description of source code needed eventually for code-related tasks such as code summarization, existing pretraining tasks require a bilingual corpus composed of source code and the associated natural language description, which severely limits the amount of data for pre-training. To this end, we propose SPT-Code, a sequence-to-sequence pre-trained model for source code. In order to pre-train SPT-Code in a sequence-to-sequence manner and address the aforementioned weaknesses associated with existing pre-training tasks, we introduce three pre-training tasks that are specifically designed to enable SPT-Code to learn knowledge of source code, the corresponding code structure, as well as a natural language description of the code without relying on any bilingual corpus, and eventually exploit these three sources of information when it is applied to downstream tasks. Experimental results demonstrate that SPT-Code achieves state-of-the-art performance on five code-related downstream tasks after fine-tuning.

37 citations


Proceedings ArticleDOI
27 Feb 2022-Findings
TL;DR: A novel lightweight framework for controllable GPT2 generation, which utilizes a set of small attribute-specific vectors, called prefixes, to steer natural language generation, and shows that its methods can guide generation towards the desired attributes while keeping high linguistic quality.
Abstract: To guide the generation of large pretrained language models (LM), previous work has focused on directly fine-tuning the language model or utilizing an attribute discriminator. In this work, we propose a novel lightweight framework for controllable GPT2 generation, which utilizes a set of small attribute-specific vectors, called prefixes (Li and Liang, 2021), to steer natural language generation. Different from Li and Liang (2021), where each prefix is trained independently, we take the relationship among prefixes into consideration and train multiple prefixes simultaneously. We propose a novel supervised method and also an unsupervised method to train the prefixes for single-aspect control while the combination of these two methods can achieve multi-aspect control. Experimental results on both single-aspect and multi-aspect control show that our methods can guide generation towards the desired attributes while keeping high linguistic quality.

36 citations


Proceedings ArticleDOI
15 Jun 2022
TL;DR: This paper proposes a new pre-training objective, “Naturalizing” of source code, exploiting code’s bimodal, dual-channel (formal & natural channels) nature, and introduces six classes of semantic preserving transformations to introduce un-natural forms of code, and forces the model to produce more natural original programs written by developers.
Abstract: Pre-trained Generative Language models (e.g., PLBART, CodeT5, SPT-Code) for source code yielded strong results on several tasks in the past few years, including code generation and translation. These models have adopted varying pre-training objectives to learn statistics of code construction from very large-scale corpora in a self-supervised fashion; the success of pre-trained models largely hinges on these pre-training objectives. This paper proposes a new pre-training objective, “Naturalizing” of source code, exploiting code’s bimodal, dual-channel (formal & natural channels) nature. Unlike natural language, code’s bimodal, dual-channel nature allows us to generate semantically equivalent code at scale. We introduce six classes of semantic preserving transformations to introduce unnatural forms of code, and then force our model to produce more natural original programs written by developers. Learning to generate equivalent, but more natural code, at scale, over large corpora of open-source code, without explicit manual supervision, helps the model learn to both ingest & generate code. We fine-tune our model in three generative Software Engineering tasks: code generation, code translation, and code refinement with limited human-curated labeled data and achieve state-of-the-art performance rivaling CodeT5. We show that our pre-trained model is especially competitive at zero-shot and few-shot learning, and better at learning code properties (e.g., syntax, data flow)

34 citations


Proceedings ArticleDOI
22 Mar 2022
TL;DR:
Abstract: A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal, and receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities. In this paper, we review contemporary studies in the emerging field of VLN, covering tasks, evaluation metrics, methods, etc. Through structured analysis of current progress and challenges, we also highlight the limitations of current VLN and opportunities for future work. This paper serves as a thorough reference for the VLN research community.

30 citations


Journal ArticleDOI
TL;DR: This paper propose a Context-Aware Visual Policy Network (CAVP) for fine-grained image-to-language generation, which explicitly considers the previous visual attentions as context, and decides whether the context is used for the current word/sentence generation given the current visual attention.
Abstract: With the maturity of visual detection techniques, we are more ambitious in describing visual content with open-vocabulary, fine-grained and free-form language, i.e., the task of image captioning. In particular, we are interested in generating longer, richer and more fine-grained sentences and paragraphs as image descriptions. Image captioning can be translated to the task of sequential language prediction given visual content, where the output sequence forms natural language description with plausible grammar. However, existing image captioning methods focus only on language policy while not visual policy, and thus fail to capture visual context that are crucial for compositional reasoning such as object relationships (e.g., "man riding horse") and visual comparisons (e.g., "small(er) cat"). This issue is especially severe when generating longer sequences such as a paragraph. To fill the gap, we propose a Context-Aware Visual Policy network (CAVP) for fine-grained image-to-language generation: image sentence captioning and image paragraph captioning. During captioning, CAVP explicitly considers the previous visual attentions as context, and decides whether the context is used for the current word/sentence generation given the current visual attention. Compared against traditional visual attention mechanism that only fixes a single visual region at each step, CAVP can attend to complex visual compositions over time. The whole image captioning model-CAVP and its subsequent language policy network-can be efficiently optimized end-to-end by using an actor-critic policy gradient method. We have demonstrated the effectiveness of CAVP by state-of-the-art performances on MS-COCO and Stanford captioning datasets, using various metrics and sensible visualizations of qualitative visual context.

29 citations


Journal ArticleDOI
TL;DR: The authors proposed a recursive grounding tree (RvG-tree) model, which is inspired by the intuition that any language expression can be recursively decomposed into two constituent parts, and the grounding confidence score can be accumulated by calculating their grounding scores returned by the two sub-trees.
Abstract: Grounding natural language in images, such as localizing "the black dog on the left of the tree", is one of the core problems in artificial intelligence, as it needs to comprehend the fine-grained language compositions. However, existing solutions merely rely on the association between the holistic language features and visual features, while neglect the nature of composite reasoning implied in the language. In this paper, we propose a natural language grounding model that can automatically compose a binary tree structure for parsing the language and then perform visual reasoning along the tree in a bottom-up fashion. We call our model RvG-Tree: Recursive Grounding Tree, which is inspired by the intuition that any language expression can be recursively decomposed into two constituent parts, and the grounding confidence score can be recursively accumulated by calculating their grounding scores returned by the two sub-trees.RvG-Tree can be trained end-to-end by using the Straight-Through Gumbel-Softmax estimator that allows the gradients from the continuous score functions passing through the discrete tree construction. Experiments on several benchmarks show that our model achieves the state-of-the-art performance with more explainable reasoning.

Posted ContentDOI
22 Dec 2022-bioRxiv
TL;DR: This paper showed that language models can learn a deep grammar that enables the design of protein structure, extending beyond natural proteins, and showed that such a grammar can be used to generate de novo proteins.
Abstract: Learning the design patterns of proteins from sequences across evolution may have promise toward generative protein design. However it is unknown whether language models, trained on sequences of natural proteins, will be capable of more than memorization of existing protein families. Here we show that language models generalize beyond natural proteins to generate de novo proteins. We focus on two protein design tasks: fixed backbone design where the structure is specified, and unconstrained generation where the structure is sampled from the model. Remarkably although the models are trained only on sequences, we find that they are capable of designing structure. A total of 228 generated proteins are evaluated experimentally with high overall success rates (152/228 or 67%) in producing a soluble and monomeric species by size exclusion chromatography. Out of 152 experimentally successful designs, 35 have no significant sequence match to known natural proteins. Of the remaining 117, sequence identity to the nearest sequence match is at median 27%, below 20% for 6 designs, and as low as 18% for 3 designs. For fixed backbone design, the language model generates successful designs for each of eight experimentally evaluated artificially created fixed backbone targets. For unconstrained generation, sampled proteins cover diverse topologies and secondary structure compositions, and have high experimental success rate (71/129 or 55%). The designs reflect deep patterns linking sequence and structure, including motifs that occur in related natural structures, and motifs that are not observed in similar structural contexts in known protein families. The results show that language models, though only trained on sequences, learn a deep grammar that enables the design of protein structure, extending beyond natural proteins.

Book ChapterDOI
TL;DR: The language model's capabilities for causal association among events expressed in natural language text are investigated using sentence context combined with event information, and by leveraging masked event context with in-domain and out-of-domain data distribution.
Abstract: Causality understanding between events is a critical natural language processing task that is helpful in many areas, including health care, business risk management, and finance. On close examination, one can find a huge amount of textual content both in the form of formal documents or in content arising from social media like Twitter, dedicated to communicating and exploring various types of causality in the real world. Recognizing these “Cause-effect" relationships between natural language events continues to remain a challenge simply because it is often expressed implicitly. Implicit causality is hard to detect through most of the techniques employed in literature and can also, at times be perceived as ambiguous or vague. Also, although well-known datasets do exist for this problem, the examples in them are limited in the range and complexity of the causal relationships they depict especially when related to implicit relationships. Most of the contemporary methods are either based on lexico-semantic pattern matching or are feature-driven supervised algorithms. Therefore, these methods are more geared towards handling explicit causal relationships leading to limited coverage for implicit relationships, and are hard to generalize. In this paper, we investigate the language model’s capabilities for causal association among events expressed in natural language text using sentence context combined with event information, and by leveraging masked event context with in-domain and out-of-domain data distribution. Our proposed methods achieve the state-of-art performance in three different data distributions and can be leveraged for extraction of a causal diagram and/or building a chain of events from unstructured text.

Proceedings ArticleDOI
11 Feb 2022
TL;DR: This paper proposes a conversational User Simulator, called USi, for automatic evaluation of conversational search systems, capable of automatically answering clarifying questions about the topic throughout the search session, and shows that responses generated by USi are both inline with the underlying information need and comparable to human-generated answers.
Abstract: Clarifying the underlying user information need by asking clarifying questions is an important feature of modern conversational search system. However, evaluation of such systems through answering prompted clarifying questions requires significant human effort, which can be time-consuming and expensive. In this paper, we propose a conversational User Simulator, called USi, for automatic evaluation of such conversational search systems. Given a description of an information need, USi is capable of automatically answering clarifying questions about the topic throughout the search session. Through a set of experiments, including automated natural language generation metrics and crowdsourcing studies, we show that responses generated by USi are both inline with the underlying information need and comparable to human-generated answers. Moreover, we make the first steps towards multi-turn interactions, where conversational search systems asks multiple questions to the (simulated) user with a goal of clarifying the user need. To this end, we expand on currently available datasets for studying clarifying questions, i.e., Qulac and ClariQ, by performing a crowdsourcing-based multi-turn data acquisition. We show that our generative, GPT2-based model, is capable of providing accurate and natural answers to unseen clarifying questions in the single-turn setting and discuss capabilities of our model in the multi-turn setting. We provide the code, data, and the pre-trained model to be used for further research on the topic.

Proceedings ArticleDOI
29 Apr 2022
TL;DR: A natural language code synthesis tool, GenLine, backed by a large generative language model and a set of task-specific prompts that create or change code is presented, indicating that while naturallanguage code synthesis can sometimes provide a magical experience, participants still faced challenges.
Abstract: In this paper, we present a natural language code synthesis tool, GenLine, backed by 1) a large generative language model and 2) a set of task-specific prompts that create or change code. To understand the user experience of natural language code synthesis with these new types of models, we conducted a user study in which participants applied GenLine to two programming tasks. Our results indicate that while natural language code synthesis can sometimes provide a magical experience, participants still faced challenges. In particular, participants felt that they needed to learn the model’s “syntax,” despite their input being natural language. Participants also struggled to form an accurate mental model of the types of requests the model can reliably translate and developed a set of strategies to debug model input. From these findings, we discuss design implications for future natural language code synthesis tools built using large generative language models.

Proceedings ArticleDOI
10 Aug 2022
TL;DR: A novel pretraining objective is proposed which explicitly models edits and used to build CoditT5, a large language model for software-related editing tasks that is pretrained on large amounts of source code and natural language comments.
Abstract: Pretrained language models have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits. To address this, we propose a novel pretraining objective which explicitly models edits and use it to build CoditT5, a large language model for software-related editing tasks that is pretrained on large amounts of source code and natural language comments. We fine-tune it on various downstream editing tasks, including comment updating, bug fixing, and automated code review. By outperforming standard generation-based models, we demonstrate the generalizability of our approach and its suitability for editing tasks. We also show how a standard generation model and our edit-based model can complement one another through simple reranking strategies, with which we achieve state-of-the-art performance for the three downstream editing tasks.

Journal ArticleDOI
TL;DR: This state-of-the-art report investigates the recent developments and applications of NNLG in its full extent from a multidimensional view, covering critical perspectives such as multimodality, multilinguality, controllability and learning strategies.
Abstract: Developing artificial learning systems that can understand and generate natural language has been one of the long-standing goals of artificial intelligence. Recent decades have witnessed an impressive progress on both of these problems, giving rise to a new family of approaches. Especially, the advances in deep learning over the past couple of years have led to neural approaches to natural language generation (NLG). These methods combine generative language learning techniques with neural-networks based frameworks. With a wide range of applications in natural language processing, neural NLG (NNLG) is a new and fast growing field of research. In this state-of-the-art report, we investigate the recent developments and applications of NNLG in its full extent from a multidimensional view, covering critical perspectives such as multimodality, multilinguality, controllability and learning strategies. We summarize the fundamental building blocks of NNLG approaches from these aspects and provide detailed reviews of commonly used preprocessing steps and basic neural architectures. This report also focuses on the seminal applications of these NNLG models such as machine translation, description generation, automatic speech recognition, abstractive summarization, text simplification, question answering and generation, and dialogue generation. Finally, we conclude with a thorough discussion of the described frameworks by pointing out some open research directions.

Journal ArticleDOI
TL;DR: In this article , a conceptual model for the semantic content conveyed by natural language descriptions of visualizations is introduced, which spans four levels of semantic content: enumerating visualization construction properties, reporting statistical concepts and relations, identifying perceptual and cognitive phenomena, and elucidating domain-specific insights.
Abstract: Natural language descriptions sometimes accompany visualizations to better communicate and contextualize their insights, and to improve their accessibility for readers with disabilities. However, it is difficult to evaluate the usefulness of these descriptions, and how effectively they improve access to meaningful information, because we have little understanding of the semantic content they convey, and how different readers receive this content. In response, we introduce a conceptual model for the semantic content conveyed by natural language descriptions of visualizations. Developed through a grounded theory analysis of 2,147 sentences, our model spans four levels of semantic content: enumerating visualization construction properties (e.g., marks and encodings); reporting statistical concepts and relations (e.g., extrema and correlations); identifying perceptual and cognitive phenomena (e.g., complex trends and patterns); and elucidating domain-specific insights (e.g., social and political context). To demonstrate how our model can be applied to evaluate the effectiveness of visualization descriptions, we conduct a mixed-methods evaluation with 30 blind and 90 sighted readers, and find that these reader groups differ significantly on which semantic content they rank as most useful. Together, our model and findings suggest that access to meaningful information is strongly reader-specific, and that research in automatic visualization captioning should orient toward descriptions that more richly communicate overall trends and statistics, sensitive to reader preferences. Our work further opens a space of research on natural language as a data interface coequal with visualization.

Journal ArticleDOI
TL;DR: A recent survey as mentioned in this paper summarizes the current progress of adversarial techniques in the natural language processing field and categorizes textual adversarial attacks according to the combination of semantic granularity and example generation strategy.

Journal ArticleDOI
TL;DR: Results obtained show that a hybrid approach exploiting both natural language dialogue and block-based interaction can help make the programming task easy and efficient for non-technical users.
Abstract: The research reported in this paper proposes a new approach to collaborative robots that aims at improving the simplicity and efficiency of the programming task for non-technical users. It is grounded on three standpoints: (i) an elementary and disciplined paradigm for robot programming, called the simple programming journey, (ii) a hybrid interaction mode where robot tasks can be programmed using a natural language chat and, if necessary, can be completed and finalized through a block-based interface, and (iii) a robust cognitive match between the mental models of the user and the programming interface. The proposed approach has been implemented and tested through the development of a prototype programming environment called CAPIRCI, which can be tailored to different application domains through the definition of objects, locations, and actions. CAPIRCI has been tested by real users with a COBOTTA robot by DENSO WAVE Ltd. Two experimental tests have been carried out in order to validate the novel approach proposed and to assess its impact on end-user programming. The results obtained show that a hybrid approach exploiting both natural language dialogue and block-based interaction can help make the programming task easy and efficient for non-technical users.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a method to generate natural and relevant questions from various input formats, e.g., natural language, text, and images, using a question generation model.
Abstract: Question generation is an important yet challenging problem in Artificial Intelligence (AI), which aims to generate natural and relevant questions from various input formats, e.g., natural language...

Journal ArticleDOI
TL;DR: This research aims to analyze the possibilities and challenges of implementing a chatbot in the Kumaon language with end-to-end encryption so that the service user has good security.
Abstract: Natural language processing is one of the essential activities of artificial intelligence. There is an excellent need for chatbots that require integration into artificial intelligence applications. The local language makes the process easier. Our research aims to analyze the possibilities and challenges of implementing a chatbot in the Kumaon language. We also provide a detailed survey of the Kumaon language and map it to other languages to make it easier to process it for industrial use. This chatbot can help with various needs and services in the Kumaon language. The method used in this research is a study analysis of the Kumaon language to deal with language extinction. The novelty in this research is a chatbot in the Kumaon language with end-to-end encryption so that the service user has good security.

Journal ArticleDOI
TL;DR: The authors showed that a domain general learning setup, originally developed in cognitive psychology to model rule learning, is able to acquire key pieces of natural language from relatively few examples of sentences, highlighting some features of language and language acquisition that may arise from general cognitive processes.
Abstract: Significance It has long been hypothesized that language acquisition may be impossible without innate knowledge of the structures that occur in natural language. Here, we show that a domain general learning setup, originally developed in cognitive psychology to model rule learning, is able to acquire key pieces of natural language from relatively few examples of sentences. This develops a new approach to formalizing linguistic learning and highlights some features of language and language acquisition that may arise from general cognitive processes.

Proceedings ArticleDOI
01 Jan 2022
TL;DR: Through an audit of recent work on computational approaches for predicting morality, this work examines the broader issues that arise from such efforts and offers a critique of such NLP methods for automating ethical decision-making.
Abstract: Ethics is one of the longest standing intellectual endeavors of humanity. In recent years, the fields of AI and NLP have attempted to address issues of harmful outcomes in machine learning systems that are made to interface with humans. One recent approach in this vein is the construction of NLP morality models that can take in arbitrary text and output a moral judgment about the situation described. In this work, we offer a critique of such NLP methods for automating ethical decision-making. Through an audit of recent work on computational approaches for predicting morality, we examine the broader issues that arise from such efforts. We conclude with a discussion of how machine ethics could usefully proceed in NLP, by focusing on current and near-future uses of technology, in a way that centers around transparency, democratic values, and allows for straightforward accountability.

Journal ArticleDOI
TL;DR: The authors proposed a natural language understanding (NLU) framework for argumentative dialogue systems in the information-seeking and opinion building domain, which consists of two sub-models, namely intent classifier and argument similarity.
Abstract: This paper introduces a natural language understanding (NLU) framework for argumentative dialogue systems in the information-seeking and opinion building domain. The proposed framework consists of two sub-models, namely intent classifier and argument similarity. Intent classifier model stacks BiLSTM with attention mechanism on top of the pre-trained BERT model and fine-tune the model for recognizing the user intent, whereas the argument similarity model employs BERT+BiLSTM for identifying system arguments the user refers to in his or her natural language utterances. Our model is evaluated in an argumentative dialogue system that engages the user to inform him-/herself about a controversial topic by exploring pro and con arguments and build his/her opinion towards the topic. In order to evaluate the proposed approach, we collect user utterances for the interaction with the respective system labeling intent and referenced argument in an extensive online study. The data collection includes multiple topics and two different user types (native English speakers from the UK and non-native English speakers from China). Additionally, we evaluate the proposed intent classifier and argument similarity models separately on the publicly available Banking77 and STS benchmark datasets. The evaluation indicates a clear advantage of the utilized techniques over baseline approaches on several datasets, as well as the robustness of the proposed approach against new topics and different language proficiency as well as the cultural background of the user. Furthermore, results show that our intent classifier model outperforms DIET, DistillBERT, and BERT fine-tuned models in few-shot setups (i.e., with 10, 20, or 30 labeled examples per intent) and full data setup.

Book ChapterDOI
TL;DR: In this paper , the authors introduce a new dataset Mobile app Tasks with Iterative Feedback (MoTIF), where the goal is to complete a natural language command in a mobile app.
Abstract: Vision-language navigation (VLN), in which an agent follows language instruction in a visual environment, has been studied under the premise that the input command is fully feasible in the environment. Yet in practice, a request may not be possible due to language ambiguity or environment changes. To study VLN with unknown command feasibility, we introduce a new dataset Mobile app Tasks with Iterative Feedback (MoTIF), where the goal is to complete a natural language command in a mobile app. Mobile apps provide a scalable domain to study real downstream uses of VLN methods. Moreover, mobile app commands provide instruction for interactive navigation, as they result in action sequences with state changes via clicking, typing, or swiping. MoTIF is the first to include feasibility annotations, containing both binary feasibility labels and fine-grained labels for why tasks are unsatisfiable. We further collect follow-up questions for ambiguous queries to enable research on task uncertainty resolution. Equipped with our dataset, we propose the new problem of feasibility prediction, in which a natural language instruction and multimodal app environment are used to predict command feasibility. MoTIF provides a more realistic app dataset as it contains many diverse environments, high-level goals, and longer action sequences than prior work. We evaluate interactive VLN methods using MoTIF, quantify the generalization ability of current approaches to new app environments, and measure the effect of task feasibility on navigation performance.

Journal ArticleDOI
TL;DR: The authors describe the nature and capabilities of GPT-3, and offer some suggestions for how instructors might meet the challenges of the increasing improvement of the systems and their availability to students, and then offer some examples of how instructors can meet the challenge of increasing improvement.
Abstract: AI-based natural language production systems are currently able to produce unique text with minimal human intervention. Because such systems are improving at a very fast pace, teachers who expect students to produce their own writing—engaging in the complex processes of generating and organizing ideas, researching topics, drafting coherent prose, and using feedback to make principled revisions that both improve the quality of the text and help them to develop as writers—will confront the prospect that students can use the systems to produce human-looking text without engaging in these processes. In this article, we first describe the nature and capabilities of AI-based natural language production systems such as GPT-3, then offer some suggestions for how instructors might meet the challenges of the increasing improvement of the systems and their availability to students.

Proceedings ArticleDOI
25 Jul 2022
TL;DR: A novel model named MHST is developed that takes into account the information in multi-modalities to intelligently address different types of questions with corresponding strategies, i.e., extraction or reasoning.
Abstract: Document Visual Question Answering (VQA) aims to answer questions over visually-rich documents. In this work, we introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages comprising semi-structured table(s) and unstructured text as well as 16,558 question-answer pairs. The documents are sampled from financial reports and contain lots of numbers, which means discrete reasoning capability is demanded to answer the questions. Based on TAT-DQA, we further develop a novel model named MHST that takes into account the information in multi-modalities to intelligently address different types of questions with corresponding strategies, i.e., extraction or reasoning. The experiments show that MHST model significantly outperforms the baseline methods, demonstrating its effectiveness. However, the performance still lags far behind that of expert humans. We expect that our TAT-DQA dataset would facilitate the research on understanding of visually-rich documents, especially for scenarios that require discrete reasoning. Also, we hope the proposed model would inspire researchers to design more advanced Document VQA models in future.

Proceedings ArticleDOI
05 Mar 2022
TL;DR: In this article , an approach based on Neural Machine Translation (NMT) was proposed to automatically generate shellcode snippets from descriptions in natural language, and an empirical study using a novel dataset (shellcode_IA32) was conducted to evaluate the accuracy of NMT at generating shellcodes.
Abstract: Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly language. In this work, we address the task of automatically generating shellcodes, starting purely from descriptions in natural language, by proposing an approach based on Neural Machine Translation (NMT). We then present an empirical study using a novel dataset (Shellcode_IA32), which consists of 3,200 assembly code snippets of real Linux/x86 shellcodes from public databases, annotated using natural language. Moreover, we propose novel metrics to evaluate the accuracy of NMT at generating shellcodes. The empirical analysis shows that NMT can generate assembly code snippets from the natural language with high accuracy and that in many cases can generate entire shellcodes with no errors.

Journal ArticleDOI
TL;DR: In this paper , an explainable classifier based on a data-driven predictive model is proposed, which is an explainability classifier that accounts for the imprecision related to model outputs by summarizing them into simple linguistic statements and by including additional domain knowledge in the form of middle-layer labels.