scispace - formally typeset
Search or ask a question

Showing papers on "Natural language understanding published in 2018"


Proceedings ArticleDOI
01 Nov 2018
TL;DR: The gluebenchmark as mentioned in this paper is a benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models.
Abstract: Human ability to understand language is general, flexible, and robust. In contrast, most NLU models above the word level are designed for a specific task and struggle with out-of-domain data. If we aspire to develop models with understanding beyond the detection of superficial correspondences between inputs and outputs, then it is critical to develop a unified model that can execute a range of linguistic tasks across different domains. To facilitate research in this direction, we present the General Language Understanding Evaluation (GLUE, gluebenchmark.com): a benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models. For some benchmark tasks, training data is plentiful, but for others it is limited or does not match the genre of the test set. GLUE thus favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. While none of the datasets in GLUE were created from scratch for the benchmark, four of them feature privately-held test data, which is used to ensure that the benchmark is used fairly. We evaluate baselines that use ELMo (Peters et al., 2018), a powerful transfer learning technique, as well as state-of-the-art sentence representation models. The best models still achieve fairly low absolute scores. Analysis with our diagnostic dataset yields similarly weak performance over all phenomena tested, with some exceptions.

3,225 citations


Proceedings Article
20 Apr 2018
TL;DR: A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.
Abstract: Human ability to understand language is general, flexible, and robust. In contrast, most NLU models above the word level are designed for a specific task and struggle with out-of-domain data. If we aspire to develop models with understanding beyond the detection of superficial correspondences between inputs and outputs, then it is critical to develop a unified model that can execute a range of linguistic tasks across different domains. To facilitate research in this direction, we present the General Language Understanding Evaluation (GLUE, gluebenchmark.com): a benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models. For some benchmark tasks, training data is plentiful, but for others it is limited or does not match the genre of the test set. GLUE thus favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. While none of the datasets in GLUE were created from scratch for the benchmark, four of them feature privately-held test data, which is used to ensure that the benchmark is used fairly. We evaluate baselines that use ELMo (Peters et al., 2018), a powerful transfer learning technique, as well as state-of-the-art sentence representation models. The best models still achieve fairly low absolute scores. Analysis with our diagnostic dataset yields similarly weak performance over all phenomena tested, with some exceptions.

2,167 citations


Posted Content
TL;DR: AllenNLP is designed to support researchers who want to build novel language understanding models quickly and easily and provides a flexible data API that handles intelligent batching and padding, and a modular and extensible experiment framework that makes doing good science easy.
Abstract: This paper describes AllenNLP, a platform for research on deep learning methods in natural language understanding. AllenNLP is designed to support researchers who want to build novel language understanding models quickly and easily. It is built on top of PyTorch, allowing for dynamic computation graphs, and provides (1) a flexible data API that handles intelligent batching and padding, (2) high-level abstractions for common operations in working with text, and (3) a modular and extensible experiment framework that makes doing good science easy. It also includes reference implementations of high quality approaches for both core semantic problems (e.g. semantic role labeling (Palmer et al., 2005)) and language understanding applications (e.g. machine comprehension (Rajpurkar et al., 2016)). AllenNLP is an ongoing open-source effort maintained by engineers and researchers at the Allen Institute for Artificial Intelligence.

767 citations


Posted Content
TL;DR: The machine learning architecture of the Snips Voice Platform is presented, a software solution to perform Spoken Language Understanding on microprocessors typical of IoT devices that is fast and accurate while enforcing privacy by design, as no personal user data is ever collected.
Abstract: This paper presents the machine learning architecture of the Snips Voice Platform, a software solution to perform Spoken Language Understanding on microprocessors typical of IoT devices. The embedded inference is fast and accurate while enforcing privacy by design, as no personal user data is ever collected. Focusing on Automatic Speech Recognition and Natural Language Understanding, we detail our approach to training high-performance Machine Learning models that are small enough to run in real-time on small devices. Additionally, we describe a data generation procedure that provides sufficient, high-quality training data without compromising user privacy.

566 citations


Proceedings Article
25 Apr 2018
TL;DR: This work couple sub-symbolic and symbolic AI to automatically discover conceptual primitives from text and link them to commonsense concepts and named entities in a new three-level knowledge representation for sentiment analysis.
Abstract: With the recent development of deep learning, research in AI has gained new vigor and prominence. While machine learning has succeeded in revitalizing many research fields, such as computer vision, speech recognition, and medical diagnosis, we are yet to witness impressive progress in natural language understanding. One of the reasons behind this unmatched expectation is that, while a bottom-up approach is feasible for pattern recognition, reasoning and understanding often require a top-down approach. In this work, we couple sub-symbolic and symbolic AI to automatically discover conceptual primitives from text and link them to commonsense concepts and named entities in a new three-level knowledge representation for sentiment analysis. In particular, we employ recurrent neural networks to infer primitives by lexical substitution and use them for grounding common and commonsense knowledge by means of multi-dimensional scaling.

340 citations


Posted Content
TL;DR: The General Language Understanding Evaluation Benchmark (GLUE) as mentioned in this paper is a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks, which incentivizes sharing knowledge across tasks because some tasks have very limited training data.
Abstract: For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is model-agnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited training data. We further provide a hand-crafted diagnostic test suite that enables detailed linguistic analysis of NLU models. We evaluate baselines based on current methods for multi-task and transfer learning and find that they do not immediately give substantial improvements over the aggregate performance of training a separate model per task, indicating room for improvement in developing general and robust NLU systems.

261 citations


Proceedings Article
01 Aug 2018
TL;DR: The authors proposed an evaluation methodology consisting of automatically constructed "stress tests" that allow them to examine whether systems have the ability to make real inferential decisions, and evaluated six sentence-encoder models on these stress tests.
Abstract: Natural language inference (NLI) is the task of determining if a natural language hypothesis can be inferred from a given premise in a justifiable manner. NLI was proposed as a benchmark task for natural language understanding. Existing models perform well at standard datasets for NLI, achieving impressive results across different genres of text. However, the extent to which these models understand the semantic content of sentences is unclear. In this work, we propose an evaluation methodology consisting of automatically constructed “stress tests” that allow us to examine whether systems have the ability to make real inferential decisions. Our evaluation of six sentence-encoder models on these stress tests reveals strengths and weaknesses of these models with respect to challenging linguistic phenomena, and suggests important directions for future work in this area.

208 citations


Proceedings ArticleDOI
Dmitriy Serdyuk1, Yongqiang Wang1, Christian Fuegen1, Anuj Kumar1, Baiyang Liu1, Yoshua Bengio1 
15 Apr 2018
TL;DR: This study showed that the trained model can achieve reasonable good result and demonstrated that the model can capture the semantic attention directly from the audio features.
Abstract: Spoken language understanding system is traditionally designed as a pipeline of a number of components. First, the audio signal is processed by an automatic speech recognizer for transcription or n-best hypotheses. With the recognition results, a natural language understanding system classifies the text to structured data as domain, intent and slots for down-streaming consumers, such as dialog system, hands-free applications. These components are usually developed and optimized independently. In this paper, we present our study on an end-to-end learning system for spoken language understanding. With this unified approach, we can infer the semantic meaning directly from audio features without the intermediate text representation. This study showed that the trained model can achieve reasonable good result and demonstrated that the model can capture the semantic attention directly from the audio features.

189 citations


Journal ArticleDOI
TL;DR: The unsupervised machines (Atom2Vec) can learn the basic properties of atoms by themselves from the extensive database of known compounds and materials, represented in terms of high-dimensional vectors, and clustering of atoms in vector space classifies them into meaningful groups consistent with human knowledge.
Abstract: Exciting advances have been made in artificial intelligence (AI) during recent decades. Among them, applications of machine learning (ML) and deep learning techniques brought human-competitive performances in various tasks of fields, including image recognition, speech recognition, and natural language understanding. Even in Go, the ancient game of profound complexity, the AI player has already beat human world champions convincingly with and without learning from the human. In this work, we show that our unsupervised machines (Atom2Vec) can learn the basic properties of atoms by themselves from the extensive database of known compounds and materials. These learned properties are represented in terms of high-dimensional vectors, and clustering of atoms in vector space classifies them into meaningful groups consistent with human knowledge. We use the atom vectors as basic input units for neural networks and other ML models designed and trained to predict materials properties, which demonstrate significant accuracy.

173 citations


Proceedings ArticleDOI
24 Sep 2018
TL;DR: This paper formulate audio to semantic understanding as a sequence-to-sequence problem, and proposes and compares various encoder-decoder based approaches that optimize both modules jointly, in an end- to-end manner.
Abstract: Conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text (or top N hypotheses) into a set of domains, intents, and arguments. These modules are typically optimized independently. In this paper, we formulate audio to semantic understanding as a sequence-to-sequence problem [1]. We propose and compare various encoder-decoder based approaches that optimize both modules jointly, in an end-to-end manner. Evaluations on a real-world task show that 1) having an intermediate text representation is crucial for the quality of the predicted semantics, especially the intent arguments and 2) jointly optimizing the full system improves overall accuracy of prediction. Compared to independently trained models, our best jointly trained model achieves similar domain and intent prediction F1 scores, but improves argument word error rate by 18% relative.

129 citations


Posted Content
TL;DR: The authors proposed an evaluation methodology consisting of automatically constructed "stress tests" that allow us to examine whether systems have the ability to make real inferential decisions, revealing strengths and weaknesses of these models with respect to challenging linguistic phenomena, and suggests important directions for future work in this area.
Abstract: Natural language inference (NLI) is the task of determining if a natural language hypothesis can be inferred from a given premise in a justifiable manner. NLI was proposed as a benchmark task for natural language understanding. Existing models perform well at standard datasets for NLI, achieving impressive results across different genres of text. However, the extent to which these models understand the semantic content of sentences is unclear. In this work, we propose an evaluation methodology consisting of automatically constructed "stress tests" that allow us to examine whether systems have the ability to make real inferential decisions. Our evaluation of six sentence-encoder models on these stress tests reveals strengths and weaknesses of these models with respect to challenging linguistic phenomena, and suggests important directions for future work in this area.

Book
16 May 2018
TL;DR: Artificial Intelligence: With an Introduction to Machine Learning, Second Edition as mentioned in this paper provides a more accessible and student friendly introduction to AI, while maintaining the same accessibility and problem-solving approach, while providing new material and methods.
Abstract: The first edition of this popular textbook, Contemporary Artificial Intelligence, provided an accessible and student friendly introduction to AI This fully revised and expanded update, Artificial Intelligence: With an Introduction to Machine Learning, Second Edition, retains the same accessibility and problem-solving approach, while providing new material and methods The book is divided into five sections that focus on the most useful techniques that have emerged from AI The first section of the book covers logic-based methods, while the second section focuses on probability-based methods Emergent intelligence is featured in the third section and explores evolutionary computation and methods based on swarm intelligence The newest section comes next and provides a detailed overview of neural networks and deep learning The final section of the book focuses on natural language understanding Suitable for undergraduate and beginning graduate students, this class-tested textbook provides students and other readers with key AI methods and algorithms for solving challenging problems involving systems that behave intelligently in specialized domains such as medical and software diagnostics, financial decision making, speech and text recognition, genetic analysis, and more

Proceedings Article
14 Mar 2018
TL;DR: A large dataset of narrative texts and questions about these texts, intended to be used in a machine comprehension task that requires reasoning using commonsense knowledge, and shows that the mode of data collection via crowdsourcing results in a substantial amount of inference questions.
Abstract: We introduce a large dataset of narrative texts and questions about these texts, intended to be used in a machine comprehension task that requires reasoning using commonsense knowledge. Our dataset complements similar datasets in that we focus on stories about everyday activities, such as going to the movies or working in the garden, and that the questions require commonsense knowledge, or more specifically, script knowledge, to be answered. We show that our mode of data collection via crowdsourcing results in a substantial amount of such inference questions. The dataset forms the basis of a shared task on commonsense and script knowledge organized at SemEval 2018 and provides challenging test cases for the broader natural language understanding community.

Posted Content
TL;DR: This paper proposed and compared various encoder-decoder based approaches that optimize both modules jointly, in an end-to-end manner, in a real-world task and showed that having an intermediate text representation is crucial for the quality of the predicted semantics, especially the intent arguments.
Abstract: Conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text (or top N hypotheses) into a set of domains, intents, and arguments. These modules are typically optimized independently. In this paper, we formulate audio to semantic understanding as a sequence-to-sequence problem [1]. We propose and compare various encoder-decoder based approaches that optimize both modules jointly, in an end-to-end manner. Evaluations on a real-world task show that 1) having an intermediate text representation is crucial for the quality of the predicted semantics, especially the intent arguments and 2) jointly optimizing the full system improves overall accuracy of prediction. Compared to independently trained models, our best jointly trained model achieves similar domain and intent prediction F1 scores, but improves argument word error rate by 18% relative.

Proceedings ArticleDOI
01 Jul 2018
TL;DR: This paper looks at Natural Language Understanding, an area of Natural Language Processing aimed at making sense of text, through the lens of a visionary future: what do the authors expect a machine should be able to understand and what are the key dimensions that require the attention of researchers to make this dream come true?
Abstract: In this paper I look at Natural Language Understanding, an area of Natural Language Processing aimed at making sense of text, through the lens of a visionary future: what do we expect a machine should be able to understand? and what are the key dimensions that require the attention of researchers to make this dream come true?

Proceedings ArticleDOI
01 Nov 2018
TL;DR: This paper introduces CogCompTime, a system that has these two important functionalities and incorporates the most recent progress, achieves state-of-the-art performance, and is publicly available at http://cogcomp.org/page/publication_view/844.
Abstract: Automatic extraction of temporal information is important for natural language understanding. It involves two basic tasks: (1) Understanding time expressions that are mentioned explicitly in text (e.g., February 27, 1998 or tomorrow), and (2) Understanding temporal information that is conveyed implicitly via relations. This paper introduces CogCompTime, a system that has these two important functionalities. It incorporates the most recent progress, achieves state-of-the-art performance, and is publicly available at http://cogcomp.org/page/publication_view/844.

Posted Content
TL;DR: The proposed neural math solver is based on an encoder-decoder framework, where the encoder is designed to understand the semantics of problems, and the decoder focuses on tracking semantic meanings of the generated symbols and then deciding which symbol to generate next.
Abstract: Solving math word problems is a challenging task that requires accurate natural language understanding to bridge natural language texts and math expressions. Motivated by the intuition about how human generates the equations given the problem texts, this paper presents a neural approach to automatically solve math word problems by operating symbols according to their semantic meanings in texts. This paper views the process of generating equation as a bridge between the semantic world and the symbolic world, where the proposed neural math solver is based on an encoder-decoder framework. In the proposed model, the encoder is designed to understand the semantics of problems, and the decoder focuses on tracking semantic meanings of the generated symbols and then deciding which symbol to generate next. The preliminary experiments are conducted in a dataset Math23K, and our model significantly outperforms both the state-of-the-art single model and the best non-retrieval-based model over about 10% accuracy, demonstrating the effectiveness of bridging the symbolic and semantic worlds from math word problems.

Proceedings Article
08 May 2018
TL;DR: This work presents a semantically annotated parallel corpus for English, German, Italian, and Dutch where sentences are aligned with scoped meaning representations in order to capture the semantics of negation, modals, quantification, and presupposition triggers.
Abstract: Semantic parsing offers many opportunities to improve natural language understanding. We present a semantically annotated parallel corpus for English, German, Italian, and Dutch where sentences are aligned with scoped meaning representations in order to capture the semantics of negation, modals, quantification, and presupposition triggers. The semantic formalism is based on Discourse Representation Theory, but concepts are represented by WordNet synsets and thematic roles by VerbNet relations. Translating scoped meaning representations to sets of clauses enables us to compare them for the purpose of semantic parser evaluation and checking translations. This is done by computing precision and recall on matching clauses, in a similar way as is done for Abstract Meaning Representations. We show that our matching tool for evaluating scoped meaning representations is both accurate and efficient. Applying this matching tool to three baseline semantic parsers yields F-scores between 43% and 54%. A pilot study is performed to automatically find changes in meaning by comparing meaning representations of translations. This comparison turns out to be an additional way of (i) finding annotation mistakes and (ii) finding instances where our semantic analysis needs to be improved.

Posted Content
TL;DR: The authors introduce a large dataset of narrative texts and questions about these texts, intended to be used in a machine comprehension task that requires reasoning using commonsense knowledge, or more specifically, script knowledge.
Abstract: We introduce a large dataset of narrative texts and questions about these texts, intended to be used in a machine comprehension task that requires reasoning using commonsense knowledge. Our dataset complements similar datasets in that we focus on stories about everyday activities, such as going to the movies or working in the garden, and that the questions require commonsense knowledge, or more specifically, script knowledge, to be answered. We show that our mode of data collection via crowdsourcing results in a substantial amount of such inference questions. The dataset forms the basis of a shared task on commonsense and script knowledge organized at SemEval 2018 and provides challenging test cases for the broader natural language understanding community.

Proceedings Article
18 Feb 2018
TL;DR: This paper compares and analyses the main cloud-based NLU platforms, both from a descriptive and a performance-based point of view, highlighting strengths and weaknesses of the different NLU tools.
Abstract: In the last 10 years, various cloud platforms enabled developers to easily create applications able to understand, with some limitations, natural languages. Nowadays, such cloud platforms for natural language understanding (NLU) are widely used, thanks to the raise of multiple chat services and conversational assistants on our mobile devices. This paper compares and analyses the main cloud-based NLU platforms, both from a descriptive and from a performance-based point of view. For the descriptive analysis, a taxonomy is proposed and six cloud platforms are analyzed. The performance evaluation, instead, compares three of these platforms, highlighting strengths and weaknesses of the different NLU tools.

Proceedings ArticleDOI
01 Jul 2018
TL;DR: A lightweight yet effective approach to mining cross-cultural differences of named entities and finding similar terms for slang across languages that could be useful for machine translation applications and research in computational social science is presented.
Abstract: Cross-cultural differences and similarities are common in cross-lingual natural language understanding, especially for research in social media. For instance, people of distinct cultures often hold different opinions on a single named entity. Also, understanding slang terms across languages requires knowledge of cross-cultural similarities. In this paper, we study the problem of computing such cross-cultural differences and similarities. We present a lightweight yet effective approach, and evaluate it on two novel tasks: 1) mining cross-cultural differences of named entities and 2) finding similar terms for slang across languages. Experimental results show that our framework substantially outperforms a number of baseline methods on both tasks. The framework could be useful for machine translation applications and research in computational social science.

Posted Content
TL;DR: This paper makes the first attempt to let SRL enhance text comprehension and inference through specifying verbal predicates and their corresponding semantic roles, and shows that the salient labels can be conveniently added to existing models and significantly improve deep learning models in challenging text comprehension tasks.
Abstract: Who did what to whom is a major focus in natural language understanding, which is right the aim of semantic role labeling (SRL) task. Despite of sharing a lot of processing characteristics and even task purpose, it is surprisingly that jointly considering these two related tasks was never formally reported in previous work. Thus this paper makes the first attempt to let SRL enhance text comprehension and inference through specifying verbal predicates and their corresponding semantic roles. In terms of deep learning models, our embeddings are enhanced by explicit contextual semantic role labels for more fine-grained semantics. We show that the salient labels can be conveniently added to existing models and significantly improve deep learning models in challenging text comprehension tasks. Extensive experiments on benchmark machine reading comprehension and inference datasets verify that the proposed semantic learning helps our system reach new state-of-the-art over strong baselines which have been enhanced by well pretrained language models from the latest progress.

Proceedings ArticleDOI
01 Jun 2018
TL;DR: This paper proposed efficient deep neural network architectures that maximally re-use available resources through transfer learning, which significantly increase accuracy in low resource settings and enable rapid development of accurate models with less data.
Abstract: Fast expansion of natural language functionality of intelligent virtual agents is critical for achieving engaging and informative interactions. However, developing accurate models for new natural language domains is a time and data intensive process. We propose efficient deep neural network architectures that maximally re-use available resources through transfer learning. Our methods are applied for expanding the understanding capabilities of a popular commercial agent and are evaluated on hundreds of new domains, designed by internal or external developers. We demonstrate that our proposed methods significantly increase accuracy in low resource settings and enable rapid development of accurate models with less data.

Proceedings ArticleDOI
26 Feb 2018
TL;DR: A Chatbot Framework is established and it is shown that the framework that is based on some theoretical designs could be practically implemented to satisfy the required capabilities in the industry.
Abstract: Artificial Intelligence continues to grow in popularity on various industrial platforms, becoming especially prominent in Chatbot technology. A great deal of recent research was focused on social and assistive Chatbot, and its technology has been evolving over time. Following this direction, a Chatbot Framework is established in this paper, discussing about the relevant technologies. Firstly, the development of Artificial Intelligence is introduced. In particular, we present the Chatbot history of technology timeline. Following this, we describe the capabilities of the Chatbot and discuss about the technologies. The entire Chatbot framework will be presented afterwards, as well as the supporting set of modules. Our analysis of this framework shows that the framework that is based on some theoretical designs could be practically implemented to satisfy the required capabilities in the industry. Through our analysis, we show that the capabilities are also feasible.

Posted Content
TL;DR: This paper introduces Information Extraction technology, its various sub-tasks, highlights state-of-the-art research in various IE subtasks, current challenges and future research directions.
Abstract: With rise of digital age, there is an explosion of information in the form of news, articles, social media, and so on. Much of this data lies in unstructured form and manually managing and effectively making use of it is tedious, boring and labor intensive. This explosion of information and need for more sophisticated and efficient information handling tools gives rise to Information Extraction(IE) and Information Retrieval(IR) technology. Information Extraction systems takes natural language text as input and produces structured information specified by certain criteria, that is relevant to a particular application. Various sub-tasks of IE such as Named Entity Recognition, Coreference Resolution, Named Entity Linking, Relation Extraction, Knowledge Base reasoning forms the building blocks of various high end Natural Language Processing (NLP) tasks such as Machine Translation, Question-Answering System, Natural Language Understanding, Text Summarization and Digital Assistants like Siri, Cortana and Google Now. This paper introduces Information Extraction technology, its various sub-tasks, highlights state-of-the-art research in various IE subtasks, current challenges and future research directions.

DissertationDOI
23 Apr 2018
TL;DR: This work presents an end-to-end pipeline for translating natural language commands to discrete robot actions, and uses clarification dialogs to jointly improve language parsing and concept grounding.
Abstract: Natural language understanding for robotics can require substantial domainand platform-specific engineering. For example, for mobile robots to pick-and-place objects in an environment to satisfy human commands, we can specify the language humans use to issue such commands, and connect concept words like red can to physical object properties. One way to alleviate this engineering for a new domain is to enable robots in human environments to adapt dynamically— continually learning new language constructions and perceptual concepts. In this work, we present an end-to-end pipeline for translating natural language commands to discrete robot actions, and use clarification dialogs to jointly improve language parsing and concept grounding. We train and evaluate this agent in a virtual setting on Amazon Mechanical Turk, and we transfer the learned agent to a physical robot platform to demonstrate it in the real world.

Proceedings ArticleDOI
01 Dec 2018
TL;DR: This paper evaluates architectures, models and deployment issues related to the usage of deep learning techniques in the automotive manufacturing domain, and developed several deep learning models that help to improve the quality and efficiency of these processes.
Abstract: Artificial Intelligence (AI) and Deep Learning has been steadily gaining importance due it’s potential for a broad set of science and industry applications. The success of deep learning techniques has found many applications, e.g. in the domain of computer vision and natural language understanding. Developing AI applications is a complex task with many challenges related to data collection, model training, and deployment.In this paper, we evaluate architectures, models and deployment issues related to the usage of deep learning techniques in the automotive manufacturing domain. Particularly, we focus on different computer vision problems in automotive manufacturing processes, e.g., in logistics processes. We developed several deep learning models that help to improve the quality and efficiency of these processes. Finally, we provide an analysis of the architecture, datasets and models used, and provide performance metrics for each of the different models.

Book ChapterDOI
08 Sep 2018
TL;DR: It is demonstrated that multi-hop FiLM generation significantly outperforms prior state-of-the-art on the GuessWhat?! visual dialogue task and matches state- of-the art on the ReferIt object retrieval task, and additional qualitative analysis is provided.
Abstract: Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue. For such tasks, one successful approach is to condition image-based convolutional network computation on language via Feature-wise Linear Modulation (FiLM) layers, i.e., per-channel scaling and shifting. We propose to generate the parameters of FiLM layers going up the hierarchy of a convolutional network in a multi-hop fashion rather than all at once, as in prior work. By alternating between attending to the language input and generating FiLM layer parameters, this approach is better able to scale to settings with longer input sequences such as dialogue. We demonstrate that multi-hop FiLM generation significantly outperforms prior state-of-the-art on the GuessWhat?! visual dialogue task and matches state-of-the art on the ReferIt object retrieval task, and we provide additional qualitative analysis.

Proceedings Article
01 May 2018
TL;DR: The b5 corpus is described, a collection of controlled and free (non-topic specific) texts produced in different communicative tasks, and accompanied by inventories of personality of their authors and additional demographics, which aims to provide support for a wide range of NLP studies based on personality information.
Abstract: The computational treatment of human personality both for the recognition of personality traits from text and for the generation of text so as to reflect a particular set of traits is central to the development of NLP applications. As a means to provide a basic resource for studies of this kind, this article describes the b5 corpus, a collection of controlled and free (non-topic specific) texts produced in different (e.g., referential or descriptive) communicative tasks, and accompanied by inventories of personality of their authors and additional demographics. The present discussion is mainly focused on the various corpus components and on the data collection task itself, but preliminary results of personality recognition from text are presented in order to illustrate how the corpus data may be reused. The b5 corpus aims to provide support for a wide range of NLP studies based on personality information and it is, to the best of our knowledge, the largest resource of this kind to be made available for research purposes in the Brazilian Portuguese language.

Proceedings ArticleDOI
01 Jun 2018
TL;DR: This work describes Bag of Experts (BoE) architectures for model reuse for both LSTM and CRF based models for slot tagging and shows that these models outperform the baseline models with a statistically significant average margin of 5.06% in absolute F1-score.
Abstract: Slot tagging, the task of detecting entities in input user utterances, is a key component of natural language understanding systems for personal digital assistants. Since each new domain requires a different set of slots, the annotation costs for labeling data for training slot tagging models increases rapidly as the number of domains grow. To tackle this, we describe Bag of Experts (BoE) architectures for model reuse for both LSTM and CRF based models. Extensive experimentation over a dataset of 10 domains drawn from data relevant to our commercial personal digital assistant shows that our BoE models outperform the baseline models with a statistically significant average margin of 5.06% in absolute F1-score when training with 2000 instances per domain, and achieve an even higher improvement of 12.16% when only 25% of the training data is used.