scispace - formally typeset
Search or ask a question

Showing papers on "Natural language published in 2014"


Proceedings Article
08 Dec 2014
TL;DR: Convolutional neural network models for matching two sentences are proposed, by adapting the convolutional strategy in vision and speech and nicely represent the hierarchical structures of sentences with their layer-by-layer composition and pooling.
Abstract: Semantic matching is of central importance to many natural language tasks [2,28]. A successful matching algorithm needs to adequately model the internal structures of language objects and the interaction between them. As a step toward this goal, we propose convolutional neural network models for matching two sentences, by adapting the convolutional strategy in vision and speech. The proposed models not only nicely represent the hierarchical structures of sentences with their layer-by-layer composition and pooling, but also capture the rich matching patterns at different levels. Our models are rather generic, requiring no prior knowledge on language, and can hence be applied to matching tasks of different nature and in different languages. The empirical study on a variety of matching tasks demonstrates the efficacy of the proposed model on a variety of matching tasks and its superiority to competitor models.

1,041 citations


Posted Content
TL;DR: In this article, a multi-modal recurrent neural network (M-RNN) is used to align the two modalities through a multimodal embedding, and the inferred alignments are used to learn to generate novel descriptions of image regions.
Abstract: We present a model that generates natural language descriptions of images and their regions. Our approach leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between language and visual data. Our alignment model is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Multimodal Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate that our alignment model produces state of the art results in retrieval experiments on Flickr8K, Flickr30K and MSCOCO datasets. We then show that the generated descriptions significantly outperform retrieval baselines on both full images and on a new dataset of region-level annotations.

946 citations


Proceedings ArticleDOI
01 Oct 2014
TL;DR: A new game to crowd-source natural language referring expressions by designing a two player game that can both collect and verify referring expressions directly within the game and provides an in depth analysis of the resulting dataset.
Abstract: In this paper we introduce a new game to crowd-source natural language referring expressions. By designing a two player game, we can both collect and verify referring expressions directly within the game. To date, the game has produced a dataset containing 130,525 expressions, referring to 96,654 distinct objects, in 19,894 photographs of natural scenes. This dataset is larger and more varied than previous REG datasets and allows us to study referring expressions in real-world scenes. We provide an in depth analysis of the resulting dataset. Based on our findings, we design a new optimization based model for generating referring expressions and perform experimental evaluations on 3 test sets.

842 citations


Proceedings Article
21 Jun 2014
TL;DR: This work introduces two multimodal neural language models: models of natural language that can be conditioned on other modalities and imagetext modelling, which can generate sentence descriptions for images without the use of templates, structured prediction, and/or syntactic trees.
Abstract: We introduce two multimodal neural language models: models of natural language that can be conditioned on other modalities. An image-text multimodal neural language model can be used to retrieve images given complex sentence queries, retrieve phrase descriptions given image queries, as well as generate text conditioned on images. We show that in the case of image-text modelling we can jointly learn word representations and image features by training our models together with a convolutional network. Unlike many of the existing methods, our approach can generate sentence descriptions for images without the use of templates, structured prediction, and/or syntactic trees. While we focus on imagetext modelling, our algorithms can be easily applied to other modalities such as audio.

693 citations


Proceedings ArticleDOI
01 Jun 2014
TL;DR: This paper presents two simple paraphrase models, an association model and a vector space model, and trains them jointly from question-answer pairs, improving state-of-the-art accuracies on two recently released question-answering datasets.
Abstract: A central challenge in semantic parsing is handling the myriad ways in which knowledge base predicates can be expressed. Traditionally, semantic parsers are trained primarily from text paired with knowledge base information. Our goal is to exploit the much larger amounts of raw text not tied to any knowledge base. In this paper, we turn semantic parsing on its head. Given an input utterance, we first use a simple method to deterministically generate a set of candidate logical forms with a canonical realization in natural language for each. Then, we use a paraphrase model to choose the realization that best paraphrases the input, and output the corresponding logical form. We present two simple paraphrase models, an association model and a vector space model, and train them jointly from question-answer pairs. Our system PARASEMPRE improves stateof-the-art accuracies on two recently released question-answering datasets.

593 citations


Journal ArticleDOI
01 Feb 2014
TL;DR: In this article, a new neural network architecture is proposed to embed multi-relational graphs into a flexible continuous vector space in which the original data is kept and enhanced. And the network is trained to encode the semantics of these graphs in order to assign high probabilities to plausible components.
Abstract: Large-scale relational learning becomes crucial for handling the huge amounts of structured data generated daily in many application domains ranging from computational biology or information retrieval, to natural language processing. In this paper, we present a new neural network architecture designed to embed multi-relational graphs into a flexible continuous vector space in which the original data is kept and enhanced. The network is trained to encode the semantics of these graphs in order to assign high probabilities to plausible components. We empirically show that it reaches competitive performance in link prediction on standard datasets from the literature as well as on data from a real-world knowledge base (WordNet). In addition, we present how our method can be applied to perform word-sense disambiguation in a context of open-text semantic parsing, where the goal is to learn to assign a structured meaning representation to almost any sentence of free text, demonstrating that it can scale up to tens of thousands of nodes and thousands of types of relation.

511 citations


Journal ArticleDOI
01 Sep 2014
TL;DR: The architecture of an interactive natural language query interface for relational databases is described, able to correctly interpret complex natural language queries, in a generic manner across a range of domains, and is good enough to be usable in practice.
Abstract: Natural language has been the holy grail of query interface designers, but has generally been considered too hard to work with, except in limited specific circumstances. In this paper, we describe the architecture of an interactive natural language query interface for relational databases. Through a carefully limited interaction with the user, we are able to correctly interpret complex natural language queries, in a generic manner across a range of domains. By these means, a logically complex English language sentence is correctly translated into a SQL query, which may include aggregation, nesting, and various types of joins, among other things, and can be evaluated against an RDBMS. We have constructed a system, NaLIR (Natural Language Interface for Relational databases), embodying these ideas. Our experimental assessment, through user studies, demonstrates that NaLIR is good enough to be usable in practice: even naive users are able to specify quite complex ad-hoc queries.

411 citations


Journal ArticleDOI
TL;DR: The authors proposed a concept-level sentiment analysis that merges linguistics, common-sense computing, and machine learning for improving the accuracy of tasks such as polarity detection, by allowing sentiments to flow from concept to concept based on the dependency relation of the input sentence, in particular, achieving a better understanding of the contextual role of each concept within the sentence and, hence, obtaining a polarity detector that outperforms state-of-the-art statistical methods.
Abstract: The Web is evolving through an era where the opinions of users are getting increasingly important and valuable. The distillation of knowledge from the huge amount of unstructured information on the Web can be a key factor for tasks such as social media marketing, branding, product positioning, and corporate reputation management. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions involves a deep understanding of natural language text by machines, from which we are still very far. To this end, concept-level sentiment analysis aims to go beyond a mere word-level analysis of text and provide novel approaches to opinion mining and sentiment analysis that enable a more efficient passage from (unstructured) textual information to (structured) machine-processable data. A recent knowledge-based technology in this context is sentic computing, which relies on the ensemble application of common-sense computing and the psychology of emotions to infer the conceptual and affective information associated with natural language. Sentic computing, however, is limited by the richness of the knowledge base and by the fact that the bag-of-concepts model, despite more sophisticated than bag-of-words, misses out important discourse structure information that is key for properly detecting the polarity conveyed by natural language opinions. In this work, we introduce a novel paradigm to concept-level sentiment analysis that merges linguistics, common-sense computing, and machine learning for improving the accuracy of tasks such as polarity detection. By allowing sentiments to flow from concept to concept based on the dependency relation of the input sentence, in particular, we achieve a better understanding of the contextual role of each concept within the sentence and, hence, obtain a polarity detection engine that outperforms state-of-the-art statistical methods.

325 citations


Journal ArticleDOI
Tobias Kuhn1
TL;DR: A comprehensive survey of existing English-based Natural Language Controlled Natural Language (CNL) can be found in this article, where the authors provide a common terminology and a common model for CNL, to contribute to the understanding of their general nature, to provide a starting point for researchers interested in the area and to help developers to make design decisions.
Abstract: What is here called controlled natural language CNL has traditionally been given many different names. Especially during the last four decades, a wide variety of such languages have been designed. They are applied to improve communication among humans, to improve translation, or to provide natural and intuitive representations for formal notations. Despite the apparent differences, it seems sensible to put all these languages under the same umbrella. To bring order to the variety of languages, a general classification scheme is presented here. A comprehensive survey of existing English-based CNLs is given, listing and describing 100 languages from 1930 until today. Classification of these languages reveals that they form a single scattered cloud filling the conceptual space between natural languages such as English on the one end and formal languages such as propositional logic on the other. The goal of this article is to provide a common terminology and a common model for CNL, to contribute to the understanding of their general nature, to provide a starting point for researchers interested in the area, and to help developers to make design decisions.

308 citations


Proceedings Article
08 Dec 2014
TL;DR: The results show that deep RNNs outperform associated shallow counterparts that employ the same number of parameters and outperforms previous baselines on the sentiment analysis task, including a multiplicative RNN variant as well as the recently introduced paragraph vectors.
Abstract: Recursive neural networks comprise a class of architecture that can operate on structured input. They have been previously successfully applied to model com-positionality in natural language using parse-tree-based structural representations. Even though these architectures are deep in structure, they lack the capacity for hierarchical representation that exists in conventional deep feed-forward networks as well as in recently investigated deep recurrent neural networks. In this work we introduce a new architecture — a deep recursive neural network (deep RNN) — constructed by stacking multiple recursive layers. We evaluate the proposed model on the task of fine-grained sentiment classification. Our results show that deep RNNs outperform associated shallow counterparts that employ the same number of parameters. Furthermore, our approach outperforms previous baselines on the sentiment analysis task, including a multiplicative RNN variant as well as the recently introduced paragraph vectors, achieving new state-of-the-art results. We provide exploratory analyses of the effect of multiple layers and show that they capture different aspects of compositionality in language.

289 citations


Proceedings ArticleDOI
18 Jun 2014
TL;DR: A semantic query graph is proposed to model the query intention in the natural language question in a structural way, based on which, RDF Q/A is reduced to subgraph matching problem and resolves the ambiguity of natural language questions at the time when matches of query are found.
Abstract: RDF question/answering (Q/A) allows users to ask questions in natural languages over a knowledge base represented by RDF. To answer a national language question, the existing work takes a two-stage approach: question understanding and query evaluation. Their focus is on question understanding to deal with the disambiguation of the natural language phrases. The most common technique is the joint disambiguation, which has the exponential search space. In this paper, we propose a systematic framework to answer natural language questions over RDF repository (RDF Q/A) from a graph data-driven perspective. We propose a semantic query graph to model the query intention in the natural language question in a structural way, based on which, RDF Q/A is reduced to subgraph matching problem. More importantly, we resolve the ambiguity of natural language questions at the time when matches of query are found. The cost of disambiguation is saved if there are no matching found. We compare our method with some state-of-the-art RDF Q/A systems in the benchmark dataset. Extensive experiments confirm that our method not only improves the precision but also speeds up query performance greatly.


Posted Content
TL;DR: This paper proposes to translate videos directly to sentences using a unified deep neural network with both convolutional and recurrent structure, to create sentence descriptions of open-domain videos with large vocabularies.
Abstract: Solving the visual symbol grounding problem has long been a goal of artificial intelligence The field appears to be advancing closer to this goal with recent breakthroughs in deep learning for natural language grounding in static images In this paper, we propose to translate videos directly to sentences using a unified deep neural network with both convolutional and recurrent structure Described video datasets are scarce, and most existing methods have been applied to toy domains with a small vocabulary of possible words By transferring knowledge from 12M+ images with category labels and 100,000+ images with captions, our method is able to create sentence descriptions of open-domain videos with large vocabularies We compare our approach with recent work using language generation metrics, subject, verb, and object prediction accuracy, and a human evaluation

Journal ArticleDOI
TL;DR: This paper proposes an alternative framework in which iconicity in face-to-face communication is a powerful vehicle for bridging between language and human sensori-motor experience, and, as such, iconicity provides a key to understanding language evolution, development and processing.
Abstract: Iconicity, a resemblance between properties of linguistic form (both in spoken and signed languages) and meaning, has traditionally been considered to be a marginal, irrelevant phenomenon for our understanding of language processing, development and evolution. Rather, the arbitrary and symbolic nature of language has long been taken as a design feature of the human linguistic system. In this paper, we propose an alternative framework in which iconicity in face-to-face communication (spoken and signed) is a powerful vehicle for bridging between language and human sensori-motor experience, and, as such, iconicity provides a key to understanding language evolution, development and processing. In language evolution, iconicity might have played a key role in establishing displacement (the ability of language to refer beyond what is immediately present), which is core to what language does; in ontogenesis, iconicity might play a critical role in supporting referentiality (learning to map linguistic labels to objects, events, etc., in the world), which is core to vocabulary development. Finally, in language processing, iconicity could provide a mechanism to account for how language comes to be embodied (grounded in our sensory and motor systems), which is core to meaningful communication.

Journal ArticleDOI
TL;DR: It is argued there is no empirical evidence to support the continued use of the term SLI and limited evidence that it has provided any real benefits for children and their families, and an international consensus panel is proposed to develop an agreed definition and set of criteria for language impairment.
Abstract: Background:Theterm‘specificlanguageimpairment’(SLI),inusesincethe1980s,describeschildrenwithlanguage impairment whose cognitive skills are within normal limits where there is no identifiable reason for the language impairment. SLI is determined by applying exclusionary criteria, so that it is defined by what it is not rather than by what it is. The recent decision to not include SLI in DSM-5 provoked much debate and concern from researchers and clinicians. Aims: To explore how the term ‘specific language impairment’ emerged, to consider how disorders, including SLI, are generally defined and to explore how societal changes might impact on use the term. Methods & Procedures: We reviewed the literature to explore the origins of the term ‘specific language impairment’ and present published evidence, as well as new analyses of population data, to explore the validity of continuing to use the term. Outcomes & Results and Conclusions & Implications: We support the decision to exclude the term ‘specific language impairment’ from DSM-5 and conclude that the term has been a convenient label for researchers, but that the current classification is unacceptably arbitrary. Furthermore, we argue there is no empirical evidence to support the continued use of the term SLI and limited evidence that it has provided any real benefits for children and their families. In fact, the term may be disadvantageous to some due to the use of exclusionary criteria to determine eligibility for and access to speech pathology services. We propose the following recommendations. First, that the word ‘specific’ be removed and the label ‘language impairment’ be used. Second, that the exclusionary criteria be relaxed and in their place inclusionary criteria be adopted that take into account the fluid nature of language development particularly in the preschool period. Building on the goodwill and collaborations between the clinical and research communities we propose the establishment of an international consensus panel to develop an agreed definition and set of criteria for language impairment. Given the rich data now available in population studies it is possible to test the validity of these definitions and criteria. Consultation with service users and policy-makers should be incorporated into the decision-making process.

Journal ArticleDOI
TL;DR: This paper introduces a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs and converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision.
Abstract: In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the F REE 917 and W EB Q UESTIONS benchmark datasets show our semantic parser improves over the state of the art.

Proceedings Article
23 Aug 2014
TL;DR: This paper proposes a strategy for generating textual descriptions of videos by using a factor graph to combine visual detections with language statistics, and uses state-of-the-art visual recognition systems to obtain confidences on entities, activities, and scenes present in the video.
Abstract: This paper integrates techniques in natural language processing and computer vision to improve recognition and description of entities and activities in real-world videos. We propose a strategy for generating textual descriptions of videos by using a factor graph to combine visual detections with language statistics. We use state-of-the-art visual recognition systems to obtain confidences on entities, activities, and scenes present in the video. Our factor graph model combines these detection confidences with probabilistic knowledge mined from text corpora to estimate the most likely subject, verb, object, and place. Results on YouTube videos show that our approach improves both the joint detection of these latent, diverse sentence components and the detection of some individual components when compared to using the vision system alone, as well as over a previous n-gram language-modeling approach. The joint detection allows us to automatically generate more accurate, richer sentential descriptions of videos with a wide array of possible content.

Proceedings Article
27 Jul 2014
TL;DR: The beginnings of an automatic statistician is presented, focusing on regression problems, which explores an open-ended space of statistical models to discover a good explanation of a data set, and then produces a detailed report with figures and natural language text.
Abstract: This paper presents the beginnings of an automatic statistician, focusing on regression problems. Our system explores an open-ended space of statistical models to discover a good explanation of a data set, and then produces a detailed report with figures and natural language text. Our approach treats unknown regression functions nonparametrically using Gaussian processes, which has two important consequences. First, Gaussian processes can model functions in terms of high-level properties (e.g. smoothness, trends, periodicity, changepoints). Taken together with the compositional structure of our language of models this allows us to automatically describe functions in simple terms. Second, the use of flexible nonparametric models and a rich language for composing them in an open-ended manner also results in state-of-the-art extrapolation performance evaluated over 13 real time series data sets from various domains.

Proceedings ArticleDOI
12 Jul 2014
TL;DR: In this article, the authors demonstrate an approach for enabling a robot to recover from failures by communicating its need for specific help to a human partner using natural language, such as "Please give me the white table leg that is on the black table".
Abstract: Robots inevitably fail, often without the ability to recover autonomously. We demonstrate an approach for enabling a robot to recover from failures by communicating its need for specific help to a human partner using natural language. Our approach automatically detects failures, then generates targeted spoken-language requests for help such as “Please give me the white table leg that is on the black table.” Once the human partner has repaired the failure condition, the system resumes full autonomy. We present a novel inverse semantics algorithm for generating effective help requests. In contrast to forward semantic models that interpret natural language in terms of robot actions and perception, our inverse semantics algorithm generates requests by emulating the human’s ability to interpret a request using the Generalized Grounding Graph (G) framework. To assess the effectiveness of our approach, we present a corpusbased online evaluation, as well as an end-to-end user study, demonstrating that our approach increases the effectiveness of human interventions compared to static requests for help.

Journal ArticleDOI
TL;DR: The focus of recent research on bilingualism has been to understand the relations among discoveries and their implications for language, cognition, and the brain across the life span.
Abstract: A series of discoveries in the past two decades has changed the way we think about bilingualism and its implications for language and cognition. One is that both of the bilingual’s languages are always active. The parallel activation of the two languages is thought to give rise to competition that imposes demands on the bilingual to control the language not in use to achieve fluency in the target language. The second is that there are consequences of bilingualism that affect the native as well as the second language: The native language changes in response to second-language use. The third is that the consequences of bilingualism are not limited to language but appear to reflect a reorganization of brain networks that hold implications for the ways in which bilinguals negotiate cognitive competition more generally. The focus of recent research on bilingualism has been to understand the relations among these discoveries and their implications for language, cognition, and the brain across the life span.

Journal ArticleDOI
TL;DR: The motivation for taking a multi-modal approach to the study of language learning, processing and evolution is provided, and the broad implications of shifting current dominant approaches and assumptions to encompass multimodal expression in both signed and spoken languages are discussed.
Abstract: Our understanding of the cognitive and neural underpinnings of language has traditionally been firmly based on spoken Indo-European languages and on language studied as speech or text. However, in face-to-face communication, language is multimodal: speech signals are invariably accompanied by visual information on the face and in manual gestures, and sign languages deploy multiple channels (hands, face and body) in utterance construction. Moreover, the narrow focus on spoken Indo-European languages has entrenched the assumption that language is comprised wholly by an arbitrary system of symbols and rules. However, iconicity (i.e. resemblance between aspects of communicative form and meaning) is also present: speakers use iconic gestures when they speak; many non-Indo-European spoken languages exhibit a substantial amount of iconicity in word forms and, finally, iconicity is the norm, rather than the exception in sign languages. This introduction provides the motivation for taking a multimodal approach to the study of language learning, processing and evolution, and discusses the broad implications of shifting our current dominant approaches and assumptions to encompass multimodal expression in both signed and spoken languages.

Journal ArticleDOI
TL;DR: EmoSenticSpace, a new framework for affective common-sense reasoning that extends WordNet-Affect and SenticNet by providing both emotion labels and polarity scores for a large set of natural language concepts, is proposed.
Abstract: Emotions play a key role in natural language understanding and sensemaking. Pure machine learning usually fails to recognize and interpret emotions in text accurately. The need for knowledge bases that give access to semantics and sentics (the conceptual and affective information) associated with natural language is growing exponentially in the context of big social data analysis. To this end, this paper proposes EmoSenticSpace, a new framework for affective common-sense reasoning that extends WordNet-Affect and SenticNet by providing both emotion labels and polarity scores for a large set of natural language concepts. The framework is built by means of fuzzy c-means clustering and support-vector-machine classification, and takes into account a number of similarity measures, including point-wise mutual information and emotional affinity. EmoSenticSpace was tested on three emotion-related natural language processing tasks, namely sentiment analysis, emotion recognition, and personality detection. In all cases, the proposed framework outperforms the state-of-the-art. In particular, the direct evaluation of EmoSenticSpace against psychological features provided in the benchmark ISEAR dataset shows a 92.15% agreement.

Journal ArticleDOI
TL;DR: The objective of this article is to illustrate that text mining and qualitative research are epistemologically compatible and the criteria of sound text mining adhere to those in qualitative research in terms of consistency and replicability.
Abstract: The objective of this article is to illustrate that text mining and qualitative research are epistemologically compatible. First, like many qualitative research approaches, such as grounded theory, text mining encourages open-mindedness and discourages preconceptions. Contrary to the popular belief that text mining is a linear and fully automated procedure, the text miner might add, delete, and revise the initial categories in an iterative fashion. Second, text mining is similar to content analysis, which also aims to extract common themes and threads by counting words. Although both of them utilize computer algorithms, text mining is characterized by its capability of processing natural languages. Last, the criteria of sound text mining adhere to those in qualitative research in terms of consistency and replicability. Key Words: Text Mining, Content Analysis, Exploratory Data Analysis, Natural Language Processing, Computational Linguistics, Grounded Theory, Reliability, and Validity

Proceedings ArticleDOI
01 Oct 2014
TL;DR: The main innovation of this work is to show how to augment explicit constraints with learned spatial knowledge to infer missing objects and likely layouts for the objects in the scene.
Abstract: We address the grounding of natural language to concrete spatial constraints, and inference of implicit pragmatics in 3D environments. We apply our approach to the task of text-to-3D scene generation. We present a representation for common sense spatial knowledge and an approach to extract it from 3D scene data. In text-to3D scene generation, a user provides as input natural language text from which we extract explicit constraints on the objects that should appear in the scene. The main innovation of this work is to show how to augment these explicit constraints with learned spatial knowledge to infer missing objects and likely layouts for the objects in the scene. We demonstrate that spatial knowledge is useful for interpreting natural language and show examples of learned knowledge and generated 3D scenes.

Proceedings ArticleDOI
01 Jun 2014
TL;DR: A translation-based approach to solve knowledge-based question answering tasks in one unified framework based on CYK parsing, which achieves better results than previous methods.
Abstract: A typical knowledge-based question answering (KB-QA) system faces two challenges: one is to transform natural language questions into their meaning representations (MRs); the other is to retrieve answers from knowledge bases (KBs) using generated MRs. Unlike previous methods which treat them in a cascaded manner, we present a translation-based approach to solve these two tasks in one unified framework. We translate questions to answers based on CYK parsing. Answers as translations of the span covered by each CYK cell are obtained by a question translation method, which first generates formal triple queries as MRs for the span based on question patterns and relation expressions, and then retrieves answers from a given KB based on triple queries generated. A linear model is defined over derivations, and minimum error rate training is used to tune feature weights based on a set of question-answer pairs. Compared to a KB-QA system using a state-of-the-art semantic parser, our method achieves better results.

Posted Content
TL;DR: A knowledge engine, which learns and shares knowledge representations, for robots to carry out a variety of tasks, is introduced and its use in three important research areas: grounding natural language, perception, and planning, which are the key building blocks for many robotic tasks.
Abstract: In this paper we introduce a knowledge engine, which learns and shares knowledge representations, for robots to carry out a variety of tasks. Building such an engine brings with it the challenge of dealing with multiple data modalities including symbols, natural language, haptic senses, robot trajectories, visual features and many others. The \textit{knowledge} stored in the engine comes from multiple sources including physical interactions that robots have while performing tasks (perception, planning and control), knowledge bases from the Internet and learned representations from several robotics research groups. We discuss various technical aspects and associated challenges such as modeling the correctness of knowledge, inferring latent information and formulating different robotic tasks as queries to the knowledge engine. We describe the system architecture and how it supports different mechanisms for users and robots to interact with the engine. Finally, we demonstrate its use in three important research areas: grounding natural language, perception, and planning, which are the key building blocks for many robotic tasks. This knowledge engine is a collaborative effort and we call it RoboBrain.

Journal ArticleDOI
TL;DR: The results support the use of highly variable input in a therapeutic context to facilitate grammatical morpheme learning in preschool children with language impairment.
Abstract: Purpose Artificial language learning studies have demonstrated that learners exposed to many different nonword combinations representing a grammatical form demonstrate rapid learning of that form w...

Journal ArticleDOI
TL;DR: A survey of case studies of language learners in transnational, multilingual, and diaspora contexts can be found in this article, which provides an overview of some traditional areas of coverage and then newer foci in terms of methodology, thematic areas, and findings pertaining to language learners.
Abstract: Case study research has played a very important role in applied linguistics since the field was established, particularly in studies of language teaching, learning, and use. The case in such studies generally has been a person (e.g., a teacher, learner, speaker, writer, or interlocutor) or a small number of individuals on their own or in a group (e.g., a family, a class, a work team, or a community of practice). The cases are normally studied in depth in order to provide an understanding of individuals’ experiences, issues, insights, developmental pathways, or performance within a particular linguistic, social, or educational context. Rather than discuss constructs, hypotheses, and findings in terms of statistical patterns or trends derived from a larger sample or survey of a population of language learners, as in some quantitative research, a qualitative case study of a person presents a contextualized human profile. Case study has contributed substantially to theory development, generating new perspectives or offering a refutation or refinement of earlier theories in applied linguistics by analyzing linguistic, cultural, and social phenomena associated with children, adolescents, young adults, and older adults.In recent years, the purview of case studies in applied linguistics has expanded to include many previously underrepresented topics, linguistic situations, theoretical perspectives, and populations. This article provides an overview of some traditional areas of coverage and then newer foci in terms of methodology, thematic areas, and findings pertaining to language learners in transnational, multilingual, and diaspora contexts especially.

Patent
Soumitri N. Kolavennu1
20 Aug 2014
TL;DR: In this article, a natural language text based message from a recorded voice message is translated into a command recognizable by the HVAC controller, where the natural language message is then sent back to the user.
Abstract: An HVAC controller may be controlled in response to a natural language message that is not recognizable by the HVAC controller as a command, where the natural language message is translated into a command recognizable by the HVAC controller. Voice recognition software may be used to create a natural language text based message from a recorded voice message, where the natural language text based message is translated into the command recognizable by the HVAC controller. In response to the command, the HVAC controller may perform an action and/or respond with a natural language text based message. Where the HVAC controller responds to the natural language text based message, the HVAC controller may send a natural language text based message back to the user. In some cases, a user may communicate with the thermostat via an on-line social network.

Proceedings ArticleDOI
29 Sep 2014
TL;DR: This paper presents a new model called the Distributed Correspondence Graph (DCG) to infer the most likely set of planning constraints from natural language instructions, and presents experimental results from comparative experiments that demonstrate improvements in efficiency in natural language understanding without loss of accuracy.
Abstract: Natural language interfaces for robot control aspire to find the best sequence of actions that reflect the behavior intended by the instruction. This is difficult because of the diversity of language, variety of environments, and heterogeneity of tasks. Previous work has demonstrated that probabilistic graphical models constructed from the parse structure of natural language can be used to identify motions that most closely resemble verb phrases. Such approaches however quickly succumb to computational bottlenecks imposed by construction and search the space of possible actions. Planning constraints, which define goal regions and separate the admissible and inadmissible states in an environment model, provide an interesting alternative to represent the meaning of verb phrases. In this paper we present a new model called the Distributed Correspondence Graph (DCG) to infer the most likely set of planning constraints from natural language instructions. A trajectory planner then uses these planning constraints to find a sequence of actions that resemble the instruction. Separating the problem of identifying the action encoded by the language into individual steps of planning constraint inference and motion planning enables us to avoid computational costs associated with generation and evaluation of many trajectories. We present experimental results from comparative experiments that demonstrate improvements in efficiency in natural language understanding without loss of accuracy.