Showing papers on "Natural language published in 2013"

PDF

Open Access

Journal Article•DOI•

BabyTalk: Understanding and Generating Simple Image Descriptions

[...]

Girish Kulkarni¹, Visruth Premraj¹, Vicente Ordonez¹, Sagnik Dhar¹, Siming Li¹, Yejin Choi¹, Alexander C. Berg¹, Tamara L. Berg¹ - Show less +4 more•Institutions (1)

Stony Brook University¹

01 Dec 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The proposed system to automatically generate natural language descriptions from images is very effective at producing relevant sentences for images and generates descriptions that are notably more true to the specific image content than previous work.

...read moreread less

Abstract: We present a system to automatically generate natural language descriptions from images. This system consists of two parts. The first part, content planning, smooths the output of computer vision-based detection and recognition algorithms with statistics mined from large pools of visually descriptive text to determine the best content words to use to describe an image. The second step, surface realization, chooses words to construct natural language sentences based on the predicted content and general statistics from natural language. We present multiple approaches for the surface realization step and evaluate each using automatic measures of similarity to human generated reference descriptions. We also collect forced choice human evaluations between descriptions from the proposed generation system and descriptions from competing approaches. The proposed system is very effective at producing relevant sentences for images. It also generates descriptions that are notably more true to the specific image content than previous work.

...read moreread less

791 citations

Proceedings Article•DOI•

YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition

[...]

Sergio Guadarrama¹, Niveda Krishnamoorthy², Girish Malkarnenkar², Subhashini Venugopalan², Raymond J. Mooney², Trevor Darrell¹, Kate Saenko³ - Show less +3 more•Institutions (3)

University of California, Berkeley¹, University of Texas at Austin², University of Massachusetts Lowell³

01 Dec 2013

TL;DR: This paper presents a solution that takes a short video clip and outputs a brief sentence that sums up the main activity in the video, such as the actor, the action and its object, and uses a Web-scale language model to ``fill in'' novel verbs.

...read moreread less

Abstract: Despite a recent push towards large-scale object recognition, activity recognition remains limited to narrow domains and small vocabularies of actions. In this paper, we tackle the challenge of recognizing and describing activities ``in-the-wild''. We present a solution that takes a short video clip and outputs a brief sentence that sums up the main activity in the video, such as the actor, the action and its object. Unlike previous work, our approach works on out-of-domain actions: it does not require training videos of the exact activity. If it cannot find an accurate prediction for a pre-trained model, it finds a less specific answer that is also plausible from a pragmatic standpoint. We use semantic hierarchies learned from the data to help to choose an appropriate level of generalization, and priors learned from Web-scale natural language corpora to penalize unlikely combinations of actors/actions/objects, we also use a Web-scale language model to ``fill in'' novel verbs, i.e. when the verb does not appear in the training set. We evaluate our method on a large YouTube corpus and demonstrate it is able to generate short sentence descriptions of video clips better than baseline approaches.

...read moreread less

555 citations

Proceedings Article•DOI•

ClausIE: clause-based open information extraction

[...]

Luciano Del Corro¹, Rainer Gemulla¹•Institutions (1)

Max Planck Society¹

13 May 2013

TL;DR: ClausIE is a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text using a small set of domain-independent lexica, operates sentence by sentence without any post-processing, and requires no training data.

...read moreread less

Abstract: We propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. ClausIE fundamentally differs from previous approaches in that it separates the detection of ``useful'' pieces of information expressed in a sentence from their representation in terms of extractions. In more detail, ClausIE exploits linguistic knowledge about the grammar of the English language to first detect clauses in an input sentence and to subsequently identify the type of each clause according to the grammatical function of its constituents. Based on this information, ClausIE is able to generate high-precision extractions; the representation of these extractions can be flexibly customized to the underlying application. ClausIE is based on dependency parsing and a small set of domain-independent lexica, operates sentence by sentence without any post-processing, and requires no training data (whether labeled or unlabeled). Our experimental study on various real-world datasets suggests that ClausIE obtains higher recall and higher precision than existing approaches, both on high-quality text as well as on noisy text as found in the web.

...read moreread less

537 citations

Journal Article•DOI•

Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions

[...]

Yoav Artzi, Luke Zettlemoyer

31 Mar 2013-Transactions of the Association for Computational Linguistics

TL;DR: This paper shows semantic parsing can be used within a grounded CCG semantic parsing approach that learns a joint model of meaning and context for interpreting and executing natural language instructions, using various types of weak supervision.

...read moreread less

Abstract: The context in which language is used provides a strong signal for learning to recover its meaning. In this paper, we show it can be used within a grounded CCG semantic parsing approach that learns a joint model of meaning and context for interpreting and executing natural language instructions, using various types of weak supervision. The joint nature provides crucial benefits by allowing situated cues, such as the set of visible objects, to directly influence learning. It also enables algorithms that learn while executing instructions, for example by trying to replicate human actions. Experiments on a benchmark navigational dataset demonstrate strong performance under differing forms of supervision, including correctly executing 60% more instruction sets relative to the previous state of the art.

...read moreread less

530 citations

Proceedings Article•DOI•

Improving efficiency and accuracy in multilingual entity extraction

[...]

Joachim Daiber¹, Max Jakob, Chris Hokamp², Pablo N. Mendes³•Institutions (3)

University of Groningen¹, University of North Texas², Wright State University³

04 Sep 2013

TL;DR: This paper discusses some implementation and data processing challenges encountered while developing a new multilingual version of DBpedia Spotlight that is faster, more accurate and easier to configure, and compares the solution to the previous system.

...read moreread less

Abstract: There has recently been an increased interest in named entity recognition and disambiguation systems at major conferences such as WWW, SIGIR, ACL, KDD, etc. However, most work has focused on algorithms and evaluations, leaving little space for implementation details. In this paper, we discuss some implementation and data processing challenges we encountered while developing a new multilingual version of DBpedia Spotlight that is faster, more accurate and easier to configure. We compare our solution to the previous system, considering time performance, space requirements and accuracy in the context of the Dutch and English languages. Additionally, we report results for 9 additional languages among the largest Wikipedias. Finally, we present challenges and experiences to foment the discussion with other developers interested in recognition and disambiguation of entities in natural language text.

...read moreread less

529 citations

Proceedings Article•DOI•

Translating Video Content to Natural Language Descriptions

[...]

Marcus Rohrbach¹, Wei Qiu¹, Ivan Titov², Stefan Thater³, Manfred Pinkal³, Bernt Schiele¹ - Show less +2 more•Institutions (3)

Max Planck Society¹, University of Amsterdam², Saarland University³

01 Dec 2013

TL;DR: This paper generates a rich semantic representation of the visual content including e.g. object and activity labels and proposes to formulate the generation of natural language as a machine translation problem using the semantic representation as source language and the generated sentences as target language.

...read moreread less

Abstract: Humans use rich natural language to describe and communicate visual perceptions. In order to provide natural language descriptions for visual content, this paper combines two important ingredients. First, we generate a rich semantic representation of the visual content including e.g. object and activity labels. To predict the semantic representation we learn a CRF to model the relationships between different components of the visual input. And second, we propose to formulate the generation of natural language as a machine translation problem using the semantic representation as source language and the generated sentences as target language. For this we exploit the power of a parallel corpus of videos and textual descriptions and adapt statistical machine translation to translate between our two languages. We evaluate our video descriptions on the TACoS dataset, which contains video snippets aligned with sentence descriptions. Using automatic evaluation and human judgments we show significant improvements over several baseline approaches, motivated by prior work. Our translation approach also shows improvements over related work on an image description task.

...read moreread less

438 citations

Patent•

Application of Z-Webs and Z-factors to Analytics, Search Engine, Learning, Recognition, Natural Language, and Other Utilities

[...]

Saied Tadayon, Bijan Tadayon

28 Feb 2013

TL;DR: In this paper, the authors introduce Z-webs, including Z-factors and Z-nodes, for the understanding of relationships between objects, subjects, abstract ideas, concepts, or the like, including face, car, images, people, emotions, mood, text, natural language, voice, music, video, locations, formulas, facts, historical data, landmarks, personalities, ownership, family, friends, love, happiness, social behavior, voting behavior, and the like.

...read moreread less

Abstract: Here, we introduce Z-webs, including Z-factors and Z-nodes, for the understanding of relationships between objects, subjects, abstract ideas, concepts, or the like, including face, car, images, people, emotions, mood, text, natural language, voice, music, video, locations, formulas, facts, historical data, landmarks, personalities, ownership, family, friends, love, happiness, social behavior, voting behavior, and the like, to be used for many applications in our life, including on the search engine, analytics, Big Data processing, natural language processing, economy forecasting, face recognition, dealing with reliability and certainty, medical diagnosis, pattern recognition, object recognition, biometrics, security analysis, risk analysis, fraud detection, satellite image analysis, machine generated data analysis, machine learning, training samples, extracting data or patterns (from the video, images, and the like), editing video or images, and the like. Z-factors include reliability factor, confidence factor, expertise factor, bias factor, and the like, which is associated with each Z-node in the Z-web.

...read moreread less

398 citations

Book Chapter•DOI•

Learning to Parse Natural Language Commands to a Robot Control System

[...]

Cynthia Matuszek¹, Evan Herbst¹, Luke Zettlemoyer¹, Dieter Fox¹•Institutions (1)

University of Washington¹

01 Jan 2013

TL;DR: This work discusses the problem of parsing natural language commands to actions and control structures that can be readily implemented in a robot execution system, and learns a parser based on example pairs of English commands and corresponding control language expressions.

...read moreread less

Abstract: As robots become more ubiquitous and capable of performing complex tasks, the importance of enabling untrained users to interact with them has increased. In response, unconstrained natural-language interaction with robots has emerged as a significant research area. We discuss the problem of parsing natural language commands to actions and control structures that can be readily implemented in a robot execution system. Our approach learns a parser based on example pairs of English commands and corresponding control language expressions. We evaluate this approach in the context of following route instructions through an indoor environment, and demonstrate that our system can learn to translate English commands into sequences of desired actions, while correctly capturing the semantic intent of statements involving complex control structures. The procedural nature of our formal representation allows a robot to interpret route instructions online while moving through a previously unknown environment.

...read moreread less

385 citations

Proceedings Article•

Towards Robust Linguistic Analysis using OntoNotes

[...]

Sameer Pradhan¹, Sameer Pradhan², Alessandro Moschitti³, Alessandro Moschitti⁴, Nianwen Xue⁵, Hwee Tou Ng⁶, Anders Björkelund⁷, Olga Uryupina⁴, Yuchen Zhang⁵, Zhi Zhong⁶ - Show less +6 more•Institutions (7)

Boston Children's Hospital¹, Harvard University², Qatar Computing Research Institute³, University of Trento⁴, Brandeis University⁵, National University of Singapore⁶, University of Stuttgart⁷

01 Aug 2013

TL;DR: An analysis of the performance of publicly available, state-of-the-art tools on all layers and languages in the OntoNotes v5.0 corpus should set the benchmark for future development of various NLP components in syntax and semantics, and possibly encourage research towards an integrated system that makes use of the various layers jointly to improve overall performance.

...read moreread less

Abstract: Large-scale linguistically annotated corpora have played a crucial role in advancing the state of the art of key natural language technologies such as syntactic, semantic and discourse analyzers, and they serve as training data as well as evaluation benchmarks. Up till now, however, most of the evaluation has been done on monolithic corpora such as the Penn Treebank, the Proposition Bank. As a result, it is still unclear how the state-of-the-art analyzers perform in general on data from a variety of genres or domains. The completion of the OntoNotes corpus, a large-scale, multi-genre, multilingual corpus manually annotated with syntactic, semantic and discourse information, makes it possible to perform such an evaluation. This paper presents an analysis of the performance of publicly available, state-of-the-art tools on all layers and languages in the OntoNotes v5.0 corpus. This should set the benchmark for future development of various NLP components in syntax and semantics, and possibly encourage research towards an integrated system that makes use of the various layers jointly to improve overall performance.

...read moreread less

381 citations

Proceedings Article•

The Dialog State Tracking Challenge

[...]

Jason D. Williams¹, Antoine Raux², Deepak Ramachandran³, Alan W. Black⁴•Institutions (4)

Microsoft¹, Honda², Nuance Communications³, Carnegie Mellon University⁴

01 Aug 2013

TL;DR: The dialog state tracking challenge seeks to address this by providing a heterogeneous corpus of 15K human-computer dialogs in a standard format, along with a suite of 11 evaluation metrics, and shows that the suite of performance metrics cluster into 4 natural groups.

...read moreread less

Abstract: In a spoken dialog system, dialog state tracking deduces information about the user’s goal as the dialog progresses, synthesizing evidence such as dialog acts over multiple turns with external data sources. Recent approaches have been shown to overcome ASR and SLU errors in some applications. However, there are currently no common testbeds or evaluation measures for this task, hampering progress. The dialog state tracking challenge seeks to address this by providing a heterogeneous corpus of 15K human-computer dialogs in a standard format, along with a suite of 11 evaluation metrics. The challenge received a total of 27 entries from 9 research groups. The results show that the suite of performance metrics cluster into 4 natural groups. Moreover, the dialog systems that benefit most from dialog state tracking are those with less discriminative speech recognition confidence scores. Finally, generalization is a key problem: in 2 of the 4 test sets, fewer than half of the entries out-performed simple baselines. 1 Overview and motivation Spoken dialog systems interact with users via natural language to help them achieve a goal. As the interaction progresses, the dialog manager maintains a representation of the state of the dialog in a process called dialog state tracking (DST). For example, in a bus schedule information system, the dialog state might indicate the user’s desired bus route, origin, and destination. Dialog state tracking is difficult because automatic speech ∗Most of the work for the challenge was performed when the second and third authors were with Honda Research Institute, Mountain View, CA, USA recognition (ASR) and spoken language understanding (SLU) errors are common, and can cause the system to misunderstand the user’s needs. At the same time, state tracking is crucial because the system relies on the estimated dialog state to choose actions – for example, which bus schedule information to present to the user. Most commercial systems use hand-crafted heuristics for state tracking, selecting the SLU result with the highest confidence score, and discarding alternatives. In contrast, statistical approaches compute scores for many hypotheses for the dialog state (Figure 1). By exploiting correlations between turns and information from external data sources – such as maps, bus timetables, or models of past dialogs – statistical approaches can overcome some SLU errors. Numerous techniques for dialog state tracking have been proposed, including heuristic scores (Higashinaka et al., 2003), Bayesian networks (Paek and Horvitz, 2000; Williams and Young, 2007), kernel density estimators (Ma et al., 2012), and discriminative models (Bohus and Rudnicky, 2006). Techniques have been fielded which scale to realistically sized dialog problems and operate in real time (Young et al., 2010; Thomson and Young, 2010; Williams, 2010; Mehta et al., 2010). In end-to-end dialog systems, dialog state tracking has been shown to improve overall system performance (Young et al., 2010; Thomson and Young, 2010). Despite this progress, direct comparisons between methods have not been possible because past studies use different domains and system components, for speech recognition, spoken language understanding, dialog control, etc. Moreover, there is little agreement on how to evaluate dialog state tracking. Together these issues limit progress in this research area. The Dialog State Tracking Challenge (DSTC) provides a first common testbed and evaluation

...read moreread less

379 citations

Proceedings Article•DOI•

A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching

[...]

Pradipto Das¹, Chenliang Xu¹, Richard F. Doell¹, Jason J. Corso¹•Institutions (1)

University at Buffalo¹

23 Jun 2013

TL;DR: This paper proposes a hybrid system consisting of a low level multimodal latent topic model for initial keyword annotation, a middle level of concept detectors and a high level module to produce final lingual descriptions that captures the most relevant contents of a video in a natural language description.

...read moreread less

Abstract: The problem of describing images through natural language has gained importance in the computer vision community. Solutions to image description have either focused on a top-down approach of generating language through combinations of object detections and language models or bottom-up propagation of keyword tags from training images to test images through probabilistic or nearest neighbor techniques. In contrast, describing videos with natural language is a less studied problem. In this paper, we combine ideas from the bottom-up and top-down approaches to image description and propose a method for video description that captures the most relevant contents of a video in a natural language description. We propose a hybrid system consisting of a low level multimodal latent topic model for initial keyword annotation, a middle level of concept detectors and a high level module to produce final lingual descriptions. We compare the results of our system to human descriptions in both short and long forms on two datasets, and demonstrate that final system output has greater agreement with the human descriptions than any single level.

...read moreread less

Proceedings Article•

A Deep Architecture for Matching Short Texts

[...]

Zhengdong Lu¹, Hang Li¹•Institutions (1)

Huawei¹

05 Dec 2013

TL;DR: This paper proposes a new deep architecture to more effectively model the complicated matching relations between two objects from heterogeneous domains and applies this model to matching tasks in natural language, e.g., finding sensible responses for a tweet, or relevant answers to a given question.

...read moreread less

Abstract: Many machine learning problems can be interpreted as learning for matching two types of objects (e.g., images and captions, users and products, queries and documents, etc.). The matching level of two objects is usually measured as the inner product in a certain feature space, while the modeling effort focuses on mapping of objects from the original space to the feature space. This schema, although proven successful on a range of matching tasks, is insufficient for capturing the rich structure in the matching process of more complicated objects. In this paper, we propose a new deep architecture to more effectively model the complicated matching relations between two objects from heterogeneous domains. More specifically, we apply this model to matching tasks in natural language, e.g., finding sensible responses for a tweet, or relevant answers to a given question. This new architecture naturally combines the localness and hierarchy intrinsic to the natural language problems, and therefore greatly improves upon the state-of-the-art models.

...read moreread less

Proceedings Article•DOI•

How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms

[...]

Annibale Panichella¹, Bogdan Dit², Rocco Oliveto³, Massimilano Di Penta⁴, Denys Poshynanyk², Andrea De Lucia¹ - Show less +2 more•Institutions (4)

University of Salerno¹, College of William & Mary², University of Molise³, University of Sannio⁴

18 May 2013

TL;DR: A novel solution to adapt, configure and effectively use a topic modeling technique, namely Latent Dirichlet Allocation (LDA), to achieve better (acceptable) performance across various SE tasks is proposed.

...read moreread less

Abstract: Information Retrieval (IR) methods, and in particular topic models, have recently been used to support essential software engineering (SE) tasks, by enabling software textual retrieval and analysis. In all these approaches, topic models have been used on software artifacts in a similar manner as they were used on natural language documents (e.g., using the same settings and parameters) because the underlying assumption was that source code and natural language documents are similar. However, applying topic models on software data using the same settings as for natural language text did not always produce the expected results. Recent research investigated this assumption and showed that source code is much more repetitive and predictable as compared to the natural language text. Our paper builds on this new fundamental finding and proposes a novel solution to adapt, configure and effectively use a topic modeling technique, namely Latent Dirichlet Allocation (LDA), to achieve better (acceptable) performance across various SE tasks. Our paper introduces a novel solution called LDA-GA, which uses Genetic Algorithms (GA) to determine a near-optimal configuration for LDA in the context of three different SE tasks: (1) traceability link recovery, (2) feature location, and (3) software artifact labeling. The results of our empirical studies demonstrate that LDA-GA is ableto identify robust LDA configurations, which lead to a higher accuracy on all the datasets for these SE tasks as compared to previously published results, heuristics, and the results of a combinatorial search.

...read moreread less

Proceedings Article•DOI•

Arabic sentiment analysis: Lexicon-based and corpus-based

[...]

Nawaf A. Abdulla¹, Nizar A. Ahmed¹, Mohammed A. Shehab¹, Mahmoud Al-Ayyoub¹•Institutions (1)

Jordan University of Science and Technology¹

01 Dec 2013

TL;DR: This paper starts by building a manually annotated dataset and then takes the reader through the detailed steps of building the lexicon, which addresses both approaches to SA for the Arabic language.

...read moreread less

Abstract: The emergence of the Web 2.0 technology generated a massive amount of raw data by enabling Internet users to post their opinions, reviews, comments on the web. Processing this raw data to extract useful information can be a very challenging task. An example of important information that can be automatically extracted from the users' posts and comments is their opinions on different issues, events, services, products, etc. This problem of Sentiment Analysis (SA) has been studied well on the English language and two main approaches have been devised: corpus-based and lexicon-based. This paper addresses both approaches to SA for the Arabic language. Since there is a limited number of publically available Arabic dataset and Arabic lexicons for SA, this paper starts by building a manually annotated dataset and then takes the reader through the detailed steps of building the lexicon. Experiments are conducted throughout the different stages of this process to observe the improvements gained on the accuracy of the system and compare them to corpus-based approach.

...read moreread less

Book Chapter•DOI•

ConceptNet 5: A Large Semantic Network for Relational Knowledge

[...]

Robert Speer¹, Catherine Havasi¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2013

TL;DR: The latest iteration of ConceptNet is presented, ConceptNet 5, with a focus on its fundamental design decisions and ways to interoperate with it.

...read moreread less

Abstract: ConceptNet is a knowledge representation project, providing a large semantic graph that describes general human knowledge and how it is expressed in natural language. Here we present the latest iteration, ConceptNet 5, with a focus on its fundamental design decisions and ways to interoperate with it.

...read moreread less

Journal Article•DOI•

A method based on PSO and granular computing of linguistic information to solve group decision making problems defined in heterogeneous contexts

[...]

Francisco Javier Cabrerizo¹, Enrique Herrera-Viedma², Witold Pedrycz³, Witold Pedrycz⁴•Institutions (4)

National University of Distance Education¹, University of Granada², Polish Academy of Sciences³, University of Alberta⁴

01 Nov 2013-European Journal of Operational Research

TL;DR: An information granulation of the linguistic information used in group decision making problems defined in heterogeneous contexts, i.e., where the experts have associated importance degrees reflecting their ability to handle the problem.

...read moreread less

Patent•

Learning-Based Processing of Natural Language Questions

[...]

Ming Zhou¹, Furu Wei¹, Xiaohua Liu¹, Hong Sun¹, Yajuan Duan¹, Chengjie Sun¹, Heung-Yeung Shum¹ - Show less +3 more•Institutions (1)

Microsoft¹

02 Jul 2013

TL;DR: In this article, a natural language question is analyzed to extract query units and to determine a question type, answer type, and/or lexical answer type using rules-based heuristics and machine learning trained classifiers.

...read moreread less

Abstract: Techniques described enable answering a natural language question using machine learning-based methods to gather and analyze evidence from web searches. A received natural language question is analyzed to extract query units and to determine a question type, answer type, and/or lexical answer type using rules-based heuristics and/or machine learning trained classifiers. Query generation templates are employed to generate a plurality of ranked queries to be used to gather evidence to determine the answer to the natural language question. Candidate answers are extracted from the results based on the answer type and/or lexical answer type, and ranked using a ranker previously trained offline. Confidence levels are calculated for the candidate answers and top answer(s) may be provided to the user if the confidence levels of the top answer(s) surpass a threshold.

...read moreread less

Patent•

Methods for creating a phrase thesaurus

[...]

Farzad Ehsani, Eva M. Knodt

08 Jan 2013

TL;DR: In this article, a phrase-based modeling of generic structures of verbal interaction is proposed for the purpose of automating part of the design of grammar networks, which can regulate, control, and define the content and scope of human-machine interaction in natural language voice user interfaces.

...read moreread less

Abstract: The invention enables creation of grammar networks that can regulate, control, and define the content and scope of human-machine interaction in natural language voice user interfaces (NLVUI). More specifically, the invention concerns a phrase-based modeling of generic structures of verbal interaction and use of these models for the purpose of automating part of the design of such grammar networks.

...read moreread less

Patent•

Knowledge storage and retrieval system and method

[...]

William Tunstall-Pedoe

17 May 2013

TL;DR: In this article, a system and method for representing, storing and retrieving real-world knowledge on a computer or network of computers is disclosed, where knowledge is broken down into permanent atomic "facts" which can be stored in a standard relational database and processed very efficiently.

...read moreread less

Abstract: A system and method for representing, storing and retrieving real-world knowledge on a computer or network of computers is disclosed. Knowledge is broken down into permanent atomic “facts” which can be stored in a standard relational database and processed very efficiently. It also provides for the efficient querying of a knowledge base, efficient inference of new knowledge and translation into and out of natural language. Queries can also be processed with full natural language explanations of where the answers came from. The method can also be used in a distributed fashion enabling the system to be a large network of computers and the technology can be integrated into a web browser adding to the browser's functionality.

...read moreread less

Journal Article•DOI•

An analysis of symbolic linguistic computing models in decision making

[...]

Rosa M. Rodríguez¹, Luis Martínez¹•Institutions (1)

University of Jaén¹

01 Jan 2013-International Journal of General Systems

TL;DR: This paper overviews the relationship between decision making and CWW, and focuses on symbolic linguistic computing models that have been widely used in linguistic decision making to analyse if all of them can be considered inside of the CWW paradigm.

...read moreread less

Abstract: It is common that experts involved in complex real-world decision problems use natural language for expressing their knowledge in uncertain frameworks. The language is inherent vague, hence probabilistic decision models are not very suitable in such cases. Therefore, other tools such as fuzzy logic and fuzzy linguistic approaches have been successfully used to model and manage such vagueness. The use of linguistic information implies to operate with such a type of information, i.e. processes of computing with words (CWW). Different schemes have been proposed to deal with those processes, and diverse symbolic linguistic computing models have been introduced to accomplish the linguistic computations. In this paper, we overview the relationship between decision making and CWW, and focus on symbolic linguistic computing models that have been widely used in linguistic decision making to analyse if all of them can be considered inside of the CWW paradigm.

...read moreread less

Journal Article•DOI•

Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World

[...]

Jayant Krishnamurthy¹, Thomas Kollar¹•Institutions (1)

Carnegie Mellon University¹

31 May 2013-Transactions of the Association for Computational Linguistics

TL;DR: This paper introduces Logical Semantics with Perception (LSP), a model for grounded language acquisition that learns to map natural language statements to their referents in a physical environment and finds that LSP outperforms existing, less expressive models that cannot represent relational language.

...read moreread less

Abstract: This paper introduces Logical Semantics with Perception (LSP), a model for grounded language acquisition that learns to map natural language statements to their referents in a physical environment. For example, given an image, LSP can map the statement “blue mug on the table” to the set of image segments showing blue mugs on tables. LSP learns physical representations for both categorical (“blue,” “mug”) and relational (“on”) language, and also learns to compose these representations to produce the referents of entire statements. We further introduce a weakly supervised training procedure that estimates LSP’s parameters using annotated referents for entire statements, without annotated referents for individual words or the parse structure of the statement. We perform experiments on two applications: scene understanding and geographical question answering. We find that LSP outperforms existing, less expressive models that cannot represent relational language. We further find that weakly supervised training is competitive with fully supervised training while requiring significantly less annotation effort.

...read moreread less

Journal Article•DOI•

The Language of Mathematics: Investigating the Ways Language Counts for Children's Mathematical Development.

[...]

Rose K. Vukovic¹, Nonie K. Lesaux²•Institutions (2)

New York University¹, Harvard University²

01 Jun 2013-Journal of Experimental Child Psychology

TL;DR: The results showed that language ability predicts gains in data analysis/probability and geometry, but not in arithmetic or algebra, after controlling for visual-spatial working memory, reading ability, and sex, which suggests that early language experiences are important for later mathematical development regardless of language background.

...read moreread less

Journal Article•DOI•

BioC: a minimalist approach to interoperability for biomedical text processing

[...]

Donald C. Comeau, Rezarta Islamaj Dogan¹, Paolo Ciccarese, Kevin Bretonnel Cohen, Martin Krallinger, Florian Leitner, Zhiyong Lu, Yifan Peng¹, Fabio Rinaldi, Manabu Torii, Alfonso Valencia, Karin Verspoor, Thomas C. Wiegers, Cathy H. Wu, W. John Wilbur - Show less +11 more•Institutions (1)

National Institutes of Health¹

01 Jan 2013-Database

TL;DR: A simple extensible mark-up language format to share text documents and annotations, which allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities is proposed.

...read moreread less

Abstract: A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions. Code and data are available at http://bioc.sourceforge.net/. Database URL: http://bioc.sourceforge.net/

...read moreread less

Patent•

Voice-Based Image Tagging and Searching

[...]

Jan Erik Solem¹, Thijs Willem Stalenhoef¹•Institutions (1)

Apple Inc.¹

25 Jun 2013

TL;DR: In this article, an electronic device with one or more processors and memory provides a digital photograph of a real-world scene and provides a natural language text string corresponding to a speech input associated with the digital photograph.

...read moreread less

Abstract: The electronic device with one or more processors and memory provides a digital photograph of a real-world scene. The electronic device provides a natural language text string corresponding to a speech input associated with the digital photograph. The electronic device performs natural language processing on the text string to identify one or more terms associated with an entity, an activity, or a location. The electronic device tags the digital photograph with the one or more terms and their associated entity, activity, or location.

...read moreread less

Journal Article•DOI•

Languages with More Second Language Learners Tend to Lose Nominal Case

[...]

Christian Bentz¹, Bodo Winter²•Institutions (2)

University of Cambridge¹, University of California, Merced²

01 Jan 2013

TL;DR: The negative association between the number of second language speakers and nominal casecomplexity generalizestodifferentlanguageareas andfamilies and the idea that morphosyntactic complexity is reduced by a high degree of language contact involving adult learners is supported.

...read moreread less

Abstract: Inthispaper,weprovidequantitativeevidenceshowingthatlanguagesspokenbymanysecond languagespeakerstendtohaverelativelysmallnominalcasesystemsornonominalcaseatall In our sample, all languages with more than 50% second language speakers had no nominal case The negative association between the number of second language speakers and nominal casecomplexitygeneralizestodifferentlanguageareasandfamiliesAstherearemanystudies attestingtothedifficultyofacquiringmorphologicalcaseinsecondlanguageacquisition,this result supports the idea that languages adapt to the cognitive constraints of their speakers, as wellastothesociolinguisticnichesoftheirspeakingcommunitiesWediscussourresultswith respecttosociolinguistictypologyandtheLinguisticNicheHypothesis,aswellaswithrespect toqualitativedatafromhistoricallinguisticsAllinall,multiplelinesofevidenceconvergeon the idea that morphosyntactic complexity is reduced by a high degree of language contact involving adult learners

...read moreread less

Journal Article•DOI•

Text: Now in 2D! A Framework for Lexical Expansion with Contextual Similarity

[...]

Chris Biemann¹, Martin Riedl¹•Institutions (1)

Technische Universität Darmstadt¹

22 Jul 2013

TL;DR: A new metaphor of two-dimensional text for data-driven semantic modeling of natural language is proposed, which provides an entirely new angle on the representation of text: not only syntagmatic relations are annotated in the text, but also paradigmatic relations are made explicit by generating lexical expansions.

...read moreread less

Abstract: A new metaphor of two-dimensional text for data-driven semantic modeling of natural language is proposed, which provides an entirely new angle on the representation of text: not only syntagmatic relations are annotated in the text, but also paradigmatic relations are made explicit by generating lexical expansions We operationalize distributional similarity in a general framework for large corpora, and describe a new method to generate similar terms in context Our evaluation shows that distributional similarity is able to produce highquality lexical resources in an unsupervised and knowledge-free way, and that our highly scalable similarity measure yields better scores in a WordNet-based evaluation than previous measures for very large corpora Evaluating on a lexical substitution task, we find that our contextualization method improves over a non-contextualized baseline across all parts of speech, and we show how the metaphor can be applied successfully to part-of-speech tagging A number of ways to extend and improve the contextualization method within our framework are discussed As opposed to comparable approaches, our framework defines a model of lexical expansions in context that can generate the expansions as opposed to ranking a given list, and thus does not require existing lexical-semantic resources

...read moreread less

Book Chapter•DOI•

Language with an Attitude

[...]

Dennis R. Preston

12 Jul 2013

Proceedings Article•

Generating Natural Language Questions to Support Learning On-Line

[...]

David Lindberg¹, Fred Popowich¹, John C. Nesbit¹, Phil Winne•Institutions (1)

Simon Fraser University¹

01 Aug 2013

TL;DR: A sophisticated template-based approach is introduced that incorporates semantic role labels into a system that automatically generates natural language questions to support online learning and indicates it is a promising one for supporting learning.

...read moreread less

Abstract: When instructors prepare learning materials for students, they frequently develop accompanying questions to guide learning. Natural language processing technology can be used to automatically generate such questions but techniques used have not fully leveraged semantic information contained in the learning materials or the full context in which the question generation task occurs. We introduce a sophisticated template-based approach that incorporates semantic role labels into a system that automatically generates natural language questions to support online learning. While we have not yet incorporated the full learning context into our approach, our preliminary evaluation and evaluation methodology indicate our approach is a promising one for supporting learning.

...read moreread less

Journal Article•DOI•

Digital language death

[...]

András Kornai¹•Institutions (1)

Hungarian Academy of Sciences¹

22 Oct 2013-PLOS ONE

TL;DR: It is argued that this consensus figure vastly underestimates the danger of digital language death, in that less than 5% of all languages can still ascend to the digital realm.

...read moreread less

Abstract: Of the approximately 7,000 languages spoken today, some 2,500 are generally considered endangered. Here we argue that this consensus figure vastly underestimates the danger of digital language death, in that less than 5% of all languages can still ascend to the digital realm. We present evidence of a massive die-off caused by the digital divide.

...read moreread less

Book Chapter•DOI•

Free Choice Permission

[...]

Hans Kamp

01 Jan 2013

TL;DR: Permission statements contain occurrences of a certain one-place existential operator which are often concealed in the surface structure; Represent this operator by the letter 'F', and call F the 'focus-operator' as it is its function to move the sub-formula which stands within its scope into focus.

...read moreread less

Abstract: Permission statements contain occurrences of a certain one-place existential operator which are often concealed in the surface structure; Represent this operator by the letter 'F'. And call F the 'focus-operator' as it is its function to 'move' the sub-formula which stands within its scope into focus and thereby subject it to the particular semantic-pragmatic operation which is associated with the mode, or pragmatic function. It has been a tacit assumption behind most formal analyses of natural language that at least all assertions have the same pragmatic function. The semantical analysis of natural language is of fundamentally greater complexity than is imagined by those who believe that describe the semantics of at least the declarative parts of natural languages by theories which are based on no other concepts than those of satisfaction and truth. Keywords:language; permission; pragmatic; semantics

...read moreread less

Collapse