scispace - formally typeset
Search or ask a question

Showing papers on "Natural language published in 2012"


Proceedings Article
12 Jul 2012
TL;DR: A recursive neural network model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length and can learn the meaning of operators in propositional logic and natural language is introduced.
Abstract: Single-word vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. This matrix-vector RNN can learn the meaning of operators in propositional logic and natural language. The model obtains state of the art performance on three different experiments: predicting fine-grained sentiment distributions of adverb-adjective pairs; classifying sentiment labels of movie reviews and classifying semantic relationships such as cause-effect or topic-message between nouns using the syntactic path between them.

1,400 citations


01 Jan 2012
TL;DR: In this article, the authors investigate whether there is any relationship between language and culture, and if so, what the relationship between languages and culture is, and the results of their study indicate that there is a very close relationship between linguistics and culture.
Abstract: Language, the most commonplace of all human possessions, is possibly the most complex and the most interesting. Since it is an instrument for humans' communications with each other, the growth and development of their talents, causing creativity, innovation, and novelty, exchanging and transferring their experiences, and on the whole, for formation of society(s). Concern with language is not new. From the earliest recorded history, there is evidence that people investigated language. Many of the assumptions, theories and goals of modern linguistics find their origin in past centuries. However, this study aims to investigate whether there is any relationship between language and culture, and if so, what the relationship between language and culture is. To achieve the aims of this study, some of the main theories which can be related to the goal of the paper are introduced and explained. Then, it is followed by a precise discussion. The results of the article indicate that there is a very close relationship between language and culture. That is, culture has a direct effect on language. Language and culture are closely correlated.

846 citations


Posted Content
TL;DR: In this paper, a learning algorithm that takes as input a training set of sentences labeled with expressions in the lambda calculus is presented, along with a log-linear model that represents a distribution over syntactic and semantic analyses conditioned on the input sentence.
Abstract: This paper addresses the problem of mapping natural language sentences to lambda-calculus encodings of their meaning. We describe a learning algorithm that takes as input a training set of sentences labeled with expressions in the lambda calculus. The algorithm induces a grammar for the problem, along with a log-linear model that represents a distribution over syntactic and semantic analyses conditioned on the input sentence. We apply the method to the task of learning natural language interfaces to databases and show that the learned parsers outperform previous methods in two benchmark database domains.

662 citations


Proceedings ArticleDOI
02 Jun 2012
TL;DR: The conjecture that most software is also natural, in the sense that it is created by humans at work, with all the attendant constraints and limitations, and thus, like natural language, it is also likely to be repetitive and predictable is conjecture.
Abstract: Natural languages like English are rich, complex, and powerful. The highly creative and graceful use of languages like English and Tamil, by masters like Shakespeare and Avvaiyar, can certainly delight and inspire. But in practice, given cognitive constraints and the exigencies of daily life, most human utterances are far simpler and much more repetitive and predictable. In fact, these utterances can be very usefully modeled using modern statistical methods. This fact has led to the phenomenal success of statistical approaches to speech recognition, natural language translation, question-answering, and text mining and comprehension. We begin with the conjecture that most software is also natural, in the sense that it is created by humans at work, with all the attendant constraints and limitations — and thus, like natural language, it is also likely to be repetitive and predictable. We then proceed to ask whether a) code can be usefully modeled by statistical language models and b) such models can be leveraged to support software engineers. Using the widely adopted n-gram model, we provide empirical evidence supportive of a positive answer to both these questions. We show that code is also very repetitive, and in fact even more so than natural languages. As an example use of the model, we have developed a simple code completion engine for Java that, despite its simplicity, already improves Eclipse's built-in completion capability. We conclude the paper by laying out a vision for future research in this area.

642 citations


Patent
28 Sep 2012
TL;DR: In this article, a virtual assistant uses context information to supplement natural language or gestural input from a user, which helps to clarify the user's intent and reduce the number of candidate interpretations of user's input, and reduces the need for the user to provide excessive clarification input.
Abstract: A virtual assistant uses context information to supplement natural language or gestural input from a user. Context helps to clarify the user's intent and to reduce the number of candidate interpretations of the user's input, and reduces the need for the user to provide excessive clarification input. Context can include any available information that is usable by the assistant to supplement explicit user input to constrain an information-processing problem and/or to personalize results. Context can be used to constrain solutions during various phases of processing, including, for example, speech recognition, natural language processing, task flow processing, and dialog generation.

593 citations


Proceedings Article
01 May 2012
TL;DR: The latest iteration of ConceptNet 5 is presented, including its fundamental design decisions, ways to use it, and evaluations of its coverage and accuracy.
Abstract: ConceptNet is a knowledge representation project, providing a large semantic graph that describes general human knowledge and how it is expressed in natural language. This paper presents the latest iteration, ConceptNet 5, including its fundamental design decisions, ways to use it, and evaluations of its coverage and accuracy.

475 citations


Proceedings Article
01 Dec 2012
TL;DR: This work induces distributed representations for a pair of languages jointly and shows that these representations are informative by using them for crosslingual document classification, where classifiers trained on these representations substantially outperform strong baselines when applied to a new language.
Abstract: Distributed representations of words have proven extremely useful in numerous natural language processing tasks. Their appeal is that they can help alleviate data sparsity problems common to supervised learning. Methods for inducing these representations require only unlabeled language data, which are plentiful for many natural languages. In this work, we induce distributed representations for a pair of languages jointly. We treat it as a multitask learning problem where each task corresponds to a single word, and task relatedness is derived from co-occurrence statistics in bilingual parallel data. These representations can be used for a number of crosslingual learning tasks, where a learner can be trained on annotations present in one language and applied to test data in another. We show that our representations are informative by using them for crosslingual document classification, where classifiers trained on these representations substantially outperform strong baselines (e.g. machine translation) when applied to a new language.

432 citations


Proceedings Article
08 Jul 2012
TL;DR: A holistic data-driven approach to image description generation, exploiting the vast amount of (noisy) parallel image data and associated natural language descriptions available on the web to generate novel descriptions for query images.
Abstract: We present a holistic data-driven approach to image description generation, exploiting the vast amount of (noisy) parallel image data and associated natural language descriptions available on the web. More specifically, given a query image, we retrieve existing human-composed phrases used to describe visually similar images, then selectively combine those phrases to generate a novel description for the query image. We cast the generation process as constraint optimization problems, collectively incorporating multiple interconnected aspects of language composition for content planning, surface realization and discourse structure. Evaluation by human annotators indicates that our final system generates more semantically correct and linguistically appealing descriptions than two nontrivial baselines.

353 citations


Proceedings Article
21 Mar 2012
TL;DR: This work proposes a method that learns to assign MRs to a wide range of text thanks to a training scheme that combines learning from knowledge bases with learning from raw text.
Abstract: Open-text semantic parsers are designed to interpret any statement in natural language by inferring a corresponding meaning representation (MR – a formal representation of its sense). Unfortunately, large scale systems cannot be easily machine-learned due to a lack of directly supervised data. We propose a method that learns to assign MRs to a wide range of text (using a dictionary of more than 70,000 words mapped to more than 40,000 entities) thanks to a training scheme that combines learning from knowledge bases (e.g. WordNet) with learning from raw text. The model jointly learns representations of words, entities and MRs via a multi-task training process operating on these diverse sources of data. Hence, the system ends up providing methods for knowledge acquisition and wordsense disambiguation within the context of semantic parsing in a single elegant framework. Experiments on these various tasks indicate the promise of the approach.

350 citations


PatentDOI
TL;DR: In this paper, a natural language query system and a method for processing and analyzing multimodal-originated queries, including voice and proximity-based queries, is presented, which includes a Web-enabled device including a speech input module for receiving a voice-based query in natural language form from a user and a location/proximity module for decoding location/position information from a location-proximation device.
Abstract: The disclosure provides a natural language query system and method for processing and analyzing multimodally-originated queries, including voice and proximity-based queries. The natural language query system includes a Web-enabled device including a speech input module for receiving a voice-based query in natural language form from a user and a location/proximity module for receiving location/proximity information from a location/proximity device. The query system also includes a speech conversion module for converting the voice-based query in natural language form to text in natural language form and a natural language processing module for converting the text in natural language form to text in searchable form. The query system further includes a semantic engine module for converting the text in searchable form to a formal database query and a database-look-up module for using the formal database query to obtain a result related to the voice-based query in natural language form from a database.

303 citations


Proceedings ArticleDOI
16 Apr 2012
TL;DR: The authors showed that conversational behavior can reveal power relationships in two very different settings: discussions among Wikipedians and arguments before the U. S. Supreme Court, and proposed an analysis framework based on linguistic coordination that can be used to shed light on power relationships.
Abstract: Understanding social interaction within groups is key to analyzing online communities. Most current work focuses on structural properties: who talks to whom, and how such interactions form larger network structures. The interactions themselves, however, generally take place in the form of natural language --- either spoken or written --- and one could reasonably suppose that signals manifested in language might also provide information about roles, status, and other aspects of the group's dynamics. To date, however, finding domain-independent language-based signals has been a challenge.Here, we show that in group discussions, power differentials between participants are subtly revealed by how much one individual immediately echoes the linguistic style of the person they are responding to. Starting from this observation, we propose an analysis framework based on linguistic coordination that can be used to shed light on power relationships and that works consistently across multiple types of power --- including a more "static" form of power based on status differences, and a more "situational" form of power in which one individual experiences a type of dependence on another. Using this framework, we study how conversational behavior can reveal power relationships in two very different settings: discussions among Wikipedians and arguments before the U. S. Supreme Court.

Posted Content
TL;DR: This paper presented an approach for joint learning of language and perception models for grounded attribute induction using a probabilistic categorial grammar that enables the construction of rich, compositional meaning representations.
Abstract: As robots become more ubiquitous and capable, it becomes ever more important to enable untrained users to easily interact with them. Recently, this has led to study of the language grounding problem, where the goal is to extract representations of the meanings of natural language tied to perception and actuation in the physical world. In this paper, we present an approach for joint learning of language and perception models for grounded attribute induction. Our perception model includes attribute classifiers, for example to detect object color and shape, and the language model is based on a probabilistic categorial grammar that enables the construction of rich, compositional meaning representations. The approach is evaluated on the task of interpreting sentences that describe sets of objects in a physical workspace. We demonstrate accurate task performance and effective latent-variable concept induction in physical grounded scenes.

Journal ArticleDOI
TL;DR: The authors investigated the relationship between statistical learning and language using a within-subject design embedded in an individual-differences framework and found that performance on the two statistical learning tasks was the only predictor for comprehending relevant types of natural language sentences.
Abstract: Although statistical learning and language have been assumed to be intertwined, this theoretical presupposition has rarely been tested empirically The present study investigates the relationship between statistical learning and language using a within-subject design embedded in an individual-differences framework Participants were administered separate statistical learning tasks involving adjacent and nonadjacent dependencies, along with a language comprehension task and a battery of other measures assessing verbal working memory, short-term memory, vocabulary, reading experience, cognitive motivation, and fluid intelligence Strong interrelationships were found among statistical learning, verbal working memory, and language comprehension However, when the effects of all other factors were controlled for, performance on the two statistical learning tasks was the only predictor for comprehending relevant types of natural language sentences

Proceedings Article
12 Jul 2012
TL;DR: This paper presents a methodology for translating natural language questions into structured SPARQL queries over linked-data sources, based on an integer linear program to solve several disambiguation tasks jointly.
Abstract: The Linked Data initiative comprises structured databases in the Semantic-Web data model RDF. Exploring this heterogeneous data by structured query languages is tedious and error-prone even for skilled users. To ease the task, this paper presents a methodology for translating natural language questions into structured SPARQL queries over linked-data sources. Our method is based on an integer linear program to solve several disambiguation tasks jointly: the segmentation of questions into phrases; the mapping of phrases to semantic entities, classes, and relations; and the construction of SPARQL triple patterns. Our solution harnesses the rich type system provided by knowledge bases in the web of linked data, to constrain our semantic-coherence objective function. We present experiments on both the question translation and the resulting query answering.

Journal ArticleDOI
TL;DR: An alternative to speech-exclusive approaches to language acquisition exists in the use of sign languages such as ASL, where acquiring a sign language is subject to the same time constraints of spoken language development.
Abstract: Children acquire language without instruction as long as they are regularly and meaningfully engaged with an accessible human language. Today, 80% of children born deaf in the developed world are implanted with cochlear devices that allow some of them access to sound in their early years, which helps them to develop speech. However, because of brain plasticity changes during early childhood, children who have not acquired a first language in the early years might never be completely fluent in any language. If they miss this critical period for exposure to a natural language, their subsequent development of the cognitive activities that rely on a solid first language might be underdeveloped, such as literacy, memory organization, and number manipulation. An alternative to speech-exclusive approaches to language acquisition exists in the use of sign languages such as American Sign Language (ASL), where acquiring a sign language is subject to the same time constraints of spoken language development. Unfortunately, so far, these alternatives are caught up in an "either - or" dilemma, leading to a highly polarized conflict about which system families should choose for their children, with little tolerance for alternatives by either side of the debate and widespread misinformation about the evidence and implications for or against either approach. The success rate with cochlear implants is highly variable. This issue is still debated, and as far as we know, there are no reliable predictors for success with implants. Yet families are often advised not to expose their child to sign language. Here absolute positions based on ideology create pressures for parents that might jeopardize the real developmental needs of deaf children. What we do know is that cochlear implants do not offer accessible language to many deaf children. By the time it is clear that the deaf child is not acquiring spoken language with cochlear devices, it might already be past the critical period, and the child runs the risk of becoming linguistically deprived. Linguistic deprivation constitutes multiple personal harms as well as harms to society (in terms of costs to our medical systems and in loss of potential productive societal participation).

Book
12 Jul 2012
TL;DR: The Computing with Words (CWW) system as discussed by the authors is a system of computation in which the objects of computation are predominantly words, phrases and propositions drawn from a natural language based on fuzzy logic.
Abstract: In essence, Computing with Words (CWW) is a system of computation in which the objects of computation are predominantly words, phrases and propositions drawn from a natural language CWW is based on fuzzy logic In science there is a deep-seated tradition of according muchmore respect to numbers than to words In a fundamental way, CWW is a challenge to this tradition What is not widely recognized is that, today, words are used in place of numbers in a wide variety of applications ranging from digital cameras and household appliancesto fraud detection systems, biomedicalinstrumentation and subway trains CWW offersa unique capabilitythe capability to precisiate natural language Unprecisiated (raw) natural language cannot be computed with A key concept which underlies precisiation of meaning is that of the meaning postulate: A proposition, p, is a restriction on the values which a variable, Xa variable which is implicit in pis allowed to take CWW has an important ramification for mathematics Addition of the formalism of CWW to mathematics empowers mathematics to construct mathematical solutions of computational problems which are stated in a natural language Traditional mathematics does not have this capability

Patent
Sameer Badaskar1
19 Nov 2012
TL;DR: In this article, methods and systems for searching for media items using a voice-based digital assistant are described, where text strings correspond to speech inputs input by a user into an electronic device.
Abstract: Methods and systems for searching for media items using a voice-based digital assistant are described. Natural language text strings corresponding to search queries are provided. The search queries include query terms. The text strings may correspond to speech inputs input by a user into an electronic device. At least one information source is searched to identify at least one parameter associated with at least one of the query terms. The parameters include at least one of a time parameter, a date parameter, or a geo-code parameter. The parameters are compared to tags of media items to identify matches. In some implementations, media items whose tags match the parameter are presented to the user.

Proceedings ArticleDOI
26 Jun 2012
TL;DR: This work presents an approach for joint learning of language and perception models for grounded attribute induction, which includes a language model based on a probabilistic categorial grammar that enables the construction of compositional meaning representations.
Abstract: As robots become more ubiquitous and capable, it becomes ever more important for untrained users to easily interact with them. Recently, this has led to study of the language grounding problem, where the goal is to extract representations of the meanings of natural language tied to the physical world. We present an approach for joint learning of language and perception models for grounded attribute induction. The perception model includes classifiers for physical characteristics and a language model based on a probabilistic categorial grammar that enables the construction of compositional meaning representations. We evaluate on the task of interpreting sentences that describe sets of objects in a physical workspace, and demonstrate accurate task performance and effective latent-variable concept induction in physical grounded scenes.

Journal ArticleDOI
TL;DR: It is found that learners restructure such languages in ways that facilitate efficient information transfer compared with the input language, supporting the hypothesis that some of the structural similarities found in natural languages are shaped by biases toward communicatively efficient linguistic systems.
Abstract: Languages of the world display many structural similarities. We test the hypothesis that some of these structural properties may arise from biases operating during language acquisition that shape languages over time. Specifically, we investigate whether language learners are biased toward linguistic systems that strike an efficient balance between robust information transfer, on the one hand, and effort or resource demands, on the other hand, thereby increasing the communicative utility of the acquired language. In two experiments, we expose learners to miniature artificial languages designed in such a way that they do not use their formal devices (case marking) efficiently to facilitate robust information transfer. We find that learners restructure such languages in ways that facilitate efficient information transfer compared with the input language. These systematic changes introduced by the learners follow typologically frequent patterns, supporting the hypothesis that some of the structural similarities found in natural languages are shaped by biases toward communicatively efficient linguistic systems.

Book
11 Oct 2012
TL;DR: This example-driven book walks you through the annotation cycle, from selecting an annotation task and creating the annotation specification to designing the guidelines, creating a "gold standard" corpus, and then beginning the actual data creation with the annotation process.
Abstract: Create your own natural language training corpus for machine learning. This example-driven book walks you through the annotation cycle, from selecting an annotation task and creating the annotation specification to designing the guidelines, creating a "gold standard" corpus, and then beginning the actual data creation with the annotation process. Systems exist for analyzing existing corpora, but making a new corpus can be extremely complex. To help you build a foundation for your own machine learning goals, this easy-to-use guide includes case studies that demonstrate four different annotation tasks in detail. You'll also learn how to use a lightweight software package for annotating texts and adjudicating the annotations. This book is a perfect companion to O'Reilly's Natural Language Processing with Python, which describes how to use existing corpora with the Natural Language Toolkit.

Proceedings ArticleDOI
02 Jun 2012
TL;DR: This work proposes a novel approach to infer formal specifications from natural language text of API documents and shows that this approach has an average 83% accuracy in inferring specifications from over 1600 sentences describing code contracts.
Abstract: Application Programming Interface (API) documents are a typical way of describing legal usage of reusable software libraries, thus facilitating software reuse. However, even with such documents, developers often overlook some documents and build software systems that are inconsistent with the legal usage of those libraries. Existing software verification tools require formal specifications (such as code contracts), and therefore cannot directly verify the legal usage described in natural language text of API documents against the code using that library. However, in practice, most libraries do not come with formal specifications, thus hindering tool-based verification. To address this issue, we propose a novel approach to infer formal specifications from natural language text of API documents. Our evaluation results show that our approach achieves an average of 92% precision and 93% recall in identifying sentences that describe code contracts from more than 2500 sentences of API documents. Furthermore, our results show that our approach has an average 83% accuracy in inferring specifications from over 1600 sentences describing code contracts.

Proceedings Article
07 Jun 2012
TL;DR: A novel, optimal semantic similarity approach based on word-to-word similarity metrics to solve the important task of assessing natural language student input in dialogue-based intelligent tutoring systems.
Abstract: We present in this paper a novel, optimal semantic similarity approach based on word-to-word similarity metrics to solve the important task of assessing natural language student input in dialogue-based intelligent tutoring systems. The optimal matching is guaranteed using the sailor assignment problem, also known as the job assignment problem, a well-known combinatorial optimization problem. We compare the optimal matching method with a greedy method as well as with a baseline method on data sets from two intelligent tutoring systems, AutoTutor and iSTART.


Reference EntryDOI
05 Nov 2012
TL;DR: The phrase crosslinguistic influence (CLI) is roughly synonymous with other terms, most notably language transfer and interference, in that all refer to the influence of one language upon another, most typically in cases of second language acquisition (SLA).
Abstract: The phrase crosslinguistic influence (CLI) is roughly synonymous with other terms, most notably language transfer and interference, in that all refer to the influence of one language upon another, most typically in cases of second language acquisition (SLA). These terms continue to be used widely, but in each case the expression is really a cover term for a wide range of phenomena. For that reason, secondary terms are often used, such as positive transfer to refer to the facilitating effects of one language in acquiring another (e.g., of Spanish vocabulary in acquiring French) and negative transfer to refer to divergences due to some differences between the target language and a source language (most typically the native language of the learner). Still another cover term often used is substrate influence, but this is found mainly in historical or sociolinguistic studies of language contact, such as work on the influence of certain African languages on the development of creoles in Surinam (Migge, 2003). Keywords: constraint; inference; transfer


Journal ArticleDOI
TL;DR: This paper summarises the key developmental milestones of language development in the preschool years, providing a backdrop for understanding difficulties with language learning.
Abstract: Most young children make significant progress in learning language during the first 4 years of life. Delays or differences in patterns of language acquisition are sensitive indicators of developmental problems. The dynamic, complex nature of language and the variability in the timing of its acquisition poses a number of challenges for the assessment of young children. This paper summarises the key developmental milestones of language development in the preschool years, providing a backdrop for understanding difficulties with language learning. Children with specific language impairment (SLI) are characterised illustrating the types of language difficulties they exhibit. Genetic evidence for language impairment suggests complex interactions among multiple genes of small effect. There are few consistent neurobiological abnormalities and currently there is no identified neurobiological signature for language difficulties. The assessment of young children’s language skills thus focuses on the evaluation of their performances in comparison to typically developing peers. Assessment of language abilities in preschool children should involve an evaluation of both expressive and receptive skills and should include an evaluation of more than one dimension of language. The use of a single measure of a language component, such as vocabulary, is considered inadequate for determining whether preschool children have typical language or language impairment. Available evidence supports the inclusion of measures of phonological short-term memory in the assessment of the language abilities of preschool children. Further study of genetic, neurobiological and early behavioural correlates of language impairments in preschool children is needed.

Patent
06 Dec 2012
TL;DR: In this paper, a natural language authoring system that organizes technical, financial, legal and market information into Point of View specific analytical, visual and narrative decision support content is presented.
Abstract: A natural language authoring system that organizes technical, financial, legal and market information into Point of View specific analytical, visual and narrative decision-support content. The expert system transforms a user's point of view into a tailored narrative and/or visualization report. Expert rules embed interactive advertising, such as affiliate URL links, into analytical, visual and narrative and statistical content. The rules may be modified by one or more users, thereby capturing knowledge as the rules are utilized by users of the system.

Journal ArticleDOI
TL;DR: It is shown that processing the syntax of language elicits the known substrate of linguistic competence, whereas algebraic operations recruit bilateral parietal brain regions previously implicated in the representation of magnitude, arguing against the view that language provides the structure of thought across all cognitive domains.
Abstract: A central question in cognitive science is whether natural language provides combinatorial operations that are essential to diverse domains of thought. In the study reported here, we addressed this issue by examining the role of linguistic mechanisms in forging the hierarchical structures of algebra. In a 3-T functional MRI experiment, we showed that processing of the syntax-like operations of algebra does not rely on the neural mechanisms of natural language. Our findings indicate that processing the syntax of language elicits the known substrate of linguistic competence, whereas algebraic operations recruit bilateral parietal brain regions previously implicated in the representation of magnitude. This double dissociation argues against the view that language provides the structure of thought across all cognitive domains.

Journal ArticleDOI
TL;DR: This paper defined the ability to engage in social, semiotic, and cognitive practices consistent with those of content experts as a process of braiding three language strands of everyday language, abstract language, and metaphoric language.
Abstract: Disciplinary literacy is defined here as the ability to engage in social, semiotic, and cognitive practices consistent with those of content experts. Characterizing literacy development as a process of braiding 3 language strands of everyday language, abstract language, and metaphoric language, this

Book
24 Apr 2012
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
Abstract: Some people may be laughing when looking at you reading in your spare time. Some may be admired of you. And some may want be like you who have reading hobby. What about your own feel? Have you felt right? Reading is a need and a hobby at once. This condition is the on that will make you feel that you must read. If you know are looking for the book enPDFd language interpretation and communication as the choice of reading, you can find here.