scispace - formally typeset
Search or ask a question

Showing papers on "Natural language published in 2003"


01 Jan 2003
TL;DR: 'Without symbolism the life of man would be like that of the prisoners in the cave of Plato's simile; it could find no access to the "ideal world" which is opened to him from different sides by religion, art, philosophy, science.
Abstract: 'Without symbolism the life of man would be like that of the prisoners in the cave of Plato's simile…confined within the limits of his biological needs and practical interests; it could find no access to the "ideal world" which is opened to him from different sides by religion, art, philosophy, science.' Ernst Cassirer 1 ABSTRACT I begin by outlining some of the positions that have been taken by those who have reflected upon the nature of language. In his early work Wittgenstein asserts that language becomes meaningful when we tacitly adhere to the rules of logic. In his later work he claims that lan- guages become meaningful when they are situated within forms of life. Polanyi describes language as a toolbox for deploying our tacit awareness. A meaning is generated when a point of view attends from a subsidiary to a focal awareness. Languages re-present these meanings. Although all languages rely upon rules, what it is to be a meaning is not reduc- ible to rules. Nor is there a universal grammar. Because it renders abstract reflection possi- ble, language renders minds possible. A mind is not the product of an innate language of thought; it is a consequence of indwelling within a natural language. Indwelling within languages enables us to access new realities. Languages however do not supply us with the boundaries of the world. Not only do we know more than we can say, we can also say more than we know. The ultimate context of our linguistic meanings is not our social practices; it is our embodied awareness of the world. A representationalist account is in accordance with the view that minds are Turing machines. But the symbols processed by a Turing ma- chine derive their meaning from the agents that use them to achieve their purposes. Only if the processing of symbolic representations is related to the tacit context within which they become meaningful, does a semantic engine becomes possible.

4,598 citations


Journal ArticleDOI
01 Jan 2003

1,739 citations


Proceedings ArticleDOI
31 May 2003
TL;DR: This work has shown that conditionally-trained models, such as conditional maximum entropy models, handle inter-dependent features of greedy sequence modeling in NLP well.
Abstract: Models for many natural language tasks benefit from the flexibility to use overlapping, non-independent features. For example, the need for labeled data can be drastically reduced by taking advantage of domain knowledge in the form of word lists, part-of-speech tags, character n-grams, and capitalization patterns. While it is difficult to capture such inter-dependent features with a generative probabilistic model, conditionally-trained models, such as conditional maximum entropy models, handle them well. There has been significant work with such models for greedy sequence modeling in NLP (Ratnaparkhi, 1996; Borthwick et al., 1998).

1,306 citations


Journal ArticleDOI
29 Jan 2003
TL;DR: The improvements, difficulties, and successes that have occured with the synchronous languages since then are discussed.
Abstract: Twelve years ago, Proceedings of the IEEE devoted a special section to the synchronous languages. This paper discusses the improvements, difficulties, and successes that have occured with the synchronous languages since then. Today, synchronous languages have been established as a technology of choice for modeling, specifying, validating, and implementing real-time embedded applications. The paradigm of synchrony has emerged as an engineer-friendly design method based on mathematically sound tools.

927 citations


Book
13 Mar 2003
TL;DR: American Sign Language as a language, Grammar, gesture, and meaning, and an index of illustrated signs.
Abstract: In sign languages of the deaf some signs can meaningfully point toward things or can be meaningfully placed in the space ahead of the signer. This obligatory part of fluent grammatical signing has no parallel in vocally produced languages. This book focuses on American Sign Language to examine the grammatical and conceptual purposes served by these directional signs. It guides the reader through ASL grammar, the different categories of directional signs, the types of spatial representations signs are directed toward, how such spatial conceptions can be represented in mental space theory, and the conceptual purposes served by these signs. The book demonstrates a remarkable integration of grammar and gesture in the service of constructing meaning. These results also suggest that our concept of 'language' has been much too narrow and that a more comprehensive look at vocally produced languages will reveal the same integration of gestural, gradient, and symbolic elements.

647 citations


Proceedings ArticleDOI
04 Jun 2003
TL;DR: A policy language designed for pervasive computing applications that is based on deontic concepts and grounded in a semantic language that demonstrates the feasibility of the policy language in pervasive environments through a prototype used as part of a secure pervasive system.
Abstract: We describe a policy language designed for pervasive computing applications that is based on deontic concepts and grounded in a semantic language. The pervasive computing environments under consideration are those in which people and devices are mobile and use various wireless networking technologies to discover and access services and devices in their vicinity. Such pervasive environments lend themselves to policy-based security due to their extremely dynamic nature. Using policies allows the security functionality to be modified without changing the implementation of the entities involved. However, along with being extremely dynamic, these environments also tend to span several domains and be made up of entities of varied capabilities. A policy language for environments of this sort needs to be very expressive but lightweight and easily extensible. We demonstrate the feasibility of our policy language in pervasive environments through a prototype used as part of a secure pervasive system.

602 citations


Proceedings ArticleDOI
12 Jan 2003
TL;DR: This paper demonstrates a new approach, using large-scale real-world knowledge about the inherent affective nature of everyday situations to classify sentences into "basic" emotion categories, and suggests that the approach is robust enough to enable plausible affective text user interfaces.
Abstract: This paper presents a novel way for assessing the affective qualities of natural language and a scenario for its use. Previous approaches to textual affect sensing have employed keyword spotting, lexical affinity, statistical methods, and hand-crafted models. This paper demonstrates a new approach, using large-scale real-world knowledge about the inherent affective nature of everyday situations (such as "getting into a car accident") to classify sentences into "basic" emotion categories. This commonsense approach has new robustness implications.Open Mind Commonsense was used as a real world corpus of 400,000 facts about the everyday world. Four linguistic models are combined for robustness as a society of commonsense-based affect recognition. These models cooperate and compete to classify the affect of text. Such a system that analyzes affective qualities sentence by sentence is of practical value when people want to evaluate the text they are writing. As such, the system is tested in an email writing application. The results suggest that the approach is robust enough to enable plausible affective text user interfaces.

585 citations


Proceedings ArticleDOI
12 Jan 2003
TL;DR: The Precise NLI is introduced, which reduces the semantic interpretation challenge in NLIs to a graph matching problem and shows that Precise has high coverage and accuracy over common English questions.
Abstract: The need for Natural Language Interfaces (NLIs) to databases has become increasingly acute as more nontechnical people access information through their web browsers, PDAs and cell phones. Yet NLIs are only usable if they map natural language questions to SQL queries correctly. We introduce the Precise NLI [2], which reduces the semantic interpretation challenge in NLIs to a graph matching problem. Precise uses the max-flow algorithm to efficiently solve this problem. Each max-flow solution corresponds to a possible semantic interpretation of the sentence. precise collects max-flow solutions, discards the solutions that do not obey syntactic constraints and retains the rest as the basis for generating SQL queries corresponding to the question q. The syntactic information is extracted from the parse tree corresponding to the given question which is computed by a statistical parser [1]. For a broad, well-defined class of semantically tractable natural language questions, Precise is guaranteed to map each question to the corresponding SQL querySemantically tractable questions correspond to a natural, domain-independent subset of English that can be efficiently and accurately interpreted as nonrecursive Datalog clauses. Precise is transportable to arbitrary databases, such as the Restaurants,Jobs and Geography databases used in our implementation. Examples of semantically tractable questions include: "What Chinese restaurants with a 3.5 rating are in Seattle?", "What are the areas of US states with large populations?", "What jobs require 4 years of experience and desire a B.S.CS degree?".Given a question which is not semantically tractable, Precise recognizes it as such and informs the user that it cannot answer it.Given a semantically tractable question, Precise computes the set of non-equivalent SQL interpretations corresponding to the question. If a unique such SQL interpretation exists, Precise outputs it together with the corresponding result set obtained by querying the current database. If the set contains more than one SQL interpretation, the natural language question is ambiguous in the context of the current database. In this case, Precise asks for the user's help in determining which interpretation is the correct one.Our experiments have shown that Precise has high coverage and accuracy over common English questions. In future work, we plan to explore increasingly broad classes of questions and include Precise as a module in a full-fledged dialog system. An important direction for future work is helping users understand the types of questions Precise cannot handle via dialog, enabling them to build an accurate mental model of the system and its capabilities. Also, our own group's work on the EXACT natural language interface [3] builds on Precise and on the underlying theoretical framework. EXACT composes an extended version of Precise with a sound and complete planner to develop a powerful and provably reliable interface to household appliances

552 citations


Journal ArticleDOI
TL;DR: This paper showed that infants can use statistical properties of linguistic input to discover structure, including sound patterns, words, and the beginnings of grammar, and their abilities appear to be both powerful and constrained, such that some statistical patterns are more readily detected and used than others.
Abstract: What types of mechanisms underlie the acquisition of human language? Recent evidence suggests that learners, including infants, can use statistical properties of linguistic input to discover structure, including sound patterns, words, and the beginnings of grammar. These abilities appear to be both powerful and constrained, such that some statistical patterns are more readily detected and used than others. Implications for the structure of human languages are discussed.

543 citations


Journal ArticleDOI
TL;DR: Increase of activation over time in Broca's area was specific for 'real' language acquisition only, independent of the kind of language.
Abstract: Language acquisition in humans relies on abilities like abstraction and use of syntactic rules, which are absent in other animals. The neural correlate of acquiring new linguistic competence was investigated with two functional magnetic resonance imaging (fMRI) studies. German native speakers learned a sample of 'real' grammatical rules of different languages (Italian or Japanese), which, although parametrically different, follow the universal principles of grammar (UG). Activity during this task was compared with that during a task that involved learning 'unreal' rules of language. 'Unreal' rules were obtained manipulating the original two languages; they used the same lexicon as Italian or Japanese, but were linguistically illegal, as they violated the principles of UG. Increase of activation over time in Broca's area was specific for 'real' language acquisition only, independent of the kind of language. Thus, in Broca's area, biological constraints and language experience interact to enable linguistic competence for a new language.

327 citations



Journal ArticleDOI
TL;DR: Drawing on observations and research from Chinese and Korean, this work examines the universal and writing-specific aspects of reading and considers the implications of the universal language constraint for learning to read.
Abstract: Reading has universal properties that can be seen across the world's writing systems. The most important one is the universal language constraint: All writing systems represent spoken languages, a universal with consequences for reading processes. These consequences are seen most clearly at the broad principle level: the principle that reading universally requires the reader to make links to language at the phonological and morphemic levels. At the same time, the nature of the writing system and the various orthographies that instantiate it do make a difference for important details of the reading process. Drawing on observations and research from Chinese and Korean, I examine these universal and writing-specific aspects of reading. I also consider the implications of the universal language constraint for learning to read.

Book
27 Mar 2003
TL;DR: In this article, Aikhenvald uses the example of Arawak and Tucanoan languages spoken in the large area of the Vaupes river basin in northwest Amazonia, which spans Colombia and Brazil.
Abstract: This book considers how and why forms and meanings of different languages at different times may resemble one another. The author explains the relationship between a real diffusion and the genetic development of languages, and reveals the means of distinguishing what may cause one language to share the characteristics of another. Professor Aikhenvald uses the example of Arawak and Tucanoan languages spoken in the large area of the Vaupes river basin in northwest Amazonia, which spans Colombia and Brazil. In this region language is seen as a badge of identity: language mixing, interaction, and influence are resisted for ideological reasons. The book considers which grammatical categories are most and least likely to be borrowed in a situation of prolonged language contact where lexical borrowing is reduced to a minimum. The author provides a genetic analysis of the languages of the region and considers their historical relationships with languages of the same family outside it. She also examines changes brought about by recent contact with European languages and culture, and the linguistic and cultural effects of being part of a group that is aware of the threat to its language and identity. The book is presented in relatively nontechnical language and will interest linguists and anthropologists.

Patent
24 Nov 2003
TL;DR: In this paper, an autonomous response engine and method that can more successfully mimic a human conversational exchange is presented, which has a statement-response database that is autonomously updated, thus enabling a database of significant size to be easily created and maintained with current information.
Abstract: The present invention is an autonomous response engine and method that can more successfully mimic a human conversational exchange. In an exemplary, preferred embodiment of the invention, the response engine has a statement-response database that is autonomously updated, thus enabling a database of significant size to be easily created and maintained with current information. The response engine autonomously generates natural language responses to natural language queries by following one of several conversation strategies, by choosing at least one context element from a context database and by searching the updated statement-response data base for appropriate matches to the queries.

Book ChapterDOI
16 Feb 2003
TL;DR: This paper provides an introduction to NSP while raising some general issues in Ngram analysis, and summarizes several applications where NSP has been successfully employed.
Abstract: The Ngram Statistics Package (NSP) is a flexible and easy-to-use software tool that supports the identification and analysis of Ngrams, sequences of N tokens in online text. We have designed and implemented NSP to be easy to customize to particular problems and yet remain general enough to serve a broad range of needs. This paper provides an introduction to NSP while raising some general issues in Ngram analysis, and summarizes several applications where NSP has been successfully employed. NSP is written in Perl and is freely available under the GNU Public License.

Journal ArticleDOI
TL;DR: A general biomedical domain-oriented NLP engine called MedScan is presented that efficiently processes sentences from MEDLINE abstracts and produces a set of regularized logical structures representing the meaning of each sentence.
Abstract: Motivation: The importance of extracting biomedical information from scientific publications is well recognized. A number of information extraction systems for the biomedical domain have been reported, but none of them have become widely used in practical applications. Most proposals to date make rather simplistic assumptions about the syntactic aspect of natural language. There is an urgent need for a system that has broad coverage and performs well in real-text applications. Results: We present a general biomedical domain-oriented NLP engine called MedScan that efficiently processes sentences from MEDLINE abstracts and produces a set of regularized logical structures representing the meaning of each sentence. The engine utilizes a specially developed context-free grammar and lexicon. Preliminary evaluation of the system’s performance, accuracy, and coverage exhibited encouraging results. Further approaches for increasing the coverage and reducing parsing ambiguity of the engine, as well as its application for information extraction are discussed. Availability: MedScan is available for commercial licensing from Ariadne Genomics, Inc.

01 Jan 2003
TL;DR: This handbook describes the ambiguity phenomenon from several points of view, including linguistics, software engineering, and the law, and several strategies for avoiding and detecting ambiguities are presented.
Abstract: This handbook is about writing software requirements specifications and legal contracts, two kinds of documents with similar needs for completeness, consistency, and precision. Particularly when these are written, as they usually are, in natural language, ambiguity—by any definition—is a major cause of their not specifying what they should. Simple misuse of the language in which the document is written is one source of these ambiguities. This handbook describes the ambiguity phenomenon from several points of view, including linguistics, software engineering, and the law. Several strategies for avoiding and detecting ambiguities are presented. Strong emphasis is given on the problems arising from the use of heavily used and seemingly unambiguous words and phrases such as “all”, “each”, and “every” in defining or referencing sets; positioning of “only”, “also”, and “even”; precedences of “and” and “or”; “a”, “all”, “any”, “each”, “one”, “some”, and “the” used as quantifiers; “or” and “and/or”; “that” vs. “which”; parallelism; pronouns referring to an idea; multiple adjectives; etc. Many examples from requirements documents and legal documents are examined. While no guide can overcome the careless or indifferent writer, this handbook is offered as a guide both for writing better requirements or contracts and for inspecting them for potential ambiguities.

Journal ArticleDOI
TL;DR: This article used auditory phonetics as a means of classifying, diagnosing, and predicting problems of lexical segmentation in second-language listening, and used simple practice exercises to identify how and why learners find speech input diacult to process.
Abstract: This article calls for greater attention to the perceptual processes involved in second language listening—and particularly to the part they play in breakdowns of understanding. It suggests employing basic auditory phonetics as a means of classifying, diagnosing, and predicting problems of lexical segmentation. Recognition of how and why learners find speech input diacult to process can provide a programme of simple practice exercises which anticipate or rectify listening problems.

Journal ArticleDOI
TL;DR: Functional magnetic resonance imaging results indicate a learning-related change in brain circuitry underlying relational processes of language learning, with a transition from a similarity-based learning system in the medial temporal lobes to a language-related processing system inThe left prefrontal cortex.

Proceedings ArticleDOI
26 Oct 2003
TL;DR: A general picture of HowNet is shown, and its theory and peculiarities are discussed, which have been applied in various areas of NLP, related to both Chinese and English.
Abstract: HowNet is an online common-sense knowledge base unveiling inter-conceptual relations and inter-attribute relations of concepts as connoting in Chinese and English bilingual lexicons Since it was released in 1999, HowNet has become more and more popular and been applied in various areas of NLP, related to both Chinese and English We show a general picture of HowNet, discuss its theory and peculiarities


Journal ArticleDOI
TL;DR: This work describes a system for extracting PGSM interactions from unstructured text using a lexical analyzer and context free grammar, and demonstrates that efficient parsers can be constructed for extracting these relationships from natural language with high rates of recall and precision.
Abstract: Motivation: As research into disease pathology and cellular function continues to generate vast amounts of data pertaining to protein, gene and small molecule (PGSM) interactions, there exists a critical need to capture these results in structured formats allowing for computational analysis. Although many efforts have been made to create databases that store this information in computer readable form, populating these sources largely requires a manual process of interpreting and extracting interaction relationships from the biological research literature. Being able to efficiently and accurately automate the extraction of interactions from unstructured text, would greatly improve the content of these databases and provide a method for managing the continued growth of new literature being published. Results: In this paper, we describe a system for extracting PGSM interactions from unstructured text. By utilizing a lexical analyzer and context free grammar (CFG), we demonstrate that efficient parsers can be constructed for extracting these relationships from natural language with high rates of recall and precision. Our results show that this technique achieved a recall rate of 83.5% and a precision rate of 93.1% for recognizing PGSM names and a recall rate of 63.9% and a precision rate of 70.2% for extracting interactions between these entities. In contrast to other published techniques, the use of a CFG significantly reduces the complexities of natural language processing by focusing on domain specific structure as opposed to analyzing the semantics of a given language. Additionally, our approach provides a level of abstraction for adding new rules for extracting other types of biological relationships beyond PGSM relationships. Availability: The program and corpus are available by request from the authors. Contact: gilder@research.ge.com; jtemkin1@comcast.net

Book
01 Jan 2003
TL;DR: This book develops the mathematical foundations of present day linguistics from a mathematical point of view starting with ideas already contained in Montague's work and equips the reader with all the background necessary to understand and evaluate theories as diverse as Montague Grammar, Categorial grammar, HPSG and GB.
Abstract: This book studies language(s) and linguistic theories from a mathematical point of view. Starting with ideas already contained in Montague's work, it develops the mathematical foundations of present day linguistics. It equips the reader with all the background necessary to understand and evaluate theories as diverse as Montague Grammar, Categorial Grammar, HPSG and GB. The mathematical tools are mainly from universal algebra and logic, but no particular knowledge is presupposed beyond a certain mathematical sophistication that is in any case needed in order to fruitfully work within these theories. The presentation focuses on abstract mathematical structures and their computational properties, but plenty of examples from different natural languages are provided to illustrate the main concepts and results. In contrast to books devoted to so-called formal language theory, languages are seen here as semiotic systems, that is, as systems of signs. A language sign correlates form with meaning. Using the principle of compositionality it is possible to gain substantial insight into the interaction between form and meaning in natural languages.

Patent
Peter F. Garst1
18 Feb 2003
TL;DR: In this article, a new source of information, linguistic models, to improve the accuracy of mathematical recognition is presented. But it is not an extension of linguistic methods to the mathematical domain thereby providing recognition of the artificial language of mathematics in a way analogous to natural language recognition.
Abstract: The present invention provides a new source of information, linguistic models, to improve the accuracy of mathematical recognition. Specifically, the present invention is an extension of linguistic methods to the mathematical domain thereby providing recognition of the artificial language of mathematics in a way analogous to natural language recognition. Parse trees are the basic units of the mathematical language, and a linguistic model for mathematics is a method for assigning a linguistic score to each parse tree. The models are generally created by taking a large body of known text and counting the occurrence of various linguistic events such as word bigrams in that body. The raw counts are modified by smoothing and other algorithms before taking their place as probabilities in the model.

Journal Article
TL;DR: Three merging rules for combining probability distributions are examined: the well known mixture rule, the logarithmic rule, and a novel product rule that were applied with state-of-the-art results to two problems commonly used to assess human mastery of lexical semantics|synonym questions and analogy questions.
Abstract: Existing statistical approaches to natural language problems are very coarse approximations to the true complexity of language processing. As such, no single technique will be best for all problem instances. Many researchers are examining ensemble methods that combine the output of successful, separately developed modules to create more accurate solutions. This paper examines three merging rules for combining probability distributions: the well known mixture rule, the logarithmic rule, and a novel product rule. These rules were applied with state-of-the-art results to two problems commonly used to assess human mastery of lexical semantics|synonym questions and analogy questions. All three merging rules result in ensembles that are more accurate than any of their component modules. The dierences among the three rules are not statistically signicant, but it is suggestive that the popular mixture rule is not the best rule for either of the two problems.

Proceedings ArticleDOI
30 Nov 2003
TL;DR: It is a conventional wisdom in the speech community that better speech recognition accuracy is a good indicator for better spoken language understanding accuracy, but the findings in this work reveal that this is not always the case.
Abstract: It is a conventional wisdom in the speech community that better speech recognition accuracy is a good indicator for better spoken language understanding accuracy, given a fixed understanding component. The findings in this work reveal that this is not always the case. More important than word error rate reduction, the language model for recognition should be trained to match the optimization objective for understanding. In this work, we applied a spoken language understanding model as the language model in speech recognition. The model was obtained with an example-based learning algorithm that optimized the understanding accuracy. Although the speech recognition word error rate is 46% higher than the trigram model, the overall slot understanding error can be reduced by as much as 17%.

Patent
31 Jan 2003
TL;DR: This article proposed a method to improve the recognition of proper nouns by augmenting the pronunciation of each proper noun or name in the natural language of the speech recognition system with at least one "native" pronunciation in another natural language.
Abstract: Recognition of proper nouns by an automated speech recognition system is improved by augmenting the pronunciation of each proper noun or name in the natural language of the speech recognition system with at least one “native” pronunciation in another natural language. To maximize recognition, preferably the pronunciations are predicted based on information not available to the speech recognition system. Prediction of pronunciation may be based on a location derived from a telephone number or postal address associated with the name and the language or dialect spoken in the country or region of that location. The “native” pronunciation(s) may be added to a dictionary of the speech recognition system or directly to the grammar used for recognizing speech.


Patent
30 May 2003
TL;DR: In this article, one or more statistical classifiers are used in conjunction with a rule-based classifier to perform task classification on natural language inputs, such as search queries and natural language queries.
Abstract: The present invention involves using one or more statistical classifiers in order to perform task classification on natural language inputs. In another embodiment, the statistical classifiers can be used in conjunction with a rule-based classifier to perform task classification. In one application, a statistical classifier is used in order ascertain if an input is a search query or a natural-language input.

Journal ArticleDOI
Randi C. Martin1
TL;DR: Much work remains to be done in developing precise theoretical accounts of sentence processing that can accommodate the observed patterns of breakdown, and theoretical developments may provide a means of accommodating the seemingly contradictory findings regarding the neural organization of sentenceprocessing.
Abstract: Earlier formulations of the relation of language and the brain provided oversimplified accounts of the nature of language disorders, classifying patients into syndromes characterized by the disruption of sensory or motor word representations or by the disruption of syntax or semantics. More recent neuropsychological findings, drawn mainly from case studies, provide evidence regarding the various levels of representations and processes involved in single-word and sentence processing. Lesion data and neuroimaging findings are converging to some extent in providing localization of these components of language processing, particularly at the single-word level. Much work remains to be done in developing precise theoretical accounts of sentence processing that can accommodate the observed patterns of breakdown. Such theoretical developments may provide a means of accommodating the seemingly contradictory findings regarding the neural organization of sentence processing.