scispace - formally typeset
Search or ask a question

Showing papers on "Natural language published in 2000"


Book
01 Jan 2000
TL;DR: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora, to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation.
Abstract: From the Publisher: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora.Methodology boxes are included in each chapter. Each chapter is built around one or more worked examples to demonstrate the main idea of the chapter. Covers the fundamental algorithms of various fields, whether originally proposed for spoken or written language to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation. Emphasis on web and other practical applications. Emphasis on scientific evaluation. Useful as a reference for professionals in any of the areas of speech and language processing.

3,794 citations


Book
01 Jan 2000
TL;DR: The book covers topics in formal linguistics, intonational phonology, computational linguistics and experimental psycholinguistics, presenting them as an integrated theory of the language faculty in a form accessible to readers from any of those fields.
Abstract: In this book Mark Steedman argues that the surface syntax of natural languages maps spoken and written forms directly to a compositional semantic representation that includes predicate-argument structure, quantification and information structure without constructing any intervening structural representation. His purpose is to construct a principle theory of natural grammar that is directly compatible with both expalantory linguistic accounts of a number of problematic syntactic phenomena and a straightforward computational account of the way sentences are mapped onto represenations of meaning. The radical nature of Steedman's proposal stems from his claim that much of the apparent complexity of syntax, prosody and processing follows from the lexical specification of the grammar and from the involvement of a small number of universal rule-types for combining predicates and arguments. These syntactic operations are related to the combinators of Combinatory Logic, engendering a much freer definition of derivational constituency than is traditionally assumed. This property allows Combinatory Categorical Grammar to capture elegantly the structure and interpretation of coordination and intonation contour in English as well as some well-known interactions between word order, coordination and relativization across a number of other languages. It also allows more direct compatibility with incremental semantic interpretation during parsing. The book covers topics in formal linguistics, intonational phonology, computational linguistics and experimental psycholinguistics, presenting them as an integrated theory of the language faculty in a form accessible to readers from any of those fields.

1,489 citations


Book
01 Jan 2000
TL;DR: The authors argued that knowledge of language is internal to the human mind and that a proper study of language must deal with this mental construct, therefore, human language is a 'biological object' and should be analyzed using the methodology of the sciences.
Abstract: This book is an outstanding contribution to the philosophical study of language and mind, by one of the most influential thinkers of our time. In a series of penetrating essays, Chomsky cuts through the confusion and prejudice which has infected the study of language and mind, bringing new solutions to traditional philosophical puzzles and fresh perspectives on issues of general interest, ranging from the mind-body problem to the unification of science. Using a range of imaginative and deceptively simple linguistic analyses, Chomsky defends the view that knowledge of language is internal to the human mind. He argues that a proper study of language must deal with this mental construct. According to Chomsky, therefore, human language is a 'biological object' and should be analyzed using the methodology of the sciences. His examples and analyses come together in this book to give a unique and compelling perspective on language and the mind.

977 citations


Book
01 Jan 2000
TL;DR: An evolutionary model of language change some theories oflanguage change in an evolutionary framework a theory of language meaning in use form-function re-analysis interference, intraference and grammaticalization selection (propagation) of innovations in language change the descent of languages towards an evolutionary linguistics.
Abstract: An evolutionary model of language change some theories of language change in an evolutionary framework a theory of language meaning in use form-function re-analysis interference, intraference and grammaticalization selection (propagation) of innovations in language change the descent of languages towards an evolutionary linguistics.

820 citations



Journal ArticleDOI
01 Aug 2000
TL;DR: A Bayesian approach to integration of linguistic theories with data is argued for inStatistical language models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies.
Abstract: Statistical language models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them, point to a few promising directions, and argue for a Bayesian approach to integration of linguistic theories with data.

734 citations


Patent
15 Sep 2000
TL;DR: A natural language information querying system includes an indexing facility configured to automatically generate indices of updated textual sources based on one or more predefined grammars and a database coupled to the indexing facilities to store the indices for subsequent searching as discussed by the authors.
Abstract: A natural language information querying system includes an indexing facility configured to automatically generate indices of updated textual sources based on one or more predefined grammars and a database coupled to the indexing facility to store the indices for subsequent searching.

586 citations


Patent
27 May 2000
TL;DR: In this article, a phrase-based modeling of generic structures of verbal interaction is proposed for the purpose of automating part of the design of grammar networks, which can regulate, control, and define the content and scope of human-machine interaction in natural language voice user interfaces.
Abstract: The invention enables creation of grammar networks that can regulate, control, and define the content and scope of human-machine interaction in natural language voice user interfaces (NLVUI). More specifically, the invention concerns a phrase-based modeling of generic structures of verbal interaction and use of these models for the purpose of automating part of the design of such grammar networks.

540 citations



Journal ArticleDOI
TL;DR: It is hypothesized that the neural tissue involved in language processing may not be prespecified exclusively by sensory modality but may entail polymodal neural tissue that has evolved unique sensitivity to aspects of the patterning of natural language.
Abstract: For more than a century we have understood that our brain's left hemisphere is the primary site for processing language, yet why this is so has remained more elusive. Using positron emission tomography, we report cerebral blood flow activity in profoundly deaf signers processing specific aspects of sign language in key brain sites widely assumed to be unimodal speech or sound processing areas: the left inferior frontal cortex when signers produced meaningful signs, and the planum temporale bilaterally when they viewed signs or meaningless parts of signs (sign-phonetic and syllabic units). Contrary to prevailing wisdom, the planum temporale may not be exclusively dedicated to processing speech sounds, but may be specialized for processing more abstract properties essential to language that can engage multiple modalities. We hypothesize that the neural tissue involved in language processing may not be prespecified exclusively by sensory modality (such as sound) but may entail polymodal neural tissue that has evolved unique sensitivity to aspects of the patterning of natural language. Such neural specialization for aspects of language patterning appears to be neurally unmodifiable in so far as languages with radically different sensory modalities such as speech and sign are processed at similar brain sites, while, at the same time, the neural pathways for expressing and perceiving natural language appear to be neurally highly modifiable.

463 citations


Journal ArticleDOI
TL;DR: This paper reviewed the research carried out over approximately the past five years on second language learning as a mediated process and explored some of the original topics in greater depth and importantly, it has moved into new areas not previously studied.
Abstract: This article reviews the research carried out over approximately the past five years on second language learning as a mediated process. Lantolf and Pavlenko (1995) surveyed twenty-five studies carried out from the early 1980s to the mid 1990s on mediated second language (L2) learning. Since the publication of their article only five years ago, more than forty new studies have appeared on mediated learning. As this research has become more robust, it has explored some of the original topics in greater depth and importantly, it has moved into new areas not previously studied. For example, current work continues to seek to better understand how L2 learning is mediated in the Zone of Proximal Development, a topic of earlier work, but it is now looking more closely at peer rather than expert-novice scaffolding in the ZPD. Research is also studying how experts scaffold novices in concrete classroom situations where concern is not with the ZPD itself, as was the case with the original work, but with how individuals, unaware of such a construct, go about providing and appropriating help in order to learn.While some of the earlier research focused on the role of private speech in carrying out tasks in a second language, more recent research is concerned with the role of private speech in appropriating a second language. A new area of interest that has opened up within the past two or three years deals with the processes through which language mediates the formation of new identities among L2 learners.


Journal ArticleDOI
TL;DR: It will be argued that this framework captures the essence of the Gricean maxims and gives a precise explication of Atlas & Levinson's (1981) idea of balancing between informativeness and efficiency in natural language processing.
Abstract: In a series of papers, Petra Hendriks, Helen de Hoop, and Henriette de Swart Lave applied optimality theory (OT) to semantics. These authors argue that there is a fundamental difference between the form of OT as used in syntax on the one hand and its form as used in semantics on the other hand. Whereas in the first case OT takes the point of view of the speaker, in the second case the point of view of the hearer is taken. The aim of this paper is to argue that the proper treatment of OT in natural language interpretation has to take both perspectives at the same time. A conceptual framework is established that realizes the integration of both perspectives. It will be argued that this framework captures the essence of the Gricean maxims and gives a precise explication of Atlas & Levinson's (1981) idea of balancing between informativeness and efficiency in natural language processing. The ideas are then applied to resolve some puzzles in natural language interpretation.

Journal ArticleDOI
TL;DR: Results are consistent with the view that language switching is a part of a general executive attentional system and that languages are represented in overlapping areas of the brain in early bilinguals.

Patent
19 Oct 2000
TL;DR: In this article, a natural language interface control system for operating a plurality of devices (114) consisting of a first microphone array (108), a feature extraction module (202) coupled to the first microphone arrays, and a speech recognition module (204) coupled with the feature extraction modules, wherein the speech recognition model utilizes hidden Markov models.
Abstract: A natural language interface control system (206) for operating a plurality of devices (114) consists of a first microphone array (108), a feature extraction module (202) coupled to the first microphone array, and a speech recognition module (204) coupled to the feature extraction module, wherein the speech recognition module utilizes hidden Markov models. The system also comprises a natural language interface module (222) coupled to the speech recognition module (204) and a device interface (210) coupled to the natural language interface module (222), wherein the natural language interface module is for operating a plurality of devices coupled to the device interface based upon non-prompted, open-ended natural language requests from a user.

Journal ArticleDOI
TL;DR: An attempt at using the syntactic structure in natural language for improved language models for speech recognition using an original probabilistic parameterization of a shift-reduce parser.

Patent
02 Feb 2000
TL;DR: In this paper, the authors describe a system, method, and program product that are used in interactive natural language dialogs, where one or more presentation managers operating on a computer system present information from the computer system to users over network interfaces (e.g., speech, typed in text, pointing devices).
Abstract: This patent describes a novel system, method, and program product that are used in interactive natural language dialog. One or more presentation managers operating on a computer system present information from the computer system to one or more users over network interface(s) and accept queries from the users using one or more known input/output modalities (e.g. Speech, typed in text, pointing devices, etc.). A natural language parser parses one or more natural language phrases received over one or more of the network interfaces by one or more of the presentation managers into one or more logical forms (parsed user input), each logical form having a grammatical and structural organization. A dialog manager module maintains and directs interactive sessions between each of the users and the computer system. The dialog manager receives logical forms from one or more of the presentation managers and sends these to a taxonomical mapping process which matches the items of interest to the user against the content organization in the content database to match business categories and sends modified logical forms back to the dialog manager.

Patent
10 Nov 2000
TL;DR: In this article, a real-time system (100) incorporating speech recognition and linguistic processing for recognizing a spoken query by a user and distributed between client (150) and server (180), is disclosed.
Abstract: A real-time system (100) incorporating speech recognition and linguistic processing for recognizing a spoken query by a user and distributed between client (150) and server (180), is disclosed. The system (100) accepts user's queries in the form of speech at the client (150) where minimal processing extracts a sufficient number of acoustic speech vectors representing the utterance. These vectors are sent via a communications channel (160A) to the server (180) where additional acoustic vectors are derived. Using Hidden Markov Models (HMMs), and appropriate grammars and dictionaries conditioned by the selections made by the user, the speech representing the user's query is fully decoded into text (or some other suitable form) at the server (180). The text corresponding to the user's query is then simultaneously sent to a natural language engine (190) and a database processor (186) where optimized SQL statements are constructed for a full-text search from a database (188) for a record set of several stored questions that best matches the user's query. Further processing in the natural language engine (190) narrows the search to a single stored question. The answer corresponding to this single stored question is next retrieved from the file path and sent to the client (150) in compressed form. At the client (150), the answer to the user's query is articulated to the user using a text-to-speech engine (159) in his or her native natural language. The system (100) requires no training and can operate in several natural languages.

Book
22 Dec 2000
TL;DR: A Syntactic Model of Interpretation of Natural Language as a Formal Language?
Abstract: 1.Towards a Syntactic Model of Interpretation. Natural Language as a Formal Language?. Underspecification in Language Processing. The Representational Theory of Mind. Pronominal Anaphora: Semantic Problems. The Problem of Multiple Ambiguity. The Problem of Uniqueness. The Problem of Indirect Reference. Quantification. Syntactic Processes of Anaphora. The Anaphora Solution ---- Towards a Representational Account. 2. The General Framework. A Preliminary Sketch. The Data Structures of the Parsing Model. Atomic Formulae. Tree Modalities. Basic Tree Structures. Partial Tree Structures. Requirements. Descriptions of Tree Structures. 3. The Dynamics of Tree Building. The Parsing Process -- A Sketch. A Basic Example. A Left--Dislocation Example. Verb--final Languages and the Grammar--parser Problem. The Parsing Process Defined. Computational Rules. Lexical Transitions. Pragmatic Actions and Lexical Constraints. Summary. 4. Linked Tree Structures. Relative Clauses ---- Preliminaries. The LINK Relation. The Data Reviewed. The Analysis ---- A Sketch for English. Defining Linked Tree Structures. Relativisers Annotating Unfixed Nodes. Relatives: Towards a Dynamic Typology. Relativisers Projecting a Requirement. Variation in Locality. Topic Structures and Relatives. Variation in Order ---- Head--Final Relatives. Head--internal Relatives. The Potential for Lexical Variation. Genitive Constructions as LINK Structures. Summary. 5. Wh Questions: A General Perspective. Introduction. The Semantic Diversity of wh Questions. Scopal Properties of wh Expressions. Wh--initial vs wh--in--situ Structures. Wh--in--situ Structures. Wh--in--situ from a Dynamic Perspective. Expletive wh Structures. Partial Movement. Partial Movement as a Reflex of a Requirement. Wh Expressions and Scope Effects. 6. Crossover Phenomena. Crossover ---- The Problem. Crossover ---- The Dynamic Account. Crossover in Relatives. Crossover Phenomena in Questions. Summary. 7. Quantification Preliminaries. Introduction. Scope Effects and Indefinites. Quantification. Quantified NPs. Scope. Term Reconstructions. Applications ---- E--type Anaphora. 8. Reflections on Language Design. The Overall Perspective. Underspecification and the Formal Language Metaphor. English is not a Formal Language. Wellformedness and Availability of Interpretations. Universals and Language Variation. On Knowledge of Language. 9. Appendix: The Formal Framework. Introduction. Declarative Structure. Feature Decorated Tree Construction. Goal--directedness. The Structure of Goal--directed Pointed Partial Tree Models. Tree Descriptions. Procedural Structure. Actions over Goal--directed Partial Tree Models. Natural Languages. Axioms. Finite Binary trees. Partial Trees. Requirements. Actions. Partial Order. Logical Forms. Computational Rules. Update Actions. Pragmatic Actions. General Index. Symbol Index.

Journal ArticleDOI
TL;DR: For instance, the authors pointed out that nearly every text that I look at uses two modes of communication: (a) language as writing and (b) image, and they admit that other features are important, but if pressed, the linguist and the applied linguist would maintain that their business was language, after all and these other things were someone else's to look after.
Abstract: * Nearly every text that I look at uses two modes of communication: (a) language as writing and (b) image. Yet TESOL professionals continue to act as though language fully represented the meanings they wish to encode and communicate. Yes, they admit that other features are important, but if pressed, the linguist and the applied linguist (the language teacher, let us say) would maintain that their business was language, after all, and these other things were someone else's to look after.

Journal ArticleDOI
J.R. Bellegarda1
TL;DR: The objective of this work is to characterize the behavior of such multispan modeling in actual recognition, and to discuss of intrinsic multi-span tradeoffs, such as the influence of training data selection on the resulting performance.
Abstract: Multispan language modeling refers to the integration of various constraints, both local and global, present in the language. It was recently proposed to capture global constraints through the use of latent semantic analysis, while taking local constraints into account via the usual n-gram approach. This has led to several families of data-driven, multispan language models for large vocabulary speech recognition. Because of the inherent complementarity in the two types of constraints, the multispan performance, as measured by perplexity, has been shown to compare favorably with the corresponding n-gram performance. The objective of this work is to characterize the behavior of such multispan modeling in actual recognition. Major implementation issues are addressed, including search integration and context scope selection. Experiments are conducted on a subset of the Wall Street Journal (WSJ) speaker-independent, 20000-word vocabulary, continuous speech task. Results show that, compared to standard n-gram, the multispan framework can lead to a reduction in average word error rate of over 20%. The paper concludes with a discussion of intrinsic multi-span tradeoffs, such as the influence of training data selection on the resulting performance.

Journal Article
TL;DR: A computer assisted word acquisition programme (CAVOCA) is described which operationalises current theoretical thinking about word acquisition, and its contents are based on a systematic inventory of the vocabulary relevant for the target group.
Abstract: During the initial stages of instructed L2 acquisition students learn a couple thousand, mainly high frequency words. Functional language proficiency, however, requires mastery of a considerably larger number of words. It is therefore necessary at the intermediate and advanced stages of language acquisition to learn a large vocabulary in a short period of time. There is not enough time to copy the natural (largely incidental) L1 word acquisition process. Incidental acquisition of the words is only possible up to a point, because, on account of their low frequency, they do not occur often enough in the L2 learning material. Acquisition of new words from authentic L2 reading texts by means of strategies such as contextual deduction is also not a solution for a number of reasons. There appears to be no alternative to intentional learning of a great many new words in a relatively short period of time. The words to be learned may be presented in isolation or in context. Presentation in bilingual word lists seems an attractive shortcut because it takes less time than contextual presentation and yields excellent short term results. Long term retention, however, is often disappointing so contextual presentation seems advisable. Any suggestions how to implement this in pedagogic contexts should be based on a systematic analysis of the two most important aspects of the L2 word learning problem, that is to say, selecting the relevant vocabulary (which and how many words) and creating optimal conditions for the acquisition process. This article sets out to describe a computer assisted word acquisition programme (CAVOCA) which tries to do precisely this: the programme operationalises current theoretical thinking about word acquisition, and its contents are based on a systematic inventory of the vocabulary relevant for the target group. To establish its efficiency, the programme was contrasted in a number of experimental settings with a paired associates method of learning new words. The experimental results suggest that an approach combining the two methods is most advisable.

Patent
19 Jan 2000
TL;DR: In this paper, a natural language-based interface data presentation system interfaces is realized by employing so-called open-ended natural language inquiries to the interface that translates them into database queries and a set of information to be provided to a user.
Abstract: A natural language-based interface data presentation system interfaces, for example, information visualization system interfaces, is realized by employing so-called open-ended natural language inquiries to the interface that translates them into database queries and a set of information to be provided to a user. More specifically, a natural language inquiry is translated to database queries by determining if any complete database queries can be formulated based on the natural language inquiry and, if so, specifying which complete database queries are to be made. In accordance with one aspect of the invention, knowledge of the information visualization presentation is advantageously employed in the interface to guide a user in response to the user's natural language inquiries. In accordance with another aspect of the invention, knowledge of the database and knowledge of the information visualization presentation are advantageously employed in the interface to guide a user in response to the user's natural language inquiries. In accordance with still another aspect of the invention, knowledge of the database, knowledge of the information visualization presentation and context information about the query dialogue are advantageously employed in the interface to guide a user in response to the user's natural language inquiries. In one or more first prescribed embodiments of the invention, the set of data presentation information can be in audio, visual, or both audio-visual form. In one or more other prescribed embodiments of the invention, the inquiry and data presentation information delivery process can be interactive between the user and the interface. In one or more still other prescribed embodiments of the invention, one or more modes of user-interface interaction can be utilized. These modes of interaction can include text, speech, point and click, or the like.

01 Jan 2000


Book ChapterDOI
14 Aug 2000
TL;DR: This article shows how the fundamental semiotic primitives are represented in semantically equivalent notations for logic, including controlled natural languages and various computer languages.
Abstract: The Internet is a giant semiotic system. It is a massive collection of Peirce’s three kinds of signs: icons, which show the form of something; indices, which point to something; and symbols, which represent something according to some convention. But current proposals for ontologies and metadata have overlooked some of the most important features of signs. A sign has three aspects: it is (1) an entity that represents (2) another entity to (3) an agent. By looking only at the signs themselves, some metadata proposals have lost sight of the entities they represent and the agents – human, animal, or robot – which interpret them. With its three branches of syntax, semantics, and pragmatics, semiotics provides guidelines for organizing and using signs to represent something to someone for some purpose. Besides representation, semiotics also supports methods for translating patterns of signs intended for one purpose to other patterns intended for different but related purposes. This article shows how the fundamental semiotic primitives are represented in semantically equivalent notations for logic, including controlled natural languages and various computer languages.

Patent
David Elworthy1
23 Feb 2000
TL;DR: In this article, a method and apparatus for performing a search for information containing natural language is disclosed which uses a natural language query, where the query is input in the form of units of the natural language and this is matched with units in natural language of the data.
Abstract: A method and apparatus for performing a search for information containing natural language is disclosed which uses a natural language query. The query is input in the form of units of the natural language and this is matched with units in the natural language of the data. Where there are unmatched units in the query and/or the data, context data in the form of one or more unmatched units of the query and/or the data is generated. Each unmatched unit as a predefined linguistic relationship to one of the or each matched unit. Output data is formed as matched units with any respective context data.


Book
01 Jan 2000
TL;DR: This book discusses plagiarism, referencing, and developing authorial voice using multiple sources - difficulties and successes, and a pedagogy for plagiarism and referencing.
Abstract: Real Language SeriesGeneral Editors-Jennifer Coates, Jenny Cheshire and Euan ReidThis is a sociolinguistic series about the relationships between language, society and social change. Books in the series draw on natural language data from a wide range of social contexts. The series takes a critical approach to the subject, challenging current orthodoxies, and dealing with familiar topics in new ways. The topic of plagiarism is a highly contentious issue and one that is of growing interest and importance in higher education across the world. Stolen Language? Plagiarism in Writing uncovers the reasons why students plagiarize, and explains what can be done about it. It challenges the concepts of original authorship of language, tracing the notion of plagiarism to the introduction of copyright laws in the eighteenth century. The analysis presented in this book explores plagiarism as complex and contested, and suggests that in student academic writing it may be the surface manifestation of learning difficulties related to the educational environment, the nature of academic discourse and the nature of language.Underlying the concept of plagiarism is the premise that meaning is made by the individual, using the system of language at his or her disposal. The words and ideas then belong to the individual who first thought of them, or who first used these words in a particular way. New understandings, that language and cognition are fundamentally social and cultural, contest the idea of 'original thought' or 'original language'. In addition, what constitutes plagiarism differs depending on the genre and context of writing.Stolen Language shows that there is in any good writing an authorial presence, an authorial voice which is particularly difficult for the novice writer to control when constructing an essay based on multiple texts. Written in a unique and accessible way, the book also looks at the particular difficulties experienced by writers of English as an additional language and provides a practical framework for academics and teachers of writing on how to develop authorial voice and critical thinking in the student writer.

Journal ArticleDOI
01 Dec 2000-Language
TL;DR: This article presented a survey of linguistics from the perspective of individual languages, language families, language groups, or language samples, focusing on little-known languages, whose analysis may shed new light on long-standing problems in general linguistics.
Abstract: The series is a platform for contributions of all kinds to this rapidly developing field. General problems are studied from the perspective of individual languages, language families, language groups, or language samples. Conclusions are the result of a deepened study of empirical data. Special emphasis is given to little-known languages, whose analysis may shed new light on long-standing problems in general linguistics.