scispace - formally typeset
Search or ask a question

Perspectives on the utility of linguistic knowledge in English word prediction

01 Jan 2005-
TL;DR: The major findings suggest that the practical utility of linguistic knowledge in language technology should generally be evaluated from at least three larger perspectives: (1) language, (2) technology, and (3) the user of the application.
Abstract: The problem addressed in the present thesis is the utility of linguistic knowledge in one domain of language technology, word prediction. An important characteristic of any practical language technology application is its level of performance, and it is therefore essential to be able to measure this quantitatively. The main questions in the present thesis are the following: (1) how can a significant improvement in performance be obtained in practical language technology products, and (2) what is the cost of improved performance in terms of the sources of linguistic knowledge that should be incorporated in them? On a more general level, the major findings suggest that the practical utility of linguistic knowledge in language technology should generally be evaluated from at least three larger perspectives: (1) language, (2) technology, and (3) the user of the application. From these three perspectives, a variety of constraints can be identified which either increase or decrease the usefulness of linguistic knowledge in practical language technology applications. A statistical stateof-the-art word prediction system was developed and tested in the empirical part of this work, and testing the performance of a few prediction methods that utilise sources of linguistic knowledge showed that they can perform just as well as some existing state-of-the-art statistical prediction methods. When the syllable-initial characters of the words to be predicted were used, for example, the expected length of the search key in a running text with a prediction list of ten tokens was only 1.59 characters, while the use of information on the parts of speech of the word tokens to be predicted in a system with five lists representing five parts of speech resulted only in a three percent improvement in performance. One of the practical implications of these results for the field of language technology is that a significant improvement in the performance of a word prediction system may be achieved only incrementally. The simultaneous use of several techniques may in turn dilute the real-time operation of the prediction system, so that it is unable to suggest candidate words quickly enough for the user. It can also affect some performance aspects such as the average percentage of keystrokes/characters saved.
Citations
More filters
Patent
08 Feb 2007
TL;DR: In this paper, a software application utilizes words contained in an application document to provide context-based word prediction in the same or a related document, and the user may choose from the presented candidate words for automatic population into the document being edited.
Abstract: Context-based word prediction is provided. A software application utilizes words contained in an application document to provide context-based word prediction in the same or a related document. The software application creates an application defined data source and populates the data source with words occurring in a document. When the same or a related document is being edited via an input method, for example, typing, speech recognition, electronic handwriting, etc., a prediction engine presents candidate words from the application defined data source that match current text input, and the user may choose from the presented candidate words for automatic population into the document being edited. Information from the application defined data source may be transferred between computing devices, for example, between a mobile computing device and a desktop (non-mobile) computing device.

217 citations

Journal Article
TL;DR: The most common activities that can be computer assisted is the generation of text as mentioned in this paper, which can be made as easy for the user as possible, but it may present significant access problems for others.
Abstract: Computers and computer-based technology have become an integral part of the lives of many individuals with disabilities. One of the most common activities that can be computer assisted is the generation of text. People who cannot accurately control their extremities (due to disabilities such as cerebral palsy and spinal cord injury) use computers as writing tools. People whose physical disability restricts their spoken output may use a computer as a communication prosthesis. In both cases, the generation of text is a necessary activity that can be physically demanding. It should be made as easy for the user as possible. While the standard computer keyboard is an efficient interface for able-bodied people and some disabled people, it may present significant access problems for others. In these cases some alternative interface is necessary.

94 citations

Patent
08 Feb 2007
TL;DR: In this article, a number of textual candidates can be predicted based in part on user input and data stored in a store component, and the number of predicted textual candidate can be suggested to the user as a sum of suggested textual candidates.
Abstract: Embodiments are provided to predict and suggest one or more candidates. Words, acronyms, compound words, phrases, and other textual and symbolic representations can be predicted and suggested to a user as part of an input process or other user operation. In an embodiment, a number of textual candidates can be predicted based in part on user input and data stored in a store component. The number of predicted textual candidates can be suggested to a user as a number of suggested textual candidates. Embodiments enable a user to select an appropriate textual candidate from the number of suggested textual candidates, while reducing a number of associated user operations.

55 citations

Proceedings ArticleDOI
15 Oct 2007
TL;DR: It is shown that training on a combination of in-domain data with out-of- domain data is often more beneficial than either data set alone and that advanced language modeling such as topic modeling is portable even when applied to very different text.
Abstract: Word prediction can be used to enhance the communication rate of people with disabilities who use Augmentative and Alternative Communication (AAC) devices. We use statistical methods in a word prediction system, which are trained on a corpus, and then measure the efficacy of the resulting system by calculating the theoretical keystroke savings on some held out data. Ideally training and testing should be done on a large corpus of AAC text covering a variety of topics, but no such corpus exists. We discuss training and testing on a wide variety of corpora meant to approximate text from AAC users. We show that training on a combination of in-domain data with out-of-domain data is often more beneficial than either data set alone and that advanced language modeling such as topic modeling is portable even when applied to very different text.

29 citations

Journal Article
TL;DR: This paper used a permutation test to detect the linguistic sources of the syntactic variation between two groups, the adults who had received their school education in Finland and the adolescents who were educated in Australia.
Abstract: The paper discusses an application of a technique to tag a corpus containing the English of Finnish Australians automatically and to analyse the frequency vectors of part-ofspeech (POS) trigrams using a permutation test. Our goal is to detect the linguistic sources of the syntactic variation between two groups, the ‘Adults,’ who had received their school education in Finland, and the ‘Juveniles,’ who were educated in Australia. The idea of the technique is to utilise frequency profiles of trigrams of POS categories as indicators of syntactic distance between the groups and then examine potential effects of language contact and language (‘vernacular’) universals in SLA. The results show that some features we describe as ‘contaminating’ the interlanguage of the Adults can be best attributed to Finnish substratum transfer. However, there are other features in our data that may also be ascribed to more “universal” primitives or universal properties of the language faculty. As we have no evidence of potential contamination at the early stages of the Juveniles’ L2 acquisition, we cannot yet prove or refute our hypothesis about the strength of contact influence as opposed to that of the other factors.

16 citations

References
More filters
Journal ArticleDOI
01 Sep 2000-Language
TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.
Abstract: Part 1 The lexical database: nouns in WordNet, George A. Miller modifiers in WordNet, Katherine J. Miller a semantic network of English verbs, Christiane Fellbaum design and implementation of the WordNet lexical database and searching software, Randee I. Tengi. Part 2: automated discovery of WordNet relations, Marti A. Hearst representing verb alterations in WordNet, Karen T. Kohl et al the formalization of WordNet by methods of relational concept analysis, Uta E. Priss. Part 3 Applications of WordNet: building semantic concordances, Shari Landes et al performance and confidence in a semantic annotation task, Christiane Fellbaum et al WordNet and class-based probabilities, Philip Resnik combining local context and WordNet similarity for word sense identification, Claudia Leacock and Martin Chodorow using WordNet for text retrieval, Ellen M. Voorhees lexical chains as representations of context for the detection and correction of malapropisms, Graeme Hirst and David St-Onge temporal indexing through lexical chaining, Reem Al-Halimi and Rick Kazman COLOR-X - using knowledge from WordNet for conceptual modelling, J.F.M. Burg and R.P. van de Riet knowledge processing on an extended WordNet, Sanda M. Harabagiu and Dan I Moldovan appendix - obtaining and using WordNet.

13,049 citations


"Perspectives on the utility of ling..." refers background in this paper

  • ...This is a large, hand-built computer-based lexicon that can also be regarded as an on-line lexical database (see Beckwith et al. 1991; Fellbaum 1998)....

    [...]

Book
01 Jan 1985
TL;DR: Part 1 The clause: constituency towards a functional grammar clause as message clause as exchange clause as representation and above, below and beyond the clause: below the clause - groups and phrases above the clauses - the clause complex additional.
Abstract: This third edition of An Introduction to Functional Grammar has been extensively revised. While retaining the organization and coverage of the earlier editions, it incorporates a considerable amount of new material. This includes strengthening the grammar through the use of data from a large-scale corpus, upgrading the description throughout, and giving greater emphasis to the systemic perspective, in which grammaticalization is understood in the context of an overall model of language.The approach taken in the book overcomes the distinction between theoretical and applied linguistics. The description of grammar is grounded in a comprehensive theory, but it is a theory which evolves in the process of being applied.

12,963 citations

Book
01 Jan 1948
TL;DR: The Mathematical Theory of Communication (MTOC) as discussed by the authors was originally published as a paper on communication theory more than fifty years ago and has since gone through four hardcover and sixteen paperback printings.
Abstract: Scientific knowledge grows at a phenomenal pace--but few books have had as lasting an impact or played as important a role in our modern world as The Mathematical Theory of Communication, published originally as a paper on communication theory more than fifty years ago. Republished in book form shortly thereafter, it has since gone through four hardcover and sixteen paperback printings. It is a revolutionary work, astounding in its foresight and contemporaneity. The University of Illinois Press is pleased and honored to issue this commemorative reprinting of a classic.

10,215 citations

Book
01 Jan 1976
TL;DR: This book studies the cohesion that arises from semantic relations between sentences, reference from one to the other, repetition of word meanings, the conjunctive force of but, so, then and the like are considered.
Abstract: Cohesion in English is concerned with a relatively neglected part of the linguistic system: its resources for text construction, the range of meanings that are speciffically associated with relating what is being spoken or written to its semantic environment. A principal component of these resources is 'cohesion'. This book studies the cohesion that arises from semantic relations between sentences. Reference from one to the other, repetition of word meanings, the conjunctive force of but, so, then and the like are considered. Further, it describes a method for analysing and coding sentences, which is applied to specimen texts.

7,006 citations