scispace - formally typeset
Search or ask a question

Showing papers on "Utterance published in 2006"


Patent
04 Aug 2006
TL;DR: In this article, a conversational human-machine interface that includes conversational speech analyzer, a general cognitive model, an environmental model, and a personalized cognitive model to determine context, domain knowledge, and invoke prior information to interpret a spoken utterance or a received non-spoken message is presented.
Abstract: A system and method are provided for receiving speech and/or non-speech communications of natural language questions and/or commands and executing the questions and/or commands. The invention provides a conversational human-machine interface that includes a conversational speech analyzer, a general cognitive model, an environmental model, and a personalized cognitive model to determine context, domain knowledge, and invoke prior information to interpret a spoken utterance or a received non-spoken message. The system and method creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech or non-speech communication and presenting the expected results for a particular question or command.

430 citations


Proceedings Article
04 Dec 2006
TL;DR: This work demonstrates that the trend toward predictability-sensitive syntactic reduction (Jaeger, 2006) is robust in the face of a wide variety of control variables, and presents evidence that speakers use both surface and structural cues for predictability estimation.
Abstract: If language users are rational, they might choose to structure their utterances so as to optimize communicative properties. In particular, information-theoretic and psycholinguistic considerations suggest that this may include maximizing the uniformity of information density in an utterance. We investigate this possibility in the context of syntactic reduction, where the speaker has the option of either marking a higher-order unit (a phrase) with an extra word, or leaving it unmarked. We demonstrate that speakers are more likely to reduce less information-dense phrases. In a second step, we combine a stochastic model of structured utterance production with a logistic-regression model of syntactic reduction to study which types of cues speakers employ when estimating the predictability of upcoming elements. We demonstrate that the trend toward predictability-sensitive syntactic reduction (Jaeger, 2006) is robust in the face of a wide variety of control variables, and present evidence that speakers use both surface and structural cues for predictability estimation.

387 citations


Journal ArticleDOI
TL;DR: In this paper, the authors explore how the wealth of insights provided by the literature on the interpretation of prosody might be integrated into the relevance-theoretic framework (Sperber and Wilson, 1986/1995; Blakemore, 2002; Carston, 2002).

293 citations


Journal ArticleDOI
TL;DR: This article proposed an elicited imitation test that requires test takers to focus attention first on the meaning of the utterance before repeating it, and some of the sentences presented with are grammatical and others are ungrammatical.
Abstract: A key issue in the field of second language acquisition has been the difficulty of specifying accurate measures of implicit language knowledge. This paper describes the development of an elicited imitation test. Its design differs from that of most other elicited imitation tests in that it (a) requires test takers to focus attention first on the meaning of the utterance before repeating it and (b) some of the sentences that test takers are presented with are grammatical and others are ungrammatical. Test takers are asked to repeat sentences in correct English. It is hypothesized that (a) requiring test takers to respond to the meaning of an utterance reduces the likelihood that they will explicitly focus on linguistic form and thus access explicit language knowledge and that (b) spontaneous correction of incorrect sentences is a powerful indication of participants’ constraints on internal grammar (Munnich et al. 1994). The test is trialled on a baseline group of 20 native speakers and a sample of 95 second language learners. Evidence which would suggest that this test is a likely measure of implicit language knowledge is presented.

282 citations


Journal ArticleDOI
01 Oct 2006-Lingua
TL;DR: The authors considers two post-Gricean attempts to provide an explanatory account of verbal irony: echoic use of language in which the speaker tacitly dissociates herself from an attributed utterance or thought, and pretence as a type of pretence, where the speaker "makes as if" to perform a certain speech act, expecting her audience to see through the pretence and recognise the mocking or critical attitude behind it.

269 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examine documents that are authorized by institutions (such as race-equality policies, which are often signed by, say, the vice-chancellor on behalf of an institution), make claims about the institution (for instance, by describing the institution as having certain qualities, such as being diverse), or point toward future action (by committing an institution to a course of action (e.g., diversity or equality, which in turn might involve the commitment of resources).
Abstract: In this paper, I reflect on institutional speech acts: those that make claims "about" or "on behalf of an institution. Such speech acts involve acts of naming: the institution is named, and in being "given" a name, the institution is also "given" attributes, qualities, and even a character. By "speech acts" I include not just spoken words but writing and visual images all the materials that give an institution interiority, as if it has a face, as well as feelings, thoughts, orjudgments. They might say, forexample, "the university regrets," or just simply, "we regret." More specifically, in this paper, I examine documents that are authorized by institutions (such as race-equality policies, which are often signed by, say, the vice-chancellor on behalf of an institution), make claims about the institution (for instance, by describing the institution as having certain qualities, such as being diverse), or point toward future action (by committing an institution to a course of action, such as diversity or equality, which in turn might involve the commitment of resources). Such speech acts do not do what they say: they do not, as it were, commit a person, organization, or state to an action. Instead, they are nonperformatives. They are speech acts that read as if they are performatives, and this "reading" generates its own effects. For John Langshaw Austin a performative refers to a particular class of speech. An utterance is performative when it does what it says: "the issuing of the utterance is the performing of an action" (1975, 6). For Austin, conditions have to be in place to allow such words to act, or in his

231 citations


Journal ArticleDOI
TL;DR: The interaction between utterance and scene processing by monitoring eye movements in agent-action-patient events, while participants listened to related utterances suggested a preferred reliance of comprehension on depicted events over stereotypical thematic knowledge for thematic interpretation.

220 citations


Patent
28 Nov 2006
TL;DR: In this article, a voice dialing method includes decoding an utterance from a user, decoding the utterance to identify a recognition result for the utterances, and communicating to the user the recognition result if an indication is received from the user that the communicated recognition result is incorrect.
Abstract: A voice dialing method includes the steps of receiving an utterance from a user, decoding the utterance to identify a recognition result for the utterance, and communicating to the user the recognition result If an indication is received from the user that the communicated recognition result is incorrect, then it is added to a rejection reference Then, when the user repeats the misunderstood utterance, the rejection reference can be used to eliminate the incorrect recognition result as a potential subsequent recognition result The method can be used for single or multiple digits or digit strings

184 citations


Proceedings Article
01 May 2006
TL;DR: A way to improve the discriminative quality of gender-dependent features: the emotion recognition system is preceded by an automatic gender detection that decides upon which of two gender- dependent emotion classifiers is used to classify an utterance.
Abstract: Feature extraction is still a disputed issue for the recognition of emotions from speech Differences in features for male and female speakers are a well-known problem and it is established that gender-dependent emotion recognizers perform better than gender-independent ones We propose a way to improve the discriminative quality of gender-dependent features: The emotion recognition system is preceded by an automatic gender detection that decides upon which of two gender-dependent emotion classifiers is used to classify an utterance This framework was tested on two different databases, one with emotional speech produced by actors and one with spontaneous emotional speech from a Wizard-of-Oz setting Gender detection achieved an accuracy of about 90 % and the combined gender and emotion recognition system improved the overall recognition rate of a gender-independent emotion recognition system by 2-4 %

168 citations


Patent
15 Sep 2006
TL;DR: In this paper, an apparatus is provided for speech utterance verification, which is configured to compare a first prosody component from a recorded speech with a second prosodic component for a reference speech.
Abstract: An apparatus is provided for speech utterance verification. The apparatus is configured to compare a first prosody component from a recorded speech with a second prosody component for a reference speech. The apparatus determines a prosodic verification evaluation for the recorded speech utterance in dependence of the comparison.

167 citations


Patent
Hideki Hirakawa1, Tetsuro Chino1
15 Mar 2006
TL;DR: In this paper, an utterance relation determining unit is used to determine whether the second utterance which is input after the input of the first utterance is a speech re-utterance of a whole utterance or a part of the utterance.
Abstract: A speech recognition apparatus includes a generation unit configured to receive a speech utterance and to generate at least one recognition candidate associating to the speech utterance and a likelihood of the recognition candidate; a storing unit configured to store at least the one recognition candidate and the likelihood; a selecting unit configured to select one of at least the one recognition candidate as a recognition result of a first speech utterance based on the likelihood; an utterance relation determining unit configured to determine, when a first speech utterance and a second speech utterance are sequentially input, at least whether the second speech utterance which is input after the input of the first speech utterance is a speech re-utterance of a whole of the first speech utterance or a speech re-utterance of a part of the first speech utterance; a whole correcting unit configured to correct the recognition candidate of the whole of the first speech utterance based on the second speech utterance and to display the corrected recognition result when the utterance relation determining unit determines that the second speech utterance is the speech re-utterance of the whole of the first speech utterance; and a part correcting unit configured to correct the recognition candidate for the part of the first speech utterance, the part corresponding to the second speech utterance, based on the second speech utterance and to display the corrected recognition result when the utterance relation determining unit determines that the second speech utterance is the speech re-utterance of the part of the first speech utterance.

Journal ArticleDOI
TL;DR: The particular case where linguistic forms can become extinct, the presence of many speakers causes a two-stage relaxation, the first being a common marginal distribution that persists for a long time as a consequence of ultimate extinction being due to rare fluctuations.
Abstract: We present a mathematical formulation of a theory of language change. The theory is evolutionary in nature and has close analogies with theories of population genetics. The mathematical structure we construct similarly has correspondences with the Fisher-Wright model of population genetics, but there are significant differences. The continuous time formulation of the model is expressed in terms of a Fokker-Planck equation. This equation is exactly soluble in the case of a single speaker and can be investigated analytically in the case of multiple speakers who communicate equally with all other speakers and give their utterances equal weight. Whilst the stationary properties of this system have much in common with the single-speaker case, time-dependent properties are richer. In the particular case where linguistic forms can become extinct, we find that the presence of many speakers causes a two-stage relaxation, the first being a common marginal distribution that persists for a long time as a consequence of ultimate extinction being due to rare fluctuations.

Journal ArticleDOI
TL;DR: An empirical study investigating the feasibility of recognizing student state in two corpora of spoken tutoring dialogues, one with a human tutor, and one with an computer tutor, shows significant improvements in prediction accuracy over relevant baselines.

Journal ArticleDOI
TL;DR: Results obtained with Spanish and Italian listeners suggest that prosody is important in identifying Spanish-accented Italian and Italian-acented Spanish.
Abstract: The general goal of this study was to expand our understanding of what is meant by 'foreign accent'. More specifically, it deals with the role of prosody (timing and melody), which has rarely been examined. New technologies, including diphone speech synthesis (experiment 1) and speech manipulation (experiment 2), are used to study the relative importance of prosody in what is perceived as a foreign accent. The methodology we propose, based on the prosody transplantation paradigm, can be applied to different languages or language varieties. Here, it is applied to Spanish and Italian. We built up a dozen sentences which are spoken in almost the same way in both languages (e.g. ha visto la casa del presidente americano 'you/(s)he saw the American president's house'). Spanish/Italian monolinguals and bilinguals were recorded. We then studied what is perceived when the segmental specification of an utterance is combined with suprasegmental features belonging to a different language. Under these conditions, results obtained with Spanish and Italian listeners suggest that prosody is important in identifying Spanish-accented Italian and Italian-accented Spanish.

Patent
13 Jul 2006
TL;DR: In this paper, the first utterance has been incorrectly recognized on a first attempt, and the user will repeat the incorrectly recognized portion in the subsequent utterance in order to correct the error.
Abstract: In one embodiment, the present invention is a method and apparatus for error correction in speech recognition applications. In one embodiment, a method for recognizing user speech includes receiving a first utterance from the user, receiving a subsequent utterance from the user, and combining acoustic evidence from the first utterance with acoustic evidence from the subsequent utterance in order to recognize the first utterance. It is assumed that, if the first utterance has been incorrectly recognized on a first attempt, the user will repeat the first utterance (or at least the incorrectly recognized portion of the first utterance) in the subsequent utterance.

Journal ArticleDOI
TL;DR: This article showed that there is no systematic relationship between cognitive effort and cognitive effects in metaphor comprehension, and they concluded that relevance theory need not make any general predictions about the effort needed to comprehend metaphors.
Abstract: This paper explores the trade-off between cognitive effort and cognitive effects during immediate metaphor comprehension. We specifically evaluate the fundamental claim of relevance theory that metaphor understanding, like all utterance interpretation, is constrained by the presumption of optimal relevance (Sperber and Wilson, 1995, p. 270): the ostensive stimulus is relevant enough for it to be worth the addressee’s effort to process it, and the ostensive stimulus is the most relevant one compatible with the communicator’s abilities and preferences. One important implication of optimal relevance is that listeners follow a path of least effort and stop processing at the first interpretation that satisfies their expectation of relevance. They do this by trying to minimize cognitive effort while maximizing cognitive effects. Some relevance theory scholars suggest that metaphors should require additional cognitive effort to be understood, and that in return they yield more cognitive effects than does literal speech. Others claim that metaphors may be understood quickly, as soon as people infer enough effects for the speaker’s utterance to meet their expectation of optimal relevance. Our analysis of the experimental evidence suggests that there is no systematic relationship between cognitive effort and cognitive effects in metaphor comprehension. We conclude that relevance theory need not make any general predictions about the effort needed to comprehend metaphors. Nevertheless, relevance theory is consistent with many of the findings in psycholinguistics on metaphor understanding, and can account for aspects of metaphor understanding that no other theory can explain.


Journal ArticleDOI
TL;DR: The authors consider four criteria for distinguishing what is said from what is merely meant, and argue that when properly understood, they support the traditional classifi cation instead of the traditional "What is said ".
Abstract: On a familiar and prima facie plausible view of metaphor, speakers who speak metaphorically say one thing in order to mean another. A variety of theorists have recently challenged this view; they offer criteria for distinguishing what is said from what is merely meant, and argue that these support classifying metaphor within ' what is said ' . I consider four such criteria, and argue that when properly understood, they support the traditional classifi cation instead. I conclude by sketching how we might extract a workable notion of ' what is said ' from ordinary intuitions about saying. 1. Contextualism and ' What is Said ' Metaphor is a deeply context-sensitive linguistic phenomenon. In the right context, nearly any term or sentence can be used metaphorically, and can be used to express a wide variety of contents. The standard way to accommodate this broad variability is to treat metaphor as a form of speaker meaning, on which speakers intentionally say one thing in order to communicate something different ( Grice, 1975; Searle, 1979; Martinich, 1984 ). On this view, by uttering: (1) Bill is a mouse Alice knowingly says something which, if she meant it, would commit her to the claim that Bill is a small rodent. Normally, she won ' t intend to be taken as committing herself to such an absurdly false claim; her hearers realize this, and interpret her metaphorically instead. The particular assumptions she intends her hearers to employ in determining her metaphorical content can vary considerably across different conversational contexts, producing a wide variety of possible metaphorical meanings. This broadly Gricean model is both intuitively plausible and theoretically satisfying. It nicely subsumes metaphor within a larger theory of communication, on which speakers intentionally exploit shared conversational presuppositions in order to communicate effi ciently. And it accomplishes this while allowing us to retain an attractive view of the relation between speaker and sentence meaning. On Grice ' s own preferred way of thinking, ' what is said ' by an utterance of a

Patent
08 Sep 2006
TL;DR: In this paper, a speech recognition system uses multiple confidence thresholds to improve the quality of speech recognition results, and the choice of which confidence threshold to use for a particular utterance may be based on one or more features relating to the utterance.
Abstract: A speech recognition system uses multiple confidence thresholds to improve the quality of speech recognition results. The choice of which confidence threshold to use for a particular utterance may be based on one or more features relating to the utterance. In one particular implementation, the speech recognition system includes a speech recognition engine that provides speech recognition results and a confidence score for an input utterance. The system also includes a threshold selection component that determines, based on the received input utterance, a threshold value corresponding to the input utterance. The system further includes a threshold component that accepts the recognition results based on a comparison of the confidence score to the threshold value.

Journal ArticleDOI
TL;DR: In this paper, the authors present a detailed phonetic and pragmatic analysis of a particular kind of self-repetition, i.e., the phrases "have another go tomorrow" and "it might do, it might do", which are used to close sequences of talk.

Proceedings Article
01 Jan 2006
TL;DR: Measurements of visual, facial parameters obtained from a speech corpus consisting of short, read utterances in which focal accent was systematically varied are presented.
Abstract: In this paper, we present measurements of visual, facial parameters obtained from a speech corpus consisting of short, read utterances in which focal accent was systematically varied. The utterance ...

Journal ArticleDOI
TL;DR: The authors studied how utterances opposing another position in an argument are constructed with a simultaneous orientation to the detailed structure of the prior utterance being opposed and the future trajectories of action projected by that utterance, which the current utterance attempts to counter and intercept.
Abstract: Analysis focuses on how utterances opposing another position in an argument are constructed with a simultaneous orientation to (a) the detailed structure of the prior utterance being opposed and (b) the future trajectories of action projected by that utterance, which the current utterance attempts to counter and intercept. Through such practices participants treat each other as cognitively complex, reflexive actors who are reshaping a contested, consequential social landscape through the choices they make as they build each next action. Data is drawn from a dispute between a father and his son who is just entering adolescence.

Book ChapterDOI
01 Jan 2006
TL;DR: Language production is logically divided into three major steps: deciding what to express (conceptualization), determining how to express it (formulation), and expressing it (articulation).
Abstract: Publisher Summary Language production is logically divided into three major steps: deciding what to express (conceptualization), determining how to express it (formulation), and expressing it (articulation) Although achieving goals in conversation, structuring narratives, and modulating the ebb and flow of dialogue are inherently important to understanding how people speak, psycholinguistic studies of language production have primarily focused on the formulation of single, isolated utterances An utterance consists of one or more words, spoken together under a single intonational contour or expressing a single idea The simplest meaningful utterance consists of a single word Generating a word begins with specifying its semantic and pragmatic properties—that is, a speaker decides upon an intention or some content to express (eg, a desired outcome or an observation) and encodes the situational constraints on how the content may be expressed The next major stage is formulation, which in turn is divided into a word selection stage and a sound processing stage Sound processing, in contrast, involves constructing the phonological form of a selected word by retrieving its individual sounds and organizing them into stressed and unstressed syllables and then specifying the motor programs to realize those syllables The final process is articulation—that is, the execution of motor programs to pronounce the sounds of a word

Proceedings Article
01 Apr 2006
TL;DR: First, it is investigated how well the addressee of a dialogue act can be predicted based on gaze, utterance and conversational context features, and whether information about meeting context can aid classifiers’ performances.
Abstract: We present results on addressee identification in four-participants face-to-face meetings using Bayesian Network and Naive Bayes classifiers. First, we investigate how well the addressee of a dialogue act can be predicted based on gaze, utterance and conversational context features. Then, we explore whether information about meeting context can aid classifiers’ performances. Both classifiers perform the best when conversational context and utterance features are combined with speaker’s gaze information. The classifiers show little gain from information about meeting context.

Book ChapterDOI
01 Jan 2006
TL;DR: This chapter reviews the syntax in language production and describes how the expressive power of language is enhanced immeasurably by the ability to create meanings compositionally by putting words together.
Abstract: Publisher Summary This chapter reviews the syntax in language production. Syntax is an interface between meaning and sound. A word such as cat has a particular meaning, but the expressive power of language is enhanced immeasurably by the ability to create meanings compositionally by putting words together—for example, the ability to say not a cat or that's my cat. Models of production instantiate this basic architecture fairly and transparently. The process of speaking begins with a message-level representation that captures the idea that the speaker wishes to convey. This message becomes sound at the other end of the model at a stage called “phonological encoding.” Linking the message and phonological levels are two stages of syntactic processing (or grammatical encoding as it is called in the model): one called “functional processing” and the other called “positional processing.” The final component of syntactic processing takes place at the positional level that operates on the functional-level representation. At this point, serial order is imposed on the utterance.

Journal ArticleDOI
TL;DR: One view of how speakers plan and produce utterances is outlined, the literature on age-related changes in production is summarized, an overview of the published research on speakers' gaze during picture description is presented, and a study using eye movement monitoring to explore age- related changes in language production is recapped.

Journal ArticleDOI
01 Oct 2006-Lingua
TL;DR: The authors examined the difference between these approaches in the light of adverbial parenthetical clauses whose relationship with their hosts depends on pragmatically constrained inference, and showed how such examples underline two very different conceptions of the distinction between grammar and pragmatics.

Journal ArticleDOI
TL;DR: The authors argue for a more expansive engagement with multimodality, a view already signaled in the theories of Goman, Clark, Hanks, and Irvine, and argue for shifting the unit of analysis from linguistic or discourse repre- sentation to semiotic remediation practices, a notion that attends to the ways that humans' and nonhumans' semiotic performances (historical and imagined) are re-represented and reused across modes, media, and chains of activity.
Abstract: Discussions of reported speech have increasingly attended to mode, both the mode of the utterance represented and the mode of delivery. In this article, we argue for a more expansive engagement with multimodality, a view already signaled in the theories of Goman, Clark, Hanks, and Irvine. We first propose shifting the unit of analysis from linguistic or discourse repre- sentation to semiotic remediation practices, a notion that attends to the di- verse ways that humans' and nonhumans' semiotic performances (historical and imagined) are re-represented and reused across modes, media, and chains of activity. We then turn to three examples—a family pretend game, a college composition course task, and a comedy skit—that illustrate how semiotic remediation operates in concretely situated and culturally medi- ated practices. We conclude by suggesting that this notion of semiotic reme- diation will assist a fuller understanding of reported speech as discourse practice, that dialogic views of reported speech may in turn contribute to explorations of multimodality, and that attention to semiotic remediation is central to understanding the work of communication and culture.

Journal ArticleDOI
TL;DR: The problem of expressing paralinguistic information in conversational speech may be solved by the use of phrase-sized utterance units taken intact from a large corpus, the complexity of which may be beyond the capabilities of many current synthesis methods.
Abstract: This paper reports progress in the synthesis of conversational speech, from the viewpoint of work carried out on the analysis of a very large corpus of expressive speech in normal everyday situations. With recent developments in concatenative techniques, speech synthesis has overcome the barrier of realistically portraying extra-linguistic information by using the actual voice of a recognizable person as a source for units, combined with minimal use of signal processing. However, the technology still faces the problem of expressing paralinguistic information, i.e., the variety in the types of speech and laughter that a person might use in everyday social interactions. Paralinguistic modification of an utterance portrays the speaker's affective states and shows his or her relationships with the speaker through variations in the manner of speaking, by means of prosody and voice quality. These inflections are carried on the propositional content of an utterance, and can perhaps be modeled by rule, but they are also expressed through nonverbal utterances, the complexity of which may be beyond the capabilities of many current synthesis methods. We suggest that this problem may be solved by the use of phrase-sized utterance units taken intact from a large corpus

01 Jan 2006
TL;DR: This work relates the theory of presupposition accommodation to a computational framework for reasoning in conversation, and fleshes out two key principles: that interpretation is a form of intention recognition; and that intentions are complex informational structures which specify commitments to conditions and to outcomes as well as to actions.
Abstract: We relate the theory of presupposition accommodation to a computational framework for reasoning in conversation. We understand presuppositions as private commitments the speaker makes in using an utterance but expects the listener to recognize based on mutual information. On this understanding, the conversation can move forward not just through the positive effects of interlocutors’ utterances but also from the retrospective insight interlocutors gain about one anothers’ mental states from observing what they do. Our title, ENLIGHTENED UPDATE, highlights such cases. Our approach fleshes out two key principles: that interpretation is a form of intention recognition; and that intentions are complex informational structures, which specify commitments to conditions and to outcomes as well as to actions. We present a formalization and implementation of these principles for a simple conversational agent, and draw on this case study to argue that pragmatic reasoning is holistic in character, continuous with common-sense reasoning about collaborative activities, and most effectively characterized by associating specific, reliable interpretive constraints directly with grammatical forms. In showing how to make such claims precise and to develop theories that respect them, we illustrate the general place of computation in the cognitive science of language.