scispace - formally typeset
Search or ask a question

Showing papers on "Utterance published in 2009"


Proceedings ArticleDOI
02 Aug 2009
TL;DR: A generative model is presented that simultaneously segments the text into utterances and maps each utterance to a meaning representation grounded in the world state and generalizes across three domains of increasing difficulty.
Abstract: A central problem in grounded language acquisition is learning the correspondences between a rich world state and a stream of text which references that world state. To deal with the high degree of ambiguity present in this setting, we present a generative model that simultaneously segments the text into utterances and maps each utterance to a meaning representation grounded in the world state. We show that our model generalizes across three domains of increasing difficulty---Robocup sportscasting, weather forecasts (a new domain), and NFL recaps.

348 citations


Proceedings ArticleDOI
12 May 2009
TL;DR: An integrated robotic architecture is described that can achieve the above steps by translating natural language instructions incrementally and simultaneously into formal logical goal description and action languages, which can be used both to reason about the achievability of a goal as well as to generate new action scripts to pursue the goal.
Abstract: Robots that can be given instructions in spoken language need to be able to parse a natural language utterance quickly, determine its meaning, generate a goal representation from it, check whether the new goal conflicts with existing goals, and if acceptable, produce an action sequence to achieve the new goal (ideally being sensitive to the existing goals). In this paper, we describe an integrated robotic architecture that can achieve the above steps by translating natural language instructions incrementally and simultaneously into formal logical goal description and action languages, which can be used both to reason about the achievability of a goal as well as to generate new action scripts to pursue the goal. We demonstrate the implementation of our approach on a robot taking spoken natural language instructions in an office environment.

199 citations


Journal ArticleDOI
TL;DR: This paper analyzed chat logs and other artifacts of a virtual world, Quest Atlantis (QA), and proposed the concept of Negotiation for Action (NfA) to explain how interaction, specifically, avatar-embodied collaboration between native English speakers and nonnative English speakers, provided resources for English language acquisition.
Abstract: This study analyzes the user chat logs and other artifacts of a virtual world, Quest Atlantis (QA), and proposes the concept of Negotiation for Action (NfA) to explain how interaction, specifically, avatar-embodied collaboration between native English speakers and nonnative English speakers, provided resources for English language acquisition. Iterative multilayered analyses revealed several affordances of QA for language acquisition at both utterance and discourse levels. Through intercultural collaboration on solving content-based problems, participants successfully reached quest goals during which emergent identity formation and meaning making take place. The study also demonstrates that it is in this intercultural interaction that pragmatics, syntax, semantics, and discourse practices arose and were enacted. The findings are consistent with our ecological psychology framework, in that meaning emerges when language is used to coordinate in-the-moment actions. [ABSTRACT FROM AUTHOR]

194 citations



Book
17 Sep 2009
TL;DR: The authors examined nonverbal behaviours from a pragmatic perspective, and provided the analytical basis to answer some important questions: How are non-verbal behaviours interpreted? What do they convey? How can they best accommodated within a theory of utterance interpretation?
Abstract: The way we say the words we say helps us convey our intended meanings. Indeed, the tone of voice we use, the facial expressions and bodily gestures we adopt while we are talking, often add entirely new layers of meaning to those words. How the natural non-verbal properties of utterances interact with linguistic ones is a question that is often largely ignored. This book redresses the balance, providing a unique examination of non-verbal behaviours from a pragmatic perspective. It charts a point of contact between pragmatics, linguistics, philosophy, cognitive science, ethology and psychology, and provides the analytical basis to answer some important questions: How are non-verbal behaviours interpreted? What do they convey? How can they be best accommodated within a theory of utterance interpretation?

146 citations


Journal ArticleDOI
01 Dec 2009
TL;DR: This work presents a method for automatically synthesizing body language animations directly from the participants' speech signals, without the need for additional input, suitable for animating characters from live human speech.
Abstract: Human communication involves not only speech, but also a wide variety of gestures and body motions Interactions in virtual environments often lack this multi-modal aspect of communication We present a method for automatically synthesizing body language animations directly from the participants' speech signals, without the need for additional input Our system generates appropriate body language animations by selecting segments from motion capture data of real people in conversation The synthesis can be performed progressively, with no advance knowledge of the utterance, making the system suitable for animating characters from live human speech The selection is driven by a hidden Markov model and uses prosody-based features extracted from speech The training phase is fully automatic and does not require hand-labeling of input data, and the synthesis phase is efficient enough to run in real time on live microphone input User studies confirm that our method is able to produce realistic and compelling body language

140 citations


Journal ArticleDOI
TL;DR: The data reveal the formulas, here operationalized as recurring multiword expressions, to be situated in recurring usage events, suggesting the need for a fine-tuning of the UBL theory for the purposes of SLA research towards a more locally contextualized theory of language acquisition and use.
Abstract: The general aim of this article is to discuss the application of Usage-Based Linguistics (UBL) to an investigation of developmental issues in second language acquisition (SLA). Particularly, the aim is to discuss the relevance for SLA of the UBL suggestion that language learning is item-based, going from formulas via low-scope patterns to fully abstract constructions. This paper examines how well this suggested path of acquisition serves ‘as a default in guiding the investigation of the ways in which exemplars and their type and token frequencies determine the second language acquisition of structure’ (N. Ellis 2002: 170). As such, it builds on and further discusses the findings in Bardovi-Harlig (2002) and Eskildsen and Cadierno (2007). The empirical point of departure is longitudinal oral second language classroom interaction and the focal point is the use of can by one student in the class in question. The data reveal the formulas, here operationalized as recurring multiword expressions, to be situated in recurring usage events, suggesting the need for a fine-tuning of the UBL theory for the purposes of SLA research towards a more locally contextualized theory of language acquisition and use. The data also suggest that semi-fixed linguistic patterns, here operationalized as utterance schemas, deserve a prominent place in L2 developmental research.

133 citations


Journal ArticleDOI
19 Nov 2009-Infancy
TL;DR: 7- and 8-month-old infants' long-term memory for words was assessed and word recognition over the long term was successful for words introduced in IDS, but not for those introduced in ADS, regardless of the register in which recognition stimuli were produced.
Abstract: When addressing infants, many adults adopt a particular type of speech, known as infant-directed speech (IDS). IDS is characterized by exaggerated intonation, as well as reduced speech rate, shorter utterance duration, and grammatical simplification. It is commonly asserted that IDS serves in part to facilitate language learning. Although intuitively appealing, direct empirical tests of this claim are surprisingly scarce. Additionally, studies that have examined associations between IDS and language learning have measured learning within a single laboratory session rather than the type of long-term storage of information necessary for word learning. In this study, 7- and 8-month-old infants' long-term memory for words was assessed when words were spoken in IDS and adult-directed speech (ADS). Word recognition over the long term was successful for words introduced in IDS, but not for those introduced in ADS, regardless of the register in which recognition stimuli were produced. Findings are discussed in the context of the influence of particular input styles on emergent word knowledge in prelexical infants.

123 citations


Book ChapterDOI
01 Apr 2009
TL;DR: Winsler, Fernyhough, McClaren, and Way as mentioned in this paper explored the role of language in the development of children's executive function or self-regulation, and found that children's private speech provides an empirical window for exploring many interesting questions about mind, behavior, and language.
Abstract: Developmental relations between thought, language, and behavior have proved to be perennially interesting to psychologists, cognitive scientists, and philosophers (Nelson, 1996; Pinker, 1994; Vygotsky, 1934/1987). To what extent is language separate from thinking? How does language development influence cognitive development? To what extent is language development dependent upon cognitive growth? How is language used by children as a tool for guiding one's thinking, behavior, or problem solving? One phenomenon that falls at the intersection of many such discussions is children's private speech – children's overt and sometimes partially covert (whispered) self-talk while they are working on something or playing. Children's private speech provides an empirical window for exploring many interesting questions about mind, behavior, and language, especially those having to do with language serving a role in the development of children's executive function or self-regulation. Private speech is typically defined as overt, audible speech that is not addressed to another person (Winsler, Fernyhough, McClaren, & Way, 2004). Inner speech, on the other hand, refers to fully internal, silent verbal thought – that is, speech fully inside one's head. Research on children's private speech, largely that which originated from within the Vygotskian theoretical tradition, has been summarized and reviewed before on two occasions – first, in Zivin's (1979a) volume entitled The Development of Self-Regulation Through Private Speech (Zivin, 1979b), and then 13 years later in Diaz and Berk's (1992) volume, entitled Private Speech: From Social Interaction to Self-Regulation (Berk, 1992). Since then, however, research on private speech and self-talk has blossomed.

122 citations


Journal ArticleDOI
TL;DR: The authors argue that the basis of deixis is not the spatial contiguity of the referent, but rather the access (perceptual, cognitive, social) that participants have to the referept.

110 citations


Book ChapterDOI
30 Mar 2009
TL;DR: A new model for Natural Language Generation (NLG) in Spoken Dialogue Systems is presented and evaluated, based on statistical planning, given noisy feedback from the current generation context, which significantly outperforms all the prior approaches.
Abstract: We present and evaluate a new model for Natural Language Generation (NLG) in Spoken Dialogue Systems, based on statistical planning, given noisy feedback from the current generation context (e.g. a user and a surface realiser). We study its use in a standard NLG problem: how to present information (in this case a set of search results) to users, given the complex tradeoffs between utterance length, amount of information conveyed, and cognitive load. We set these trade-offs by analysing existing match data. We then train a NLG policy using Reinforcement Learning (RL), which adapts its behaviour to noisy feedback from the current generation context. This policy is compared to several baselines derived from previous work in this area. The learned policy significantly outperforms all the prior approaches.

Proceedings Article
10 May 2009
TL;DR: A novel interaction technique based solely on emotional speech recognition is introduced, which allows the user to take part in dialogue with virtual actors without any constraints on style or expressivity, by mapping the recognised emotional categories to narrative situations and virtual characters feelings.
Abstract: In most Interactive Storytelling systems, user interaction is based on natural language communication with virtual agents, either through isolated utterances or through dialogue. Natural language communication is also an essential element of interactive narratives in which the user is supposed to impersonate one of the story's characters. Whilst techniques for narrative generation and agent behaviour have made significant progress in recent years, natural language processing remains a bottleneck hampering the scalability of Interactive Storytelling systems. In this paper, we introduce a novel interaction technique based solely on emotional speech recognition. It allows the user to take part in dialogue with virtual actors without any constraints on style or expressivity, by mapping the recognised emotional categories to narrative situations and virtual characters feelings. Our Interactive Storytelling system uses an emotional planner to drive characters' behaviours. The main feature of this approach is that characters' feelings are part of the planning domain and are at the heart of narrative representations. The emotional speech recogniser analyses the speech signal to produce a variety of features which can be used to define ad-hoc categories on which to train the system. The content of our interactive narrative is an adaptation of one chapter of the XIXth century classic novel, Madame Bovary, which is well suited to a formalisation in terms of characters' feelings. At various stages of the narrative, the user can address the main character or respond to her, impersonating her lover. The emotional category extracted from the user utterance can be analysed in terms of the current narrative context, which includes characters' beliefs, feelings and expectations, to produce a specific influence on the target character, which will become visible through a change in its behaviour, achieving a high level of realism for the interaction. A limited number of emotional categories is sufficient to drive the narrative across multiple courses of actions, since it comprises over thirty narrative functions. We report results from a fully implemented prototype, both in terms of proof of concept and of usability through a preliminary user study.

Journal ArticleDOI
TL;DR: It is suggested that the scalar inference is endorsed less often in face-threatening contexts, i.e., when X implies a loss of face for the listener, and also seen as a nice and polite thing to do when X threatens the face of the listener.

Patent
09 Nov 2009
TL;DR: In this paper, a method for presenting additional content for a word that is part of a message, and that is presented by a mobile communication device, includes the steps of: presenting the message, including emphasizing one or more words for which respective additional content is available for presenting by the mobile communication devices; receiving an utterance that includes an emphasized word for which additional content was available to be presented by the device.
Abstract: A method for presenting additional content for a word that is part of a message, and that is presented by a mobile communication device, includes the steps of: presenting the message, including emphasizing one or more words for which respective additional content is available for presenting by the mobile communication device; receiving an utterance that includes an emphasized word for which additional content is available for presenting by the mobile communication device; and presenting the additional content for the emphasized word included in the utterance received by the mobile communication device. These steps are performed by the mobile communication device.

Journal ArticleDOI
TL;DR: The authors proposed an alternative account of manipulation couched in the relevance-theoretic framework which treats manipulation as a two-step communicative attempt at misleading the context-selection process when interpreting a target utterance.
Abstract: Manipulative discourse has attracted a lot of attention in various adjacent domains of linguistic research, notably in rhetoric, argumentation theory, philosophy of language, discourse analysis, pragmatics, among others. We start with a review of the existing definitions provided in these fields and highlight some of the difficulties they encounter. In particular, we argue that there is still a need for an analytic model that makes predictions about manipulative discourse. We propose an alternative account of manipulation couched in the relevance-theoretic framework which treats manipulation as a two-step communicative attempt at misleading the context-selection process when interpreting a target utterance. We argue further that such attempts systematically exploit the inherent weaknesses or flaws of the human cognitive system that are amply discussed in cognitive psychology under the heading of “cognitive illusions”. We claim that such a model correctly captures classical instances of manipulative discourse which fall outside the scope of other accounts.

Patent
John Nicholas Gross1
15 Jun 2009
TL;DR: In this article, an audible based electronic challenge system is used to control access to a computing resource by using a test to identify an origin of a voice, based on analyzing a spoken utterance using optimized challenge items selected for their discrimination capability to determine if it was articulated by an unauthorized human or a text to speech (TTS) system.
Abstract: An audible based electronic challenge system is used to control access to a computing resource by using a test to identify an origin of a voice. The test is based on analyzing a spoken utterance using optimized challenge items selected for their discrimination capability to determine if it was articulated by an unauthorized human or a text to speech (TTS) system.

Patent
24 Mar 2009
TL;DR: In this paper, an apparatus for post-processing conversation errors by using the multi-level verification in a voice conversation system and a method therefor are provided to recognize various conversation errors which can be generated in the conversation system, through the verification of multilevel type.
Abstract: An apparatus for post-processing conversation errors by using the multi-level verification in a voice conversation system and a method therefor are provided to recognize various conversation errors which can be generated in the conversation system, through the verification of multi-level type. A voice recognition part(50) extracts the feature vector of a voice signal and performs the voice recognition. A language analysis part(120) linguistically analyzes the user's utterance and outputs the language analysis result. A conversation analysis part(130) grasps the detailed meaning of the user's utterance based on the previous utterance and outputs the conversation analysis result. A conversation analysis and management part(140) analyzes the meaning of the user's utterance by referring to the flow of the whole conversation and outputs the analyzed result.

Proceedings ArticleDOI
09 Mar 2009
TL;DR: A model for a robot that generates route directions by integrating three important crucial elements: utterances, gestures, and timing is proposed, which demonstrated the effectiveness of the approach not only for task efficiency but also for perceived naturalness.
Abstract: Providing route directions is a complicated interaction. Utterances are combined with gestures and pronounced with appropriate timing. This study proposes a model for a robot that generates route directions by integrating three important crucial elements: utterances, gestures, and timing. Two research questions must be answered in this modeling process. First, is it useful to let robot perform gesture even though the information conveyed by the gesture is given by utterance as well? Second, is it useful to implement the timing at which humans speaks? Many previous studies about the natural behavior of computers and robots have learned from human speakers, such as gestures and speech timing. However, our approach is different from such previous studies. We emphasized the listener's perspective. Gestures were designed based on the usefulness, although we were influenced by the basic structure of human gestures. Timing was not based on how humans speak, but modeled from how they listen. The experimental result demonstrated the effectiveness of our approach, not only for task efficiency but also for perceived naturalness.

Proceedings ArticleDOI
11 Sep 2009
TL;DR: A method for determining when a system has reached a point of maximal understanding of an ongoing user utterance and a prototype implementation that shows how systems can use this ability to strategically initiate system completions of user utterances are shown.
Abstract: We investigate novel approaches to responsive overlap behaviors in dialogue systems, opening possibilities for systems to interrupt, acknowledge or complete a user's utterance while it is still in progress. Our specific contributions are a method for determining when a system has reached a point of maximal understanding of an ongoing user utterance, and a prototype implementation that shows how systems can use this ability to strategically initiate system completions of user utterances. More broadly, this framework facilitates the implementation of a range of overlap behaviors that are common in human dialogue, but have been largely absent in dialogue systems.

Journal ArticleDOI
TL;DR: In this paper, it is argued that the notion of intentionality and the nature of pragmatic intrusion will set- tle the question of the cancellability of explicatures and that the pragmatically conveyed elements of implicatures are not cancellable.
Abstract: An explicature is a combination of linguistically encoded and contextually in- ferred conceptual features. The smaller the relative contribution of the contextual features, the more explicit the explicature will be, and inversely. (Sperber and Wil- son 1986: 182). Abstract The aim of this paper is to reflect on the necessity of pragmatic develop- ment of propositional forms and arrive at a better understanding of the level of meaning which Sperber and Wilson and Carston call 'explicature'. It is also argued that the pragmatically conveyed elements of explicatures are not cancellable—unlike conversational implicatures. While Capone (2003) addressed the issue of the cancellability of explicatures from a merely em- pirical point of view, in this paper a number of important theoretical ques- tions are raised and discussed. In particular it is proposed that the analysis of the notion of intentionality and the nature of pragmatic intrusion will set- tle the question of the cancellability of explicatures. An explicature can be considered a two-level entity. It consists of a logical form and a pragmatic increment that the logical form gives rise to in the context of an utterance. However, both the initial logical form and the pragmatic increment are the target of pragmatic processes. Consequently, we need a pragmatic process to promote the initial logical form to an intended interpretation and another pragmatic process to derive further increments starting from the initial log- ical form as promoted to an utterance interpretation.

01 Jan 2009
TL;DR: Roy et al. as discussed by the authors found significant correlations between input frequencies and age of acquisition for individual words, and caregivers' utterance length, type-token ratio, and proportion of single-word utterances all show significant temporal relationships with the child's devel- opment, suggesting that caregivers tune their utterances to the linguistic ability of the child.
Abstract: Exploring Word Learning in a High-Density Longitudinal Corpus Brandon C. Roy (bcroy@media.mit.edu) Michael C. Frank (mcfrank@mit.edu) The Media Laboratory Massachusetts Institute of Technology Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Deb Roy (dkroy@media.mit.edu) The Media Laboratory Massachusetts Institute of Technology Abstract What is the role of the linguistic environment in children’s early word learning? Here we provide a preliminary analy- sis of one child’s linguistic development, using a portion of the high-density longitudinal data collected for the Human Speechome Project. We focus particularly on the develop- ment of the child’s productive vocabulary from the age of 9 to 24 months and the relationship between the child’s language development and the caregivers’ speech. We find significant correlations between input frequencies and age of acquisition for individual words. In addition, caregivers’ utterance length, type-token ratio, and proportion of single-word utterances all show significant temporal relationships with the child’s devel- opment, suggesting that caregivers “tune” their utterances to the linguistic ability of the child. Keywords: Language acquisition; word learning; corpus data Language Development and the Environment What is the role of the linguistic environment in a child’s ac- quisition of language? In attempts to understand the nature of the mechanisms underlying language acquisition, input- uptake correlations have the potential to provide deep insight. If particular aspects of children’s input are predictive of their later language development, such findings could powerfully illuminate the nature of children’s language learning strate- gies and mechanisms and the relationship of linguistic knowl- edge and experience. Systematic studies of child-directed speech (CDS) dating back to the late 1960’s have established that CDS has special characteristics including shorter utterance lengths, exagger- ated prosody, high redundancy, and referential content tied to immediate context (Snow & Ferguson, 1977). Initial in- vestigations of the facilitative role of CDS focused primarily on development of syntax. Early findings were contradictory, however, and the overall picture remains mixed (Newport et al., 1977; Furrow et al., 1979; Pine, 1995). More recent stud- ies of the role of the environment on lexical development proved to be clearer. For example, studies have shown that the total amount of CDS predicts children’s vocabulary size and rate of growth (Huttenlocher et al., 1991; Hart & Risley, 1995) and the frequency of specific words within CDS pre- dicts the age of acquisition of those words (Huttenlocher et al., 1991; Goodman et al., 2008). Despite the quantity of work in this area, however, our overall understanding of the role of the environment in lan- guage development remains limited by the lack of appropriate observational data. Historically, most longitudinal studies of language development have relied on observations from just two or a few points in time, leading to difficulties in construct- ing a complete picture of the continuous developmental pro- cess. Driven by new technologies, the methodological land- scape is now changing. Higher density longitudinal studies are emerging that provide valuable new perspectives on long- standing questions. For example, by analyzing 90-120 minute audio recordings of children’s home environments every two weeks from 9-15 months of age, Brent & Siskind (2001) shed new light on the role of words spoken in isolation by show- ing that their presence in CDS predicted age of acquisition of those words. More recently, Lieven et al. (in press) recorded 28-30 hours of audio over a 6-week period in the homes of four toddlers yielding 100,000+ word corpora of CDS and speech by children from each home. These data were used to trace the relationship between a child’s utterances over time in support of a constructivist theory of grammar development. Motivated by the goal of obtaining a more complete and naturalistic longitudinal record of child development—and establishing new tools and methods for replicating such ef- forts in the future—the Human Speechome Project (HSP) was launched with the aim of recording the first two to three years of one child’s development at home in rich detail (Roy et al., 2006). This paper provides an overview of the HSP project and corpus, and the human-machine collaborative process for audio analysis. We then present an initial anal- ysis on a subset of the audio portion of this corpus focusing on CDS and lexical development. The Human Speechome Project The goal of HSP is to study early language development through analysis of audio and video recordings of the first two to three years of one child’s life. The home of the fam- ily of one of the authors (DR) with a newborn was outfitted with fourteen microphones and eleven omnidirectional cam- eras. Audio was recorded from ceiling mounted boundary layer microphones at 16 bit resolution with a sampling rate of 48 KHz. Due to the unique acoustic properties of boundary layer microphones, most speech throughout the house includ- ing very quiet speech was captured with sufficient clarity to enable reliable transcription. Video was also recorded to cap- ture non-linguistic context using high resolution fisheye lens video cameras that provide a bird’s-eye view of people, ob- jects, and activity throughout the home. Recordings were made from birth to the child’s third birth- day with the highest density of recordings focused on the first

Journal ArticleDOI
01 Jan 2009-Synthese
TL;DR: A view on indicative conditionals is set out and defended that which proposition is (semantically) expressed by an utterance of a conditional is a function of (among other things) the speaker's context and the assessor's context.
Abstract: I set out and defend a view on indicative conditionals that I call “indexical relativism”. The core of the view is that which proposition is (semantically) expressed by an utterance of a conditional is a function of (among other things) the speaker’s context and the assessor’s context. This implies a kind of relativism, namely that a single utterance may be correctly assessed as true by one assessor and false by another.

Book ChapterDOI
31 Jul 2009
TL;DR: This paper explored existing proposals regarding where in words speakers initiate repair (what they will call the "site" of initiation) using data from seven languages and presented and explained site of initiation data from those seven languages.
Abstract: Introduction Same-turn self-repair is the process by which speakers stop an utterance in progress and then abort, recast or redo that utterance. While same-turn self-repair has become a topic of great interest in the past decade, very little has been written on the question of where within a word speakers tend to initiate repair (the main exceptions being Schegloff 1979 and Jasperson 1998). And, to our knowledge, no work has been done on this question from a cross-linguistic perspective. The goal of this paper is twofold: First, we explore existing proposals regarding where in words speakers initiate repair (what we will call the “site” of initiation) using data from our seven languages; and, second, we present and explain site of initiation data from those seven languages. Our findings suggest that there is a great deal of cross-linguistic variation with respect to favored sites of initiation but that most of the variation can be accounted for by a few simple interactional factors. This paper is the first study we are aware of which considers word length in explanations of self-repair data. The current study is part of a larger project that has as its goal an understanding of the universal principles of self-repair and their language-specific manifestations. Prior studies have shown that the linguistic resources available to speakers of different languages shape the specific methods by which repair is accomplished (Fox et al . 1996; Wouk 2005; Fincke 1999; Egbert 2002; Sidnell 2007c; Karkainnen et al ., 2007).

Proceedings Article
01 Jan 2009
TL;DR: A system to detect angry vs. non-angry utterances of children who are engaged in dialog with an Aibo robot dog, submitted to the Interspeech2009 Emotion Challenge evaluation.
Abstract: This paper describes a system to detect angry vs. non-angry utterances of children who are engaged in dialog with an Aibo robot dog. The system was submitted to the Interspeech2009 Emotion Challenge evaluation. The speech data consist of short utterances of the children’s speech, and the proposed system is designed to detect anger in each given chunk. Frame-based cepstral features, prosodic and acoustic features as well as glottal excitation features are extracted automatically, reduced in dimensionality and classified by means of an artificial neural network and a support vector machine. An automatic speech recognizer transcribes the words in an utterance and yields a separate classification based on the degree of emotional salience of the words. Late fusion is applied to make a final decision on anger vs. nonanger of the utterance. Preliminary results show 75.9% unweighted average recall on the training data and 67.6% on the test set. Index Terms: speech processing, meta-data extraction, emotion recognition, evaluation

Journal ArticleDOI
TL;DR: This proposal is shown to account for a range of constraints on the felicitous use of yo, including its restriction to addressee-new and addRessee-relevant information in assertions, as well as its behaviour in imperatives and interrogatives.
Abstract: I provide an account of the Japanese sentence-final particle yo within a dynamic semantics framework. I argue that yo is used with one of two intonational morphemes, corresponding to sentence-final rising or falling tunes. These intonational morphemes modify a sentence's illocutionary force head, adding an addressee-directed update semantics to the utterance. The different intonational contours specify whether this update is monotonic or non-monotonic. The use of yo is then argued to contribute a pragmatic presupposition to the utterance saying that the post-update discourse context is one in which the addressee's contextual decision problem is resolved. This proposal is shown to account for a range of constraints on the felicitous use of yo, including its restriction to addressee-new and addressee-relevant information in assertions, as well as its behaviour in imperatives and interrogatives.

Book ChapterDOI
01 Jan 2009
TL;DR: The authors proposed a mesoscopic view of language, where linguistic enti- ties such as the letters, words or phrases are the basic units and the grammar is an emergent property of the interactions among them.
Abstract: Human beings as a species is quite unique to this biological world for they are the only organisms known to be capable of thinking, communicating and pre- serving potentially an in¯nite number of ideas that form the pillars of modern civilization. This unique ability is a consequence of the complex and powerful human languages characterized by their recursive syntax and compositional semantics [40]. It has been argued that language is a dynamic complex adap- tive system that has evolved through the process of self-organization to serve the purpose of human communication needs [80]. The complexity of human languages have always attracted the attention of physicists, who have tried to explain several linguistic phenomena through models of physical systems (see e.g., [32, 42]). Like any physical system, a linguistic system (i.e., a language) can be viewed from three di®erent perspectives [52]. On one extreme, a language is a collection of utterances that are produced by the speakers of a linguistic com- munity during the course of their interactions with other speakers of the same community. This is analogous to the microscopic view of a thermodynamic system, where every utterance and its corresponding context contributes to the identity of the language, i.e., the grammar. On the other extreme, a lan- guage can be characterized by a set of grammar rules and a vocabulary. This is analogous to a macroscopic view. Sandwiched between these two extremes, one can also conceive of a mesoscopic view of language, where linguistic enti- ties, such as the letters, words or phrases are the basic units and the grammar is an emergent property of the interactions among them.

Journal ArticleDOI
TL;DR: The authors proposes a more general procedural analysis of interjections, according to which the procedures they encode would not lead hearers to construct higher-level explicatures, but to access contextual material that is necessary for the correct interpretation of the interjectional utterances.
Abstract: The current relevance-theoretic approach to interjections (Wharton 2000, 2001, 2003) analyses these as procedural elements that contribute to the recovery of the higher-level explicatures of utterances. is analysis seems to work satisfactorily for those emotive/expressive interjections accompanying another proposition or appended to another utterance. However, it does not seem to apply to interjections occurring alone, as independent utterances, and to the so-called type of conative/volitive interjections. For this reason, based on previous work on interjections and some relevance-theoretic postulates, this paper seeks to suggest a more general procedural analysis of interjections, according to which the procedures they encode would not lead hearers to construct higher-level explicatures, but to access contextual material that is necessary for the correct interpretation of interjectional utterances. As a result of such procedural meaning, interjections can be used with a rather precise informative intention, can be taken to communicate propositions and result in cognitive effects.

Patent
01 Oct 2009
TL;DR: In this paper, a conversational human-machine interface that includes conversational speech analyzer, a general cognitive model, an environmental model, and a personalized cognitive model to determine context, domain knowledge, and invoke prior information to interpret a spoken utterance or a received non-spoken message is presented.
Abstract: A system and method are provided for receiving speech and/or non-speech communications of natural language questions and/or commands and executing the questions and/or commands. The invention provides a conversational human-machine interface that includes a conversational speech analyzer, a general cognitive model, an environmental model, and a personalized cognitive model to determine context, domain knowledge, and invoke prior information to interpret a spoken utterance or a received non-spoken message. The system and method creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech or non-speech communication and presenting the expected results for a particular question or command.

Patent
15 Jul 2009
TL;DR: In this article, a system and method of targeted tuning of a speech recognition system are disclosed, which includes detecting that a frequency of occurrence of a particular type of utterance satisfies a threshold.
Abstract: A system and method of targeted tuning of a speech recognition system are disclosed. A particular method includes detecting that a frequency of occurrence of a particular type of utterance satisfies a threshold. The method further includes tuning a speech recognition system with respect to the particular type of utterance.

Journal ArticleDOI
TL;DR: For instance, this paper argued that componential analysis can be used to reveal the meaning of a word in the native cognitive world, which is not necessarily a list of individual referent objects, nor connotational meaning (which, in an arbitrarily restricted domain, the semantic differential technique of Osgood and his associates aimed to reveal).
Abstract: WHAT is the purpose of a componential analysis of a terminology? In initial formulations (see Goodenough 1956, and Lounsbury 1956) this purpose was stated clearly: it was a semantic analysis, an analysis of meaning; and, furthermore, not any and all kinds of meaning, but meaning of a particular kind. The kind of meaning which componential analysis aimed to expose was, first of all, intensional or definitional meaning: that is to say, the minimal information about the object to which a term referred, either sufficient to justify the utterance of the term in reference, or necessary to infer from its use. It did not aim to reveal extensional meaning (merely a listing of individual referent objects, such as particular people, episodes of illness, potatoes, etc.) nor connotational meaning (which, in an arbitrarily restricted domain, the semantic differential technique of Osgood and his associates aimed to reveal). Furthermore--and this restriction is central to the argument of this paper-the aim of componential analysis was to discover the intensional meaning of terms for their native users. In other words, it was supposed to make statements about concepts in the native's "cognitive world." The methodological promise of componential analysis, and a large part of the reason for its popularity, lay precisely in its claim to be a systematic, reliable technique for revealing what words mean to the people who use them, not merely in the domain of kinship, but in any other lexical domain with a taxonomic structure.2 In some recent papers (e.g. Burling 1964a, and Lounsbury 1964a) and in discussion among practitioners of the art, however, a drawing-in-of-horns has begun. The claim that a componential analysis represents a native speaker's cognitive world is now often avoided (see Burling 1964a, 1964b; Frake 1964; Hammel 1964; Hymes 1964; Lounsbury 1964a). One reason for this shyness has been, I think, the traditional lack of interest of anthropologists in testing the validity of hypotheses by experimental research methods. One major intent of this paper is to demonstrate that the validity of a hypothesis about native cognitive worlds can be tested empirically. (For another attempt along these lines of empirical testing, see Romney and D'Andrade 1964.) Another reason for avoiding the psychological issue seems to be a reluctance to recognize the implications either of accepting or of rejecting an interest in native "cognition." Let us consider the implications of rejecting an interest in native cognition