scispace - formally typeset
Search or ask a question

Showing papers on "Utterance published in 1993"


Journal ArticleDOI
TL;DR: The voice parameters affected by emotion are found to be of three main types: voice quality, utterance timing, and utterance pitch contour.
Abstract: There has been considerable research into perceptible correlates of emotional state, but a very limited amount of the literature examines the acoustic correlates and other relevant aspects of emotion effects in human speech; in addition, the vocal emotion literature is almost totally separate from the main body of speech analysis literature. A discussion of the literature describing human vocal emotion, and its principal findings, are presented. The voice parameters affected by emotion are found to be of three main types: voice quality, utterance timing, and utterance pitch contour. These parameters are described both in general and in detail for a range of specific emotions. Current speech synthesizer technology is such that many of the parameters of human speech affected by emotion could be manipulated systematically in synthetic speech to produce a simulation of vocal emotion; application of the literature to construction of a system capable of producing synthetic speech with emotion is discussed.

1,063 citations


Journal ArticleDOI
01 Jun 1993-Lingua
TL;DR: Wilson and Sperber as mentioned in this paper treat utterance interpretation as a two-phase process: a modular decoding phase is seen as providing input to a central inferential phase in which a linguistically encoded logical form is contextually enriched and used to construct a hypothesis about the speaker's informative intention.

601 citations


Journal ArticleDOI
TL;DR: Five reading-time experiments test predictions ofCentering theory with respect to the conditions under which it is preferable to realize (refer to) an entity using a pronoun rather than a repeated definite description or name and provide evidence for the dissociation of the coherence processes of looking backward and looking forward.

580 citations


Journal ArticleDOI
TL;DR: It is shown that noisy feedback is unlikely to be necessary for language learning because if noisy feedback exists it is too weak, and internal mechanisms are necessary to account for the unlearning of ungrammatical utterances.

372 citations


Book
01 Jan 1993
TL;DR: This chapter discusses learner varieties and theoretical linguistics, which involves the acquisition of temporality in second language acquisition and the formation of word-formation processes in learne varieties.
Abstract: Part I. Production: 1. Utterance structure 2. Word-formation processes 3. The acquisition of temporality 4. Reference to space in learne varieties Part II. 5. Ways of achieving understanding 6. Feedback in second language acquisition Part III. 7. Adult language acquisition 8. Conclusion: learner varieties and theoretical linguistics.

301 citations


PatentDOI
TL;DR: In this article, a speech recognition system has an acoustic processor for generating a sequence of coded representations of an utterance to be recognized, where each speech hypothesis is modeled with an acoustic model.
Abstract: A speech recognition system displays a source text of one or more words in a source language. The system has an acoustic processor for generating a sequence of coded representations of an utterance to be recognized. The utterance comprises a series of one or more words in a target language different from the source language. A set of one or more speech hypotheses, each comprising one or more words from the target language, are produced. Each speech hypothesis is modeled with an acoustic model. An acoustic match score for each speech hypothesis comprises an estimate of the closeness of a match between the acoustic model of the speech hypothesis and the sequence of coded representations of the utterance. A translation match score for each speech hypothesis comprises an estimate of the probability of occurrence of the speech hypothesis given the occurrence of the source text. A hypothesis score for each hypothesis comprises a combination of the acoustic match score and the translation match score. At least one word of one or more speech hypotheses having the best hypothesis scores is output as a recognition result.

271 citations


PatentDOI
TL;DR: A children's speech training aid compares a child's speech with models of speech, stored as subword acoustic models, and a general speech model to give an indication of whether or not the child has spoken correctly.
Abstract: A children's speech training aid compares a child's speech with models of speech, stored as sub-word acoustic models, and a general speech model to give an indication of whether or not the child has spoken correctly. An indication of how well the word has been pronounced may also be given. An adult operator enters the word to be tested into the training aid which then forms a model of that word from the stored sub-word speech models. The stored acoustic models are formed by first recording a plurality of words by a plurality of children from a given list of single words. These recordings are then processed off-line to give a basic acoustic model of an acceptable or correct sound for each phoneme in the context of the pre- and proceeding phonemes. The acoustic models are Hidden Markov Models. The limits of acceptable prononciation applied to different words and children may be adjusted by variable penalty values applied in association with the general speech acoustic model. The training aid generates accumulated word costs for each child's utterance and uses these costs to indicate correctness of pronunciation.

82 citations


PatentDOI
TL;DR: A system and associated methods for recognizing (12) compound words from an utterance (22) containing a succession of one or more words from a predetermined vocabulary is described in this paper, where at least one of the words in the utterance is a compound word including at least two formatives in succession.
Abstract: A system and associated methods for recognizing (12) compound words from an utterance (22) containing a succession of one or more words from a predetermined vocabulary. At least one of the words in the utterance is a compound word including at least two formatives in succession, wherein those formatives are words in the vocabulary.

80 citations


Journal ArticleDOI
TL;DR: This article presented cross-cultural evidence of cohesive elements marked by repetitive gestures that maintain continuity with respect to their location in space, the hand with which they are produced, and their form.
Abstract: Discourse cohesion is viewed from the perspective of a speech/gesture synthesis. Based on narrative and conversational data, we present cross‐cultural evidence of cohesive elements marked by repetitive gestures that maintain continuity with respect to their location in space, the hand with which they are produced, and/or their form. The data show the joint contribution made by speech and gesture to the process of creating and maintaining discourse topics. It is claimed that an approach to discourse that focuses on events taking place at the moment of speaking, unlike approaches that assume the prior existence of planned discourse units, can account for the impact of speech and gesture on thought; for example, the execution of a gesture helps the speaker to track presupposed background information, and so provides a basis for the production of the communicatively dynamic part of an utterance. The proposed model of discourse production is a dialectic, in which gesture and speech provide interacting voices, ...

78 citations


Journal ArticleDOI
TL;DR: In this paper, the meaning of English modal auxiliaries, which are found in utterances conveying modal meanings such as ability, possibility, and permission, are discussed.
Abstract: Like so many others before it, this exposition is of the meaning of the English modal auxiliaries, which are found in utterances conveying modal meanings such as ability, possibility and permission. However, unlike the majority of its predecessors, the present rendering admits to being about more than semantics. With the five central modal auxiliaries, can, may, must, will and shall, the modals for short, as a point of departure, a framework will be formulated to shed light on some central aspects of the immense cotext and context sensitivity involved in the meaning of utterances of sentences containing a modal auxiliary.

70 citations


Proceedings Article
01 Aug 1993
TL;DR: To develop a solid theoretical basis for the design of an ALI system, a formal probabilistic framework has been developed and it was discovered that additional information, such as prosodic and acoustic information, can also be useful to supplement the phonotactic information.
Abstract: Automatic language Identification (ALI) is the problem of automatically identifying the language of an utterance through the use of a computer. In 1977, House and Neuburg proposed an approach to ALI which focused on the phonotactic constraints used effectively for language identification if an accurate phonetic representation of an utterance could be obtained from the acoustic signal. Our research utilizes House and Neuburg''s ideas as the starting point for a new segment-based approach to ALI. To develop a solid theoretical basis for the design of an ALI system, a formal probabilistic framework has been developed. This framework uses House and Neuburg''s ideas as its foundation but also utilizes additional information that may be useful for ALI. Specifically, phonotactic, acoustic and prosodic information are all incorporated into the framework which provides the structure for segment-based system. To investigate the capabilities of the new segment-based approach, the system was trained and tested using the OGI Multi-Language Telephone Speech Corpus, which consists of utterances in 10 different languages. the entire system was able to identify the language of a test utterance 48.6% of the times. To investigate the system''s performance in more detail, the entire system, as well as each component of the system, was evaluated as various test conditions were altered. Overall, the analyses of the system confirmed that the phonotactic constraints of languages can be used effectively for ALI. However, it was also discovered that additional information, such as prosodic and acoustic information, can also be useful to supplement the phonotactic information.

Book
02 Jan 1993
TL;DR: NLSoar is presented, a language comprehension system that integrates disparate knowledge sources automatically and applies all the relevant knowledge sources simultaneously to each word.
Abstract: Multiple types of knowledge (syntax, semantics, pragmatics, etc.) contribute to establishing the meaning of an utterance. Immediate application of these knowledge sources is necessary to satisfy the real-time constraintof 200 to 300words per minute for adult comprehension, since delaying the use of a knowledge source introduces computational inefficiencies in the form of overgeneration. On the other hand, ensuring that all relevant knowledge is brought to bear as each word in the sentence is understood is a difficult design problem. As a solution to this problem, we present NLSoar, a language comprehension system that integrates disparate knowledge sources automatically. Through experience, the nature of the understanding process changes from deliberate, sequential problem solving to recognitional comprehension that applies all the relevant knowledge sources simultaneously to each word. The dynamic character of the system results directly from its implementation within the Soar architecture.

Book ChapterDOI
01 Nov 1993
TL;DR: The authors argued that metaphor can only be understood by close attention to the distinction between "sentence meaning" and "utterance meaning", and that metaphor must be considered a case of the latter, not the former.
Abstract: Introduction I am persuaded that the main points in Professor Searle's paper are correct, but there are a few arguments which leave me unsatisfied. I agree with his point that metaphor can only be understood by close attention to the distinction between “sentence meaning” and “utterance meaning,” and that metaphor must be considered a case of the latter, not the former. But I find some aspects of his proposal for a proper treatment of metaphor tantalizingly vague or incomplete. In particular, it seems to me that his discussion of “call to mind” casts the net too wide, capturing some things that are not in the same boat as clear cases of metaphor. Moreover, there are some difficult leaps in his three-step analysis of metaphor, and I think that his proposal avoids dealing with an important question about its nature. I have a few other quibbles that I shall mention along the way, though I do not think they are a serious threat to Searle's analysis, with which, as I have said, I generally agree. The proper domain of the analysis of metaphor The distinction Searle makes between sentence meaning and utterance meaning is a crucial one, not only for metaphor, but for the study of meaning in general. The mistake of overlooking this distinction has constantly plagued work on meaning by linguists, and I think recognition of the importance of the distinction is a real advance.

Book ChapterDOI
01 Mar 1993
TL;DR: This article examined the different utterance types that were used by adults and 4- to 10-year-old children when reporting dialogues in various situations and identified several discourse modes and variations in their uses are illustrated in two types of narrative patterns: (1) prototypical cases, namely when dialogues were reported entirely in one mode, and (2) mode mixtures, namely cases which involved more than one mode.
Abstract: Introduction Speech can be reported in a variety of more or less explicit ways. The most obvious types of utterances reporting speech contain verbs of saying that refer explicitly to speech events and present this speech in direct and indirect quotations. Such metalinguistic utterances all mark explicitly some boundary between the reported message and the narrator's message, although they differ in other ways. Other types of utterances can also be used to present speech events in less obvious ways that do not explicitly represent speech qua speech. By virtue of their form and content in isolation, these utterances do not, strictly speaking, quote speech. When they are embedded in discourse, however, they constitute nonexplicit ways of reporting speech, the uses of which are systematic from a functional point of view. The study presented below examines the different utterance types that were used by (English-speaking) adults and 4- to 10-year-old children when reporting dialogues in various situations. Several discourse modes are identified in the corpus and variations in their uses are illustrated in two types of narrative patterns: (1) prototypical cases, namely when dialogues were reported entirely in one mode, and (2) mode mixtures , namely cases which involved more than one mode.

11 Jan 1993
TL;DR: In this paper, a unified theory of speech act production, interpretation, and repair has been developed by using default reasoning to generate utterances and using abduction to characterize interpretation and repair.
Abstract: To decide how to respond to an utterance, a speaker must interpret what others have said and why they have said it. Speakers rely on their expectations to decide whether they have understood each other. Misunderstandings occur when speakers differ in their beliefs about what has been said or why. If a listener hears something that seems inconsistent, he may reinterpret an earlier utterance and respond to it anew. Otherwise, he assumes that the conversation is proceeding smoothly. Recognizing an inconsistency as a misunderstanding and generating a new reply together accomplish what is known as a fourth-position repair. To model the repair of misunderstandings, this thesis combines both intentional and social accounts of discourse, unifying theories of speech act production, interpretation, and repair. In intentional accounts, speakers use their beliefs, goals, and expectations to decide what to say; when they interpret an utterance, speakers identify goals that might account for it. In sociological accounts provided by Ethnomethodology, discourse interactions and the resolution of misunderstandings are normal activities guided by social conventions. The approach extends intentional accounts by using expectations deriving from social conventions in order to guide interpretation. As a result, it avoids the unconstrained inference of goals that has plagued many models of discourse. A unified theory has been developed by using default reasoning to generate utterances and using abduction to characterize interpretation and repair. The account has been expressed as a logical theory within the Prioritized Theorist Framework. The theory includes relations on linguistic acts and the Gricean attitudes that they express. It also contains an axiomatization of speakers' knowledge for generating socially appropriate utterances and for detecting and repairing misunderstandings. The generality of the approach is demonstrated by re-enacting real conversations using the theorem-proving capabilities of Prioritized Theorist.

Journal ArticleDOI
TL;DR: This paper investigated how listeners integrate spoken utterances into a discourse representation, using a cross-modal naming task where subjects hear a short discourse followed by a sentence fragment, and immediately at the offset of the fragment, a visual probe word is presented.

01 Jan 1993
TL;DR: The role of discourse markers is to signal speaker comment on the current utterance as mentioned in this paper, while absence of markers does not affect sentence grammaticality, it does remove a powerful clue about the speaker's perception of the relationship between prior and subsequent discourse.
Abstract: This paper discusses discourse mzrkers (eg, "and, so, anyway") and offers an overview of their characteristics and occurrence, using English for illustration The role of discourse markers is to signal speaker comment on the current utterance The discourse marker is not part of the sentence's propositional content While absence of markers does not affect sentence grammaticality, it does remove a powerful clue about the speaker's perception of the relationship between prior and subsequent discourse Each discourse marker may appear in a sentence-initial position; some may occur in sentence-medial or sentence-final position; however, in the latter cases, a change in marker scope occurs Each discourse marker has an associated core meaning, part of which signals type of sequential relationship (eg, change of topic, parallelism, etc) and part of which provides the starting point for interpretation of the commentary message in a given case Three types of discourse markers are examined: those signalling reference to the discourse topic; those signalling that current discourse activity relates to the foregoing discourse; and those signalling relationship of the basic current message to some prior message Based on this conceptual framework, analysis of discourse markers in other languages in

Journal ArticleDOI
TL;DR: In this article, three groups of participants rated the authorial intent to endorse or reject the truth value, the degree of irony, or metaphoricity of target utterances contained in brief anecdotes.
Abstract: Three groups of participants rated the authorial intent to endorse or reject the truth-value, the degree of irony, or metaphoricity of target utterances contained in brief anecdotes. Target utterances could be interpreted either literally or figuratively; a figurative resemblance was suggested by having the utterance echo an earlier usage. The anecdotes manipulated (a) congruence of the utterance with observational fact and (b) audience type. Type of audience depended on whether all listeners (nonpolarized audience), or only some listeners (polarized audience) were aware that the target utterance echoed an earlier utterance. Results supported the hypothesis that authorial intent would affect the relative degree of irony and metaphoricity perceived by participants: Endorsements were perceived as conveying a metaphoric message, whereas rejections were perceived as ironic. Congruent usage was taken as endorsement of a truth-value and incongruent usage was taken as rejection of a truth-value. Incongruent utte...

Proceedings Article
01 Jan 1993
TL;DR: A unified approach to iden-tifying non-linguistic speech features from the recordedsignal using phone-based acoustic likelihoods, which has been shown to be effective for text-independent,vocabulary-independent sex, speaker, and language identi-ffcation and promising for a variety of applications.
Abstract: SUMMARY In this paper we have presented a unified approach forthe identification of non-linguistic speech features fromrecorded signals using phone-based acoustic likelihoods.The inclusion of this technique in speech-based systems,can broaden the scope of applications of speech technolo-gies, and lead to more user-friendly systems. The approachis based on training a set of large phone-based ergodicHMMs for each non-linguisticfeature to be identified (lan-guage, gender, speaker, ...), and identifying the feature asthat associated with the model having the highest acousticlikelihoodof the set. The decoding procedure is efficientlyimplemented by processing all the models in parallel usinga time-synchronous beam search strategy.This has been shown to be a powerful technique for sex,language, and speaker-identification, and has other possi-ble applications such as for dialect identification (includ -ing foreign accents), or identification of speech disfluen-cies. Sex-identification for BREF and WSJ was error-free,and 99% accurate for TIMIT with 2s of speech. Speakeridentification accuracies of 98.8% on TIMIT (168 speak-ers) and 99.1% on BREF (65 speakers) were obtained withone utterance per speaker, and 100% if 2 utterances wereused foridentification. This identificationaccuracy was ob -tained on the 168 test speakers of TIMIT without makinguse of the phonetic transcriptionsduring training,verifyingthat it is not necessary to have labeled data adaptation data.Speaker-independent models can be used to provide the la-bels used in building the speaker-specific models. Beingindependent of the spoken text, and requiring only a smallamount of identification speech (on the order of 2.5s), thistechnique is promising for a variety of applications, partic-ularly those for which continual, transparent verification ispreferable.Tests of two-way language identification of read, labora-toryspeech show that with 2sof speech the languageis cor-rectly identified as English or French with over 99% accu-racy. Simply portingthe approach to the conditionsof tele-phone speech, French and English data in the OGI multi-language telephone speech corpus was about 76% with 2sof speech, and increased to 82% with 10s. The overall 10-languageidentificationaccuracy on thedesignateddevelop -ment test data of in the OGI corpus is 59.7%. These resultswere obtained without the use of phone transcriptions fortraining, which were used for the experiments with labora-tory speech.In conclusion, we propose a unified approach to iden-tifying non-linguistic speech features from the recordedsignal using phone-based acoustic likelihoods. This tech-nique has been shown to be effective for text-independent,vocabulary-independent sex, speaker, and language identi-fication. While phone labels have been used to train thespeaker-independent seed models, these models can thenbe used to label unknown speech, thus avoiding the costlyprocess of transcribing the speech data. The ability to ac-curately identify non-linguisticspeech features can leadtomore performant spoken language systems enabling betterand more friendly human machine interaction.

Patent
28 May 1993
TL;DR: This paper presented a speech training system that allows a student to enter any utterance to be learned and have the articulatory model movements required to produce the utterance displayed on a CRT screen.
Abstract: This invention includes a speech training system that allows a student to enter any utterance to be learned and have the articulatory model movements required to produce the utterance displayed on a CRT screen. The system accepts a typed utterance, breaking it down into a set of phonemes. The set of phonemes is sent to a synthesizer, which produces a set of parameters indicating the acoustic characteristics of the utterance. The acoustic parameters are converted into articulatory parameters emphasizing the frequency, nasality, and tongue-palate contact required to produce the typed utterance. The articulatory parameters are displayed on the CRT screen. The acoustic parameters are also sent to a formant synthesizer which converts the parameters into speech output. The system measures a student's production and then evaluates the student's production against the parameters of the typed utterance for its similarity. Feedback on the similarity is displayed on the CRT screen.

Proceedings ArticleDOI
21 Apr 1993
TL;DR: A unified theory of speech act production, interpretation, and the repair of misunderstandings has been developed by characterizing the generation of utterances as default reasoning and using abduction to characterize interpretation and repair.
Abstract: To respond to an utterance, a listener must interpret what others have said and why they have said it. Misunderstandings occur when agents differ in their beliefs about what has been said or why. Our work combines intentional and social accounts of discourse, unifying theories of speech act production, interpretation, and the repair of misunderstandings. A unified theory has been developed by characterizing the generation of utterances as default reasoning and using abduction to characterize interpretation and repair.

Book ChapterDOI
01 Nov 1993
TL;DR: The authors examined the question of whether the interpretation of a metaphorical expression is predictable on the basis of the linguistic properties of the utterance alone and found that if the metaphorical utterance is given out of context, do speakers agree on the most likely interpretation?
Abstract: Paivio concluded his chapter in the first edition of this collection with the lament that much of the psychological research on metaphor had not been directed at really fundamental problems in the area. In the last paragraph, he wrote: Such work might require the systematic development of a large pool of novel metaphors that vary in type, difficulty, concreteness, and whatever other dimensions may seem relevant. It may demand systematic extensions of some of the traditional paradigms that have been developed in verbal memory and language research. It would require detailed factual information about precisely how people respond to a novel metaphorical expression. The present chapter describes a step toward redressing this lack of relevant research. The particular issues I am addressing involve the last point raised by Paivio and Walsh, namely, the nature of the speaker's response to a novel metaphor. However, I want to go further than just determining the facts of what interpretation is provided by the native speaker to a novel metaphorical utterance and suggest that in order to evaluate such facts, we must examine the following general question: To what extent is the interpretation of a metaphorical expression (or at least the most probable interpretation) predictable on the basis of the linguistic properties of the utterance alone? There are several subquestions: If the metaphorical utterance is given out of context, do speakers agree on the most likely interpretation?

Proceedings Article
28 Aug 1993
TL;DR: A mechanism for representing and using typicality distributions of static spatial relations which is related to Herskovits' analytical framework is explained and allows us to construct dynamic mental images corresponding to the referents of objective sports reports.
Abstract: AI research concerning the connection between seeing and speaking mainly employs what is often called reference semantics Applying this approach to the situation of a radio sports reporter, we have to coordinate the demand of referentially anchoring an utterance dealing with the visually perceived, and the demand for coherence of an utterance as part of a verbal interaction with somebody not situated in the same perceptual context In consequence, we are led to the conception of a speaker anticipating the listeners' understanding by means of mental images which replace the percepts being described, and thus provide the referents for the audience We present a system realizing this type of partner modeling, emphasizing mainly the reconstruction of the referents, ie, of a mental image Starting from the thesis that the audience expects the speaker to mean the most typical case of the described class of events or situations with respect to the communicated context, we explain a mechanism for representing and using typicality distributions of static spatial relations which is related to Herskovits' analytical framework Extended to restrictions of speed and temporal duration, this mechanism also allows us to construct dynamic mental images corresponding to the referents of objective sports reports

02 Jan 1993
TL;DR: An algorithm is presented that combines world, linguistic, and contextual knowledge to recognize complex discourse acts such as expressing doubt and that enables the system to model the structure of negotiation dialogues, a step towards a complete model of negotiation and thus a more robust model of understanding.
Abstract: Current natural language understanding systems are severely limited in their handling of many kinds of dialogues. They handle each utterance in relative isolation, with little or no use of the established dialogue context, of general knowledge of the world, of the system's beliefs about the user's beliefs, or of the linguistic information contained in the user's utterances. Consequently, current systems are unable to recognize when a user is conveying lack of belief in the system's claims or conveying implicit acceptance of previously communicated propositions. These shortcomings prevent these systems from being able to recognize complex communicative actions such as expressing doubt, and subsequent actions which implicitly indicate acceptance of communicated propositions. This thesis presents a tripartite model of dialogue that overcomes these limitations. The model distinguishes among domain, problem-solving, and discourse or communicative actions, thus capturing a wider range of goals than previous models. It is able to recognize the role of utterances even when that role cannot be identified from a single utterance alone. The thesis shows how people convey uncertain beliefs by the surface form of their utterances and argues that recognition of complex discourse acts requires a multi-strength belief model capable of representing such beliefs. It presents an algorithm that combines world, linguistic, and contextual knowledge to recognize complex discourse acts such as expressing doubt and that enables the system to model the structure of negotiation dialogues. As a result, this model is a step towards a complete model of negotiation and thus a more robust model of understanding.



Patent
Joseph Desimone1, Jian-Tu Hsieh1
22 Dec 1993
TL;DR: In this article, a speech signal derived from a user's utterance, and a bio-signal which is indicative of the user's emotional state, are provided to a speech recognition system.
Abstract: The recognition rate of a speech recognition system is improved by compensating for changes in the user's speech that result from factors such as emotion, anxiety or fatigue. A speech signal derived from a user's utterance, and a bio-signal, which is indicative of the user's emotional state, are provided to a speech recognition system. The bio-signal is used to provide a reference frequency that changes when the user's emotional state changes. An utterance is identified by examining the relative magnitudes of its frequency components and the position of the frequency components relative to the reference frequency.

Proceedings ArticleDOI
21 Mar 1993
TL;DR: A unified approach to identifying non-linguistic speech features from the recorded signal using phone-based acoustic likelihoods is presented and is shown to be effective for text-independent language, sex, and speaker identification and can enable better and more friendly human-machine interaction.
Abstract: Over the last decade technological advances have been made which enable us to envision real-world applications of speech technologies. It is possible to foresee applications where the spoken query is to be recognized without even prior knowledge of the language being spoken, for example, information centers in public places such as train stations and airports. Other applications may require accurate identification of the speaker for security reasons, including control of access to confidential information or for telephone-based transactions. Ideally, the speaker's identity can be verified continually during the transaction, in a manner completely transparent to the user. With these views in mind, this paper presents a unified approach to identifying non-linguistic speech features from the recorded signal using phone-based acoustic likelihoods.This technique is shown to be effective for text-independent language, sex, and speaker identification and can enable better and more friendly human-machine interaction. With 2s of speech, the language can be identified with better than 99% accuracy. Error in sex-identification is about 1% on a per-sentence basis, and speaker identification accuracies of 98.5% on TIMIT (168 speakers) and 99.2% on BREF (65 speakers), were obtained with one utterance per speaker, and 100% with 2 utterances for both corpora. An experiment using unsupervised adaptation for speaker identification on the 168 TIMIT speakers had the same identification accuracies obtained with supervised adaptation.

Book ChapterDOI
01 Jan 1993
TL;DR: For instance, this article showed that a sentence in the past tense locates the episode it describes before the utterance time, sentences in the future tense serve to describe episodes later than the time of utterance and a present tense sentence is typically used to present a condition as holding over some period which surrounds the utterances time.
Abstract: The theory we have developed in the preceding chapters ignores all questions of reference to time. In view of this one might have thought that it cannot possibly by right. For in natural languages such as English reference to time is ubiquitous. Virtually every English sentence involves an element of temporal reference because of the tense of its verb: as a first approximation, a sentence in the past tense locates the episode it describes before the utterance time, sentences in the future tense serve to describe episodes later than the time of utterance and a present tense sentence is typically used to present a condition as holding over some period which surrounds the utterance time. Since our theory paid no heed to any of this, how could it possibly be correct?

Journal Article
TL;DR: The lack of a consistent answer to the question of the generator's source has been at the heart of the problem of how to make research on generation intelligible and engaging for the rest of the computational linguistics community, and has complicated efforts to evaluate alternative treatments even for people in the field.
Abstract: David D. McDonald* Brandeis University The most vexing question in natural language generation is 'what is the source'-- what do speakers start from when they begin to compose an utterance? Theories of generation in the literature differ markedly in their assumptions. A few start with an unanalyzed body of numerical data (e.g. Bourbeau et al. 1990; Kukich 1988). Most start with the structured objects that are used by a particular reasoning system or simulator and are cast in that system's representational formalism (e.g. Hovy 1990; Meteer 1992; R6sner 1988). A growing number of systems, largely focused on problems in machine translation or grammatical theory, take their input to be logical formulae based on lexical predicates (e.g. Wedekind 1988; Shieber et al. 1990). The lack of a consistent answer to the question of the generator's source has been at the heart of the problem of how to make research on generation intelligible and engaging for the rest of the computational linguistics community, and has complicated efforts to evaluate alternative treatments even for people in the field. Nevertheless, a source cannot be imposed by fiat. Differences in what information is assumed to be available, its relative decomposition when compared to the "packaging" available in the words or syntactic constructions of the language