scispace - formally typeset
Search or ask a question
Author

Alan Cruttenden

Bio: Alan Cruttenden is an academic researcher from University of Oxford. The author has contributed to research in topics: Speech synthesis & Motor theory of speech perception. The author has an hindex of 2, co-authored 2 publications receiving 646 citations.

Papers
More filters
Book
21 Jul 1994
TL;DR: Part I: Speech and language 1. Communication 2. The production of speech 3. The sounds of speech 4. The description and classification of speech sounds 5. Sounds in language 6. The historical background 7. Standard and regional accents 8. The English vowels 9. Words 11. Connected speech 12. Words in connected speech 13. Teaching the pronunciation of English
Abstract: PART I: Speech and language 1. Communication 2. The production of speech 3. The sounds of speech 4. The description and classification of speech sounds 5. Sounds in language PART II: The sounds of English 6. The historical background 7. Standard and regional accents 8. The English vowels 9. The English consonants PART III: Words and connected speech 10. Words 11. Connected speech 12. Words in connected speech 13. Teaching the pronunciation of English

659 citations

Journal ArticleDOI
TL;DR: In this paper, audio and acoustic data were produced from recordings of a Glaswegian English speaker in conversational and reading modes, where different intonational systems were used in the two modes.
Abstract: Auditory and acoustic data were produced from recordings of a Glaswegian English speaker in conversational and reading modes. Clearly different intonational systems were used in the two modes. The reading style used an intonation similar to that used in standard British intonation (the intonation of ‘Received Pronunciation’ (RPI)). The conversational style was an example of the type of intonation used in a number of cities in the north of the UK (Urban North British Intonation (UNBI)), characterised by a default intonation involving rising or rising-slumping nuclear pitch patterns. This speaker illustrates a clear-cut case of intonational diglossia with a falling default tune in the one mode and a rising(-falling) default tune in the other.

29 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The results suggest that children's representations of familiar words are phonetically well-specified, and that this specification may not be a consequence of the need to differentiate similar words in production.

429 citations

Book
John Field1
23 Feb 2009
TL;DR: This paper argued that a preoccupation with the notion of "comprehension" has led teachers to focus upon the product of listening, in the form of answers to questions, ignoring the listening process itself.
Abstract: This book challenges the orthodox approach to the teaching of second language listening, which is based upon the asking and answering of comprehension questions. The book's central argument is that a preoccupation with the notion of 'comprehension' has led teachers to focus upon the product of listening, in the form of answers to questions, ignoring the listening process itself. The author provides an informed account of the psychological processes which make up the skill of listening, and analyses the characteristics of the speech signal from which listeners have to construct a message. Drawing upon this information, the book proposes a radical alternative to the comprehension approach and provides for intensive small-scale practice in aspects of listening that are perceptually or cognitively demanding for the learner. Listening in the Language Classroom was winner of the Ben Warren International Trust House Prize in 2008.

348 citations

Posted Content
TL;DR: This work presents LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, a recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end.
Abstract: Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). However, existing work on models trained end-to-end perform only word classification, rather than sentence-level sequence prediction. Studies have shown that human lipreading performance increases for longer words (Easton & Basala, 1982), indicating the importance of features capturing temporal context in an ambiguous communication channel. Motivated by this observation, we present LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, a recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end. To the best of our knowledge, LipNet is the first end-to-end sentence-level lipreading model that simultaneously learns spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet achieves 95.2% accuracy in sentence-level, overlapped speaker split task, outperforming experienced human lipreaders and the previous 86.4% word-level state-of-the-art accuracy (Gergen et al., 2016).

295 citations

01 Jan 2004
TL;DR: The English are a lazy lot, and will not speak a word as it should be spoken when they can slide through it as discussed by the authors. Why be bothered to say extraordinary when you can get away with strawdiny?... Many of the Oxford Cockneys are weaklings too languid or emasculated to speak their noble language with any vigor.
Abstract: The English are a lazy lot, and will not speak a word as it should be spoken when they can slide through it. Why be bothered to say extraordinary when you can get away with strawdiny? ... Many of the Oxford Cockneys are weaklings too languid or emasculated to speak their noble language with any vigor, but the majority are following a foolish fashion which had better be abandoned. Its ugliness alone should make it unpopular, but it has the additional effect of causing confusion. [Irish playwright St. John Ervine, quoted by H.L. Mencken (1948, p. 39)]

273 citations

Journal ArticleDOI
TL;DR: This article measured the formants of the eleven monophthong vowels of Standard Southern British pronunciation of English using linear-prediction-based formant tracks overlaid on digital spectrograms for an average of ten instances of each vowel for each speaker.
Abstract: The formants of the eleven monophthong vowels of Standard Southern British (SSB) pronunciation of English were measured for five male and five female BBC broadcasters whose speech was included in the MARSEC database. The measurements were made using linear-prediction-based formant tracks overlaid on digital spectrograms for an average of ten instances of each vowel for each speakers, These measurements were taken from connected speech, allowing comparison with previous formant values measured from citation words. I was found that the male vowels were significantly less peripheral in the measurements from connected speech than in measurements from citation words.

220 citations