An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

Intelligent Automated Assistant

How may I help you

https://hal.archives-ouvertes.fr/hal-00499180/document

Automatic speech recognition and speech variability: A review

Text-to-Speech Synthesis provides a complete, end-to-end account of the process of generating speech by computer. Giving an in-depth explanation of all aspects of current speech synthesis technology, it assumes no specialized prior knowledge. Introductory chapters on linguistics, phonetics, signal processing and speech signals lay the foundation, with subsequent material explaining how this knowledge is put to use in building practical systems that generate speech. Including coverage of the very latest techniques such as unit selection, hidden Markov model synthesis, and statistical text analysis, explanations of the more traditional techniques such as format synthesis and synthesis by rule are also provided. Weaving together the various strands of this multidisciplinary field, the book is designed for graduate students in electrical engineering, computer science, and linguistics. It is also an ideal reference for practitioners in the fields of human communication interaction and telephony.

/pdf/text-to-speech-synthesis-2y7hiwoykk.pdf

Text-to-Speech Synthesis

Festival Speech Synthesis System

Systems and processes for detecting and/or providing a whispered speech response are provided. In one example process, speech is received from a user, and based on the speech input, determined that a whispered speech response is to be provided. Upon determining that a whispered speech response is to be provided, the whispered speech response is generated and provided to the user.

Digital assistant providing whispered speech

A speech recognition method, medium, and system. The method includes detecting an energy change of each frame making up signals including speech and non-speech signals, and identifying a speech segment corresponding to frames that include only speech signals from among the frames based inclusive of the detected energy change.

Speech detection method, medium, and system

This paper describes Apple’s hybrid unit selection speech synthesis system, which provides the voices for Siri with the requirement of naturalness, personality and expressivity. It has been deployed into hundreds of millions of desktop and mobile devices (e.g. iPhone, iPad, Mac, etc.) via iOS and macOS in multiple languages. The system is following the classical unit selection framework with the advantage of using deep learning techniques to boost the performance. In particular, deep and recurrent mixture density networks are used to predict the target and concatenation reference distributions for respective costs during unit selection. In this paper, we present an overview of the run-time TTS engine and the voice building process. We also describe various techniques that enable on-device capability such as preselection optimization, caching for low latency, and unit pruning for low footprint, as well as techniques that improve the naturalness and expressivity of the voice such as the use of long units.

Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System.

A system and method of speech recognition involving a mobile device. Speech input is received (202) on a mobile device (102) and converted (204) to a set of phonetic symbols. Data relating to the phonetic symbols is transferred (206) from the mobile device over a communications network (104) to a remote processing device (106) where it is used (208) to identify at least one matching data item from a set of data items (114). Data relating to the at least one matching data item is transferred (210) from the remote processing device to the mobile device and presented (214) thereon.

Speech recognition involving a mobile device

In this paper we exhibit a novel approach to the problems of topic and speaker identification that makes use of a large vocabulary continuous speech recognizer. We present a theoretical framework which formulates the two tasks as complementary problems, and describe the symmetric way in which we have implemented their solution. Results of trials of the message identification systems using the Switchboard corpus of telephone conversations are reported.

/pdf/topic-and-speaker-identification-via-large-vocabulary-484fbrz94i.pdf

Melvyn J. Hunt

Papers

Digital assistant providing whispered speech

Speech detection method, medium, and system

Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System.

Speech recognition involving a mobile device

Topic and speaker identification via large vocabulary continuous speech recognition