Topic

Speech Recognition Grammar Specification

About: Speech Recognition Grammar Specification is a research topic. Over the lifetime, 9 publications have been published within this topic receiving 58 citations. The topic is also known as: SRGS.

...read moreread less

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A pervasive wiki application based on VoiceXML

[...]

Constantinos Kolias¹, Vassilis Kolias², Ioannis Anagnostopoulos¹, Georgios Kambourakis¹, Eleftherios Kayafas² - Show less +1 more•Institutions (2)

University of the Aegean¹, National Technical University of Athens²

16 Jul 2008

TL;DR: The design and implementation of an audio wiki application accessible via the Public Switched Telephone Network and the Internet for educational purposes and exploits mature World Wide Web Consortium standards such as VoiceXML, Speech Synthesis Markup Language (SSML) and Speech Recognition Grammar Specification (SRGS).

...read moreread less

Abstract: In this paper, we describe the design and implementation of an audio wiki application accessible via the Public Switched Telephone Network (PSTN) and the Internet for educational purposes. The application exploits mature World Wide Web Consortium standards such as VoiceXML, Speech Synthesis Markup Language (SSML) and Speech Recognition Grammar Specification (SRGS). The purpose of such an application is to assist visually impaired, technologically uneducated, and underprivileged people in accessing information originally intended to be accessed visually via a Personal Computer. Users may access wiki content via wired or mobile phones, or via a Personal Computer using a Web Browser or a Voice over IP service. This feature promotes pervasiveness to educational material to an extremely large population, i.e. those who simply own a telephone line.

...read moreread less

14 citations

Book•

Speech processing for IP networks : Media Resource Control Protocol (MRCP)

[...]

David Burke

09 Apr 2007

TL;DR: The MRCP Value Proposition, a guide to Speech Applications, and Media Encoding and Control: Initiating the Media Session, which describes the process of establishing and removing media resources from the network.

...read moreread less

Abstract: PART I. BACKGROUND. 1. Introduction. 1.1 Introduction to Speech Applications. 1.2 The MRCP Value Proposition. 1.3 History of MRCP Standardisation. 1.3.1 Internet Engineering Task Force. 1.3.2 World Wide Web Consortium. 1.3.3 MRCP: From Humble Beginnings Toward IETF Standard. 1.4 Summary. 2. Basic Principles of Speech Processing. 2.1 Human Speech Production. 2.1.1 Speech Sounds: Phonemics and Phonetics. 2.2 Speech Recognition. 2.2.1 Endpoint Detection. 2.2.2 Mel-Cepstrum. 2.2.3 Hidden Markov Models. 2.2.4 Language Modelling. 2.3 Speaker Verification and Identification. 2.3.1 Feature Extraction. 2.3.2 Statistical Modelling. 2.4 Speech Synthesis. 2.4.1 Front-end Processing. 2.4.2 Back-end Synthesis. 2.5 Summary. 3. Overview of MRCP. 3.1 Architecture. 3.2 Media Resource Types. 3.3 Network Scenarios. 3.3.1 VoiceXML IVR Service Node. 3.3.2 IP PBX with Voicemail. 3.3.3 Advanced Media Gateway. 3.4 Protocol Operation. 3.4.1 Establishing Communication Channels. 3.4.2 Controlling a Media Resource. 3.4.3 Walkthrough Examples. 3.5 Security. 3.6 Summary. PART II. MEDIA AND CONTROL SESSIONS. 4. Session Initiation Protocol. 4.1 Introduction. 4.2 Walkthrough Example. 4.3 SIP URIs. 4.4 Transport. 4.5 Media Negotiation. 4.5.1 Session Description Protocol. 4.5.2 Offer/Answer Model. 4.6 SIP Servers. 4.6.1 Registrars. 4.6.2 Proxy Servers. 4.6.3 Redirect Servers. 4.7 SIP Extensions. 4.7.1 Capability Discovery. 4.8 Security. 4.8.1 Transport and Network Layer Security. 4.8.2 Authentication. 4.8.3 S/MIME. 4.9 Summary. 5. Session Initiation in MRCP. 5.1 Introduction. 5.2 Initiating the Media Session. 5.3 Initiating the Control Session. 5.4 Session Initiation Examples. 5.4.1 Single Media Resource. 5.4.2 Adding and Removing Media Resources. 5.4.3 Distributed Media Source/Sink. 5.5 Locating Media Resource Servers. 5.5.1 Requesting Server Capabilities. 5.5.2 Media Resource Brokers. 5.6 Security. 5.7 Summary. 6. The Media Session. 6.1 Media Encoding. 6.1.1 Pulse Code Modulation (PCM). 6.1.2 Linear Predictive Coding (LPC). 6.2 Media Transport. 6.2.1 Real-Time Protocol (RTP). 6.2.2 DTMF. 6.3 Security. 6.4 Summary. 7. The Control Session. 7.1 Message Structure. 7.1.1 Request Message. 7.1.2 Response Message. 7.1.3 Event Message. 7.1.4 Message Bodies. 7.2 Generic Methods. 7.3 Generic Headers. 7.4 Security. 7.5 Summary. PART III. DATA REPRESENTATION FORMATS. 8. Speech Synthesis Markup Language (SSML). 8.1 Introduction. 8.2 Document Structure. 8.3 Recorded Audio. 8.4 Pronunciation. 8.4.1 Phonemic/Phonetic Content. 8.4.2 Substitution. 8.4.3 Interpreting Text . 8.5 Prosody. 8.5.1 Prosodic Boundaries. 8.5.2 Emphasis. 8.5.3 Speaking Voice. 8.5.4 Prosodic Control. 8.6 Markers . 8.7 Metadata. 8.8 Summary. 9. Speech Recognition Grammar Specification (SRGS). 9.1 Introduction. 9.2 Document Structure. 9.3 Rules, Tokens, and Sequences. 9.4 Alternatives. 9.5 Rule References. 9.5.1 Special Rules. 9.6 Repeats. 9.7 DTMF Grammars. 9.8 Semantic Interpretation. 9.8.1 Semantic Literals. 9.8.2 Semantic Scripts. 9.9 Summary. 10. Natural Language Semantics Markup Language (NLSML). 10.1 Introduction. 10.2 Document Structure. 10.3 Speech Recognition Results. 10.3.1 Serialising Semantic Interpretation Results. 10.4 Voice Enrollment Results. 10.5 Speaker Verification Results. 10.6 Summary. 11. Pronunciation Lexicon Specification (PLS). 11.1 Introduction. 11.2 Document Structure. 11.3 Lexical Entries. 11.4 Abbreviations and Acronyms. 11.5 Multiple Orthographies. 11.6 Multiple Pronunciations. 11.7 Summary. PART IV. MEDIA RESOURCES. 12. Speech Synthesiser Resource. 12.1 Overview. 12.2 Methods. 12.2.1 SPEAK. 12.2.2 PAUSE. 12.2.3 RESUME. 12.2.4 STOP. 12.2.5 BARGE-IN-OCCURRED. 12.2.6 CONTROL. 12.2.7 DEFINE-LEXICON. 12.3 Events. 12.3.1 SPEECH-MARKER. 12.3.2 SPEAK-COMPLETE. 12.4 Headers. 12.5 Summary. 13. Speech Recogniser Resource. 13.1 Overview. 13.2 Recognition Methods. 13.2.1 RECOGNIZE. 13.2.2 DEFINE-GRAMMAR. 13.2.3 START-INPUT-TIMERS. 13.2.4 GET-RESULT. 13.2.5 STOP. 13.2.6 INTERPRET. 13.3 Enrollment Methods. 13.3.1 START-PHRASE-ENROLLMENT. 13.3.2 ENROLLMENT-ROLLBACK. 13.3.3 END-PHRASE-ENROLLMENT. 13.3.4 MODIFY-PHRASE. 13.3.5 DELETE-PHRASE. 13.4 Events. 13.4.1 START-OF-INPUT. 13.4.2 RECOGNITION-COMPLETE. 13.4.3 INTERPRETATION-COMPLETE. 13.5 Recognition Headers. 13.6 Enrollment Headers. 13.7 Summary. 14. Recorder Resource. 14.1 Overview. 14.2 Methods. 14.2.1 RECORD. 14.2.2 START-INPUT-TIMERS. 14.2.3 STOP. 14.3 Events. 14.3.1 START-OF-INPUT. 14.3.2 RECORD-COMPLETE. 14.4 Headers. 14.5 Summary. 15. Speaker Verification Resource. 15.1 Overview. 15.2 Methods. 15.2.1 START-SESSION. 15.2.2 END-SESSION. 15.2.3 VERIFY. 15.2.4 VERIFY-FROM-BUFFER. 15.2.5 VERIFY-ROLLBACK. 15.2.6 START-INPUT-TIMERS. 15.2.7 GET-INTERMEDIATE-RESULT. 15.2.8 STOP. 15.2.9 CLEAR-BUFFER. 15.2.10 QUERY-VOICEPRINT. 15.2.11 DELETE-VOICEPRINT. 15.3 Events. 15.3.1 START-OF-INPUT. 15.3.2 VERIFICATION-COMPLETE. 15.4 Headers. 15.5 Summary. PART V. PROGRAMMING SPEECH APPLICATIONS. 16. Voice eXtensible Markup Language (VoiceXML). 16.1 Introduction. 16.2 Document Structure. 16.2.1 Applications and Dialogs. 16.3 Dialogs. 16.3.1 Forms. 16.3.2 Menus. 16.3.3 Mixed Initiative Dialogs. 16.4 Media Playback. 16.5 Media Recording. 16.6 Speech and DTMF Recognition. 16.6.1 Specifying Grammars. 16.6.2 Grammar Scope and Activation. 16.6.3 Configuring Recognition Settings. 16.6.4 Processing Recognition Results. 16.7 Flow Control. 16.7.1 Executable Content. 16.7.2 Variables, Scopes, and Expressions. 16.7.3 Document and Dialog Transitions . 16.7.4 Event Handling. 16.8 Resource Fetching. 16.9 Call Transfer. 16.10 Summary. 17. VoiceXML and MRCP Interworking. 17.1 Introduction. 17.2 Interworking Fundamentals. 17.2.1 Play Prompts. 17.2.2 Play and Recognise. 17.2.3 Record. 17.3 Application Example. 17.3.1 VoiceXML Scripts. 17.3.2 MRCP Flows. 17.4 Summary. Appendix A. MRCP Version 1. A.1 Overview. A.2 Session Management and Message Transport. A.3 General Protocol Details. A.4 Speech Synthesiser Resource. A.5 Speech Recogniser Resource. Appendix B. XML Primer. B.1 Background. B.2 Basic Concepts. B.3 Namespaces. B.4 Document Schemas. Appendix C. HTTP Primer. C.1 Background. C.2 Basic Concepts. C.2.1 GET Method. C.2.2 POST Method. C.3 Caching. C.4 Cookies. C.5 Security. References. Index. Acronyms.

...read moreread less

13 citations

Book Chapter•DOI•

Speech Application Language Tags

[...]

J. Larson¹•Institutions (1)

Intel¹

01 Jan 2006

TL;DR: Speech Application Language Tags (SALT) is a small number of XML elements that may be embedded into host programming languages to speech-enable applications to develop telephony and multimodal applications.

...read moreread less

Abstract: Enabling users to speak and listen to a computer will greatly enhance users' ability to access computers at any time from nearly any place. Speech Application Language Tags (SALT) is a small number of XML elements that may be embedded into host programming languages to speech-enable applications. SALT may be used to develop telephony (speech input and output only) applications and multimodal applications (speech input and output, as well as keyboard and mouse input and display output). SALT and the host programming language provide control structures not available in VoiceXML, the current standard language for developing speech applications.

...read moreread less

10 citations

Proceedings Article•DOI•

On the definition of patterns for semantic annotation

[...]

Mónica Marrero¹, Julián Urbano¹, Jorge Morato¹, Sonia Sánchez-Cuadrado¹•Institutions (1)

Carlos III Health Institute¹

30 Oct 2010

TL;DR: The Speech Recognition Grammar Specification is adopted, by the W3C, initially intended for speech recognition in the Web, to achieve its full adaptation to the information extraction processes, exploiting its powerful recognition, reuse and flexibility capabilities.

...read moreread less

Abstract: The semantic annotation of documents is an additional advantage for retrieval, as long as the annotations and their maintenance process scale well. Automatic or semi-automatic annotation tools help in this matter with the use of patterns. In this paper we analyze the advantages of creating these patterns with standard web languages, as well as the requirements they should meet. We adopt the Speech Recognition Grammar Specification, by the W3C, initially intended for speech recognition in the Web. Our objective is to achieve its full adaptation to the information extraction processes, exploiting its powerful recognition, reuse and flexibility capabilities.

...read moreread less

9 citations

Patent•

Voicexml language extension for natively supporting voice enrolled grammars

[...]

Brien H. Muschett¹•Institutions (1)

IBM¹

02 Oct 2006

TL;DR: In this paper, the authors extend the VoiceXML language model to support voice enrolled grammars, which can be used in normal speaker dependent speech recognition operations, such as text-based speech recognition.

...read moreread less

Abstract: The present invention extends the VoiceXML language model to natively support voice enrolled grammars. Specifically, three VoiceXML tags can be added to the language model to add, modify, and delete acoustically provided phrases to voice enrolled grammars. Once created, the voice enrolled grammars can be used in normal speaker dependent speech recognition operations. That is, the voice enrolled grammars can be referenced and utilized just like text enrolled grammars can be referenced and utilized. For example using the present invention, voice enrolled grammars can be referenced by standard text-based Speech Recognition Grammar Specification (SRGS) grammars to create more complex, usable grammars.

...read moreread less

6 citations

Network Information

Performance

Metrics

Papers

Citations

No. of papers in the topic in previous years
Year	Papers
2017	2
2012	1
2010	1
2008	1
2007	2
2006	2

Speech Recognition Grammar Specification

Papers

Trending Questions (1)

Network Information

Related Topics (5)

Performance

Metrics