Home
/
Authors
/
Toshiyuki Sakai

Author

Toshiyuki Sakai

Bio: Toshiyuki Sakai is an academic researcher from Kyoto University. The author has contributed to research in topics: Speech processing & Acoustic model. The author has an hindex of 3, co-authored 7 publications receiving 144 citations.

Papers

PDF

Open Access

More filters

The Automatic Speech Recognition System for Conversational Sound

[...]

Toshiyuki Sakai¹, Shuji Doshita¹•Institutions (1)

Kyoto University¹

01 Jan 1964

TL;DR: In this article, a monosyllable recognition system was constructed in which the phoneme is used as the basic recognition unit, and the principle of the recognition is based on the mechanism of the articulation in our speech organ.

...read moreread less

Abstract: This paper describes the method and the system investigated to solve the problem encountered in the automatic recognition of speech sound. From research in the automatic analyzer of speech sound, a monosyllable recognition system was constructed in which the phoneme is used as the basic recognition unit. Recently this system has been developed to accept the conversational speech sound with unlimited vocabulary. The mechanical recognition of conversational speech sound requires two basic operations. One is the segmentation of the continuous speech sound into several discrete intervals (or segments), each of which may be thought to correspond to a phoneme, and the other is the pattern recognition of such segments. For segmentation, by defining two criteria, ``stability'' and ``distance,'' the properties of the time pattern obtained by the analysis of input speech sound may be examined. The principle of the recognition is based on the mechanism of the articulation in our speech organ. Corresponding to this, the machine has the functions called phoneme classification, vowel analysis and consonant analysis. A conversational speech recognition system with the phonetic contextual approach is also applied to the vowel recognition where the time pattern of input speech is matched with the stored standard patterns in which the phonetic contextual effects are taken into consideration. The time pattern which has great variety may be effectively expressed by the new representation of ``sequential pattern'' and ``weighting pattern.''

...read moreread less

47 citations

Proceedings Article•

The Phonetic Typewriter.

[...]

Toshiyuki Sakai, Shuji Doshita

01 Jan 1962

46 citations

Journal Article•DOI•

The Automatic Speech Recognition System for Conversational Sound

[...]

Toshiyuki Sakai¹, Shuji Doshita¹•Institutions (1)

Kyoto University¹

01 Dec 1963-IEEE Transactions on Electronic Computers

TL;DR: This paper describes the method and the system investigated to solve the problem encountered in the automatic recognition of speech sound, a monosyllable recognition system in which the phoneme is used as the basic recognition unit.

...read moreread less

Abstract: This paper describes the method and the system investigated to solve the problem encountered in the automatic recognition of speech sound From research in the automatic analyzer of speech sound, a monosyllable recognition system was constructed in which the phoneme is used as the basic recognition unit Recently this system has been developed to accept the conversational speech sound with unlimited vocabulary The mechanical recognition of conversational speech sound requires two basic operations One is the segmentation of the continuous speech sound into several discrete intervals (or segments), each of which may be thought to correspond to a phoneme, and the other is the pattern recognition of such segments For segmentation, by defining two criteria, ``stability'' and ``distance,'' the properties of the time pattern obtained by the analysis of input speech sound may be examined The principle of the recognition is based on the mechanism of the articulation in our speech organ Corresponding to this, the machine has the functions called phoneme classification, vowel analysis and consonant analysis A conversational speech recognition system with the phonetic contextual approach is also applied to the vowel recognition where the time pattern of input speech is matched with the stored standard patterns in which the phonetic contextual effects are taken into consideration The time pattern which has great variety may be effectively expressed by the new representation of ``sequential pattern'' and ``weighting pattern''

...read moreread less

43 citations

DOI•

On-Line, Real-Time Multiple Speech Output System and Its System Evaluation.

[...]

Toshiyuki Sakai, Kenji Ohtani, Shinji Tomita

01 Jan 1972

TL;DR: A new multiple speech output system is described, based on a compilation method, that converts the speech waves within one cycle at larynx frequency into speech sounds by synthesizers.

...read moreread less

Abstract: A new multiple speech output system is described. This system is based on a compilation method. Fundamental speech elements used for the synthesis are the speech waves within one cycle at larynx frequency. They are stored on a secondary memory. such as a magnetic drum or disk in the form of digitalized zero-crossing intervals. Speech elements are read out from the secondary memory and are connected in succession according to the connection rules by the computer program. The resulting sequences are transformed into the speech sounds by synthesizers. System evaluation is also performed in real time mode to measure the load of the computer and the maximum number of multiplicity.

...read moreread less

3 citations

Proceedings Article•

On-Line, Real-Time, Multiple-Speech Output System.

[...]

Toshiyuki Sakai, Kenji Ohtani, Shinji Tomita

01 Jan 1971

3 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A comprehensive review for industrial applicability of artificial neural networks

[...]

Magali Rezende Gouvêa Meireles¹, Paulo Eduardo Maciel de Almeida¹, Marcelo Godoy Simões•Institutions (1)

Colorado School of Mines¹

05 Jun 2003-IEEE Transactions on Industrial Electronics

TL;DR: An organized and normalized review of the industrial applications of artificial neural networks, in the last 12 years, is presented to help industrial managing and operational personnel decide which kind of ANN topology and training method would be adequate for their specific problems.

...read moreread less

Abstract: This paper presents a comprehensive review of the industrial applications of artificial neural networks (ANNs), in the last 12 years. Common questions that arise to practitioners and control engineers while deciding how to use NNs for specific industrial tasks are answered. Workable issues regarding implementation details, training and performance evaluation of such algorithms are also discussed, based on a judiciously chronological organization of topologies and training methods effectively used in the past years. The most popular ANN topologies and training methods are listed and briefly discussed, as a reference to the application engineer. Finally, ANN industrial applications are grouped and tabulated by their main functions and what they actually performed on the referenced papers. The authors prepared this paper bearing in mind that an organized and normalized review would be suitable to help industrial managing and operational personnel decide which kind of ANN topology and training method would be adequate for their specific problems.

...read moreread less

419 citations

Patent•DOI•

Method and apparatus for speech synthesis based on prosodic analysis

[...]

Sandra E. Hutchins

23 Sep 1992-Journal of the Acoustical Society of America

TL;DR: A system for synthesizing a speech signal from strings of words, which are themselves strings of characters, includes a memory in which predetermined syntax tags are stored in association with entered words and phonetic transcriptions are storedIn association with the syntax tags.

...read moreread less

Abstract: A system for synthesizing a speech signal from strings of words, which are themselves strings of characters, includes a memory in which predetermined syntax tags are stored in association with entered words and phonetic transcriptions are stored in association with the syntax tags. A parser accesses the memory and groups the syntax tags of the entered words into phrases according to a first set of predetermined grammatical rules relating the syntax tags to one another. The parser also verifies the conformance of sequences of the phrases to a second set of predetermined grammatical rules relating the phrases to one another. The system retrieves the phonetic transcriptions associated with the syntax tags that were grouped into phrases conforming to the second set of rules, and also translates predetermined strings of characters into words. The system generates strings of phonetic transcriptions and prosody markers corresponding to respective strings of the words, and adds markers for rhythm and stress to the strings, which are then converted into data arrays having prosody information on a diphone-by-diphone basis. Predetermined diphone waveforms are retrieved from memory that correspond to the entered words, and these retrieved waveforms are adjusted based on the prosody information in the arrays. The adjusted diphone waveforms, which may also be adjusted for coarticulation, are then concatenated to form the speech signal. Methods in a digital computer are also disclosed.

...read moreread less

318 citations

Journal Article•DOI•

State of the art in pattern recognition

[...]

George Nagy¹•Institutions (1)

IBM¹

01 Jan 1968

TL;DR: This paper reviews statistical, adaptive, and heuristic techniques used in laboratory investigations of pattern recognition problems and includes correlation methods, discriminant analysis, maximum likelihood decisions minimax techniques, perceptron-like algorithms, feature extraction, preprocessing, clustering and nonsupervised learning.

...read moreread less

Abstract: This paper reviews statistical, adaptive, and heuristic techniques used in laboratory investigations of pattern recognition problems. The discussion includes correlation methods, discriminant analysis, maximum likelihood decisions minimax techniques, perceptron-like algorithms, feature extraction, preprocessing, clustering and nonsupervised learning. Two-dimensional distributions are used to illustrate the properties of the various procedures. Several experimental projects, representative of prospective applications, are also described.

...read moreread less

317 citations

Automatic Speech Recognition - A Brief History of the Technology Development

[...]

Lawrence R. Rabiner

01 Jan 2004

TL;DR: Based on major advances in statistical modeling of speech in the 1980s, automatic speech recognition systems today find widespread application in tasks that require a human-machine interface, such as automatic call processing in the telephone network and query-based information systems that do things like provide updated travel information, stock price quotations, weather reports, etc.

...read moreread less

Abstract: Designing a machine that mimics human behavior, particularly the capability of speaking naturally and responding properly to spoken language, has intrigued engineers and scientists for centuries Since the 1930s, when Homer Dudley of Bell Laboratories proposed a system model for speech analysis and synthesis [1, 2], the problem of automatic speech recognition has been approached progressively, from a simple machine that responds to a small set of sounds to a sophisticated system that responds to fluently spoken natural language and takes into account the varying statistics of the language in which the speech is produced Based on major advances in statistical modeling of speech in the 1980s, automatic speech recognition systems today find widespread application in tasks that require a human-machine interface, such as automatic call processing in the telephone network and query-based information systems that do things like provide updated travel information, stock price quotations, weather reports, etc In this article, we review some major highlights in the research and development of automatic speech recognition during the last few decades so as to provide a technological perspective and an appreciation of the fundamental progress that has been made in this important area of information and communication technology

...read moreread less

270 citations

Fuzzy sets and decisionmaking approaches in vowel and speaker recognition

[...]

Sankar K. Pal, D. Dutta Majumder

01 Aug 1977

TL;DR: Two decision algorithmic methods using weighted-distance functions and property sets are developed and implemented with the optimum size of the training set on a large number of Telugu speech sounds with a recognition score of 82 percent for vowels and 97 percent for the speaker.

...read moreread less

Abstract: Some applications based on the theory of fuzzy sets in problems of computer recognition of vowels and identifying the person from his spoken words using only the first three formants (F 1, F2, and F3) of the unknown utterance are presented. Two decision algorithmic methods using weighted-distance functions and property sets are developed and implemented with the optimum size of the training set on a large number of Telugu (an important Indian language) speech sounds with a recognition score of 82 percent for vowels and 97 percent for the speaker.

...read moreread less

187 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

Collapse