A fuzzy acoustic-phonetic decoder for speech recognition
TL;DR: A general framework for acoustic-phonetic modelling is developed and context-sensitive rules are incorporated into a knowledge-based automatic speech recognition (ASR) system and are assessed with control based on fuzzy decision-making.
Abstract: A general framework for acoustic-phonetic modelling is developed. Context-sensitive rules are incorporated into a knowledge-based automatic speech recognition (ASR) system and are assessed with control based on fuzzy decision-making. A reliability measure is outlined, a test collection is run and a confusion matrix is built for each rule. During the recognition procedure, the fuzzy set of trained values related to the phonetic unit to be recognized is computed, and its membership function is automatically drawn. Tests were done on an isolated-word speech database of French with 1000 utterances and 33 rules. The results with a one-speaker low training rate are established via a two-step procedure: word recognition and a word-rejection testbed with five speakers who were never involved during the training.
Summary (2 min read)
2 . 1 Multi-Stage Decoding
- A bottom-up, rule-based, acoustic-phonetic decoder retrieves the segments and context-free features from isolated words .
- Then, a word recognizer  provides a set of concurrent lexical hypotheses from the previous phonetic lattice .
- To improve the recognition rate, a top-down decoder is now able to focus on phonetic transitions and to verify coarticulation cues.
- This environment allows the user to program context sensitive rules since all phonetic hypothesis are available during the top-down stage.
2 . 2 Contextual Rules
- The system combines three sets of recognition rules which analyse the spectral characteristics of the vocal tract to compute co-articulation features for French.
- The speaker references are obtained with a low training procedure (30 spoken words).
- A set of 24 mel-scaled LPC based cepstrum, energy, zero-crossing and delta zero-crossing rates are computed for each frame.
- The frequency band where burst occurs depends on the right context.
- If the following phoneme is /i/, L starts from the first channel to channel (F2+1) where F2 is extracted from the V-spectral reference.
3 . 2 A Reliability Measure.
- Using fuzzy sets initiated by , the platform provides a reliability measure in order to gain knowledge about the ability of each rule and to perform rational fusion operators on such degrees of uncertainty.
- Hence, the reliability measure is trained on an isolated-word speech database.
- During the recognition procedure, rule relevance may be computed from such a set of histograms.
- Pj which have been detected into a Pi equivalent signal portion within the lattice.
- L assures a normalization constraint which causes ignorance to get a high uncertainty.
3 . 3 Aggregation
- To compute a phonetic score knowing the reliability scores cij (see fusion1 in table 1), the semantic interpretation of cij is used.
- As an average reliability score means either ignorance or high uncertainty, the fusion1 operator solely trusts the lowest and the highest score.
- The experimental weight function w tends to aggregate with the min function if one of the Sj corresponds to a low degree of certainty, otherwise tends to aggregate with the arithmetical mean function.
- The evaluation speech data were selected from the BDLEX database.
- The reliability measure was poorly trained using a partial database collected from one male speaker.
- The isolatedword recognition corpus consisted of 1000 words preprocessed with a 20,000 word dictionary at bottom-up decoding: a group of five speakers (four males, one female), who were never involved during the learning stage, was presented with 200 words each.
- Thus, the results show the speaker-independent ability of the system.
- 33 rules were applied during the top-down phase.
- To summarize, the authors can say that fuzzy decision making has a number of advantages compared with hierarchical control when it comes to reject lexical hypotheses: Thresholds are delayed in the decision procedure; .
- The multi-domain parameters produced by rules can be compared and rationally aggregated after the computation of the reliability measure.
- One is the optimization of aggregation operators.
- On the other hand, the relevance measure has a potential use in other word rejection areas: speech recognition with HMM may improve by evaluating a probability model from reliability vectors, which is currently being investigated in a speaker independent vocal dictation system.
Did you find this useful? Give us your feedback
Cites methods from "A fuzzy acoustic-phonetic decoder f..."
...Applied to a phonetic hypothesis, each rule returns a numeric parameter re- lated to a fuzzy number via a procedure described in  : a fuzzy set of rule parameters is made up and the membership function CR () is drawn from one-speaker database histograms....
"A fuzzy acoustic-phonetic decoder f..." refers background in this paper
...If the N values of cij are ordered for a given phoneme j (c1j is the lowest and cNj the highest reliability score), fusion1 can be seen as an OWA operator [ 7 ] with a null weight vector [wi]1≤i≤N but first and last weight, expressing that c1j and cNj scores are more weighted as they go far from 0.5....
Related Papers (5)
Frequently Asked Questions (1)
Q1. What are the contributions mentioned in the paper "A fuzzy acoustic-phonetic decoder for speech recognition" ?
In this paper, a general framework of acoustic-phonetic modelling is developed.