scispace - formally typeset
Search or ask a question

Showing papers on "Speech coding published in 1991"


Journal ArticleDOI
TL;DR: The influence of several variables on PRMA efficiency, defined as the number of conversations per channel, is examined and it is found that with 32-kb/s speech coding and 720- kb/s transmission (22.5 channels), PRMA supports up to 37 simultaneous conversations, or 1.64 conservations per channel.
Abstract: Packet-reservation multiple access (PRMA) is viewed as a merger of slotted ALOHA and time-division multiple access (TDMA). Dispersed terminals transmit packets of speech information to a central base station. When its speech activity detector indicates the beginning of a talkspurt, a terminal contends with other terminals for access to an available time slot. After the base station detects the first packet in the talkspurt, the terminal reserves future time slots for transmission of subsequent speech packets. The influence of several variables on PRMA efficiency, defined as the number of conversations per channel, is examined. The number of channels is the ratio of transmission rate to speech coding rate. It is found that with 32-kb/s speech coding and 720-kb/s transmission (22.5 channels), PRMA supports up to 37 simultaneous conversations, or 1.64 conservations per channel. The number of conversations per channel is at least 1.5 over a wide range of packet sizes (8 ms of speech per packet to 34 ms) and for all systems with 16 or more channels (transmission rate >or=512 kb/s, with 32-kb/s speech coding). Other factors studied are the sensitivity of the speech activity detector, the retransmission probability of the contention scheme, and the maximum time delay for the transmission of speech packets. >

433 citations


Journal ArticleDOI
TL;DR: The algorithms are evaluated with respect to improving automatic recognition of speech in the presence of additive noise and shown to outperform other enhancement methods in this application.
Abstract: The basis of an improved form of iterative speech enhancement for single-channel inputs is sequential maximum a posteriori estimation of the speech waveform and its all-pole parameters, followed by imposition of constraints upon the sequence of speech spectra. The approaches impose intraframe and interframe constraints on the input speech signal. Properties of the line spectral pair representation of speech allow for an efficient and direct procedure for application of many of the constraint requirements. Substantial improvement over the unconstrained method is observed in a variety of domains. Informed listener quality evaluation tests and objective speech quality measures demonstrate the technique's effectiveness for additive white Gaussian noise. A consistent terminating point of the iterative technique is shown. The current systems result in substantially improved speech quality and linear predictive coding (LPC) parameter estimation with only a minor increase in computational requirements. The algorithms are evaluated with respect to improving automatic recognition of speech in the presence of additive noise and shown to outperform other enhancement methods in this application. >

263 citations


Book
01 Sep 1991
TL;DR: In 25 original chapter-articles, leading authorities address various aspects of speech signal processing, stressing the advances during the past five to ten years.
Abstract: In 25 original chapter-articles, leading authorities address various aspects of speech signal processing, stressing the advances during the past five to ten years. The volume presents a wealth of material, in a variety of styles, and is divided into four sections: analysis and coding (nine chapters)

248 citations


Journal ArticleDOI
Y. Medan1, E. Yair1, D. Chazan1
TL;DR: Based on a new similarity model for the voice excitation process, a novel pitch determination procedure is derived that has infinite (super) resolution, better accuracy than the difference limen for F/sub 0/, robustness to noise, reliability, and modest computational complexity.
Abstract: Based on a new similarity model for the voice excitation process, a novel pitch determination procedure is derived. The unique features of the proposed algorithm are infinite (super) resolution, better accuracy than the difference limen for F/sub 0/, robustness to noise, reliability, and modest computational complexity. The algorithm is instrumental to speech processing applications which require pitch synchronous spectral analysis. The computational complexity of the proposed algorithm is well within the capacity of modern digital signal processing (DSP) technology and therefore can be implemented in real time. >

232 citations


Patent
28 Oct 1991
TL;DR: In this paper, a CELP speech processor utilizes an organized, non-overlapping, algebraic codebook containing a predetermined number of vectors, uniformly distributed over a multi-dimensional sphere to generate a remaining speech residual.
Abstract: Apparatus and method for encoding speech using a codebook excited linear predictive (CELP) speech processor and an algebraic codebook for use therewith The CELP speech processor receives a digital speech input representative of human speech and performs linear predictive code analysis and perceptual weighting filtering to produce a short term speech information and a long term speech information The CELP speech processor utilizes an organized, non-overlapping, algebraic codebook containing a predetermined number of vectors, uniformly distributed over a multi-dimensional sphere to generate a remaining speech residual The short term speech information, long term speech information and remaining speech residual are combinable to form a quality reproduction of the digital speech input

230 citations


PatentDOI
TL;DR: In this article, an enhanced wide area audio response network (AWAN) is proposed, which includes a central controller and a plurality of audio peripherals distributed over a wide area, each audio peripheral being connected to telephone lines for receiving and originating telephone calls, converting received analog audio signals into digital representations, recording and storing digital representations.
Abstract: Apparatus and method to provide enhanced wide area audio response services through an enhanced wide area audio response network which includes a central controller and a plurality of audio peripherals distributed over a wide area, each audio peripheral being connected to telephone lines for receiving and originating telephone calls, converting received analog audio signals into digital representations, recording and storing digital representations, converting stored digital representations into analog audio signals, playing audio signals over connected telephone lines, and communicating with, including receiving commands from, the central controller over a Packet Switched Public Data Network (PSPDN), which controller is a highly reliable general purpose controller which offers utility grade service to each audio peripheral and utilizes Dialed Number Identification Service (DNIS) tables for various applications, including voice messaging, audio text, remote information provider accessing, and testing to provide error notification.

192 citations


PatentDOI
Juin-Hwey Chen1
TL;DR: In this paper, a low-bitrate (typically 8 kbit/s or less), low-delay digital coder and decoder based on Code Excited Linear Prediction for speech and similar signals features backward adaptive adjustment for codebook gain and short-term synthesis filter parameters and forward adaptive adjustment of long-term (pitch) synthesis filter parameter.
Abstract: A low-bitrate (typically 8 kbit/s or less), low-delay digital coder and decoder based on Code Excited Linear Prediction for speech and similar signals features backward adaptive adjustment for codebook gain and short-term synthesis filter parameters and forward adaptive adjustment of long-term (pitch) synthesis filter parameters. A highly efficient, low delay pitch parameter derivation and quantization permits overall delay which is a fraction of prior coding delays for equivalent speech quality at low bitrates.

166 citations


Journal ArticleDOI
TL;DR: It is shown that there is a difference in error sensitivity of four orders of magnitude between the most and the least sensitive bits of the speech coder, and a family of rate-compatible punctured convolutional codes with flexible unequal error protection capabilities have been matched to thespeech coder.
Abstract: The effects of digital transmission errors on a family of variable-rate embedded subband speech coders (SBC) are analyzed in detail. It is shown that there is a difference in error sensitivity of four orders of magnitude between the most and the least sensitive bits of the speech coder. As a result, a family of rate-compatible punctured convolutional codes with flexible unequal error protection capabilities have been matched to the speech coder. These codes are optimally decoded with the Viterbi algorithm. Among the results, analysis and informal listening tests show that with a 4-level unequal error protection scheme transmission of 12 kb/s speech is possible with very little degradation in quality over a 16 kb/s channel with an average bit error rate (BER) of 2*10/sup -2/ at a vehicle speed of 60 m.p.h. and with interleaving over two 16 ms speech frames. >

143 citations


Journal ArticleDOI
TL;DR: Seven postlinguistically deaf adults implanted with the Nucleus Multi-Electrode Cochlear Implant participated in an evaluation of speech perception performance with three speech processors, finding an increase in their ability to communicate with other people using the Mini Speech Processor (Multi-Peak speech coding strategy) compared with the Wearable Speech Processor in everyday life.
Abstract: Seven postlinguistically deaf adults implanted with the Nucleus Multi-Electrode Cochlear Implant participated in an evaluation of speech perception performance with three speech processors: the Wearable Speech Process (WSP III), a prototype of the Mini Speech Processor, and the Mini Speech Processor. The first experiment was performed with the prototype and Wearable Speech Processor both programmed using the F0F1F2 speech coding strategy. The second experiment compared performance with the Mini Speech Processor programmed with the Multi-Peak speech coding strategy and the Wearable Speech Processor programmed with the F0F1F2 speech coding strategy. Performance was evaluated in the sound-only condition using recorded speech tests presented in quiet and in noise. Questionnaires and informal reports provided information about use in everyday life. In experiment I, there was no significant difference in performance using the Wearable Speech Processor and prototype on any of the tests. Nevertheless, six out of seven subjects preferred the prototype for use in everyday life. In experiment II, performance on open-set tests in quiet and noise was significantly higher with the Mini Speech Processor (Multi-Peak speech coding strategy) than with the Wearable Speech Processor. Subjects reported an increase in their ability to communicate with other people using the Mini Speech Processor (Multi-Peak speech coding strategy) compared with the Wearable Speech Processor in everyday life.

136 citations


Book
30 Sep 1991
TL;DR: A text in channel coding, decoding algorithms, and compression of data and speech, designed for both classroom and research use.
Abstract: A text in channel coding, decoding algorithms, and compression of data and speech, designed for both classroom and research use There is a special emphasis on the algorithms employed in the field

129 citations


Proceedings ArticleDOI
14 Apr 1991
TL;DR: A modular software TTS (text-to-speech) system for Greek with good intelligibility and quality of speech and the possibility of being further improved by extending its linguistic knowledge is presented.
Abstract: A modular software TTS (text-to-speech) system for Greek with good intelligibility and quality of speech and the possibility of being further improved by extending its linguistic knowledge is presented. The system has several peculiarities in comparison to most systems for other languages, combining the advantages of formant synthesis with those of diphone synthesis. In addition to the text normalizer (including numbers) and the sophisticated text preprocessor, the system uses composite speech segments besides phonemes which are concatenated, using a dynamically-adjusted-in-range, sigmoid function. The segments are coded in a novel scheme aiding the rules which manipulate voice onset and duration times. A declined line with its ending part dependent on the punctuation mark or function word, which fluctuates according to the stressed points and unvoiced (voiced) consonant and plosive locations, controls the intonation of the input text. The system is to be improved by incorporating elaborate prosodic rules resulting from syntactic analysis of the text. >

Proceedings ArticleDOI
14 Apr 1991
TL;DR: An efficient procedure for searching such a large codebook deploying a focused search strategy, where less than 0.1% of the codebook is searched with performance very close to that of a full search is described.
Abstract: The application of algebraic code excited linear prediction (ACELP) coding to wideband speech is presented An algebraic codebook with a 20 bit address can be used without any storage requirements and, more importantly, with a very efficient search procedure which allows for real-time implementation The authors describe an efficient procedure for searching such a large codebook deploying a focused search strategy, where less than 01% of the codebook is searched with performance very close to that of a full search High-quality speech at a bit rate of 13 kbps was obtained >

Proceedings ArticleDOI
14 Apr 1991
TL;DR: The exploitation of left-right correlation in a subband code for stereophonic audio signals is investigated and preliminary results of a stereo codec are promising: at 192 kb/s good coding results have been obtained.
Abstract: The exploitation of left-right correlation in a subband code for stereophonic audio signals is investigated. A transform of left and right signals into decorrelated intensity and error signals is presented. Although this can be seen as the optimal exploitation of redundancy, it yields only marginal gain in bit rate. If the reduced phase-sensitivity of the human observer can be exploited by encoding only the intensity signal, a substantial gain can be obtained. Preliminary results of a stereo codec are promising: at 192 kb/s good coding results have been obtained. >

Book ChapterDOI
Ira A. Gerson1, Mark A. Jasiuk1
01 Jan 1991
TL;DR: The VSELP speech coder was designed to achieve the highest possible speech quality with reasonable computational complexity while providing robustness to channel errors.
Abstract: Vector Sum Excited Linear Prediction falls into the class of speech coders known as Code Excited Linear Prediction (CELP) (also called Vector Excited or Stochastically Excited) [1,4,5]. The VSELP speech coder was designed to achieve the highest possible speech quality with reasonable computational complexity while providing robustness to channel errors. These goals are essential for wide acceptance of low data rate (4.8-8 kbps) speech coding for telecommunications applications.

PatentDOI
TL;DR: A speech coder apparatus operates to compress speech signals to a low bit rate and includes a continuous speech recognizer (CSR) which has a memory for storing templates.
Abstract: A speech coder apparatus operates to compress speech signals to a low bit rate. The apparatus includes a continuous speech recognizer (CSR) which has a memory for storing templates. Input speech is processed by the CSR where information in the speech is compared against the templates to provide an output digital signal indicative of recognized words, which signal is transmitted along a first path. There is further included a front end processor which is also responsive to the input speech signal for providing output digitized speech samples during a given frame interval. A side information encoder circuit responds to the output from the front end processor to provide at the output of the encoder a parameter signal indicative of the value of the pitch and word duration for each word as recognized by the CSR unit. The output of the encoder is transmitted as a second signal. There is a receiver which includes a synthesizer responsive to the first and second transmitted signals for providing an output synthesized signal for each recognized word where the pitch, duration and amplitude of the synthesized signal is changed according to the parameter signal to preserve the quality of the synthesized speech.

PatentDOI
TL;DR: In this article, an adaptive filtering technique is applied to sequences of energy estimates in each of two signal channels, one channel containing speech and environmental noise and the other channel containing primarily the same environmental noise.
Abstract: A digital signal processing system applies an adaptive filtering technique to sequences of energy estimates in each of two signal channels, one channel containing speech and environmental noise and the other channel containing primarily the same environmental noise. From the channel containing primarily environmental noise, a prediction is made of the energy of that noise in the channel containing both the speech and that noise, so that the noise can be extracted from the mixture of speech and noise. The result is that the speech will be more easily recognizable by either human listeners or speech recognition systems.


Book
08 Jan 1991
TL;DR: This chapter discusses Digital Signal Processing methods, Information Theory and Probability Models, and some Useful Practical Classes of Random Processes.
Abstract: Preface Acknowledgement Symbols Abbreviations Part I Basic Digital Signal Processing 1 Introduction 11 Signals and Information 12 Signal Processing Methods 13 Applications of Digital Signal Processing 14 Summary 2 Fourier Analysis and Synthesis 21 Introduction 22 Fourier Series: Representation of Periodic Signals 23 Fourier Transform: Representation of Nonperiodic Signals 24 Discrete Fourier Transform 25 Short-Time Fourier Transform 26 Fast Fourier Transform (FFT) 27 2-D Discrete Fourier Transform (2-D DFT) 28 Discrete Cosine Transform (DCT) 29 Some Applications of the Fourier Transform 210 Summary 3 z-Transform 31 Introduction 32 Derivation of the z-Transform 33 The z-Plane and the Unit Circle 34 Properties of z-Transform 35 z-Transfer Function, Poles (Resonance) and Zeros (Anti-resonance) 36 z-Transform of Analysis of Exponential Transient Signals 37 Inverse z-Transform 38 Summary 4 Digital Filters 41 Introduction 42 Linear Time-Invariant Digital Filters 43 Recursive and Non-Recursive Filters 44 Filtering Operation: Sum of Vector Products, A Comparison of Convolution and Correlation 45 Filter Structures: Direct, Cascade and Parallel Forms 46 Linear Phase FIR Filters 47 Design of Digital FIR Filter-banks 48 Quadrature Mirror Sub-band Filters 49 Design of Infinite Impulse Response (IIR) Filters by Pole-zero Placements 410 Issues in the Design and Implementation of a Digital Filter 411 Summary 5 Sampling and Quantisation 51 Introduction 52 Sampling a Continuous-Time Signal 53 Quantisation 54 Sampling Rate Conversion: Interpolation and Decimation 55 Summary Part II Model-Based Signal Processing 6 Information Theory and Probability Models 61 Introduction: Probability and Information Models 62 Random Processes 63 Probability Models of Random Signals 64 Information Models 65 Stationary and Non-Stationary Random Processes 66 Statistics (Expected Values) of a Random Process 67 Some Useful Practical Classes of Random Processes 68 Transformation of a Random Process 69 Search Engines: Citation Ranking 610 Summary 7 Bayesian Inference 71 Bayesian Estimation Theory: Basic Definitions 72 Bayesian Estimation 73 Expectation Maximisation Method 74 Cramer-Rao Bound on the Minimum Estimator Variance 75 Design of Gaussian Mixture Models (GMM) 76 Bayesian Classification 77 Modelling the Space of a Random Process 78 Summary 8 Least Square Error, Wiener-Kolmogorov Filters 81 Least Square Error Estimation: Wiener-Kolmogorov Filter 82 Block-Data Formulation of the Wiener Filter 83 Interpretation of Wiener Filter as Projection in Vector Space 84 Analysis of the Least Mean Square Error Signal 85 Formulation of Wiener Filters in the Frequency Domain 86 Some Applications of Wiener Filters 87 Implementation of Wiener Filters 88 Summary 9 Adaptive Filters: Kalman, RLS, LMS 91 Introduction 92 State-Space Kalman Filters 93 Sample Adaptive Filters 94 Recursive Least Square (RLS) Adaptive Filters 95 The Steepest-Descent Method 96 LMS Filter 97 Summary 10 Linear Prediction Models 101 Linear Prediction Coding 102 Forward, Backward and Lattice Predictors 103 Short-Term and Long-Term Predictors 104 MAP Estimation of Predictor Coefficients 105 Formant-Tracking LP Models 106 Sub-Band Linear Prediction Model 107 Signal Restoration Using Linear Prediction Models 108 Summary 11 Hidden Markov Models 111 Statistical Models for Non-Stationary Processes 112 Hidden Markov Models 113 Training Hidden Markov Models 114 Decoding Signals Using Hidden Markov Models 115 HMM in DNA and Protein Sequences 116 HMMs for Modelling Speech and Noise 117 Summary 12 Eigenvector Analysis, Principal Component Analysis and Independent Component Analysis 121 Introduction - Linear Systems and Eigenanalysis 122 Eigenvectors and Eigenvalues 123 Principal Component Analysis (PCA) 124 Independent Component Analysis 125 Summary Part III Applications of Digital Signal Processing to Speech, Music and Telecommunications 13 Music Signal Processing and Auditory Perception 131 Introduction 132 Musical Notes, Intervals and Scales 133 Musical Instruments 134 Review of Basic Physics of Sounds 135 Music Signal Features and Models 136 Anatomy of the Ear and the Hearing Process 137 Psychoacoustics of Hearing 138 Music Coding (Compression) 139 High Quality Audio Coding: MPEG Audio Layer-3 (MP3) 1310 Stereo Music Coding 1311 Summary 14 Speech Processing 141 Speech Communication 142 Acoustic Theory of Speech: The Source-filter Model 143 Speech Models and Features 144 Linear Prediction Models of Speech 145 Harmonic Plus Noise Model of Speech 146 Fundamental Frequency (Pitch) Information 147 Speech Coding 148 Speech Recognition 149 Summary 15 Speech Enhancement 151 Introduction 152 Single-Input Speech Enhancement Methods 153 Speech Bandwidth Extension - Spectral Extrapolation 154 Interpolation of Lost Speech Segments - Packet Loss Concealment 155 Multi-Input Speech Enhancement Methods 156 Speech Distortion Measurements 157 Summary 16 Echo Cancellation 161 Introduction: Acoustic and Hybrid Echo 162 Telephone Line Hybrid Echo 163 Hybrid (Telephone Line) Echo Suppression 164 Adaptive Echo Cancellation 165 Acoustic Echo 166 Sub-Band Acoustic Echo Cancellation 167 Echo Cancellation with Linear Prediction Pre-whitening 168 Multi-Input Multi-Output Echo Cancellation 169 Summary 17 Channel Equalisation and Blind Deconvolution 171 Introduction 172 Blind Equalisation Using Channel Input Power Spectrum 173 Equalisation Based on Linear Prediction Models 174 Bayesian Blind Deconvolution and Equalisation 175 Blind Equalisation for Digital Communication Channels 176 Equalisation Based on Higher-Order Statistics 177 Summary 18 Signal Processing in Mobile Communication 181 Introduction to Cellular Communication 182 Communication Signal Processing in Mobile Systems 183 Capacity, Noise, and Spectral Efficiency 184 Multi-path and Fading in Mobile Communication 185 Smart Antennas - Space-Time Signal Processing 186 Summary Index

Proceedings ArticleDOI
14 Apr 1991
TL;DR: Measurements were made of the correlation dimension of normally spoken speech from a single speaker, and the results reveal that most of the points in the state space of the signal lie very close to a manifold of a dimensionality of less than three, indicating that one should be able to construct a nonlinear predictor for speech that significantly outperforms linear predictors.
Abstract: Measurements were made of the correlation dimension of normally spoken speech from a single speaker, and the results reveal that most of the points in the state space of the signal lie very close to a manifold of a dimensionality of less than three. This result indicates that one should be able to construct a nonlinear predictor for speech that significantly outperforms linear predictors. To validate this conclusion, a nonparametric predictor was constructed which was able to produce a prediction gain approximately 3 dB better than an equivalent linear predictor. Similar improvements in signal-to-noise ratio were also observed when the nonlinear predictor was added to a simple speech coder. >

Patent
02 May 1991
TL;DR: In this article, a supervisory circuit for use with an audio intrusion detection system is disclosed, in which the supervisory circuits periodically generate an audio test signal which is supplied to a sounder, which emits audio test sound.
Abstract: A supervisory circuit for use with an audio intrusion detection system is disclosed. The supervisory circuit periodically generates an audio test signal which is supplied to a sounder which emits an audio test sound. The audio test sound is directed into a volume of space, in the same volume of space as which the audio intrusion detection system is directed to detect. The audio intrusion detection system detects the test sound and generates an audio test signal in response thereto. During the generation of the audio test sound, the comparing apparatus of the audio intrusion detection system is disabled. The audio test signal generated by the audio intrusion detection system is then compared to a test threshold signal. A test result signal is generated in response to the comparison with the test result signal indicative of the operability of the audio intrusion detection system.

PatentDOI
TL;DR: A CELP type speech coding system is provided with an arithmetic processing unit which transforms a perceptual weighted input speech signal vector AX to a vector t AAX, a sparse adaptive codebook which stores a plurality of pitch prediction residual vectors P sparsed by a sparse unit, and a multiplying unit which multiplies the successively read out vectors P and the output tAAX from the arithmeticprocessing unit.
Abstract: A speech coding and decoding system, the system is operated under a known code-excited linear prediction (CELP) coding method. The CELP coding is achieved by selecting an optimum pitch vector P from an adaptive codebook and the corresponding first gain, and at the same time, selecting an optimum code vector from a stochastic codebook and the corresponding second gain. The system of the present invention is featured by a weighted orthogonalization transforming unit introduced therein. The perceptually weighted code vector AC is not used as is, as usual, but after the transformation thereof into a perceptually weighted code vector AC' by the above unit; the vector AC' being made orthogonal to the optimum perceptually weighted pitch vector AP.

Journal ArticleDOI
TL;DR: A new method based on the assumption that, for voiced speech, a perceptually accurate speech signal can be reconstructed from a description of the waveform of a single, representative pitch cycle per interval of 20-30 ms is presented, which retains the natural quality of coders which encode the entire waveform, but requires a bit rate close to that of the parametric coders.

Proceedings ArticleDOI
Willem Bastiaan Kleijn1
14 Apr 1991
TL;DR: A novel method of coding voiced speech is introduced, which transmits an encoded prototype waveform at 20-30 ms intervals, and is quantized using analysis-by-synthesis methods, which results in excellent speech quality at rates between 3.0 and 4.0 kb/s.
Abstract: A major source of audible distortion in current low-bit-rate speech coding algorithms is an inaccurate degree of periodicity of the voiced speech signal. If the correlations between neighboring pitch cycles are accurately reproduced, these audible distortions can be reduced significantly. To this purpose, a novel method of coding voiced speech is introduced, which transmits an encoded prototype waveform at 20-30 ms intervals. The prototype waveform describes a pitch cycle representative for the interval, and is quantized using analysis-by-synthesis methods. The speech signal is reconstructed by concatenation of interpolated prototype waveforms. The short-term and the long-term correlations between pitch cycles can be controlled explicitly. Unquantized reconstructed speech is virtually indistinguishable from the original signal. The method results in excellent speech quality at rates between 3.0 and 4.0 kb/s. >

Proceedings ArticleDOI
14 Apr 1991
TL;DR: Theoretical and practical aspects of the MUSICAM (masking pattern adapted universal subband integrated coding and multiplexing) system are presented and how it has been designed so as to meet the technical requirements of most applications.
Abstract: Theoretical and practical aspects of the MUSICAM (masking pattern adapted universal subband integrated coding and multiplexing) system are presented. The system is briefly described. It is one of the few codecs able to achieve high audio quality at bit-rates in the range of 64 to 192 kb/s per monophonic channel. It is shown how it has been designed so as to meet the technical requirements of most applications (low delay, low complexity, error robustness, short access units, etc.). Two examples of applications in the field of digital audio broadcasting and multimedia are given. >

Proceedings ArticleDOI
Erik Ordentlich1, Yair Shoham1
14 Apr 1991
TL;DR: An enhanced noise weighting technique is proposed and demonstrated its efficiency via subjective listening tests and was essentially equal to that of the 65 kb/s standard (G.722) CCITT wideband coder.
Abstract: The authors report on the use of the codebook-excited linear-predictive (CELP) algorithm for 32 kb/s low-delay (LD-CELP) coding of wideband speech. The main problem associated with wideband coding, namely, spectral noise weighting, is discussed. The authors propose an enhanced noise weighting technique and demonstrate its efficiency via subjective listening tests. In these tests, involving 20 listeners and 8 test sentences, the average rating for the proposed 32 kb/s LD-CELP was essentially equal to that of the 65 kb/s standard (G.722) CCITT wideband coder. >

PatentDOI
TL;DR: A user-orientated telephone instrument including high-fidelity speech encoding, encryption capability and many function presently provided by centralized call processing is described.
Abstract: A user-orientated telephone instrument including high-fidelity speech encoding, encryption capability and many function presently provided by centralized call processing is described. Utilizing an ADPCM encoding process including forward error correction, a high quality relatively noise immune signal is provided. To the user, the instrument appears as a unique electronic telephone which offers many special services to facilitate communications with other people and with computers. Advanced features such as optional hands-off voice on/off-hook control, personal identification by voice print and remote chart presentation are provided.

PatentDOI
Joji Kane1, Akira Nohara1
TL;DR: In this article, a signal detection apparatus for detecting a noise-suppressed speech signal is described, in which a band division process including a Fourier transformation is performed for an inputted speech signal, thereby outputting spectrum signals of plural channels.
Abstract: There is disclosed a signal detection apparatus for detecting a noise-suppressed speech signal. In the signal detection apparatus, a band division process including a Fourier transformation is performed for an inputted speech signal, thereby outputting spectrum signals of plural channels. A cepstrum analysis process is performed for the spectrum signals, and a peak of the obtained cepstrum is detected in response to the cepstrum analysis result. Thereafter, a speech signal interval of the inputted noisy speech signal is detected in response to the detected peak, and a noise is predicted in the speech signal in response to the detected speech signal interval. Then, the predicted noise is canceled in the spectrum signals thereby outputting noise-suppressed spectrum signals. Finally, the noise-suppressed spectrum signals are combined and are inverse Fourier-transformed, thereby outputting a noise-suppressed speech signal.

PatentDOI
TL;DR: In this article, the quantizers designated for individual subbands are determined to minimize mean squared error distortion in the recreated signal while using no more than a predetermined number of quantization bits per window of speech.
Abstract: In an adaptive subband excited transform speech encoding system, a range of quantizers are available and are dynamically selected for each window of speech. The quantizers designated for individual subbands are determined to minimize mean squared error distortion in the recreated signal while using no more than a predetermined number of quantization bits per window of speech.

Book ChapterDOI
Peter Kroon1, Bishnu S. Atal1
01 Jan 1991
TL;DR: The pitch predictor removes the redundancy in a periodic speech signal by predicting the current signal from a linear combination of past versions of this signal.
Abstract: Pitch prediction [1] plays an important role in many speech coding systems such as multipulse [2] and vector or code-excited linear predictive coders [3]. The pitch predictor removes the redundancy in a periodic speech signal by predicting the current signal from a linear combination of past versions of this signal. The general form of an odd-order pitch predictor with delay M and predictor coefficients b(k) is given by: Equ1 number equation at page 321

Patent
Tatsuya Yaguchi1
27 Aug 1991
TL;DR: In this article, a digital communication device of the present invention is provided with a modulation circuit for modulating a digital transmit signal, a first interpolater for converting the modulated signal in frequency, a coding circuit for coding the signal converted in frequency into an audio PCM transmission code with reference to a voice companding code table, a decoding circuit for decoding a coded PCM receive code into a digital signal, and a demodulation circuit for demodulating the converted signal and digitally performs modem modulation/demodulation and voice codec processing.
Abstract: A digital communication device of the present invention is provided with a modulation circuit for modulating a digital transmit signal, a first interpolater for converting the modulated signal in frequency, a coding circuit for coding the signal converted in frequency into an audio PCM transmission code with reference to a voice companding code table, a decoding circuit for decoding a coded audio PCM receive code into a digital signal with reference to the voice companding code table, a second interpolater for converting the decoded digital signal in frequency, and a demodulation circuit for demodulating the converted signal, and digitally performs modem modulation/demodulation and voice codec processing.