scispace - formally typeset
Search or ask a question

Showing papers on "Linear predictive coding published in 1983"


Proceedings ArticleDOI
Bishnu S. Atal1
14 Apr 1983
TL;DR: The aim is to determine the extent to which the bit rate of LPC parameters can be reduced without sacrificing speech quality.
Abstract: This paper describes a method for efficient coding of LPC log area parameters. It is now well recognized that sample-by-sample quantization of LPC parameters is not very efficient in minimizing the bit rate needed to code these parameters. Recent methods for reducing the bit rate have used vector and segment quantization methods. Much of the past work in this area has focussed on efficient coding of LPC parameters in the context of vocoders which put a ceiling on achievable speech quality. The results from these studies cannot be directly applied to synthesis of high quality speech. This paper describes a different approach to efficient coding of log area parameters. Our aim is to determine the extent to which the bit rate of LPC parameters can be reduced without sacrificing speech quality. Speech events occur generally at non-uniformly spaced time intervals. Moreover, some speech events are slow while others are fast. Uniform sampling of speech parameters is thus not efficient. We describe a non-uniform sampling and interpolation procedure for efficient coding of log area parameters. A temporal decomposition technique is used to represent the continuous variation of these parameters as a linearly-weighted sum of a number of discrete elementary components. The location and length of each component is automatically adapted to speech events. We find that each elementary component can be coded as a very low information rate signal.

377 citations


Journal ArticleDOI
TL;DR: Recent results obtained in waveform coding of speech with vector quantization are reviewed, with Vector quantization appearing to be a suitable coding technique which caters to this dual requirement of effective speech coding.
Abstract: V ECTOR QUANTIZATION (VQ), a new direction in source coding, has recently emerged as a powerful and widely applicable coding technique. I t was first applied to analysis/synthesis of speech, and has allowed Linear Predictive Coding (LPC) rates to be dramatically reduced to 800 b/s with very slight reduction in quality, and further compressed to rates as low as 150 b/s while retaining intelligibility [ 1,2]. More recently, the technique has found its way to waveform coding [3-51, where its applicability and effectiveness is less obvious and not widely known. There is currently a great need for a low-complexity speech coder at the rate of 16 kb/s which attains essentially “toll” quality, roughly equivalent to that of standard 64-kb/s log PCM codecs. Adaptive DPCM schemes can attain this quality with low complexity for the proposed 32 kb/s CCITT standard, but at 16 kb/s the quality of ADPCM or adaptive delta modulation schemes is inadequate. More powerful methods, such as subband coding or transform coding, are capable of producing acceptable speech quality at 16kb/s but have a much higher implementation complexity. The difficulty is further compounded by the need for a scheme that can handle both speech and voiceband data at the 16 kb/s rate. These two types of waveforms occupy the same bandwidth in the subscriber loop part of the telephone network, yet they have a widely different statistical character. Effective speech coding at this rate must be geared to the specific character of speech and must exploit our knowledge of human hearing. On the other hand, a waveform that carries data must be coded and later reconstructed so that a modem can still extract the data with an acceptably low error rate. This is purely a signal processing operation not involving human perception. Vector quantization appears to be a suitable coding technique which caters to this dual requirement. VQ may become the key to 16 kb/s coding; it may also lead to improved quality waveform coding at 8 or 9.6 kb/s. In this paper, we review recent results obtained in waveform coding of speech with vector quantization and

198 citations


Proceedings ArticleDOI
01 Apr 1983
TL;DR: It is demonstrated in this paper that this random quantizer used in the original vocoder is near-optimal by comparing it with quantizers that use clustering algorithms for quantizing speech segments.
Abstract: In this paper we investigate several methods for reducing the bit rate of a segment vocoder [1] by 35% to 150 b/s. In the original vocoder we used a random sample of vectors as a set of templates for vector quantization. We demonstrate in this paper that this random quantizer is near-optimal by comparing it with quantizers that use clustering algorithms for quantizing speech segments. The reduction of the bit rate of the segment vocoder was achieved primarily by using a segment network, i.e., not all segment templates are allowed to follow a given segment template. The spectral continuity of speech is used to determine the subset of templates, that can be used to quantize an input segment. To achieve the low rate of 150 b/s, we also reduced the bit rate for coding pitch, gain, and segment duration. Finally, we present the bit allocation used for transmitting speech at 150 b/s as a single speaker segment vocoder.

178 citations


Journal ArticleDOI
TL;DR: This paper compares this algorithm to several alternative algorithms and studies the properties of the resulting code books to conclude that the various algorithms gave essentially identical code books.
Abstract: Vector quantization has been used in coding applications for several years. Recently, quantization of linear predictive coding (LPC) vectors has been used for speech coding and recognition. In these latter applications, the only method that has been used for deriving the vector quantizer code book from a set of training vectors is the one described by Linde, Buzo, and Gray. In this paper, we compare this algorithm to several alternative algorithms and also study the properties of the resulting code books. Our conclusion is that the various algorithms that we tried gave essentially identical code books.

169 citations


Journal ArticleDOI
TL;DR: In this article, a modification of LPC, called time-varying LPC is proposed, which can be used to analyze nonstationary speech signals, where the coefficients of the linear combination of functions are obtained by the same least squares error technique used by the LPC.

155 citations


Proceedings ArticleDOI
14 Apr 1983
TL;DR: Subjective evaluation with the diagnostic rhyme test (DRT) finds the proposed techniques to be feasible for intelligible speech transmission at bit rates between 400 bits/sec and 200 bit/sec.
Abstract: Frame predictive vector quantization is developed to compress the bit rate for coding the LPC filter coefficients to under 250 bits/sec. An innovative LPC compression technique, matrix quantization, is also developed to compress the LPC filter coefficients to a rate under 150 bits/sec. Subjective evaluation with the diagnostic rhyme test (DRT) finds the proposed techniques to be feasible for intelligible speech transmission at bit rates between 400 bits/sec and 200 bits/sec.

146 citations


Journal ArticleDOI
A. Nadas1
TL;DR: The currently used method of maximum likelihood, while heuristic, is shown to be superior under certain assumptions to another heuristic: the method of conditional maximum likelihood.
Abstract: The choice of method for training a speech recognizer is posed as an optimization problem. The currently used method of maximum likelihood, while heuristic, is shown to be superior under certain assumptions to another heuristic: the method of conditional maximum likelihood.

135 citations


Proceedings ArticleDOI
14 Apr 1983
TL;DR: This integrated pitch tracking algorithm is compared to three standard pitch tracking algorithms over a data base of 58 male and female speakers ranging from 6 to 87 years of age and is shown to exhibit superior performance.
Abstract: A pitch tracking algorithm is described which operates in the time domain from a conditioned linear prediction residual and applies dynamic programming to optimally determine both pitch and voicing. A set of candidate pitch values are derived from a correlation function applied to an LPC prediction residual which has been low pass filtered in voiced speech and high pass filtered in unvoiced speech by using a single pole filter based on the first reflection coefficient of LPC. A post processing technique using dynamic programming is used to obtain a smooth pitch contour. By incorporating the correlation values of the candidate pitch values, voicing state information and spectral change information into the penalty function of the dynamic programming, a voicing decision is obtained along with an optimum pitch value. This integrated pitch tracking algorithm is compared to three standard pitch tracking algorithms over a data base of 58 male and female speakers ranging from 6 to 87 years of age and is shown to exhibit superior performance.

120 citations


PatentDOI
TL;DR: In this paper, a segment is classified as "speech" if the energy of the signal is greater than an adaptively adjusted threshold defined as the maximum of scaled values of two separate envelope parameters, which both track the variation in energy over the sequence of frames of speech data.
Abstract: Silence suppression in speech synthesis systems is achieved by detecting and processing only segments of voice activity. A segment is classified as "speech" if the energy of the signal is greater than an adaptively adjusted threshold. The adaptively adjusted threshold is preferably defined as the maximum of scaled values of two separate envelope parameters, which both track the variation in energy over the sequence of frames of speech data. One contour is a slow-rising fast-falling value, which is updated only during unvoiced speech frames, and therefore track a lower envelope of the energy contour. This parameter in effect tracks an ambiant noise level. The other parameter is a fast-rising slow-falling parameter, which is updated only during voiced speech frames, and thus tracks an upper envelope of the energy contour. (This in effect tracks the average speech level.) A nonsilent energy tracker and a silent energy tracker adjust corresponding energy values representing the energy contours.

77 citations


PatentDOI
TL;DR: In this article, the residual signal derived from linear predictive coding (LPC) estimation is adaptively filtered, and then is used as the input to a conventional pitch estimation procedure, where the adaptive filtering step uses the first reflection coefficient (k1) to realize a simple filter (e.g., A(z)=(1-k1 z-1)-1).
Abstract: A voice messaging system, wherein linear predictive coding (LPC) parameters, pitch, and preferably other excitation information is derived from a human voice input, encoded, and transmitted and/or stored, to be called up later to provide a speech output which is nearly identical to the original speech input. The invention features adaptive filtering of the residual signal. The residual signal derived from LPC estimation is adaptively filtered, and then is used as the input to a conventional pitch estimation procedure. The adaptive filtering step uses the first reflection coefficient (k1) to realize a simple filter (e.g., A(z)=(1-k1 z-1)-1. This filter removes high frequency noise from the residual signal during voiced periods, but does not remove the high frequency energy which contains important information during the unvoiced periods of speech. Preferably the above preprocessing technique is also combined with a postprocessing technique, wherein dynamic programming is used to optimally track pitch and voicing information through successive frames.

57 citations


Proceedings ArticleDOI
14 Apr 1983
TL;DR: The Spectral Transform LP (STLP) method is proposed, which introduces Amplitude transforms on the input spectrum and on the spectrum of the model to modify the error criterion and the model adopted in the standard LP analysis.
Abstract: The Linear Predictive (LP) method has been widely used in speech analysis, mainly because of the simple mathematical formulation of the model and the straightforward computation of its parameters. However, there still remain certain difficulties that cause errors in the result of the analysis. Often encountered are the errors due to harmonic structure of the excitation source. The Spectral Transform LP (STLP) method proposed in the present paper aims at reducing these errors. Amplitude transforms on the input spectrum and on the spectrum of the model are introduced to modify the error criterion and the model adopted in the standard LP analysis. We show by analyses of both synthetic and natural speech that the STLP method offers significant improvement over the standard LP method. A method of STLP speech synthesis using the standard LP model is proposed. A perceptual experiment confirms the superiority of the STLP method in analysis of speech.

Proceedings ArticleDOI
01 Apr 1983
TL;DR: A model that allows accurate evaluation of the envelope of the reverberant speech, even when little prior information about the room characteristics is available, is proposed in the context of a multiband processing scheme, aiming at the enhancement of single microphone recorded reverberantspeech signals.
Abstract: Acoustic environments can be treated as linear systems whose transmission properties are given by their impulse response functions. This basic model can be extended, under certain conditions, to describe the relationship between the envelopes of the input and output waveforms. Such a model is proposed in the context of a multiband processing scheme, aiming at the enhancement of single microphone recorded reverberant speech signals. The specific requirements of this model permit a simplified approach to the estimation of the envelope functions. The model allows accurate evaluation of the envelope of the reverberant speech, even when little prior information about the room characteristics is available. Speech enhancement can be then achieved after envelope deconvolution in each band, which recovers the envelope of the anechoic signal from the measured speech envelope, and final reconstruction of the speech waveform using the original phase function.

Proceedings ArticleDOI
Sharad Singhal1, B. Atal
01 Apr 1983
TL;DR: The possibility that multi-pulse excitation can approximate the all-pole filter excitation sufficiently closely and obtain the optimum filter parameters for this excitation is examined.
Abstract: Present LPC analysis procedures assume that the input to the all-pole filter is white; the filter parameters are obtained by minimizing the mean-squared error between the filter output samples and their values obtained by linear prediction on the basis of past output samples. It is well known that these procedures often do not yield accurate filter parameters for periodic (or quasi-periodic) signals such as voiced speech. To compensate for the periodic nature of speech, an estimate of the excitation of the all-pole filter has to be made. Multi-pulse LPC obtains the best excitation for a specified bit rate by minimizing a weighted mean-squared criterion representing subjectively important differences between original and synthetic speech signals. In this paper we examine the possibility that multi-pulse excitation can approximate the all-pole filter excitation sufficiently closely and obtain the optimum filter parameters for this excitation.

Proceedings ArticleDOI
01 Apr 1983
TL;DR: This paper discusses several algorithms that can be used to reduce the transmission rate for LPC vocoded speech to around 300 to 400 b/s, with only a modest degradation in speech quality relative to that of fixed-rate 2400 b/S LPC Vocoders.
Abstract: In this paper we discuss several algorithms that can be used to reduce the transmission rate for LPC vocoded speech to around 300 to 400 b/s, with only a modest degradation in speech quality relative to that of fixed-rate 2400 b/s LPC vocoders. We limit the discussion to vocoders that transmit information for single frames (as opposed to whole segments of speech). We start with vector quantization, which reduces the bit rate to around 800 b/s accompanied by a significant but tolerable loss in quality relative to a typical fixed-rate 2400 b/s vocoder. Then we reduce the frame rate using one of two techniques: Fixed-Rate Transmission with Variable Interpolation, or Optimal Variable-Frame-Rate Transmission. We also reduce the data rate necessary for the source parameters (pitch, voicing, gain) from 400 b/s to about 100 b/s by taking advantage of their statistical dependence on the spectrum and some perceptual factors. The final result at 300 b/s has a quality comparable to that of the fixed-rate 800 b/s vector quantization vocoder. At 400 b/s, the quality is, in many respects, better than that of the 800 b/s vocoder and comparable to the 2400 b/s LPC vocoder.

Proceedings ArticleDOI
C. Heron1, R. Crochiere, R. Cox
01 Apr 1983
TL;DR: The results of informal listening tests indicate that the new designs offer performance comparable to existing ATC techniques while having complexities roughly three times that of existing 4 and 5 band sub-band coders.
Abstract: In this paper we report on a study of a technique for 32-band subband/transform coding at 16 kb/s. This approach occupies the middle range of algorithm complexities and frequency resolution between that of Sub-Band Coding (SBC) and Adaptive Transform Coding (ATC). Two designs for 16 kb/s 32-band coders have been simulated on a laboratory computer. The results of informal listening tests indicate that the new designs offer performance comparable to existing ATC techniques while having complexities roughly three times that of existing 4 and 5 band sub-band coders.

Proceedings ArticleDOI
01 Apr 1983
TL;DR: The partial trigonometric moment problem is shown to provide a unifying framework for several speech modelling techniques, such as the classical LPC antoregressive model, the line spectral pairs and composite sinusoidal waves models, and the Toeplitz eigenvector model for formant extraction.
Abstract: The partial trigonometric moment problem is shown to provide a unifying framework for several speech modelling techniques, such as the classical LPC antoregressive model, the line spectral pairs and composite sinusoidal waves models, and the Toeplitz eigenvector model for formant extraction, From a mathematical viewpoint, this moment problem can be identified to an extension problem in the class of impedance functions or equivalently in the class of nonnegative definite Toeplitz matrices.

Journal ArticleDOI
TL;DR: Using these techniques, LPC encoded speech at 1200 bits/s is demonstrated to be of quality comparable to a constant rate LPC vocoder at 2400 bit/s.
Abstract: In LPC analysis, the speech signal is divided into frames each of which is represented by a vector of estimated vocal tract parameters, assumed to be constant throughout the frame. For many sounds, these parameters do not change significantly from one frame to the next, and some of them can often be adequately represented by previously transmitted values. In the LPC coding systems described in this paper, a number of alternative representations are considered for each frame. These representations (vectors) are combinations of PARCOR coefficients from the current frame and from previous frames. Several consecutive frames are analyzed at once, and all the possible sequences of PARCOR coefficient vectors are examined. The sequence which minimizes a preselected cost function is chosen for transmission, resulting in a reduced overall data rate. The examination of all the decision sequences is equivalent to a decision tree search, which is most efficiently accomplished through dynamic programming. Using these techniques, LPC encoded speech at 1200 bits/s is demonstrated to be of quality comparable to a constant rate LPC vocoder at 2400 bits/s.

Dissertation
01 Jan 1983

Journal ArticleDOI
TL;DR: A very small, flexible, high-quality, full-duplex 2.4-kbit/s linear predictive vocoder has been implemented with commercially available integrated circuits.
Abstract: A very small, flexible, high-quality, full-duplex 2.4-kbit/s linear predictive vocoder has been implemented with commercially available integrated circuits. This fully digital realization is based on a distributed signal processing architecture employing three Nippon Electric Company (NEC) µPD7720 signal processing interface (SPI) single-chip microcomputers. One SPI implements the LPC analyzer, a second implements the Gold pitch and voicing decision algorithm, white the third µPD7720 implements the excitation generator and synthesizer. An Intel 8085-based 8-bit microcomputer is used for data transfer, control and multiplexing functions, and communications with the host terminal. The LPC chip set achieves high flexibility by accepting run time initialization options from the Intel 8085. These parameters include choice of linear predictive model (<= 15), analysis and synthesis frame size, and speech sampling frequency, as well as choice of speech input and output coding formats (linear or µ-255 law) and choice of analog or digjtal pre- and deemphasis. A total of 16 integrated circuits is used in the LPC vocoder with a power disipation of 5.5 W and occupying 18 in/sup 2/ of circuit area.

Journal ArticleDOI
TL;DR: In this article, the authors examined the utility of linear predictive coding in reducing the amount of data storage required for signals gathered in ocean bottom seismology, and found that this scheme consistently introduced about 15 times (4 bits) less distortion both in terms of the root-mean-square (rms) error and the maximum error than rounding the data, and the rms distortion of the data were within a factor of 4 (2 bits) of the rate distortion bound on optimal encoding.
Abstract: This paper examines the utility of linear predictive coding in reducing the amount of data storage required for signals gathered in ocean bottom seismology. In this study, a set of 12 typical signals were repeatedly encoded with the storage allocated decreasing from an initial 12 bits per datum to 2. The error introduced was then compared to the performance achieved by simply rounding off the lowest bits of the data, and to estimates of the rate distortion limit. It was found that this scheme consistently introduced about 15 times (4 bits) less distortion both in terms of the root-mean-square (rms) error and in terms of the maximum error than rounding the data. Moreover, the rms distortion of the data were within a factor of 4 (2 bits) of the rate distortion bound on optimal encoding. Thus, the scheme was seen to be an effective approach to the problem of data compression in the marine environment.

Proceedings ArticleDOI
01 Apr 1983
TL;DR: Results found are: (1) limited time sequence compression does not impose any negative effect on DP or its alternatives and (2) variable threshold scheme performs better than the fixed threshold scheme.
Abstract: This paper investigates the effect of LPC based time compression schemes on dynamic programming (DP) and its alternatives. Two compression schemes, one with fixed threshold and the other with variable threshold both incorporated with two control factors, the rate of frame overlap and the step of interframe interval, are investigated. The test speech is 40-word alpha-digit vocabulary pronounced by 10 males and 10 females. Results found are: (1) limited time sequence compression does not impose any negative effect on DP or its alternatives and (2) variable threshold scheme performs better than the fixed threshold scheme. More detailed discussion on the compression schemes and DP interaction are included.

Proceedings ArticleDOI
14 Apr 1983
TL;DR: Although the modified adaptive predictor provided the best improvement in spectral error, results indicate the modified spectral subtraction method to be the most suitable for use with linear predictive coding systems.
Abstract: This paper presents a discussion and evaluation of several filtering techniques for suppressing narrowband background noise in speech signals. The methods discussed are a modified spectral subtraction technique, an inverse transform filter, an adoptive notch placement technique, an adaptive predictor, and a modification of the adaptive predictor. Performance of the filter methods are compared using a spectral error measurement and an area ratio parameter error measurement. Although the modified adaptive predictor provided the best improvement in spectral error, results indicate the modified spectral subtraction method to be the most suitable for use with linear predictive coding systems.

Proceedings ArticleDOI
14 Apr 1983
TL;DR: This paper discusses the formulation of the problem, the techniques developed, and the results of a limited-scale intelligibility test, which indicate that no intelligibility improvement is obtained from the processing.
Abstract: Development and tests on an algorithm to enhance the intelligibility of speech degraded by an interfering talker is reported. This paper discusses the formulation of the problem, the techniques developed, and the results of a limited-scale intelligibility test. While the test results indicate that no intelligibility improvement is obtained from the processing, several promising new directions for this problem have been identified.

Journal ArticleDOI
TL;DR: A 2:1 compression and expansion system that has been used as part of a 9.6 kbit/s speech coder is discussed and it is shown that for all the compression/expansion ratios of interest the buffer size needed is twice the maximum pitch period.
Abstract: Time domain harmonic scaling (TDHS) has been realized in real time on the Bell Laboratories digital signal processing (DSP) integrated circuit. It is an algorithm that can expand or compress the bandwidth and sampling rate of speech by taking advantage of the pitch structure in the speech signal. As such it is useful in a variety of speech applications including speech coding, speech enhancement, and rate modification. A single DSP can perform compression and a second DSP can perform expansion. Both operations require pitch information to be supplied with the input speech. Included in the system is a real-time pitch/periodicity detector which has also been implemented on a single DSP. Its design is based on a novel modification of the autocorrelation function type pitch detector. This paper presents details of both the TDHS and pitch detector implementation and discusses their performances. In particular in this paper we discuss a 2:1 compression and expansion system that has been used as part of a 9.6 kbit/s speech coder. TDHS was previously thought to require a much larger buffer than the RAM memory available in the DSP. We show that for all the compression/expansion ratios of interest the buffer size needed is twice the maximum pitch period.

Proceedings ArticleDOI
01 Apr 1983
TL;DR: Two filter bank based statistics, the new statistic and a symmetric variation of the distance measure proposed by Klatt [5] performed better than the linear prediction statistics in a word recognition task on continuous speech.
Abstract: With the current popularity of template matching for speech recognition systems, it is important to have the strongest possible spectral match statistic for scoring template frames against speech frames. This paper describes a phoneme based technique that has proven useful in studying match statistics and compares a new filter bank based statistic derived with this technique to several linear prediction and filter bank statistics. Two filter bank based statistics, the new statistic and a symmetric variation of the distance measure proposed by Klatt [5] performed better than the linear prediction statistics in a word recognition task on continuous speech.

Proceedings ArticleDOI
01 Apr 1983
TL;DR: A methodology is described to obtain a set of segments and rules that represents adequately the speech performance of a given speaker and how such a segment data base can be used for speech coding at very low bit rate, synthesis from unrestricted text, and continuous speech recognition.
Abstract: A methodology is described to obtain a set of segments and rules that represents adequately the speech performance of a given speaker. This methodology proceeds from an initial set of diphones extracted from a neutral context and modify this set with larger and/or smaller segments depending on the match with natural utterances. Each segment is stored as a sequence of frames coded using LPC coefficients. An estimate of the likelihood of timescale distortion is associated with each frame. It represents knowledge on temporal variability that can be used by synthesis rules and/or pattern matching algorithms. It is then shown how such a segment data base can be used for 1) speech coding at very low bit rate ( ∼ 400 bit/sec), 2) synthesis from unrestricted text, 3) continuous speech recognition.

Journal ArticleDOI
B.S. Babu1
TL;DR: This paper describes a 2400 bit/s vocoder based on spectral envelope estimation, spectral coding to 48 bits, pitch extraction, and decreasing-chirp excitation for voiced synthesis that is robust in acoustic noise environments at a data rate of 2400 bits/s.
Abstract: This paper describes a 2400 bit/s vocoder based on spectral envelope estimation, spectral coding to 48 bits, pitch extraction, and decreasing-chirp excitation for voiced synthesis. Several spectral smoothing and coding schemes are described and intelligibility test results compared. This vocoder was implemented on the CSP-30 high speed digital processor at the RADC/EEV Speech Processing Research and Development Facility at Hanscom AFB, MA. This system yields high performance in a quiet environment and is robust in acoustic noise environments at a data rate of 2400 bits/s.

Proceedings ArticleDOI
14 Apr 1983
TL;DR: The results indicate that arbitrarily-shaped image regions can be well identified and clustered using as features their 2-D LPC parameters.
Abstract: This paper is concerned with the use of 2-D linear prediction for image segmentation. It begins with a brief summary of the mathematics involved in 2-D linear predictive analysis of arbitrarily-shaped regions. Then, it introduces a 2-D LPC distance measure based on the error residual of 2-D linear prediction. Finally, it describes how the above results can be applied to image segmentation using a simple cluster seeking algorithm. The results indicate that arbitrarily-shaped image regions can be well identified and clustered using as features their 2-D LPC parameters.

Proceedings ArticleDOI
01 Apr 1983
TL;DR: This evaluation procedure is validated by computing the objective quality scores over a test bed of five mediumband and narrowband real-time speech coders and correlating the scores with subjective judgments generated by the Diagnostic Acceptability Measure test.
Abstract: We consider in this paper the problem of developing and testing a procedure for objective evaluation of the speech quality of real-time speech coders. We validate this evaluation procedure by computing the objective quality scores over a test bed of five mediumband and narrowband real-time speech coders and correlating the scores with subjective judgments generated by the Diagnostic Acceptability Measure test. We report on the performance of several published objective measures. Also, we describe and suggest solution approaches to two related problems: synchronize in time a real-time coder's output with its input, and design a database of input-speech sentences to be used in objective speech quality evaluation. We present various experimental results of this ongoing work.

Proceedings ArticleDOI
14 Apr 1983
TL;DR: This paper examines a technique for reducing the effect of pitch-incongruence or harmonic offset in baseband speech coding of high pitched voices using a variable-width baseband coding scheme.
Abstract: This paper examines a technique for reducing the effect of pitch-incongruence or harmonic offset in baseband speech coding of high pitched voices. The proposed method employs a variable-width baseband coding scheme. A nominal baseband width can be specified. Actual width of the baseband is determined in every frame to be a number closest to the nominal width that contains an integer number of multiples of fundamental pitch frequency. When baseband is copied up to higher bands, phase of the copied up bands must be frequency-adjusted to form a consistent overall phase function for that frame. This ensures more appropriate addition of harmonics to generate pitch pulses at desired positions in the regenerated residual signal. A method for making such adjustment to phase of the copied up bands is explored and its promises and limitations are discussed.