scispace - formally typeset
Search or ask a question

Showing papers on "Speech coding published in 1983"


Proceedings ArticleDOI
Bishnu S. Atal1
14 Apr 1983
TL;DR: The aim is to determine the extent to which the bit rate of LPC parameters can be reduced without sacrificing speech quality.
Abstract: This paper describes a method for efficient coding of LPC log area parameters. It is now well recognized that sample-by-sample quantization of LPC parameters is not very efficient in minimizing the bit rate needed to code these parameters. Recent methods for reducing the bit rate have used vector and segment quantization methods. Much of the past work in this area has focussed on efficient coding of LPC parameters in the context of vocoders which put a ceiling on achievable speech quality. The results from these studies cannot be directly applied to synthesis of high quality speech. This paper describes a different approach to efficient coding of log area parameters. Our aim is to determine the extent to which the bit rate of LPC parameters can be reduced without sacrificing speech quality. Speech events occur generally at non-uniformly spaced time intervals. Moreover, some speech events are slow while others are fast. Uniform sampling of speech parameters is thus not efficient. We describe a non-uniform sampling and interpolation procedure for efficient coding of log area parameters. A temporal decomposition technique is used to represent the continuous variation of these parameters as a linearly-weighted sum of a number of discrete elementary components. The location and length of each component is automatically adapted to speech events. We find that each elementary component can be coded as a very low information rate signal.

377 citations


Journal ArticleDOI
TL;DR: Recent results obtained in waveform coding of speech with vector quantization are reviewed, with Vector quantization appearing to be a suitable coding technique which caters to this dual requirement of effective speech coding.
Abstract: V ECTOR QUANTIZATION (VQ), a new direction in source coding, has recently emerged as a powerful and widely applicable coding technique. I t was first applied to analysis/synthesis of speech, and has allowed Linear Predictive Coding (LPC) rates to be dramatically reduced to 800 b/s with very slight reduction in quality, and further compressed to rates as low as 150 b/s while retaining intelligibility [ 1,2]. More recently, the technique has found its way to waveform coding [3-51, where its applicability and effectiveness is less obvious and not widely known. There is currently a great need for a low-complexity speech coder at the rate of 16 kb/s which attains essentially “toll” quality, roughly equivalent to that of standard 64-kb/s log PCM codecs. Adaptive DPCM schemes can attain this quality with low complexity for the proposed 32 kb/s CCITT standard, but at 16 kb/s the quality of ADPCM or adaptive delta modulation schemes is inadequate. More powerful methods, such as subband coding or transform coding, are capable of producing acceptable speech quality at 16kb/s but have a much higher implementation complexity. The difficulty is further compounded by the need for a scheme that can handle both speech and voiceband data at the 16 kb/s rate. These two types of waveforms occupy the same bandwidth in the subscriber loop part of the telephone network, yet they have a widely different statistical character. Effective speech coding at this rate must be geared to the specific character of speech and must exploit our knowledge of human hearing. On the other hand, a waveform that carries data must be coded and later reconstructed so that a modem can still extract the data with an acceptably low error rate. This is purely a signal processing operation not involving human perception. Vector quantization appears to be a suitable coding technique which caters to this dual requirement. VQ may become the key to 16 kb/s coding; it may also lead to improved quality waveform coding at 8 or 9.6 kb/s. In this paper, we review recent results obtained in waveform coding of speech with vector quantization and

198 citations


Journal ArticleDOI
TL;DR: This paper compares this algorithm to several alternative algorithms and studies the properties of the resulting code books to conclude that the various algorithms gave essentially identical code books.
Abstract: Vector quantization has been used in coding applications for several years. Recently, quantization of linear predictive coding (LPC) vectors has been used for speech coding and recognition. In these latter applications, the only method that has been used for deriving the vector quantizer code book from a set of training vectors is the one described by Linde, Buzo, and Gray. In this paper, we compare this algorithm to several alternative algorithms and also study the properties of the resulting code books. Our conclusion is that the various algorithms that we tried gave essentially identical code books.

169 citations


Journal ArticleDOI
TL;DR: The results of a new method based on rate-distortion speech coding (speech coding by vector quantization), minimum cross-entropy pattern classification, and information-theoretic spectral distortion measures for discrete utterance speech recognition are presented.
Abstract: The results of a new method are presented for discrete utterance speech recognition. The method is based on rate-distortion speech coding (speech coding by vector quantization), minimum cross-entropy pattern classification, and information-theoretic spectral distortion measures. Separate vector quantization code books are designed from training sequences for each word in the recognition vocabulary. Inputs from outside the training sequence are classified by performing vector quantization and finding the code book that achieves the lowest average distortion per speech frame. The new method obviates time alignment. It achieves 99 percent accuracy for speaker-dependent recognition of a 20 -word vocabulary that includes the ten digits, with higher accuracy for recognition of the digit subset. For speaker-independent recognition, the method achieves 88 percent accuracy for the 20 -word vocabulary and 95 percent for the digit subset. Background of the method, detailed empirical results, and an analysis of computational requirements are presented.

92 citations


Journal ArticleDOI
TL;DR: A novel model for voiced speech that allows for local non-stationarities not only in terms of pitch perturbations, but in Terms of vocal tract variations as well, and supports new forms of spectral prediction, which can be put to advantage in speech coding applications.
Abstract: The main purpose of this paper is to present a novel model for voiced speech. The classical model, which is being used in many applications, assumes local stationarity, and consequently imposes a simple and well known line structure to the short-time spectrum of voiced speech. The model derived in this paper allows for local non-stationarities not only in terms of pitch perturbations, but in terms of vocal tract variations as well. The resulting structure of the short-time spectrum becomes more complex, but can still be interpreted in terms of generalized lines. The proposed model supports new forms of spectral prediction, which can be put to advantage in speech coding applications. Experimental results are presented supporting the validity of both the model itself and the prediction relationships. Finally, a new class of speech coders, denoted harmonic coders, based on the presented model, is proposed, and a specific implementation is presented.

84 citations


PatentDOI
TL;DR: In this article, a variable baseband width adaptively varied, in accordance with an integral multiple of the frequency of the pitch of the input signal, to provide a more appropriate harmonic match in the reconstituted excitation signal.
Abstract: An improved voice messaging system using LPC baseband speech coding. In standard LPC-based baseband speech coding techniques, LPC parameters plus a residual signal are transmitted. To save band width, the residual signal is filtered so that only a fraction of its full bandwidth (e.g., the bottom 1 KHz) is transmitted. At the decoding station, this fraction of the residual signal (which is known as the baseband signal) is copied up or otherwise expanded to higher frequencies, to provide the excitation signal which is filtered according to the LPC parameters to provide the reconstituted speech output. However, this tends to produce perceptually significant ringing effects and high frequency distortion in the reconstituted signal. The present invention uses a variable baseband width, which is adaptively varied, in accordance with an integral multiple of the frequency of the pitch of the input signal, to provide a more appropriate harmonic match in the reconstituted excitation signal. This eliminates the noticeable ringing effect.

50 citations


Journal ArticleDOI
TL;DR: The speech processing used with the Vienna auditory prosthesis is described and results are presented.
Abstract: A number of remarkably different speech-coding strategies exist that were developed or suggested for the electrical stimulation of the auditory nerve. The considerable differences among those strategies may reflect both (1) the relative importance attached by the various groups to the place and periodicity principle, and, to a certain degree, ( 2 ) the varying abilities and limitations of the different implants used. The second point may need some explanation: Without doubt, a percutaneous plug is the most advantageous from the point of view of coding, but many groups do not use it for good reason. The implanted receiver circuits necessary for the transcutaneous transmission are severely restricted in size, power consumption, and complexity. This limits the number and bandwidth of independent stimulation channels. Often some low-power digital circuits are used.', Their incompatibility with analog stimulation waveforms may explain the preponderance of pulsatile stimulation schemes. Fortunately, speech seems to be so redundant that even a considerable degree of processing leaves some cues sufficiently intact to enable the listener to partly reconstruct the information. This may explain why widely differing coding schemes may achieve similar results. There is no agreement on the number of independent stimulation channels necessary or feasible. Existing devices feature from one to fifteen channel^.^ The stimulation sites used include the scala tympani, the modiolus, the cochlear nucleus, the round window and the promontory, and sites along, but external to, the cochlea. The stimulation signals themselves span the range from fixedfrequency, variable-duration pulses & to modulated carriers and to completely analog waveforms.6 There are schemes using varying amounts of feature extraction from voiced/unvoiced-coding only, to heavily coded signals on several channels.? In this paper the speech processing used with the Vienna auditory prosthesis is described and results are presented. In addition, some results of our ongoing investigations of further singleand multichannel strategies are given.

47 citations


Proceedings ArticleDOI
01 Apr 1983
TL;DR: A model that allows accurate evaluation of the envelope of the reverberant speech, even when little prior information about the room characteristics is available, is proposed in the context of a multiband processing scheme, aiming at the enhancement of single microphone recorded reverberantspeech signals.
Abstract: Acoustic environments can be treated as linear systems whose transmission properties are given by their impulse response functions. This basic model can be extended, under certain conditions, to describe the relationship between the envelopes of the input and output waveforms. Such a model is proposed in the context of a multiband processing scheme, aiming at the enhancement of single microphone recorded reverberant speech signals. The specific requirements of this model permit a simplified approach to the estimation of the envelope functions. The model allows accurate evaluation of the envelope of the reverberant speech, even when little prior information about the room characteristics is available. Speech enhancement can be then achieved after envelope deconvolution in each band, which recovers the envelope of the anechoic signal from the measured speech envelope, and final reconstruction of the speech waveform using the original phase function.

37 citations


Proceedings ArticleDOI
Sharad Singhal1, B. Atal
01 Apr 1983
TL;DR: The possibility that multi-pulse excitation can approximate the all-pole filter excitation sufficiently closely and obtain the optimum filter parameters for this excitation is examined.
Abstract: Present LPC analysis procedures assume that the input to the all-pole filter is white; the filter parameters are obtained by minimizing the mean-squared error between the filter output samples and their values obtained by linear prediction on the basis of past output samples. It is well known that these procedures often do not yield accurate filter parameters for periodic (or quasi-periodic) signals such as voiced speech. To compensate for the periodic nature of speech, an estimate of the excitation of the all-pole filter has to be made. Multi-pulse LPC obtains the best excitation for a specified bit rate by minimizing a weighted mean-squared criterion representing subjectively important differences between original and synthetic speech signals. In this paper we examine the possibility that multi-pulse excitation can approximate the all-pole filter excitation sufficiently closely and obtain the optimum filter parameters for this excitation.

31 citations


Journal ArticleDOI
TL;DR: This work has investigated the performance of an adjustable source/channel codec in a cellular mobile-radio environment and found that this approach offers an improved grade of service.
Abstract: The performance of an adjustable source/channel codec in a cellular mobile-radio environment is investigated. The speech transmission rate and the amount of forward error correction change in response to changing channel conditions. The channel rate is constant at 32 kb/s, and when the channel is good all of these bits are used for speech transmission. In intermediate and poor channels the speech rate is 24 or 16 kb/s, and the remaining channel symbols are used for forward error correction. Relative to conventional transmission this approach offers an improved grade of service. For example, the outage rate (the proportion of "poor or worse" communications) goes from nine percent with fixed-rate to three percent with variable-rate transmission. Alternatively, this improved grade of service can be exchanged for higher bandwidth efficiency. The fixed-rate system (with nine percent outage) has 23 users per cell. With 52 users per cell the outage of the variable-rate system is only six percent.

31 citations



PatentDOI
TL;DR: In this paper, the asymmetric bi-directional audio signal cross-feed is established between first and second audio signal processing channels, for example, where the cross-fed signal components are combined in an out-of-phase relationship with respect to related audio signals already passing through a given channel.
Abstract: Enhanced pyschoacoustic imagery is achieved in an audio signal processing circuit for processing plural channels of related audio signals. Asymmetric bi-directional audio signal cross-feed is established between first and second audio signal processing channels, for example. The cross-fed signal components are combined in an out-of-phase relationship with respect to related audio signals already passing through a given channel. The asymmetry is designed so as to complement the asymmetry which is believed to be present in a listener's brain processing of perceived acoustic signals due to the naturally occurring left or right half brain dominance of the listener. In other embodiments both symmetric and asymmetric, cross-feeding is limited to signal components below a predetermined frequency.

Journal ArticleDOI
TL;DR: A new diversity technique is proposed to combat Rayleigh fading in digital mobile radio systems transmitting speech signals using μ-law PCM encoded speech signals, and a statistical error detection strategy is evoked to identify the erroneous samples.
Abstract: A new diversity technique is proposed to combat Rayleigh fading in digital mobile radio systems transmitting speech signals. The speech signals are μ-law PCM encoded ( \mu = 255 , 8 kHz sampling, 8 bits/code word, 64 kbit/s data rate), and alternate data words are used to form two streams called "odd" and "even." The even stream is delayed by τ seconds and the streams are interleaved prior to radio transmission using two-level PSK modulation. At the receiver the odd data stream is delayed by τ and interleaved with the even stream. Consequently, if an error burst occurs, the effect of the reshuffling of the data stream is, in general, to place words with bit errors in juxtaposition to those correctly received. After μ-law PCM decoding of the words, a statistical error detection strategy is evoked to identify the erroneous samples. These samples are replaced by adjacent sample interpolation to give the recovered speech sequence. No recourse to channel protection coding is made. In our experiments a Rayleigh fading envelope was generated from a hardware simulator and stored in a computer, along with four sentences of speech. The system was then simulated and the recovered speech perceived. The objective performance measures were segmental SNR for the audio signal, and BER. Different error detection strategies were examined and restrictions on τ investigated. For a mobile speed of 30 mph, SNR values of 32, 21, and 16 dB were obtained for BER values of 0.1, 1, and 2 percent, corresponding to SNR gains over an uncorrected system of 3, 9, and 11 dB, respectively.

PatentDOI
TL;DR: In this paper, a method of and apparatus for processing audio signals in which a measure of amplitude of audio signals is obtained in a selected time period is presented, and then the delayed audio signals are normalized using the measured amplitude.
Abstract: A method of and apparatus for processing audio signals in which a measure of amplitude of audio signals in a selected time period is obtained. The audio signals (Fig. 1) for the selected time period are delayed (18) until the measure of amplitude (16) is obtained, and then the delayed audio signals are normalized (20) using the measure of amplitude. High frequency emphasis (14) may be employed prior to obtaining the measure of amplitude. Alternatively, a multi-channel system (Fig. 3) can be employed for processing audio signals in limited frequency bands (32, 34, 36). The method and apparatus are applicable in a variety of applications including hearing aids, audio storage media, broadcast and public address systems, and voice communications such as telephone systems.

Journal ArticleDOI
TL;DR: The spread-spectrum properties of the X-System for secret telephony developed by Bell Telephone Laboratories for use in World War II are examined and it is believed that this was the first practical example of digital speech transmission.
Abstract: The spread-spectrum properties of the X-System for secret telephony developed by Bell Telephone Laboratories for use in World War II are examined. In this system, the bandwidth of the speech signal was reduced by a vocoder, the vocoder signals were sampled and quantized to base six, and a random, never-reused, six-valued key stream was added modulo six to obtain a public message which was undecipherable without the key. It is believed that this was the first practical example of digital speech transmission. Examples of its effectiveness are described, and a number of humaninterest type anecdotes are related.

Journal ArticleDOI
TL;DR: A 2:1 compression and expansion system that has been used as part of a 9.6 kbit/s speech coder is discussed and it is shown that for all the compression/expansion ratios of interest the buffer size needed is twice the maximum pitch period.
Abstract: Time domain harmonic scaling (TDHS) has been realized in real time on the Bell Laboratories digital signal processing (DSP) integrated circuit It is an algorithm that can expand or compress the bandwidth and sampling rate of speech by taking advantage of the pitch structure in the speech signal As such it is useful in a variety of speech applications including speech coding, speech enhancement, and rate modification A single DSP can perform compression and a second DSP can perform expansion Both operations require pitch information to be supplied with the input speech Included in the system is a real-time pitch/periodicity detector which has also been implemented on a single DSP Its design is based on a novel modification of the autocorrelation function type pitch detector This paper presents details of both the TDHS and pitch detector implementation and discusses their performances In particular in this paper we discuss a 2:1 compression and expansion system that has been used as part of a 96 kbit/s speech coder TDHS was previously thought to require a much larger buffer than the RAM memory available in the DSP We show that for all the compression/expansion ratios of interest the buffer size needed is twice the maximum pitch period

DOI
01 Jan 1983
TL;DR: The final author version and the galley proof are versions of the publication after peer review and the final published version features the final layout of the paper including the volume, issue and page numbers.
Abstract: • A submitted manuscript is the author's version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers.

Journal ArticleDOI
R. Crochiere1, J. Flanagan
TL;DR: This paper briefly outlines some of the basic properties of speech and the techniques of coding that use these properties and point to areas of current research and areas of practical hardware implementation.
Abstract: THE field of digital speech is rapidly coming to fruition as an area of opportunity for commercial application. In this paper we attempt to give a broad tutorial overview of some of the key aspects of this technology. We briefly outline some of the basic properties of speech and the techniques of coding that use these properties and point to areas of current research and areas of practical hardware implementation. We also attempt to outline some of the potential application areas for this new technology.


Journal ArticleDOI
TL;DR: In this article, the authors considered the possibility of introducing packetized voice traffic into a packet-switched network and proposed simplified protocols and priority rules for voice handling, which are compared by means of analytical tools and simulation experiments considering the presence of voice, interactive, and batch data packets.
Abstract: This paper considers the possibility of introducing packetized voice traffic into a packet-switched network. It is well known that the network must assure voice packets sufficient delay characteristics for conversational speech, i.e., low delay between speaker and listener and low delay jitter or variance. To reach these goals, simplified protocols and priority rules for voice handling are proposed and evaluated. A model of a packet switching node structure capable of handling both data and voice is derived for both analytical and simulation approaches. The use of low bit rate voice encoders is considered. The necessity of avoiding the transmission of silent intervals is discussed in relation to the behavior of packet voice receivers. Proposed strategies are compared by means of analytical tools and simulation experiments considering the presence of voice, interactive, and batch data packets.

Journal ArticleDOI
TL;DR: Using these techniques, LPC encoded speech at 1200 bits/s is demonstrated to be of quality comparable to a constant rate LPC vocoder at 2400 bit/s.
Abstract: In LPC analysis, the speech signal is divided into frames each of which is represented by a vector of estimated vocal tract parameters, assumed to be constant throughout the frame. For many sounds, these parameters do not change significantly from one frame to the next, and some of them can often be adequately represented by previously transmitted values. In the LPC coding systems described in this paper, a number of alternative representations are considered for each frame. These representations (vectors) are combinations of PARCOR coefficients from the current frame and from previous frames. Several consecutive frames are analyzed at once, and all the possible sequences of PARCOR coefficient vectors are examined. The sequence which minimizes a preselected cost function is chosen for transmission, resulting in a reduced overall data rate. The examination of all the decision sequences is equivalent to a decision tree search, which is most efficiently accomplished through dynamic programming. Using these techniques, LPC encoded speech at 1200 bits/s is demonstrated to be of quality comparable to a constant rate LPC vocoder at 2400 bits/s.

Dissertation
01 Jan 1983

Journal ArticleDOI
TL;DR: A variable bit rate speech coding system based on explicit coding of the reconstruction noise in ADPCM (differential pulse code modulation with adaptive quantization) is discussed, which can be regarded as a way of improving the performance of AD PCM coding at a single bit rate of R + Rn bits/sample.
Abstract: This paper discusses a variable bit rate speech coding system based on explicit coding of the reconstruction noise in ADPCM (differential pulse code modulation with adaptive quantization). If the ADPCM bit rate is R bits/sample, PCM coding of its noise using an average bit rate of R n bits/sample provides the receiver with the possibility of operating at any bit rate in the range R to R + max{R n }. Using R values in the range 2 to 5, and R n values in the range 0 to 3, we compare the performance of the (R + R n )-bit system with that of conventional (R + R n )-bit ADPCM. If noise coding is based on instantaneous R n -bit quantization of its samples with an optimized step size, the signal-to-noise ratio performance is comparable to that of conventional ADPCM for R n = 1, but it deteriorates significantly for R n > 1. With non-instantaneous noise coding, the performance can exceed that of conventional ADPCM for any R n > 1, if R > 2. This is due to a variable bit allocation algorithm that quantizes noise samples with differing resolutions, while maintaining a constant total bit rate in every block of 4 ms. The algorithm does not require the transmission of any extra side information. It can also be regarded as a way of improving the performance of ADPCM coding at a single bit rate of R + R n bits/sample.

Journal ArticleDOI
TL;DR: Results measured over 16 ms, a phoneme, and word durations indicate that the adaptive frequency mapping algorithm significantly enhances the recovered speech compared to telephonic speech.
Abstract: Telephone channels restrict the bandwidth of speech signals to approximately 0.3-3.3 kHz, with the consequence that the intelligibility of unvoiced sounds may be significantly impaired. To prevent this band limitation of unvoiced sounds while still confining the speech to the telephonic bandwidth, we propose a scheme which, on recognizing the presence of unvoiced sounds extending to 7.6 kHz, frequency maps them into the band 0.3-3.3 kHz. Four mapping laws are considered and the unvoiced speech is compressed using each law. Frequency demapping is employed, and the law that has the best spectral match to the speech spectrum is selected. Voiced speech is band limited from 0.3 to 3.3 kHz. Results measured over 16 ms, a phoneme, and word durations indicate that the adaptive frequency mapping algorithm significantly enhances the recovered speech compared to telephonic speech. Informal listening experiences support these findings.

Proceedings ArticleDOI
01 Apr 1983
TL;DR: Results found are: (1) limited time sequence compression does not impose any negative effect on DP or its alternatives and (2) variable threshold scheme performs better than the fixed threshold scheme.
Abstract: This paper investigates the effect of LPC based time compression schemes on dynamic programming (DP) and its alternatives. Two compression schemes, one with fixed threshold and the other with variable threshold both incorporated with two control factors, the rate of frame overlap and the step of interframe interval, are investigated. The test speech is 40-word alpha-digit vocabulary pronounced by 10 males and 10 females. Results found are: (1) limited time sequence compression does not impose any negative effect on DP or its alternatives and (2) variable threshold scheme performs better than the fixed threshold scheme. More detailed discussion on the compression schemes and DP interaction are included.

Proceedings ArticleDOI
14 Apr 1983
TL;DR: Although the modified adaptive predictor provided the best improvement in spectral error, results indicate the modified spectral subtraction method to be the most suitable for use with linear predictive coding systems.
Abstract: This paper presents a discussion and evaluation of several filtering techniques for suppressing narrowband background noise in speech signals. The methods discussed are a modified spectral subtraction technique, an inverse transform filter, an adoptive notch placement technique, an adaptive predictor, and a modification of the adaptive predictor. Performance of the filter methods are compared using a spectral error measurement and an area ratio parameter error measurement. Although the modified adaptive predictor provided the best improvement in spectral error, results indicate the modified spectral subtraction method to be the most suitable for use with linear predictive coding systems.

Proceedings ArticleDOI
Claude Galand1, D. Esteban
01 Apr 1983
TL;DR: A new TASI approach with embedded bit stream is proposed, taking advantage both of the sub-band coder architecture and of the dynamic bit allocation, and providing a full range of quality from communications quality to toll quality.
Abstract: Application of sub-band coders to Time Assignement Speech Interpolation systems (TASI) is discussed. After a short review of standard optimum allocation of bits for one single voice port, the extension to multiports is discussed and is shown to present significant drawbacks. A new TASI approach with embedded bit stream is then proposed. The output of the multirate speech compressor has an embedded bit stream, i.e. accepts bit deletion and insertion for dynamic rate conversion /1/. This property is of high importance in a digital communication network, since it allows the bit stream to be flagged at any overloaded node without tandeming or freeze-out. The imbedded operation is obtained by taking advantage both of the sub-band coder architecture and of the dynamic bit allocation. So as to illustrate the method, a multirate version of a sub-band coder has been designed to operate at different rates: 8,16,24,32 kbps, providing a full range of quality from communications quality to toll quality.

Proceedings ArticleDOI
14 Apr 1983
TL;DR: This paper discusses the formulation of the problem, the techniques developed, and the results of a limited-scale intelligibility test, which indicate that no intelligibility improvement is obtained from the processing.
Abstract: Development and tests on an algorithm to enhance the intelligibility of speech degraded by an interfering talker is reported. This paper discusses the formulation of the problem, the techniques developed, and the results of a limited-scale intelligibility test. While the test results indicate that no intelligibility improvement is obtained from the processing, several promising new directions for this problem have been identified.

Journal ArticleDOI
TL;DR: A 2:1 compression and expansion system that has been used as part of a 9.6 kbit/s speech coder is discussed and it is shown that for all the compression/expansion ratios of interest the buffer size needed is twice the maximum pitch period.
Abstract: Time domain harmonic scaling (TDHS) has been realized in real time on the Bell Laboratories digital signal processing (DSP) integrated circuit. It is an algorithm that can expand or compress the bandwidth and sampling rate of speech by taking advantage of the pitch structure in the speech signal. As such it is useful in a variety of speech applications including speech coding, speech enhancement, and rate modification. A single DSP can perform compression and a second DSP can perform expansion. Both operations require pitch information to be supplied with the input speech. Included in the system is a real-time pitch/periodicity detector which has also been implemented on a single DSP. Its design is based on a novel modification of the autocorrelation function type pitch detector. This paper presents details of both the TDHS and pitch detector implementation and discusses their performances. In particular in this paper we discuss a 2:1 compression and expansion system that has been used as part of a 9.6 kbit/s speech coder. TDHS was previously thought to require a much larger buffer than the RAM memory available in the DSP. We show that for all the compression/expansion ratios of interest the buffer size needed is twice the maximum pitch period.

Journal ArticleDOI
Yoshimune Hagiwara1, Y. Kita, T. Miyamoto, Y. Toba, H. Hara, T. Akazawa 
TL;DR: HSP architecture, LSI design, and a speech analysis application are described, which makes it possible to construct a compact speech analysis circuit by the LPC (PARCOR) method with two HSP's.
Abstract: A single chip high-performance digital signal processor (HSP) has been developed for speech, telecommunication, and other applications. The HSP uses 3 µm CMOS technology and its architecture features floating point arithmetic and pipeline structure. By adoption of floating point arithmetic, data covering a wide dynamic range (up to 32 bits) can be manipulated. The input clock frequency is 16 MHz, and the instruction cycle time is 250 ns. Efficient signal processing instructions and a large internal memory (program ROM: 512 words; data RAM: 200 words; data ROM: 128 words) make it possible to construct a compact speech analysis circuit by the LPC (PARCOR) method with two HSP's. This paper describes HSP architecture, LSI design, and a speech analysis application.