scispace - formally typeset
Search or ask a question
Author

P. Kroon

Bio: P. Kroon is an academic researcher. The author has contributed to research in topics: Speech coding & Codec. The author has an hindex of 1, co-authored 1 publications receiving 121 citations.

Papers
More filters
Journal ArticleDOI
R.V. Cox1, P. Kroon
TL;DR: The attributes of speech coders such as bit rate, complexity, delay, and quality are described, which are applicable to low-bit-rate multimedia communications.
Abstract: The International Telecommunications Union (ITU) has standardized three speech coders which are applicable to low-bit-rate multimedia communications. ITU Rec. G.729 8 kb/s CS-ACELP has a 15 ms algorithmic codec delay and provides network-quality speech. It was originally designed for wireless applications, but is applicable to multimedia communications as well. Annex A of Rec. G.729 is a reduced-complexity version of the CS-ACELP coder. It was designed explicitly for simultaneous voice and data applications that are prevalent in low-bit-rate multimedia communications. These two coders use the same bitstream format and can interoperate. The ITU Rec. G.723.1 6.3 and 5.3 kb/s speech coder for multimedia communications was designed originally for low-bit-rate videophones. Its frame size of 30 ms and one-way algorithmic codec delay of 37.5 ms allow for a further reduction in bit rate compared to the G.729 coder. In applications where low delay is important, the delay of G.723.1 may be too large. However, if the delay is acceptable, G.723.1 provides a lower-complexity alternative to G.729 at the expense of a slight degradation in quality. This article describes the attributes of speech coders such as bit rate, complexity, delay, and quality. Then it discusses the basic concepts of the three new ITU coders by comparing their specific attributes. The second part of this article describes the standardization process for each of these coders.

123 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper presents a voice detection algorithm which is robust to noisy environments, thanks to a new methodology adopted for the matching process, based on a pattern recognition approach in which the matching phase is performed by a set of six fuzzy rules, trained by means of a new hybrid learning tool.
Abstract: Discontinuous transmission based on speech/pause detection represents a valid solution to improve the spectral efficiency of new generation wireless communication systems. In this context, robust voice activity detection (VAD) algorithms are required, as traditional solutions present a high misclassification rate in the presence of the background noise typical of mobile environments. This paper presents a voice detection algorithm which is robust to noisy environments, thanks to a new methodology adopted for the matching process. More specifically, the VAD proposed is based on a pattern recognition approach in which the matching phase is performed by a set of six fuzzy rules, trained by means of a new hybrid learning tool. A series of objective tests performed on a large speech database, varying the signal-to-noise ratio (SNR), the types of background noise, and the input signal level, showed that, as compared with the VAD standardized by ITU-T in Recommendation G.729 annex B, the fuzzy VAD, on average, achieves an improvement in reduction both of the activity factor of about 25% and of the clipping introduced of about 43%. Informal listening tests also confirm an improvement in the perceived speech quality.

141 citations

Proceedings ArticleDOI
12 May 2002
TL;DR: This work uses the Gilbert loss model to infer that changing the packet interval affects loss burstiness, which in turn influences forward error correction (FEC) performance, and performs subjective listening tests based on Mean Opinion Score to evaluate the effect of bursty loss on VoIP perceived quality.
Abstract: Packet loss degrades the perceived quality of voice over IP (VoIP). In addition, packet loss in the Internet tends to come in bursts, which may further degrade audio quality. Using the Gilbert loss model, we infer that changing the packet interval affects loss burstiness, which in turn influences forward error correction (FEC) performance. Next, we perform subjective listening tests based on Mean Opinion Score (MOS) to evaluate the effect of bursty loss on VoIP perceived quality. Then, we compare the perceived quality achieved by two major loss repair methods: FEC and low bit-rate redundancy (LBR). Our MOS test results show that FEC is much preferred over LBR. In addition, our MOS results reveal that, under bursty loss, FEC quality is much better with a moderately large packet interval. Finally, because FEC introduces an extra delay proportional to the packet interval, we present a method of optimizing the packet interval to maximize FEC MOS by considering the delay impairment in ITU's E-model standard.

109 citations

Journal ArticleDOI
TL;DR: New rate-compatible convolutional (RCC) codes with high constraint lengths and a wide range of code rates are presented and are shown to provide good performance and rate-matching capabilities.
Abstract: New rate-compatible convolutional (RCC) codes with high constraint lengths and a wide range of code rates are presented. These new codes originate from rate 1/4 optimum distance spectrum (ODS) convolutional parent encoders with constraint lengths 7-10. Low rate encoders (rates 115 down to 1/10) are found by a nested search, and high rate encoders (rates above 1/4) are found by rate-compatible puncturing. The new codes form rate-compatible code families more powerful and flexible than those previously presented. It is shown that these codes are almost as good as the existing optimum convolutional codes of the same fates. The effects of varying the design parameters of the rate-compatible punctured convolutional (RCPC) codes, i.e., the parent encoder rate, the puncturing period, and the constraint length, are also examined. The new codes are then applied to a multicode direct-sequence code-division multiple-access (DS-CDMA) system and are shown to provide good performance and rate-matching capabilities. The results, which are evaluated in terms of the efficiency for Gaussian and Rayleigh fading channels, show that the system efficiency increases with decreasing code rate.

99 citations

Journal ArticleDOI
TL;DR: A performance evaluation and comparison of G.729, AMR, and fuzzy voice activity detection (FVAD) algorithms was made using objective, psychoacoustic, and subjective parameters to evaluate the extent to which VADs depend on language, the signal-to-noise ratio, or the power level.
Abstract: The paper proposes a performance evaluation and comparison of G.729, AMR, and fuzzy voice activity detection (FVAD) algorithms. The comparison was made using objective, psychoacoustic, and subjective parameters. A highly varied speech database was also set up to evaluate the extent to which VADs depend on language, the signal-to-noise ratio (SNR), or the power level.

90 citations

Journal ArticleDOI
TL;DR: An adaptive source rate control (ASRC) scheme which can work together with the hybrid ARQ error control schemes to achieve efficient transmission of real-time video with low delay and high reliability is proposed.
Abstract: Hybrid ARQ schemes can yield much better throughput and reliability than static FEC schemes for the transmission of data over time-varying wireless channels. However these schemes result in extra delay. They adapt to the varying channel conditions by retransmitting erroneous packets, this causes variable effective data rates for current PCS networks because the channel bandwidth is constant. Hybrid ARQ schemes are currently being proposed as the error control schemes for real-time video transmission. An important issue is how to ensure low delay while taking advantage of the high throughput and reliability that these schemes provide for. In this paper we propose an adaptive source rate control (ASRC) scheme which can work together with the hybrid ARQ error control schemes to achieve efficient transmission of real-time video with low delay and high reliability. The ASRC scheme adjusts the source rate based on the channel conditions, the transport buffer occupancy and the delay constraints. It achieves good video quality by dynamically changing both the number of the forced update (intracoded) macroblocks and the quantization scale used in a frame. The number of the forced update macroblocks used in a frame is first adjusted according to the allocated source rate. This reduces the fluctuation of the quantization scale with the change in the channel conditions during encoding so that the uniformity of the video quality is improved. The simulation results show that the proposed ASRC scheme performs very well for both slow fading and fast fading channels.

90 citations