scispace - formally typeset
Search or ask a question
Author

Nuggehally Sampath Jayant

Other affiliations: Alcatel-Lucent, AT&T
Bio: Nuggehally Sampath Jayant is an academic researcher from Bell Labs. The author has contributed to research in topics: Speech coding & Quantization (signal processing). The author has an hindex of 32, co-authored 68 publications receiving 5695 citations. Previous affiliations of Nuggehally Sampath Jayant include Alcatel-Lucent & AT&T.


Papers
More filters
Journal ArticleDOI
01 Oct 1993
TL;DR: It is proposed that fundamental limits in the science can be expressed by the semiquantitative concepts of perceptual entropy and the perceptual distortion-rate function, and current compression technology is examined in that framework.
Abstract: The notion of perceptual coding, which is based on the concept of distortion masking by the signal being compressed, is developed. Progress in this field as a result of advances in classical coding theory, modeling of human perception, and digital signal processing, is described. It is proposed that fundamental limits in the science can be expressed by the semiquantitative concepts of perceptual entropy and the perceptual distortion-rate function, and current compression technology is examined in that framework. Problems and future research directions are summarized. >

905 citations

Journal ArticleDOI
TL;DR: The results of video coding based on a three-dimensional spatio-temporal subband decomposition are competitive with traditional video coding techniques and provide the motivation for investigating the 3-D subband framework for different coding schemes and various applications.
Abstract: We describe and show the results of video coding based on a three-dimensional (3-D) spatio-temporal subband decomposition. The results include a 1-Mbps coder based on a new adaptive differential pulse code modulation scheme (ADPCM) and adaptive bit allocation. This rate is useful for video storage on CD-ROM. Coding results are also shown for a 384-kbps rate that are based on ADPCM for the lowest frequency band and a new form of vector quantization (geometric vector quantization (GVQ)) for the data in the higher frequency bands. GVQ takes advantage of the inherent structure and sparseness of the data in the higher bands. Results are also shown for a 128-kbps coder that is based on an unbalanced tree-structured vector quantizer (UTSVQ) for the lowest frequency band and GVQ for the higher frequency bands. The results are competitive with traditional video coding techniques and provide the motivation for investigating the 3-D subband framework for different coding schemes and various applications. >

283 citations

Journal ArticleDOI
TL;DR: Perceptual considerations indicate that packet lengths most robust to losses are in the range 16-32 ms, irrespective of whether interpolation is used or not, whereas tolerable P L values can be as high as 2 to 5 percent without interpolation and 5 to 10 percent with interpolation.
Abstract: We have studied the effects of random packet losses in digital speech systems based on 12-bit PCM and 4-bit adaptive DPCM coding. The effects are a function of packet length B and probability of packet loss P L . We have also studied tbe benefits of an odd-even sample-interpolation procedure that mitigates these effects (at the cost of increased decoding delay). The procedure is based on arranging a 2B -block of codewords into two B -sample packets, an odd-sample packet and an even-sample packet. If one of these packets is lost, the odd (or even) samples of the 2B -block are estimated from the even (or odd) samples by means of adaptive interpolation. Perceptual considerations indicate that packet lengths most robust to losses are in the range 16-32 ms, irrespective of whether interpolation is used or not. With these packet lengths, tolerable P L values, which are strictly input-speech-dependent, can be as high as 2 to 5 percent without interpolation and 5 to 10 percent with interpolation. These observations are based on a computer simulation with three sentence-length speech inputs, and on informal listening tests.

254 citations

Proceedings ArticleDOI
05 Dec 1988
TL;DR: An algorithm that separates the pixels in the image into clusters based on both their intensity and their clusters is developed, which performs better than the K-means algorithm and its nonadaptive extensions that incorporate spatial constraints by the use of Gibbs random fields.
Abstract: A generalization of the K-means clustering algorithm to include spatial constraints and to account for local intensity variations in the image is proposed. Spatial constraints are included by the use of a Gibbs random field model. Local intensity variations are accounted for in an iterative procedure involving averaging over a sliding window whose size decreases as the algorithm progresses. Results with an eight-neighbor Gibbs random field model applied to pictures of industrial objects and a variety of other images show that the algorithm performs better than the K-means algorithm and its nonadaptive extensions. >

247 citations


Cited by
More filters
Book
01 Jan 1998
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Abstract: Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.

17,693 citations

Journal ArticleDOI
TL;DR: It is proved the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density.
Abstract: A general non-parametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure: the mean shift. For discrete data, we prove the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density. The relation of the mean shift procedure to the Nadaraya-Watson estimator from kernel regression and the robust M-estimators; of location is also established. Algorithms for two low-level vision tasks discontinuity-preserving smoothing and image segmentation - are described as applications. In these algorithms, the only user-set parameter is the resolution of the analysis, and either gray-level or color images are accepted as input. Extensive experimental results illustrate their excellent performance.

11,727 citations

Journal ArticleDOI
TL;DR: It is argued that insertion of a watermark under this regime makes the watermark robust to signal processing operations and common geometric transformations provided that the original image is available and that it can be successfully registered against the transformed watermarked image.
Abstract: This paper presents a secure (tamper-resistant) algorithm for watermarking images, and a methodology for digital watermarking that may be generalized to audio, video, and multimedia data. We advocate that a watermark should be constructed as an independent and identically distributed (i.i.d.) Gaussian random vector that is imperceptibly inserted in a spread-spectrum-like fashion into the perceptually most significant spectral components of the data. We argue that insertion of a watermark under this regime makes the watermark robust to signal processing operations (such as lossy compression, filtering, digital-analog and analog-digital conversion, requantization, etc.), and common geometric transformations (such as cropping, scaling, translation, and rotation) provided that the original image is available and that it can be successfully registered against the transformed watermarked image. In these cases, the watermark detector unambiguously identifies the owner. Further, the use of Gaussian noise, ensures strong resilience to multiple-document, or collusional, attacks. Experimental results are provided to support these claims, along with an exposition of pending open problems.

6,194 citations

Journal ArticleDOI
S. Boll1
TL;DR: A stand-alone noise suppression algorithm that resynthesizes a speech waveform and can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.
Abstract: A stand-alone noise suppression algorithm is presented for reducing the spectral effects of acoustically added noise in speech. Effective performance of digital speech processors operating in practical environments may require suppression of noise from the digital wave-form. Spectral subtraction offers a computationally efficient, processor-independent approach to effective digital speech analysis. The method, requiring about the same computation as high-speed convolution, suppresses stationary noise from speech by subtracting the spectral noise bias calculated during nonspeech activity. Secondary procedures are then applied to attenuate the residual noise left after subtraction. Since the algorithm resynthesizes a speech waveform, it can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.

4,862 citations

Book
01 Mar 1995
TL;DR: Wavelets and Subband Coding offered a unified view of the exciting field of wavelets and their discrete-time cousins, filter banks, or subband coding and developed the theory in both continuous and discrete time.
Abstract: First published in 1995, Wavelets and Subband Coding offered a unified view of the exciting field of wavelets and their discrete-time cousins, filter banks, or subband coding. The book developed the theory in both continuous and discrete time, and presented important applications. During the past decade, it filled a useful need in explaining a new view of signal processing based on flexible time-frequency analysis and its applications. Since 2007, the authors now retain the copyright and allow open access to the book.

2,793 citations