scispace - formally typeset
Search or ask a question
Conference

Multimedia Signal Processing 

About: Multimedia Signal Processing is an academic conference. The conference publishes majorly in the area(s): Data compression & Feature extraction. Over the lifetime, 2792 publications have been published by the conference receiving 33517 citations.


Papers
More filters
Proceedings Article
01 Apr 2011
TL;DR: A survey on steganography and steganalysis for digital images, mainly covering the fundamental concepts, the progress of steganographic methods for images in spatial representation and in JPEG format, and the development of the corresponding steganalytic schemes.
Abstract: Steganography and steganalysis are important topics in information hiding. Steganography refers to the technology of hiding data into digital media without drawing any suspicion, while steganalysis is the art of detecting the presence of steganography. This paper provides a survey on steganography and steganalysis for digital images, mainly covering the fundamental concepts, the progress of steganographic methods for images in spatial representation and in JPEG format, and the development of the corresponding steganalytic schemes. Some commonly used strategies for improving steganographic se- curity and enhancing steganalytic capability are summarized and possible research trends are discussed.

417 citations

Proceedings ArticleDOI
09 Dec 2002
TL;DR: Different techniques mapping functional parts to blocks of a unified framework for audio fingerprinting are reviewed, with a focus on pattern matching and robust hashing.
Abstract: An audio fingerprint is a content-based compact signature that summarizes an audio recording Audio fingerprinting technologies have recently attracted attention since they allow the monitoring of audio independently of its format and without the need of meta-data or watermark embedding The different approaches to fingerprinting are usually described with different rationales and terminology depending on the background: pattern matching, multimedia (music) information retrieval or cryptography (robust hashing) In this paper, we review different techniques mapping functional parts to blocks of a unified framework

346 citations

Proceedings ArticleDOI
03 Oct 2001
TL;DR: In this paper, a modified Karhunen-Loeve transform is proposed for 3D object retrieval, which takes into account not only vertices or polygon centroids from the 3D models but all points in the polygons of the objects.
Abstract: We present tools for 3D object retrieval in which a model, a polygonal mesh, serves as a query and similar objects are retrieved from a collection of 3D objects. Algorithms proceed first by a normalization step (pose estimation) in which models are transformed into a canonical coordinate frame. Second, feature vectors are extracted and compared with those derived from normalized models in the search space. Using a metric in the feature vector space nearest neighbors are computed and ranked. Objects thus retrieved are displayed for inspection, selection, and processing. For the pose estimation we introduce a modified Karhunen-Loeve transform that takes into account not only vertices or polygon centroids from the 3D models but all points in the polygons of the objects. Some feature vectors can be regarded as samples of functions on the 2-sphere. We use Fourier expansions of these functions as uniform representations allowing embedded multi-resolution feature vectors. Our implementation demonstrates and visualizes these tools.

345 citations

Journal ArticleDOI
01 Feb 2001
TL;DR: This work presents a class of embedding methods called quantization index modulation (QIM) that achieve provably good rate-distortion-robustness performance, and introduces a form of postprocessing the authors refer to as distortion compensation that, when combined with QIM, allows capacity to be achieved.
Abstract: Copyright notification and enforcement, authentication, covert communication, and hybrid transmission applications such as digital audio broadcasting are examples of emerging multimedia applications for digital watermarking and information embedding methods, methods for embedding one signal (e.g., the digital watermark) within another "host" signal to form a third, "composite" signal. The embedding is designed to achieve efficient trade-offs among the three conflicting goals of maximizing information-embedding rate, minimizing distortion between the host signal and composite signal, and maximizing the robustness of the embedding. We present a class of embedding methods called quantization index modulation (QIM) that achieve provably good rate-distortion-robustness performance. These methods, and low-complexity realizations of them called dither modulation, are provably better than both previously proposed linear methods of spread spectrum and nonlinear methods of low-bit(s) modulation against square-error distortion-constrained intentional attacks. We also derive information-embedding capacities for the case of a colored Gaussian host signal and additive colored Gaussian noise attacks. These results imply an information embedding capacity of about 1/3 b/s of embedded digital rate for every Hertz of host signal bandwidth and every dB drop in received host signal quality. We show that QIM methods achieve performance within 1.6 dB of capacity, and we introduce a form of postprocessing we refer to as distortion compensation that, when combined with QIM, allows capacity to be achieved. In addition, we show that distortion-compensated QIM is an optimal embedding strategy against some important classes of intentional attacks as well. Finally, we report simulation results that demonstrate the performance of dither modulation realizations that can be implemented with only a few adders and scalar quantizers.

277 citations

Journal ArticleDOI
01 Oct 1998
TL;DR: A set of low-level audio features are proposed for characterizing semantic contents of short audio clips and a neural net classifier was successful in separating the above five types of TV programs.
Abstract: Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we have focused on using the associated audio information (mainly the nonspeech portion) for video scene analysis. As an example, we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. A set of low-level audio features are proposed for characterizing semantic contents of short audio clips. The linear separability of different classes under the proposed feature space is examined using a clustering analysis. The effective features are identified by evaluating the intracluster and intercluster scattering matrices of the feature space. Using these features, a neural net classifier was successful in separating the above five types of TV programs. By evaluating the changes between the feature vectors of adjacent clips, we also can identify scene breaks in an audio sequence quite accurately. These results demonstrate the capability of the proposed audio features for characterizing the semantic content of an audio sequence.

272 citations

Performance
Metrics
No. of papers from the Conference in previous years
YearPapers
20211
2020114
201978
201864
2017148
201681