scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Circuits and Systems for Video Technology in 2003"


Journal Article•DOI•
TL;DR: An overview of the technical features of H.264/AVC is provided, profiles and applications for the standard are described, and the history of the standardization process is outlined.
Abstract: H.264/AVC is newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goals of the H.264/AVC standardization effort have been enhanced compression performance and provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "nonconversational" (storage, broadcast, or streaming) applications. H.264/AVC has achieved a significant improvement in rate-distortion efficiency relative to existing standards. This article provides an overview of the technical features of H.264/AVC, describes profiles and applications for the standard, and outlines the history of the standardization process.

8,646 citations


Journal Article•DOI•
TL;DR: A unified approach to the coder control of video coding standards such as MPEG-2, H.263, MPEG-4, and the draft video coding standard H.264/AVC (advanced video coding) is presented.
Abstract: A unified approach to the coder control of video coding standards such as MPEG-2, H.263, MPEG-4, and the draft video coding standard H.264/AVC (advanced video coding) is presented. The performance of the various standards is compared by means of PSNR and subjective testing results. The results indicate that H.264/AVC compliant encoders typically achieve essentially the same reproduction quality as encoders that are compliant with the previous standards while typically requiring 60% or less of the bit rate.

3,312 citations


Journal Article•DOI•
TL;DR: The redundancy in digital images is explored to achieve very high embedding capacity, and keep the distortion low, in a novel reversible data-embedding method for digital images.
Abstract: Reversible data embedding has drawn lots of interest recently Being reversible, the original digital content can be completely restored We present a novel reversible data-embedding method for digital images We explore the redundancy in digital images to achieve very high embedding capacity, and keep the distortion low

2,739 citations


Journal Article•DOI•
TL;DR: Context-based adaptive binary arithmetic coding (CABAC) as a normative part of the new ITU-T/ISO/IEC standard H.264/AVC for video compression is presented, and significantly outperforms the baseline entropy coding method of H.265.
Abstract: Context-based adaptive binary arithmetic coding (CABAC) as a normative part of the new ITU-T/ISO/IEC standard H.264/AVC for video compression is presented. By combining an adaptive binary arithmetic coding technique with context modeling, a high degree of adaptation and redundancy reduction is achieved. The CABAC framework also includes a novel low-complexity method for binary arithmetic coding and probability estimation that is well suited for efficient hardware and software implementations. CABAC significantly outperforms the baseline entropy coding method of H.264/AVC for the typical area of envisaged target applications. For a set of test sequences representing typical material used in broadcast applications and for a range of acceptable video quality of about 30 to 38 dB, average bit-rate savings of 9%-14% are achieved.

1,702 citations


Journal Article•DOI•
P. List1, A. Joch, Jani Lainema2, G. Bjontegaard, Marta Karczewicz2 •
TL;DR: The adaptive deblocking filter used in the H.264/MPEG-4 AVC video coding standard performs simple operations to detect and analyze artifacts on coded block boundaries and attenuates those by applying a selected filter.
Abstract: This paper describes the adaptive deblocking filter used in the H.264/MPEG-4 AVC video coding standard. The filter performs simple operations to detect and analyze artifacts on coded block boundaries and attenuates those by applying a selected filter.

884 citations


Journal Article•DOI•
TL;DR: The 4/spl times/4 transforms in H.264 can be computed exactly in integer arithmetic, thus avoiding inverse transform mismatch problems and minimizing computational complexity, especially for low-end processors.
Abstract: This paper presents an overview of the transform and quantization designs in H.264. Unlike the popular 8/spl times/8 discrete cosine transform used in previous standards, the 4/spl times/4 transforms in H.264 can be computed exactly in integer arithmetic, thus avoiding inverse transform mismatch problems. The new transforms can also be computed without multiplications, just additions and shifts, in 16-bit arithmetic, thus minimizing computational complexity, especially for low-end processors. By using short tables, the new quantization formulas use multiplications but avoid divisions.

726 citations


Journal Article•DOI•
TL;DR: The paper describes the use of H.264 coded video over best-effort IP networks, using RTP as the real-time transport protocol.
Abstract: H.264 is the ITU-T's new, nonbackward compatible video compression Recommendation that significantly outperforms all previous video compression standards. It consists of a video coding layer (VCL) which performs all the classic signal processing tasks and generates bit strings containing coded macroblocks, and a network adaptation layer (NAL) which adapts those bit strings in a network friendly way. The paper describes the use of H.264 coded video over best-effort IP networks, using RTP as the real-time transport protocol. After a description of the environment, the error-resilience tools of H.264 and the draft specification of the RTP payload format are introduced. Next the performance of several possible VCL- and NAL-based error-resilience tools of H.264 are verified in simulations.

664 citations


Journal Article•DOI•
TL;DR: An overview over the tools which are likely to be used in wireless environments and discusses the most challenging application, wireless conversational services in greater detail is provided.
Abstract: Video transmission in wireless environments is a challenging task calling for high-compression efficiency as well as a network friendly design. Both have been major goals of the H.264/AVC standardization effort addressing "conversational" (i.e., video telephony) and "nonconversational" (i.e., storage, broadcast, or streaming) applications. The video compression performance of the H.264/AVC video coding layer typically provides a significant improvement. The network-friendly design goal of H.264/AVC is addressed via the network abstraction layer that has been developed to transport the coded video data over any existing and future networks including wireless systems. The main objective of this paper is to provide an overview over the tools which are likely to be used in wireless environments and discusses the most challenging application, wireless conversational services in greater detail. Appropriate justifications for the application of different tools based on experimental results are presented.

596 citations


Journal Article•DOI•
TL;DR: This work proposes a content-based soft annotation procedure for providing images with semantical labels, and experiments with two learning methods, support vector machines (SVMs) and Bayes point machines (BPMs), to select a base binary-classifier for CBSA.
Abstract: We propose a content-based soft annotation (CBSA) procedure for providing images with semantical labels. The annotation procedure starts with labeling a small set of training images, each with one single semantical label (e.g., forest, animal, or sky). An ensemble of binary classifiers is then trained for predicting label membership for images. The trained ensemble is applied to each individual image to give the image multiple soft labels, and each label is associated with a label membership factor. To select a base binary-classifier for CBSA, we experiment with two learning methods, support vector machines (SVMs) and Bayes point machines (BPMs), and compare their class-prediction accuracy. Our empirical study on a 116-category 25K-image set shows that the BPM-based ensemble provides better annotation quality than the SVM-based ensemble for supporting multimodal image retrievals.

479 citations


Journal Article•DOI•
TL;DR: This work study and analyze the computational complexity of a software-based H.264/AVC (advanced video codec) baseline profile decoder, determining the number of basic computational operations required by a decoder to perform the key decoding subfunctions and evaluating the dependence of the time complexity of each of the major decoder sub functions on encoder characteristics, content, resolution and bit rate.
Abstract: We study and analyze the computational complexity of a software-based H.264/AVC (advanced video codec) baseline profile decoder. Our analysis is based on determining the number of basic computational operations required by a decoder to perform the key decoding subfunctions. The frequency of use of each of the required decoding subfunctions is empirically derived using bitstreams generated from two different encoders for a variety of content, resolutions and bit rates. Using the measured frequencies, estimates of the decoder time complexity for various hardware platforms can be determined. A detailed example is provided to assist in deriving time complexity estimates. We compare the resulting estimates to numbers measured for an optimized decoder on the Pentium 3 hardware platform. We then use those numbers to evaluate the dependence of the time complexity of each of the major decoder subfunctions on encoder characteristics, content, resolution and bit rate. Finally, we compare an H.264/AVC-compliant baseline decoder to a decoder that is compliant with the H.263 standard, which is currently dominant in interactive video applications. Both "C" only decoder implementations were compared on a Pentium 3 hardware platform. Our results indicate that an H.264/AVC baseline decoder is approximately 2.5 times more time complex than an H.263 baseline decoder.

424 citations


Journal Article•DOI•
TL;DR: A blind discrete wavelet transform-discrete Fourier transform (DWT-DFT) composite image watermarking algorithm that is robust against both affine transformation and JPEG compression is proposed.
Abstract: Robustness is a crucially important issue in watermarking. Robustness against geometric distortion and JPEG compression at the same time with blind extraction remains especially challenging. A blind discrete wavelet transform-discrete Fourier transform (DWT-DFT) composite image watermarking algorithm that is robust against both affine transformation and JPEG compression is proposed. The algorithm improves robustness by using a new embedding strategy, watermark structure, 2D interleaving, and synchronization technique. A spread-spectrum-based informative watermark with a training sequence is embedded in the coefficients of the LL subband in the DWT domain while a template is embedded in the middle frequency components in the DFT domain. In watermark extraction, we first detect the template in a possibly corrupted watermarked image to obtain the parameters of an affine transform and convert the image back to its original shape. Then, we perform translation registration using the training sequence embedded in the DWT domain, and, finally, extract the informative watermark. Experimental work demonstrates that the proposed algorithm generates a more robust watermark than other reported watermarking algorithms. Specifically it is robust simultaneously against almost all affine transform related testing functions in StirMark 3.1 and JPEG compression with quality factor as low as 10. While the approach is presented for gray-level images, it can also be applied to color images and video sequences.

Journal Article•DOI•
TL;DR: This paper proposes an effective color filter array (CFA) interpolation method for digital still cameras (DSCs) using a simple image model that correlates the R,G,B channels and shows that the frequency response of the proposed method is better than the conventional methods.
Abstract: We propose an effective color filter array (CFA) interpolation method for digital still cameras (DSCs) using a simple image model that correlates the R,G,B channels. In this model, we define the constants K/sub R/ as green minus red and K/sub B/ as green minus blue. For real-world images, the contrasts of K/sub R/ and K/sub B/ are quite flat over a small region and this property is suitable for interpolation. The main contribution of this paper is that we propose a low-complexity interpolation method to improve the image quality. We show that the frequency response of the proposed method is better than the conventional methods. Simulation results also verify that the proposed method obtain superior image quality on typical images. The luminance channel of the proposed method outperforms by 6.34-dB peak SNR the bilinear method, and the chrominance channels have a 7.69-dB peak signal-to-noise ratio improvement on average. Furthermore, the complexity of the proposed method is comparable to conventional bilinear interpolation. It requires only add and shift operations to implement.

Journal Article•DOI•
Marta Karczewicz1, R. Kurceren1•
TL;DR: It is shown that SP-frames have significantly better coding efficiency than I-frames while providing similar functionalities.
Abstract: This paper discusses two new frame types, SP-frames and SI-frames, defined in the emerging video coding standard, known as ITU-T Rec. H.264 or ISO/IEC MPEG-4/Part 10-AVC. The main feature of SP-frames is that identical SP-frames can be reconstructed even when different reference frames are used for their prediction. This property allows them to replace I-frames in applications such as splicing, random access, and error recovery/resilience. We also include a description of SI-frames, which are used in conjunction with SP-frames. Finally, simulation results illustrating the coding efficiency of SP-frames are provided. It is shown that SP-frames have significantly better coding efficiency than I-frames while providing similar functionalities.

Journal Article•DOI•
Hyungshin Kim1, Heung-Kyu Lee1•
TL;DR: A robust image watermark based on an invariant image feature vector using normalized Zernike moments of an image as the vector and is robust with respect to geometrical distortions and compression.
Abstract: The paper introduces a robust image watermark based on an invariant image feature vector. Normalized Zernike moments of an image are used as the vector. The watermark is generated by modifying the vector. The watermark signal is designed with Zernike moments. The signal is added to the cover image in the spatial domain after the reconstruction process. We extract the feature vector from the modified image and use it as the watermark. The watermark is detected by comparing the computed Zernike moments of the test image and the given watermark vector. Rotation invariance is achieved by taking the magnitude of the Zernike moments. An image normalization method is used for scale and translation invariance. The robustness of the proposed method is demonstrated and tested using Stirmark 3.1. The test results show that our watermark is robust with respect to geometrical distortions and compression.

Journal Article•DOI•
TL;DR: The techniques for image-based rendering (IBR), in which 3-D geometry of the scene is known, are surveyed and the issues in trading off the use of images and geometry by revisiting plenoptic-sampling analysis and the notions of view dependency and geometric proxies are explored.
Abstract: We survey the techniques for image-based rendering (IBR) and for compressing image-based representations. Unlike traditional three-dimensional (3-D) computer graphics, in which 3-D geometry of the scene is known, IBR techniques render novel views directly from input images. IBR techniques can be classified into three categories according to how much geometric information is used: rendering without geometry, rendering with implicit geometry (i.e., correspondence), and rendering with explicit geometry (either with approximate or accurate geometry). We discuss the characteristics of these categories and their representative techniques. IBR techniques demonstrate a surprising diverse range in their extent of use of images and geometry in representing 3-D scenes. We explore the issues in trading off the use of images and geometry by revisiting plenoptic-sampling analysis and the notions of view dependency and geometric proxies. Finally, we highlight compression techniques specifically designed for image-based representations. Such compression techniques are important in making IBR techniques practical.

Journal Article•DOI•
TL;DR: It is demonstrated how the quality of the B pictures should be reduced to improve the overall rate-distortion performance of the scalable representation and shown that the gains by multihypothesis prediction and arithmetic coding are additive.
Abstract: This paper reviews recent advances in using B pictures in the context of the draft H.264/AVC video-compression standard. We focus on reference picture selection and linearly combined motion-compensated prediction signals. We show that bidirectional prediction exploits partially the efficiency of combined prediction signals whereas multihypothesis prediction allows a more general form of B pictures. The general concept of linearly combined prediction signals chosen from an arbitrary set of reference pictures improves the H.264/AVC test model TML-9 which is used in the following. We outline H.264/AVC macroblock prediction modes for B pictures, classify them into four groups and compare their efficiency in terms of rate-distortion performance. When investigating multihypothesis prediction, we show that bidirectional prediction is a special case of this concept. Multihypothesis prediction allows also two combined forward prediction signals. Experimental results show that this case is also advantageous in terms of compression efficiency. The draft H.264/AVC video-compression standard offers improved entropy coding by context-based adaptive binary arithmetic coding. Simulations show that the gains by multihypothesis prediction and arithmetic coding are additive. B pictures establish an enhancement layer and are predicted from reference pictures that are provided by the base layer. The quality of the base layer influences the rate-distortion trade-off for B pictures. We demonstrate how the quality of the B pictures should be reduced to improve the overall rate-distortion performance of the scalable representation.

Journal Article•DOI•
TL;DR: In order to reduce the bit rate of video signals, the standardized coding techniques apply motion-compensated prediction in combination with transform coding of the prediction error and a Wiener interpolation filter and 1/4-pel displacement vector resolution is applied.
Abstract: In order to reduce the bit rate of video signals, the standardized coding techniques apply motion-compensated prediction in combination with transform coding of the prediction error. By mathematical analysis, it is shown that aliasing components are deteriorating the prediction efficiency. In order to compensate the aliasing, two-dimensional (2-D) and three-dimensional interpolation filters are developed. As a result, motion- and aliasing-compensated prediction with 1/4-pel displacement vector resolution and a separable 2-D Wiener interpolation filter provide a coding gain of up to 2 dB when compared to 1/2-pel displacement vector resolution as it is used in H.263 or MPEG-2. An additional coding gain of 1 dB can be obtained with 1/8-pel displacement vector resolution when compared to 1/4-pel displacement vector resolution. In consequence of the significantly improved coding efficiency, a Wiener interpolation filter and 1/4-pel displacement vector resolution is applied in H.264/AVC and in MPEG-4 (advanced simple profile).

Journal Article•DOI•
TL;DR: A triangle model of perceived motion energy (PME) is proposed to model motion patterns in video and a scheme to extract key frames based on this model is proposed and the extracted key frames are representative.
Abstract: The key frame is a simple yet effective form of summarizing a long video sequence. The number of key frames used to abstract a shot should be compliant to visual content complexity within the shot and the placement of key frames should represent most salient visual content. Motion is the more salient feature in presenting actions or events in video and, thus, should be the feature to determine key frames. We propose a triangle model of perceived motion energy (PME) to model motion patterns in video and a scheme to extract key frames based on this model. The frames at the turning point of the motion acceleration and motion deceleration are selected as key frames. The key-frame selection process is threshold free and fast and the extracted key frames are representative.

Journal Article•DOI•
TL;DR: This work experiments with using spectral methods to infer a semantic space from user's relevance feedback, so that the system will gradually improve its retrieval performance through accumulated user interactions.
Abstract: As current methods for content-based retrieval are incapable of capturing the semantics of images, we experiment with using spectral methods to infer a semantic space from user's relevance feedback, so that our system will gradually improve its retrieval performance through accumulated user interactions. In addition to the long-term learning process, we also model the traditional approaches to query refinement using relevance feedback as a short-term learning process. The proposed short- and long-term learning frameworks have been integrated into an image retrieval system. Experimental results on a large collection of images have shown the effectiveness and robustness of our proposed algorithms.

Journal Article•DOI•
Mathias Wien1•
TL;DR: Simulation results reveal a performance increase up to 12% overall rate savings and 0.9 dB in peak signal-to-noise ratio.
Abstract: A concept for variable block-size transform coding is presented. It is called adaptive block-size transforms (ABT) and was proposed for coding of high resolution and interlaced video in the emerging video coding standard H.264/AVC. The basic idea of inter ABT is to align the block size used for transform coding of the prediction error to the block size used for motion compensation. Intra ABT employs variable block-size prediction and transforms for encoding. With ABT, the maximum feasible signal length is exploited for transform coding. Simulation results reveal a performance increase up to 12% overall rate savings and 0.9 dB in peak signal-to-noise ratio.

Journal Article•DOI•
TL;DR: This work presents detailed analysis and dedicated hardware architecture of the block-coding engine to execute the EBCOT algorithm efficiently and shows that about 60% of the processing time is reduced compared with sample-based straightforward implementation.
Abstract: Embedded block coding with optimized truncation (EBCOT) is the most important technology in the latest image-coding standard, JPEG 2000. The hardware design of the block-coding engine in EBCOT is critical because the operations are bit-level processing and occupy more than half of the computation time of the whole compression process. A general purpose processor (GPP) is, therefore, very inefficient to process these operations. We present detailed analysis and dedicated hardware architecture of the block-coding engine to execute the EBCOT algorithm efficiently. The context formation process in EBCOT is analyzed to get an insight into the characteristics of the operation. A column-based architecture and two speed-up methods, sample skipping (SS) and group-of-column skipping (GOCS), for the context generation are then proposed. As for arithmetic encoder design, the pipeline and look-ahead techniques are used to speed up the processing. It is shown that about 60% of the processing time is reduced compared with sample-based straightforward implementation. A test chip is designed and the simulation results show that it can process 4.6 million pixels image within 1 s, corresponding to 2400 /spl times/ 1800 image size, or CIF (352 /spl times/ 288) 4 : 2 : 0 video sequence with 30 frames per second at 50-MHz working frequency.

Journal Article•DOI•
TL;DR: A class of randomized algorithms to estimate Voronoi video similarity (VVS) by first summarizing each video with a small set of its sampled frames, called the video signature (ViSig), and then calculating the distances between corresponding frames from the two ViSigs are proposed.
Abstract: The proliferation of video content on the Web makes similarity detection an indispensable tool in Web data management, searching, and navigation. We propose a number of algorithms to efficiently measure video similarity. We define video as a set of frames, which are represented as high dimensional vectors in a feature space. Our goal is to measure ideal video similarity (IVS), defined as the percentage of clusters of similar frames shared between two video sequences. Since IVS is too complex to be deployed in large database applications, we approximate it with Voronoi video similarity (VVS), defined as the volume of the intersection between Voronoi cells of similar clusters. We propose a class of randomized algorithms to estimate VVS by first summarizing each video with a small set of its sampled frames, called the video signature (ViSig), and then calculating the distances between corresponding frames from the two ViSigs. By generating samples with a probability distribution that describes the video statistics, and ranking them based upon their likelihood of making an error in the estimation, we show analytically that ViSig can provide an unbiased estimate of IVS. Experimental results on a large dataset of Web video and a set of MPEG-7 test sequences with artificially generated similar versions are provided to demonstrate the retrieval performance of our proposed techniques.

Journal Article•DOI•
TL;DR: Based on log-polar mapping (LPM) and phase correlation, the paper presents a novel digital image watermarking scheme that is invariant to rotation, scaling, and translation (RST).
Abstract: Based on log-polar mapping (LPM) and phase correlation, the paper presents a novel digital image watermarking scheme that is invariant to rotation, scaling, and translation (RST). We embed a watermark in the LPMs of the Fourier magnitude spectrum of an original image, and use the phase correlation between the LPM of the original image and the LPM of the watermarked image to calculate the displacement of watermark positions in the LPM domain. The scheme preserves the image quality by avoiding computing the inverse log-polar mapping (ILPM), and produces smaller correlation coefficients for unwatermarked images by using phase correlation to avoid exhaustive search. The evaluations demonstrate that the scheme is invariant to rotation and translation, invariant to scaling when the scale is in a reasonable range, and very robust to JPEG compression.

Journal Article•DOI•
TL;DR: This paper presents a new HRD for H.264/AVC that is more general and flexible than those defined in prior standards and provides significant additional benefits.
Abstract: In video coding standards, a compliant bit stream must be decoded by a hypothetical decoder that is conceptually connected to the output of an encoder and consists of a decoder buffer, a decoder, and a display unit. This virtual decoder is known as the hypothetical reference decoder (HRD) in H.263 and the video buffering verifier in MPEG. The encoder must create a bit stream so that the hypothetical decoder buffer does not overflow or underflow. These previous decoder models assume that a given bit stream will be transmitted through a channel of a known bit rate and will be decoded (after a given buffering delay) by a device of some given buffer size. Therefore, these models are quite rigid and do not address the requirements of many of today's important video applications such as broadcasting video live or streaming pre-encoded video on demand over network paths with various peak bit rates to devices with various buffer sizes. In this paper, we present a new HRD for H.264/AVC that is more general and flexible than those defined in prior standards and provides significant additional benefits.

Journal Article•DOI•
TL;DR: Experimental results show that embedded watermarks using the proposed techniques can give good image quality and are robust in varying degree to JPEG compression, low-pass filtering, noise contamination, and print-and-scan.
Abstract: Three novel blind watermarking techniques are proposed to embed watermarks into digital images for different purposes. The watermarks are designed to be decoded or detected without the original images. The first one, called single watermark embedding (SWE), is used to embed a watermark bit sequence into digital images using two secret keys. The second technique, called multiple watermark embedding (MWE), extends SWE to embed multiple watermarks simultaneously in the same watermark space while minimizing the watermark (distortion) energy. The third technique, called iterative watermark embedding (IWE), embeds watermarks into JPEG-compressed images. The iterative approach of IWE can prevent the potential removal of a watermark in the JPEG recompression process. Experimental results show that embedded watermarks using the proposed techniques can give good image quality and are robust in varying degree to JPEG compression, low-pass filtering, noise contamination, and print-and-scan.

Journal Article•DOI•
TL;DR: The proposed image-sharing method has several characteristics: fast transmission among branches; fault tolerance; a secure storage system; reduced chance of pirating of high-quality images; and most importantly, the provision to each branch manager an easy-to-manage environment.
Abstract: This study presents a user-friendly image-sharing method for easier management of the shadow images. The sharing of images among several branches (distributed disks) using the proposed method has several characteristics: 1) fast transmission among branches; 2) fault tolerance; 3) a secure storage system; 4) reduced chance of pirating of high-quality images; and 5) most importantly, the provision to each branch manager an easy-to-manage environment (because each shadow image looks like a shrunken version of the original image). The current approach still has the small-size and channel-independent properties of our previous work, namely, the size of each shadow image is only 1/r of that of the original image, and any r shadow images can be used for restoration (the restored image is independent of which r shadow images are used).

Journal Article•DOI•
Jane Hunter1•
TL;DR: The ABC model's ability to mediate and integrate between multimedia metadata vocabularies is evaluated by illustrating how it can provide the foundation to facilitate semantic interoperability between MPEG-7, MPEG-21 and other domain-specific metadata vocABularies.
Abstract: A core ontology is one of the key building blocks necessary to enable the scalable assimilation of information from diverse multimedia sources. A complete and extensible ontology that expresses the basic concepts that are common across a variety of domains and media types and that can provide the basis for specialization into domain-specific concepts and vocabularies, is essential for well-defined mappings between domain-specific knowledge representations (i.e., metadata vocabularies) and the subsequent building of a variety of services such as cross-domain searching, tracking, browsing, data mining and knowledge acquisition. As more and more communities develop metadata application profiles which combine terms from multiple vocabularies (e.g., Dublin Core, MPEG-7, MPEG-21, CIDOC/CRM, FGDC, IMS), a core ontology will provide a common understanding of the basic entities and relationships, which is essential for semantic interoperability and the development of additional services based on deductive inferencing. In this paper, we first propose such a core ontology (the ABC model) which was developed in response to a need to integrate information from multiple genres of multimedia content within digital libraries and archives. Although the MPEG-21 RDD was influenced by the ABC model and is based on a model extremely similar to ABC, we believe that it is important to define a separate and domain-independent top-level extensible ontology for scenarios in which either MPEG-21 is irrelevant or to enable the attachment of ontologies from communities external to MPEG, for example, the museum domain (CIDOC/CRM) or the biomedical domain (ON9.3). We evaluate the ABC model's ability to mediate and integrate between multimedia metadata vocabularies by illustrating how it can provide the foundation to facilitate semantic interoperability between MPEG-7, MPEG-21 and other domain-specific metadata vocabularies. By expressing the semantics of both MPEG-7 and MPEG-21 metadata terms in RDF Schema/DAML+OIL [and eventually the Web Ontology Language (OWL)] and attaching the MPEG-7 and MPEG-21 class and property hierarchies to the appropriate top-level classes and properties of the ABC model, we have defined a single distributed machine-understandable ontology. The resulting ontology provides semantic knowledge which is nonexistent within declarative XML schemas or XML-encoded metadata descriptions. Finally, in order to illustrate how such an ontology will contribute to the interoperability of data and services across the entire multimedia content delivery chain, we describe a number of valuable services which have been developed or could potentially be developed using the resulting merged ontologies.

Journal Article•DOI•
TL;DR: A new scaling algorithm is proposed, winscale, which performs the scale up/down transform using an area pixel model rather than a point pixel model, and it is proved that winscale has good scale property with low complexity.
Abstract: We propose a new scaling algorithm, winscale, which performs the scale up/down transform using an area pixel model rather than a point pixel model. The proposed algorithm has low complexity: the algorithm uses a maximum of four pixels of an original image to calculate one pixel of a scaled image. Nevertheless, the algorithm has good characteristics such as fine-edge and changeable smoothness. We implemented a hardware design of winscale using an FPGA and displayed some test scenes in an liquid crystal display panel using a digital visual interface. The hardware cost and the image quality were compared with those of the conventional image scaling algorithms. It is proved that winscale has good scale property with low complexity. Winscale can be used in various digital display devices that need image scaling, especially in applications that require good image quality with low hardware cost.

Journal Article•DOI•
TL;DR: A new method to parameterize the spectral analysis problem for IBR, which is applicable for general-purpose IBR spectral analysis and shows that with the "truncating windows" analysis and some conclusions obtained with SPF, the spectrum expansion caused by non-Lambertian reflections and occlusions can be quantatively estimated, even when the scene geometry is not explicitly known.
Abstract: Image-based rendering (IBR) has become a very active research area in recent years. The spectral analysis problem for IBR has not been completely solved. In this paper, we present a new method to parameterize the problem, which is applicable for general-purpose IBR spectral analysis. We notice that any plenoptic function is generated by light ray emitted/reflected/refracted from the object surface. We introduce the surface plenoptic function (SPF), which represents the light rays starting from the object surface. Given that radiance along a light ray does not change unless the light ray is blocked, SPF reduces the dimension of the original plenoptic function to 6D. We are then able to map or transform the SPF to IBR representations captured along any camera trajectory. Assuming some properties on the SPF, we can analyze the properties of IBR for generic scenes such as scenes with Lambertian or non-Lambertian surfaces and scenes with or without occlusions, and for different sampling strategies such as lightfield/concentric mosaic. We find that in most cases, even though the SPF may be band-limited, the frequency spectrum of IBR is not band-limited. We show that non-Lambertian reflections, depth variations and occlusions can all broaden the spectrum, with the latter two being more significant. SPF is defined for scenes with known geometry. When the geometry is unknown, spectral analysis is still possible. We show that with the "truncating windows" analysis and some conclusions obtained with SPF, the spectrum expansion caused by non-Lambertian reflections and occlusions can be quantatively estimated, even when the scene geometry is not explicitly known. Given the spectrum of IBR, we also study how to sample IBR data more efficiently. Our analysis is based on the generalized periodic sampling theory with arbitrary geometry. We show that the sampling efficiency can be up to twice of that when we use rectangular sampling. The advantages and disadvantages of generalized periodic sampling for IBR are also discussed.