scispace - formally typeset
Search or ask a question

Showing papers in "Signal Processing-image Communication in 2004"


Journal ArticleDOI
TL;DR: A new philosophy in designing image and video quality metrics is followed, which uses structural dis- tortion as an estimate of perceived visual distortion as part of full-reference (FR) video quality assessment.
Abstract: Objective image and video quality measures play important roles in a variety of image and video pro- cessing applications, such as compression, communication, printing, analysis, registration, restoration, enhancement and watermarking. Most proposed quality assessment ap- proaches in the literature are error sensitivity-based meth- ods. In this paper, we follow a new philosophy in designing image and video quality metrics, which uses structural dis- tortion as an estimate of perceived visual distortion. A com- putationally ecient approach is developed for full-reference (FR) video quality assessment. The algorithm is tested on the video quality experts group (VQEG) Phase I FR-TV test data set. Keywords—Image quality assessment, video quality assess- ment, human visual system, error sensitivity, structural dis- tortion, video quality experts group (VQEG)

1,083 citations


Journal ArticleDOI
TL;DR: A full- and no-reference blur metric as well as a full-reference ringing metric are presented, based on an analysis of the edges and adjacent regions in an image and have very low computational complexity.
Abstract: We present a full- and no-reference blur metric as well as a full-reference ringing metric. These metrics are based on an analysis of the edges and adjacent regions in an image and have very low computational complexity. As blur and ringing are typical artifacts of wavelet compression, the metrics are then applied to JPEG2000 coded images. Their perceptual significance is corroborated through a number of subjective experiments. The results show that the proposed metrics perform well over a wide range of image content and distortion levels. Potential applications include source coding optimization and network resource management.

526 citations


Journal ArticleDOI
TL;DR: An overview and analysis of the video coding tools that the standard supports and how these tools are organized into profiles, as well as a number of additional elements of the standard such as, tools that provide system support, details of levels of profiles, and the issue of encoder and decoder complexity.
Abstract: H.264/MPEG-4 AVC is a recently completed video compression standard jointly developed by the ITU-T VCEG and the ISO/IEC MPEG standards committees. The standard promises much higher compression than that possible with earlier standards. It allows coding of non-interlaced and interlaced video very efficiently, and even at high bit rates provides more acceptable visual quality than earlier standards. Further, the standard supports flexibilities in coding as well as organization of coded data that can increase resilience to errors or losses. As might be expected, the increase in coding efficiency and coding flexibility comes at the expense of an increase in complexity with respect to earlier standards. In this paper, we first briefly introduce the video coding tools that the standard supports and how these tools are organized into profiles. As with earlier standards, the mechanism of profiles allows one to implement only a desired subset of the standard and still be interoperable with applications of interest. Next, we discuss how the various video coding tools of the standard work, as well as the related issue of how to perform encoding using these tools. We then evaluate the coding performance in terms of contribution to overall improvement offered by individual tools, options within these tools, and important combinations of tools, on a representative set of video test sequences and movie clips. Next, we discuss a number of additional elements of the standard such as, tools that provide system support, details of levels of profiles, and the issue of encoder and decoder complexity. Finally, we summarize our overview and analysis of this standard, by identifying, based on their performance, promising tools as well as options within various tools. r 2004 Elsevier B.V. All rights reserved.

371 citations


Journal ArticleDOI
TL;DR: This paper surveys the various techniques developed for IBR, including representation, sampling and compression, and categorizes various IBR representations into two categories based on how the plenoptic function is simplified, namely restraining the viewing space and introducing source descriptions.
Abstract: Image-based rendering (IBR) has attracted a lot of research interest recently. In this paper, we survey the various techniques developed for IBR, including representation, sampling and compression. The goal is to provide an overview of research for IBR in a complete and systematic manner. We observe that essentially all the IBR representations are derived from the plenoptic function, which is seven dimensional and difficult to handle. We classify various IBR representations into two categories based on how the plenoptic function is simplified, namely restraining the viewing space and introducing source descriptions. In the former category, we summarize six common assumptions that were often made in various approaches and discuss how the dimension of the plenoptic function can be reduced based on these assumptions. In the latter category, we further categorize the methods based on what kind of source description was introduced, such as scene geometry, texture map or reflection model. Sampling and compression are also discussed respectively for both categories.

273 citations


Journal ArticleDOI
TL;DR: A novel framework for fully scalable video coding that performs open-loop motion-compensated temporal filtering (MCTF) in the wavelet domain (in-band) is presented, and inspired by recent work on advanced prediction techniques, an algorithm for optimized multihypothesis temporal filtering is proposed.
Abstract: A novel framework for fully scalable video coding that performs open-loop motion-compensated temporal filtering (MCTF) in the wavelet domain (in-band) is presented in this paper. Unlike the conventional spatial-domain MCTF (SDMCTF) schemes, which apply MCTF on the original image data and then encode the residuals using the critically sampled discrete wavelet transform (DWT), the proposed framework applies the in-band MCTF (IBMCTF) after the DWT is performed in the spatial dimensions. To overcome the inefficiency of MCTF in the critically-sampled DWT, a complete-to-overcomplete DWT (CODWT) is performed. Recent theoretical findings on the CODWT are reviewed from the application perspective of fully-scalable IBMCTF, and constraints on the transform calculation that allow for fast and seamless resolution-scalable coding are established. Furthermore, inspired by recent work on advanced prediction techniques, an algorithm for optimized multihypothesis temporal filtering is proposed in this paper. The application of the proposed algorithm in MCTF-based video coding is demonstrated, and similar improvements as for the multihypothesis prediction algorithms employed in closed-loop video coding are experimentally observed. Experimental instantiations of the proposed IBMCTF and SDMCTF coders with multihypothesis prediction produce single embedded bitstreams, from which subsets are extracted to be compared against the current state-of-the-art in video coding.

193 citations


Journal ArticleDOI
TL;DR: A new approach for image fingerprinting using the Radon transform to make the fingerprint robust against affine transformations, and addresses other issues such as pairwise independence, database search efficiency and key dependence of the proposed method.
Abstract: With the ever-increasing use of multimedia contents through electronic commerce and on-line services, the problems associated with the protection of intellectual property, management of large database and indexation of content are becoming more prominent. Watermarking has been considered as efficient means to these problems. Although watermarking is a powerful tool, there are some issues with the use of it, such as the modification of the content and its security. With respect to this, identifying content itself based on its own features rather than watermarking can be an alternative solution to these problems. The aim of fingerprinting is to provide fast and reliable methods for content identification. In this paper, we present a new approach for image fingerprinting using the Radon transform to make the fingerprint robust against affine transformations. Since it is quite easy with modern computers to apply affine transformations to audio, image and video, there is an obvious necessity for affine transformation resilient fingerprinting. Experimental results show that the proposed fingerprints are highly robust against most signal processing transformations. Besides robustness, we also address other issues such as pairwise independence, database search efficiency and key dependence of the proposed method.

185 citations


Journal ArticleDOI
TL;DR: Analysis, experimental results, and independent studies that demonstrate quality benefits of WMV-9 over a variety of codecs, including optimized implementations of MPEG-2, MPEG-4, and H.264/AVC are presented.
Abstract: Microsoft ® Windows Media 9 Series is a set of technologies that enables rich digital media experiences across many types of networks and devices. These technologies are widely used in the industry for media delivery over the internet and other media, and are also applied to broadcast, high definition DVDs, and digital projection in theaters. At the core of these technologies is a state-of-the-art video codec called Windows Media Video 9 (WMV-9), which provides highly competitive video quality for reasonable computational complexity. WMV-9 is currently under standardization by the Society of Motion Picture and Television Engineers (SMPTE) and the spec is at the CD (Committee Draft) stage. This paper includes a brief introduction to Windows Media technologies and their applications, with a focus on the compression algorithms used in WMV-9. We present analysis, experimental results, and independent studies that demonstrate quality benefits of WMV-9 over a variety of codecs, including optimized implementations of MPEG-2, MPEG-4, and H.264/AVC. We also discuss the complexity advantages of WMV-9 over H.264/AVC.

161 citations


Journal ArticleDOI
TL;DR: A new method for analysis and synthesis is proposed allowing, from a single photo, to cancel the facial expression on a given face and to artificially synthesize novel expressions on this same face.
Abstract: This article addresses the issue of expressive face modelling using an active appearance model for facial expression recognition and synthesis. We consider the six universal emotional categories namely joy, anger, fear, disgust, sadness and surprise. After a description of the active appearance model (computed with 3 or only one PCA), we address the active appearance model contribution to automatic facial expression recognition. Then we propose a new method for analysis and synthesis allowing, from a single photo, to cancel the facial expression on a given face and to artificially synthesize novel expressions on this same face. In this last framework, we propose two facial expression modelling approaches.

150 citations


Journal ArticleDOI
TL;DR: A content independent, no-reference sharpness metric based on the local frequency spectrum around the image edges that can be used by itself as a control variable for high- quality image capture and display systems, high-quality sharpness enhancement algorithms, and as a key component of a more general overall quality metric.
Abstract: Sharpness metrics that use the whole frequency spectrum of the image cannot separate the sharpness information from the scene content. The sharpness metrics that use spatial gradients of the edges work only for comparisons among versions of the same image. We have developed a content independent, no-reference sharpness metric based on the local frequency spectrum around the image edges. In this approach, we create an edge profile by detecting edge pixels and assigning them to 8×8 pixel blocks. Then we compute sharpness using the average 2D kurtosis of the 8×8 DCT blocks. However, average kurtosis is highly sensitive to asymmetry in the DCT, e.g. different amounts of energy and edges in the x and y directions, therefore causing problems with different content and asymmetric sharpness enhancement. Thus we compensate the kurtosis using spatial edge extent information and the amount of vertical and horizontal energy in the DCT. The results show high correlation with subjective quality for sharpness-enhanced video and high potential to deal with asymmetric enhancement. For compressed, extremely sharpened and noisy video, the metric correlates with subjective scores up to the point where impairments become strongly noticeable in the subjective quality evaluation. The metric can be used by itself as a control variable for high-quality image capture and display systems, high-quality sharpness enhancement algorithms, and as a key component of a more general overall quality metric.

148 citations


Journal ArticleDOI
TL;DR: The paper provides an explanation of MCTF methods and the resulting 3D wavelet representation, and shows results obtained in the context of encoding digital cinema (DC) materials.
Abstract: Scalability at the bitstream level is an important feature for encoded video that is to be transmitted and stored with a variety of target rates or to be replayed on devices with different capabilities and resolutions. This is attractive for digital cinema applications, where the same encoded source representation could seamlessly be used for purposes of archival and various distribution channels. Conventional high-performance video compression schemes are based on the method of motion-compensated prediction, using a recursive loop in the prediction process. Due to this recursion and the inherent drift in cases of deviation between encoder and decoder states, scalability is difficult to realize and typically effects a penalty in compression performance for prediction-based coders. The method of interframe wavelet coding overcomes this limitation by replacing the prediction along the time axis by a wavelet filter, which can nevertheless be operated in combination with motion compensation. Recent advances in motion-compensated temporal filtering (MCTF) have proven that combination with arbitrary motion compensation methods is possible. Compression performance is achieved that is comparable with state of the art single-layer coders targeting only for one rate. The paper provides an explanation of MCTF methods and the resulting 3D wavelet representation, and shows results obtained in the context of encoding digital cinema (DC) materials.

127 citations


Journal ArticleDOI
TL;DR: Methods for using the statistical properties of intra coded video data to estimate the quantization error caused by compression without accessing either the original pictures or the bitstream are described.
Abstract: Many user-end applications require an estimate of the quality of coded video or images without having access to the original, i.e. a no-reference quality metric. Furthermore, in many such applications the compressed video bitstream is also not available. This paper describes methods for using the statistical properties of intra coded video data to estimate the quantization error caused by compression without accessing either the original pictures or the bitstream. We derive closed form expressions for the quantization error in coding schemes based on the discrete cosine transform and block based coding. A commonly used quality metric, the peak signal to noise ratio (PSNR) is subsequently computed from the estimated quantization error. Since quantization error is the most significant loss incurred during typical coding schemes, the estimated PSNR, or any PSNR-based quality metric may be used to gauge the overall quality of the pictures.

Journal ArticleDOI
TL;DR: A variation of the peak-and-valley filter based on a recursive minimum–maximum method, which replaces the noisy pixel with a value based on neighborhood information, which preserves constant and edge areas even under high impulsive noise probability.
Abstract: Most image processing applications require noise elimination. For example, in applications where derivative operators are applied, any noise in the image can result in serious errors. Impulsive noise appears as a sprinkle of dark and bright spots. Transmission errors, corrupted pixel elements in the camera sensors, or faulty memory locations can cause impulsive noise. Linear filters fail to suppress impulsive noise. Thus, non-linear filters have been proposed. Windyga's peak-and-valley filter, introduced to remove impulsive noise, identifies noisy pixels and then replaces their values with the minimum or maximum value of their neighbors depending on the noise (dark or bright). Its main disadvantage is that it removes fine image details. In this work, a variation of the peak-and-valley filter is proposed to overcome this problem. It is based on a recursive minimum–maximum method, which replaces the noisy pixel with a value based on neighborhood information. This method preserves constant and edge areas even under high impulsive noise probability. Finally, a comparison study of the peak-and-valley filter, the median filter, and the proposed filter is carried-out using different types of images. The proposed filter outperforms other filters in the noise reduction and the image details preservation. However, it operates slightly slower than the peak-and-valley filter.

Journal ArticleDOI
TL;DR: Experiments on standard images show that the proposed scheme gets the fastest speed of fractal image coding up to the present and holds high reconstruction fidelity.
Abstract: A new no search fractal image coding scheme is introduced which is able to improve the speed of fractal image compression greatly. Every time-consuming part of fractal coding is redesigned and accelerated with new techniques. Compared with the most recent scheme of Tong and Wong, this method speeds up the encoding process by 22 times and maintain the compression quality. Experiments on standard images show that the proposed scheme gets the fastest speed of fractal image coding up to the present and holds high reconstruction fidelity. For example, using PII 450 MHz PC, the proposed scheme spends 0.515 s to compress the Lena (512×512×8) with 36.04 dB PSNR decoding quality. Using Dell PIV 2.8 GHz PC, it spends only 0.078 s to finish the encoding process and gets 36.04 dB PSNR.

Journal ArticleDOI
TL;DR: The investigation shows that motion-compensated three-dimensional transform coding can outperform predictive coding with single-hypothesis motion compensation by up to 0.5 bits/sample.
Abstract: This article explores the efficiency of motion-compensated three-dimensional transform coding, a compression scheme that employs a motion-compensated transform for a group of pictures. We investigate this coding scheme experimentally and theoretically. The practical coding scheme employs in temporal direction a wavelet decomposition with motion-compensated lifting steps. Further, we compare the experimental results to that of a predictive video codec with single-hypothesis motion compensation and comparable computational complexity. The experiments show that the 5/3 wavelet kernel outperforms both the Haar kernel and, in many cases, the reference scheme utilizing single-hypothesis motion-compensated predictive coding. The theoretical investigation models this motion-compensated subband coding scheme for a group of K pictures with a signal model for K motion-compensated pictures that are decorrelated by a linear transform. We utilize the Karhunen-Loeve Transform to obtain theoretical performance bounds at high bit-rates and compare to both optimum intra-frame coding of individual motion-compensated pictures and single-hypothesis motion-compensated predictive coding. The investigation shows that motion-compensated three-dimensional transform coding can outperform predictive coding with single-hypothesis motion compensation by up to 0.5 bits/sample.

Journal ArticleDOI
TL;DR: An objective quality metric that generates continuous estimates of perceived quality for low bit rate video is introduced based on a multichannel model of the human visual system that exceeds the performance of a similar metric based on the Mean Squared Error.
Abstract: An objective quality metric that generates continuous estimates of perceived quality for low bit rate video is introduced. The metric is based on a multichannel model of the human visual system. The vision model is initially parameterized to threshold data and then further optimized using video frames containing severe distortions. The proposed metric also discards processing of the finest scales to reduce computational complexity, which also results in an improvement in the accuracy of prediction for the sequences under consideration. A temporal pooling method suited to modeling continuous time waveforms is also introduced. The metric is parameterized and evaluated using the results of a Single Stimulus Continuous Quality Evaluation test conducted for CIF video at rates from 100 to 800 kbps . The proposed metric exceeds the performance of a similar metric based on the Mean Squared Error.

Journal ArticleDOI
TL;DR: Experiments on various still images and videos show that the new quality measure is very efficient in terms of computational complexity and memory usage, and can produce consistent blocking artifacts measurement.
Abstract: Block transform coding is the most popular approach for image and video compression. The objective measurement of blocking artifacts plays an important role in the design, optimization, and assessment of image and video coding systems. This paper presents a new algorithm for measuring image quality of a BDCT coded images or videos. It exhibits unique and useful features: (1) it examines the blocks individually so that it can measure the severity of blocking artifacts locally; (2) it is a one-pass algorithm in the sense that the image needs to be accessed only once; (3) it takes into account the blocking artifacts for high bit rate images and the flatness for the very low bit rate images; (4) the quality measure is well defined in the range of 0–10. Experiments on various still images and videos show that the new quality measure is very efficient in terms of computational complexity and memory usage, and can produce consistent blocking artifacts measurement.

Journal ArticleDOI
TL;DR: The proposed multiview sequence CODEC with view scalability provides several viewers with realities or one viewer motion parallax whereby changes in the viewer’s position results in changes in what is seen.
Abstract: A multiview sequence CODEC with flexibility, MPEG-2 compatibility and view scalability is proposed. We define a GGOP (Group of GOP) structure as a basic coding unit to efficiently code multiview sequences. Our proposed CODEC provides flexible GGOP structures based on the number of views and baseline distances among cameras. The encoder generates two types of bitstreams; a main bitstream and an auxiliary one. The main bitstream is the same as a MPEG-2 mono-sequence bitstream for MPEG-2 compatibility. The auxiliary bitstream contains information concerning the remaining multiview sequences except for the reference sequences. Our proposed CODEC with view scalability provides several viewers with realities or one viewer motion parallax whereby changes in the viewer’s position results in changes in what is seen. The important point is that a number of view points are selectively determined at the receiver according to the type of display modes. The viewers can choose an arbitrary number of views by checking the information so that only the views selected are decoded and displayed. The proposed multiview sequence CODEC is tested with several multiview sequences to determine its flexibility, compatibility and view scalability. In addition, we subjectively confirm that the decoded bitstreams with view scalability can be properly displayed by several types of display modes, including 3D monitors.

Journal ArticleDOI
TL;DR: A method of completing a tunnel excavation with a slurry shield type tunneling machine through a soft ground having a high water content or much gushing water, advancing the machine from the ground into a vertical shaft at, for example, a tunnel terminating point while preventing any ground collapse or gushingWater at a verticalaft wall finally excavated, and an apparatus for performing the method.
Abstract: The multimedia content delivery chain poses today many challenges. The increasing terminal diversity, network heterogeneity and the pressure to satisfy the user preferences are raising the need for content to be customized in order to provide the user the best possible experience. This paper addresses the problem of multimedia customization by (1) presenting the MPEG-7 multimedia content description standard and the MPEG-21 multimedia framework; (2) classifying multimedia customization processing algorithms; (3) discussing multimedia customization systems; and (4) presenting some customization experiments.

Journal ArticleDOI
TL;DR: This scheme provides fast text detection in images and videos with a low computation cost, comparing with traditional methods.
Abstract: Automatic character detection in video sequences is a complex task, due to the variety of sizes and colors as well as to the complexity of the background. In this paper we address this problem by proposing a localization/verification scheme. Candidate text regions are first localized by using a fast algorithm with a very low rejection rate, which enables the character size normalization. Contrast independent features are then proposed for training machine learning tools in order to verify the text regions. Two kinds of machine learning tools, multilayer perceptrons and support vector machines, are compared based on four different features in the verification task. This scheme provides fast text detection in images and videos with a low computation cost, comparing with traditional methods.

Journal ArticleDOI
TL;DR: An advanced motion threading technique is devised, in which one set of motion vectors is generated for each temporal layer of wavelet coefficients for temporal scalability, in order to reduce the motion overhead information, especially at low bit rates.
Abstract: This paper presents an advanced motion threading technique for improved performance in 3D wavelet coding. First, we extend an original motion threading idea of ours to a lifting- based implementation. Methods for enabling fractional-pixel alignment in motion threading and for processing many-to-one pixel mapping and non-referred pixels are proposed to reduce the wavelet boundary effects. Second, we devise an advanced motion threading technique, in which one set of motion vectors is generated for each temporal layer of wavelet coefficients for temporal scalability. In order to reduce the motion overhead information, especially at low bit rates, several correlated motion prediction modes at the macroblock level are defined to exploit the intra/inter layer correlation in motion vector coding. Finally, rate-distortion optimization is utilized in motion estimation to select the best motion prediction mode for each macroblock. With the new motion threading technique, we are able to achieve 1.5–6.0 dB gain in average PSNR in 3D wavelet coding over our previous implementation of motion threading.

Journal ArticleDOI
TL;DR: This paper presents a scalable video codec based on a 5/3 adaptive temporal lifting decomposition, which uses a memory-constraint “on-the-fly” implementation and evaluates the temporal scalability properties of this video coding structure.
Abstract: Motion-compensated temporal filtering subband video codecs have attracted recently a lot of attention, due to their compression performance comparable with that of state-of-the-art hybrid codecs and due to their additional scalability features. In this paper, we present a scalable video codec based on a 5/3 adaptive temporal lifting decomposition. Different adaptation criteria for coping with the occluded areas are discussed and new criteria for optimizing the temporal prediction are introduced. For our simulations, we use a memory-constraint “on-the-fly” implementation. We also evaluate the temporal scalability properties of this video coding structure.

Journal ArticleDOI
TL;DR: The proposed method consists of three stages, which apply a hierarchical block-based technique to detect and eliminate the moving regions from the background, and in the meantime the initial guess of the global motion is estimated.
Abstract: This paper aims to construct mosaics from video sequences with moving objects. We propose to explicitly eliminate moving objects from the background. When dealing with only the retained background, we can simplify the following global motion estimation and exclude moving objects from the video mosaic. The proposed method consists of three stages. First, we apply a hierarchical block-based technique to detect and eliminate the moving regions from the background, and in the meantime we estimate the initial guess of the global motion. Next, we employ a hierarchical feature-based technique on the retained background regions to refine and derive the precise global motion. Last, we refine the segmentation results obtained at the first stage and warp all the retained background regions with respect to a reference coordinate system and integrate them into a video mosaic. Many experimental results are shown to demonstrate the effectiveness of the proposed work.

Journal ArticleDOI
TL;DR: A novel unequal error protection technique that enhances the video transmission quality over wireless networks and it is demonstrated that the gain in the system performances can reach 1.5 dB without any significant increase in the transmission rate or the receiver complexity.
Abstract: In this paper, we present a novel unequal error protection technique that enhances the video transmission quality over wireless networks. The case of application considered is a UMTS/TDD transmission system for H263 compressed and turbo-coded video sequences. The overall redundancy added to the compressed stream is non-uniformly distributed between the succeeding video frames in order to minimize the mean distortion over the transmitted sequence. The repartition of the redundancy on the video stream is optimized using an analytical approach which aims to alleviate the error propagation along the sequence. Different puncturing patterns of the rate 1 3 turbo-coder were considered in our simulations. The results obtained here are compared to those with a classical equal error protection scheme. We demonstrate that the gain in the system performances can reach 1.5 dB (in terms of the mean peak signal-to-noise ratio) without any significant increase in the transmission rate or the receiver complexity.

Journal ArticleDOI
TL;DR: In this paper, the authors use the object motion history and first order statistical measurements of it to obtain information for the extraction of uncertainty regions, a kind of shape prior knowledge w.r.t. the allowed object deformations.
Abstract: Tracking moving objects in video sequences is a task that emerges in various fields of study: video analysis, computer vision, biomedical systems, etc. In the last decade, special attention has been drawn to problems concerning tracking in real-world environments, where moving objects do not obey any afore-known constraints about their nature and motion or the scenes they are moving in. Apart from the existence of noise and environmental changes, many problems are also concerned, due to background texture, complicated object motion, and deformable and/or articulated objects, changing their shape while moving along time. Another phenomenon in natural sequences is the appearance of occlusions between different objects, whose handling requires motion information and, in some cases, additional constraints. In this work, we revisit one of the most known active contours, the Snakes, and we propose a motion-based utilization of it, aiming at successful handling of the previously mentioned problems. The use of the object motion history and first order statistical measurements of it, provide us with information for the extraction of uncertainty regions, a kind of shape prior knowledge w.r.t. the allowed object deformations. This constraining also makes the proposed method efficient, handling the trade-off between accuracy and computation complexity. The energy minimization is approximated by a force-based approach inside the extracted uncertainty regions, and the weights of the total snake energy function are automatically estimated as respective weights in the resulting evolution force. Finally, in order to handle background complexity and partial occlusion cases, we introduce two rules, according to which the moving object region is correctly separated from the background, whereas the occluded boundaries are estimated according to the object's expected shape. To verify the performance of the proposed method, some experimental results are included, concerning different cases of object tracking, indoors and outdoors, with rigid and deformable objects, noisy and textured backgrounds, as well as appearance of occlusions.

Journal ArticleDOI
TL;DR: This paper proposes a novel query-based summary creation mechanism using a relevance metric and a constraints schema implemented in the context of an automatic video summarization system based on MPEG-7 descriptions, and a human skin filter allowing to build summaries based on the presence or absence of human skin.
Abstract: The ever-growing amount of audiovisual content available has raised the need to develop systems allowing each of us to consume the information considered ‘essential’, adapted to our tastes, our preferences, our time and also to our capacities of receiving, consuming and storing that information. In other words, there is an increasing need to develop systems able to automatically summarize audiovisual information. This paper proposes a novel query-based summary creation mechanism using a relevance metric and a constraints schema implemented in the context of an automatic video summarization system based on MPEG-7 descriptions. In the context of the same system, this paper also proposes a human skin filter allowing to build summaries based on the presence or absence of human skin. This skin colour filter is based solely on the MPEG-7 Dominant Colour descriptor, which means that the content is skin filtered with a rather small amount of processing, without accessing and processing the video data.

Journal ArticleDOI
TL;DR: The contributions included in this issue reflect the current orientation towards issues relevant to the broadcast and multimedia industries, which have been especially fostered by the ITU Video Quality Experts Group (VQEG) and the Alliance for Telecommunications Industry Solutions (ATIS) standards committee T1A1.
Abstract: In bringing to the readers this special issue on objective image quality, it is our aim to present current work and directions on a challenging subject with deep historical roots that is vital to the development of present day image and multimedia technologies. The contributions included in this issue reflect the current orientation towards issues relevant to the broadcast and multimedia industries, which have been especially fostered by the ITU Video Quality Experts Group (VQEG) and the Alliance for Telecommunications Industry Solutions (ATIS) standards committee T1A1. Those issues include objective quality monitoring and control, and multimedia Quality of Service (QoS) management. The papers have been selected for their contributions in two categories: methodologies for using image quality metrics; algorithms for image quality metrics. In the methodology category, the paper by Brill et al. provides new methods for evaluating, comparing and cross-calibrating image quality metrics that for the first time take into account observer variability. Also, in this category the paper by Moore et al. emphasizes the impact of defects with respect to content, and importance of dealing with content explicitly as a way to identify general assessment principles. The papers on image quality metrics have been selected for their innovative approaches to specific problems in measuring or estimating objective quality as well as their contribution and perspective on highly relevant issues such as quality vs. fidelity, supra-threshold impairment assessment, fulland no-reference metrics, continuous quality evaluation, and bottom-up vs. top-down quality assessment. The paper by Caviedes and Oberti presents an innovative algorithm to measure sharpness without a reference. It also addresses the general issue of measuring quality enhancement,

Journal ArticleDOI
TL;DR: This paper develops a different method to find the rate-distortion functions for JSCC of the MPEG-2 video, and shows that the end-to-end distortion of the UEP method is smaller than the equal error protection method for the same total bit-rate.
Abstract: This paper proposes an unequal error protection (UEP) method for MPEG-2 video transmission. Since the source and channel coders are normally concatenated, if the channel is noisy, more bits are allocated to channel coding and fewer to source coding. The situation is reversed when the channel conditions are more benign. Most of the joint source channel coding (JSCC) methods assume that the video source is subband coded, the bit error sensitivity of the source code can be modeled, and the bit allocations for different subband channels will be calculated. The UEP applied to different subbands is the rate compatible punctured convolution channel coder. However, the MPEG-2 coding is not a subband coding, the bit error sensitivity function for the coded video can no longer be applied. Here, we develop a different method to find the rate-distortion functions for JSCC of the MPEG-2 video. In the experiments, we show that the end-to-end distortion of our UEP method is smaller than the equal error protection method for the same total bit-rate.

Journal ArticleDOI
TL;DR: An improved motion-compensated restoration method for color motion picture films deteriorated due to flashing blotches which consists of an improved multiresolution block matching with log-D search, a rank ordered differences-based blotch detection and 3D vector median filtering for interpolation of missing data.
Abstract: In this paper, we propose an improved motion-compensated restoration method for color motion picture films deteriorated due to flashing blotches. The method consists of an improved multiresolution block matching with log-D search, a rank ordered differences-based blotch detection and 3D vector median filtering for interpolation of missing data, and utilizes five consecutive frames. Performance of the method is tested on artificially corrupted image sequences and real motion picture films, and is compared to that of the three-frame-based method which involves similar algorithms except improved motion estimation and blotch detection. The results show that the method efficiently works even in severely blotched and motion regions in image sequences.

Journal ArticleDOI
TL;DR: A technique to identify patterns associated with different downsampling methods in order to select the appropriate upsampling mechanism is proposed, which has low complexity and achieves high accuracy over a wide range of images.
Abstract: Downsampling an image results in the loss of image information that cannot be recovered with upsampling. We demonstrate that the particular combination of downsampling and upsampling methods used can significantly impact the reconstructed image quality, and then we propose a technique to identify patterns associated with different downsampling methods in order to select the appropriate upsampling mechanism. The technique has low complexity and achieves high accuracy over a wide range of images.

Journal ArticleDOI
TL;DR: From preliminary results that exploit subpixel displacements between LR frames to attain superresolution, it is concluded that SGWs show promise and potential to be extremely fast, efficient and versatile for superresolution.
Abstract: Over the last 3 years or so, first-generation wavelets have been used to realize superresolution from a captured sequence of low-resolution (LR) degraded frames. Here, it is pointed out that second-generation wavelets (SGWs) are inherently more suited for image superresolution. From preliminary results that exploit subpixel displacements between LR frames to attain superresolution, it is concluded that SGWs show promise and potential to be extremely fast, efficient and versatile for superresolution.