scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Circuits and Systems for Video Technology in 2002"


Journal ArticleDOI
TL;DR: Analysis shows that a speed improvement rate of the hexagon-based search (HEXBS) algorithm over the diamond search (DS) algorithm can be over 80% for locating some motion vectors in certain scenarios.
Abstract: In block motion estimation, a search pattern with a different shape or size has a very important impact on search speed and distortion performance. A square-shaped search pattern is adopted in many popular fast algorithms. Recently, a diamond-shaped search pattern was introduced in fast block motion estimation and has exhibited a faster search speed. Based on an in-depth examination of the influence of the search pattern on speed performance, we propose a novel algorithm using a hexagon-based search pattern to achieve further improvement. The hexagon-based search pattern is investigated in comparison with diamond search pattern and demonstrates significant speedup gain over the diamond-based search. Analysis shows that a speed improvement rate of the hexagon-based search (HEXBS) algorithm over the diamond search (DS) algorithm can be over 80% for locating some motion vectors in certain scenarios. In short, the proposed HEXBS algorithm can find the same motion vector with fewer search points than the DS algorithm. Generally speaking, the larger the motion vector, the more search points the. HEXBS algorithm can save, which is further justified by experimental results.

860 citations


Journal ArticleDOI
TL;DR: A conceptual solution to the shot-boundary detection problem is presented in the form of a statistical detector that is based on minimization of the average detection-error probability and the performance of the detector is demonstrated regarding two most widely used types of shot boundaries: hard cuts and dissolves.
Abstract: Partitioning a video sequence into shots is the first step toward video-content analysis and content-based video browsing and retrieval. A video shot is defined as a series of interrelated consecutive frames taken contiguously by a single camera and representing a continuous action in time and space. As such, shots are considered to be the primitives for higher level content analysis, indexing, and classification. The objective of this paper is twofold. First, we analyze the shot-boundary detection problem in detail and identify major issues that need to be considered in order to solve this problem successfully. Then, we present a conceptual solution to the shot-boundary detection problem in which all issues identified in the previous step are considered. This solution is provided in the form of a statistical detector that is based on minimization of the average detection-error probability. We model the required statistical functions using a robust metric for visual content discontinuities (based on motion compensation) and take into account all (a priori) knowledge that we found relevant to shot-boundary detection. This knowledge includes the shot-length distribution, visual discontinuity patterns at shot boundaries, and characteristic temporal changes of visual features around a boundary. Major advantages of the proposed detector are its robust and sequence-independent performance, while there is also the possibility to detect different types of shot boundaries simultaneously. We demonstrate the performance of our detector regarding two most widely used types of shot boundaries: hard cuts and dissolves.

513 citations


Journal ArticleDOI
Rainer Lienhart1, A. Wernicke
TL;DR: This work proposes a novel method for localizing and segmenting text in complex images and videos that is not only able to locate and segment text occurrences into large binary images, but is also able to track each text line with sub-pixel accuracy over the entire occurrence in a video.
Abstract: Many images, especially those used for page design on Web pages, as well as videos contain visible text. If these text occurrences could be detected, segmented, and recognized automatically, they would be a valuable source of high-level semantics for indexing and retrieval. We propose a novel method for localizing and segmenting text in complex images and videos. Text lines are identified by using a complex-valued multilayer feed-forward network trained to detect text at a fixed scale and position. The network's output at all scales and positions is integrated into a single text-saliency map, serving as a starting point for candidate text lines. In the case of video, these candidate text lines are refined by exploiting the temporal redundancy of text in video. Localized text lines are then scaled to a fixed height of 100 pixels and segmented into a binary image with black characters on white background. For videos, temporal redundancy is exploited to improve segmentation performance. Input images and videos can be of any size due to a true multiresolution approach. Moreover, the system is not only able to locate and segment text occurrences into large binary images, but is also able to track each text line with sub-pixel accuracy over the entire occurrence in a video, so that one text bitmap is created for all instances of that text line. Therefore, our text segmentation results can also be used for object-based video encoding such as that enabled by MPEG-4.

478 citations


Journal ArticleDOI
TL;DR: An efficient moving object segmentation algorithm suitable for real-time content-based multimedia communication systems is proposed and a processing speed of 25 QCIF fps can be achieved on a personal computer with a 450-MHz Pentium III processor.
Abstract: An efficient moving object segmentation algorithm suitable for real-time content-based multimedia communication systems is proposed in this paper. First, a background registration technique is used to construct a reliable background image from the accumulated frame difference information. The moving object region is then separated from the background region by comparing the current frame with the constructed background image. Finally, a post-processing step is applied on the obtained object mask to remove noise regions and to smooth the object boundary. In situations where object shadows appear in the background region, a pre-processing gradient filter is applied on the input image to reduce the shadow effect. In order to meet the real-time requirement, no computationally intensive operation is included in this method. Moreover, the implementation is optimized using parallel processing and a processing speed of 25 QCIF fps can be achieved on a personal computer with a 450-MHz Pentium III processor. Good segmentation performance is demonstrated by the simulation results.

441 citations


Journal ArticleDOI
TL;DR: The proposed cross-diamond search (CDS) algorithm employs the halfway-stop technique and finds small motion vectors with fewer search points than the DS algorithm while maintaining similar or even better search quality.
Abstract: In block motion estimation, search patterns with different shapes or sizes and the center-biased characteristics of motion-vector distribution have a large impact on the searching speed and quality of performance. We propose a novel algorithm using a cross-search pattern as the initial step and large/small diamond search (DS) patterns as the subsequent steps for fast block motion estimation. The initial cross-search pattern is designed to fit the cross-center-biased motion vector distribution characteristics of the real-world sequences by evaluating the nine relatively higher probable candidates located horizontally and vertically at the center of the search grid. The proposed cross-diamond search (CDS) algorithm employs the halfway-stop technique and finds small motion vectors with fewer search points than the DS algorithm while maintaining similar or even better search quality. The improvement of CDS over DS can be up to a 40% gain on speedup. Experimental results show that the CDS is much more robust, and provides faster searching speed and smaller distortions than other popular fast block-matching algorithms.

392 citations


Journal ArticleDOI
TL;DR: An analytic solution for adaptive intra mode selection and joint source-channel rate control under time-varying wireless channel conditions is derived and significantly improves the end-to-end video quality in wireless video coding and transmission.
Abstract: We first develop a rate-distortion (R-D) model for DCT-based video coding incorporating the macroblock (MB) intra refreshing rate. For any given bit rate and intra refreshing rate, this model is capable of estimating the corresponding coding distortion even before a video frame is coded. We then present a theoretical analysis of the picture distortion caused by channel errors and the subsequent inter-frame propagation. Based on this analysis, we develop a statistical model to estimate such channel errors induced distortion for different channel conditions and encoder settings. The proposed analytic model mathematically describes the complex behavior of channel errors in a video coding and transmission system. Unlike other experimental approaches for distortion estimation reported in the literature, this analytic model has very low computational complexity and implementation cost, which are highly desirable in wireless video applications. Simulation results show that this model is able to accurately estimate the channel errors induced distortion with a minimum delay in processing. Based on the proposed source coding R-D model and the analytic channel-distortion estimation, we derive an analytic solution for adaptive intra mode selection and joint source-channel rate control under time-varying wireless channel conditions. Extensive experimental results demonstrate that this scheme significantly improves the end-to-end video quality in wireless video coding and transmission.

390 citations


Journal ArticleDOI
TL;DR: A novel algorithm for segmentation of moving objects in video sequences and extraction of video object planes (VOPs) based on connected components analysis and smoothness of VO displacement in successive frames is proposed.
Abstract: The new video-coding standard MPEG-4 enables content-based functionality, as well as high coding efficiency, by taking into account shape information of moving objects. A novel algorithm for segmentation of moving objects in video sequences and extraction of video object planes (VOPs) is proposed . For the case of multiple video objects in a scene, the extraction of a specific single video object (VO) based on connected components analysis and smoothness of VO displacement in successive frames is also discussed. Our algorithm begins with a robust double-edge map derived from the difference between two successive frames. After removing edge points which belong to the previous frame, the remaining edge map, moving edge (ME), is used to extract the VOP. The proposed algorithm is evaluated on an indoor sequence captured by a low-end camera as well as MPEG-4 test sequences and produces promising results.

333 citations


Journal ArticleDOI
TL;DR: This work explores the data reuse properties of full-search block-matching for motion estimation (ME) and associated architecture designs, as well as memory bandwidth requirements, and a seven-type classification system is developed that can accommodate most published ME architectures.
Abstract: This work explores the data reuse properties of full-search block-matching (FSBM) for motion estimation (ME) and associated architecture designs, as well as memory bandwidth requirements. Memory bandwidth in high-quality video is a major bottleneck to designing an implementable architecture because of large frame size and search range. First, the memory bandwidth in ME is analyzed and the problem is solved by exploring data reuse. Four levels are defined according to the degree of data reuse for previous frame access. With the highest level of data reuse, one-access for frame pixels is achieved. A scheduling strategy is also applied to data reuse of the ME architecture designs and a seven-type classification system is developed that can accommodate most published ME architectures. This classification can simplify the work of designers in designing more cost-effective ME architectures, while simultaneously minimizing memory bandwidth. Finally, a FSBM architecture suitable for high quality HDTV video with a minimum memory bandwidth feature is proposed. Our architecture is able to achieve 100% hardware efficiency while preserving minimum I/O pin count, low local memory size, and bandwidth.

308 citations


Journal ArticleDOI
TL;DR: A new framework for rate-distortion (R-D) analysis is presented, where the coding rate R and distortion D are considered as functions of /spl rho/ which is the percentage of zeros among the quantized transform coefficients.
Abstract: We present a new framework for rate-distortion (R-D) analysis, where the coding rate R and distortion D are considered as functions of /spl rho/ which is the percentage of zeros among the quantized transform coefficients. Previously (see He, Z. et al., Int. Conf. Acoustics, Speech and Sig. Proc., 2001), we observed that, in transform coding of images and videos, the rate function R(/spl rho/) is approximately linear. Based on this linear rate model, a simple and unified rate control algorithm was proposed for all standard video coding systems, such as MPEG-2, H.263, and MPEG-4. We further develop a distortion model and an optimum bit allocation scheme in the /spl rho/ domain. This bit allocation scheme is applied to MPEG-4 video coding to allocate the available bits among different video objects. The bits target of each object is then achieved by our /spl rho/-domain rate control algorithm. When coupled with a macroblock classification scheme, the above bit allocation and rate control scheme can also be applied to other video coding systems, such as H.263, at the macroblock level. Our extensive experimental results show that the proposed algorithm controls the encoder bit rate very accurately and improves the video quality significantly (by up to 1.5 dB).

279 citations


Journal ArticleDOI
TL;DR: A significant enhancement of the method by means of a new neural approach, the random NN model, and its learning algorithm are reported on, both of which offer better performances for the application.
Abstract: An important and unsolved problem today is that of automatic quantification of the quality of video flows transmitted over packet networks. In particular, the ability to perform this task in real time (typically for streams sent themselves in real time) is especially interesting. The problem is still unsolved because there are many parameters affecting video quality, and their combined effect is not well identified and understood. Among these parameters, we have the source bit rate, the encoded frame type, the frame rate at the source, the packet loss rate in the network, etc. Only subjective evaluations give good results but, by definition, they are not automatic. We have previously explored the possibility of using artificial neural networks (NNs) to automatically quantify the quality of video flows and we showed that they can give results well correlated with human perception. In this paper, our goal is twofold. First, we report on a significant enhancement of our method by means of a new neural approach, the random NN model, and its learning algorithm, both of which offer better performances for our application. Second, we follow our approach to study and analyze the behavior of video quality for wide range variations of a set of selected parameters. This may help in developing control mechanisms in order to deliver the best possible video quality given the current network situation, and in better understanding of QoS aspects in multimedia engineering.

265 citations


Journal ArticleDOI
TL;DR: This work addresses the problem of real-time video streaming over wireless LANs for both unicast and multicast transmission by describing a novel hybrid Automatic Repeat reQuest (ARQ) algorithm that efficiently combines forward error control (FEC) coding with the ARQ protocol.
Abstract: We address the problem of real-time video streaming over wireless LANs for both unicast and multicast transmission. The wireless channel is modeled as a packet-erasure channel at the IP level. For the unicast scenario, we describe a novel hybrid Automatic Repeat reQuest (ARQ) algorithm that efficiently combines forward error control (FEC) coding with the ARQ protocol. For the multiple-users scenario, we formulate the problem of real-time video multicast as an optimization of a maximum regret cost function across the multicast user space. The proposed solution efficiently combines progressive source coding with FEC coding. We present a theoretical analysis of the unicast and multicast cases, as well as experimental results that demonstrate the performance advantages of the proposed algorithms over existing methods.

Journal ArticleDOI
TL;DR: The experimental results show that the proposed method of measuring blocking artifacts is effective and stable across a wide variety of images, and the proposed blocking-artifact reduction method exhibits satisfactory performance as compared to other post-processing techniques.
Abstract: Blocking artifacts continue to be among the most serious defects that occur in images and video streams compressed to low bit rates using block discrete cosine transform (DCT)-based compression standards (e.g., JPEG, MPEG, and H.263). It is of interest to be able to numerically assess the degree of blocking artifact in a visual signal, for example, in order to objectively determine the efficacy of a compression method, or to discover the quality of video content being delivered by a web server. We propose new methods for efficiently assessing, and subsequently reducing, the severity of blocking artifacts in compressed image bitstreams. The method is blind, and operates only in the DCT domain. Hence, it can be applied to unknown visual signals, and it is efficient since the signal need not be compressed or decompressed. In the algorithm, blocking artifacts are modeled as 2-D step functions. A fast DCT-domain algorithm extracts all parameters needed to detect the presence of, and estimate the amplitude of blocking artifacts, by exploiting several properties of the human vision system. Using the estimate of blockiness, a novel DCT-domain method is then developed which adaptively reduces detected blocking artifacts. Our experimental results show that the proposed method of measuring blocking artifacts is effective and stable across a wide variety of images. Moreover, the proposed blocking-artifact reduction method exhibits satisfactory performance as compared to other post-processing techniques. The proposed technique has a low computational cost hence can be used for real-time image/video quality monitoring and control, especially in applications where it is desired that the image/video data be processed directly in the DCT-domain.

Journal ArticleDOI
TL;DR: A new approach for multiple description video coding employs a second-order predictor for motion-compensation, which predicts a current frame from two previously coded frames, which can be varied to achieve a wide range of tradeoffs between coding efficiency and error resilience.
Abstract: A new approach for multiple description video coding is proposed. It employs a second-order predictor for motion-compensation, which predicts a current frame from two previously coded frames. The coder generates two descriptions, containing the coded even and odd frames, respectively. When only a single description (say, that containing even frames) is received, the decoder can only use previous even frames for prediction. The mismatch between the predicted frames at the encoder and decoder is explicitly coded to avoid error propagation in the ideal MD channels, where a description is either received intact or lost completely. By using the second-order predictor and coding the mismatch signal, one can also suppress error propagation in packet lossy networks where packets in either description can be lost. The predictor and the mismatch signal quantizer can be varied to achieve a wide range of tradeoffs between coding efficiency and error resilience.

Journal ArticleDOI
TL;DR: New methods of performing selective encryption and spatial/frequency shuffling of compressed digital content that maintain syntax compliance after content has been secured are introduced.
Abstract: We introduce new methods of performing selective encryption and spatial/frequency shuffling of compressed digital content that maintain syntax compliance after content has been secured. The tools described have been proposed to the MPEG-4 Intellectual Property Management and Protection (IPMP) standardization group and have been adopted into the MPEG-4 IPMP Final Proposed Draft Amendment (FPDAM). We describe the application of the new methods to the protection of MPEG-4 video content in the wireless environment, and illustrate how they are used to leverage established encryption algorithms for the protection of only the information fields in the bitstream that are critical to the reconstructed video quality, while maintaining compliance to the syntax of MPEG-4 video, and thereby reduces the amount of data to be encrypted and guarantees the inheritance of many of the good properties of the unprotected bitstreams that have been carefully studied and built, such as error resiliency and network friendliness. The encrypted content bitstream works with many existing random access, network bandwidth adaptation, and error control techniques that have been developed for standard-compliant compressed video, thus making it especially suitable for wireless multimedia applications. Standard compliance also allows subsequent signal processing techniques to be applied to the encrypted bitstream.

Journal ArticleDOI
TL;DR: The proposed rate control algorithm is proposed for various standard video coding systems, such as MPEG-2, H.263, and MPEG-4, and outperforms other algorithms reported in the literature by providing much more accurate and robust rate control.
Abstract: We show that, in any typical transform coding system, there is always a linear relationship between the coding bit rate R and the percentage of zeros among the quantized transform coefficients, denoted by /spl rho/. Based on Shannon's source coding theorem, a theoretical justification is provided for this linear source model. The physical meaning of the model parameter is also discussed. We show that it is directly related to the image content and is a measure of picture complexity. In video coding, we propose an adaptive estimation scheme to estimate this model parameter. Based on the linear source model and the adaptive estimation scheme, a unified rate control algorithm is proposed for various standard video coding systems, such as MPEG-2, H.263, and MPEG-4. Our extensive simulation results show that the proposed rate control outperforms other algorithms reported in the literature by providing much more accurate and robust rate control.

Journal ArticleDOI
TL;DR: Two techniques are proposed, the generalized motion vector predictor and the adaptive threshold calculation, that can be used to significantly improve the performance of many existing fast ME algorithms and create two new algorithms, named advanced predictive diamond zonal search and predictive MV field adaptive search technique.
Abstract: Motion estimation (ME) is an important part of any video encoding system since it could significantly affect the output quality of an encoded sequence. Unfortunately, this feature requires a significant part of the encoding time especially when using the straightforward full search (FS) algorithm. We propose two techniques, the generalized motion vector (MV) predictor and the adaptive threshold calculation, that can be used to significantly improve the performance of many existing fast ME algorithms. In particular, we apply them to create two new algorithms, named advanced predictive diamond zonal search and predictive MV field adaptive search technique, respectively, which can considerably reduce, if not essentially remove, the computational cost of ME at the encoder, while at the same time give similar, and in many cases better, visual quality with the brute force full search algorithm. The proposed algorithms mainly rely upon very robust and reliable predictive techniques and early termination criteria with parameters adapted to the local characteristics combined with the zonal based patterns. Our experiments verify the considerable superiority of the proposed algorithms versus the performance of possibly all other known fast algorithms, and FS.

Journal ArticleDOI
TL;DR: A new method for automatic segmentation of moving objects in image sequences for VOP extraction using a Markov random field, based on motion information, spatial information and the memory is presented.
Abstract: The emerging video coding standard MPEG-4 enables various content-based functionalities for multimedia applications. To support such functionalities, as well as to improve coding efficiency, MPEG-4 relies on a decomposition of each frame of an image sequence into video object planes (VOP). Each VOP corresponds to a single moving object in the scene. This paper presents a new method for automatic segmentation of moving objects in image sequences for VOP extraction. We formulate the problem as graph labeling over a region adjacency graph (RAG), based on motion information. The label field is modeled as a Markov random field (MRF). An initial spatial partition of each frame is obtained by a fast, floating-point based implementation of the watershed algorithm. The motion of each region is estimated by hierarchical region matching. To avoid inaccuracies in occlusion areas, a novel motion validation scheme is presented. A dynamic memory, based on object tracking, is incorporated into the segmentation process to maintain temporal coherence of the segmentation. Finally, a labeling is obtained by maximization of the a posteriori probability of the MRF using motion information, spatial information and the memory. The optimization is carried out by highest confidence first (HCF). Experimental results for several video sequences demonstrate the effectiveness of the proposed approach.

Journal ArticleDOI
TL;DR: This work proposes multiple description (MD) video coders which use motion-compensated predictions and provides three different algorithms to control the mismatch between the prediction loops at the encoder and decoder.
Abstract: We propose multiple description (MD) video coders which use motion-compensated predictions. Our MD video coders utilize MD transform coding and three separate prediction paths at the encoder to mimic the three possible scenarios at the decoder: both descriptions received or either of the single descriptions received. We provide three different algorithms to control the mismatch between the prediction loops at the encoder and decoder. We present simulation results comparing the three approaches to two standards-based approaches to MD video coding. We show that when the main prediction loop at the encoder uses a two-channel reconstruction, it is important to have side prediction loops and transmit some redundancy information to control mismatch. We also examine the performance of our MD video coder with partial mismatch control in the presence of random packet loss, and demonstrate a significant improvement compared to more traditional approaches.

Journal ArticleDOI
TL;DR: A protocol architecture is described that addresses the need for high bandwidth and more robust end-to-end connections in a multihop mobile radio network and the performance of the MDC-MPT scheme is compared to a system using layered coding and asymmetrical paths for the base and enhancement layers.
Abstract: This paper examines the effectiveness of combining multiple description coding (MDC) and multiple path transport (MPT) for video and image transmission in a multihop mobile radio network. The video and image information is encoded nonhierarchically into multiple descriptions with the following objectives. The received picture quality should be acceptable, even if only one description is received and every additional received description contributes to enhanced picture quality. Typical applications will need a higher bandwidth/higher reliability connection than that provided by a single link in current mobile networks. To support these applications, a mobile node may need to set up and use multiple paths to the desired destination, either simply because of the lack of raw bandwidth on a single channel or because of its poor error characteristics, which reduce its effective throughput. The principal reason for considering such an architecture is to provide high bandwidth and more robust end-to-end connections. We describe a protocol architecture that addresses this need and, with the help of simulations, we demonstrate the feasibility of this system and compare the performance of the MDC-MPT scheme to a system using layered coding and asymmetrical paths for the base and enhancement layers.

Journal ArticleDOI
TL;DR: With this framework, it is shown how keyword-based query and semantic filtering is possible for a predetermined set of concepts and how detection performance can be significantly improved using the multinet to take inter-conceptual relationships into account.
Abstract: Video query by semantic keywords is one of the most challenging research issues in video data management. To go beyond low-level similarity and access video data content by semantics, we need to bridge the gap between the low-level representation and high-level semantics. This is a difficult multimedia understanding problem. We formulate this problem as a probabilistic pattern-recognition problem for modeling semantics in terms of concepts and context. To map low-level features to high-level semantics, we propose probabilistic multimedia objects (multijects). Examples of multijects in movies include explosion, mountain, beach, outdoor, music, etc. Semantic concepts in videos interact and appear in context. To model this interaction explicitly, we propose a network of multijects (multinet). To model the multinet computationally, we propose a factor graph framework which can enforce spatio-temporal constraints. Using probabilistic models for multijects, rocks, sky, snow, water-body, and forestry/greenery, and using a factor graph as the multinet, we demonstrate the application of this framework to semantic video indexing. We demonstrate how detection performance can be significantly improved using the multinet to take inter-conceptual relationships into account. Our experiments using a large video database consisting of clips from several movies and based on a set of five semantic concepts reveal a significant improvement in detection performance by over 22%. We also show how the multinet is extended to take temporal correlation into account. By constructing a dynamic multinet, we show that the detection performance is further enhanced by as much as 12%. With this framework, we show how keyword-based query and semantic filtering is possible for a predetermined set of concepts.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a new framework for error concealment in block-based image coding systems, called sequential recovery, which reduces the complexity of statistical modeling from blockwise to pixelwise.
Abstract: This paper introduces a new framework for error concealment in block-based image coding systems: sequential recovery. Unlike previous approaches that simultaneously recover the pixels inside a missing block, we propose to recover them in a sequential fashion such that the previously-recovered pixels can be used in the recovery process afterwards. The principal advantage of the sequential approach is the improved capability of recovering important image features brought by the reduction in the complexity of statistical modeling, i.e., from blockwise to pixelwise. Under the framework of sequential recovery, we present an orientation adaptive interpolation scheme derived from the pixelwise statistical model. We also investigate the problem of error propagation with sequential recovery and propose a linear merge strategy to alleviate it. Extensive experimental results are used to demonstrate the improvement of the proposed sequential error-concealment technique over previous techniques in the literature.

Journal ArticleDOI
TL;DR: Experimental results show that simultaneously adjusting the source coding and transmission power is more energy efficient than considering these factors separately.
Abstract: We consider a situation where a video sequence is to be compressed and transmitted over a wireless channel. Our goal is to limit the amount of distortion in the received video sequence, while minimizing transmission energy. To accomplish this goal, we consider error resilience and concealment techniques at the source coding level, and transmission power management at the physical layer. We jointly consider these approaches in a novel framework. In this setting, we formulate and solve an optimization problem that corresponds to minimizing the energy required to transmit video under distortion and delay constraints. Experimental results show that simultaneously adjusting the source coding and transmission power is more energy efficient than considering these factors separately.

Journal ArticleDOI
TL;DR: An unsupervised fuzzy c-means algorithm is used to detect video-shot boundaries in order to segment a news video into video shots, and a graph-theoretical cluster analysis algorithm is implemented to classify the video shots into anchorperson shots and news footage shots.
Abstract: News story parsing is an important and challenging task in a news video library system. We address two important components in a news video story parsing system: shot boundary detection and anchorperson detection. First, an unsupervised fuzzy c-means algorithm is used to detect video-shot boundaries in order to segment a news video into video shots. Then, a graph-theoretical cluster analysis algorithm is implemented to classify the video shots into anchorperson shots and news footage shots. Because of its unsupervised nature, the algorithms require little human intervention. The efficacy of the proposed method is extensively tested on more than five hours of news programs.

Journal ArticleDOI
TL;DR: A new approach to threshold selection is proposed, which permits reducing the probability of missed detection to a minimum, while ensuring a given false detection probability, with respect to existing grayscale algorithms.
Abstract: In the field of image watermarking, research has been mainly focused on grayscale image watermarking, whereas the extension to the color case is usually accomplished by marking the image luminance, or by processing each color channel separately. A DCT domain watermarking technique expressly designed to exploit the peculiarities of color images is presented. The watermark is hidden within the data by modifying a subset of full-frame DCT coefficients of each color channel. Detection is based on a global correlation measure which is computed by taking into account the information conveyed by the three color channels as well as their interdependency. To ultimately decide whether or not the image contains the watermark, the correlation value is compared to a threshold. With respect to existing grayscale algorithms, a new approach to threshold selection is proposed, which permits reducing the probability of missed detection to a minimum, while ensuring a given false detection probability. Experimental results, as well as theoretical analysis, are presented to demonstrate the validity of the new approach with respect to algorithms operating on image luminance only.

Journal ArticleDOI
TL;DR: A motion-compensated, transform-domain super-resolution procedure for creating high-quality video or still images that directly incorporates the transform- domain quantization information by working with the compressed bit stream is proposed.
Abstract: There are a number of useful methods for creating high-quality video or still images from a lower quality video source. The best of these involve motion compensating a number of video frames to produce the desired video or still. These methods are formulated in the space domain and they require that the input be expressed in that format. More and more frequently, however, video sources are presented in a compressed format, such as MPEG, H.263, or DV. Ironically, there is important information in the compressed domain representation that is lost if the video is first decompressed and then used with a spatial-domain method. In particular, quantization information is lost once the video has been decompressed. Here, we propose a motion-compensated, transform-domain super-resolution procedure for creating high-quality video or still images that directly incorporates the transform-domain quantization information by working with the compressed bit stream. We apply this new formulation to MPEG-compressed video and demonstrate its effectiveness.

Journal ArticleDOI
TL;DR: A novel frequency-domain technique for image blocking artifact detection and reduction is presented and experimental results illustrating the performance of the proposed method are presented and evaluated.
Abstract: A novel frequency-domain technique for image blocking artifact detection and reduction is presented. The algorithm first detects the regions of the image which present visible blocking artifacts. This detection is performed in the frequency domain and uses the estimated relative quantization error calculated when the discrete cosine transform (DCT) coefficients are modeled by a Laplacian probability function. Then, for each block affected by blocking artifacts, its DC and AC coefficients are recalculated for artifact reduction. To achieve this, a closed-form representation of the optimal correction of the DCT coefficients is produced by minimizing a novel enhanced form of the mean squared difference of slope for every frequency separately. This correction of each DCT coefficient depends on the eight neighboring coefficients in the subband-like representation of the DCT transform and is constrained by the quantization upper and lower bound. Experimental results illustrating the performance of the proposed method are presented and evaluated.

Journal ArticleDOI
TL;DR: Though there is a marginal increase in the computation required in image-halving, the computation overhead of the proposed modification is higher compared to the Dugad-Ahuja algorithm in the case of doubling the images.
Abstract: Resizing of digital images is needed in various applications, such as transmission of images over communication channels varying widely in their bandwidths, display at different resolutions depending on the resolution of a display device, etc. In this work, we propose a modification of a recently proposed elegant image resizing algorithm by Dugad and Ahuja (2001). We have also extended their approach and our modified versions to color images and studied their performance at different levels of compression for an image. Our proposed modified algorithms, in general, perform better than the earlier method in most cases. Though there is a marginal increase in the computation required in image-halving, the computation overhead of the proposed modification is higher compared to the Dugad-Ahuja algorithm in the case of doubling the images.

Journal ArticleDOI
TL;DR: This work investigates the relations of rate, distortion and power consumption in video coding and proposes a power-minimized bit-allocation scheme considering the processing power, for source coding and channel coding, jointly with the transmission power.
Abstract: Video communication over wireless links using handheld devices is a challenging task due to the time-varying characteristics of the wireless channels and limited battery resources. Rate-distortion (RD) analysis plays a key role in video coding and communication systems, and usually the RD relation does not assume any power constraint. We investigate the relations of rate, distortion and power consumption. Based on those relations, we propose a power-minimized bit-allocation scheme considering the processing power, for source coding and channel coding, jointly with the transmission power. The total bits are allocated between source and channel coders, according to wireless channel conditions and video quality requirements, to minimize the total power consumption for a single user and a group of users in a cell, respectively. Simulation results show that our proposed joint power-control and bit-allocation scheme achieves high power savings compared to the conventional scheme.

Journal ArticleDOI
TL;DR: Experimental results on the proposed online processing scheme combined with efficient VOS show the proposed integrated scheme generates desirable summarizations of surveillance videos.
Abstract: Key frames are the subset of still images which best represent the content of a video sequence in an abstracted manner. In other words, video abstraction transforms an entire video clip to a small number of representative images. We present a scheme for object-based video abstraction facilitated by an efficient video-object segmentation (VOS) system. In such a framework, the concept of a "key frame" is replaced by that of a "key video-object plane (VOP)." In order to achieve an online object-based framework such as an object-based video surveillance system, it becomes essential that semantically meaningful video objects are directly accessed from video sequences. Moreover, the extraction of key VOPs needs to be automated and context dependent so that they maintain the important contents of the video while removing all redundancies. Once a VOP is extracted, the shape of the VOP needs to be well described. To this end, both region-based and contour-based shape descriptors are investigated, and the region-based descriptor is selected for the proposed system. The key VOPs are extracted in a sequential manner by successive comparison with the previously declared key VOP. Experimental results on the proposed online processing scheme combined with efficient VOS show the proposed integrated scheme generates desirable summarizations of surveillance videos.

Journal ArticleDOI
TL;DR: Experimental results show that, multihypothesis prediction improves significantly coding efficiency by utilizing variable block size and multiframe motion compensation, and it is shown that variable blocksize and multihypthesis prediction provide gains for different scenarios and that multiframemotion compensation enhances the multihymothesis gain.
Abstract: This paper investigates linearly combined motion-compensated signals for video compression In particular, we discuss multiple motion-compensated signals that are jointly estimated for efficient prediction and video coding First, we extend the wide-sense stationary theory of motion-compensated prediction (MCP) for the case of jointly estimated prediction signals Our theory suggests that the gain by multihypothesis MCP is limited and that two jointly estimated hypotheses provide a major portion of this achievable gain In addition, the analysis reveals a property of the displacement error of jointly estimated hypotheses Second, we present a complete multihypothesis codec which is based on the ITU-T Recommendation H263 with multiframe capability Multiframe motion compensation chooses one prediction signal from a set of reference frames, whereas multihypothesis prediction chooses more than one for the linear combination With our scheme, the time delay associated with B-frames is avoided by choosing more than one prediction signal from previously decoded pictures Experimental results show that, multihypothesis prediction improves significantly coding efficiency by utilizing variable block size and multiframe motion compensation We show that variable block size and multihypothesis prediction provide gains for different scenarios and that multiframe motion compensation enhances the multihypothesis gain For example, the presented multihypothesis codec with ten reference frames improves coding efficiency by up to 27 dB when compared to the reference codec with one reference frame for the set of investigated test sequences