scispace - formally typeset
Search or ask a question
Patent

System and method for object based parametric video coding

TL;DR: In this article, a video compression framework based on parametric object and background compression is proposed, where an embodiment detects objects and segments frames into regions corresponding to the foreground object and the background.
Abstract: A video compression framework based on parametric object and background compression is proposed. At the encoder, an embodiment detects objects and segments frames into regions corresponding to the foreground object and the background. The object and the background are individually encoded using separate parametric coding techniques. While the object is encoded using the projection of coefficients to the orthonormal basis of the learnt subspace (used for appearance based object tracking), the background is characterized using an auto-regressive (AR) process model. An advantage of the proposed schemes is that the decoder structure allows for simultaneous reconstruction of object and background, thus making it amenable to the new multi-thread/multi-processor architectures.
Citations
More filters
Patent
04 Jan 2008
TL;DR: In this article, a method and apparatus for image data compression includes detecting a portion of an image signal that uses a disproportionate amount of bandwidth compared to other portions of the image signal.
Abstract: A method and apparatus for image data compression includes detecting a portion of an image signal that uses a disproportionate amount of bandwidth compared to other portions of the image signal. The detected portion of the image signal result in determined components of interest. Relative to certain variance, the method and apparatus normalize the determined components of interest to generate an intermediate form of the components of interest. The intermediate form represents the components of interest reduced in complexity by the certain variance and enables a compressed form of the image signal where the determined components of interest maintain saliency. In one embodiment, the video signal is a sequence of video frames. The step of detecting includes any of: (i) analyzing image gradients across one or more frames where image gradient is a first derivative model and gradient flow is a second derivative, (ii) integrating finite differences of pels temporally or spatially to form a derivative model, (iii) analyzing an illumination field across one or more frames, and (iv) predictive analysis, to determine bandwidth consumption. The determined bandwidth consumption is then used to determine the components of interest.

50 citations

Patent
06 Oct 2009
TL;DR: In this article, a feature-based model is proposed for video compression, which includes a model of deformation variation and appearance variation of instances of the candidate feature, and compression efficiency is compared with the conventional video compression efficiency.
Abstract: Systems and methods of processing video data are provided. Video data having a series of video frames is received and processed. One or more instances of a candidate feature are detected in the video frames. The previously decoded video frames are processed to identify potential matches of the candidate feature. When a substantial amount of portions of previously decoded video frames include instances of the candidate feature, the instances of the candidate feature are aggregated into a set. The candidate feature set is used to create a feature-based model. The feature-based model includes a model of deformation variation and a model of appearance variation of instances of the candidate feature. The feature-based model compression efficiency is compared with the conventional video compression efficiency.

43 citations

Patent
03 Sep 2015
TL;DR: In this paper, a temporal contrast sensitivity function (TCSF) is computed from the encoder's motion vectors. And spatial complexity maps (SCMs) can be calculated from metrics such as block variance, block luminance, SSIM, and edge strength to obtain a unified importance map.
Abstract: Perceptual statistics may be used to compute importance maps that indicate which regions of a video frame are important to the human visual system. Importance maps may be applied to the video encoding process to enhance the quality of encoded bitstreams. The temporal contrast sensitivity function (TCSF) may be computed from the encoder's motion vectors. Motion vector quality metrics may be used to construct a true motion vector map (TMVM) that can be used to refine the TCSF. Spatial complexity maps (SCMs) can be calculated from metrics such as block variance, block luminance, SSIM, and edge strength, and the SCMs can be combined with the TCSF to obtain a unified importance map. Importance maps may be used to improve encoding by modifying the criterion for selecting optimum encoding solutions or by modifying the quantization for each target block to be encoded.

35 citations

Patent
20 Feb 2013
TL;DR: In this article, a model-based compression framework makes use of the preserved model data by detecting features in a new video to be encoded, relating those features to specific blocks of data, and accessing similar model information from the model library.
Abstract: Systems and methods of improving video encoding/decoding efficiency may be provided. A feature-based processing stream is applied to video data having a series of video frames. Computer-vision-based feature and object detection algorithms identify regions of interest throughout the video datacube. The detected features and objects are modeled with a compact set of parameters, and similar feature/object instances are associated across frames. Associated features/objects are formed into tracks, and each track is given a representative, characteristic feature. Similar characteristic features are clustered and then stored in a model library, for reuse in the compression of other videos. A model-based compression framework makes use of the preserved model data by detecting features in a new video to be encoded, relating those features to specific blocks of data, and accessing similar model information from the model library. The formation of model libraries can be specialized to include personal, “smart” model libraries, differential libraries, and predictive libraries. Predictive model libraries can be modified to handle a variety of demand scenarios.

35 citations

Patent
04 Jan 2008
TL;DR: In this paper, a photorealistic avatar representation of a video conference participant is created based on a face detector and an object based video compression algorithm, which uses machine learning face detection techniques, creates the photorealistically avatar representation from parameters derived from the density, structure, deformation, appearance and illumination models.
Abstract: Systems and methods for processing video are provided. Video compression schemes are provided to reduce the number of bits required to store and transmit digital media in video conferencing or videoblogging applications. A photorealistic avatar representation of a video conference participant is created. The avatar representation can be based on portions of a video stream that depict the conference participant. A face detector is used to identify, track and classify the face. Object models including density, structure, deformation, appearance and illumination models are created based on the detected face. An object based video compression algorithm, which uses machine learning face detection techniques, creates the photorealistic avatar representation from parameters derived from the density, structure, deformation, appearance and illumination models.

33 citations

References
More filters
Journal ArticleDOI
TL;DR: The Condensation algorithm uses “factored sampling”, previously applied to the interpretation of static images, in which the probability distribution of possible interpretations is represented by a randomly generated set.
Abstract: The problem of tracking curves in dense visual clutter is challenging. Kalman filtering is inadequate because it is based on Gaussian densities which, being unimo dal, cannot represent simultaneous alternative hypotheses. The Condensation algorithm uses “factored sampling”, previously applied to the interpretation of static images, in which the probability distribution of possible interpretations is represented by a randomly generated set. Condensation uses learned dynamical models, together with visual observations, to propagate the random set over time. The result is highly robust tracking of agile motion. Notwithstanding the use of stochastic methods, the algorithm runs in near real-time.

5,804 citations

Journal ArticleDOI
TL;DR: A “subspace constancy assumption” is defined that allows techniques for parameterized optical flow estimation to simultaneously solve for the view of an object and the affine transformation between the eigenspace and the image.
Abstract: This paper describes an approach for tracking rigid and articulated objects using a view-based representation. The approach builds on and extends work on eigenspace representations, robust estimation techniques, and parameterized optical flow estimation. First, we note that the least-squares image reconstruction of standard eigenspace techniques has a number of problems and we reformulate the reconstruction problem as one of robust estimation. Second we define a “subspace constancy assumption” that allows us to exploit techniques for parameterized optical flow estimation to simultaneously solve for the view of an object and the affine transformation between the eigenspace and the image. To account for large affine transformations between the eigenspace and the image we define a multi-scale eigenspace representation and a coarse-to-fine matching strategy. Finally, we use these techniques to track objects over long image sequences in which the objects simultaneously undergo both affine image motions and changes of view. In particular we use this “EigenTracking” technique to track and recognize the gestures of a moving hand.

1,343 citations

Journal ArticleDOI
TL;DR: A framework for learning robust, adaptive, appearance models to be used for motion-based tracking of natural objects to provide robustness in the face of image outliers, while adapting to natural changes in appearance such as those due to facial expressions or variations in 3D pose.
Abstract: We propose a framework for learning robust, adaptive, appearance models to be used for motion-based tracking of natural objects. The model adapts to slowly changing appearance, and it maintains a natural measure of the stability of the observed image structure during tracking. By identifying stable properties of appearance, we can weight them more heavily for motion estimation, while less stable properties can be proportionately downweighted. The appearance model involves a mixture of stable image structure, learned over long time courses, along with two-frame motion information and an outlier process. An online EM-algorithm is used to adapt the appearance model parameters over time. An implementation of this approach is developed for an appearance model based on the filter responses from a steerable pyramid. This model is used in a motion-based tracking algorithm to provide robustness in the face of image outliers, such as those caused by occlusions, while adapting to natural changes in appearance such as those due to facial expressions or variations in 3D pose.

1,142 citations

Journal ArticleDOI
TL;DR: A fast incremental principal component analysis (IPCA) algorithm, called candid covariance-free IPCA (CCIPCA), used to compute the principal components of a sequence of samples incrementally without estimating the covariance matrix (so covariances-free).
Abstract: Appearance-based image analysis techniques require fast computation of principal components of high-dimensional image vectors. We introduce a fast incremental principal component analysis (IPCA) algorithm, called candid covariance-free IPCA (CCIPCA), used to compute the principal components of a sequence of samples incrementally without estimating the covariance matrix (so covariance-free). The new method is motivated by the concept of statistical efficiency (the estimate has the smallest variance given the observed data). To do this, it keeps the scale of observations and computes the mean of observations incrementally, which is an efficient estimate for some well known distributions (e.g., Gaussian), although the highest possible efficiency is not guaranteed in our case because of unknown sample distribution. The method is for real-time applications and, thus, it does not allow iterations. It converges very fast for high-dimensional image vectors. Some links between IPCA and the development of the cerebral cortex are also discussed.

479 citations