scispace - formally typeset
Search or ask a question

Showing papers on "Human visual system model published in 2011"


Journal ArticleDOI
TL;DR: A classification scheme for full-reference and reduced-reference media-layer objective video quality assessment methods is introduced and it is found that the natural visual statistics based MultiScale-Structural SIMilarity index (MS-SSIM), thenatural visual feature based Video Quality Metric (VQM), and the perceptual spatio-temporal frequency-domain based MOtion-based Video Integrity Evaluation (MOVIE) index give the best performance for the LIVE Video Quality Database.
Abstract: With the increasing demand for video-based applications, the reliable prediction of video quality has increased in importance. Numerous video quality assessment methods and metrics have been proposed over the past years with varying computational complexity and accuracy. In this paper, we introduce a classification scheme for full-reference and reduced-reference media-layer objective video quality assessment methods. Our classification scheme first classifies a method according to whether natural visual characteristics or perceptual (human visual system) characteristics are considered. We further subclassify natural visual characteristics methods into methods based on natural visual statistics or natural visual features. We subclassify perceptual characteristics methods into frequency or pixel-domain methods. According to our classification scheme, we comprehensively review and compare the media-layer objective video quality models for both standard resolution and high definition video. We find that the natural visual statistics based MultiScale-Structural SIMilarity index (MS-SSIM), the natural visual feature based Video Quality Metric (VQM), and the perceptual spatio-temporal frequency-domain based MOtion-based Video Integrity Evaluation (MOVIE) index give the best performance for the LIVE Video Quality Database.

631 citations


Journal ArticleDOI
TL;DR: The main conclusion is that contour detection has reached high degree of sophistication, taking into account multimodal contour definition (by luminance, color or texture changes), mechanisms for reducing the contour masking influence of noise and texture, perceptual grouping, multiscale aspects and high-level vision information.

347 citations


Journal ArticleDOI
TL;DR: Whether and to what extent the addition of NSS is beneficial to objective quality prediction in general terms is evaluated, and some practical issues in the design of an attention-based metric are addressed.
Abstract: Since the human visual system (HVS) is the ultimate assessor of image quality, current research on the design of objective image quality metrics tends to include an important feature of the HVS, namely, visual attention. Different metrics for image quality prediction have been extended with a computational model of visual attention, but the resulting gain in reliability of the metrics so far was variable. To better understand the basic added value of including visual attention in the design of objective metrics, we used measured data of visual attention. To this end, we performed two eye-tracking experiments: one with a free-looking task and one with a quality assessment task. In the first experiment, 20 observers looked freely to 29 unimpaired original images, yielding us so-called natural scene saliency (NSS). In the second experiment, 20 different observers assessed the quality of distorted versions of the original images. The resulting saliency maps showed some differences with the NSS, and therefore, we applied both types of saliency to four different objective metrics predicting the quality of JPEG compressed images. For both types of saliency the performance gain of the metrics improved, but to a larger extent when adding the NSS. As a consequence, we further integrated NSS in several state-of-the-art quality metrics, including three full-reference metrics and two no-reference metrics, and evaluated their prediction performance for a larger set of distortions. By doing so, we evaluated whether and to what extent the addition of NSS is beneficial to objective quality prediction in general terms. In addition, we address some practical issues in the design of an attention-based metric. The eye-tracking data are made available to the research community .

254 citations


25 Oct 2011
TL;DR: The human visual system is the most complex pattern recognition device known as discussed by the authors, and the visual cortex arrives at a simple and unambiguous interpretation of data from the retinal image that is useful for the decisions and actions of everyday life.
Abstract: The human visual system is the most complex pattern recognition device known. In ways that are yet to be fully understood, the visual cortex arrives at a simple and unambiguous interpretation of data from the retinal image that is useful for the decisions and actions of everyday life. Recent advances in Bayesian models of computer vision and in the measurement and modeling of natural image statistics are providing the tools to test and constrain theories of human object perception. In turn, these theories are having an impact on the interpretation of cortical function.

247 citations


Journal ArticleDOI
TL;DR: This paper proposes two simple quality measures to correlate detail losses and additive impairments with visual quality, respectively and demonstrates that the proposed metric has a better or similar performance in matching subjective ratings when compared with the state-of-the-art image quality metrics.
Abstract: In the research field of image processing, mean squared error (MSE) and peak signal-to-noise ratio (PSNR) are extensively adopted as the objective visual quality metrics, mainly because of their simplicity for calculation and optimization. However, it has been well recognized that these pixel-based difference measures correlate poorly with the human perception. Inspired by existing works , in this paper we propose a novel algorithm which separately evaluates detail losses and additive impairments for image quality assessment. The detail loss refers to the loss of useful visual information which affects the content visibility, and the additive impairment represents the redundant visual information whose appearance in the test image will distract viewer's attention from the useful contents causing unpleasant viewing experience. To separate detail losses and additive impairments, a wavelet-domain decoupling algorithm is developed which can be used for a host of distortion types. Two HVS characteristics, i.e., the contrast sensitivity function and the contrast masking effect, are taken into account to approximate the HVS sensitivities. We propose two simple quality measures to correlate detail losses and additive impairments with visual quality, respectively. Based on the findings in that observers judge low-quality images in terms of the ability to interpret the content, the outputs of the two quality measures are adaptively combined to yield the overall quality index. By conducting experiments based on five subjectively-rated image databases, we demonstrate that the proposed metric has a better or similar performance in matching subjective ratings when compared with the state-of-the-art image quality metrics.

219 citations


Journal ArticleDOI
TL;DR: A total variation model for Retinex is presented, which assumes spatial smoothness of the illumination and piecewise continuity of the reflection, where the total variation term is employed in the model.
Abstract: Human vision has the ability to recognize color under varying illumination conditions. Retinex theory is introduced to explain how the human visual system perceives color. The main aim of this paper is to present a total variation model for Retinex. Different from the existing methods, we consider and study two important elements which include illumination and reflection. We assume spatial smoothness of the illumination and piecewise continuity of the reflection, where the total variation term is employed in the model. The existence of the solution of the model is shown in the paper. We employ a fast computation method to solve the proposed minimization problem. Numerical examples are presented to illustrate the effectiveness of the proposed model.

215 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: The insights gained from analysis enable building a novel distance function between images assessing whether they are from the same basic-level category, which goes beyond direct visual distance as it also exploits semantic similarity measured through ImageNet.
Abstract: Many computer vision approaches take for granted positive answers to questions such as “Are semantic categories visually separable?” and “Is visual similarity correlated to semantic similarity?”. In this paper, we study experimentally whether these assumptions hold and show parallels to questions investigated in cognitive science about the human visual system. The insights gained from our analysis enable building a novel distance function between images assessing whether they are from the same basic-level category. This function goes beyond direct visual distance as it also exploits semantic similarity measured through ImageNet. We demonstrate experimentally that it outperforms purely visual distances.

168 citations


Journal ArticleDOI
Aljoscha Smolic1
TL;DR: The conclusion is that the necessary technology including standard media formats for 3D video and free viewpoint video is available or will be available in the future, and that there is a clear demand from industry and user for such advanced types of visual media.

146 citations


Journal ArticleDOI
TL;DR: This review paper on visual mismatch negativity (MMN), an event-related brain potential component, provides arguments in favor of its theoretical importance in visual cognitive sciences.
Abstract: This review paper on visual mismatch negativity (MMN), an event-related brain potential component, provides arguments in favor of its theoretical importance in visual cognitive sciences. We propose that (a) previous visual MMN findings can be regarded as ample evidence for the existence of unintentional prediction about the next state of a visual object in the immediate future on the basis of its temporal context ('unintentional temporal-context-based prediction in vision'); (b) such predictive processes may be qualitatively similar to those revealed by behavioral phenomena, such as representational momentum, flash-lag effect, and perceptual sequence learning; (c) such predictive processes may provide advantages for our adaptation to the visual environment at the computational, neural, and behavioral levels, and (d) in concert with such behavioral phenomena, visual MMN could be a unique and powerful tool for tapping into the predictive power of the human visual system.

130 citations


Journal ArticleDOI
TL;DR: This work states that a subset of the visual information is selected by shifting the focus of attention across the visual scene to the most relevant objects, and perceptual quality models inherently assume that all objects draw the attention of the viewer to the same degree.
Abstract: Perceptual quality metrics are widely deployed in image and video processing systems. These metrics aim to emulate the integral mechanisms of the human visual system (HVS) to correlate well with visual perception of quality. One integral property of the HVS is, however, often neglected: visual attention (VA) [1]. The essential mechanisms associated with VA consist mainly of higher cognitive processing, deployed to reduce the complexity of scene analysis. For this purpose, a subset of the visual information is selected by shifting the focus of attention across the visual scene to the most relevant objects. By neglecting VA, perceptual quality models inherently assume that all objects draw the attention of the viewer to the same degree.

128 citations


Journal ArticleDOI
TL;DR: A principled general framework for analyzing the psychophysics and neurophysiology of defocus estimation in species across the animal kingdom and for developing optimal image-based defocus and depth estimation algorithms for computational vision systems is provided.
Abstract: Defocus blur is nearly always present in natural images: Objects at only one distance can be perfectly focused. Images of objects at other distances are blurred by an amount depending on pupil diameter and lens properties. Despite the fact that defocus is of great behavioral, perceptual, and biological importance, it is unknown how biological systems estimate defocus. Given a set of natural scenes and the properties of the vision system, we show from first principles how to optimally estimate defocus at each location in any individual image. We show for the human visual system that high-precision, unbiased estimates are obtainable under natural viewing conditions for patches with detectable contrast. The high quality of the estimates is surprising given the heterogeneity of natural images. Additionally, we quantify the degree to which the sign ambiguity often attributed to defocus is resolved by monochromatic aberrations (other than defocus) and chromatic aberrations; chromatic aberrations fully resolve the sign ambiguity. Finally, we show that simple spatial and spatio-chromatic receptive fields extract the information optimally. The approach can be tailored to any environment–vision system pairing: natural or man-made, animal or machine. Thus, it provides a principled general framework for analyzing the psychophysics and neurophysiology of defocus estimation in species across the animal kingdom and for developing optimal image-based defocus and depth estimation algorithms for computational vision systems.

Proceedings ArticleDOI
25 Jul 2011
TL;DR: A perceptual model of disparity for computer graphics that is used to define a metric to compare a stereo image to an alternative stereo image and to estimate the magnitude of the perceived disparity change is introduced.
Abstract: Binocular disparity is an important cue for the human visual system to recognize spatial layout, both in reality and simulated virtual worlds. This paper introduces a perceptual model of disparity for computer graphics that is used to define a metric to compare a stereo image to an alternative stereo image and to estimate the magnitude of the perceived disparity change. Our model can be used to assess the effect of disparity to control the level of undesirable distortions or enhancements (introduced on purpose). A number of psycho-visual experiments are conducted to quantify the mutual effect of disparity magnitude and frequency to derive the model. Besides difference prediction, other applications include compression, and re-targeting. We also present novel applications in form of hybrid stereo images and backward-compatible stereo. The latter minimizes disparity in order to convey a stereo impression if special equipment is used but produces images that appear almost ordinary to the naked eye. The validity of our model and difference metric is again confirmed in a study.

Proceedings ArticleDOI
01 Dec 2011
TL;DR: A modified PSNR metric which is based on HVS characteristics and correlates well with the perceived image quality is presented, which takes into account the error sensitivity, structural distortion and edge distortion in the image.
Abstract: Objective assessment of the image quality is of keen importance in numerous image processing applications Various objective quality assessment indexes have been developed for this purpose, of which Peak signal-to-noise ratio (PSNR) is one of the simplest and commonly used However, it sometimes fails to give result similar to as perceived by the Human Visual System (HVS) This paper presents a modified PSNR metric which is based on HVS characteristics and correlates well with the perceived image quality It takes into account the error sensitivity, structural distortion and edge distortion in the image The proposed metric uses RGB model for color images and empirically combine the effects of above distortion types on each of the color plane Simulation results illustrate the precision and efficiency of the proposed metric in assessing the quality of color images for different types of degradations and show better correlation with the known characteristics of HVS in comparison to conventional PSNR metric

Journal ArticleDOI
TL;DR: This paper reviews current research on stereo human vision and how it informs us about how best to create and present stereo 3D imagery and how temporal presentation protocols affect flicker, motion artifacts, and depth distortion.
Abstract: Stereoscopic displays have become important for many applications, including operation of remote devices, medical imaging, surgery, scientific visualization, and computer-assisted design. But the most significant and exciting development is the incorporation of stereo technology into entertainment: specifically, cinema, television, and video games. In these applications for stereo, three-dimensional (3D) imagery should create a faithful impression of the 3D structure of the scene being portrayed. In addition, the viewer should be comfortable and not leave the experience with eye fatigue or a headache. Finally, the presentation of the stereo images should not create temporal artifacts like flicker or motion judder. This paper reviews current research on stereo human vision and how it informs us about how best to create and present stereo 3D imagery. The paper is divided into four parts: (1) getting the geometry right, (2) depth cue interactions in stereo 3D media, (3) focusing and fixating on stereo images, and (4) how temporal presentation protocols affect flicker, motion artifacts, and depth distortion.

Journal ArticleDOI
TL;DR: This paper uses the structural similarity index as the quality metric for rate-distortion modeling and develops an optimum bit allocation and rate control scheme for video coding that achieves up to 25% bit-rate reduction over the JM reference software of H.264.
Abstract: The quality of video is ultimately judged by human eye; however, mean squared error and the like that have been used as quality metrics are poorly correlated with human perception. Although the characteristics of human visual system have been incorporated into perceptual-based rate control, most existing schemes do not take rate-distortion optimization into consideration. In this paper, we use the structural similarity index as the quality metric for rate-distortion modeling and develop an optimum bit allocation and rate control scheme for video coding. This scheme achieves up to 25% bit-rate reduction over the JM reference software of H.264. Under the rate-distortion optimization framework, the proposed scheme can be easily integrated with the perceptual-based mode decision scheme. The overall bit-rate reduction may reach as high as 32% over the JM reference software.

Journal ArticleDOI
TL;DR: This work introduces a basic model, the masked RBM, which explicitly models occlusion boundaries in image patches by factoring the appearance of any patch region from its shape, and proposes a generative model of larger images using a field of such RBMs.
Abstract: Computer vision has grown tremendously in the past two decades. Despite all efforts, existing attempts at matching parts of the human visual system's extraordinary ability to understand visual scenes lack either scope or power. By combining the advantages of general low-level generative models and powerful layer-based and hierarchical models, this work aims at being a first step toward richer, more flexible models of images. After comparing various types of restricted Boltzmann machines (RBMs) able to model continuous-valued data, we introduce our basic model, the masked RBM, which explicitly models occlusion boundaries in image patches by factoring the appearance of any patch region from its shape. We then propose a generative model of larger images using a field of such RBMs. Finally, we discuss how masked RBMs could be stacked to form a deep model able to generate more complicated structures and suitable for various tasks such as segmentation or object recognition.

Journal ArticleDOI
TL;DR: In this letter, an improved histogram modification based reversible data hiding technique is proposed, unlike the conventional reversible techniques, a data embedding level is adaptively adjusted for each pixel with a consideration of the human visual system (HVS) characteristics.
Abstract: In this letter, we propose an improved histogram modification based reversible data hiding technique. In the proposed algorithm, unlike the conventional reversible techniques, a data embedding level is adaptively adjusted for each pixel with a consideration of the human visual system (HVS) characteristics. To this end, an edge and the just noticeable difference (JND) values are estimated for every pixel, and the estimated values are used to determine the embedding level. This pixel level adjustment can effectively reduce the distortion caused by data embedding. The experimental results and performance comparison with other reversible data hiding algorithms are presented to demonstrate the validity of the proposed algorithm.

Proceedings ArticleDOI
22 May 2011
TL;DR: A reduced reference (RR) perceptual quality metric for color stereoscopic images is presented and experiments indicate that the objective scores obtained by the proposed metric agree well with the subjective assessment scores.
Abstract: In this work, a reduced reference (RR) perceptual quality metric for color stereoscopic images is presented. Given a reference stereo pair of images and their “distorted” version, we first compute the disparity map of both the reference and the distorted stereoscopic images. To this end, we define a method for color image disparity estimation based on the structure tensors properties and eigenvalues/eigenvectors analysis. Then, we compute the cyclopean images of both the reference and the distorted pairs. Thereafter, we apply a multispectral wavelet decomposition to the two cyclopean color images in order to describe the different channels in the human visual system (HVS). Then, contrast sensitivity function (CSF) filtering is performed to obtain the same visual sensitivity information within the original and the distorted cyclopean images. Thereafter, based on the properties of the human visual system (HVS), rational sensitivity thresholding is performed to obtain the sensitivity coefficients of the cyclopean images. Finally, RR stereo color image quality assessment (SCIQA) is performed by comparing the sensitivity coefficients of the cyclopean images and studying the coherence between the disparity maps of the reference and the distorted pairs. Experiments performed on color stereoscopic images indicate that the objective scores obtained by the proposed metric agree well with the subjective assessment scores.

Journal ArticleDOI
TL;DR: A novel paradigm for measuring the face inversion effect is developed, a standard marker of holistic face processing, that measures the minimum exposure time required to discriminate between two stimuli to demonstrate that holistic processing operates on whole upright faces, regardless of whether subjects are required to extract first- or second-level information.

Journal ArticleDOI
TL;DR: Context-preserving visual links were perceived as visually more attractive than traditional visual links that do not account for the context information and can be employed in a large number of visual data analysis scenarios in which the underlying content cannot or should not be altered.
Abstract: Evaluating, comparing, and interpreting related pieces of information are tasks that are commonly performed during visual data analysis and in many kinds of information-intensive work. Synchronized visual highlighting of related elements is a well-known technique used to assist this task. An alternative approach, which is more invasive but also more expressive is visual linking in which line connections are rendered between related elements. In this work, we present context-preserving visual links as a new method for generating visual links. The method specifically aims to fulfill the following two goals: first, visual links should minimize the occlusion of important information; second, links should visually stand out from surrounding information by minimizing visual interference. We employ an image-based analysis of visual saliency to determine the important regions in the original representation. A consequence of the image-based approach is that our technique is application-independent and can be employed in a large number of visual data analysis scenarios in which the underlying content cannot or should not be altered. We conducted a controlled experiment that indicates that users can find linked elements in complex visualizations more quickly and with greater subjective satisfaction than in complex visualizations in which plain highlighting is used. Context-preserving visual links were perceived as visually more attractive than traditional visual links that do not account for the context information.

Journal ArticleDOI
TL;DR: It is shown that below this just-noticeable asymmetry threshold, where subtle artifacts start to appear, symmetric coding performs better than asymmetric coding in terms of perceived 3D video quality, and that the choice between asymmetric vs. symmetrical coding depends on PSNR; hence, the available total bitrate.
Abstract: It is well known that the human visual system can perceive high frequencies in 3D stereo video, even if that information is present in only one of the views. Therefore, the best perceived 3D stereo video quality may be achieved by asymmetric coding where the reference and auxiliary (right and left) views are coded at unequal PSNR. However, the questions of what is the best level of asymmetry in order to maximize the perceived quality and whether asymmetry should be achieved by spatial resolution reduction or PSNR (quality) reduction have been open issues. We conducted extensive subjective tests, which indicate that if the reference view is encoded at sufficiently high quality and the auxiliary view is encoded at a lower quality but above a certain PSNR threshold, then the degradation in 3D video quality is unnoticeable. Since asymmetric coding by PSNR reduction gives finer control over achievable PSNR values over spatial resolution reduction, asymmetry by PSNR reduction allows us to encode at a point more close to this just-noticeable asymmetry PSNR threshold; hence will be preferred over the spatial resolution reduction method. Subjective tests also indicate that below this just-noticeable asymmetry threshold, where subtle artifacts start to appear, symmetric coding performs better than asymmetric coding in terms of perceived 3D video quality. Therefore, we show that the choice between asymmetric vs. symmetric coding depends on PSNR; hence, the available total bitrate. This paper also proposes a novel asymmetric scalable stereo video coding framework to enable adaptive stereoscopic video streaming taking full advantage of these observations and subjective test results.

Proceedings ArticleDOI
28 Nov 2011
TL;DR: This paper proposes a novel deep learning model called bilinear deep belief network (BDBN), which aims to provide human-like judgment by referencing the architecture of the human visual system and the procedure of intelligent perception, and develops BDBN under a semi-supervised learning framework.
Abstract: Image classification is a well-known classical problem in multimedia content analysis. This paper proposes a novel deep learning model called bilinear deep belief network (BDBN) for image classification. Unlike previous image classification models, BDBN aims to provide human-like judgment by referencing the architecture of the human visual system and the procedure of intelligent perception. Therefore, the multi-layer structure of the cortex and the propagation of information in the visual areas of the brain are realized faithfully. Unlike most existing deep models, BDBN utilizes a bilinear discriminant strategy to simulate the "initial guess" in human object recognition, and at the same time to avoid falling into a bad local optimum. To preserve the natural tensor structure of the image data, a novel deep architecture with greedy layer-wise reconstruction and global fine-tuning is proposed. To adapt real-world image classification tasks, we develop BDBN under a semi-supervised learning framework, which makes the deep model work well when labeled images are insufficient. Comparative experiments on three standard datasets show that the proposed algorithm outperforms both representative classification models and existing deep learning techniques. More interestingly, our demonstrations show that the proposed BDBN works consistently with the visual perception of humans.

Journal ArticleDOI
TL;DR: In this paper, a perceptual model of disparity is introduced for the human visual system to recognize spatial layout, both in reality and simulated virtual worlds, in both real and virtual worlds.
Abstract: Binocular disparity is an important cue for the human visual system to recognize spatial layout, both in reality and simulated virtual worlds. This paper introduces a perceptual model of disparity ...


Journal ArticleDOI
TL;DR: This work proposes a public digital watermarking technique for video copyright protection in the discrete wavelet transform domain, which achieves a good perceptual quality and high resistance to a large spectrum of attacks.
Abstract: The development of the information technology and computer networks facilitates easy duplication, manipulation, and distribution of digital data. Digital watermarking is one of the proposed solutions for effectively safeguarding the rightful ownership of digital images and video. We propose a public digital watermarking technique for video copyright protection in the discrete wavelet transform domain. The scheme uses binary images as watermarks. These are embedded in the detail wavelet coefficients of the middle wavelet subbands. The method is a combination of spread spectrum and quantization-based watermarking. Every bit of the watermark is spread over a number of wavelet coefficients with the use of a secret key by means of quantization. The selected wavelet detail coefficients from different subbands are quantized using an optimal quantization model, based on the characteristics of the human visual system (HVS). Our HVS-based scheme is compared to a non-HVS approach. The resilience of the watermarking algorithm is tested against a series of different spatial, temporal, and compression attacks. To improve the robustness of the algorithm, we use error correction codes and embed the watermark with spatial and temporal redundancy. The proposed method achieves a good perceptual quality and high resistance to a large spectrum of attacks.

Journal ArticleDOI
TL;DR: In this paper, the authors extended the real fractional differential (RDF) to quaternion body and put forward a new concept: quaternions fractional differentiation (QFD), and applied it to edge detection of colour image.
Abstract: According to the development of the real fractional differential and its applications in the modern signal processing, the authors extend it to quaternion body and put forward a new concept: quaternion fractional differential (QFD), and apply it to edge detection of colour image. This method is called edge detection based on QFD. Simulation experiments indicate that this method has special advantages. Furthermore, the authors give an indicator to evaluate the effectiveness of different edge filters. Comparing with Sobel and mix edges of real fractional differential to every channels of colour image, they discover that QFD has fewer false negatives in the textured regions and is also better at detecting edges that are partially defined by texture, which means the authors can obtain much better results in the interesting regions using QFD and is more consistent with the characteristics of human visual system.

Journal Article
TL;DR: An imperceptible and robust audio watermarking algorithm based on the discrete wavelet transform that has been evaluated extensively, and simulation results are presented to demonstrate the imperceptibility and robustness of the proposed algorithm.
Abstract: Many effective watermarking algorithms have been proposed and implemented for digital images and digital video, however, few algorithms have been proposed for audio watermarking. This is due to the fact that, the human audio system is far more complex and sensitive than the human visual system. In this paper, we describe an imperceptible and robust audio watermarking algorithm based on the discrete wavelet transform. Performance of the algorithm has been evaluated extensively, and simulation results are presented to demonstrate the imperceptibility and robustness of the proposed algorithm.

Journal ArticleDOI
TL;DR: A general probabilistic (k,n)-VSS scheme for grey-scale images and another scheme for color images that is equivalent to the contrast of existing deterministic VSS schemes.

Proceedings ArticleDOI
29 Dec 2011
TL;DR: In this paper, an optimized bit-depth transformation and HVS model based wavelet transform denoising were proposed to compress the video to a manageable bitrate without compromising perceptual quality.
Abstract: High Dynamic Range (HDR) technology is able to offer high levels of immersion with a dynamic range comparable to the Human Visual System (HVS). A primary drawback of HDR is that its memory and bandwidth requirements are significantly higher than for conventional video. The challenge is thus to develop means for efficiently compressing the video to a manageable bitrate without compromising perceptual quality. In this paper, we propose an HDR compression method based on an optimized bit-depth transformation, and HVS model based wavelet transform denoising. Experimental results indicate that the proposed method outperforms previous approaches and operates in accordance with characteristics of the HVS, tested objectively using a Visible Difference Predictor (VDP).

Book ChapterDOI
01 Jan 2011
TL;DR: A significant application of video analysis and understanding is intelligent surveillance, which aims to interpret automatically human activity and detect unusual events that could pose a threat to public security and safety.
Abstract: Human eyes are highly efficient devices for scanning through a large quantity of low-level visual sensory data and delivering selective information to one’s brain for high-level semantic interpretation and gaining situational awareness. Over the last few decades, the computer vision community has endeavoured to bring about similar perceptual capabilities to artificial visual sensors. Substantial efforts have been made towards understanding static images of individual objects and the corresponding processes in the human visual system. This endeavour is intensified further by the need for understanding a massive quantity of video data, with the aim to comprehend multiple entities not only within a single image but also over time across multiple video frames for understanding their spatio-temporal relations. A significant application of video analysis and understanding is intelligent surveillance, which aims to interpret automatically human activity and detect unusual events that could pose a threat to public security and safety.