scispace - formally typeset
Search or ask a question

Showing papers on "Human visual system model published in 1998"


Journal ArticleDOI
TL;DR: A new watermarking algorithm is presented: the method, which operates in the frequency domain, embeds a pseudo-random sequence of real numbers in a selected set of DCT coefficients, which is adapted to the image by exploiting the masking characteristics of the human visual system, thus ensuring the watermark invisibility.

743 citations


Journal ArticleDOI
TL;DR: A copyright protection method that is based on hiding an ‘invisible’ signal, known as digital watermark, in the image is presented and a variation that generates image dependent watermarks as well as a method to handle geometrical distortions are presented.

542 citations


Proceedings ArticleDOI
24 Jul 1998
TL;DR: The model is based on a multiscale representation of pattern, luminance, and color processing in the human visual system and can be usefully applied to image quality metrics, image compression methods, and perceptually-based image synthesis algorithms.
Abstract: In this paper we develop a computational model of adaptation and spatial vision for realistic tone reproduction. The model is based on a multiscale representation of pattern, luminance, and color processing in the human visual system. We incorporate the model into a tone reproduction operator that maps the vast ranges of radiances found in real and synthetic scenes into the small fixed ranges available on conventional display devices such as CRT’s and printers. The model allows the operator to address the two major problems in realistic tone reproduction: wide absolute range and high dynamic range scenes can be displayed; and the displayed images match our perceptions of the scenes at both threshold and suprathreshold levels to the degree possible given a particular display device. Although in this paper we apply our visual model to the tone reproduction problem, the model is general and can be usefully applied to image quality metrics, image compression methods, and perceptually-based image synthesis algorithms. CR Categories: I.3.0 [Computer Graphics]: General;

458 citations


Proceedings ArticleDOI
17 Jul 1998
TL;DR: This work has developed a foveated multiresolution pyramid video coder/decoder which runs in real-time on a general purpose computer and includes zero-tree coding.
Abstract: Foveated imaging exploits the fact that the spatial resolution of the human visual system decreases dramatically away from the point of gaze. Because of this fact, large bandwidth savings are obtained by matching the resolution of the transmitted image to the fall-off in resolution of the human visual system. We have developed a foveated multiresolutionpyramid (FMP) video coder/decoder which runs in real-time on a general purpose computer (i.e., a Pentium with theWindows 95/NT OS). The current system uses a foveated multiresolution pyramid to code each image into 5 or 6 regions ofvarying resolution. The user-controlled foveation point is obtained from a pointing device (e.g., a mouse or an eyelracker).Spatial edge artifacts between the regions created by the foveation are eliminated by raised-cosine blending across levels of thepyramid, and by "foveation point interpolation" within levels of the pyramid. Each level of the pyramid is then motioncompensated, multiresolution pyramid coded, and thresholdedlquantized based upon human contrast sensitivity as a functionof spatial frequency and retinal eccentricity. The final lossless coding includes zero-tree coding. Optimal use of foveatedimaging requires eye tracking; however, there are many useful applications which do not require eye tracking.Key words: foveation, foveated imaging, multiresolution pyramid, video, motion compensation, zero-tree coding, humanvision, eye tracking, video compression

400 citations


Journal ArticleDOI
TL;DR: The robustness of the watermarking procedure to embed copyright protection into digital video to several video degradations and distortions is demonstrated.
Abstract: We present a watermarking procedure to embed copyright protection into digital video. Our watermarking procedure is scene-based and video dependent. It directly exploits spatial masking, frequency masking, and temporal properties to embed an invisible and robust watermark. The watermark consists of static and dynamic temporal components that are generated from a temporal wavelet transform of the video scenes. The resulting wavelet coefficient frames are modified by a perceptually shaped pseudorandom sequence representing the author. The noise-like watermark is statistically undetectable to thwart unauthorized removal. Furthermore, the author representation resolves the deadlock problem. The multiresolution watermark may be detected on single frames without knowledge of the location of the frames in the video scene. We demonstrate the robustness of the watermarking procedure to several video degradations and distortions.

398 citations


Journal ArticleDOI
Chuang Gu1, Ming-Chieh Lee1
TL;DR: A novel semantic video object extraction system using mathematical morphology and a perspective motion model to solve the semantic videoobject extraction problem in two separate steps: supervised I-frame segmentation, and unsupervised P-frame tracking.
Abstract: This paper introduces a novel semantic video object extraction system using mathematical morphology and a perspective motion model. Inspired by the results from the study of the human visual system, we intend to solve the semantic video object extraction problem in two separate steps: supervised I-frame segmentation, and unsupervised P-frame tracking. First, the precise semantic video object boundary can be found using a combination of human assistance and a morphological segmentation tool. Second, the semantic video objects in the remaining frames are obtained using global perspective motion estimation and compensation of the previous semantic video object plus boundary refinement as used for I frames.

272 citations


Journal ArticleDOI
TL;DR: In this paper, a digital set of 29 hyperspectral images of natural scenes was acquired and its spatial frequency content analyzed in terms of chrominance and luminance defined according to existing models of the human cone responses and visual signal processing.
Abstract: The spatial filtering applied by the human visual system appears to be low pass for chromatic stimuli and band pass for luminance stimuli. Here we explore whether this observed difference in contrast sensitivity reflects a real difference in the components of chrominance and luminance in natural scenes. For this purpose a digital set of 29 hyperspectral images of natural scenes was acquired and its spatial frequency content analyzed in terms of chrominance and luminance defined according to existing models of the human cone responses and visual signal processing. The statistical 1/f amplitude spatial-frequency distribution is confirmed for a variety of chromatic conditions across the visible spectrum. Our analysis suggests that natural scenes are relatively rich in high-spatial-frequency chrominance information that does not appear to be transmitted by the human visual system. This result is unlikely to have arisen from errors in the original measurements. Several reasons may combine to explain a failure to transmit high-spatial-frequency chrominance: (a) its minor importance for primate visual tasks, (b) its removal by filtering applied to compensate for chromatic aberration of the eye's optics, and (c) a biological bottleneck blocking its transmission. In addition, we graphically compare the ratios of luminance to chrominance measured by our hyperspectral camera and those measured psychophysically over an equivalent spatial-frequency range.

181 citations


Patent
25 May 1998
TL;DR: In this paper, a video indexing system analyzes contents of source video and develops a visual table of contents using selected images, which are then stored for retrieval by a user who may then display the visual index on a display.
Abstract: A video indexing system analyzes contents of source video and develops a visual table of contents using selected images. The source video is analyzed to detect video cuts from one scene to another, and static scenes. Keyframes are selected for each significant scene. A keyframe filtering process filters out less desired frames including, for example, unicolor frames, or those frames having a same object as a primary focus or one primary focuses. A visual index is created from those frames remaining after the keyframe filtering and stored for retrieval. The visual index may be retrieved by a user who may then display the visual index on a display. The user may select one of the frames displayed in the visual index and the source video may be manually (by the user) or automatically advanced to that frame of the source video. Additionally, a user may print the visual index.

138 citations


Journal ArticleDOI
TL;DR: The evidence for a dissociation between perception and action in neurologically intact individuals and psychophysical judgements about the dimensions of objects in the far peripheral field bear little relation to the calibration of grasping movements directed at those objects.

137 citations


Proceedings ArticleDOI
04 Oct 1998
TL;DR: A perceptual model based on the texture masking and luminance masking properties of the human visual system and an adaptive JPEG coder is implemented that provides savings in bit-rate over baseline JPEG, with no overall loss in perceptual quality according to a subjective test.
Abstract: A perceptual model based on the texture masking and luminance masking properties of the human visual system is presented in this paper. The model computes a local multiplier map for scaling of the JPEG quantization matrix. The result is that fewer bits are used to represent the perceptually less important areas of the image. The texture masking model is based on a block classification algorithm to differentiate between the plain, edge, and texture blocks. An adaptive luminance masking scheme is used to adjust the luminance masking strategy depending on the image's mean luminance value. An adaptive JPEG coder based on the perceptual model is implemented. Experimental results show that the adaptive coder provides savings in bit-rate over baseline JPEG, with no overall loss in perceptual quality according to a subjective test.

109 citations


Journal ArticleDOI
TL;DR: This model has been able to mimic quite accurately the temporally varying subjective picture quality of video sequences as recorded by the ITU-R SSCQE method.

Journal ArticleDOI
TL;DR: The results of psychophysical experiments suggest that the visual system relies on geometric properties of bounding contours such as closure and not on the texture of the two-dimensional regions they partition.

Proceedings ArticleDOI
04 Oct 1998
TL;DR: Some solutions to the problem of building some perceptual masks for better hiding watermarks embedded in the full-frame DCT domain are presented and the results support the validity of the approach.
Abstract: The interest in image watermarking techniques has rapidly grown during the years. Two requirements needed to be satisfied to use watermarking techniques for copyright protection are: unperceivability and robustness against image processing algorithms and forgery attacks. In particular, it is widely accepted that the exploitation of the characteristics of the human visual system should greatly help in satisfying both these requirements. Some solutions to the problem of building some perceptual masks for better hiding watermarks embedded in the full-frame DCT domain are presented. The results support the validity of the approach.

Journal ArticleDOI
TL;DR: This so-called watermarking process is intended to be the basis of a complete copyright protection system and consists of constructing a band-limited image from binary sequences with good correlation properties and in modulating some randomly selected carriers.
Abstract: In this paper, we wish to present a process enabling us to mark digital pictures with invisible and undetectable secret information. This so-called watermarking process is intended to be the basis of a complete copyright protection system. It consists of constructing a band-limited image from binary sequences with good correlation properties and in modulating some randomly selected carriers. The security relies on the secrecy of these carrier frequencies, which are deduced from a unique secret key. Then the amplitude of the modulated images is modified according to a masking criterion based on a model of the Human Visual System. The adding of the modulated images to the original is supposed to be invisible. The resulting image fully identifies the copyright owner since he is the only one able to detect and prove the presence of the embedded watermark thanks to his secret key. This paper also contains an analysis of the robustness of the watermark against compression and image processing. (C) 1998 SPIE and IS&T. [S1017-9909(98)01603-1].

Proceedings ArticleDOI
04 Oct 1998
TL;DR: An objective image quality assessment technique which is based on the properties of the human visual system and consists of an early vision model and a visual attention model which indicates regions of interest in a scene through the use of importance maps.
Abstract: We present an objective image quality assessment technique which is based on the properties of the human visual system (HVS). It consists of two major components: an early vision model (multi-channel and designed specifically for complex natural images), and a visual attention model which indicates regions of interest in a scene through the use of importance maps. Visible errors are then weighted, depending on the perceptual importance of the region in which they occur. We show that this technique produces a high correlation with subjective test data (0.93), compared to only 0.65 for PSNR. This technique is particularly useful for images coded with spatially varying quality.

Proceedings ArticleDOI
07 Dec 1998
TL;DR: A mathematical morphology based post-processing algorithm that uses binary morphological operators to isolate the regions of an image where the ringing artifact is most prominent to the human visual system (HVS) while preserving genuine edges and other (high-frequency) fine details present in the image.
Abstract: Ringing is an annoying artifact frequently encountered in low bit-rate transform and subband decomposition based compression of different media such as image, intra frame video and graphics. A mathematical morphology based post-processing algorithm is presented in this paper for image ringing artifact suppression. First, we use binary morphological operators to isolate the regions of an image where the ringing artifact is most prominent to the human visual system (HVS) while preserving genuine edges and other (high-frequency) fine details present in the image. Then, a gray-level morphological nonlinear smoothing filter is applied to the unmasked regions of the image under the filtering mask to eliminate ringing within this constraint region. To gauge the effectiveness of this approach, we propose an HVS compatible objective measure of the ringing artifact. Preliminary simulations indicate that the proposed method is capable of significantly reducing the ringing artifact on both subjective and objective basis.

Proceedings ArticleDOI
04 Oct 1998
TL;DR: This work has developed a coding technique which exploits characteristics of the human visual system to allocate more bits to the region in which a viewer is most likely, and assumed, to be looking, and it encodes the video frames more efficiently than uniform quantization.
Abstract: We have developed a coding technique which exploits characteristics of the human visual system to allocate more bits to the region in which a viewer is most likely, and assumed, to be looking. We focus on applications, such as video phone and video teleconferencing, in which the regions of interest are those containing human faces. Our approach encodes the video frames more efficiently than uniform quantization and can be used to significantly reduce the encoding bit rate or to improve the perceived image quality for a given bit budget. The method consists of the following: face detection and tracking, local visual-sensitivity determination, and quantizer control.

Journal ArticleDOI
TL;DR: Results from perceptual judgment, delayed matching to sample and long-term memory recall experiments, and computational model that accompanies the psychophysical data indicate that the human visual system can support metrically veridical representations of similarities among 3D objects.

Journal ArticleDOI
TL;DR: The results demonstrate that position invariance, a widely acknowledged property of the human visual system, is limited to specific experimental conditions.
Abstract: Visual object recognition is considered to be largely translation invariant. An earlier study (Foster & Kahn, 1985), however, has indicated that recognition of complex novel stimuli is partially specific to location in the visual field: It is significantly easier to determine the identity of two briefly displayed random patterns if both stimuli are presented at the same, rather than at different, locations. In a series ofsame/different discrimination tasks, we characterize the processes underlying this “displacement effect”: Horizontal and vertical translations are equally effective in reducing performance. Making the task more difficult by increasing pattern similarity leads to even higher positional specificity. The displacement effect disappears after rotation or contrast reversal of the patterns, indicating that positional specificity depends on relatively low levels of processing. Control experiments rule out explanations that are independent of visual pattern memory, such as spatial attention, eye movements, or retinal afterimages. Positional specificity of recognition is found only forsame trials. Our results demonstrate that position invariance, a widely acknowledged property of the human visual system, is limited to specific experimental conditions. Normalization models involving mental shifts of an early visual representation or of a window of attention cannot easily account for these findings.

Proceedings ArticleDOI
04 Oct 1998
TL;DR: A novel technique, which uses a joint audio- visual analysis for scene identification and characterization, and an outlook on a possible implementation strategy for the overall scene identification task is suggested, and validated through a series of experimental simulations on real audio-visual data.
Abstract: A novel technique, which uses a joint audio-visual analysis for scene identification and characterization, is proposed. The paper defines four different scene types: dialogues, stories, actions, and generic scenes. It then explains how any audio-visual material can be decomposed into a series of scenes obeying the previous classification, by properly analyzing and then combining the underlying audio and visual information. A rule-based procedure is defined for such purpose. Before such rule-based decision can take place, a series of low-level pre-processing tasks are suggested to adequately measure audio and visual correlations. As far as visual information is concerned, it is proposed to measure the similarities between non-consecutive shots using a learning vector quantization approach. An outlook on a possible implementation strategy for the overall scene identification task is suggested, and validated through a series of experimental simulations on real audio-visual data.

Journal ArticleDOI
TL;DR: This work describes how each of the subsystems of the packet video delivery network can be tuned to optimize the quality of the delivered signal, for a given available bit rate in the network.
Abstract: We focus on packet video delivery, with an emphasis on the quality of service perceived by the end user. A video signal passes through several subsystems, such as the source coder, the network (ATM or Internet), and the decoder. Each of these can impair the information, either by data loss or by introducing delay. We describe how each of the subsystems can be tuned to optimize the quality of the delivered signal, for a given available bit rate in the network. The assessment of end-user quality is not trivial. We present research results, which rely on a model of the human visual system.

Proceedings ArticleDOI
04 Oct 1998
TL;DR: The model is shown to accurately fit psychophysical contrast sensitivity data as well as intra- and inter-channel contrast masking data from several different psychophysical experiments.
Abstract: This paper presents a comprehensive distortion metric for digital color images. It is based on a normalization model of the human visual system that incorporates color perception. The model is shown to accurately fit psychophysical contrast sensitivity data as well as intra- and inter-channel contrast masking data from several different psychophysical experiments. The output of the metric is compared with subjective data for natural images.

Journal ArticleDOI
01 Jun 1998
TL;DR: This paper provides the reader with a tutorial on major visual data-compression techniques and a list of references for further information as the details of each method are provided.
Abstract: The compression of visual information in the framework of multimedia applications is discussed To this end, major approaches to compress still as well as moving pictures are reviewed The most important objective in any compression algorithm is that of compression efficiency High-compression coding of still pictures can be split into three categories: waveform, second-generation, and fractal coding techniques Each coding approach introduces a different artifact at the target bit rates The primary objective of most ongoing research in this field is to mask these artifacts as much as possible to the human visual system Video-compression techniques have to deal with data enriched by one more component, namely, the temporal coordinate Either compression techniques developed for still images can be generalized for three-dimensional signals (space and time) or a hybrid approach can be defined based on motion compensation The video compression techniques can then be classified into the following four classes: waveform, object-based, model-based, and fractal coding techniques This paper provides the reader with a tutorial on major visual data-compression techniques and a list of references for further information as the details of each method

Book ChapterDOI
01 Jan 1998
TL;DR: The colour spaces presented in this chapter are the most popular in the image processing community andEquations describing transformations between different colourspaces and the reasons for using colour spaces other than RGB are presented.
Abstract: Colour spaces (other terms: colour coordinate systems, colour models) are three-dimensional arrangements of colour sensations. Colours are specified by points in these spaces. The colour spaces presented in this chapter are the most popular in the image processing community. Equations describing transformations between different colourspaces and the reasons for using colour spaces other than RGB are presented. Based on examples from the literature, the applicability of individual colour spaces in image processing systems is discussed. Spaces used in image processing are derived from visual system models (e.g. RGB, opponent colour space, IHS etc.); adopted from technical domains (e.g. colorimetry: XYZ, television: TUV, etc.) or developed especially for image processing (e.g. Ohta space, Kodak Photo YCC space etc.).

Journal ArticleDOI
TL;DR: A spatio-temporal noise reduction scheme for interlaced video which makes use of some special properties of the human visual system and may even yield results which are better than the sum of pure spatial and temporal techniques.
Abstract: The reduction of Gaussian noise is still an important task in video systems. For this purpose a spatio-temporal noise reduction scheme for interlaced video is presented. It consists mainly of a subband based temporal recursive filter which makes use of some special properties of the human visual system. This temporal system is supported by a preceding detail preserving spatial filter with low hardware expense, which consists of an image analysing highpass filter bank and an adaptive lowpass FIR-filter for noise reduction. Both the spatial and temporal noise reduction have been evaluated with a large amount of simulations which result in a very good objective and subjective efficiency. Furthermore the chain of both temporal and spatial noise reduction may even yield results which are better than the sum of pure spatial and temporal techniques.

Journal ArticleDOI
TL;DR: Studies of the diseased human brain show that activity in separate processing-perceptual systems—especially those concerned with color and motion—can lead to the perception of the relevant attribute even when the other processing systems are inactive, which suggests that consciousness itself is a modular, distributed system.
Abstract: The primate visual brain is characterized by a set of parallel, multistage systems that are specialized to process different attributes of the visual scene. They occupy spatially distinct positions in the visual brain and do not project to a unique common area. These processing systems are also perceptual systems, because the result of activity in each leads to the perception of the relevant visual attribute. But the different processing-perceptual systems require different times to complete their tasks, thus leading to another char acteristic of the visual brain, a temporal hierarchy for perception. Together, these two characteristics—of parallel processing and temporal hierarchy—suggest that each processing-perceptual system can act with fair autonomy. Studies of the diseased human brain show that activity in separate processing-perceptual systems—especially those concerned with color and motion—can lead to the perception of the relevant attribute even when the other processing systems are inactive and ...

Journal ArticleDOI
TL;DR: A new block reduction technique is proposed, based on a space-variant non-linear filtering operation of the blocking artifacts present in the image to be reconstructed, which provides a way to greatly reduce the artifacts without degrading high-frequency information of the original image.
Abstract: Blocking effect constitutes one of the main drawbacks of the actual DCT-based compression methods. We propose in this paper a new block reduction technique; it is based on a space-variant non-linear filtering operation of the blocking artifacts present in the image to be reconstructed. To account for the perceptual importance of the distortion, the amount of smoothing is adapted to the visibility of the blocking effect. A visibility parameter is computed for each artifact using the psychovisual properties of the human visual system (HVS). The postprocessing algorithm is in conformity with actual existing compression standards; it provides a way to greatly reduce the artifacts without degrading high-frequency information of the original image. First the proposed method is described and then experimental results are presented, showing the effectiveness of the correction.

Proceedings ArticleDOI
17 Jul 1998
TL;DR: A technique for controlling the adaptive quantization process in an MPEG encoder, which improves upon the commonly used TM5 rate controller, and indicates a subjective improvement in picture quality, in comparison to the TM5 method.
Abstract: We present a technique for controlling the adaptive quantization process in an MPEG encoder, which improves upon the commonly used TM5 rate controller. The method combines both a spatial masking model and a technique for automatically determining the visually important areas in a scene. The spatial masking model has been designed with consideration of the structure of compressed natural images. It takes into account the different levels of distortion that are tolerable by viewers in different parts of a picture by segmenting the scene into flat, edge, and textured regions and quantizing these regions differently. The visually important scene areas are calculated using Importance Maps. These maps are generated by combining factors known to influence human visual attention and eye movements. Lower quantization is assigned to visually important regions, while areas classified as being of low visual importance are more harshly quantized. Results indicate a subjective improvement in picture quality, in comparison to the TM5 method. Less ringing occurs at edges, and the visually important areas of a picture are more accurately coded. This is particularly noticeable at low bit rates. The technique is computationally efficient and flexible, and can easily be extended to specific applications.

Book ChapterDOI
01 Jan 1998
TL;DR: This paper investigated the psychological plausibility of this representation, looking at correlations with human perceptions of memorability and similarity, and showed that transformation of faces to an average shape prior to principal component analysis improves correlations of human ratings.
Abstract: A variety of experimental results indicate that the human visual system processes faces at least to some extent holistically, rather than by analysing individual features such as nose and eyes Principal Components Analysis (PCA) of face images, which is widely used in engineering approaches to face identification, produces an inherently global representation We investigate the psychological plausibility of this representation, looking at correlations with human perceptions of memorability and similarity We show that transformation of faces to an average shape prior to PCA improves correlations with human ratings

Dissertation
01 Jan 1998
TL;DR: Inspired by the human visual system, a model of vision is developed, with special emphasis on visual attention, and programs based on that model that extract a wide variety of spatial relations on demand and learn visuospatial patterns of activity from experience are explained.
Abstract: The human visual system solves an amazing range of problems in the course of everyday activities. Without conscious effort, the human visual system finds a place on the table to put down a cup, selects the shortest checkout queue in a grocery store, looks for moving vehicles before we cross a road, and checks to see if the stoplight has turned green. Inspired by the human visual system, I have developed a model of vision, with special emphasis on visual attention. In this thesis, I explain that model and exhibit programs based on that model that: (1) Extract a wide variety of spatial relations on demand. (2) Learn visuospatial patterns of activity from experience. For example, one program determines what object a human is pointing to. Another learns a particular pattern of visual activity evoked whenever an object falls off a table. The program that extracts spatial relations on demand uses sequences of primitive operations called visual routines. The primitive operations in the visual routines fall into one of three families: operations for moving the focus of attention; operations for establishing certain properties at the focus of attention; and operations for selecting locations. The three families of primitive operations constitute a powerful language of attention. That language supports the construction of visual routines for a wide variety of visuospatial tasks. The program that learns visuospatial patterns of activity rests on the idea that visual routines can be viewed as repeating patterns of attentional state. I show how my language of attention enables learning by supporting the extraction, from experience, of such patterns of repeating attentional state. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)