scispace - formally typeset
Search or ask a question

Showing papers on "Human visual system model published in 2004"


Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations


Proceedings ArticleDOI
17 May 2004
TL;DR: This work proposes an information fidelity criterion that quantifies the Shannon information that is shared between the reference and distorted images relative to the information contained in the reference image itself, and demonstrates the performance of the algorithm by testing it on a data set of 779 images.
Abstract: Measurement of image quality is crucial for many image-processing algorithms. Traditionally, image quality assessment algorithms predict visual quality by comparing a distorted image against a reference image, typically by modeling the human visual system (HVS), or by using arbitrary signal fidelity criteria. We adopt a new paradigm for image quality assessment. We propose an information fidelity criterion that quantifies the Shannon information that is shared between the reference and distorted images relative to the information contained in the reference image itself. We use natural scene statistics (NSS) modeling in concert with an image degradation model and an HVS model. We demonstrate the performance of our algorithm by testing it on a data set of 779 images, and show that our method is competitive with state of the art quality assessment methods, and outperforms them in our simulations.

1,349 citations


Journal ArticleDOI
TL;DR: A new philosophy in designing image and video quality metrics is followed, which uses structural dis- tortion as an estimate of perceived visual distortion as part of full-reference (FR) video quality assessment.
Abstract: Objective image and video quality measures play important roles in a variety of image and video pro- cessing applications, such as compression, communication, printing, analysis, registration, restoration, enhancement and watermarking. Most proposed quality assessment ap- proaches in the literature are error sensitivity-based meth- ods. In this paper, we follow a new philosophy in designing image and video quality metrics, which uses structural dis- tortion as an estimate of perceived visual distortion. A com- putationally ecient approach is developed for full-reference (FR) video quality assessment. The algorithm is tested on the video quality experts group (VQEG) Phase I FR-TV test data set. Keywords—Image quality assessment, video quality assess- ment, human visual system, error sensitivity, structural dis- tortion, video quality experts group (VQEG)

1,083 citations


Journal ArticleDOI
TL;DR: The frequency of white pixels is used to show the contrast of the recovered image and the scheme is nonexpansible and can be easily implemented on a basis of conventional VSS scheme.

426 citations


Journal ArticleDOI
TL;DR: It is shown that the 'light-from-above' prior, used to extract information about shape from shading is modified in response to active experience with the scene, demonstrating that priors are constantly adapted by interactiveExperience with the environment.
Abstract: To interpret complex and ambiguous input, the human visual system uses prior knowledge or assumptions about the world. We show that the 'light-from-above' prior, used to extract information about shape from shading is modified in response to active experience with the scene. The resultant adaptation is not specific to the learned scene but generalizes to a different task, demonstrating that priors are constantly adapted by interactive experience with the environment.

352 citations


Journal ArticleDOI
01 Aug 2004
TL;DR: A new approach for inter-frame encoding of HDR video is proposed, which is embedded in the well-established MPEG-4 video compression standard and requires only 10--11 bits to encode 12 orders of magnitude of visible luminance range and does not lead to perceivable contouring artifacts.
Abstract: Due to rapid technological progress in high dynamic range (HDR) video capture and display, the efficient storage and transmission of such data is crucial for the completeness of any HDR imaging pipeline. We propose a new approach for inter-frame encoding of HDR video, which is embedded in the well-established MPEG-4 video compression standard. The key component of our technique is luminance quantization that is optimized for the contrast threshold perception in the human visual system. The quantization scheme requires only 10--11 bits to encode 12 orders of magnitude of visible luminance range and does not lead to perceivable contouring artifacts. Besides video encoding, the proposed quantization provides perceptually-optimized luminance sampling for fast implementation of any global tone mapping operator using a lookup table. To improve the quality of synthetic video sequences, we introduce a coding scheme for discrete cosine transform (DCT) blocks with high contrast. We demonstrate the capabilities of HDR video in a player, which enables decoding, tone mapping, and applying post-processing effects in real-time. The tone mapping algorithm as well as its parameters can be changed interactively while the video is playing. We can simulate post-processing effects such as glare, night vision, and motion blur, which appear very realistic due to the usage of HDR data.

253 citations


Journal ArticleDOI
TL;DR: A novel robust watermarking approach called FuseMark is presented based on the principles of image fusion for copy protection or robust tagging applications to decrease the problem of false negative detection without increasing the false positive detection rate.
Abstract: This paper presents a novel robust watermarking approach called FuseMark based on the principles of image fusion for copy protection or robust tagging applications. We consider the problem of logo watermarking in still images and employ multiresolution data fusion principles for watermark embedding and extraction. A human visual system model based on contrast sensitivity is incorporated to hide a higher energy hidden logo in salient image components. Watermark extraction involves both characterization of attacks and logo estimation using a rake-like receiver. Statistical analysis demonstrates how our extraction approach can be used for watermark detection applications to decrease the problem of false negative detection without increasing the false positive detection rate. Simulation results verify theoretical observations and demonstrate the practical performance of FuseMark.

219 citations


Journal ArticleDOI
TL;DR: It is shown that after controlling for subjects' expectations, there is no difference between ‘featurally’ and ‘configurally’ transformed faces in terms of inversion effect, which reinforces the plausibility of simple hierarchical models of object representation and recognition in the cortex.
Abstract: Understanding how the human visual system recognizes objects is one of the key challenges in neuroscience. Inspired by a large body of physiological evidence, a general class of recognition models has emerged, which is based on a hierarchical organization of visual processing, with succeeding stages being sensitive to image features of increasing complexity. However, these models appear to be incompatible with some well-known psychophysical results. Prominent among these are experiments investigating recognition impairments caused by vertical inversion of images, especially those of faces. It has been reported that faces that differ 'featurally' are much easier to distinguish when inverted than those that differ 'configurally'; a finding that is difficult to reconcile with the physiological models. Here, we show that after controlling for subjects' expectations, there is no difference between 'featurally' and 'configurally' transformed faces in terms of inversion effect. This result reinforces the plausibility of simple hierarchical models of object representation and recognition in the cortex.

152 citations


Dissertation
01 Jan 2004
TL;DR: It is shown that quality assessment algorithms deal only with images and videos that are meant for human consumption, and that they outperform current state-of-the-art methods in my simulations.
Abstract: Measurement of image quality is crucial for designing image processing systems that could potentially degrade visual quality. Such measurements allow developers to optimize designs to deliver maximum quality while minimizing system cost. This dissertation is about automatic algorithms for quality assessment of digital images. Traditionally, researchers have equated image quality with image fidelity, or the closeness of a distorted image to a ‘reference’ image that is assumed to have perfect quality. This closeness is typically measured by modeling the human visual system, or by using different mathematical criteria for signal similarity. In this dissertation, I approach the problem from a novel direction. I claim that quality assessment algorithms deal only with images and videos that are meant for human consumption, and that these signals are almost exclusively images and videos of the visual environment. Image distortions make these so-called natural scenes look ‘unnatural’. I claim that this departure from ‘expected’ characteristics could be quantified for predicting visual quality. I present a novel information-theoretic approach to image quality assessment using statistical models for natural scenes. I approach the quality assessment problem as an information fidelity problem, in which the distortion process is viewed as a channel that limits the flow of information from a source of natural images to the receiver (the brain). I show that quality of a test image is strongly related to the amount of statistical information about the reference image that is present in the test image. I also explore image quality assessment in the absence of the reference, and present a novel method for blindly quantifying the quality of images compressed by wavelet based compression algorithms. I show that images are rendered unnatural by the quantization process during lossy compression, and that this unnaturalness could be quantified blindly for predicting visual quality. I test and validate the performance of the algorithms proposed in this dissertation through an extensive study in which ground truth data was obtained from many human subjects. I show that the methods presented can accurately predict visual quality, and that they outperform current state-of-the-art methods in my simulations.

146 citations


Journal ArticleDOI
TL;DR: A psychophysical experiment is described that measured the sensitivity of human observers to small differences of 3D shape over a wide variety of conditions and provides clear evidence that the presence of specular highlights or the motions of a surface relative to its light source do not pose an impediment to perception, but rather, provide powerful sources of information for the perceptual analysis of3D shape.
Abstract: There have been numerous computational models developed in an effort to explain how the human visual system analyzes three-dimensional (3D) surface shape from patterns of image shading, but they all share some important limitations. Models that are applicable to individual static images cannot correctly interpret regions that contain specular highlights, and those that are applicable to moving images have difficulties when a surface moves relative to its sources of illumination. Here we describe a psychophysical experiment that measured the sensitivity of human observers to small differences of 3D shape over a wide variety of conditions. The results provide clear evidence that the presence of specular highlights or the motions of a surface relative to its light source do not pose an impediment to perception, but rather, provide powerful sources of information for the perceptual analysis of 3D shape.

135 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: A novel approach is presented in this paper, including both visual and audio from video clips, to recognize the human emotion, and shows that this approach outperforms only using visual or audio separately.
Abstract: Emotion recognition is one of the latest challenges in intelligent human/computer communication. Most of the previous work on emotion recognition focused on extracting emotions from visual or audio information separately. A novel approach is presented in this paper, including both visual and audio from video clips, to recognize the human emotion. The facial animation parameters (FAPs) compliant facial feature tracking based on active appearance model is performed on the video to generate two vector stream which represent the expression feature and the visual speech one. Combined with the visual vectors, the audio vector is extracted in terms of low level features. Then, a tripled hidden Markov model is introduced to perform the recognition which allows the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. The experimental results show that this approach outperforms only using visual or audio separately.

Journal ArticleDOI
TL;DR: The present results suggest that a change‐detection mechanism sensitive to unattended changes in motion direction may exist in the human visual system.
Abstract: The possibility that the visual system is able to register unattended changes is still debated in the literature. However, it is difficult to understand how a sensory system becomes aware of unexpected salient changes in the environment if attention is required for detecting them. The ability to automatically detect unusual changes in the sensory environment is an adaptive function which has been confirmed in other sensory modalities (i.e. audition). This deviance detector mechanism has proven to be based on a preattentive nonrefractory memory-comparison process. To investigate whether such automatic change detection mechanism exists in the human visual system, we recorded event-related potentials to sudden changes in a biologically important feature, motion direction. Unattended sinusoidal gratings varying in motion direction in the peripheral field were presented while subjects performed a central task with two levels of difficulty. We found a larger negative displacement in the electrophysiological response elicited by less frequent stimuli (deviant) at posterior scalp locations. Within the latency range of the visual evoked component N2, this differential response was elicited independently of the direction of motion and processing load. Moreover, the results showed that the negativity elicited by deviants was not related to a differential refractory state between the electrophysiological responses to frequent and infrequent directions of motion, and that it was restricted to scalp locations related to motion processing areas. The present results suggest that a change-detection mechanism sensitive to unattended changes in motion direction may exist in the human visual system.

Proceedings ArticleDOI
13 Jan 2004
TL;DR: The WeightMap is introduced, a bitmap representation of the visual weight of a presentation that is based on the concepts of visual weight and visual balance, which are fundamental to the visual arts.
Abstract: Layout refers to the process of determining the size and position of the visual objects in an information presentation. We introduce the WeightMap, a bitmap representation of the visual weight of a presentation. In addition, we present algorithms that use WeightMaps to allow an automated layout system to evaluate the effectiveness of its layouts. Our approach is based on the concepts of visual weight and visual balance, which are fundamental to the visual arts. The objects in the layout are each assigned a visual weight, and a WeightMap is created that encodes the visual weight of the layout. Image-processing techniques, including pyramids and edge detection, are then used to efficiently analyze the WeightMap for balance. In addition, derivatives of the sums of the rows and columns are used to generate suggestions for how to improve the layout.

Journal ArticleDOI
TL;DR: This work applies a radial-basis function (RBF) network for implementing an adaptive metric which progressively models the notion of image similarity through continual relevance feedback from users, and shows that the proposed methods not only outperform conventional CBIR systems in terms of both accuracy and robustness, but also previously proposed interactive systems.
Abstract: An important requirement for constructing effective content-based image retrieval (CBIR) systems is accurate characterization of visual information. Conventional nonadaptive models, which are usually adopted for this task in simple CBIR systems, do not adequately capture all aspects of the characteristics of the human visual system. An effective way of addressing this problem is to adopt a "human-computer" interactive approach, where the users directly teach the system about what they regard as being significant image features and their own notions of image similarity. We propose a machine learning approach for this task, which allows users to directly modify query characteristics by specifying their attributes in the form of training examples. Specifically, we apply a radial-basis function (RBF) network for implementing an adaptive metric which progressively models the notion of image similarity through continual relevance feedback from users. Experimental results show that the proposed methods not only outperform conventional CBIR systems in terms of both accuracy and robustness, but also previously proposed interactive systems.

Journal ArticleDOI
TL;DR: A novel high capacity data hiding method based on JPEG that can achieve an impressively high embedding capacity of around 20% of the compressed image size with little noticeable degradation of image quality is proposed.
Abstract: The JPEG image is the most popular file format in relation to digital images. However, up to the present time, there seems to have been very few data hiding techniques taking the JPEG image into account. In this paper, we shall propose a novel high capacity data hiding method based on JPEG. The proposed method employs a capacity table to estimate the number of bits that can be hidden in each DCT component so that significant distortions in the stego-image can be avoided. The capacity table is derived from the JPEG default quantization table and the Human Visual System (HVS). Then, the adaptive least-significant bit (LSB) substitution technique is employed to process each quantized DCT coefficient. The proposed data hiding method enables us to control the level of embedding capacity by using a capacity factor. According to our experimental results, our new scheme can achieve an impressively high embedding capacity of around 20% of the compressed image size with little noticeable degradation of image quality.

Journal ArticleDOI
TL;DR: It is exposed here the possibility that the brain makes use of the spatio-temporal structure of spike patterns to encode information and how rapid selective neural responses can be generated rapidly through spike-timing-dependent plasticity (STDP) and how these selectivities can be used for visual representation and recognition.
Abstract: Where neural information processing is concerned, there is no debate about the fact that spikes are the basic currency for transmitting information between neurons. How the brain actually uses them to encode information remains more controversial. It is commonly assumed that neuronal firing rate is the key variable, but the speed with which images can be analysed by the visual system poses a major challenge for rate-based approaches. We will thus expose here the possibility that the brain makes use of the spatio-temporal structure of spike patterns to encode information. We then consider how such rapid selective neural responses can be generated rapidly through spike-timing-dependent plasticity (STDP) and how these selectivities can be used for visual representation and recognition. Finally, we show how temporal codes and sparse representations may very well arise one from another and explain some of the remarkable features of processing in the visual system.

Journal ArticleDOI
TL;DR: These results provide clear evidence against the notion of separate analysis of pattern and motion, as opposed to the general belief that the human visual cortex initially analyses spatial patterns independent of their movements.

Journal ArticleDOI
TL;DR: An objective quality metric that generates continuous estimates of perceived quality for low bit rate video is introduced based on a multichannel model of the human visual system that exceeds the performance of a similar metric based on the Mean Squared Error.
Abstract: An objective quality metric that generates continuous estimates of perceived quality for low bit rate video is introduced. The metric is based on a multichannel model of the human visual system. The vision model is initially parameterized to threshold data and then further optimized using video frames containing severe distortions. The proposed metric also discards processing of the finest scales to reduce computational complexity, which also results in an improvement in the accuracy of prediction for the sequences under consideration. A temporal pooling method suited to modeling continuous time waveforms is also introduced. The metric is parameterized and evaluated using the results of a Single Stimulus Continuous Quality Evaluation test conducted for CIF video at rates from 100 to 800 kbps . The proposed metric exceeds the performance of a similar metric based on the Mean Squared Error.

Proceedings ArticleDOI
07 Aug 2004
TL;DR: In this paper, the authors introduce the concept of vision-realistic rendering, which is the computer generation of synthetic images that incorporate the characteristics of a particular individual's entire optical system.
Abstract: We introduce the concept of vision-realistic rendering -- the computer generation of synthetic images that incorporate the characteristics of a particular individual's entire optical system. Specifically, this paper develops a method for simulating the scanned foveal image from wavefront data of actual human subjects, and demonstrates those methods on sample images.First, a subject's optical system is measured by a Shack-Hartmann wavefront aberrometry device. This device outputs a measured wavefront which is sampled to calculate an object space point spread function (OSPSF). The OSPSF is then used to blur input images. This blurring is accomplished by creating a set of depth images, convolving them with the OSPSF, and finally compositing to form a vision-realistic rendered image.Applications of vision-realistic rendering in computer graphics as well as in optometry and ophthalmology are discussed.

Journal Article
TL;DR: Four different HVS models, which exploit various properties of human eye, are described and a way of combining these three basic models to get better tradeoff between conflicting requirements of digital watermarks is presented.
Abstract: In this paper some Human Visual System (HVS) models used in digital image watermarking are presented. Four different HVS models, which exploit various properties of human eye, are described. Two of them operate in transform domain of Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT). HVS model in DCT domain consists of Just Noticeable Difference thresholds for corresponding DCT basis functions corrected by luminance sensitivity and selfor neighborhood contrast masking. HVS model in DWT domain is based on different HVS sensitivity in various DWT subbands. The third presented HVS model is composed of contrast thresholds as a function of spatial frequency and eye’s eccentricity. We present also a way of combining these three basic models to get better tradeoff between conflicting requirements of digital watermarks. The fourth HVS model is based on noise visibility in an image and is described by so called Noise Visibility Function (NVF). The possible ways of exploiting of the described HVS models in digital image watermarking are also briefly discussed.

Proceedings Article
01 Jan 2004
TL;DR: This paper investigates to what level viewers fail to notice degradations in image quality, between nontask related areas and task related areas, when quality parameters such as image resolution, edge antialiasing and reflection and shadows are altered.
Abstract: The perception of a virtual environment depends on the user and the task the user is currently performing in that environment. Models of the human visual system can thus be exploited to significantly reduce computational time when rendering high fidelity images, without compromising the perceived visual quality. This paper considers how an image can be selectively rendered when a user is performing a visual task in an environment. In particular, we investigate to what level viewers fail to notice degradations in image quality, between nontask related areas and task related areas, when quality parameters such as image resolution, edge antialiasing and reflection and shadows are altered.

Proceedings ArticleDOI
20 Oct 2004
TL;DR: In this article, the gaze sequence of an observer when presented with a previously seen image on a personal computer screen is used to identify points of gaze fixation and extract a compact representation of the significant screen locations.
Abstract: This paper presents a simple PIN-like approach to user authentication, using the gaze sequence of an observer when presented with a previously seen image on a personal computer screen. The method relies on the principle that the human visual system requires the eye to rest motionless for short periods, to assimilate detail at a given location in a visual scene. By deliberately looking at certain features or objects in a scene following a pre-defined sequence specified by the observer, an independent signature for personal identification can be established. Points of gaze fixation can be identified using a simple eye-tracker based on images from a typical webcam, and processed to extract a compact representation of the significant screen locations from the gaze sequence. Experimental results demonstrate that these "signatures" can be reliably and rapidly computed, and offer the advantage that they are difficult to detect by covert means.

Proceedings ArticleDOI
06 Sep 2004
TL;DR: This work introduces a novel anisotropic diffusion partial differential equation (PDE) that is applied to the 2D image of the scene rendered with a pin-hole camera, and it is a good approximation of the optical phenomenon, without the visual artifacts and depth inconsistencies present in other approaches.
Abstract: Computer graphics cameras lack the finite depth of field (DOF) present in real world ones. This results in all objects being rendered sharp regardless of their depth, reducing the realism of the scene. On top of that, real-world DOF provides a depth cue, that helps the human visual system decode the elements of a scene. Several methods have been proposed to render images with finite DOF, but these have always implied an important trade-off between speed and accuracy. We introduce a novel anisotropic diffusion partial differential equation (PDE) that is applied to the 2D image of the scene rendered with a pin-hole camera. In this PDE, the amount of blurring on the 2D image depends on the depth information of the 3D scene, present in the Z-buffer. This equation is well posed, has existence and uniqueness results, and it is a good approximation of the optical phenomenon, without the visual artifacts and depth inconsistencies present in other approaches. Because both inputs to our algorithm are present at the graphics card at every moment, we can run the processing entirely in the GPU. This fact, coupled with the particular numerical scheme chosen for our PDE, allows for real-time rendering using a programmable graphics card.

Book
01 Jan 2004
TL;DR: This chapter discusses visual problem solving in Geospatial Domains, an enterprise architecture for data mining and visual analysis of spatial data, and Multilevel analytical and visual decision framework for imagery conflation and registration.
Abstract: Preface, Color Plates PART 1. Visual Problem Solving and Decision Making 1. Decision process and its visual aspects, 2.Information visualization value stack model PART 2. Visual and Heterogeneous Reasoning 3. Visual reasoning and representation, 4. Representing visual decision making: a computational architecture for heterogeneous reasoning, 5. Algebraic visual symbolism for problem solving: iconic equations from Diophantus to our days, 6. Iconic reasoning architecture for analysis and decision making, 7.Toward visual reasoning and discovery: lessons from early history of mathematics PART 3. Visual Correlation 8. Visual correlation methods and models, 9.Iconic approach for data annotating, searching and correlating, 10. Bruegel iconic correlation system PART 4. Visual and Spatial Data Mining 11. Visualizing data streams, 12. SPIN! - an enterprise architecture for data mining and visual analysis of spatial data, 13. XML-based visualization and evaluation of data mining results, 14. Neural-network techniques for visual mining clinical electronencephalograms, 15. Visual data mining with simultaneous rescaling, 16. Visual data mining using monotone Boolean functions PART 5. Visual and Spatial Problem Solving in Geospatial Domains 17. Imagery integration as conflict resolution decision process: methods and approaches, 18. Multilevel analytical and visual decision framework for imagery conflation and registration, 19. Conflation of images with algebraic structures, 20. Algorithm development technology for conflation and area-based conflation algorithm, 21. Virtual experts for imagery registration and conflation

Proceedings ArticleDOI
05 Apr 2004
TL;DR: An entropy-based method is developed for selection of DWT coefficients that provides an adaptive way for determining the number of watermarked coefficients and watermarking factor at each level ofDWT decomposition.
Abstract: A new approach for non-blind watermarking of still gray level images in the wavelet domain. The method uses the human visual system (HVS) characteristic, and an innovative entropy based approach to create an efficient watermarking scheme is presented in this paper. It decomposes original image in DWT domain in to three hierarchical levels and watermarks it with a logo image, which is scrambled thru a well-known PN-sequence. An entropy-based method is developed for selection of DWT coefficients that provides an adaptive way for determining the number of watermarked coefficients and watermarking factor at each level of DWT decomposition. This approach shows an excellent resistance against almost all the attacks known in the watermarking literature. The detection results of the method reveal better resistance in comparison to the existance methods. With simple modifications, the method can be used for color images and in real time systems.

Proceedings ArticleDOI
07 Aug 2004
TL;DR: It is shown that second order image statistics are predominantly due to geometric modeling, while being largely unaffected by the choice of rendering parameters, and useful for modeling applications, which are shown in direct examples.
Abstract: The class of all natural images is an extremely small fraction of all possible images. Some of the structure of natural images can be modeled statistically, revealing striking regularities. Moreover, the human visual system appears to be optimized to view natural images. Images that do not behave statistically as natural images are harder for the human visual system to interpret. This paper reviews second order image statistics as well as their implications for computer graphics. We show that these statistics are predominantly due to geometric modeling, while being largely unaffected by the choice of rendering parameters. As a result, second order image statistics are useful for modeling applications, which we show in direct examples (recursive random displacement terrain modeling and solid texture synthesis). Finally, we present an image reconstruction filter based on second order image statistics.

Journal ArticleDOI
TL;DR: There is an increasing awareness of the impact of post-processing algorithms – particularly filtering – in diagnostic software applications and an awareness of these types of techniques is useful.
Abstract: Images from an ordinary consumer digital camera convey information at a wide range of spatial (and temporal) scales and enable the viewer to decompose the image into regions that are uniform in some way (colour, texture, ...), recognize familiar objects, determine spatial relationships between objects, and detect abnormalities (e.g. textural markings on a region expected to be plain). Though modern digital cameras are equipped with low noise electronics and excellent lenses that minimize pincushion (and similar) distortions, images also contain noise and artefacts such as red-eye in flash images. Widely distributed software packages such as Photoshop provide a set of ‘‘filtering’’ operations which enable the user to improve the image in some way: from image smoothing (typically local averaging) that removes noise and high frequencies, sharpening that increases high frequency content, contrast stretching, through to specialized algorithms, for example for red-eye reduction. Such image filtering is designed to improve the appearance of an image, relying on the human visual system to disregard any unwanted change of content of the image. Medical image analysis poses a far tougher challenge. First, there is an even greater need for image filtering, because medical images have a poorer noise-to-signal ratio than scenes taken with a digital camera, the spatial resolution is often frustratingly low, the contrast between anatomically distinct structures is often too low to be computed reliably using a standard image processing technique, and artefacts are common (e.g. motion and bias field in MRI). Second, changes to image content must be done in a highly controlled and reliable way that does not compromise clinical decision-making. For example, whereas it is generally acceptable to filter out local bright patches of noise, care must be taken in the case of mammography not to remove microcalcifications. This paper briefly explores some of the key areas of development in the area of filtering in Medical Imaging and how these techniques impact generally available software packages in routine use in a diagnostic setting. It is interesting to note that a great deal of image filtering takes place at what is usually regarded as a ‘‘preprocessing’’ stage in the formation of a medical image and is relatively invisible to a radiologist. However, there is an increasing awareness of the impact of post-processing algorithms – particularly filtering – in diagnostic software applications and an awareness of these types of techniques is useful. Noise equalization

Book ChapterDOI
11 May 2004
TL;DR: Experimental results indicate that the inclusion of visible differences information in fusion assessment yields metrics whose accuracy, with reference to subjective results, is superior to that obtained from the state of the art objective fusion performance measures.
Abstract: Multisensor signal-level image fusion has attracted considerable research attention recently. Whereas it is relatively straightforward to obtain a fused image, e.g. a simple but crude method is to average the input signals, assessing the performance of fusion algorithms is much harder in practice. This is particularly true in widespread “fusion for display” applications where multisensor images are fused and the resulting image is presented to a human operator. As recent studies have shown, the most direct and reliable image fusion evaluation method, subjective tests with a representative sample of potential users are expensive in terms of both time/effort and equipment required. This paper presents an investigation into the application of the Visible signal Differences Prediction modelling, to the objective evaluation of the performance of fusion algorithms. Thus given a pair of input images and a resulting fused image, the Visual Difference Prediction process evaluates the probability that a signal difference between each of the inputs and the fused image can be detected by the human visual system. The resulting probability maps are used to form objective fusion performance metrics and are also integrated with more complex fusion performance measures. Experimental results indicate that the inclusion of visible differences information in fusion assessment yields metrics whose accuracy, with reference to subjective results, is superior to that obtained from the state of the art objective fusion performance measures.

01 Jan 2004
TL;DR: Foveated image and video coding systems achieve increased compression efficiency by removing considerable high-frequency information redundancy from the regions away from the fixation point without significant loss of the reconstructed image or video quality.
Abstract: The human visual system (HVS) is highly space-variant in sampling, coding, processing, and understanding of visual information. The visual sensitivity is highest at the point of fixation and decreases dramatically with distance from the point of fixation. By taking advantage of this phenomenon, foveated image and video coding systems achieve increased compression efficiency by removing considerable high-frequency information redundancy from the regions away from the fixation point without significant loss of the reconstructed image or video quality.

Proceedings ArticleDOI
23 Jan 2004
TL;DR: A Gabor function based filter bank is used to separate the text and the nontext areas of comparable size and the technique is shown to work efficiently on different kinds of scanned document images, camera captured document images and sometimes on scenic images.
Abstract: Extraction of text areas is a necessary first step for taking a complex document image for diameter recognition task. In digital libraries, such OCR'ed text facilitates access to the image of document page through keyword search. Gabor filters, known to be simulating certain characteristics of the human visual system (HVS), have been employed for this task by a large number of scientists, in scanned document images. Adapting such a scheme for camera based document images is a relatively new approach. Moreover, design of the appropriate filters to separate text areas, which are assumed to be rich in high frequency components, from nontext areas is a difficult task. The difficulty increases if the clutter is also rich in high frequency components. Other reported works, on separating text from nontext areas, have used geometrical/structural information like shape and size of the regions in binarized document images. In this work, we have used a combination of the above mentioned approaches for the purpose. We have used connected component analysis (CCA), in binarized images, to segment nontext areas based on the size information of the connected regions. A Gabor function based filter bank is used to separate the text and the nontext areas of comparable size. The technique is shown to work efficiently on different kinds of scanned document images, camera captured document images and sometimes on scenic images.