Showing papers in &quot;Signal Processing-image Communication in 2006&quot;

Evaluation of relevance feedback schemes in content-based in retrieval systems

TL;DR: A real-time FTV is constructed including the complete chain from capturing to display, and a new algorithm was developed to generate free viewpoint images to create ray-based image engineering through the development of FTV.

...read moreread less

Abstract: We have been developing ray-based 3D information systems that consist of ray acquisition, ray processing, and ray display. Free viewpoint television (FTV) based on the ray-space method is a typical example. FTV will bring an epoch-making change in the history of television because it enables us to view a distant 3D world freely by changing our viewpoints as if we were there. We constructed a real-time FTV including the complete chain from capturing to display. A new algorithm was developed to generate free viewpoint images. In addition, a new user interface is presented for FTV to make full use of 3D information. FTV is not a pixel-based system but a ray-based system. We are creating ray-based image engineering through the development of FTV.

...read moreread less

261 citations

Journal Article•DOI•

[...]

Nikolaos Doulamis¹, Anastasios Doulamis¹•Institutions (1)

National Technical University of Athens¹

A multi-factors approach for image quality assessment based on a human visual system model

TL;DR: Three main types of relevance feedback algorithms are investigated; the Euclidean, the query point movements and the correlation-based approaches, and a newly objective criterion, called average normalized similarity metric distance is introduced which exploits the difference among the actual and ideal similarity measure among all best retrievals.

...read moreread less

Abstract: Multimedia content modeling, i.e., identification of semantically meaningful entities, is an arduous task mainly due to the fact that (a) humans perceive the content using high-level concepts and (b) the subjectivity of human perception, which often interprets the same content in a different way at different times. For this reason, an efficient content management system has to be adapted to current user's information needs and preferences through an on-line learning strategy based on users’ interaction. One adaptive learning strategy is relevance feedback, originally developed in traditional text-based information retrieval systems. In this way, the user interacts with the system to provide information about the relevance of the content, which is then fed back to the system to update its performance. In this paper, we evaluate and investigate three main types of relevance feedback algorithms; the Euclidean, the query point movements and the correlation-based approaches. In the first case, we examine heuristic and optimal techniques which are based either on the weighted or the generalized Euclidean distance. In the second case, we survey single and multipoint query movement schemes. As far as the third type is concerned, two different ways for parametrizing the normalized cross-correlation similarity metric are proposed. The first scales only the elements of the query feature vector and called query-scaling strategy, while the second scales both the query and the selected samples (query-sample scaling strategy). All the examined algorithms are evaluated using both subjective and objective criteria. Subjective evaluation is performed by depicting the best retrieved images as response of the system to a user's query. Instead, objective evaluation is obtained using standard criteria, such as the precision–recall curve and the average normalized modified retrieval rank (ANMRR). Furthermore, a newly objective criterion, called average normalized similarity metric distance is introduced which exploits the difference among the actual and ideal similarity measure among all best retrievals. Discussions and comparisons of all the aforementioned relevance feedback algorithms are presented.

...read moreread less

74 citations

Journal Article•DOI•

[...]

Giaime Ginesu¹, Francesco Massidda¹, Daniele D. Giusto¹•Institutions (1)

University of Cagliari¹

Rate-distortion analysis for light field coding and streaming

TL;DR: The proposed visual quality metric is based on an effective Human Visual System model and relies on the computation of three distortion factors: blockiness, edge errors and visual impairments, which take into account the typical artifacts introduced by several classes of coders.

...read moreread less

Abstract: In this paper, a multi-factor full-reference image quality index is presented. The proposed visual quality metric is based on an effective Human Visual System model. Images are pre-processed in order to take into account luminance masking and contrast sensitivity effects. The proposed metric relies on the computation of three distortion factors: blockiness, edge errors and visual impairments, which take into account the typical artifacts introduced by several classes of coders. A pooling algorithm is used in order to obtain a single distortion index. Results show the effectiveness of the proposed approach and its consistency with subjective evaluations.

...read moreread less

64 citations

Journal Article•DOI•

[...]

P. Ramanathan¹, Bernd Girod¹•Institutions (1)

Stanford University¹

Layered Light-Field Rendering with Focus Measurement

TL;DR: A theoretical framework to analyze the rate-distortion performance of a light field coding and streaming system is proposed, revealing that the efficiency gains from more accurate geometry, increase as correlation between images increases.

...read moreread less

Abstract: A theoretical framework to analyze the rate-distortion performance of a light field coding and streaming system is proposed. This framework takes into account the statistical properties of the light field images, the accuracy of the geometry information used in disparity compensation, and the prediction dependency structure or transform used to exploit correlation among views. Using this framework, the effect that various parameters have on compression efficiency is studied. The framework reveals that the efficiency gains from more accurate geometry, increase as correlation between images increases. The coding gains due to prediction suggested by the framework match those observed from experimental results. This framework is also used to study the performance of light field streaming by deriving a view-trajectory-dependent rate-distortion function. Simulation results show that the streaming results depend both the prediction structure and the viewing trajectory. For instance, independent coding of images gives the best streaming performance for certain view trajectories. These and other trends described by the simulation results agree qualitatively with actual experimental streaming results.

...read moreread less

41 citations

Journal Article•DOI•

[...]

Keita Takahashi¹, Takeshi Naemura¹•Institutions (1)

University of Tokyo¹

Rate-distortion-optimized predictive compression of dynamic 3D mesh sequences

TL;DR: A new image-based rendering method that uses input from an array of cameras and synthesizes high-quality free-viewpoint images in real-time and discusses the focus measurement scheme in both spatial and frequency domains.

...read moreread less

Abstract: This paper introduces a new image-based rendering method that uses input from an array of cameras and synthesizes high-quality free-viewpoint images in real-time. The input cameras can be roughly arranged, if they are calibrated in advance. Our method uses a set of depth layers to deal with scenes with large depth ranges, but does not require prior knowledge of the scene geometry. Instead, during the on-the-fly process, the optimal depth layer is automatically assigned to each pixel on the synthesized image by using our focus measurement scheme. We implemented the rendering method and achieved nearly interactive frame rates on a commodity PC. This paper also discusses the focus measurement scheme in both spatial and frequency domains. The discussion in the spatial domain is practical since it can be applied for arbitrary camera arrays. On the other hand, the frequency domain analysis is theoretically interesting since it proves that a signal-processing theory is applicable to the depth assignment problem.

...read moreread less

41 citations

Journal Article•DOI•

[...]

Karsten Müller¹, Aljoscha Smolic¹, Matthias Kautzner¹, Peter Eisert¹, Thomas Wiegand¹ - Show less +1 more•Institutions (1)

Fraunhofer Society¹

01 Oct 2006-Signal Processing-image Communication

TL;DR: An RD-optimized dynamic 3D mesh coder that includes different prediction modes as well as an RD cost computation that controls the mode selection across all possible spatial partitions of a mesh to find the clustering structure together with the associated prediction modes is presented.

...read moreread less

Abstract: Compression of computer graphics data such as static and dynamic 3D meshes has received significant attention in recent years, since new applications require transmission over channels and storage on media with limited capacity. This includes pure graphics applications (virtual reality, games) as well as 3DTV and free viewpoint video. Efficient compression algorithms have been developed first for static 3D meshes, and later for dynamic 3D meshes and animations. Standard formats are available for instance in MPEG-4 3D mesh compression for static meshes, and Interpolator Compression for the animation part. For some important types of 3D objects, e.g. human head or body models, facial and body animation parameters have been introduced. Recent results for compression of general dynamic meshes have shown that the statistical dependencies within a mesh sequence can be exploited well by predictive coding approaches. Coders introduced so far use experimentally determined or heuristic thresholds for tuning the algorithms. In video coding, rate-distortion (RD) optimization is often used to avoid fixed thresholds and to select the optimum prediction mode. We applied these ideas and present here an RD-optimized dynamic 3D mesh coder. It includes different prediction modes as well as an RD cost computation that controls the mode selection across all possible spatial partitions of a mesh to find the clustering structure together with the associated prediction modes. The general coding structure is derived from statistical analysis of mesh sequences and exploits temporal as well as spatial mesh dependencies. To evaluate the coding efficiency of the developed coder, comparative coding results for mesh sequences at different resolutions were carried out.

...read moreread less

38 citations

Journal Article•DOI•

Multi-object image retrieval based on shape and topology

[...]

Naif Alajlan¹, Mohamed S. Kamel¹, G.H. Freeman¹•Institutions (1)

University of Waterloo¹

A perceptually optimised video coding system for sign language communication at low bit rates

TL;DR: A continuous optimization approach to solve the MSSI problem and a very effective dynamic programming algorithm to measure the similarity between the attributed nodes are adapted for shape-based matching of multi-object images.

...read moreread less

Abstract: We aim at developing a geometry-based retrieval system for multi-object images. We model both shape and topology of image objects including holes using a structured representation called curvature tree (CT); the hierarchy of the CT reflects the inclusion relationships between the objects and holes. To facilitate shape-based matching, triangle-area representation (TAR) of each object and hole is stored at the corresponding node in the CT. The similarity between two multi-object images is measured based on the maximum similarity subtree isomorphism (MSSI) between their CTs. For this purpose, we adapt a continuous optimization approach to solve the MSSI problem and a very effective dynamic programming algorithm to measure the similarity between the attributed nodes. Our matching scheme agrees with many recent findings in psychology about the human perception of multi-object images. Experiments on a database of 1500 logos and the MPEG-7 CE-1 database of 1400 shape images have shown the significance of the proposed method.

...read moreread less

37 citations

Journal Article•DOI•

[...]

Dimitris Agrafiotis¹, Nishan Canagarajah¹, David Bull¹, Jim Kyle², Helen Seers², Matthew W. G. Dye³ - Show less +2 more•Institutions (3)

University of Bristol¹, Centre for Deaf Studies, Bristol², University of Rochester³

01 Aug 2006-Signal Processing-image Communication

TL;DR: This paper first analyse the sign language viewer's eye-gaze, based on the results of an eye-tracking study that was conducted, as well as the video content involved in sign language person-to-person communication, and proposes a sign language video coding system using foveated processing.

...read moreread less

Abstract: The ability to communicate remotely through the use of video as promised by wireless networks and already practised over fixed networks, is for deaf people as important as voice telephony is for hearing people. Sign languages are visual–spatial languages and as such demand good image quality for interaction and understanding. In this paper, we first analyse the sign language viewer's eye-gaze, based on the results of an eye-tracking study that we conducted, as well as the video content involved in sign language person-to-person communication. Based on this analysis we propose a sign language video coding system using foveated processing, which can lead to bit rate savings without compromising the comprehension of the coded sequence or equivalently produce a coded sequence with higher comprehension value at the same bit rate. We support this claim with the results of an initial comprehension assessment trial of such coded sequences by deaf users. The proposed system constitutes a new paradigm for coding sign language image sequences at limited bit rates.

...read moreread less

34 citations

Journal Article•DOI•

Joint near-lossless compression and watermarking of still images for authentication and tamper localization

[...]

Roberto Caldelli¹, Francesco Filippini¹, Mauro Barni²•Institutions (2)

University of Florence¹, University of Siena²

Multi-view synthesis: A novel view creation approach for free viewpoint video

TL;DR: Experimental results show the ability of the system to detect tampering and to limit the peak error between the original and the processed images.

...read moreread less

Abstract: A system is presented to jointly achieve image watermarking and compression The watermark is a fragile one being intended for authentication purposes The watermarked and compressed images are fully compliant with the JPEG-LS standard, the only price to pay being a slight reduction of compression efficiency and an additional distortion that can be anyway tuned to grant a maximum preset error Watermark detection is possible both in the compressed and in the pixel domain, thus increasing the flexibility and usability of the system The system is expressly designed to be used in remote sensing and telemedicine applications, hence we designed it in such a way that the maximum compression and watermarking error can be strictly controlled (near-lossless compression and watermarking) Experimental results show the ability of the system to detect tampering and to limit the peak error between the original and the processed images

...read moreread less

34 citations

Journal Article•DOI•

[...]

Eddie Cooke¹, Peter Kauff², Thomas Sikora³•Institutions (3)

Dublin City University¹, Fraunhofer Society², Technical University of Berlin³

Joint moving cast shadows segmentation and light source detection in video sequences

TL;DR: This paper presents a new multiple image view synthesis algorithm for novel view creation that requires only implicit scene geometry information and identifies and selects only the best quality surface areas from available reference images, thereby reducing perceptual errors in virtual view reconstruction.

...read moreread less

Abstract: Interactive audio-visual applications such as free viewpoint video (FVV) endeavour to provide unrestricted spatio-temporal navigation within a multiple camera environment. Current novel view creation approaches for scene navigation within FVV applications are either purely image-based, implying large information redundancy and dense sampling of the scene; or involve reconstructing complex 3-D models of the scene. In this paper we present a new multiple image view synthesis algorithm for novel view creation that requires only implicit scene geometry information. The multi-view synthesis approach can be used in any multiple camera environment and is scalable, as virtual views can be created given 1 to N of the available video inputs, providing a means to gracefully handle scenarios where camera inputs decrease or increase over time. The algorithm identifies and selects only the best quality surface areas from available reference images, thereby reducing perceptual errors in virtual view reconstruction. Experimental results are provided and verified using both objective (PSNR) and subjective comparisons and also the improvements over the traditional multiple image view synthesis approach of view-oriented weighting are presented.

...read moreread less

32 citations

Journal Article•DOI•

[...]

Henri Nicolas¹, Jean-Marie Pinel¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Jan 2006-Signal Processing-image Communication

TL;DR: Experimental results show that good shadow and object contours and light source locations are obtained with the proposed method even if the theoretical assumptions are not fully valid.

...read moreread less

Abstract: This paper proposes a new method which allows a joint estimation of the light source projection on the image plane and the segmentation of moving cast shadows in natural video sequences. It allows improving the segmentation of moving objects by separating clearly cast shadows from moving objects. The method is based on a shadow model which mainly assumes that the cast shadows are projected on plane and Lambertian surfaces, and that the light source is unique. The moving cast shadows, including the penumbra, are detected using a segmentation method based on a comparison between a reference image and the original one. The light source position is estimated using geometrical relations linking the light source, the object and its cast shadow on the 2-D image plane. This is obtained using a robust temporal filtering method. For each image using the current estimation of the light source position and the video object contours, a cast shadow search area is defined. This reduces the risk of false detections during the segmentation process, and thus allows increasing the detection rate and reducing the false alarm one. Experimental results show that good shadow and object contours and light source locations are obtained with the proposed method even if the theoretical assumptions are not fully valid.

...read moreread less

Journal Article•DOI•

On the use of independent component analysis for image compression

[...]

Artur J. Ferreira¹, Mário A. T. Figueiredo²•Institutions (2)

Instituto Superior de Engenharia de Lisboa¹, Instituto Superior Técnico²

A fast block-matching algorithm based on adaptive search area and its VLSI architecture for H.264/AVC

TL;DR: ICA provides an excellent tool for learning a coder for a specific image class, which can even be done using a single image from that class, and generalizes very well for a wide range of image classes.

...read moreread less

Abstract: This paper addresses the use of independent component analysis (ICA) for image compression. Our goal is to study the adequacy (for lossy transform compression) of bases learned from data using ICA. Since these bases are, in general, non-orthogonal, two methods are considered to obtain image representations: matching pursuit type algorithms and orthogonalization of the ICA bases followed by standard orthogonal projection. Several coder architectures are evaluated and compared, using both the usual SNR and a perceptual quality measure called picture quality scale . We consider four classes of images (natural, faces, fingerprints, and synthetic) to study the generalization and adaptation abilities of the data-dependent ICA bases. In this study, we have observed that: bases learned from natural images generalize well to other classes of images; bases learned from the other specific classes show good specialization. For example, for fingerprint images, our coders perform close to the special-purpose WSQ coder developed by the FBI. For some classes, the visual quality of the images obtained with our coders is similar to that obtained with JPEG2000, which is currently the state-of-the-art coder and much more sophisticated than a simple transform coder. We conclude that ICA provides a excellent tool for learning a coder for a specific image class, which can even be done using a single image from that class. This is an alternative to hand tailoring a coder for a given class (as was done, for example, in the WSQ for fingerprint images). Another conclusion is that a coder learned from natural images acts like an universal coder, that is, generalizes very well for a wide range of image classes.

...read moreread less

Journal Article•DOI•

[...]

Ying-Lai Xi¹, Chong-Yang Hao¹, Yangyu Fan¹, Hong-Qi Hu¹•Institutions (1)

Northwestern Polytechnical University¹

3d modeling and rendering from multiple wide-baseline images by match propagation

TL;DR: A fast block-matching algorithm based on search center prediction and search early termination, called center-prediction and early-termination based motion search algorithm (CPETS), which satisfies high performance and efficient VLSI implementation and outperforms some popular fast algorithms.

...read moreread less

Abstract: In this paper, we propose a fast block-matching algorithm based on search center prediction and search early termination, called center-prediction and early-termination based motion search algorithm (CPETS). The CPETS satisfies high performance and efficient VLSI implementation. It makes use of the spatial and temporal correlation in motion vector (MV) fields and feature of all-zero blocks to accelerate the searching process. This paper describes the CPETS with three levels. At the coarsest level, which happens when center prediction fails, the search area is defined to enclose all original search range. At the middle level, the search area is defined as a 7×7-pels square area around the predicted center. At the finest level, a 5×5-pels search area around the predicted center is adopted. At each level, 9-points uniformly allocated search pattern is adopted. The experiment results show that the CPETS is able to achieve a reduction of 95.67% encoding time in average compared with full-search scheme, with a negligible peak signal-noise ratio (PSNR) loss and bitrate increase. Also, the efficiency of CPETS outperforms some popular fast algorithms such as: three-step search, new three-step search, four-step search evidently. This paper also describes an efficient four-way pipelined VLSI architecture based on the CPETS for H.264/AVC coding. The proposed architecture divides current block and search area into four sub-regions, respectively, with 4:1 sub-sampling and processes them in parallel. Also, each sub-region is processed by a pipelined structure to ensure the search for nine candidate points is performed simultaneously. By adopting search early-termination strategy, the architecture can compute one MV for 16×16 block in 81 clock cycles in the best case and 901 clock cycles in the poorest case. The architecture has been designed and simulated with VHDL language. Simulation results show that the proposed architecture achieves a high performance for real-time motion estimation. Only 47.3 K gates and 1624×8 bits on-chip RAM are needed for a search range of (−15, +15) with three reference frames and four candidate block modes by using 36 processing elements.

...read moreread less

Journal Article•DOI•

[...]

Jian Yao¹, Wai-Kuen Cham¹•Institutions (1)

The Chinese University of Hong Kong¹

Marker-based image segmentation relying on disjoint set union

TL;DR: Experimental results show that the proposed method can produce good scene models from a small set of widely separated images and synthesize novel views in good quality.

...read moreread less

Abstract: In this paper, we present a method for modeling a complex scene from a small set of input images taken from widely separated viewpoints and then synthesizing novel views. First, we find sparse correspondences across multiple input images and calibrate these input images taken with unknown cameras. Then one of the input images is chosen as the reference image for modeling by match propagation. A sparse set of reliably matched pixels in the reference image is initially selected and then propagated to neighboring pixels based on both the clustering-based light invariant photoconsistency constraint and the data-driven depth smoothness constraint, which are integrated into a pixel matching quality function to efficiently deal with occlusions, light changes and depth discontinuity. Finally, a novel view rendering algorithm is developed to fast synthesize a novel view by match propagation again. Experimental results show that the proposed method can produce good scene models from a small set of widely separated images and synthesize novel views in good quality.

...read moreread less

Journal Article•DOI•

[...]

Hai Gao¹, Weisi Lin², Ping Xue¹, Wan-Chi Siu³•Institutions (3)

Nanyang Technological University¹, Institute for Infocomm Research Singapore², Hong Kong Polytechnic University³

01 Feb 2006-Signal Processing-image Communication

TL;DR: A new marker-based segmentation algorithm relying on disjoint set union is proposed in this paper, which consists of three steps, namely: pixel sorting, set union, and pixel resolving.

...read moreread less

Abstract: Marker-based image segmentation has been widely used in image analysis and understanding. The well-known Meyer's marker-based watershed algorithm by immersion is realized using the hierarchical circular queues. A new marker-based segmentation algorithm relying on disjoint set union is proposed in this paper. It consists of three steps, namely: pixel sorting, set union, and pixel resolving. The memory requirement for the proposed algorithm is fixed as 2× N integers ( N is the image size), whereas the memory requirement for Meyer's algorithm is image dependent. The advantage of the proposed algorithm lies at its regularity and simplicity in software/firmware/hardware implementation.

...read moreread less

Journal Article•DOI•

Combined turbo coding and hierarchical QAM for unequal error protection of H.264 coded video

[...]

Bashar Barmada¹, Mohammad Mahdi Ghandi¹, E.V. Jones¹, Mohammed Ghanbari¹•Institutions (1)

University of Essex¹

Image coding based on wavelet transform and uniform scalar dead zone quantizer

TL;DR: Simulation results show that, over a wide range of channel signal-to-noise ratio (SNR), the combined technique is superior to non-scalable transmission and outperforms UEP with turbo coding alone.

...read moreread less

Abstract: This paper investigates the unequal error protected (UEP) transmission of scalable H.264 bitstreams with two-priority layers, where differentiated turbo coding provides better protection for the high priority (HP) base layer than for the low priority (LP) enhancement layer. The drawback of such a method is the high overhead introduced by the channel coding, which results in a low source data rate for the HP layer, and hence lowers video quality. To overcome this problem, we introduce an efficient combination of turbo coding and hierarchical quadrature amplitude modulation (HQAM) to provide a high protection for the HP layer and at the same time maintaining the requisite channel coding redundancy. Simulation results show that, over a wide range of channel signal-to-noise ratio (SNR), our combined technique is superior to non-scalable transmission and outperforms UEP with turbo coding alone.

...read moreread less

Journal Article•DOI•

[...]

Jianhua Chen¹, Yufeng Zhang¹, Xinling Shi¹•Institutions (1)

Yunnan University¹

01 Aug 2006-Signal Processing-image Communication

TL;DR: Experimental results show that the compression performance of the proposed coding algorithm is competitive to other wavelet-based image coding algorithms reported in the literature.

...read moreread less

Abstract: In this paper, a new wavelet transform image coding algorithm is presented The discrete wavelet transform (DWT) is applied to the original image The DWT coefficients are firstly quantized with a uniform scalar dead zone quantizer Then the quantized coefficients are decomposed into four symbol streams: a binary significance map symbol stream, a binary sign stream, a position of the most significant bit (PMSB) symbol stream and a residual bit stream An adaptive arithmetic coder with different context models is employed for the entropy coding of these symbol streams Experimental results show that the compression performance of the proposed coding algorithm is competitive to other wavelet-based image coding algorithms reported in the literature

...read moreread less

Journal Article•DOI•

Low-complexity compression of multispectral images based on classified transform coding

[...]

Marco Cagnazzo, Luca Cicala, Giovanni Poggi, Luisa Verdoliva

A clustering-based color model and integral images for fast object tracking

TL;DR: Experiments carried out on several multispectral images show that the resulting unsupervised coder has a fully acceptable complexity, and a rate-distortion performance which is superior to that of the original supervised coder, and comparable to the best coders known in the literature.

...read moreread less

Abstract: Compression of remote-sensing images can be necessary in various stages of the image life, and especially on-board a satellite before transmission to the ground station. Although on-board CPU power is quite limited, it is now possible to implement sophisticated real-time compression techniques, provided that complexity constraints are taken into account at design time. In this paper we consider the class-based multispectral image coder originally proposed in [Gelli and Poggi, Compression of multispectral images by spectral classification and transform coding, IEEE Trans. Image Process. (April 1999) 476-489 [5]] and modify it to allow its use in real time with limited hardware resources. Experiments carried out on several multispectral images show that the resulting unsupervised coder has a fully acceptable complexity, and a rate-distortion performance which is superior to that of the original supervised coder, and comparable to that of the best coders known in the literature.

...read moreread less

Journal Article•DOI•

[...]

Li Peihua

A novel framework for the interactive transmission of 3D scenes

TL;DR: The Integral Images for computation of histogram, mean and variance are proposed, with which the similarity measure can be evaluated at negligible computational cost and exhaustive search is made efficiently for object localization which guarantees the global maximum be achieved.

...read moreread less

Abstract: The paper presents a clustering-based color model and develops a fast algorithm for object tracking. The color model is built upon K-means clustering, by which the color space of the object can be partitioned adaptively and the histogram bins can be determined accordingly. In addition, in each bin the multi-channel gray level is modelled as Gaussian distribution. Defined in this way the color model can describe accurately the color distribution with very little bins. To evaluate similarity between the reference model and the candidate model, a similarity measure based on Bhattacharrya distance is introduced and its simplified form is derived under assumption that in each bin distribution of gray level in different channel is independent of each other. Motivated by the paper of Viola and Jones, the Integral Images for computation of histogram, mean and variance are proposed, with which the similarity measure can be evaluated at negligible computational cost. Thus, exhaustive search is made efficiently for object localization which guarantees the global maximum be achieved. Comparisons with the well-known mean shift algorithm demonstrate that the proposed algorithm has better performance while having the same (or less) computational cost.

...read moreread less

Journal Article•DOI•

[...]

Pietro Zanuttigh¹, Pietro Zanuttigh², N. Brusco², N. Brusco¹, David Taubman², Guido M. Cortelazzo¹ - Show less +2 more•Institutions (2)

University of Padua¹, University of New South Wales²

01 Oct 2006-Signal Processing-image Communication

TL;DR: An interactive browsing environment for 3D scenes, which allows for the dynamic optimization of selected client views by distributing available transmission resources between geometry and texture components, is considered.

...read moreread less

Abstract: We consider an interactive browsing environment for 3D scenes, which allows for the dynamic optimization of selected client views by distributing available transmission resources between geometry and texture components. Texture information is available at a server in the form of scalably compressed images, corresponding to a multitude of original image views. Surface geometry is also available at the server in the form of scalably compressed depth maps, again corresponding to a multitude of original views. Texture and depth components are both open to augmentation as more content becomes available. At any point in the interactive browsing experience, the server must decide how to allocate transmission resources between the delivery of new elements from the various original view bit-streams and new elements from the original geometry bit-streams. The proposed framework implicitly supports dynamic view sub-sampling, based on rate-distortion criteria, since the best server policy is not always to send the nearest original view image to the one which the client is rendering. In this paper, we particularly elaborate upon a novel strategy for distortion-sensitive synthesis of both geometry and rendered imagery at the client, based upon whatever data is provided by the server. We also outline how the JPIP standard for interactive communication of JPEG2000 images, can be leveraged for the 3D scene browsing application.

...read moreread less

Journal Article•DOI•

BFlavor: A harmonized approach to media resource adaptation, inspired by MPEG-21 BSDL and XFlavor

[...]

Wesley De Neve¹, Davy Van Deursen¹, Davy De Schrijver¹, Sam Lerouge¹, Koen De Wolf¹, Rik Van de Walle¹ - Show less +2 more•Institutions (1)

Ghent University¹

A variable block size motion estimation algorithm for real-time H.264 video encoding

TL;DR: BFlavor is an efficient and harmonized description tool for enabling XML-driven adaptation of media resources in a format-agnostic way and is outperformed by BSDL and XFlavor in terms of execution times, memory consumption, and file sizes.

...read moreread less

Abstract: During recent years, several tools have been developed that allow the automatic generation of XML descriptions containing information about the syntax of binary media resources. Such a bitstream syntax description (BSD) can then be transformed to reflect a desired adaptation of a media resource, and can subsequently be used to create a tailored version of this resource. The main contribution of this paper is the introduction of BFlavor, a new tool for exposing the syntax of binary media resources as an XML description. Its development was inspired by two other technologies, i.e. MPEG-21 BSDL and XFlavor. Although created from a different point of view, both languages offer solutions for translating the syntax of a media resource into an XML representation for further processing. BFlavor (BSDL+XFlavor) harmonizes the two technologies by combining their strengths and eliminating their weaknesses. More precisely, the processing efficiency and expressive power of XFlavor on the one hand, and the ability to create high-level BSDs using MPEG-21 BSDL on the other hand, were our key motives for its development. To assess the expressive power and performance of a BFlavor-driven content adaptation chain, several experiments were conducted. These experiments test the automatic generation of BSDs for MPEG-1 Video and H.264/AVC, as well as the exploitation of multi-layered temporal scalability in H.264/AVC. Our results show that BFlavor is an efficient and harmonized description tool for enabling XML-driven adaptation of media resources in a format-agnostic way. BSDL and XFlavor are outperformed by BFlavor in terms of execution times, memory consumption, and file sizes.

...read moreread less

Journal Article•DOI•

[...]

Nadeem Ahmad Khan¹, Shahid Masud¹, Amna Ahmad¹•Institutions (1)

Lahore University of Management Sciences¹

Gaze tracking for region of interest coding in JPEG 2000

TL;DR: This paper presents an efficient variable block size motion estimation algorithm for use in real-time H.264 video encoder implementation that results in over 80% reduction in the encoding time over full search reference implementation and around 55% improvement over the fastmotion estimation algorithm (FME) of the reference implementation.

...read moreread less

Abstract: This paper presents an efficient variable block size motion estimation algorithm for use in real-time H.264 video encoder implementation. In this recursive motion estimation algorithm, results of variable block size modes and motion vectors previously obtained for neighboring macroblocks are used in determining the best mode and motion vectors for encoding the current macroblock. Considering only a limited number of well chosen candidates helps reduce the computational complexity drastically. An additional fine search stage to refine the initially selected motion vector enhances the motion estimator accuracy and SNR performance to a value close to that of full search algorithm. The proposed methods result in over 80% reduction in the encoding time over full search reference implementation and around 55% improvement in the encoding time over the fast motion estimation algorithm (FME) of the reference implementation. The average SNR and compression performance do not show significant difference from the reference implementation. Results based on a number of video sequences are presented to demonstrate the advantage of using the proposed motion estimation technique.

...read moreread less

Journal Article•DOI•

[...]

Anthony Nguyen¹, Vinod Chandran¹, Sridha Sridharan¹•Institutions (1)

Queensland University of Technology¹

Spatial error concealment with edge related perceptual considerations

TL;DR: In this paper, the human visual attention mechanism direct the viewer's eye movements around the image to provide a sequence of fixations, which are analyzed, clustered and classified into regions of interest (ROI).

...read moreread less

Abstract: Current image coding systems such as JPEG are far away from the capability of the human perceptual system in that the encoding may not maximise the reconstruction quality of image contents. Humans are often concerned with the interpretability of the image and thus enhanced reconstruction quality in image contents would facilitate improved recognition performance. This paper addresses this issue by incorporating characteristics of the human perceptual system into an image coding system. This is achieved by analysing the spatial and temporal characteristics of the human visual attention system as recorded from an eye-tracking device at the encoding end. Human visual attention mechanisms direct the viewer's eye movements around the image to provide a sequence of fixations, which are analysed, clustered and classified into regions of interest (ROI). These ROIs are used to selectively encode and prioritise regions such that an improved image content recognition performance can be achieved.

...read moreread less

Journal Article•DOI•

[...]

Dimitris Agrafiotis¹, David Bull¹, Nishan Canagarajah¹•Institutions (1)

University of Bristol¹

01 Feb 2006-Signal Processing-image Communication

TL;DR: This paper proposes a spatial error concealment method that uses edge-related information in order not only to preserve existing edges but also to avoid introducing new strong ones by switching to a smooth approximation of missing information where necessary.

...read moreread less

Abstract: Video transmission over error-prone networks can suffer from packet erasures which can greatly reduce the quality of the received video. Error concealment methods reduce the perceived quality degradation at the receiving end by masking the effects of such errors. They accomplish this by exploiting temporal and spatial correlations that exist in image sequences. Spatial error concealment approaches conceal errors by making use of spatial information only which is necessary in cases where motion information is not available or reliable. The performance of such methods can be greatly increased if perceptual considerations are taken into account as, e.g., the preservation of edge information. This paper proposes a spatial error concealment method that uses edge-related information in order not only to preserve existing edges but also to avoid introducing new strong ones by switching to a smooth approximation of missing information where necessary. A novel switching algorithm which uses the directional entropy of neighbouring edges chooses between two interpolation methods, a directional along detected edges or a bilinear using the nearest neighbouring pixels. Results show that the performance of the proposed method is better compared to both ‘single interpolation’ and to edge strength-based switching methods.

...read moreread less

Journal Article•DOI•

On-demand transmission of 3D models over lossy networks

[...]

Dihong Tian¹, Ghassan AlRegib¹•Institutions (1)

Georgia Institute of Technology¹

Two-stage multiple description image coders: Analysis and comparative study

TL;DR: Simulation results show the efficacy of the proposed transmission system using hybrid unequal-error-protection and selective-retransmission for 3D meshes which are encoded with multi-resolutions in reducing transmission latency and providing smooth performance for interactive applications.

...read moreread less

Abstract: Three-dimensional (3D) meshes are used intensively in distributed graphics applications where model data are transmitted on demand to users’ terminals and rendered for interactive manipulation. For real-time rendering and high-resolution visualization, the transmission system should adapt to both data properties and transport link characteristics while providing scalability to accommodate terminals with disparate rendering capabilities. This paper presents a transmission system using hybrid unequal-error-protection and selective-retransmission for 3D meshes which are encoded with multi-resolutions. Based on the distortion-rate performance of the 3D data, the end-to-end channel statistics and the network parameters, transmission policies that maximize the service quality for a client-specific constraint is determined with linear computation complexity. A TCP-friendly protocol is utilized to further provide performance stability over time as well as bandwidth fairness for parallel flows in the network. Simulation results show the efficacy of the proposed transmission system in reducing transmission latency and providing smooth performance for interactive applications. For example, for a fixed rendering quality, the proposed system achieves 20–30% reduction in transmission latency compared to the system based on 3TP, which is a recently presented 3D application protocol using hybrid TCP and UDP.

...read moreread less

Journal Article•DOI•

[...]

Andrey Norkin¹, Atanas Gotchev¹, Karen Egiazarian¹, Jaakko Astola¹•Institutions (1)

Tampere University of Technology¹

Mosaic-based 3D scene representation and rendering

TL;DR: A general two-stage multiple description coding (MDC) scheme using whitening transform is analyzed and identifies the importance of a good coarse approximation and explores different approaches for changing its resolution and coding it.

...read moreread less

Abstract: A general two-stage multiple description coding (MDC) scheme using whitening transform is analyzed. It represents the original image in a form of a coarse image approximation and a residual image. The coarse approximation is subsequently duplicated and combined with the residual image further split into two descriptions using a checkerboard block transform coefficients rearrangement. We identify the importance of a good coarse approximation and explore different approaches for changing its resolution and coding it. We also propose different approaches for coding the residual signal. The coder scheme is quite simple and yet achieves high performance comparable with other MDC methods.

...read moreread less

Journal Article•DOI•

[...]

Zhigang Zhu¹, Allen R. Hanson²•Institutions (2)

City College of New York¹, University of Massachusetts Amherst²

01 Oct 2006-Signal Processing-image Communication

TL;DR: A geometric representation is developed that can re-organize the original perspective images into a set of parallel projections with different oblique viewing angles that can be used as both an advanced video interface and a pre-processing step for 3D reconstruction.

...read moreread less

Abstract: In this paper we address the problem of fusing images from many video cameras or a moving video camera. The captured images have obvious motion parallax, but they will be aligned and integrated into a few mosaics with a large field-of-view (FOV) that preserve 3D information. We have developed a compact geometric representation that can re-organize the original perspective images into a set of parallel projections with different oblique viewing angles. In addition to providing a wide field of view, mosaics with various oblique views well represent occlusion regions that cannot be seen in a usual nadir view. Stereo mosaic pairs can be formed from mosaics with different oblique viewing angles. This representation can be used as both an advanced interface for interactive 3D video and a pre-processing step for 3D reconstruction. A ray interpolation approach for generating the parallel-projection mosaics is presented, and efficient 3D scene/object rendering based on multiple parallel-projection mosaics is discussed. Several real-world examples are provided, with applications ranging from aerial video surveillance/environmental monitoring, ground mobile robot navigation, to under-vehicle inspection.

...read moreread less

Journal Article•DOI•

Partial linear regression for speech-driven talking head application

[...]

Chao-Kuei Hsieh¹, Yung-Chang Chen¹•Institutions (1)

National Tsing Hua University¹

01 Jan 2006-Signal Processing-image Communication

TL;DR: A new adaptation method, called the partial linear regression (PLR), is proposed and adopted in an audio-driven talking head application, which allows users to adapt the partial parameters from the available adaptive data while keeping the others unchanged.

...read moreread less

Abstract: Avatars in many applications are constructed manually or by a single speech-driven model which needs a lot of training data and long training time. It is essential to build up a user-dependent model more efficiently. In this paper, a new adaptation method, called the partial linear regression (PLR), is proposed and adopted in an audio-driven talking head application. This method allows users to adapt the partial parameters from the available adaptive data while keeping the others unchanged. In our experiments, the PLR algorithm can retrench the hours of time spent on retraining a new user-dependent model, and adjust the user-independent model to a more personalized one. The animated results with adapted models are 36% closer to the user-dependent model than using the pre-trained user-independent model.

...read moreread less

Journal Article•DOI•

Distortion variation minimization in real-time video coding

[...]

Zhenzhong Chen¹, King Ngi Ngan¹•Institutions (1)

The Chinese University of Hong Kong¹

Adaptive Image Replica Detection based on Support Vector Classifiers

TL;DR: The experimental results demonstrate that using the MINVAR based rate distortion tradeoff framework, the decoded picture quality is smoother than the traditional H.264 joint model (JM) rate control without sacrificing global quality such that a better subjective visual quality is guaranteed.

...read moreread less

Abstract: In this paper, we review the rate distortion tradeoff issues in real-time video coding and introduce a minimum variation (MINVAR) distortion criterion based approach. The MINVAR based rate distortion tradeoff framework provides a local optimization strategy as a rate control mechanism in real-time video coding applications by minimizing the distortion variation while the corresponding bit rate fluctuation is limited by utilizing the encoder buffer. The proposed approach aims to achieve a smooth decoded picture quality for pleasing human visual experience. The performance of the proposed method is evaluated with H.264. The experimental results demonstrate that using the proposed approach, the decoded picture quality is smoother than the traditional H.264 joint model (JM) rate control without sacrificing global quality such that a better subjective visual quality is guaranteed.

...read moreread less

Journal Article•DOI•

[...]

Yannick Maret¹, Frederic Dufaux¹, Touradj Ebrahimi¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹