scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Electronic Imaging in 2014"


Journal ArticleDOI
TL;DR: This work presents a VQA algorithm that estimates quality via separate estimates of perceived degradation due to spatial distortion and joint spatial and temporal distortion, and demonstrates that this algorithm performs well in predicting video quality and is competitive with current state-of-the-art V QA algorithms.
Abstract: Algorithms for video quality assessment (VQA) aim to estimate the qualities of videos in a manner that agrees with human judgments of quality. Modern VQA algorithms often estimate video quality by comparing localized space-time regions or groups of frames from the reference and distorted videos, using comparisons based on visual features, statistics, and/or perceptual models. We present a VQA algorithm that estimates quality via separate estimates of perceived degradation due to (1) spatial distortion and (2) joint spatial and temporal distortion. The first stage of the algorithm estimates perceived quality degradation due to spatial distortion; this stage operates by adaptively applying to groups of spatial video frames the two strategies from the most apparent distortion algorithm with an extension to account for temporal masking. The second stage of the algorithm estimates perceived quality degradation due to joint spatial and temporal distortion; this stage operates by measuring the dissimilarity between the reference and distorted videos represented in terms of two-dimensional spatiotemporal slices. Finally, the estimates obtained from the two stages are combined to yield an overall estimate of perceived quality degradation. Testing on various video-quality databases demonstrates that our algorithm performs well in predicting video quality and is competitive with current state-of-the-art VQA algorithms.

188 citations


Journal ArticleDOI
TL;DR: This work investigates the use of local texture descriptors, namely local binary patterns, local phase quantization, and binarized statistical image features for robust human identification from two-dimensional ear imaging for improved recognition performance under illumination variation, pose variation, and partial occlusion.
Abstract: Automated personal identification using the shape of the human ear is emerging as an appealing modality in biometric and forensic domains. This is mainly due to the fact that the ear pattern can provide rich and stable information to differentiate and recognize people. In the literature, there are many approaches and descriptors that achieve relatively good results in constrained environments. The recognition performance tends, however, to significantly decrease under illumination variation, pose variation, and partial occlusion. In this work, we investigate the use of local texture descriptors, namely local binary patterns, local phase quantization, and binarized statistical image features for robust human identification from two-dimensional ear imaging. In contrast to global image descriptors which compute features directly from the entire image, local descriptors representing the features in small local image patches have proven to be more effective in real-world conditions. Our extensive experimental results on the benchmarks IIT Delhi-1, IIT Delhi-2, and USTB ear databases show that local texture features in general and BSIF in particular provide a significant performance improvement compared to the state-of-the-art. © 2014 SPIE and IS&T (DOI: 10.1117/1.JEI.23.5.053008)

76 citations


Journal ArticleDOI
Jun Wan1, Qiuqi Ruan1, Wei Li1, Gaoyun An1, Ruizhen Zhao1 
TL;DR: Experimental results show that the proposed feature outperforms other spatiotemporal features and are comparative to other state-of-the-art approaches, even though there is only one training sample for each class.
Abstract: Human activity recognition based on RGB-D data has received more attention in recent years. We propose a spatiotemporal feature named three-dimensional (3D) sparse motion scale-invariant feature transform (SIFT) from RGB-D data for activity recognition. First, we build pyramids as scale space for each RGB and depth frame, and then use Shi-Tomasi corner detector and sparse optical flow to quickly detect and track robust keypoints around the motion pattern in the scale space. Subsequently, local patches around keypoints, which are extracted from RGB-D data, are used to build 3D gradient and motion spaces. Then SIFT-like descriptors are calculated on both 3D spaces, respectively. The proposed feature is invariant to scale, transition, and partial occlusions. More importantly, the running time of the proposed feature is fast so that it is well-suited for real-time applications. We have evaluated the proposed feature under a bag of words model on three public RGB-D datasets: one-shot learning Chalearn Gesture Dataset, Cornell Activity Dataset-60, and MSR Daily Activity 3D dataset. Experimental results show that the proposed feature outperforms other spatiotemporal features and are comparative to other state-of-the-art approaches, even though there is only one training sample for each class.

47 citations


Journal ArticleDOI
TL;DR: This work proposes an alternative way to explore local properties of Retinex, replacing random paths by traces of a specialized swarm of termites, and discusses differences in path exploration with other retinex implementations.
Abstract: The original presentation of Retinex, a spatial color correction and image enhancement algorithm modeling the human vision system, as proposed by Land and McCann in 1964, uses paths to explore the image in search of a local reference white point. The interesting results of this algorithm have led to the development of many versions of Retinex. They follow the same principle but differ in the way they explore the image, with, for example, random paths, random samples, convolution masks, and variational formulations. We propose an alternative way to explore local properties of Retinex, replacing random paths by traces of a specialized swarm of termites. In presenting the spatial characteristics of the proposed method, we discuss differences in path exploration with other Retinex implementations. Experiments, results, and comparisons are presented to test the efficacy of the proposed Retinex implementation.

46 citations


Journal ArticleDOI
TL;DR: New approaches to analyze laser-gated viewing data for nonline-of-sight vision with a frame-to-frame back-projection as well as feature selection algorithms are discussed and it is demonstrated that the choice of the filter has an impact on the selectivity i.e., multiple target detection as well on the localization precision.
Abstract: We discuss new approaches to analyze laser-gated viewing data for nonline-of-sight vision with a frame-to-frame back-projection as well as feature selection algorithms. Although first back-projection approaches use time transients for each pixel, our method has the ability to calculate the projection of imaging data on the voxel space for each frame. Further, different data analysis algorithms and their sequential appli- cation were studied with the aim of identifying and selecting signals from different target positions. A slight modi- fication of commonly used filters leads to a powerful selection of local maximum values. It is demonstrated that the choice of the filter has an impact on the selectivity i.e., multiple target detection as well as on the localization precision. © 2014 SPIE and IS&T (DOI: 10.1117/1.JEI.23.6.063003)

38 citations


Journal ArticleDOI
TL;DR: The proposed algorithm addresses all three types of artifacts which are prevalent in JPEG images: blocking, and for edges blurring, and aliasing, and enhances the quality of the image via two stages.
Abstract: Transform coding using the discrete cosine transform is one of the most popular techniques for image and video compression. However, at low bit rates, the coded images suffer from severe visual distortions. An innovative approach is proposed that deals with artifacts in JPEG compressed images. Our algorithm addresses all three types of artifacts which are prevalent in JPEG images: blocking, and for edges blurring, and aliasing. We enhance the quality of the image via two stages. First, we remove blocking artifacts via boundary smoothing and guided filtering. Then, we reduce blurring and aliasing around the edges via a local edge-regeneration stage. We compared the proposed algorithm with other modern JPEG artifact-removal algorithms. The results demonstrate that the proposed approach is competitive, and can in many cases outperform, competing algorithms.

38 citations


Journal ArticleDOI
TL;DR: An approach of rapid hologram generation for the realistic three-dimensional (3-D) image reconstruction based on the angular tiling concept is proposed, using a new graphic rendering approach integrated with a previously developed layer-based method for hologram calculation.
Abstract: An approach of rapid hologram generation for the realistic three-dimensional (3-D) image reconstruction based on the angular tiling concept is proposed, using a new graphic rendering approach integrated with a previously developed layer-based method for hologram calculation. A 3-D object is simplified as layered cross-sectional images perpendicular to a chosen viewing direction, and our graphics rendering approach allows the incorporation of clear depth cues, occlusion, and shading in the generated holograms for angular tiling. The combination of these techniques together with parallel computing reduces the computation time of a single-view hologram for a 3-D image of extended graphics array resolution to 176 ms using a single consumer graphics processing unit card.

36 citations


Journal ArticleDOI
TL;DR: The results suggest that supplementing standard DXA BMD measurements with sophisticated femoral trabecular bone characterization and supervised learning techniques can significantly improve biomechanical strength prediction in proximal femur specimens.
Abstract: We investigate the use of different trabecular bone descriptors and advanced machine learning tech niques to complement standard bone mineral density (BMD) measures derived from dual-energy x-ray absorptiometry (DXA) for improving clinical assessment of osteoporotic fracture risk. For this purpose, volumes of interest were extracted from the head, neck, and trochanter of 146 ex vivo proximal femur specimens on multidetector computer tomography. The trabecular bone captured was characterized with (1) statistical moments of the BMD distribution, (2) geometrical features derived from the scaling index method (SIM), and (3) morphometric parameters, such as bone fraction, trabecular thickness, etc. Feature sets comprising DXA BMD and such supplemental features were used to predict the failure load (FL) of the specimens, previously determined through biomechanical testing, with multiregression and support vector regression. Prediction performance was measured by the root mean square error (RMSE); correlation with measured FL was evaluated using the coefficient of determination R2. The best prediction performance was achieved by a combination of DXA BMD and SIM-derived geometric features derived from the femoral head (RMSE: 0.869 ± 0.121, R2: 0.68 ± 0.079), which was significantly better than DXA BMD alone (RMSE: 0.948 ± 0.119, R2: 0.61 ± 0.101) (p < 10-4). For multivariate feature sets, SVR outperformed multiregression (p < 0.05). These results suggest that supplementing standard DXA BMD measurements with sophisticated femoral trabecular bone characterization and supervised learning techniques can significantly improve biomechanical strength prediction in proximal femur specimens.

35 citations


Journal ArticleDOI
TL;DR: The results showed that binocular disparity and relative size improved depth judgments over the distance range, indicating that for accurate depth judgments, additional depth cues should be used to facilitate stereoscopic perception within an individual’s action space.
Abstract: Depth perception is an important component of many augmented reality applications. It is, however, subject to multiple error sources. In this study, we investigated depth judgments with a stereoscopic video see-through head-mounted display for the purpose of designing depth cueing for systems that operate in an individual's action space. In the experiment, we studied the use of binocular disparity and relative size to improve relative depth judgments of augmented objects above the ground plane. The relative size cue was created by adding auxiliary augmentations to the scene according to constraints described in the section on the underlying theory. The results showed that binocular disparity and relative size improved depth judgments over the distance range. This indicates that for accurate depth judgments, additional depth cues should be used to facilitate stereo- scopic perception within an individual's action space. © The Authors. Published by SPIE under a Creative Commons Attribution 3.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI. (DOI: 10.1117/1.JEI.23.1.011006)

34 citations


Journal ArticleDOI
TL;DR: The compressive sensing (CS) recovery algorithm is employed, which can utilize sparsity restriction to diminish the effect of the motion-induced error on image reconstruction, andSimulations are designed to illustrate the three factors of the target-motion- induced error.
Abstract: Radar coincidence imaging (RCI) is a new instantaneous imaging technique that does not depend on Doppler frequency for resolution. Such an imaging method does not require target relative motion and has an imaging interval that is even shorter than a pulse width. The potential advantages in processing both the relatively stationary and maneuvering targets make RCI provide a supplementary imaging approach for the conventional range Doppler imaging methods. The simulation experiments have preliminarily demonstrated the feasibility of the RCI technique. However, further investigations show that the imaging error arises for mov- ing targets, and moreover, it is particularly related to target scattering maps. The paper analyzes the target- motion-induced error and points out that three factors are involved: target velocity, target scattering map, and the time-space independence of detecting signals. The current image-reconstruction algorithms of RCI, which are based on the least-square (LS) principle, are found to be seriously sensitive to the motion-induced errors and will be limited in practical imaging scenarios. Accordingly, the compressive sensing (CS) recovery algo- rithm is employed, which can utilize sparsity restriction to diminish the effect of the motion-induced error on image reconstruction. Simulations are designed to illustrate the three factors of the target-motion-induced error. The imaging performance of the LS and the CS methods in RCI image recovery are compared as well. ©T he Authors. Published by SPIE under a Creative Commons Attribution 3.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI. (DOI: 10.1117/1.JEI.23.2.023014)

33 citations


Journal ArticleDOI
TL;DR: To create a seamless viewing experience for multiple viewers, this technique smoothly interpolate the set of viewer heights and distances on a per-vertex basis across the array’s field of view, reducing image distortion, cross talk, and artifacts from tracking errors.
Abstract: We present a technique for achieving tracked vertical parallax for multiple users using a variety of autostereoscopic projector array setups, including front- and rear-projection and curved display surfaces. This hybrid parallax approach allows for immediate horizontal parallax as viewers move left and right and tracked parallax as they move up and down, allowing cues such as three-dimensional (3-D) perspective and eye contact to be conveyed faithfully. We use a low-cost RGB-depth sensor to simultaneously track multiple viewer head positions in 3-D space, and we interactively update the imagery sent to the array so that imagery directed to each viewer appears from a consistent and correct vertical perspective. Unlike previous work, we do not assume that the imagery sent to each projector in the array is rendered from a single vertical perspective. This lets us apply hybrid parallax to displays where a single projector forms parts of multiple viewers’ imagery. Thus, each individual projected image is rendered with multiple centers of projection, and might show an object from above on the left and from below on the right. We demonstrate this technique using a dense horizontal array of pico-projectors aimed into an anisotropic vertical diffusion screen, yielding 1.5 deg angular resolution over 110 deg field of view. To create a seamless viewing experience for multiple viewers, we smoothly interpolate the set of viewer heights and distances on a per-vertex basis across the array’s field of view, reducing image distortion, cross talk, and artifacts from tracking errors.

Journal ArticleDOI
TL;DR: An ergodic method is proposed for tracing the ideal rotation matrix and the preferred locations of the optical centers to obtain the ideal projection matrix from which the image rectifying transform matrix (IRTM) is further derived.
Abstract: An algorithm for multiview images captured by a parallel multicamera array is presented. An ergodic method is proposed for tracing the ideal rotation matrix and the preferred locations of the optical centers. Moreover, the camera's intrinsic parameters are optimized in order to obtain the ideal projection matrix from which the image rectifying transform matrix (IRTM) is further derived. Then, the multiview image is rectified based on the IRTM. Experimental results show that this proposed algorithm dramatically improves accuracy. Moreover, the robustness of this algorithm is superior to the robustness of the previous method. © 2014 SPIE and IS&T (DOI: 10.1117/1.JEI.23.3.033001)

Journal ArticleDOI
TL;DR: An approach for fast computation of CTM, CKM, and CHM discrete orthogonal moments using the image block representation for binary images and image slice representation for grayscale images is presented.
Abstract: We propose a new set of bivariate discrete orthogonal polynomials, which are the product of Charlier’s discrete orthogonal polynomials with one variable by Tchebichef, Krawtchouk, and Hahn discrete orthogonal polynomials with one variable This set of bivariate discrete orthogonal polynomials is used to define several new types of discrete orthogonal moments such as Charlier-Tchebichef moments (CTM), Charlier-Krawtchouk moments (CKM), and Charlier-Hahn moments (CHM) We also present an approach for fast computation of CTM, CKM, and CHM discrete orthogonal moments using the image block representation for binary images and image slice representation for grayscale images A set of Charlier-Tchebichef invariant moments, Charlier-Krawtchouk invariant moments, and Charlier-Hahn invariant moments is also presented These invariant moments are derived algebraically from the geometric invariant moments, and their computation is accelerated using an image representation scheme The presented algorithms are tested in several well-known computer vision datasets including image reconstruction, computational time, moment’s invariability, and classification of objects The performance of these invariant moments used as pattern features for a pattern classification is compared with Hu, Legendre, Tchebichef-Krawtchouk, Tchebichef-Hahn, and Krawtchouk-Hahn invariant moments

Journal ArticleDOI
TL;DR: An efficient reversible data hiding scheme for encrypted H.264/AVC videos is proposed that can perform data hiding in encrypted videos without decryption, which preserves the confidentiality of the content.
Abstract: Due to the security and privacy-preserving requirements for cloud data management, it is sometimes desired that video content is accessible in an encrypted form. Reversible data hiding in the encrypted domain is an emerging technology, as it can perform data hiding in encrypted videos without decryption, which preserves the confidentiality of the content. Furthermore, the original cover can be losslessly restored after decryption and data extraction. An efficient reversible data hiding scheme for encrypted H.264/AVC videos is proposed. During H.264/AVC encoding, the intraprediction mode, motion vector difference, and the sign bits of the residue coefficients are encrypted using a standard stream cipher. Then, the data-hider who does not know the original video content, may reversibly embed secret data into the encrypted H.264/AVC video by using a modified version of the histogram shifting technique. A scale factor is utilized for selecting the embedding zone, which is scalable for different capacity requirements. With an encrypted video containing hidden data, data extraction can be carried out either in the encrypted or decrypted domain. In addition, real reversibility is realized so that data extraction and video recovery are free of any error. Experimental results demonstrate the feasibility and efficiency of the proposed scheme.

Journal ArticleDOI
TL;DR: A computational approach to generate realistic DoF effects for mobile devices such as tablets by calibrating the rear-facing stereo cameras and rectifying the stereo image pairs through FCam API, and generating a low-res disparity map using graph cuts stereo matching and subsequently upsample it via joint bilateral upsampling.
Abstract: The depth of field (DoF) effect is a useful tool in photography and cinematography because of its aesthetic value. However, capturing and displaying dynamic DoF effect were until recently a quality unique to expensive and bulky movie cameras. A computational approach to generate realistic DoF effects for mobile devices such as tablets is proposed. We first calibrate the rear-facing stereo cameras and rectify the stereo image pairs through FCam API, then generate a low-res disparity map using graph cuts stereo matching and subsequently upsample it via joint bilateral upsampling. Next, we generate a synthetic light field by warping the raw color image to nearby viewpoints, according to the corresponding values in the upsampled high-resolution disparity map. Finally, we render dynamic DoF effect on the tablet screen with light field rendering. The user can easily capture and generate desired DoF effects with arbitrary aperture sizes or focal depths using the tablet only, with no additional hardware or software required. The system has been examined in a variety of environments with satisfactory results, according to the subjective evaluation tests.

Journal ArticleDOI
Jing Hu1, Yupin Luo1
TL;DR: This work proposes a single-image SR method that learns the LR-HR relations from the given LR image itself instead of any external images, and provides photorealistic HR images with sharp edges.
Abstract: The challenge of learning-based superresolution (SR) is to predict the relationships between low-resolution (LR) patches and their corresponding high-resolution (HR) patches. By learning such relationships from external training images, the existing learning-based SR approaches are often affected by the relevance between the training data and the LR input image. Therefore, we propose a single-image SR method that learns the LR-HR relations from the given LR image itself instead of any external images. Both the local regression model and nonlocal patch redundancy are exploited in the proposed method. The local regression model is employed to derive the mapping functions between self-LR-HR example patches, and the nonlocal self-similarity gives rise to a high-order derivative estimation of the derived mapping function. Moreover, to fully exploit the multiscale similarities inside the LR input image, we accumulate the previous reconstruction results and their corresponding LR versions as additional example patches for the subsequent estimation process, and adopt a gradual magnification scheme to achieve the desired zooming size step by step. Extensive experiments on benchmark images have validated the effectiveness of the proposed method. Compared to other state-of-the-art SR approaches, the proposed method provides photorealistic HR images with sharp edges.

Journal ArticleDOI
TL;DR: A color-to-gray conversion algorithm that retains both the overall appearance and the discriminability of details of the input color image by qualitative and quantitative comparison with several state-of-the-art methods is presented.
Abstract: We present a color-to-gray conversion algorithm that retains both the overall appearance and the discriminability of details of the input color image. The algorithm employs a weighted pyramid image fusion scheme to blend the R, G, and B color channels of the input image into a single grayscale image. The use of simple visual quality metrics as weights in the fusion scheme serves to retain visual contrast from each of the input color channels. We demonstrate the effectiveness of the method by qualitative and quantitative comparison with several state-of-the-art methods. © 2014 SPIE and IS&T.

Journal ArticleDOI
TL;DR: Two stereoscopic approaches of the CELLmicrocosmos project will be introduced, which address the stereoscopic scaling problem, and it is shown that it is possible to generate statically rendered as well as interactive stereoscopic cell visualizations.
Abstract: Cell visualization is an important area of scientific and educational visualization. There is already a number of astonishing animations illustrating the structural and functional properties of biological cells available in the Internet. However, these visualizations usually do not take advantage of three-dimensional (3-D) stereoscopic techniques. The stereoscopic visualization of the microcosmos cell—invisible to the human eye—bears high potential for educational as well as scientific approaches. Using open source tools it will be shown that it is possible to generate statically rendered as well as interactive stereoscopic cell visualizations. First, the 3-D modeling software Blender in conjunction with Schneider’s stereoscopic camera plug-in will be used to generate a stereoscopic cell animation. While static renderings have an advantage in that the stereoscopic effect can be optimized for spectators, interactive stereoscopic visualizations always have to adjust and optimize the stereoscopic effect for users who can freely navigate through space. Cell visualization is paradigmatic for this problem because the scale differences from the mesoscopic to the molecular level account for a factor of 100,000. Therefore, two stereoscopic approaches of the CELLmicrocosmos project will be introduced, which address the stereoscopic scaling problem. The stereoscopic quality was positively evaluated by 20 students.

Journal ArticleDOI
TL;DR: A parameterized model is developed and presented for a spatially varying PSF due to lens aberrations and defocus in an imaging system and is able to unify a set of hundreds of PSF observations across the image plane into a single 10-parameter model.
Abstract: Optical blur due to lens aberrations and defocus has been demonstrated in some cases to be spatially varying across the image plane. However, existing models in the literature for the point-spread function (PSF) corresponding to this blur are either parameterized and spatially invariant or spatially varying but ad-hoc and discretely defined. A parameterized model is developed and presented for a spatially varying PSF due to lens aberrations and defocus in an imaging system. The model is motivated from an established theoretical framework in physics and is demonstrated to be able to unify a set of hundreds of PSF observations across the image plane into a single 10-parameter model. The accuracy of the model is demonstrated with simulations and measurement data collected by two separate research groups.

Journal ArticleDOI
TL;DR: This work introduces several important novelties, such as the capability to learn actions based on both positive and negative samples, the possibility of efficiently retraining the system in the presence of misclassified or unrecognized events, and the use of a parsing procedure that allows correct detection of the activities also when they are concatenated and/or nested one with each other.
Abstract: Automatic recognition of human activities and behaviors is still a challenging problem for many reasons, including limited accuracy of the data acquired by sensing devices, high variability of human behaviors, and gap between visual appearance and scene semantics Symbolic approaches can significantly simplify the analysis and turn raw data into chains of meaningful patterns This allows getting rid of most of the clutter produced by low-level processing operations, embedding significant contextual information into the data, as well as using simple syntactic approaches to perform the matching between incoming sequences and models We propose a symbolic approach to learn and detect complex activities through the sequences of atomic actions Compared to previous methods based on context-free grammars, we introduce several important novelties, such as the capability to learn actions based on both positive and negative samples, the possibility of efficiently retraining the system in the presence of misclassified or unrecognized events, and the use of a parsing procedure that allows correct detection of the activities also when they are concatenated and/or nested one with each other An experimental validation on three datasets with different characteristics demonstrates the robustness of the approach in classifying complex human behaviors

Journal ArticleDOI
TL;DR: The comparative results suggest that the proposed finger multibiometric cryptosystem at feature-level fusion outperforms other approaches in terms of verification performance and template security.
Abstract: We address two critical issues in the design of a finger multibiometric system, i.e., fusion strategy and template security. First, three fusion strategies (feature-level, score-level, and decision-level fusions) with the corresponding template protection technique are proposed as the finger multibiometric cryptosystems to protect multiple finger biometric templates of fingerprint, finger vein, finger knuckle print, and finger shape modalities. Second, we theoretically analyze different fusion strategies for finger multibiometric cryptosystems with respect to their impact on security and recognition accuracy. Finally, the performance of finger multibiometric cryptosystems at different fusion levels is investigated on a merged finger multimodal biometric database. The comparative results suggest that the proposed finger multibiometric cryptosystem at feature-level fusion outperforms other approaches in terms of verification performance and template security.

Journal ArticleDOI
TL;DR: This work investigates the benefits of incorporating bottom-up video saliency maps (obtained using Itti’s computational model) into video quality metrics and compares the performance of four full-reference videoquality metrics with their modified versions, which had saliencyMaps incorporated into the algorithm.
Abstract: A recent development in the area of image and video quality consists of trying to incorporate aspects of visual attention in the design of visual quality metrics, mostly using the assumption that visual distortions appearing in less salient areas might be less visible and, therefore, less annoying. This research area is still in its infancy and results obtained by different groups are not yet conclusive. Among the works that have reported some improvements, most use subjective saliency maps, i.e., saliency maps generated from eye-tracking data obtained experimentally. Other works address the image quality problem, not focusing on how to incorporate visual attention into video signals. We investigate the benefits of incorporating bottom-up video saliency maps (obtained using Itti's computational model) into video quality metrics. In particular, we com- pare the performance of four full-reference video quality metrics with their modified versions, which had saliency maps incorporated into the algorithm. Results show that the addition of video saliency maps improve the per- formance of most quality metrics tested, but the highest gains were obtained for the metrics that only took into consideration spatial degradations. © 2014 SPIE and IS&T (DOI: 10.1117/1.JEI.23.6.061107)

Journal ArticleDOI
TL;DR: The experimental results demonstrate that as well as achieving further powerful local Gabor features of multimodalities and obtaining better recognition performance by their fusion strategy, the architecture also outperforms some state-of-the-art individual methods and other fusion approaches for face-iris-fingerprint multimodal biometric systems.
Abstract: A multimodal biometric system has been considered a promising technique to overcome the defects of unimodal biometric systems. We have introduced a fusion scheme to gain a better understanding and fusion method for a face-iris-fingerprint multimodal biometric system. In our case, we use particle swarm optimization to train a set of adaptive Gabor filters in order to achieve the proper Gabor basic functions for each modality. For a closer analysis of texture information, two different local Gabor features for each modality are produced by the corresponding Gabor coefficients. Next, all matching scores of the two Gabor features for each modality are projected to a single-scalar score via a trained, supported, vector regression model for a final decision. A large-scale dataset is formed to validate the proposed scheme using the Facial Recognition Technology database-fafb and CASIA-V3-Interval together with FVC2004-DB2a datasets. The experimental results demonstrate that as well as achieving further powerful local Gabor features of multimodalities and obtaining better recognition performance by their fusion strategy, our architecture also outperforms some state-of-the-art individual methods and other fusion approaches for face-iris-fingerprint multimodal biometric systems.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed algorithm can significantly reduce computational complexity of 3-D-HEVC while maintaining nearly the same rate distortion performance as the original encoder.
Abstract: In the test model of high efficiency video coding (HEVC) standard-based three-dimensional (3-D) video coding (3-D-HEVC), the variable size motion estimation (ME) and disparity estimation (DE) have been employed to select the best coding mode for each treeblock in the encoding process. This technique achieves the highest possible coding efficiency, but it brings extremely high computational complexity that limits 3-D- HEVC from practical applications. An early SKIP mode decision algorithm based on spatial and interview cor- relations is proposed to reduce the computational complexity of the ME/DE procedures. The basic idea of the method is to utilize the spatial and interview properties of coding information in previous coded frames to predict the current treeblock prediction mode and early skip unnecessary variable-size ME and DE. Experimental results show that the proposed algorithm can significantly reduce computational complexity of 3-D-HEVC while maintaining nearly the same rate distortion performance as the original encoder. © 2014 SPIE and IS&T (DOI: 10.1117/1.JEI.23.5.053017)

Journal ArticleDOI
TL;DR: The viewing zone of an autostereoscopic display with a directional backlight using a convex lens array with defocusing and field curvature of the lens should be taken into account based on optical simulations.
Abstract: When a directional backlight to each eye alternates synchronously with the alternation of left-eye and right-eye images on the display panel, the viewer can see a stereoscopic image without wearing special goggles. One way to realize a directional backlight is to place a convex lens array in front of dot matrix light sources to generate collimated light. To implement this method, however, defocusing and field curvature of the lens should be taken into account. The viewing zone of an autostereoscopic display with a directional backlight using a convex lens array is analyzed based on optical simulations.

Journal ArticleDOI
TL;DR: This work presents a multi-line-scan light-field image acquisition and processing system designed for 2.5/3-D inspection of fine surface structures in industrial environments and compares several approaches based on testing a set of slope hypotheses in the EPI domain.
Abstract: We present a multi-line-scan light-field image acquisition and processing system designed for 2.5/3-D inspection of fine surface structures in industrial environments. The acquired three-dimensional light field is composed of multiple observations of an object viewed from different angles. The acquisition system consists of an area-scan camera that allows for a small number of sensor lines to be extracted at high frame rates, and a mechanism for transporting an inspected object at a constant speed and direction. During acquisition, an object is moved orthogonally to the camera’s optical axis as well as the orientation of the sensor lines and a predefined subset of lines is read out from the sensor at each time step. This allows for the construction of so-called epipolar plane images (EPIs) and subsequent EPI-based depth estimation. We compare several approaches based on testing a set of slope hypotheses in the EPI domain. Hypotheses are derived from block matching, namely the sum of absolute differences, modified sum of absolute differences, normalized cross correlation, census transform, and modified census transform. Results for depth estimation and all-in-focus image generation are presented for synthetic and real data.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed framework can provide more accurate registration than existing methods and be able to incorporate global information in the evaluation of triplets of mappings.
Abstract: A multimodal image registration framework based on searching the best matched keypoints and the incorporation of global information is proposed. It comprises two key elements: keypoint detection and an iterative process. Keypoints are detected from both the reference and test images. For each test keypoint, a number of reference keypoints are chosen as mapping candidates. A triplet of keypoint mappings determine an affine transformation that is evaluated using a similarity metric between the reference image and the transformed test image by the determined transformation. An iterative process is conducted on triplets of keypoint mappings, keeping track of the best matched reference keypoint. Random sample consensus and mutual information are applied to eliminate outlier keypoint mappings. The similarity metric is defined to be the number of overlapped edge pixels over the entire images, allowing for global information to be incorporated in the evaluation of triplets of mappings. The performance of the framework is investigated with keypoints extracted by scale invariant feature transform and partial intensity invariant feature descriptor. Experimental results show that the proposed framework can provide more accurate registration than existing methods.

Journal ArticleDOI
TL;DR: Two improvement techniques for stereo matching algorithms using silicon retina sensors are presented, one of which is an adapted belief propagation approach optimizing the initial matching cost volume and the other an innovative two-stage postfilter for smoothing and outlier rejection.
Abstract: We present two improvement techniques for stereo matching algorithms using silicon retina sensors. We verify the results with ground truth data. In contrast to conventional monochrome/color cameras, silicon retina sensors deliver an asynchronous flow of events instead of common framed and discrete intensity or color images. While using this kind of sensor in a stereo setup to enable new fields of applications, it also intro- duces new challenges in terms of stereo image analysis. Using this type of sensor, stereo matching algorithms have to deal with sparse event data, thus, less information. This affects the quality of the achievable disparity results and renders improving the stereo matching algorithms a necessary task. For this reason, we introduce two techniques for increasing the accuracy of silicon retina stereo results, in the sense that the average distance error is reduced. The first method is an adapted belief propagation approach optimizing the initial matching cost volume, and the second is an innovative two-stage postfilter for smoothing and outlier rejection. The evaluation shows that the proposed techniques increase the accuracy of the stereo matching and constitute a useful exten- sion for using silicon retina sensors for depth estimation. © 2014 SPIE and IS&T (DOI: 10.1117/1.JEI.23.4.043011)

Journal ArticleDOI
Jing Dong1, Yang Xia1, Qifeng Yu1, Ang Su1, Wang Hou1 
TL;DR: Experiments show that the proposed instantaneous video stabilization method can stabilize various videos without the need for user interaction or costly 3-D reconstruction, and it works as an instant-process for videos from an online source.
Abstract: Video stabilization is a critical step for improving the quality of videos captured by unmanned aerial vehicles. However, the complicated scenarios in the video and the need for instantaneously presenting a stabilized image posed significant challenges to the existing methods. In this work, an instantaneous video stabilization method for unmanned aerial vehicles is proposed. This new approach serves several purposes: smoothing the video motion in both two-dimensional and three-dimensional (3-D) scenes, decreasing the lags in response, and instantaneously providing the stabilized image to users. For each input frame, our approach regenerates four short motion trajectories by applying interframe transformations to the four corners of the image rectangle. An adaptive filter is then performed to smooth motion trajectories and suppress the lags in response simultaneously. Finally, at the stage of image composition, the quality of image is considered for selecting a visually plausible stabilized video. Experiments show that our approach can stabilize various videos without the need for user interaction or costly 3-D reconstruction, and it works as an instant-process for videos from an online source.

Journal ArticleDOI
TL;DR: This saliency model is used to predict the human eye fixation, and has been tested on the most widely used three benchmark datasets and compared with eight state-of-the-art saliency models.
Abstract: Based on the fact that human attention is more likely to be attracted by different objects or statistical outliers of a scene, a bottom-up saliency detection model is proposed. Our model regards the saliency patterns of an image as the outliers in a dataset. For an input image, first, each image element is described as a feature vector. The whole image is considered as a dataset and an image element is classified as a saliency pattern if its corresponding feature vector is an outlier among the dataset. Then, a binary label map can be built to indicate the salient and the nonsalient elements in the image. According to the Boolean map theory, we compute multiple binary maps as a set of Boolean maps which indicate the outliers in multilevels. Finally, we linearly fused them into the final saliency map. This saliency model is used to predict the human eye fixation, and has been tested on the most widely used three benchmark datasets and compared with eight state-of-the-art saliency models. In our experiments, we adopt the shuffled the area under curve metric to evaluate the accuracy of our model. The experimental results show that our model outperforms the state-of-the-art models on all three datasets.