scispace - formally typeset
Search or ask a question

Showing papers in "Eurasip Journal on Image and Video Processing in 2013"


Journal ArticleDOI
TL;DR: An automated species identification method for wildlife pictures captured by remote camera traps that uses improved sparse coding spatial pyramid matching (ScSPM), which extracts dense SIFT descriptor and cell-structured LBP as the local features and generates global feature via weighted sparse coding and max pooling using multi-scale pyramid kernel.
Abstract: Image sensors are increasingly being used in biodiversity monitoring, with each study generating many thousands or millions of pictures. Efficiently identifying the species captured by each image is a critical challenge for the advancement of this field. Here, we present an automated species identification method for wildlife pictures captured by remote camera traps. Our process starts with images that are cropped out of the background. We then use improved sparse coding spatial pyramid matching (ScSPM), which extracts dense SIFT descriptor and cell-structured LBP (cLBP) as the local features, that generates global feature via weighted sparse coding and max pooling using multi-scale pyramid kernel, and classifies the images by a linear support vector machine algorithm. Weighted sparse coding is used to enforce both sparsity and locality of encoding in feature space. We tested the method on a dataset with over 7,000 camera trap images of 18 species from two different field cites, and achieved an average classification accuracy of 82%. Our analysis demonstrates that the combination of SIFT and cLBP can serve as a useful technique for animal species recognition in real, complex scenarios.

184 citations


Journal ArticleDOI
TL;DR: A new fast feature extraction strategy that uses the 3D point cloud obtained from the frames in a gait cycle that improves the accuracy significantly, compared with state-of-the-art systems which do not use depth information.
Abstract: This article presents a new approach for gait-based gender recognition using depth cameras, that can run in real time. The main contribution of this study is a new fast feature extraction strategy that uses the 3D point cloud obtained from the frames in a gait cycle. For each frame, these points are aligned according to their centroid and grouped. After that, they are projected into their PCA plane, obtaining a representation of the cycle particularly robust against view changes. Then, final discriminative features are computed by first making a histogram of the projected points and then using linear discriminant analysis. To test the method we have used the DGait database, which is currently the only publicly available database for gait analysis that includes depth information. We have performed experiments on manually labeled cycles and over whole video sequences, and the results show that our method improves the accuracy significantly, compared with state-of-the-art systems which do not use depth information. Furthermore, our approach is insensitive to illumination changes, given that it discards the RGB information. That makes the method especially suitable for real applications, as illustrated in the last part of the experiments section.

181 citations


Journal ArticleDOI
Xiaolin Shen1, Lu Yu1
TL;DR: This article proposes a CU splitting early termination algorithm to reduce the heavy computational burden on encoder and is modeled as a binary classification problem, on which a support vector machine (SVM) is applied.
Abstract: High efficiency video coding (HEVC) is the latest video coding standard that has been developed by JCT-VC. It employs plenty of efficient coding algorithms (e.g., highly flexible quad-tree coding block partitioning), and outperforms H.264/AVC by 35–43% bitrate reduction. However, it imposes enormous computational complexity on encoder due to the optimization processing in the efficient coding tools, especially the rate distortion optimization on coding unit (CU), prediction unit, and transform unit. In this article, we propose a CU splitting early termination algorithm to reduce the heavy computational burden on encoder. CU splitting is modeled as a binary classification problem, on which a support vector machine (SVM) is applied. In order to reduce the impact of outliers as well as to maintain the RD performance while a misclassification occurs, RD loss due to misclassification is introduced as weights in SVM training. Efficient and representative features are extracted and optimized by a wrapper approach to eliminate dependency on video content as well as on encoding configurations. Experimental results show that the proposed algorithm can achieve about 44.7% complexity reduction on average with only 1.35% BD-rate increase under the “random access” configuration, and 41.9% time saving with 1.66% BD-rate increase under the “low delay” setting, compared with the HEVC reference software.

165 citations


Journal ArticleDOI
TL;DR: The purpose of this survey is to give an overview of landmarking algorithms and their progress over the last decade, categorize them and show comparative performance statistics of the state of the art.
Abstract: Face landmarking, defined as the detection and localization of certain characteristic points on the face, is an important intermediary step for many subsequent face processing operations that range from biometric recognition to the understanding of mental states. Despite its conceptual simplicity, this computer vision problem has proven extremely challenging due to inherent face variability as well as the multitude of confounding factors such as pose, expression, illumination and occlusions. The purpose of this survey is to give an overview of landmarking algorithms and their progress over the last decade, categorize them and show comparative performance statistics of the state of the art. We discuss the main trends and indicate current shortcomings with the expectation that this survey will provide further impetus for the much needed high-performance, real-life face landmarking operating at video rates.

130 citations


Journal ArticleDOI
TL;DR: A novel integrated approach which exploits features of uniform robust scale invariant feature transform (UR-SIFT) and PIIFD and is robust against low content contrast of color images and large content, appearance, and scale changes between color and other retinal image modalities like the fluorescein angiography.
Abstract: Existing algorithms based on scale invariant feature transform (SIFT) and Harris corners such as edge-driven dual-bootstrap iterative closest point and Harris-partial intensity invariant feature descriptor (PIIFD) respectivley have been shown to be robust in registering multimodal retinal images. However, they fail to register color retinal images with other modalities in the presence of large content or scale changes. Moreover, the approaches need preprocessing operations such as image resizing to do well. This restricts the application of image registration for further analysis such as change detection and image fusion. Motivated by the need for efficient registration of multimodal retinal image pairs, this paper introduces a novel integrated approach which exploits features of uniform robust scale invariant feature transform (UR-SIFT) and PIIFD. The approach is robust against low content contrast of color images and large content, appearance, and scale changes between color and other retinal image modalities like the fluorescein angiography. Due to low efficiency of standard SIFT detector for multimodal images, the UR-SIFT algorithm extracts high stable and distinctive features in the full distribution of location and scale in images. Then, feature points are adequate and repeatable. Moreover, the PIIFD descriptor is symmetric to contrast, which makes it suitable for robust multimodal image registration. After the UR-SIFT feature extraction and the PIIFD descriptor generation in images, an initial cross-matching process is performed and followed by a mismatch elimination algorithm. Our dataset consists of 120 pairs of multimodal retinal images. Experiment results show the outperformance of the UR-SIFT-PIIFD over the Harris-PIIFD and similar algorithms in terms of efficiency and positional accuracy.

97 citations


Journal ArticleDOI
TL;DR: The robustness to noise for the eight following LBP-based descriptors are evaluated; improved LBP, median binary patterns (MBP), local ternary patterns (LTP), improved LTP (ILTP), local quinary patterns, robust L BP, and fuzzy LBP (FLBP).
Abstract: Local binary pattern (LBP) operators have become commonly used texture descriptors in recent years. Several new LBP-based descriptors have been proposed, of which some aim at improving robustness to noise. To do this, the thresholding and encoding schemes used in the descriptors are modified. In this article, the robustness to noise for the eight following LBP-based descriptors are evaluated; improved LBP, median binary patterns (MBP), local ternary patterns (LTP), improved LTP (ILTP), local quinary patterns, robust LBP, and fuzzy LBP (FLBP). To put their performance into perspective they are compared to three well-known reference descriptors; the classic LBP, Gabor filter banks (GF), and standard descriptors derived from gray-level co-occurrence matrices. In addition, a roughly five times faster implementation of the FLBP descriptor is presented, and a new descriptor which we call shift LBP is introduced as an even faster approximation to the FLBP. The texture descriptors are compared and evaluated on six texture datasets; Brodatz, KTH-TIPS2b, Kylberg, Mondial Marmi, UIUC, and a Virus texture dataset. After optimizing all parameters for each dataset the descriptors are evaluated under increasing levels of additive Gaussian white noise. The discriminating power of the texture descriptors is assessed using tenfolded cross-validation of a nearest neighbor classifier. The results show that several of the descriptors perform well at low levels of noise while they all suffer, to different degrees, from higher levels of introduced noise. In our tests, ILTP and FLBP show an overall good performance on several datasets. The GF are often very noise robust compared to the LBP-family under moderate to high levels of noise but not necessarily the best descriptor under low levels of added noise. In our tests, MBP is neither a good texture descriptor nor stable to noise.

86 citations


Journal ArticleDOI
TL;DR: An automated framework for photo identification of chimpanzees including face detection, face alignment, and face recognition is presented, which can be used by biologists, researchers, and gamekeepers to estimate population sizes faster and more precisely than the current frameworks.
Abstract: Due to the ongoing biodiversity crisis, many species including great apes like chimpanzees are on the brink of extinction. Consequently, there is an urgent need to protect the remaining populations of threatened species. To overcome the catastrophic decline of biodiversity, biologists and gamekeepers recently started to use remote cameras and recording devices for wildlife monitoring in order to estimate the size of remaining populations. However, the manual analysis of the resulting image and video material is extremely tedious, time consuming, and cost intensive. To overcome the burden of time-consuming routine work, we have recently started to develop computer vision algorithms for automated chimpanzee detection and identification of individuals. Based on the assumption that humans and great apes share similar properties of the face, we proposed to adapt and extend face detection and recognition algorithms, originally developed to recognize humans, for chimpanzee identification. In this paper we do not only summarize our earlier work in the field, we also extend our previous approaches towards a more robust system which is less prone to difficult lighting situations, various poses, and expressions as well as partial occlusion by branches, leafs, or other individuals. To overcome the limitations of our previous work, we combine holistic global features and locally extracted descriptors using a decision fusion scheme. We present an automated framework for photo identification of chimpanzees including face detection, face alignment, and face recognition. We thoroughly evaluate our proposed algorithms on two datasets of captive and free-living chimpanzee individuals which were annotated by experts. In three experiments we show that the presented framework outperforms previous approaches in the field of great ape identification and achieves promising results. Therefore, our system can be used by biologists, researchers, and gamekeepers to estimate population sizes faster and more precisely than the current frameworks. Thus, the proposed framework for chimpanzee identification has the potential to open up new venues in efficient wildlife monitoring and can help researches to develop innovative protection schemes in the future.

69 citations


Journal ArticleDOI
TL;DR: How several single image defogging methods work using a color ellipsoid framework using a Gaussian mixture model to account for multiple mixtures is explained, giving intuition in more complex observation windows, such as observations at depth discontinuities.
Abstract: The goal of this article is to explain how several single image defogging methods work using a color ellipsoid framework. The foundation of the framework is the atmospheric dichromatic model which is analogous to the reflectance dichromatic model. A key step in single image defogging is the ability to estimate relative depth. Therefore, properties of the color ellipsoids are tied to depth cues within an image. This framework is then extended using a Gaussian mixture model to account for multiple mixtures which gives intuition in more complex observation windows, such as observations at depth discontinuities which is a common problem in single image defogging. A few single image defogging methods are analyzed within this framework and surprisingly tied together with a common approach in using a dark prior. A new single image defogging method based on the color ellipsoid framework is introduced and compared to existing methods.

62 citations


Journal ArticleDOI
TL;DR: A fully automated method for the detection and tracking of elephants in wildlife video which has been collected by biologists in the field is proposed and shows that both near- and far-distant elephants can be detected and tracked reliably.
Abstract: Biologists often have to investigate large amounts of video in behavioral studies of animals. These videos are usually not sufficiently indexed which makes the finding of objects of interest a time-consuming task. We propose a fully automated method for the detection and tracking of elephants in wildlife video which has been collected by biologists in the field. The method dynamically learns a color model of elephants from a few training images. Based on the color model, we localize elephants in video sequences with different backgrounds and lighting conditions. We exploit temporal clues from the video to improve the robustness of the approach and to obtain spatial and temporal consistent detections. The proposed method detects elephants (and groups of elephants) of different sizes and poses performing different activities. The method is robust to occlusions (e.g., by vegetation) and correctly handles camera motion and different lighting conditions. Experiments show that both near- and far-distant elephants can be detected and tracked reliably. The proposed method enables biologists efficient and direct access to their video collections which facilitates further behavioral and ecological studies. The method does not make hard constraints on the species of elephants themselves and is thus easily adaptable to other animal species.

60 citations


Journal ArticleDOI
TL;DR: This paper illustrates and exemplifies the good practices to be followed in using machine learning in modeling perceptual mechanisms and proves the ability of ML-based approaches to address visual quality assessment.
Abstract: Objective metrics for visual quality assessment often base their reliability on the explicit modeling of the highly non-linear behavior of human perception; as a result, they may be complex and computationally expensive. Conversely, machine learning (ML) paradigms allow to tackle the quality assessment task from a different perspective, as the eventual goal is to mimic quality perception instead of designing an explicit model the human visual system. Several studies already proved the ability of ML-based approaches to address visual quality assessment; nevertheless, these paradigms are highly prone to overfitting, and their overall reliability may be questionable. In fact, a prerequisite for successfully using ML in modeling perceptual mechanisms is a profound understanding of the advantages and limitations that characterize learning machines. This paper illustrates and exemplifies the good practices to be followed.

59 citations


Journal ArticleDOI
TL;DR: A novel face-based matcher composed of a multi-resolution hierarchy of patch-based feature descriptors for periocular recognition - recognition based on the soft tissue surrounding the eye orbit is developed.
Abstract: This work develops a novel face-based matcher composed of a multi-resolution hierarchy of patch-based feature descriptors for periocular recognition - recognition based on the soft tissue surrounding the eye orbit. The novel patch-based framework for periocular recognition is compared against other feature descriptors and a commercial full-face recognition system against a set of four uniquely challenging face corpora. The framework, hierarchical three-patch local binary pattern, is compared against the three-patch local binary pattern and the uniform local binary pattern on the soft tissue area around the eye orbit. Each challenge set was chosen for its particular non-ideal face representations that may be summarized as matching against pose, illumination, expression, aging, and occlusions. The MORPH corpora consists of two mug shot datasets labeled Album 1 and Album 2. The Album 1 corpus is the more challenging of the two due to its incorporation of print photographs (legacy) captured with a variety of cameras from the late 1960s to 1990s. The second challenge dataset is the FRGC still image set. Corpus three, Georgia Tech face database, is a small corpus but one that contains faces under pose, illumination, expression, and eye region occlusions. The final challenge dataset chosen is the Notre Dame Twins database, which is comprised of 100 sets of identical twins and 1 set of triplets. The proposed framework reports top periocular performance against each dataset, as measured by rank-1 accuracy: (1) MORPH Album 1, 33.2%; (2) FRGC, 97.51%; (3) Georgia Tech, 92.4%; and (4) Notre Dame Twins, 98.03%. Furthermore, this work shows that the proposed periocular matcher (using only a small section of the face, about the eyes) compares favorably to a commercial full-face matcher.

Journal ArticleDOI
TL;DR: This article proposes an efficient coarse-to-fine dual scale technique for cavity detection in chest radiographs that outperforms other existing techniques with respect to true cavity detection rate and segmentation accuracy.
Abstract: Although many lung disease diagnostic procedures can benefit from computer-aided detection (CAD), current CAD systems are mainly designed for lung nodule detection. In this article, we focus on tuberculosis (TB) cavity detection because of its highly infectious nature. Infectious TB, such as adult-type pulmonary TB (APTB) and HIV-related TB, continues to be a public health problem of global proportion, especially in the developing countries. Cavities in the upper lung zone provide a useful cue to radiologists for potential infectious TB. However, the superimposed anatomical structures in the lung field hinder effective identification of these cavities. In order to address the deficiency of existing computer-aided TB cavity detection methods, we propose an efficient coarse-to-fine dual scale technique for cavity detection in chest radiographs. Gaussian-based matching, local binary pattern, and gradient orientation features are applied at the coarse scale, while circularity, gradient inverse coefficient of variation and Kullback–Leibler divergence measures are applied at the fine scale. Experimental results demonstrate that the proposed technique outperforms other existing techniques with respect to true cavity detection rate and segmentation accuracy.

Journal ArticleDOI
TL;DR: Using the proposed methodology, a relatively high performance (up to 90%) of affect recognition is obtained and several fusion techniques are used to combine the information extracted from the audio and video contents of music video clips.
Abstract: Nowadays, tags play an important role in the search and retrieval process in multimedia content sharing social networks. As the amount of multimedia contents explosively increases, it is a challenging problem to find a content that will be appealing to the users. Furthermore, the retrieval of multimedia contents, which can match users’ current mood or affective state, can be of great interest. One approach to indexing multimedia contents is to determine the potential affective state, which they can induce in users. In this paper, multimedia content analysis is performed to extract affective audio and visual cues from different music video clips. Furthermore, several fusion techniques are used to combine the information extracted from the audio and video contents of music video clips. We show that using the proposed methodology, a relatively high performance (up to 90%) of affect recognition is obtained.

Journal ArticleDOI
TL;DR: The obtained experimental results show the relevance of the idea of combining XBee (Zigbee or Wireless Fidelity) protocol, known for its high noise immunity, to secure hyperchaotic communications.
Abstract: In this paper, we propose and demonstrate experimentally a new wireless digital encryption hyperchaotic communication system based on radio frequency (RF) communication protocols for secure real-time data or image transmission. A reconfigurable hardware architecture is developed to ensure the interconnection between two field programmable gate array development platforms through XBee RF modules. To ensure the synchronization and encryption of data between the transmitter and the receiver, a feedback masking hyperchaotic synchronization technique based on a dynamic feedback modulation has been implemented to digitally synchronize the encrypter hyperchaotic systems. The obtained experimental results show the relevance of the idea of combining XBee (Zigbee or Wireless Fidelity) protocol, known for its high noise immunity, to secure hyperchaotic communications. In fact, we have recovered the information data or image correctly after real-time encrypted data or image transmission tests at a maximum distance (indoor range) of more than 30 m and with maximum digital modulation rate of 625,000 baud allowing a wireless encrypted video transmission rate of 25 images per second with a spatial resolution of 128 × 128 pixels. The obtained performance of the communication system is suitable for secure data or image transmissions in wireless sensor networks.

Journal ArticleDOI
TL;DR: A high performance face recognition system based on local binary pattern (LBP) using the probability distribution functions (PDFs) of pixels in different mutually independent color channels which are robust to frontal homogenous illumination and planer rotation is proposed.
Abstract: In this article, a high performance face recognition system based on local binary pattern (LBP) using the probability distribution functions (PDFs) of pixels in different mutually independent color channels which are robust to frontal homogenous illumination and planer rotation is proposed. The illumination of faces is enhanced by using the state-of-the-art technique which is using discrete wavelet transform and singular value decomposition. After equalization, face images are segmented by using local successive mean quantization transform followed by skin color-based face detection system. Kullback–Leibler distance between the concatenated PDFs of a given face obtained by LBP and the concatenated PDFs of each face in the database is used as a metric in the recognition process. Various decision fusion techniques have been used in order to improve the recognition rate. The proposed system has been tested on the FERET, HP, and Bosphorus face databases. The proposed system is compared with conventional and the state-of-the-art techniques. The recognition rates obtained using FVF approach for FERET database is 99.78% compared with 79.60 and 68.80% for conventional gray-scale LBP and principle component analysis-based face recognition techniques, respectively.

Journal ArticleDOI
Yun Tie1, Ling Guan1
TL;DR: An efficient and robust method for facial landmark detection and tracking from video sequences using a kernel correlation analysis approach to find the detection likelihood by maximizing a similarity criterion between the target points and the candidate points.
Abstract: Facial landmarks are a set of salient points, usually located on the corners, tips or mid points of the facial components. Reliable facial landmarks and their associated detection and tracking algorithms can be widely used for representing the important visual features for face registration and expression recognition. In this paper we propose an efficient and robust method for facial landmark detection and tracking from video sequences. We select 26 landmark points on the facial region to facilitate the analysis of human facial expressions. They are detected in the first input frame by the scale invariant feature based detectors. Multiple Differential Evolution-Markov Chain (DE-MC) particle filters are applied for tracking these points through the video sequences. A kernel correlation analysis approach is proposed to find the detection likelihood by maximizing a similarity criterion between the target points and the candidate points. The detection likelihood is then integrated into the tracker’s observation likelihood. Sampling efficiency is improved and minimal amount of computation is achieved by using the intermediate results obtained in particle allocations. Three public databases are used for experiments and the results demonstrate the effectiveness of our method.

Journal ArticleDOI
TL;DR: A camera-based lane departure warning system implemented on a field programmable gate array (FPGA) device used as a driver assistance system, which effectively prevents accidents given that it is endowed with the advantages of FPGA technology, including high performance for digital image processing applications, compactness, and low cost.
Abstract: This paper presents a camera-based lane departure warning system implemented on a field programmable gate array (FPGA) device. The system is used as a driver assistance system, which effectively prevents accidents given that it is endowed with the advantages of FPGA technology, including high performance for digital image processing applications, compactness, and low cost. The main contributions of this work are threefold. (1) An improved vanishing point-based steerable filter is introduced and implemented on an FPGA device. Using the vanishing point to guide the orientation at each pixel, this algorithm works well in complex environments. (2) An improved vanishing point-based parallel Hough transform is proposed. Unlike the traditional Hough transform, our improved version moves the coordinate origin to the estimated vanishing point to reduce storage requirements and enhance detection capability. (3) A prototype based on the FPGA is developed. With improvements in the vanishing point-based steerable filter and vanishing point-based parallel Hough transform, the prototype can be used in complex weather and lighting conditions. Experiments conducted on an evaluation platform and on actual roads illustrate the effective performance of the proposed system.

Journal ArticleDOI
TL;DR: A semi-automatic selection process for content sets for subjective experiments will be proposed for three-dimensional testing, a newer field that requires new considerations for scene selection.
Abstract: This paper presents recommended techniques for choosing video sequences for subjective experiments. Subjective video quality assessment is a well-understood field, yet scene selection is often driven by convenience or content availability. Three-dimensional testing is a newer field that requires new considerations for scene selection. The impact of experiment design on best practices for scene selection will also be considered. A semi-automatic selection process for content sets for subjective experiments will be proposed.

Journal ArticleDOI
TL;DR: This work presents a stereo vision-based system that is able to detect bees at the beehive entrance and is sufficiently reliable for tracking, and proposes a detect-before-track approach that employs two innovating methods: hybrid segmentation using both intensity and depth images, and tuned 3D multi-target tracking based on the Kalman filter and Global Nearest Neighbor.
Abstract: In response to recent needs of biologists, we lay the foundations for a real-time stereo vision-based system for monitoring flying honeybees in three dimensions at the beehive entrance. Tracking bees is a challenging task as they are numerous, small, and fast-moving targets with chaotic motion. Contrary to current state-of-the-art approaches, we propose to tackle the problem in 3D space. We present a stereo vision-based system that is able to detect bees at the beehive entrance and is sufficiently reliable for tracking. Furthermore, we propose a detect-before-track approach that employs two innovating methods: hybrid segmentation using both intensity and depth images, and tuned 3D multi-target tracking based on the Kalman filter and Global Nearest Neighbor. Tests on robust ground truths for segmentation and tracking have shown that our segmentation and tracking methods clearly outperform standard 2D approaches.

Journal ArticleDOI
TL;DR: It is shown that the developed video tracking system outperforms level set-based systems that do not use prior shape knowledge, working well even where these systems fail.
Abstract: Over the years, maritime surveillance has become increasingly important due to the recurrence of piracy. While surveillance has traditionally been a manual task using crew members in lookout positions on parts of the ship, much work is being done to automate this task using digital cameras coupled with a computer that uses image processing techniques that intelligently track object in the maritime environment. One such technique is level set segmentation which evolves a contour to objects of interest in a given image. This method works well but gives incorrect segmentation results when a target object is corrupted in the image. This paper explores the possibility of factoring in prior knowledge of a ship’s shape into level set segmentation to improve results, a concept that is unaddressed in maritime surveillance problem. It is shown that the developed video tracking system outperforms level set-based systems that do not use prior shape knowledge, working well even where these systems fail.

Journal ArticleDOI
TL;DR: In this article, a two-stage method for high density noise suppression while preserving the image details is proposed, where the first stage applies an iterative impulse detector, exploiting the image entropy, to identify the corrupted pixels and then employs an adaptive iterative mean filter to restore them.
Abstract: In this paper, we suggest a general model for the fixed-valued impulse noise and propose a two-stage method for high density noise suppression while preserving the image details. In the first stage, we apply an iterative impulse detector, exploiting the image entropy, to identify the corrupted pixels and then employ an Adaptive Iterative Mean filter to restore them. The filter is adaptive in terms of the number of iterations, which is different for each noisy pixel, according to the Euclidean distance from the nearest uncorrupted pixel. Experimental results show that the proposed filter is fast and outperforms the best existing techniques in both objective and subjective performance measures.

Journal ArticleDOI
TL;DR: Comparative analysis reveals that LBP8,1 is the most suitable texture analysis operator for the proposed system due to its perfect classification performance along with the lowest degree of computational complexity.
Abstract: Fault diagnosis of induction motors in the practical industrial fields is always a challenging task due to the difficulty that lies in exact identification of fault signatures at various motor operating conditions in the presence of background noise produced by other mechanical subsystems. Several signal processing approaches have been adopted so far to mitigate the effect of this background noise in the acquired sensor signal so that fault-related features can be extracted effectively. Addressing this issue, this paper proposes a new approach for fault diagnosis of induction motors utilizing two-dimensional texture analysis based on local binary patterns (LBPs). Firstly, time domain vibration signals acquired from the operating motor are converted into two-dimensional gray-scale images. Then, discriminating texture features are extracted from these images employing LBP operator. These local feature descriptors are later utilized by multi-class support vector machine to identify faults of induction motors. The efficient texture analysis capability as well as the gray-scale invariance property of the LBP operators enables the proposed system to achieve impressive diagnostic performance even in the presence of high background noise. Comparative analysis reveals that LBP8,1 is the most suitable texture analysis operator for the proposed system due to its perfect classification performance along with the lowest degree of computational complexity.

Journal ArticleDOI
TL;DR: Experimental results on a database, including 480 retinal images obtained from 40 subjects of DRIVE dataset and 40 subjects from STARE dataset, demonstrated an average true recognition accuracy rate equal to 100% for the proposed method.
Abstract: This paper presents a new human recognition method based on features extracted from retinal images. The proposed method is composed of some steps including feature extraction, phase correlation technique, and feature matching for recognition. In the proposed method, Harris corner detector is used for feature extraction. Then, phase correlation technique is applied to estimate the rotation angle of head or eye movement in front of a retina fundus camera. Finally, a new similarity function is used to compute the similarity between features of different retina images. Experimental results on a database, including 480 retinal images obtained from 40 subjects of DRIVE dataset and 40 subjects from STARE dataset, demonstrated an average true recognition accuracy rate equal to 100% for the proposed method. The success rate and number of images used in the proposed method show the effectiveness of the proposed method in comparison to the counterpart methods.

Journal ArticleDOI
TL;DR: There is a significant advantage in using SIFT classification, the classification-based attack is robust against different SIFT implementations, and the attack is able to impair a state-of-the-art SIFT-based copy-move detector in realistic cases.
Abstract: Copy-move forgeries are very common image manipulations that are often carried out with malicious intents. Among the techniques devised by the ‘Image Forensic’ community, those relying on scale invariant feature transform (SIFT) features are the most effective ones. In this paper, we approach the copy-move scenario from the perspective of an attacker whose goal is to remove such features. The attacks conceived so far against SIFT-based forensic techniques implicitly assume that all SIFT keypoints have similar properties. On the contrary, we base our attacking strategy on the observation that it is possible to classify them in different typologies. Also, one may devise attacks tailored to each specific SIFT class, thus improving the performance in terms of removal rate and visual quality. To validate our ideas, we propose to use a SIFT classification scheme based on the gray scale histogram of the neighborhood of SIFT keypoints. Once the classification is performed, we then attack the different classes by means of class-specific methods. Our experiments lead to three interesting results: (1) there is a significant advantage in using SIFT classification, (2) the classification-based attack is robust against different SIFT implementations, and (3) we are able to impair a state-of-the-art SIFT-based copy-move detector in realistic cases.

Journal ArticleDOI
TL;DR: A real-time single-pass CCA algorithm that adopts the pixel as a scan unit while the line as a labeling unit and manages the correspondence of labels between adjacent rows by designing a multi-layer-index structure is proposed.
Abstract: Due to the demand for real-time processing in real-time automatic target recognition (RTATR) systems, fast connected components analysis (CCA) is significant to RTATR performance improvement. Conventional single-pass CCA algorithms need horizontal blanking periods to resolve the equivalence, which are difficult to be applied when the streamed data is transmitted without horizontal blanking periods. In this paper, a real-time single-pass CCA algorithm is proposed. Unlike the conventional ones, we adopt the pixel as a scan unit while the line as a labeling unit and manage the correspondence of labels between adjacent rows by designing a multi-layer-index structure. Equivalence is resolved when the image is scanning, without extra processing time. The proposed algorithm is suitable for hardware acceleration, and the streamed image data can be processed during image transmission without horizontal blanking periods. Experimental results indicate that the hardware acceleration of algorithm achieves real-time CCA in RTATR system.

Journal ArticleDOI
Ming Xi1, Lianghao Wang1, Qingqing Yang1, Dongxiao Li1, Ming Zhang1 
TL;DR: A depth-image-based rendering (DIBR) method with spatial and temporal texture synthesis is presented, which combines the temporally stationary scene information extracted from the input video and spatial texture in the current frame to fill the disoccluded areas in the virtual views.
Abstract: A depth-image-based rendering (DIBR) method with spatial and temporal texture synthesis is presented in this article. Theoretically, the DIBR algorithm can be used to generate arbitrary virtual views of the same scene in a three-dimensional television system. But the disoccluded area, which is occluded in the original views and becomes visible in the virtual views, makes it very difficult to obtain high image quality in the extrapolated views. The proposed view synthesis method combines the temporally stationary scene information extracted from the input video and spatial texture in the current frame to fill the disoccluded areas in the virtual views. Firstly, the current texture image and a stationary scene image, which is extracted from the input video, are warped to the same virtual perspective position by the DIBR method. Then, the two virtual images are merged together to reduce the hole regions and maintain the temporal consistency of these areas. Finally, an oriented exemplar-based inpainting method is utilized to eliminate the remaining holes. Experimental results are shown to demonstrate the performance and advantage of the proposed method compared with other view synthesis methods.

Journal ArticleDOI
TL;DR: The experimental results show that the proposed scheme achieves much better performance than the existing lossy compression scheme for pixel-value encrypted images and also similar performance as the state-of-the-art lossy compressed for pixel permutation-based encrypted images.
Abstract: Compression of encrypted data draws much attention in recent years due to the security concerns in a service-oriented environment such as cloud computing. We propose a scalable lossy compression scheme for images having their pixel value encrypted with a standard stream cipher. The encrypted data are simply compressed by transmitting a uniformly subsampled portion of the encrypted data and some bitplanes of another uniformly subsampled portion of the encrypted data. At the receiver side, a decoder performs content-adaptive interpolation based on the decrypted partial information, where the received bit plane information serves as the side information that reflects the image edge information, making the image reconstruction more precise. When more bit planes are transmitted, higher quality of the decompressed image can be achieved. The experimental results show that our proposed scheme achieves much better performance than the existing lossy compression scheme for pixel-value encrypted images and also similar performance as the state-of-the-art lossy compression for pixel permutation-based encrypted images. In addition, our proposed scheme has the following advantages: at the decoder side, no computationally intensive iteration and no additional public orthogonal matrix are needed. It works well for both smooth and texture-rich images.

Journal ArticleDOI
TL;DR: Techniques to assess the objective quality for stereoscopic 3D video content related to motion and depth map features are proposed and guidelines are obtained after applying the algorithm to quantify the impact over viewer's experience when common cases happen.
Abstract: In this paper, we propose techniques to assess the objective quality for stereoscopic 3D video content, related to motion and depth map features. An analysis has been carried out in order to understand what causes the generation of visual discomfort in the viewer's eye when visualizing a 3D video. Motion is an important feature affecting 3D experience but is also often the cause of visual discomfort. Guidelines are obtained after applying the algorithm to quantify the impact over viewer's experience when common cases happen, such as high motion sequences, scene changes with abrupt parallax changes, or complete absence of stereoscopy.

Journal ArticleDOI
TL;DR: An effective background initialization and foreground segmentation approach for bootstrapping video sequences is proposed, in which a side-match measure is used to determine whether the background is exposed.
Abstract: In this study, an effective background initialization and foreground segmentation approach for bootstrapping video sequences is proposed. First, a modified block representation approach is used to classify each block of the current video frame into one of four categories, namely, “background,” “still object,” “illumination change,” and “moving object.” Then, a new background updating scheme is developed, in which a side-match measure is used to determine whether the background is exposed. Finally, using the edge information, an improved noise removal and shadow suppression procedure with two morphological operations is adopted to enhance the final segmented foreground. Based on the experimental results obtained in this study, as compared with three comparison approaches, the proposed approach produces better background initialization and foreground segmentation results.

Journal ArticleDOI
TL;DR: A novel methodology for automatic American College of Radiology Breast Imaging Reporting and Data System classification using local binary pattern variance descriptor that characterizes the local density in different types of breast tissue patterns information into the LBP histogram is presented.
Abstract: Mammogram tissue density has been found to be a strong indicator for breast cancer risk. Efforts in computer vision of breast parenchymal pattern have been made in order to improve the diagnostic accuracy by radiologists. Motivated by recent results in mammogram tissue density classification, a novel methodology for automatic American College of Radiology Breast Imaging Reporting and Data System classification using local binary pattern variance descriptor is presented in this article. The proposed approach characterizes the local density in different types of breast tissue patterns information into the LBP histogram. The performance of macro-calcification detection methods is developed using FARABI database. Performance results are given in terms of receiver operating characteristic. The area under curve of the corresponding approach has been found to be 79%.