scispace - formally typeset
Search or ask a question

Showing papers in "Eurasip Journal on Image and Video Processing in 2017"


Journal ArticleDOI
TL;DR: It is found out that spectrogram image classification with CNN algorithm works as well as the SVM algorithm, and given the large amount of data, CNN and SVM machine learning algorithms can accurately classify and pre-diagnose respiratory audio.
Abstract: In the field of medicine, with the introduction of computer systems that can collect and analyze massive amounts of data, many non-invasive diagnostic methods are being developed for a variety of conditions. In this study, our aim is to develop a non-invasive method of classifying respiratory sounds that are recorded by an electronic stethoscope and the audio recording software that uses various machine learning algorithms. In order to store respiratory sounds on a computer, we developed a cost-effective and easy-to-use electronic stethoscope that can be used with any device. Using this device, we recorded 17,930 lung sounds from 1630 subjects. We employed two types of machine learning algorithms; mel frequency cepstral coefficient (MFCC) features in a support vector machine (SVM) and spectrogram images in the convolutional neural network (CNN). Since using MFCC features with a SVM algorithm is a generally accepted classification method for audio, we utilized its results to benchmark the CNN algorithm. We prepared four data sets for each CNN and SVM algorithm to classify respiratory audio: (1) healthy versus pathological classification; (2) rale, rhonchus, and normal sound classification; (3) singular respiratory sound type classification; and (4) audio type classification with all sound types. Accuracy results of the experiments were; (1) CNN 86%, SVM 86%, (2) CNN 76%, SVM 75%, (3) CNN 80%, SVM 80%, and (4) CNN 62%, SVM 62%, respectively. As a result, we found out that spectrogram image classification with CNN algorithm works as well as the SVM algorithm, and given the large amount of data, CNN and SVM machine learning algorithms can accurately classify and pre-diagnose respiratory audio.

165 citations


Journal ArticleDOI
TL;DR: This work obtained an average of 88% of success detecting cracks and an 80% of failure detecting the type of the crack, and could be implemented in a vehicle traveling as fast as 130 kmh or 81 mph.
Abstract: Each year, millions of dollars are invested on road maintenance and reparation all over the world. In order to minimize costs, one of the main aspects is the early detection of those flaws. Different types of cracks require different types of repairs; therefore, not only a crack detection is required but a crack type classification. Also, the earlier the crack is detected, the cheaper the reparation is. Once the images are captured, several processes are applied in order to extract the main characteristics for emphasizing the cracks (logarithmic transformation, bilateral filter, Canny algorithm, and a morphological filter). After image preprocessing, a decision tree heuristic algorithm is applied to finally classify the image. This work obtained an average of 88% of success detecting cracks and an 80% of success detecting the type of the crack. It could be implemented in a vehicle traveling as fast as 130 kmh or 81 mph.

121 citations


Journal ArticleDOI
TL;DR: This research proposes a hybrid strategy for efficient classification of human activities from a given video sequence by integrating four major steps: segment the moving objects by fusing novel uniform segmentation and expectation maximization, extract a new set of fused features using local binary patterns with histogram oriented gradient and Harlick features, and feature classification using multi-class support vector machine.
Abstract: Human activity monitoring in the video sequences is an intriguing computer vision domain which incorporates colossal applications, e.g., surveillance systems, human-computer interaction, and traffic control systems. In this research, our primary focus is in proposing a hybrid strategy for efficient classification of human activities from a given video sequence. The proposed method integrates four major steps: (a) segment the moving objects by fusing novel uniform segmentation and expectation maximization, (b) extract a new set of fused features using local binary patterns with histogram oriented gradient and Harlick features, (c) feature selection by novel Euclidean distance and joint entropy-PCA-based method, and (d) feature classification using multi-class support vector machine. The three benchmark datasets (MIT, CAVIAR, and BMW-10) are used for training the classifier for human classification; and for testing, we utilized multi-camera pedestrian videos along with MSR Action dataset, INRIA, and CASIA dataset. Additionally, the results are also validated using dataset recorded by our research group. For action recognition, four publicly available datasets are selected such as Weizmann, KTH, UIUC, and Muhavi to achieve recognition rates of 95.80, 99.30, 99, and 99.40%, respectively, which confirm the authenticity of our proposed work. Promising results are achieved in terms of greater precision compared to existing techniques.

105 citations


Journal ArticleDOI
TL;DR: In this paper, a feature set characterizing both facial and body properties of a student, including gaze point and body posture, are used to train classifiers which estimate time-varying attention levels of individual students.
Abstract: This paper proposes a novel approach to automatic estimation of attention of students during lectures in the classroom. The approach uses 2D and 3D data obtained by the Kinect One sensor to build a feature set characterizing both facial and body properties of a student, including gaze point and body posture. Machine learning algorithms are used to train classifiers which estimate time-varying attention levels of individual students. Human observers’ estimation of attention level is used as a reference. The comparison of attention prediction accuracy of seven classifiers is done on a data set comprising 18 subjects. Our best person-independent three-level attention classifier achieved moderate accuracy of 0.753, comparable to results of other studies in the field of student engagement. The results indicate that Kinect-based attention monitoring system is able to predict both students’ attention over time as well as average attention levels and could be applied as a tool for non-intrusive automated analytics of the learning process.

97 citations


Journal ArticleDOI
TL;DR: The use of the state-of-the-art patch-based denoising methods for additive noise reduction is investigated and fast patch similarity measurements produce fast patch- based image denoizing methods.
Abstract: Digital images are captured using sensors during the data acquisition phase, where they are often contaminated by noise (an undesired random signal). Such noise can also be produced during transmission or by poor-quality lossy image compression. Reducing the noise and enhancing the images are considered the central process to all other digital image processing tasks. The improvement in the performance of image denoising methods would contribute greatly on the results of other image processing techniques. Patch-based denoising methods recently have merged as the state-of-the-art denoising approaches for various additive noise levels. In this work, the use of the state-of-the-art patch-based denoising methods for additive noise reduction is investigated. Various types of image datasets are addressed to conduct this study. We first explain the type of noise in digital images and discuss various image denoising approaches, with a focus on patch-based denoising methods. Then, we experimentally evaluate both quantitatively and qualitatively the patch-based denoising methods. The patch-based image denoising methods are analyzed in terms of quality and computational time. Despite the sophistication of patch-based image denoising approaches, most patch-based image denoising methods outperform the rest. Fast patch similarity measurements produce fast patch-based image denoising methods. Patch-based image denoising approaches can effectively reduce noise and enhance images. Patch-based image denoising approach is the state-of-the-art image denoising approach.

57 citations


Journal ArticleDOI
TL;DR: Compared to the current state-of-the-art method, the proposed data hiding method using a new coefficient selection technique has higher peak signal to noise ratio (PSNR) and smaller file size.
Abstract: Recently, reversible data hiding (RDH) techniques for JPEG images have become more extensively used to combine image and authentication information conveniently into one file. Although embedding data in JPEG image degrades visual quality and increases file size, it is proven to be useful for data communication and image authentication. In this paper, a data hiding method in JPEG image using a new coefficient selection technique is proposed. The proposed scheme embeds data using the histogram shifting (HS) method. According to the number of zero AC coefficients, block ordering is used to embed data first in the blocks causing less distortion. In order to further reduce the distortion, positions of AC coefficients are selected carefully. Finally, AC coefficients valued +1 and −1 are used for embedding, and the remaining non-zero AC coefficients are shifted to the left or right directions according to their sign. Compared to the current state-of-the-art method, experimental results show that the proposed method has higher peak signal to noise ratio (PSNR) and smaller file size.

45 citations


Journal ArticleDOI
TL;DR: An efficient approach to obtain image hash through DWT-SVD and a saliency detection technique using spectral residual model and the receiver operating characteristics shows that the proposed method is better than some state-of-the-art methods.
Abstract: In the last few decades, the discovery of various methods for generating secure image hash has become revolutionary in the field of image hashing. This paper presents an efficient approach to obtain image hash through DWT-SVD and a saliency detection technique using spectral residual model. The latest image hashing technique based on ring partition and invariant vector distance is rotation invariant for the large angle at the cost of being insensitive to corner forgery. But, due to the use of the central orientation information, the proposed system is rotation invariant for arbitrary angles along with sensitiveness to corner changes. In addition, we have used the HSV color space that gives desirable classification performance. It provides satisfactory results against large degree rotation, JPEG compression, brightness and contrast adjustment, watermarking, etc. This technique is also sensitive to malicious activities. Moreover, it locates the forged areas of a forged image. We have applied the proposed algorithm to a large collection of images from various databases. The receiver operating characteristics shows that the proposed method is better than some state-of-the-art methods.

45 citations


Journal ArticleDOI
TL;DR: In this paper, a multimodal video database, namely COGNIMUSE, annotated with sensory and semantic saliency, events, cross-media semantics, and emotion is presented.
Abstract: Research related to computational modeling for machine-based understanding requires ground truth data for training, content analysis, and evaluation. In this paper, we present a multimodal video database, namely COGNIMUSE, annotated with sensory and semantic saliency, events, cross-media semantics, and emotion. The purpose of this database is manifold; it can be used for training and evaluation of event detection and summarization algorithms, for classification and recognition of audio-visual and cross-media events, as well as for emotion tracking. In order to enable comparisons with other computational models, we propose state-of-the-art algorithms, specifically a unified energy-based audio-visual framework and a method for text saliency computation, for the detection of perceptually salient events from videos. Additionally, a movie summarization system for the automatic production of summaries is presented. Two kinds of evaluation were performed, an objective based on the saliency annotation of the database and an extensive qualitative human evaluation of the automatically produced summaries, where we investigated what composes high-quality movie summaries, where both methods verified the appropriateness of the proposed methods. The annotation of the database and the code for the summarization system can be found at http://cognimuse.cs.ntua.gr/database .

41 citations


Journal ArticleDOI
TL;DR: This work proposed an evolutionary classifier with a Bayes kernel (BYEC) that can be adjusted with a small sample set to better adapt the model for a new production line and compared the performance of this method to various algorithms.
Abstract: Nowadays, surface defect detection systems for steel strip have replaced traditional artificial inspection systems, and automatic defect detection systems offer good performance when the sample set is large and the model is stable. However, the trained model does work well when a new production line is initiated with different equipment, processes, or detection devices. These variables make just tiny changes to the real-world model but have a significant impact on the classification result. To overcome these problems, we propose an evolutionary classifier with a Bayes kernel (BYEC) that can be adjusted with a small sample set to better adapt the model for a new production line. First, abundant features were introduced to cover detailed information about the defects. Second, we constructed a series of support vector machines (SVMs) with a random subspace of the features. Then, a Bayes classifier was trained as an evolutionary kernel fused with the results from the sub-SVM to form an integrated classifier. Finally, we proposed a method to adjust the Bayes evolutionary kernel with a small sample set. We compared the performance of this method to various algorithms; experimental results demonstrate that the proposed method can be adjusted with a small sample set to fit the changed model. Experimental evaluations were conducted to demonstrate the robustness, low requirement for samples, and adaptiveness of the proposed method.

41 citations


Journal ArticleDOI
TL;DR: The proposed method was extensively tested on the CK+ and JAFFE datasets using a support vector machine (SVM) and shown to further improve the accuracy and efficiency of GLTP compared to other common and state-of-the-art methods in literature.
Abstract: Automated human emotion detection is a topic of significant interest in the field of computer vision. Over the past decade, much emphasis has been on using facial expression recognition (FER) to extract emotion from facial expressions. Many popular appearance-based methods such as local binary pattern (LBP), local directional pattern (LDP) and local ternary pattern (LTP) have been proposed for this task and have been proven both accurate and efficient. In recent years, much work has been undertaken into improving these methods. The gradient local ternary pattern (GLTP) is one such method aimed at increasing robustness to varying illumination and random noise in the environment. In this paper, GLTP is investigated in more detail and further improvements such as the use of enhanced pre-processing, a more accurate Scharr gradient operator, dimensionality reduction via principal component analysis (PCA) and facial component extraction are proposed. The proposed method was extensively tested on the CK+ and JAFFE datasets using a support vector machine (SVM) and shown to further improve the accuracy and efficiency of GLTP compared to other common and state-of-the-art methods in literature.

40 citations


Journal ArticleDOI
TL;DR: A segmentation-free optical character recognition system for printed Urdu Nastaliq font using ligatures as units of recognition using Hidden Markov Models for classification is presented.
Abstract: This paper presents a segmentation-free optical character recognition system for printed Urdu Nastaliq font using ligatures as units of recognition. The proposed technique relies on statistical features and employs Hidden Markov Models for classification. A total of 1525 unique high-frequency Urdu ligatures from the standard Urdu Printed Text Images (UPTI) database are considered in our study. Ligatures extracted from text lines are first split into primary (main body) and secondary (dots and diacritics) ligatures and multiple instances of the same ligature are grouped into clusters using a sequential clustering algorithm. Hidden Markov Models are trained separately for each ligature using the examples in the respective cluster by sliding right-to-left the overlapped windows and extracting a set of statistical features. Given the query text, the primary and secondary ligatures are separately recognized and later associated together using a set of heuristics to recognize the complete ligature. The system evaluated on the standard UPTI Urdu database reported a ligature recognition rate of 92% on more than 6000 query ligatures.

Journal ArticleDOI
TL;DR: The characteristics of QR barcodes are explored to design a secret hiding mechanism for the QR barcode with a higher payload compared to the past ones and the feasibility of the proposed scheme is demonstrated.
Abstract: Quick response (QR) code has become one of the more popular two-dimensional barcodes because of its greater data capacity and higher damage resistance. The barcode scanners can easily extract the information hidden in the QR code while scanning the data modules. However, some sensitive data directly stored in QR codes are insecure in real-world QR applications, such as the e-ticket and e-coupon. To protect the sensitive data, this paper explores the characteristics of QR barcodes to design a secret hiding mechanism for the QR barcode with a higher payload compared to the past ones. For a normal scanner, a browser can only reveal the formal information from the marked QR code. The authorized user/scanner can further reveal the sensitive data from the marked QR tag. The experiments demonstrate a satisfactory secret payload and the feasibility of the proposed scheme.

Journal ArticleDOI
TL;DR: This paper presents a comprehensive survey of the existed deep learning based hashing methods, which showcases their remarkable power of automatic learning highly robust and compact binary code representation for visual search.
Abstract: The proliferation of mobile devices is producing a new wave of mobile visual search applications that enable users to sense their surroundings with smart phones. As the particular challenges of mobile visual search, achieving high recognition bitrate becomes the consistent target of existed related works. In this paper, we explore to holistically exploit the deep learning-based hashing methods for more robust and instant mobile visual search. Firstly, we present a comprehensive survey of the existed deep learning based hashing methods, which showcases their remarkable power of automatic learning highly robust and compact binary code representation for visual search. Furthermore, in order to implement the deep learning hashing on computation and memory constrained mobile device, we investigate the deep learning optimization works to accelerate the computation and reduce the model size. Finally, we demonstrate a case study of deep learning hashing based mobile visual search system. The evaluations show that the proposed system can significantly improve 70% accuracy in MAP than traditional methods, and only needs less than one second computation time on the ordinary mobile phone. Finally, with the comprehensive study, we discuss the open issues and future research directions of deep learning hashing for mobile visual search.

Journal ArticleDOI
Seonhee Park1, Byeongho Moon1, Seungyong Ko1, Soohwan Yu1, Joonki Paik1 
TL;DR: Experimental results show that the proposed method can provide the better restored result than the existing methods without unnatural artifacts such as noise amplification and halo effects near edges.
Abstract: This paper presents a low-light image restoration method based on the variational Retinex model using the bright channel prior (BCP) and total-variation minimization. The proposed method first estimates the bright channel to control the amount of brightness enhancement. Next, the variational Retinex-based energy function is iteratively minimized to estimate the improved illumination and reflectance using the BCP. Contrast of the estimated illumination is enhanced using the gamma correction and histogram equalization to reduce a color distortion and noise amplification. Experimental results show that the proposed method can provide the better restored result than the existing methods without unnatural artifacts such as noise amplification and halo effects near edges.

Journal ArticleDOI
TL;DR: An algorithm that is capable of clustering images taken by an unknown number of unknown digital cameras into groups, such that each contains only photographs taken by the same source camera, is presented.
Abstract: We present in this paper an algorithm that is capable of clustering images taken by an unknown number of unknown digital cameras into groups, such that each contains only images taken by the same source camera. It first extracts a sensor pattern noise (SPN) from each image, which serves as the fingerprint of the camera that has taken the image. The image clustering is performed based on the pairwise correlations between camera fingerprints extracted from images. During this process, each SPN is treated as a random variable and a Markov random field (MRF) approach is employed to iteratively assign a class label to each SPN (i.e., random variable). The clustering process requires no a priori knowledge about the dataset from the user. A concise yet effective cost function is formulated to allow different “neighbors” different voting power in determining the class label of the image in question depending on their similarities. Comparative experiments were carried out on the Dresden image database to demonstrate the advantages of the proposed clustering algorithm.

Journal ArticleDOI
TL;DR: A comparative study between the proposed separable two-dimensional discrete orthogonal moments and the classical ones is presented, in terms of gray-level image reconstruction accuracy, including noisy and noise-free conditions.
Abstract: In this paper, we propose three new separable two-dimensional discrete orthogonal moments baptized: RTM (Racah-Tchebichef moments), RKM (Racah-Krawtchouk moments), and RdHM (Racah-dual Hahn moments). We present a comparative study between our proposed separable two-dimensional discrete orthogonal moments and the classical ones, in terms of gray-level image reconstruction accuracy, including noisy and noise-free conditions. Furthermore, in this study, the local feature extraction capabilities of the proposed moments are described. Finally, a new set of RST (rotation, scaling, and translation) invariants, based on separable proposed moments, is introduced in this paper for the first time, and their description performances are highly tested as pattern features for image classification in comparison with the traditional moment invariants. The experimental results show that the new set of moments is potentially useful in the field of image analysis.

Journal ArticleDOI
TL;DR: A new lossy compression method denoted as PE-VQ method is proposed which employs prediction error and vector quantization (VQ) concepts and it is shown that higher PSNR values can be obtained by applying VQ on prediction errors rather than on the original image pixels.
Abstract: Lossy image compression has been gaining importance in recent years due to the enormous increase in the volume of image data employed for Internet and other applications. In a lossy compression, it is essential to ensure that the compression process does not affect the quality of the image adversely. The performance of a lossy compression algorithm is evaluated based on two conflicting parameters, namely, compression ratio and image quality which is usually measured by PSNR values. In this paper, a new lossy compression method denoted as PE-VQ method is proposed which employs prediction error and vector quantization (VQ) concepts. An optimum codebook is generated by using a combination of two algorithms, namely, artificial bee colony and genetic algorithms. The performance of the proposed PE-VQ method is evaluated in terms of compression ratio (CR) and PSNR values using three different types of databases, namely, CLEF med 2009, Corel 1 k and standard images (Lena, Barbara etc.). Experiments are conducted for different codebook sizes and for different CR values. The results show that for a given CR, the proposed PE-VQ technique yields higher PSNR value compared to the existing algorithms. It is also shown that higher PSNR values can be obtained by applying VQ on prediction errors rather than on the original image pixels.

Journal ArticleDOI
TL;DR: A gradient-based estimation of the texture complexity that is used for coding unit decision is proposed and results show that the proposed algorithm achieves a reduction of 42.8% in encoding time with an increase in BD rate of only 1.1%.
Abstract: In order to reach higher coding efficiency compared to its predecessor, a state-of-the-art video compression standard, the High Efficiency Video Coding (HEVC), has been designed to rely on many improved coding tools and sophisticated techniques. The new features are achieving significant coding efficiency but at the cost of huge implementation complexity. This complexity has increased the HEVC encoders’ need for fast algorithms and hardware friendly implementations. In fact, encoders have to perform the different encoding decisions, overcoming the real-time encoding constraint while taking care of coding efficiency. In this sense, in order to reduce the encoding complexity, HEVC encoders rely on look-ahead mechanisms and pre-processing solutions. In this context, we propose a gradient-based pre-processing stage. We investigate particularly the Prewitt operator used to generate the gradient and we propose necessary approaches that enhance the gradient performance of detecting the HEVC intra modes. We also set different probability scenarios, based on the gradient information, in order to speed up the mode search process. Moreover, we propose a gradient-based estimation of the texture complexity that we use for coding unit decision. Results show that the proposed algorithm achieves a reduction of 42.8% in encoding time with an increase in BD rate of only 1.1%.

Journal ArticleDOI
TL;DR: The experimental results of all variation cases demonstrated that the most effective method to distinguish identical twins is the proposed method compared to the other approaches implemented in this study.
Abstract: Distinguishing identical twins using their face images is a challenge in biometrics. The goal of this study is to construct a biometric system that is able to give the correct matching decision for the recognition of identical twins. We propose a method that uses feature-level fusion, score-level fusion, and decision-level fusion with principal component analysis, histogram of oriented gradients, and local binary patterns feature extractors. In the experiments, face images of identical twins from ND-TWINS-2009-2010 database were used. The results show that the proposed method is better than the state-of-the-art methods for distinguishing identical twins. Variations in illumination, expression, gender, and age of identical twins’ faces were also considered in this study. The experimental results of all variation cases demonstrated that the most effective method to distinguish identical twins is the proposed method compared to the other approaches implemented in this study. The lowest equal error rates of identical twins recognition that are achieved using the proposed method are 2.07% for natural expression, 0.0% for smiling expression, and 2.2% for controlled illumination compared to 4.5, 4.2, and 4.7% equal error rates of the best state-of-the-art algorithm under the same conditions. Additionally, the proposed method is compared with the other methods for non-twins using the same database and standard FERET subsets. The results achieved by the proposed method for non-twins identification are also better than all the other methods under expression, illumination, and aging variations.

Journal ArticleDOI
TL;DR: A computationally feasible discriminative ternary census transform histogram for image representation which uses dynamic thresholds to perceive the key properties of a feature descriptor and becomes more generalized to be used in different applications with reasonable accuracies.
Abstract: Despite lots of effort being exerted in designing feature descriptors, it is still challenging to find generalized feature descriptors, with acceptable discrimination ability, which are able to capture prominent features in various image processing applications. To address this issue, we propose a computationally feasible discriminative ternary census transform histogram (DTCTH) for image representation which uses dynamic thresholds to perceive the key properties of a feature descriptor. The code produced by DTCTH is more stable against intensity fluctuation, and it mainly captures the discriminative structural properties of an image by suppressing unnecessary background information. Thus, DTCTH becomes more generalized to be used in different applications with reasonable accuracies. To validate the generalizability of DTCTH, we have conducted rigorous experiments on five different applications considering nine benchmark datasets. The experimental results demonstrate that DTCTH performs as high as 28.08% better than the existing state-of-the-art feature descriptors such as GIST, SIFT, HOG, LBP, CLBP, OC-LBP, LGP, LTP, LAID, and CENTRIST.

Journal ArticleDOI
TL;DR: This paper investigates how a natural associative cortex such as a network integrates expert networks to form a gatingCNN scheme, and shows that with proper treatment, the gating CNN scheme works well, indicating future approaches to information integration in future activity recognition.
Abstract: Human activity recognition requires both visual and temporal cues, making it challenging to integrate these important modalities. The usual schemes for integration are averaging and fixing the weights of both features for all samples. However, how much weight is needed for each sample and modality, is still an open question. A mixture of experts via a gating Convolutional Neural Network (CNN) is one promising architecture for adaptively weighting every sample within a dataset. In this paper, rather than just averaging or using fixed weights, we investigate how a natural associative cortex such as a network integrates expert networks to form a gating CNN scheme. Starting from Red Green Blue color model (RGB) values and optical flows, we show that with proper treatment, the gating CNN scheme works well, indicating future approaches to information integration in future activity recognition.

Journal ArticleDOI
TL;DR: The adaptive decision based inverse distance weighted interpolation (DBIDWI) algorithm for the elimination of high- density salt and pepper noise in images is proposed and performs very well in restoring an image corrupted by high-density salt and Pepper noise by preserving fine details of an image.
Abstract: An adaptive decision based inverse distance weighted interpolation (DBIDWI) algorithm for the elimination of high-density salt and pepper noise in images is proposed. The pixel is initially checked for salt and pepper noise. If classified as noisy pixel, replace it with an inverse distance weighted interpolation value. This interpolation estimates the values of corrupted pixels using the distance and values nearby non-noisy pixels in vicinity. Inverse distance weighted interpolation uses the contribution of non-noisy pixel to the interpolated value. The window size is varied adaptively depending upon the non-noisy content of the current processing window. The algorithm is tested on various images and found to exhibit good results both in terms of quantitative (PSNR, MSE, SSIM, Pratt’s FOM) and qualitative (visually) at high noise densities. The algorithm performs very well in restoring an image corrupted by high-density salt and pepper noise by preserving fine details of an image.

Journal ArticleDOI
TL;DR: This paper proposes a novel watermarking algorithm based on non-subsampled contourlet transform (NSCT) for improving the security aspects of such images and offers superior capability, better capture quality, and tampering resistance, when compared with existing water marking approaches.
Abstract: At present, dealing with the piracy and tampering of images has become a notable challenge, due to the presence of smart mobile gadgets. In this paper, we propose a novel watermarking algorithm based on non-subsampled contourlet transform (NSCT) for improving the security aspects of such images. Moreover, the fusion of feature searching approach with watermarking methods has gained prominence in the current years. The scale-invariant feature transform (SIFT) is a technique in computer vision for detecting and illustrating the local features in images. Nevertheless, the SIFT algorithm can extract feature points with high invariance that are resilient to several issues like rotation, compression, and scaling. Furthermore, the extracted feature points are embedded with watermark using the NSCT approach. Subsequently, the tree split, voting, rotation searching, and morphology techniques are employed for improving the robustness against the noise. The proposed watermarking algorithm offers superior capability, better capture quality, and tampering resistance, when compared with existing watermarking approaches.

Journal ArticleDOI
TL;DR: A novel automatic method to extract the inner lips contour of CS speakers using a recent facial contour extraction model developed in computer vision, which reduces the total error of CLNF significantly, which is comparable to the state of the art.
Abstract: In previous French Cued Speech (CS) studies, one of the widely used methods is painting blue color on the speaker’s lips to make lips feature extraction easier. In this paper, in order to get rid of this artifice, a novel automatic method to extract the inner lips contour of CS speakers is presented. This method is based on a recent facial contour extraction model developed in computer vision, called Constrained Local Neural Field (CLNF), which provides eight characteristic landmarks describing the inner lips contour. However, directly applied to our CS data, CLNF fails in about 41.4% of cases. Therefore, we propose two methods to correct the B parameter (aperture of inner lips) and A parameter (width of inner lips), respectively. For correcting the B parameter, a hybrid dynamic correlation template method (HD-CTM) using the first derivative of smoothed luminance variation is proposed. HD-CTM is first applied to detect the outer lower lips position. Then, the inner lower lips position is obtained by subtracting the validated lower lips thickness (VLLT). For correcting the A parameter, a periodical spline interpolation with a geometrical deformation of six CLNF inner lips landmarks is explored. Combined with an automatic round lips detector, this method is efficient to correct A parameter for round lips (the third vowel viseme made of French vowels with a small opening). HD-CTM is evaluated on 4800 images of three French speakers. It corrects about 95% CLNF errors of the B parameter, and total RMSE of one pixel (i.e., 0.05 cm on average) is achieved. The periodical spline interpolation method is tested on 927 round lips images. The total error of CLNF is reduced significantly, which is comparable to the state of the art. Moreover, the third viseme is properly distributed in the parameter A and B plane after using this method.

Journal ArticleDOI
TL;DR: This paper proposes a novel recursive NLM (RNLM) algorithm for video processing that takes advantage of recursion for computational savings, compared with the direct 3D NLM, and is able to exploit both spatial and temporal redundancy for improved performance.
Abstract: In this paper, we propose a computationally efficient algorithm for video denoising that exploits temporal and spatial redundancy. The proposed method is based on non-local means (NLM). NLM methods have been applied successfully in various image denoising applications. In the single-frame NLM method, each output pixel is formed as a weighted sum of the center pixels of neighboring patches, within a given search window. The weights are based on the patch intensity vector distances. The process requires computing vector distances for all of the patches in the search window. Direct extension of this method from 2D to 3D, for video processing, can be computationally demanding. Note that the size of a 3D search window is the size of the 2D search window multiplied by the number of frames being used to form the output. Exploiting a large number of frames in this manner can be prohibitive for real-time video processing. Here, we propose a novel recursive NLM (RNLM) algorithm for video processing. Our RNLM method takes advantage of recursion for computational savings, compared with the direct 3D NLM. However, like the 3D NLM, our method is still able to exploit both spatial and temporal redundancy for improved performance, compared with 2D NLM. In our approach, the first frame is processed with single-frame NLM. Subsequent frames are estimated using a weighted sum of pixels from the current frame and a pixel from the previous frame estimate. Only the single best matching patch from the previous estimate is incorporated into the current estimate. Several experimental results are presented here to demonstrate the efficacy of our proposed method in terms of quantitative and subjective image quality.

Journal ArticleDOI
TL;DR: This paper improves the performance of binarization by detecting the non-text regions and processing only text regions and improves the textline detection method by extracting main textblock and compensating the skew angle and writing style.
Abstract: This paper presents a textline detection method for degraded historical documents. Our method follows a conventional two-step procedure that the binarization is first performed and then the textlines are extracted from the binary image. In order to address the challenges in historical documents such as document degradation, structure noise, and skews, we develop new methods for the binarization and textline extraction. First, we improve the performance of binarization by detecting the non-text regions and processing only text regions. We also improve the textline detection method by extracting main textblock and compensating the skew angle and writing style. Experimental results show that the proposed method yields the state-of-the-art performance for several datasets.

Journal ArticleDOI
TL;DR: This study presents methods to automatically predict emergent leadership and personality traits in the group meeting videos of the Emergent LEAdership corpus and demonstrates the presence of annotation bias as well as the benefit of transferring information from weakly similar domains.
Abstract: Automatic prediction of personalities from meeting videos is a classical machine learning problem. Psychologists define personality traits as uncorrelated long-term characteristics of human beings. However, human annotations of personality traits introduce cultural and cognitive bias. In this study, we present methods to automatically predict emergent leadership and personality traits in the group meeting videos of the Emergent LEAdership corpus. Prediction of extraversion has attracted the attention of psychologists as it is able to explain a wide range of behaviors, predict performance, and assess risk. Prediction of emergent leadership, on the other hand, is of great importance for the business community. Therefore, we focus on the prediction of extraversion and leadership since these traits are also strongly manifested in a meeting scenario through the extracted features. We use feature analysis and multi-task learning methods in conjunction with the non-verbal features and crowd-sourced annotations from the Video bLOG (VLOG) corpus to perform a multi-domain and multi-task prediction of personality traits. Our results indicate that multi-task learning methods using 10 personality annotations as tasks and with a transfer from two different datasets from different domains improve the overall recognition performance. Preventing negative transfer by using a forward task selection scheme yields the best recognition results with 74.5% accuracy in leadership and 81.3% accuracy in extraversion traits. These results demonstrate the presence of annotation bias as well as the benefit of transferring information from weakly similar domains.

Journal ArticleDOI
TL;DR: A new variant of the Hough Transform that provides an accurate detection of segment endpoints, even if they do not correspond to intersection points between line segments, and can be extended to detect predefined polygonal shapes.
Abstract: The Hough Transform (HT) is an effective and popular technique for detecting image features such as lines and curves. From its standard form, numerous variants have emerged with the objective, in many cases, of extending the kind of image features that could be detected. Particularly, corner and line segment detection using HT has been separately addressed by several approaches. To deal with the combined detection of both image features (corners and segments), this paper presents a new variant of the Hough Transform. The proposed method provides an accurate detection of segment endpoints, even if they do not correspond to intersection points between line segments. Segments are detected from their endpoints, producing not only a set of isolated segments but also a collection of polylines. This provides a direct representation of the polygonal contours of the image despite imperfections in the input data such as missing or noisy feature points. It is also shown how this proposal can be extended to detect predefined polygonal shapes. The paper describes in detail every stage of the proposed method and includes experimental results obtained from real images showing the benefits of the proposal in comparison with other approaches.

Journal ArticleDOI
TL;DR: A multi-input structure in the final fully connected layer of the proposed NR-Network to extract a multi-scale and more discriminative feature from the input image to achieve better performance than two deep benchmark networks for face recognition under noise.
Abstract: Along with the developments of deep learning, many recent architectures have been proposed for face recognition and even get close to human performance. However, accurately recognizing an identity from seriously noisy face images still remains a challenge. In this paper, we propose a carefully designed deep neural network coined noise-resistant network (NR-Network) for face recognition under noise. We present a multi-input structure in the final fully connected layer of the proposed NR-Network to extract a multi-scale and more discriminative feature from the input image. Experimental results such as the receiver-operating characteristic (ROC) curves on the AR database injected with different noise types show that the NR-Network is visibly superior to some state-of-the-art feature extraction algorithms and also achieves better performance than two deep benchmark networks for face recognition under noise.

Journal ArticleDOI
TL;DR: This study first analyzed the statistical relationship between the best mode and the costs calculated through Rough Mode Decision (RMD) process and proposed an effective mode decision algorithm in intra-mode prediction process that provides an average time reduction rate of 53% compared to the reference HM-16.12.
Abstract: High Efficiency Video Coding (HEVC or H.265), the latest international video coding standard, displays a 50% bit rate reduction with nearly equal quality and dramatically higher coding complexity compared with H.264. Unlike other fast algorithms, we first propose an algorithm that combines the CU coding bits with the reduction of unnecessary intra-prediction modes to decrease computational complexity. In this study, we first analyzed the statistical relationship between the best mode and the costs calculated through Rough Mode Decision (RMD) process and proposed an effective mode decision algorithm in intra-mode prediction process. We alleviated the computation difficulty by carrying out the RMD process in two stages, reducing 35 modes down to 11 modes in the first RMD process stage, and adding modes adjacent to the most promising modes selected during the first stage into the second RMD stage. After these two stages, we had two or three modes ready to be used in the rate distortion operation (RDO) process instead of the three or eight in the original HEVC process, which significantly reduced the number of unnecessary candidate modes in the RDO process. We then used the coding bits of the current coding unit (CU) as the main basis for judging its complexity and proposed an early termination method for CU partition based on the number of coding bits of the current CU. Experimental results show that the proposed fast algorithm provides an average time reduction rate of 53% compared to the reference HM-16.12, with only 1.7% Bjontegaard delta rate increase, which is acceptable for Rate-Distortion performance.