scispace - formally typeset
Search or ask a question

Showing papers in "Iet Computer Vision in 2018"


Journal ArticleDOI
TL;DR: Major constraints on vision-based gesture recognition occurring in detection and pre-processing, representation and feature extraction, and recognition are surveyed.
Abstract: The ability of computers to recognise hand gestures visually is essential for progress in human-computer interaction. Gesture recognition has applications ranging from sign language to medical assistance to virtual reality. However, gesture recognition is extremely challenging not only because of its diverse contexts, multiple interpretations, and spatio-temporal variations but also because of the complex non-rigid properties of the hand. This study surveys major constraints on vision-based gesture recognition occurring in detection and pre-processing, representation and feature extraction, and recognition. Current challenges are explored in detail.

138 citations


Journal ArticleDOI
TL;DR: This study proposes a novel two-step deep learning classifier to distinguish flowers of a wide range of species, modelled as a binary classifier in a fully convolutional network framework.
Abstract: Flower classification is a challenging task due to the wide range of flower species, which have a similar shape, appearance or surrounding objects such as leaves and grass. In this study, the authors propose a novel two-step deep learning classifier to distinguish flowers of a wide range of species. First, the flower region is automatically segmented to allow localisation of the minimum bounding box around it. The proposed flower segmentation approach is modelled as a binary classifier in a fully convolutional network framework. Second, they build a robust convolutional neural network classifier to distinguish the different flower types. They propose novel steps during the training stage to ensure robust, accurate and real-time classification. They evaluate their method on three well known flower datasets. Their classification results exceed 97% on all datasets, which are better than the state-of-the-art in this domain.

62 citations


Journal ArticleDOI
TL;DR: An automatic skin lesion segmentation method which can be used as a preliminary step for lesion classification and obtained Dice coefficient values of 0.8236 and 0.9139 for ISIC 2017 test dataset and PH 2 dataset, respectively.
Abstract: Skin cancer is the most common type of cancer in the world and the incidents of skin cancer have been rising over the past decade. Even with a dermoscopic imaging system, which magnifies the lesion region, detecting and classifying skin lesions by visual examination is laborious due to the complex structures of the lesions. This necessitates the need for an automated skin lesion diagnosis system to enhance the diagnostic capability of dermatologists. In this study, the authors propose an automatic skin lesion segmentation method which can be used as a preliminary step for lesion classification. The proposed method comprises two major steps, namely preprocessing and segmentation. In the preprocessing step, noise such as illumination, hair and rulers are removed using filtering techniques and in the segmentation phase, skin lesions are segmented using the GrabCut segmentation algorithm. The k-means clustering algorithm is then used along with the colour features learnt from the training images to improve the boundaries of the segments. To evaluate the authors' proposed method, they have used ISIC 2017 challenge dataset and PH 2 dataset. They have obtained Dice coefficient values of 0.8236 and 0.9139 for ISIC 2017 test dataset and PH 2 dataset, respectively.

61 citations


Journal ArticleDOI
TL;DR: The authors propose a new skeleton-based approach to describe the spatio-temporal aspects of a human activity sequence, using the Minkowski and cosine distances between the 3D joints.
Abstract: There is a significantly increasing demand for monitoring systems for elderly people in the health-care sector. As the aging population increases, patient privacy violations and the cost of elderly assistance have driven the research community toward computer vision and image processing to design and deploy new systems for monitoring the elderly in the authors' society and turning their living houses into smart environments. By exploiting recent advances and the low cost of three-dimensional (3D) depth sensors such as Microsoft Kinect, the authors propose a new skeleton-based approach to describe the spatio-temporal aspects of a human activity sequence, using the Minkowski and cosine distances between the 3D joints. We trained and validated their approach on the Microsoft MSR 3D Action and MSR Daily Activity 3D datasets using the Extremely Randomised Trees algorithm. The results are very promising, demonstrating that the trained model can be used to build a monitoring system for the elderly using open-source libraries and a low-cost depth sensor.

47 citations


Journal ArticleDOI
TL;DR: A novel multi-layer fused convolution neural network (MLF-CNN) that can detect pedestrians in different scales, even in adverse illumination conditions and can significantly reduce the detection miss rate is proposed.
Abstract: In this study, a novel multi-layer fused convolution neural network (MLF-CNN) is proposed for detecting pedestrians under adverse illumination conditions Currently, most existing pedestrian detectors are very likely to be stuck under adverse illumination circumstances such as shadows, overexposure, or nighttime To detect pedestrians under such conditions, the authors apply deep learning for effective fusion of the visible and thermal information in multispectral images The MLF-CNN consists of a proposal generation stage and a detection stage In the first stage, they design an MLF region proposal network and propose to use summation fusion method for integration of the two convolutional layers This combination can detect pedestrians in different scales, even in adverse illumination Furthermore, instead of extracting features from a single layer, they extract features from three feature maps and match the scale using the fused ROI pooling layers This new multiple-layer fusion technique can significantly reduce the detection miss rate Extensive evaluations of several challenging datasets well demonstrate that their approach achieves state-of-the-art performance For example, their method performs 2862% better than the baseline method and 1135% better than the well-known faster R-CNN halfway fusion method in detection accuracy on KAIST multispectral pedestrian dataset

46 citations


Journal ArticleDOI
TL;DR: A fourth-order partial differential equations based trilateral filter (FPDETF) dehazing approach is proposed to enhance the coarse estimated atmospheric veil and is able to reduce halo and gradient reversal artefacts and preserve radiometric information of haze-free images.
Abstract: Remote sensing images taken in hazy situations are degraded by scattering of atmospheric particles, which greatly influences the efficiency of visual systems. Therefore, the visibility restoration of hazy images becomes a significant area of research. In this study, a fourth-order partial differential equations based trilateral filter (FPDETF) dehazing approach is proposed to enhance the coarse estimated atmospheric veil. FPDETF is able to reduce halo and gradient reversal artefacts. It also preserves the radiometric information of haze-free images. The visibility restoration phase is also refined to reduce the colour distortion of dehazed images. The proposed technique has been evaluated on ten well-known remote sensing images and also compared with seven well-known existing dehazing approaches. The experimental results reveal that the proposed technique outperforms others in terms of contrast gain and percentage of saturated pixels.

43 citations


Journal ArticleDOI
TL;DR: A fusion of structural and textural features from two descriptors is introduced, which outperforms the existing methods on the PH 2 database.
Abstract: Melanoma is one the most increasing cancers since past decades. For accurate detection and classification, discriminative features are required to distinguish between benign and malignant cases. In this study, the authors introduce a fusion of structural and textural features from two descriptors. The structural features are extracted from wavelet and curvelet transforms, whereas the textural features are extracted from different variants of local binary pattern operator. The proposed method is implemented on 200 images from PH 2 dermoscopy database including 160 non-melanoma and 40 melanoma images, where a rigorous statistical analysis for the database is performed. Using support vector machine (SVM) classifier with random sampling cross-validation method between the three cases of skin lesions given in the database, the validated results showed a very encouraging performance with a sensitivity of 78.93%, a specificity of 93.25% and an accuracy of 86.07%. The proposed approach outperforms the existing methods on the PH 2 database.

41 citations


Journal ArticleDOI
TL;DR: Experimental results show that the proposed texture- based features give significantly better results in cervical cancer detection when compared with state of the art shape-based features regarding accuracy.
Abstract: In India, cervical cancer is the second most common type of cancer in females Pap smear is a simple cytology test for the detection of cancer in its early stages To obtain the best results from the Pap smear, expert pathologist are required Availability of pathologist in India is far below the required numbers, especially in rural parts In this paper, multiple texture-based features are introduced for the extraction of relevant and informative features from single-cell images First-order histogram, GLCM, LBP, Laws, and DWT are used for texture feature extraction These methods help to recognise the contour of the nucleus and cytoplasm ANN and SVM are used to classify the single-cell images either normal or cancerous based on the trained features ANN and SVM are used on every single feature as well as on the combination of all features Best results are obtained with a combination of all features The system is evaluated on generated dataset MNITJ, containing 330 single cervical cell images and also on publicly available benchmark Herlev data set Experimental results show that the proposed texture-based features give significantly better results in cervical cancer detection when compared with state of the art shape-based features regarding accuracy

40 citations


Journal ArticleDOI
TL;DR: A deep convolutional neural network-based regularised discriminant learning framework which extracts low-dimensional discriminative features for melanoma detection is proposed and minimises the whole of within-class variance information and maximises the total class variance information.
Abstract: Of all the skin cancer that is prevalent, melanoma has the highest mortality rates. Melanoma becomes life threatening when it penetrates deep into the dermis layer unless detected at an early stage, it becomes fatal since it has a tendency to migrate to other parts of our body. This study presents an automated non-invasive methodology to assist the clinicians and dermatologists for detection of melanoma. Unlike conventional computational methods which require (expensive) domain expertise for segmentation and hand crafted feature computation and/or selection, a deep convolutional neural network-based regularised discriminant learning framework which extracts low-dimensional discriminative features for melanoma detection is proposed. Their approach minimises the whole of within-class variance information and maximises the total class variance information. The importance of various subspaces arising in the within-class scatter matrix followed by dimensionality reduction using total class variance information is analysed for melanoma detection. Experimental results on ISBI 2016, MED-NODE, PH2 and the recent ISBI 2017 databases show the efficacy of their proposed approach as compared to other state-of-the-art methodologies.

40 citations


Journal ArticleDOI
TL;DR: A new method is presented based on genetic algorithm to achieve adaptive SE for target detection in IR images and the contrast between the target and background clutter is greatly increased while maintaining a low false alarm rate.
Abstract: Automatic detection and tracking of a small target in infrared (IR) images are of great importance. Toggle operator (TO) is the newest class of non-linear operator morphology that has been widely used in detection and tracking the target in IR images. The most important problem in improving the efficiency of the TO is to use structural elements (SEs) in accordance with signal-to-clutter ratio (SCR) of each image. Generally, the clutters and targets are different in case of each image; therefore, for images with different SCRs, using SEs with fixed pixels and dimensions cannot lead to successful target detection. In this study, a new method is presented based on genetic algorithm to achieve adaptive SE for target detection in IR images. In this method, by designing the SE in accordance with the characteristics of each image, a large amount of background clutter and noise is suppressed and the contrast between target and background is increased. The results of a large set of real IR images including moving targets show that the proposed algorithm is effective in target detection. In the proposed method, the contrast between the target and background clutter is greatly increased while maintaining a low false alarm rate.

40 citations


Journal ArticleDOI
TL;DR: A new convolutional neural network structure is designed, fine-tuning Visual Geometry Group Network, up to 19 layers to achieve a 20-layer network, for palmprint gender classification, and experimental results show that the proposed structure could achieve good performance for gender classification.
Abstract: Palmprint gender classification can revolutionise the performance of authentication systems, reduce searching space and speed up matching rate. However, to the best of their knowledge, there is no literature addressing this issue. The authors design a new convolutional neural network (CNN) structure, fine-tuning Visual Geometry Group Network, up to 19 layers to achieve a 20-layer network, for palmprint gender classification. Experimental results show that the proposed structure could achieve good performance for gender classification. They also investigate palmprint images with 15 different kinds of spectra. They empirically find that a palmprint image acquired by the Blue spectrum could achieve 89.2% correct classification and could be considered as a suitable spectrum for gender classification. The neural network is able to classify a 224 × 224 × 3-pixel palmprint image in <23 ms, verifying that the proposed CNN is an effective real-time solution.

Journal ArticleDOI
TL;DR: Three Probabilistic convolutional neural networks built on top of deterministic ones with two probabilistic deep learning frameworks - DISCO networks and Bayesian SegNet are proposed and evaluated.
Abstract: The authors consider the problem of human pose estimation using probabilistic convolutional neural networks. They explore ways to improve human pose estimation accuracy on standard pose estimation benchmarks MPII human pose and Leeds Sports Pose (LSP) datasets using frameworks for probabilistic deep learning. Such frameworks transform deterministic neural network into a probabilistic one and allow sampling of independent and equiprobable hypotheses (different outputs) for a given input. Overlapping body parts and body joints hidden under clothes or other obstacles make the problem of human pose estimation ambiguous. In this context to get accurate estimation of joints' position they use uncertainty in network's predictions, which is represented by variance of hypotheses, provided by a probabilistic convolutional neural network, and confidence is characterised by mean of them. Their work is based on current CNN cascades for pose estimation. They propose and evaluate three probabilistic convolutional neural networks built on top of deterministic ones with two probabilistic deep learning frameworks - DISCO networks and Bayesian SegNet. The authors evaluate their models on standard pose estimation benchmarks and show that proposed probabilistic models outperform base deterministic ones.

Journal ArticleDOI
TL;DR: Although the results show that the performances of the system are comparable with the state of the art, recognition improvements are obtained with the activities related to health-care environments, showing promise for applications in the AL realm.
Abstract: Human activity recognition is an important and active field of research having a wide range of applications in numerous fields including ambient-assisted living (AL). Although most of the researches are focused on the single user, the ability to recognise two-person interactions is perhaps more important for its social implications. This study presents a two-person activity recognition system that uses skeleton data extracted from a depth camera. The human actions are encoded using a set of a few basic postures obtained with an unsupervised clustering approach. Multiclass support vector machines are used to build models on the training set, whereas the X-means algorithm is employed to dynamically find the optimal number of clusters for each sample during the classification phase. The system is evaluated on the Institute of Systems and Robotics (ISR) - University of Lincoln (UoL) and Stony Brook University (SBU) datasets, reaching overall accuracies of 0.87 and 0.88, respectively. Although the results show that the performances of the system are comparable with the state of the art, recognition improvements are obtained with the activities related to health-care environments, showing promise for applications in the AL realm.

Journal ArticleDOI
TL;DR: A compact end-to-end neural network, which is trained in the framework of conditional generative adversarial networks, is proposed for the real-time pixel-level segmentation of insulators.
Abstract: The conventional inspection of fragile insulators is critical to grid operation and insulator segmentation is the basis of inspection. However, the segmentation of various insulators is still difficult because of the great differences in colour and shape, as well as the cluttered background. Traditional insulator segmentation algorithms need many artificial thresholds, thereby limiting the adaptability of algorithms. A compact end-to-end neural network, which is trained in the framework of conditional generative adversarial networks, is proposed for the real-time pixel-level segmentation of insulators. The input image is mapped to a visual saliency map, and various insulators with different poses are filtered out at the same time. The proposed two-stage training and empty samples are also used to improve the segmentation quality. Extensive experiments and comparisons are performed on many real-world images. The experimental results demonstrate superior segmentation and real-time performance. Meanwhile, the effectiveness of the proposed training strategies and the trade-off between performance and speed are analysed in detail.

Journal ArticleDOI
TL;DR: This study presents a novel approach for Arabic video text recognition based on recurrent neural networks that relies specifically on a multi-dimensional long short-term memory coupled with a connectionist temporal classification layer and brings robust performance and yields a low-error rate.
Abstract: This study presents a novel approach for Arabic video text recognition based on recurrent neural networks. In fact, embedded texts in videos represent a rich source of information for indexing and automatically annotating multimedia documents. However, video text recognition is a non-trivial task due to many challenges like the variability of text patterns and the complexity of backgrounds. In the case of Arabic, the presence of diacritic marks, the cursive nature of the script and the non-uniform intra/inter word distances, may introduce many additional challenges. The proposed system presents a segmentation-free method that relies specifically on a multi-dimensional long short-term memory coupled with a connectionist temporal classification layer. It is shown that using an efficient pre-processing step and a compact representation of Arabic character models brings robust performance and yields a low-error rate than other recently published methods. The authors' system is trained and evaluated using the public AcTiV-R dataset under different evaluation protocols. The obtained results are very interesting. They also outperform current state-of-the-art approaches on the public dataset ALIF in terms of recognition rates at both character and line levels.

Journal ArticleDOI
TL;DR: A pre-trained Levenberg–Marquardt neural network is used to perform ad-hoc clustering of skin lesion features in order to achieve an efficient nevus discrimination (benign against melanoma), as well as a numerical array to be used for follow-up rate definition and assessment.
Abstract: Traditional methods for early detection of melanoma rely on the visual analysis of the skin lesions performed by a dermatologist. The analysis is based on the so-called ABCDE (Asymmetry, Border irregularity, Colour variegation, Diameter, Evolution) criteria, although confirmation is obtained through biopsy performed by a pathologist. The proposed method exploits an automatic pipeline based on morphological analysis and evaluation of skin lesion dermoscopy images. Preliminary segmentation and pre-processing of dermoscopy image by SC-cellular neural networks is performed, in order to obtain ad-hoc grey-level skin lesion image that is further exploited to extract analytic innovative hand-crafted image features for oncological risks assessment. In the end, a pre-trained Levenberg–Marquardt neural network is used to perform ad-hoc clustering of such features in order to achieve an efficient nevus discrimination (benign against melanoma), as well as a numerical array to be used for follow-up rate definition and assessment. Moreover, the authors further evaluated a combination of stacked autoencoders in lieu of the Levenberg–Marquardt neural network for the clustering step.

Journal ArticleDOI
TL;DR: A fall detection method for indoor environments based on the Kinect sensor and analysis of three-dimensional skeleton joints information is proposed and can be used in real-time video surveillance because of its time efficiency and robustness.
Abstract: Falls sustained by subjects can have severe consequences, especially for elderly persons living alone. A fall detection method for indoor environments based on the Kinect sensor and analysis of three-dimensional skeleton joints information is proposed. Compared with state-of-the-art methods, the authors' method provides two major improvements. First, possible fall activity is quantified and represented by a one-dimensional float array with only 32 items, followed by fall recognition using a support vector machine (SVM). Unlike typical deep learning methods, the input parameters of their method are dramatically reduced. Hence, videos are trained and recognised by an SVM with a low time cost. Second, the torso angle is imported to detect the start key frame of a possible fall, which is much more efficient than using a sliding window. Their approach is evaluated on the telecommunication systems team (TST) fall detection dataset v2. The results show that their approach achieves an accuracy of 92.05%, better than other typical methods. According to the characters of machine learning, when more samples are imported, their method is expected to achieve a higher accuracy and stronger capability of fall-like discrimination. It can be used in real-time video surveillance because of its time efficiency and robustness.

Journal ArticleDOI
TL;DR: The authors address the problem of remote sensing image classification by introducing a novel deep recurrent architecture that incorporates high-level feature descriptors to tackle this challenging problem based on the general encoder–decoder framework.
Abstract: Automatically classifying an image has been a central problem in computer vision for decades. A plethora of models has been proposed, from handcrafted feature solutions to more sophisticated approaches such as deep learning. The authors address the problem of remote sensing image classification, which is an important problem to many real world applications. They introduce a novel deep recurrent architecture that incorporates high-level feature descriptors to tackle this challenging problem. Their solution is based on the general encoder–decoder framework. To the best of the authors’ knowledge, this is the first study to use a recurrent network structure on this task. The experimental results show that the proposed framework outperforms the previous works in the three datasets widely used in the literature. They have achieved a state-of-the-art accuracy rate of 97.29% on the UC Merced dataset.

Journal ArticleDOI
TL;DR: Numerical results prove that the LTV-based model is fastest, and the CTV model is the best for denoising with edge-preserving, and it also leads to the best visually haze-free and noise-free images.
Abstract: Single image dehazing and denoising models can simultaneously remove haze and noise with high efficiency. Here, the authors propose three variational models combining the celebrated dark channel prior (DCP) and total variations (TV) models for image dehazing and denoising. The authors firstly estimate the transmission map associated with depth using DCP, then design three variational models for colour image dehazing and denoising based on this estimation and the layered total variation (LTV) regulariser, multichannel total variation (MTV) regulariser, and colour total variation (CTV) regulariser, respectively. In order to improve the computation efficiency of the three models, the authors design their fast split Bregman algorithms via introducing some auxiliary variables and the Bregman iterative parameters. Numerous experiments are presented to compare their denoising effects, edge-preserving properties, and computation efficiencies. To demonstrate the merits of the proposed models, the authors also conduct some comparisons with several existing state-of-the-art methods. Numerical results further prove that the LTV-based model is fastest, and the CTV model is the best for denoising with edge-preserving, and it also leads to the best visually haze-free and noise-free images.

Journal ArticleDOI
TL;DR: The authors’ proposed method consists of three major stages: hand segmentation, hand shape sequence and body motion description, and sign classification, which is considered promising.
Abstract: With the increase in the number of deaf-mute people in the Arab world and the lack of Arabic sign language (ArSL) recognition benchmark data sets, there is a pressing need for publishing a large-volume and realistic ArSL data set. This study presents such a data set, which consists of 150 isolated ArSL signs. The data set is challenging due to the great similarity among hand shapes and motions in the collected signs. Along with the data set, a sign language recognition algorithm is presented. The authors’ proposed method consists of three major stages: hand segmentation, hand shape sequence and body motion description, and sign classification. The hand shape segmentation is based on the depth and position of the hand joints. Histograms of oriented gradients and principal component analysis are applied on the segmented hand shapes to obtain the hand shape sequence descriptor. The covariance of the three-dimensional joints of the upper half of the skeleton in addition to the hand states and face properties are adopted for motion sequence description. The canonical correlation analysis and random forest classifiers are used for classification. The achieved accuracy is 55.57% over 150 ArSL signs, which is considered promising.

Journal ArticleDOI
TL;DR: This study presents angled local directional pattern (ALDP), which is an improved version of LDP, for texture analysis, and experimental results show that ALDP substantially outperforms both LDP and LBP methods.
Abstract: Local binary pattern (LBP) is currently one of the most common feature extraction methods used for texture analysis. However, LBP suffers from random noise, because it depends on image intensity. Recently, a more stable feature method was introduced, local directional pattern (LDP) uses the gradient space instead of the pixel intensity. Typically, LDP generates a code based on the edge response values using Kirsch masks. Yet, despite the great achievement of LDP, it has two drawbacks. The first is the static choice of the number of most significant bits used for LDP code generation. Second, the original LDP method uses the 8-neighborhood to compute the LDP code, and the value of the centre pixel is ignored. This study presents angled local directional pattern (ALDP), which is an improved version of LDP, for texture analysis. Experimental results on two different texture data sets, using six different classifiers, show that ALDP substantially outperforms both LDP and LBP methods. The ALDP has been evaluated to recognise the facial expressions emotion. Results indicate a very high recognition rate for the proposed method. An added advantage is that ALDP has an adaptive approach for the selection of the number significant bits as opposed to LDP.

Journal ArticleDOI
Shifei Ding, Xingyu Zhao, Hui Xu, Qiangbo Zhu, Yu Xue1 
TL;DR: Compared with other multi-scale decompositions-based image fusion and other improved NSCT-PCNN algorithms, the algorithm presented in this study outperforms them in terms of objective criteria and visual appearance.
Abstract: Pulse coupled neural network (PCNN) is widely used in image processing because of its unique biological characteristics, which is suitable for image fusion. When combining PCNN with non-subsampled contourlet (NSCT) model, it is applied in overcoming the difficulty of coefficients selection for subband of the NSCT model. However in the original model, only the grey values of image pixels are used as input, without considering that the subjective vision of human eyes lacks the sensitivity to the local factors of the image. In this study, the improved pulse-coupled neural network model has replaced the grey-scale value of the image and introduced the weighted product of the strength of the gradient of the image and the local phase coherence as the model input. Finally, compared with other multi-scale decompositions-based image fusion and other improved NSCT-PCNN algorithms, the algorithm presented in this study outperforms them in terms of objective criteria and visual appearance.

Journal ArticleDOI
TL;DR: The authors propose an optimisation algorithm based on clustering and fitting (CF) for saliency detection which can effectively optimise co-saliency detection algorithms which already consider multiple similar images simultaneously to improve saliency of single images.
Abstract: In view of the observation that saliency maps generated by saliency detection algorithms usually show similarity imperfection against the ground truth, the authors propose an optimisation algorithm based on clustering and fitting (CF) for saliency detection. The algorithm uses a fitting model to represent the quantitative relationship between ground truth and algorithm-generated saliency maps. The authors use the K -means method to cluster the images into k clusters according to the similarities among images. Image similarity is measured in terms of scene and colour by using the GIST and colour histogram features, after which the fitting model for each cluster is calculated. The saliency map of a new image is optimised by using one of the fitting models which correspond to the cluster to which the image belongs. Experimental results show that their CF-based optimisation algorithm improves the performance of various single image saliency detection algorithms. Moreover, the improvement achieved by their algorithm when using both CF strategies is greater than the improvement achieved by the same algorithm when not using the clustering strategy. In addition, their proposed optimisation algorithm can also effectively optimise co-saliency detection algorithms which already consider multiple similar images simultaneously to improve saliency of single images.

Journal ArticleDOI
TL;DR: The authors showed that the score-level fusion of CNN extracted features and appearance-based KFA method have a positive effect on classification accuracy and the proposed method achieves 95.31% classification rate on animal faces, significantly better than the other state-of-the-art methods.
Abstract: A real-world animal biometric system that detects and describes animal life in image and video data is an emerging subject in machine vision. These systems develop computer vision approaches for the classification of animals. A novel method for animal face classification based on score-level fusion of recently popular convolutional neural network (CNN) features and appearance-based descriptor features is presented. This method utilises a score-level fusion of two different approaches; one uses CNN which can automatically extract features, learn and classify them; and the other one uses kernel Fisher analysis (KFA) for its feature extraction phase. The proposed method may also be used in other areas of image classification and object recognition. The experimental results show that automatic feature extraction in CNN is better than other simple feature extraction techniques (both local- and appearance-based features), and additionally, appropriate score-level combination of CNN and simple features can achieve even higher accuracy than applying CNN alone. The authors showed that the score-level fusion of CNN extracted features and appearance-based KFA method have a positive effect on classification accuracy. The proposed method achieves 95.31% classification rate on animal faces which is significantly better than the other state-of-the-art methods.

Journal ArticleDOI
TL;DR: This study considers automatic image-based individual identification of the endangered Saimaa ringed seal and proposes a framework that starts with segmentation of the seal from the background and proceeds to various post-processing steps to make the pelage pattern more visible and the identification easier.
Abstract: In order to monitor an animal population and to track individual animals in a non-invasive way, identification of individual animals based on certain distinctive characteristics is necessary. In this study, automatic image-based individual identification of the endangered Saimaa ringed seal ( Phoca hispida saimensis ) is considered. Ringed seals have a distinctive permanent pelage pattern that is unique to each individual. This can be used as a basis for the identification process. The authors propose a framework that starts with segmentation of the seal from the background and proceeds to various post-processing steps to make the pelage pattern more visible and the identification easier. Finally, two existing species independent individual identification methods are compared with a challenging data set of Saimaa ringed seal images. The results show that the segmentation and proposed post-processing steps increase the identification performance.

Journal ArticleDOI
TL;DR: The authors present efficient and effective algorithms for fall detection on the basis of sequences of depth maps and data from a wireless inertial sensor worn by a monitored person to permit distinguishing between accidental falls and activities of daily living.
Abstract: The authors present efficient and effective algorithms for fall detection on the basis of sequences of depth maps and data from a wireless inertial sensor worn by a monitored person. A set of descriptors is discussed to permit distinguishing between accidental falls and activities of daily living. Experimental validation is carried out on the freely available dataset consisting of synchronised depth and accelerometric data. Extensive experiments are conducted in the scenario with a static camera facing the scene and an active camera observing the same scene from above. Several experiments consisting of person detection, tracking and fall detection in real-time are carried out to show efficiency and reliability of the proposed solutions. The experimental results show that the developed algorithms for fall detection have high sensitivity and specificity.

Journal ArticleDOI
TL;DR: This method is able to combine recent approaches on contour-based and region-based segmentation and can detect insects with higher accuracy than that of the most used approaches.
Abstract: Insect detection is one of the most challenging problems of biometric image processing. This study focuses on developing a method to detect both individual insects and touching insects from trap images in extreme conditions. This method is able to combine recent approaches on contour-based and region-based segmentation. More precisely, the two contributions are: an adaptive k-means clustering approach by using the contour's convex hull and a new region merging algorithm. Quantitative evaluations show that the proposed method can detect insects with higher accuracy than that of the most used approaches.

Journal ArticleDOI
TL;DR: Despite that age classification using gait is a very challenging task, experiments conducted on OU-ISIR database show that the authors' proposed descriptors fusion approach enhances considerably the recognition rate.
Abstract: Far from the camera, image resolution is significantly degraded and person cannot cooperate with the acquisition equipment. So, the classical intrusive biometrics approach could not be applied. As a non-intrusive biometric, gait analysis gained the attention of the computer vision community for number of potential applications such as age estimation. Since, that gait is very sensitive to ageing, gait analysis is the suitable solution for age estimation at a great distance from the camera. Given the complexity of this task, the authors propose in this study a new approach based on descriptors cascade. The proposed approach is to use a fusion of some efficient contour and silhouette descriptors. Indeed, they introduce the proposed descriptor based on silhouette projection model (SM) in the first time. In the second time, the proposed descriptor is merged with the best existing ones in order to enhance the classification performances. Despite that age classification using gait is a very challenging task, experiments conducted on OU-ISIR database show that their proposed descriptors fusion approach enhances considerably the recognition rate.

Journal ArticleDOI
TL;DR: The authors' method adopts an iterative training process to improve transferred models by iterating among clustering, selection, exchange, and fine-tuning and can enhance transferred CNN models by using more source datasets and is competitive to the state-of-the-art methods.
Abstract: This study proposes progressive unsupervised co-learning for unsupervised person re-identification by introducing a co-training strategy in an iterative training process. The authors' method adopts an iterative training process to improve transferred models by iterating among clustering, selection, exchange, and fine-tuning. To solve the problem of transferring representations learned from multiple source datasets, their method utilises multiple convolutional neural network (CNN) models trained on different labelled source datasets by feeding soft labels obtained by clustering on target dataset to each other. The enhanced model can learn more discriminative person representations than the single model trained on multiple datasets. Experimental results on two large-scale benchmark datasets (i.e. DukeMTMC-reID and Market-1501) demonstrate that their method can enhance transferred CNN models by using more source datasets and is competitive to the state-of-the-art methods.

Journal ArticleDOI
TL;DR: An innovative peg-free hand-geometry-based user identification system using spectral properties of a minimal edge connected graph representation of hand image is proposed and the multiclass support vector machine is employed for identification of the claimed user.
Abstract: In a previously reported work, the user's hand is represented as a weighted undirected complete connected graph and spectral properties of the graph are extracted and used as feature vectors. To reduce the complexity in representing the hand image as a complete connected graph and to achieve the higher identification rate, the hand image is sought to be represented as minimal edge connected graph. The experiments are conducted separately for 16 topologies of minimal edge connected graph selected empirically to investigate the performance of the hand-geometry system. The prominent edges of hand image graph are identified experimentally by computing the identification rate. In this study, an innovative peg-free hand-geometry-based user identification system using spectral properties of a minimal edge connected graph representation of hand image is proposed. The multiclass support vector machine is employed for identification of the claimed user. The geometrical information embedded in the prominent edges will contribute to achieve better identification rate. The experimentation is carried on two databases, namely GPDS150 hand database and hand images of VTU-BEC-DB multimodal database. The minimal edge connected graph with 30 prominent edges of hand image graph achieves better identification with a faster rate.