scispace - formally typeset
Search or ask a question

Showing papers in "The Visual Computer in 2019"


Journal ArticleDOI
TL;DR: This survey presents detailed attributes of CNN with special emphasis on optimization methods that have been utilized in CNN-based methods, and introduces a taxonomy that summarizes important aspects of the CNN for approaching crowd behaviour analysis.
Abstract: Interest in automatic crowd behaviour analysis has grown considerably in the last few years. Crowd behaviour analysis has become an integral part all over the world for ensuring peaceful event organizations and minimum casualties in the places of public and religious interests. Traditionally, the area of crowd analysis was computed using handcrafted features. However, the real-world images and videos consist of nonlinearity that must be used efficiently for gaining accuracies in the results. As in many other computer vision areas, deep learning-based methods have taken giant strides for obtaining state-of-the-art performance in crowd behaviour analysis. This paper presents a comprehensive survey of current convolution neural network (CNN)-based methods for crowd behaviour analysis. We have also surveyed popular software tools for CNN in the recent years. This survey presents detailed attributes of CNN with special emphasis on optimization methods that have been utilized in CNN-based methods. It also reviews fundamental and innovative methodologies, both conventional and latest methods of CNN, reported in the last few years. We introduce a taxonomy that summarizes important aspects of the CNN for approaching crowd behaviour analysis. Details of the proposed architectures, crowd analysis needs and their respective datasets are reviewed. In addition, we summarize and discuss the main works proposed so far with particular interest on CNNs on how they treat the temporal dimension of data, their highlighting features and opportunities and challenges for future research. To the best of our knowledge, this is a unique survey for crowd behaviour analysis using the CNN. We hope that this survey would become a reference in this ever-evolving field of research.

122 citations


Journal ArticleDOI
TL;DR: In this survey, the image captioning approaches and improvements based on deep neural network are introduced, including the characteristics of the specific techniques.
Abstract: Image captioning is a hot topic of image understanding, and it is composed of two natural parts ("look" and "language expression") which correspond to the two most important fields of artificial intelligence ("machine vision" and "natural language processing"). With the development of deep neural networks and better labeling database, the image captioning techniques have developed quickly. In this survey, the image captioning approaches and improvements based on deep neural network are introduced, including the characteristics of the specific techniques. The early image captioning approach based on deep neural network is the retrieval-based method. The retrieval method makes use of a searching technique to find an appropriate image description. The template-based method separates the image captioning process into object detection and sentence generation. Recently, end-to-end learning-based image captioning method has been verified effective at image captioning. The end-to-end learning techniques can generate more flexible and fluent sentence. In this survey, the image captioning methods are reviewed in detail. Furthermore, some remaining challenges are discussed.

64 citations


Journal ArticleDOI
TL;DR: This work presents an interactive visual analytics system and approach that enables users to visualize, understand and explore univariate or multivariate long time-series data in one image using a connected scatter plot and supports interactive analysis and exploration for pattern discovery and outlier detection.
Abstract: There is a need for solutions which assist users to understand long time-series data by observing its changes over time, finding repeated patterns, detecting outliers, and effectively labeling data instances. Although these tasks are quite distinct and are usually tackled separately, we present an interactive visual analytics system and approach that can address these issues in a single system. It enables users to visualize, understand and explore univariate or multivariate long time-series data in one image using a connected scatter plot. It supports interactive analysis and exploration for pattern discovery and outlier detection. Different dimensionality reduction techniques are used and compared in our system. Because of its power of extracting features, deep learning is used for multivariate time-series along with 2D reduction techniques for rapid and easy interpretation and interaction with large amount of time-series data. We deploy our system with different time-series datasets and report two real-world case studies that are used to evaluate our system.

63 citations


Journal ArticleDOI
TL;DR: A novel descriptor to extract the color-texture features via two information types named concatenation of local and global color features is presented, which outperforms the existing state-of-the-art methods.
Abstract: Several techniques have recently been proposed to extract the features of an image. Feature extraction is one of the most important steps in various image processing and computer vision applications such as image retrieval, image classification, matching, object recognition. Relevant feature (global or local) contains discriminating information and is able to distinguish one object from others. Global features describe the entire image, whereas local features describe the image patches (small group of pixels). In this paper, we present a novel descriptor to extract the color-texture features via two information types. Our descriptor named concatenation of local and global color features is based on the fusion of global features using wavelet transform and a modified version of local ternary pattern, whereas, for the local features, speeded-up robust feature descriptor and bag of words model were used. All the features are extracted from the three color planes. To evaluate the effectiveness of our descriptor for image classification, we carried out experiments using the challenging datasets: New-BarkTex, Outex-TC13, Outex-TC14, MIT scene, UIUC sports event, Caltech 101 and MIT indoor scene. Experimental results showed that our descriptor outperforms the existing state-of-the-art methods.

61 citations


Journal ArticleDOI
TL;DR: Promising results are obtained when compared with the similar state of the arts, demonstrating the robustness of the proposed hybrid feature vector for different types of challenges—illumination, view variations posed by the datasets.
Abstract: Understanding of human action and activity from video data is growing field and received rapid importance due to surveillance, security, entertainment and personal logging. In this work, a new hybrid technique is proposed for the description of human action and activity in video sequences. The unified framework endows a robust feature vector wrapping both global and local information strengthening discriminative depiction of action recognition. Initially, entropy-based texture segmentation is used for human silhouette extraction followed by construction of average energy silhouette images (AEIs). AEIs are the 2D binary projection of human silhouette frames of the video sequences, which reduces the feature vector generation time complexity. Spatial Distribution Gradients are computed at different levels of resolution of sub-images of AEI consisting overall shape variations of human silhouette during the activity. Due to scale, rotation and translation invariant properties of STIPs, the vocabulary of DoG-based STIPs are created using vector quantization which is unique for each class of the activity. Extensive experiments are conducted to validate the performance of the proposed approach on four standard benchmarks, i.e., Weizmann, KTH, Ballet Movements, Multi-view IXMAS. Promising results are obtained when compared with the similar state of the arts, demonstrating the robustness of the proposed hybrid feature vector for different types of challenges—illumination, view variations posed by the datasets.

42 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel method for automatically classifying breast thermogram images using local energy features of wavelet sub-bands and obtained an accuracy of 91%, sensitivity 87.23% and specificity 94.34% using SVM Gaussian classifier for normalized breast thermograms.
Abstract: Breast thermography is a non-invasive imaging technique used for early detection of breast cancer based on temperatures. Temperature matrix of breast provides minute variations in temperatures, which is significant in early detection of breast cancer. The minimum, maximum temperatures and the their range may be different for each breast thermogram. Normalization of temperature matrices of breast thermograms is essential to bring the different range of temperatures to the common scale. In this article, we demonstrate the importance of temperature matrix normalization of breast thermograms. This paper also proposes a novel method for automatically classifying breast thermogram images using local energy features of wavelet sub-bands. A significant subset of features is selected by a random subset feature selection (RSFS) and genetic algorithm. Features selected by RSFS method are found to be relevant in detection of asymmetry between right and left breast. We have obtained an accuracy of 91%, sensitivity 87.23% and specificity 94.34% using SVM Gaussian classifier for normalized breast thermograms. Accuracy of classification between a set of hundred normalized and corresponding set of non-normalized breast thermograms are compared. An increase in accuracy of 16% is obtained for normalized breast thermograms in comparison with non-normalized breast thermograms.

40 citations


Journal ArticleDOI
TL;DR: This work proposes a convolutional hierarchical autoencoder model for motion prediction with a novel encoder which incorporates 1D Convolutional layers and hierarchical topology and shows that the model outperforms the state-of-the-art methods in both short-term prediction and long- term prediction.
Abstract: Human motion prediction is a challenging problem due to the complicated human body constraints and high-dimensional dynamics. Recent deep learning approaches adopt RNN, CNN or fully connected networks to learn the motion features which do not fully exploit the hierarchical structure of human anatomy. To address this problem, we propose a convolutional hierarchical autoencoder model for motion prediction with a novel encoder which incorporates 1D convolutional layers and hierarchical topology. The new network is more efficient compared to the existing deep learning models with respect to size and speed. We train the generic model on Human3.6M and CMU benchmark and conduct extensive experiments. The qualitative and quantitative results show that our model outperforms the state-of-the-art methods in both short-term prediction and long-term prediction.

39 citations


Journal ArticleDOI
TL;DR: A static software-based approach using quality features to detect the liveness in a fingerprint image using eight sensor-independent quality features from the detailed ridge–valley structure of a fingerprint at the local level to form a 13-dimensional feature vector.
Abstract: Fingerprint-based recognition is widely deployed in different domains. However, current recognition systems are vulnerable to presentation attack. Presentation attack utilizes an artificial replica of a fingerprint to deceive the sensors. In such scenarios, fingerprint liveness detection is required to ensure the actual presence of a live fingerprint. In this paper, we propose a static software-based approach using quality features to detect the liveness in a fingerprint image. The proposed method extracts eight sensor-independent quality features from the detailed ridge–valley structure of a fingerprint at the local level to form a 13-dimensional feature vector. Sequential Forward Floating Selection and Random Forest Feature Selection are used to select the optimal feature set from the created feature vector. To classify fake and live fingerprints, we have used support vector machine, random forest, and gradient boosted tree classifiers. The proposed method is tested on a publically available database of LivDet 2009 competition. The experimental results demonstrate that the least average classification error of 5.3% is achieved on LivDet 2009 database, exhibiting supremacy of the proposed method over current state-of-the-art approaches. Additionally, we have analyzed the importance of individual features on LivDet 2009 database, and effectiveness of the best-performing features is evaluated on LivDet 2011, 2013, and 2015 databases. The obtained results depict that the proposed approach is able to perform well irrespective of the different sensors and materials used in these databases. Further, the proposed method utilizes a single fingerprint image. This characteristic makes our method more user-friendly, faster, and less intrusive.

36 citations


Journal ArticleDOI
TL;DR: A novel bag-of-poses framework for action recognition using 3D skeleton data that assumes that any action can be represented by a set of predefined spatiotemporal poses and encoded with key pose histograms is introduced.
Abstract: Over the last few decades, human action recognition has become one of the most challenging tasks in the field of computer vision. Effortless and accurate extraction of 3D skeleton information has been recently achieved by means of economical depth sensors and state-of-the-art deep learning approaches. In this study, we introduce a novel bag-of-poses framework for action recognition using 3D skeleton data. Our assumption is that any action can be represented by a set of predefined spatiotemporal poses. The pose descriptor is composed of three parts. The first part is concatenation of the normalized coordinate of the skeleton joints. The second part is consisted of temporal displacement of the joints constructed with predefined temporal offset, and the third part is temporal displacement with the previous frame in the sequence. In order to generate the key poses, we apply K-means clustering over all the training pose descriptors of the dataset. SVM classifier is trained with the generated key poses to classify an action pose. Accordingly, every action in the dataset is encoded with key pose histograms. ELM classifier is used for action recognition due to its fast, accurate and reliable performance compared to the other classifiers. The proposed framework is validated with five publicly available benchmark 3D action datasets and achieved state-of-the-art results on three of the datasets and competitive results on the other two datasets compared to the other methods.

35 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel algorithm to address this problem by introducing a sparse prior on the low-rank component and an efficient solving method based on a two-stage iterative scheme is developed to address the raised optimization problem.
Abstract: Recovering the low-rank and sparse components from a given matrix is a challenging problem that has many real applications. This paper proposes a novel algorithm to address this problem by introducing a sparse prior on the low-rank component. Specifically, the low-rank component is assumed to be sparse in a transform domain and a sparse regularizer formulated as an $$\ell _1$$ -norm term is employed to promote the sparsity. The truncated nuclear norm is used to model the low-rank prior, rather than the nuclear norm used in most existing methods, to achieve a better approximation to the rank of the considered matrix. Furthermore, an efficient solving method based on a two-stage iterative scheme is developed to address the raised optimization problem. The proposed algorithm is applied to deal with synthetic data and real applications including face image shadow removal and video background subtraction, and the experimental results validate the effectiveness and accuracy of the proposed approach as compared with other methods.

32 citations


Journal ArticleDOI
TL;DR: A deep neural network is designed and trained which calculates a dominant light direction from a single RGB-D image to enhance visual coherence in augmented reality applications by providing accurate and temporally coherent estimates of real illumination.
Abstract: This paper presents a novel method for illumination estimation from RGB-D images. The main focus of the proposed method is to enhance visual coherence in augmented reality applications by providing accurate and temporally coherent estimates of real illumination. For this purpose, we designed and trained a deep neural network which calculates a dominant light direction from a single RGB-D image. Additionally, we propose a novel method for real-time outlier detection to achieve temporally coherent estimates. Our method for light source estimation in augmented reality was evaluated on the set of real scenes. Our results demonstrate that the neural network can successfully estimate light sources even in scenes which were not seen by the network during training. Moreover, we compared our results with illumination estimates calculated by the state-of-the-art method for illumination estimation. Finally, we demonstrate the applicability of our method on numerous augmented reality scenes.

Journal ArticleDOI
TL;DR: This paper is the first work proposing multi-layer robust principal component analysis (multi-layer RPCA) and using it for camera-trap images segmentation and compared it with some state-of-the-art algorithms of background subtraction, where it outperformed these other methods.
Abstract: The segmentation of animals from camera-trap images is a difficult task. To illustrate, there are various challenges due to environmental conditions and hardware limitation in these images. We proposed a multi-layer robust principal component analysis (multi-layer RPCA) approach for background subtraction. Our method computes sparse and low-rank images from a weighted sum of descriptors, using color and texture features as case of study for camera-trap images segmentation. The segmentation algorithm is composed of histogram equalization or Gaussian filtering as pre-processing, and morphological filters with active contour as post-processing. The parameters of our multi-layer RPCA were optimized with an exhaustive search. The database consists of camera-trap images from the Colombian forest taken by the Instituto de Investigacion de Recursos Biologicos Alexander von Humboldt. We analyzed the performance of our method in inherent and therefore challenging situations of camera-trap images. Furthermore, we compared our method with some state-of-the-art algorithms of background subtraction, where our multi-layer RPCA outperformed these other methods. Our multi-layer RPCA reached 76.17 and 69.97% of average fine-grained F-measure for color and infrared sequences, respectively. To our best knowledge, this paper is the first work proposing multi-layer RPCA and using it for camera-trap images segmentation.

Journal ArticleDOI
TL;DR: In order to solve the shortcomings of the traditional chaotic encryption and problems of low security, the double chaotic image encryption algorithm based on fractional Fourier transform is proposed and achieves better encryption effect.
Abstract: In order to solve the shortcomings of the traditional chaotic encryption and problems of low security, the double chaotic image encryption algorithm based on fractional Fourier transform is proposed. In this algorithm, the optimization algorithm is obtained with the help of Henon mapping and fractional Fourier transforms, then the ciphertext image obtained by the optimization algorithm is taken as input, and according to the sequence of the optimal solution, the logistic chaos is synthesized to obtain the final ciphertext image. This algorithm combines chaotic systems and Fourier transforms, which allows the plaintext to be well hidden, and spatial and frequency domain scrambling is achieved. After experiments, the results show that the improved encryption algorithm achieves better encryption effect. It not only has strong sensitivity and large key space, but also can resist attacks effectively. It has certain application value in image information security.

Journal ArticleDOI
TL;DR: A new descriptor vector for expressive human motions inspired from the Laban movement analysis method (LMA), a descriptive language with an underlying semantics that allows to qualify human motion in its different aspects is proposed.
Abstract: The purpose of this paper is to describe human motions and emotions that appear on real video images with compact and informative representations. We aimed to recognize expressive motions and analyze the relationship between human body features and emotions. We propose a new descriptor vector for expressive human motions inspired from the Laban movement analysis method (LMA), a descriptive language with an underlying semantics that allows to qualify human motion in its different aspects. The proposed descriptor is fed into a machine learning framework including, random decision forest, multi-layer perceptron and two multiclass support vector machines methods. We evaluated our descriptor first for motion recognition and second for emotion recognition from the analysis of expressive body movements. Preliminary experiments with three public datasets, MSRC-12, MSR Action 3D and UTkinect, showed that our model performs better than many existing motion recognition methods. We also built a dataset composed of 10 control motions (move, turn left, turn right, stop, sit down, wave, dance, introduce yourself, increase velocity, decrease velocity). We tested our descriptor vector and achieved high recognition performance. In the second experimental part, we evaluated our descriptor with a dataset composed of expressive gestures performed with four basic emotions selected from Russell’s Circumplex model of affect (happy, angry, sad and calm). The same machine learning methods were used for human emotions recognition based on expressive motions. A 3D virtual avatar was introduced to reproduce human body motions, and three aspects were analyzed (1) how expressed emotions are classified by humans, (2) how motion descriptor is evaluated by humans, (3) what is the relationship between human emotions and motion features.

Journal ArticleDOI
TL;DR: A multiple feature subspaces analysis (MFSA) approach, which takes advantage of facial symmetry, which is effortless and efficient in implementing, but achieves either better or competitive performance when recognizing face images taken in both constrained and unconstrained environment.
Abstract: Collecting samples is one of the main difficulties for face recognition, for example, in most of the real-world applications such as law enhancement, e-passport, and ID card identification, it is customary to collect a single sample per person (SSPP). Unfortunately, in such SSPP scenario, many presented face recognition methods suffer serious performance drop or fail due to their inability to learn the discriminative information of a person from a single sample. To address the SSPP problem, in this paper, we propose a multiple feature subspaces analysis (MFSA) approach, which takes advantage of facial symmetry. First, we divide each enrolled face into two halves about the bilateral symmetry axis and further partition every half into several local face patches. Second, we cluster all the patches into multiple groups according to their locations at the half face and formulate SSPP as a MFSA problem by learning a feature subspace for each group, so that the confusion between inter-class and intra-class variations of face patches is removed and more discriminative features can be extracted from each subspace. To recognize a target person, a k-NN classifier is employed in each subspace to predict the label of a face patch and majority voting strategy is used to identify the unlabeled subject. Compared with the state-of-the-art methods, MFSA is effortless and efficient in implementing, but achieves either better or competitive performance when recognizing face images taken in both constrained and unconstrained environment.

Journal ArticleDOI
TL;DR: The aim of this paper has been to improve the efficiency of the original MPS method and to present more efficient strategies for the selection of minimal paths.
Abstract: In many countries, the common practice for monitoring road surface conditions consists in collecting pavement images at traffic speed by devoted imaging devices. Among the surface distresses, cracking can serve as a condition indicator of the pavement structure. Image processing techniques have been then developed to computerize the survey of cracking as the support of human visual control. Among the existing automatic crack detection methods, the minimal path selection (MPS) technique has shown a better performance compared to other methods on simulated and field pavement images (Amhaz et al. in IEEE Trans Intell Transp Syst 17:2718–2729, 2016; Amhaz, in: Detection automatique de fissures dans des images de chaussee par selection de chemins minimaux, 2015). As a counterpart, MPS suffers from a large computing time. Within this scope, the aim of this paper has been to improve the efficiency of the original MPS method and to present more efficient strategies for the selection of minimal paths. Among the five main steps of the original MPS version, the improvements address the first three steps that enable the segmentation of the crack skeleton and reduce the computing burden on the last two steps. The tests of the two improved MPS versions on some image samples illustrate the large reduction in false positive paths without reducing the overall performance of the segmentation technique. Moreover, the computing time is divided by a factor sixty roughly for the latest MPS version.

Journal ArticleDOI
TL;DR: This approach presents a multi-scale porous structure-based lightweight framework to reduce the weight of 3D printed objects while meeting the specified requirements.
Abstract: Lightweight modeling is one of the most important research subjects in modern fabrication manufacturing, and it provides not only a low-cost solution but also functional applications, especially for the fabrication using 3D printing. This approach presents a multi-scale porous structure-based lightweight framework to reduce the weight of 3D printed objects while meeting the specified requirements. Specifically, the triply periodic minimal surface (TPMS) is exploited to design a multi-scale porous structure, which can achieve high mechanical behaviors with lightweight. The multi-scale porous structure is constructed using compactly supported radial basis functions, and it inherits the good properties of TPMS, such as smoothness, full connectivity (no closed hollows) and quasi-self-supporting (free of extra supports in most cases). Then, the lightweight problem utilizing the porous structures is formulated into a constrained optimization. Finally, a strength-to-weight optimization method is proposed to obtain the lightweight models. It is also worth noting that the proposed porous structures can be perfectly fabricated by common 3D printing technologies on account of the leftover material, such as the liquid in SLA, which can be removed through the fully connected void channel.

Journal ArticleDOI
TL;DR: An emotion-based diversity behavior model is presented, by adopting the OCEAN personality model and the OCC emotion model, while being enriched with the incorporation of CA-SIRS emotion contagion model, which can provide more diverse individual behaviors compared to the existing emotion- based escape simulation models.
Abstract: Computer simulations of crowd behaviors in public emergency have become a hot research field, as they are able to provide rich valuable data for public safety analysis and disaster-preparedness measures. In most of the existing crowd escape simulation systems, human behaviors are often limited to taking flight or running away. However, due to the personality and the current emotion of each individual, they may behave in various ways, such as Samaritan or giving up on life et al. In order to simulate such like diversity crowd behavior, this paper presents an emotion-based diversity behavior model, by adopting the OCEAN personality model and the OCC emotion model, while being enriched with the incorporation of CA-SIRS emotion contagion model. We also consider the effects of Yerkes–Dodson Law on individuals’ behaviors, which allows individuals to make authentic behaviors under stressful circumstances. The proposed model can provide more diverse individual behaviors compared to the existing emotion-based escape simulation models. The psychology-based individual modeling enables our model to be applied in various scenarios with slightly parameter adjustments.

Journal ArticleDOI
TL;DR: An interactive two-stage method for the segmentation of CBCT is presented, which produces accurate segmentation and is robust to changes in parameters, and compared with two similar segmentation strategy and showed that it produces more accurate segmentsation.
Abstract: Cone beam computed tomography (CBCT) is a medical imaging technique employed for diagnosis and treatment of patients with cranio-maxillofacial deformities. CBCT 3D reconstruction and segmentation of bones such as mandible or maxilla are essential procedures in surgical and orthodontic treatments. However, CBCT image processing may be impaired by features such as low contrast, inhomogeneity, noise and artifacts. Besides, values assigned to voxels are relative Hounsfield units unlike traditional computed tomography (CT). Such drawbacks render CBCT segmentation a difficult and time-consuming task, usually performed manually with tools designed for medical image processing. We present an interactive two-stage method for the segmentation of CBCT: (i) we first perform an automatic segmentation of bone structures with super-voxels, allowing a compact graph representation of the 3D data; (ii) next, a user-placed seed process guides a graph partitioning algorithm, splitting the extracted bones into mandible and skull. We have evaluated our segmentation method in three different scenarios and compared the results with ground truth data of the mandible and the skull. Results show that our method produces accurate segmentation and is robust to changes in parameters. We also compared our method with two similar segmentation strategy and showed that it produces more accurate segmentation. Finally, we evaluated our method for CT data of patients with deformed or missing bones and the segmentation was accurate for all data. The segmentation of a typical CBCT takes in average 5 min, which is faster than most techniques currently available.

Journal ArticleDOI
TL;DR: This approach is the first to apply a Gumbel-Softmax module in conditional Wasserstein GANs, as well as theFirst to explore the application of GAN-based models in the scene enhancement field.
Abstract: In this paper, we present stylistic scene enhancement GAN, SSE-GAN, a conditional Wasserstein GAN-based approach to automatic generation of mixed stylistic enhancements for 3D indoor scenes. An enhancement indicates factors that can influence the style of an indoor scene such as furniture colors and occurrence of small objects. To facilitate network training, we propose a novel enhancement feature encoding method, which represents an enhancement by a multi-one-hot vector, and effectively accommodates different enhancement factors. A Gumbel-Softmax module is introduced in the generator network to enable the generation of high fidelity enhancement features that can better confuse the discriminator. Experiments show that our approach is superior to the other baseline methods and successfully models the relationship between the style distribution and scene enhancements. Thus, although only trained with a dataset of room images in single styles, the trained generator can generate mixed stylistic enhancements by specifying multiple styles as the condition. Our approach is the first to apply a Gumbel-Softmax module in conditional Wasserstein GANs, as well as the first to explore the application of GAN-based models in the scene enhancement field.

Journal ArticleDOI
TL;DR: A novel saliency map segmentation strategy, named SSG which consists of superpixel region growing, superpixel Density-Based Spatial Clustering of Applications with Noise clustering and iterated graph cuts and is combined with saliency detection to abstract salient objects.
Abstract: Saliency detection is a popular topic for image processing recently. In this paper, we propose a simple, robust and fast salient object segmentation framework. Firstly, we develop a novel saliency map segmentation strategy, named SSG which consists of superpixel region growing, superpixel Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering and iterated graph cuts (GrabCut), where DBSCAN makes similar background regions cluster as a whole, region growing groups similar regions together as much as possible, GrabCut segments salient objects accurately. Then, the proposed SSG is combined with saliency detection to abstract salient objects. Experimental results on three benchmark datasets demonstrate that the proposed method achieves the favorable performance than many recent state-of-the-art methods in terms of precision, recall, F-measure and execution time.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a novel source term handling to improve the quality of the computed velocity field and accelerate the performance of the ISPH pressure computation in Compressible Incompressible SPH.
Abstract: Incompressible SPH (ISPH) is a promising concept for the pressure computation in SPH. It works with large timesteps and the underlying pressure Poisson equation (PPE) can be solved very efficiently. Still, various aspects of current ISPH formulations can be optimized. This paper discusses issues of the two standard source terms that are typically employed in PPEs, i.e., density invariance (DI) and velocity divergence (VD). We show that the DI source term suffers from significant artificial viscosity, while the VD source term suffers from particle disorder and volume loss. As a conclusion of these findings, we propose a novel source term handling. A first PPE is solved with the VD source term to compute a divergence-free velocity field with minimized artificial viscosity. To address the resulting volume error and particle disorder, a second PPE is solved to improve the sampling quality. The result of the second PPE is used for a particle shift (PS) only. The divergence-free velocity field—computed from the first PPE—is not changed, but only resampled at the updated particle positions. Thus, the proposed source term handling incorporates velocity divergence and particle shift (VD + PS). The proposed VD + PS variant does not only improve the quality of the computed velocity field, but also accelerates the performance of the ISPH pressure computation. This is illustrated for IISPH—a recent ISPH implementation—where a performance gain factor of 1.6 could be achieved.

Journal ArticleDOI
TL;DR: Evaluation of the proposed method on five publicly available facial kinship datasets shows the superiority ofThe proposed method over both the state-of-the-art kinship verification methods and what is known as human decision-making.
Abstract: A recent challenge in computer vision is exploring the cardinality of a relationship among multiple visual entities to answer questions like whether the subjects in a photograph have a kin relationship. This paper tackles kinship recognition from the aging viewpoint in which the system could find the parent of a child where the input image of the parent belongs to the age range that is lower than the child is. Technical contributions of this research are twofold. (1) An efficient discriminative feature space is constructed by proposing kernelized bi-directional PCA to form a topological cubic feature space. Cubic feature space in conjunction with the introduced cubic norm is used to solve the kinship problem. (2) To fill the gap of aging effect in finding a kin relation, a semi-supervised learning paradigm is proposed. To do this, first, the pooling layer of a convolutional neural network is modified to do a soft pooling. Then, the last pooling layer, as a rich feature vector, is fed into density-based spatial clustering of applications with noise algorithm. This pre-classification phase would be useful when there is no aggregation on how many classes should be used in the age group estimation task. Finally, by adding kernel computation to sparse representation classifier, the age classification is done. Evaluation of the proposed method on five publicly available facial kinship datasets shows the superiority of the proposed method over both the state-of-the-art kinship verification methods and what is known as human decision-making.

Journal ArticleDOI
TL;DR: A new focus + context distortion approach, termed PolarViz, to manipulate the radial distribution of data points and derive radial equalization to automatically spread out the frequency, and radial specification to shape the distribution based on user’s requirement is proposed.
Abstract: Visual analytics tools are of paramount importance in handling high-dimensional datasets such as those in our turbine performance assessment. Conventional tools such as RadViz have been used in 2D exploratory data analysis. However, with the increase in dataset size and dimensionality, the clumping of projected data points toward the origin in RadViz causes low space utilization, which largely degenerates the visibility of the feature characteristics. In this study, to better evaluate the hidden patterns in the center region, we propose a new focus + context distortion approach, termed PolarViz, to manipulate the radial distribution of data points. We derive radial equalization to automatically spread out the frequency, and radial specification to shape the distribution based on user’s requirement. Computational experiments have been conducted on two datasets including a benchmark dataset and a turbine performance simulation data. The performance of the proposed algorithm as well as other methods for solving the clumping problem in both data space and image space are illustrated and compared, and the pros and cons are analyzed. Moreover, a user study was conducted to assess the performance of the proposed method.

Journal ArticleDOI
TL;DR: The Visual Computer Journal is going well although unfortunately, its impact factor has decreased from 1.468 to 1.036, but one of the reasons might be that it has received more regular submissions during the year 2018.
Abstract: The Visual Computer Journal is going well although unfortunately, its impact factor has decreased from 1.468 to 1.036. It is difficult to know why the citations have lowered, but one of the reasons might be that we have received more regular submissions during the year 2018. The Visual Computer has receivedmore than 800 submissions in 2018, 11.5%more than in 2017. This is a tremendous achievement from authors, associate editors and reviewers, and it may be more difficult to keep the impact factor with so many papers. Each submitted paper is handled by an associate editor (AE) who finds at least 3 reviewers to evaluate a paper. I take the opportunity to warmly thank the associate editors as well as all reviewers listed at the end of the issue for their tremendous and invaluable work. Throughout 2018, some associate editors have left the editorial board of the Visual Computer. I would like to thank them for their great work and contribution. Here is the list of the associate editors who have left the editorial board:

Journal ArticleDOI
Long Chen1, Ronggui Wang1, Juan Yang1, Lixia Xue1, Min Hu1 
TL;DR: This paper proposes a novel multi-label image classification framework which is an improvement to the CNN–RNN design pattern and demonstrates that the model can effectively exploit the correlation between tags to improve the classification performance as well as better recognize the small targets.
Abstract: Recognizing multi-label images is a significant but challenging task toward high-level visual understanding. Remarkable success has been achieved by applying CNN–RNN design-based models to capture the underlying semantic dependencies of labels and predict the label distributions over the global-level features output by CNNs. However, such global-level features often fuse the information of multiple objects, leading to the difficulty in recognizing small object and capturing the label co-relation. To better solve this problem, in this paper, we propose a novel multi-label image classification framework which is an improvement to the CNN–RNN design pattern. By introducing the attention network module in the CNN–RNN architecture, the objects features of the attention map are separated by the channels which are further send to the LSTM network to capture dependencies and predict labels sequentially. A category-wise max-pooling operation is then performed to integrate these labels into the final prediction. Experimental results on PASCAL2007 and MS-COCO datasets demonstrate that our model can effectively exploit the correlation between tags to improve the classification performance as well as better recognize the small targets.

Journal ArticleDOI
TL;DR: Extensive experimental evaluations on two existing datasets, containing abundant colors and patterns, show that the proposed method outperforms the state-of-the-art methods quantitatively and qualitatively.
Abstract: This paper presents a novel semi-reference inspired color-to-gray conversion model for faithfully preserving the contrast details of the color image, essentially differs from most of the no-reference and reference approaches. In the proposed model, on the basic assumption that a good gray conversion should make the conveyed gradient values (i.e., contrast) to be maximal, we present a projection maximum function to model the decolorization procedure. We further incorporate weights of the original gradients into the maximum function. The Gaussian weighted factor consisting of the gradients of each channel of the input color image is employed to better reflect the degree of preserving feature discriminability and color ordering in color-to-gray conversion. The projected gradient descent and discrete searching techniques are developed to solve the proposed model with and without nonnegative constraint, respectively. Extensive experimental evaluations on two existing datasets, containing abundant colors and patterns, show that the proposed method outperforms the state-of-the-art methods quantitatively and qualitatively.

Journal ArticleDOI
TL;DR: A novel shadow removal algorithm based on multi-scale image decomposition, which can recover the illumination for complex shadows with inconsistent illumination and different surface materials, is proposed.
Abstract: Shadow removal is a fundamental and challenging problem in image processing field. Current approaches can only process shadows with simple scenes. For complex texture and illumination, the performance is less impressive. In this paper, we propose a novel shadow removal algorithm based on multi-scale image decomposition, which can recover the illumination for complex shadows with inconsistent illumination and different surface materials. Independent of shadow detection, our algorithm only requires a rough boundary distinguishing shadow regions from non-shadow regions. It first performs a multi-scale decomposition for the input image based on an illumination-sensitive smoothing process and then removes shadows in the basic layer using a local-to-global optimization strategy, which fuses all local shadow-free results in a global manner. Finally, we recover the texture details for the shadow-free basic layer and obtain the final shadow-free image. We validate the performance of the proposed method under various lighting and texture conditions and show consistent illumination between shadow and surrounding regions in the shadow removal results.

Journal ArticleDOI
TL;DR: A 3D color homography model which approximates photo-realistic color transfer algorithm as a combination of a 3D perspective transform and a mean intensity mapping is proposed.
Abstract: Color transfer is an image editing process that naturally transfers the color theme of a source image to a target image. In this paper, we propose a 3D color homography model which approximates photo-realistic color transfer algorithm as a combination of a 3D perspective transform and a mean intensity mapping. A key advantage of our approach is that the re-coded color transfer algorithm is simple and accurate. Our evaluation demonstrates that our 3D color homography model delivers leading color transfer re-coding performance. In addition, we also show that our 3D color homography model can be applied to color transfer artifact fixing, complex color transfer acceleration, and color-robust image stitching.

Journal ArticleDOI
TL;DR: This paper addresses the example-based stylization of videos by addressing the issue of extending that particular technique to video, ensuring that the solution is spatially and temporally consistent.
Abstract: This paper addresses the example-based stylization of videos. Style transfer aims at editing an image so that it matches the style of an example. This topic has recently been investigated massively, both in the industry and academia. The difficulty lies in how to capture the style of an image. For this work we build on our previous work " Split and Match " for still pictures, based on adaptive patch synthesis. We address the issue of extending that particular technique to video, ensuring that the solution is spatially and temporally consistent. Results show that our video style transfer is visually plausible, while being very competitive regarding computation time and memory when compared to neural network approaches.