scispace - formally typeset
Search or ask a question

Showing papers in "Eurasip Journal on Image and Video Processing in 2021"


Journal ArticleDOI
TL;DR: In this article, the state-of-the-art in the field of touchless 2D fingerprint recognition at each stage of the recognition process is summarized and technical considerations and trade-offs of the presented methods along with open issues and challenges.
Abstract: Touchless fingerprint recognition represents a rapidly growing field of research which has been studied for more than a decade Through a touchless acquisition process, many issues of touch-based systems are circumvented, eg, the presence of latent fingerprints or distortions caused by pressing fingers on a sensor surface However, touchless fingerprint recognition systems reveal new challenges In particular, a reliable detection and focusing of a presented finger as well as an appropriate preprocessing of the acquired finger image represent the most crucial tasks Also, further issues, eg, interoperability between touchless and touch-based fingerprints or presentation attack detection, are currently investigated by different research groups Many works have been proposed so far to put touchless fingerprint recognition into practice Published approaches range from self identification scenarios with commodity devices, eg, smartphones, to high performance on-the-move deployments paving the way for new fingerprint recognition application scenariosThis work summarizes the state-of-the-art in the field of touchless 2D fingerprint recognition at each stage of the recognition process Additionally, technical considerations and trade-offs of the presented methods are discussed along with open issues and challenges An overview of available research resources completes the work

27 citations


Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper adopted mathematical morphological operations to estimate and compensate the document background, whose radius is computed by the minimum entropy-based stroke width transform (SWT), and performed Laplacian energy-based segmentation on the compensated document images.
Abstract: Binarization plays an important role in document analysis and recognition (DAR) systems. In this paper, we present our winning algorithm in ICFHR 2018 competition on handwritten document image binarization (H-DIBCO 2018), which is based on background estimation and energy minimization. First, we adopt mathematical morphological operations to estimate and compensate the document background. It uses a disk-shaped structuring element, whose radius is computed by the minimum entropy-based stroke width transform (SWT). Second, we perform Laplacian energy-based segmentation on the compensated document images. Finally, we implement post-processing to preserve text stroke connectivity and eliminate isolated noise. Experimental results indicate that the proposed method outperforms other state-of-the-art techniques on several public available benchmark datasets.

24 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a JPEG image steganography payload location method based on optimal estimation of cover co-frequency sub-image, which estimates the cover JPEG image based on the Markov model of co-image.
Abstract: The excellent cover estimation is very important to the payload location of JPEG image steganography. But it is still hard to exactly estimate the quantized DCT coefficients in cover JPEG image. Therefore, this paper proposes a JPEG image steganography payload location method based on optimal estimation of cover co-frequency sub-image, which estimates the cover JPEG image based on the Markov model of co-frequency sub-image. The proposed method combines the coefficients of the same position in each 8 × 8 block in the JPEG image to obtain 64 co-frequency sub-images and then uses the maximum a posterior (MAP) probability algorithm to find the optimal estimations of cover co-frequency sub-images by the Markov model. Then, the residual of each DCT coefficient is obtained by computing the absolute difference between it and the estimated cover version of it, and the average residual over coefficients in the same position of multiple stego images embedded along the same path is used to estimate the stego position. The experimental results show that the proposed payload location method can significantly improve the locating accuracy of the stego positions in low frequencies.

18 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a novel deep YOLO V3 approach to detect the multi-object in the entire frame during the training and test phase, which followed a regression-based technique that used a probabilistic model to locate objects.
Abstract: Computer vision is an interdisciplinary domain for object detection. Object detection relay is a vital part in assisting surveillance, vehicle detection and pose estimation. In this work, we proposed a novel deep you only look once (deep YOLO V3) approach to detect the multi-object. This approach looks at the entire frame during the training and test phase. It followed a regression-based technique that used a probabilistic model to locate objects. In this, we construct 106 convolution layers followed by 2 fully connected layers and 812 × 812 × 3 input size to detect the drones with small size. We pre-train the convolution layers for classification at half the resolution and then double the resolution for detection. The number of filters of each layer will be set to 16. The number of filters of the last scale layer is more than 16 to improve the small object detection. This construction uses up-sampling techniques to improve undesired spectral images into the existing signal and rescaling the features in specific locations. It clearly reveals that the up-sampling detects small objects. It actually improves the sampling rate. This YOLO architecture is preferred because it considers less memory resource and computation cost rather than more number of filters. The proposed system is designed and trained to perform a single type of class called drone and the object detection and tracking is performed with the embedded system-based deep YOLO. The proposed YOLO approach predicts the multiple bounding boxes per grid cell with better accuracy. The proposed model has been trained with a large number of small drones with different conditions like open field, and marine environment with complex background.

15 citations


Journal ArticleDOI
TL;DR: In this paper, an automated system is proposed for the identification and recognition of fruit diseases, which overcomes the challenges like convex edges, inconsistency between colors, irregularity, visibility, scale, and origin.
Abstract: Agriculture plays a critical role in the economy of several countries, by providing the main sources of income, employment, and food to their rural population. However, in recent years, it has been observed that plants and fruits are widely damaged by different diseases which cause a huge loss to the farmers, although this loss can be minimized by detecting plants’ diseases at their earlier stages using pattern recognition (PR) and machine learning (ML) techniques. In this article, an automated system is proposed for the identification and recognition of fruit diseases. Our approach is distinctive in a way, it overcomes the challenges like convex edges, inconsistency between colors, irregularity, visibility, scale, and origin. The proposed approach incorporates five primary steps including preprocessing,Standard instruction requires city and country for affiliations. Hence, please check if the provided information for each affiliation with missing data is correct and amend if deemed necessary. disease identification through segmentation, feature extraction and fusion, feature selection, and classification. The infection regions are extracted using the proposed adaptive and quartile deviation-based segmentation approach and fused resultant binary images by employing the weighted coefficient of correlation (CoC). Then the most appropriate features are selected using a novel framework of entropy and rank-based correlation (EaRbC). Finally, selected features are classified using multi-class support vector machine (MC-SCM). A PlantVillage dataset is utilized for the evaluation of the proposed system to achieving an average segmentation and classification accuracy of 93.74% and 97.7%, respectively. From the set of statistical measure, we sincerely believe that our proposed method outperforms existing method with greater accuracy.

8 citations


Journal ArticleDOI
TL;DR: Both proposed plate detection and character recognition methods have significantly outperformed conventional approaches in terms of precision and recall for multiple plate recognition.
Abstract: Multiple-license plate recognition is gaining popularity in the Intelligent Transport System (ITS) applications for security monitoring and surveillance. Advancements in acquisition devices have increased the availability of high definition (HD) images, which can capture images of multiple vehicles. Since license plate (LP) occupies a relatively small portion of an image, therefore, detection of LP in an image is considered a challenging task. Moreover, the overall performance deteriorates when the aforementioned factor combines with varying illumination conditions, such as night, dusk, and rainy. As it is difficult to locate a small object in an entire image, this paper proposes a two-step approach for plate localization in challenging conditions. In the first step, the Faster-Region-based Convolutional Neural Network algorithm (Faster R-CNN) is used to detect all the vehicles in an image, which results in scaled information to locate plates. In the second step, morphological operations are employed to reduce non-plate regions. Meanwhile, geometric properties are used to localize plates in the HSI color space. This approach increases accuracy and reduces processing time. For character recognition, the look-up table (LUT) classifier using adaptive boosting with modified census transform (MCT) as a feature extractor is used. Both proposed plate detection and character recognition methods have significantly outperformed conventional approaches in terms of precision and recall for multiple plate recognition.

7 citations


Journal ArticleDOI
TL;DR: This work contends that SFP can be an effective approach for recognizing facial expressions under different head rotations, and proposes an algorithm, called profile salient facial patches (PSFP), to achieve this objective.
Abstract: Methods using salient facial patches (SFPs) play a significant role in research on facial expression recognition However, most SFP methods use only frontal face images or videos for recognition, and they do not consider head position variations We contend that SFP can be an effective approach for recognizing facial expressions under different head rotations Accordingly, we propose an algorithm, called profile salient facial patches (PSFP), to achieve this objective First, to detect facial landmarks and estimate head poses from profile face images, a tree-structured part model is used for pose-free landmark localization Second, to obtain the salient facial patches from profile face images, the facial patches are selected using the detected facial landmarks while avoiding their overlap or the transcending of the actual face range To analyze the PSFP recognition performance, three classical approaches for local feature extraction, specifically the histogram of oriented gradients (HOG), local binary pattern, and Gabor, were applied to extract profile facial expression features Experimental results on the Radboud Faces Database show that PSFP with HOG features can achieve higher accuracies under most head rotations

7 citations


Journal ArticleDOI
TL;DR: In this paper, a systematic literature review of these technologies and summarizes a basic framework of fatigue driving monitoring system based on electrooculogram signal (EOG) is presented, and the advantages and disadvantages of existing technologies are summarized.
Abstract: To accurately identify fatigued driving, establishing a monitoring system is one of the important guarantees of improving traffic safety and reducing traffic accidents. Among many research methods, electrooculogram signal (EOG) has unique advantages. This paper presents a systematic literature review of these technologies and summarizes a basic framework of fatigue driving monitoring system based on EOGs. Then we summarize the advantages and disadvantages of existing technologies. In addition, 80 primary references published during the last decade were identified. The multi-feature fusion technique based on EOGs performs better than other traditional methods due to its low cost, low power consumption and low intrusion, while its application is still limited which needs more efforts to obtain good and generalizable results. And then, an overview of the literature on technology is given, revealing a premier and unbiased survey of the existing empirical research of classification techniques that have been applied to fatigue driving analysis. Finally, this paper adds value to the current literature by investigating the application of EOG signals in fatigued driving and the design of related systems, future guidelines have been provided to practitioners and researchers to grasp the major contributions and challenges in the state-of-the-art research.

6 citations


Journal ArticleDOI
TL;DR: In this paper, a steganographic visual stories generation model that enables users to automatically post stego status on social media without any direct user intervention and use the mutual-perceived joint attention (MPJA) to maintain the imperceptibility of text is proposed.
Abstract: Social media plays an increasingly important role in providing information and social support to users Due to the easy dissemination of content, as well as difficulty to track on the social network, we are motivated to study the way of concealing sensitive messages in this channel with high confidentiality In this paper, we design a steganographic visual stories generation model that enables users to automatically post stego status on social media without any direct user intervention and use the mutual-perceived joint attention (MPJA) to maintain the imperceptibility of stego text We demonstrate our approach on the visual storytelling (VIST) dataset and show that it yields high-quality steganographic texts Since the proposed work realizes steganography by auto-generating visual story using deep learning, it enables us to move steganography to the real-world online social networks with intelligent steganographic bots

6 citations


Journal ArticleDOI
TL;DR: In this article, a hybrid multi-modal descriptor that integrates multiple affine-invariant boundary-based and region-based features is created from the hand silhouette to obtain a reliable and representative description of individual gestures, and an ensemble of one-vs-all support vector machines is independently trained on each of these learned feature representations to perform gesture classification.
Abstract: Robust vision-based hand pose estimation is highly sought but still remains a challenging task, due to its inherent difficulty partially caused by self-occlusion among hand fingers. In this paper, an innovative framework for real-time static hand gesture recognition is introduced, based on an optimized shape representation build from multiple shape cues. The framework incorporates a specific module for hand pose estimation based on depth map data, where the hand silhouette is first extracted from the extremely detailed and accurate depth map captured by a time-of-flight (ToF) depth sensor. A hybrid multi-modal descriptor that integrates multiple affine-invariant boundary-based and region-based features is created from the hand silhouette to obtain a reliable and representative description of individual gestures. Finally, an ensemble of one-vs.-all support vector machines (SVMs) is independently trained on each of these learned feature representations to perform gesture classification. When evaluated on a publicly available dataset incorporating a relatively large and diverse collection of egocentric hand gestures, the approach yields encouraging results that agree very favorably with those reported in the literature, while maintaining real-time operation.

5 citations


Journal ArticleDOI
TL;DR: Modified Exploiting Modification Direction coded PU partition modes based steganography is proposed in this paper, which can hide a secret digit in a ( [[EQUATION]] )-ary notational system in a pair ofPU partition modes and thus enlarging the capacity.
Abstract: As High Efficiency Video Coding (HEVC) is a worldwide popular video coding standard, the steganography of HEVC videos has gained more and more attention. Prediction unit (PU) is one of the most important innovative modules of HEVC; thus, PU partition mode-based steganography is becoming a novel branch of HEVC steganography. However, the embedding capacity of this kind of steganography is limited by the types of PU partition modes. To solve the problem, modified exploiting modification direction (EMD)-coded PU partition mode-based steganography is proposed in this paper, which can hide a secret digit in a (2n + x − 1)-ary notational system in a pair of PU partition modes and thus enlarging the capacity. Furthermore, two mapping patterns for PU partition modes are analyzed, and the one that performs the better is selected as the final mapping pattern. Firstly, 8 × 8- and 16 × 16-sized PU partition modes are recorded according to the optimal mapping pattern in the video encoding process. Then, PU partition modes are modified by using the proposed method to satisfy the requirement of secret information. Finally, the stego video can be obtained by re-encoding the video with the modified PU partition modes. Experimental results show that the embedding capacity can be significantly enlarged, and compared with the state-of-the-art work, the proposed method has much larger capacity while keeping high visual quality.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a novel framework named weighted long short-term memory network (WLSTM) with saliency-aware motion enhancement (SME) for video activity prediction, where a boundary-prior based motion segmentation method is introduced to use shortest geodesic distance in an undirected weighted graph.
Abstract: In recent years, great progress has been made in recognizing human activities in complete image sequences. However, predicting human activity earlier in a video is still a challenging task. In this paper, a novel framework named weighted long short-term memory network (WLSTM) with saliency-aware motion enhancement (SME) is proposed for video activity prediction. First, a boundary-prior based motion segmentation method is introduced to use shortest geodesic distance in an undirected weighted graph. Next, a dynamic contrast segmentation strategy is proposed to segment the moving object in a complex environment. Then, the SME is constructed to enhance the moving object by suppressing irrelevant background in each frame. Moreover, an effective long-range attention mechanism is designed to further deal with the long-term dependency of complex non-periodic activities by automatically focusing more on the semantic critical frames instead of processing all sampled frames equally. Thus, the learned weights can highlight the discriminative frames and reduce the temporal redundancy. Finally, we evaluate our framework on the UT-Interaction and sub-JHMDB datasets. The experimental results show that WLSTM with SME statistically outperforms a number of state-of-the-art methods on both datasets.

Journal ArticleDOI
TL;DR: A fast ISPs coding mode optimization algorithm based on CU texture complexity is proposed, which aims to determine whether a CU needs to use ISP coding mode in advance by calculating CU texture complex, so as to reduce the computation complexity of ISP.
Abstract: In lately published video coding standard Versatile Video Coding (VVC/ H.266), the intra sub-partitions (ISP) coding mode is proposed. It is efficient for frames with rich texture, but less efficient for frames that are very flat or constant. In this paper, by comparing and analyzing the rate distortion cost (RD-cost) of coding unit (CU) with different texture features for using and not using ISP(No-ISP) coding mode, it is observed that CUs with simple texture can skip ISP coding mode. Based on this observation, a fast ISP coding mode optimization algorithm based on CU texture complexity is proposed, which aims to determine whether a CU needs to use ISP coding mode in advance by calculating CU texture complexity, so as to reduce the computation complexity of ISP. The experimental results show that under All Intra (AI) configuration, the coding time can be reduced by 7%, while the BD rate only increase by 0.09%.

Journal ArticleDOI
TL;DR: Experimental results obtained using DL models with ResNet feature extractors, and multiple benchmarks re-identification datasets, indicate that pruning can considerably reduce network complexity while maintaining a high level of accuracy.
Abstract: Recent years have witnessed a substantial increase in the deep learning (DL) architectures proposed for visual recognition tasks like person re-identification, where individuals must be recognized over multiple distributed cameras. Although these architectures have greatly improved the state-of-the-art accuracy, the computational complexity of the convolutional neural networks (CNNs) commonly used for feature extraction remains an issue, hindering their deployment on platforms with limited resources, or in applications with real-time constraints. There is an obvious advantage to accelerating and compressing DL models without significantly decreasing their accuracy. However, the source (pruning) domain differs from operational (target) domains, and the domain shift between image data captured with different non-overlapping camera viewpoints leads to lower recognition accuracy. In this paper, we investigate the prunability of these architectures under different design scenarios. This paper first revisits pruning techniques that are suitable for reducing the computational complexity of deep CNN networks applied to person re-identification. Then, these techniques are analyzed according to their pruning criteria and strategy and according to different scenarios for exploiting pruning methods to fine-tuning networks to target domains. Experimental results obtained using DL models with ResNet feature extractors, and multiple benchmarks re-identification datasets, indicate that pruning can considerably reduce network complexity while maintaining a high level of accuracy. In scenarios where pruning is performed with large pretraining or fine-tuning datasets, the number of FLOPS required by ResNet architectures is reduced by half, while maintaining a comparable rank-1 accuracy (within 1% of the original model). Pruning while training a larger CNNs can also provide a significantly better performance than fine-tuning smaller ones.

Journal ArticleDOI
TL;DR: In this paper, an intelligent video surveillance system based on embedded modules for information learning, fire detection based on color and motion information, and detection of loitering and fall based on human body motion is proposed.
Abstract: Conventional surveillance systems for preventing accidents and incidents do not identify 95% thereof after 22 min when one person monitors a plurality of closed circuit televisions (CCTV). To address this issue, while computer-based intelligent video surveillance systems have been studied to notify users of abnormal situations when they happen, it is not commonly used in real environment because of weakness of personal information leaks and high power consumption. To address this issue, intelligent video surveillance systems based on small devices have been studied. This paper suggests implement an intelligent video surveillance system based on embedded modules for intruder detection based on information learning, fire detection based on color and motion information, and loitering and fall detection based on human body motion. Moreover, an algorithm and an embedded module optimization method are applied for real-time processing. The implemented algorithm showed performance of 88.51% for intruder detection, 92.63% for fire detection, 80% for loitering detection and 93.54% for fall detection. The result of comparison before and after optimization about the algorithm processing time showed 50.53% of decrease, implying potential real-time driving of the intelligent image monitoring system based on embedded modules.

Journal ArticleDOI
TL;DR: In this paper, a new set of quaternion fractional-order generalized Laguerre orthogonal moments (QFr-GLMs) is proposed to extract both the global and local color features.
Abstract: Inspired by quaternion algebra and the idea of fractional-order transformation, we propose a new set of quaternion fractional-order generalized Laguerre orthogonal moments (QFr-GLMs) based on fractional-order generalized Laguerre polynomials. Firstly, the proposed QFr-GLMs are directly constructed in Cartesian coordinate space, avoiding the need for conversion between Cartesian and polar coordinates; therefore, they are better image descriptors than circularly orthogonal moments constructed in polar coordinates. Moreover, unlike the latest Zernike moments based on quaternion and fractional-order transformations, which extract only the global features from color images, our proposed QFr-GLMs can extract both the global and local color features. This paper also derives a new set of invariant color-image descriptors by QFr-GLMs, enabling geometric-invariant pattern recognition in color images. Finally, the performances of our proposed QFr-GLMs and moment invariants were evaluated in simulation experiments of correlated color images. Both theoretical analysis and experimental results demonstrate the value of the proposed QFr-GLMs and their geometric invariants in the representation and recognition of color images.

Journal ArticleDOI
TL;DR: In this article, the parametric model of the matrix of discrete cosine transform (DCT), and using an exhaustive search of the parameters' space, seek for the best approximations of 8-point DCT at the given computational complexities by taking into account three different scenarios of practical usage.
Abstract: In this paper, based on the parametric model of the matrix of discrete cosine transform (DCT), and using an exhaustive search of the parameters’ space, we seek for the best approximations of 8-point DCT at the given computational complexities by taking into account three different scenarios of practical usage. The possible parameter values are selected in such a way that the resulting transforms are only multiplierless approximations, i.e., only additions and bit-shift operations are required. The considered usage scenarios include such cases where approximation of DCT is used: (i) at the data compression stage, (ii) at the decompression stage, and (iii) both at the compression and decompression stages. The obtained results in effectiveness of generated approximations are compared with results of popular known approximations of 8-point DCT of the same class (i.e., multiplierless approximations). In addition, we perform a series of experiments in lossy compression of natural images using popular JPEG standard. The obtained results are presented and discussed. It should be noted, that in the overwhelming number of cases the generated approximations are better than the known ones, e.g., in asymmetric scenarios even by more than 3 dB starting from entropy of 2 bits per pixel. In the last part of the paper, we investigate the possibility of hardware implementation of generated approximations in Field-Programmable Gate Array (FPGA) circuits. The results in the form of resource and energy consumption are presented and commented. The experiment outcomes confirm the assumption that the considered class of transformations is characterized by low resource utilization.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a new methodology for HEp2 segmentation from IIf images by maximum modified quantum entropy, which used a new criterion with a flexible representation of the quantum image (FRQI).
Abstract: The autoimmune disorders such as rheumatoid, arthritis, and scleroderma are connective tissue diseases (CTD). Autoimmune diseases are generally diagnosed using the antinuclear antibody (ANA) blood test. This test uses indirect immune fluorescence (IIf) image analysis to detect the presence of liquid substance antibodies at intervals the blood, which is responsible for CTDs. Typically human alveolar epithelial cells type 2 (HEp2) are utilized as the substrate for the microscope slides. The various fluorescence antibody patterns on HEp-2 cells permits the differential designation-diagnosis. The segmentation of HEp-2 cells of IIf images is therefore a crucial step in the ANA test. However, not only this task is extremely challenging, but physicians also often have a considerable number of IIf images to examine.In this study, we propose a new methodology for HEp2 segmentation from IIf images by maximum modified quantum entropy. Besides, we have used a new criterion with a flexible representation of the quantum image(FRQI). The proposed methodology determines the optimum threshold based on the quantum entropy measure, by maximizing the measure of class separability for the obtained classes over all the gray levels. We tested the suggested algorithm over all images of the MIVIA HEp 2 image data set.To objectively assess the proposed methodology, segmentation accuracy (SA), Jaccard similarity (JS), the F1-measure,the Matthews correlation coefficient(MCC), and the peak signal-to-noise ratio (PSNR) were used to evaluate performance. We have compared the proposed methodology with quantum entropy, Kapur and Otsu algorithms, respectively.The results show that the proposed algorithm is better than quantum entropy and Kapur methods. In addition, it overcomes the limitations of the Otsu method concerning the images which has positive skew histogram.This study can contribute to create a computer-aided decision (CAD) framework for the diagnosis of immune system diseases

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an effective network for pulmonary nodule segmentation and classification at one time based on adversarial training scheme, which consists of a High-Resolution network with Multi-scale Progressive Fusion (HR-MPF) and a proposed Progressive Decoding Module (PDM) recovering final pixel-wise prediction results.
Abstract: Accurate segmentation and classification of pulmonary nodules are of great significance to early detection and diagnosis of lung diseases, which can reduce the risk of developing lung cancer and improve patient survival rate. In this paper, we propose an effective network for pulmonary nodule segmentation and classification at one time based on adversarial training scheme. The segmentation network consists of a High-Resolution network with Multi-scale Progressive Fusion (HR-MPF) and a proposed Progressive Decoding Module (PDM) recovering final pixel-wise prediction results. Specifically, the proposed HR-MPF firstly incorporates boosted module to High-Resolution Network (HRNet) in a progressive feature fusion manner. In this case, feature communication is augmented among all levels in this high-resolution network. Then, downstream classification module would identify benign and malignant pulmonary nodules based on feature map from PDM. In the adversarial training scheme, a discriminator is set to optimize HR-MPF and PDM through back propagation. Meanwhile, a reasonably designed multi-task loss function optimizes performance of segmentation and classification overall. To improve the accuracy of boundary prediction crucial to nodule segmentation, a boundary consistency constraint is designed and incorporated in the segmentation loss function. Experiments on publicly available LUNA16 dataset show that the framework outperforms relevant advanced methods in quantitative evaluation and visual perception.

Journal ArticleDOI
TL;DR: This paper used photo-realistic, synthesized facial images with varying parameters and corresponding ground-truth landmarks to enable comparison of alignment and landmark detection techniques relative to general performance, performance across focal length, and performance across viewing angle.
Abstract: Recent attention to facial alignment and landmark detection methods, particularly with application of deep convolutional neural networks, have yielded notable improvements. Neither these neural-network nor more traditional methods, though, have been tested directly regarding performance differences due to camera-lens focal length nor camera viewing angle of subjects systematically across the viewing hemisphere. This work uses photo-realistic, synthesized facial images with varying parameters and corresponding ground-truth landmarks to enable comparison of alignment and landmark detection techniques relative to general performance, performance across focal length, and performance across viewing angle. Recently published high-performing methods along with traditional techniques are compared in regards to these aspects.

Journal ArticleDOI
TL;DR: In this article, a convolutional autoencoder is used to scale down the input dimension and typify image features with high exactness, which can preserve high spatial details and high spectral characteristics simultaneously.
Abstract: In this paper, we propose a pansharpening method based on a convolutional autoencoder. The convolutional autoencoder is a sort of convolutional neural network (CNN) and objective to scale down the input dimension and typify image features with high exactness. First, the autoencoder network is trained to reduce the difference between the degraded panchromatic image patches and reconstruction output original panchromatic image patches. The intensity component, which is developed by adaptive intensity-hue-saturation (AIHS), is then delivered into the trained convolutional autoencoder network to generate an enhanced intensity component of the multi-spectral image. The pansharpening is accomplished by improving the panchromatic image from the enhanced intensity component using a multi-scale guided filter; then, the semantic detail is injected into the upsampled multi-spectral image. Real and degraded datasets are utilized for the experiments, which exhibit that the proposed technique has the ability to preserve the high spatial details and high spectral characteristics simultaneously. Furthermore, experimental results demonstrated that the proposed study performs state-of-the-art results in terms of subjective and objective assessments on remote sensing data.

Journal ArticleDOI
TL;DR: The assistance system proposed in this article can provide visually effective quantitative data, assist the law to prevent malicious plagiarism on images by unfair competitors, and reduce the plagiarism caused by the similar design concepts of late trademark designers.
Abstract: Trademarks are common graphic signs in human society. People used this kind of graphic sign to distinguish the signs of representative significance such as individuals, organizations, countries, and groups. Under effective use, these graphic signs can bring maintenance and development resources and profits to the owner. In addition to maintenance and development, organizations that have obtained resources can further promote national and social progress. However, the benefits of these resources have also attracted the attention of unfair competitors. By imitating counterfeit trademarks that appear, unfair competitors can steal the resources of the original trademark. In order to prevent such acts of unfair competitors, the state has formulated laws to protect trademarks. In the past, there have also been researches on similar trademark searches to assist in trademark protection. Although the original trademark is protected by national laws, unfair competitors have recently used psychological methods to counterfeit the original trademark and steal its resources. Trademarks counterfeited through psychology have the characteristics of confuse consumers and do not constitute infringement under the law. Under the influence of such counterfeit trademarks, the original trademark is still not well protected. In order to effectively prevent such trademark counterfeiting through psychology, this article proposes new features based on trademark design and Gestalt psychology to assist legal judgments. These features correspond to a part of the process that is not fully understood in the human visual system and quantify them. In the experimental results, we used past cases to analyze the proposed assistance system. Discussions based on past judgments proved that the quantitative results of the proposed system are similar to the plaintiff or the judgment to determine the reasons for plagiarism. This result shows that the assistance system proposed in this article can provide visually effective quantitative data, assist the law to prevent malicious plagiarism on images by unfair competitors, and reduce the plagiarism caused by the similar design concepts of late trademark designers.

Journal ArticleDOI
TL;DR: In this article, an Improved Deep Recursive Residual Network (IDRRN) super-resolution model is proposed to decrease the difficulty of network training The deep recursive structure is configured to control the model parameter number while increasing the network depth At the same time, the short-path recursive connections are used to alleviate the gradient disappearance and enhance the feature propagation.
Abstract: Single-frame image super-resolution (SISR) technology in remote sensing is improving fast from a performance point of view Deep learning methods have been widely used in SISR to improve the details of rebuilt images and speed up network training However, these supervised techniques usually tend to overfit quickly due to the models’ complexity and the lack of training data In this paper, an Improved Deep Recursive Residual Network (IDRRN) super-resolution model is proposed to decrease the difficulty of network training The deep recursive structure is configured to control the model parameter number while increasing the network depth At the same time, the short-path recursive connections are used to alleviate the gradient disappearance and enhance the feature propagation Comprehensive experiments show that IDRRN has a better improvement in both quantitation and visual perception

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a real-time target detection and tracking algorithm for embedded systems that combines the object detection model of single-shot multibox detection in deep convolution networks and the kernel correlation filters tracking algorithm using field-programmable gate arrays.
Abstract: With the increasing application of computer vision technology in autonomous driving, robot, and other mobile devices, more and more attention has been paid to the implementation of target detection and tracking algorithms on embedded platforms. The real-time performance and robustness of algorithms are two hot research topics and challenges in this field. In order to solve the problems of poor real-time tracking performance of embedded systems using convolutional neural networks and low robustness of tracking algorithms for complex scenes, this paper proposes a fast and accurate real-time video detection and tracking algorithm suitable for embedded systems. The algorithm combines the object detection model of single-shot multibox detection in deep convolution networks and the kernel correlation filters tracking algorithm, what is more, it accelerates the single-shot multibox detection model using field-programmable gate arrays, which satisfies the real-time performance of the algorithm on the embedded platform. To solve the problem of model contamination after the kernel correlation filters algorithm fails to track in complex scenes, an improvement in the validity detection mechanism of tracking results is proposed that solves the problem of the traditional kernel correlation filters algorithm not being able to robustly track for a long time. In order to solve the problem that the missed rate of the single-shot multibox detection model is high under the conditions of motion blur or illumination variation, a strategy to reduce missed rate is proposed that effectively reduces the missed detection. The experimental results on the embedded platform show that the algorithm can achieve real-time tracking of the object in the video and can automatically reposition the object to continue tracking after the object tracking fails.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors developed an app, which runs on the mobile phones of the sellers and provides the operating environment and automated assistance capabilities of social network applications, which can collect social information published by the sellers during the assistance process.
Abstract: Social e-commerce has been a hot topic in recent years, with the number of users increasing year by year and the transaction money exploding Unlike traditional e-commerce, the main activities of social e-commerce are on social network apps To classify sellers by the merchandise, this article designs and implements a social network seller classification scheme We develop an app, which runs on the mobile phones of the sellers and provides the operating environment and automated assistance capabilities of social network applications The app can collect social information published by the sellers during the assistance process, uploads to the server to perform model training on the data We collect 38,970 sellers’ information, extract the text information in the picture with the help of OCR, and establish a deep learning model based on BERT to classify the merchandise of sellers In the final experiment, we achieve an accuracy of more than 90%, which shows that the model can accurately classify sellers on a social network

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a framework to solve the problem by training a stacked generative adversarial network with attention guidance, which can efficiently create a high-resolution, realistic-looking composite.
Abstract: Perfect image compositing can harmonize the appearance between the foreground and background effectively so that the composite result looks seamless and natural. However, the traditional convolutional neural network (CNN)-based methods often fail to yield highly realistic composite results due to overdependence on scene parsing while ignoring the coherence of semantic and structural between foreground and background. In this paper, we propose a framework to solve this problem by training a stacked generative adversarial network with attention guidance, which can efficiently create a high-resolution, realistic-looking composite. To this end, we develop a diverse adversarial loss in addition to perceptual and guidance loss to train the proposed generative network. Moreover, we construct a multi-scenario dataset for high-resolution image compositing, which contains high-quality images with different styles and object masks. Experiments on the synthesized and real images demonstrate the efficiency and effectiveness of our network in producing seamless, natural, and realistic results. Ablation studies show that our proposed network can improve the visual performance of composite results compared with the application of existing methods.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a stable enhanced superresolution generative adversarial network (SESRGAN) algorithm to address the low-resolution and blurred texture details in ancient murals.
Abstract: A stable enhanced superresolution generative adversarial network (SESRGAN) algorithm was proposed in this study to address the low-resolution and blurred texture details in ancient murals. This algorithm makes improvements on the basis of GANs, which use dense residual blocks to extract image features. After two upsampling steps, the feature information of the image is input into the high-resolution (HR) image space to realize an improvement in resolution, and the reconstructed HR image is finally generated. The discriminator network uses VGG as its basic framework to judge the authenticity of the input image. This study further optimized the details of the network model. In addition, three loss optimization models, i.e., the perceptual loss, content loss, and adversarial loss models, were integrated into the proposed algorithm. The Wasserstein GAN-gradient penalty (WGAN-GP) theory was used to optimize the adversarial loss of the model when calculating the perceptual loss and when using the preactivation feature information for calculation purposes. In addition, public data sets were used to pretrain the generative network model to achieve a high-quality initialization. The simulation experiment results showed that the proposed algorithm outperforms other related superresolution algorithms in terms of both objective and subjective evaluation indicators. A subjective perception evaluation was also conducted, and the reconstructed images produced by our algorithm were more in line with the general public’s visual perception than those produced by the other compared algorithms.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a supervised retinal vessel extraction scheme using constrained-based nonnegative matrix factorization (NMF) and three dimensional (3D) modified attention U-Net architecture.
Abstract: Due to the complex morphology and characteristic of retinal vessels, it remains challenging for most of the existing algorithms to accurately detect them This paper proposes a supervised retinal vessels extraction scheme using constrained-based nonnegative matrix factorization (NMF) and three dimensional (3D) modified attention U-Net architecture The proposed method detects the retinal vessels by three major steps First, we perform Gaussian filter and gamma correction on the green channel of retinal images to suppress background noise and adjust the contrast of images Then, the study develops a new within-class and between-class constrained NMF algorithm to extract neighborhood feature information of every pixel and reduce feature data dimension By using these constraints, the method can effectively gather similar features within-class and discriminate features between-class to improve feature description ability for each pixel Next, this study formulates segmentation task as a classification problem and solves it with a more contributing 3D modified attention U-Net as a two-label classifier for reducing computational cost This proposed network contains an upsampling to raise image resolution before encoding and revert image to its original size with a downsampling after three max-pooling layers Besides, the attention gate (AG) set in these layers contributes to more accurate segmentation by maintaining details while suppressing noises Finally, the experimental results on three publicly available datasets DRIVE, STARE, and HRF demonstrate better performance than most existing methods

Journal ArticleDOI
TL;DR: In this article, a buffer starvation evaluation model based on deep learning and a video stream scheduling model were proposed to evaluate and improve the video service quality of 5G-powered UAV.
Abstract: With regard to video streaming services under wireless networks, how to improve the quality of experience (QoE) has always been a challenging task. Especially after the arrival of the 5G era, more attention has been paid to analyze the experience quality of video streaming in more complex network scenarios (such as 5G-powered drone video transmission). Insufficient buffer in the video stream transmission process will cause the playback to freeze [1]. In order to cope with this defect, this paper proposes a buffer starvation evaluation model based on deep learning and a video stream scheduling model based on reinforcement learning. This approach uses the method of machine learning to extract the correlation between the buffer starvation probability distribution and the traffic load, thereby obtaining the explicit evaluation results of buffer starvation events and a series of resource allocation strategies that optimize long-term QoE. In order to deal with the noise problem caused by the random environment, the model introduces an internal reward mechanism in the scheduling process, so that the agent can fully explore the environment. Experiments have proved that our framework can effectively evaluate and improve the video service quality of 5G-powered UAV.

Journal ArticleDOI
TL;DR: In this article, a dictionary learning based scheme was proposed to recognize printed texture patterns, where dictionaries for all kinds of texture patterns were learned from print-and-scanned texture modules in the training stage.
Abstract: Quick Response (QR) codes are designed for information storage and high-speed reading applications. To store additional information, Two-Level QR (2LQR) codes replace black modules in standard QR codes with specific texture patterns. When the 2LQR code is printed, texture patterns are blurred and their sizes are smaller than $$0.5{\mathrm{cm}}^{2}$$ . Recognizing small-sized blurred texture patterns is challenging. In original 2LQR literature, recognition of texture patterns is based on maximizing the correlation between print-and-scanned texture patterns and the original digital ones. When employing desktop printers with large pixel extensions and low-resolution capture devices, the recognition accuracy of texture patterns greatly reduces. To improve the recognition accuracy under this situation, our work presents a dictionary learning based scheme to recognize printed texture patterns. To our best knowledge, it is the first attempt to use dictionary learning to promote the recognition accuracy of printed texture patterns. In our scheme, dictionaries for all kinds of texture patterns are learned from print-and-scanned texture modules in the training stage. And these learned dictionaries are employed to represent each texture module in the testing stage (extracting process) to recognize their texture pattern. Experimental results show that our proposed algorithm significantly reduces the recognition error of small-sized printed texture patterns.