scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Real-time Image Processing in 2021"


Journal ArticleDOI
TL;DR: This present study focuses on applying machine learning model for classifying tomato disease image dataset to proactively take necessary steps to combat such agricultural crisis.
Abstract: The human population is growing at a very rapid scale. With this progressive growth, it is extremely important to ensure that healthy food is available for the survival of the inhabitants of this planet. Also, the economy of developing countries is highly dependent on agricultural production. The overall economic balance gets affected if there is a variance in the demand and supply of food or agricultural products. Diseases in plants are a great threat to the yield of the crops thereby causing famines and economy slow down. Our present study focuses on applying machine learning model for classifying tomato disease image dataset to proactively take necessary steps to combat such agricultural crisis. In this work, the dataset is collected from publicly available plant–village dataset. The significant features are extracted from the dataset using the hybrid-principal component analysis–Whale optimization algorithm. Further the extracted data are fed into a deep neural network for classification of tomato diseases. The proposed model is then evaluated with the classical machine learning techniques to establish the superiority in terms of accuracy and loss rate metrics.

182 citations


Journal ArticleDOI
TL;DR: A real-time cheating immune secret sharing approach is introduced that minimizes the time as well as space complexity for the secret sharing effectively and generates meaningful shares without the restriction for any fixed number participants.
Abstract: To observe the earth surface and its atmospheric interaction, various advanced optical and radar sensors are utilized. This observation returns a huge amount of optical multidimensional remote sensing images which may be used in multidisciplinary fields. The processing of these images in real time is a challenging task because of their high spatial resolution and complex data structure. At the same time, these images are quite confidential in various applications such as in the military and intelligence sectors. For secretly transmitting the remote sensing images in real time, a real-time cheating immune secret sharing approach is introduced in this paper. The proposed approach minimizes the time as well as space complexity for the secret sharing effectively. It also generates meaningful shares without the restriction for any fixed number participants. Generated shares by the proposed approach are cheating immune. That means they can authenticate themselves if tampered with. Experimental results show the effectiveness of the proposed approach.

105 citations


Journal ArticleDOI
TL;DR: The achieved experimental results show that the proposed solution is suitable for creating a smart and real-time video-surveillance system for fire/smoke detection, and YOLOv2 is a better option compared to the other approaches for real- time fire/Smoke detection.
Abstract: This work presents a real-time video-based fire and smoke detection using YOLOv2 Convolutional Neural Network (CNN) in antifire surveillance systems. YOLOv2 is designed with light-weight neural network architecture to account the requirements of embedded platforms. The training stage is processed off-line with indoor and outdoor fire and smoke image sets in different indoor and outdoor scenarios. Ground truth labeler app is used to generate the ground truth data from the training set. The trained model was tested and compared to the other state-of-the-art methods. We used a large scale of fire/smoke and negative videos in different environments, both indoor (e.g., a railway carriage, container, bus wagon, or home/office) or outdoor (e.g., storage or parking area). YOLOv2 is a better option compared to the other approaches for real-time fire/smoke detection. This work has been deployed in a low-cost embedded device (Jetson Nano), which is composed of a single, fixed camera per scene, working in the visible spectral range. There are not specific requirements for the video camera. Hence, when the proposed solution is applied for safety on-board vehicles, or in transport infrastructures, or smart cities, the camera installed in closed-circuit television surveillance systems can be reused. The achieved experimental results show that the proposed solution is suitable for creating a smart and real-time video-surveillance system for fire/smoke detection.

80 citations


Journal ArticleDOI
TL;DR: A convolution neural network is trained to estimate the point spread function (PSF) parameters using acquired images over satellite calibration site with specific pattern and image deconvolution is performed to obtain image signal-to-noise (SNR), modulation transfer function (MTF) improvement.
Abstract: In-orbit optical-imaging instruments may suffer from degradations due to space environment impacts or long-time operation. The degradation causes blurring on the image received from the ground. Degradations come from defocus and spherical aberrations cause blurring on the received image. Image deblurring should be done in pre-processing step to compensate the sensor bad impacts. The aberrations are modeled by Zernike polynomials and treated by deep learning in deblurring method. This paper presents a method to deconvolve the acquired data to improve the image quality. A convolution neural network is trained to estimate the point spread function (PSF) parameters using acquired images over satellite calibration site with specific pattern. Image deconvolution is performed to obtain image signal-to-noise (SNR) and modulation transfer function (MTF) improvement. Technical and image data used for modeling and experiment are used from VNREDSat-1 satellite (the first operational Vietnam Earth observation optical small satellite). The experiment is performed on computers accelerated by graphics processing units (GPU) to ensure fast computation.

66 citations


Journal ArticleDOI
TL;DR: This work proposes a new efficient and high-speed image encryption scheme based on the Bülban chaotic map that is extremely secure and highly fast for real-time image processing at 80 fps (frames per second).
Abstract: In the last decades, a big number of image encryption schemes have been proposed. Most of these schemes reach a high-security level, however, their slow speeds due to their complex process make them unusable in real-time applications. Motivated by this, we propose a new efficient and high-speed image encryption scheme based on the Bulban chaotic map. Unlike most of the existing schemes, we make a wisely use of this simple chaotic map to generate only a few numbers of random rows and columns. Moreover, to further increase the speed, we raise the processing unit from the pixel level to the row/column level. Security of the new scheme is achieved through a substitution-permutation network, where we apply a circular shift of rows and columns to break the strong correlation of adjacent pixels. Then, we combine the XOR operation with the Modulo function to mask the pixels values and prevent any leak of information. High-security tests and simulation analysis have been carried out to demonstrate that the scheme is extremely secure and highly fast for real-time image processing at 80 fps (frames per second).

65 citations


Journal ArticleDOI
TL;DR: In this article, an artificial intelligence system for social distancing classification of persons using thermal images was proposed by exploiting YOLOv2 (you look at once) approach, a novel deep learning detection technique is developed for detecting and tracking people in indoor and outdoor scenarios.
Abstract: COVID-19 is a disease caused by a severe respiratory syndrome coronavirus. It was identified in December 2019 in Wuhan, China. It has resulted in an ongoing pandemic that caused infected cases including many deaths. Coronavirus is primarily spread between people during close contact. Motivating to this notion, this research proposes an artificial intelligence system for social distancing classification of persons using thermal images. By exploiting YOLOv2 (you look at once) approach, a novel deep learning detection technique is developed for detecting and tracking people in indoor and outdoor scenarios. An algorithm is also implemented for measuring and classifying the distance between persons and to automatically check if social distancing rules are respected or not. Hence, this work aims at minimizing the spread of the COVID-19 virus by evaluating if and how persons comply with social distancing rules. The proposed approach is applied to images acquired through thermal cameras, to establish a complete AI system for people tracking, social distancing classification, and body temperature monitoring. The training phase is done with two datasets captured from different thermal cameras. Ground Truth Labeler app is used for labeling the persons in the images. The proposed technique has been deployed in a low-cost embedded system (Jetson Nano) which is composed of a fixed camera. The proposed approach is implemented in a distributed surveillance video system to visualize people from several cameras in one centralized monitoring system. The achieved results show that the proposed method is suitable to set up a surveillance system in smart cities for people detection, social distancing classification, and body temperature analysis.

58 citations


Journal ArticleDOI
TL;DR: A metaheuristics method to automatically find the near-optimal values of convolutional neural network hyperparameters based on a modified firefly algorithm and develop a system for automatic image classification of glioma brain tumor grades from magnetic resonance imaging are proposed.
Abstract: The most frequent brain tumor types are gliomas. The magnetic resonance imaging technique helps to make the diagnosis of brain tumors. It is hard to get the diagnosis in the early stages of the glioma brain tumor, although the specialist has a lot of experience. Therefore, for the magnetic resonance imaging interpretation, a reliable and efficient system is required which helps the doctor to make the diagnosis in early stages. To make classification of the images, to which class the glioma belongs, convolutional neural networks, which proved that they can obtain an excellent performance in the image classification tasks, can be used. Convolutional network hyperparameters’ tuning is a very important issue in this domain for achieving high accuracy on the image classification; however, this task takes a lot of computational time. Approaching this issue, in this manuscript, we propose a metaheuristics method to automatically find the near-optimal values of convolutional neural network hyperparameters based on a modified firefly algorithm and develop a system for automatic image classification of glioma brain tumor grades from magnetic resonance imaging. First, we have tested the proposed modified algorithm on the set of standard unconstrained benchmark functions and the performance is compared to the original algorithm and other modified variants. Upon verifying the efficiency of the proposed approach in general, it is applied for hyperparameters’ optimization of the convolutional neural network. The IXI dataset and the cancer imaging archive with more collections of data are used for evaluation purposes, and additionally, the method is evaluated on the axial brain tumor images. The obtained experimental results and comparative analysis with other state-of-the-art algorithms tested under the same conditions show the robustness and efficiency of the proposed method.

47 citations


Journal ArticleDOI
TL;DR: The experimental results of both security and performance analysis show that the proposed image encryption scheme is secure enough to resist all the existing cryptanalytic attack and efficient in terms of encryption time.
Abstract: In this paper, we propose a new 2D sine–cosine cross-chaotic (SC3) map to design an image encryption scheme with high confusion and diffusion capability. We evaluate the maximum Lyapunov exponent (MLE) of the proposed SC3 map to measure its degree of sensitivity to initial conditions and perform bifurcation analysis to find the chaotic region. The proposed chaotic map generates two pseudo-random sequence $$R_1$$ and $$R_2$$ , which are used in confusion (permutation) and diffusion phase, respectively. The confusion layer is designed by shuffling the image pixels, and the diffusion layer is designed by bitwise XOR operation. The strength of the proposed image encryption scheme is evaluated against resistance to the statistical attack (information entropy, correlation coefficient, and histogram analysis), differential attack (NPCR and UACI), and sensitivity to the secret key. The experimental results of both security and performance analysis show that the proposed image encryption scheme is secure enough to resist all the existing cryptanalytic attack and efficient in terms of encryption time.

45 citations


Journal ArticleDOI
TL;DR: In this article, a robust algorithm was proposed to assist robotic surgery for censorious surgeries in real-time using reinforcement-based temporal difference (TD) based approach through assistive approaches.
Abstract: In recent years, enormous advancement has taken place in biomedical engineering, which has paved the way for robot-assisted surgery in various complex surgical procedures. In robotic surgery, the reinforcement-based Temporal Difference (TD) based approach through assistive approaches has tremendous potential. Probabilistic Roadmap (PR) can be used for recognition of the path to the region of interest without any obstacles and, Inverse Kinematics (IK) approach can be used for the accurate approximation of the pixel space to the real-time workspace. Our proposed system would be more effective in approximating the path length, depth evaluation, and less invasive contact force sensor. This article presents a robust algorithm that would assist in robotic surgery for censorious surgeries in real-time. For working on such soft tissues, software-driven procedures and algorithms must be more precise in choosing the optimal path for reaching out to the procedural region. The statistical analysis has proven that the proposed approach would be outperforming under favorable learning rate, discount factor, and the exploration factor.

43 citations


Journal ArticleDOI
TL;DR: A lightweight real-time vehicle detection model developed to run on common computing devices using the pre-trained Tiny-YOLOv3 network and subsequently pruned and simplified by training on the BIT-vehicle dataset, and excluding some of the unnecessary layers.
Abstract: In recent years, vehicle detection from video sequences has been one of the important tasks in intelligent transportation systems and is used for detection and tracking of the vehicles, capturing their violations, and controlling the traffic. This paper focuses on a lightweight real-time vehicle detection model developed to run on common computing devices. This method can be developed on low power systems (e.g. devices without GPUs or low power GPU modules), relying on the proposed real-time lightweight algorithm. The system employs an end-to-end approach for identifying, locating, and classifying vehicles in the images. The pre-trained Tiny-YOLOv3 network is adopted as the main reference model and subsequently pruned and simplified by training on the BIT-vehicle dataset, and excluding some of the unnecessary layers. The results indicated advantages of the proposed method in terms of accuracy and speed. Also, the network is capable to detect and classify six different types of vehicles with MAP = 95.05%, at the speed of 17 fps. Hence, it is about two times faster than the original Tiny-YOLOv3 network.

35 citations


Journal ArticleDOI
TL;DR: “SD-Net” an end-to-end CNN architecture, which produces real-time high quality density maps and effectively counts people in extremely overcrowded scenes and is evaluated using four publicly available crowd analysis datasets, demonstrating superiority over state-of-the-art in terms of accuracy and model size.
Abstract: The advancements in computer vision-related technologies attract many researchers for surveillance applications, particularly involving the automated crowded scenes analysis such as crowd counting in a very congested scene. In crowd counting, the main goal is to count or estimate the number of people in a particular scene. Understanding overcrowded scenes in real-time is important for instant responsive actions. However, it is a very difficult task due to some of the key challenges including clutter background, occlusion, variations in human pose and scale, and limited surveillance training data, that are inadequately covered in the employed literature. To tackle these challenges, we introduce “SD-Net” an end-to-end CNN architecture, which produces real-time high quality density maps and effectively counts people in extremely overcrowded scenes. The proposed architecture consists of depthwise separable, standard, and dilated 2D convolutional layers. Depthwise separable and standard 2D convolutional layers are used to extract 2D features. Instead of using pooling layers, dilated 2D convolutional layers are employed that results in huge receptive fields and reduces the number of parameters. Our CNN architecture is evaluated using four publicly available crowd analysis datasets, demonstrating superiority over state-of-the-art in terms of accuracy and model size.

Journal ArticleDOI
TL;DR: The proposed model achieved higher accuracy and can restore high-quality dehazed images as compared to the state-of-the-art models, and could be deployed as a real-time application for real- Time image processing, real- time remote sensing images,real-time underwater images enhancement, video-guided transportation, outdoor surveillance, and auto-driver backed systems.
Abstract: Haze and fog had a great influence on the quality of images, and to eliminate this, dehazing and defogging are applied. For this purpose, an effective and automatic dehazing method is proposed. To dehaze a hazy image, we need to estimate two important parameters such as atmospheric light and transmission map. For atmospheric light estimation, the superpixels segmentation method is used to segment the input image. Then each superpixel intensities are summed and further compared with each superpixel individually to extract the maximum intense superpixel. Extracting the maximum intense superpixel from the outdoor hazy image automatically selects the hazy region (atmospheric light). Thus, we considered the individual channel intensities of the extracted maximum intense superpixel as an atmospheric light for our proposed algorithm. Secondly, on the basis of measured atmospheric light, an initial transmission map is estimated. The transmission map is further refined through a rolling guidance filter that preserves much of the image information such as textures, structures and edges in the final dehazed output. Finally, the haze-free image is produced by integrating the atmospheric light and refined transmission with the haze imaging model. Through detailed experimentation on several publicly available datasets, we showed that the proposed model achieved higher accuracy and can restore high-quality dehazed images as compared to the state-of-the-art models. The proposed model could be deployed as a real-time application for real-time image processing, real-time remote sensing images, real-time underwater images enhancement, video-guided transportation, outdoor surveillance, and auto-driver backed systems.

Journal ArticleDOI
TL;DR: A fabric defect detection system based on VAE on Jetson TX2 from Nvidia Corporation, USA can meet the real-time requirements of the project and realize its popularization and application.
Abstract: Automatic detection of fabric defects based on machine vision is an important topic in the quality control of cotton textile factories. There are many kinds of defects in fabric production, it is very difficult to classify the defects automatically. In recent years, deep learning image processing technology based on a convolutional neural network (CNN) can train and extract features of the target image automatically. Since a large number of defect samples cannot be collected completely, we compared unsupervised learning algorithms based on CNN, including auto encoder (AE), variational automatic encoder (VAE), and generative adversarial networks (GAN). Because of the large amount of calculation and the difficulty of training in GAN, we chose AE and VAE codec networks and then introduced mean structural similarity (MSSIM) as network training loss function to improve the performance that only used $${L}_{p}$$ -distance loss function for image brightness comparison. After training finished, the authors used the trained model to obtain target defects from SSIM residual maps between input and reconstruct images. According to the evaluation results, we finally implemented a fabric defect detection system based on VAE on Jetson TX2 from Nvidia Corporation, USA. The optimized algorithm can meet the real-time requirements of the project and realize its popularization and application.

Journal ArticleDOI
TL;DR: The results indicated that the proposed system can provide better predictive power than can human-structured interviews, personality inventories, occupation interest testing, and assessment centers and can be utilized as an effective screening method using a personal-value-based competency model.
Abstract: This work aims to develop a real-time image and video processor enabled with an artificial intelligence (AI) agent that can predict a job candidate’s behavioral competencies according to his or her facial expressions. This is accomplished using a real-time video-recorded interview with a histogram of oriented gradients and support vector machine (HOG-SVM) plus convolutional neural network (CNN) recognition. Different from the classical view of recognizing emotional states, this prototype system was developed to automatically decode a job candidate’s behaviors by their microexpressions based on the behavioral ecology view of facial displays (BECV) in the context of employment interviews using a real-time video-recorded interview. An experiment was conducted at a Fortune 500 company, and the video records and competency scores were collected from the company’s employees and hiring managers. The results indicated that our proposed system can provide better predictive power than can human-structured interviews, personality inventories, occupation interest testing, and assessment centers. As such, our proposed approach can be utilized as an effective screening method using a personal-value-based competency model.

Journal ArticleDOI
TL;DR: A deep learning fire recognition algorithm based on model compression and lightweight requirements is proposed for the lightweight MobileNetV3 model to meet the needs of embedded intelligent forest fire monitoring systems using an unmanned aerial vehicles (UAV).
Abstract: To meet the needs of embedded intelligent forest fire monitoring systems using an unmanned aerial vehicles (UAV), a deep learning fire recognition algorithm based on model compression and lightweight requirements is proposed in this study. The algorithm for the lightweight MobileNetV3 model was developed to reduce the complexity of the conventional YOLOv4 network structure. The redundant channels are eliminated through channel-level sparsity-induced regularization. The knowledge distillation algorithm is used to improve the detection accuracy of the pruned model. The experimental results reveal that the number of model parameters for the proposed architecture is only 2.64 million—compared with YOLOv4, this represents a reduction of nearly 95.87%. The inference time decreased from 153.8 to 37.4 ms, a reduction of nearly 75.68%. Our approach shows the advantages of a model with a smaller number of parameters, low memory requirements and fast inference speed compared with existing algorithms. The method presented in this paper is specifically tailored for use as a deep learning forest fire monitoring system on a UAV platform.

Journal ArticleDOI
TL;DR: In this paper, the effect of the software and hardware parts for the underwater image, surveys the state-of-the-art different strategies and algorithms in underwater image enhancement, and measures the algorithm performance from various aspects.
Abstract: In recent years, deep sea and ocean explorations have attracted more attention in the marine industry. Most of the marine vehicles, including robots, submarines, and ships, would be equipped with automatic imaging of deep sea layers. There is a reason which the quality of the images taken by the underwater devices is not optimal due to water properties and impurities. Consequently, water absorbs a series of colors, so processing gets more difficult. Scattering and absorption are related to underwater imaging light and are called light attenuation in water. The examination has previously shown that the emergence of some inherent limitations is due to the presence of artifacts and environmental noise in underwater images. As a result, it is hard to distinguish objects from their backgrounds in those images in a real-time system. This paper discusses the effect of the software and hardware parts for the underwater image, surveys the state-of-art different strategies and algorithms in underwater image enhancement, and measures the algorithm performance from various aspects. We also consider the important conducted studies on the field of quality enhancement in underwater images. We have analyzed the methods from five perspectives: (a) hardware and software tools, (b) a variety of underwater imaging techniques, (c) improving real-time image quality, (d) identifying specific objectives in underwater imaging, and (e) assessments. Finally, the advantages and disadvantages of the presented real/non-real-time image processing techniques are addressed to improve the quality of the underwater images. This systematic review provides an overview of the major underwater image algorithms and real/non-real-time processing.

Journal ArticleDOI
TL;DR: A new recurrence algorithm to compute the coefficients of MNPs for high-order polynomials is proposed based on a derived identity for MNPs that reduces the number of the utilized recurrence times and the computed number ofMNPs coefficients.
Abstract: Meixner polynomials (MNPs) and their moments are considered significant feature extraction tools because of their salient representation in signal processing and computer vision. However, the existing recurrence algorithm of MNPs exhibits numerical instabilities of coefficients for high-order polynomials. This paper proposed a new recurrence algorithm to compute the coefficients of MNPs for high-order polynomials. The proposed algorithm is based on a derived identity for MNPs that reduces the number of the utilized recurrence times and the computed number of MNPs coefficients. To minimize the numerical errors, a new form of the recurrence algorithm is presented. The proposed algorithm computes $$\sim $$ 50% of the MNP coefficients. A comparison with different state-of-the-art algorithms is performed to evaluate the performance of the proposed recurrence algorithm in terms of computational cost and reconstruction error. In addition, an investigation is performed to find the maximum generated size. The results show that the proposed algorithm remarkably reduces the computational cost and increases the generated size of the MNPs. The proposed algorithm shows an average improvement of $$\sim $$ 77% in terms of computation cost. In addition, the proposed algorithm exhibits an improvement of $$\sim $$ 1269% in terms of generated size.

Journal ArticleDOI
TL;DR: A new wavelet-based multi-focus image fusion approach using method noise and anisotropic diffusion for two separate cases, i.e., with and without a reference image, specifically designed for real-time surveillance applications is presented.
Abstract: This paper presents a new wavelet-based multi-focus image fusion approach using method noise and anisotropic diffusion for two separate cases, i.e., with and without a reference image. It is specifically designed for real-time surveillance applications. It is a multi-step image fusion approach. Firstly, stationary wavelet transform (SWT) is performed to get low and high-frequency coefficients. Secondly, the input images' LL bands are fused using average operation. The rest of the respective bands are fused using a new correlation coefficient (CC) based fusion strategy using the threshold value calculated by structural similarity index metric (SSIM). Then inverse SWT is performed to reconstruct the fused coefficients. Thirdly, anisotropic diffusion-based method noise thresholding is introduced to recover the unprocessed and still damaged input images' components. Finally, the proposed approach's performance has experimented with various qualitative (visual perception) and quantitative factors (performance metrics). The experimental outcomes show that the proposed approach generates fine edges, high visual quality, high clarity of objects, and less degradation. The proposed multi-step hybrid technique is implemented to generate high-quality fused images. The experimental outcomes verify the achievement of the proposed approach.

Journal ArticleDOI
TL;DR: The proposed solution is an Augmented Reality (AR)-based smartphone application which recognizes artifacts using Deep Learning in real time and retrieve supportive multimedia information for the visitors and convolutional neural networks will be applied to correctly recognize artifacts.
Abstract: Museums have adapted their traditional ways of providing services with the advent of novel digital technologies to match up with the pace and growing needs of current industry revolution. Mixed Reality has revitalized interpretation of numerous domains by offering immersive experiences in digital and real world. In the proposed study, an attempt was made to enrich user’s museum experience with relevant multimedia information and for building a better connection with the artifacts with in Taxila Museum in Pakistan, which has beautifully preserved the Gandhara civilization. The proposed solution is an Augmented Reality (AR)-based smartphone application which recognizes artifacts using Deep Learning in real time and retrieve supportive multimedia information for the visitors. To provide user with exact content, convolutional neural networks (CNN) will be applied to correctly recognize artifacts. The significance of proposed application is compared with traditional human guided or free user tours through user-centric questionnaire-based survey. The evaluation is carefully performed using relevant evaluation models including Museum Experience Scale (MES) and triptych model of interactivity. The findings of the study are discussed and assessed comprehensively using statistical methods to highlight its significance.

Journal ArticleDOI
TL;DR: A real-time person tracking and segmentation system is introduced in this work, using an overhead camera perspective, and the SiamMask algorithm delivers good results, with a tracking accuracy of 95% and a comparison is performed with other tracking algorithms.
Abstract: Real-time video surveillance systems are widely deployed in various environments, including public areas, commercial buildings, and public infrastructures. Person detection is a key and crucial task in different video surveillance applications, such as person detection, segmentation, and tracking. Researchers presented different image processing and artificial intelligence-based approaches (including machine and deep learning) for person detection and tracking, but mainly comprised of frontal view camera perspective. A real-time person tracking and segmentation system is introduced in this work, using an overhead camera perspective. The system applied a deep learning-based algorithm, i.e., SiamMask, a simple, versatile, fast, and surpassing other real-time tracking algorithms. The algorithm also performs segmentation of the target person by combining a mask branch to the fully convolutional twin neural network for target or person tracking. First, the person video sequences are obtained from an overhead perspective, and then additional training is performed with the help of transfer learning. Finally, a comparison is performed with other tracking algorithms. The SiamMask algorithm delivers good results, with a tracking accuracy of 95%.

Journal ArticleDOI
TL;DR: The proposed hardware architecture adopts a scalable pipeline design to support multi-resolution input image and full 8-bit fixed-point datapath to improve hardware resource utilization and layer fusion technology that merges the convolution, batch normalization and Leaky-ReLU is developed to avoid transmission of intermediate data between FPGA and external memory.
Abstract: In recent years, dedicated hardware accelerators for the acceleration of the convolutional neural network (CNN) have been extensively studied. Although many studies have presented efficient designs on FPGAs for image classification neural network models such as AlexNet and VGG, there are still little implementations for CNN-based object detection applications. This paper presents an OpenCL-based high-throughput FPGA accelerator for the YOLOv2 object detection algorithm on Arria-10 GX1150 FPGA. The proposed hardware architecture adopts a scalable pipeline design to support multi-resolution input image and full 8-bit fixed-point datapath to improve hardware resource utilization. Layer fusion technology that merges the convolution, batch normalization and Leaky-ReLU is also developed to avoid transmission of intermediate data between FPGA and external memory. Experimental results show that the final design achieves a peak throughput of 566 GOP/s under the working frequency of 190 MHz. The accelerator can execute YOLOv2 inference computation ( $$288\times 288$$ resolution) and tiny YOLOv2 ( $$416\times 416$$ resolution) at the speed of 35 and 71 FPS, respectively.

Journal ArticleDOI
TL;DR: An object detection architecture, seeks to address discrediting performance with increased Intersection over Union (IoU) thresholds and takes advantage of transfer learning architecture for overhead person images, and the newly trained feature layer is added to the existing architecture.
Abstract: Internet of things (IoT) is transforming technological evolution in several practical applications. These applications range from smart cities, smart healthcare to intelligent video surveillance, where the primary interest is person monitoring and detection. The amalgamation of Artificial Intelligence (AI) and IoT-based techniques maintain a balance between computational cost and efficiency that is essential for next-generation IoT networks. In this context, a real-time IoT-enabled people detection system is introduced. The developed system performs image processing task over the cloud using an internet connection, thus reduces the computational cost by processing high-resolution images over the cloud. For person detection, a pre-trained Cascade RCNN, a deep learning approach is used. It is an object detection architecture, seeks to address discrediting performance with increased Intersection over Union (IoU) thresholds. As the architecture is pre-trained with COCO data set and the person body’s appearance in overhead perspective is significantly different; thus, additional training is performed to enhance the detection results. Taking advantage of transfer learning architecture is trained for overhead person images, and the newly trained feature layer is added to the existing architecture. Experimental outcomes reveal that additional training increases the detection architecture’s performance with an accuracy rate of 0.96.

Journal ArticleDOI
TL;DR: In this article, a hybrid and multi-level digital image denoising approach (MLAC) using a convolutional neural network (CNN) and anisotropic diffusion (AD) is proposed.
Abstract: The elimination of noisy content from digital images is one of the major issues during image pre-processing. The process of image acquisition, compression, and image transmission is a major reason for image noise that causes loss of information. This loss of information causes irregularities and error in the working of many real-time applications such as computerized photography, hurdle detection and traffic monitoring (computer vision), automatic character recognition, morphing, and surveillance applications. This paper proposes a new hybrid and multi-level digital image denoising approach (MLAC) using a convolutional neural network (CNN) and anisotropic diffusion (AD). The denoising approach uses a hybrid combination of CNN and AD using multi-level implementation. First of all, CNN is applied to noisy images for noise elimination, which results in a denoised image in the first level of image denoising. After that, denoised image is passed to AD in the second level of image denoising. The AD is applied for edge and corner preservation of objects. This hybrid approach is highly efficient in removing noise while preserving fine details of image. The proposed denoising method is experimented on all standard inbuilt image datasets of Matlab framework. It is tested on SAR images as well. The results are compared with those of some of the latest works in the field of CNN and AD. The quality of the denoised image is tested by using naked eye visual analysis factors and quantitative metrics such as peak signal-to-noise ratio (PSNR), structural similarity index metric (SSIM), universal image quality index (UIQI), feature similarity index metric (FSIM), equivalent numbers of looks (ENL), noise variance (NV), and mean-squared error (MSE). The denoising results are further critically analyzed using zooming analysis method, plotting histogram, comparative running real-time implementation aspects, and time complexity evaluation. The detailed study of result confirms that the proposed approach gives an excellent result in terms of structure, edge preservation, and noise suppression.

Journal ArticleDOI
TL;DR: The proposed crypt-watermarking system allows a good solution against brute force attack which produce a huge key-space of $$2^{768}$$ and offers a good efficiency value of 0.19 MHz/LUT in terms of FPGA resource consumption and speed, making the system a reliable choice for real sensitive embedded applications.
Abstract: In this paper a new approach for designing invisible non-blind full crypto-watermarking system targeting images security on FPGA platform is presented. This new design is based on the Hardware-Software co-design approach using the High-Level Synthesis (HLS) tool of Xilinx which allows a good compromise between development time and performances. For a better authentication and robustness of the proposed system, the Discrete Wavelet Transform (DWT) is employed. To more enhance the security level, a new chaos-based generator proposed is integrated into a stream cipher algorithm in order to encrypt and decrypt the watermark during the insertion and extraction phases.This approach allows a better secure access at the positions of the watermark and to distribute the watermark evenly throughout the image. Three novel customized Intellectual Property (IP) cores designed under HLS tool, implementing Haar DWT and the new chaos-based key generator, have been generated, tested, and validated. The generated Register Transfer Level-IP (RTL-IP) cores are integrated into a Vivado library that achieves real-time secured watermarking operations for both embedding and extraction processes. The system has been evaluated using the main metrics in terms of imperceptibility of the produced watermarked images achieving a Peak Signal to Noise Ratio (PSNR) of 47 dB, robustness against most geometric and image processing attacks achieving a Normalized Cross-Correlation (NCC) of 0.99. The proposed crypt-watermarking system allows a good solution against brute force attack which produce a huge key-space of $$2^{768}$$ . Finally, the implementation offers a good efficiency value of 0.19 MHz/LUT in terms of FPGA resource consumption and speed, making the system a reliable choice for real sensitive embedded applications.

Journal ArticleDOI
Jun Ma1, Honglin Wan1, Junxia Wang1, Hao Xia1, Chengjie Bai1 
TL;DR: This paper employs a one-stage object detection framework and proposes a pedestrian detection method based on the multi-scale attention mechanism of a convolutional neural network to improve the imbalance between accuracy and speed.
Abstract: In recent years, the performance of the convolutional neural network-based pedestrian detection method has improved significantly. However, an imbalance remains between detection accuracy and speed. In this paper, we employ a one-stage object detection framework and propose a pedestrian detection method based on the multi-scale attention mechanism of a convolutional neural network to improve the imbalance between accuracy and speed. First, a multi-scale convolution module is designed to extract corresponding features at different scales. Second, using the attention module, association information between features is mined from space and channel perspectives to strengthen the original features. Then, the enhanced features are passed through a classification and regression module to perform object positioning and bounding box regression. Finally, to learn more pedestrian location information, we improve the loss function to realise better network training. The proposed method achieved considerable results on the challenging CityPersons and Caltech pedestrian detection datasets.

Journal ArticleDOI
TL;DR: A novel real-time traffic sign detection system with a lightweight backbone network named Depth Separable DetNet (DS-DetNet) and a lite fusion feature pyramid network (LFFPN) for efficient feature fusion is proposed.
Abstract: Traffic sign detection (TSD) using convolutional neural networks (CNN) is promising and intriguing for autonomous driving. Especially, with sophisticated large-scale CNN models, TSD can be performed with high accuracy. However, the conventional CNN models suffer the drawbacks of being time-consuming and resource-hungry, which limit their application and deployments in various platforms of limited resources. In this paper, we propose a novel real-time traffic sign detection system with a lightweight backbone network named Depth Separable DetNet (DS-DetNet) and a lite fusion feature pyramid network (LFFPN) for efficient feature fusion. The new model can achieve a performance trade-off between speed and accuracy using a depthwise separable bottleneck block, a lite fusion module, and an improved SSD detection front-end. The testing results on the MS COCO and the GTSDB datasets reveal that 23.1% mAP with 6.39 M parameters and only 1.08B FLOPs on MSCOCO, 81.35% mAP with 5.78 M parameters on GTSDB. With our model, the run speed is 61 frames per second (fps) on GTX 1080ti, 12 fps on Nvidia Jetson Nano and 16 fps on Nvidia Jetson Xavier NX.

Journal ArticleDOI
TL;DR: An embedded system that operates on low-level, lightweight algorithms, based on two types of data, namely, radar signals and camera images with the purpose of identifying and classifying obstacles on the road is proposed.
Abstract: Road safety is an essential issue of modern life that must be tackled and resolved. Using AI technology to develop autonomous vehicles and driver-assistant systems is a promising approach to reduce accidents and preserve user’s security. In this regard, obstacle detection and identification have been a topic of much concern for researchers over the last few years. In this paper, we propose an embedded system that operates on low-level, lightweight algorithms, based on two types of data, namely, radar signals and camera images with the purpose of identifying and classifying obstacles on the road. The proposed system has two major contributions. The first is the use of machine-learning methods alongside signal processing techniques to optimize the overall computing performance and efficiency. Then, the second contribution consists of the use of the dynamic reconfiguration feature using DSP48 instead of standard CLBs to improve surface usage. The overall system was developed on Xilinx Zedboard Zynq-7000 FPGA.

Journal ArticleDOI
TL;DR: A novel smartphone-based architecture intended for portable and constrained systems is designed and implemented to run CNN-based object recognition in real time and with high efficiency.
Abstract: Machine learning algorithms based on convolutional neural networks (CNNs) have recently been explored in a myriad of object detection applications. Nonetheless, many devices with limited computation resources and strict power consumption constraints are not suitable to run such algorithms designed for high-performance computers. Hence, a novel smartphone-based architecture intended for portable and constrained systems is designed and implemented to run CNN-based object recognition in real time and with high efficiency. The system is designed and optimised by leveraging the integration of the best of its kind from the state-of-the-art machine learning platforms including OpenCV, TensorFlow Lite, and Qualcomm Snapdragon informed by empirical testing and evaluation of each candidate framework in a comparable scenario with a high demanding neural network. The final system has been prototyped combining the strengths from these frameworks and led to a new machine learning-based object recognition execution environment embedded in a smartphone with advantageous performance compared with the previous frameworks.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a real-time method for spillway tunnel defect detection (STDD) using deep learning, which can provide reliable support for the structure safety evaluation.
Abstract: A spillway tunnel eroded by high-speed water for a long time is prone to the rebar-exposed defects. Therefore, regular defect detection is very important for the safety of the hydropower station. The images of spillway tunnel are obtained by erecting scaffolding, and then the defects are manually recognized. This traditional method has some disadvantages such as high risk, inefficiently, time consumption and strong subjectivity. To improve the efficiency of defect detection, a real-time method is proposed for spillway tunnel defect detection (STDD) using deep learning. First, images of a spillway tunnel are collected by an Unmanned Aerial Vehicle (UAV) system and raw images are cropped and labeled to create a dataset of rebar-exposed defects. Then, the lightweight STDD network is developed using separable convolution and asymmetric convolution, and the network is trained and tested on the dataset. To evaluate the performance of STDD network, a comparative experiment is conducted with other networks. The results show that the STDD network has better detection performance. For defect segmentation, the recall, precision, F1 and mean intersection over union (mIoU) are 89.92%, 93.48%, 91.59%, and 91.73%, respectively. The STDD network has 1.7 M parameters, and the average inference time is 14.08 ms. In summary, the proposed STDD network achieves accurate and real-time defect detection for spillway tunnel, which can provide reliable support for the structure safety evaluation.

Journal ArticleDOI
TL;DR: A nighttime object detection scheme based on a lightweight deep learning model in the edge computing mode that can achieve real-time and high-accuracy object detection on edge devices is proposed.
Abstract: Autonomous driving systems in internet of vehicles (IoV) applications usually adopt a cloud computing mode. In these systems, information got at the edge of the cloud computing center for data analysis and situation response. However, the conventional IoV face enormous challenges to meet the requirements in terms of storage, communication, and computing problems because of the considerable amount of information on the traffic environment. The environment perception during the nighttime is poorer than that during the daytime that this problem also requires addressing. To solve these problems, we propose a nighttime object detection scheme based on a lightweight deep learning model in the edge computing mode. First, the pedestrian detection and the vehicle detection algorithm that using the thermal images based on the YOLO architecture. We can implement the model on edge devices that can achieve real-time detection through the designed lightweight strategy. Next, a spatial prior information and temporal prior information into the detection algorithm and divide the frames into key and non-key frames to increase the performance and speed of the system simultaneously. Finally, we implemented the detection network for performance and feasibility verification on the Jetson TX2 edge device. The experimental results show that the proposed system can achieve real-time and high-accuracy object detection on edge devices.