scispace - formally typeset
Search or ask a question

Showing papers in "Signal, Image and Video Processing in 2020"


Journal ArticleDOI
TL;DR: This research investigates distracted driver posture recognition as a part of the human action recognition framework through a combination of three of the most advanced techniques in deep learning, namely the inception module with a residual block and a hierarchical recurrent neural network.
Abstract: One of the most challenging topics in the field of intelligent transportation systems is the automatic interpretation of the driver’s behavior. This research investigates distracted driver posture recognition as a part of the human action recognition framework. Numerous car accidents have been reported that were caused by distracted drivers. Our aim was to improve the performance of detecting drivers’ distracted actions. The developed system involves a dashboard camera capable of detecting distracted drivers through 2D camera images. We use a combination of three of the most advanced techniques in deep learning, namely the inception module with a residual block and a hierarchical recurrent neural network to enhance the performance of detecting the distracted behaviors of drivers. The proposed method yields very good results. The distracted driver behaviors include texting, talking on the phone, operating the radio, drinking, reaching behind, fixing hair and makeup, and talking to the passenger.

56 citations


Journal ArticleDOI
TL;DR: This paper constructs a dot product-like operation from the mf-operator and uses it to define dense and convolutional feed-forwarding passes in AddNet, a computationally efficient additive deep neural network based on a multiplication-free vector operator.
Abstract: In this paper, we introduce a video-based wildfire detection scheme based on a computationally efficient additive deep neural network, which we call AddNet. This AddNet is based on a multiplication-free vector operator, which performs only addition and sign manipulation operations. In this regard, we construct a dot product-like operation from the mf-operator and use it to define dense and convolutional feed-forwarding passes in AddNet. We train AddNet on images taken from forestry surveillance cameras. Our experiments show that AddNet can achieve a time-saving by 12.4% when compared to an equivalent regular convolutional neural network (CNN). Furthermore, the smoke recognition performance of AddNet is as good as regular CNNs and substantially better than binary-weight neural networks.

50 citations


Journal ArticleDOI
TL;DR: Experimental results indicate that the proposed adaptive image restoration method obtains better restorations in terms of visual quality as well as quantitatively, and avoids the staircasing artifacts associated with TV regularization and its variant models.
Abstract: The total variation (TV) regularization model for image restoration is widely utilized due to its edge preservation properties. Despite its advantages, the TV regularization can obtain spurious oscillations in flat regions of digital images and thus recent works advocate high-order TV regularization models. In this work, we propose an adaptive image restoration method based on a combination of first-order and second-order total variations regularization with an inverse-gradient-based adaptive parameter. The proposed model removes noise effectively and preserves image structures. Due to the adaptive parameter estimation based on the inverse gradient, it avoids the staircasing artifacts associated with TV regularization and its variant models. Experimental results indicate that the proposed method obtains better restorations in terms of visual quality as well as quantitatively. In particular, our proposed adaptive higher-order TV method obtained (19.3159, 0.7172, 0.90985, 0.79934, 0.99838) PSNR, SSIM, MS-SSIM, F-SIM, and P-SIM values compared to related models such as the TV-Bounded Hessian (18.9735, 0.6599, 0.8718, 0.73833, 0.99767), and TV-Laplacian (19.0345, 0.6719, 0.88198, 0.75405, 0.99789).

49 citations


Journal ArticleDOI
TL;DR: A new blind image forgery detection technique which employs a new backbone architecture for deep learning which is called ResNet-conv, specifically designed to learn discriminative artifacts from tampered regions.
Abstract: Digital images have become a dominant source of information and means of communication in our society. However, they can easily be altered using readily available image editing tools. In this paper, we propose a new blind image forgery detection technique which employs a new backbone architecture for deep learning which is called ResNet-conv. ResNet-conv is obtained by replacing the feature pyramid network in ResNet-FPN with a set of convolutional layers. This new backbone is used to generate the initial feature map which is then to train the Mask-RCNN to generate masks for spliced regions in forged images. The proposed network is specifically designed to learn discriminative artifacts from tampered regions. Two different ResNet architectures are considered, namely ResNet-50 and ResNet-101. The ImageNet, He_normal, and Xavier_normal initialization techniques are employed and compared based on convergence. To train a robust model for this architecture, several post-processing techniques are applied to the input images. The proposed network is trained and evaluated using a computer-generated image splicing dataset and found to be more efficient than other techniques.

45 citations


Journal ArticleDOI
TL;DR: A computer-vision-based system that can assist the radiologists by analyzing the radiological symptoms in knee x-rays for osteoarthritis is presented, achieving more than 97% detection accuracy.
Abstract: Knee issues are very frequent among people of all ages, and osteoarthritis is one of the most common reasons behind them. The primary feature in observing extremity and advancement of osteoarthritis is joint space narrowing (cartilage loss) which is manually computed on knee x-rays by a radiologist. Such manual inspections require an expert radiologist to analyze the x-ray image; moreover, it is a tedious and time-consuming task. In this paper, we present a computer-vision-based system that can assist the radiologists by analyzing the radiological symptoms in knee x-rays for osteoarthritis. Different image processing techniques have been applied on knee radiographs to enhance their quality. The knee region is extracted automatically using template matching. The knee joint space width is calculated, and the radiographs are classified based on the comparison with the standard normal knee joint space width. The experimental evaluation performed on a large knee x-ray dataset shows that our method is able to efficiently detect osteoarthritis, achieving more than 97% detection accuracy.

31 citations


Journal ArticleDOI
TL;DR: Preliminary results show the promise of the proposed age-invariant face recognition system for personal identification despite aging process, which outperform the current state-of-the-art techniques on same data.
Abstract: Age-invariant face recognition is one of the most crucial computer vision problems, e.g., in passport verification, surveillance systems, and missing individuals identification. The extraction of robust face features is a challenge since the facial characteristics change over age progression. In this paper, an age-invariant face recognition system is proposed, which includes four stages: preprocessing, feature extraction, feature fusion, and classification. Preprocessing stage detects faces using Viola–Jones algorithm and frontal face alignment. Feature extraction is achieved using a CNN architecture using VGG-Face model to extract compact face features. Extracted features are fused using the real-time feature-level multi-discriminant correlation analysis, which significantly reduces feature dimensions and results in the most relevant features to age-invariant face recognition. Finally, K-nearest neighbor and support vector machine are investigated for classification. Our experiments are performed on two standard face-aging datasets, namely FGNET and MORPH. Rank-1 recognition accuracy of the proposed system is 81.5% on FGNET and 96.5% on MORPH. Experimental results outperform the current state-of-the-art techniques on same data. These preliminary results show the promise of the proposed system for personal identification despite aging process.

28 citations


Journal ArticleDOI
TL;DR: Experimental results and topographies show that the CSP spatial filtering method implies the relationship between EEG bands, EEG channels, neural efficiency and emotional stimuli types.
Abstract: The application of EEG-based emotional states is one of the most vital phases in the context of neural response decoding. Emotional response mostly appears in the presence of visual, auditory, tactile, and gustatory arousals. In our work, we use visual stimuli to evaluate the emotional feedback. One of the best performing methods in emotion estimation applications is the common spatial patterns (CSP). We implement CSP method in addition to the conventional Welch power spectral density-based analysis. Experimental results and topographies on the collected EEG data show that the CSP spatial filtering method implies the relationship between EEG bands, EEG channels, neural efficiency and emotional stimuli types.

28 citations


Journal ArticleDOI
TL;DR: Various new feature extraction methods employing long short-term memory network (LSTM) model have been presented in this paper, which help in the detection of heart rhythms from electrocardiogram signals.
Abstract: The high mortality rate that has been prevailing among cardiac patients can be reduced to some extent through early detection of the heart-related diseases which can be done with the help of automated computer-aided diagnosing machines. There is a need for an expert system that automatically detects the abnormalities in the heart rhythms. Various new feature extraction methods employing long short-term memory network (LSTM) model have been presented in this paper, which help in the detection of heart rhythms from electrocardiogram signals. Based on higher-order statistics, wavelets, morphological descriptors, and R–R intervals, the electrocardiogram signals are decomposed into 45 features. All these features are used as a sequence, for input, to a single LSTM model. The publically available MIT-BIH arrhythmia database has been used for training and testing. The proposed model has helped to classify five distinct arrhythmic rhythms (including normal beats). Performance evaluation of the proposed system model has obtained values like precision of 96.73%, accuracy of 99.37%, specificity of 99.14%, F-score of 95.77%, and sensitivity of 94.89%, respectively.

27 citations


Journal ArticleDOI
TL;DR: This work applies iterative closest point and kernel principal component analysis with circular kernel for feature extraction while using a circular kernel function, combined with empirical mode decomposition into intrinsic mode functions perceptual hashing using and fast Chebyshev transform, and a secure authentication approach that exploits the discrete logarithm problem and Bose–Chaudhuri–Hocquenghem error-correcting codes to generate 128-bit crypto keys.
Abstract: Ear biometrics has generated an increased interest in the domain of biometric identification systems due to its robustness and covert acquisition potential. The external structure of the human ear has a bilateral symmetry structure. Here, we analyse ear biometrics based on ear symmetry features. We apply iterative closest point and kernel principal component analysis with circular kernel for feature extraction while using a circular kernel function, combined with empirical mode decomposition into intrinsic mode functions perceptual hashing using and fast Chebyshev transform, and a secure authentication approach that exploits the discrete logarithm problem and Bose–Chaudhuri–Hocquenghem error-correcting codes to generate 128-bit crypto keys. We evaluate the proposed ear biometric cryptosecurity system using our data set of ear images acquired from 103 persons. Our results show that the ear biometric-based authentication achieved an equal error rate of 0.13 and true positive rate TPR of 0.85.

26 citations


Journal ArticleDOI
Bharat Garg1
TL;DR: In this paper, an adaptive trimmed median (ATM) filter was proposed to remove salt-and-pepper (SAP) noise of high noise density (ND) up medium range while performs new interpolation-based procedure at high ND.
Abstract: The paper presents a novel adaptive trimmed median (ATM) filter to remove salt-and-pepper (SAP) noise of high noise density (ND). The proposed filter computes median of trimmed window of adaptive size containing noise-free pixels (NFP) for ND up medium range while performs new interpolation-based procedure at high ND. Further, for the rare scenarios especially at the boundary where denoising using interpolation is not good enough, the proposed filter denoises the candidate pixel with the help of nearest processed pixels. The proposed ATM filter effectively suppresses SAP noise because denoising mostly utilizes original non-noisy pixels. The proposed algorithm is evaluated for varying ND (10–90%) with different benchmark images (greyscale and coloured) over the existing approaches. The proposed ATM filter on an average provides 1.59 dB and 0.37 dB higher PSNR values on the greyscale and color images, respectively.

26 citations


Journal ArticleDOI
TL;DR: This study presents a novel full-reference image quality assessment algorithm relying on a Siamese layout of pretrained convolutional neural networks, feature pooling, and a neural network that outperforms other state-of-the-art algorithms.
Abstract: Image quality assessment is an important element of a broad spectrum of applications ranging from automatic video streaming to display technology. In this study, we present a novel full-reference image quality assessment algorithm relying on a Siamese layout of pretrained convolutional neural networks (CNNs), feature pooling, and a neural network. Unlike previous methods, our algorithm handles input images without resizing, cropping, or any modifications. As a consequence, it effectively learns the fine-grained, quality-aware features of images. The proposed model derives its core performance from pretrained CNNs, being trained at a higher resolution than that in previous works. The presented architecture was trained on the recently published KADID-10k, which is the largest image quality database and contains 10,125 digital images. Experimental results on KADID-10k demonstrate that the proposed method outperforms other state-of-the-art algorithms. These results are also confirmed with cross-database tests using other publicly available datasets.

Journal ArticleDOI
TL;DR: A combined second-order non-convex total variation with overlapping group sparse regularizer for staircase artifact removal is proposed and the results from peak signal-to-noise ratio and structural similarity index measure show the effectiveness of the proposed method when compared to the mentioned algorithms.
Abstract: Total variation is a very popular method for image restoration, yet it produces undesirable staircase artifacts. In this paper, a combined second-order non-convex total variation with overlapping group sparse regularizer for staircase artifact removal is proposed. The non-convex higher-order TV regularizer is introduced to model the observation that the staircase effect must be appropriately smoothen out in the restored image while preserving its edges. However, using the non-convex higher-order TV alone tends to smoothen the image while amplifying speckle artifacts. To deal with this, the overlapping group sparse regularizer is added to balance the effects produced by the non-convex higher-order TV regularizer. An efficient re-weighted $$\ell _1$$ alternating direction method is formulated to solve the corresponding iterative scheme. Comparative analysis with three algorithms, namely overlapping group sparse total variation, total generalized variation and the fast non-smooth non-convex method, with two different blur kernels is carried out. The results from peak signal-to-noise ratio and structural similarity index measure show the effectiveness of our proposed method when compared to the mentioned algorithms.

Journal ArticleDOI
TL;DR: A novel dual-stage approach for an abrupt transition detection is proposed which is able to withstand under certain illumination and motion effects.
Abstract: Many researches have been done on shot boundary detection, but the performance of shot boundary detection approaches is yet to be addressed for the videos having sudden illumination and object/camera motion effects efficiently. In this paper, a novel dual-stage approach for an abrupt transition detection is proposed which is able to withstand under certain illumination and motion effects. Firstly, an adaptive Wiener filter is applied to the lightness component of the frame to retain some important information on both frequencies and LBP-HF is extracted to reduce the illumination effect. From the experimentation, it is also confirmed that the motion effect is also reduced in the first stage. Secondly, Canny edge difference is used to further remove the illumination and motion effects which are not handled in the first stage. TRECVid 2001 and TRECVid 2007 datasets are applied to analyze and validate our proposed algorithm. Experimental results manifest that the proposed system outperforms the state-of-the-art shot boundary detection techniques.

Journal ArticleDOI
TL;DR: A simple method is introduced here to build a dataset for sentence-level Mandarin lipreading from programs like news, speech and talk show and proposes a model that is composed of a 3D convolutional layer with DenseNet and residual bidirectional long short-term memory.
Abstract: Lipreading is to recognize what the speakers say by the movement of lip only. Most of the previous works are to solve the problem of lipreading in English. For Mandarin lipreading, there are a few researches due to the lack of datasets. For that reason, we introduce a simple method here to build a dataset for sentence-level Mandarin lipreading from programs like news, speech and talk show. We use Hanyu Pinyin (a phonemic transcription of Chinese) as label and totally have 349 classes, while the number of Chinese characters is 1705 in our dataset. Therefore, for lipreading, there are two steps. The first step is to obtain the Hanyu Pinyin sequence. We propose a model that is composed of a 3D convolutional layer with DenseNet and residual bidirectional long short-term memory. After this, in order to get the final Chinese characters results, a model with a stack of multi-head attention is applied to convert Hanyu Pinyin into Chinese characters.

Journal ArticleDOI
TL;DR: This paper presents an efficient methodology based on empirical wavelet transform (EWT) to remove cross-terms from the Wigner–Ville distribution (WVD), and normalized Rényi entropy measure is also computed for validating the performance.
Abstract: This paper presents an efficient methodology based on empirical wavelet transform (EWT) to remove cross-terms from the Wigner–Ville distribution (WVD). An EWT-based filter bank method is suggested to remove the cross-terms that occur due to nonlinearity in modulation. The mean-square error-based filter bank bandwidth selection is done which has been applied for the boundaries selection in EWT. In this way, a signal-dependent adaptive boundary selection is performed. Thereafter, energy-based segmentation is applied in time domain to eliminate inter-cross-terms generated between components. Moreover, the WVD of all the components is added together to produce a complete cross-terms-free time–frequency distribution. The proposed method is compared with other existing methods, and normalized Renyi entropy measure is also computed for validating the performance.

Journal ArticleDOI
TL;DR: It has been found that performance from the proposed method is better than the existing methods based on CNN, and the accuracy is 99.84.
Abstract: The ECG signal is such a substantial means to reflect all the electrical activities of the cardiac system. Therefore, it is considered by the physician as the essential tools and materials to diagnose and treat heart diseases. To deal with different types of arrhythmia, the physician manually inspects the ECG heartbeat. Since there are tiny alternations in the amplitude, durations and therefore the morphology, the computer-based systems were needed to develop such solutions in order to help the physician to do their job. In this study, a novel tactic to automatically classify ten different arrhythmia types was developed depending on the deep learning theory. Consequently, the well-known convolutional neural network (CNN) approach was adopted to classify those different types of arrhythmia. The structure of the proposed model consists of 11 layers distributed as follows: four layers as convolution interchanged with other four layers of max pooling and finally three successfully connected layers. The experiment was conducted with the dataset which was downloaded from the Physionet in the Massachusetts Institute of Technology-Beth Israel Hospital database and then augmented to get sufficient and balanced dataset. To evaluate the performance of the proposed method and compare it with the previous algorithms, confusion matrix, sensitivity (SEN), specificity (SPE), precision (PRE), area under curve and receiver operating characteristic have been used and calculated. It has been found that performance from the proposed method is better than the existing methods based on CNN, and the accuracy is 99.84.

Journal ArticleDOI
TL;DR: It is shown that the total power of extracted brain sources within the BG region in the $$\alpha $$ α and $$β $$ β rhythms can be used effectively to determine the severity of PD.
Abstract: Diagnosis of Parkinson’s disease (PD) in the early stages is very critical for effective treatments. In this paper, we propose a simple and low-cost biomarker to diagnose PD, using the electroencephalography (EEG) signals. In the proposed method, EEG is used to detect the brain electrical activities in internal regions of brain, e.g., basal ganglia (BG). Based on the high correlation between PD and brain activities in the BG, the proposed method provides a highly accurate PD diagnostic measure. Moreover, we obtain a quantitative measure of the disease severity, using the spectral analysis of extracted brain sources. The proposed method is denoted by Parkinson’s disease stage detection (PDSD). The PDSD includes brain sources separation and localization steps. The accuracy of the method in detection and quantification of PD is evaluated and verified by using information of ten patients and ten healthy people. The results show that there is a significant difference in the number of brain sources within the BG region, as well as their power spectral density, between healthy cases and patients. The accuracy and the cross-validation error of PDSD to detect PD are 95% and 6.25%, respectively. Furthermore, it is shown that the total power of extracted brain sources within the BG region in the $$\alpha $$ and $$\beta $$ rhythms can be used effectively to determine the severity of PD.

Journal ArticleDOI
TL;DR: Experimental study on the famous WIDER Face and FDDB databases proved the efficiency as well as the feasibility of the proposed method to deal with multi-scale face detection problems.
Abstract: In this paper, we introduce a deep learning (CNN) based method for face detection in an uncontrolled environment. The proposed method consists in developing a CNN architecture dedicated to the face detection tasks by combining both of global and local features at multiple scales. Our architecture is composed of two main networks: A region proposal network that generates a list of regions of interest (ROIs) and a second corresponds to a network that use these ROIs for classification into face/non-face. Both of them share the full-image convolution features of a pre-trained ResNet-50 model. Experimental study was conducted on the famous WIDER Face and FDDB databases. The obtained results proved the efficiency as well as the feasibility of the proposed method to deal with multi-scale face detection problems.

Journal ArticleDOI
TL;DR: The analysis in this paper shows that the proposed transform can be expressed as a variant of STFT, and as an alternative discretization of the CWT, and could also be considered a variants of the CQT and a special case of multi-resolution STFT.
Abstract: The short-time Fourier transform (STFT) is extensively used to convert signals from the time-domain into the time–frequency domain. However, the standard STFT has the drawback of having a fixed window size. Recently, we proposed a variant of that transform which fixes the window size in the frequency domain (STFT-FD). In this paper, we revisit that formulation, showing its similarity to existing techniques. Firstly, the formulation is revisited from the point of view of the STFT and some improvements are proposed. Secondly, the continuous wavelet transform (CWT) equation is used to formulate the transform in the continuous time using wavelet theory and to discretize it. Thirdly, the constant-Q transform (CQT) is analyzed showing the similarities in the equations of both transforms, and the differences in terms of how the sweep is carried out are discussed. Fourthly, the analogies with multi-resolution STFT are analyzed. Finally, the representations of a period chirp and an electrocardiogram signal in the time–frequency domain and the time-scale domain are obtained and used to compare the different techniques. The analysis in this paper shows that the proposed transform can be expressed as a variant of STFT, and as an alternative discretization of the CWT. It could also be considered a variant of the CQT and a special case of multi-resolution STFT.

Journal ArticleDOI
TL;DR: A new algorithm based on Atanassov’s intuitionistic fuzzy sets and fuzzy mathematical morphology to leukocytes segmentation in color images that works directly over the color images without the need of converting the image in gray scale.
Abstract: This work presents a new algorithm based on Atanassov’s intuitionistic fuzzy sets and fuzzy mathematical morphology to leukocytes segmentation in color images. The main idea is based on modeling a color image as an Atanassov’s intuitionistic fuzzy set using the hue component in the HSV color space. Then, a pixel labeled as leukocyte is selected and compared to the whole image with a similarity measure. Thus, the leukocyte is segmented and separated from the rest of the image. The experimental results show that the algorithm has a good performance, reaching a value of 99.41% for the correct classification of leukocytes and a 99.23% for the correct classification of the background. Other metrics such as accuracy, precision and recall have been calculated obtaining 99.32%, 99.41% and 99.24%, respectively. The algorithm presents two important characteristics: It works directly over the color images without the need of converting the image in gray scale, and it does not produce false colors because fuzzy morphological operators guarantee it.

Journal ArticleDOI
TL;DR: This study segmented osteosarcoma solely utilizing DWI and identified effective and robust technique(s) for tumor segmentation using semi-automated and automated methods.
Abstract: Osteosarcoma is a primary malignant bone tumor in children and adolescents with significant morbidity and poor prognosis. Diffusion weighted imaging (DWI) plays a crucial role in diagnosis and prognosis of this malignant disease by capturing cellular changes in tumor tissue early in the course of treatment without any contrast injection. Segmentation of tumor in DWI is challenging due to low signal-to-noise ratio, partial-volume effects, intensity inhomogeneities and irregular shape of osteosarcoma. The purpose of this study was to segment osteosarcoma solely utilizing DWI and identify effective and robust technique(s) for tumor segmentation. DWI dataset of fifty-five (N = 55; male:female = 41:14; Age = 17.8 ± 7.4 years) patients with osteosarcoma was acquired before treatment. Total nine automated and semi-automated segmentation algorithms based on (1) Otsu thresholding (OT), (2) Otsu threshold-based region growing (OT-RG), (3) Active contour (AC), (4) Simple linear iterative clustering Superpixels (SLIC-S), (5) Fuzzy c-means clustering (FCM), (6) Graph cut (GC), (7) Logistic regression (LR) (8) Linear support vector machines (L-SVM) and (9) Deep feed-forward neural network (DNN) were implemented. Segmentation accuracy was estimated by Dice coefficient (DC), Jaccard Index (JI), precision (P) and recall (R) using manually demarcated ground-truth tumor mask by a radiologist. Evaluated apparent diffusion coefficient (ADC) in segmented tumor mask and ground-truth tumor mask was compared using paired t test for statistical significance (p < 0.05) and Pearson correlation coefficient (PCC). Automated SLIC-S and FCM showed quantitatively and qualitatively superior segmentation with DC: ~ 79–82%; JI: ~ 67–71%; P: ~ 81–83%; R: ~ 80–86% and PCC = 0.89, 0.88 among all methods. Among semi-automated methods, AC was quantitatively more accurate (DC: ~ 77%; JI: ~ 65%; P: ~ 72%; R: ~ 88%; PCC = 0.85) than OT-RG and GC (DC: ~ 74–75%; JI: ~ 60–61%; P: ~ 67–72%; R: ~ 84–89%; PCC = 0.78, 0.73). Among machine learning algorithms, DNN showed the highest accuracy (DC: ~ 73%; JI: ~ 62%; P: ~ 77%; R: ~ 86%; PCC = 0.79) than LR and L-SVM (DC: ~ 70–71%; JI: ~ 58–63%; P: ~ 73%; R: ~ 74–85%; PCC = 0.69, 0.71). Execution times were instantaneous for SLIC-S, FCM and machine learning methods, while OT-RG, AC and GC took comparable ~ 1–6 s/slice image. Automated SLIC-S, FCM and semi-automated AC methods produced promising tumor segmentation results using DWI of the osteosarcoma dataset.

Journal ArticleDOI
TL;DR: An efficient data hiding method, which has high capacity and high fidelity, is proposed for high-efficiency video coding and exhibits outstanding performance in terms of visual quality and embedding capacity.
Abstract: Nowadays, data hiding has become important for different reasons such as copyright violation and authentication. Conventional data hiding approaches may suffer from high capacity and high fidelity. In general, increasing capacity leads to distortion in carrier signal. In this paper, an efficient data hiding method, which has high capacity and high fidelity, is proposed for high-efficiency video coding. Hidden data are embedded into the coefficient of the discrete sine transforms at the transform domain level. The proposed method is basically based on matrix encoding approach that provides high capacity and high fidelity. In this paper, additionally, error propagation issue caused by data hiding is handled in the video encoding. Then, data hiding process is carried out without error propagation. Thus, distortion in visual quality is kept at minimum level during video encoding. Experimental results reveal that the proposed method exhibits outstanding performance in terms of visual quality and embedding capacity.

Journal ArticleDOI
TL;DR: It is demonstrated that the GIF with weighted aggregation performs well in the fields of computational photography and image processing, including single image detail enhancement, tone mapping of high-dynamic-range images, single image haze removal, etc.
Abstract: As a local filter, the guided image filtering (GIF) suffers from halo artifacts. To address this issue, a novel weighted aggregating strategy is proposed in this paper. By introducing the weighted aggregation to GIF, the proposed method called WAGIF can achieve results with sharp edges and avoid/reduce halo artifacts. More specifically, compared to the weighted guided image filtering and the gradient domain guided image filtering, the proposed method can achieve both fine and coarse smoothing results in the flat areas while preserving edges. In addition, the complexity of the proposed approach is O(N) for an image with N pixels. It is demonstrated that the GIF with weighted aggregation performs well in the fields of computational photography and image processing, including single image detail enhancement, tone mapping of high-dynamic-range images, single image haze removal, etc.

Journal ArticleDOI
TL;DR: An attempt is made to recognize objects in underwater images using an adaptive Gaussian mixture model, and inner distance shape matching technique was applied for object recognition.
Abstract: Object recognition in underwater images becomes a challenging task because of its poor visibility conditions. Marine scientists often prefer automation tools for object recognition as large amount of data is captured everyday with the help of autonomous underwater vehicles. The challenge for classification in such underwater images is the limited color information. An attempt is made to recognize objects in underwater images using an adaptive Gaussian mixture model. The Gaussian mixture model performs accurate object segmentation provided the number of clusters is predefined. Optimization techniques like genetic algorithm, particle swarm optimization and differential evolution were analyzed for initializing the parameter set. Differential evolution is known for its accurate decision making in fewer iterations and proved to be better for initializing the number of clusters for the Gaussian mixture model. Further for object recognition, inner distance shape matching technique was applied. The proposed classification method achieved a maximum accuracy of 99%.

Journal ArticleDOI
TL;DR: This paper presents a novel S-PCP method, called local null space pursuit (LNSP), which achieves a high detection accuracy and real-time performance on aerial images.
Abstract: Recently, accurate detection of moving objects has achieved via principal component pursuit (PCP). However, in the case of aerial imagery, existing PCP-based detection methods suffer from low accuracy and/or high computational loads. This paper presents a novel S-PCP method, called local null space pursuit (LNSP), which achieves a high detection accuracy and real-time performance on aerial images. LNSP models the background as a subspace that lies in a low-dimensional subspace, while the moving objects are modelled as sparse. Based on these two models, LNSP proposes a new formulation for the detection problem by using multiple local null spaces and $$\ell _1$$-norm. The performance of LNSP is evaluated on challenging aerial datasets and then compared the results with relevant current state-of-the-art methods.

Journal ArticleDOI
TL;DR: A convolutional neural network (CNN)-based deblocking filter is proposed for SHVC in H.265 that removes blocking artifacts with less computation due to max-pooling and soft-max layers in CNN.
Abstract: The deblocking filter in the standard H.265 SHVC reduces blocking artifacts at the edges and block boundaries of coded frame with exhaustive computation. But the maximum removal of these blocking artifacts is not achieved; also the computational complexity is still a burden. This made many research works discussed related to maximum removal of blocking artifacts without considering computational complexity. In this paper, a convolutional neural network (CNN)-based deblocking filter is proposed for SHVC in H.265 that removes blocking artifacts with less computation. The proposed CNN framework learns the reconstructed samples of the input frame in a video sequence. The blocking artifacts are efficiently detected by the preprocessing unit which is considered as the first convolution layer. Next, the features are extracted for the frames in a video based on patch by employing kernel and stride for scanning the complete frames in a video. The normalization is applied from base layer to enhancement layer in the CNN framework to preserve sharpness in the video by removing artifacts generated due to inter-layer prediction and quantization. The network model is trained by rectified linear unit activation function to perform nonlinear mappings. In addition, the CNN-based deblocking filter achieves less computation due to max-pooling and soft-max layers in CNN. The simulation results are conducted that produce an average of 0.76 dB increase in PSNR and 57% time saving compared with the standard SHM reference encoder.

Journal ArticleDOI
TL;DR: A comparative study on modelling the impulsive noise amplitude in indoor PLC systems by utilising several impulsive distributions, which shows that the S $$\alpha $$ α S distribution achieves the best modelling success when compared to the other families in terms of the statistical error criteria, especially for the tail characteristics of the measured data sets.
Abstract: Powerline communication (PLC) is an emerging technology that has an important role in smart grid systems. Due to making use of existing transmission lines for communication purposes, PLC systems are subject to various noise effects. Among those, the most challenging one is the impulsive noise compared to the background and narrowband noise. In this paper, we present a comparative study on modelling the impulsive noise amplitude in indoor PLC systems by utilising several impulsive distributions. In particular, as candidate distributions, we use the symmetric $$\alpha $$ -Stable (S $$\alpha $$ S), generalised Gaussian, Bernoulli Gaussian and Student’s t distribution families as well as the Middleton Class A distribution, which dominates the literature as the impulsive noise model for PLC systems. Real indoor PLC system noise measurements are investigated for the simulation studies, which show that the S $$\alpha $$ S distribution achieves the best modelling success when compared to the other families in terms of the statistical error criteria, especially for the tail characteristics of the measured data sets.

Journal ArticleDOI
TL;DR: The results indicated the proposed scheme could characterize the dynamics of EEG signals in three groups, and it is suitable for the detection of epileptic seizures.
Abstract: In this paper, an efficient simple system for classifying electroencephalogram (EEG) data of normal and epileptic subjects is presented using lagged Poincare plot parameters. To this effect, a benchmark for choosing delays is defined based on the autocorrelation function. For each lag, traditional indicators, including the number of points lying on the identity line, the length of the minor (SD1)/major axis (SD2) of the fitted ellipse on the plot, the SD1/SD2 ratio, and the area of the ellipse, were calculated. The efficiency of the features in discriminating between the groups was examined based on the statistical significance of the differences. K-nearest neighbor and probabilistic neural network were employed as the classifier. The performance of the suggested scheme was evaluated using a publicly available database that includes numerous EEG data of healthy, during the incidence of an epileptic seizure and seizure-free intervals cases. It is indicated that the method can provide the maximum correct rate of 98.33%. Our results indicated the proposed scheme could characterize the dynamics of EEG signals in three groups, and it is suitable for the detection of epileptic seizures.

Journal ArticleDOI
TL;DR: A technique to select the most discriminative feature extraction techniques based on Fisher score and results support the superiority of this proposed moment-based technique over the state-of-the-art methods in the literature.
Abstract: SAR generates high-resolution images irrespective of any weather condition and solar illumination. Feature-level fusion increases the dimensionally of feature space as well as feature redundancy brought by correlation among the features. In this paper, we propose a technique to select the most discriminative feature extraction techniques based on Fisher score. In this regard, by utilizing different moment methods, we extract moment features and evaluate Fisher scores of particular moment method followed by moment method ranking. Finally, we select top moment methods for feature fusion. The proposed technique improves accuracy while decreasing the feature dimensionality and the feature redundancy. The performance of the proposed method improves individual performances of the moment methods considered. Furthermore, results support the superiority of this proposed moment-based technique over the state-of-the-art methods in the literature.

Journal ArticleDOI
TL;DR: This study used maximal overlap discrete wavelet transform to decompose the data, extracted the variance, inter-quartile range, Pearson correlation coefficient, Hoeffding’s D correlation coefficient and Shannon entropy of the wavelet coefficients and used the k -nearest neighbor model to detect MI.
Abstract: Electrocardiography is a useful diagnostic tool for various cardiovascular diseases, such as myocardial infarction (MI). An electrocardiograph (ECG) records the electrical activity of the heart, which can reflect any abnormal activity. MI recognition by visual examination of an ECG requires an expert’s interpretation and is difficult because of the short duration and small amplitude of the changes in ECG signals associated with MI. Therefore, we propose a new method for the automatic detection of MI using ECG signals. In this study, we used maximal overlap discrete wavelet transform to decompose the data, extracted the variance, inter-quartile range, Pearson correlation coefficient, Hoeffding’s D correlation coefficient and Shannon entropy of the wavelet coefficients and used the k-nearest neighbor model to detect MI. The accuracy, sensitivity and specificity of the model were 99.57%, 99.82% and 98.79%, respectively. Therefore, the system can be used in clinics to help diagnose MI.