scispace - formally typeset
Search or ask a question

Showing papers by "M. Omair Ahmad published in 2018"


Journal Article•DOI•
TL;DR: This paper proposes a scheme for designing a blind multibit watermark decoder incorporating the vector-based HMM in wavelet domain and shows that the proposed decoder is more robust against various kinds of attacks compared with the state-of-the-art methods.
Abstract: The vector-based hidden Markov model (HMM) is a powerful statistical model for characterizing the distribution of the wavelet coefficients, since it is capable of capturing the subband marginal distribution as well as the inter-scale and cross-orientation dependencies of the wavelet coefficients. In this paper we propose a scheme for designing a blind multibit watermark decoder incorporating the vector-based HMM in wavelet domain. The decoder is designed based on the maximum likelihood criterion. A closed-form expression is derived for the bit error rate and validated experimentally with Monte Carlo simulations. The performance of the proposed watermark detector is evaluated using a set of standard test images and shown to outperform the decoders designed based on the Cauchy or generalized Gaussian distributions without or with attacks. It is also shown that the proposed decoder is more robust against various kinds of attacks compared with the state-of-the-art methods.

55 citations


Journal Article•DOI•
TL;DR: Experimental results on different settings of mixed-noise show that the proposed CNN-based denoising method performs significantly better than the sparse representation and patch-based methods do both in terms of accuracy and robustness.
Abstract: The removal of mixed-noise is an ill-posed problem due to high level of non-linearity in the distribution of noise. Most commonly encountered mixed-noise is the combination of additive white Gaussian noise (AWGN) and impulse noise (IN) that have contrasting characteristics. A number of methods from the cascade of IN and AWGN reduction to the state-of-the-art sparse representation have been reported to reduce this common form of mixed-noise. In this paper, a new learning-based algorithm using the convolutional neural network (CNN) model is proposed to reduce the mixed Gaussian-impulse noise from images. The proposed CNN model adopts computationally efficient transfer learning approach to obtain an end-to-end map from noisy image to noise-free image. The model has a small structure yet it is capable of providing performance superior to that of the well established methods. Experimental results on different settings of mixed-noise show that the proposed CNN-based denoising method performs significantly better than the sparse representation and patch-based methods do both in terms of accuracy and robustness. Moreover, due to the lightweight structure, the denoising operation of the proposed CNN-based method is computationally faster than that of the previously reported methods.

48 citations


Journal Article•DOI•
TL;DR: The performance of a multi-biometric system can be improved using an efficient normalization technique under the simple sum-rule-based score-level fusion and a weighting technique based on the confidence of the matching scores by considering the mean-to-maximum of genuine scores and mean- to-minimum of impostor scores.
Abstract: The performance of a multi-biometric system can be improved using an efficient normalization technique under the simple sum-rule-based score-level fusion. It can also be further improved using normalization techniques along with a weighting method under the weighted sum-rule-based score-level fusion. In this paper, at first, we present two anchored score normalization techniques based on the genuine and impostor scores. Specifically, the proposed normalization techniques utilize the information of the overlap region between the genuine and impostor scores and their neighbors. Second, we propose a weighting technique that is based on the confidence of the matching scores by considering the mean-to-maximum of genuine scores and mean-to-minimum of impostor scores. A multi-biometric system having three biometric traits, fingerprint, palmprint, and earprint, is utilized to evaluate the performance of the proposed techniques. The performance of the multi-biometric system is evaluated in terms of the equal error rate and genuine acceptance rate @0.5% false acceptance rate. The receiver operating characteristics are also plotted in terms of the genuine acceptance rate as a function of the false acceptance rate.

39 citations


Journal Article•DOI•
TL;DR: The proposed method can reduce the burden of physicians in investigating WCE video to detect bleeding frame and zone with a high level of accuracy and offer significantly better performance in comparison to some existing methods in terms of bleeding detection accuracy, sensitivity, specificity and precision.

18 citations


Journal Article•DOI•
TL;DR: The idea behind the proposed network is to compose the residual signal that is more representative of the features produced by the different layers of the network and it is not as sparse.
Abstract: The features produced by the layers of a neural network become increasingly more sparse as the network gets deeper and consequently, the learning capability of the network is not further enhanced as the number of layers is increased. In this paper, a novel residual deep network, called CompNet, is proposed for the single image super resolution problem without an excessive increase in the network complexity. The idea behind the proposed network is to compose the residual signal that is more representative of the features produced by the different layers of the network and it is not as sparse. The proposed network is experimented on different benchmark datasets and is shown to outperform the state-of-the-art schemes designed to solve the super resolution problem.

15 citations


Journal Article•DOI•
TL;DR: This paper presents a high performance quality-guaranteed two-dimensional (2D) single-lead ECG compression algorithm using singular value decomposition (SVD) and lossless-ASCII-character-encoding (LLACE)-based techniques.

14 citations


Journal Article•DOI•
TL;DR: It is observed that the introduction of weights in convex and non-convex penalty functions produces smaller root mean square errors of the reconstructed signal when compared to the case of not using weights.

14 citations


Journal Article•DOI•
TL;DR: A practical physical channel model is proposed by dividing the angular domain into a finite number of distinct directions and a lower bound on the achievable rate of uplink data transmission is derived using a linear detector for each user and employed in defining the spectral efficiency.
Abstract: This brief investigates the spectral efficiency of multiuser massive multiple-input multiple-output systems with a very large number of antennas at a base station serving single-antenna users. A practical physical channel model is proposed by dividing the angular domain into a finite number of distinct directions. A lower bound on the achievable rate of uplink data transmission is derived using a linear detector for each user and employed in defining the spectral efficiency. The obtained lower bound is further modified for the maximum-ratio combining and zero-forcing receivers. A power control scheme based on the large-scale fading is also proposed to maximize the spectral efficiency under peak power constraint. Experiments are conducted to evaluate the obtained lower bounds and the performance of the proposed method. The numerical results show a comparison between the performances of both receivers along with the advantage of the proposed power control scheme in terms of the spectral efficiency.

11 citations


Journal Article•DOI•
TL;DR: Extensive simulations are carried out to show that using the proposed decoder leads to considerable improvement in the rate-distortion performance of the distributed video codec, particularly on video sequences with fast motions.
Abstract: Distributed video coding is relatively a novel video coding paradigm that enables a lower complex video encoding compared to conventional video coding schemes, at the expense of a higher-complexity decoder. Improving the rate-distortion and coding efficiency is a challenging problem in distributed video coding. Using a suitable correlation noise model along with an accurate estimation of its parameter can lead to an improved rate-distortion performance. In a distributed video codec, the Wyner-Ziv frames are not available at the decoder. In addition, the correlation noise is not stationary and its statistics vary within each frame and in its corresponding transform coefficient bands. Hence, the estimation of the correlation noise model parameter is not a feasible task. In this paper, a new decoder is proposed to estimate the correlation noise parameter and carry out the decoding process progressively and recursively on an augmented factor graph. In the proposed decoder, a recursive message passing algorithm is used for decoding the bitplanes corresponding to each DCT band in a WZ frame, and simultaneously, for estimating and refining the correlation noise distribution parameter. To approximate the posterior distribution of the correlation noise parameter, and consequently, derive a closed-form expression for the messages on the augmented factor graph, a variational Bayes algorithm is employed. Extensive simulations are carried out to show that using the proposed decoder leads to considerable improvement in the rate-distortion performance of the distributed video codec, particularly on video sequences with fast motions.

10 citations


Journal Article•DOI•
TL;DR: An automatic apnea detection scheme is proposed in this paper using a single lead EEG signal, which can differentiate apnea patients and healthy subjects and also classify apnea and non-apnea frames in the data of an apnea patient.
Abstract: Electroencephalogram (EEG) is getting special attention of late in the detection of sleep apnea as it is directly related to the neural activity. But apnea detection through visual monitoring of EEG signal by an expert is expensive, difficult, and susceptible to human error. To counter this problem, an automatic apnea detection scheme is proposed in this paper using a single lead EEG signal, which can differentiate apnea patients and healthy subjects and also classify apnea and non-apnea frames in the data of an apnea patient. Each sub-frame of a given frame of EEG data is first decomposed into band-limited intrinsic mode functions (BLIMFs) by using the variational mode decomposition (VMD). The advantage of using VMD is to obtain compact BLIMFs with adaptive center frequencies, which give an opportunity to capture the local information corresponding to varying neural activity. Furthermore, by extracting features from each BLIMF, a temporal within-frame feature variation pattern is obtained for each mode. We propose to fit the resulting pattern with the Rician model (RiM) and utilize the fitted model parameters as features. The use of such VMD-RiM features not only offers better feature quality but also ensures very low feature dimension. In order to evaluate the performance of the proposed method, K nearest neighbor classifier is used and various cross-validation schemes are carried out. Detailed experimentation is carried out on several apnea and healthy subjects of various apnea-hypopnea indices from three publicly available datasets and it is found that the proposed method achieves superior classification performances in comparison to those obtained by the existing methods, in terms of sensitivity, specificity, and accuracy.

10 citations


Proceedings Article•DOI•
01 Aug 2018
TL;DR: A novel content-based image retrieval method based on perceptual image hashing that creates a signature per image based on image rotation and DCT and is compared with some state-of-the-art methods.
Abstract: Unlike the regular usage, hashing methods, which is ptography, can be used to extract signatures in relation to the detection of similar images. However, finding a hashing function for detecting image similarity seems to be a challenging task, as the hash code needs to represent the content rather than encrypt it. In this paper, a novel content-based image retrieval method based on perceptual image hashing is proposed. The proposed hashing method creates a signature per image based on image rotation and DCT. The acquired hash code is then used to train a memory model to find similar images among a large number of images. In order to evaluate the proposed method, we compare it with some state-of-the-art methods. The results show that our method provides performance faster and better than the leading competitive methods.

Proceedings Article•DOI•
01 Aug 2018
TL;DR: A novel classification framework which uses unsupervised autoencoder network to select the subset from given structural and clinical features by exploring the linear and nonlinear relationship among them followed by a supervised multinomial logistic layer to automatically identify the patients having AD, mild cognitive impairment (MCI), and cognitively normal (CN) clinical status is proposed.
Abstract: Although there is no cure for Alzheimer’s disease (AD), the early accurate prediction of clinical status plays a significant role in preventing, treating, and slowing down the progression of the disease. However, the absence of a single test and the complexity of AD create delays in diagnosis. In recent years, diagnosis of AD using different biomarkers through machine learning techniques has been the hottest research in the medical field. However, a common bottleneck of the diagnostic performance is overfitting due to having lot of irrelevant features in the training data. In view of this fact, we propose a novel classification framework which uses unsupervised autoencoder network to select the subset from given structural and clinical features by exploring the linear and nonlinear relationship among them followed by a supervised multinomial logistic layer to automatically identify the patients having AD, mild cognitive impairment (MCI), and cognitively normal (CN) clinical status. Through experimental results on Alzheimer’s disease neuroimaging initiative (ADNI) database, it is shown that the proposed classification algorithm achieves better performance in terms of accuracy, sensitivity, and specificity in 5-fold cross validation when compared to the state-of-the-art methods.

Proceedings Article•DOI•
27 May 2018
TL;DR: The experimental results show that the performance of a multi-biometric systems using the proposed fusions is superior to that of the uni- biometric systems or to thatof the system using existing level of fusions.
Abstract: In this paper, first, a new fusion technique, referred to as hybrid fusion (HBF) technique, based on feature-level fusion and the best unimodal system for multimodal biometric system recognition, is proposed. Secondly, a new weighting technique, referred to as mean-extrema based confidence weighting (MEBCW) technique, based on the scores obtained from feature-level fusion and the best unimodal system, is proposed. Finally, a weighted hybrid fusion, referred to as weighted hybrid fusion (WHBF) technique, is developed by incorporating MEBCW in HBF, in order to improve the overall recognition rate of a multimodal biometric system. The performance of the proposed method, in terms of equal error rate and genuine acceptance rates @5.3% and @7.2% false acceptance rates, is evaluated on a multi-biometric system. The experimental results show that the performance of a multi-biometric systems using the proposed fusions is superior to that of the uni-biometric systems or to that of the system using existing level of fusions.

Proceedings Article•DOI•
24 Jun 2018
TL;DR: A novel classification algorithm to discriminate the patients having AD, early mild cognitive impairment (MCI), late MCI, and normal control in 18F-AV-45 PET using shearlet based deep convolutional neural network (CNN).
Abstract: Although there is no cure for Alzheimer’s disease (AD), an accurate early diagnosis is essential for health and social care, and will be of great significance when the course of the disease could be reversed through treatment options. Florbetapir positron emission tomography (18F-AV-45 PET) is proven to be the most powerful imaging technique to investigate the deposition of amyloid plaques, one of the potential hallmarks of AD, signifying the onset of AD before it changes the brains structure. In this paper, we propose a novel classification algorithm to discriminate the patients having AD, early mild cognitive impairment (MCI), late MCI, and normal control in 18F-AV-45 PET using shearlet based deep convolutional neural network (CNN). It is known that the conventional CNNs involve convolution and pooling layers, which in fact produce the smoothed representation of data, and this results in losing detailed information. In view of this fact, the conventional CNN is integrated with shearlet transform incorporating the multiresolution details of the data. Once the model is pretrained to transform the input data into a better stacked representation, the resulting final layer is passed to softmax classifier, which returns the probabilities of each class. Through experimental results, it is shown that the performance of the proposed classification framework is superior to that of the traditional CNN in Alzheimer’s disease neuroimaging initiative (ADNI) database in terms of classification accuracy. As a result, it has the potential to distinguish the different stages of AD progression with less clinical prior information.

Proceedings Article•DOI•
24 Jun 2018
TL;DR: Results show that a superior performance is provided by a multi-biometric system using the proposed fusion scheme in comparison with the performance provided by the system using existing fusions or by the unimodal systems.
Abstract: In feature-level fusion, features extracted from different modalities are fused in order to obtain a single feature set for multimodal biometric recognition systems. These features can be encoded using a binary (1' or '0') encoding technique. The encoded feature value of '1' provides more information about the feature than '0' does. In view of this, we first propose a fusion in order to fuse encoded features obtained from individual feature encoders for a multimodal biometric system, and refer to it as the first-stage fusion (FSF). Next, another fusion is carried out between the unimodal system which provides the best performance in that multimodal system and the proposed FSF, and referred to as the second-stage fusion (SSF). Genuine acceptance rates @4.3% and @4.4% false acceptance rates, and equal error rate are utilized for evaluating the performance of a multi-biometric system using the proposed fusions. Results show that a superior performance is provided by a multi-biometric system using the proposed fusion scheme in comparison with the performance provided by the system using existing fusions or by the unimodal systems.

Proceedings Article•DOI•
24 Jun 2018
TL;DR: This work proposes a framework based on image hashing and random forest, which is fast and offers high performance, and outperforms competitive methods in terms of both accuracy and speed.
Abstract: Use of large image datasets has become a common occurrence. This, however, makes image searching a highly desired operation in many applications. Most of the content-based image retrieval (CBIR) methods usually adopt machine-learning techniques that take the image content into account. These methods are effective, but they are generally too complex and resource demanding. We propose a framework based on image hashing and random forest, which is fast and offers high performance. The proposed framework consists of a multi-key image hashing technique based on discrete cosine transform (DCT) and discrete wavelet transform (DWT) and random forest based on normalized B+ Tree (NB+ Tree), which reduces the high-dimensional input vectors to one-dimension, which in turn improves the time complexity significantly. We analyze our method empirically and show that it outperforms competitive methods in terms of both accuracy and speed. In addition, the proposed scheme maintains a fast scaling with increasing size of the data sets while preserving high accuracy.

Proceedings Article•DOI•
01 Oct 2018
TL;DR: This work proposes a novel stacked sparse autoencoder based method to assign a value in the missing places and to select the significant structural and clinical features in order to discriminate the patients having AD, mild cognitive impairment (MCI), and cognitively normal (CN) clinical status.
Abstract: In recent years, the accurate detection of Alzheimer's disease (AD) at its early stage, using various biomarkers through machine learning techniques, has been given paramount importance in the medical field. However, in reality, the input datasets contain lots of missing values due to several factors such as increasing mortality rate, avoiding invasive procedures, and dropping out from the study. In this work, after analyzing the pattern of structural and clinical data from tadpole study in Alzheimer's disease neuroimaging initiative (ADNI) database, it has been found that the unobserved data are not missing completely at random. In view of this fact, with the assumption that the missing data patterns are in blocks, we propose a novel stacked sparse autoencoder based method to assign a value in the missing places and to select the significant structural and clinical features in order to discriminate the patients having AD, mild cognitive impairment (MCI), and cognitively normal (CN) clinical status. Through experimental results, it is shown that the proposed imputation algorithm achieves better performance for semi-supervised AD classification in terms of accuracy, sensitivity, and specificity in 5-fold cross validation when compared to the state-of-the-art methods.

Journal Article•DOI•
TL;DR: A novel method to assess the accuracy of TDE by investigating the NCC profile around the estimated time-delay, and utilizing support vector machine to classify peak-hopping and jitter error.
Abstract: The accuracy of time-delay estimation (TDE) in ultrasound elastography is usually measured by calculating the value of normalized cross correlation (NCC) at the estimated displacement. NCC value, however, could be very high at a displacement estimate with large error, a well-known problem in TDE referred to as peak-hopping. Furthermore, NCC value could suffer from jitter error, which is due to electric noise and signal decorrelation. Herein, we propose a novel method to assess the accuracy of TDE by investigating the NCC profile around the estimated time-delay. We extract several features from the NCC profile, and utilize support vector machine to classify peak-hopping and jitter error. The results on simulation, phantom, and in vivo data show the significant improvement of the proposed algorithm compared to the state of the art techniques.

Posted Content•
TL;DR: This paper presents a variational step to remove the heavy tail of the noise distribution originating from the IN and shows that this approach can significantly improve the denoising performance of mixed AWGN-IN using well-established methods.
Abstract: Reduction of mixed noise is an ill posed problem for the occurrence of contrasting distributions of noise in the image. The mixed noise that is usually encountered is the simultaneous presence of additive white Gaussian noise (AWGN) and impulse noise (IN). A standard approach to denoise an image with such corruption is to apply a rank order filter (ROF) followed by an efficient linear filter to remove the residual noise. However, ROF cannot completely remove the heavy tail of the noise distribution originating from the IN and thus the denoising performance can be suboptimal. In this paper, we present a variational step to remove the heavy tail of the noise distribution. Through experiments, it is shown that this approach can significantly improve the denoising performance of mixed AWGN-IN using well-established methods.

Proceedings Article•DOI•
24 Jun 2018
TL;DR: A physical channel model is studied in this paper, where the angular domain is divided into a finite number of distinct directions and the lower capacity bound of uplink channel for zero-forcing detector is derived.
Abstract: This paper considers a power control problem for multiuser massive multiple-input multiple-output systems, where a base station with a massive number of antennas simultaneously receives data signals from the various users. A physical channel model is studied in this paper, where the angular domain is divided into a finite number of distinct directions. It is assumed that the perfect channel-state information is available at the base station and each user has the knowledge of only the geometric attenuation and shadow fading of the channel-state information. The lower capacity bound of uplink channel for zero-forcing detector is derived. According to the geometric attenuation and shadow fading of each user's channel, the power among the users is controlled in such a way that the spectral efficiency is maximized and the minimum energy per bit is achieved in the cell. It is shown that the proposed power control method outperforms the existing works addressed in the literature.

Posted Content•
TL;DR: The proposed method outperforms some of the state-of-the-art speech enhancement methods both at high and low levels of SNRs in terms of the standard objective measures and the subjective evaluations including formal listening tests.
Abstract: For enhancement of noisy speech, a method of threshold determination based on modeling of Teager energy (TE) operated perceptual wavelet packet (PWP) coefficients of the noisy speech by exponential distribution is presented A custom thresholding function based on the combination of mu-law and semisoft thresholding functions is designed and exploited to apply the statistically derived threshold upon the PWP coefficients The effectiveness of the proposed method is evaluated for car and multi-talker babble noise corrupted speech signals through performing extensive simulations using the NOIZEUS database The proposed method outperforms some of the state-of-the-art speech enhancement methods both at high and low levels of SNRs in terms of the standard objective measures and the subjective evaluations including formal listening tests

Journal Article•DOI•
Ke Li1, Hai Wang1, Xu Xiaolong1, Yu Du1, Yuansheng Liu1, M. Omair Ahmad2 •
15 May 2018-Sensors
TL;DR: The key factors that impact users’ end-to-end OTT web browsing service perception are analyzed by monitoring crowdsourced user perceptions and the intrinsic relationships among the key factors and the interactions between key quality indicators (KQI) are evaluated.
Abstract: Service perception analysis is crucial for understanding both user experiences and network quality as well as for maintaining and optimizing of mobile networks. Given the rapid development of mobile Internet and over-the-top (OTT) services, the conventional network-centric mode of network operation and maintenance is no longer effective. Therefore, developing an approach to evaluate and optimizing users' service perceptions has become increasingly important. Meanwhile, the development of a new sensing paradigm, mobile crowdsensing (MCS), makes it possible to evaluate and analyze the user's OTT service perception from end-user's point of view other than from the network side. In this paper, the key factors that impact users' end-to-end OTT web browsing service perception are analyzed by monitoring crowdsourced user perceptions. The intrinsic relationships among the key factors and the interactions between key quality indicators (KQI) are evaluated from several perspectives. Moreover, an analytical framework of perceptional degradation and a detailed algorithm are proposed whose goal is to identify the major factors that impact the perceptional degradation of web browsing service as well as their significance of contribution. Finally, a case study is presented to show the effectiveness of the proposed method using a dataset crowdsensed from a large number of smartphone users in a real mobile network. The proposed analytical framework forms a valuable solution for mobile network maintenance and optimization and can help improve web browsing service perception and network quality.

Posted Content•
TL;DR: It is argued that this method of noise estimation is capable of estimating the non-stationary noise accurately and a modified complex spectrum is found to be a better representation of enhanced speech spectrum.
Abstract: A two-step enhancement method based on spectral subtraction and phase spectrum compensation is presented in this paper for noisy speeches in adverse environments involving non-stationary noise and medium to low levels of SNR. The magnitude of the noisy speech spectrum is modified in the first step of the proposed method by a spectral subtraction approach, where a new noise estimation method based on the low frequency information of the noisy speech is introduced. We argue that this method of noise estimation is capable of estimating the non-stationary noise accurately. The phase spectrum of the noisy speech is modified in the second step consisting of phase spectrum compensation, where an SNR-dependent approach is incorporated to determine the amount of compensation to be imposed on the phase spectrum. A modified complex spectrum is obtained by aggregating the magnitude from the spectral subtraction step and modified phase spectrum from the phase compensation step, which is found to be a better representation of enhanced speech spectrum. Speech files available in the NOIZEUS database are used to carry extensive simulations for evaluation of the proposed method.

Posted Content•
TL;DR: The proposed method outperforms some of the state-of-the-art speech enhancement methods both at high and low levels of SNRs in terms of standard objective measures and subjective evaluations including formal listening tests.
Abstract: In this paper, for real time enhancement of noisy speech, a method of threshold determination based on modeling of Teager energy (TE) operated perceptual wavelet packet (PWP) coefficients of the noisy speech and noise by an Erlang-2 PDF is presented The proposed method is computationally much faster than the existing wavelet packet based thresholding methods A custom thresholding function based on a combination of mu-law and semisoft thresholding functions is designed and exploited to apply the statistically derived threshold upon the PWP coefficients The proposed custom thresholding function works as a mu-law or a semisoft thresholding function or their combination based on the probability of speech presence and absence in a subband of the PWP transformed noisy speech By using the speech files available in NOIZEUS database, a number of simulations are performed to evaluate the performance of the proposed method for speech signals in the presence of Gaussian white and street noises The proposed method outperforms some of the state-of-the-art speech enhancement methods both at high and low levels of SNRs in terms of standard objective measures and subjective evaluations including formal listening tests

Posted Content•
TL;DR: A confidence parameter of noise estimation is introduced in the gain function of the proposed method to prevent subtraction of the overestimated and underestimated noise, which not only removes the noise efficiently but also prevents the speech distortion.
Abstract: A speech enhancement method based on probabilistic geometric approach to spectral subtraction (PGA) performed on short time magnitude spectrum is presented in this paper. A confidence parameter of noise estimation is introduced in the gain function of the proposed method to prevent subtraction of the overestimated and underestimated noise, which not only removes the noise efficiently but also prevents the speech distortion. The noise compensated magnitude spectrum is then recombined with the unchanged phase spectrum to produce a modified complex spectrum prior to synthesize an enhanced frame. Extensive simulations are carried out using the speech files available in the NOIZEUS database in order to evaluate the performance of the proposed method.

Proceedings Article•DOI•
13 Apr 2018
TL;DR: A new method is proposed to estimate a priori probability in MAP metric of H.264 intra modes decoder from the intra block modes seated in its spatially adjacent macroblocks previously generated up to the current stage of the decoding tree.
Abstract: In this paper, residual redundancy in compressed videos is exploited to alleviate transmission errors using joint source channel arithmetic decoding. A new method is proposed to estimate a priori probability in MAP metric of H.264 intra modes decoder. The decoder generates a decoding tree using a breadth first search algorithm. An introduced statistical model is then implemented stage by stage over the decoding tree. In this model, a priori PMF of intra block modes in a macroblock is estimated from the intra block modes seated in its spatially adjacent macroblocks previously generated up to the current stage of the decoding tree. The estimated PMFs are categorized as either reliable or unreliable based on their local entropies. In the unreliable case, the decoder assumes uniform PMF and switch to ML metric instead. The simulation results show the proposed method reduces the error rate 1 % to 13% at various SNRs compared to the ML.

Proceedings Article•DOI•
01 Dec 2018
TL;DR: In this paper, a variational step was proposed to remove the heavy tail of the noise distribution, which can significantly improve denoising performance of mixed AWGN-IN using well-established methods.
Abstract: Reduction of mixed noise is an ill posed problem for the occurrence of contrasting distributions of noise in the image. The mixed noise that is usually encountered is the simultaneous presence of additive white Gaussian noise (AWGN) and impulse noise (IN). A standard approach to denoise an image with such corruption is to apply a rank order filter (ROF) followed by an efficient linear filter to remove the residual noise. However, ROF cannot completely remove the heavy tail of the noise distribution originating from the IN and thus the denoising performance can be suboptimal. In this paper, we present a variational step to remove the heavy tail of the noise distribution. Through experiments, it is shown that this approach can significantly improve the denoising performance of mixed AWGN-IN using well-established methods.

Proceedings Article•DOI•
13 May 2018
TL;DR: This work develops learning techniques that is inspired by the way a human brain identifies images by developing CNN models by providing most useful information to the network by leveraging the joint information from wavelet compressed image patches and class activation maps.
Abstract: Image aesthetics classification is the method of visualizing and classifying images based on the visual signatures in the data rather than the semantics associated with it. In this work, we develop learning techniques that is inspired by the way a human brain identifies images. We develop CNN models by providing most useful information to the network by leveraging the joint information from wavelet compressed image patches and class activation maps (CAM). The performance of the network in recognizing the image based on simple visual aesthetics signatures is shown to be better than existing techniques with few caveats.

Proceedings Article•DOI•
01 Oct 2018
TL;DR: A novel technique for an effective segmentation of the moving foreground from video sequences with a dynamic background is developed and results show the superiority of the proposed scheme in providing a segmented foreground binary mask that fits more closely with the corresponding ground truth mask than those obtained by the other methods do.
Abstract: Segmentation of a moving foreground from video sequences, in the presence of a rapidly changing background, is a difficult problem. In this paper, a novel technique for an effective segmentation of the moving foreground from video sequences with a dynamic background is developed. The segmentation problem is treated as a problem of classifying the foreground and background pixels of a video frame using the color components of the pixels as multiple features of the images. The gray levels of the pixels and the hue and saturation level components in the HSV representation of the pixels of a frame are used to form a scalar-valued feature image. This feature image incorporating multiple features of the pixels is then used to devise a simple classification scheme in the framework of a support vector machine classifier. Unlike some other data classification approaches for foreground segmentation in which a priori knowledge of the shape and size of the moving foreground is essential, in the proposed method, training samples are obtained in an automatic manner. In order to assess the effectiveness of the proposed method, the new scheme is applied to a number of video sequences with a dynamic background and the results are compared with those obtained by using other existing methods. The subjective and objective results show the superiority of the proposed scheme in providing a segmented foreground binary mask that fits more closely with the corresponding ground truth mask than those obtained by the other methods do.

Journal Article•DOI•
TL;DR: Quantitative and qualitative performance evaluations on three benchmark datasets demonstrate that the proposed tracking algorithm outperforms the state-of-the-art methods.
Abstract: The success of correlation filters in visual tracking has attracted much attention in computer vision due to their high efficiency and performance. However, they are not equipped with a mechanism to cope with challenging situations like scale variations, out-of-view, and camera motion. With the aim of dealing with such situations, a collaborative scheme of tracking based on the discriminative and generative models is proposed. Instead of finding all the affine motion parameters of the target by the combined likelihood of these models, the correlation filters, based on discriminative model, are used to find the position of the target, whereas 2D robust coding in a bilateral 2DPCA subspace, based on generative model, is used to find the other affine motion parameters of the target. Further, a 2D robust coding distance is proposed to differentiate the candidate samples from the subspace and used to compute the observation likelihood in the generative model. In addition, it is proposed to generate a robust occlusion map from the weights obtained during the residual minimization and a novel update mechanism of the appearance model for both the correlation filters and bilateral 2DPCA subspace is proposed. The proposed method is evaluated on the challenging image sequences available in the OTB-50, VOT2016, and UAV20L benchmark datasets, and its performance is compared with that of the state-of-the-art tracking algorithms. In contrast to OTB-50 and VOT2016, the dataset UAV20L contains long duration sequences with additional challenges introduced by both the camera motion and the view points in three dimensions. Quantitative and qualitative performance evaluations on three benchmark datasets demonstrate that the proposed tracking algorithm outperforms the state-of-the-art methods.