scispace - formally typeset
Search or ask a question

Showing papers in "EURASIP Journal on Advances in Signal Processing in 2011"


Journal ArticleDOI
TL;DR: This study explains how link and system level simulations are connected and shows how the link level simulator serves as a reference to design the system level simulator, and compares the accuracy of the PHY modeling at system level by means of simulations performed both with bit-accurate link level simulations and PHY-model-based systemlevel simulations.
Abstract: In this article, we introduce MATLAB-based link and system level simulation environments for UMTS Long-Term Evolution (LTE). The source codes of both simulators are available under an academic non-commercial use license, allowing researchers full access to standard-compliant simulation environments. Owing to the open source availability, the simulators enable reproducible research in wireless communications and comparison of novel algorithms. In this study, we explain how link and system level simulations are connected and show how the link level simulator serves as a reference to design the system level simulator. We compare the accuracy of the PHY modeling at system level by means of simulations performed both with bit-accurate link level simulations and PHY-model-based system level simulations. We highlight some of the currently most interesting research questions for LTE, and explain by some research examples how our simulators can be applied.

292 citations


Journal ArticleDOI
TL;DR: A review of the pan-sharpening methods proposed in the literature giving a clear classification of them and a description of their main characteristics and how the quality of the pansharpened images can be assessed both visually and quantitatively is analyzed.
Abstract: There exist a number of satellites on different earth observation platforms, which provide multispectral images together with a panchromatic image, that is, an image containing reflectance data representative of a wide range of bands and wavelengths. Pansharpening is a pixel-level fusion technique used to increase the spatial resolution of the multispectral image while simultaneously preserving its spectral information. In this paper, we provide a review of the pan-sharpening methods proposed in the literature giving a clear classification of them and a description of their main characteristics. Finally, we analyze how the quality of the pansharpened images can be assessed both visually and quantitatively and examine the different quality measures proposed for that purpose.

262 citations


Journal ArticleDOI
TL;DR: If the experiments are performed on a large data set, the algorithm is compared to the state-of-the-art methods, the code and/or data are well documented and available online, the community will all benefit and make it easier to build upon each other's work.
Abstract: Reproducible research results become more and more an important issue as systems under investigation are growing permanently in complexity, and it becomes thus almost impossible to judge the accuracy of research results merely on the bare paper presentation.

152 citations


Journal ArticleDOI
TL;DR: An implementation of TDD reciprocity based zero-forcing linear precoding on a wireless testbed with considerable performance improvements over reference schemes and a calibration technique which self-calibrates the base-station without the need for help from other nodes is described.
Abstract: We describe an implementation of TDD reciprocity based zero-forcing linear precoding on a wireless testbed. A calibration technique which self-calibrates the base-station without the need for help from other nodes is described. Performance results in terms of downlink channel estimation error as well as bit error rate (BER) and signal to interference noise and distortion ratio (SINDR) are presented for a scenario with two base-stations and two mobile stations, with two antennas at the base-stations and a single antenna at the mobile-station. The results show considerable performance improvements over reference schemes (such as maximum ratio transmission). However, our analysis also reveals that the hardware impairments significantly limit the performance achieved. We further investigate how to model these impairments and attempt to predict the SINDR, such as what would be needed in a coordinated multipoint (CoMP) scenario where scheduling is performed jointly over the two cells. Although the results are obtained for a MISO scenario the general conclusions are relevant also for MIMO scenarios.

107 citations


Journal ArticleDOI
TL;DR: This article proposes a novel hyperspectral imagery super-resolution method by utilizing the sparse representation and spectral mixing model and introduces an adaptive regularization terms into the sparse represented framework by combining the linear spectrum mixing model.
Abstract: For the instrument limitation and imperfect imaging optics, it is difficult to acquire high spatial resolution hyperspectral imagery. Low spatial resolution will result in a lot of mixed pixels and greatly degrade the detection and recognition performance, affect the related application in civil and military fields. As a powerful statistical image modeling technique, sparse representation can be utilized to analyze the hyperspectral image efficiently. Hyperspectral imagery is intrinsically sparse in spatial and spectral domains, and image super-resolution quality largely depends on whether the prior knowledge is utilized properly. In this article, we propose a novel hyperspectral imagery super-resolution method by utilizing the sparse representation and spectral mixing model. Based on the sparse representation model and hyperspectral image acquisition process model, small patches of hyperspectral observations from different wavelengths can be represented as weighted linear combinations of a small number of atoms in pre-trained dictionary. Then super-resolution is treated as a least squares problem with sparse constraints. To maintain the spectral consistency, we further introduce an adaptive regularization terms into the sparse representation framework by combining the linear spectrum mixing model. Extensive experiments validate that the proposed method achieves much better results.

95 citations


Journal ArticleDOI
TL;DR: This paper presents a meta-modelling system that automates the very labor-intensive and therefore time-heavy and expensive and therefore expensive and expensive process of designing and implementing digital music synthesis systems.
Abstract: 1Dipartimento di Elettronica e Informazione (DEI), Politecnico di Milano, Milano, Italy 2Department of Signal Processing and Communications, Helmut-Schmidt-University—University of Federal Armed Forces Hamburg, Germany 3Music Technology Group, Department of Information and Communication Technologies & Audiovisual Institute, Universitat Pompeu Fabra, Barcelona, Spain 4Centre for Digital Music (C4DM), School of Electronic Engineering and Computer Science, Queen Mary University of London, UK 5Department of Engineering, University of Cambridge, UK

92 citations


Journal ArticleDOI
TL;DR: The results confirm that the short-term Rényi entropy can be an effective tool for estimating the local number of components present in the signal, using a quadratic separable kernel TFD.
Abstract: The time-frequency Renyi entropy provides a measure of complexity of a nonstationary multicomponent signal in the time-frequency plane. When the complexity of a signal corresponds to the number of its components, then this information is measured as the Renyi entropy of the time-frequency distribution (TFD) of the signal. This article presents a solution to the problem of detecting the number of components that are present in short-time interval of the signal TFD, using the short-term Renyi entropy. The method is automatic and it does not require a prior information about the signal. The algorithm is applied on both synthetic and real data, using a quadratic separable kernel TFD. The results confirm that the short-term Renyi entropy can be an effective tool for estimating the local number of components present in the signal. The key aspect of selecting a suitable TFD is also discussed.

87 citations


Journal ArticleDOI
TL;DR: This article shows how to use tensor calculus to extend matrix-based MOS schemes and presents the proposed multi-dimensional model order selection scheme based on the closed-form PARAFAC algorithm, which is only applicable to multi- dimensional data.
Abstract: Multi-dimensional model order selection (MOS) techniques achieve an improved accuracy, reliability, and robustness, since they consider all dimensions jointly during the estimation of parameters. Additionally, from fundamental identifiability results of multi-dimensional decompositions, it is known that the number of main components can be larger when compared to matrix-based decompositions. In this article, we show how to use tensor calculus to extend matrix-based MOS schemes and we also present our proposed multi-dimensional model order selection scheme based on the closed-form PARAFAC algorithm, which is only applicable to multi-dimensional data. In general, as shown by means of simulations, the Probability of correct Detection (PoD) of our proposed multi-dimensional MOS schemes is much better than the PoD of matrix-based schemes.

80 citations


Journal ArticleDOI
TL;DR: The obtained results are compared to the multicomponent signal ICI-based IF estimation method for various window types and SNRs, showing the estimation accuracy improvement in terms of the mean squared error (MSE) by up to 23%.
Abstract: A method for components instantaneous frequency (IF) estimation of multicomponent signals in low signal-to-noise ratio (SNR) is proposed. The method combines a new proposed modification of a blind source separation (BSS) algorithm for components separation, with the improved adaptive IF estimation procedure based on the modified sliding pairwise intersection of confidence intervals (ICI) rule. The obtained results are compared to the multicomponent signal ICI-based IF estimation method for various window types and SNRs, showing the estimation accuracy improvement in terms of the mean squared error (MSE) by up to 23%. Furthermore, the highest improvement is achieved for low SNRs values, when many of the existing methods fail.

64 citations


Journal ArticleDOI
TL;DR: The proposed detection method shows high detection accuracy of bird tonal components and significant recognition accuracy improvements over the Mel-frequency cepstral coefficients, in both standard and noise-compensated models, and strong robustness to mismatch between the training and testing conditions.
Abstract: This paper presents a study of automatic detection and recognition of tonal bird sounds in noisy environments. The detection of spectro-temporal regions containing bird tonal vocalisations is based on exploiting the spectral shape to identify sinusoidal components in the short-time spectrum. The detection method provides tonal-based feature representation that is employed for automatic bird recognition. The recognition system uses Gaussian mixture models to model 165 different bird syllables, produced by 95 bird species. Standard models, as well as models compensating for the effect of the noise, are employed. Experiments are performed on bird sound recordings corrupted by White noise and real-world environmental noise. The proposed detection method shows high detection accuracy of bird tonal components. The employed tonal-based features show significant recognition accuracy improvements over the Mel-frequency cepstral coefficients, in both standard and noise-compensated models, and strong robustness to mismatch between the training and testing conditions.

62 citations


Journal ArticleDOI
TL;DR: A model-based algorithm for detection of percussive events is introduced and the results indicate that the approach is promising and applicable in design and development of interactive musical systems.
Abstract: Interactive musical systems require real-time, low-latency, accurate, and reliable event detection and classification algorithms. In this paper, we introduce a model-based algorithm for detection of percussive events and test the algorithm on the detection and classification of different percussive sounds. We focus on tuning the algorithm for a good compromise between temporal precision, classification accuracy and low latency. The model is trained offline on different percussive sounds using the expectation maximization approach for learning spectral templates for each sound and is able to run online to detect and classify sounds from audio stream input by a Hidden Markov Model. Our results indicate that the approach is promising and applicable in design and development of interactive musical systems.

Journal ArticleDOI
TL;DR: Two reduced-complexity (RC) versions of the IAA and IAA based on maximum likelihood (IAA-ML) algorithms are proposed and provide similar results to those obtained with their original counterparts.
Abstract: We address the 2D direction-of-arrival (DOA) estimation problem in scenarios with coherent sources. More specifically, we adopt beamforming solutions based on the iterative adaptive approach (IAA) recently proposed in the literature. The motivation of such adoption mainly comes from the excellent behavior these beamformers provide in scenarios with coherent sources. Nonetheless, these strategies suffer from a prohibitive computational complexity, especially in 2D scenarios. In order to alleviate the, we propose two reduced-complexity (RC) versions of the IAA and IAA based on maximum likelihood (IAA-ML) algorithms. The proposed beamformers are referred to as IAA-RC and IAA-ML-RC and provide similar results to those obtained with their original counterparts. Computational complexity, however, is further reduced. Numerical results presented in the paper show that the computational burden can be decreased by a 52% with our proposed solutions in the considered scenarios.

Journal ArticleDOI
TL;DR: New pixel- and region-based multiresolution image fusion algorithms are introduced in this paper using the Parameterized Logarithmic Image Processing (PLIP) model, a framework more suitable for processing images.
Abstract: New pixel- and region-based multiresolution image fusion algorithms are introduced in this paper using the Parameterized Logarithmic Image Processing (PLIP) model, a framework more suitable for processing images. A mathematical analysis shows that the Logarithmic Image Processing (LIP) model and standard mathematical operators are extreme cases of the PLIP model operators. Moreover, the PLIP model operators also have the ability to take on cases in between LIP and standard operators based on the visual requirements of the input images. PLIP-based multiresolution decomposition schemes are developed and thoroughly applied for image fusion as analysis and synthesis methods. The new decomposition schemes and fusion rules yield novel image fusion algorithms which are able to provide visually more pleasing fusion results. LIP-based multiresolution image fusion approaches are consequently formulated due to the generalized nature of the PLIP model. Computer simulations illustrate that the proposed image fusion algorithms using the Parameterized Logarithmic Laplacian Pyramid, Parameterized Logarithmic DiscreteWavelet Transform, and Parameterized Logarithmic Stationary Wavelet Transform outperform their respective traditional approaches by both qualitative and quantitative means. The algorithms were tested over a range of different image classes, including out-of-focus, medical, surveillance, and remote sensing images.

Journal ArticleDOI
TL;DR: A new approximation basis, the Gaussian basis, which is more compact both in time and frequency domain, is proposed and the reconstruction results from different bases under different parameter settings are compared.
Abstract: Time-encoding circuits operate in an asynchronous mode and thus are very suitable for ultra-wideband applications. However, this asynchronous mode leads to nonuniform sampling that requires computationally complex decoding algorithms to recover the input signals. In the encoding and decoding process, many non-idealities in circuits and the computing system can affect the final signal recovery. In this article, the sources of the distortion are analyzed for proper parameter setting. In the analysis, the decoding problem is generalized as a function approximation problem. The characteristics of the bases used in existing algorithms are examined. These bases typically require long time support to reach good frequency property. Long time support not only increases computation complexity, but also increases approximation error when the signal is reconstructed through short patches. Hence, a new approximation basis, the Gaussian basis, which is more compact both in time and frequency domain, is proposed. The reconstruction results from different bases under different parameter settings are compared.

Journal ArticleDOI
TL;DR: A method for predicting the user mental state for the development of more efficient and usable spoken dialogue systems, implemented in the UAH system, shows that taking into account the user's mental state improves system performance as well as its perceived quality.
Abstract: In this paper we propose a method for predicting the user mental state for the development of more efficient and usable spoken dialogue systems. This prediction, carried out for each user turn in the dialogue, makes it possible to adapt the system dynamically to the user needs. The mental state is built on the basis of the emotional state of the user and their intention, and is recognized by means of a module conceived as an intermediate phase between natural language understanding and the dialogue management in the architecture of the systems. We have implemented the method in the UAH system, for which the evaluation results with both simulated and real users show that taking into account the user's mental state improves system performance as well as its perceived quality.

Journal ArticleDOI
TL;DR: This paper presents here a navigation framework that requires no additional hardware than the already existing naval radar sensor and shows that visual radar features can be used to accurately estimate the vessel trajectory over an extensive data set.
Abstract: A vessel navigating in a critical environment such as an archipelago requires very accurate movement estimates. Intentional or unintentional jamming makes GPS unreliable as the only source of information and an additional independent supporting navigation system should be used. In this paper, we suggest estimating the vessel movements using a sequence of radar images from the preexisting body-fixed radar. Island landmarks in the radar scans are tracked between multiple scans using visual features. This provides information not only about the position of the vessel but also of its course and velocity. We present here a navigation framework that requires no additional hardware than the already existing naval radar sensor. Experiments show that visual radar features can be used to accurately estimate the vessel trajectory over an extensive data set.

Journal ArticleDOI
TL;DR: In this paper, a trilinear decomposition-based 2D-DOA estimation algorithm in L-shaped array was proposed. But the proposed algorithm requires spectral peak searching and pairing.
Abstract: Two-dimensional (2D) direction-of-arrival (DOA) estimation has played an important role in array signal processing. In this article, we address a problem of bind 2D-DOA estimation with L-shaped array. This article links the 2D-DOA estimation problem to the trilinear model. To exploit this link, we derive a trilinear decomposition-based 2D-DOA estimation algorithm in L-shaped array. Without spectral peak searching and pairing, the proposed algorithm employs well. Moreover, our algorithm has much better 2D-DOA estimation performance than the estimation of signal parameters via rotational invariance technique algorithms and propagator method. Simulation results illustrate validity of the algorithm.

Journal ArticleDOI
TL;DR: Under which conditions filtering can visibly improve the image quality is addressed, and it is demonstrated that it is possible to roughly estimate whether or not the visual quality can clearly be improved by filtering.
Abstract: This article addresses under which conditions filtering can visibly improve the image quality. The key points are the following. First, we analyze filtering efficiency for 25 test images, from the color image database TID2008. This database allows assessing filter efficiency for images corrupted by different noise types for several levels of noise variance. Second, the limit of filtering efficiency is determined for independent and identically distributed (i.i.d.) additive noise and compared to the output mean square error of state-of-the-art filters. Third, component-wise and vector denoising is studied, where the latter approach is demonstrated to be more efficient. Fourth, using of modern visual quality metrics, we determine that for which levels of i.i.d. and spatially correlated noise the noise in original images or residual noise and distortions because of filtering in output images are practically invisible. We also demonstrate that it is possible to roughly estimate whether or not the visual quality can clearly be improved by filtering.

Journal ArticleDOI
TL;DR: Three different ways of using SCFDE with DOW communications are proposed and it is shown that they exhibit lower PAPR and provide better bit-error rate (BER) performance in the presence of the LED nonlinearity.
Abstract: We investigate the use of single carrier frequency domain equalization (SCFDE) over a diffuse optical wireless (DOW) communications. Recently orthogonal frequency division multiplexing (OFDM) has been applied to DOW communications. However, due to high peak-to-average power ratio (PAPR), the performance of OFDM can severely be affected by the nonlinear characteristics of light emitting diodes (LED). To avoid a PAPR problem, we present in this paper a modified form of SCFDE for DOW communications. We propose three different ways of using SCFDE with DOW communications and show that they exhibit lower PAPR and provide better bit-error rate (BER) performance in the presence of the LED nonlinearity.

Journal ArticleDOI
TL;DR: The recently introduced multivariate empirical mode decomposition (MEMD) for quantifying multivariate phase synchrony is used to quantify multivariate synchronization within a network of oscillators using measures of multiple correlation and complexity.
Abstract: Quantifying the phase synchrony between signals is important in many different applications, including the study of the chaotic oscillators in physics and the modeling of the joint dynamics between channels of brain activity recorded by electroencephalogram (EEG). Current measures of phase synchrony rely on either the wavelet transform or the Hilbert transform of the signals and suffer from constraints such as the limit on time-frequency resolution in the wavelet analysis and the prefiltering requirement in Hilbert transform. Furthermore, the current phase synchrony measures are limited to quantifying bivariate relationships and do not reveal any information about multivariate synchronization patterns, which are important for understanding the underlying oscillatory networks. In this paper, we address these two issues by employing the recently introduced multivariate empirical mode decomposition (MEMD) for quantifying multivariate phase synchrony. First, an MEMD-based bivariate phase synchrony measure is defined for a more robust description of time-varying phase synchrony across frequencies. Second, the proposed bivariate phase synchronization index is used to quantify multivariate synchronization within a network of oscillators using measures of multiple correlation and complexity. Finally, the proposed measures are applied to both simulated networks of chaotic oscillators and real EEG data.

Journal ArticleDOI
TL;DR: This paper describes an original method for multilabel classification problems derived from a Bayesian version of the k-nearest neighbor (k-NN) rule, which takes into account the dependencies between labels.
Abstract: In multilabel classification, each instance in the training set is associated with a set of labels, and the task is to output a label set whose size is unknown a priori for each unseen instance The most commonly used approach for multilabel classification is where a binary classifier is learned independently for each possible class However, multilabeled data generally exhibit relationships between labels, and this approach fails to take such relationships into account In this paper, we describe an original method for multilabel classification problems derived from a Bayesian version of the k-nearest neighbor (k-NN) rule The method developed here is an improvement on an existing method for multilabel classification, namely multilabel k-NN, which takes into account the dependencies between labels Experiments on simulated and benchmark datasets show the usefulness and the efficiency of the proposed approach as compared to other existing methods

Journal ArticleDOI
TL;DR: This study implements fast and exact variants of the hard and fuzzy c-means algorithms with several initialization schemes and then compares the resulting quantizers on a diverse set of images to demonstrate that fuzzy c -means is significantly slower than hard c-Means, and that with respect to output quality, the former algorithm is neither objectively nor subjectively superior to the latter.
Abstract: Color quantization is an important operation with many applications in graphics and image processing. Most quantization methods are essentially based on data clustering algorithms. Recent studies have demonstrated the effectiveness of hard c-means (k-means) clustering algorithm in this domain. Other studies reported similar findings pertaining to the fuzzy c-means algorithm. Interestingly, none of these studies directly compared the two types of c-means algorithms. In this study, we implement fast and exact variants of the hard and fuzzy c-means algorithms with several initialization schemes and then compare the resulting quantizers on a diverse set of images. The results demonstrate that fuzzy c-means is significantly slower than hard c-means, and that with respect to output quality, the former algorithm is neither objectively nor subjectively superior to the latter.

Journal ArticleDOI
TL;DR: A low-cost system for 2D eye gaze estimation with low-resolution webcam images is presented that does not employ the real pupil-center as a reference point for gaze estimation which is more reliable against corneal reflection.
Abstract: In this article, a low-cost system for 2D eye gaze estimation with low-resolution webcam images is presented. Two algorithms are proposed for this purpose, one for the eye-ball detection with stable approximate pupil-center and the other one for the eye movements' direction detection. Eyeball is detected using deformable angular integral search by minimum intensity (DAISMI) algorithm. Deformable template-based 2D gaze estimation (DTBGE) algorithm is employed as a noise filter for deciding the stable movement decisions. While DTBGE employs binary images, DAISMI employs gray-scale images. Right and left eye estimates are evaluated separately. DAISMI finds the stable approximate pupil-center location by calculating the mass-center of eyeball border vertices to be employed for initial deformable template alignment. DTBGE starts running with initial alignment and updates the template alignment with resulting eye movements and eyeball size frame by frame. The horizontal and vertical deviation of eye movements through eyeball size is considered as if it is directly proportional with the deviation of cursor movements in a certain screen size and resolution. The core advantage of the system is that it does not employ the real pupil-center as a reference point for gaze estimation which is more reliable against corneal reflection. Visual angle accuracy is used for the evaluation and benchmarking of the system. Effectiveness of the proposed system is presented and experimental results are shown.

Journal ArticleDOI
TL;DR: Since it acquires the multispectral images in one shot, the proposed system can solve the limitations of slow and complex acquisition process, and costliness of the state of the art mult ispectral imaging systems, leading to its possible uses in widespread applications.
Abstract: This paper proposes a one-shot six-channel multispectral color image acquisition system using a stereo camera and a pair of optical filters. The two filters from the best pair, selected from among readily available filters such that they modify the sensitivities of the two cameras in such a way that they produce optimal estimation of spectral reflectance and/or color, are placed in front of the two lenses of the stereo camera. The two images acquired from the stereo camera are then registered for pixel-to-pixel correspondence. The spectral reflectance and/or color at each pixel on the scene are estimated from the corresponding camera outputs in the two images. Both simulations and experiments have shown that the proposed system performs well both spectrally and colorimetrically. Since it acquires the multispectral images in one shot, the proposed system can solve the limitations of slow and complex acquisition process, and costliness of the state of the art multispectral imaging systems, leading to its possible uses in widespread applications.

Journal ArticleDOI
TL;DR: The experimental results demonstrate that the method outperforms the existing score following system in 16 songs out of 20 polyphonic songs and that the error in the prediction of the score position is reduced by 69% on average.
Abstract: Our goal is to develop a coplayer music robot capable of presenting a musical expression together with humans. Although many instrument-performing robots exist, they may have difficulty playing with human performers due to the lack of the synchronization function. The robot has to follow differences in humans' performance such as temporal fluctuations to play with human performers. We classify synchronization and musical expression into two levels: (1) melody level and (2) rhythm level to cope with erroneous synchronizations. The idea is as follows: When the synchronization with the melody is reliable, respond to the pitch the robot hears, when the synchronization is uncertain, try to follow the rhythm of the music. Our method estimates the score position for the melody level and the tempo for the rhythm level. The reliability of the score position estimation is extracted from the probability distribution of the score position. The experimental results demonstrate that our method outperforms the existing score following system in 16 songs out of 20 polyphonic songs. The error in the prediction of the score position is reduced by 69% on average. The results also revealed that the switching mechanism alleviates the error in the estimation of the score position.

Journal ArticleDOI
TL;DR: In this paper, the problem of automatic detection of image areas appropriate for accurate estimation of additive noise standard deviation (STD) irrespectively to processed image properties is considered, and two complementary informative maps: noise- (NI-) and texture- (TI-) informative maps are determined and iteratively upgraded based on the Fisher information on noise STD calculated in scanning window.
Abstract: The problem of automatic detection of image areas appropriate for accurate estimation of additive noise standard deviation (STD) irrespectively to processed image properties is considered in this paper. For accurate estimation of either image texture or noise STD, we distinguish two complementary informative maps: noise- (NI-) and texture- (TI-) informative ones. The NI map is determined and iteratively upgraded based on the Fisher information on noise STD calculated in scanning window (SW) fashion. Fractional Brownian motion (fBm) model for image texture is used to derive the required Fisher information. To extract final noise STD from NI map, fBm- and DCT-based estimators are implemented. The performance of these two estimators is comparatively assessed on large image database for different noise levels. It is also compared with performance of two competitive state-of-the-art estimators recently published. Utilizing NI map along with DCT-based noise STD estimator has proved to be significantly more efficient.

Journal ArticleDOI
TL;DR: This study addresses the problem of automatic detection of speech under stress using a previously developed feature extraction scheme based on the Teager Energy Operator (TEO) to improve detection performance and proposes a selected sub-band frequency partitioned weighting scheme and Weighting scheme for all frequency bands.
Abstract: The problem of detecting psychological stress from speech is challenging due to differences in how speakers convey stress. Changes in speech production due to speaker state are not linearly dependent on changes in stress. Research is further complicated by the existence of different stress types and the lack of metrics capable of discriminating stress levels. This study addresses the problem of automatic detection of speech under stress using a previously developed feature extraction scheme based on the Teager Energy Operator (TEO). To improve detection performance a (i) selected sub-band frequency partitioned weighting scheme and (ii) weighting scheme for all frequency bands are proposed. Using the traditional TEO-based feature vector with a closed-speaker Hidden Markov Model-trained stressed speech classifier, error rates of 22.5/13.0% for stress/neutral speech are obtained. With the new weighted sub-band detection scheme, closed-speaker error rates are reduced to 4.7/4.6% for stress/neutral detection, with a relative error reduction of 79.1/64.6%, respectively. For the open-speaker case, stress/neutral speech detection error rates of 69.7/16.2% using traditional features are used to 13.1/4.0% (a relative 81.3/75.4% reduction) with the proposed automatic frequency sub-band weighting scheme. Finally, issues related to speaker dependent/independent scenarios, vowel duration, and mismatched vowel type on stress detection performance are discussed.

Journal ArticleDOI
TL;DR: This article investigates the degradation due to DA/AD conversions via sound cards, which can be decomposed into volume change, additional noise, and time-scale modification (TSM), and proposes a solution by considering the effect of the volume change and TSM.
Abstract: Digital audio watermarking robust against digital-to-analog (D/A) and analog-to-digital (A/D) conversions is an important issue. In a number of watermark application scenarios, D/A and A/D conversions are involved. In this article, we first investigate the degradation due to DA/AD conversions via sound cards, which can be decomposed into volume change, additional noise, and time-scale modification (TSM). Then, we propose a solution for DA/AD conversions by considering the effect of the volume change, additional noise and TSM. For the volume change, we introduce relation-based watermarking method by modifying groups of the energy relation of three adjacent DWT coefficient sections. For the additional noise, we pick up the lowest-frequency coefficients for watermarking. For the TSM, the synchronization technique (with synchronization codes and an interpolation processing operation) is exploited. Simulation tests show the proposed audio watermarking algorithm provides a satisfactory performance to DA/AD conversions and those common audio processing manipulations.

Journal ArticleDOI
TL;DR: It is concluded that NMF features can significantly contribute to the robustness of state-of-the-art emotion recognition engines in practical application scenarios where different noise and reverberation conditions have to be faced.
Abstract: We present a comprehensive study on the effect of reverberation and background noise on the recognition of nonprototypical emotions from speech We carry out our evaluation on a single, well-defined task based on the FAU Aibo Emotion Corpus consisting of spontaneous children's speech, which was used in the INTERSPEECH 2009 Emotion Challenge, the first of its kind Based on the challenge task, and relying on well-proven methodologies from the speech recognition domain, we derive test scenarios with realistic noise and reverberation conditions, including matched as well as mismatched condition training As feature extraction based on supervised Nonnegative Matrix Factorization (NMF) has been proposed in automatic speech recognition for enhanced robustness, we introduce and evaluate different kinds of NMF-based features for emotion recognition We conclude that NMF features can significantly contribute to the robustness of state-of-the-art emotion recognition engines in practical application scenarios where different noise and reverberation conditions have to be faced

Journal ArticleDOI
TL;DR: An automatic pavement crack classification approach, exploiting the spatial distribution features of the cracks under a neural network model, with empirical study indicates a classification precision of over 98% of the proposed approach.
Abstract: Pavement crack types provide important information for making pavement maintenance strategies. This paper proposes an automatic pavement crack classification approach, exploiting the spatial distribution features (i.e., direction feature and density feature) of the cracks under a neural network model. In this approach, a direction coding (D-Coding) algorithm is presented to encode the crack subsections and extract the direction features, and a Delaunay Triangulation technique is employed to analyze the crack region structure and extract the density features. As regarding skeletonized crack sections rather than crack pixels, the spatial distribution features hold considerable feature significance for each type of cracks. Empirical study indicates a classification precision of over 98% of the proposed approach.