scispace - formally typeset
Search or ask a question

Showing papers in "EURASIP Journal on Advances in Signal Processing in 2009"


Journal ArticleDOI
TL;DR: An eight-channel database of head-related impulse responses and binaural room impulse responses is introduced, allowing for a realistic construction of simulated sound fields for hearing instrument research and, consequently, for a realism evaluation of hearing instrument algorithms.
Abstract: An eight-channel database of head-related impulse responses (HRIRs) and binaural room impulse responses (BRIRs) is introduced. The impulse responses (IRs) were measured with three-channel behind-the-ear (BTEs) hearing aids and an in-ear microphone at both ears of a human head and torso simulator. The database aims at providing a tool for the evaluation of multichannel hearing aid algorithms in hearing aid research. In addition to the HRIRs derived from measurements in an anechoic chamber, sets of BRIRs for multiple, realistic head and sound-source positions in four natural environments reflecting daily-life communication situations with different reverberation times are provided. For comparison, analytically derived IRs for a rigid acoustic sphere were computed at the multichannel microphone positions of the BTEs and differences to real HRIRs were examined. The scenes' natural acoustic background was also recorded in each of the real-world environments for all eight channels. Overall, the present database allows for a realistic construction of simulated sound fields for hearing instrument research and, consequently, for a realistic evaluation of hearing instrument algorithms.

299 citations


Journal ArticleDOI
TL;DR: Experimental results presented in this paper confirm the usefulness of the KPCA for the analysis of hyperspectral data and improve results in terms of accuracy.
Abstract: Kernel principal component analysis (KPCA) is investigated for feature extraction from hyperspectral remote sensing data. Features extracted using KPCA are classified using linear support vector machines. In one experiment, it is shown that kernel principal component features are more linearly separable than features extracted with conventional principal component analysis. In a second experiment, kernel principal components are used to construct the extended morphological profile (EMP). Classification results, in terms of accuracy, are improved in comparison to original approach which used conventional principal component analysis for constructing the EMP. Experimental results presented in this paper confirm the usefulness of the KPCA for the analysis of hyperspectral data. For the one data set, the overall classification accuracy increases from 79% to 96% with the proposed approach.

275 citations


Journal ArticleDOI
TL;DR: Multiview applications and solutions to support generic multiview as well as 3D services are introduced and cover a wide range of requirements for 3D video related to interface, transport of the MVC bitstreams, and MVC decoder resource management.
Abstract: Multiview video has gained a wide interest recently. The huge amount of data needed to be processed by multiview applications is a heavy burden for both transmission and decoding. The joint video team has recently devoted part of its effort to extend the widely deployed H.264/AVC standard to handle multiview video coding (MVC). The MVC extension of H.264/AVC includes a number of new techniques for improved coding efficiency, reduced decoding complexity, and new functionalities for multiview operations. MVC takes advantage of some of the interfaces and transport mechanisms introduced for the scalable video coding (SVC) extension of H.264/AVC, but the system level integration of MVC is conceptually more challenging as the decoder output may contain more than one view and can consist of any combination of the views with any temporal level. The generation of all the output views also requires careful consideration and control of the available decoder resources. In this paper, multiview applications and solutions to support generic multiview as well as 3D services are introduced. The proposed solutions, which have been adopted to the draft MVC specification, cover a wide range of requirements for 3D video related to interface, transport of the MVC bitstreams, and MVC decoder resource management. The features that have been introduced in MVC to support these solutions include marking of reference pictures, supporting for efficient view switching, structuring of the bitstream, signalling of view scalability supplemental enhancement information (SEI) and parallel decoding SEI.

262 citations


Journal ArticleDOI
TL;DR: The major finding of this work is that even in the presence of oblivious BSs (that is, BSs with no information about the codebooks) multicell processing is able to provide ideal performance with relatively small backhaul capacities, unless the application of interest requires high data rate and the backhaul capacity is not allowed to increase with the SNR.
Abstract: Multicell processing in the form of joint encoding for the downlink of a cellular system is studied under the assumption that the base stations (BSs) are connected to a central processor (CP) via finitecapacity links (finite-capacity backhaul). To obtain analytical insight into the impact of finite-capacity backhaul on the downlink throughput, the investigation focuses on a simple linear cellular system (as for a highway or a long avenue) based on the Wyner model. Several transmission schemes are proposed that require varying degrees of knowledge regarding the system codebooks at the BSs. Achievable rates are derived in closed-form and compared with an upper bound. Performance is also evaluated in asymptotic regimes of interest (high backhaul capacity and extreme signal-to-noise ratio, SNR) and further corroborated by numerical results. The major finding of this work is that even in the presence of oblivious BSs (that is, BSs with no information about the codebooks) multicell processing is able to provide ideal performance with relatively small backhaul capacities, unless the application of interest requires high data rate (i.e., high SNR) and the backhaul capacity is not allowed to increase with the SNR. In these latter cases, some form of codebook information at the BSs becomes necessary.

218 citations


Journal ArticleDOI
TL;DR: A new network signal modelling technique for detecting network anomalies, combining the wavelet approximation and system identification theory is proposed, which achieves high-detection rates in terms of both attack instances and attack types.
Abstract: Signal processing techniques have been applied recently for analyzing and detecting network anomalies due to their potential to find novel or unknown intrusions. In this paper, we propose a new network signal modelling technique for detecting network anomalies, combining the wavelet approximation and system identification theory. In order to characterize network traffic behaviors, we present fifteen features and use them as the input signals in our system. We then evaluate our approach with the 1999 DARPA intrusion detection dataset and conduct a comprehensive analysis of the intrusions in the dataset. Evaluation results show that the approach achieves high-detection rates in terms of both attack instances and attack types. Furthermore, we conduct a full day's evaluation in a real large-scale WiFi ISP network where five attack types are successfully detected from over 30 millions flows.

215 citations


Journal ArticleDOI
TL;DR: There is a large dependence of the methods on the amount of face and background information that is included in the face's images, and the performance of all methods decreases largely with outdoor-illumination, but LBP-based methods are an excellent election if the authors need real-time operation as well as high recognition rates.
Abstract: The aim of this work is to carry out a comparative study of face recognition methods that are suitable to work in unconstrained environments. The analyzed methods are selected by considering their performance in former comparative studies, in addition to be real-time, to require just one image per person, and to be fully online. In the study two local-matching methods, histograms of LBP features and Gabor Jet descriptors, one holistic method, generalized PCA, and two image-matching methods, SIFT-based and ERCF-based, are analyzed. The methods are compared using the FERET, LFW, UCHFaceHRI, and FRGC databases, which allows evaluating them in real-world conditions that include variations in scale, pose, lighting, focus, resolution, facial expression, accessories, makeup, occlusions, background and photographic quality. Main conclusions of this study are: there is a large dependence of the methods on the amount of face and background information that is included in the face's images, and the performance of all methods decreases largely with outdoor-illumination. The analyzed methods are robust to inaccurate alignment, face occlusions, and variations in expressions, to a large degree. LBP-based methods are an excellent election if we need real-time operation as well as high recognition rates.

185 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compared the performance of single-user (SU) and multiuser (MU) MIMO transmissions and derived closed-form approximations for achievable rates for both SU and MU-MIMO.
Abstract: Imperfect channel state information degrades the performance of multiple-input multiple-output (MIMO) communications; its effects on single-user (SU) and multiuser (MU) MIMO transmissions are quite different. In particular, MU-MIMO suffers from residual interuser interference due to imperfect channel state information while SU-MIMO only suffers from a power loss. This paper compares the throughput loss of both SU and MU-MIMO in the broadcast channel due to delay and channel quantization. Accurate closed-form approximations are derived for achievable rates for both SU and MU-MIMO. It is shown that SU-MIMO is relatively robust to delayed and quantized channel information, while MU-MIMO with zero-forcing precoding loses its spatial multiplexing gain with a fixed delay or fixed codebook size. Based on derived achievable rates, a mode switching algorithm is proposed, which switches between SU and MU-MIMO modes to improve the spectral efficiency based on average signal-to-noise ratio (SNR), normalized Doppler frequency, and the channel quantization codebook size. The operating regions for SU and MU modes with different delays and codebook sizes are determined, and they can be used to select the preferred mode. It is shown that the MU mode is active only when the normalized Doppler frequency is very small, and the codebook size is large.

177 citations


Journal ArticleDOI
TL;DR: A vision-based navigation architecture which combines inertial sensors, visual odometry, and registration of the on-board video to a geo-referenced aerial image is proposed which is capable of providing high-rate and drift-free state estimation for UAV autonomous navigation without the GPS system.
Abstract: This paper investigates the possibility of augmenting an Unmanned Aerial Vehicle (UAV) navigation system with a passive video camera in order to cope with long-term GPS outages. The paper proposes a vision-based navigation architecture which combines inertial sensors, visual odometry, and registration of the on-board video to a geo-referenced aerial image. The vision-aided navigation system developed is capable of providing high-rate and drift-free state estimation for UAV autonomous navigation without the GPS system. Due to the use of image-to-map registration for absolute position calculation, drift-free position performance depends on the structural characteristics of the terrain. Experimental evaluation of the approach based on offline flight data is provided. In addition the architecture proposed has been implemented on-board an experimental UAV helicopter platform and tested during vision-based autonomous flights.

169 citations


Journal ArticleDOI
TL;DR: An alternative approach where gait is collected by the sensors attached to the person's body reveals a sideway motion of the foot provides the most discrimination, compared to an up-down or forward-backward directions; and different segments of the gait cycle provide different level of discrimination.
Abstract: This paper presents an alternative approach, where gait is collected by the sensors attached to the person's body. Such wearable sensors record motion (e.g. acceleration) of the body parts during walking. The recorded motion signals are then investigated for person recognition purposes. We analyzed acceleration signals from the foot, hip, pocket and arm. Applying various methods, the best EER obtained for foot-, pocket-, arm-and hip-based user authentication were 5%, 7%, 10% and 13%, respectively. Furthermore, we present the results of our analysis on security assessment of gait. Studying gait-based user authentication (in case of hip motion) under three attack scenarios, we revealed that a minimal effort mimicking does not help to improve the acceptance chances of impostors. However, impostors who know their closest person in the database or the genders of the users can be a threat to gait-based authentication. We also provide some new insights toward the uniqueness of gait in case of foot motion. In particular, we revealed the following: a sideway motion of the foot provides the most discrimination, compared to an up-down or forward-backward directions; and different segments of the gait cycle provide different level of discrimination.

120 citations


Journal ArticleDOI
TL;DR: A new approximation method called nominal belief-state optimization (NBO), combined with other application-specific approximations and techniques within the POMDP framework, produces a practical design that coordinates the UAVs to achieve good long-term mean-squared-error tracking performance in the presence of occlusions and dynamic constraints.
Abstract: This paper discusses the application of the theory of partially observable Markov decision processes (POMDPs) to the design of guidance algorithms for controlling the motion of unmanned aerial vehicles (UAVs) with onboard sensors to improve tracking of multiple ground targets. While POMDP problems are intractable to solve exactly, principled approximation methods can be devised based on the theory that characterizes optimal solutions. A new approximation method called nominal belief-state optimization (NBO), combined with other application-specific approximations and techniques within the POMDP framework, produces a practical design that coordinates the UAVs to achieve good long-term mean-squared-error tracking performance in the presence of occlusions and dynamic constraints. The flexibility of the design is demonstrated by extending the objective to reduce the probability of a track swap in ambiguous situations.

111 citations


Journal ArticleDOI
TL;DR: The FFT system shows promise both as a stand-alone system and especially in combination with approaches that are based on local features, and as an approach using global features, the system possesses many advantages.
Abstract: We present a novel online signature verification system based on the Fast Fourier Transform. The advantage of using the Fourier domain is the ability to compactly represent an online signature using a fixed number of coefficients. The fixed-length representation leads to fast matching algorithms and is essential in certain applications. The challenge on the other hand is to find the right preprocessing steps and matching algorithm for this representation. We report on the effectiveness of the proposed method, along with the effects of individual preprocessing and normalization steps, based on comprehensive tests over two public signature databases. We also propose to use the pen-up duration information in identifying forgeries. The best results obtained on the SUSIG-Visual subcorpus and the MCYT-100 database are 6.2% and 12.1% error rate on skilled forgeries, respectively. The fusion of the proposed system with our state-of-the-art Dynamic Time Warping (DTW) system lowers the error rate of the DTW system by up to about 25%. While the current error rates are higher than state-of-the-art results for these databases, as an approach using global features, the system possesses many advantages. Considering also the suggested improvements, the FFT system shows promise both as a stand-alone system and especially in combination with approaches that are based on local features.

Journal ArticleDOI
TL;DR: This work analyzes the traffic generated by human players versus game bots and proposes general solutions to identify game bots, and discusses the robustness of the proposed methods against countermeasures of bot developers, and considers a number of possible ways to manage the increasingly serious bot problem.
Abstract: Massively multiplayer online role playing games (MMORPGs) have become extremely popular among network gamers. Despite their success, one of MMORPG's greatest challenges is the increasing use of game bots, that is, autoplaying game clients. The use of game bots is considered unsportsmanlike and is therefore forbidden. To keep games in order, game police, played by actual human players, often patrol game zones and question suspicious players. This practice, however, is labor-intensive and ineffective. To address this problem, we analyze the traffic generated by human players versus game bots and propose general solutions to identify game bots. Taking Ragnarok Online as our subject, we study the traffic generated by human players and game bots. We find that their traffic is distinguishable by 1) the regularity in the release time of client commands, 2) the trend and magnitude of traffic burstiness in multiple time scales, and 3) the sensitivity to different network conditions. Based on these findings, we propose four strategies and two ensemble schemes to identify bots. Finally, we discuss the robustness of the proposed methods against countermeasures of bot developers, and consider a number of possible ways to manage the increasingly serious bot problem.

Journal ArticleDOI
TL;DR: This paper investigates the application of the pure-pursuit path tracking technique for reactive tracking of paths that are implicitly defined by perceived environmental features in an indoor environment.
Abstract: Due to its simplicity and efficiency, the pure-pursuit path tracking method has been widely employed for planned navigation of nonholonomic ground vehicles. In this paper, we investigate the application of this technique for reactive tracking of paths that are implicitly defined by perceived environmental features. Goal points are obtained through an efficient interpretation of range data from an onboard 2D laser scanner to follow persons, corridors, and walls. Moreover, this formulation allows that a robotic mission can be composed of a combination of different types of path segments. These techniques have been successfully tested in the tracked mobile robot Auriga-α in an indoor environment.

Journal ArticleDOI
TL;DR: A novel methodology that is built on previous work and utilizes phonological features, automatic speech alignment based on acoustic models that were trained on normal speech, context-dependent speaker feature extraction, and intelligibility prediction based on a small model that can be trained on pathological speech samples is presented.
Abstract: It is commonly acknowledged that word or phoneme intelligibility is an important criterion in the assessment of the communication efficiency of a pathological speaker. People have therefore put a lot of effort in the design of perceptual intelligibility rating tests. These tests usually have the drawback that they employ unnatural speech material (e.g., nonsense words) and that they cannot fully exclude errors due to listener bias. Therefore, there is a growing interest in the application of objective automatic speech recognition technology to automate the intelligibility assessment. Current research is headed towards the design of automated methods which can be shown to produce ratings that correspond well with those emerging from a well-designed and well-performed perceptual test. In this paper, a novel methodology that is built on previous work (Middag et al., 2008) is presented. It utilizes phonological features, automatic speech alignment based on acoustic models that were trained on normal speech, context-dependent speaker feature extraction, and intelligibility prediction based on a small model that can be trained on pathological speech samples. The experimental evaluation of the new system reveals that the root mean squared error of the discrepancies between perceived and computed intelligibilities can be as low as 8 on a scale of 0 to 100.

Journal ArticleDOI
TL;DR: An efficient method for persons authentication is showed, a deep analysis of similarity metrics performance is presented and a set of feature points representing landmarks in the retinal vessel tree of the biometric system is described.
Abstract: Biometrics refer to identity verification of individuals based on some physiologic or behavioural characteristics. The typical authentication process of a person consists in extracting a biometric pattern of him/her and matching it with the stored pattern for the authorised user obtaining a similarity value between patterns. In this work an efficient method for persons authentication is showed. The biometric pattern of the system is a set of feature points representing landmarks in the retinal vessel tree. The pattern extraction and matching is described. Also, a deep analysis of similarity metrics performance is presented for the biometric system. A database with samples of retina images from users on different moments of time is used, thus simulating a hard and real environment of verification. Even in this scenario, the system allows to establish a wide confidence band for the metric threshold where no errors are obtained for training and test sets.

Journal ArticleDOI
TL;DR: The results showed that there were significant differences in the performance of the algorithms being evaluated, and the new proposed measure for jitter, LocJitt, performed in general is equal to or better than the commonly used tools of MDVP and Praat.
Abstract: This work is focused on the evaluation of different methods to estimate the amount of jitter present in speech signals. The jitter value is a measure of the irregularity of a quasiperiodic signal and is a good indicator of the presence of pathologies in the larynx such as vocal fold nodules or a vocal fold polyp. Given the irregular nature of the speech signal, each jitter estimation algorithm relies on its own model making a direct comparison of the results very difficult. For this reason, the evaluation of the different jitter estimation methods was target on their ability to detect pathological voices. Two databases were used for this evaluation: a subset of the MEEI database and a smaller database acquired in the scope of this work. The results showed that there were significant differences in the performance of the algorithms being evaluated. Surprisingly, in the largest database the best results were not achieved with the commonly used relative jitter, measured as a percentage of the glottal cycle, but with absolute jitter values measured in microseconds. Also, the new proposed measure for jitter, LocJitt, performed in general is equal to or better than the commonly used tools of MDVP and Praat.

Journal ArticleDOI
TL;DR: The benefit of using external acoustic sensor nodes for noise reduction in hearing aids is demonstrated in a simulated acoustic scenario with multiple sound sources and a modification to DANSE is proposed to increase its robustness, yielding smaller discrepancy between the performance of DAN SE and the centralized MWF.
Abstract: The benefit of using external acoustic sensor nodes for noise reduction in hearing aids is demonstrated in a simulated acoustic scenario with multiple sound sources. A distributed adaptive node-specific signal estimation (DANSE) algorithm, that has a reduced communication bandwidth and computational load, is evaluated. Batch-mode simulations compare the noise reduction performance of a centralized multi-channel Wiener filter (MWF) with DANSE. In the simulated scenario, DANSE is observed not to be able to achieve the same performance as its centralized MWF equivalent, although in theory both should generate the same set of filters. A modification to DANSE is proposed to increase its robustness, yielding smaller discrepancy between the performance of DANSE and the centralized MWF. Furthermore, the influence of several parameters such as the DFT size used for frequency domain processing and possible delays in the communication link between nodes is investigated.

Journal ArticleDOI
TL;DR: A fully distributed least mean-square (D-LMS) algorithm is developed in this paper, in which sensors exchange messages with single-hop neighbors to consent on the network-wide estimates adaptively, which accurately extend to the pragmatic setting whereby sensors acquire temporally correlated, not necessarily Gaussian data.
Abstract: Low-cost estimation of stationary signals and reduced-complexity tracking of nonstationary processes are well motivated tasks than can be accomplished using ad hoc wireless sensor networks (WSNs). To this end, a fully distributed least mean-square (D-LMS) algorithm is developed in this paper, in which sensors exchange messages with single-hop neighbors to consent on the network-wide estimates adaptively. The novel approach does not require a Hamiltonian cycle or a special bridge subset of sensors, while communications among sensors are allowed to be noisy. A mean-square error (MSE) performance analysis of DLMS is conducted in the presence of a time-varying parameter vector, which adheres to a first-order autoregressive model. For sensor observations that are related to the parameter vector of interest via a linear Gaussian model and after adopting simplifying independence assumptions, exact closed-form expressions are derived for the global and sensor-level MSE evolution as well as its steady-state (s.s.) values. Mean and MSE-sense stability of D-LMS are also established. Interestingly, extensive numerical tests demonstrate that for small step-sizes the results accurately extend to the pragmatic setting whereby sensors acquire temporally correlated, not necessarily Gaussian data.

Journal ArticleDOI
TL;DR: This paper quantifies the spectral efficiency gains obtainable under realistic propagation and operational conditions in a typical indoor deployment of network MIMO, a family of techniques whereby each end user in a wireless access network is served through several access points within its range of influence.
Abstract: It is well known that multiple-input multiple-output (MIMO) techniques can bring numerous benefits, such as higher spectral efficiency, to point-to-point wireless links. More recently, there has been interest in extending MIMO concepts to multiuser wireless systems. Our focus in this paper is on network MIMO, a family of techniques whereby each end user in a wireless access network is served through several access points within its range of influence. By tightly coordinating the transmission and reception of signals at multiple access points, network MIMO can transcend the limits on spectral efficiency imposed by cochannel interference. Taking prior information-theoretic analyses of network MIMO to the next level, we quantify the spectral efficiency gains obtainable under realistic propagation and operational conditions in a typical indoor deployment. Our study relies on detailed simulations and, for specificity, is conducted largely within the physical-layer framework of the IEEE 802.16e Mobile WiMAX system. Furthermore, to facilitate the coordination between access points, we assume that a high-capacity local area network, such as Gigabit Ethernet, connects all the access points. Our results confirm that network MIMO stands to provide a multiple-fold increase in spectral efficiency under these conditions.

Journal ArticleDOI
TL;DR: A novel no-reference blockiness metric that provides a quantitative measure of blocking annoyance in block-based DCT coding is presented and shows to be highly consistent with subjective data at a reduced computational load.
Abstract: A novel no-reference blockiness metric that provides a quantitative measure of blocking annoyance in block-based DCT coding is presented. The metric incorporates properties of the human visual system (HVS) to improve its reliability, while the additional cost introduced by the HVS is minimized to ensure its use for real-time processing. This is mainly achieved by calculating the local pixel-based distortion of the artifact itself, combined with its local visibility by means of a simplified model of visual masking. The overall computation efficiency and metric accuracy is further improved by including a grid detector to identify the exact location of blocking artifacts in a given image. The metric calculated only at the detected blocking artifacts is averaged over all blocking artifacts in the image to yield an overall blockiness score. The performance of this metric is compared to existing alternatives in literature and shows to be highly consistent with subjective data at a reduced computational load. As such, the proposed blockiness metric is promising in terms of both computational efficiency and practical reliability for real-life applications.

Journal ArticleDOI
TL;DR: A methodology for identity verification that quantifies the minimum number of heartbeats required to authenticate an enrolled individual is presented, based on the statistical theory of sequential procedures.
Abstract: The electrocardiogram (ECG) is an emerging novel biometric for human identification. One challenge for the practical use of ECG as a biometric is minimizing the time needed to acquire user data. We present a methodology for identity verification that quantifies the minimum number of heartbeats required to authenticate an enrolled individual. The approach rests on the statistical theory of sequential procedures. The procedure extracts fiducial features from each heartbeat to compute the test statistics. Sampling of heartbeats continues until a decision is reached--either verifying that the acquired ECG matches the stored credentials of the individual or that the ECG clearly does not match the stored credentials for the declared identity. We present the mathematical formulation of the sequential procedure and illustrate the performance with measured data. The initial test was performed on a limited population, twenty-nine individuals. The sequential procedure arrives at the correct decision in fifteen heartbeats or fewer in all but one instance and in most cases the decision is reached with half as many heartbeats. Analysis of an additional 75 subjects measured under different conditions indicates similar performance. Issues of generalizing beyond the laboratory setting are discussed and several avenues for future investigation are identified.

Journal ArticleDOI
TL;DR: An object tracking method is contributed that generates and maintains multiple hypotheses consisting of probabilistic state estimates that are generated by the individual information sources.
Abstract: To act intelligently in dynamic environments, mobile robots must estimate object positions using information obtained from a variety of sources. We formally describe the problem of estimating the state of objects where a robot can only task its sensors to view one object at a time. We contribute an object tracking method that generates and maintains multiple hypotheses consisting of probabilistic state estimates that are generated by the individual information sources. These different hypotheses can be generated by the robot's own prediction model and by communicating robot team members. The multiple hypotheses are often spatially disjoint and cannot simultaneously be verified by the robot's limited sensors. Instead, the robot must decide towards which hypothesis its sensors should be tasked by evaluating each hypothesis on its likelihood of containing the object. Our contributed algorithm prioritizes the different hypotheses, according to rankings set by the expected uncertainty in the object's motion model, as well as the uncertainties in the sources of information used to track their positions. We describe the algorithm in detail and show extensive empirical results in simulation as well as experiments on actual robots that demonstrate the effectiveness of our approach.

Journal ArticleDOI
TL;DR: Experimental results with hearing aid scenarios demonstrate that the proposed SDW-MWF incorporating the conditional SPP improves the signal-to-noise ratio compared to a traditional SDW -MWF.
Abstract: A multi-channel noise reduction technique is presented based on a Speech Distortion-Weighted Multi-channel Wiener Filter (SDW-MWF) approach that incorporates the conditional Speech Presence Probability (SPP). A traditional SDW-MWF uses a fixed parameter to a trade-off between noise reduction and speech distortion without taking speech presence into account. Consequently, the improvement in noise reduction comes at the cost of a higher speech distortion since the speech dominant segments and the noise dominant segments are weighted equally. Incorporating the conditional SPP in SDW-MWF allows to exploit the fact that speech may not be present at all frequencies and at all times, while the noise can indeed be continuously present. In speech dominant segments it is then desirable to have less noise reduction to avoid speech distortion, while in noise dominant segments it is desirable to have as much noise reduction as possible. Experimental results with hearing aid scenarios demonstrate that the proposed SDW-MWF incorporating the conditional SPP improves the signal-to-noise ratio compared to a traditional SDW-MWF.

Journal ArticleDOI
TL;DR: A review of the diversity of concepts and motivations for improving the concentration and resolution of time-frequency distributions (TFDs) along the individual components of the multi-component signals can be found in this paper.
Abstract: We present a review of the diversity of concepts and motivations for improving the concentration and resolution of time-frequency distributions (TFDs) along the individual components of the multi-component signals. The central idea has been to obtain a distribution that represents the signal's energy concentration simultaneously in time and frequency without blur and crosscomponents so that closely spaced components can be easily distinguished. The objective is the precise description of spectral content of a signal with respect to time, so that first, necessary mathematical and physical principles may be developed, and second, accurate understanding of a time-varying spectrum may become possible. The fundamentals in this area of research have been found developing steadily, with significant advances in the recent past.

Journal ArticleDOI
TL;DR: The aim of this paper is to reduce the amount of information for describing the motion of the texture video and of the depth map sequences by sharing one common motion vector field by proposing a new bitrate allocation strategy between the texture and its associated per-pixel depth information.
Abstract: The video-plus-depth data representation uses a regular texture video enriched with the so-called depth map, providing the depth distance for each pixel. The compression efficiency is usually higher for smooth, gray level data representing the depth map than for classical video texture. However, improvements of the coding efficiency are still possible, taking into account the fact that the video and the depth map sequences are strongly correlated. Classically, the correlation between the texture motion vectors and the depth map motion vectors is not exploited in the coding process. The aim of this paper is to reduce the amount of information for describing the motion of the texture video and of the depth map sequences by sharing one common motion vector field. Furthermore, in the literature, the bitrate control scheme generally fixes for the depth map sequence a percentage of 20% of the texture stream bitrate. However, this fixed percentage can affect the depth coding efficiency, and it should also depend on the content of each sequence. We propose a new bitrate allocation strategy between the texture and its associated per-pixel depth information.We provide comparative analysis to measure the quality of the resulting 3D + t sequences.

Journal ArticleDOI
TL;DR: Two techniques are developed that incorporate a model of the speaker's phonetic confusion matrix into the ASR process and attempt to correct the errors made at the phonetic level and make use of a language model to find the best estimate of the correct word sequence.
Abstract: Dysarthria is a motor speech disorder characterized by weakness, paralysis, or poor coordination of the muscles responsible for speech. Although automatic speech recognition (ASR) systems have been developed for disordered speech, factors such as low intelligibility and limited phonemic repertoire decrease speech recognition accuracy, making conventional speaker adaptation algorithms perform poorly on dysarthric speakers. In this work, rather than adapting the acoustic models, we model the errors made by the speaker and attempt to correct them. For this task, two techniques have been developed: (1) a set of "metamodels" that incorporate a model of the speaker's phonetic confusion matrix into the ASR process; (2) a cascade of weighted finite-state transducers at the confusion matrix, word, and language levels. Both techniques attempt to correct the errors made at the phonetic level and make use of a language model to find the best estimate of the correct word sequence. Our experiments show that both techniques outperform standard adaptation techniques.

Journal ArticleDOI
TL;DR: A new system for single-channel speech enhancement is proposed which achieves a joint suppression of late reverberant speech and background noise with a low signal delay and low computational complexity.
Abstract: A new system for single-channel speech enhancement is proposed which achieves a joint suppression of late reverberant speech and background noise with a low signal delay and low computational complexity. It is based on a generalized spectral subtraction rule which depends on the variances of the late reverberant speech and background noise. The calculation of the spectral variances of the late reverberant speech requires an estimate of the reverberation time (RT) which is accomplished by a maximum likelihood (ML) approach. The enhancement with this blind RT estimation achieves almost the same speech quality as by using the actual RT. In comparison to commonly used post-filters in hearing aids which only perform a noise reduction, a significantly better objective and subjective speech quality is achieved. The proposed system performs time-domain filtering with coefficients adapted in the non-uniform (Bark-scaled) frequency-domain. This allows to achieve a high speech quality with low signal delay which is important for speech enhancement in hearing aids or related applications such as hands-free communication systems.

Journal ArticleDOI
TL;DR: A detection rate optimized bit allocation (DROBA) principle, which assigns more bits to discrim inative features and fewer bits to nondiscriminative features, is presented and is applicable to arbitrary biometric modalities.
Abstract: Extracting binary strings from real-valued biometric templates is a fundamental step in many biometric template protection systems, such as fuzzy commitment, fuzzy extractor, secure sketch, and helper data systems. Previous work has been focusing on the design of optimal quantization and coding for each single feature component, yet the binary string--concatenation of all coded feature components--is not optimal. In this paper, we present a detection rate optimized bit allocation (DROBA) principle, which assigns more bits to discriminative features and fewer bits to nondiscriminative features. We further propose a dynamic programming (DP) approach and a greedy search (GS) approach to achieve DROBA. Experiments of DROBA on the FVC2000 fingerprint database and the FRGC face database show good performances. As a universal method, DROBA is applicable to arbitrary biometric modalities, such as fingerprint texture, iris, signature, and face. DROBA will bring significant benefits not only to the template protection systems but also to the systems with fast matching requirements or constrained storage capability.

Journal ArticleDOI
TL;DR: The modular concept of the system provides the capability to test the antenna hardware, beamforming unit, and beamforming algorithm in an independent manner, thus allowing the smart antenna system to be developed and tested in parallel, hence reduces the design time.
Abstract: A new design of smart antenna testbed developed at UKM for digital beamforming purpose is proposed. The smart antenna UKM testbed developed based on modular design employing two novel designs of L-probe fed inverted hybrid E-H (LIEH) array antenna and software reconfigurable digital beamforming system (DBS). The antenna is developed based on using the novel LIEH microstrip patch element design arranged into 4 × 1 uniform linear array antenna. An interface board is designed to interface to the ADC board with the RF front-end receiver. The modular concept of the system provides the capability to test the antenna hardware, beamforming unit, and beamforming algorithm in an independent manner, thus allowing the smart antenna system to be developed and tested in parallel, hence reduces the design time. The DBS was developed using a high-performance TMS320C67112™ floating-point DSP board and a 4-channel RF front-end receiver developed in-house. An interface board is designed to interface to the ADC board with the RF front-end receiver. A four-element receiving array testbed at 1.88-2.22 GHz frequency is constructed, and digital beamforming on this testbed is successfully demonstrated.

Journal ArticleDOI
TL;DR: The possible usage of palmprint in fuzzy vault is investigated to develop a user friendly and reliable crypto system and the use of both symmetric and asymmetric approach for the encryption is suggested.
Abstract: The combination of cryptology and biometrics has emerged as promising component of information security. Despite the current popularity of palmprint biometric, there has not been any attempt to investigate its usage for the fuzzy vault. This paper therefore investigates the possible usage of palmprint in fuzzy vault to develop a user friendly and reliable crypto system. We suggest the use of both symmetric and asymmetric approach for the encryption. The ciphertext of any document is generated by symmetric cryptosystem; the symmetric key is then encrypted by asymmetric approach. Further, Reed and Solomon codes are used on the generated asymmetric key to provide some error tolerance while decryption. The experimental results from the proposed approach on the palmprint images suggest its possible usage in an automated palmprint-based key generation system.