Showing papers in "Journal of Signal and Information Processing in 2014"
••
TL;DR: Various acoustic features are combined to form a feature set, so as to detect voice disorders in children based on which further treatments can be prescribed by a pathologist and enable an automatic non-invasive device to diagnose and analyze the voice of the patient.
Abstract: The identification
and classification of pathological voice are still a challenging area of research in speech processing. Acoustic features
of speech are used mainly to discriminate normal voices from pathological voices.
This paper explores and compares various classification models to find the ability
of acoustic parameters in differentiating normal voices from pathological voices.
An attempt is made to analyze and to discriminate pathological voice from normal
voice in children using different classification methods. The classification of
pathological voice from normal voice is implemented using Support Vector Machine
(SVM) and Radial Basis Functional Neural Network (RBFNN). The normal and pathological
voices of children are used to train and test the classifiers. A dataset is constructed
by recording speech utterances of a set of Tamil phrases. The speech signal is then
analyzed in order to extract the acoustic parameters such as the Signal Energy,
pitch, formant frequencies, Mean Square Residual signal, Reflection coefficients,
Jitter and Shimmer. In this study various acoustic features are combined to form
a feature set, so as to detect voice disorders in children based on which further
treatments can be prescribed by a pathologist. Hence, a successful pathological
voice classification will enable an automatic non-invasive device to diagnose and
analyze the voice of the patient.
37 citations
••
TL;DR: Results showed that using one-fourth overlapped data buffers with 128 points Hanning windows and no frames averaging leads to the best performance in removing noise from the noisy speech.
Abstract: Spectral
subtraction is used in this research as a method to remove noise from noisy
speech signals in the frequency domain. This method consists of computing the
spectrum of the noisy speech using the Fast Fourier Transform (FFT) and
subtracting the average magnitude of the noise spectrum from the noisy speech
spectrum. We applied spectral subtraction to the speech signal “Real graph”. A
digital audio recorder system embedded in a personal computer was used to
sample the speech signal “Real graph” to which we digitally added vacuum
cleaner noise. The noise removal algorithm was implemented using Matlab
software by storing the noisy speech data into Hanning time-widowed
half-overlapped data buffers, computing the corresponding spectrums using the
FFT, removing the noise from the noisy speech, and reconstructing the speech
back into the time domain using the inverse Fast Fourier Transform (IFFT). The
performance of the algorithm was evaluated by calculating the Speech to Noise Ratio
(SNR). Frame averaging was introduced
as an optional technique that could improve the SNR. Seventeen different configurations with various lengths of the
Hanning time windows, various degrees of data buffers overlapping, and various
numbers of frames to be averaged were investigated in view of improving the SNR. Results showed that using
one-fourth overlapped data buffers with 128 points Hanning windows and no
frames averaging leads to the best performance in removing noise from the noisy
speech.
35 citations
••
TL;DR: In this article, a nonlinear autoregressive approach with exogenous input is used as a novel method for statistical forecasting of the disturbance storm time index, a measure of space weather related to the ring current which surrounds the Earth, and fluctuations in disturbance stormtime field strength as a result of incoming solar particles.
Abstract: A nonlinear autoregressive approach with exogenous input is used as a
novel method for statistical forecasting of the disturbance storm time index, a
measure of space weather related to the ring current which surrounds the Earth,
and fluctuations in disturbance storm time field strength as a result of
incoming solar particles. This ring current produces a magnetic field which
opposes the planetary geomagnetic field. Given the occurrence of solar activity
hours or days before subsequent geomagnetic fluctuations and the potential
effects that geomagnetic storms have on terrestrial systems, it would be useful
to be able to predict geophysical parameters in advance using both historical
disturbance storm time indices and external input of solar winds and the
interplanetary magnetic field. By assessing various statistical techniques it
is determined that artificial neural networks may be ideal for the prediction
of disturbance storm time index values which may in turn be used to forecast geomagnetic
storms. Furthermore, it is found that a Bayesian regularization neural network
algorithm may be the most accurate model compared to both other forms of
artificial neural network used and the linear models employing regression
analyses.
23 citations
••
TL;DR: It has been shown that applying wavelet edge detection method to the segmented images generated through the proposed image preprocessing approach yields the superior performance among other standard edge detection methods.
Abstract: Edge detection is the process of determining where boundaries of objects fall within an image. So far, several standard operators-based methods have been widely used for edge detection. However, due to inherent quality of images, these methods prove ineffective if they are applied without any preprocessing. In this paper, an image preprocessing approach has been adopted in order to get certain parameters that are useful to perform better edge detection with the standard operators-based edge detection methods. The proposed preprocessing approach involves computation of the histogram, finding out the total number of peaks and suppressing irrelevant peaks. From the intensity values corresponding to relevant peaks, threshold values are obtained. From these threshold values, optimal multilevel thresholds are calculated using the Otsu method, then multilevel image segmentation is carried out. Finally, a standard edge detection method can be applied to the resultant segmented image. Simulation results are presented to show that our preprocessed approach when used with a standard edge detection method enhances its performance. It has been also shown that applying wavelet edge detection method to the segmented images, generated through our preprocessing approach, yields the superior performance among other standard edge detection methods.
22 citations
••
TL;DR: A mathematical approach is proposed in order to identify characteristics of these signals of real traffic situations and to quantify their similarity and complexity and enables the reconstruction of bus ACC and DEC signals.
Abstract: Public transportation by
bus is an essential part of mobility. Braking and starting, e.g., approaching a
bus stop, are documented as the main reason for non-collision incidents. These
situations are evoked by the acceleration forces leading to perturbations of
the passenger’s base of support. In laboratory studies perturbations are applied
to getting insight into
the postural control system and neuromuscular responses. However, bus perturbations
diverge from laboratory ones with respect to duration, maximum and shape, and
it was shown recently that these characteristics influence the postural
response. Thus, results from posturographic studies cannot be generalised and
transferred to bus perturbations. In this study, acceleration (ACC) and
deceleration (DEC) signals of real traffic situations were examined. A
mathematical approach is proposed in order to identify characteristics of these
signals and to quantify their similarity and complexity. Typical
characteristics (duration, maximum, and shape) of real-world driving manoeuvres
concerning start and stop situations could be identified. A mean duration of
13.6 s for ACC and 9.8 s for DEC signals was found which is
clearly longer than laboratory perturbations. ACC and DEC signals are more
complex than the used signals for platform displacements in the laboratory. The
proposed method enables the reconstruction of bus ACC and DEC signals. The data can be used as input for studies
on postural control with high ecological validity.
16 citations
••
TL;DR: Inspired by the core foundation of quantum mechanics, a new easy shape representation for content based image retrieval is proposed by borrowing the concept of quantum superposition into the basis of distance histogram.
Abstract: Content Based Image Retrieval (CBIR) is a technique in which images are indexed based on their visual contents and retrieving is only based upon these indexed images contents. Among the visual contents to describe the image details is shape. Shape of object, is considered as the most important distinguishable feature which living things can easily recognize, which is also a fact while this line is being written, and large efforts are currently underway in describing image contents by their shapes. Inspired by the core foundation of quantum mechanics, a new easy shape representation for content based image retrieval is proposed by borrowing the concept of quantum superposition into the basis of distance histogram. Results show better retrieval accuracy of the proposed method when compared with distance histogram.
9 citations
••
TL;DR: A test on parts of PB.A57 manuscript, with 95% level of confidence concluded that the success percentage of preprocessing in producing Javanese character images ranged 85.9% - 94.82%.
Abstract: Manuscript preprocessing is the earliest stage in transliteration process of manuscripts in Javanese scripts. Manuscript preprocessing stage is aimed to produce images of letters which form the manuscripts to be processed further in manuscript transliteration system. There are four main steps in manuscript preprocessing, which are manuscript binarization, noise reduction, line segmentation, and character segmentation for every line image produced by line segmentation. The result of the test on parts of PB.A57 manuscript which contains 291 character images, with 95% level of confidence concluded that the success percentage of preprocessing in producing Javanese character images ranged 85.9% - 94.82%.
7 citations
••
TL;DR: In this paper, an improved partial transmit sequence (PTS) scheme based on combining the grouped discrete cosine transform (DCT) with PTS technique is proposed, where adjacent partitioned data are firstly transformed by a DCT into new modified data, and then the proposed scheme utilizes the conventional PTS technique to further reduce the peak-to-average power ratio (PAPR) of the OFDM signal.
Abstract: The high peak-to-average power ratio (PAPR) is one of the serious problems in the application of OFDM technology. In this paper, an improved partial transmit sequence (PTS) scheme based on combining the grouped discrete cosine transform (DCT) with PTS technique is proposed. In the proposed scheme, the adjacent partitioned data are firstly transformed by a DCT into new modified data. After that the proposed scheme utilizes the conventional PTS technique to further reduce the PAPR of the OFDM signal. The performance of the PAPR is evaluated using a computer simulation. The simulation results indicate that the proposed scheme may improve the PAPR performance compared with the conventional PTS scheme, the grouped DCT scheme, and original OFDM respectively.
7 citations
••
TL;DR: A new adaptive control method used to adjust the output voltage and current of DC-DC (DC: Direct Current) power converter under different sudden changes in load is presented.
Abstract: The purpose of this paper is to present a new adaptive control method used to adjust the output voltage and current of DC-DC (DC: Direct Current) power converter under different sudden changes in load. The controller is a PID controller (Proportional, Integrator, and Differentiator). The gains of the PID controller (KP, KI and KD) tuned using Simulated Annealing (SA) algorithm which is part of Generic Probabilistic Metaheuristic family. The new control system is expected to have a fast transient response feature, with less undershoot of the output voltage and less overshoot of the reactor current. Pulse Width Modulation (PWM) will be utilized to switch the power electronic devices.
5 citations
••
TL;DR: In this article, a closed-form approximation for the residual inter-symbol interference (ISI) obtained by blind adaptive equalizers is proposed for the noisy and biased input case, where the error of the equalized output signal may be expressed as a polynomial function of order 3.
Abstract: Recently, two expressions (for the noiseless and noisy case) were proposed for the residual inter-symbol interference (ISI) obtained by blind adaptive equalizers, where the error of the equalized output signal may be expressed as a polynomial function of order 3. However, those expressions are not applicable for biased input signals. In this paper, a closed-form approximated expression is proposed for the residual ISI applicable for the noisy and biased input case. This new proposed expression is valid for blind adaptive equalizers, where the error of the equalized output signal may be expressed as a polynomial function of order 3. The new proposed expression depends on the equalizer’s tap length, input signal statistics, channel power, SNR, step-size parameter and on the input signal’s bias. Simulation results indicate a high correlation between the simulated results and those obtained from our new proposed expression.
4 citations
••
TL;DR: The results showed that the rate of correct recognition of the proposed system is about 100% for training files and 95.7% for one testing file for each speaker from the ELSDSR database, and efficiency results were better than the well-known Mel Frequency Frequency Coefficient (MFCC) and the Zak transform.
Abstract: In this paper, an expert
system for security based on biometric human features that can be obtained
without any contact with the registering sensor is presented. These features
are extracted from human’s voice, so the system is called Voice Recognition
System (VRS). The proposed system consists
of a combination of three stages: signal pre-processing, features extraction by
using Wavelet Packet Transform
(WPT) and features matching by using Artificial Neural Networks (ANNs). The
features vectors are formed after two steps: firstly, decomposing the speech
signal at level 7 with Daubechies 20-tap (db20), secondly, the energy
corresponding to each WPT node is calculated which collected to form a features
vector. One hundred twenty eight features vector for each speaker was fed to
the Feed Forward Back-propagation Neural Network (FFBPNN). The data used in
this paper are drawn from the English Language Speech Database for Speaker
Recognition (ELSDSR) database which composes of audio files for training and
other files for testing. The performance of the proposed system is evaluated by
using the test files. Our results showed that the rate of correct recognition
of the proposed system is about 100% for training files and 95.7% for one
testing file for each speaker from the ELSDSR database. The proposed method
showed efficiency results were better than the well-known Mel Frequency
Cepstral Coefficient (MFCC) and the Zak transform.
••
TL;DR: Measurements reiterate recent measurements in a large population of human brains showing the superimposition of Schumann power densities in QEEG data and indicate that intrinsic features of proton densities within cerebral water may be a fundamental basis to consciousness that can be simulated experimentally.
Abstract: The physical properties of
water, particularly the nature of interfacial water and pH shifts associated
with dynamics of the hydronium ion near any surface, may be a primary source of
the complex electromagnetic patterns frequently correlated with consciousness
Effectively all of the major correlates of consciousness, including the 40 Hz
and 8 Hz coupling between the cerebral cortices and hippocampal formation, can
be accommodated by the properties of water within a specific-shaped volume
exposed to a magnetic field In the present study, quantitative
electroencephalographic activity was measured from an experimental simulation
of the human head constructed using conductive dough whose pH could be changed
systematically Spectral analyses of electrical potentials generated over the
regions equivalent to the left and right temporal lobes in humans exhibited
patterns characteristic of Schumann Resonance This fundamental and its
harmonics are generated within the earth-ionospheric cavity with intensities
similar to the volumetric intracerebral magnetic (~2 pT) and electric field (~6
× 10-1 V·m-1)
strengths The power densities for specific pH values were moderately
correlated with those obtained from normal human brains for the fundamental
(first) and second harmonic for the level simulating the cerebral cortices
Calculations indicated that the effective pH would be similar to that
encountered within a single layer of protons near the plasma membrane surface
These results reiterate recent measurements in a large population of human
brains showing the superimposition of Schumann power densities in QEEG data and
indicate that intrinsic features of proton densities within cerebral water may
be a fundamental basis to consciousness that can be simulated experimentally
••
TL;DR: Experimental results indicate that this de-noising method for IMU acceleration signals can efficiently and adaptively remove noise, and this method can better meet the precision requirement.
Abstract: In the track irregularity detection, the acceleration signals of the inertial measurement unit (IMU) output which with low frequency components and noise, this paper studied a de-noising algorithm. Based on the criterion of consecutive mean square error, a de-noising method for IMU acceleration signals based on empirical mode decomposition (EMD) was proposed. This method can divide the intrinsic mode functions (IMFs) derived from EMD into signal dominant modes and noise dominant modes, then the modes reflecting the important structures of a signal were combined together to form partially reconstructed de-noised signal. Simulations were conducted for simulated signals and a real IMU acceleration signals using this method. Experimental results indicate that this method can efficiently and adaptively remove noise, and this method can better meet the precision requirement.
••
TL;DR: The convergence property of the regularized low pass filtering algorithm is proved in theory and tested by numerical results.
Abstract: In this paper
the low pass filter is discussed in the noisy case. And a regularized low pass
filter is presented. The convergence property of the regularized low pass
filtering algorithm is proved in theory and tested by numerical results.
••
TL;DR: A simple method to automatically localize signboard texts within JPEG mobile phone camera images using the Discrete Cosine Transform used by the JPEG compression format is proposed.
Abstract: Extraction of the text
data present in images involves Text detection, Text localization, Text
tracking, Text extraction, Text Enhancement and Text Recognition. Due to its
inherent complexity, traditional text localization algorithms in natural
scenes, especially in multi-context scenes, are not implementable under low
computational resources architectures such as mobile phones. In this paper, we
proposed a simple method to automatically localize signboard texts within JPEG
mobile phone camera images. Taking into account the information provided by the
Discrete Cosine Transform (DCT) used by the JPEG compression format, we delimitate
the borders of the most important text
region. This system is simple, reliable, affordable, easily implementable, and
quick even working under
architectures with low computational resources.
••
TL;DR: New data analysis methods and new ways to detect people work location via mobile computing technology are presented and a centralized mobile device management (MDM) system is presented.
Abstract: Client software on mobile
devices that can cause the remote control perform data mining tasks and show
production results is significantly added the value for the nomadic users and
organizations that need to perform data analysis stored in the repository, far
away from the site, where users work, allowing them to generate knowledge
regardless of their physical location. This paper presents new data analysis
methods and new ways to detect people work location via mobile computing
technology. The growing number of applications, content, and data can be
accessed from a wide range of devices. It becomes necessary to introduce a
centralized mobile device management. MDM is a KDE software package working
with enterprise systems using mobile devices. The paper discussed the design
system in detail.
••
TL;DR: A model of speech synthesis that focuses on its intonation is proposed and a prosody file that can be pronounced by speech engine application is proposed.
Abstract: Prosody in speech
synthesis systems (text-to-speech) is a determinant of tone, duration, and
loudness of speech sound. Intonation is a part of prosody which determines the
speech tone. In Indonesian, intonation is determined by the structure of
sentences, types of sentences, and also the position of the word in a sentence.
In this study, a model of speech synthesis that focuses on its intonation is
proposed. The speech intonation is determined by sentence structure, intonation
patterns of the example sentences, and general rules of Indonesian
pronunciation. The model receives texts and intonation patterns as inputs.
Based on the general principle of Indonesian pronunciation, a prosody file was
made. Based on input text, sentence structure is determined and then interval
among parts of a sentence (phrase) can be determined. These intervals are used
to correct the duration of the initial prosody file. Furthermore, the
frequencies in prosody file were corrected using intonation patterns. The final
result is prosody file that can be pronounced by speech engine application.
Experiment results of studies using the original voice of radio news announcer
and the speech synthesis show that the peaks of F0 are determined by general rules or
intonation patterns which are dominant. Similarity test with the PESQ method
shows that the result of the synthesis is 1.18 at MOS-LQO scale.
••
TL;DR: Some computational geometry preliminaries are discussed, and then a summary of some different techniques used to address the surface reconstruction problem is presented, emphasizing on their advantages and disadvantages.
Abstract: Surface reconstruction is
a problem in the field of computational geometry that is concerned with
recreating a surface from scattered data points sampled from an unknown
surface. To date, the primary application of surface reconstruction algorithms
has been in computer graphics, where physical models are digitized in three
dimensions with laser range scanners or mechanical digitizing probes
(Bernardini et al., 1999 [1]). Surface reconstruction algorithms
are used to convert the set of digitized points into a wire frame mesh model,
which can be colored, textured, shaded, and placed into a 3D scene (in a movie
or television commercial, for example). In this paper, we discuss some
computational geometry preliminaries, and then move on to a summary of some different
techniques used to address the surface reconstruction problem. The coming
sections describe two algorithms: that of Hoppe, et al. (1992 [2]) and Amenta, et al. (1998 [3]). Finally, we present other
applications of surface reconstruction and a brief comparison for some
algorithms in this filed emphasizing on their advantages and disadvantages.
••
TL;DR: Fouda et al. as discussed by the authors proposed a template matching from 2D into 1D based on the template matching algorithm, which is a modification of the Template Matching algorithm.
Abstract: The following article has been retracted due to special reason of the authors. This paper published in Vol.5 No. 2, 2014, has been removed from this site. Title: Template Matching from 2-D into 1-D Author: Yasser Fouda
••
TL;DR: The results obtained by the proposed algorithm show that the tracking process was successfully carried out for a set of color videos with different challenging conditions such as occlusion, illumination changes, cluttered conditions, and object scale changes.
Abstract: This paper presents a new kernel-based algorithm for video object tracking called rebound of region of interest (RROI). The novel algorithm uses a rectangle-shaped section as region of interest (ROI) to represent and track specific objects in videos. The proposed algorithm is constituted by two stages. The first stage seeks to determine the direction of the object’s motion by analyzing the changing regions around the object being tracked between two consecutive frames. Once the direction of the object’s motion has been predicted, it is initialized an iterative process that seeks to minimize a function of dissimilarity in order to find the location of the object being tracked in the next frame. The main advantage of the proposed algorithm is that, unlike existing kernel-based methods, it is immune to highly cluttered conditions. The results obtained by the proposed algorithm show that the tracking process was successfully carried out for a set of color videos with different challenging conditions such as occlusion, illumination changes, cluttered conditions, and object scale changes.
••
TL;DR: On the basis of tree matrix algorithm, whole depictions of loops can be figured out, providing of foundation for further research of relations between loops and LDPC codes’ performance.
Abstract: LDPC codes are finding increasing use in applications requiring reliable and highly efficient information transfer over bandwidth. An LDPC code is defined by a sparse parity-check matrix and can be described by a bipartite graph called Tanner graph. Loops in Tanner graph prevent the sum-product algorithm from converging. Further, loops, especially short loops, degrade the performance of LDPC decoder, because they affect the independence of the extrinsic information exchanged in the iterative decoding. This paper, by graph theory, deduces cut-node tree graph of LDPC code, and depicts it with matrix. On the basis of tree matrix algorithm, whole depictions of loops can be figured out, providing of foundation for further research of relations between loops and LDPC codes’ performance.
••
TL;DR: A novel implementation of the level set method that achieves real-time level-set-based object tracking by simple operations such as switching values of thelevel set functions and there is no need to solve any partial differential equations (PDEs).
Abstract: This paper proposes a novel
implementation of the level set method that achieves real-time level-set-based
object tracking. In the proposed algorithm, the evolution of the curve is
realized by simple operations such as switching values of the level set
functions and there is no need to solve any partial differential equations
(PDEs). The object contour could change due to the change in the location,
orientation or due to the changeable nature of the object shape itself. Knowing
the contour, the average color value for the pixels within the contour could be
found. The estimated object color and contour in one frame are the bases for
locating the object in the consecutive one. The color is used to segment the
object pixels and the estimated contour is used to initialize the deformation
process. Thus, the algorithm works in a closed cycle in which the color is used
to segment the object pixels to get the object contour and the contour is used
to get the typical-color of the object. With our fast algorithm, a real-time
system has been implemented on a standard PC. Results from standard test
sequences and our real time system are presented.