scispace - formally typeset

Proceedings ArticleDOI

Noise Detection and Classification in Speech Signals with Boosting

26 Aug 2007-pp 778-782

TL;DR: A novel method to detect and classify sudden noises in speech signals using Boosting, which can create a complex, non-linear boundary that determines whether the observed signal is speech, noise1, noise2, or so on.
Abstract: This paper presents a novel method to detect and classify sudden noises in speech signals. There are many sudden and short-period noises in natural environments, such as inside a car. If a speech recognition system can detect sudden noises, it will make it possible for the system to ask the speaker to repeat the same utterance so that the speech data will be clean. If clean speech data can be input, it will help prevent system operation errors. In this paper, we tried to detect and classify sudden noises in user's utterances using Boosting. Boosting can create a complex, non-linear boundary that determines whether the observed signal is speech, noise1, noise2, or so on. In our experiments, the proposed method achieved good performance in comparison to a conventional method based on the GMM (Gaussian Mixture Model).

Summary (2 min read)

1. INTRODUCTION

  • Sudden and short-period noises often affect the performance of a speech recognition system.
  • To recognize the speech data correctly, noise reduction or model adaptation to the sudden noise is required.
  • Many studies have been conducted on non-stationary noise reduction in a single chan- nel [1, 2].
  • But it is difficult for these methods to track sudden noises.
  • The authors propose sudden-noise detection and classification based on AdaBoost.

2. SYSTEM OVERVIEW

  • Figure 2 shows the overview of the noise detection and classification system based on AdaBoost.
  • Each segment is converted to the linear spectral domain by applying the discrete Fourier transform.
  • Then the logarithm is applied to the linear power spectrum, and the feature vector (log-mel spectrum) is obtained.
  • Next, the system identifies whether or 7781-4244-1198-X/07/$25.00 ©2007 IEEE SSP 2007 not the feature vector is a noisy speech overlapped by sudden noises using two-class AdaBoost, where the multi-class AdaBoost is not used due to the computation cost.
  • Then the system clarifies sudden noise type from only the detected noisy frame using multi-class AdaBoost.

3. NOISE DETECTION USING ADABOOST

  • Boosting is a voting method using weighted weak classifier, and AdaBoost is one method of Boosting [5].
  • After training the weak learner on the t-th iteration, the error of ht is calculated.
  • 4. NOISE CLASSIFICATIONWITH MULTI-CLASS ADABOOST.
  • The authors used one-vs-rest for multi-class classification using AdaBoost.
  • The final classifier decides a noise class having the maximum value from all classes in (6).

5. SMOOTHING

  • A signal interval detected by AdaBoost may result in only a few frames (unrealistic short interval) due to the frameindependent detection and classification.
  • Therefore, in this paper, majority voting is applied to a small number of frames in order to delete the unrealistic short interval.
  • When carrying out the smoothing of one frame, the prior three and subsequent three frames are also taken into consideration, meaning that majority voting is carried out on a total of seven frames.

6. GMM-BASED NOISE DETECTION AND CLASSIFICATION

  • The authors used a conventional GMM (Gaussian mixture model) for comparing the proposed method.
  • GMMs are used widely for VAD (Voice Activity Detection) because the model is easy to train and usually powerful [8].
  • Using two GMMs, the log likelihood ratio is calculated by L(x) = log Pr(x|speech model) Pr(x|noisy model) (10) C(x) = argmax k Pr(x|noisy model(k)) (12).
  • When a GMM is used for detection and classification, the smoothing method is the same as Section 5.

7.1. Experimental Conditions

  • To evaluate the proposed method, the authors used six kinds of sudden noises from the RWCP corpus [9].
  • The following sudden noise sounds were used: spraying, telephone sounds, tearing paper, pouring of a granular substance, bell-ringing and horn blowing.
  • In the database, each kind of noise has 50 data samples, which are divided into 20 data samples for training and 30 for testing.
  • The speech signal was sampled at 16 kHz and windowed with a 20-msec Hamming window every 10-msec, and a 24- order log-mel power spectrum and 12-order MFCCs were used as feature vectors.
  • Therefore, recall ratio, precision ratio and Fmeasure are calculated without considering whether noise are classified correctly or not.

7.2. Experimental Results

  • Figure 5 shows the results of the sudden-noise detection and classification when both thresholds(= η in Equation 4 and Equation 11) are 0.
  • Therefore, an increase of the SNR degrades the performance of the noise detection and classification because the noise power decreases.
  • But, AdaBoost had higher performance than GMM for 0 and 5 SNR, except for precision.
  • A. Betkowska, K. Shinoda, and S. Furui, “FHMM for Robust Speech Recognition in Home Environment,” Proc. Symposium on Large-Scale Knowledge Resources, pp. 129-132, 2006. [4].
  • Y, et al., “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Comp. and System Sci., 55, pp. 119-139, 1997. [6].

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING
Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki
Department of Computer and Systems Engineering
Kobe University, Kobe, Japan
miyake@cs.scitec.kobe-u.ac.jp,{takigu,ariki}@kobe-u.ac.jp
ABSTRACT
This paper presents a novel method to detect and classify sud-
den noises in speech signals. There are many sudden and
short-period noises in natural environments, such as inside a
car. If a speech recognition system can detect sudden noises,
it will make it possible for the system to ask the speaker to re-
peat the same utterance so that the speech data will be clean.
If clean speech data can be input, it will help prevent system
operation errors. In this paper, we tried to detect and classify
sudden noises in user’s utterances using Boosting. Boosting
can create a complex, non-linear boundary that determines
whether the observed signal is speech, noise1, noise2, or so
on. In our experiments, the proposed method achieved good
performance in comparison to a conventional method based
on the GMM (Gaussian Mixture Model).
Index Terms: Noise, Acoustic signal detection, Pattern clas-
sication
1. INTRODUCTION
Sudden and short-period noises often affect the performance
of a speech recognition system. Figure 1 shows a speech wave
overlapped by a sudden noise (a telephone call). To recognize
the speech data correctly, noise reduction or model adaptation
to the sudden noise is required. However, it is difcult to re-
move such noises because we do not know where the noise
overlapped and what the noise was. Many studies have been
conducted on non-stationary noise reduction in a single chan-
Fig. 1. Speech wave overlapped by a sudden noise (telephone
call)
nel [1, 2]. But it is difcult for these methods to track sud-
den noises. Studies have also been carried out based on the
model compensation technique for speech recognition in en-
vironments where there is sudden noise [3, 4]. These methods
are useful for environments in which it is known what noises
there are. But as the number of noises increase, the number
of models increases, and recognition time increases.
In this paper, we propose sudden-noise detection and clas-
sication based on AdaBoost. If a speech recognition system
can detect sudden noises, it will make it possible for the sys-
tem to ask the speaker to repeat the same utterance so that the
speech data will be clean. If clean speech data can be input,
it will help prevent system operation errors. Also, if it can
be determined what noise is overlapped, the noise character-
istics information will be useful in noise reduction or model
composition.
“Boosting” is a technique in which a set of weak clas-
siers is combined to form one high-performance prediction
rule, and AdaBoost serves as an adaptive boosting algorithm
in which the rule for combining the weak classiers adapts to
the problem and is able to yield extremely efcient classiers.
In this paper, we discuss the AdaBoost algorithm for sudden-
noise detection and classication problems. The proposed
method shows an improved noise detection rate and classi-
cation accuracy compared to that of a conventional method
based on the GMM (Gaussian Mixture Model).
In Section 2 of this paper, we describe an overview of the
proposed method. In Sections 3, 4 and 5, noise detection and
classication using AdaBoost are described. In Section 6, a
comparative approach using GMMs is described. In Section
7, a noise detection and classication experiment is described.
2. SYSTEM OVERVIEW
Figure 2 shows the overview of the noise detection and clas-
sication system based on AdaBoost. The speech waveform
is split into a small segment by a window function. Each seg-
ment is converted to the linear spectral domain by applying
the discrete Fourier transform. Then the logarithm is applied
to the linear power spectrum, and the feature vector (log-mel
spectrum) is obtained. Next, the system identies whether or
7781-4244-1198-X/07/$25.00 ©2007 IEEE SSP 2007

Input signal
Feature extraction
Noise detection with AdaBoost
Noise classification with AdaBoost
Speech only
Noisy speech overlapped
by sudden noises
ni
i
,...,1:x
,x
1
Output
,x
2
,x
3
,x
4
(s)
(s)
(s) (n)
(n)
5
x
(s): speech
(n): noisy speech
Fig. 2. System overview of noise detection and classication
not the feature vector is a noisy speech overlapped by sudden
noises using two-class AdaBoost, where the multi-class Ad-
aBoost is not used due to the computation cost. Then the sys-
tem claries sudden noise type from only the detected noisy
frame using multi-class AdaBoost.
3. NOISE DETECTION USING ADABOOST
Boosting is a voting method using weighted weak classier,
and AdaBoost is one method of Boosting [5]. Boosting de-
cides the weak classiers and their weights based on the min-
imizing of loss function in a two-class problem. Since Boost-
ing is fast and has high performance, it is commonly used for
face detection in images [6].
Figure 3 shows the AdaBoost learning algorithm. The Ad-
aBoost algorithm uses a set of training data, { (x
1
, y
1
), ...,
(x
n
, y
n
) }, where x
i
is the i-th feature vector of the observed
signal and y is a set of possible labels. For noise detection,
we consider just two possible labels, Y = {−1, 1}, where la-
bel 1 means noisy speech and label 1, means speech only.
As shown in Figure 3, the weak learner generates a hy-
pothesis h
t
: x →{1,1} that has a small error. In this paper,
single-level decision trees (also known as decision stamps)
are used as the base classiers.
h
t
(x
i
)=
1, if p
t
x
j
p
t
θ
t
1, otherwise
(3)
Here x
j
is the j-dimensional feature of x
i
, θ
t
is the threshold
and p
t
is the parity indicating the direction of the inequality
sign. θ
t
and p
t
are decided by minimizing the error. After
training the weak learner on the t-th iteration, the error of h
t
is calculated.
Next, AdaBoost sets a parameter α
t
. Intuitively, α
t
mea-
sures the importance that is assigned to h
t
. Then weight w
t
is
Input: n examples Z = {(x
1
,y
1
), ··· , (x
n
,y
n
)}
Initialize:
w
1
(z
i
)=
j
1
2m
, if y
i
=1
1
2l
, if y
i
= 1
where, m is the number of positive data, and l is the number of
negative data.
Do for t =1, ··· ,T,
1. Train a base learner with respect to weighted example distri-
bution w
t
and obtain hypothesis h
t
: x →{1, 1}
2. Calculate the training error
t
of h
t
:
t
=
n
X
i=1
w
t
(z
i
)
I(h
t
(x
i
) = y
i
)+1
2
.
I(h
t
(x
i
) = y
i
)=
j
1, if h
t
(x
i
) = y
i
1, otherwise
3. Set
α
t
= log
1
t
t
4. Update example distribution w
t
:
w
t+1
(z
i
)=
w
t
(z
i
) exp{α
t
I(h
t
(x
i
) = y
i
)}
P
n
j=1
w
t
(z
i
) exp{α
t
I(h
t
(x
i
) = y
i
)}
. (1)
Output: nal hypothesis:
f(x)=
1
||α||
X
t
α
t
h
t
(x). (2)
Fig. 3. AdaBoost algorithm for noise detection
updated. Equation (1) leads to an increase of the weight for
the data misclassied by h
t
. Therefore, the weight tends to
concentrate on “hard” data. After the T -th iteration, the nal
hypothesis, f (x), combines the outputs of the T weak hy-
potheses using a weighted majority vote. Outputs H(x
i
) are
decided using f(x
i
) in Equation 2 and threshold η as follows:
H(x
i
)=
1, if f(x
i
)
1, otherwise
(4)
As AdaBoost trains the weight, focusing on “hard” data,
we can expect that it will achieve extremely high detection
rates even if the power of the noise to be detected is low.
4. NOISE CLASSIFICATION WITH MULTI-CLASS
ADABOOST
Because AdaBoost is based on a two-class classier, it is dif-
cult to classify multi-class noises. Therefore, we use extended
multi-class AdaBoost to classify sudden noises. There are
some ways to carry out multi-class classication using a pair-
wise method (such as a tree); for example, K-pairwise, or one-
vs-rest [7]. In this paper, we used one-vs-rest for multi-class
779

classication using AdaBoost. This method creates multiple
two-class classiers, which distinguish between one class and
other classes. The largest value is selected from the output
values and used as the resulting value. The number of clas-
siers is the same as the number of classes to classify. The
multi-class AdaBoost algorithm is as follows:
Input: m examples {(x
1
,y
1
), ··· , (x
m
,y
m
)}
y
i
= {1, ··· ,K}
Do for k =1, ··· ,K
1. Set labels
y
k
i
=
+1, if y
i
= k
1, otherwise
(5)
2. Learn k-th classier f
k
(x) using AdaBoost for data
set
Z
k
=(x
1
,y
k
1
), ··· , (x
m
,y
k
m
)
Final classier:
ˆ
k = argmax
k
f
k
(x) (6)
The multi-class algorithm is applied to the detected noisy
frames overlapped by sudden noises. The number of classi-
ers, K, corresponds to the noise class. The k-th classier is
designed to separate class k and other classes (Fig. 4) using
AdaBoost, as described in Section 3. The nal classier de-
cides a noise class having the maximum value from all classes
in (6).
The multi-class AdaBoost can be applied to the noise de-
tection problem, too. But in this paper, due to the computa-
tion cost, the two-class AdaBoost rst detects noisy speech
and then only the detected frame is classied into each noise
class using multi-class AdaBoost.
Feature vector
AdaBoost
1
st
class vs other
AdaBoost
2
nd
class vs other
AdaBoost
K-th class vs other
)(
1
xf
)(maxarg
ˆ
xfk
k
k
)(xf
K
)(
2
xf
Fig. 4. One-vs-rest AdaBoost for noise classication
5. SMOOTHING
A signal interval detected by AdaBoost may result in only
a few frames (unrealistic short interval) due to the frame-
independent detection and classication. Therefore, in this
paper, majority voting is applied to a small number of frames
in order to delete the unrealistic short interval. When carry-
ing out the smoothing of one frame, the prior three and subse-
quent three frames are also taken into consideration, mean-
ing that majority voting is carried out on a total of seven
frames. For the outputs of detection and classication c
i
(i =
N 3, ··· ,N,··· ,N+3), majority voting at the N-th frame
is as follows:
c
N
= argmax
c
N+3
i=N3
I(c
i
= c) (7)
I(c
i
= c)=
1, if c
i
= c
0, otherwise
(8)
This is repeated until c
N
does not change. Using this method,
we are only able to detect 4 or more frames if continuous
noise.
6. GMM-BASED NOISE DETECTION AND
CLASSIFICATION
We used a conventional GMM (Gaussian mixture model) for
comparing the proposed method. GMMs are used widely for
VAD (Voice Activity Detection) because the model is easy to
train and usually powerful [8]. GMMs are expressed using an
m-th mixture mean vector μ
m
and covariance matrix Σ
m
as
follows:
Pr(x)=
m
P (m)N (x; μ
m
m
) (9)
In this paper, in order to detect sudden noises, we trained
two GMMs (a clean speech model and a noisy speech model)
where the number of mixtures is 64. Using two GMMs, the
log likelihood ratio is calculated by
L(x) = log
Pr(x|speech
model)
Pr(x|noisy model)
(10)
In a similar way, using AdaBoost, output H(x
i
) is decided
using threshold η.
H(x
i
)=
1, if L(x
i
)
1, otherwise
(11)
In order to classify noise types, we need to train a noise
GMM for each noise. Then, for the detected noisy speech
only, we nd a maximum likelihood noise from among the
noise GMMs.
C(x) = argmax
k
Pr(x|noisy model
(k)
) (12)
When a GMM is used for detection and classication, the
smoothing method is the same as Section 5.
780

7. EXPERIMENTS
7.1. Experimental Conditions
To evaluate the proposed method, we used six kinds of sud-
den noises from the RWCP corpus [9]. The following sudden
noise sounds were used: spraying, telephone sounds, tear-
ing paper, pouring of a granular substance, bell-ringing and
horn blowing. In the database, each kind of noise has 50 data
samples, which are divided into 20 data samples for training
and 30 for testing. These noises were used in speech sig-
nal, so the frames are classied into “speech with spraying,
“speech with telephone sounds, “speech with tearing paper,
“speech with pouring of a granular substance, “speech with
bell-ringing” and “speech with horn blowing.
In order to make noisy speech corrupted by sudden noises,
we added the sudden noises to clean speech in the wave do-
main and used 2,104 utterances of 5 men for testing and 210
utterances of 21 men for training (the total number of training
data: 210 utterances × (6 + 1) = 1,470). Noises, whose SNR
was adjusted between -5 dB and 5 dB, are overlapped with
learning data. Similarly, test data noise had SNR of -5 dB, 0
dB and 5 dB and each sound continued for about 200 ms.
The speech signal was sampled at 16 kHz and windowed
with a 20-msec Hamming window every 10-msec, and a 24-
order log-mel power spectrum and 12-order MFCCs were used
as feature vectors. The number of training iterations, T ,was
500, where AdaBoost was composed of 500 weak classiers.
For evaluation, we used ve criteria: recall ratio, precision
ratio, F-measure, classication ratio and accuracy. These are
calculated by following equations,
Recall =
tp
tp + fn
(13)
P recision =
tp
tp + fp
(14)
F-measure =
2 · Recall · P recision
Recall + P recision
(15)
Classif ication =
tp ce
tp
(16)
Accuracy =
tp fp ce
tp + fn
(17)
where, tp is the number of true positive frames that were
“noisy speech” frames that were actually detected as “noisy
speech. Similarly, fp represents false positive frames that
are “clean speech” frames detected as “noisy speech. and
fn represents false negative frames that are “noisy speech”
identied as “clean speech. ce is the number of classication
error frames. Therefore, recall ratio, precision ratio and F-
measure are calculated without considering whether noise are
classied correctly or not. In contrast, the classication ratio
is only calculated in true positive frame without considering
false positive and negative detections. Accuracy is compre-
hensive evaluation of detection and classication. These cri-
Recall Precision F-measure Classification Accuracy
AdaBoost
GMM
0.95
0.90
0.85
0.80
0.75
0.70
0.962
0.972
0.914
0.934
1.00
[SNR of -5 dB]
0.978
0.988
0.988
0.986
0.967
0.942
Recall Precision F-measure Classification Accuracy
AdaBoost
GMM
0.95
0.90
0.85
0.80
0.75
0.70
0.947
0.947
0.907
0.875
1.00
[SNR of 0 dB]
0.983
0.960
0.914
0.973
0.989
0.973
Recall Precision F-measure Classification Accuracy
AdaBoost
GMM
0.95
0.90
0.85
0.80
0.75
0.70
0.932
0.871
0.842
0.734
1.00
[SNR of 5 dB]
0.973
0.953
0.915
0.950
0.781
0.985
Fig. 5. Results of noise detection and classication at SNRs
of -5 dB, 0 dB and 5 dB when thresholds were not adjusted
(η =0)
teria are calculated using short time frames.
7.2. Experimental Results
Figure 5 shows the results of the sudden-noise detection and
classication when both thresholds(= η in Equation 4 and
Equation 11) are 0. Here the SNR is calculated by
SNR = 10 log
E[s
2
]
E[n
2
]
(18)
where E[s
2
] is the expectation of the power of the clean speech
signal. Therefore, an increase of the SNR degrades the per-
formance of the noise detection and classication because the
noise power decreases. Figure 5 shows that GMM has higher
performance for -5 dB SNR. But, AdaBoost had higher per-
formance than GMM for 0 and 5 SNR, except for precision.
781

Recall Precision F-measure Classification Accuracy
AdaBoost
GMM
0.95
0.90
0.85
0.80
0.75
0.70
0.974
0.973
0.937
0.933
1.00
[SNR of -5 dB]
0.972
0.989
0.989
0.973
0.973
0.974
Recall Precision F-measure Classification Accuracy
AdaBoost
GMM
0.95
0.90
0.85
0.80
0.75
0.70
0.973
0.958
0.914
0.896
1.00
[SNR of 0 dB]
0.965
0.962
0.950
0.951
0.989
0.973
Recall Precision F-measure Classification Accuracy
AdaBoost
GMM
0.95
0.90
0.85
0.80
0.75
0.70
0.932
0.923
0.842
0.804
1.00
[SNR of 5 dB]
0.973
0.949
0.915
0.950
0.900
0.947
Fig. 6. Results of noise detection and classication at SNRs
of -5 dB, 0 dB and 5 dB when thresholds were adjusted so as
to maximize F-measures
In addition, Figure 6 shows the results when thresholds were
adjusted as maximizing each F-measure.
As can be seen from this gure, these results clarify the
effectiveness of the AdaBoost-based method in comparison
to the GMM-based method, particularly in regard to classi-
cation. As the SNR increases (and noise power decreases),
the difference in performance is large. Since the GMM-based
method calculates the mean and covariance of the training
data only, it may be difcult to express a complex non-linear
boundary exactly between clean speech and noisy speech (over-
lapped by a low-power noise). On the other hand, the Ad-
aBoost system can obtain good performance at an SNR of 5
dB because AdaBoost can make a non-linear boundary from
the training data near the boundary directly.
8. CONCLUSION
We proposed the sudden-noise detection and classication with
Boosting. Experimental results show that the performance
using AdaBoost is better than that of the conventional GMM-
based method, especially at a high SNR (meaning, under low-
power noise conditions). The reason is that Boosting could
train a complex non-linear boundary weighting the training
data heavily, while the GMM approach could not express the
complex boundary because the GMM-based method calcu-
lates the mean and covariance of the training data only. Future
research will include combining noise detection and classi-
cation with noise reduction.
9. REFERENCES
[1] V. Barreaud , et al., “On-Line Frame-Synchronous Com-
pensation of Non-Stationary noise, ICASSP, vol. 1, pp.
652-655, 2003.
[2] M. Fujimoto, S. Nakamura, “Particle Filter Based Non-
stationary Noise Tracking for Robust Speech Recogni-
tion, ICASSP, vol. 1, pp. 257-260, 2005.
[3] A. Betkowska, K. Shinoda, and S. Furui, “FHMM
for Robust Speech Recognition in Home Environ-
ment, Proc. Symposium on Large-Scale Knowledge
Resources, pp. 129-132, 2006.
[4] M. Ida, S. Nakamura, “HMM Composition-Based
Rapid Model Adaptation Using an Priori Noise
GMM Adaptation Evaluation on Aurora2 Corpus, IC-
SLP2002, Vol.1, pp. 437-440, 2002.
[5] Freund. Y, et al., A decision-theoretic generalization of
on-line learning and an application to boosting, Journal
of Comp. and System Sci., 55, pp. 119-139, 1997.
[6] P. Viola, et al., “Rapid Object Detection using a Boosted
Cascade of Simple Features, IEEE CVPR, vol. 1, pp.
511-518, 2001.
[7] E. Alpaydin, “Introduction to Machine Learning, The
MIT Press, 2004.
[8] A. Lee, et al., “Noise robust real world spoken dialog
system using GMM based rejection of unintended in-
puts, ICSLP, vol.I, pp. 173-176, 2004.
[9] S. Nakamura, et al., Acoustical Sound Database in
Real Environments for Sound Scene Understanding and
Hands-Free Speech Recognition, 2nd ICLRE, pp. 965-
968, 2000.
782
Citations
More filters

DissertationDOI
01 Jan 2014
TL;DR: The approach taken is to interpret the sound event as a two-dimensional spectrogram image, with the two axes as the time and frequency dimensions, which enables novel methods for SER to be developed based on spectrogramimage processing, which are inspired by techniques from the field of image processing.
Abstract: The objective of this research is to develop feature extraction and classification techniques for the task of sound event recognition (SER) in unstructured environments. Although this field is traditionally overshadowed by the popular field of automatic speech recognition (ASR), an SER system that can achieve human-like sound recognition performance opens up a range of novel application areas. These include acoustic surveillance, bio-acoustical monitoring, environmental context detection, healthcare applications and more generally the rich transcription of acoustic environments. The challenge in such environments are the adverse effects such as noise, distortion and multiple sources, which are more likely to occur with distant microphones compared to the close-talking microphones that are more common in ASR. In addition, the characteristics of acoustic events are less well defined than those of speech, and there is no sub-word dictionary available like the phonemes in speech. Therefore, the performance of ASR systems typically degrades dramatically in these challenging unstructured environments, and it is important to develop new methods that can perform well for this challenging task. In this thesis, the approach taken is to interpret the sound event as a two-dimensional spectrogram image, with the two axes as the time and frequency dimensions. This enables novel methods for SER to be developed based on spectrogram image processing, which are inspired by techniques from the field of image processing. The motivation for such an approach is based on finding an automatic approach to “spectrogram reading”, where it is possible for humans to visually recognise the different sound event signatures in the spectrogram. The advantages of such an approach are twofold. Firstly, the sound event image representation makes it possible to naturally capture the sound information in a two-dimensional feature. This has advantages over conventional onedimensional frame-based features, which capture only a slice of spectral information

55 citations


Cites methods from "Noise Detection and Classification ..."

  • ...A simpler approach is therefore to modify the training to include a category containing different combinations of overlapping sound events, then perform conventional classification [254]....

    [...]


Journal ArticleDOI
Abstract: This paper describes a method for reducing sudden noise using noise detection and classification methods, and noise power estimation. Sudden noise detection and classification have been dealt with in our previous study. In this paper, GMM-based noise reduction is performed using the detection and classification results. As a result of classification, we can determine the kind of noise we are dealing with, but the power is unknown. In this paper, this problem is solved by combining an estimation of noise power with the noise reduction method. In our experiments, the proposed method achieved good performance for recognition of utterances overlapped by sudden noises.

12 citations


Proceedings ArticleDOI
S Lakshmikanth1, K R Natraj2, K R Rekha2Institutions (2)
03 Apr 2014
TL;DR: A particle swarm optimization (PSO) based wiener filter for enhancement of filtering and a comparative analysis is performed on these algorithms and generated the MSE and PSNR values of signals.
Abstract: Industrial noise is generated due to the number of sources that interferes with the signals. The source and weight of noise signals are hard to analyze hence a collective form of noise called Gaussian Noise is considered in this paper. This noise is collective form of noise signals that arise in industrial and transmission scales of signal processing. We have implemented wiener filter, least-mean-square algorithm, normalized LMS algorithm for denoising the noisy signals. In this paper we propose a particle swarm optimization (PSO) based wiener filter for enhancement of filtering. A comparative analysis is performed on these algorithms and generated the MSE and PSNR values of signals..

4 citations


Proceedings Article
01 Dec 2012
TL;DR: A novel and fast weighting method using an AdaBoost algorithm to find the sensor area contributing to the accurate discrimination of vowels and results for vowel recognition show the large-weight MEG sensors mainly in a language area of the brain and the high classification accuracy.
Abstract: This paper shows that pattern classification based on machine learning is a powerful tool for analyzing human brain activity data obtained by magnetoencephalography (MEG) In our previous work, a weighting method using multiple kernel learning was proposed, but this method had a high computational cost In this paper, we propose a novel and fast weighting method using an AdaBoost algorithm to find the sensor area contributing to the accurate discrimination of vowels Our AdaBoost simultaneously estimates both the classification boundary and the weight to each MEG sensor, with MEG amplitude obtained from each pair of sensors being an element of the feature vector The estimated weight indicates how the corresponding sensor is useful for classifying the MEG response patterns Our results for vowel recognition show the large-weight MEG sensors mainly in a language area of the brain and the high classification accuracy (910%) in the latency range between 50 and 150 ms

4 citations


Cites background from "Noise Detection and Classification ..."

  • ...Boosting-based algorithms have recently been developed on a wide range of area, such as text processing [12][13], image processing [14][15], speech recognition [16][17], and so on [18]....

    [...]


References
More filters

Proceedings ArticleDOI
Paul A. Viola1, Michael JonesInstitutions (1)
01 Dec 2001
TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Abstract: This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "integral image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.

17,417 citations


"Noise Detection and Classification ..." refers background in this paper

  • ...Since Boosting is fast and has high performance, it is commonly used for face detection in images [6]....

    [...]


Journal ArticleDOI
Yoav Freund1, Robert E. Schapire1Institutions (1)
01 Aug 1997
TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.
Abstract: In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. We show how the resulting learning algorithm can be applied to a variety of problems, including gambling, multiple-outcome prediction, repeated games, and prediction of points in Rn. In the second part of the paper we apply the multiplicative weight-update technique to derive a new boosting algorithm. This boosting algorithm does not require any prior knowledge about the performance of the weak learning algorithm. We also study generalizations of the new boosting algorithm to the problem of learning functions whose range, rather than being binary, is an arbitrary finite set or a bounded segment of the real line.

14,262 citations


Book
01 Oct 2004
TL;DR: Introduction to Machine Learning is a comprehensive textbook on the subject, covering a broad array of topics not usually included in introductory machine learning texts, and discusses many methods from different fields, including statistics, pattern recognition, neural networks, artificial intelligence, signal processing, control, and data mining.
Abstract: The goal of machine learning is to program computers to use example data or past experience to solve a given problem. Many successful applications of machine learning exist already, including systems that analyze past sales data to predict customer behavior, optimize robot behavior so that a task can be completed using minimum resources, and extract knowledge from bioinformatics data. Introduction to Machine Learning is a comprehensive textbook on the subject, covering a broad array of topics not usually included in introductory machine learning texts. In order to present a unified treatment of machine learning problems and solutions, it discusses many methods from different fields, including statistics, pattern recognition, neural networks, artificial intelligence, signal processing, control, and data mining. All learning algorithms are explained so that the student can easily move from the equations in the book to a computer program. The text covers such topics as supervised learning, Bayesian decision theory, parametric methods, multivariate methods, multilayer perceptrons, local models, hidden Markov models, assessing and comparing classification algorithms, and reinforcement learning. New to the second edition are chapters on kernel machines, graphical models, and Bayesian estimation; expanded coverage of statistical tests in a chapter on design and analysis of machine learning experiments; case studies available on the Web (with downloadable results for instructors); and many additional exercises. All chapters have been revised and updated. Introduction to Machine Learning can be used by advanced undergraduates and graduate students who have completed courses in computer programming, probability, calculus, and linear algebra. It will also be of interest to engineers in the field who are concerned with the application of machine learning methods. Adaptive Computation and Machine Learning series

3,947 citations


Proceedings Article
Satoshi Nakamura, Kazuo Hiyane, Futoshi Asano, T. Nishiura1  +1 moreInstitutions (2)
01 May 2000
TL;DR: LREC2000: the 2nd International Conference on Language Resources and Evaluation, May 31 - June 2, 2000, Athens, Greece.
Abstract: LREC2000: the 2nd International Conference on Language Resources and Evaluation, May 31 - June 2, 2000, Athens, Greece.

246 citations


"Noise Detection and Classification ..." refers methods in this paper

  • ...GMMs are used widely for VAD (Voice Activity Detection) because the model is easy to train and usually powerful [8]....

    [...]


Proceedings ArticleDOI
04 Oct 2004
TL;DR: ICSLP2004: the 8th International Conference on Spoken Language Processing, October 4-8, 2004, Jeju Island, Korea.
Abstract: ICSLP2004: the 8th International Conference on Spoken Language Processing, October 4-8, 2004, Jeju Island, Korea.

59 citations


"Noise Detection and Classification ..." refers methods in this paper

  • ...Using this method, we are only able to detect 4 or more frames if continuous noise....

    [...]


Network Information
Related Papers (5)
05 Sep 1995

R. Vergin, Douglas O'Shaughnessy

01 Sep 2003

Michael L. Seltzer, Jasha Droppo +1 more

06 Apr 2003

Francoise Beaufays, D. Boies +2 more

Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20142
20121
20101