A statistical model-based voice activity detection
Citations
902 citations
834 citations
Cites background from "A statistical model-based voice act..."
...Additionally, VADs are generally difficult to tune and their reliability severely deteriorates for weak speech components and low input SNR [15], [16], [20]....
[...]
634 citations
Cites methods from "A statistical model-based voice act..."
...More precisely, a statistical-model based voice activity detector (VAD) (Sohn et al., 1999) was used to update the noise spectrum during speech-absent periods....
[...]
...This was surprising at first, but close analysis indicated that the logMMSE-SPU algorithm was sensitive to the noise spectrum estimate, which in our case was obtained with a VAD algorithm....
[...]
...The frame windowing scheme proposed in (Jabloun and Champagne, 2003) was adopted in both VAD methods....
[...]
...The following VAD decision rule was used: 1 L XL 1 k¼1 log Kk ?...
[...]
...(3) Incorporating noise estimation algorithms in place of VAD algorithms for updating the noise spectrum did not produce significant improvements in performance....
[...]
569 citations
Cites methods from "A statistical model-based voice act..."
...soft-decision speech pause detection is either implemented on a frame-by-frame basis [12, 22 ] or estimated independentlyfor individual subbands using an a posteriori signal-to-noise ratio (SNR) [11,13]....
[...]
554 citations
References
8,442 citations
"A statistical model-based voice act..." refers methods in this paper
...…frame state model, the current state depends on the previous observations as well as the current one, which is reflected on the decision rule in the following way: Based on the above formulations, a recursive formula for is obtained as (11) where denotes the likelihood ratio in (4) atth frame....
[...]
3,905 citations
2,714 citations
"A statistical model-based voice act..." refers background or methods in this paper
...…as follows: (5) Substituting (5) into (4) and applying the LRT yields the Itakura–Saito distortion (ISD) based decision rule [2], i.e., (6) Note that the left-hand side of (6) can not be smaller than zero, which is the well-known property of ISD and implies that the likelihood ratio is biased to ....
[...]
...The likelihood ratio for theth frequency band is (3) where and , and they are called thea priori and a posteriori signal-to-noise ratios (SNR’s), respectively [3]....
[...]
...In this letter, we further optimize the decision rule by employing the decision-directed (DD) method for the estimation of the unknown parameters [3]....
[...]
...We adopt the Gaussian statistical model that the DFT coefficients of each process are asymptotically independent Gaussian random variables [3]....
[...]
578 citations
"A statistical model-based voice act..." refers methods in this paper
...The DD method of (7) provides smoother estimates of the a priori SNR than the ML method [4], and consequently reduces the fluctuation of the estimated likelihood ratios during noise-only periods....
[...]
196 citations