SpeechFind: advances in spoken document retrieval for a National Gallery of the Spoken Word
read more
Citations
Content-based multimedia information retrieval: State of the art and challenges
An overview of automatic speaker diarization systems
Retrieval and browsing of spoken content
Review: Speaker segmentation and clustering
A review on speaker diarization systems and approaches
References
An Introduction to Multivariate Statistical Analysis
SRILM – An Extensible Language Modeling Toolkit
An introduction to latent semantic analysis
Introduction To Multivariate Statistical Analysis
Digital Watermarking
Related Papers (5)
Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition
Frequently Asked Questions (9)
Q2. What is the SMF problem used to design?
The SMF problem is used to design systems that are affine-in-parameters (but not necessarily in the data), subject to a bound on the absolute error between a desired sequence and a linearly filtered version of another sequence.
Q3. What is the way to search for key words in a news story?
For BN stories, WERs can be low with much redundancy in the news stories, and therefore, search for key words over longer sequences is a reasonable approach.
Q4. Why is the prediction residual used for the stegosignal?
Because the prediction residual associated with the coversignal is used for reconstructing the stegosignal, the autocorrelation values of the stegosignal are different from the modified autocorrelation values derived from the perturbed LP coefficients and the prediction residual .
Q5. What is the common approach to minimizing the shift from the well-trained baseline model?
A conservative approach is to minimize the shift from the well-trained baseline model parameters, given the constraint of no loss of discrimination power along the first dominant eigendirections in the test speaker eigenspace:(5)By substituting (4) into (5) and minimizing the objective function using the Lagrange Multiplier method, the adapted mean can be obtained from using a linear transformation , in which is an nonsingular matrix given by(6)and where is an identity matrix.
Q6. How many different types of Gaussian states exist in the acoustic model?
The baseline speaker-independent acoustic model has 6275 context-dependent tied states, each having 16 mixture component Gaussians (i.e., in total, 100 400 diagonal mixture component Gaussians exist in the acoustic model).
Q7. What constraint was used to determine the fidelity of the watermarking algorithm?
2) SMF-Based Fidelity Criterion: In [71], a general parameter-embedding problem was considered whose solution is subject to an fidelity constraint on the signal.
Q8. How many terms are in the transcribed audio?
Since the transcribed audio segments have considerable length variations, the authors make equal to some percentage of the number of terms in each original automatic audio transcription (which achieves better performance than picking a fixed number of terms for all spoken documents).
Q9. What is the new tfidf weighting scheme?
the tfidf weighting scheme is replaced with Okapi weighting [90], and several query and document expansion technologies are incorporated.