scispace - formally typeset
Search or ask a question
Author

Rolf Jongebloed

Bio: Rolf Jongebloed is an academic researcher from Braunschweig University of Technology. The author has contributed to research in topics: Voice activity detection & Microphone. The author has an hindex of 1, co-authored 1 publications receiving 6 citations.

Papers
More filters
Proceedings ArticleDOI
15 Apr 2018
TL;DR: This work investigates single- and multi-talk in a wide range of different crosstalk levels, and improved the detection accuracy towards a standardized voice activity detection overall by 12.89 % absolute.
Abstract: Multichannel recordings of meetings with a (wireless) headset for each person deliver commonly the best audio quality for subsequent analyses. However, still speech portions of other participants can couple into the microphone channel of the associated target speaker. Due to this crosstalk, a speaker activity detection (SAD) is required in order to identify only the speech portions of the target speaker in the related microphone channel. While most solutions are either complex and need a training process, or achieve insufficient results in multi-talk situations, we propose a low complexity method, which can handle both crosstalk and multi-talk situations. We investigate single- and multi-talk in a wide range of different crosstalk levels, and improved the detection accuracy towards a standardized voice activity detection overall by 12.89 % absolute, whereas a state-of-the-art multichannel SAD was exceeded even by 13.76 % absolute.

6 citations


Cited by
More filters
Proceedings ArticleDOI
04 May 2020
TL;DR: This work extends an existing approach by integrating methods from acoustic echo cancellation to improve the estimation of the interferer (noise) components of the filter, which leads to an improved signal-to-interferer ratio by up to 2.1 dB absolute at constant speech component quality.
Abstract: Recording a meeting and obtaining clean speech signals of each speaker is a challenging task. Even with a multichannel recording, in which all speakers are equipped with a close-talk microphone, speech of an active speaker still couples not only into his dedicated microphone, but also into all other microphone channels at a certain level. This is denoted as crosstalk and requires a multichannel speaker interference reduction to enhance the microphone channels for further processing. To solve this issue, we use a Wiener filter which is based on all individual microphone channels. We extend an existing approach by integrating methods from acoustic echo cancellation to improve the estimation of the interferer (noise) components of the filter, which leads to an improved signal-to-interferer ratio by up to 2.1 dB absolute at constant speech component quality.

7 citations

Journal ArticleDOI
TL;DR: An adaptive filter method is integrated, which was originally proposed for acoustic echo cancellation (AEC), in order to obtain a well-performing interferer (noise) component estimation and results in an improved speech-to-interferer ratio by up to 2.7 dB at constant or even better speech component quality.
Abstract: Microphone leakage or crosstalk is a common problem in multichannel close-talk audio recordings (e.g., meetings or live music performances), which occurs when a target signal does not only couple into its dedicated microphone, but also in all other microphone channels. For further signal processing such as automatic transcription of a meeting, a multichannel speaker interference reduction is required in order to eliminate the interfering speech signals in the microphone channels. The contribution of this paper is twofold: First, we consider multichannel close-talk recordings of a three-person meeting scenario with various different crosstalk levels. In order to eliminate the crosstalk in the target microphone channel, we extend a multichannel Wiener filter approach, which considers all individual microphone channels. Therefore, we integrate an adaptive filter method, which was originally proposed for acoustic echo cancellation (AEC), in order to obtain a well-performing interferer (noise) component estimation. This results in an improved speech-to-interferer ratio by up to 2.7 dB at constant or even better speech component quality. Second, since an AEC method requires typically clean reference channels, we investigate and report findings why the AEC algorithm is able to successfully estimate the interfering signals and the room impulse responses between the microphones of the interferer and the target speakers even though the reference signals are themselves disturbed by crosstalk in the considered meeting scenario.

1 citations

01 Jul 2020
TL;DR: The result shows that the proposed speech activity detection on the entertainment media domain based on CNN can achieve better performance than previous work in a more complicated noise environment.
Abstract: Speech activity detection (SAD) is a critical preparation process for speech-based applications. The speech activity detection is used to identify the speech in an audio recording. This paper aims to propose a speech activity detection on the entertainment media domain based on CNN. The fusion of two Dense Convolutional Network (DenseNet) with different feature extraction by using Dempster-Shafer theory (DS theory) was used to classify the speech segment. We combined acoustic features, which are the logmel spectrogram (LM), mel frequency cepstral coefficient (MFCC), chroma, spectral contrast, and tonnetz as the input feature. The combination of acoustic features operates on the raw speech signal and delivers it into a convolution neural network for classifying the speech. The result in this work shows that the proposed speech activity detection can achieve better performance (+1% Accuracy, +8% Precision, and +5% F1 score) than previous work in a more complicated noise environment.
Proceedings ArticleDOI
24 Jan 2021
TL;DR: The purpose of this work is not to improve the MAEC method, but instead to show that it can be successfully applied to microphone leakage reduction, such as in meetings with headset-equipped participants.
Abstract: Microphone leakage occurs in multichannel close-talk audio recordings of a meeting, when speech of an active speaker couples into both the dedicated target microphone and all other microphone channels. For an automatic transcription or analysis of a meeting, the interferer signals in the target microphone channels have to be eliminated. Therefore, we apply a frequency domain adaptive filtering-based multichannel acoustic echo cancellation (MAEC) method, which typically requires clean reference channels. We consider a wide range of different speech-to-interferer ratios and evaluate two cascading schemes for the MAEC, which leads to an improved speech component quality and interferer reduction by up to 0.1MOS points and 0.5dB, respectively. However, the purpose of this work is not to improve the MAEC method, but instead to show that it can be successfully applied to microphone leakage reduction, such as in meetings with headset-equipped participants. Therefore, we analyze and point out why the MAEC method is able to cancel the interferer signals in this scenario even though the reference signals are themselves disturbed by interfering speech portions.