scispace - formally typeset
Search or ask a question

Showing papers on "Microphone published in 2021"


Journal ArticleDOI
TL;DR: In this paper, a new and general fractional formulation is presented to investigate the complex behaviors of a capacitor microphone dynamical system, where the classical Euler-Lagrange equations are constructed by using the classical Lagrangian approach.
Abstract: In this study, a new and general fractional formulation is presented to investigate the complex behaviors of a capacitor microphone dynamical system. Initially, for both displacement and electrical charge, the classical Euler–Lagrange equations are constructed by using the classical Lagrangian approach. Expanding this classical scheme in a general fractional framework provides the new fractional Euler–Lagrange equations in which non-integer order derivatives involve a general function as their kernel. Applying an appropriate matrix approximation technique changes the latter fractional formulation into a nonlinear algebraic system. Finally, the derived system is solved numerically with a discussion on its dynamical behaviors. According to the obtained results, various features of the capacitor microphone under study are discovered due to the flexibility in choosing the kernel, unlike the previous mathematical formalism.

86 citations


Journal ArticleDOI
TL;DR: In this article, a multi-microphone complex spectral mapping (MMCM) was proposed for speaker separation in reverberant conditions, which combines MMCM with minimum variance distortionless response (MVDR) beamforming and post-filtering.
Abstract: We propose multi-microphone complex spectral mapping, a simple way of applying deep learning for time-varying non-linear beamforming, for speaker separation in reverberant conditions. We aim at both speaker separation and dereverberation. Our study first investigates offline utterance-wise speaker separation and then extends to block-online continuous speech separation (CSS). Assuming a fixed array geometry between training and testing, we train deep neural networks (DNN) to predict the real and imaginary (RI) components of target speech at a reference microphone from the RI components of multiple microphones. We then integrate multi-microphone complex spectral mapping with minimum variance distortionless response (MVDR) beamforming and post-filtering to further improve separation, and combine it with frame-level speaker counting for block-online CSS. Although our system is trained on simulated room impulse responses (RIR) based on a fixed number of microphones arranged in a given geometry, it generalizes well to a real array with the same geometry. State-of-the-art separation performance is obtained on the simulated two-talker SMS-WSJ corpus and the real-recorded LibriCSS dataset.

41 citations


Journal ArticleDOI
TL;DR: In this article, photoacoustic spectroscopy (PAS) is used for gas sensing, which usually uses a capacitive microphone as a signal detector, but the electric nature of the microphone limits its performance.
Abstract: Photoacoustic spectroscopy (PAS) is an ultrasensitive method for gas sensing, which usually using a capacitive microphone as a signal detector. However, the electric nature of the microphone limits...

40 citations


Journal ArticleDOI
TL;DR: A convolutional neural network (CNN) based SR model that takes advantage of information from both time and frequency domains is proposed, which improves the generalization capability for untrained microphone channels and unknown downsampling schemes.
Abstract: Speech super-resolution (SR) aims to increase the sampling rate of a given speech signal by generating high-frequency components. This paper proposes a convolutional neural network (CNN) based SR model that takes advantage of information from both time and frequency domains. Specifically, the proposed CNN is a time-domain model that takes the raw waveform of low-resolution speech as the input, and outputs an estimate of the corresponding high-resolution waveform. During the training stage, we employ a cross-domain loss to optimize the network. We compare our model with several deep neural network (DNN) based SR models, and experiments show that our model outperforms existing models. Furthermore, the robustness of DNN-based models is investigated, in particular regarding microphone channels and downsampling schemes, which have a major impact on the performance of DNN-based SR models. By training with proper datasets and preprocessing, we improve the generalization capability for untrained microphone channels and unknown downsampling schemes.

30 citations


Journal ArticleDOI
TL;DR: In this paper, three distinctive sensors for laser powder bed fusion metal additive manufacturing process monitoring are compared: a microphone for airborne acoustic emissions, an on-axis two-colour pyrometer for melt pool temperature measurement and an off-axis thermographic camera are simultaneously applied.

27 citations


Journal ArticleDOI
29 Jun 2021-Energies
TL;DR: An overview of the importance of using sound localization in different applications along with the use and limitations of ad-hoc microphones over other microphones is presented, along with some of the existing methods that are used for sound localization using microphone arrays in the recent literature.
Abstract: Sound localization is a vast field of research and advancement which is used in many useful applications to facilitate communication, radars, medical aid, and speech enhancement to but name a few. Many different methods are presented in recent times in this field to gain benefits. Various types of microphone arrays serve the purpose of sensing the incoming sound. This paper presents an overview of the importance of using sound localization in different applications along with the use and limitations of ad-hoc microphones over other microphones. In order to overcome these limitations certain approaches are also presented. Detailed explanation of some of the existing methods that are used for sound localization using microphone arrays in the recent literature is given. Existing methods are studied in a comparative fashion along with the factors that influence the choice of one method over the others. This review is done in order to form a basis for choosing the best fit method for our use.

26 citations


Journal ArticleDOI
TL;DR: In this paper, a novel health index is proposed to detect incipient fault of ball bearing and monitor its progression earlier than the visible fault appearance using the microphone sensor Frequency energy shift method is employed to extract the health index (HI), based on the findings that the energy is concentrated and shifted in specific frequency bands as the fault progresses.

26 citations


Journal ArticleDOI
TL;DR: In this study, multi-device operation monitoring system by analyzing sound is developed and was applied successfully in monitoring experiments in two different environments: a workshop in which hand-operated device was used and a factory with a computer numerical control machine.

24 citations


Journal ArticleDOI
TL;DR: An acoustic source identification system, which includes identifying both the recording device and the environment in which it was recorded, is proposed, which achieved accuracies of 98% and 98.57% for environment and microphone classification, respectively, using unvoiced speech segments.
Abstract: The recording device along with the acoustic environment plays a major role in digital audio forensics. We propose an acoustic source identification system in this paper, which includes identifying both the recording device and the environment in which it was recorded. A hybrid Convolutional Neural Network (CNN) with Long Short-Term Memory (LSTM) is used in this study to automatically extract environments and microphone features from the speech sound. In the experiments, we investigated the effect of using the voiced and unvoiced segments of speech on the accuracy of the environment and microphone classification. We also studied the effect of background noise on microphone classification in 3 different environments, i.e., very quiet, quiet, and noisy. The proposed system utilizes a subset of the KSU-DB corpus containing 3 environments, 4 classes of recording devices, 136 speakers (68 males and 68 females), and 3600 recordings of words, sentences, and continuous speech. This research combines the advantages of both CNN and RNN (in particular bidirectional LSTM) models, called CRNN. The speech signals were represented as a spectrogram and were fed to the CRNN model as 2D images. The proposed method achieved accuracies of 98% and 98.57% for environment and microphone classification, respectively, using unvoiced speech segments.

22 citations


Journal ArticleDOI
TL;DR: This paper categorizes existing 3D microphone arrays according to their physical configurations, design philosophies, and purposes, followed by an overview of each array.
Abstract: Along with the recent advance of multichannel 3D audio technologies, a number of new microphone techniques for 3D sound recording have been proposed over the years. To choose a technique that is most suitable for the intended goal of a recording, it is first necessary to understand the design principles, pros, and cons of different techniques. This paper first categorizes existing 3D microphone arrays according to their physical configurations, design philosophies, and purposes, followed by an overview of each array. Studies that have subjectively or objectively evaluated different microphone arrays are also reviewed. Different approaches in the configuration of upper microphone layer are discussed, aiming to provide theoretical and practical insights into how they can contribute to creating an immersive auditory experience. Finally, limitations of previous studies and future research topics in 3D sound recording are identified.

21 citations


Journal ArticleDOI
TL;DR: In this article, a fully convolutional neural network is trained by using two noisy realizations of the same speech signal, one used as the input and the other as the target of the network.

Proceedings ArticleDOI
24 Jun 2021
TL;DR: In this paper, an acoustic-based in-ear system for general human motion sensing is proposed, which uses the occlusion effect (i.e., the enhancement of low-frequency components of bone-conducted sounds in an occluded ear canal).
Abstract: Smart earbuds are recognized as a new wearable platform for personal-scale human motion sensing. However, due to the interference from head movement or background noise, commonly-used modalities (e.g. accelerometer and microphone) fail to reliably detect both intense and light motions. To obviate this, we propose OESense, an acoustic-based in-ear system for general human motion sensing. The core idea behind OESense is the joint use of the occlusion effect (i.e., the enhancement of low-frequency components of bone-conducted sounds in an occluded ear canal) and inward-facing microphone, which naturally boosts the sensing signal and suppresses external interference. We prototype OESense as an earbud and evaluate its performance on three representative applications, i.e., step counting, activity recognition, and hand-to-face gesture interaction. With data collected from 31 subjects, we show that OESense achieves 99.3% step counting recall, 98.3% recognition recall for 5 activities, and 97.0% recall for five tapping gestures on human face, respectively. We also demonstrate that OESense is compatible with earbuds' fundamental functionalities (e.g. music playback and phone calls). In terms of energy, OESense consumes 746 mW during data recording and recognition and it has a response latency of 40.85 ms for gesture recognition. Our analysis indicates such overhead is acceptable and OESense is potential to be integrated into future earbuds.

Journal ArticleDOI
TL;DR: It is proved that first-order differential beamformers with linear microphone arrays are not steerable and their mainlobes can only be at the endfire directions, and a method to design steerable beam formers with LDMAs using null constraints is developed.
Abstract: Differential microphone arrays (DMAs) can achieve high directivity and frequency-invariant spatial response with small apertures; they also have a great potential to be used in a wide spectrum of applications for high-fidelity sound acquisition. Although many efforts have been made to address the design of linear DMAs (LDMAs), most developed methods so far only work for the situation where the source of interest is incident from the endfire direction. This paper studies the steering problem of differential beamformers with linear microphone arrays. We present new insights into beam steering of LDMAs and propose a series of steerable differential beamformers. The major contributions of this paper are as follows. 1) A series of ideal functions are defined to describe the ideal, target beampatterns of LDMAs. 2) We prove that first-order differential beamformers with linear microphone arrays are not steerable and their mainlobes can only be at the endfire directions. 3) We deduce the fundamental conditions for designing steerable differential beamformers with LDMAs. 4) We develop a method to design steerable beamformers with LDMAs using null constraints. Simulations and experiments validate the properties of the developed method.

Proceedings ArticleDOI
24 Jun 2021
TL;DR: Owlet as mentioned in this paper uses a carefully designed 3D-printed metamaterial structure that covers the microphone to embed a direction-specific signature in the recorded sounds and learns the directional signatures through a one-time in-lab calibration.
Abstract: This paper presents a low-power and miniaturized design for acoustic direction-of-arrival (DoA) estimation and source localization, called Owlet. The required aperture, power consumption, and hardware complexity of the traditional array-based spatial sensing techniques make them unsuitable for small and power-constrained IoT devices. Aiming to overcome these fundamental limitations, Owlet explores acoustic microstructures for extracting spatial information. It uses a carefully designed 3D-printed metamaterial structure that covers the microphone. The structure embeds a direction-specific signature in the recorded sounds. Owlet system learns the directional signatures through a one-time in-lab calibration. The system uses an additional microphone as a reference channel and develops techniques that eliminate environmental variation, making the design robust to noises and multipaths in arbitrary locations of operations. Owlet prototype shows 3.6° median error in DoA estimation and 10cm median error in source localization while using a 1.5cm × 1.3cm acoustic structure for sensing. The prototype consumes less than 100th of the energy required by a traditional microphone array to achieve similar DoA estimation accuracy. Owlet opens up possibilities of low-power sensing through 3D-printed passive structures.

Journal ArticleDOI
TL;DR: In this study, several DNN models were developed with a training strategy specifically designed for an acoustic imaging task, revealing a far improved source localization and source strength estimation.

Journal ArticleDOI
TL;DR: In this paper, a wideband fiber-optic Fabry-Perot (F-P) acoustic sensing (FPAS) scheme was realized by utilizing a high-speed absolute cavity length demodulation with a 70-kHz maximum line rate spectrometer.
Abstract: In this paper, we realize a wideband fiber-optic Fabry-Perot (F-P) acoustic sensing (FPAS) scheme by utilizing a high-speed absolute cavity length demodulation with a 70-kHz maximum line rate spectrometer. The wideband FPAS is made of a pre-stress stainless-steel diaphragm based on F-P interferometric structure. The real-time absolute F-P cavity lengths are calculated by a phase demodulation method, which is realized by processing the interference image at a 70-kHz frame rate. Acoustic signal is obtained by extracting the AC component of the demodulated cavity length. The experimental results show that the spectrometer can be running at a 50-kHz line rate, and an acoustic detection wideband of 20 Hz to 20 kHz is obtained. The noise-limited minimum detectable sound pressure level is 18.8 dB, which is sensitive enough for the communication of human voice. The proposed wideband acoustic sensing scheme achieves good robustness, which is promising as a speech-sound microphone for communication during the magnetic resonance imaging procedure.

Journal ArticleDOI
TL;DR: In this article, the authors describe the design methodology for wind-tunnel major components, and the validation tests of the flow quality are performed by pitot-tube and hot-wire measurements, and those of the aeroacoustic performance are conducted by far-field microphone measurements.

Journal ArticleDOI
TL;DR: In this paper, a 3D printed gradient-index phononic crystal lens was used for focusing audio frequency range acoustic waves in air to enhance sound energy harvesting, which can find applications for wireless sensors and other low-power electronic components.
Abstract: We investigate the harvesting of sound waves by exploiting a 3D-printed gradient-index phononic crystal lens. The concept is demonstrated numerically and experimentally for focusing audio frequency range acoustic waves in air to enhance sound energy harvesting. A finite-element model is developed to design the unit cell dispersion properties and to construct the 3D lens for wave field simulations. Numerical simulations are presented to confirm the focusing of incident plane waves and to study the sensitivity of the refractive index profile to the direction of wave propagation. The theoretical predictions are validated experimentally using a scanning microphone setup under speaker excitation, and a very good agreement is observed between the experimental and numerical wave fields. A circular piezoelectric unimorph harvester is placed at the focal position of the lens, and its performance is characterized with a resistor sweep in the absence and presence of the lens, resulting in more than an order of magnitude enhancement in the harvested power with the lens. The 3D-printed lens presented here substantially enhances the intensity of sound energy via focusing, yielding micro-Watt level power output, which can find applications for wireless sensors and other low-power electronic components.

Journal ArticleDOI
TL;DR: This is the first systematic work on observability analysis of SLAM-based microphone array calibration and sound source localization with Fisher information matrix approach, and presents necessary and sufficient conditions guaranteeing its full column rankness, which lead to parameter identifiability.
Abstract: Sensor array-based systems, which adopt time difference of arrival (TDOA) measurements among the sensors, have found many robotic applications. However, for existing frameworks and systems to be useful, the sensor array needs to be calibrated accurately. Of particular interest in this article are microphone array-based robot audition systems. In our recent work, by using a moving sound source, and the graph-based formulation of simultaneous localization and mapping (SLAM), we have proposed a framework for joint sound source localization and calibration of microphone array geometrical information, together with the estimation of microphone time offset and clock difference/drift rates. However, a thorough study on the identifiability question, termed observability analysis here, in the SLAM framework for microphone array calibration and sound source localization, is still lacking in the literature. In this article, we will fill the abovementioned gap via a Fisher information matrix approach. Motivated by the equivalence between the full column rankness of the Fisher information matrix and the Jacobian matrix, we leverage the structure of the latter associated with the SLAM formulation, and present necessary and sufficient conditions guaranteeing its full column rankness, which lead to parameter identifiability. We have thoroughly discussed the 3-D case with asynchronous (with both time offset and clock drifts, or with only one of them) and synchronous microphone array, respectively. These conditions are closely related to the motion varieties of the sound source and the microphone array configuration, and have intuitive and physical interpretations. Based on the established conditions, we have also discovered some particular cases where observability is impossible. Connections with calibration of other sensors will also be discussed, amongst others. To our best knowledge, this is the first systematic work on observability analysis of SLAM-based microphone array calibration and sound source localization. The tools and concepts used in this article are also applicable to other TDOA sensing modalities such as ultrawide band (UWB) sensors.

Journal ArticleDOI
TL;DR: In this article, a combination of a convolutional neural network and a recurrent neural network (RNN) was used to detect snoring from audio recordings of 38 patients referred to a clinical center for a sleep study.

Journal ArticleDOI
TL;DR: An optimal algorithm for microphone reference selection that maximizes the output signal-to-noise ratio (SNR) and a lower-complexity algorithm that is still optimal for rank-1 beamformers, but sub-optimal for the general $r>1$ rank beamformingers are proposed.
Abstract: Multi-microphone speech enhancement methods typically require a reference position with respect to which the target signal is estimated. Often, this reference position is arbitrarily chosen as one of the reference microphones. However, it has been shown that the choice of the reference microphone can have a significant impact on the final noise reduction performance. In this paper, we therefore theoretically analyze the impact of selecting a reference on the noise reduction performance with near-end noise being taken into account. Following the generalized eigenvalue decomposition (GEVD) based optimal variable span filtering framework, we find that for any linear beamformer, the output signal-to-noise ratio (SNR) taking both the near-end and far-end noise into account is reference dependent. Only when the near-end noise is neglected, the output SNR of rank-1 beamformers does not depend on the reference position. However, in general for rank- $r$ beamformers with $r>1$ (e.g., the multichannel Wiener filter) the performance does depend on the reference position. Based on these, we propose an optimal algorithm for microphone reference selection that maximizes the output SNR. In addition, we propose a lower-complexity algorithm that is still optimal for rank-1 beamformers, but sub-optimal for the general $r>1$ rank beamformers. Experiments using a simulated microphone array validate the effectiveness of both proposed methods and show that in terms of quality, several dB can be gained by selecting the proper reference microphone.

Journal ArticleDOI
Zhuowei Xiang1, Weiyu Dai1, Weiying Rao1, Xun Cai1, Hongyan Fu1 
TL;DR: In this paper, a high sensitive fiber-optic Fabry-Perot interferometer sensor based on gold diaphragm for acoustic detection has been proposed and experimentally demonstrated.
Abstract: In this paper, a high sensitive fiber-optic Fabry-Perot interferometer sensor based on gold diaphragm for acoustic detection has been proposed and experimentally demonstrated. The extrinsic Fabry-Perot interferometer (EFPI) comprises a 140-nm-thick gold diaphragm and the end-face of a fiber-optic collimator, both of which are enclosed into a structure made of glass tubes. A fiber-optic collimator is first adopted in acoustic sensing to reduce the light loss and improve the sensitivity. The experimental results show that the proposed sensor has a wide flat response range from 400 Hz to 12 kHz, almost covering the primary frequency component of audible sound. The pressure sensitivity and the minimum detectable acoustic pressure level (MDP) are measured to be −175.7 dB re 1rad/ $\mu $ Pa@150Hz and $95.3~\mu $ Pa/Hz1/2@2kHz, respectively. The proposed sensor has advantages of high sensitivity, wide flat response, low cost, and simple fabrication, which shows its potential as a fiber optical microphone with high sensitivity and high acoustic quality in practical applications.

Journal ArticleDOI
Lv Na1, Shao-jie Chen1, Chen Qiheng1, Wei Tao1, Hui Zhao1, Shanben Chen1 
TL;DR: In this article, a microphone array technology was used to monitor the dynamic pulsed GMAW process, and the splash sound signal was successfully separated out based on FastICA blind signal separation algorithm, and its frequency domain energy distribution is mainly concentrated in the high frequency band of 6000−8000 Hz.

Journal ArticleDOI
TL;DR: In this paper, a fly Ochracea inspired MEMS directional microphones were designed identically in circular shape and operated using piezoelectric sensing in 3-3 transducer mode.
Abstract: The majority of fly Ormia ochracea inspired sound source localization (SSL) works are limited to 1D, and therefore SSL in 2D can include a new vision for ambiguous acoustic applications. This article reports on an analytical and experimental work on SSL in 2D using a pair of fly O. ochracea inspired MEMS directional microphones. The reported directional microphones were designed identically in circular shape and operated using piezoelectric sensing in 3–3 transducer mode. In X–Y plane, they were canted in a 90° phase difference, i.e., one microphone was in X–axis and another one was in Y–axis. As a result, their directionality results from the X–axis (cosine) and Y–axis (sine) formulated the tangent dependent 2D SSL in the X–Y plane. The highest accuracy of the SSL in 2D was found to be ±2.92° at bending frequency (11.9 kHz) followed by a ±3.25° accuracy at rocking frequency (6.4 kHz), a ±4.68° accuracy at 1 kHz frequency, and a ±6.91° accuracy at 18 kHz frequency. The subjected frequencies were selected based on the measured inter-aural sensitivity difference (mISD) which showed a proportional impact on the cue of 2D SSL, i.e., the directionality. Besides, the basic acoustic functionalities, such as sensitivity, SNR, and self–noise were found to be 20.86 mV/Pa, 66.4 dB, and 27.6 dB SPL, respectively at 1 kHz frequency and 1 Pa sound pressure. Considering this trend of microphones, the outstanding contribution of this work is the SSL in 2D with higher accuracy using a pair of high performing bio–inspired directional microphones.

Journal ArticleDOI
TL;DR: In this article, a multiresolution deep learning approach is proposed to encode relevant information contained in unprocessed time-domain acoustic signals captured by microphone arrays for real-time sound source two-dimensional localization tasks.
Abstract: Sound source localization using multichannel signal processing has been a subject of active research for decades. In recent years, the use of deep learning in audio signal processing has significantly improved the performances for machine hearing. This has motivated the scientific community to also develop machine learning strategies for source localization applications. This paper presents BeamLearning, a multiresolution deep learning approach that allows the encoding of relevant information contained in unprocessed time-domain acoustic signals captured by microphone arrays. The use of raw data aims at avoiding the simplifying hypothesis that most traditional model-based localization methods rely on. Benefits of its use are shown for real-time sound source two-dimensional localization tasks in reverberating and noisy environments. Since supervised machine learning approaches require large-sized, physically realistic, precisely labelled datasets, a fast graphics processing unit-based computation of room impulse responses was developed using fractional delays for image source models. A thorough analysis of the network representation and extensive performance tests are carried out using the BeamLearning network with synthetic and experimental datasets. Obtained results demonstrate that the BeamLearning approach significantly outperforms the wideband MUSIC and steered response power-phase transform methods in terms of localization accuracy and computational efficiency in the presence of heavy measurement noise and reverberation.

Posted Content
TL;DR: The Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) as discussed by the authors is an ICASSP2022 Signal Processing Grand Challenge, which consists of two tracks, namely speaker diarization and multi-speaker ASR.
Abstract: Recent development of speech signal processing, such as speech recognition, speaker diarization, etc., has inspired numerous applications of speech technologies. The meeting scenario is one of the most valuable and, at the same time, most challenging scenarios for speech technologies. Speaker diarization and multi-speaker automatic speech recognition in meeting scenarios have attracted increasing attention. However, the lack of large public real meeting data has been a major obstacle for advancement of the field. Therefore, we release the \emph{AliMeeting} corpus, which consists of 120 hours of real recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by each participants' headset microphone. Moreover, we will launch the Multi-channel Multi-party Meeting Transcription Challenge (M2MeT), as an ICASSP2022 Signal Processing Grand Challenge. The challenge consists of two tracks, namely speaker diarization and multi-speaker ASR. In this paper we provide a detailed introduction of the dateset, rules, evaluation methods and baseline systems, aiming to further promote reproducible research in this field.

Journal ArticleDOI
TL;DR: In this paper, a feature-based approach is proposed to tackle the data association problem and achieve multisource localization in 3D in a distributed microphone array, where features are generated by using interchannel phase difference (IPD) information, which indicates the number of times each frequency bin across all time frames has been assigned to sources.
Abstract: Multisource localization using time difference of arrival (TDOA) is challenging because the correct combination of TDOA estimates across different microphone pairs, corresponding to the same source, is usually unknown, which is termed as the data association problem. Moreover, many existing multisource localization techniques are originally demonstrated in two dimensions, and their extensions to three dimensions (3D) are not straightforward and would lead to much higher computational complexity. In this paper, we propose an efficient, feature-based approach to tackle the data association problem and achieve multisource localization in 3D in a distributed microphone array. The features are generated by using interchannel phase difference (IPD) information, which indicates the number of times each frequency bin across all time frames has been assigned to sources. Based on such features, the data association problem is addressed by correlating most similar features across different microphone pairs, which is executed by solving a two-dimensional assignment problem successively. Thereafter, the locations of multiple sources can be obtained by imposing a single-source location estimator on the resulting TDOA combinations. The proposed approach is evaluated using both simulated data and real-world recordings.

Journal ArticleDOI
TL;DR: In this article, an optimal sound absorber with a compact scale consisting of multiple inhomogeneous Helmholtz resonators with extended necks (HRENs) was designed and demonstrated for effective sound absorption in a prescribed frequency range.

Proceedings ArticleDOI
06 Jun 2021
TL;DR: In this article, a coherence-based selection algorithm is proposed to select the reference signals with high coherence to improve the quality of reference signals and the noise reduction performance of the active noise control system.
Abstract: Feedforward active noise control (ANC) is widely utilized to attenuate the broadband noise picked up by the reference microphone. However, in some situations, it is impractical to obtain a clean reference signal when the noise source is far away from the controller. Hence, we adopt a wireless reference microphone to pick up the reference signals around the noise sources. Furthermore, a coherence-based-selection algorithm is proposed to select the reference signals with high coherence. The proposed method improves the quality of the reference signals and the noise reduction performance of the ANC system. Numerical simulations and real-time experiments are conducted to validate the effectiveness of the proposed algorithm.

Proceedings ArticleDOI
06 May 2021
TL;DR: ProxiMic as discussed by the authors uses the feature from pop noise observed when a user speaks and blows air onto the microphone to detect close-to-mic speech, which can achieve 94.1% activation recall, 12.3 False Accepts per Week per User (FAWU) with 68KB memory size.
Abstract: Wake-up-free techniques (e.g., Raise-to-Speak) are important for improving the voice input experience. We present ProxiMic, a close-to-mic (within 5 cm) speech sensing technique using only one microphone. With ProxiMic, a user keeps a microphone-embedded device close to the mouth and speaks directly to the device without wake-up phrases or button presses. To detect close-to-mic speech, we use the feature from pop noise observed when a user speaks and blows air onto the microphone. Sound input is first passed through a low-pass adaptive threshold filter, then analyzed by a CNN which detects subtle close-to-mic features (mainly pop noise). Our two-stage algorithm can achieve 94.1% activation recall, 12.3 False Accepts per Week per User (FAWU) with 68 KB memory size, which can run at 352 fps on the smartphone. The user study shows that ProxiMic is efficient, user-friendly, and practical.