scispace - formally typeset
Search or ask a question

Showing papers on "Microphone published in 2022"


Journal ArticleDOI
TL;DR: In this article , the effects of different contaminations on the acoustic spectrum of wire and arc additive manufacturing (WAAM) were analyzed using the time and frequency domain techniques, namely, Power Spectral Density, and Short Time Fourier Transform.
Abstract: Additive Manufacturing (AM) processes allow the creation of complex parts with near net shapes. Wire and arc additive manufacturing (WAAM) is an AM process that can produce large metallic components with low material waste and high production rates. Typically, WAAM enables over 10-times the volumetric deposition rates of powder-based AM processes. However, the high depositions rates of WAAM require high heat input to melt the large volume of material, which in turn results in potential flaws such as pores, cracks, distortion, loss of mechanical properties and low dimensional accuracy. Hence, for practical implementation of the WAAM process in an industrial environment it is necessary to ensure flaw-free production. Accordingly, to guarantee the production-level scalability of WAAM it is fundamental to monitor and detect flaw formation during the process. The objective of this work is to characterize the effects of different contaminations on the acoustic spectrum of WAAM and lay the foundations for a microphone-based acoustic sensing approach for monitoring the quality of WAAM-fabricated parts. To realize this objective, WAAM parts were processed with deliberately introduced flaws, such as material contamination, and the acoustic signals were analyzed using the time and frequency domain techniques, namely, Power Spectral Density, and Short Time Fourier Transform. The signatures obtained were used to pinpoint the location of flaw formation. The results obtained in this study show that the effects of contamination in WAAM can be identified through the analysis of the acoustic spectrum of the process. • Acoustic monitoring of the WAAM process is performed. • Defect detection using acoustic signal is validated with CT scans. • Acoustic monitoring is a expedite and low-cost solution for defect detection during WAAM.

31 citations


Proceedings ArticleDOI
23 May 2022
TL;DR: A system consisting of deep learning and signal processing to simultaneously suppress echoes, noise, and reverberation is proposed and a novel speech dense-prediction backbone is designed.
Abstract: Speech quality is often degraded by acoustic echoes, background noise, and reverberation. In this paper, we propose a system consisting of deep learning and signal processing to simultaneously suppress echoes, noise, and reverberation. For the deep learning, we design a novel speech dense-prediction backbone. For the signal processing, a linear acoustic echo canceller is used as conditional information for deep learning. To improve the performance of the speech dense-prediction backbone, strategies such as a microphone and reference phase encoder, multi-scale time-frequency processing, and streaming axial attention are designed. The proposed system ranked first in both AEC and DNS Challenge (non-personal track) of ICASSP 2022. In addition, this backbone has also been extended to the multi-channel speech enhancement task, and placed second in ICASSP 2022 L3DAS22 Challenge1.

26 citations


Journal ArticleDOI
01 May 2022-Matter
TL;DR: In this article , an eardrum-inspired acoustic textile was developed that can function as a sensitive audible microphone, which can convert even very slight 10−7−1 atmospheric pressure waves at audible frequencies into mechanical vibrations with nanometer amplitudes, which further induce the electrical signals through the fibers to detect the direction of sound pulses, broadcast human speech, and capture heartbeats.

23 citations


Journal ArticleDOI
TL;DR: In this paper , a cross-correlation based direction of arrival (DOA) estimation technique using the time difference of arrival at different microphone pairs, with noise angular spectrum subtraction, is proposed.
Abstract: This paper presents a sound source localization method using an irregular microphone array embedded in a drone. Sound source localization is an integral function of drone audition systems which enables various applications of drones such as search and rescue missions. However, the audio recordings using the on-board microphones obscure the sound emitted by a source on the ground due to drone generated motor and propeller noise, thus leading to an extremely low signal-to-drone noise ratio (SdNR). In this paper, we propose a cross-correlation based direction of arrival (DOA) estimation technique using the time difference of arrival (TDOA) at different microphone pairs, with noise angular spectrum subtraction. Through the measured current-specific drone noise spectrum, noise suppression has been achieved from the multi-channel recordings. Experimental results show that the proposed method is capable of estimating the position in three-dimensional space for simultaneously active multiple sound sources on the ground at low SdNR conditions ($-30$ dB), and localize two sound sources located at a certain azimuth angular separation with low prediction error comparable to the multiple signal classification (MUSIC) based algorithms and the generalized cross-correlation with phase transformation (GCC-PHAT) method. Due to its simplicity, applicability to any array geometry, and better robustness against drone noise, the proposed method increases the feasibility of localization under extreme SdNR levels.

17 citations


Journal ArticleDOI
TL;DR: A comprehensive overview of various non-destructive testing (NDT) techniques for wire and arc additive manufacturing (WAAM) and fusion welding can be found in this paper , where the minimum defect size that can be identified via NDT methods has been obtained from previous academic research or from tests carried out by companies.
Abstract: In Wire and Arc Additive Manufacturing (WAAM) and fusion welding, various defects such as porosity, cracks, deformation and lack of fusion can occur during the fabrication process. These have a strong impact on the mechanical properties and can also lead to failure of the manufactured parts during service. These defects can be recognized using non-destructive testing (NDT) methods so that the examined workpiece is not harmed. This paper provides a comprehensive overview of various NDT techniques for WAAM and fusion welding, including laser-ultrasonic, acoustic emission with an airborne optical microphone, optical emission spectroscopy, laser-induced breakdown spectroscopy, laser opto-ultrasonic dual detection, thermography and also in-process defect detection via weld current monitoring with an oscilloscope. In addition, the novel research conducted, its operating principle and the equipment required to perform these techniques are presented. The minimum defect size that can be identified via NDT methods has been obtained from previous academic research or from tests carried out by companies. The use of these techniques in WAAM and fusion welding applications makes it possible to detect defects and to take a step towards the production of high-quality final components.

17 citations


Journal ArticleDOI
TL;DR: A system called LASense is proposed, which can significantly increase the sensing range for fine-grained human activities using a single pair of speaker and microphone using a virtual transceiver idea that purely leverages delicate signal processing techniques in software.
Abstract: Acoustic signals have been widely adopted in sensing fine-grained human activities, including respiration monitoring, finger tracking, eye blink detection, etc. One major challenge for acoustic sensing is the extremely limited sensing range, which becomes even more severe when sensing fine-grained activities. Different from the prior efforts that adopt multiple microphones and/or advanced deep learning techniques for long sensing range, we propose a system called LASense, which can significantly increase the sensing range for fine-grained human activities using a single pair of speaker and microphone. To achieve this, LASense introduces a virtual transceiver idea that purely leverages delicate signal processing techniques in software. To demonstrate the effectiveness of LASense, we apply the proposed approach to three fine-grained human activities, i.e., respiration, finger tapping and eye blink. For respiration monitoring, we significantly increase the sensing range from the state-of-the-art 2 m to 6 m. For finer-grained finger tapping and eye blink detection, we increase the state-of-the-art sensing range by 150% and 80%, respectively.

16 citations


Proceedings ArticleDOI
23 May 2022
TL;DR: The AliMeeting corpus as discussed by the authors contains 120 hours of recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field collected by headset microphone.
Abstract: Recent development of speech signal processing, such as speech recognition, speaker diarization, etc., has inspired numerous applications of speech technologies. The meeting scenario is one of the most valuable and, at the same time, most challenging scenarios for the deployment of speech technologies. Speaker diarization and multi-speaker automatic speech recognition in meeting scenarios have attracted much attention recently. However, the lack of large public meeting data has been a major obstacle for advancement of the field. Therefore, we make available the AliMeeting corpus, which consists of 120 hours of recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by headset microphone. Each meeting session is composed of 2-4 speakers with different speaker overlap ratio, recorded in meeting rooms with different size. Along with the dataset, we launch the ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) with two tracks, namely speaker diarization and multi-speaker ASR, aiming to provide a common testbed for meeting rich transcription and promote reproducible research in this field. In this paper we provide a detailed introduction of the AliMeeting dateset, challenge rules, evaluation methods and baseline systems.

13 citations


Journal ArticleDOI
01 Jun 2022-Sensors
TL;DR: An electronicStethoscope was developed consisting of a traditional stethoscope with a condenser microphone embedded in the head to collect cardiopulmonary sounds and an AI-based classifier for cardiopULmonary sounds was proposed, finding the microphone placed in the stethoscopes head surrounded by cork is found to have better noise reduction.
Abstract: With conventional stethoscopes, the auscultation results may vary from one doctor to another due to a decline in his/her hearing ability with age or his/her different professional training, and the problematic cardiopulmonary sound cannot be recorded for analysis. In this paper, to resolve the above-mentioned issues, an electronic stethoscope was developed consisting of a traditional stethoscope with a condenser microphone embedded in the head to collect cardiopulmonary sounds and an AI-based classifier for cardiopulmonary sounds was proposed. Different deployments of the microphone in the stethoscope head with amplification and filter circuits were explored and analyzed using fast Fourier transform (FFT) to evaluate the effects of noise reduction. After testing, the microphone placed in the stethoscope head surrounded by cork is found to have better noise reduction. For classifying normal (healthy) and abnormal (pathological) cardiopulmonary sounds, each sample of cardiopulmonary sound is first segmented into several small frames and then a principal component analysis is performed on each small frame. The difference signal is obtained by subtracting PCA from the original signal. MFCC (Mel-frequency cepstral coefficients) and statistics are used for feature extraction based on the difference signal, and ensemble learning is used as the classifier. The final results are determined by voting based on the classification results of each small frame. After the testing, two distinct classifiers, one for heart sounds and one for lung sounds, are proposed. The best voting for heart sounds falls at 5–45% and the best voting for lung sounds falls at 5–65%. The best accuracy of 86.9%, sensitivity of 81.9%, specificity of 91.8%, and F1 score of 86.1% are obtained for heart sounds using 2 s frame segmentation with a 20% overlap, whereas the best accuracy of 73.3%, sensitivity of 66.7%, specificity of 80%, and F1 score of 71.5% are yielded for lung sounds using 5 s frame segmentation with a 50% overlap.

13 citations


Journal ArticleDOI
TL;DR: In this paper , an end-to-end computational model for predicting speech perception with cochlear implants (CI), the most widely used neuroprosthesis, is presented, which integrates signal processing, a finite element model of the electrically-stimulated cochlea, and an auditory nerve model to predict neural responses to speech stimuli.
Abstract: Advances in computational models of biological systems and artificial neural networks enable rapid virtual prototyping of neuroprostheses, accelerating innovation in the field. Here, we present an end-to-end computational model for predicting speech perception with cochlear implants (CI), the most widely-used neuroprosthesis.The model integrates CI signal processing, a finite element model of the electrically-stimulated cochlea, and an auditory nerve model to predict neural responses to speech stimuli. An automatic speech recognition neural network is then used to extract phoneme-level speech perception from these neural response patterns.Compared to human CI listener data, the model predicts similar patterns of speech perception and misperception, captures between-phoneme differences in perceptibility, and replicates effects of stimulation parameters and noise on speech recognition. Information transmission analysis at different stages along the CI processing chain indicates that the bottleneck of information flow occurs at the electrode-neural interface, corroborating studies in CI listeners.An end-to-end model of CI speech perception replicated phoneme-level CI speech perception patterns, and was used to quantify information degradation through the CI processing chain.This type of model shows great promise for developing and optimizing new and existing neuroprostheses.

12 citations


Journal ArticleDOI
TL;DR: In this article , a dual-branched spherical convolutional autoencoder is proposed to obtain high-resolution localization results from the conventional spherical beamforming maps while incorporating frequency-variant and distortion-invariant strategies to address the inherent challenges.
Abstract: While sound source localization (SSL) using a spherical microphone array system can be applied to obtain visual beam patterns of source distribution maps in a range of omnidirectional acoustic applications, the present challenges of the spherical measurement system on the valid frequency ranges and the spatial distortion as well as the grid-related limitations of data-driven SSL approaches raise the need to develop an appropriate method. Imbued by these challenges, this study proposes a deep learning (DL) approach to achieve the high-resolution performance of localizing multiple sound sources tailored for omnidirectional acoustic applications. First, we present a spherical target map representation that can panoramically pinpoint the position and strength information of multiple sound sources without any grid-related constraints. Then, a dual-branched spherical convolutional autoencoder is proposed to obtain high-resolution localization results from the conventional spherical beamforming maps while incorporating frequency-variant and distortion-invariant strategies to address the inherent challenges. We quantitatively and qualitatively assess our proposed method’s localization capability for multiple sound sources and validate that the proposed method can achieve far more precise and computationally efficient results than the existing approaches. By extension, we newly present the experimental setup that can create omnidirectional acoustic scenarios for the multiple SSL. By evaluating our proposed method in this experimental setup, we demonstrate the effectiveness and applicability of the proposed method with the experimental data. Our study delivers the proposed approach’s potential of being utilized in various SSL applications.

12 citations


Journal ArticleDOI
TL;DR: In this paper , the authors describe a microphone system featuring a new MEMS transducer based on a sealed-dual membrane (SDM) design paired with the latest generation of digital read-out ASIC.
Abstract: This work describes a microphone system featuring a new MEMS transducer based on a sealed-dual membrane (SDM) design paired with the latest generation of digital read-out ASIC. State-of-the-art noise performance is achieved thanks to significant optimizations both on the MEMS as well as on the ASIC side. The SDM design reduces significantly the magnitude of one of the main noise contributors by moving the air gaps to a sealed low-pressure chamber. The ASIC features an unconventional read-out amplifier based on a power-scalable current-feedback architecture as well as a reconfigurable $\Delta \Sigma $ modulator allowing to trade-off signal-to-noise ratio (SNR) versus power consumption. The microphone system achieves an SNR of 72dB(A) supporting an acoustical overload point (AOP) of 130dB SPL. This represents a significant improvement to current state-of-the-art digital microphones.

Journal ArticleDOI
TL;DR: In this article , two fast iteration algorithms, alternating direction method of multipliers (ADMM) and accelerated proximal gradient line search method (APGL) based on truncated nuclear norm regularization (TNNR), are proposed to complete the cross-spectral matrix for non-synchronous measurements beamforming.

Journal ArticleDOI
01 May 2022-Sensors
TL;DR: In this paper , an up-to-date review of the acoustic noise caused by multilayer ceramic capacitors in electronic devices, containing measurements methodologies, solutions, and simulation methods.
Abstract: Multilayer Ceramic Capacitors (MLCC) have a major role in modern electronic devices due to their small price and size, large range of capacitance, small ESL and ESR, and good frequency response. Unfortunately, the main dielectric material used for MLCCs, Barium Titanate, makes the capacitors vibrate due to the piezoelectric and electrostrictive effects. This vibration is transferred to the PCB, making it resonate in the audible range of 20 Hz–20 kHz, and in this way the singing capacitors phenomenon occurs. This phenomenon is usually measured with a microphone, to measure the sound pressure level, or with a Laser Doppler Vibrometer (LDV), to measure the vibration. Besides this, other methods are mentioned in the literature, for example, the optical fiber and the active excitation method. There are several solutions to attenuate or even eliminate the acoustic noise caused by MLCC. Specially designed capacitors for low acoustic levels and different layout geometries are only two options found in the literature. To prevent the singing capacitor phenomenon, different simulations can be performed, the harmonic analysis being the most popular technique. This paper is an up-to-date review of the acoustic noise caused by MLCCs in electronic devices, containing measurements methodologies, solutions, and simulation methods.

Journal ArticleDOI
TL;DR: In this paper , an experimental campaign was carried out in the seaport of Genoa to identify key points in the application of an acoustic camera to the characterization of port noise.

Journal ArticleDOI
TL;DR: A comprehensive literature survey of MEMS-based piezoelectric microphones along with the fabrication processes involved, application domains, and methodologies used for experimentations is presented in this article .
Abstract: This paper presents a comprehensive literature survey of MEMS based piezoelectric microphones along with the fabrication processes involved, application domains, and methodologies used for experimentations. Advantages and limitations of existing microphones are presented with the impact of process parameters during the thin film growth. This review identifies the issues faced by the microphone technologies spanning from the invention of microphones to the most recent state-of-the-art solutions implemented to overcome or address them. A detailed comparison of performance in terms of sensitivity and dynamic range is presented here that can be used to decide the piezoelectric material and process to be used to develop sensors based on the bandwidth requirement. Electrical and mechanical properties of different piezoelectric materials such as AlN, ZnO, quartz, PZT, PVDF, and other polymers that has great potential to be used as the sensing membrane in development and deployment of these microphones are presented along with the complications faced during the fabrication. Insights on the future of these sensors and emerging application domains are also discussed.

Journal ArticleDOI
16 Mar 2022-Fluids
TL;DR: In this article , a detailed investigation of the statistical properties of the near-field pressure fluctuations induced by an underexpanded jet, by varying the nozzle exit shapes has been presented.
Abstract: A detailed investigation of the statistical properties of the near-field pressure fluctuations induced by an under-expanded jet, by varying the nozzle exit shapes has been presented. Experiments using different convergent Chevron nozzles were carried out in the anechoic chamber at the University of Bristol to assess the importance of the Chevron shape on the near pressure field emitted by a single stream under-expanded jet. Measurements were carried out through an axial microphone array traversed radially to various positions for jet in an under-expanded condition at Mach number M = 1.3. The intermittent behavior is investigated considering the standard statistical indicators and a wavelet-based conditional approach, including the phase angle. The intermittent degree of various features related to different scales, such as Screech tones and broadband shock associate noise were estimated. A series of recently developed wavelet-based tools were assessed as a viable approach to investigate the noise emitted by under-expanded jets.

Book ChapterDOI
02 Feb 2022
TL;DR: In this article , a reinforcement learning agent equipped with a novel transformer memory learns motion policies to control its camera and microphone to recover the dynamic target audio, using self-attention to make high-quality estimates for current timesteps and also simultaneously improve its past estimates.
Abstract: We explore active audio-visual separation for dynamic sound sources, where an embodied agent moves intelligently in a 3D environment to continuously isolate the time-varying audio stream being emitted by an object of interest. The agent hears a mixed stream of multiple audio sources (e.g., multiple people conversing and a band playing music at a noisy party). Given a limited time budget, it needs to extract the target sound accurately at every step using egocentric audio-visual observations. We propose a reinforcement learning agent equipped with a novel transformer memory that learns motion policies to control its camera and microphone to recover the dynamic target audio, using self-attention to make high-quality estimates for current timesteps and also simultaneously improve its past estimates. Using highly realistic acoustic SoundSpaces [13] simulations in real-world scanned Matterport3D [11] environments, we show that our model is able to learn efficient behavior to carry out continuous separation of a dynamic audio target. Project: https://vision.cs.utexas.edu/projects/active-av-dynamic-separation/ .

Journal ArticleDOI
TL;DR: In this article , the correlation of turning process parameters of a CNC machine with the tool vibration, acoustic signal, and energy consumption using preliminary data analysis and machine learning methods was studied.

Proceedings ArticleDOI
23 May 2022
TL;DR: In this article , a convolutional recurrent network that performs complex spectral mapping was employed to combine the strengths of AC and BC microphones by employing an attention-based fusion with early fusion and late fusion strategies.
Abstract: Bone-conduction (BC) microphones capture speech signals by converting the vibrations of the human skull into electrical signals. BC sensors are insensitive to acoustic noise, but limited in bandwidth. On the other hand, conventional or air-conduction (AC) microphones are capable of capturing full-band speech, but are susceptible to background noise. We propose to combine the strengths of AC and BC microphones by employing a convolutional recurrent network that performs complex spectral mapping. To better utilize signals from both kinds of microphone, we employ attention-based fusion with early-fusion and late-fusion strategies. Experiments demonstrate the superiority of the proposed method over other recent speech enhancement methods combining BC and AC signals. In addition, our enhancement performance is significantly better than conventional speech enhancement counter-parts, especially in low signal-to-noise ratio scenarios.

Journal ArticleDOI
TL;DR: In this article , a structural transfer path analysis was conducted to derive the primary vibro-acoustic paths of structural road noise, and the optimum positions of the reference sensors were determined through the analysis of path contribution and vibroacoustic transfer function.

Journal ArticleDOI
TL;DR: The motive of the proposed concept is to address limitations by connecting the sensors with an Internet of Things network and cloud platform for remote recording and monitoring purposes by utilizing the Blynk IoT application and cloud server for the analytics.
Abstract: Human activity monitoring system plays a major role in the application of surveillance. It can be analyzed through cameras, sensors, and microphone. The traditional approach requires a human intervention for validating the human movement recorded by a surveillance camera and microphone. Therefore, the sensor based approaches are developed to make an alert signal through a buzzer or light, irrespective of the threshold value given to its output. But such sensor based technique also requires a human attention in the monitoring room. The motive of the proposed concept is to address such limitations by connecting the sensors with an Internet of Things (IoT) network and cloud platform for remote recording and monitoring purposes. The proposed work utilizes the Blynk IoT application and cloud server for the analytics.

Journal ArticleDOI
TL;DR: In this article , an advanced duct microphone array analysis based on a user friendly iterative Bayesian Inverse Approach (iBIA) has been successfully applied to assess the modal content and the Sound Power Level of fan/outlet guide vanes (OGV) broadband noise.

Journal ArticleDOI
TL;DR: In this paper , the authors proposed a new bionic acoustic sensor based on the fish ear structure, which is fabricated using microelectromechanical systems (MEMS) technology, and is encapsulated in castor oil, which has an acoustic impedance close to the human body.
Abstract: High-performance medical acoustic sensors are essential in medical equipment and diagnosis. Commercially available medical acoustic sensors are capacitive and piezoelectric types. When they are used to detect heart sound signals, there is attenuation and distortion due to the sound transmission between different media. This paper proposes a new bionic acoustic sensor based on the fish ear structure. Through theoretical analysis and finite element simulation, the optimal parameters of the sensitive structure are determined. The sensor is fabricated using microelectromechanical systems (MEMS) technology, and is encapsulated in castor oil, which has an acoustic impedance close to the human body. An electroacoustic test platform is built to test the performance of the sensor. The results showed that the MEMS bionic sensor operated with a bandwidth of 20–2k Hz. Its linearity and frequency responses were better than the electret microphone. In addition, the sensor was tested for heart sound collection application to verify its effectiveness. The proposed sensor can be effectively used in clinical auscultation and has a high SNR.

Journal ArticleDOI
TL;DR: In this article , two microphone linear arrays were used to locate the sound source in an indoor environment by using the generalized cross-correlation algorithm to calculate the TDOA, which is also designed to deal with the problem of delay in the reception of sound signals from two microphone arrays.
Abstract: Sound signals have been widely applied in various fields. One of the popular applications is sound localization, where the location and direction of a sound source are determined by analyzing the sound signal. In this study, two microphone linear arrays were used to locate the sound source in an indoor environment. The TDOA is also designed to deal with the problem of delay in the reception of sound signals from two microphone arrays by using the generalized cross-correlation algorithm to calculate the TDOA. The proposed microphone array system with the algorithm can successfully estimate the sound source’s location. The test was performed in a standardized chamber. This experiment used two microphone arrays, each with two microphones. The experimental results prove that the proposed method can detect the sound source and obtain good performance with a position error of about 2.0~2.3 cm and angle error of about 0.74 degrees. Therefore, the experimental results demonstrate the feasibility of the system.

Journal ArticleDOI
TL;DR: In this article , the authors compared three supervised machine learning-based methods based on acoustic and vibration data for fault bearing detection, which achieved a fault classification above 96% in a rotary machine with vibration and acoustic signals.

Journal ArticleDOI
TL;DR: In this article , the authors investigated the feedback controller design for active noise control headphones under the condition that the frequency responses of the primary and secondary paths corresponding to the feedback microphone do not match to the ones corresponding to human ear.
Abstract: This paper presents an investigation on the feedback controller design for active noise control headphones under the condition that the frequency responses of the primary and secondary paths corresponding to the feedback microphone do not match to the ones corresponding to the human ear. The influence of such mismatches on the performance are analyzed first, and then an optimization method is proposed to enhance the comprehensive performance at the human ear. In the proposed method, the feedback loop is constructed directly with the feedback microphone and any extra filters of the virtual sensing techniques are avoided. Cascade biquad filters are used as the controller, which is in accordance with current applications. A differential evolution algorithm was used to solve the proposed optimization problem, and the optimal parameters of the controller were found. It has been shown by the experimental results that, at the dummy head ear position, good noise reduction performance could be obtained at the low frequency band with limited noise enhancement for high frequencies, even if large frequency response mismatches exist.

Journal ArticleDOI
TL;DR: In this paper , a hybrid microphone array signal processing approach was proposed for the near-field scenario that combines the beamforming technique and DNNs to identify both the sound source location and content, which is quite suitable for a sound field which contains a dominant, stronger sound source and masked, weaker sound sources.
Abstract: Synchronistical localization, separation, and reconstruction for multiple sound sources are usually necessary in various situations, such as in conference rooms, living rooms, and supermarkets. To improve the intelligibility of speech signals, the application of deep neural networks (DNNs) has achieved considerable success in the area of time-domain signal separation and reconstruction. In this paper, we propose a hybrid microphone array signal processing approach for the nearfield scenario that combines the beamforming technique and DNN. Using this method, the challenge of identifying both the sound source location and content can be overcome. Moreover, the use of a sequenced virtual sound field reconstruction process enables the proposed approach to be quite suitable for a sound field which contains a dominant, stronger sound source and masked, weaker sound sources. Using this strategy, all traceable, mainly sound, sources can be discovered by loops in a given sound field. The operational duration and accuracy of localization are further improved by substituting the broadband weighted multiple signal classification (BW-MUSIC) method for the conventional delay-and-sum (DAS) beamforming algorithm. The effectiveness of the proposed method for localizing and reconstructing speech signals was validated by simulations and experiments with promising results. The localization results were accurate, while the similarity and correlation between the reconstructed and original signals was high.

Proceedings ArticleDOI
18 Jul 2022
TL;DR: In this paper , a Machine Learning (ML) model is devised which classifies audio signals in order to distinguish bruxism-related sounds, which is carried out by making use of a Convolutional Neural Network (CNN), while the overall ML model is designed to be deployed on an embedded device featuring a microcontroller and a microphone.
Abstract: Nowadays, more and more people suffer from sleep bruxism, involving repetitive jaw-muscle activity characterised by teeth clenching or teeth grinding during night. Albeit it is not a life-threatening disease, its timely diagnosis is fundamental to prevent teeth wear and preserve quality of life. However, this condition is usually detected by dentists when its effects are overt, and its remote and early diagnosis is challenging and almost unfeasible. Nevertheless, sleep bruxism entails sounds related to teeth grinding or teeth chattering. Therefore, they can be exploited to detect occurrences of sleep bruxism, resulting in a first stage screening favouring its quick diagnosis. To this end, this paper proposes an innovative methodology for the remote assessment and diagnosis of sleep bruxism. Specifically, a Machine Learning (ML) model is devised which classifies audio signals in order to distinguish bruxism-related sounds. The classifier is carried out by making use of a Convolutional Neural Network (CNN), while the overall ML model is designed to be deployed on an embedded device featuring a microcontroller and a microphone. In so doing, a tiny, portable and ready to use embedded ML-enable device is set up, that is able to detect bruxism phenomena directly at patients home.

Journal ArticleDOI
TL;DR: In this paper , a parametric signal-dependent method for the task of encoding microphone array signals into Ambisonic signals was presented and evaluated in the context of encoding a simulated seven-sensor microphone array, which is mounted on an AR headset device.
Abstract: This article proposes a parametric signal-dependent method for the task of encoding microphone array signals into Ambisonic signals. The proposed method is presented and evaluated in the context of encoding a simulated seven-sensor microphone array, which is mounted on an augmented reality headset device. Given the inherent flexibility of the Ambisonics format, and its popularity within the context of such devices, this array configuration represents a potential future use case for Ambisonic recording. However, due to its irregular geometry and non-uniform sensor placement, conventional signal-independent Ambisonic encoding is particularly limited. The primary aims of the proposed method are to obtain Ambisonic signals over a wider frequency band-width, and at a higher spatial resolution, than would otherwise be possible through conventional signal-independent encoding. The proposed method is based on a multi-source sound-field model and employs spatial filtering to divide the captured sound-field into its individual source and directional ambient components, which are subsequently encoded into the Ambisonics format at an arbitrary order. It is demonstrated through both objective and perceptual evaluations that the proposed parametric method outperforms conventional signal-independent encoding in the majority of cases.

Journal ArticleDOI
TL;DR: In this article , the laser Doppler vibrometer (LDV) was used as an optical laser microphone for human-robot interaction in extremely noisy service environments, where the robot irradiates an object near a speaker with a laser and measures the vibration of the object to record the sound.
Abstract: Domestic robots are often required to understand spoken commands in noisy environments, including service appliances' operating sounds. Most conventional domestic robots use electret condenser microphones (ECMs) to record the sound. However, the ECMs are known to be sensitive to the noise in the direction of sound arrival. The laser Doppler vibrometer (LDV), which has been widely used in the research field of measurement, has the potential to work as a new speech-input device to solve this problem. The aim of this paper is to investigate the effectiveness of using the LDV as an optical laser microphone for human-robot interaction in extremely noisy service environments. Our robot irradiates an object near a speaker with a laser and measures the vibration of the object to record the sound. We conducted three experiments to assess the performance of speech recognition using the optical laser microphone in various settings and showed stable performance in extremely noisy conditions compared with a conventional ECM. GRAPHICAL ABSTRACT