Indoor location identification technologies for real-time IoT-based applications: An inclusive survey

Insects have a close relationship with the humanity, in both positive and negative ways. Mosquito borne diseases kill millions of people and insect pests consume and destroy around US $40 billion worth of food each year. In contrast, insects pollinate at least two-thirds of all the food consumed in the world. In order to control populations of disease vectors and agricultural pests, researchers in entomology have developed numerous methods including chemical, biological and mechanical approaches. However, without the knowledge of the exact location of the insects, the use of these techniques becomes costly and inefficient. We are developing a novel sensor as a tool to control disease vectors and agricultural pests. This sensor, which is built from inexpensive commodity electronics, captures insect flight information using laser light and classifies the insects according to their species. The use of machine learning techniques allows the sensor to automatically identify the species without human intervention. Finally, the sensor can provide real-time estimates of insect species with virtually no time gap between the insect identification and the delivery of population estimates. In this paper, we present our solution to the most important challenge to make this sensor practical: the creation of an accurate classification system. We show that, with the correct combination of feature extraction and machine learning techniques, we can achieve an accuracy of almost 90 % in the task of identifying the correct insect species among nine species. Specifically, we show that we can achieve an accuracy of 95 % in the task of correctly recognizing if a given event was generated by a disease vector mosquito.

Exploring Low Cost Laser Sensors to Identify Flying Insect Species

An increase in the accuracy of identification of Activities of Daily Living (ADL) is very important for different goals of Enhanced Living Environments and for Ambient Assisted Living (AAL) tasks. This increase may be achieved through identification of the surrounding environment. Although this is usually used to identify the location, ADL recognition can be improved with the identification of the sound in that particular environment. This paper reviews audio fingerprinting techniques that can be used with the acoustic data acquired from mobile devices. A comprehensive literature search was conducted in order to identify relevant English language works aimed at the identification of the environment of ADLs using data acquired with mobile devices, published between 2002 and 2017. In total, 40 studies were analyzed and selected from 115 citations. The results highlight several audio fingerprinting techniques, including Modified discrete cosine transform (MDCT), Mel-frequency cepstrum coefficients (MFCC), Principal Component Analysis (PCA), Fast Fourier Transform (FFT), Gaussian mixture models (GMM), likelihood estimation, logarithmic moduled complex lapped transform (LMCLT), support vector machine (SVM), constant Q transform (CQT), symmetric pairwise boosting (SPB), Philips robust hash (PRH), linear discriminant analysis (LDA) and discrete cosine transform (DCT).

https://rua.ua.es/dspace/bitstream/10045/72592/1/2018_Pires_etal_Sensors.pdf

Recognition of Activities of Daily Living Based on Environmental Analyses Using Audio Fingerprinting Techniques: A Systematic Review.

Speech data collected under uncontrolled environment need to be processed to build a robust automatic speech recognition system. In this paper, a method is proposed to process the degraded speech signal. Initially, the significance of the spectral subtraction with voice activity detection (SS-VAD) and magnitude squared spectrum estimators are studied for different types of noises. In SS-VAD method, the degraded speech data is sampled and windowed into 50% overlapping. The VAD is used to detect the voiced regions of speech signal. The minimum mean square error-short time power spectrum, minimum mean square error-spectrum power based on zero crossing (MMSE-SPZC) and maximum a posteriori estimators are studied individually. These MSS estimators are implemented on the assumption that the magnitude squared spectrum of the degraded speech signal is the sum of the clean (original) speech signal and noise model. The experimental results show that the MMSE-SPZC estimator gives better performance compared to the other two methods. This estimator is combined with SS-VAD method to improve the performance. In this paper, the combined SS-VAD and MMSE-SPZC method, yields better speech quality by reducing noise in degraded speech signal compared to the individual methods.

Speech enhancement by combining spectral subtraction and minimum mean square error-spectrum power estimator based on zero crossing

Combined with the linear prediction-minimum mean squared error (LP-MMSE), an efficient perceptual hashing algorithm based on improved spectral entropy for speech authentication was proposed in this paper. The linear prediction analysis is conducted on speech signal after preprocessing, framing and adding windows, and obtained the minimum mean squared error coefficient matrix. And then, the spectral entropy parameter matrix of each frame is calculated by using improved spectral entropy method. And the final binary perceptual hashing sequence is generated based on the above two matrices, and the speech authentication is completed. Comparing the experimental results of combining the Teager energy operator (TEO) with the linear predictive coefficients (LPC), LP-MMSE and line spectrum pair (LSP) coefficient respectively, it can be seen that the proposed algorithm had a good compromise between robustness, discrimination and authentication efficiency, and the proposed algorithm can meet the requirement of real-time speech authentication in speech communication. Experimental results show that the proposed algorithm was better than other existing methods in compactness.

An efficient perceptual hashing based on improved spectral entropy for speech authentication

Audio fingerprint is a compact unique content-based digital signature of an audio signal. It's an interesting technique that can be used to identify unknown audio clips. Generally, it mainly consists of two parts, i.e. fingerprint extracting from audio signals and fingerprint matching against those stored in a fingerprint database that has been set up beforehand. With the rapid growth in the quantity of audio files, the probability of collision of different audio signals become relatively high and it has become very challenging to retrieve an audio recording in real-time from the ever-growing huge database. In this letter, we introduce a reliable audio fingerprinting system, which extracts audio fingerprints from an audio signal based on its spectral energy structure. Preliminary experimental results suggest that this fingerprinting system can work well in the application of broadcast monitoring. (4 pages)

An audio fingerprinting system based on spectral energy structure

Audio is always being affected by outside noise during the communications. Conventional spectral subtraction (CSS) is widely used due to its characteristic of low computational complexity, high real-time and easy to achieve. But its fatal flaw is that the de-noised signals contain a great deal of "music noise". The paper aims to reduce "music noise" as much as possible. Voice Activity Detection (VAD) is used to detect the starting and ending of the audio, so we use silent segment to estimate noise spectrum exactly. Furthermore, it introduces spectral decay factor to estimate noise effectively. Finally, some additional de-noising modules, such as smooth processing, threshold calculation and music noise removing, are added to the system in order to make system work stability. We use segment SNR as a evaluation of de-noising effect. Experiment results dedicate its flexibility.

An improved spectral subtraction method

Audio fingerprinting, like human fingerprint, identifies audio clips from a large number of databases successfully, even when the audio signals are slightly or seriously distorted. In the paper, based on 2-D, we propose an improved audio fingerprinting extraction algorithm which was proposed by Shazam company. The algorithm uses a combinatorial hashed time-frequency analysis of the audio, yielding unusual properties in which multiple tracks mixed together may each be identified. The results of experiment verify the improvement in the retrieval speed and accuracy.

Robust audio fingerprint extraction algorithm based on 2-D chroma

In audio fingerprinting, an audio clip must be recognized by matching an extracted fingerprint to a database of previously computed fingerprints. The fingerprints should reduce the dimensionality of the input significantly, provide discrimination among different audio clips, and, at the same time, be invariant to distorted versions of the same audio clip. In this paper, we design fingerprints addressing the above issues by extracting the audio fingerprints from the Spectral Flux of the clipped signal. Spectral Flux (SF) is a measure of how quickly the power spectrum of a signal is changing, calculated by comparing the power spectrum for one frame against the power spectrum from the previous frame. More precisely, it is usually calculated as the 2-norm (also known as the Euclidean distance) between the two normalised spectra. By using the AF as the feature of our algorithm we retrieval the audio clips from the database which has store some fingerprints computed previously. We test the robustness of the fingerprints under a large number of distortions. And the experimental results show that the proposed algorithm performance well in audio retrieval.

Audio fingerprint based on Spectral Flux for audio retrieval

With the rapid expansion of modern multimedia data, a number of audio fingerprinting algorithms have been proposed. Audio fingerprint is a compact unique content-based digital signature of an audio signal, which can be used to identify unknown audio clips. Due to the interference of different kinds of noise, audio fingerprinting is still a challenging task. In this paper, Nearest Neighbor Estimation (NNE) is used to reduce the interference of the noise. Firstly, audio feature points are extracted from audio clips. Then NNE is used to reduce the impact of noise on the feature points. Experimental results show that NNE reduces the influence of noise effectively in white noise environment.

Ram Swaminathan

Papers

An audio fingerprinting system based on spectral energy structure

An improved spectral subtraction method

Robust audio fingerprint extraction algorithm based on 2-D chroma

Audio fingerprint based on Spectral Flux for audio retrieval

Noise reduction based on Nearest Neighbor Estimation for audio feature extraction