Recognition of Audio Depression Based on Convolutional Neural Network and Generative Antagonism Network Model

doi:10.1109/ACCESS.2020.2998532

Home
/
Papers
/
Recognition of Audio Depression Based on Convolutional Neural Network and Generative Antagonism Network Model

Journal Article•DOI•

Recognition of Audio Depression Based on Convolutional Neural Network and Generative Antagonism Network Model

Zhiyong Wang, Longxi Chen, Lifeng Wang, Guangqiang Diao

29 May 2020-IEEE Access (Institute of Electrical and Electronics Engineers (IEEE))-Vol. 8, pp 101181-101191

TL;DR: An audio depression recognition method based on convolution neural network and generative antagonism network model that effectively reduces the depression recognition error compared with other existing methods, and the RMSE and MAE values obtained on the two datasets are better than the comparison algorithm by more than 5%.

read less

Abstract: This paper proposes an audio depression recognition method based on convolution neural network and generative antagonism network model. First of all, preprocess the data set, remove the long-term mute segments in the data set, and splice the rest into a new audio file. Then, the features of speech signal, such as Mel-scale Frequency Cepstral Coefficients (MFCCs), short-term energy and spectral entropy, are extracted based on audio difference normalization algorithm. The extracted matrix vector feature data, which represents the unique attributes of the subjects' own voice, is the data base for model training. Then, based on the combination of CNN and GAN, DR AudioNet is used to build the model of depression recognition research. With the help of DR AudioNet, the former model is optimized and the recognition classification is completed through the normalization characteristics of the two adjacent segments before and after the current audio segment. The experimental results on AViD-Corpus and DAIC-WOZ datasets show that the proposed method effectively reduces the depression recognition error compared with other existing methods, and the RMSE and MAE values obtained on the two datasets are better than the comparison algorithm by more than 5%.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

Dysarthria Detection Using Convolutional Neural Network

[...]

Pratibha Dumane¹, Bilal Hungund², Satishkumar S. Chavan¹•Institutions (2)

Don Bosco Institute of Technology, Mumbai¹, Narsee Monjee Institute of Management Studies²

01 Jan 2021

TL;DR: In this article, the use of Convolutional Neural Network based model for identifying whether a person is suffering from dysarthria is proposed, which makes use of several speech features viz. zero crossing rates, MFCCs, spectral centroids, spectral roll off for analysis of the speech signals.

...read moreread less

Abstract: Patients suffering from dysarthria have trouble controlling their muscles involved in speaking, thereby leading to spoken speech that is indiscernible. There have been a number of studies that have addressed speech impairments; however additional research is required in terms of considering speakers with the same impairment though with variable condition of the impairment. The type of impairment and the level of severity will help in assessing the progression of the dysarthria and will also help in planning the therapy.This paper proposes the use of Convolutional Neural Network based model for identifying whether a person is suffering from dysarthria. Early diagnosis is a step towards better management of the impairment. The proposed model makes use of several speech features viz. zero crossing rates, MFCCs, spectral centroids, spectral roll off for analysis of the speech signals. TORGO speech signal database is used for the training and testing of the proposed model. CNN shows promising results for early diagnosis of dysarthric speech with an accuracy score of 93.87%.

...read moreread less

4 citations

Journal Article•DOI•

A Novel Smart Depression Recognition Method Using Human-Computer Interaction System

[...]

Lijun Xu¹, Jianjun Hou¹, Jun Gao²•Institutions (2)

Nanjing Institute of Technology¹, Siemens²

26 May 2021-Wireless Communications and Mobile Computing

TL;DR: Wang et al. as mentioned in this paper used an audio depression regression model (DR AudioNet) based on a convolutional neural network (CNN) and a long short-term memory network (LSTM) to identify the prevalence of depression patients.

...read moreread less

Abstract: In recent years, depression not only makes patients suffer from psychological pain such as self-blame but also has a high disability mortality rate. Early detection and diagnosis of depression and timely treatment of patients with different levels can improve the cure rate. Because there are quite a few potential depression patients who are not aware of their illness, some even suspect that they are sick but are unwilling to go to the hospital. In response to this situation, this research designed an intelligent depression recognition human-computer interaction system. The main contributions of this research are (1) the use of an audio depression regression model (DR AudioNet) based on a convolutional neural network (CNN) and a long-short-term memory network (LSTM) to identify the prevalence of depression patients. And it uses a multiscale audio differential normalization (MADN) feature extraction algorithm. The MADN feature describes the characteristics of nonpersonalized speech, and two network models are designed based on the MADN features of two adjacent segments of audio. Comparative experiments show that the method is effective in identifying depression. (2) Based on the research conclusion of the previous step, a human-computer interaction system is designed. After the user inputs his own voice, the final recognition result is output through the recognition of the network model used in this research. Visual operation is more convenient for users and has a practical application value.

...read moreread less

3 citations

Journal Article•DOI•

Stochastic modelling of transition dynamic of mixed mood episodes in bipolar disorder

[...]

K A Yashaswini¹, Adithya Kishore Saxena¹•Institutions (1)

Presidency University, Kolkata¹

01 Feb 2022-International Journal of Electrical and Computer Engineering

TL;DR: In this article, the authors proposed a model to detect mixed-mood episode which is characterized by combination of various symptoms of bipolar disorder in random, unpredictable, and uncertain manner.

...read moreread less

Abstract: In the present state of health and wellness, Mental illness is always deemed less importance compared to other forms of physical illness. In reality, mental illness causes serious multi-dimensional adverse effect to the subject with respect to personal life, social life, as well as financial stability. In the area of mental illness, bipolar disorder is one of the most prominent type which can be triggered by any external stimulation to the subject suffering from this illness. There diagnosis as well as treatment process of bipolar disorder is very much different from other form of illness where the first step of impediment is the correct diagnosis itself. According to the standard body, there are classification of discrete forms of bipolar disorder viz. type-I, type-II, cyclothymic, etc. which is characterized by specific mood associated with depression and mania. However, there is no study associated with mixed-mood episode detection which is characterized by combination of various symptoms of bipolar disorder in random, unpredictable, and uncertain manner. Hence, the model contributes to obtain granular information with dynamics of mood transition. The simulated outcome of the proposed system in MATLAB shows that resulting model is capable enough for detection of mixed mood episode precisely.

...read moreread less

2 citations

Journal Article•DOI•

IWGAN: Anomaly Detection in Airport Based on Improved Wasserstein Generative Adversarial Network

[...]

Ko-Wei Huang, Guan-Wei Chen, Ziwen Huang, Shih-Hsiung Lee

20 Jan 2023-Applied Sciences

TL;DR: In this article , the improved Wasserstein Skip-Connection GAN (IWGAN) was proposed to detect anomalies and hazards in the airport environment using autoencoders and GANs.

...read moreread less

Abstract: Anomaly detection is an important research topic in the field of artificial intelligence and visual scene understanding. The most significant challenge in real-world anomaly detection problems is the high imbalance of available data (i.e., non-anomalous versus anomalous data). This limits the use of supervised learning methods. Furthermore, the abnormal—and even normal—datasets in the airport field are relatively insufficient, causing them to be difficult to use to train deep neural networks when conducting experiments. Because generative adversarial networks (GANs) are able to effectively learn the latent vector space of all images, the present study adopted a GAN variant with autoencoders to create a hybrid model for detecting anomalies and hazards in the airport environment. The proposed method, which integrates the Wasserstein-GAN (WGAN) and Skip-GANomaly models to distinguish between normal and abnormal images, is called the Improved Wasserstein Skip-Connection GAN (IWGAN). In the experimental stage, we evaluated different hyper-parameters—including the activation function, learning rate, decay rate, training times of discriminator, and method of label smoothing—to identify the optimal combination. The proposed model’s performance was compared with that of existing models, such as U-Net, GAN, WGAN, GANomaly, and Skip-GANomaly. Our experimental results indicate that the proposed model yields exceptional performance.

...read moreread less

1 citations

Journal Article•DOI•

A hybrid model for depression detection using deep learning

[...]

Vandana, Nikhil Marriwala, Deepti Chaudhary

01 Dec 2022-Measurement: Sensors

TL;DR: In this paper , a hybrid model is proposed for depression detection using deep learning algorithms, which mainly combines textual features and audio features of patient's responses, and the results show that audio CNN is a good model for depression classification.

...read moreread less

References

PDF

Open Access

More filters

Journal Article•DOI•

Speech emotion recognition using deep 1D & 2D CNN LSTM networks

[...]

Jianfeng Zhao¹, Jianfeng Zhao², Xia Mao², Lijiang Chen²•Institutions (2)

Inner Mongolia University of Science and Technology¹, Beihang University²

01 Jan 2019-Biomedical Signal Processing and Control

TL;DR: The experimental results show that the designed networks achieve excellent performance on the task of recognizing speech emotion, especially the 2D CNN LSTM network outperforms the traditional approaches, Deep Belief Network (DBN) and CNN on the selected databases.

...read moreread less

599 citations

"Recognition of Audio Depression Bas..." refers methods in this paper

...For the audio recognition problem, scholars have proposed many methods, in [11] they constructed a one-dimensional long-short term memory (LSTM) and a two-dimensional LSTM to extract local and global emotion related features in speech, which can improve the accuracy of original model by combining the two features....
[...]

Journal Article•DOI•

Learning Affective Features With a Hybrid Deep Model for Audio–Visual Emotion Recognition

[...]

Shiqing Zhang¹, Shiliang Zhang¹, Tiejun Huang¹, Wen Gao¹, Qi Tian² - Show less +1 more•Institutions (2)

Peking University¹, University of Texas at San Antonio²

01 Oct 2018-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: This paper proposes to bridge the emotional gap by using a hybrid deep model, which first produces audio–visual segment features with Convolutional Neural Networks and 3D-CNN, then fuses audio– visual segment features in a Deep Belief Networks (DBNs).

...read moreread less

Abstract: Emotion recognition is challenging due to the emotional gap between emotions and audio–visual features. Motivated by the powerful feature learning ability of deep neural networks, this paper proposes to bridge the emotional gap by using a hybrid deep model, which first produces audio–visual segment features with Convolutional Neural Networks (CNNs) and 3D-CNN, then fuses audio–visual segment features in a Deep Belief Networks (DBNs). The proposed method is trained in two stages. First, CNN and 3D-CNN models pre-trained on corresponding large-scale image and video classification tasks are fine-tuned on emotion recognition tasks to learn audio and visual segment features, respectively. Second, the outputs of CNN and 3D-CNN models are combined into a fusion network built with a DBN model. The fusion network is trained to jointly learn a discriminative audio–visual segment feature representation. After average-pooling segment features learned by DBN to form a fixed-length global video feature, a linear Support Vector Machine is used for video emotion classification. Experimental results on three public audio–visual emotional databases, including the acted RML database, the acted eNTERFACE05 database, and the spontaneous BAUM-1s database, demonstrate the promising performance of the proposed method. To the best of our knowledge, this is an early work fusing audio and visual cues with CNN, 3D-CNN, and DBN for audio–visual emotion recognition.

...read moreread less

249 citations

"Recognition of Audio Depression Bas..." refers background in this paper

...Clinical observations and studies have found that there is a significant correlation between the audio characteristics and the depression degrees [4], [5]....
[...]

Proceedings Article•DOI•

DepAudioNet: An Efficient Deep Model for Audio based Depression Classification

[...]

Xingchen Ma¹, Hongyu Yang¹, Qiang Chen¹, Di Huang¹, Yunhong Wang¹ - Show less +1 more•Institutions (1)

Beihang University¹

16 Oct 2016

TL;DR: A deep model is proposed, namely DepAudioNet, to encode the depression related characteristics in the vocal channel, combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to deliver a more comprehensive audio representation.

...read moreread less

Abstract: This paper presents a novel and effective audio based method on depression classification. It focuses on two important issues, \emph{i.e.} data representation and sample imbalance, which are not well addressed in literature. For the former one, in contrast to traditional shallow hand-crafted features, we propose a deep model, namely DepAudioNet, to encode the depression related characteristics in the vocal channel, combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to deliver a more comprehensive audio representation. For the latter one, we introduce a random sampling strategy in the model training phase to balance the positive and negative samples, which largely alleviates the bias caused by uneven sample distribution. Evaluations are carried out on the DAIC-WOZ dataset for the Depression Classification Sub-challenge (DCC) at the 2016 Audio-Visual Emotion Challenge (AVEC), and the experimental results achieved clearly demonstrate the effectiveness of the proposed approach.

...read moreread less

183 citations

"Recognition of Audio Depression Bas..." refers background in this paper

...[27] proposed a binary classification network structure for identifying depression in the 2016 AVEC competition, which is mainly composed of CNN and LSTM....
[...]

Journal Article•DOI•

Deep Learning for the Internet of Things

[...]

Shuochao Yao¹, Yiran Zhao¹, Aston Zhang², Shaohan Hu³, Huajie Shao¹, Chao Zhang¹, Lu Su⁴, Tarek Abdelzaher¹ - Show less +4 more•Institutions (4)

University of Illinois at Urbana–Champaign¹, Amazon.com², IBM³, State University of New York System⁴

24 May 2018-IEEE Computer

TL;DR: The authors discuss several core challenges in embedded and mobile deep learning, as well as recent solutions demonstrating the feasibility of building IoT applications that are powered by effective, efficient, and reliable deep learning models.

...read moreread less

Abstract: How can the advantages of deep learning be brought to the emerging world of embedded IoT devices? The authors discuss several core challenges in embedded and mobile deep learning, as well as recent solutions demonstrating the feasibility of building IoT applications that are powered by effective, efficient, and reliable deep learning models.

...read moreread less

106 citations

"Recognition of Audio Depression Bas..." refers background or methods in this paper

...The number of filters M is between 20-40, and we setM = 40 according to [8]....
[...]
...Currently, the Beck Depression Inventory II (BDI-II) is most widely used selfassessment scale for depressive symptoms and is the tool to assess the degrees of depression [8]....
[...]

Journal Article•DOI•

Acoustic characteristics of rough voice: Subharmonics*

[...]

Koichi Omori¹, Hisayoshi Kojima², Hisayoshi Kojima¹, Rajesh Kakani², Rajesh Kakani¹, David H. Slavit¹, David H. Slavit², Stanley M. Blaugrund¹, Stanley M. Blaugrund² - Show less +5 more•Institutions (2)

Lenox Hill Hospital¹, Kyoto University²

01 Mar 1997-Journal of Voice

TL;DR: This study investigates the relationship between rough voice and the presence of subharmonics, which correspond to smaller yet distinct peaks located between two consecutive harmonic peaks in the power spectrum.

...read moreread less

62 citations

"Recognition of Audio Depression Bas..." refers methods in this paper

...Based on the measurement method of signal harmonics in Omori [26] et al. work, we can use the concept of entropy to describe this hypothesis....
[...]
...Based on the measurement method of signal harmonics in Omori [26] et al....
[...]
...[26] K. Omori, H. Kojima, R. Kakani, D. H. Slavit, and S. M. Blaugrund, ‘‘Acoustic characteristics of rough voice: Subharmonics,’’ J. Voice, vol. 11, no. 1, pp. 40–47, Mar. 1997....
[...]