scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Detection of asphyxia in infants using deep learning convolutional neural network (cnn) trained on mel frequency cepstrum coefficient (mfcc) features extracted from cry sounds

TL;DR: It is proved that Mel Frequency Cepstrum Coefficient (MFCC) feature generates from audio signal of infant cry could be used as input feature for the Convolution Neural Network (CNN).
Abstract: Deep Learning Neural Network (DLNN), is a new branch of machine learning with the ability for complex feature representation compared to traditional 4th-generation neural networks. Although it was mainly suited for image feature (since it was inspired by object recognition method of mammalian visual system), if any type of feature can be translate into image, other type of data could be fit for using DLNN. In this paper, we prove that Mel Frequency Cepstrum Coefficient (MFCC) feature generates from audio signal of infant cry could be used as input feature for the Convolution Neural Network (CNN). The result shows CNN can be used to classify between normal and pathological (asphyxiated) cry with 94.3% accuracy in training set and 92.8% accuracy in testing set.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A machine learning approach based on convolutional neural network to predict the time history response of the transmission tower during the complex wind input and the effectiveness of the CNN surrogate model is validated through a fragility model development, and its robustness is investigated.

23 citations

Proceedings ArticleDOI
01 Aug 2019
TL;DR: This paper captures the packets from the Narrow Band-Internet of Things (NB-IoT) transmission, Unmanned Aerial Vehicle (UAV) control, 4K video and Facebook access for emulating mMTC, URLLC, eMBB and Internet traffic in 5G.
Abstract: 5G supports more new services, including enhanced Mobile Broadband (eMBB), Ultra-reliable and Low Latency Communications (URLLC) and massive Machine Type Communications (mMTC). The Quality of Service (QoS) requirements of these 5G service types are different. In this paper, we capture the packets from the Narrow Band-Internet of Things (NB-IoT) transmission, Unmanned Aerial Vehicle (UAV) control, 4K video and Facebook access for emulating mMTC, URLLC, eMBB and Internet traffic in 5G. With the captured packets, we investigate using the machine learning technology to classify the packets based on the payload information. Specifically, the Convolutional Neural Network (CNN) model is performed to classify the application packets into suitable groups. In addition, this paper studies the effects of various parameters such as the kernel number, kernel size, pooling window size, the dropout rate and the payload length to find the optimal values for high accuracy and low latency.

16 citations


Cites methods from "Detection of asphyxia in infants us..."

  • ...) based on the detection model in [13]....

    [...]

Proceedings ArticleDOI
14 Jul 2019
TL;DR: This paper proposes a novel method through generating weighted prosodic features combined with acoustic features to form a merged feature matrix to classify asphyxiated baby crying effectively and has the benefits of keeping the robustness and resolution of the classification model simultaneously.
Abstract: Asphyxia is a respiratory injury that leads to a serious damage for infants. Early detection of asphyxia using Artificially Intelligent technology helps in reducing infant mortality rate when compared to traditional medical diagnosis, which is time consuming. In this paper, we propose a novel method through generating weighted prosodic features combined with acoustic features to form a merged feature matrix to classify asphyxiated baby crying effectively. The weights of the prosodic features are trained at the frame level with labeled data and can be optimized using deep learning approach with neural networks. The novel merged feature matrix is established with both acoustic and weighted prosodic features. The matrix has good ability to capture the diversity of variations within infant cries, especially for asphyxiated samples. Our method has the benefits of keeping the robustness and resolution of the classification model simultaneously. The effectiveness of this approach is evaluated on Baby Chillanto Database. Our method yields a significant reduction of 3.11%, 3.23%, and 1.43% absolute classification error rate compared with the results using single acoustic features, single prosodic features, and both acoustic and prosodic features, respectively. The testing accuracy in our method reaches 96.74%, which outperforms all other related studies on asphyxiated baby crying classification.

13 citations


Cites background from "Detection of asphyxia in infants us..."

  • ...8% accuracy in a specific testing data [7]....

    [...]

Journal ArticleDOI
TL;DR: PigTalk is a new approach that automatically mitigates piglet crushing, which could not be achieved in the past, and is indicated that PigTalk can save piglets within 0.05 s with 99.93% of the successful rate.
Abstract: On pig farms, many piglets die because they are crushed when sows roll from side to side or lie down. On average, 1.2 piglets are crushed by sows every day. To resolve the piglet mortality issue, this article proposes PigTalk, an artificial intelligence (AI) based Internet of Things (IoT) platform for detecting and mitigating piglet crushing. Through real-time analysis of the voice data collected in a farrowing house, PigTalk detects if any piglet screaming occurs, and automatically activates sow-alert actuators for emergency handling of the crushing event. We propose an audio clip transform approach to pre-process the raw voice data, and utilizes min-max scaling in machine learning (ML) to detect piglet screams. In our first contribution, the above data preprocessing method together with subtle parameter setups of the machine learning model improve the piglet scream detection accuracy up to 99.4%, which is better than the previous solutions (up to 92.8%). In our second contribution, we show how to design two cyber IoT devices, i.e., DataBank for data pre-processing and ML_device for real-time AI to automatically trigger actuators such as floor vibration and water drop to force a sow to stand up. We conduct analytic analysis and simulation to investigate how the detection delay affects the critical time period to save crushed piglets. Our study indicates that PigTalk can save piglets within 0.05 s with 99.93% of the successful rate. Such results are validated in a commercial farrowing house. PigTalk is a new approach that automatically mitigates piglet crushing, which could not be achieved in the past.

10 citations


Cites methods or result from "Detection of asphyxia in infants us..."

  • ...Similar study was conducted to detect infant cry [11] and the prediction accuracy can be up to 92....

    [...]

  • ...Based on the detection model in [11], we develop the CNN model in the SAs of DataBank and ML_device with the steps illustrated in Fig....

    [...]

  • ...1) Similar to the studies in [10] and [11], PigTalk uses CNN to interpret vocalizations from farrowing cages in real time to detect piglet screaming....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a classification model for classifying students' performance in SijilPelajaran Malaysia in order to help teachers plan suitable teaching activities for their students based on the students's performance was proposed.
Abstract: The purpose of this paper is to propose a classification model for classifying students’ performance in SijilPelajaran Malaysia in order to help teachers plan suitable teaching activities for their students based on the students’ performance Five classifier algorithms have been used during the process which are Naive Bayes, Random Tree, Multi Class Classifier, Conjunctive Rule and Nearest Neighbour Data was collected from MaktabRendahSains MARA Kuala Berang, Terengganu, Malaysia starting May 2011 until December 2014 The students’ performance was evaluated based on the category of students according to their SPM Results Parameters that contribute to students’ performance such as stream, state, gender and hometown are also investigated along with the examination dataThis research shows that first semester results can be used to identify students’ performance Keywords : educational data mining; classification model; feature selection

9 citations


Cites methods from "Detection of asphyxia in infants us..."

  • ...outperforms the prediction decision tree and neural network [30-33] methods....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.
Abstract: We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.

15,055 citations


"Detection of asphyxia in infants us..." refers background in this paper

  • ...DLNN was proposed by [4] as an improvement to the conventional fourth-generation neural networks....

    [...]

Journal ArticleDOI
TL;DR: The de-caying error flow is theoretically analyzed, methods trying to overcome vanishing gradients are briefly discussed, and experiments comparing conventional algorithms and alternative methods are presented.
Abstract: Recurrent nets are in principle capable to store past inputs to produce the currently desired output. Because of this property recurrent nets are used in time series prediction and process control. Practical applications involve temporal dependencies spanning many time steps, e.g. between relevant inputs and desired outputs. In this case, however, gradient based learning methods take too much time. The extremely increased learning time arises because the error vanishes as it gets propagated back. In this article the de-caying error flow is theoretically analyzed. Then methods trying to overcome vanishing gradients are briefly discussed. Finally, experiments comparing conventional algorithms and alternative methods are presented. With advanced methods long time lag problems can be solved in reasonable time.

2,203 citations


"Detection of asphyxia in infants us..." refers background in this paper

  • ...Therefore, ReLU was proposed as an alternative to tangent-sigmoid activation function to avoid this issue [18]....

    [...]

Posted Content
TL;DR: In this article, a generalized large-margin softmax (L-Softmax) loss is proposed to encourage intra-class compactness and inter-class separability between learned features.
Abstract: Cross-entropy loss together with softmax is arguably one of the most common used supervision components in convolutional neural networks (CNNs). Despite its simplicity, popularity and excellent performance, the component does not explicitly encourage discriminative learning of features. In this paper, we propose a generalized large-margin softmax (L-Softmax) loss which explicitly encourages intra-class compactness and inter-class separability between learned features. Moreover, L-Softmax not only can adjust the desired margin but also can avoid overfitting. We also show that the L-Softmax loss can be optimized by typical stochastic gradient descent. Extensive experiments on four benchmark datasets demonstrate that the deeply-learned features with L-softmax loss become more discriminative, hence significantly boosting the performance on a variety of visual classification and verification tasks.

680 citations

Proceedings ArticleDOI
01 Jun 2016
TL;DR: Wang et al. as mentioned in this paper introduced a complete framework for the object detection from video (VID) task based on still-image object detection and general object tracking, and a temporal convolution network is proposed to incorporate temporal information to regularize the detection results and shows its effectiveness for the task.
Abstract: Deep Convolution Neural Networks (CNNs) have shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation. For object detection, particularly in still images, the performance has been significantly increased last year thanks to powerful deep networks (e.g. GoogleNet) and detection frameworks (e.g. Regions with CNN features (RCNN)). The lately introduced ImageNet [6] task on object detection from video (VID) brings the object detection task into the video domain, in which objects' locations at each frame are required to be annotated with bounding boxes. In this work, we introduce a complete framework for the VID task based on still-image object detection and general object tracking. Their relations and contributions in the VID task are thoroughly studied and evaluated. In addition, a temporal convolution network is proposed to incorporate temporal information to regularize the detection results and shows its effectiveness for the task. Code is available at https://github.com/ myfavouritekk/vdetlib.

338 citations

Journal ArticleDOI
TL;DR: A pattern recognition system which works with the mechanism of the neocognitron, a neural network model for deformation-invariant visual pattern recognition, is discussed, which has been trained to recognize 35 handwritten alphanumeric characters.
Abstract: A pattern recognition system which works with the mechanism of the neocognitron, a neural network model for deformation-invariant visual pattern recognition, is discussed. The neocognition was developed by Fukushima (1980). The system has been trained to recognize 35 handwritten alphanumeric characters. The ability to recognize deformed characters correctly depends strongly on the choice of the training pattern set. Some techniques for selecting training patterns useful for deformation-invariant recognition of a large number of characters are suggested. >

249 citations


"Detection of asphyxia in infants us..." refers background or methods in this paper

  • ...The work in this paper focuses on CNN as it was reported as an excellent tool in the deep learning framework [5]....

    [...]

  • ...One of its famous usage examples is handwriting recognition [5]....

    [...]