scispace - formally typeset
Search or ask a question
Author

Erik Marchi

Bio: Erik Marchi is an academic researcher from Apple Inc.. The author has contributed to research in topics: Recurrent neural network & Autism. The author has an hindex of 24, co-authored 70 publications receiving 3183 citations. Previous affiliations of Erik Marchi include Technische Universität München & University of Passau.

Papers published on a yearly basis

Papers
More filters
Proceedings ArticleDOI
20 Mar 2016
TL;DR: This paper proposes a solution to the problem of `context-aware' emotional relevant feature extraction, by combining Convolutional Neural Networks (CNNs) with LSTM networks, in order to automatically learn the best representation of the speech signal directly from the raw time representation.
Abstract: The automatic recognition of spontaneous emotions from speech is a challenging task. On the one hand, acoustic features need to be robust enough to capture the emotional content for various styles of speaking, and while on the other, machine learning algorithms need to be insensitive to outliers while being able to model the context. Whereas the latter has been tackled by the use of Long Short-Term Memory (LSTM) networks, the former is still under very active investigations, even though more than a decade of research has provided a large set of acoustic descriptors. In this paper, we propose a solution to the problem of ‘context-aware’ emotional relevant feature extraction, by combining Convolutional Neural Networks (CNNs) with LSTM networks, in order to automatically learn the best representation of the speech signal directly from the raw time representation. In this novel work on the so-called end-to-end speech emotion recognition, we show that the use of the proposed topology significantly outperforms the traditional approaches based on signal processing techniques for the prediction of spontaneous and natural emotions on the RECOLA database.

785 citations

Proceedings ArticleDOI
25 Aug 2013
TL;DR: The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech and introduces conflict in group discussions as a new task and deals with autism and its manifestations in speech.
Abstract: The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech. It further introduces conflict in group discussions as a new task and deals with autism and its manifestations in speech. Finally, emotion is revisited as task, albeit with a broader range of overall twelve enacted emotional states. In this paper, we describe these four Sub-Challenges, their conditions, baselines, and a new feature set by the openSMILE toolkit, provided to the participants. Index Terms: Computational Paralinguistics, Challenge, Social Signals, Conflict, Emotion, Autism

694 citations

Proceedings ArticleDOI
02 Sep 2013
TL;DR: A sparse auto encoder method for feature transfer learning for speech emotion recognition using a common emotion-specific mapping rule from a small set of labelled data in a target domain to improve the performance relative to learning each source domain independently.
Abstract: In speech emotion recognition, training and test data used for system development usually tend to fit each other perfectly, but further 'similar' data may be available. Transfer learning helps to exploit such similar data for training despite the inherent dissimilarities in order to boost a recogniser's performance. In this context, this paper presents a sparse auto encoder method for feature transfer learning for speech emotion recognition. In our proposed method, a common emotion-specific mapping rule is learnt from a small set of labelled data in a target domain. Then, newly reconstructed data are obtained by applying this rule on the emotion-specific data in a different domain. The experimental results evaluated on six standard databases show that our approach significantly improves the performance relative to learning each source domain independently.

335 citations

Journal ArticleDOI
TL;DR: A new technique that allows automated coding of non-verbal mode of communication (gestures) and offers the possibility of objective, evaluation of gestures, independent of human judgment is described.
Abstract: Autism spectrum conditions (autism) are diagnosed more frequently in boys than in girls. Females with autism may have been under-identified due to not only a male-biased understanding of autism but also females’ camouflaging. The study describes a new technique that allows automated coding of non-verbal mode of communication (gestures) and offers the possibility of objective, evaluation of gestures, independent of human judgment. The EyesWeb software platform and the Kinect sensor during two demonstration activities of ADOS-2 (Autism Diagnostic Observation Schedule, Second Edition) were used. The study group consisted of 33 high-functioning Polish girls and boys with formal diagnosis of autism or Asperger syndrome aged 5–10, with fluent speech, IQ average and above and their parents (girls with autism, n = 16; boys with autism, n = 17). All children were assessed during two demonstration activities of Module 3 of ADOS-2, administered in Polish, and coded using Polish codes. Children were also assessed with Polish versions of the Eyes and Faces Tests. Parents provided information on the author-reviewed Polish research translation of SCQ (Social Communication Questionnaire, Current and Lifetime) and Polish version of AQ Child (Autism Spectrum Quotient, Child). Girls with autism tended to use gestures more vividly as compared to boys with autism during two demonstration activities of ADOS-2. Girls with autism made significantly more mistakes than boys with autism on the Faces Test. All children with autism had high scores in AQ Child, which confirmed the presence of autistic traits in this group. The current communication skills of boys with autism reported by parents in SCQ were significantly better than those of girls with autism. However, both girls with autism and boys with autism improved in the social and communication abilities over the lifetime. The number of stereotypic behaviours in boys significantly decreased over life whereas it remained at a comparable level in girls with autism. High-functioning females with autism might present better on non-verbal (gestures) mode of communication than boys with autism. It may camouflage other diagnostic features. It poses risk of under-diagnosis or not receiving the appropriate diagnosis for this population. Further research is required to examine this phenomenon so appropriate gender revisions to the diagnostic assessments might be implemented.

214 citations

Proceedings ArticleDOI
19 Apr 2015
TL;DR: This paper presents a novel unsupervised approach based on a denoising autoencoder which significantly outperforms existing methods by achieving up to 93.4% F-Measure.
Abstract: Acoustic novelty detection aims at identifying abnormal/novel acoustic signals which differ from the reference/normal data that the system was trained with. In this paper we present a novel unsupervised approach based on a denoising autoencoder. In our approach auditory spectral features are processed by a denoising autoencoder with bidirectional Long Short-Term Memory recurrent neural networks. We use the reconstruction error between the input and the output of the autoencoder as activation signal to detect novel events. The autoencoder is trained on a public database which contains recordings of typical in-home situations such as talking, watching television, playing and eating. The evaluation was performed on more than 260 different abnormal events. We compare results with state-of-theart methods and we conclude that our novel approach significantly outperforms existing methods by achieving up to 93.4% F-Measure.

210 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

14,635 citations

Journal ArticleDOI
TL;DR: This paper presents the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling, and observes that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.
Abstract: Several variants of the long short-term memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful functional ANalysis Of VAriance framework. In total, we summarize the results of 5400 experimental runs ( $\approx 15$ years of CPU time), which makes our study the largest of its kind on LSTM networks. Our results show that none of the variants can improve upon the standard LSTM architecture significantly, and demonstrate the forget gate and the output activation function to be its most critical components. We further observe that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

4,746 citations

Journal ArticleDOI
TL;DR: This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied toTransfer learning, which can be applied to big data environments.
Abstract: Machine learning and data mining techniques have been used in numerous real-world applications. An assumption of traditional machine learning methodologies is the training data and testing data are taken from the same domain, such that the input feature space and data distribution characteristics are the same. However, in some real-world machine learning scenarios, this assumption does not hold. There are cases where training data is expensive or difficult to collect. Therefore, there is a need to create high-performance learners trained with more easily obtained data from different domains. This methodology is referred to as transfer learning. This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied to transfer learning. Lastly, there is information listed on software downloads for various transfer learning solutions and a discussion of possible future research work. The transfer learning solutions surveyed are independent of data size and can be applied to big data environments.

2,900 citations

Journal ArticleDOI
01 Jan 2021
TL;DR: Transfer learning aims to improve the performance of target learners on target domains by transferring the knowledge contained in different but related source domains as discussed by the authors, in which the dependence on a large number of target-domain data can be reduced for constructing target learners.
Abstract: Transfer learning aims at improving the performance of target learners on target domains by transferring the knowledge contained in different but related source domains. In this way, the dependence on a large number of target-domain data can be reduced for constructing target learners. Due to the wide application prospects, transfer learning has become a popular and promising area in machine learning. Although there are already some valuable and impressive surveys on transfer learning, these surveys introduce approaches in a relatively isolated way and lack the recent advances in transfer learning. Due to the rapid expansion of the transfer learning area, it is both necessary and challenging to comprehensively review the relevant studies. This survey attempts to connect and systematize the existing transfer learning research studies, as well as to summarize and interpret the mechanisms and the strategies of transfer learning in a comprehensive way, which may help readers have a better understanding of the current research status and ideas. Unlike previous surveys, this survey article reviews more than 40 representative transfer learning approaches, especially homogeneous transfer learning approaches, from the perspectives of data and model. The applications of transfer learning are also briefly introduced. In order to show the performance of different transfer learning models, over 20 representative transfer learning models are used for experiments. The models are performed on three different data sets, that is, Amazon Reviews, Reuters-21578, and Office-31, and the experimental results demonstrate the importance of selecting appropriate transfer learning models for different applications in practice.

2,433 citations

09 Mar 2012
TL;DR: Artificial neural networks (ANNs) constitute a class of flexible nonlinear models designed to mimic biological neural systems as mentioned in this paper, and they have been widely used in computer vision applications.
Abstract: Artificial neural networks (ANNs) constitute a class of flexible nonlinear models designed to mimic biological neural systems. In this entry, we introduce ANN using familiar econometric terminology and provide an overview of ANN modeling approach and its implementation methods. † Correspondence: Chung-Ming Kuan, Institute of Economics, Academia Sinica, 128 Academia Road, Sec. 2, Taipei 115, Taiwan; ckuan@econ.sinica.edu.tw. †† I would like to express my sincere gratitude to the editor, Professor Steven Durlauf, for his patience and constructive comments on early drafts of this entry. I also thank Shih-Hsun Hsu and Yu-Lieh Huang for very helpful suggestions. The remaining errors are all mine.

2,069 citations