Journal•ISSN: 1783-7677

Journal on Multimodal User Interfaces

Springer Science+Business Media

About: Journal on Multimodal User Interfaces is an academic journal published by Springer Science+Business Media. The journal publishes majorly in the area(s): Sonification & Usability. It has an ISSN identifier of 1783-7677. Over the lifetime, 396 publications have been published receiving 7360 citations. The journal is also known as: Journal on multimodal user interfaces (Print) & Multimodal user interfaces (Internet).

...read moreread less

Topics: Sonification, Usability, Multimodal interaction, User interface, Gesture ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

EmoNets: Multimodal deep learning approaches for emotion recognition in video

[...]

Samira Ebrahimi Kahou¹, Xavier Bouthillier¹, Pascal Lamblin¹, Caglar Gulcehre¹, Vincent Michalski², Kishore Konda², Sébastien Jean¹, Pierre Froumenty¹, Yann N. Dauphin¹, Nicolas Boulanger-Lewandowski¹, Raul Chandias Ferrari¹, Mehdi Mirza¹, David Warde-Farley¹, Aaron Courville¹, Pascal Vincent¹, Roland Memisevic¹, Chris Pal¹, Yoshua Bengio¹ - Show less +14 more•Institutions (2)

Université de Montréal¹, Goethe University Frankfurt²

01 Jun 2016-Journal on Multimodal User Interfaces

TL;DR: In this article, the authors presented an approach to learn several specialist models using deep learning techniques, each focusing on one modality, including CNN, deep belief net, K-means based bag-of-mouths, and relational autoencoder.

...read moreread less

Abstract: The task of the Emotion Recognition in the Wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies. The videos depict acted-out emotions under realistic conditions with a large degree of variation in attributes such as pose and illumination, making it worthwhile to explore approaches which consider combinations of features from multiple modalities for label assignment. In this paper we present our approach to learning several specialist models using deep learning techniques, each focusing on one modality. Among these are a convolutional neural network, focusing on capturing visual information in detected faces, a deep belief net focusing on the representation of the audio stream, a K-Means based “bag-of-mouths” model, which extracts visual features around the mouth region and a relational autoencoder, which addresses spatio-temporal aspects of videos. We explore multiple methods for the combination of cues from these modalities into one common classifier. This achieves a considerably greater accuracy than predictions from our strongest single-modality classifier. Our method was the winning submission in the 2013 EmotiW challenge and achieved a test set accuracy of 47.67 % on the 2014 dataset.

...read moreread less

357 citations

Journal Article•DOI•

Hierarchical committee of deep convolutional neural networks for robust facial expression recognition

[...]

Bo-Kyeong Kim¹, Jihyeon Roh¹, Suh-Yeon Dong¹, Soo-Young Lee¹•Institutions (1)

KAIST¹

16 Jan 2016-Journal on Multimodal User Interfaces

TL;DR: This paper builds a hierarchical architecture of the committee with exponentially-weighted decision fusion of deep CNNs for robust facial expression recognition for the third Emotion Recognition in the Wild (EmotiW2015) challenge.

...read moreread less

Abstract: This paper describes our approach towards robust facial expression recognition (FER) for the third Emotion Recognition in the Wild (EmotiW2015) challenge. We train multiple deep convolutional neural networks (deep CNNs) as committee members and combine their decisions. To improve this committee of deep CNNs, we present two strategies: (1) in order to obtain diverse decisions from deep CNNs, we vary network architecture, input normalization, and random weight initialization in training these deep models, and (2) in order to form a better committee in structural and decisional aspects, we construct a hierarchical architecture of the committee with exponentially-weighted decision fusion. In solving a seven-class problem of static FER in the wild for the EmotiW2015, we achieve a test accuracy of 61.6 %. Moreover, on other public FER databases, our hierarchical committee of deep CNNs yields superior performance, outperforming or competing with state-of-the-art results for these databases.

...read moreread less

219 citations

Journal Article•DOI•

Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis

[...]

Loic Kessous, Ginevra Castellano¹, George Caridakis²•Institutions (2)

Queen Mary University of London¹, National Technical University of Athens²

01 Mar 2010-Journal on Multimodal User Interfaces

TL;DR: The multimodal approach increased the recognition rate by more than 10% when compared to the most successful unimodal system, and the best pairing is ‘gesture-speech’.

...read moreread less

Abstract: In this paper a study on multimodal automatic emotion recognition during a speech-based interaction is presented. A database was constructed consisting of people pronouncing a sentence in a scenario where they interacted with an agent using speech. Ten people pronounced a sentence corresponding to a command while making 8 different emotional expressions. Gender was equally represented, with speakers of several different native languages including French, German, Greek and Italian. Facial expression, gesture and acoustic analysis of speech were used to extract features relevant to emotion. For the automatic classification of unimodal data, bimodal data and multimodal data, a system based on a Bayesian classifier was used. After performing an automatic classification of each modality, the different modalities were combined using a multimodal approach. Fusion of the modalities at the feature level (before running the classifier) and at the results level (combining results from classifier from each modality) were compared. Fusing the multimodal data resulted in a large increase in the recognition rates in comparison to the unimodal systems: the multimodal approach increased the recognition rate by more than 10% when compared to the most successful unimodal system. Bimodal emotion recognition based on all combinations of the modalities (i.e., ‘face-gesture’, ‘face-speech’ and ‘gesture-speech’) was also investigated. The results show that the best pairing is ‘gesture-speech’. Using all three modalities resulted in a 3.3% classification improvement over the best bimodal results.

...read moreread less

218 citations

Journal Article•DOI•

An insight into assistive technology for the visually impaired and blind people: state-of-the-art and future trends

[...]

Alexy Bhowmick¹, Alexy Bhowmick², Shyamanta M. Hazarika²•Institutions (2)

Assam Don Bosco University¹, Tezpur University²

07 Jan 2017-Journal on Multimodal User Interfaces

TL;DR: An objective statistical survey across the various sub-disciplines in the field and applied information analysis and network-theory techniques to answer several key questions relevant to the field reveal that there has been a sustained growth in this field.

...read moreread less

Abstract: Assistive technology for the visually impaired and blind people is a research field that is gaining increasing prominence owing to an explosion of new interest in it from disparate disciplines. The field has a very relevant social impact on our ever-increasing aging and blind populations. While many excellent state-of-the-art accounts have been written till date, all of them are subjective in nature. We performed an objective statistical survey across the various sub-disciplines in the field and applied information analysis and network-theory techniques to answer several key questions relevant to the field. To analyze the field we compiled an extensive database of scientific research publications over the last two decades. We inferred interesting patterns and statistics concerning the main research areas and underlying themes, identified leading journals and conferences, captured growth patterns of the research field; identified active research communities and present our interpretation of trends in the field for the near future. Our results reveal that there has been a sustained growth in this field; from less than 50 publications per year in the mid 1990s to close to 400 scientific publications per year in 2014. Assistive Technology for persons with visually impairments is expected to grow at a swift pace and impact the lives of individuals and the elderly in ways not previously possible.

...read moreread less

158 citations

Journal Article•DOI•

Multimodal assistive technologies for depression diagnosis and monitoring

[...]

Jyoti Joshi¹, Roland Goecke², Roland Goecke¹, Sharifa Alghowinem², Abhinav Dhall², Michael Wagner¹, Michael Wagner², Julien Epps³, Gordon Parker³, Michael Breakspear⁴ - Show less +6 more•Institutions (4)

University of Canberra¹, Australian National University², University of New South Wales³, QIMR Berghofer Medical Research Institute⁴

07 Sep 2013-Journal on Multimodal User Interfaces

TL;DR: The proposition that the auditory and visual human communication complement each other, which is well-known in auditory-visual speech processing, is exploited and the proposed framework’s effectiveness in depression analysis is shown.

...read moreread less

Abstract: Depression is a severe mental health disorder with high societal costs. Current clinical practice depends almost exclusively on self-report and clinical opinion, risking a range of subjective biases. The long-term goal of our research is to develop assistive technologies to support clinicians and sufferers in the diagnosis and monitoring of treatment progress in a timely and easily accessible format. In the first phase, we aim to develop a diagnostic aid using affective sensing approaches. This paper describes the progress to date and proposes a novel multimodal framework comprising of audio-video fusion for depression diagnosis. We exploit the proposition that the auditory and visual human communication complement each other, which is well-known in auditory-visual speech processing; we investigate this hypothesis for depression analysis. For the video data analysis, intra-facial muscle movements and the movements of the head and shoulders are analysed by computing spatio-temporal interest points. In addition, various audio features (fundamental frequency f0, loudness, intensity and mel-frequency cepstral coefficients) are computed. Next, a bag of visual features and a bag of audio features are generated separately. In this study, we compare fusion methods at feature level, score level and decision level. Experiments are performed on an age and gender matched clinical dataset of 30 patients and 30 healthy controls. The results from the multimodal experiments show the proposed framework’s effectiveness in depression analysis.

...read moreread less

148 citations

Collapse

Performance

Metrics

396

Papers

7,360

Citations

No. of papers from the Journal in previous years
Year	Papers
2023	5
2022	17
2021	43
2020	29
2019	32
2018	23