scispace - formally typeset
Search or ask a question

Showing papers in "Journal on Multimodal User Interfaces in 2020"


Journal ArticleDOI
TL;DR: Overall, the findings showed that elderly Arab users found the UI design of mHealth acceptable due to its cultural significance, and the impact of age on the relationship between ease of use, usefulness, and intention was significant.
Abstract: In the Arab world, mobile health (mHealth) applications are an effective way to provide health benefits to medically needy in the absence of health services. However, end users around the world use technology to perform tasks in a way that appears more natural, and closer to their cultural and personal preferences. Evidence from prior studies shows that culture is a vital factor in the success of a system or product. In view of this fact, this study investigated elderly Arab users’ acceptance of mHealth User Interface (UI) design-based culture. The TAM model was used to shape the theoretical foundation for this study with a questionnaire as data gathering tool from 81 participants. The findings showed that perceived ease of use and attitude towards use had a significant positive influence on users’ behavioral intention to use mHealth UI design-based culture. The impact of age on the relationship between ease of use, usefulness, and intention was significant. Overall, the findings showed that elderly Arab users found the UI design of mHealth acceptable due to its cultural significance. To enhance the design of mobile UI targeting elderly users, it is important to consider the cultural rules and their behavioral applications.

44 citations


Journal ArticleDOI
TL;DR: The results showed that the hand gesture had an issue of local worker understanding remote expert’s hand gesture cue and the main visual communication cue was the pointer cue with fast completion time and high level of co-presence.
Abstract: Many researchers have studied various visual communication cues (e.g. pointer, sketch, and hand gesture) in Mixed Reality remote collaboration systems for real-world tasks. However, the effect of combining them has not been so well explored. We studied the effect of these cues in four combinations: hand only, hand + pointer, hand + sketch, and hand + pointer + sketch, within the two user studies when a dependent view and independent view are supported respectively. In the first user study with the dependent view, the results showed that the hand gesture cue was the main visual communication cue and adding sketch cue to the hand gesture cue helped participants complete the task faster. In the second study with the independent view, the results showed that the hand gesture had an issue of local worker understanding remote expert’s hand gesture cue and the main visual communication cue was the pointer cue with fast completion time and high level of co-presence.

35 citations


Journal ArticleDOI
TL;DR: This paper presents an MR remote collaboration system that shares both spatial auditory and visual cues between collaborators to help them complete a search task and finds that integrating visual cues with the spatial auditory cues significantly improved the local worker’s task performance, social presence, and spatial perception of the environment.
Abstract: Collaborative Mixed Reality (MR) technologies enable remote people to work together by sharing communication cues intrinsic to face-to-face conversations, such as eye gaze and hand gestures. While the role of visual cues has been investigated in many collaborative MR systems, the use of spatial auditory cues remains underexplored. In this paper, we present an MR remote collaboration system that shares both spatial auditory and visual cues between collaborators to help them complete a search task. Through two user studies in a large office, we found that compared to non-spatialized audio, the spatialized remote expert’s voice and auditory beacons enabled local workers to find small occluded objects with significantly stronger spatial perception. We also found that while the spatial auditory cues could indicate the spatial layout and a general direction to search for the target object, visual head frustum and hand gestures intuitively demonstrated the remote expert’s movements and the position of the target. Integrating visual cues (especially the head frustum) with the spatial auditory cues significantly improved the local worker’s task performance, social presence, and spatial perception of the environment.

33 citations


Journal ArticleDOI
TL;DR: This editorial describes five essential factors for remote collaboration: task, local user, remote user, communication, and tool/interface, and then summarizes a brief history of the research areas.
Abstract: Remote collaboration has been studied for more than two decades and now there is the possibilities for new types of collaboration with the recent advances in immersive technologies such as Virtual, Augmented, Mixed Reality (VR/AR/MR). However, despite the increasing research interest in remote collaboration study with VR/AR/MR technologies, there is still a lack of academic venues specifically focusing on VR/AR/MR remote collaboration research. This special issue provides high-quality papers on the topic of remote collaboration research and increases visibility of this timely interesting and important research area. We particularly focus on three research aspects in remote collaboration: (1) use of multimodal communication cues, (2) awareness of the task space, and (3) human factors understanding. In this editorial, we first describe five essential factors for remote collaboration: task, local user, remote user, communication, and tool/interface, and then summarize a brief history of the research areas. We also cover the feature papers accepted in this issue, which introduce novel multimodal interfaces for remote collaboration and the effects on task performance and perceptual factors. Finally, we discuss some potential future research directions while concluding the editorial.

29 citations


Journal ArticleDOI
TL;DR: An intelligent interactive head up display on the windscreen of the driver that does not require them to take eyes off from road while undertaking secondary tasks like playing music, operating vent controls, watching navigation map and so on is presented.
Abstract: Modern infotainment systems in automobiles facilitate driving at the cost of secondary tasks in addition to the primary task of driving. These secondary tasks have considerable chance to distract a driver from his primary driving task, thereby reducing safety or increasing cognitive workload. This paper presents an intelligent interactive head up display (HUD) on the windscreen of the driver that does not require them to take eyes off from road while undertaking secondary tasks like playing music, operating vent controls, watching navigation map and so on. The interactive HUD allows interaction in the form of pointing and selection just like traditional graphical user interfaces, however tracking operators’ eye gaze or finger movements. Additionally, the system can also estimate drivers’ cognitive load and distraction level. User studies show the system improves driving performance in terms of mean deviation from lane in an ISO 26022 lane changing task compared to touchscreen system and participants can undertake ISO 9241 pointing tasks in less than 2 s on average inside a Toyota Etios car.

23 citations


Journal ArticleDOI
TL;DR: The experiment results show that the multi-modal strategy can achieve encouraging recognition results compared to the single modal strategy and label-preserving transformation is used to enlarge the dataset artificially, in order to address the over-fitting and data imbalance of deep neural networks.
Abstract: Emotion recognition based on facial expression is a challenging research topic and has attracted a great deal of attention in the past few years. This paper presents a novel method, utilizing multi-modal strategy to extract emotion features from facial expression images. The basic idea is to combine the low-level empirical feature and the high-level self-learning feature into a multi-modal feature. The 2-dimensional coordinate of facial key points are extracted as low-level empirical feature and the high-level self-learning feature are extracted by the Convolutional Neural Networks (CNNs). To reduce the number of free parameters of CNNs, small filters are utilized for all convolutional layers. Owing to multiple small filters are equivalent of a large filter, which can reduce the number of parameters to learn effectively. And label-preserving transformation is used to enlarge the dataset artificially, in order to address the over-fitting and data imbalance of deep neural networks. Then, two kinds of modal features are fused linearly to form the facial expression feature. Extensive experiments are evaluated on the extended Cohn–Kanade (CK+) Dataset. For comparison, three kinds of feature vectors are adopted: low-level facial key point feature vector, high-level self-learning feature vector and multi-modal feature vector. The experiment results show that the multi-modal strategy can achieve encouraging recognition results compared to the single modal strategy.

23 citations


Journal ArticleDOI
TL;DR: An interactive and easy-to-configure MR remote collaboration technique enabling a local worker to easily share his/her environment by integrating 360 panorama images into a low-cost 3D reconstructed scene as photo-bubbles and projective textures is shown.
Abstract: Remote collaboration using mixed reality (MR) enables two separated workers to collaborate by sharing visual cues. A local worker can share his/her environment to the remote worker for a better contextual understanding. However, prior techniques were using either 360 video sharing or a complicated 3D reconstruction configuration. This limits the interactivity and practicality of the system. In this paper we show an interactive and easy-to-configure MR remote collaboration technique enabling a local worker to easily share his/her environment by integrating 360 panorama images into a low-cost 3D reconstructed scene as photo-bubbles and projective textures. This enables the remote worker to visit past scenes on either an immersive 360 panoramic scenery, or an interactive 3D environment. We developed a prototype and conducted a user study comparing the two modes of how 360 panorama images could be used in a remote collaboration system. Results suggested that both photo-bubbles and projective textures can provide high social presence, co-presence and low cognitive load for solving tasks while each have its advantage and limitations. For example, photo-bubbles are good for a quick navigation inside the 3D environment without depth perception while projective textures are good for spatial understanding but require physical efforts.

18 citations


Journal ArticleDOI
TL;DR: This paper presents a human-subjects study to understand the impact of accuracy, precision, latency, and dropout based errors on users’ performance when using shared gaze cues to identify a target among a crowd of people and measures participants’ objective performance through their response time and error rate.
Abstract: Augmented reality (AR) technologies provide a shared platform for users to collaborate in a physical context involving both real and virtual content. To enhance the quality of interaction between AR users, researchers have proposed augmenting users’ interpersonal space with embodied cues such as their gaze direction. While beneficial in achieving improved interpersonal spatial communication, such shared gaze environments suffer from multiple types of errors related to eye tracking and networking, that can reduce objective performance and subjective experience. In this paper, we present a human-subjects study to understand the impact of accuracy, precision, latency, and dropout based errors on users’ performance when using shared gaze cues to identify a target among a crowd of people. We simulated varying amounts of errors and the target distances and measured participants’ objective performance through their response time and error rate, and their subjective experience and cognitive load through questionnaires. We found significant differences suggesting that the simulated error levels had stronger effects on participants’ performance than target distance with accuracy and latency having a high impact on participants’ error rate. We also observed that participants assessed their own performance as lower than it objectively was. We discuss implications for practical shared gaze applications and we present a multi-user prototype system.

18 citations


Journal ArticleDOI
TL;DR: It was found that the trust was enhanced under ambiguity uncertainty with low Agreeableness, low Neuroticism, high Extraversion, high Conscientiousness, and high Openness, while under the low cognitive load condition, under the high cognitive load conditions, high neuroticism and low Extraversion benefitted the trust without the uncertainty presentation.
Abstract: Data analytics-driven solutions are widely used in various intelligent systems, where humans and machines make decisions collaboratively based on predictions. Human factors such as personality and trust have significant effects on such human–machine collaborations. This paper investigates effects of personality traits on user trust in human–machine collaborations under uncertainty and cognitive load conditions. A user study of 42 subjects in a repeated factorial design experiment found that uncertainty presentation led to increased trust but only under low cognitive load conditions when users had sufficient cognitive resources to process the information. Presentation of uncertainty under high load conditions led to a decrease in trust. When further drilling down into personality trait groups of users, overall, users with low Openness showed the highest trust. Furthermore, under the low cognitive load condition, it was found that the trust was enhanced under ambiguity uncertainty with low Agreeableness, low Neuroticism, high Extraversion, high Conscientiousness, and high Openness. Under the high cognitive load condition, high Neuroticism and low Extraversion benefitted the trust without the uncertainty presentation. The results demonstrated that different personality traits affected trust differently under uncertainty and cognitive load conditions. A framework of user trust feedback loop was set up to incorporate the study results into human–machine collaborations for the meaningful participatory design.

17 citations


Journal ArticleDOI
TL;DR: Views on multi-modal human–robot interfaces, of older people living independently, with students and university staff are compared, finding that older people’s technology needs have differences and similarities to the younger people who are likely carrying out the research.
Abstract: Numerous projects, normally run by younger people, are exploring robot use by older people. But are older any different from younger people in the way they want to interact with robots? Understanding older compared to younger people’s preferences will give researchers more insight into good design. We compared views on multi-modal human–robot interfaces, of older people living independently, with students and university staff. We showed 96 participants aged under 65 and 18 aged 65 + , six videos presenting different scenarios, including interfaces both working properly and failing, for an older man interacting with a robot by speech and touch screen tablet. Participants were asked about the interfaces they might use and why, using self-completed questionnaires with mainly open-ended questions. People over 65 were more like people under 21 than those aged 22–64 (78%, 67%, 47% respectively) in preferring speech over tablet for robot–human interaction. But reasons for doing so may differ, for example, hearing and eyesight impairment versus speaking while hands full. Older participants were more likely (83% vs. 55%) to want a robot in the house than those under 65. Older people were as familiar with tablets and smart speakers as younger people, but less likely to use smart phones. Some younger people suggested interacting with robot via their smart phone, and while not at home. Answers to similar questions about preferences for robot interaction varied according to position in the questionnaire. User-centred design of human–robot interfaces should include open questions to understand people’s preferences, should account for question wording and order in interpreting user preferences, and should include people of all age ranges to better understand interface use. Older people’s technology needs have differences and similarities to the younger people who are likely carrying out the research. Our sample of older people were more like people under 21 than those aged in between for preference of robot–human interaction, and more willing to have a robot in the home than younger people. Differences may come from a more home based lifestyle and difficulties with vision, hearing, or dexterity rather than lack of interest in technology.

17 citations


Journal ArticleDOI
TL;DR: The psychophysical abilities and limitations of the auditory and vibrotactile modality will be discussed and the differential sensitivity for suprathreshold signals is compared.
Abstract: In this paper, the psychophysical abilities and limitations of the auditory and vibrotactile modality will be discussed. A direct comparison reveals similarities and differences. The knowledge of those is the basis for the design of perceptually optimized auditory-tactile human–machine interfaces or multimodal music applications. Literature data and own results for psychophysical characteristics are summarized. An overview of the absolute perception thresholds of both modalities is given. The main factors which influence these thresholds are discussed: age, energy integration, masking and adaptation. Subsequently, the differential sensitivity (discrimination of intensity, frequency, temporal aspects and location) for suprathreshold signals is compared.

Journal ArticleDOI
TL;DR: Results show that having both music and dance led to higher accuracy scores for most target emotions, compared to music or dance conditions alone.
Abstract: Sonification has the potential to communicate a variety of data types to listeners including not just cognitive information, but also emotions and aesthetics. The goal of our dancer sonification project is to “sonify emotions as well as motions” of a dance performance via musical sonification. To this end, we developed and evaluated sonification strategies for adding a layer of emotional mappings to data sonification. Experiment 1 developed and evaluated four musical sonifications (i.e., sin-ification, MIDI-fication, melody module, and melody and arrangement module) to see their emotional effects. Videos were recorded of a professional dancer interacting with each of the four musical sonification strategies. Forty-eight participants provided ratings of musicality, emotional expressivity, and sound-motion/emotion compatibility via an online survey. Results suggest that increasing musical mappings led to higher ratings for each dimension for dance-type gestures. Experiment 2 used the musical sonification framework to develop four sonification scenarios that aimed to communicate a target emotion (happy, sad, angry, and tender). Thirty participants compared four interactive sonification scenarios with four pre-composed dance choreographies featuring the same musical and gestural palettes. Both forced choice and multi-dimensional emotional evaluations were collected, as well as motion/emotion compatibility ratings. Results show that having both music and dance led to higher accuracy scores for most target emotions, compared to music or dance conditions alone. These findings can contribute to the fields of movement sonification, algorithmic music composition, as well as affective computing in general, by describing strategies for conveying emotion through sound.

Journal ArticleDOI
TL;DR: The study shows that the polarity design had the highest accuracy rates in the detection task whereas the stethoscope sonification obtained the better score in the classification assignment and aesthetics, the water ambience sonification was regarded as the most pleasant.
Abstract: This paper presents the design and evaluation of four sonification methods to support monitoring and diagnosis in Electrocardiography (ECG). In particular we focus on an ECG abnormality called ST-elevation which is an important indicator of a myocardial infarction. Since myocardial infarction represents a life-threatening condition it is of essential value to detect an ST-elevation as early as possible. As part of the evaluated sound designs, we propose two novel sonifications: (i) Polarity sonification, a continuous parameter-mapping sonification using a formant synthesizer and (ii) Stethoscope sonification, a combination of the ECG signal and a stethoscope recording. The other two designs, (iii) the water ambience sonification and the (iv) morph sonification, were presented in our previous work about ECG sonification (Aldana Blanco AL, Steffen G, Thomas H (2016) In: Proceedings of Interactive Sonification Workshop (ISon). Bielefeld, Germany). The study evaluates three components across the proposed sonifications (1) detection performance, meaning if participants are able to detect a transition from healthy to unhealthy states, (2) classification accuracy, that evaluates if participants can accurately classify the severity of the pathology, and (3) aesthetics and usability (pleasantness, informativeness and long-term listening). The study results show that the polarity design had the highest accuracy rates in the detection task whereas the stethoscope sonification obtained the better score in the classification assignment. Concerning aesthetics, the water ambience sonification was regarded as the most pleasant. Furthermore, we found a significant difference between sound/music experts and non-experts in terms of the error rates obtained in the detection task using the morph sonification and also in the classification task using the stethoscope sonification. Overall, the group of experts obtained lower error rates than the group of non-experts, which means that further training could improve accuracy rates and, particularly for designs that rely mainly on pitch variations, additional training is needed in the non-experts group.

Journal ArticleDOI
TL;DR: This research is the first to actively involve pilots in the exploration of gaze-based interactions in the cockpit, and the results build the foundation for future research, because they not only reflect pilots’ attitudes towards this novel technology, but also provide an overview of situations in which pilots need gaze- based interactions.
Abstract: Flying an aircraft is a mentally demanding task where pilots must process a vast amount of visual, auditory and vestibular information. They have to control the aircraft by pulling, pushing and turning different knobs and levers, while knowing that mistakes in doing so can have fatal outcomes. Therefore, attempts to improve and optimize these interactions should not increase pilots’ mental workload. By utilizing pilots’ visual attention, gaze-based interactions provide an unobtrusive solution to this. This research is the first to actively involve pilots in the exploration of gaze-based interactions in the cockpit. By distributing a survey among 20 active commercial aviation pilots working for an internationally operating airline, the paper investigates pilots’ perception and needs concerning gaze-based interactions. The results build the foundation for future research, because they not only reflect pilots’ attitudes towards this novel technology, but also provide an overview of situations in which pilots need gaze-based interactions.

Journal ArticleDOI
TL;DR: This article describes one such framework, the embodied sonification listening model, which provides a theoretical description of sonified listening in terms of conceptual metaphor theory.
Abstract: This is a theoretical paper that considers the mapping problem, a foundational issue which arises when designing a sonification, as it applies to sonic information design. We argue that this problem can be addressed by using models from the field of embodied cognitive science, including embodied image schema theory, conceptual metaphor theory and conceptual blends, and from research which treats sound and musical structures using these models, when mapping data to sound. However, there are currently very few theoretical frameworks for applying embodied cognition principles in a sonic information design context. This article describes one such framework, the embodied sonification listening model, which provides a theoretical description of sonification listening in terms of conceptual metaphor theory.

Journal ArticleDOI
TL;DR: This paper evaluates cost-effective and portable solutions allowing for independent control of frequency and amplitude over a wide frequency bandwidth and low harmonic distortion, so that flexible and high-quality vibrotactile feedback can be displayed and compares the result of equalization by performing sinesweep measurements on the implementation.
Abstract: The integration of vibrotactile feedback in digital music instruments (DMIs) is thought to improve the instrument’s response and make it more suitable for expert musical interactions. However, given the extreme requirements of musical performances, there is a need for solutions allowing for independent control of frequency and amplitude over a wide frequency bandwidth (40–1000 Hz) and low harmonic distortion, so that flexible and high-quality vibrotactile feedback can be displayed. In this paper, we evaluate cost-effective and portable solutions that meet these requirements. We first measure the magnitude–frequency and harmonic distortion characteristics of two vibrotactile actuators, where the harmonic distortion is quantified in the form of total harmonic distortion (THD). The magnitude–frequency and THD characteristics in two unloaded cases (actuator suspended freely or placed on a sandbag) are observed to be largely identical, with minor attenuation for actuators placed on the sandbag. Loading the actuator (when placed in a DMI) brings resonant features to its magnitude–frequency characteristics, increasing the output THD and imposing a dampening effect. To equalize the system’s frequency response, an autoregressive method that automatically estimates minimum-phase filter parameters is introduced, which by design, remains stable upon inversion A practical use of this method is demonstrated by implementing vibrotactile feedback in the poly vinyl chloride chassis of an unfinished DMI, the t-Stick. We finally compare the result of equalization by performing sinesweep measurements on the implementation and discuss the degree of equalization achieved using it.

Journal ArticleDOI
TL;DR: This paper presents a Movement Expectation Sonification Model, based on theories of motor-feedback and expectation, to explore how musical sonification can impact the way people perceive their movement, and presents a study that validates the predictions of this model.
Abstract: When designing movement sonifications, their effect on people’s movement must be considered. Recent work has shown how real-time sonification can be designed to alter the way people move. However, the mechanisms through which these sonifications alter people’s expectations of their movement is not well explained. This is especially important when considering musical sonifications, to which people bring their own associations and musical expectation, and which can, in turn, alter their perception of the sonification. This paper presents a Movement Expectation Sonification Model, based on theories of motor-feedback and expectation, to explore how musical sonification can impact the way people perceive their movement. Secondly, we present a study that validates the predictions of this model by exploring how harmonic stability within sonification interacts with contextual cues in the environment to impact movement behaviour and perceptions. We show how musical expectancy can be built to either reward or encourage movement, and how such an effect is mediated through the presence of additional cues. This model offers a way for sonification designers to create movement sonifications that not only inform movement but can be used to encourage progress and reward successes.

Journal ArticleDOI
TL;DR: A new web-based system specially adapted to the education of Czech pupils with visual impairment, designed to enable teachers create and manage teaching materials, indicates a positive reception and frequent use of the system as well as a preference over classical educational materials.
Abstract: This paper describes a new web-based system specially adapted to the education of Czech pupils with visual impairment. The system integrates speech and language technologies with a web framework in lower secondary education, especially in mathematics and physics subjects. A new interface utilized the text-to-speech (TTS) synthesis for online automatic reading of educational texts. The interface provides several TTS voices, synthesized data caching, and automatic processing of formulas in mathematics and physics. The system was designed to enable teachers create and manage teaching materials. It also enables the pupils to view and listen to the read forms of these documents online. A school for pupils with visual impairment participated in the development and implementation of the system. After one year of using the system daily, the user experience and evaluation data were collected. The results indicate a positive reception and frequent use of the system as well as a preference over classical educational materials.

Journal ArticleDOI
TL;DR: The authors' automated social skills training is extended by considering user listening skills during conversations with computer agents, and the number of noddings and backchannels within the utterances contributes to the predictions.
Abstract: Listening skills are critical for human communication. Social skills training (SST), performed by human trainers, is a well-established method for obtaining appropriate skills in social interaction. Previous work automated the process of social skills training by developing a dialogue system that teaches speaking skills through interaction with a computer agent. Even though previous work that simulated social skills training considered speaking skills, the SST framework incorporates other skills, such as listening, asking questions, and expressing discomfort. In this paper, we extend our automated social skills training by considering user listening skills during conversations with computer agents. We prepared two scenarios: Listening 1 and Listening 2, which respectively assume small talk and job training. A female agent spoke to the participants about a recent story and how to make a telephone call, and the participants listened. We recorded the data of 27 Japanese graduate students who interacted with the agent. Two expert external raters assessed the participants’ listening skills. We manually extracted features that might be related to the eye fixation and behavioral cues of the participants and confirmed that a simple linear regression with selected features correctly predicted listening skills with a correlation coefficient above 0.50 in both scenarios. The number of noddings and backchannels within the utterances contributes to the predictions because we found that just using these two features predicted listening skills with a correlation coefficient above 0.43. Since these two features are easier to understand for users, we plan to integrate them into the framework of automated social skills training.

Journal ArticleDOI
TL;DR: Two novel techniques which use the audio respiration signal captured by a standard microphone placed near to mouth and supervised machine learning algorithms are proposed which confirm that it is possible to infer information about the movement qualities from respiration audio.
Abstract: In this paper, we explore how the audio respiration signal can contribute to multimodal analysis of movement qualities. Within this aim, we propose two novel techniques which use the audio respiration signal captured by a standard microphone placed near to mouth and supervised machine learning algorithms. The first approach consists of the classification of a set of acoustic features extracted from exhalations of a person performing fluid or fragmented movements. In the second approach, the intrapersonal synchronization between the respiration and kinetic energy of body movements is used to distinguish the same qualities. First, the value of synchronization between modalities is computed using the Event Synchronization algorithm. Next, a set of features, computed from the value of synchronization, is used as an input to machine learning algorithms. Both approaches were applied to the multimodal corpus composed of short performances by three professionals performing fluid and fragmented movements. The total duration of the corpus is about 17 min. The highest F-score (0.87) for the first approach was obtained for the binary classification task using Support Vector Machines (SVM-LP). The best result for the same task using the second approach was obtained using Naive Bayes algorithm (F-score of 0.72). The results confirm that it is possible to infer information about the movement qualities from respiration audio.

Journal ArticleDOI
TL;DR: It is argued that artistic works created in this field over the last 20 years—and those yet to come—may be of significant importance to the haptics community as new objects that question physicality, tangibility, and creativity from a fresh and rather singular angle.
Abstract: The nature of human/instrument interaction is a long-standing area of study, drawing interest from fields as diverse as philosophy, cognitive sciences, anthropology, human–computer-interaction, and artistic creation. In particular, the case of the interaction between performer and musical instrument provides an enticing framework for studying the instrumental dynamics that allow for embodiment, skill acquisition and virtuosity with (electro-)acoustical instruments, and questioning how such notions may be transferred into the realm of digital music technologies and virtual instruments. This paper offers a study of concepts and technologies allowing for instrumental dynamics with Digital Musical Instruments, through an analysis of haptic-audio creation centred on (a) theoretical and conceptual frameworks, (b) technological components—namely physical modelling techniques for the design of virtual mechanical systems and force-feedback technologies allowing mechanical coupling with them, and (c) a corpus of artistic works based on this approach. Through this retrospective, we argue that artistic works created in this field over the last 20 years—and those yet to come—may be of significant importance to the haptics community as new objects that question physicality, tangibility, and creativity from a fresh and rather singular angle. Following which, we discuss the convergence of efforts in this field, challenges still ahead, and the possible emergence of a new transdisciplinary community focused on multisensory digital art forms.

Journal ArticleDOI
TL;DR: This special issue gathers extended versions of a selection of papers presented at the 2019 HAID workshop, focusing on several directions of research on haptics, audio and HCI, including perceptual studies and the design, evaluation and use of vibrotactile and force-feedback devices in audio, musical and game applications.
Abstract: Haptics, audio and human–computer interaction are three scientific disciplines that share interests, issues and methodologies. Despite these common points, interaction between these communities are sparse, because each of them have their own publication venues, meeting places, etc. A venue to foster interaction between these three communities was created in 2006, the Haptic and Audio Interaction Design workshop (HAID), aiming to provide a meeting place for researchers in these areas. HAID was carried out yearly from 2006 to 2013, then discontinued. Having worked in the intersection of these areas for several years, we felt the need to revive this event and decided to organize a HAID edition in 2019 in Lille, France. HAID 2019 was attended by more than 100 university, industry and artistic researchers and practitioners, showing the continued interest for such a unique venue. This special issue gathers extended versions of a selection of papers presented at the 2019 workshop. These papers focus on several directions of research on haptics, audio and HCI, including perceptual studies and the design, evaluation and use of vibrotactile and force-feedback devices in audio, musical and game applications.

Journal ArticleDOI
TL;DR: The lesson learned was that various material and physical properties of virtual buttons can be successfully rendered by characteristic frequency and decay cues if correctly reproduced by the device.
Abstract: An experiment is described that tested the possibility to classify wooden, plastic, and metallic objects based on reproduced auditory and vibrotactile stimuli. The results show that recognition rates are considerably above chance level with either unimodal auditory or vibrotactile feedback. Supported by those findings, the possibility to render virtual buttons for professional appliances with different tactile properties was tested. To this end, a touchscreen device was provided with various types of vibrotactile feedback in response to the sensed pressing force and location of a finger. Different virtual buttons designs were tested by user panels who performed a subjective evaluation on perceived tactile properties and materials. In a first implementation, virtual buttons were designed reproducing the vibration recordings of real materials used in the classification experiment: mainly due to hardware limitations of our prototype and the consequent impossibility to render complex vibratory signals, this approach did not prove successful. A second implementation was then optimized for the device capabilities, moreover introducing surface compliance effects and button release cues: the new design led to generally high quality ratings, clear discrimination of different buttons and unambiguous material classification. The lesson learned was that various material and physical properties of virtual buttons can be successfully rendered by characteristic frequency and decay cues if correctly reproduced by the device.

Journal ArticleDOI
TL;DR: This special issue covers a diverse collection of approaches to auditory displays, involving art, design, science, and research, and hopes it can provide the state of art of auditory display research and auditory user interface design.
Abstract: For almost 3 decades, research on auditory displays and sonification has been well advanced. Now, the auditory display community has arrived at the stage of sonic information design with a more systematic, refined necessity, going beyond random mappings between the referents and sounds. Due to its innate transdisciplinary nature of auditory display, it would be difficult to unify the methods to study it. This special issue covers a diverse collection of approaches to auditory displays, involving art, design, science, and research. Accordingly, the works in the present special issue included new theories, frameworks, methods, and applications about auditory displays and auditory user interfaces. We hope that this special issue can provide the state of art of auditory display research and auditory user interface design, offering fresh inspiration and motivation to researchers and designers for their future works.

Journal ArticleDOI
TL;DR: The design process, implementation and user involvement in the design of an exergame app that is specifically targeted to stroke survivors are outlined, and the lessons learned during the design process are presented.
Abstract: Persons who have survived a stroke might lower the risk of having recurrent strokes by adopting a healthier lifestyle with more exercise. One way to promote exercising is by fitness or exergame apps for mobile phones. Health and fitness apps are used by a significant portion of the consumers, but these apps are not targeted to stroke survivors, who may experience cognitive limitations (like fatigue and neglect), have problems with mobility due to hemiplegia, and balance problems. We outline the design process, implementation and user involvement in the design of an exergame app that is specifically targeted to stroke survivors, and present the lessons learned during the design process.

Journal ArticleDOI
TL;DR: A sonification method based on a combination of single-side-band modulation and a pitch modulation of the original data stream that allows to expand pure audification in a flexible way and suggests a procedure for parameter optimization to achieve an optimal listening range for any data set, adjusted to human speech.
Abstract: We present a sonification method which we call Focused Audification (FA; previously: Augmented Audification) that allows to expand pure audification in a flexible way. It is based on a combination of single-side-band modulation and a pitch modulation of the original data stream. Based on two free parameters, the sonification’s frequency range is adjustable to the human hearing range and allows to interactively zoom into the data set at any scale. The parameters have been adjusted in a multimodal experiment on cardiac data by laypeople. Following from these results we suggest a procedure for parameter optimization to achieve an optimal listening range for any data set, adjusted to human speech.

Journal ArticleDOI
TL;DR: Results indicated the AC–BC relationship is not unique and depends on sound intensity, frequency, and BC transducer location, and results support similar findings which indicate that the Mastoid and Condyle locations can be considered interchangeable in terms of their frequency-related sensitivity while the Forehead was found to be considerably less sensitive compared to the other locations.
Abstract: The term ‘multimodal’ typically refers to the combination of two or more sensory modalities; however, through the advancement of technology, modality variations within specific sensory systems are being discovered and compared in regards to physiological perception and response. The ongoing evaluation of air vs bone conduction auditory perception modalities is one such comparison. Despite an increased awareness of the potential benefits of utilizing bone conduction pathways, a complete understanding of the human auditory system, more specifically, the relationship between air conducted and bone conducted sound remains a critical deficiency hindering the development of advanced multimodal auditory displays. Conduction equivalency ratios (CERs), which were defined as the difference in sound intensity levels (in dB) between equally loud signals transmitted in air conduction (AC) (sound field) and bone conduction (BC) modes provided a link between these two modes of hearing by determining the relationship between spectral content of AC and BC sound. The current report aims to describe, in depth, the establishment of such CERs at three BC transducer contact locations on a listener’s head over a range of audible frequencies presented over three signal intensities within a controlled free-field listening environment. Results indicated the AC–BC relationship is not unique and depends on sound intensity, frequency, and BC transducer location. In addition, in terms of head sensitivity, results support similar findings which indicate that the Mastoid and Condyle locations can be considered interchangeable in terms of their frequency-related sensitivity while the Forehead was found to be considerably less sensitive compared to the other locations.

Journal ArticleDOI
TL;DR: This study investigates the effect of direct manipulation and integrality of interaction techniques in 2D audio production tools by classifying the atomic tasks of a general composite task of authoring 3D audio trajectories and then evaluating different interaction techniques across these tasks.
Abstract: With the popularity of immersive media, developing usable tools for content development is important for the production process. In the context of 3D audio production, user interfaces for authoring and editing 3D audio trajectories enable content developers, composers, practitioners, and recording and mixing engineers to define how audio sources travel in time. However , common interaction techniques in 3D audio production tools can make the workflow of this task tedious and difficult to accomplish. This study investigates this problem by classifying the atomic tasks (spatially and temporally) of a general composite task of authoring 3D audio trajectories and then evaluating different interaction techniques across these tasks. Common graphical user interfaces were compared with input devices having varying degrees-of-freedom for spatial atomic tasks in order to investigate the effect of direct manipulation and integrality of interaction techniques. Continuous and discrete interaction techniques were compared for temporal tasks in order to investigate the effect of direct manipulation. Results suggest that interaction techniques with high degrees of integrality and direct manipulation reduce task completion time compared to standard GUI techniques. The design of temporal tasks can create a visual bias, and discrete-time controls can be a suitable method for traversing a small number of control points. These results and further observations provide directions on the study of interaction technique design for 3D audio tools, which in turn should improve workflows of 3D audio content creation.

Journal ArticleDOI
TL;DR: This work evaluates the practicality of the state-of-the-art signal-based HCI research in terms of the following six aspects— granularity, robustness, usability, efficiency, stability, and deployability.
Abstract: Wi-Fi and acoustic signal-based human–computer interaction (HCI) methods have received growing attention in academia. However, there still are issues to be addressed despite their flourishing. In this work, we evaluate the practicality of the state-of-the-art signal-based HCI research in terms of the following six aspects—granularity, robustness, usability, efficiency, stability, and deployability. The paper presents our analysis results, observations and prospective research directions. We believe that this work will serve as a standard for future signal-based HCI research for assessing the practicality of newly developed methods.