scispace - formally typeset
Search or ask a question

Showing papers on "Crossmodal published in 2023"


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a target and source modality co-reinforcement (MCR) approach to achieve sufficient cross-modal interaction and fusion at different granularities.
Abstract: Perceiving human emotions from a multimodal perspective has received significant attention in knowledge engineering communities. Due to the variable receiving frequency for sequences from various modalities, multimodal streams usually have an inherent asynchronous challenge. Most previous methods performed manual sequence alignment before multimodal fusion, which ignored long-range dependencies among modalities and failed to learn reliable crossmodal element correlations. Inspired by the human perception paradigm, we propose a target and source Modality Co-Reinforcement (MCR) approach to achieve sufficient crossmodal interaction and fusion at different granularities. Specifically, MCR introduces two types of target modality reinforcement units to reinforce the multimodal representations jointly. These target units effectively enhance emotion-related knowledge exchange in fine-grained interactions and capture the crossmodal elements that are emotionally expressive in mixed-grained interactions. Moreover, a source modality update module is presented to provide meaningful features for the crossmodal fusion of target modalities. Eventually, the multimodal representations are progressively reinforced and improved via the above components. Comprehensive experiments are conducted on three multimodal emotion understanding benchmarks. Quantitative results show that MCR significantly outperforms the previous state-of-the-art methods in both word-aligned and unaligned settings. Additionally, qualitative analysis and visualization fully demonstrate the superiority of the proposed modules.

14 citations


Journal ArticleDOI
TL;DR: For instance, this article found that temporal synchrony and enhanced multisensory associative learning capabilities first guide causal inference and initiate early coarse multi-sensory integration capabilities.

2 citations


Journal ArticleDOI
TL;DR: The authors proposed a Text-centered fusion network with cross-modal attention (TeFNA), which takes text modality as the primary modality to improve the representation of fusion features.
Abstract: Multimodal sentiment analysis (MSA), which goes beyond the analysis of texts to include other modalities such as audio and visual data, has attracted a significant amount of attention. An effective fusion of sentiment information in multiple modalities is key to improving the performance of MSA. However, aligning multiple modalities during the process of fusion faces challenges such as maintaining modal-specific information. This paper proposes a Text-centered Fusion Network with crossmodal Attention (TeFNA), a multimodal fusion network that uses crossmodal attention to model unaligned multimodal timing information. In particular, TeFNA employs a Text-Centered Aligned fusion method (TCA) that takes text modality as the primary modality to improve the representation of fusion features. In addition, TeFNA maximizes the mutual information between modality pairs to maintain task-related emotional information, thereby ensuring that the key information of modalities from input to fusion is preserved. The results of our comprehensive experiments on the multimodal datasets of CMU-MOSI and CMU-MOSEI show that our proposed model outperforms methods in terms of most metrics used.

1 citations


Journal ArticleDOI
TL;DR: A growing body of experimental research now demonstrates that neurologically normal individuals associate different taste qualities with design features such as curvature, symmetry, orientation, texture, and movement as mentioned in this paper .
Abstract: A growing body of experimental research now demonstrates that neurologically normal individuals associate different taste qualities with design features such as curvature, symmetry, orientation, texture and movement. The form of everything from the food itself through to the curvature of the plateware on which it happens to be served, and from glassware to typeface, not to mention the shapes of/on food product packaging have all been shown to influence people's taste expectations, and, on occasion, also their taste/food experiences. Although the origins of shape-taste and other form-taste crossmodal correspondences have yet to be fully worked out, it would appear that shape qualities are occasionally elicited directly. However, more often, there may be a metaphorical attempt to translate the temporal qualities of taste sensations into a spatial analogue. At the same time, emotional mediation may sometimes also play a role in the affinity people experience between shape properties and taste. And finally, it should be acknowledged that associative learning of the relation between packaging shapes, glassware shapes, logos, labels and iconic food forms that commonly co-occur with specific taste properties (i.e., in the case of branded food products) may also play an important role in determining the nature of shape-taste correspondences. Ultimately, however, any attempt to use such shape-taste correspondences to nudge people's behaviour/perception in the real world is made challenging due to the fact that shape properties are associated with multiple qualities, and not just taste.

1 citations


Journal ArticleDOI
06 Apr 2023-PLOS ONE
TL;DR: In this article , the authors investigated the association between musical expertise and the processing of audiovisual cross-modal correspondences in a decision reaction-time task and found that musicians were significantly more accurate in their responses than non-musicians.
Abstract: Numerous studies have reported both cortical and functional changes for visual, tactile, and auditory brain areas in musicians, which have been attributed to long-term training induced neuroplasticity. Previous investigations have reported advantages for musicians in multisensory processing at the behavioural level, however, multisensory integration with tasks requiring higher level cognitive processing has not yet been extensively studied. Here, we investigated the association between musical expertise and the processing of audiovisual crossmodal correspondences in a decision reaction-time task. The visual display varied in three dimensions (elevation, symbolic and non-symbolic magnitude), while the auditory stimulus varied in pitch. Congruency was based on a set of newly learned abstract rules: “The higher the spatial elevation, the higher the tone”, “the more dots presented, the higher the tone”, and “the higher the number presented, the higher the tone”, and accuracy and reaction times were recorded. Musicians were significantly more accurate in their responses than non-musicians, suggesting an association between long-term musical training and audiovisual integration. Contrary to what was hypothesized, no differences in reaction times were found. The musicians’ advantage on accuracy was also observed for rule-based congruency in seemingly unrelated stimuli (pitch-magnitude). These results suggest an interaction between implicit and explicit processing–as reflected on reaction times and accuracy, respectively. This advantage was generalised on congruency in otherwise unrelated stimuli (pitch-magnitude pairs), suggesting an advantage on processes requiring higher order cognitive functions. The results support the notion that accuracy and latency measures may reflect different processes.

1 citations


Journal ArticleDOI
TL;DR: In this paper , the authors show that spiking activity coherently represents a location-specific mapping across auditory cortex (AC) and lateral, secondary visual cortex (V2L) of freely moving rats engaged in a sensory detection task on a figure-8 maze.
Abstract: Abstract Neurons in primary visual cortex (V1) may not only signal current visual input but also relevant contextual information such as reward expectancy and the subject’s spatial position. Such contextual representations need not be restricted to V1 but could participate in a coherent mapping throughout sensory cortices. Here, we show that spiking activity coherently represents a location-specific mapping across auditory cortex (AC) and lateral, secondary visual cortex (V2L) of freely moving rats engaged in a sensory detection task on a figure-8 maze. Single-unit activity of both areas showed extensive similarities in terms of spatial distribution, reliability, and position coding. Importantly, reconstructions of subject position based on spiking activity displayed decoding errors that were correlated between areas. Additionally, we found that head direction, but not locomotor speed or head angular velocity, was an important determinant of activity in AC and V2L. By contrast, variables related to the sensory task cues or to trial correctness and reward were not markedly encoded in AC and V2L. We conclude that sensory cortices participate in coherent, multimodal representations of the subject’s sensory-specific location. These may provide a common reference frame for distributed cortical sensory and motor processes and may support crossmodal predictive processing.

1 citations


Journal ArticleDOI
TL;DR: In this article , the authors present the first pre-registered systematic examination of the literature on the cross-modal interactions between audition and taste, concluding that taste may be crossmodally associated with pitch and musical instruments; words, nonwords, and speech sounds; and music and soundtracks.

1 citations


Journal ArticleDOI
TL;DR: In this article , the authors used functional magnetic resonance imaging (FMRI) to investigate how auditory stimuli reduce visual inhibition of return (IOR), a mechanism for preventing attention from returning to previously examined spatial locations.
Abstract: Visual inhibition of return (IOR) is a mechanism for preventing attention from returning to previously examined spatial locations. Previous studies have found that auditory stimuli presented simultaneously with a visual target can reduce or even eliminate the visual IOR. However, the mechanism responsible for decreased visual IOR accompanied by auditory stimuli is unclear. Using functional magnetic resonance imaging, we aimed to investigate how auditory stimuli reduce visual IOR. Behaviorally, we found that the visual IOR accompanying auditory stimuli was significant but smaller than the visual IOR. Neurally, only in the validly cued trials, the superior temporal gyrus showed increased neural coupling with the intraparietal sulcus, presupplementary motor area, and some other areas in audiovisual conditions compared with visual conditions. These results suggest that the reduction in visual IOR by the simultaneous auditory stimuli may be due to a dual mechanism: rescuing the suppressed visual salience and facilitating response initiation. Our results support crossmodal interactions can occur across multiple neural levels and cognitive processing stages. This study provides a new perspective for understanding attention‐orienting networks and response initiation based on crossmodal information.

1 citations


Journal ArticleDOI
19 Jan 2023
TL;DR: In this paper , the authors investigate the effects of two visual design principles, repetition and compositional lines, in a food image on purchase intention in the context of a mobile food delivery app and test the effect of cross-modal correspondences between vision and taste as a processing mechanism.
Abstract: PurposeThis study aims to investigate the effects of two visual design principles, repetition and compositional lines, in a food image on purchase intention in the context of a mobile food delivery app and test the effect of crossmodal correspondences between vision and taste as a processing mechanism.Design/methodology/approachIn this study, two experiments were conducted using burgers and iced tea as stimuli.FindingsThe results demonstrate that repetition of an identical food product increases visual appeal for both burgers and iced tea. However, the optimal level of repetition was different between the two products. The findings show that different compositional lines generate different levels of visual appeal and the effects of compositional lines vary between burgers and iced tea. The results also validate the serial mediation effects of vision and taste between design principles and purchase intention.Originality/valueThe findings of this study add substantially to the understanding of visual information processing in food retailing by demonstrating how design principles such as repetition and compositional lines facilitate crossmodal responses between vision and taste and influence purchase decisions in a mobile platform. Also this study provides guidance as to how food retailers use design principles (e.g. repetition and compositional lines) for different products effectively when the food retailers develop visual digital content for a mobile app.

Posted ContentDOI
17 May 2023-bioRxiv
TL;DR: This paper investigated the circumstances under which statistical learning can occur between modalities and found that participants can explicitly learn the statistical regularities between cross-modal pairs even when the upcoming modality is not predictable, as long as the pairs contain semantic information.
Abstract: Statistical learning (SL) refers to the ability to extract statistical regularities from the environment. Previous research has suggested that regularity extraction is modality-specific, occurring within but not between sensory modalities (Frost et al., 2015). The present study investigates the circumstances under which SL can occur between modalities. In the first experiment, participants were presented with a stream of meaningless visual fractals and synthetic sounds while performing an oddball detection task. Stimuli were grouped into unimodal (AA, VV) or crossmodal (VA, AV) pairs based on higher transitional probability between the elements. Using implicit and explicit measures of SL, we found that participants only learned the unimodal pairs. In a second experiment, we presented the pairs in separate unimodal (VVVV, AAAA) and crossmodal (AVAV, VAVA) blocks, allowing participants to anticipate which modality would be presented next. We found that SL for the crossmodal pairs outperformed that of unimodal pairs. This result suggests that modality predictability facilitates a correct crossmodal attention deployment that is crucial for learning crossmodal transitional probabilities. Finally, a third experiment demonstrated that participants can explicitly learn the statistical regularities between crossmodal pairs even when the upcoming modality is not predictable, as long as the pairs contain semantic information. This finding suggests that SL between crossmodal pairs can occur when sensory-level limitations are bypassed, and when learning can unfold at a supramodal level of representation. This study demonstrates that SL is not a modality-specific mechanism and compels revision of the current neurobiological model of SL in which learning of statistical regularities between low-level stimuli features relies on hard-wired learning computations that take place in their respective sensory cortices.

Journal ArticleDOI
TL;DR: In this article , the authors provide a circuit-level description of the auditory corticothalamic pathway in conjunction with adjacent cortical somatosensory projections, and discuss the functional interactions shared by these pathways, which appears to modulate sensory perception in the complementary domain.

Journal ArticleDOI
TL;DR: In this article , the effects of music on the detection and recognition of taste sensations were investigated. But the authors focused on taste intensity ratings, and less is known about the influence of musical stimuli on other parameters of taste function.

Journal ArticleDOI
TL;DR: In this paper , the authors evaluate the evidence for crossmodal changes in both developmental and adult-onset deafness, which start as early as mild-moderate hearing loss and show reversibility when hearing is restored.

Posted ContentDOI
27 Apr 2023
TL;DR: The authors proposed a method for generating realistic synthetic training data that maintains cross-modal relations between legitimate images and false human-written captions that they termed Crossmodal HArd Synthetic MisAlignment (CHASMA).
Abstract: Multimedia content has become ubiquitous on social media platforms, leading to the rise of multimodal misinformation and the urgent need for effective strategies to detect and prevent its spread. This study focuses on CrossModal Misinformation (CMM) where image-caption pairs work together to spread falsehoods. We contrast CMM with Asymmetric Multimodal Misinformation (AMM), where one dominant modality propagates falsehoods while other modalities have little or no influence. We show that AMM adds noise to the training and evaluation process while exacerbating the unimodal bias, where text-only or image-only detectors can seemingly outperform their multimodal counterparts on an inherently multimodal task. To address this issue, we collect and curate FIGMENTS, a robust evaluation benchmark for CMM, which consists of real world cases of misinformation, excludes AMM and utilizes modality balancing to successfully alleviate unimodal bias. FIGMENTS also provides a first step towards fine-grained CMM detection by including three classes: truthful, out-of-context, and miscaptioned image-caption pairs. Furthermore, we introduce a method for generating realistic synthetic training data that maintains crossmodal relations between legitimate images and false human-written captions that we term Crossmodal HArd Synthetic MisAlignment (CHASMA). We conduct extensive comparative study using a Transformer-based architecture. Our results show that incorporating CHASMA in conjunction with other generated datasets consistently improved the overall performance on FIGMENTS in both binary (+6.26%) and multiclass settings (+15.8%).We release our code at: https://github.com/stevejpapad/figments-and-misalignments

Journal ArticleDOI
TL;DR: In this article , steady-state visual evoked potentials (SSVEP) were used to investigate the congruence between pitch and visual features of size, hue or chromatic saturation.

Journal ArticleDOI
TL;DR: In this article , the authors used multiple single-channel recording methods to examine neuronal responses to visual, auditory, somatosensory and combined stimulation in early-deaf cats.
Abstract: Many neural areas, where patterned activity is lost following deafness, have the capacity to become activated by the remaining sensory systems. This crossmodal plasticity can be measured at perceptual/behavioural as well as physiological levels. The dorsal zone (DZ) of auditory cortex of deaf cats is involved in supranormal visual motion detection, but its physiological level of crossmodal reorganisation is not well understood. The present study of early-deaf DZ (and hearing controls) used multiple single-channel recording methods to examine neuronal responses to visual, auditory, somatosensory and combined stimulation. In early-deaf DZ, no auditory activation was observed, but 100% of the neurons were responsive to visual cues of which 21% were also influenced by somatosensory stimulation. Visual and somatosensory responses were not anatomically organised as they are in hearing cats, and fewer multisensory neurons were present in the deaf condition. These crossmodal physiological results closely correspond with and support the perceptual/behavioural enhancements that occur following hearing loss.

Journal ArticleDOI
TL;DR: In this paper , the authors examined the transfer of (mostly) visually acquired knowledge of first and second-language characters to the tactile modality typically not used in that acquisition process and found that children were able to recognize/classify the first-and secondlanguage letters or digits presented not only to the trained visual modality but to the (untrained) tactile modal as well, and as expected, with greater recognition accuracy and shorter recognition time in the former than the latter modality.
Abstract: Crossmodal transfer of learning is a neurocognitive process whereby a learner's experience and knowledge acquired through one sensory mode enable him/her to perform a similar task using a different sensory mode. This study examined the transfer of (mostly) visually acquired knowledge of first- and second-language characters to the tactile modality typically not used in that acquisition process. Two experiments were conducted, one to assess letter recognition skills and one to assess digit recognition skills in both Bangla and English, in 30 sighted young children who had mastered those characters through sensory learning in natural settings. Results unequivocally demonstrated that children were able to recognize/classify the first and second language letters or digits presented not only to the (trained) visual modality but to the (untrained) tactile modality as well, and as expected, with greater recognition accuracy and shorter recognition time in the former than the latter modality. Their character recognition performance was found to be significantly influenced not by language but by character type, with digits being more accurately and more speedily recognized than letters. Moreover, language-task modality interaction was found to mediate letter recognition accuracy, digit recognition accuracy, and digit recognition time, whereas character type-task modality interaction was found to significantly mediate character recognition time only. The ecological and theoretical significance of these findings is discussed. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

Journal ArticleDOI
TL;DR: This paper investigated the role of an underlying affective mechanism and a semantic path (i.e., regarding the semantic knowledge related to a single common source identity or meaning) in crossmodal associations between visual textures and temperature concepts using an associative learning paradigm.
Abstract: In the last decades, there has been a growing interest in crossmodal correspondences, including those involving temperature. However, only a few studies have explicitly examined the underlying mechanisms behind temperature-related correspondences. Here, we investigated the relative roles of an underlying affective mechanism and a semantic path (i.e., regarding the semantic knowledge related to a single common source identity or meaning) in crossmodal associations between visual textures and temperature concepts using an associative learning paradigm. Two online experiments using visual textures previously shown to be associated with low and high thermal effusivity (Experiment 1) and visual textures with no consensual associations with thermal effusivity (Experiment 2) were conducted. Participants completed a speeded categorization task before and after an associative learning task, in which they learned mappings between the visual textures and specific affective or semantic stimuli related to low and high temperatures. Across the two experiments, both the affective and semantic mappings influenced the categorization of visual textures with the hypothesized temperatures, but there was no influence on the reaction times. The effect of learning semantic mappings was larger than that of affective ones in both experiments, suggesting that a semantic path has more weight than an affective mechanism in the formation of the associations studied here. The associations studied here could be modified through associative learning establishing correlations between visual textures and either affective or semantic stimuli. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

Journal ArticleDOI
TL;DR: In this paper , the physicochemical features of odors were extracted using an array of gas sensors, also known as an electronic nose, and a similarity of 49% between the perceptual and physicochemical spaces of our odors was found.
Abstract: During the olfactory perception process, our olfactory receptors are thought to recognize specific chemical features. These features may contribute towards explaining our crossmodal perception. The physicochemical features of odors can be extracted using an array of gas sensors, also known as an electronic nose. The present study investigates the role that the physicochemical features of olfactory stimuli play in explaining the nature and origin of olfactory crossmodal correspondences, which is a consistently overlooked aspect of prior work. Here, we answer the question of whether the physicochemical features of odors contribute towards explaining olfactory crossmodal correspondences and by how much. We found a similarity of 49% between the perceptual and the physicochemical spaces of our odors. All of our explored crossmodal correspondences namely, the angularity of shapes, smoothness of textures, perceived pleasantness, pitch, and colors have significant predictors for various physicochemical features, including aspects of intensity and odor quality. While it is generally recognized that olfactory perception is strongly shaped by context, experience, and learning, our findings show that a link, albeit small (6-23%), exists between olfactory crossmodal correspondences and their underlying physicochemical features.

Journal ArticleDOI
TL;DR: In this paper , a task based on the McGurk illusion was used to measure multisensory integration while attention was manipulated via a concurrent auditory or visual task, and it was found that visual noise increased with the addition of the secondary visual task specifically.
Abstract: We receive information about the world around us from multiple senses which combine in a process known as multisensory integration. Multisensory integration has been shown to be dependent on attention; however, the neural mechanisms underlying this effect are poorly understood. The current study investigates whether changes in sensory noise explain the effect of attention on multisensory integration and whether attentional modulations to multisensory integration occur via modality-specific mechanisms. A task based on the McGurk Illusion was used to measure multisensory integration while attention was manipulated via a concurrent auditory or visual task. Sensory noise was measured within modality based on variability in unisensory performance and was used to predict attentional changes to McGurk perception. Consistent with previous studies, reports of the McGurk illusion decreased when accompanied with a secondary task; however, this effect was stronger for the secondary visual (as opposed to auditory) task. While auditory noise was not influenced by either secondary task, visual noise increased with the addition of the secondary visual task specifically. Interestingly, visual noise accounted for significant variability in attentional disruptions to the McGurk illusion. Overall, these results strongly suggest that sensory noise may underlie attentional alterations to multisensory integration in a modality-specific manner. Future studies are needed to determine whether this finding generalizes to other types of multisensory integration and attentional manipulations. This line of research may inform future studies of attentional alterations to sensory processing in neurological disorders, such as Schizophrenia, Autism, and ADHD.

Journal ArticleDOI
TL;DR: In this paper , the specific impact of monocular deprivation on neural correlates of multisensory processes was measured for both the deprived and the non-deprived eye, and the results revealed that MD changed neural activities associated with visual and multisENSory processes in an eye-specific manner.

Journal ArticleDOI
TL;DR: In this paper , the authors investigated uni-and bimodal perception of emotional cues as well as multisensory facilitation in autistic (n = 18, mean age: 36.72 years, SD: 11.36).
Abstract: Introduction Deficits in emotional perception are common in autistic people, but it remains unclear to which extent these perceptual impairments are linked to specific sensory modalities, specific emotions or multisensory facilitation. Methods This study aimed to investigate uni- and bimodal perception of emotional cues as well as multisensory facilitation in autistic (n = 18, mean age: 36.72 years, SD: 11.36) compared to non-autistic (n = 18, mean age: 36.41 years, SD: 12.18) people using auditory, visual and audiovisual stimuli. Results Lower identification accuracy and longer response time were revealed in high-functioning autistic people. These differences were independent of modality and emotion and showed large effect sizes (Cohen’s d 0.8–1.2). Furthermore, multisensory facilitation of response time was observed in non-autistic people that was absent in autistic people, whereas no differences were found in multisensory facilitation of accuracy between the two groups. Discussion These findings suggest that processing of auditory and visual components of audiovisual stimuli is carried out more separately in autistic individuals (with equivalent temporal demands required for processing of the respective unimodal cues), but still with similar relative improvement in accuracy, whereas earlier integrative multimodal merging of stimulus properties seems to occur in non-autistic individuals.

Posted ContentDOI
15 Apr 2023
TL;DR: Zhang et al. as discussed by the authors proposed a coordinated vision-language retrieval method (CoVLR), which aims to study and alleviate the desynchrony problem between the cross-modal alignment and single modal cluster-preserving tasks by developing an effective meta-optimization based strategy.
Abstract: Current vision-language retrieval aims to perform cross-modal instance search, in which the core idea is to learn the consistent visionlanguage representations. Although the performance of cross-modal retrieval has greatly improved with the development of deep models, we unfortunately find that traditional hard consistency may destroy the original relationships among single-modal instances, leading the performance degradation for single-modal retrieval. To address this challenge, in this paper, we experimentally observe that the vision-language divergence may cause the existence of strong and weak modalities, and the hard cross-modal consistency cannot guarantee that strong modal instances' relationships are not affected by weak modality, resulting in the strong modal instances' relationships perturbed despite learned consistent representations.To this end, we propose a novel and directly Coordinated VisionLanguage Retrieval method (dubbed CoVLR), which aims to study and alleviate the desynchrony problem between the cross-modal alignment and single-modal cluster-preserving tasks. CoVLR addresses this challenge by developing an effective meta-optimization based strategy, in which the cross-modal consistency objective and the intra-modal relation preserving objective are acted as the meta-train and meta-test tasks, thereby CoVLR encourages both tasks to be optimized in a coordinated way. Consequently, we can simultaneously insure cross-modal consistency and intra-modal structure. Experiments on different datasets validate CoVLR can improve single-modal retrieval accuracy whilst preserving crossmodal retrieval capacity compared with the baselines.

Journal ArticleDOI
TL;DR: The authors formulates, implements and examines different ways in performing cross-modal associative recall of perceptual information in machine cognition, and results show that different recalls possess their own features and properties which are useful to equip cognitive machine with variety of cognitive recall capabilities for different purposes.
Abstract: As human, we actively stimulate our memory via variety of recalls for information across different modalities in our brains. The variety of recalls enriches our crossmodal recall mechanism and equip our cognitive system to retrieve information for different needs. Inspired by our cognitive capabilities, this paper formulates, implement and examine different ways in performing crossmodal associative recall of perceptual information in machine cognition. Simulations with different cases and scenarios are conducted and examined on the recalls. Results show that different recalls possess their own features and properties which are useful to equip cognitive machine with variety of cognitive recall capabilities for different purposes.

Proceedings ArticleDOI
19 Apr 2023
TL;DR: In this paper , a visuo-haptic cross-modal model of shape perception designed for shape-changing handheld controllers is presented, using the inertia tensor of an object to bridge the two senses.
Abstract: We present a visuo-haptic crossmodal model of shape perception designed for shape-changing handheld controllers. The model uses the inertia tensor of an object to bridge the two senses. The model was constructed from the results of three perceptual experiments. In the first two experiments, we validate that the primary moment and product of inertia (MOI and POI) in the inertia tensor have critical effects on the haptic perception of object length and asymmetry. Then, we estimate a haptic-to-visual shape matching model using MOI and POI as two link variables from the results of the third experiment for crossmodal magnitude production. Finally, we validate in a summative user study that the inverse of the shape matching model is effective for pairing a perceptually-congruent haptic object from a virtual object—the functionality we need for shape-changing handheld interfaces to afford perceptually-fulfilling sensory experiences in virtual reality.

Posted ContentDOI
05 Mar 2023
TL;DR: In this paper , the authors propose a parametric graph construction strategy for the intra-modal edges, and learn the crossmodal edge for audiovisual data, which can adapt to various spatial and temporal scales owing to its parametric construction.
Abstract: Heterogeneous graphs provide a compact, efficient, and scalable way to model data involving multiple disparate modalities. This makes modeling audiovisual data using heterogeneous graphs an attractive option. However, graph structure does not appear naturally in audiovisual data. Graphs for audiovisual data are constructed manually which is both difficult and sub-optimal. In this work, we address this problem by (i) proposing a parametric graph construction strategy for the intra-modal edges, and (ii) learning the crossmodal edges. To this end, we develop a new model, heterogeneous graph crossmodal network (HGCN) that learns the crossmodal edges. Our proposed model can adapt to various spatial and temporal scales owing to its parametric construction, while the learnable crossmodal edges effectively connect the relevant nodes across modalities. Experiments on a large benchmark dataset (AudioSet) show that our model is state-of-the-art (0.53 mean average precision), outperforming transformer-based models and other graph-based models.

Book ChapterDOI
30 Apr 2023
TL;DR: In this paper , the authors focus on measures of multisensory integration based on numerical data collected from single neurons and in behavioral paradigms: spike numbers, reaction time, frequency of correct or incorrect responses in detection, recognition, and discrimination tasks.
Abstract: The investigation of processes involved in merging information from different sensory modalities has become the subject of research in many areas, including anatomy, physiology, and behavioral sciences. This field of research termed "multisensory integration’’ is flourishing, crossing borders between psychology and neuroscience. The focus of this chapter is on measures of multisensory integration based on numerical data collected from single neurons and in behavioral paradigms: spike numbers, reaction time, frequency of correct or incorrect responses in detection, recognition, and discrimination tasks. Defining that somewhat fuzzy term, it has been observed that at least some kind of numerical measurement assessing the strength of crossmodal effects is required. On the empirical side, these measures typically serve to quantify effects of various covariates on multisensory integration like age, certain disorders, developmental conditions, training and rehabilitation, in addition to attention and learning. On the theoretical side, these measures often help to probe hypotheses about underlying integration mechanisms like optimality in combining information or inverse effectiveness, without necessarily subscribing to a specific model.

Journal ArticleDOI
TL;DR: In this article , the authors argue that human learning and memory systems are evolved to operate under these multisensory and dynamic conditions, and that the brain is sensitive to the relationship between the sensory inputs, and continuously updates sensory representations, and encodes memory traces based on the relationships between the senses.
Abstract: Most studies of memory and perceptual learning in humans have employed unisensory settings to simplify the study paradigm. However, in daily life we are often surrounded by complex and cluttered scenes made up of many objects and sources of sensory stimulation. Our experiences are, therefore, highly multisensory both when passively observing the world and when acting and navigating. We argue that human learning and memory systems are evolved to operate under these multisensory and dynamic conditions. The nervous system exploits the rich array of sensory inputs in this process, is sensitive to the relationship between the sensory inputs, and continuously updates sensory representations, and encodes memory traces based on the relationship between the senses. We review some recent findings that demonstrate a range of human learning and memory phenomena in which the interactions between visual and auditory modalities play an important role, and suggest possible neural mechanisms that can underlie some surprising recent findings. We outline open questions as well as directions of future research to unravel human perceptual learning and memory.


Journal ArticleDOI
12 Apr 2023-Eureka
TL;DR: This article explored the manifestations of the McGurk effect in other languages, such as Japanese and Chinese, as well as considerations for age and keenness (hearing acuity) through a literary review of existing research.
Abstract: The McGurk effect denotes a phenomenon of speech perception where a listener attends to mismatched audio and visual stimuli and perceives an illusory third sound, typically a conflation of the audio-visual stimulus. This multimodal interaction has been exploited in various English-language experiments. The article explores the manifestations of this effect in other languages, such as Japanese and Chinese, as well as considerations for age and keenness (hearing acuity) through a literary review of existing research. The literature confirms the McGurk effect is present in other languages, albeit to differing degrees. The differences in the McGurk effect across languages may be attributed to linguistic and cultural differences. Age differences demonstrate a greater lip-reading reliance as age increases in participants; a similar reliance on visual information is seen in participants as hearing impairment increases. Experimental designs should refine audiovisual stimuli by using immersive technology such as three-dimensional models in virtual reality or ambisonic playback that offers multi-directional sound signals. Future research should also address the influence of audiovisual integration in marketing, foreign language education, and developing better accommodations for the hearing impaired.