scispace - formally typeset
G

George Sterpu

Researcher at Trinity College, Dublin

Publications -  16
Citations -  158

George Sterpu is an academic researcher from Trinity College, Dublin. The author has contributed to research in topics: Modality (human–computer interaction) & Audio-visual speech recognition. The author has an hindex of 4, co-authored 15 publications receiving 86 citations.

Papers
More filters
Proceedings ArticleDOI

Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition

TL;DR: In this paper, an audio-visual fusion strategy was proposed to align the two modalities, leading to enhanced representations which increase the recognition accuracy in both clean and noisy conditions, and the results showed relative improvements from 7% up to 30% on TCD-TIMIT over the acoustic modality alone, depending on the acoustic noise level.
Journal ArticleDOI

How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition

TL;DR: The inner workings of AV Align are investigated and a regularisation method which involves predicting lip-related Action Units from visual representations is proposed which leads to better exploitation of the visual modality and encourages researchers to rethink the multimodal convergence problem when having one dominant modality.
Proceedings ArticleDOI

Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition

TL;DR: This paper proposes an audio-visual fusion strategy that goes beyond simple feature concatenation and learns to automatically align the two modalities, leading to enhanced representations which increase the recognition accuracy in both clean and noisy conditions.
Posted Content

Towards Lipreading Sentences with Active Appearance Models

TL;DR: It is concluded that a fundamental rethink of the modelling of visual features may be needed for this task, and the DCT is found to outperform AAM by more than 6% for a viseme recognition task with 56 speakers.
Journal ArticleDOI

How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition

TL;DR: In this article, the authors investigated the inner workings of AV Align and visualised the audio-visual alignment patterns, and proposed a regularization method which involves predicting lip-related Action Units from visual representations.