scispace - formally typeset
Search or ask a question

Showing papers by "Subhashini Venugopalan published in 2023"


Proceedings ArticleDOI
19 Apr 2023
TL;DR: The SpeakFaster Observer as mentioned in this paper was developed and field-testing a tool for observing and curating the gaze typing-based communication of an eye-gaze-based augmentative and alternative communication (AAC) device user with amyotrophic lateral sclerosis (ALS).
Abstract: Accelerating communication for users with severe motor and speech impairments, in particular for eye-gaze-based augmentative and alternative communication (AAC) device users, is a longstanding area of research. However, observation of such users’ communication over extended durations has been limited. This case study presents the real-world experience of developing and field-testing a tool for observing and curating the gaze typing-based communication of an eye-gaze AAC user with amyotrophic lateral sclerosis (ALS). With the intent to observe and develop technology to accelerate eye-gaze typed communication, we designed a tool and a protocol called the SpeakFaster Observer to measure everyday conversational text entry by the gaze-typing user, as well as several consenting conversation partners of the AAC user. We detail the design of the Observer software and data curation protocol, along with considerations for privacy protection. The deployment of the data protocol from November 2021 to April 2022 yielded a rich dataset of gaze-based AAC text entry from everyday life, consisting of 130+ hours of gaze keystrokes and 5,000+ curated speech utterances from the AAC user and the conversation partners. We present the key statistics of the data, including the speed (8.1 ± 3.9 words per minute) and keystroke saving rate (-0.14 ± 0.83) of gaze typing, patterns of utterance repetition and reuse, and the temporal dynamics of conversation turn-taking in gaze-based communication. We share our findings and also open source our data collection tools to further research in this domain.

Proceedings ArticleDOI
10 Mar 2023
TL;DR: The Clinical BERT Score (CBERTScore) as mentioned in this paper is an ASR metric that penalizes clinically-relevant transcription mistakes more than others, which is based on the Clinician Transcript Preference benchmark (CTP).
Abstract: Automatic Speech Recognition (ASR) in medical contexts has the potential to save time, cut costs, increase report accuracy, and reduce physician burnout. However, the healthcare industry has been slower to adopt this technology, in part due to the importance of avoiding medically-relevant transcription mistakes. In this work, we present the Clinical BERTScore (CBERTScore), an ASR metric that penalizes clinically-relevant mistakes more than others. We collect a benchmark of 18 clinician preferences on 149 realistic medical sentences called the Clinician Transcript Preference benchmark (CTP) and make it publicly available for the community to further develop clinically-aware ASR metrics. To our knowledge, this is the first public dataset of its kind. We demonstrate that our metric more closely aligns with clinician preferences on medical sentences as compared to other metrics (WER, BLUE, METEOR, etc), sometimes by wide margins.

TL;DR: The analysis of the learned attention maps to infer depth and occlusion indicate that attention enables learning a physically-grounded rendering, showing the promise of transformers as a universal modeling tool for graphics.
Abstract: We present Generalizable NeRF Transformer ( GNT ), a transformer-based architecture that reconstructs Neural Radiance Fields (NeRFs) and learns to render novel views on the fly from source views. While prior works on NeRFs optimize a scene representation by inverting a handcrafted rendering equation, GNT achieves neural representation and rendering that generalizes across scenes using transformers at two stages. (1) The view transformer leverages multi-view geometry as an inductive bias for attention-based scene representation, and predicts coordinate-aligned features by aggregating information from epipolar lines on the neighboring views. (2) The ray transformer renders novel views using attention to decode the features from the view transformer along the sampled points during ray marching. Our experiments demonstrate that when optimized on a single scene, GNT can successfully reconstruct NeRF without an explicit rendering formula due to the learned ray renderer. When trained on multiple scenes, GNT consistently achieves state-of-the-art performance when transferring to unseen scenes and outperform all other methods by ~ 10% on average. Our analysis of the learned attention maps to infer depth and occlusion indicate that attention enables learning a physically-grounded rendering. Our results show the promise of transformers as a universal modeling tool for graphics. Please refer to our project page for video results:

Proceedings ArticleDOI
13 Mar 2023
TL;DR: This paper developed dysarthric speech intelligibility classifiers on 551,176 disordered speech samples contributed by a diverse set of 468 speakers, with a range of self-reported speaking disorders and rated for their overall intelligibility on a five-point scale.
Abstract: We developed dysarthric speech intelligibility classifiers on 551,176 disordered speech samples contributed by a diverse set of 468 speakers, with a range of self-reported speaking disorders and rated for their overall intelligibility on a five-point scale. We trained three models following different deep learning approaches and evaluated them on ~94K utterances from 100 speakers. We further found the models to generalize well (without further training) on the TORGO database (100% accuracy), UASpeech (0.93 correlation), ALS-TDI PMP (0.81 AUC) datasets as well as on a dataset of realistic unprompted speech we gathered (106 dysarthric and 76 control speakers,~2300 samples).