Showing papers by "Subhashini Venugopalan published in 2023"

PDF

Open Access

Proceedings Article•DOI•

SpeakFaster Observer: Long-Term Instrumentation of Eye-Gaze Typing for Measuring AAC Communication

[...]

Shanqing Cai, Subhashini Venugopalan, Katrin Tomanek, Shaun K. Kane, Meredith Ringel Morris, Richard Cave, Robert Macdonald, Jon Campbell, Jay Beavers - Show less +5 more

19 Apr 2023

TL;DR: The SpeakFaster Observer as mentioned in this paper was developed and field-testing a tool for observing and curating the gaze typing-based communication of an eye-gaze-based augmentative and alternative communication (AAC) device user with amyotrophic lateral sclerosis (ALS).

...read moreread less

Abstract: Accelerating communication for users with severe motor and speech impairments, in particular for eye-gaze-based augmentative and alternative communication (AAC) device users, is a longstanding area of research. However, observation of such users’ communication over extended durations has been limited. This case study presents the real-world experience of developing and field-testing a tool for observing and curating the gaze typing-based communication of an eye-gaze AAC user with amyotrophic lateral sclerosis (ALS). With the intent to observe and develop technology to accelerate eye-gaze typed communication, we designed a tool and a protocol called the SpeakFaster Observer to measure everyday conversational text entry by the gaze-typing user, as well as several consenting conversation partners of the AAC user. We detail the design of the Observer software and data curation protocol, along with considerations for privacy protection. The deployment of the data protocol from November 2021 to April 2022 yielded a rich dataset of gaze-based AAC text entry from everyday life, consisting of 130+ hours of gaze keystrokes and 5,000+ curated speech utterances from the AAC user and the conversation partners. We present the key statistics of the data, including the speed (8.1 ± 3.9 words per minute) and keystroke saving rate (-0.14 ± 0.83) of gaze typing, patterns of utterance repetition and reuse, and the temporal dynamics of conversation turn-taking in gaze-based communication. We share our findings and also open source our data collection tools to further research in this domain.

...read moreread less

Proceedings Article•DOI•

Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings

[...]

Joel Shor, Ruyue Bi, Subhashini Venugopalan, Steven Ibara, R Goldenberg, Ehud Rivlen - Show less +2 more

10 Mar 2023

TL;DR: The Clinical BERT Score (CBERTScore) as mentioned in this paper is an ASR metric that penalizes clinically-relevant transcription mistakes more than others, which is based on the Clinician Transcript Preference benchmark (CTP).

...read moreread less

Abstract: Automatic Speech Recognition (ASR) in medical contexts has the potential to save time, cut costs, increase report accuracy, and reduce physician burnout. However, the healthcare industry has been slower to adopt this technology, in part due to the importance of avoiding medically-relevant transcription mistakes. In this work, we present the Clinical BERTScore (CBERTScore), an ASR metric that penalizes clinically-relevant mistakes more than others. We collect a benchmark of 18 clinician preferences on 149 realistic medical sentences called the Clinician Transcript Preference benchmark (CTP) and make it publicly available for the community to further develop clinically-aware ASR metrics. To our knowledge, this is the first public dataset of its kind. We demonstrate that our metric more closely aligns with clinician preferences on medical sentences as compared to other metrics (WER, BLUE, METEOR, etc), sometimes by wide margins.

...read moreread less

I s a ttention a ll t hat n e rf n eeds ?

[...]

Mukund Varma, Peihao Wang, Xuxi Chen, Tianlong Chen, Subhashini Venugopalan, Zhangyang Wang - Show less +2 more

TL;DR: The analysis of the learned attention maps to infer depth and occlusion indicate that attention enables learning a physically-grounded rendering, showing the promise of transformers as a universal modeling tool for graphics.

...read moreread less

Abstract: We present Generalizable NeRF Transformer ( GNT ), a transformer-based architecture that reconstructs Neural Radiance Fields (NeRFs) and learns to render novel views on the fly from source views. While prior works on NeRFs optimize a scene representation by inverting a handcrafted rendering equation, GNT achieves neural representation and rendering that generalizes across scenes using transformers at two stages. (1) The view transformer leverages multi-view geometry as an inductive bias for attention-based scene representation, and predicts coordinate-aligned features by aggregating information from epipolar lines on the neighboring views. (2) The ray transformer renders novel views using attention to decode the features from the view transformer along the sampled points during ray marching. Our experiments demonstrate that when optimized on a single scene, GNT can successfully reconstruct NeRF without an explicit rendering formula due to the learned ray renderer. When trained on multiple scenes, GNT consistently achieves state-of-the-art performance when transferring to unseen scenes and outperform all other methods by ~ 10% on average. Our analysis of the learned attention maps to infer depth and occlusion indicate that attention enables learning a physically-grounded rendering. Our results show the promise of transformers as a universal modeling tool for graphics. Please refer to our project page for video results:

...read moreread less

Proceedings Article•DOI•

Speech Intelligibility Classifiers from 550k Disordered Speech Samples

[...]

Subhashini Venugopalan, Jimmy Tobin, Samuel J. Yang, Katie Seaver, Richard Cave, Pan-Pan Jiang, Neil Zeghidour, Rus Heywood, Jordan R. Green, Michael Brenner - Show less +6 more

13 Mar 2023

TL;DR: This paper developed dysarthric speech intelligibility classifiers on 551,176 disordered speech samples contributed by a diverse set of 468 speakers, with a range of self-reported speaking disorders and rated for their overall intelligibility on a five-point scale.

...read moreread less

Abstract: We developed dysarthric speech intelligibility classifiers on 551,176 disordered speech samples contributed by a diverse set of 468 speakers, with a range of self-reported speaking disorders and rated for their overall intelligibility on a five-point scale. We trained three models following different deep learning approaches and evaluated them on ~94K utterances from 100 speakers. We further found the models to generalize well (without further training) on the TORGO database (100% accuracy), UASpeech (0.93 correlation), ALS-TDI PMP (0.81 AUC) datasets as well as on a dataset of realistic unprompted speech we gathered (106 dysarthric and 76 control speakers,~2300 samples).

...read moreread less