scispace - formally typeset
S

Sankar Basu

Researcher at IBM

Publications -  28
Citations -  1134

Sankar Basu is an academic researcher from IBM. The author has contributed to research in topics: Speech processing & Audio mining. The author has an hindex of 16, co-authored 28 publications receiving 1133 citations.

Papers
More filters
PatentDOI

Method and apparatus for audio-visual speech detection and recognition

TL;DR: In this article, the authors propose a speech recognition technique for video and audio signals that consists of processing a video signal associated with an arbitrary content video source, processing an audio signal associated to the video signal, and recognizing at least a portion of the processed audio signal using at least the processed video signal to generate output signal representative of the audio signal.
Patent

Methods and apparatus for audio-visual speaker recognition and utterance verification

TL;DR: In this paper, an identification and/or verification decision is made based on the processed audio signal and the processed video signal, which is referred to as unsupervised utterance verification.
Patent

Method and apparatus for active annotation of multimedia content

TL;DR: In this paper, the authors propose an annotation framework in which supervised training with partially labeled data is facilitated using active learning, which results in propagation of labels to unlabeled data and greatly facilitates the user in annotating large amounts of multimedia content.
Patent

Adaptive probabilistic query expansion

TL;DR: In this article, an expanding operation is used to expand the query into sub-queries, wherein at least one of the subqueries is expanded probabilistically, and an adapting operation is configured to modify the search such that the relevance of the search result is increased when the search is repeated.
Proceedings ArticleDOI

A cascade image transform for speaker independent automatic speechreading

TL;DR: A three-stage pixel based visual front end for automatic speechreading (lipreading) that results in improved recognition performance of spoken words or phonemes with significant classification accuracy gains by each added stage, which, when combined, can reach up to 27% improvement.