Home
/
Authors
/
Vikas Joshi

Author

Vikas Joshi

Other affiliations: Indian Institute of Technology Madras

Bio: Vikas Joshi is an academic researcher from IBM. The author has contributed to research in topics: Speaker recognition & Noise. The author has an hindex of 5, co-authored 27 publications receiving 103 citations. Previous affiliations of Vikas Joshi include Indian Institute of Technology Madras.

Topics: Speaker recognition, Noise, Word error rate, Initialization, Encoder ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•

Analysis of sampling algorithms for twitter

[...]

Deepan Palguna¹, Vikas Joshi², Venkatesan T. Chakaravarthy², Ravi Kothari², L. V. Subramaniam² - Show less +1 more•Institutions (2)

Purdue University¹, IBM²

25 Jul 2015

TL;DR: This work comes up with a theoretical formulation for sampling Twitter data, introduces novel statistical metrics to quantify the statistical representativeness of the Tweet sample, and derive sufficient conditions on the number of samples needed for obtaining highly representative Tweet samples.

...read moreread less

Abstract: The daily volume of Tweets in Twitter is around 500 million, and the impact of this data on applications ranging from public safety, opinion mining, news broadcast, etc., is increasing day by day. Analyzing large volumes of Tweets for various applications would require techniques that scale well with the number of Tweets. In this work we come up with a theoretical formulation for sampling Twitter data. We introduce novel statistical metrics to quantify the statistical representativeness of the Tweet sample, and derive sufficient conditions on the number of samples needed for obtaining highly representative Tweet samples. These new statistical metrics quantify the representativeness or goodness of the sample in terms of frequent keyword identification and in terms of restoring public sentiments associated with these keywords. We use uniform random sampling with replacement as our algorithm, and sampling could serve as a first step before using other sophisticated summarization methods to generate summaries for human use. We show that experiments conducted on real Twitter data agree with our bounds. In these experiments, we also compare different kinds of random sampling algorithms. Our bounds are attractive since they do not depend on the total number of Tweets in the universe. Although our ideas and techniques are specific to Twitter, they could find applications in other areas as well.

...read moreread less

14 citations

Proceedings Article•

Sub-Band Level Histogram Equalization for Robust Speech Recognition.

[...]

Vikas Joshi¹, Raghavendra Bilgi¹, Srinivasan Umesh¹, Luz García², M. Carmen Benítez - Show less +1 more•Institutions (2)

Indian Institute of Technology Madras¹, University of Granada²

01 Jan 2011

TL;DR: A novel modification of Histogram Equalization approach to robust speech recognition is described, known as Sub-band Histograms Equalization (S-HEQ), which has better equalization of the sub-bands as well as the overall cepstral histogram.

...read moreread less

Abstract: This paper describes a novel modification of Histogram Equalization approach to robust speech recognition We propose separate equalization of the high frequency and low frequency bands We study different combinations of the sub-band equalization and obtain best results when we performs a twostage equalization First, conventional Histogram Equalization (HEQ) is performed on the cepstral features, which does not completely equalize high frequency and low frequency bands, even though the overall histogram equalization is good In the second stage, an equalization is done separately on the high frequency and the low frequency components of the above equalized cepstra We refer to this approach as Sub-band Histogram Equalization (S-HEQ) The new set of features has better equalization of the sub-bands as well as the overall cepstral histogram Recognition results show a relative improvement of 12% and 15% over conventional HEQ on Aurora-2 and Aurora4 databases respectively

...read moreread less

14 citations

Posted Content•

Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

[...]

Vikas Joshi¹, Rui Zhao², Rupesh R. Mehta, Kshitiz Kumar², Jinyu Li² - Show less +1 more•Institutions (2)

IBM¹, Microsoft²

12 Aug 2020-arXiv: Audio and Speech Processing

TL;DR: This paper presents a comparative study of four different TL methods for RNN-T framework, showing 17% relative word error rate reduction with differentTL methods over randomly initialized Rnn-T model and showing the efficacy of TL for languages with small amount of training data.

...read moreread less

Abstract: Transfer learning (TL) is widely used in conventional hybrid automatic speech recognition (ASR) system, to transfer the knowledge from source to target language. TL can be applied to end-to-end (E2E) ASR system such as recurrent neural network transducer (RNN-T) models, by initializing the encoder and/or prediction network of the target language with the pre-trained models from source language. In the hybrid ASR system, transfer learning is typically done by initializing the target language acoustic model (AM) with source language AM. Several transfer learning strategies exist in the case of the RNN-T framework, depending upon the choice of the initialization model for encoder and prediction networks. This paper presents a comparative study of four different TL methods for RNN-T framework. We show 17% relative word error rate reduction with different TL methods over randomly initialized RNN-T model. We also study the impact of TL with varying amount of training data ranging from 50 hours to 1000 hours and show the efficacy of TL for languages with small amount of training data.

...read moreread less

13 citations

Posted Content•

Learning to Recognize Code-switched Speech Without Forgetting Monolingual Speech Recognition.

[...]

Sanket Shah, Basil Abraham, Gurunath Reddy M, Sunayana Sitaram, Vikas Joshi - Show less +1 more

01 Jun 2020-arXiv: Audio and Speech Processing

TL;DR: It is shown that fine-tuning ASR models on code-switched speech harms performance on monolingual speech, and the Learning Without Forgetting (LWF) framework is proposed for code- Switched ASR when the authors only have access to amonolingual model and do not have the data it was trained on.

...read moreread less

Abstract: Recently, there has been significant progress made in Automatic Speech Recognition (ASR) of code-switched speech, leading to gains in accuracy on code-switched datasets in many language pairs. Code-switched speech co-occurs with monolingual speech in one or both languages being mixed. In this work, we show that fine-tuning ASR models on code-switched speech harms performance on monolingual speech. We point out the need to optimize models for code-switching while also ensuring that monolingual performance is not sacrificed. Monolingual models may be trained on thousands of hours of speech which may not be available for re-training a new model. We propose using the Learning Without Forgetting (LWF) framework for code-switched ASR when we only have access to a monolingual model and do not have the data it was trained on. We show that it is possible to train models using this framework that perform well on both code-switched and monolingual test sets. In cases where we have access to monolingual training data as well, we propose regularization strategies for fine-tuning models for code-switching without sacrificing monolingual accuracy. We report improvements in Word Error Rate (WER) in monolingual and code-switched test sets compared to baselines that use pooled data and simple fine-tuning.

...read moreread less

12 citations

Proceedings Article•DOI•

Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems

[...]

Vikas Joshi¹, Amit Das², Eric Sun², Rupesh R. Mehta, Jinyu Li², Yifan Gong² - Show less +2 more•Institutions (2)

IBM¹, Microsoft²

30 Aug 2021

12 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Posted Content•

Learning without Forgetting

[...]

Zhizhong Li¹, Derek Hoiem¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

29 Jun 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes the Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities, and performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques.

...read moreread less

Abstract: When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning with similar old and new task datasets for improved new task performance.

...read moreread less

1,037 citations

Statistical Methods For Environmental Pollution Monitoring

[...]

Michael Frankfurter

01 Jan 2016

TL;DR: A statistical methods for environmental pollution monitoring always becomes the most wanted book and many people are absolutely searching for this book as mentioned in this paper, which means that many love to read this kind of book.

...read moreread less

Abstract: If you really want to be smarter, reading can be one of the lots ways to evoke and realize. Many people who like reading will have more knowledge and experiences. Reading can be a way to gain information from economics, politics, science, fiction, literature, religion, and many others. As one of the part of book categories, statistical methods for environmental pollution monitoring always becomes the most wanted book. Many people are absolutely searching for this book. It means that many love to read this kind of book.

...read moreread less

624 citations

Journal Article•DOI•

Exploring the boundaries of open innovation: Evidence from social media mining

[...]

Jose Ramon Saura, Daniel Palacios-Marqués, Domingo Ribeiro-Soriano

01 Jan 2022-Technovation

TL;DR: In this article , the authors explored the limits of open innovation by extracting evidence from user-generated content (UGC) on Twitter using social media mining and found that open innovation is the main driver of change in a business sector that needs to be flexible and resilient, rapidly adapting to change through innovation.

...read moreread less

98 citations

Posted Content•

A Survey of Code-switched Speech and Language Processing

[...]

Sunayana Sitaram, Khyathi Raghavi Chandu, Sai Krishna Rallabandi, Alan W. Black

25 Mar 2019-arXiv: Computation and Language

TL;DR: This survey reviews computational approaches for code-switched Speech and Natural Language Processing, including language processing tools and end-to-end systems and concludes with future directions and open problems in the field.

...read moreread less

Abstract: Code-switching, the alternation of languages within a conversation or utterance, is a common communicative phenomenon that occurs in multilingual communities across the world This survey reviews computational approaches for code-switched Speech and Natural Language Processing We motivate why processing code-switched text and speech is essential for building intelligent agents and systems that interact with users in multilingual communities As code-switching data and resources are scarce, we list what is available in various code-switched language pairs with the language processing tasks they can be used for We review code-switching research in various Speech and NLP applications, including language processing tools and end-to-end systems We conclude with future directions and open problems in the field

...read moreread less

86 citations

Journal Article•DOI•

Recent Advances in End-to-End Automatic Speech Recognition

[...]

Melissa A. Collins¹•Institutions (1)

Microsoft¹

01 Jan 2022-APSIPA transactions on signal and information processing

TL;DR: In this paper , the authors overview the recent advances in E2E models, focusing on technologies addressing those challenges from the industry's perspective, and provide excellent solutions to all these factors.

...read moreread less

Abstract: Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR). While E2E models achieve the state-of-the-art results in most benchmarks in terms of ASR accuracy, hybrid models are still used in a large proportion of commercial ASR systems at the current time. There are lots of practical factors that affect the production model deployment decision. Traditional hybrid models, being optimized for production for decades, are usually good at these factors. Without providing excellent solutions to all these factors, it is hard for E2E models to be widely commercialized. In this paper, we will overview the recent advances in E2E models, focusing on technologies addressing those challenges from the industry's perspective.

...read moreread less

52 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Collapse