scispace - formally typeset
Search or ask a question
Author

Vikas Joshi

Bio: Vikas Joshi is an academic researcher from IBM. The author has contributed to research in topics: Speaker recognition & Noise. The author has an hindex of 5, co-authored 27 publications receiving 103 citations. Previous affiliations of Vikas Joshi include Indian Institute of Technology Madras.

Papers
More filters
Proceedings Article
25 Jul 2015
TL;DR: This work comes up with a theoretical formulation for sampling Twitter data, introduces novel statistical metrics to quantify the statistical representativeness of the Tweet sample, and derive sufficient conditions on the number of samples needed for obtaining highly representative Tweet samples.
Abstract: The daily volume of Tweets in Twitter is around 500 million, and the impact of this data on applications ranging from public safety, opinion mining, news broadcast, etc., is increasing day by day. Analyzing large volumes of Tweets for various applications would require techniques that scale well with the number of Tweets. In this work we come up with a theoretical formulation for sampling Twitter data. We introduce novel statistical metrics to quantify the statistical representativeness of the Tweet sample, and derive sufficient conditions on the number of samples needed for obtaining highly representative Tweet samples. These new statistical metrics quantify the representativeness or goodness of the sample in terms of frequent keyword identification and in terms of restoring public sentiments associated with these keywords. We use uniform random sampling with replacement as our algorithm, and sampling could serve as a first step before using other sophisticated summarization methods to generate summaries for human use. We show that experiments conducted on real Twitter data agree with our bounds. In these experiments, we also compare different kinds of random sampling algorithms. Our bounds are attractive since they do not depend on the total number of Tweets in the universe. Although our ideas and techniques are specific to Twitter, they could find applications in other areas as well.

14 citations

Proceedings Article
01 Jan 2011
TL;DR: A novel modification of Histogram Equalization approach to robust speech recognition is described, known as Sub-band Histograms Equalization (S-HEQ), which has better equalization of the sub-bands as well as the overall cepstral histogram.
Abstract: This paper describes a novel modification of Histogram Equalization approach to robust speech recognition We propose separate equalization of the high frequency and low frequency bands We study different combinations of the sub-band equalization and obtain best results when we performs a twostage equalization First, conventional Histogram Equalization (HEQ) is performed on the cepstral features, which does not completely equalize high frequency and low frequency bands, even though the overall histogram equalization is good In the second stage, an equalization is done separately on the high frequency and the low frequency components of the above equalized cepstra We refer to this approach as Sub-band Histogram Equalization (S-HEQ) The new set of features has better equalization of the sub-bands as well as the overall cepstral histogram Recognition results show a relative improvement of 12% and 15% over conventional HEQ on Aurora-2 and Aurora4 databases respectively

14 citations

Posted Content
Vikas Joshi1, Rui Zhao2, Rupesh R. Mehta, Kshitiz Kumar2, Jinyu Li2 
TL;DR: This paper presents a comparative study of four different TL methods for RNN-T framework, showing 17% relative word error rate reduction with differentTL methods over randomly initialized Rnn-T model and showing the efficacy of TL for languages with small amount of training data.
Abstract: Transfer learning (TL) is widely used in conventional hybrid automatic speech recognition (ASR) system, to transfer the knowledge from source to target language. TL can be applied to end-to-end (E2E) ASR system such as recurrent neural network transducer (RNN-T) models, by initializing the encoder and/or prediction network of the target language with the pre-trained models from source language. In the hybrid ASR system, transfer learning is typically done by initializing the target language acoustic model (AM) with source language AM. Several transfer learning strategies exist in the case of the RNN-T framework, depending upon the choice of the initialization model for encoder and prediction networks. This paper presents a comparative study of four different TL methods for RNN-T framework. We show 17% relative word error rate reduction with different TL methods over randomly initialized RNN-T model. We also study the impact of TL with varying amount of training data ranging from 50 hours to 1000 hours and show the efficacy of TL for languages with small amount of training data.

13 citations

Posted Content
TL;DR: It is shown that fine-tuning ASR models on code-switched speech harms performance on monolingual speech, and the Learning Without Forgetting (LWF) framework is proposed for code- Switched ASR when the authors only have access to amonolingual model and do not have the data it was trained on.
Abstract: Recently, there has been significant progress made in Automatic Speech Recognition (ASR) of code-switched speech, leading to gains in accuracy on code-switched datasets in many language pairs. Code-switched speech co-occurs with monolingual speech in one or both languages being mixed. In this work, we show that fine-tuning ASR models on code-switched speech harms performance on monolingual speech. We point out the need to optimize models for code-switching while also ensuring that monolingual performance is not sacrificed. Monolingual models may be trained on thousands of hours of speech which may not be available for re-training a new model. We propose using the Learning Without Forgetting (LWF) framework for code-switched ASR when we only have access to a monolingual model and do not have the data it was trained on. We show that it is possible to train models using this framework that perform well on both code-switched and monolingual test sets. In cases where we have access to monolingual training data as well, we propose regularization strategies for fine-tuning models for code-switching without sacrificing monolingual accuracy. We report improvements in Word Error Rate (WER) in monolingual and code-switched test sets compared to baselines that use pooled data and simple fine-tuning.

12 citations

Proceedings ArticleDOI
Vikas Joshi1, Amit Das2, Eric Sun2, Rupesh R. Mehta, Jinyu Li2, Yifan Gong2 
30 Aug 2021

12 citations


Cited by
More filters
Posted Content
TL;DR: This work proposes the Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities, and performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques.
Abstract: When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning with similar old and new task datasets for improved new task performance.

1,037 citations

01 Jan 2016
TL;DR: A statistical methods for environmental pollution monitoring always becomes the most wanted book and many people are absolutely searching for this book as mentioned in this paper, which means that many love to read this kind of book.
Abstract: If you really want to be smarter, reading can be one of the lots ways to evoke and realize. Many people who like reading will have more knowledge and experiences. Reading can be a way to gain information from economics, politics, science, fiction, literature, religion, and many others. As one of the part of book categories, statistical methods for environmental pollution monitoring always becomes the most wanted book. Many people are absolutely searching for this book. It means that many love to read this kind of book.

624 citations

Journal ArticleDOI
TL;DR: In this article , the authors explored the limits of open innovation by extracting evidence from user-generated content (UGC) on Twitter using social media mining and found that open innovation is the main driver of change in a business sector that needs to be flexible and resilient, rapidly adapting to change through innovation.

98 citations

Posted Content
TL;DR: This survey reviews computational approaches for code-switched Speech and Natural Language Processing, including language processing tools and end-to-end systems and concludes with future directions and open problems in the field.
Abstract: Code-switching, the alternation of languages within a conversation or utterance, is a common communicative phenomenon that occurs in multilingual communities across the world This survey reviews computational approaches for code-switched Speech and Natural Language Processing We motivate why processing code-switched text and speech is essential for building intelligent agents and systems that interact with users in multilingual communities As code-switching data and resources are scarce, we list what is available in various code-switched language pairs with the language processing tasks they can be used for We review code-switching research in various Speech and NLP applications, including language processing tools and end-to-end systems We conclude with future directions and open problems in the field

86 citations

Journal ArticleDOI
Melissa A. Collins1
TL;DR: In this paper , the authors overview the recent advances in E2E models, focusing on technologies addressing those challenges from the industry's perspective, and provide excellent solutions to all these factors.
Abstract: Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR). While E2E models achieve the state-of-the-art results in most benchmarks in terms of ASR accuracy, hybrid models are still used in a large proportion of commercial ASR systems at the current time. There are lots of practical factors that affect the production model deployment decision. Traditional hybrid models, being optimized for production for decades, are usually good at these factors. Without providing excellent solutions to all these factors, it is hard for E2E models to be widely commercialized. In this paper, we will overview the recent advances in E2E models, focusing on technologies addressing those challenges from the industry's perspective.

52 citations