scispace - formally typeset
Search or ask a question
Author

Nitendra Rajput

Bio: Nitendra Rajput is an academic researcher from IBM. The author has contributed to research in topics: Mobile device & Speech processing. The author has an hindex of 24, co-authored 164 publications receiving 2169 citations. Previous affiliations of Nitendra Rajput include Indian Institutes of Technology & Nuance Communications.


Papers
More filters
Proceedings ArticleDOI
T.A. Faruquie1, Chalapathy Neti1, Nitendra Rajput1, L. V. Subramaniam1, Ashish Verma1 
31 Jan 2000
TL;DR: This work presents a novel scheme to implement a language independent system for audio-driven facial animation given a speech recognition system for just one language, in this case, English.
Abstract: Audio-driven facial animation is an interesting and evolving technique for human-computer interaction. Based on an incoming audio stream, a face image is animated with full lip synchronization. This requires a speech recognition system in the language in which audio is provided to get the time alignment for the phonetic sequence of the audio signal. However, building a speech recognition system is data intensive and is a very tedious and time consuming task. We present a novel scheme to implement a language independent system for audio-driven facial animation given a speech recognition system for just one language, in our case, English. The method presented here can also be used for text to audio-visual speech synthesis.

150 citations

Proceedings ArticleDOI
Arun Kumar1, Nitendra Rajput1, Dipanjan Chakraborty1, Sheetal K. Agarwal1, Amit A. Nanavati1 
27 Aug 2007
TL;DR: WWTW is a network of interconnected voice sites that are voice driven applications created by users and hosted in the network that has the potential to enable the underprivileged population to become a part of the next generation converged networked world.
Abstract: The World Wide Web (WWW) enabled quick and easy information dissemination and brought about fundamental changes to various aspects of our lives. However, a very large number of people, mostly in developing regions, are still untouched by this revolution. Compared to PCs, the primary access mechanism to WWW, mobile phones have made a phenomenal penetration into this population segment. Low cost of ownership, the simple user interface consisting of a small keyboard, limited menu and voice-based access contribute to the success of mobile phones with the less literate. However, apart from basic voice communication, these people are not being able to exploit the benefits of information and services available to WWW users.In this paper, we present the World Wide Telecom Web (WWTW) --- our vision of a voice-driven ecosystem parallel to that of the WWW. WWTW is a network of interconnected voice sites that are voice driven applications created by users and hosted in the network. It has the potential to enable the underprivileged population to become a part of the next generation converged networked world. We present a whole gamut of existing technology enablers for our vision as well as present research directions and open challenges that need to be solved to not only realize a WWTW but also to enable the two Webs to cross leverage each other.

107 citations

Proceedings ArticleDOI
04 Apr 2009
TL;DR: A study comparing speech and Dialed input voice user interfaces for farmers in Gujarat, India found that the task completion rates were significantly higher with dialed input, particularly for subjects under age 30 and those with less than an eighth grade education.
Abstract: In this paper we present a study comparing speech and dialed input voice user interfaces for farmers in Gujarat, India. We ran a controlled, between-subjects experiment with 45 participants. We found that the task completion rates were significantly higher with dialed input, particularly for subjects under age 30 and those with less than an eighth grade education. Additionally, participants using dialed input demonstrated a significantly greater performance improvement from the first to final task, and reported less difficulty providing input to the system.

100 citations

Journal ArticleDOI
TL;DR: This paper presents two new techniques that have been used to build a large-vocabulary continuous Hindi speech recognition system and proposes a hybrid approach that combines rule-based and statistical approaches in a two-step fashion.
Abstract: In this paper we present two new techniques that have been used to build a large-vocabulary continuous Hindi speech recognition system. We present a technique for fast bootstrapping of initial phone models of a new language. The training data for the new language is aligned using an existing speech recognition engine for another language. This aligned data is used to obtain the initial acoustic models for the phones of the new language. Following this approach requires less training data. We also present a technique for generating baseforms (phonetic spellings) for phonetic languages such as Hindi. As is inherent in phonetic languages, rules generally capture the mapping of spelling to phonemes very well. However, deep linguistic knowledge is required to write all possible rules, and there are some ambiguities in the language that are difficult to capture with rules. On the other hand, pure statistical techniques for base and generation require large amounts of training data that are not readily available. We propose a hybrid approach that combines rule-based and statistical approaches in a two-step fashion. We evaluate the performance of the proposed approaches through various phonetic classification and recognition experiments.

96 citations

Patent
Nitendra Rajput1, Kundan Shrivastava1
30 Aug 2011
TL;DR: In this article, the authors present a method for accessing a specific location in voice site audio content by indexing, in a voice site index, the audio content and mapping it with information regarding the location and adding the mapped content to the index of the voice site.
Abstract: A method, an apparatus and an article of manufacture for accessing a specific location in voice site audio content. The method includes indexing, in a voice site index, a specific location in the voice site that contains the audio content, mapping the audio content with information regarding the location and adding the mapped content to the index of the voice site, using the index to determine content and location of an input query in the voice site, automatically marking the specific location in the voice site that contains the determined content and location of the input query, and automatically transferring to the marked location in the voice site.

89 citations


Cited by
More filters
Patent
11 Jan 2011
TL;DR: In this article, an intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.
Abstract: An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

1,462 citations

Patent
28 Sep 2012
TL;DR: In this article, a virtual assistant uses context information to supplement natural language or gestural input from a user, which helps to clarify the user's intent and reduce the number of candidate interpretations of user's input, and reduces the need for the user to provide excessive clarification input.
Abstract: A virtual assistant uses context information to supplement natural language or gestural input from a user. Context helps to clarify the user's intent and to reduce the number of candidate interpretations of the user's input, and reduces the need for the user to provide excessive clarification input. Context can include any available information that is usable by the assistant to supplement explicit user input to constrain an information-processing problem and/or to personalize results. Context can be used to constrain solutions during various phases of processing, including, for example, speech recognition, natural language processing, task flow processing, and dialog generation.

593 citations

Book ChapterDOI
22 Oct 2005
TL;DR: The paper is emphasized on the several issues involved implicitly in the whole interactive feedback loop and various methods for each issue are discussed in order to examine the state of the art.
Abstract: Affective computing is currently one of the most active research topics, furthermore, having increasingly intensive attention. This strong interest is driven by a wide spectrum of promising applications in many areas such as virtual reality, smart surveillance, perceptual interface, etc. Affective computing concerns multidisciplinary knowledge background such as psychology, cognitive, physiology and computer sciences. The paper is emphasized on the several issues involved implicitly in the whole interactive feedback loop. Various methods for each issue are discussed in order to examine the state of the art. Finally, some research challenges and future directions are also discussed.

435 citations

Journal ArticleDOI
TL;DR: This paper proposes, in this paper, a survey that focuses on automatic speech recognition (ASR) for under-resourced languages, and a literature review of the recent contributions made.

435 citations

Posted Content
TL;DR: The authors conducted a systematic review of articles on the Base/Bottom of the Pyramid (BOP) concept and identified 104 articles published in journals or proceedings over a ten-year period (2000-2009).
Abstract: In 1998-1999, Prahalad and colleagues introduced the Base/Bottom of the Pyramid (BOP) concept in an article and a working paper. This article’s goal is to answer the question: What has become of the concept over the decade following its first systematic exposition in 1999? To answer this question, the authors conducted a systematic review of articles on the BOP, identifying 104 articles published in journals or proceedings over a ten-year period (2000-2009). This count excludes books, chapters, and teaching cases. The review shows that the BOP concept evolved dramatically following Prahalad’s original call to multinational enterprises (MNEs). De-emphasizing the role of MNEs over time, published BOP articles portray a more complex picture, with wide variations in terms of BOP contexts, of BOP initiatives, and of impacts of the BOP approach. A simple framework for organizing the reviewed articles helps discuss findings, identify the gaps that still exist in the literature, and suggest directions for future research.

400 citations