Top 11 papers published by Andrew Y. Ng from Stanford University in 2014

Posted Content•

Deep Speech: Scaling up end-to-end speech recognition

[...]

Awni Hannun¹, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, Andrew Y. Ng - Show less +7 more•Institutions (1)

Baidu¹

17 Dec 2014-arXiv: Computation and Language

TL;DR: Deep Speech, a state-of-the-art speech recognition system developed using end-to-end deep learning, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set.

...read moreread less

Abstract: We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a "phoneme." Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called Deep Speech, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set. Deep Speech also handles challenging noisy environments better than widely used, state-of-the-art commercial speech systems.

...read moreread less

1,761 citations

Journal Article•DOI•

Grounded Compositional Semantics for Finding and Describing Images with Sentences

[...]

Richard Socher¹, Andrej Karpathy¹, Quoc V. Le², Christopher D. Manning¹, Andrew Y. Ng¹ - Show less +1 more•Institutions (2)

Stanford University¹, Google²

30 Apr 2014-Transactions of the Association for Computational Linguistics

TL;DR: The DT-RNN model, which uses dependency trees to embed sentences into a vector space in order to retrieve images that are described by those sentences, outperform other recursive and recurrent neural networks, kernelized CCA and a bag-of-words baseline on the tasks of finding an image that fits a sentence description and vice versa.

...read moreread less

Abstract: Previous work on Recursive Neural Networks (RNNs) shows that these models can produce compositional feature vectors for accurately representing and classifying sentences or images. However, the sentence vectors of previous models cannot accurately represent visually grounded meaning. We introduce the DT-RNN model which uses dependency trees to embed sentences into a vector space in order to retrieve images that are described by those sentences. Unlike previous RNN-based models which use constituency trees, DT-RNNs naturally focus on the action and agents in a sentence. They are better able to abstract from the details of word order and syntactic expression. DT-RNNs outperform other recursive and recurrent neural networks, kernelized CCA and a bag-of-words baseline on the tasks of finding an image that fits a sentence description and vice versa. They also give more similar representations to sentences that describe the same image.

...read moreread less

916 citations

Posted Content•

First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs

[...]

Andrew L. Maas¹, Awni Hannun, Dan Jurafsky, Andrew Y. Ng•Institutions (1)

Stanford University¹

12 Aug 2014-arXiv: Computation and Language

TL;DR: This paper demonstrates that a straightforward recurrent neural network architecture can achieve a high level of accuracy and proposes and evaluates a modified prefix-search decoding algorithm that enables first-pass speech recognition with a langu age model, completely unaided by the cumbersome infrastructure of HMM-based systems.

...read moreread less

Abstract: We present a method to perform first-pass large vocabulary co ntinuous speech recognition using only a neural network and language model. Deep neural network acoustic models are now commonplace in HMM-based speech recognition systems, but building such systems is a complex, domain-specific task. Recent work demonstrated the feasibility of discarding the HMM sequence modeling framework by directly predicting transcript text from audio. This paper extends this approach in two ways. First, we demonstrate that a straightforward recurrent neural network architecture can achieve a high level of accuracy. Second, we propose and evaluate a modified prefix-search decoding algorith m. This approach to decoding enables first-pass speech recognition with a langu age model, completely unaided by the cumbersome infrastructure of HMM-based systems. Experiments on the Wall Street Journal corpus demonstrate fairly competitive word error rates, and the importance of bi-directional network recurrence.

...read moreread less

172 citations

Posted Content•

Building DNN Acoustic Models for Large Vocabulary Speech Recognition

[...]

Andrew L. Maas¹, Peng Qi¹, Ziang Xie¹, Awni Hannun¹, Christopher T. Lengerich¹, Dan Jurafsky¹, Andrew Y. Ng¹ - Show less +3 more•Institutions (1)

Stanford University¹

30 Jun 2014-arXiv: Computation and Language

TL;DR: An empirical investigation on which aspects of DNN acoustic model design are most important for speech recognition system performance, and suggests that a relatively simple DNN architecture and optimization technique produces strong results.

...read moreread less

Abstract: Deep neural networks (DNNs) are now a central component of nearly all state-of-the-art speech recognition systems. Building neural network acoustic models requires several design decisions including network architecture, size, and training loss function. This paper offers an empirical investigation on which aspects of DNN acoustic model design are most important for speech recognition system performance. We report DNN classifier performance and final speech recognizer word error rates, and compare DNNs using several metrics to quantify factors influencing differences in task performance. Our first set of experiments use the standard Switchboard benchmark corpus, which contains approximately 300 hours of conversational telephone speech. We compare standard DNNs to convolutional networks, and present the first experiments using locally-connected, untied neural networks for acoustic modeling. We additionally build systems on a corpus of 2,100 hours of training data by combining the Switchboard and Fisher corpora. This larger corpus allows us to more thoroughly examine performance of large DNN models -- with up to ten times more parameters than those typically used in speech recognition systems. Our results suggest that a relatively simple DNN architecture and optimization technique produces strong results. These findings, along with previous work, help establish a set of best practices for building DNN hybrid speech recognition systems with maximum likelihood training. Our experiments in DNN optimization additionally serve as a case study for training DNNs with discriminative loss functions for speech tasks, as well as DNN classifiers more generally.

...read moreread less

82 citations

Journal Article•DOI•

Mechatronic design of an integrated robotic hand

[...]

Morgan Quigley, Curt Salisbury¹, Andrew Y. Ng², J. Kenneth Salisbury²•Institutions (2)

SRI International¹, Stanford University²

01 Apr 2014-The International Journal of Robotics Research

TL;DR: This paper presents a hand designed for minimalistic dexterous manipulation, in which every stage of the design process also considered its manufacturing cost.

...read moreread less

Abstract: Historically, robotic hand research has tended to focus on two areas: severely underactuated hands, and high-degree-of-freedom fully actuated hands. Comparatively little research has been done in between those spaces. Furthermore, despite the large number of robotic hand designs that have been proposed in the past few decades, very few robot hands are available for purchase on the commercial market. In this paper, we present a hand designed for minimalistic dexterous manipulation, in which every stage of the design process also considered its manufacturing cost. We discuss the various trade-offs made in the design. Finally, we present the results of experiments in which the robotic hand was affixed to a manipulator arm and teleoperated to grasp and manipulate a variety of objects.

...read moreread less

50 citations

Journal Article•DOI•

Offering Verified Credentials in Massive Open Online Courses: MOOCs and technology to advance learning and learning research (Ubiquity symposium)

[...]

Andrew L. Maas¹, Chris Heather, Chuong B. Do, Relly Brandman, Daphne Koller¹, Andrew Y. Ng¹ - Show less +2 more•Institutions (1)

Stanford University¹

01 May 2014-Ubiquity

TL;DR: The process and biometrics Coursera uses to establish and verify student identity during a course and data is presented that suggest verified certificate programs help increase student success rates in courses.

...read moreread less

Abstract: Massive open online courses (MOOCs) enable the delivery of high-quality educational experiences to large groups of students. Coursera, one of the largest MOOC providers, developed a program to provide students with verified credentials as a record of their MOOC performance. Such credentials help students convey achievements in MOOCs to future employers and academic programs. This article outlines the process and biometrics Coursera uses to establish and verify student identity during a course. We additionally present data that suggest verified certificate programs help increase student success rates in courses.

...read moreread less

37 citations

Patent•

Identity verification for online education

[...]

Bipin Suresh, Christopher B. Heather, Jiquan Ngiam, Minjeong Kim, Pamela S. Fox, Andrew Y. Ng - Show less +2 more

11 Aug 2014

TL;DR: In this article, a user is prompted to provide authentication information including at least one of a plurality of types of information, and the authentication information received is compared to at least a portion of stored enrollment information associated with the user with which the received authentication information is associated.

...read moreread less

Abstract: Performing identity verification for online education is disclosed. In response to receiving a notification of a submission event, a user is prompted to provide authentication information including at least one of a plurality of types of information. Authentication information received is compared to at least a portion of stored enrollment information associated with the user with which the received authentication information is associated. The stored enrollment information includes at least two different types of information collected during an enrollment phase, including the at least one type of information solicited during the user prompting. In the event that matching criteria are met based at least in part on the comparison, a first action is performed. In the event that matching criteria are not met based at least in part on the comparison, a second action that is different from the first action is performed.

...read moreread less

34 citations

Posted Content•

Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition

[...]

Andrew L. Maas, Awni Hannun, Christopher T. Lengerich, Peng Qi, Dan Jurafsky, Andrew Y. Ng - Show less +2 more

30 Jun 2014

TL;DR: The results show that with sufficient training data, increasing DNN model size is an effective, direct path to performance improvements, and even smaller DNNs benefit from a larger training corpus.

...read moreread less

Abstract: Deep neural networks (DNNs) are now a central component of nearly all state-of-the-art speech recognition systems. Building neural network acoustic models requires several design decisions including network architecture, size, and training loss function. This paper offers an empirical investigation on which aspects of DNN acoustic model design are most important for speech recognition system performance. We report DNN classifier performance and final speech recognizer word error rates, and compare DNNs using several metrics to quantify factors influencing differences in task performance. Our first set of experiments use the standard Switchboard benchmark corpus, which contains approximately 300 hours of conversational telephone speech. We compare standard DNNs to convolutional networks, and present the first experiments using locally-connected, untied neural networks for acoustic modeling. We additionally build systems on a corpus of 2,100 hours of training data by combining the Switchboard and Fisher corpora. This larger corpus allows us to more thoroughly examine performance of large DNN models -- with up to ten times more parameters than those typically used in speech recognition systems. Our results suggest that a relatively simple DNN architecture and optimization technique produces strong results. These findings, along with previous work, help establish a set of best practices for building DNN hybrid speech recognition systems with maximum likelihood training. Our experiments in DNN optimization additionally serve as a case study for training DNNs with discriminative loss functions for speech tasks, as well as DNN classifiers more generally.

...read moreread less

22 citations

Patent•

Evaluating image similarity

[...]

Yang Song¹, Charles J. Rosenberg¹, Andrew Y. Ng¹, Bo Chen¹•Institutions (1)

Google¹

01 Aug 2014

22 citations

Simultaneous and Mapping With Sparse Extended Information

[...]

Sebastian Thrun, Yufeng Liu, Daphne Koller, Andrew Y. Ng, Zoubin Ghahramani, Hugh Durrant-Whyte - Show less +2 more

01 Jan 2014

TL;DR: It is shown that when represented in the information form, map posteriors are dominated by a small number of links that tie together nearby features in the map, which is developed into a sparse variant of the EIF, called the sparse extended information filter (SEIF).

...read moreread less

Abstract: In this paper we describe a scalable algorithm for the simultaneous mapping and localization (SLAM) problem. SLAM is the problem of acquiring a map of a static environment with a mobile robot. The vast majority of SLAM algorithms are based on the extended Kalman filter (EKF). In this paper we advocate an algorithm that relies on the dual of the EKF, the extended information filter (EIF). We show that when represented in the information form, map posteriors are dominated by a small number of links that tie together nearby features in the map. This insight is developed into a sparse variant of the EIF, called the sparse extended information filter (SEIF). SEIFs represent maps by graphical networks of features that are locally interconnected, where links represent relative information between pairs of nearby features, as well as information about the robot's pose relative to the map. We show that all essential update equations in SEIFs can be executed in constant time, irrespective of the size of the map. We also provide empirical results obtained for a benchmark data set collected in an outdoor environment, and using a multi-robot mapping simulation.

...read moreread less

2 citations

Report•DOI•

Discovery of Deep Structure from Unlabeled Data

[...]

Andrew Y. Ng, Christopher D. Manning

01 Nov 2014

TL;DR: This research project addressed the problem of learning useful deep representations from unlabeled data by innovating new unsupervised deep learning algorithms capable of learning important semantic structure in the input data in a domain general way.

...read moreread less

Abstract: : This research project addressed the problem of learning useful deep representations from unlabeled data. The major goal was to innovate new unsupervised deep learning algorithms capable of learning important semantic structure in the input data in a domain general way. At the conclusion of this project, these goals stand fulfilled. The lab produced a variety of new and influential learning algorithms including Independent Subspace Analysis (ISA); Reconstruction Independent Components Analysis (RICA); recursive neural networks; and recursive tensor networks, among others. These algorithms have posted state-of-the-art results across a number of domains and tasks, and have had impact on both academia and industry.

...read moreread less

Showing papers by "Andrew Y. Ng published in 2014"