scispace - formally typeset
H

Hasim Sak

Researcher at Google

Publications -  80
Citations -  10303

Hasim Sak is an academic researcher from Google. The author has contributed to research in topics: Language model & Recurrent neural network. The author has an hindex of 33, co-authored 78 publications receiving 8422 citations. Previous affiliations of Hasim Sak include Boğaziçi University.

Papers
More filters
Proceedings ArticleDOI

Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling

TL;DR: The first distributed training of LSTM RNNs using asynchronous stochastic gradient descent optimization on a large cluster of machines is introduced and it is shown that a two-layer deep LSTm RNN where each L STM layer has a linear recurrent projection layer can exceed state-of-the-art speech recognition performance.
Proceedings ArticleDOI

Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks

TL;DR: This paper takes advantage of the complementarity of CNNs, LSTMs and DNNs by combining them into one unified architecture, and finds that the CLDNN provides a 4-6% relative improvement in WER over an LSTM, the strongest of the three individual models.
Posted Content

Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition

TL;DR: Novel LSTM based RNN architectures which make more effective use of model parameters to train acoustic models for large vocabulary speech recognition are presented.
Proceedings ArticleDOI

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss

TL;DR: An end-to-end speech recognition model with Transformer encoders that can be used in a streaming speech recognition system and shows that the full attention version of the model beats the-state-of-the art accuracy on the LibriSpeech benchmarks.
Posted Content

Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition

TL;DR: In this paper, the performance of LSTM RNN acoustic models for large vocabulary speech recognition was further improved by frame stacking and reduced frame rate, leading to more accurate models and faster decoding.