H
Hasim Sak
Researcher at Google
Publications - 80
Citations - 10303
Hasim Sak is an academic researcher from Google. The author has contributed to research in topics: Language model & Recurrent neural network. The author has an hindex of 33, co-authored 78 publications receiving 8422 citations. Previous affiliations of Hasim Sak include Boğaziçi University.
Papers
More filters
Proceedings ArticleDOI
Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling
TL;DR: The first distributed training of LSTM RNNs using asynchronous stochastic gradient descent optimization on a large cluster of machines is introduced and it is shown that a two-layer deep LSTm RNN where each L STM layer has a linear recurrent projection layer can exceed state-of-the-art speech recognition performance.
Proceedings ArticleDOI
Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks
TL;DR: This paper takes advantage of the complementarity of CNNs, LSTMs and DNNs by combining them into one unified architecture, and finds that the CLDNN provides a 4-6% relative improvement in WER over an LSTM, the strongest of the three individual models.
Posted Content
Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition
TL;DR: Novel LSTM based RNN architectures which make more effective use of model parameters to train acoustic models for large vocabulary speech recognition are presented.
Proceedings ArticleDOI
Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
TL;DR: An end-to-end speech recognition model with Transformer encoders that can be used in a streaming speech recognition system and shows that the full attention version of the model beats the-state-of-the art accuracy on the LibriSpeech benchmarks.
Posted Content
Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition
TL;DR: In this paper, the performance of LSTM RNN acoustic models for large vocabulary speech recognition was further improved by frame stacking and reduced frame rate, leading to more accurate models and faster decoding.