A
Aki Kuusela
Researcher at Google
Publications - 20
Citations - 465
Aki Kuusela is an academic researcher from Google. The author has contributed to research in topics: Encoder & Deep learning. The author has an hindex of 8, co-authored 19 publications receiving 274 citations.
Papers
More filters
Proceedings ArticleDOI
Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks
Amirali Boroumand,Saugata Ghose,Youngsok Kim,Rachata Ausavarungnirun,Eric Shiu,Rahul Thakur,Dae Hyun Kim,Aki Kuusela,Allan Knies,Parthasarathy Ranganathan,Onur Mutlu +10 more
TL;DR: This work comprehensively analyzes the energy and performance impact of data movement for several widely-used Google consumer workloads, and finds that processing-in-memory (PIM) can significantly reduceData movement for all of these workloads by performing part of the computation close to memory.
Journal ArticleDOI
Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors
Claudionor Coelho,Aki Kuusela,Li Shan,Hao Zhuang,Jennifer Ngadiuba,Thea Klaeboe Aarrestad,Vladimir Loncar,Maurizio Pierini,Adrian Alan Pol,Sioni Summers +9 more
TL;DR: In this paper, a method for designing optimally heterogeneously quantized versions of deep neural network models for minimum energy, high-accuracy, nanosecond inference and fully automated deployment on chip is introduced.
Posted Content
Automatic deep heterogeneous quantization of Deep Neural Networks for ultra low-area, low-latency inference on the edge at particle colliders
Claudionor Coelho,Aki Kuusela,Li Shan,Hao Zhuang,Thea Klaeboe Aarrestad,Vladimir Loncar,Jennifer Ngadiuba,Maurizio Pierini,Adrian Alan Pol,Sioni Summers +9 more
TL;DR: A novel method for designing optimally heterogeneously quantized versions of deep neural network models for minimum-energy, high-accuracy, nanosecond inference and fully automated deployment on chip is introduced.
Proceedings ArticleDOI
Warehouse-scale video acceleration: co-design and deployment in the wild
Parthasarathy Ranganathan,Daniel Stodolsky,Jeff Calow,Jeremy Dorfman,Marisabel Guevara,Clinton Wills Smullen,Aki Kuusela,Raghu Balasubramanian,Sandeep Bhatia,Prakash Chauhan,Anna Cheung,In Suk Chong,Niranjani Dasharathi,Jia Feng,Brian Fosco,Samuel Foss,Ben Gelb,Sara J. Gwin,Yoshiaki Hase,He Dake,C. Richard Ho,Roy W. Huffman,Elisha Indupalli,Indira Jayaram,Poonacha Kongetira,Cho Mon Kyaw,Aaron Laursen,Yuan Li,Fong Lou,Kyle Lucke,JP Maaninen,Ramon Macias,Maire Mahony,David Alexander Munday,Srikanth Muroor,Narayana Penukonda,Eric Perkins-Argueta,Devin Persaud,Alex Ramirez,Ville-Mikko Rautio,Yolanda Ripley,Amir Salek,Sathish Sekar,Sergey N. Sokolov,Robert Springer,Don Stark,Mercedes Tan,Mark S. Wachsler,Andrew C. Walton,David A. Wickeraad,Alvin Wijaya,Hon Kwan Wu +51 more
TL;DR: In this paper, the authors describe the design and deployment of a new accelerator targeted at warehouse-scale video transcoding, and discuss key design trade-offs for balanced systems at data center scale and co-designing accelerators with large-scale distributed software systems.
Posted Content
Ultra Low-latency, Low-area Inference Accelerators using Heterogeneous Deep Quantization with QKeras and hls4ml
Claudionor Coelho,Aki Kuusela,Hao Zhuang,Thea Klaeboe Aarrestad,Vladimir Loncar,Jennifer Ngadiuba,Maurizio Pierini,Sioni Summers +7 more
TL;DR: The QKeras library is introduced, an extension of the Keras library allowing for the creation of heterogeneously quantized versions of deep neural network models, through drop-in replacement of Keras layers, which significantly reduces resource consumption while retaining high accuracy when implemented on FPGA hardware.