scispace - formally typeset
Search or ask a question
Author

Lin Zhong

Other affiliations: Tsinghua University, Yale University, University of Houston  ...read more
Bio: Lin Zhong is an academic researcher from Rice University. The author has contributed to research in topics: Mobile device & Mobile computing. The author has an hindex of 54, co-authored 207 publications receiving 9725 citations. Previous affiliations of Lin Zhong include Tsinghua University & Yale University.


Papers
More filters
Proceedings Article•DOI•
22 Aug 2012
TL;DR: This work presents the design, realization, and evaluation of Argos, the first reported base station architecture that is capable of serving many terminals simultaneously through MUBF with a large number of antennas (M >> 10), and reports an Argos prototype with 64 antennas and capable ofserving 15 clients simultaneously.
Abstract: Multi-user multiple-input multiple-output theory predicts manyfold capacity gains by leveraging many antennas on wireless base stations to serve multiple clients simultaneously through multi-user beamforming (MUBF). However, realizing a base station with a large number antennas is non-trivial, and has yet to be achieved in the real-world. We present the design, realization, and evaluation of Argos, the first reported base station architecture that is capable of serving many terminals simultaneously through MUBF with a large number of antennas (M >> 10). Designed for extreme flexibility and scalability, Argos exploits hierarchical and modular design principles, properly partitions baseband processing, and holistically considers real-time requirements of MUBF. Argos employs a novel, completely distributed, beamforming technique, as well as an internal calibration procedure to enable implicit beamforming with channel estimation cost independent of the number of base station antennas. We report an Argos prototype with 64 antennas and capable of serving 15 clients simultaneously. We experimentally demonstrate that by scaling from 1 to 64 antennas the prototype can achieve up to 6.7 fold capacity gains while using a mere 1/64th of the transmission power.

730 citations

Proceedings Article•DOI•
Jiayang Liu1, Zhen Wang1, Lin Zhong1, Jehan Wickramasuriya2, Venugopal Vasudevan2 •
09 Mar 2009
TL;DR: This work evaluates uWave using a large gesture library with over 4000 samples collected from eight users over an elongated period of time for a gesture vocabulary with eight gesture patterns identified by a Nokia research and shows that uWave achieves 98.6% accuracy, competitive with statistical methods that require significantly more training samples.
Abstract: The proliferation of accelerometers on consumer electronics has brought an opportunity for interaction based on gestures or physical manipulation of the devices. We present uWave, an efficient recognition algorithm for such interaction using a single three-axis accelerometer. Unlike statistical methods, uWave requires a single training sample for each gesture pattern and allows users to employ personalized gestures and physical manipulations. We evaluate uWave using a large gesture library with over 4000 samples collected from eight users over an elongated period of time for a gesture vocabulary with eight gesture patterns identified by a Nokia research. It shows that uWave achieves 98.6% accuracy, competitive with statistical methods that require significantly more training samples. Our evaluation data set is the largest and most extensive in published studies, to the best of our knowledge. We also present applications of uWave in gesture-based user authentication and interaction with three-dimensional mobile user interfaces using user created gestures.

717 citations

Journal Article•DOI•

[...]

TL;DR: This work has developed a deployable experimental mMIMO platform that could unleash novel collaborations between communications, computing, and machine learning researchers to completely rethink next-generation networks.
Abstract: Massive multiple-input multiple-output (mMIMO) technology uses a very large number of antennas at base stations to significantly increase efficient use of the wireless spectrum. Thus, mMIMO is considered an essential part of 5G and beyond. However, developing a scalable and reliable mMIMO system is an extremely challenging task, significantly hampering the ability of the research community to research nextgeneration networks. This "research bottleneck" motivated us to develop a deployable experimental mMIMO platform to enable research across many areas. We also envision that this platform could unleash novel collaborations between communications, computing, and machine learning researchers to completely rethink next-generation networks.

552 citations

Proceedings Article•DOI•
25 Jun 2013
TL;DR: A first-of-its-kind smartphone software system, MoodScope, which infers the mood of its user based on how the smartphone is used and provides mood as an important input to context-aware computing is reported.
Abstract: We report a first-of-its-kind smartphone software system, MoodScope, which infers the mood of its user based on how the smartphone is used. Compared to smartphone sensors that measure acceleration, light, and other physical properties, MoodScope is a "sensor" that measures the mental state of the user and provides mood as an important input to context-aware computing. We run a formative statistical mood study with smartphone-logged data collected from 32 participants over two months. Through the study, we find that by analyzing communication history and application usage patterns, we can statistically infer a user's daily mood average with an initial accuracy of 66%, which gradu-ally improves to an accuracy of 93% after a two-month personal-ized training period. Motivated by these results, we build a service, MoodScope, which analyzes usage history to act as a sensor of the user's mood. We provide a MoodScope API for developers to use our system to create mood-enabled applications. We further create and deploy a mood-sharing social application.

479 citations

Journal Article•DOI•
Clayton Shepard1, Ahmad Rahmati1, Chad C. Tossell1, Lin Zhong1, Phillip Kortum1 •
03 Jan 2011
TL;DR: This position paper presents LiveLab, a methodology to measure real-world smartphone usage and wireless networks with a reprogrammable indevice logger designed for long-term user studies, and demonstrates the feasibility and capability of LiveLab.
Abstract: We present LiveLab, a methodology to measure real-world smartphone usage and wireless networks with a reprogrammable indevice logger designed for long-term user studies. We discuss the challenges of privacy protection and power impact in LiveLab and offer our solutions. We present an iPhone 3GS based deployment of LiveLab with 25 users intended for one year. Early results from the data collection so far highlight the unique strengths and potential of LiveLab. We have two objectives in this position paper. First, we demonstrate the feasibility and capability of LiveLab. By sharing our experience, we seek to advocate LiveLab as a network and user measurement methodology. Second, we present our preliminary findings, and seek feedback from the community regarding what data to collect.

333 citations


Cited by
More filters
Journal Article•DOI•
TL;DR: While massive MIMO renders many traditional research problems irrelevant, it uncovers entirely new problems that urgently need attention: the challenge of making many low-cost low-precision components that work effectively together, acquisition and synchronization for newly joined terminals, the exploitation of extra degrees of freedom provided by the excess of service antennas, reducing internal power consumption to achieve total energy efficiency reductions, and finding new deployment scenarios.
Abstract: Multi-user MIMO offers big advantages over conventional point-to-point MIMO: it works with cheap single-antenna terminals, a rich scattering environment is not required, and resource allocation is simplified because every active terminal utilizes all of the time-frequency bins. However, multi-user MIMO, as originally envisioned, with roughly equal numbers of service antennas and terminals and frequency-division duplex operation, is not a scalable technology. Massive MIMO (also known as large-scale antenna systems, very large MIMO, hyper MIMO, full-dimension MIMO, and ARGOS) makes a clean break with current practice through the use of a large excess of service antennas over active terminals and time-division duplex operation. Extra antennas help by focusing energy into ever smaller regions of space to bring huge improvements in throughput and radiated energy efficiency. Other benefits of massive MIMO include extensive use of inexpensive low-power components, reduced latency, simplification of the MAC layer, and robustness against intentional jamming. The anticipated throughput depends on the propagation environment providing asymptotically orthogonal channels to the terminals, but so far experiments have not disclosed any limitations in this regard. While massive MIMO renders many traditional research problems irrelevant, it uncovers entirely new problems that urgently need attention: the challenge of making many low-cost low-precision components that work effectively together, acquisition and synchronization for newly joined terminals, the exploitation of extra degrees of freedom provided by the excess of service antennas, reducing internal power consumption to achieve total energy efficiency reductions, and finding new deployment scenarios. This article presents an overview of the massive MIMO concept and contemporary research on the topic.

6,184 citations

Posted Content•
TL;DR: This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN) and compares it to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the samedatacenters.
Abstract: Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters' NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X - 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X - 80X higher. Moreover, using the GPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.

3,067 citations

Journal Article•DOI•
TL;DR: The performance requirements for computing with memristive devices are examined and how the outstanding challenges could be met are examined.
Abstract: Memristive devices are electrical resistance switches that can retain a state of internal resistance based on the history of applied voltage and current. These devices can store and process information, and offer several key performance characteristics that exceed conventional integrated circuit technology. An important class of memristive devices are two-terminal resistance switches based on ionic motion, which are built from a simple conductor/insulator/conductor thin-film stack. These devices were originally conceived in the late 1960s and recent progress has led to fast, low-energy, high-endurance devices that can be scaled down to less than 10 nm and stacked in three dimensions. However, the underlying device mechanisms remain unclear, which is a significant barrier to their widespread application. Here, we review recent progress in the development and understanding of memristive devices. We also examine the performance requirements for computing with memristive devices and detail how the outstanding challenges could be met.

3,037 citations

Proceedings Article•DOI•
24 Jun 2017
TL;DR: The Tensor Processing Unit (TPU) as discussed by the authors is a custom ASIC deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN) using a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS).
Abstract: Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU) --- deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters' NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X -- 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X -- 80X higher. Moreover, using the CPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.

2,679 citations

Journal Article•DOI•
20 Nov 2017
TL;DR: In this paper, the authors provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs, and discuss various hardware platforms and architectures that support DNN, and highlight key trends in reducing the computation cost of deep neural networks either solely via hardware design changes or via joint hardware and DNN algorithm changes.
Abstract: Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic codesigns, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the tradeoffs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.

2,391 citations