scispace - formally typeset
Search or ask a question
Author

Guohui Wang

Other affiliations: University of Tübingen, Qualcomm
Bio: Guohui Wang is an academic researcher from Rice University. The author has contributed to research in topics: Throughput (business) & Low-density parity-check code. The author has an hindex of 17, co-authored 39 publications receiving 1124 citations. Previous affiliations of Guohui Wang include University of Tübingen & Qualcomm.

Papers
More filters
Journal ArticleDOI
TL;DR: This work proposes a new approximate matrix inversion algorithm relying on a Neumann series expansion, which substantially reduces the complexity of linear data detection in single-carrier frequency-division multiple access (SC-FDMA)-based large-scale MIMO systems.
Abstract: Large-scale (or massive) multiple-input multiple-out put (MIMO) is expected to be one of the key technologies in next-generation multi-user cellular systems based on the upcoming 3GPP LTE Release 12 standard, for example. In this work, we propose-to the best of our knowledge-the first VLSI design enabling high-throughput data detection in single-carrier frequency-division multiple access (SC-FDMA)-based large-scale MIMO systems. We propose a new approximate matrix inversion algorithm relying on a Neumann series expansion, which substantially reduces the complexity of linear data detection. We analyze the associated error, and we compare its performance and complexity to those of an exact linear detector. We present corresponding VLSI architectures, which perform exact and approximate soft-output detection for large-scale MIMO systems with various antenna/user configurations. Reference implementation results for a Xilinx Virtex-7 XC7VX980T FPGA show that our designs are able to achieve more than 600 Mb/s for a 128 antenna, 8 user 3GPP LTE-based large-scale MIMO system. We finally provide a performance/complexity trade-off comparison using the presented FPGA designs, which reveals that the detector circuit of choice is determined by the ratio between BS antennas and users, as well as the desired error-rate performance.

363 citations

Proceedings ArticleDOI
26 May 2013
TL;DR: This work proposes to accelerate an exemplar-based inpainting algorithm for object removal on a mobile GPU using OpenCL, and is the first published implementation of general-purpose computing using Opencl on mobile GPUs.
Abstract: Recently, general-purpose computing on graphics processing units (GPGPU) has been enabled on mobile devices thanks to the emerging heterogeneous programming models such as OpenCL. The capability of GPGPU on mobile devices opens a new era for mobile computing and can enable many computationally demanding computer vision algorithms on mobile devices. As a case study, this paper proposes to accelerate an exemplar-based inpainting algorithm for object removal on a mobile GPU using OpenCL. We discuss the methodology of exploring the parallelism in the algorithm as well as several optimization techniques. Experimental results demonstrate that our optimization strategies for mobile GPUs have significantly reduced the processing time and make computationally intensive computer vision algorithms feasible for a mobile device. To the best of the authors' knowledge, this work is the first published implementation of general-purpose computing using OpenCL on mobile GPUs.

78 citations

Proceedings ArticleDOI
26 May 2013
TL;DR: This work presents an implementation of the popular Scale-Invariant Feature Transform (SIFT) feature detection algorithm that incorporates the powerful graphics processing unit (GPU) in mobile devices and proposes a heterogeneous dataflow scheme to achieve near-realtime detection.
Abstract: Emerging mobile applications, such as augmented reality, demand robust feature detection at high frame rates. We present an implementation of the popular Scale-Invariant Feature Transform (SIFT) feature detection algorithm that incorporates the powerful graphics processing unit (GPU) in mobile devices. Where the usual GPU methods are inefficient on mobile hardware, we propose a heterogeneous dataflow scheme. By methodically partitioning the computation, compressing the data for memory transfers, and taking into account the unique challenges that arise out of the mobile GPU, we are able to achieve a speedup of 4-7x over an optimized CPU version, and a 6.4x speedup over a published GPU implementation. Additionally, we reduce energy consumption by 87 percent per image. We achieve near-realtime detection without compromising the original algorithm.

73 citations

Proceedings ArticleDOI
01 Dec 2013
TL;DR: This paper presents optimization techniques for a parallel LDPC decoder including algorithm optimization, fully coalesced memory access, asynchronous data transfer and multi-stream concurrent kernel execution for modern GPU architectures.
Abstract: In this paper, we present a high throughput and low latency LDPC (low-density parity-check) decoder implementation on GPUs (graphics processing units). The existing GPU-based LDPC decoder implementations suffer from low throughput and long latency, which prevent them from being used in practical SDR (software-defined radio) systems. To overcome this problem, we present optimization techniques for a parallel LDPC decoder including algorithm optimization, fully coalesced memory access, asynchronous data transfer and multi-stream concurrent kernel execution for modern GPU architectures. Experimental results demonstrate that the proposed LDPC decoder achieves 316 Mbps (at 10 iterations) peak throughput on a single GPU. The decoding latency, which is much lower than that of the state of the art, varies from 0.207 ms to 1.266 ms for different throughput requirements from 62.5 Mbps to 304.16 Mbps. When using four GPUs concurrently, we achieve an aggregate peak throughput of 1.25 Gbps (at 10 iterations).

68 citations

Proceedings ArticleDOI
04 May 2014
TL;DR: This paper proposes - to the best of the knowledge - the first ASIC design for high-throughput data detection in single carrier frequency division multiple access (SC-FDMA)-based large-scale MIMO systems, such as systems building on future 3GPP LTE-Advanced standards.
Abstract: This paper proposes - to the best of our knowledge - the first ASIC design for high-throughput data detection in single carrier frequency division multiple access (SC-FDMA)-based large-scale MIMO systems, such as systems building on future 3GPP LTE-Advanced standards. In order to substantially reduce the complexity of linear soft-output data detection in systems having hundreds of antennas at the base station (BS), the proposed detector builds upon a truncated Neumann series expansion to compute the necessary matrix inverse at low complexity. To achieve high throughput in the 3GPP LTE-A uplink, we develop a systolic VLSI architecture including all necessary processing blocks. We present a corresponding ASIC design that achieves 3.8 Gb/s for a 128 antenna, 8 user 3GPP LTE-A based large-scale MIMO system, while occupying 11.1 mm 2 in a TSMC 45nm CMOS technology.

60 citations


Cited by
More filters
Book
03 Jan 2018
TL;DR: This monograph summarizes many years of research insights in a clear and self-contained way and providest the reader with the necessary knowledge and mathematical toolsto carry out independent research in this area.
Abstract: Massive multiple-input multiple-output MIMO is one of themost promising technologies for the next generation of wirelesscommunication networks because it has the potential to providegame-changing improvements in spectral efficiency SE and energyefficiency EE. This monograph summarizes many years ofresearch insights in a clear and self-contained way and providesthe reader with the necessary knowledge and mathematical toolsto carry out independent research in this area. Starting froma rigorous definition of Massive MIMO, the monograph coversthe important aspects of channel estimation, SE, EE, hardwareefficiency HE, and various practical deployment considerations.From the beginning, a very general, yet tractable, canonical systemmodel with spatial channel correlation is introduced. This modelis used to realistically assess the SE and EE, and is later extendedto also include the impact of hardware impairments. Owing tothis rigorous modeling approach, a lot of classic "wisdom" aboutMassive MIMO, based on too simplistic system models, is shownto be questionable.

1,352 citations

Journal ArticleDOI
TL;DR: In this article, the authors provide a recital on the historic heritages and novel challenges facing massive/large-scale multiple-input multiple-output (LS-MIMO) systems from a detection perspective.
Abstract: The emerging massive/large-scale multiple-input multiple-output (LS-MIMO) systems that rely on very large antenna arrays have become a hot topic of wireless communications. Compared to multi-antenna aided systems being built at the time of this writing, such as the long-term evolution (LTE) based fourth generation (4G) mobile communication system which allows for up to eight antenna elements at the base station (BS), the LS-MIMO system entails an unprecedented number of antennas, say 100 or more, at the BS. The huge leap in the number of BS antennas opens the door to a new research field in communication theory, propagation and electronics, where random matrix theory begins to play a dominant role. Interestingly, LS-MIMOs also constitute a perfect example of one of the key philosophical principles of the Hegelian Dialectics, namely, that “quantitative change leads to qualitative change.” In this treatise, we provide a recital on the historic heritages and novel challenges facing LS-MIMOs from a detection perspective. First, we highlight the fundamentals of MIMO detection, including the nature of co-channel interference (CCI), the generality of the MIMO detection problem, the received signal models of both linear memoryless MIMO channels and dispersive MIMO channels exhibiting memory, as well as the complex-valued versus real-valued MIMO system models. Then, an extensive review of the representative MIMO detection methods conceived during the past 50 years (1965–2015) is presented, and relevant insights as well as lessons are inferred for the sake of designing complexity-scalable MIMO detection algorithms that are potentially applicable to LS-MIMO systems. Furthermore, we divide the LS-MIMO systems into two types, and elaborate on the distinct detection strategies suitable for each of them. The type-I LS-MIMO corresponds to the case where the number of active users is much smaller than the number of BS antennas, which is currently the mainstream definition of LS-MIMO. The type-II LS-MIMO corresponds to the case where the number of active users is comparable to the number of BS antennas. Finally, we discuss the applicability of existing MIMO detection algorithms in LS-MIMO systems, and review some of the recent advances in LS-MIMO detection.

626 citations

Journal ArticleDOI
TL;DR: This work proposes a new approximate matrix inversion algorithm relying on a Neumann series expansion, which substantially reduces the complexity of linear data detection in single-carrier frequency-division multiple access (SC-FDMA)-based large-scale MIMO systems.
Abstract: Large-scale (or massive) multiple-input multiple-out put (MIMO) is expected to be one of the key technologies in next-generation multi-user cellular systems based on the upcoming 3GPP LTE Release 12 standard, for example. In this work, we propose-to the best of our knowledge-the first VLSI design enabling high-throughput data detection in single-carrier frequency-division multiple access (SC-FDMA)-based large-scale MIMO systems. We propose a new approximate matrix inversion algorithm relying on a Neumann series expansion, which substantially reduces the complexity of linear data detection. We analyze the associated error, and we compare its performance and complexity to those of an exact linear detector. We present corresponding VLSI architectures, which perform exact and approximate soft-output detection for large-scale MIMO systems with various antenna/user configurations. Reference implementation results for a Xilinx Virtex-7 XC7VX980T FPGA show that our designs are able to achieve more than 600 Mb/s for a 128 antenna, 8 user 3GPP LTE-based large-scale MIMO system. We finally provide a performance/complexity trade-off comparison using the presented FPGA designs, which reveals that the detector circuit of choice is determined by the ratio between BS antennas and users, as well as the desired error-rate performance.

363 citations

Journal ArticleDOI
TL;DR: The benefits that cloud computing offers for fifth-generation (5G) mobile networks are explored and the implications on the signal processing algorithms are investigated.
Abstract: Cloud computing draws significant attention in the information technology (IT) community as it provides ubiquitous on-demand access to a shared pool of configurable computing resources with minimum management effort. It gains also more impact on the communication technology (CT) community and is currently discussed as an enabler for flexible, cost-efficient and more powerful mobile network implementations. Although centralized baseband pools are already investigated for the radio access network (RAN) to allow for efficient resource usage and advanced multicell algorithms, these technologies still require dedicated hardware and do not offer the same characteristics as cloud-computing platforms, i.e., on-demand provisioning, virtualization, resource pooling, elasticity, service metering, and multitenancy. However, these properties of cloud computing are key enablers for future mobile communication systems characterized by an ultradense deployment of radio access points (RAPs) leading to severe multicell interference in combination with a significant increase of the number of access nodes and huge fluctuations of the rate requirements over time. In this article, we will explore the benefits that cloud computing offers for fifth-generation (5G) mobile networks and investigate the implications on the signal processing algorithms.

272 citations

Journal ArticleDOI
TL;DR: This paper discusses optimal and near-optimal detection principles specifically designed for the massive MIMO system such as detectors based on a local search, belief propagation and box detection, and presents recent advances of detection algorithms which are mostly based on machine learning or sparsity based algorithms.
Abstract: Massive multiple-input multiple-output (MIMO) is a key technology to meet the user demands in performance and quality of services (QoS) for next generation communication systems. Due to a large number of antennas and radio frequency (RF) chains, complexity of the symbol detectors increased rapidly in a massive MIMO uplink receiver. Thus, the research to find the perfect massive MIMO detection algorithm with optimal performance and low complexity has gained a lot of attention during the past decade. A plethora of massive MIMO detection algorithms has been proposed in the literature. The aim of this paper is to provide insights on such algorithms to a generalist of wireless communications. We garner the massive MIMO detection algorithms and classify them so that a reader can find a distinction between different algorithms from a wider range of solutions. We present optimal and near-optimal detection principles specifically designed for the massive MIMO system such as detectors based on a local search, belief propagation and box detection. In addition, we cover detectors based on approximate inversion, which has gained popularity among the VLSI signal processing community due to their deterministic dataflow and low complexity. We also briefly explore several nonlinear small-scale MIMO (2-4 antenna receivers) detectors and their applicability in the massive MIMO context. In addition, we present recent advances of detection algorithms which are mostly based on machine learning or sparsity based algorithms. In each section, we also mention the related implementations of the detectors. A discussion of the pros and cons of each detector is provided.

262 citations