Other affiliations: Intel
Bio: Tao Wang is an academic researcher from Peking University. The author has contributed to research in topics: Cache & Speedup. The author has an hindex of 28, co-authored 134 publications receiving 2448 citations. Previous affiliations of Tao Wang include Intel.
Papers published on a yearly basis
••07 Sep 2014
TL;DR: Jigsaw is proposed, a floor plan reconstruction system that leverages crowdsensed data from mobile users that extracts the position, size and orientation information of individual landmark objects from images taken by users, and produces complete floor plans with hallway connectivity, room sizes and shapes.
Abstract: The lack of floor plans is a critical reason behind the current sporadic availability of indoor localization service. Service providers have to go through effort-intensive and time-consuming business negotiations with building operators, or hire dedicated personnel to gather such data. In this paper, we propose Jigsaw, a floor plan reconstruction system that leverages crowdsensed data from mobile users. It extracts the position, size and orientation information of individual landmark objects from images taken by users. It also obtains the spatial relation between adjacent landmark objects from inertial sensor data, then computes the coordinates and orientations of these objects on an initial floor plan. By combining user mobility traces and locations where images are taken, it produces complete floor plans with hallway connectivity, room sizes and shapes. Our experiments on 3 stories of 2 large shopping malls show that the 90-percentile errors of positions and orientations of landmark objects are about 1~2m and 5~9°, while the hallway connectivity is 100% correct.
TL;DR: The testing results on FIFA World Cup 2006 videos demonstrate that the method can reach high detection and labeling precision, and reliably tracking in cases of scenes such as player occlusion, moderate camera motion and pose variation.
Abstract: In this paper, we present a method to perform automatic multiple player detection, unsupervised labeling and efficient tracking in broadcast soccer videos. Player detection is to determine the players' positions and scales. It is achieved by combining the ability of dominant color based background subtraction and a boosting detector with Haar features. We then collect hundreds of player samples with the player detector, and learn codebook based player appearance model by unsupervised clustering algorithm. A player can be labeled as one of four types: two teams, referee or outlier. The learning capability enables the method to be generalized well to different videos without any manually initialization. Based on detection and labeling, we perform multiple player tracking with Markov chain Monte Carlo (MCMC) data association. Some data driven dynamics are proposed to improve the Markov chain's efficiency, such as label and motion consistent and track length. The testing results on FIFA World Cup 2006 videos demonstrate that our method can reach high detection and labeling precision, and reliably tracking in cases of scenes such as player occlusion, moderate camera motion and pose variation.
••09 Mar 2015
TL;DR: In this paper, a coordinated static and dynamic cache bypassing technique is proposed to improve application performance by identifying the global loads that indicate strong preferences for caching or bypassing through profiling.
Abstract: The massive parallel architecture enables graphics processing units (GPUs) to boost performance for a wide range of applications. Initially, GPUs only employ scratchpad memory as on-chip memory. Recently, to broaden the scope of applications that can be accelerated by GPUs, GPU vendors have used caches in conjunction with scratchpad memory as on-chip memory in the new generations of GPUs. Unfortunately, GPU caches face many performance challenges that arise due to excessive thread contention for cache resource. Cache bypassing, where memory requests can selectively bypass the cache, is one solution that can help to mitigate the cache resource contention problem. In this paper, we propose coordinated static and dynamic cache bypassing to improve application performance. At compile-time, we identify the global loads that indicate strong preferences for caching or bypassing through profiling. For the rest global loads, our dynamic cache bypassing has the flexibility to cache only a fraction of threads. In CUDA programming model, the threads are divided into work units called thread blocks. Our dynamic bypassing technique modulates the ratio of thread blocks that cache or bypass at run-time. We choose to modulate at thread block level in order to avoid the memory divergence problems. Our approach combines compile-time analysis that determines the cache or bypass preferences for global loads with run-time management that adjusts the ratio of thread blocks that cache or bypass. Our coordinated static and dynamic cache bypassing technique achieves up to 2.28X (average I.32X) performance speedup for a variety of GPU applications.
••03 Oct 2016
TL;DR: This work designs and implements MobileInsight, a software tool that collects, analyzes and exploits runtime network information from operational cellular networks, and offers a simple API, through which developers and researchers obtain access to low-level network information for their mobile applications.
Abstract: We design and implement MobileInsight, a software tool that collects, analyzes and exploits runtime network information from operational cellular networks. MobileInsight runs on commercial off-the-shelf phones without extra hardware or additional support from operators. It exposes protocol messages on both control plane and (below IP) data plane from the 3G/4G chipset. It provides in-device protocol analysis and operation logic inference. It further offers a simple API, through which developers and researchers obtain access to low-level network information for their mobile applications. We have built three showcases to illustrate how MobileInsight is applied to cellular network research.
14 Jun 2014
TL;DR: A novel memory architecture Half-DRAM is proposed, in which the DRAM array is reorganized to enable only half of a row being activated, which can achieve both significant performance improvement and power reduction, with negligible design overhead.
Abstract: DRAM memory is a major contributor for the total power consumption in modern computing systems. Consequently, power reduction for DRAM memory is critical to improve system-level power efficiency. Fine-grained DRAM architecture [1, 2] has been proposed to reduce the activation/ precharge power. However, those prior work either incurs significant performance degradation or introduces large area overhead. In this paper, we propose a novel memory architecture Half-DRAM, in which the DRAM array is reorganized to enable only half of a row being activated. The half-row activation can effectively reduce activation power and meanwhile sustain the full bandwidth one bank can provide. In addition, the half-row activation in Half-DRAM relaxes the power constraint in DRAM, and opens up opportunities for further performance gain. Furthermore, two half-row accesses can be issued in parallel by integrating the sub-array level parallelism to improve the memory level parallelism. The experimental results show that Half-DRAM can achieve both significant performance improvement and power reduction, with negligible design overhead
••07 Jun 2015
TL;DR: This work proposes a cascade architecture built on convolutional neural networks (CNNs) with very powerful discriminative capability, while maintaining high performance, and introduces a CNN-based calibration stage after each of the detection stages in the cascade.
Abstract: In real-world face detection, large visual variations, such as those due to pose, expression, and lighting, demand an advanced discriminative model to accurately differentiate faces from the backgrounds. Consequently, effective models for the problem tend to be computationally prohibitive. To address these two conflicting challenges, we propose a cascade architecture built on convolutional neural networks (CNNs) with very powerful discriminative capability, while maintaining high performance. The proposed CNN cascade operates at multiple resolutions, quickly rejects the background regions in the fast low resolution stages, and carefully evaluates a small number of challenging candidates in the last high resolution stage. To improve localization effectiveness, and reduce the number of candidates at later stages, we introduce a CNN-based calibration stage after each of the detection stages in the cascade. The output of each calibration stage is used to adjust the detection window position for input to the subsequent stage. The proposed method runs at 14 FPS on a single CPU core for VGA-resolution images and 100 FPS using a GPU, and achieves state-of-the-art detection performance on two public face detection benchmarks.
TL;DR: This survey overviews recent advances on two major areas of Wi-Fi fingerprint localization: advanced localization techniques and efficient system deployment.
Abstract: The growing commercial interest in indoor location-based services (ILBS) has spurred recent development of many indoor positioning techniques. Due to the absence of global positioning system (GPS) signal, many other signals have been proposed for indoor usage. Among them, Wi-Fi (802.11) emerges as a promising one due to the pervasive deployment of wireless LANs (WLANs). In particular, Wi-Fi fingerprinting has been attracting much attention recently because it does not require line-of-sight measurement of access points (APs) and achieves high applicability in complex indoor environment. This survey overviews recent advances on two major areas of Wi-Fi fingerprint localization: advanced localization techniques and efficient system deployment. Regarding advanced techniques to localize users, we present how to make use of temporal or spatial signal patterns, user collaboration, and motion sensors. Regarding efficient system deployment, we discuss recent advances on reducing offline labor-intensive survey, adapting to fingerprint changes, calibrating heterogeneous devices for signal collection, and achieving energy efficiency for smartphones. We study and compare the approaches through our deployment experiences, and discuss some future directions.
TL;DR: Challenges augmented reality is facing in each of these applications to go from the laboratories to the industry, as well as the future challenges the authors can forecast are also discussed in this paper.
Abstract: This paper surveys the current state-of-the-art of technology, systems and applications in Augmented Reality. It describes work performed by many different research groups, the purpose behind each new Augmented Reality system, and the difficulties and problems encountered when building some Augmented Reality applications. It surveys mobile augmented reality systems challenges and requirements for successful mobile systems. This paper summarizes the current applications of Augmented Reality and speculates on future applications and where current research will lead Augmented Reality's development. Challenges augmented reality is facing in each of these applications to go from the laboratories to the industry, as well as the future challenges we can forecast are also discussed in this paper. Section 1 gives an introduction to what Augmented Reality is and the motivations for developing this technology. Section 2 discusses Augmented Reality Technologies with computer vision methods, AR devices, interfaces and systems, and visualization tools. The mobile and wireless systems for Augmented Reality are discussed in Section 3. Four classes of current applications that have been explored are described in Section 4. These applications were chosen as they are the most famous type of applications encountered when researching AR apps. The future of augmented reality and the challenges they will be facing are discussed in Section 5.
TL;DR: In this article, a deep-learning-based indoor fingerprinting system using channel state information (CSI) is presented, which includes an offline training phase and an online localization phase.
Abstract: With the fast-growing demand of location-based services in indoor environments, indoor positioning based on fingerprinting has attracted significant interest due to its high accuracy. In this paper, we present a novel deep-learning-based indoor fingerprinting system using channel state information (CSI), which is termed DeepFi. Based on three hypotheses on CSI, the DeepFi system architecture includes an offline training phase and an online localization phase. In the offline training phase, deep learning is utilized to train all the weights of a deep network as fingerprints. Moreover, a greedy learning algorithm is used to train the weights layer by layer to reduce complexity. In the online localization phase, we use a probabilistic method based on the radial basis function to obtain the estimated location. Experimental results are presented to confirm that DeepFi can effectively reduce location error, compared with three existing methods in two representative indoor environments.
••20 Jun 2011
TL;DR: This work presents a multi-target tracking system that is designed specifically for the provision of stable and accurate head location estimates and uses a more principled approach based on a Minimal Description Length (MDL) objective which accurately models the affinity between observations.
Abstract: The majority of existing pedestrian trackers concentrate on maintaining the identities of targets, however systems for remote biometric analysis or activity recognition in surveillance video often require stable bounding-boxes around pedestrians rather than approximate locations. We present a multi-target tracking system that is designed specifically for the provision of stable and accurate head location estimates. By performing data association over a sliding window of frames, we are able to correct many data association errors and fill in gaps where observations are missed. The approach is multi-threaded and combines asynchronous HOG detections with simultaneous KLT tracking and Markov-Chain Monte-Carlo Data Association (MCM-CDA) to provide guaranteed real-time tracking in high definition video. Where previous approaches have used ad-hoc models for data association, we use a more principled approach based on a Minimal Description Length (MDL) objective which accurately models the affinity between observations. We demonstrate by qualitative and quantitative evaluation that the system is capable of providing precise location estimates for large crowds of pedestrians in real-time. To facilitate future performance comparisons, we make a new dataset with hand annotated ground truth head locations publicly available.