scispace - formally typeset
Search or ask a question
Author

Yan Gu

Bio: Yan Gu is an academic researcher from University of California, Riverside. The author has contributed to research in topics: Parallel algorithm & Computer science. The author has an hindex of 18, co-authored 58 publications receiving 848 citations. Previous affiliations of Yan Gu include Carnegie Mellon University & Tsinghua University.


Papers
More filters
Proceedings ArticleDOI
19 Jul 2013
TL;DR: An efficient, easily parallelizable algorithm for generating high-quality bounding volume hierarchies using agglomerative clustering that often produces higher quality hierarchies than a full sweep SAH building yet executes in less time than the widely used top-down, approximate SAH build algorithm based on binning.
Abstract: We introduce Approximate Agglomerative Clustering (AAC), an efficient, easily parallelizable algorithm for generating high-quality bounding volume hierarchies using agglomerative clustering. The main idea of AAC is to compute an approximation to the true greedy agglomerative clustering solution by restricting the set of candidates inspected when identifying neighboring geometry in the scene. The result is a simple algorithm that often produces higher quality hierarchies (in terms of subsequent ray tracing cost) than a full sweep SAH build yet executes in less time than the widely used top-down, approximate SAH build algorithm based on binning.

74 citations

Journal ArticleDOI
TL;DR: The problem of moving sensors on a line to form a barrier coverage of a specified segment of the line such that the maximum moving distance of the sensors is minimized is minimized by giving an O(n^2\log n)$$O(n2logn) time algorithm.
Abstract: In this paper, we study the problem of moving $$n$$n sensors on a line to form a barrier coverage of a specified segment of the line such that the maximum moving distance of the sensors is minimized. Previously, it was an open question whether this problem on sensors with arbitrary sensing ranges is solvable in polynomial time. We settle this open question positively by giving an $$O(n^2\log n)$$O(n2logn) time algorithm. For the special case when all sensors have the same-size sensing range, the previously best solution takes $$O(n^2)$$O(n2) time. We present an $$O(n\log n)$$O(nlogn) time algorithm for this case; further, if all sensors are initially located on the coverage segment, our algorithm takes $$O(n)$$O(n) time. Also, we extend our techniques to the cycle version of the problem where the barrier coverage is for a simple cycle and the sensors are allowed to move only along the cycle. For sensors with the same-size sensing range, we solve the cycle version in $$O(n)$$O(n) time, improving the previously best $$O(n^2)$$O(n2) time solution.

70 citations

Journal ArticleDOI
27 Jul 2014
TL;DR: New shading language abstractions are designed that simplify development of shaders for this system, and adaptive techniques that use these mechanisms to reduce the number of instructions performed during shading by more than a factor of three while maintaining high image quality are designed.
Abstract: Due to complex shaders and high-resolution displays (particularly on mobile graphics platforms), fragment shading often dominates the cost of rendering in games. To improve the efficiency of shading on GPUs, we extend the graphics pipeline to natively support techniques that adaptively sample components of the shading function more sparsely than per-pixel rates. We perform an extensive study of the challenges of integrating adaptive, multi-rate shading into the graphics pipeline, and evaluate two- and three-rate implementations that we believe are practical evolutions of modern GPU designs. We design new shading language abstractions that simplify development of shaders for this system, and design adaptive techniques that use these mechanisms to reduce the number of instructions performed during shading by more than a factor of three while maintaining high image quality.

67 citations

Proceedings ArticleDOI
25 Jul 2011
TL;DR: An interactive tool for designing v-style pop-ups and an automated construction algorithm from a given geometry are developed, both of which guaranteeing the pop-uppability of the results.
Abstract: Pop-up books are a fascinating form of paper art with intriguing geometric properties. In this paper, we present a systematic study of a simple but common class of pop-ups consisting of patches falling into four parallel groups, which we call v-style pop-ups. We give sufficient conditions for a v-style paper structure to be pop-uppable. That is, it can be closed flat while maintaining the rigidity of the patches, the closing and opening do not need extra force besides holding two patches and are free of intersections, and the closed paper is contained within the page border. These conditions allow us to identify novel mechanisms for making pop-ups. Based on the theory and mechanisms, we developed an interactive tool for designing v-style pop-ups and an automated construction algorithm from a given geometry, both of which guaranteeing the pop-uppability of the results.

53 citations

Proceedings ArticleDOI
13 Jun 2015
TL;DR: This work implements the parallel integer sorting algorithm of Rajasekaran and Reif, but instead of processing bits of a integers in a reduced range in a bottom-up fashion, it process the hashed values of keys directly top-down.
Abstract: Semisorting is the problem of reordering an input array of keys such that equal keys are contiguous but different keys are not necessarily in sorted order. Semisorting is important for collecting equal values and is widely used in practice. For example, it is the core of the MapReduce paradigm, is a key component of the database join operation, and has many other applications. We describe a (randomized) parallel algorithm for the problem that is theoretically efficient (linear work and logarithmic depth), but is designed to be more practically efficient than previous algorithms. We use ideas from the parallel integer sorting algorithm of Rajasekaran and Reif, but instead of processing bits of a integers in a reduced range in a bottom-up fashion, we process the hashed values of keys directly top-down. We implement the algorithm and experimentally show on a variety of input distributions that it outperforms a similarly-optimized radix sort on a modern 40-core machine with hyper-threading by about a factor of 1.7--1.9, and achieves a parallel speedup of up to 38x. We discuss the various optimizations used in our implementation and present an extensive experimental analysis of its performance.

50 citations


Cited by
More filters
01 Jan 2013

1,098 citations

Proceedings ArticleDOI
23 Feb 2013
TL;DR: This paper presents a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write and significantly more efficient than previously reported results using graph frameworks on machines with many more cores.
Abstract: There has been significant recent interest in parallel frameworks for processing graphs due to their applicability in studying social networks, the Web graph, networks in biology, and unstructured meshes in scientific simulation. Due to the desire to process large graphs, these systems have emphasized the ability to run on distributed memory machines. Today, however, a single multicore server can support more than a terabyte of memory, which can fit graphs with tens or even hundreds of billions of edges. Furthermore, for graph algorithms, shared-memory multicores are generally significantly more efficient on a per core, per dollar, and per joule basis than distributed memory systems, and shared-memory algorithms tend to be simpler than their distributed counterparts.In this paper, we present a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write. The framework has two very simple routines, one for mapping over edges and one for mapping over vertices. Our routines can be applied to any subset of the vertices, which makes the framework useful for many graph traversal algorithms that operate on subsets of the vertices. Based on recent ideas used in a very fast algorithm for breadth-first search (BFS), our routines automatically adapt to the density of vertex sets. We implement several algorithms in this framework, including BFS, graph radii estimation, graph connectivity, betweenness centrality, PageRank and single-source shortest paths. Our algorithms expressed using this framework are very simple and concise, and perform almost as well as highly optimized code. Furthermore, they get good speedups on a 40-core machine and are significantly more efficient than previously reported results using graph frameworks on machines with many more cores.

816 citations

Journal ArticleDOI
11 Nov 2016
TL;DR: A practical foveated rendering system that reduces number of shades by up to 70% and allows coarsened shading up to 30° closer to the fovea than Guenter et al.
Abstract: Foveated rendering synthesizes images with progressively less detail outside the eye fixation region, potentially unlocking significant speedups for wide field-of-view displays, such as head mounted displays, where target framerate and resolution is increasing faster than the performance of traditional real-time renderers. To study and improve potential gains, we designed a foveated rendering user study to evaluate the perceptual abilities of human peripheral vision when viewing today's displays. We determined that filtering peripheral regions reduces contrast, inducing a sense of tunnel vision. When applying a postprocess contrast enhancement, subjects tolerated up to 2× larger blur radius before detecting differences from a non-foveated ground truth. After verifying these insights on both desktop and head mounted displays augmented with high-speed gaze-tracking, we designed a perceptual target image to strive for when engineering a production foveated renderer. Given our perceptual target, we designed a practical foveated rendering system that reduces number of shades by up to 70% and allows coarsened shading up to 30° closer to the fovea than Guenter et al. [2012] without introducing perceivable aliasing or blur. We filter both pre- and post-shading to address aliasing from undersampling in the periphery, introduce a novel multiresolution- and saccade-aware temporal antialising algorithm, and use contrast enhancement to help recover peripheral details that are resolvable by our eye but degraded by filtering. We validate our system by performing another user study. Frequency analysis shows our system closely matches our perceptual target. Measurements of temporal stability show we obtain quality similar to temporally filtered non-foveated renderings.

347 citations

Journal ArticleDOI
27 Jul 2014
TL;DR: This paper shows how to rigorously formulate the consistency constraint in the functional map setting, and leads to a powerful tool for computing consistent functional maps, and also for discovering shared structures, such as meaningful shape parts.
Abstract: The construction of networks of maps among shapes in a collection enables a variety of applications in data-driven geometry processing. A key task in network construction is to make the maps consistent with each other. This consistency constraint, when properly defined, leads not only to a concise representation of such networks, but more importantly, it serves as a strong regularizer for correcting and improving noisy initial maps computed between pairs of shapes in isolation. Up-to-now, however, the consistency constraint has only been fully formulated for point-based maps or for shape collections that are fully similar. In this paper, we introduce a framework for computing consistent functional maps within heterogeneous shape collections. In such collections not all shapes share the same structure --- different types of shared structure may be present within different (but possibly overlapping) sub-collections. Unlike point-based maps, functional maps can encode similarities at multiple levels of detail (points or parts), and thus are particularly suitable for coping with such diversity within a shape collection. We show how to rigorously formulate the consistency constraint in the functional map setting. The formulation leads to a powerful tool for computing consistent functional maps, and also for discovering shared structures, such as meaningful shape parts. We also show how to adapt the procedure for handling very large-scale shape collections. Experimental results on benchmark datasets show that the proposed framework significantly improves upon state-of-the-art data-driven techniques. We demonstrate the usefulness of the framework in shape co-segmentation and various shape exploration tasks.

185 citations

01 Jan 1980
TL;DR: In this article, the worst case cost of sequences of insertions and deletions in weak B-trees is analyzed, where each node has at least a and at most b sons where 2a ≤ b.
Abstract: In this paper we explore the use of weak B-trees to represent sorted lists. In weak B-trees each node has at least a and at most b sons where 2a ≤ b. We analyse the worst case cost of sequences of insertions and deletions in weak B-trees. This leads to a new data structure (level-linked weak B-trees) for representing sorted lists when the access pattern exhibits a (time-varying) locality of reference. Our structure is substantially simpler than the one proposed by Guibas, McCreight, Plass and Roberts, yet it has many of its properties. Our structure is as simple as the one proposed by Brown/Tarjan, but our structure can treat arbitrary sequences of insertions and deletions whilst theirs can only treat non-interacting insertions and deletions.

132 citations