scispace - formally typeset
Search or ask a question
Author

Xuehui Liu

Bio: Xuehui Liu is an academic researcher from Chinese Academy of Sciences. The author has contributed to research in topics: Rendering (computer graphics) & Pixel. The author has an hindex of 15, co-authored 54 publications receiving 706 citations.


Papers
More filters
Proceedings ArticleDOI
06 Oct 2004
TL;DR: The method generates the boundary from a 3D scene with solid clipping, making the computation run on GPU despite of the complexity of the whole geometry scene, and modify the values on the boundaries according to the boundary conditions.
Abstract: In this paper, we solve the 3D fluid dynamics problem in a complex environment by taking advantage of the parallelism and programmability of GPU. In difference from other methods, innovation is made in two aspects. Firstly, more general boundary conditions could be processed on GPU in our method. By the method, we generate the boundary from a 3D scene with solid clipping, making the computation run on GPU despite of the complexity of the whole geometry scene. Then by grouping the voxels into different types according to their positions relative to the obstacles and locating the voxel that determines the value of the current voxel, we modify the values on the boundaries according to the boundary conditions. Secondly, more compact structure in data packing with flat 3D textures is designed at the fragment processing level to enhance parallelism and reduce execution passes. The scalar variables including density and temperature are packed into four channels of texels to accelerate the computation of 3D Navier-Stokes equations (NSEs). The test results prove the efficiency of our method, and as a result, it is feasible to run middle-scale problems of 3D fluid dynamics in an interactive speed for more general environment with complex geometry on PC platform.

123 citations

Proceedings ArticleDOI
01 Aug 2009
TL;DR: An efficient algorithm for multi-layer depth peeling via bucket sort of fragments on GPU, which makes it possible to capture up to 32 layers simultaneously with correct depth ordering in a single geometry pass, and is free of read-modify-write (RMW) hazards.
Abstract: In this paper we present an efficient algorithm for multi-layer depth peeling via bucket sort of fragments on GPU, which makes it possible to capture up to 32 layers simultaneously with correct depth ordering in a single geometry pass. We exploit multiple render targets (MRT) as storage and construct a bucket array of size 32 per pixel. Each bucket is capable of holding only one fragment, and can be concurrently updated using the MAX/MIN blending operation. During the rasterization, the depth range of each pixel location is divided into consecutive subintervals uniformly, and a linear bucket sort is performed so that fragments within each subintervals will be routed into the corresponding buckets. In a following fullscreen shader pass, the bucket array can be sequentially accessed to get the sorted fragments for further applications. Collisions will happen when more than one fragment is routed to the same bucket, which can be alleviated by multi-pass approach. We also develop a two-pass approach to further reduce the collisions, namely adaptive bucket depth peeling. In the first geometry pass, the depth range is redivided into non-uniform subintervals according to the depth distribution to make sure that there is only one fragment within each subinterval. In the following bucket sorting pass, there will be only one fragment routed into each bucket and collisions will be substantially reduced. Our algorithm shows up to 32 times speedup to the classical depth peeling especially for large scenes with high depth complexity, and the experimental results are visually faithful to the ground truth. Also it has no requirement of pre-sorting geometries or post-sorting fragments, and is free of read-modify-write (RMW) hazards.

67 citations

Journal ArticleDOI
TL;DR: This work solves the fluid dynamics problem completely on GPU by packing the scalar and vector variables into four channels of texels and compute the texture coordinates offsets according to the type of the boundary condition of each node to determine the corresponding variables.
Abstract: Taking advantage of the parallelism and programmability of GPU, we solve the fluid dynamics problem completely on GPU. Different from previous methods, the whole computation is accelerated in our method by packing the scalar and vector variables into four channels of texels. In order to be adaptive to the arbitrary boundary conditions, we group the grid nodes into different types according to their positions relative to obstacles and search the node that determines the value of the current node. Then we compute the texture coordinates offsets according to the type of the boundary condition of each node to determine the corresponding variables and achieve the interaction of flows with obstacles set freely by users. The test results prove the efficiency of our method and exhibit the potential of GPU for general-purpose computations. Copyright © 2004 John Wiley & Sons, Ltd.

66 citations

Proceedings ArticleDOI
19 Feb 2010
TL;DR: FreePipe is presented, a system for programmable parallel rendering that can run entirely on current graphics hardware and has performance comparable with the traditional graphics pipeline.
Abstract: In the past decade, modern GPUs have provided increasing programmability with vertex, geometry and fragment shaders. However, many classical problems have not been efficiently solved using the current graphics pipeline where some stages are still fixed functions on chip. In particular, multi-fragment effects, especially order-independent transparency, require programmability of the blending stage, that makes it difficult to be solved in a single geometry pass. In this paper we present FreePipe, a system for programmable parallel rendering that can run entirely on current graphics hardware and has performance comparable with the traditional graphics pipeline. Within this framework, two schemes for the efficient rendering of multi-fragment effects in a single geometry pass have been developed by exploiting CUDA atomic operations. Both schemes have achieved significant speedups compared to the state-of-the-art methods that are based on traditional graphics pipelines.

62 citations

01 Jan 2010
TL;DR: FreePipe as mentioned in this paper is a system for programmable parallel rendering that can run entirely on current graphics hardware and has performance comparable with the traditional graphics pipeline, and two schemes for the efficient rendering of multi-fragment effects in a single geometry pass have been developed by exploiting CUDA atomic operations.
Abstract: In the past decade, modern GPUs have provided increasing programmability with vertex, geometry and fragment shaders. However, many classical problems have not been efficiently solved using the current graphics pipeline where some stages are still fixed functions on chip. In particular, multi-fragment effects, especially order-independent transparency, require programmability of the blending stage, that makes it difficult to be solved in a single geometry pass. In this paper we present FreePipe, a system for programmable parallel rendering that can run entirely on current graphics hardware and has performance comparable with the traditional graphics pipeline. Within this framework, two schemes for the efficient rendering of multi-fragment effects in a single geometry pass have been developed by exploiting CUDA atomic operations. Both schemes have achieved significant speedups compared to the state-of-the-art methods that are based on traditional graphics pipelines.

56 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This report describes, summarize, and analyzes the latest research in mapping general‐purpose computation to graphics hardware.
Abstract: The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware a compelling platform for computationally demanding tasks in a wide variety of application domains. In this report, we describe, summarize, and analyze the latest research in mapping general-purpose computation to graphics hardware. We begin with the technical motivations that underlie general-purpose computation on graphics processors (GPGPU) and describe the hardware and software developments that have led to the recent interest in this field. We then aim the main body of this report at two separate audiences. First, we describe the techniques used in mapping general-purpose computation to graphics hardware. We believe these techniques will be generally useful for researchers who plan to develop the next generation of GPGPU algorithms and techniques. Second, we survey and categorize the latest developments in general-purpose application development on graphics hardware. This survey should be of particular interest to researchers who are interested in using the latest GPGPU applications in their systems of interest.

1,998 citations

Proceedings Article
01 Jan 2005
TL;DR: The techniques used in mapping general-purpose computation to graphics hardware will be generally useful for researchers who plan to develop the next generation of GPGPU algorithms and techniques.
Abstract: The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware a compelling platform for computationally demanding tasks in a wide variety of application domains. In this report, we describe, summarize, and analyze the latest research in mapping general-purpose computation to graphics hardware. We begin with the technical motivations that underlie general-purpose computation on graphics processors (GPGPU) and describe the hardware and software developments that have led to the recent interest in this field. We then aim the main body of this report at two separate audiences. First, we describe the techniques used in mapping general-purpose computation to graphics hardware. We believe these techniques will be generally useful for researchers who plan to develop the next generation of GPGPU algorithms and techniques. Second, we survey and categorize the latest developments in general-purpose application development on graphics hardware. This survey should be of particular interest to researchers who are interested in using the latest GPGPU applications in their systems of interest.

1,728 citations

Journal ArticleDOI
TL;DR: A very efficient implementation of a lattice Boltzmann (LB) kernel in 3D on a graphical processing unit using the compute unified device architecture interface developed by nVIDIA is presented.
Abstract: A very efficient implementation of a lattice Boltzmann (LB) kernel in 3D on a graphical processing unit using the compute unified device architecture interface developed by nVIDIA is presented. By exploiting the explicit parallelism offered by the graphics hardware, we obtain an efficiency gain of up to two orders of magnitude with respect to the computational performance of a PC. A non-trivial example shows the performance of the LB implementation, which is based on a D3Q13 model that is described in detail.

300 citations

Journal ArticleDOI
26 Jul 2010
TL;DR: This work presents a novel rendering system for defocus blur and lens effects and introduces an intuitive control over depth-of-field effects, which achieves more precision than competing real-time solutions and is mostly indistinguishable from offline rendering.
Abstract: We present a novel rendering system for defocus blur and lens effects. It supports physically-based rendering and outperforms previous approaches by involving a novel GPU-based tracing method. Our solution achieves more precision than competing real-time solutions and our results are mostly indistinguishable from offline rendering. Our method is also more general and can integrate advanced simulations, such as simple geometric lens models enabling various lens aberration effects. These latter is crucial for realism, but are often employed in artistic contexts, too. We show that available artistic lenses can be simulated by our method. In this spirit, our work introduces an intuitive control over depth-of-field effects. The physical basis is crucial as a starting point to enable new artistic renderings based on a generalized focal surface to emphasize particular elements in the scene while retaining a realistic look. Our real-time solution provides realistic, as well as plausible expressive results.

256 citations

Journal ArticleDOI
TL;DR: A very efficient implementation of a 2D-Lattice Boltzmann kernel using the Compute Unified Device Architecture (CUDA) interface developed by nVIDIA® is presented, exploiting the explicit parallelism exposed in the graphics hardware.
Abstract: In this article a very efficient implementation of a 2D-Lattice Boltzmann kernel using the Compute Unified Device Architecture (CUDA™) interface developed by nVIDIA® is presented. By exploiting the explicit parallelism exposed in the graphics hardware we obtain more than one order in performance gain compared to standard CPUs. A non-trivial example, the flow through a generic porous medium, shows the performance of the implementation.

249 citations