scispace - formally typeset
Search or ask a question
Author

Shihwa Lee

Bio: Shihwa Lee is an academic researcher from Samsung. The author has contributed to research in topics: Very long instruction word & Ray tracing (graphics). The author has an hindex of 9, co-authored 42 publications receiving 274 citations.

Papers
More filters
Proceedings ArticleDOI
19 Jul 2013
TL;DR: Simulation results show that SGRT is potentially a versatile graphics solution for future application processors as it provides a real-time ray tracing performance at full HD resolution that can compete with that of existing desktop GPU ray tracers.
Abstract: Recently, with the increasing demand for photorealistic graphics and the rapid advances in desktop CPUs/GPUs, real-time ray tracing has attracted considerable attention. Unfortunately, ray tracing in the current mobile environment is very difficult because of inadequate computing power, memory bandwidth, and flexibility in mobile GPUs. In this paper, we present a novel mobile GPU architecture called SGRT (Samsung reconfigurable GPU based on Ray Tracing) in which a fast compact hardware accelerator and a flexible programmable shader are combined. SGRT has two key features: 1) an area-efficient parallel pipelined traversal unit; and 2) flexible and high-performance kernels for shading and ray generation. Simulation results show that SGRT is potentially a versatile graphics solution for future application processors as it provides a real-time ray tracing performance at full HD resolution that can compete with that of existing desktop GPU ray tracers. Our system is implemented on an FPGA platform, and mobile ray tracing is successfully demonstrated.

74 citations

Proceedings ArticleDOI
05 Aug 2012
TL;DR: Experimental results show that the SGRT can be a versatile graphics solution, as it supports compatible performance compared to desktop GPU raytracers, and is the first mobile GPU based on full Whitted raytracing.
Abstract: Recently, with the increasing demand for photorealistic graphics and the rapid advances in desktop CPUs/GPUs, real-time raytracing has attracted considerable attention. Unfortunately, raytracing in the current mobile environment is difficult because of inadequate computing power, memory bandwidth, and flexibility in mobile GPUs. In this work, we present a novel mobile GPU architecture called the SGRT (Samsung reconfigurable GPU based on RayTracing) by enhancing our previous works with the following features: 1) a fast compact hardware engine that accelerates a traversal and intersection operation, 2) a flexible reconfigurable processor that supports software ray generation and shading, and 3) a parallelization framework that achieves scalable performance. Unlike our previous work, the current architecture is designed for both static and dynamic scenes with a smaller area. Experimental results show that the SGRT can be a versatile graphics solution, as it supports compatible performance compared to desktop GPU raytracers. To the best of our knowledge, the SGRT is the first mobile GPU based on full Whitted raytracing.

26 citations

Proceedings ArticleDOI
03 Dec 2010
TL;DR: Experimental result shows that the proposed load balancing method reduces the waiting overhead dramatically and the reduced amount is 82.3% of the total waiting overhead.
Abstract: In this paper, we address the problem of mapping H.264 main profile decoder on embedded dual core with dynamic load balancing. H.264 decoder is mapped to dual core system with a few hardware accelerators by proposed functional partitioning which enables simple interface with hardware accelerator and small memory usage for inter-core communication. We also propose dynamic load balancing method for the functional partitioning. The load balancing is done by mapping a few selected functions to each core dynamically at macroblock level. In this case, buffer level information is enough for making decision which core runs those functions. Because of this simple decision criterion and mechanism, performance loss for load balancing process can be negligible and it is also possible to extend the proposed load balancing method to multi-core systems easily. Experimental result shows that the proposed load balancing method reduces the waiting overhead dramatically and the reduced amount is 82.3% of the total waiting overhead.

18 citations

Proceedings ArticleDOI
01 Mar 2012
TL;DR: A novel panoramic imaging method which can compensate tone differences between images precisely by estimating ghost pixels in the overlap area is presented, offering a high-speed panorama system with only software implementation in spite of its outstanding performance.
Abstract: In this paper, a novel panoramic imaging method is presented which can compensate tone differences between images precisely by estimating ghost pixels in the overlap area. The method compensates the overlapped images bilaterally, then it prevents a panoramic image from getting darker or lighter as images are stitched. Therefore, the proposed method allows the overall panorama to keep the proper mid-tone, offering a high-speed panorama system with only software implementation in spite of its outstanding performance. Experimental results show that the proposed method can correct 95% serious corner cases taking less time than existing methods.

17 citations

Proceedings ArticleDOI
Won-Jong Lee1, Youngsam Shin1, Jaedon Lee1, Shihwa Lee1, Soojung Ryu1, Jeongwook Kim1 
19 Nov 2013
TL;DR: Simulation results show that this platform is potentially a versatile graphics solution for future application processors as it provides a real-time ray tracing performance at full HD resolution that can compete with that of existing desktop GPU ray tracers.
Abstract: In this work, we present a novel mobile computing platfom for mobile ray tracing in which a fast compact hardware accelerator and a flexible programmable shader are combined Our platform has two key features: 1) an area-efficient parallel pipelined traversal unit; and 2) flexible and high-performance kernels for shading and ray generation Simulation results show that our platform is potentially a versatile graphics solution for future application processors as it provides a real-time ray tracing performance at full HD resolution that can compete with that of existing desktop GPU ray tracers Our system is implemented on an FPGA platform, and mobile ray tracing is successfully demonstrated

14 citations


Cited by
More filters
Book
01 Jan 2006
TL;DR: The author discusses the history and present situation of operating systems, as well as some of the techniques used to design and implement these systems.
Abstract: Table of Contents CHAPTER 1 INTRODUCTION 1.1 WHAT IS AN OPERATING SYSTEM? 1.2 HISTORY OF OPERATING SYSTEMS 1.3 OPERATING SYSTEM CONCEPTS 1.4 SYSTEM CALLS 1.5 OPERATING SYSTEM STRUCTURE 1.6 OUTLINE OF THE REST OF THIS BOOK 1.7 SUMMARY CHAPTER 2 PROCESSES 2.1 INTRODUCTION TO PROCESSES 2.2 INTERPROCESS COMMUNICATION 2.3 CLASSICAL IPC PROBLEMS 2.4 SCHEDULING 2.5 OVERVIEW OF PROCESSES IN MINIX 3 2.6 IMPLEMENTATION OF PROCESSES IN MINIX 3 2.7 THE SYSTEM TASK IN MINIX 3 2.8 THE CLOCK TASK IN MINIX 3 2.9 SUMMARY CHAPTER 3 INPUT/OUTPUT 3.1 PRINCIPLES OF I/O HARDWARE 3.2 PRINCIPLES OF I/O SOFTWARE 3.3 DEADLOCKS 3.4 OVERVIEW OF I/O IN MINIX 3 3.5 BLOCK DEVICES IN MINIX 3 3.6 RAM DISKS 3.7 DISKS 3.8 TERMINALS 3.9 SUMMARY CHAPTER 4 MEMORY MANAGEMENT 4.1 BASIC MEMORY MANAGEMENT 4.2 SWAPPING 4.3 VIRTUAL MEMORY 4.4 PAGE REPLACEMENT ALGORITHMS 4.5 DESIGN ISSUES FOR PAGING SYSTEMS 4.6 SEGMENTATION 4.7 OVERVIEW OF THE MINIX 3 PROCESS MANAGER 4.8 IMPLEMENTATION OF THE MINIX 3 PROCESS MANAGER 4.9 SUMMARY CHAPTER 5 FILE SYSTEMS 5.1 FILES 5.2 DIRECTORIES 5.3 FILE SYSTEM IMPLEMENTATION 5.4 SECURITY 5.5 PROTECTION MECHANISMS 5.6 OVERVIEW OF THE MINIX 3 FILE SYSTEM 5.7 IMPLEMENTATION OF THE MINIX 3 FILE SYSTEM 5.8 SUMMARY CHAPTER 6 READING LIST AND BIBLIOGRAPHY 6.1 SUGGESTIONS FOR FURTHER READING 6.2 ALPHABETICAL BIBLIOGRAPHY APPENDIX A - INSTALLING MINIX 3 APPENDIX B - MINIX 3 SOURCE CODE LISTING APPENDIX C - INDEX TO FILES INDEX

572 citations

Proceedings ArticleDOI
01 Feb 2015
TL;DR: This paper proposes near-DRAM acceleration (NDA) architectures, which process data using accelerators 3D-stacked on DRAM devices comprising off-chip main memory modules, substantially reducing energy consumption and improving performance.
Abstract: Energy consumed for transferring data across the processor memory hierarchy constitutes a large fraction of total system energy consumption, and this fraction has steadily increased with technology scaling. In this paper, we propose near-DRAM acceleration (NDA) architectures, which process data using accelerators 3D-stacked on DRAM devices comprising off-chip main memory modules. NDA transfers most data through high-bandwidth and low-energy 3D interconnects between accelerators and DRAM devices instead of low-bandwidth and high-energy off-chip interconnects between a processor and DRAM devices, substantially reducing energy consumption and improving performance. Unlike previous near-memory processing architectures, NDA is built upon commodity DRAM devices; apart from inserting through-silicon vias (TSVs) to 3D-interconnect DRAM devices and accelerators, NDA requires minimal changes to the commodity DRAM device and standard memory module architectures. This allows NDA to be more easily adopted in both existing and emerging systems. Our experiments demonstrate that, on average, our NDA-based system consumes 46% (68%) lower (data transfer) energy at 1.67× higher performance than a system that integrates the same accelerator logic within the processor itself.

251 citations

Journal ArticleDOI
15 Apr 2015
TL;DR: This work surveys the field of reconfigurable computing, providing a guide to the body-of-knowledge accumulated in architecture, compute models, tools, run-time reconfiguration, and applications.
Abstract: Reconfigurable architectures can bring unique capabilities to computational tasks. They offer the performance and energy efficiency of hardware with the flexibility of software. In some domains, they are the only way to achieve the required, real-time performance without fabricating custom integrated circuits. Their functionality can be upgraded and repaired during their operational lifecycle and specialized to the particular instance of a task. We survey the field of reconfigurable computing, providing a guide to the body-of-knowledge accumulated in architecture, compute models, tools, run-time reconfiguration, and applications.

178 citations

Journal ArticleDOI
TL;DR: A taxonomy of deghosting algorithms is proposed which can be used to group existing and future algorithms into meaningful classes, and the results of a subjective experiment are shared which aims to evaluate various state‐of‐the‐art de ghosting algorithms.
Abstract: Obtaining a high quality high dynamic range HDR image in the presence of camera and object movement has been a long-standing challenge. Many methods, known as HDR deghosting algorithms, have been developed over the past ten years to undertake this challenge. Each of these algorithms approaches the deghosting problem from a different perspective, providing solutions with different degrees of complexity, solutions that range from rudimentary heuristics to advanced computer vision techniques. The proposed solutions generally differ in two ways: 1 how to detect ghost regions and 2 what to do to eliminate ghosts. Some algorithms choose to completely discard moving objects giving rise to HDR images which only contain the static regions. Some other algorithms try to find the best image to use for each dynamic region. Yet others try to register moving objects from different images in the spirit of maximizing dynamic range in dynamic regions. Furthermore, each algorithm may introduce different types of artifacts as they aim to eliminate ghosts. These artifacts may come in the form of noise, broken objects, under- and over-exposed regions, and residual ghosting. Given the high volume of studies conducted in this field over the recent years, a comprehensive survey of the state of the art is required. Thus, the first goal of this paper is to provide this survey. Secondly, the large number of algorithms brings about the need to classify them. Thus the second goal of this paper is to propose a taxonomy of deghosting algorithms which can be used to group existing and future algorithms into meaningful classes. Thirdly, the existence of a large number of algorithms brings about the need to evaluate their effectiveness, as each new algorithm claims to outperform its precedents. Therefore, the last goal of this paper is to share the results of a subjective experiment which aims to evaluate various state-of-the-art deghosting algorithms.

115 citations

Journal ArticleDOI
TL;DR: The architecture and design of CGRAs are reviewed thoroughly, a novel multidimensional taxonomy is proposed, and major challenges and the corresponding state-of-the-art techniques are surveyed and analyzed.
Abstract: As general-purpose processors have hit the power wall and chip fabrication cost escalates alarmingly, coarse-grained reconfigurable architectures (CGRAs) are attracting increasing interest from both academia and industry, because they offer the performance and energy efficiency of hardware with the flexibility of software. However, CGRAs are not yet mature in terms of programmability, productivity, and adaptability. This article reviews the architecture and design of CGRAs thoroughly for the purpose of exploiting their full potential. First, a novel multidimensional taxonomy is proposed. Second, major challenges and the corresponding state-of-the-art techniques are surveyed and analyzed. Finally, the future development is discussed.

114 citations