scispace - formally typeset
Search or ask a question
Author

Hiroyuki Takizawa

Other affiliations: University UCINF, Niigata University
Bio: Hiroyuki Takizawa is an academic researcher from Tohoku University. The author has contributed to research in topics: Cache & Cache pollution. The author has an hindex of 18, co-authored 167 publications receiving 1116 citations. Previous affiliations of Hiroyuki Takizawa include University UCINF & Niigata University.


Papers
More filters
Proceedings ArticleDOI
08 Dec 2009
TL;DR: It is demonstrated that a prototype implementation of CheCUDA can correctly checkpoint and restart a CUDA application written with basic APIs and also indicates that Che CUDA can migrate a process from one PC to another even if the process uses a GPU.
Abstract: In this paper, a tool named CheCUDA is designed to checkpoint CUDA applications that use GPUs as accelerators. As existing checkpoint/restart implementations do not support checkpointing the GPU status, CheCUDA hooks a part of basic CUDA driver API calls in order to record the status changes on the main memory. At checkpointing, CheCUDA stores the status changes in a file after copying all necessary data in the video memory to the main memory and then disabling the CUDA runtime. At restarting, CheCUDA reads the file, re-initializes the CUDA runtime, and recovers the resources on GPUs so as to restart from the stored status. This paper demonstrates that a prototype implementation of CheCUDA can correctly checkpoint and restart a CUDA application written with basic APIs. This also indicates that CheCUDA can migrate a process from one PC to another even if the process uses a GPU. Accordingly, CheCUDA is useful not only to enhance the dependability of CUDA applications but also to enable dynamic task scheduling of CUDA applications required especially on heterogeneous GPU cluster systems. This paper also shows the timing overhead for checkpointing.

91 citations

Proceedings ArticleDOI
16 May 2011
TL;DR: A new transparent checkpoint/restart (CPR) tool, named CheCL, for high-performance and dependable GPU computing, that can perform CPR on an OpenCL application program without any modification and recompilation of its code.
Abstract: In this paper, we propose a new transparent checkpoint/restart (CPR) tool, named CheCL, for high-performance and dependable GPU computing. CheCL can perform CPR on an OpenCL application program without any modification and recompilation of its code. A conventional check pointing system fails to checkpoint a process if the process uses OpenCL. Therefore, in CheCL, every API call is forwarded to another process called an API proxy, and the API proxy invokes the API function, two processes, an application process and an API proxy, are launched for an OpenCL application. In this case, as the application process is not an OpenCL process but a standard process, it can be safely check pointed. While CheCL intercepts all API calls, it records the information necessary for restoring OpenCL objects. The application process does not hold any OpenCL handles, but CheCL handles to keep such information. Those handles are automatically converted to OpenCL handles and then passed to API functions. Upon restart, OpenCL objects are automatically restored based on the recorded information. This paper demonstrates the feasibility of transparent check pointing of OpenCL programs including MPI applications, and quantitatively evaluates the runtime overheads. It is also discussed that CheCL can enable process migration of OpenCL applications among distinct nodes, and among different kinds of compute devices such as a CPU and a GPU.

61 citations

Journal ArticleDOI
TL;DR: In this article, a divide-and-conquerior approach to parallel data clustering is employed to perform the coarse-grain parallel processing by multiple PCs with a message passing mechanism.
Abstract: This paper presents an effective scheme for clustering a huge data set using a PC cluster system, in which each PC is equipped with a commodity programmable graphics processing unit (GPU). The proposed scheme is devised to achieve three-level hierarchical parallel processing of massive data clustering. The divide-and-conquer approach to parallel data clustering is employed to perform the coarse-grain parallel processing by multiple PCs with a message passing mechanism. By taking advantage of the GPU's parallel processing capability, moreover, the proposed scheme can exploit two types of the fine-grain data parallelism at the different levels in the nearest neighbor search, which is the most computationally-intensive part of the data-clustering process. The performance of our scheme is discussed in comparison with that of the implementation entirely running on CPU. Experimental results clearly show that the proposed hierarchial parallel processing can remarkably accelerate the data clustering task. Especially, GPU co-processing is quite effective to improve the computational efficiency of parallel data clustering on a PC cluster. Although data-transfer from GPU to CPU is generally costly, acceleration by GPU co-processing is significant to save the total execution time of data-clustering.

59 citations

Proceedings ArticleDOI
16 May 2011
TL;DR: This paper presents a checkpoint-restart library for CUDA that first deletes all CUDA resources before check pointing and then restores them right after check pointing, and proposes a novel technique that replays memory related API calls.
Abstract: Today, CUDA is the de facto standard programming framework to exploit the computational power of graphics processing units (GPUs) to accelerate various kinds of applications. For efficient use of a large GPU-accelerated system, one important mechanism is checkpoint-restart that can be used not only to improve fault tolerance but also to optimize node/slot allocation by suspending a job on one node and migrating the job to another node. Although several checkpoint-restart implementations have been developed so far, they do not support CUDA applications or have some severe limitations for CUDA support. Hence, we present a checkpoint-restart library for CUDA that first deletes all CUDA resources before check pointing and then restores them right after check pointing. It is necessary to restore each memory chunk at the same memory address. To this end, we propose a novel technique that replays memory related API calls. The library supports both CUDA runtime API and CUDA driver API. Moreover, the library is transparent to applications, it is not necessary to recompile the applications for check pointing. This paper demonstrates that the proposed library can achieve checkpoint-restart of various applications at acceptable overheads, and the library also works for MPI applications such as HPL.

51 citations

Proceedings ArticleDOI
31 Oct 2008
TL;DR: The evaluation results clearly indicate that the runtime processor selection at executing each kernel with given data streams is promising for energy-aware computing on a hybrid computing system.
Abstract: A commodity personal computer (PC) can be seen as a hybrid computing system equipped with two different kinds of processors, i.e. CPU and a graphics processing unit (GPU). Since the superiorities of GPUs in the performance and the power efficiency strongly depend on the system configuration and the data size determined at the runtime, a programmer cannot always know which processor should be used to execute a certain kernel. Therefore, this paper presents a runtime environment that dynamically selects an appropriate processor so as to improve the energy efficiency. The evaluation results clearly indicate that the runtime processor selection at executing each kernel with given data streams is promising for energy-aware computing on a hybrid computing system.

43 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The Super Dual Auroral Radar Network (SuperDARN) as discussed by the authors has been operating as an international co-operative organization for over 10 years and has been successful in addressing a wide range of scientific questions concerning processes in the magnetosphere, ionosphere, thermosphere, and mesosphere, as well as general plasma physics questions.
Abstract: The Super Dual Auroral Radar Network (SuperDARN) has been operating as an international co-operative organization for over 10 years. The network has now grown so that the fields of view of its 18 radars cover the majority of the northern and southern hemisphere polar ionospheres. SuperDARN has been successful in addressing a wide range of scientific questions concerning processes in the magnetosphere, ionosphere, thermosphere, and mesosphere, as well as general plasma physics questions. We commence this paper with a historical introduction to SuperDARN. Following this, we review the science performed by SuperDARN over the last 10 years covering the areas of ionospheric convection, field-aligned currents, magnetic reconnection, substorms, MHD waves, the neutral atmosphere, and E-region ionospheric irregularities. In addition, we provide an up-to-date description of the current network, as well as the analysis techniques available for use with the data from the radars. We conclude the paper with a discussion of the future of SuperDARN, its expansion, and new science opportunities.

690 citations

Journal Article
GU Si-yang1
TL;DR: A privacy preserving association rule mining algorithm was introduced that preserved privacy of individual values by computing scalar product and the security was analyzed.
Abstract: A privacy preserving association rule mining algorithm was introducedThis algorithm preserved privacy of individual values by computing scalar productMeanwhile the algorithm of computing scalar product was given and the security was analyzed

658 citations

Journal ArticleDOI
TL;DR: This article surveys Heterogeneous Computing Techniques (HCTs) such as workload partitioning that enable utilizing both CPUs and GPUs to improve performance and/or energy efficiency and reviews both discrete and fused CPU-GPU systems.
Abstract: As both CPUs and GPUs become employed in a wide range of applications, it has been acknowledged that both of these Processing Units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated a significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this article, we survey Heterogeneous Computing Techniques (HCTs) such as workload partitioning that enable utilizing both CPUs and GPUs to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler, and application levels. Further, we review both discrete and fused CPU-GPU systems and discuss benchmark suites designed for evaluating Heterogeneous Computing Systems (HCSs). We believe that this article will provide insights into the workings and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.

414 citations

Journal ArticleDOI
TL;DR: The development of molecular modeling algorithms that leverage GPU computing, the advances already made and remaining issues to be resolved, and the continuing evolution of GPU technology that promises to become even more useful to molecular modeling are surveyed.
Abstract: Graphics processing units (GPUs) have traditionally been used in molecular modeling solely for visualization of molecular structures and animation of trajectories resulting from molecular dynamics simulations. Modern GPUs have evolved into fully programmable, massively parallel co-processors that can now be exploited to accelerate many scientific computations, typically providing about one order of magnitude speedup over CPU code and in special cases providing speedups of two orders of magnitude. This paper surveys the development of molecular modeling algorithms that leverage GPU computing, the advances already made and remaining issues to be resolved, and the continuing evolution of GPU technology that promises to become even more useful to molecular modeling. Hardware acceleration with commodity GPUs is expected to benefit the overall computational biology community by bringing teraflops performance to desktop workstations and in some cases potentially changing what were formerly batch-mode computational jobs into interactive tasks.

408 citations

01 Jan 2003
TL;DR: This work has provided a keyword index to help finding articles of interest, and additionally a modern automatically constructed variant of a thematic index: a WEBSOM interface to the whole article collection of years 1981-2000.
Abstract: The Self-Organizing Map (SOM) algorithm has attracted a great deal of interest among researches and practitioners in a wide variety of fields. The SOM has been analyzed extensively, a number of variants have been developed and, perhaps most notably, it has been applied extensively within fields ranging from engineering sciences to medicine, biology, and economics. We have collected a comprehensive list of 5384 scientific papers that use the algorithms, have benefited from them, or contain analyses of them. The list is intended to serve as a source for literature surveys. The present addendum contains 2092 new articles, mainly from the years 1998-2002. We have provided a keyword index to help finding articles of interest, and additionally a modern automatically constructed variant of a thematic index: a WEBSOM interface to the whole article collection of years 1981-2000. The SOM of SOMs is available at http://websom.hut.fi/websom/somref/search.cgi for browsing and searching the collection.

402 citations