scispace - formally typeset
Search or ask a question
Author

Taisuke Boku

Bio: Taisuke Boku is an academic researcher from University of Tsukuba. The author has contributed to research in topics: Supercomputer & Benchmark (computing). The author has an hindex of 21, co-authored 180 publications receiving 2358 citations. Previous affiliations of Taisuke Boku include Kyoto Prefectural University & University of Tokyo.


Papers
More filters
Journal ArticleDOI
01 Feb 2011
TL;DR: The work of the community to prepare for the challenges of exascale computing is described, ultimately combing their efforts in a coordinated International Exascale Software Project.
Abstract: Over the last 20 years, the open-source community has provided more and more software on which the world’s high-performance computing systems depend for performance and productivity. The community has invested millions of dollars and years of effort to build key components. However, although the investments in these separate software elements have been tremendously valuable, a great deal of productivity has also been lost because of the lack of planning, coordination, and key integration of technologies necessary to make them work together smoothly and efficiently, both within individual petascale systems and between different systems. It seems clear that this completely uncoordinated development model will not provide the software needed to support the unprecedented parallelism required for peta/ exascale computation on millions of cores, or the flexibility required to exploit new hardware models and features, such as transactional memory, speculative execution, and graphics processing units. This report describes the work of the community to prepare for the challenges of exascale computing, ultimately combing their efforts in a coordinated International Exascale Software Project.

736 citations

Proceedings ArticleDOI
25 Apr 2006
TL;DR: This paper proposes an optimization algorithm to select a gear using the execution and power profile by taking the transition overhead into account, and achieves almost 40% reduction in terms of EDP without performance impact compared to results using the standard clock frequency.
Abstract: Currently, several of the high performance processors used in a PC cluster have a DVS (dynamic voltage scaling) architecture that can dynamically scale processor voltage and frequency. Adaptive scheduling of the voltage and frequency enables us to reduce power dissipation without a performance slowdown during communication and memory access. In this paper, we propose a method of profiled-based power-performance optimization by DVS scheduling in a high-performance PC cluster. We divide the program execution into several regions and select the best gear for power efficiency. Selecting the best gear is not straightforward since the overhead of DVS transition is not free. We propose an optimization algorithm to select a gear using the execution and power profile by taking the transition overhead into account. We have built and designed a power-profiling system, PowerWatch. With this system we examined the effectiveness of our optimization algorithm on two types of power-scalable clusters (Crusoe and Turion). According to the results of benchmark tests, we achieved almost 40% reduction in terms of EDP (energy-delay product) without performance impact (less than 5%) compared to results using the standard clock frequency.

129 citations

Journal ArticleDOI
TL;DR: A first-principles density functional program that efficiently performs large-scale calculations on massively-parallel computers and obtains a self-consistent electronic-structure in a few hundred hours is developed.

126 citations

Proceedings ArticleDOI
01 Sep 2006
TL;DR: A new algorithm is proposed that reduces energy consumption in a parallel program executed on a power-scalable cluster using DVFS, which reclaims slack time by changing the voltage and frequency, which allows a reduction in energy consumption without impacting on the performance of the program.
Abstract: It has become important to improve the energy efficiency of high performance PC clusters. In PC clusters, high-performance microprocessors have a dynamic voltage and frequency scaling (DVFS) mechanism, which allows the voltage and frequency to be set for reduction in energy consumption. In this paper, we proposed a new algorithm that reduces energy consumption in a parallel program executed on a power-scalable cluster using DVFS. Whenever the computational load is not balanced, parallel programs encounter slack time, that is, they must wait for synchronization of the tasks. Our algorithm reclaims slack time by changing the voltage and frequency, which allows a reduction in energy consumption without impacting on the performance of the program. Our algorithm can be applied to parallel programs represented by a directed acyclic task graph (DAG). It selects an appropriate set of voltages and frequencies (called the gear) that allow the tasks to execute at the lowest frequency that does not increase the overall execution time, but at the same time allows the tasks to be executed as uniformly as possible in frequency. We built two different types of power-scalable clusters using AMD Turion and Transmeta Crusoe. For the empirical study on energy reduction in PC clusters, we designed a toolkit called PowerWatch that includes power monitoring tools and the DVFS control library. This toolkit precisely measures the power consumption of the entire cluster in real time. The experimental results using benchmark problems show that our algorithm reduces energy consumption by 25% with only a 1 % loss in performance

98 citations

Journal ArticleDOI
TL;DR: An overview of the capabilities of the SALMON software package is provided, showing several sample calculations of the real-time, real-space calculation of the electron dynamics induced in molecules and solids by an external electric field solving the time-dependent Kohn–Sham equation.

93 citations


Cited by
More filters
01 May 1993
TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.
Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

29,323 citations

Proceedings ArticleDOI
28 Apr 2010
TL;DR: Hedera is presented, a scalable, dynamic flow scheduling system that adaptively schedules a multi-stage switching fabric to efficiently utilize aggregate network resources and delivers bisection bandwidth that is 96% of optimal and up to 113% better than static load-balancing methods.
Abstract: Today's data centers offer tremendous aggregate bandwidth to clusters of tens of thousands of machines. However, because of limited port densities in even the highest-end switches, data center topologies typically consist of multi-rooted trees with many equal-cost paths between any given pair of hosts. Existing IP multipathing protocols usually rely on per-flow static hashing and can cause substantial bandwidth losses due to long-term collisions.In this paper, we present Hedera, a scalable, dynamic flow scheduling system that adaptively schedules a multi-stage switching fabric to efficiently utilize aggregate network resources. We describe our implementation using commodity switches and unmodified hosts, and show that for a simulated 8,192 host data center, Hedera delivers bisection bandwidth that is 96% of optimal and up to 113% better than static load-balancing methods.

1,602 citations

Book
01 Jan 1996

1,170 citations

Journal ArticleDOI
TL;DR: An in-depth study of the existing literature on data center power modeling, covering more than 200 models, organized in a hierarchical structure with two main branches focusing on hardware-centric and software-centric power models.
Abstract: Data centers are critical, energy-hungry infrastructures that run large-scale Internet-based services. Energy consumption models are pivotal in designing and optimizing energy-efficient operations to curb excessive energy consumption in data centers. In this paper, we survey the state-of-the-art techniques used for energy consumption modeling and prediction for data centers and their components. We conduct an in-depth study of the existing literature on data center power modeling, covering more than 200 models. We organize these models in a hierarchical structure with two main branches focusing on hardware-centric and software-centric power models. Under hardware-centric approaches we start from the digital circuit level and move on to describe higher-level energy consumption models at the hardware component level, server level, data center level, and finally systems of systems level. Under the software-centric approaches we investigate power models developed for operating systems, virtual machines and software applications. This systematic approach allows us to identify multiple issues prevalent in power modeling of different levels of data center systems, including: i) few modeling efforts targeted at power consumption of the entire data center ii) many state-of-the-art power models are based on a few CPU or server metrics, and iii) the effectiveness and accuracy of these power models remain open questions. Based on these observations, we conclude the survey by describing key challenges for future research on constructing effective and accurate data center power models.

741 citations

01 Jan 2017
TL;DR: The 2017 roadmap of terahertz frequency electromagnetic radiation (100 GHz-30 THz) as mentioned in this paper provides a snapshot of the present state of THz science and technology in 2017, and provides an opinion on the challenges and opportunities that the future holds.
Abstract: Science and technologies based on terahertz frequency electromagnetic radiation (100 GHz–30 THz) have developed rapidly over the last 30 years. For most of the 20th Century, terahertz radiation, then referred to as sub-millimeter wave or far-infrared radiation, was mainly utilized by astronomers and some spectroscopists. Following the development of laser based terahertz time-domain spectroscopy in the 1980s and 1990s the field of THz science and technology expanded rapidly, to the extent that it now touches many areas from fundamental science to 'real world' applications. For example THz radiation is being used to optimize materials for new solar cells, and may also be a key technology for the next generation of airport security scanners. While the field was emerging it was possible to keep track of all new developments, however now the field has grown so much that it is increasingly difficult to follow the diverse range of new discoveries and applications that are appearing. At this point in time, when the field of THz science and technology is moving from an emerging to a more established and interdisciplinary field, it is apt to present a roadmap to help identify the breadth and future directions of the field. The aim of this roadmap is to present a snapshot of the present state of THz science and technology in 2017, and provide an opinion on the challenges and opportunities that the future holds. To be able to achieve this aim, we have invited a group of international experts to write 18 sections that cover most of the key areas of THz science and technology. We hope that The 2017 Roadmap on THz science and technology will prove to be a useful resource by providing a wide ranging introduction to the capabilities of THz radiation for those outside or just entering the field as well as providing perspective and breadth for those who are well established. We also feel that this review should serve as a useful guide for government and funding agencies.

690 citations