scispace - formally typeset
Search or ask a question
Author

Naoki Takada

Bio: Naoki Takada is an academic researcher from Kōchi University. The author has contributed to research in topics: Graphics processing unit & Holography. The author has an hindex of 16, co-authored 57 publications receiving 965 citations. Previous affiliations of Naoki Takada include Shohoku College & MediaTech Institute.


Papers
More filters
Journal ArticleDOI
TL;DR: Using a RV870 GPU and OpenCL, a CGH from a 3D object consisting of 1,024 points in 30 milli-seconds is calculated, a speed approximately two times faster than that of a GPU made by NVIDIA.
Abstract: In this paper, we report fast calculation of a computer-generated-hologram using a new architecture of the HD5000 series GPU (RV870) made by AMD and its new software development environment, OpenCL. Using a RV870 GPU and OpenCL, we can calculate 1,920 x 1,024 resolution of a CGH from a 3D object consisting of 1,024 points in 30 milli-seconds. The calculation speed realizes a speed approximately two times faster than that of a GPU made by NVIDIA.

128 citations

Journal ArticleDOI
TL;DR: A new C++ class library for diffraction and CGH calculations, referred to as a CWO++ library, running on a CPU and GPU, which provides diffraction calculations useful for Computer Generated Holograms, digital holography, diffractive optical elements, microscopy, image encryption and decryption and three-dimensional analysis for optical devices.

108 citations

Proceedings ArticleDOI
15 Nov 2003
TL;DR: The Protein Explorer is a PC cluster equipped with special-purpose engines that calculate nonbonded interactions between atoms, which is the most time-consuming part of the simulations.
Abstract: We are developing the 'Protein Explorer' system, a petaflops special-purpose computer system for molecular dynamics simulations. The Protein Explorer is a PC cluster equipped with special-purpose engines that calculate nonbonded interactions between atoms, which is the most time-consuming part of the simulations. A dedicated LSI 'MDGRAPE-3 chip' performs these force calculations at a speed of 165 gigaflops or higher. The system will have 6,144 MDGRAPE-3 chips to achieve a nominal peak performance of one petaflop. The system will be completed in 2006. In this paper, we describe the project plans and the architecture of the Protein Explorer.

91 citations

Journal ArticleDOI
TL;DR: This work implements an optimized CGH computation in their multi-graphics processing unit cluster system, which can calculate a CGH of 6,400×3,072 pixels from a three-dimensional object composed of 2,048 points in 55 ms.
Abstract: To overcome the computational complexity of a computer-generated hologram (CGH), we implement an optimized CGH computation in our multi-graphics processing unit cluster system. Our system can calculate a CGH of 6,400×3,072 pixels from a three-dimensional (3D) object composed of 2,048 points in 55 ms. Furthermore, in the case of a 3D object composed of 4096 points, our system is 553 times faster than a conventional central processing unit (using eight threads).

76 citations

Journal ArticleDOI
01 Apr 2018
TL;DR: It is shown that a special-purpose holography computing board, which uses eight large-scale field-programmable gate arrays, can be used to generate 108-pixel holograms that can be updated at a video frame rate, thus allowing 3D movies to be projected.
Abstract: Holography is a method of recording and reproducing three-dimensional (3D) images, and the widespread availability of computers has encouraged the development of holographic 3D screens (electroholography). However, the technology has not yet been used in practical applications because a hologram requires an enormous volume of data and modern computing power is inadequate to process this volume of data in real time. Here, we show that a special-purpose holography computing board, which uses eight large-scale field-programmable gate arrays, can be used to generate 108-pixel holograms that can be updated at a video frame rate. With our approach, we achieve a parallel operation of 4,480 hologram calculation circuits on a single board, and by clustering eight of these boards, we can increase the number of parallel calculations to 35,840. Using a 3D image composed of 7,877 points, we show that 108-pixel holograms can be updated at a video rate, thus allowing 3D movies to be projected. We also demonstrate that the system speed scales up in a linear manner as the number of parallel circuits is increased. The system operates at 0.25 GHz with an effective speed equivalent to 0.5 petaflops (1015 floating-point operations per second), matching that of a high-performance computer. A special-purpose holography computing board, which uses eight large-scale field-programmable gate arrays, can be used to generate 108-pixel holograms that can be updated at a video frame rate.

72 citations


Cited by
More filters
Proceedings ArticleDOI
11 Nov 2006
TL;DR: This work presents several new algorithms and implementation techniques that significantly accelerate parallel MD simulations compared with current state-of-the-art codes, including a novel parallel decomposition method and message-passing techniques that reduce communication requirements, as well as novel communication primitives that further reduce communication time.
Abstract: Although molecular dynamics (MD) simulations of biomolecular systems often run for days to months, many events of great scientific interest and pharmaceutical relevance occur on long time scales that remain beyond reach. We present several new algorithms and implementation techniques that significantly accelerate parallel MD simulations compared with current stateof- the-art codes. These include a novel parallel decomposition method and message-passing techniques that reduce communication requirements, as well as novel communication primitives that further reduce communication time. We have also developed numerical techniques that maintain high accuracy while using single precision computation in order to exploit processor-level vector instructions. These methods are embodied in a newly developed MD code called Desmond that achieves unprecedented simulation throughput and parallel scalability on commodity clusters. Our results suggest that Desmond?s parallel performance substantially surpasses that of any previously described code. For example, on a standard benchmark, Desmond?s performance on a conventional Opteron cluster with 2K processors slightly exceeded the reported performance of IBM?s Blue Gene/L machine with 32K processors running its Blue Matter MD code.

2,035 citations

Journal ArticleDOI
TL;DR: Computational models provide insights into the complex relationships between the stimuli and the cellular responses, and reveal the mechanisms that are responsible for signal amplification, noise reduction and generation of discontinuous bistable dynamics or oscillations.
Abstract: The specificity of cellular responses to receptor stimulation is encoded by the spatial and temporal dynamics of downstream signalling networks. Temporal dynamics are coupled to spatial gradients of signalling activities, which guide pivotal intracellular processes and tightly regulate signal propagation across a cell. Computational models provide insights into the complex relationships between the stimuli and the cellular responses, and reveal the mechanisms that are responsible for signal amplification, noise reduction and generation of discontinuous bistable dynamics or oscillations.

1,296 citations

Journal ArticleDOI
TL;DR: The rapidly evolving state of the art for atomic-level biomolecular simulation is described, the types of biological discoveries that can now be made through simulation are illustrated, and challenges motivating continued innovation in this field are discussed.
Abstract: Molecular dynamics simulations capture the behavior of biological macromolecules in full atomic detail, but their computational demands, combined with the challenge of appropriately modeling the relevant physics, have historically restricted their length and accuracy. Dramatic recent improvements in achievable simulation speed and the underlying physical models have enabled atomic-level simulations on timescales as long as milliseconds that capture key biochemical processes such as protein folding, drug binding, membrane transport, and the conformational changes critical to protein function. Such simulation may serve as a computational microscope, revealing biomolecular mechanisms at spatial and temporal scales that are difficult to observe experimentally. We describe the rapidly evolving state of the art for atomic-level biomolecular simulation, illustrate the types of biological discoveries that can now be made through simulation, and discuss challenges motivating continued innovation in this field.

974 citations

Journal ArticleDOI
01 Jul 2008
TL;DR: A massively parallel machine called Anton is described, which should be capable of executing millisecond-scale classical MD simulations of such biomolecular systems and has been designed to use both novel parallel algorithms and special-purpose logic to dramatically accelerate those calculations that dominate the time required for a typical MD simulation.
Abstract: The ability to perform long, accurate molecular dynamics (MD) simulations involving proteins and other biological macro-molecules could in principle provide answers to some of the most important currently outstanding questions in the fields of biology, chemistry, and medicine. A wide range of biologically interesting phenomena, however, occur over timescales on the order of a millisecond---several orders of magnitude beyond the duration of the longest current MD simulations. We describe a massively parallel machine called Anton, which should be capable of executing millisecond-scale classical MD simulations of such biomolecular systems. The machine, which is scheduled for completion by the end of 2008, is based on 512 identical MD-specific ASICs that interact in a tightly coupled manner using a specialized highspeed communication network. Anton has been designed to use both novel parallel algorithms and special-purpose logic to dramatically accelerate those calculations that dominate the time required for a typical MD simulation. The remainder of the simulation algorithm is executed by a programmable portion of each chip that achieves a substantial degree of parallelism while preserving the flexibility necessary to accommodate anticipated advances in physical models and simulation methods.

778 citations

Proceedings ArticleDOI
01 Jan 2006

563 citations