scispace - formally typeset
Search or ask a question
Journal ArticleDOI

GPU computing in medical physics: a review.

01 May 2011-Medical Physics (American Association of Physicists in Medicine)-Vol. 38, Iss: 5, pp 2685-2697
TL;DR: The authors review the basic principles of GPU computing as well as the main performance optimization techniques, and survey existing applications in three areas of medical physics, namely image reconstruction, dose calculation and treatment plan optimization, and image processing.
Abstract: The graphics processing unit (GPU) has emerged as a competitive platform for computing massively parallel problems. Many computing applications in medical physics can be formulated as data-parallel tasks that exploit the capabilities of the GPU for reducing processing times. The authors review the basic principles of GPU computing as well as the main performance optimization techniques, and survey existing applications in three areas of medical physics, namely image reconstruction, dose calculation and treatment plan optimization, and image processing.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: An implementation of explicit solvent all atom classical molecular dynamics (MD) within the AMBER program package that runs entirely on CUDA-enabled GPUs, providing results that are statistically indistinguishable from the traditional CPU version of the software and with performance that exceeds that achievable by the CPUs running on all conventional CPU-based clusters and supercomputers.
Abstract: We present an implementation of explicit solvent all atom classical molecular dynamics (MD) within the AMBER program package that runs entirely on CUDA-enabled GPUs. First released publicly in April 2010 as part of version 11 of the AMBER MD package and further improved and optimized over the last two years, this implementation supports the three most widely used statistical mechanical ensembles (NVE, NVT, and NPT), uses particle mesh Ewald (PME) for the long-range electrostatics, and runs entirely on CUDA-enabled NVIDIA graphics processing units (GPUs), providing results that are statistically indistinguishable from the traditional CPU version of the software and with performance that exceeds that achievable by the CPU version of AMBER software running on all conventional CPU-based clusters and supercomputers. We briefly discuss three different precision models developed specifically for this work (SPDP, SPFP, and DPDP) and highlight the technical details of the approach as it extends beyond previously reported work [Gotz et al., J. Chem. Theory Comput. 2012, DOI: 10.1021/ct200909j; Le Grand et al., Comp. Phys. Comm. 2013, DOI: 10.1016/j.cpc.2012.09.022].We highlight the substantial improvements in performance that are seen over traditional CPU-only machines and provide validation of our implementation and precision models. We also provide evidence supporting our decision to deprecate the previously described fully single precision (SPSP) model from the latest release of the AMBER software package.

2,418 citations

Journal ArticleDOI
TL;DR: This review presents the past and present work on GPU accelerated medical image processing, and is meant to serve as an overview and introduction to existing GPU implementations.

360 citations

Journal ArticleDOI
TL;DR: It is concluded that most segmentation methods may benefit from GPU processing due to the methods' data parallel structure and high thread count, however, factors such as synchronization, branch divergence and memory usage can limit the speedup.

230 citations


Cites background from "GPU computing in medical physics: a..."

  • ...Pratx and Xing (2011) provided a review on GPU computing in medical physics with focus on the applications image reconstruction, dose calculation and treatment plan optimization, and image processing....

    [...]

  • ...Rumpf and Strzodka (2001) presented a GPU implementation as early as in 2001....

    [...]

  • ...Rumpf and Strzodka (2001) presented a GPU implementation as early as in 2001. Updating the level set function / for voxels far away from the contour, does not significantly affect the movement of the contour. This observation has lead to two different optimization techniques, known as narrow band and sparse field. Both reduce the number of voxels updated in each iteration. The narrow band method updates / only within a thin band around the contour. However, the sparse field method updates / only at the neighbor pixels of the contour. Although these methods reduce the number of threads considerably, they introduce branch divergence. All of these level set methods also require global synchronization after each iteration. Hong and Wang (2004) used shader programming to create a GPU implementation of level sets for 2D images, and reported a speedup of over 10 times that of a CPU implementation....

    [...]

Journal ArticleDOI
TL;DR: A review of breast tomosynthesis research is performed, with an emphasis on its medical physics aspects, including reconstruction, image processing, and analysis, as well as the advanced applications being investigated for breasttomosynthesis.
Abstract: Many important post-acquisition aspects of breast tomosynthesis imaging can impact its clinical performance. Chief among them is the reconstruction algorithm that generates the representation of the three-dimensional breast volume from the acquired projections. But even after reconstruction, additional processes, such as artifact reduction algorithms, computer aided detection and diagnosis, among others, can also impact the performance of breast tomosynthesis in the clinical realm. In this two part paper, a review of breast tomosynthesis research is performed, with an emphasis on its medical physics aspects. In the companion paper, the first part of this review, the research performed relevant to the image acquisition process is examined. This second part will review the research on the post-acquisition aspects, including reconstruction, image processing, and analysis, as well as the advanced applications being investigated for breast tomosynthesis.

215 citations

Journal ArticleDOI
04 Jun 2013-Sensors
TL;DR: The currently available optoacoustic image reconstruction and quantification approaches are assessed, including back-projection and model-based inversion algorithms, sparse signal representation, wavelet-based approaches, methods for reduction of acoustic artifacts as well as multi-spectral methods for visualization of tissue bio-markers.
Abstract: This paper comprehensively reviews the emerging topic of optoacoustic imaging from the image reconstruction and quantification perspective. Optoacoustic imaging combines highly attractive features, including rich contrast and high versatility in sensing diverse biological targets, excellent spatial resolution not compromised by light scattering, and relatively low cost of implementation. Yet, living objects present a complex target for optoacoustic imaging due to the presence of a highly heterogeneous tissue background in the form of strong spatial variations of scattering and absorption. Extracting quantified information on the actual distribution of tissue chromophores and other biomarkers constitutes therefore a challenging problem. Image quantification is further compromised by some frequently-used approximated inversion formulae. In this review, the currently available optoacoustic image reconstruction and quantification approaches are assessed, including back-projection and model-based inversion algorithms, sparse signal representation, wavelet-based approaches, methods for reduction of acoustic artifacts as well as multi-spectral methods for visualization of tissue bio-markers. Applicability of the different methodologies is further analyzed in the context of real-life performance in small animal and clinical in-vivo imaging scenarios.

182 citations


Cites background from "GPU computing in medical physics: a..."

  • ...The graphic processing units (GPUs), which are increasingly used in medical imaging in the recent years due to their vast computational power [87,88], are expected to mitigate or significantly reduce complications concerned with computational complexities in optoacoustic imaging....

    [...]

References
More filters
18 Dec 2006
TL;DR: The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar.
Abstract: Author(s): Asanovic, K; Bodik, R; Catanzaro, B; Gebis, J; Husbands, P; Keutzer, K; Patterson, D; Plishker, W; Shalf, J; Williams, SW | Abstract: The recent switch to parallel microprocessors is a milestone in the history of computing. Industry has laid out a roadmap for multicore designs that preserves the programming paradigm of the past via binary compatibility and cache coherence. Conventional wisdom is now to double the number of cores on a chip with each silicon generation. A multidisciplinary group of Berkeley researchers met nearly two years to discuss this change. Our view is that this evolutionary approach to parallel hardware and software may work from 2 or 8 processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized, just as returns fell with greater instruction-level parallelism. We believe that much can be learned by examining the success of parallelism at the extremes of the computing spectrum, namely embedded computing and high performance computing. This led us to frame the parallel landscape with seven questions, and to recommend the following: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS per development dollar. • Instead of traditional benchmarks, use 13 “Dwarfs” to design and evaluate parallel programming models and architectures. (A dwarf is an algorithmic method that captures a pattern of computation and communication.) • “Autotuners” should play a larger role than conventional compilers in translating parallel programs. • To maximize programmer productivity, future programming models must be more human-centric than the conventional focus on hardware or applications. • To be successful, programming models should be independent of the number of processors. • To maximize application efficiency, programming models should support a wide range of data types and successful models of parallelism: task-level parallelism, word-level parallelism, and bit-level parallelism. 1 The Landscape of Parallel Computing Research: A View From Berkeley • Architects should not include features that significantly affect performance or energy if programmers cannot accurately measure their impact via performance counters and energy counters. • Traditional operating systems will be deconstructed and operating system functionality will be orchestrated using libraries and virtual machines. • To explore the design space rapidly, use system emulators based on Field Programmable Gate Arrays (FPGAs) that are highly scalable and low cost. Since real world applications are naturally parallel and hardware is naturally parallel, what we need is a programming model, system software, and a supporting architecture that are naturally parallel. Researchers have the rare opportunity to re-invent these cornerstones of computing, provided they simplify the efficient programming of highly parallel systems.

2,262 citations

Proceedings ArticleDOI
11 Aug 2008
TL;DR: Presents a collection of slides covering the following topics: CUDA parallel programming model; CUDA toolkit and libraries; performance optimization; and application development.
Abstract: The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now parallel systems. Furthermore, their parallelism continues to scale with Moore's law. The challenge is to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism to manycore GPUs with widely varying numbers of cores.

2,216 citations

Journal ArticleDOI
TL;DR: This report describes, summarize, and analyzes the latest research in mapping general‐purpose computation to graphics hardware.
Abstract: The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware a compelling platform for computationally demanding tasks in a wide variety of application domains. In this report, we describe, summarize, and analyze the latest research in mapping general-purpose computation to graphics hardware. We begin with the technical motivations that underlie general-purpose computation on graphics processors (GPGPU) and describe the hardware and software developments that have led to the recent interest in this field. We then aim the main body of this report at two separate audiences. First, we describe the techniques used in mapping general-purpose computation to graphics hardware. We believe these techniques will be generally useful for researchers who plan to develop the next generation of GPGPU algorithms and techniques. Second, we survey and categorize the latest developments in general-purpose application development on graphics hardware. This survey should be of particular interest to researchers who are interested in using the latest GPGPU applications in their systems of interest.

1,998 citations

Proceedings Article
01 Jan 2005
TL;DR: The techniques used in mapping general-purpose computation to graphics hardware will be generally useful for researchers who plan to develop the next generation of GPGPU algorithms and techniques.
Abstract: The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware a compelling platform for computationally demanding tasks in a wide variety of application domains. In this report, we describe, summarize, and analyze the latest research in mapping general-purpose computation to graphics hardware. We begin with the technical motivations that underlie general-purpose computation on graphics processors (GPGPU) and describe the hardware and software developments that have led to the recent interest in this field. We then aim the main body of this report at two separate audiences. First, we describe the techniques used in mapping general-purpose computation to graphics hardware. We believe these techniques will be generally useful for researchers who plan to develop the next generation of GPGPU algorithms and techniques. Second, we survey and categorize the latest developments in general-purpose application development on graphics hardware. This survey should be of particular interest to researchers who are interested in using the latest GPGPU applications in their systems of interest.

1,728 citations

01 May 2008
TL;DR: The background, hardware, and programming model for GPU computing is described, the state of the art in tools and techniques are summarized, and four GPU computing successes in game physics and computational biophysics that deliver order-of-magnitude performance gains over optimized CPU applications are presented.
Abstract: The graphics processing unit (GPU) has become an integral part of today's mainstream computing systems. Over the past six years, there has been a marked increase in the performance and capabilities of GPUs. The modern GPU is not only a powerful graphics engine but also a highly parallel programmable processor featuring peak arithmetic and memory bandwidth that substantially outpaces its CPU counterpart. The GPU's rapid increase in both programmability and capability has spawned a research community that has successfully mapped a broad range of computationally demanding, complex problems to the GPU. This effort in general-purpose computing on the GPU, also known as GPU computing, has positioned the GPU as a compelling alternative to traditional microprocessors in high-performance computer systems of the future. We describe the background, hardware, and programming model for GPU computing, summarize the state of the art in tools and techniques, and present four GPU computing successes in game physics and computational biophysics that deliver order-of-magnitude performance gains over optimized CPU applications.

1,570 citations