Topic

Speedup

About: Speedup is a research topic. Over the lifetime, 23618 publications have been published within this topic receiving 390005 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A dynamic frequency linear array processor for image processing

[...]

Nagarajan Ranganathan¹, N. Bhavanishankar¹, N. Vijaykrishnan•Institutions (1)

University of South Florida¹

25 Aug 1996

TL;DR: The dynamic clocking scheme provided a speedup ranging from 1.5 to 3 over the uni-frequency clocking for various low level pattern recognition and image processing algorithms that were mapped onto the chip.

...read moreread less

Abstract: In this paper, we propose a dynamic frequency linear array processor, DFLAP, for real-time image processing applications. The architecture uses a novel concept of dynamic frequency clocking which allows the chip to operate between, a maximum frequency of 400 MHz and a minimum frequency of 50 MHz based on the operation being performed. The dynamic clocking scheme is especially useful in the contest of image processing applications where certain tasks require only logic functions while others require only additions and certain others multiplication or division. The proposed architecture provides speedup by supporting two levels of parallelism and using variable frequency single clock cycle operations. DFLAP provides parallelism at the array level using multiple processing elements (PEs) and at a functional level allowing concurrent use of various units in the PE. The array architecture contains N PEs, where the image size is N/spl times/N and each PE in turn contains an a-bit arithmetic/logic unit, an 8/spl times/8 single-cycle multiplier, a shifter, a neighbor communication unit, a 32/spl times/8 dual port SRAM and a dynamic clocking unit (DCU). The DCU an each PE enables dynamic switching of clock frequencies. The dynamic clocking scheme provided a speedup ranging from 1.5 to 3 over the uni-frequency clocking for various low level pattern recognition and image processing algorithms that were mapped onto the chip.

...read moreread less

7 citations

Proceedings Article•DOI•

Accelerated molecular mechanical and solvation energetics on multicore CPUs and manycore GPUs

[...]

Deukhyun Cha, Qin Zhang¹, Jesmin Jahan Tithi², Alexander Rand³, Rezaul Chowdhury², Chandrajit L. Bajaj⁴ - Show less +2 more•Institutions (4)

CGG¹, State University of New York System², CD-adapco³, University of Texas at Austin⁴

09 Sep 2015

TL;DR: A hybrid method which simultaneously exploits both CPU and GPU cores to provide the best performance based on selected parameters of the approximation scheme is presented, which achieves more than two orders of magnitude speedup over serial computation for many of the molecular energetics terms.

...read moreread less

Abstract: Motivation. Despite several reported acceleration successes of programmable GPUs (Graphics Processing Units) for molecular modeling and simulation tools, the general focus has been on fast computation with small molecules. This was primarily due to the limited memory size on the GPU. Moreover simultaneous use of CPU and GPU cores for a single kernel execution -- a necessity for achieving high parallelism -- has also not been fully considered. Results. We present fast computation methods for molecular mechanical (Lennard-Jones and Coulombic) and generalized Born solvation energetics which run on commodity multicore CPUs and manycore GPUs. The key idea is to trade off accuracy of pairwise, long-range atomistic energetics for higher speed of execution. A simple yet efficient CUDA kernel for GPU acceleration is presented which ensures high arithmetic intensity and memory efficiency. Our CUDA kernel uses a cache-friendly, recursive and linear-space octree data structure to handle very large molecular structures with up to several million atoms. Based on this CUDA kernel, we present a hybrid method which simultaneously exploits both CPU and GPU cores to provide the best performance based on selected parameters of the approximation scheme. Our CUDA kernels achieve more than two orders of magnitude speedup over serial computation for many of the molecular energetics terms. The hybrid method is shown to be able to achieve the best performance for all values of the approximation parameter. Availability. The source code and binaries are freely available as PMEOPA (Parallel Molecular Energetic using Octree Pairwise Approximation) and downloadable from http://cvcweb.ices.utexas.edu/software.

...read moreread less

7 citations

Proceedings Article•DOI•

A VLSI hardware accelerator for dynamic time warping

[...]

V.K. Sundaresan¹, S. Nichani¹, Nagarajan Ranganathan¹, Ravi Sankar¹•Institutions (1)

University of South Florida¹

30 Aug 1992

TL;DR: The special purpose architecture is used to perform the band matrix multiplication in order to compute the local distance metric based on Itakura's log likelihood distance.

...read moreread less

Abstract: Describes an area and time efficient systolic array architecture for computations in Dynamic Time Warping (DTW). The special purpose architecture is used to perform the band matrix multiplication in order to compute the local distance metric based on Itakura's log likelihood distance. The time complexity of the algorithm is O(nk) where n and k are the number of elements in the row of the first and second input matrices. The number of processors is equal to the bandwidth w of the output band matrix. The speedup of the parallel algorithm compared to the sequential algorithm is wz where z is the multiplier stages within a PE. The parallel algorithm can be implemented as a single VLSI chip. >

...read moreread less

7 citations

Particle Swarm Optimization: A Hardware Implementation.

[...]

Parviz Palangpour, Ganesh K. Venayagamoorthy, Scott C. Smith

01 Jan 2009

TL;DR: A pipelined architecture for hardware PSO implementation is presented and an execution speedup of several orders of magnitude is observed.

...read moreread less

Abstract: Particle Swarm Optimization (PSO) is a popular population-based optimization algorithm. While PSO has been shown to perform well in a large variety of problems, PSO is typically implemented in software. Population-based optimization algorithms such as PSO are well suited for execution in parallel stages. This allows PSO to be implemented directly in hardware and achieve much faster execution times than possible in software. In this paper, a pipelined architecture for hardware PSO implementation is presented. Benchmark functions solved by software and hardware PSO implementations are compared. The hardware PSO design is implemented on a Xilinx Virtex-II Pro Development Kit for evaluation. By implementing PSO directly on hardware an execution speedup of several orders of magnitude is observed.

...read moreread less

7 citations

Book Chapter•DOI•

Evolving FPGA Based Cellular Automata

[...]

Reid B. Porter¹, Neil W. Bergmann¹•Institutions (1)

Queensland University of Technology¹

24 Nov 1998

TL;DR: The Xilinx XC6216 Field Programmable Gate Array is described and how it is used to efficiently search a hybrid 2-state, 5- neighbour cellular automata rule space that exhibits computation universality.

...read moreread less

Abstract: Cellular Automata architectures are attractive due to their fine grain parallelism, simple computational structures and local routing resources. Some researchers have used genetic algorithms to find CA that perform useful computations. The inherently parallel cellular automata model as well as the genetic algorithm are poorly suited to implementation on general purpose microprocessor based systems. Field Programmable Gate Arrays are an alternative that can provide significant speedup. This paper describes the Xilinx XC6216 Field Programmable Gate Array and how it is used to efficiently search a hybrid 2-state, 5- neighbour cellular automata rule space that exhibits computation universality. Its application to an image processing application, binary texture analysis, is discussed.

...read moreread less

7 citations

Collapse

Network Information

Performance

Metrics

26,676

Papers

455,793

Citations

No. of papers in the topic in previous years
Year	Papers
2023	945
2022	2,078
2021	1,318
2020	1,365
2019	1,370
2018	1,406

Speedup

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics