Model-based synthesis and optimization of static multi-rate image processing algorithms

doi:10.5555/1874620.1874652

Home
/
Papers
/
Model-based synthesis and optimization of static multi-rate image processing algorithms

Proceedings Article•DOI•

Model-based synthesis and optimization of static multi-rate image processing algorithms

Joachim Keinert, Hritam Dutta¹, Frank Hannig¹, Christian Haubelt¹, Jürgen Teich¹ - Show less +1 more•Institutions (1)

University of Erlangen-Nuremberg¹

20 Apr 2009-pp 135-140

TL;DR: This paper presents a new design flow which permits to specify, analyze, and synthesize complex image processing algorithms, and a novel buffer requirement analysis allows exploiting possible tradeoffs between required communication memory and computational logic for multi-rate applications.

read less

Abstract: High computational effort in modern image processing applications like medical imaging or high-resolution video processing often demands for massively parallel special purpose architectures in form of FPGAs or ASICs. However, their efficient implementation is still a challenge, as the design complexity causes exploding development times and costs. This paper presents a new design flow which permits to specify, analyze, and synthesize complex image processing algorithms. A novel buffer requirement analysis allows exploiting possible tradeoffs between required communication memory and computational logic for multi-rate applications. The derived schedule and buffer results are taken into account for resource optimized synthesis of the required hardware accelerators. Application to a multi-resolution filter shows that buffer analysis is possible in less than one second and that scheduling alternatives influence the required communication memory by up to 24% and the computational resources by up to 16%.

...read moreread less

Citations

PDF

Open Access

More filters

Exploring Trade-Offs inBuffer Requirements and Throughput Constraints forSynchronous Dataflow Graphs*

[...]

Sander Stuijk

01 Jan 2006

TL;DR: This work presents exact techniques to chart the Pareto space of throughput and storage tradeoffs, which can be used to determine the minimal storage space needed to execute a graph under a given throughput constraint.

...read moreread less

Abstract: sultinanimplementation thatcannotbeexecuted within Multimedia applications usually havethroughput constraints. these timing constraints. Itisnecessary totakethetiming An implementation mustmeetthese constraints, while it constraints intoaccount while minimizing thebuffers. Sevminimizes resource usageandenergyconsumption. The eralapproaches havebeenproposed forminimizing buffer computeintensive kernels oftheseapplications areoften requirements underathroughput constraint. In[9], atechspecified asSynchronous Dataflow Graphs.Communica- nique basedonlinear programming isproposed tocalculate tionbetween nodesinthesegraphs requires storage space aschedule thatrealizes themaximalthroughput whileit whichinfluences throughput. We present exacttechniquestries tominimize buffer sizes. Hwangetal.propose aheuristochart thePareto spaceofthroughput andstorage trade- ticthatcantakeresource constraints into account [10]. This offs, whichcanbeusedtodetermine theminimal storage methodistargeted towards a-cyclic graphs anditalways spaceneeded toexecute agraphunderagiven throughputmaximizes throughput rather thanusing athroughput conconstraint. Thefeasibility oftheapproach isdemonstratedstraint. Thus,itcouldleadtoadditional resource requirewithanumberofexamples. ments. In[13], buffer minimization formaximal throughput ofa subclass ofSDFGs(homogeneous SDFGs) isachieved Categories andSubject Descriptors: C.3[Special-pur- viaaninteger linear programming approach. Ingeneral, poseandApplication-based Systems] Signal processing sys- theminimal buffer sizes obtained withthisapproach cantems notbetranslated toexact minimal buffer sizes forarbitrary GeneralTerms:Algorithms, Experimentation, Theory. SDFGs.We propose, incontrast toexisting work, an ex

...read moreread less

166 citations

Proceedings Article•DOI•

Worst-case performance analysis of synchronous dataflow scenarios

[...]

Marc Geilen¹, Sander Stuijk¹•Institutions (1)

Eindhoven University of Technology¹

24 Oct 2010

TL;DR: A technique and a tool to automatically analyse a scenario-aware dataflow model for its worst-case performance analysis and introduces a formal semantics of the model, in terms of (max; +) linear system-theory and in particular (max ; +) automata.

...read moreread less

Abstract: Synchronous Dataflow (SDF) is a powerful analysis tool for regular, cyclic, parallel task graphs. The behaviour of SDF graphs however is static and therefore not always able to accurately capture the behaviour of modern, dynamic dataflow applications, such as embedded multimedia codecs. An approach to tackle this limitation is by means of scenarios. In this paper we introduce a technique and a tool to automatically analyse a scenario-aware dataflow model for its worst-case performance. A system is specified as a collection of SDF graphs representing individual scenarios of behaviour and a finite state machine that specifies the possible orders of scenario occurrences. This combination accurately captures more dynamic applications and this way provides tighter results than an existing analysis based on a conservative static dataflow model, which is too pessimistic, while looking only at the ‘worst-case’ individual scenario, without considering scenario transitions, can be too optimistic. We introduce a formal semantics of the model, in terms of (max; +) linear system-theory and in particular (max; +) automata. Leveraging existing results and algorithms from this domain, we give throughput analysis and state space generation algorithms for worst-case performance analysis. The method is implemented in a tool and the effectiveness of the approach is experimentally evaluated.

...read moreread less

104 citations

Cites methods from "Model-based synthesis and optimizat..."

...A number of model-based design approaches [12, 16, 22, 19] for firm real-time parallel streaming applications are based on conservative dataflow actor models such as Synchronous Dataflow [17, 23] or a slight generalisation, CycloStatic Dataflow CSDF [2]....
[...]

Dissertation•

Scheduling Techniques for High-Throughput Loop Accelerators.

[...]

Frank Hannig

01 Jan 2009

22 citations

Proceedings Article•DOI•

A deeply pipelined and parallel architecture for denoising medical images

[...]

Frank Hannig¹, Moritz Schmid¹, Jürgen Teich¹, Heinz Hornegger²•Institutions (2)

University of Erlangen-Nuremberg¹, Siemens²

01 Dec 2010

TL;DR: An almost automatic synthesis of a highly complex, throughput optimized architecture of an adaptive multiresolution filter as used in medical image processing for FPGAs is presented and it is concluded that the FPGA implementation of theMultiresolution image processing algorithm is far ahead of a comparable implementation for graphics cards in terms of power efficiency.

...read moreread less

Abstract: In this paper we present an almost automatic synthesis of a highly complex, throughput optimized architecture of an adaptive multiresolution filter as used in medical image processing for FPGAs. The filter consists of 16 parallel working modules, where the most computationally intensive module achieves software pipelining of a factor of 85, that is, computations of 85 iterations overlap each other. By applying a state-of-the-art high-level synthesis tool, we show that this approach can be used for real world applications. In addition, we show that our high-level synthesis tool is capable of significantly reducing the well known productivity gap of embedded system design by almost two orders of magnitude. Finally, we can conclude that the FPGA implementation of the multiresolution image processing algorithm is far ahead of a comparable implementation for graphics cards in terms of power efficiency.

...read moreread less

16 citations

Cites methods from "Model-based synthesis and optimizat..."

...The dimensioning of intermediate memories in form of shift registers between the different blocks was done by the techniques proposed in [8] and verified by VHDL simulation....
[...]

Journal Article•DOI•

Research and Parameter Optimization of the Pattern Recognition Algorithm for the Eye Tracking Infrared Sensor

[...]

Krzysztof Murawski¹, Krzysztof Różanowski, Mariusz Krej•Institutions (1)

Military University of Technology in Warsaw¹

01 Sep 2013-Acta Physica Polonica A

TL;DR: The object of the study was to optimize the pattern recognition algorithm in terms of CPU utilization, and the duration of the algorithm steps were measured as a function of the allocation of tasks to multiple processor cores.

...read moreread less

Abstract: The paper presents the process and results of tests of the pattern recognition algorithm. The algorithm has been developed for the sensor to track the activity of the eyes. The study was conducted on a group of ten people. The group was selected, so that it was possible to determine the in uence of gender, age, eye color, contact lenses and glasses on the result of the algorithm. The object of the study was also to optimize the pattern recognition algorithm in terms of CPU utilization. For this purpose the duration of the algorithm steps were measured as a function of the allocation of tasks to multiple processor cores.

...read moreread less

14 citations

References

PDF

Open Access

More filters

Journal Article•DOI•

Synthesis of Embedded Software from Synchronous Dataflow Specifications

[...]

Shuvra S. Bhattacharyya¹, Praveen K. Murthy, Edward A. Lee²•Institutions (2)

University of Maryland, College Park¹, University of California, Berkeley²

01 Jun 1999

TL;DR: This paper reviews a set of algorithms for compiling dataflow programs for embedded DSP applications into efficient implementations on programmable digital signal processors that focus primarily on the minimization of code size, and the minimizing of the memory required for the buffers that implement the communication channels in the input dataflow graph.

...read moreread less

Abstract: The implementation of software for embedded digital signal processing (DSP) applications is an extremely complex process. The complexity arises from escalating functionality in the applicationss intense time-to-market pressuress and stringent cost, power and speed constraints. To help cope with such complexity, DSP system designers have increasingly been employing high-level, graphical design environments in which system specification is based on hierarchical dataflow graphs. Consequently, a significant industry has emerged for the development of data-flow-based DSP design environments. Leading products in this industry include SPW from Cadence, COSSAP from Synopsys, ADS from Hewlett Packard, and DSP Station from Mentor Graphics. This paper reviews a set of algorithms for compiling dataflow programs for embedded DSP applications into efficient implementations on programmable digital signal processors. The algorithms focus primarily on the minimization of code size, and the minimization of the memory required for the buffers that implement the communication channels in the input dataflow graph. These are critical problems because programmable digital signal processors have very limited amounts of on-chip memory, and the speed, power, and cost penalties for using off-chip memory are often prohibitively high for embedded applications. Furthermore, memory demands of applications are increasing at a significantly higher rate than the rate of increase in on-chip memory capacity offered by improved integrated circuit technology.

...read moreread less

234 citations

Additional excerpts

...Data flow models of computation allow for efficient determination of system throughput [8] and required buffer sizes [9], [10]....
[...]

Journal Article•DOI•

Nonlinear Diffusion in Laplacian Pyramid Domain for Ultrasonic Speckle Reduction

[...]

Fan Zhang¹, Yangmo Yoo², Liang Mong Koh¹, Yongmin Kim²•Institutions (2)

Nanyang Technological University¹, University of Washington²

29 Jan 2007-IEEE Transactions on Medical Imaging

TL;DR: The visual comparison of despeckled in vivo ultrasound images from liver and carotid artery shows that the proposed LPND method could effectively preserve edges and detailed structures while thoroughly suppressing speckle.

...read moreread less

Abstract: A new speckle reduction method, i.e., Laplacian pyramid-based nonlinear diffusion (LPND), is proposed for medical ultrasound imaging. With this method, speckle is removed by nonlinear diffusion filtering of bandpass ultrasound images in Laplacian pyramid domain. For nonlinear diffusion in each pyramid layer, a gradient threshold is automatically determined by a variation of median absolute deviation (MAD) estimator. The performance of the proposed LPND method has been compared with that of other speckle reduction methods, including the recently proposed speckle reducing anisotropic diffusion (SRAD) and nonlinear coherent diffusion (NCD). In simulation and phantom studies, an average gain of 1.55 dB and 1.34 dB in contrast-to-noise ratio was obtained compared to SRAD and NCD, respectively. The visual comparison of despeckled in vivo ultrasound images from liver and carotid artery shows that the proposed LPND method could effectively preserve edges and detailed structures while thoroughly suppressing speckle. These preliminary results indicate that the proposed speckle reduction method could improve image quality and the visibility of small structures and fine details in medical ultrasound imaging

...read moreread less

204 citations

Additional excerpts

...Application to a multi-resolution filter used in medical imaging [6] shows the benefits of our approach....
[...]

Proceedings Article•DOI•

Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs

[...]

Sander Stuijk¹, Marc Geilen¹, Twan Basten¹•Institutions (1)

Eindhoven University of Technology¹

24 Jul 2006

TL;DR: In this paper, exact techniques to chart the Pareto space of throughput and storage trade-offs are presented, which can be used to determine the minimal storage space needed to execute a graph under a given throughput constraint.

...read moreread less

Abstract: Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as Synchronous Dataflow Graphs. Communication between nodes in these graphs requires storage space which influences throughput. We present exact techniques to chart the Pareto space of throughput and storage trade-offs, which can be used to determine the minimal storage space needed to execute a graph under a given throughput constraint. The feasibility of the approach is demonstrated with a number of examples.

...read moreread less

172 citations

Exploring Trade-Offs inBuffer Requirements and Throughput Constraints forSynchronous Dataflow Graphs*

[...]

Sander Stuijk

01 Jan 2006

...read moreread less

166 citations

Additional excerpts

...Data flow models of computation allow for efficient determination of system throughput [8] and required buffer sizes [9], [10]....
[...]

Journal Article•DOI•

Systematic and Automated Multiprocessor System Design, Programming, and Implementation

[...]

Hristo Nikolov¹, Todor Stefanov¹, Ed F. Deprettere¹•Institutions (1)

Leiden University¹

01 Mar 2008-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper used ESPAM to automatically generate and program several multiprocessor systems that execute three image processing applications, namely Sobel edge detection, Discrete Wavelet Transform, and Motion JPEG encoder, to validate and evaluate the methodology and techniques implemented.

...read moreread less

Abstract: For modern embedded systems in the realm of high-throughput multimedia, imaging, and signal processing, the complexity of embedded applications has reached a point where the performance requirements of these applications can no longer be supported by embedded system architectures based on a single processor. Thus, the emerging embedded system-on-chip platforms are increasingly becoming multiprocessor architectures. As a consequence, two major problems emerge, namely how to design and how to program such multiprocessor platforms in a systematic and automated way in order to reduce the design time and to satisfy the performance needs of applications executed on such platforms. As an efficient solution to these two problems, in this paper, we present the methodology and techniques implemented in a tool called Embedded System-level Platform synthesis and Application Mapping (ESPAM) for automated multiprocessor system design, programming, and implementation. ESPAM moves the design specification and programming from the Register Transfer Level and low-level C to a higher system level of abstraction. We explain how, starting from system-level platform, application, and mapping specifications, a multiprocessor platform is synthesized, programmed, and implemented in a systematic and automated way. The class of multiprocessor platforms we consider is introduced as well. To validate and evaluate our methodology, we used ESPAM to automatically generate and program several multiprocessor systems that execute three image processing applications, namely Sobel edge detection, Discrete Wavelet Transform, and Motion JPEG encoder. The performance of the systems that execute these applications is also presented in this paper.

...read moreread less

143 citations