scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Model-based synthesis and optimization of static multi-rate image processing algorithms

20 Apr 2009-pp 135-140
TL;DR: This paper presents a new design flow which permits to specify, analyze, and synthesize complex image processing algorithms, and a novel buffer requirement analysis allows exploiting possible tradeoffs between required communication memory and computational logic for multi-rate applications.
Abstract: High computational effort in modern image processing applications like medical imaging or high-resolution video processing often demands for massively parallel special purpose architectures in form of FPGAs or ASICs. However, their efficient implementation is still a challenge, as the design complexity causes exploding development times and costs. This paper presents a new design flow which permits to specify, analyze, and synthesize complex image processing algorithms. A novel buffer requirement analysis allows exploiting possible tradeoffs between required communication memory and computational logic for multi-rate applications. The derived schedule and buffer results are taken into account for resource optimized synthesis of the required hardware accelerators. Application to a multi-resolution filter shows that buffer analysis is possible in less than one second and that scheduling alternatives influence the required communication memory by up to 24% and the computational resources by up to 16%.
Citations
More filters
01 Jan 2006
TL;DR: This work presents exact techniques to chart the Pareto space of throughput and storage tradeoffs, which can be used to determine the minimal storage space needed to execute a graph under a given throughput constraint.
Abstract: sultinanimplementation thatcannotbeexecuted within Multimedia applications usually havethroughput constraints. these timing constraints. Itisnecessary totakethetiming An implementation mustmeetthese constraints, while it constraints intoaccount while minimizing thebuffers. Sevminimizes resource usageandenergyconsumption. The eralapproaches havebeenproposed forminimizing buffer computeintensive kernels oftheseapplications areoften requirements underathroughput constraint. In[9], atechspecified asSynchronous Dataflow Graphs.Communica- nique basedonlinear programming isproposed tocalculate tionbetween nodesinthesegraphs requires storage space aschedule thatrealizes themaximalthroughput whileit whichinfluences throughput. We present exacttechniquestries tominimize buffer sizes. Hwangetal.propose aheuristochart thePareto spaceofthroughput andstorage trade- ticthatcantakeresource constraints into account [10]. This offs, whichcanbeusedtodetermine theminimal storage methodistargeted towards a-cyclic graphs anditalways spaceneeded toexecute agraphunderagiven throughputmaximizes throughput rather thanusing athroughput conconstraint. Thefeasibility oftheapproach isdemonstratedstraint. Thus,itcouldleadtoadditional resource requirewithanumberofexamples. ments. In[13], buffer minimization formaximal throughput ofa subclass ofSDFGs(homogeneous SDFGs) isachieved Categories andSubject Descriptors: C.3[Special-pur- viaaninteger linear programming approach. Ingeneral, poseandApplication-based Systems] Signal processing sys- theminimal buffer sizes obtained withthisapproach cantems notbetranslated toexact minimal buffer sizes forarbitrary GeneralTerms:Algorithms, Experimentation, Theory. SDFGs.We propose, incontrast toexisting work, an ex

166 citations

Proceedings ArticleDOI
24 Oct 2010
TL;DR: A technique and a tool to automatically analyse a scenario-aware dataflow model for its worst-case performance analysis and introduces a formal semantics of the model, in terms of (max; +) linear system-theory and in particular (max ; +) automata.
Abstract: Synchronous Dataflow (SDF) is a powerful analysis tool for regular, cyclic, parallel task graphs. The behaviour of SDF graphs however is static and therefore not always able to accurately capture the behaviour of modern, dynamic dataflow applications, such as embedded multimedia codecs. An approach to tackle this limitation is by means of scenarios. In this paper we introduce a technique and a tool to automatically analyse a scenario-aware dataflow model for its worst-case performance. A system is specified as a collection of SDF graphs representing individual scenarios of behaviour and a finite state machine that specifies the possible orders of scenario occurrences. This combination accurately captures more dynamic applications and this way provides tighter results than an existing analysis based on a conservative static dataflow model, which is too pessimistic, while looking only at the ‘worst-case’ individual scenario, without considering scenario transitions, can be too optimistic. We introduce a formal semantics of the model, in terms of (max; +) linear system-theory and in particular (max; +) automata. Leveraging existing results and algorithms from this domain, we give throughput analysis and state space generation algorithms for worst-case performance analysis. The method is implemented in a tool and the effectiveness of the approach is experimentally evaluated.

104 citations


Cites methods from "Model-based synthesis and optimizat..."

  • ...A number of model-based design approaches [12, 16, 22, 19] for firm real-time parallel streaming applications are based on conservative dataflow actor models such as Synchronous Dataflow [17, 23] or a slight generalisation, CycloStatic Dataflow CSDF [2]....

    [...]

Dissertation
01 Jan 2009

22 citations

Proceedings ArticleDOI
01 Dec 2010
TL;DR: An almost automatic synthesis of a highly complex, throughput optimized architecture of an adaptive multiresolution filter as used in medical image processing for FPGAs is presented and it is concluded that the FPGA implementation of theMultiresolution image processing algorithm is far ahead of a comparable implementation for graphics cards in terms of power efficiency.
Abstract: In this paper we present an almost automatic synthesis of a highly complex, throughput optimized architecture of an adaptive multiresolution filter as used in medical image processing for FPGAs. The filter consists of 16 parallel working modules, where the most computationally intensive module achieves software pipelining of a factor of 85, that is, computations of 85 iterations overlap each other. By applying a state-of-the-art high-level synthesis tool, we show that this approach can be used for real world applications. In addition, we show that our high-level synthesis tool is capable of significantly reducing the well known productivity gap of embedded system design by almost two orders of magnitude. Finally, we can conclude that the FPGA implementation of the multiresolution image processing algorithm is far ahead of a comparable implementation for graphics cards in terms of power efficiency.

16 citations


Cites methods from "Model-based synthesis and optimizat..."

  • ...The dimensioning of intermediate memories in form of shift registers between the different blocks was done by the techniques proposed in [8] and verified by VHDL simulation....

    [...]

Journal ArticleDOI
TL;DR: The object of the study was to optimize the pattern recognition algorithm in terms of CPU utilization, and the duration of the algorithm steps were measured as a function of the allocation of tasks to multiple processor cores.
Abstract: The paper presents the process and results of tests of the pattern recognition algorithm. The algorithm has been developed for the sensor to track the activity of the eyes. The study was conducted on a group of ten people. The group was selected, so that it was possible to determine the in uence of gender, age, eye color, contact lenses and glasses on the result of the algorithm. The object of the study was also to optimize the pattern recognition algorithm in terms of CPU utilization. For this purpose the duration of the algorithm steps were measured as a function of the allocation of tasks to multiple processor cores.

14 citations

References
More filters
Journal ArticleDOI
01 Jun 1999
TL;DR: This paper reviews a set of algorithms for compiling dataflow programs for embedded DSP applications into efficient implementations on programmable digital signal processors that focus primarily on the minimization of code size, and the minimizing of the memory required for the buffers that implement the communication channels in the input dataflow graph.
Abstract: The implementation of software for embedded digital signal processing (DSP) applications is an extremely complex process. The complexity arises from escalating functionality in the applicationss intense time-to-market pressuress and stringent cost, power and speed constraints. To help cope with such complexity, DSP system designers have increasingly been employing high-level, graphical design environments in which system specification is based on hierarchical dataflow graphs. Consequently, a significant industry has emerged for the development of data-flow-based DSP design environments. Leading products in this industry include SPW from Cadence, COSSAP from Synopsys, ADS from Hewlett Packard, and DSP Station from Mentor Graphics. This paper reviews a set of algorithms for compiling dataflow programs for embedded DSP applications into efficient implementations on programmable digital signal processors. The algorithms focus primarily on the minimization of code size, and the minimization of the memory required for the buffers that implement the communication channels in the input dataflow graph. These are critical problems because programmable digital signal processors have very limited amounts of on-chip memory, and the speed, power, and cost penalties for using off-chip memory are often prohibitively high for embedded applications. Furthermore, memory demands of applications are increasing at a significantly higher rate than the rate of increase in on-chip memory capacity offered by improved integrated circuit technology.

234 citations


Additional excerpts

  • ...Data flow models of computation allow for efficient determination of system throughput [8] and required buffer sizes [9], [10]....

    [...]

Journal ArticleDOI
TL;DR: The visual comparison of despeckled in vivo ultrasound images from liver and carotid artery shows that the proposed LPND method could effectively preserve edges and detailed structures while thoroughly suppressing speckle.
Abstract: A new speckle reduction method, i.e., Laplacian pyramid-based nonlinear diffusion (LPND), is proposed for medical ultrasound imaging. With this method, speckle is removed by nonlinear diffusion filtering of bandpass ultrasound images in Laplacian pyramid domain. For nonlinear diffusion in each pyramid layer, a gradient threshold is automatically determined by a variation of median absolute deviation (MAD) estimator. The performance of the proposed LPND method has been compared with that of other speckle reduction methods, including the recently proposed speckle reducing anisotropic diffusion (SRAD) and nonlinear coherent diffusion (NCD). In simulation and phantom studies, an average gain of 1.55 dB and 1.34 dB in contrast-to-noise ratio was obtained compared to SRAD and NCD, respectively. The visual comparison of despeckled in vivo ultrasound images from liver and carotid artery shows that the proposed LPND method could effectively preserve edges and detailed structures while thoroughly suppressing speckle. These preliminary results indicate that the proposed speckle reduction method could improve image quality and the visibility of small structures and fine details in medical ultrasound imaging

204 citations


Additional excerpts

  • ...Application to a multi-resolution filter used in medical imaging [6] shows the benefits of our approach....

    [...]

Proceedings ArticleDOI
24 Jul 2006
TL;DR: In this paper, exact techniques to chart the Pareto space of throughput and storage trade-offs are presented, which can be used to determine the minimal storage space needed to execute a graph under a given throughput constraint.
Abstract: Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as Synchronous Dataflow Graphs. Communication between nodes in these graphs requires storage space which influences throughput. We present exact techniques to chart the Pareto space of throughput and storage trade-offs, which can be used to determine the minimal storage space needed to execute a graph under a given throughput constraint. The feasibility of the approach is demonstrated with a number of examples.

172 citations

01 Jan 2006
TL;DR: This work presents exact techniques to chart the Pareto space of throughput and storage tradeoffs, which can be used to determine the minimal storage space needed to execute a graph under a given throughput constraint.
Abstract: sultinanimplementation thatcannotbeexecuted within Multimedia applications usually havethroughput constraints. these timing constraints. Itisnecessary totakethetiming An implementation mustmeetthese constraints, while it constraints intoaccount while minimizing thebuffers. Sevminimizes resource usageandenergyconsumption. The eralapproaches havebeenproposed forminimizing buffer computeintensive kernels oftheseapplications areoften requirements underathroughput constraint. In[9], atechspecified asSynchronous Dataflow Graphs.Communica- nique basedonlinear programming isproposed tocalculate tionbetween nodesinthesegraphs requires storage space aschedule thatrealizes themaximalthroughput whileit whichinfluences throughput. We present exacttechniquestries tominimize buffer sizes. Hwangetal.propose aheuristochart thePareto spaceofthroughput andstorage trade- ticthatcantakeresource constraints into account [10]. This offs, whichcanbeusedtodetermine theminimal storage methodistargeted towards a-cyclic graphs anditalways spaceneeded toexecute agraphunderagiven throughputmaximizes throughput rather thanusing athroughput conconstraint. Thefeasibility oftheapproach isdemonstratedstraint. Thus,itcouldleadtoadditional resource requirewithanumberofexamples. ments. In[13], buffer minimization formaximal throughput ofa subclass ofSDFGs(homogeneous SDFGs) isachieved Categories andSubject Descriptors: C.3[Special-pur- viaaninteger linear programming approach. Ingeneral, poseandApplication-based Systems] Signal processing sys- theminimal buffer sizes obtained withthisapproach cantems notbetranslated toexact minimal buffer sizes forarbitrary GeneralTerms:Algorithms, Experimentation, Theory. SDFGs.We propose, incontrast toexisting work, an ex

166 citations


Additional excerpts

  • ...Data flow models of computation allow for efficient determination of system throughput [8] and required buffer sizes [9], [10]....

    [...]

Journal ArticleDOI
TL;DR: This paper used ESPAM to automatically generate and program several multiprocessor systems that execute three image processing applications, namely Sobel edge detection, Discrete Wavelet Transform, and Motion JPEG encoder, to validate and evaluate the methodology and techniques implemented.
Abstract: For modern embedded systems in the realm of high-throughput multimedia, imaging, and signal processing, the complexity of embedded applications has reached a point where the performance requirements of these applications can no longer be supported by embedded system architectures based on a single processor. Thus, the emerging embedded system-on-chip platforms are increasingly becoming multiprocessor architectures. As a consequence, two major problems emerge, namely how to design and how to program such multiprocessor platforms in a systematic and automated way in order to reduce the design time and to satisfy the performance needs of applications executed on such platforms. As an efficient solution to these two problems, in this paper, we present the methodology and techniques implemented in a tool called Embedded System-level Platform synthesis and Application Mapping (ESPAM) for automated multiprocessor system design, programming, and implementation. ESPAM moves the design specification and programming from the Register Transfer Level and low-level C to a higher system level of abstraction. We explain how, starting from system-level platform, application, and mapping specifications, a multiprocessor platform is synthesized, programmed, and implemented in a systematic and automated way. The class of multiprocessor platforms we consider is introduced as well. To validate and evaluate our methodology, we used ESPAM to automatically generate and program several multiprocessor systems that execute three image processing applications, namely Sobel edge detection, Discrete Wavelet Transform, and Motion JPEG encoder. The performance of the systems that execute these applications is also presented in this paper.

143 citations


"Model-based synthesis and optimizat..." refers background in this paper

  • ...In particular for buffer analysis this information can be advantageously exploited [11], [12], [13], [14]....

    [...]