scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Real-time Image Processing in 2008"


Journal ArticleDOI
TL;DR: This article presents a method for classifying color points for automotive applications in the Hue Saturation Intensity (HSI) Space based on the distances between their projections onto the SI plane, and some results of applying the metric are shown.
Abstract: This article presents a method for classifying color points for automotive applications in the Hue Saturation Intensity (HSI) Space based on the distances between their projections onto the SI plane. Firstly the HSI Space is analyzed in detail. Secondly the projection of image points from a typical automotive scene onto the SI plane is shown. The minimal classes relevant for driver assistance applications are derived. The requirements for the classification of the points into those classes are obtained. Several weighting functions are proposed and a fast form of an euclidean metric is investigated in detail. In order to improve the sensitivity of the weighting function, dynamic coefficients are introduced. It is shown how to compute them automatically in order to get optimal results for the classification. Finally some results of applying the metric to the sample images are shown and the conclusions are drawn.

97 citations


Journal ArticleDOI
TL;DR: A fast algorithm is proposed to accelerate the moment’s computations and a comparison with other conventional methods explains the superiority of the proposed method.
Abstract: Zernike polynomials are continuous orthogonal polynomials defined in polar coordinates over a unit disk. Zernike moment’s computation using conventional methods produced two types of errors namely approximation and geometrical. Approximation errors are removed by using exact Zernike moments. Geometrical errors are minimized through a proper mapping of the image. Exact Zernike moments are expressed as a combination of exact radial moments, where exact values of radial moments are computed by mathematical integration of the monomial polynomials over digital image pixels. A fast algorithm is proposed to accelerate the moment’s computations. A comparison with other conventional methods is performed. The obtained results explain the superiority of the proposed method.

79 citations


Journal ArticleDOI
TL;DR: A GPU library, MinGPU, which contains all of the necessary functions to convert an existing CPU code to GPU, and has created GPU implementations of several well known computer vision algorithms, including the homography transformation between two 3D views.
Abstract: In the field of computer vision, it is becoming increasingly popular to implement algorithms, in sections or in their entirety, on a graphics processing unit (GPU). This is due to the superior speed GPUs offer compared to CPUs. In this paper, we present a GPU library, MinGPU, which contains all of the necessary functions to convert an existing CPU code to GPU. We have created GPU implementations of several well known computer vision algorithms, including the homography transformation between two 3D views. We provide timing charts and show that our MinGPU implementation of homography transformations performs approximately 600 times faster than its C++ CPU implementation.

55 citations


Journal ArticleDOI
TL;DR: Performed on frontal images from FERET database, the comparisons show that PPBTF is a significant facial representation, quite effective and speedier in computation.
Abstract: A pixel-pattern-based texture feature (PPBTF) is proposed for real-time gender recognition. A gray-scale image is transformed into a pattern map where edges and lines are to be used for characterizing the texture information. On the basis of the pattern map, a feature vector is comprised the numbers of the pixels belonging to each pattern. We use the image basis functions obtained by principal component analysis (PCA) as the templates for pattern matching. The characteristics of the feature are comprehensively analyzed through an application to gender recognition. Adaboost is used to select the most discriminative feature subset, and support vector machine (SVMs) is adopted for classification. Performed on frontal images from FERET database, the comparisons with Gabor show that PPBTF is a significant facial representation, quite effective and speedier in computation.

42 citations


Journal ArticleDOI
TL;DR: This paper proposes a real-time, embedded vision solution for human action recognition, implemented on an FPGA-based ubiquitous device, and develops a fast humanaction recognition system with simple motion features and a linear support vector machine classifier.
Abstract: In recent years, automatic human action recognition has been widely researched within the computer vision and image processing communities. Here we propose a real-time, embedded vision solution for human action recognition, implemented on an FPGA-based ubiquitous device. There are three main contributions in this paper. Firstly, we have developed a fast human action recognition system with simple motion features and a linear support vector machine classifier. The method has been tested on a large, public human action dataset and achieved competitive performance for the temporal template class of approaches, which include “Motion History Image” based techniques. Secondly, we have developed a reconfigurable, FPGA based video processing architecture. One advantage of this architecture is that the system processing performance can be reconfigured for a particular application, with the addition of new or replicated processing cores. Finally, we have successfully implemented a human action recognition system on this reconfigurable architecture. With a small number of human actions (hand gestures), this stand-alone system is operating reliably at 12 frames/s, with an 80% average recognition rate using limited training data. This type of system has applications in security systems, man–machine communications and intelligent environments.

38 citations


Journal ArticleDOI
TL;DR: This is the first time that a JPEG-LS implementation offers such a high-speed encoding, and the experimental results show that encoding is performed as expected in high- speed, being able to serve real-time applications.
Abstract: A new design approach to create an efficient high-performance JPEG-LS encoder is proposed in this paper. The proposed implementation compresses the image data with the lossless mode of JPEG-LS. When the acquisition of precious content (image) is specified to occur in real-time, then lossless compression is essential. Lossless compression is important to critical applications, such as the acquisition of medical images and transmission of high-definition high-resolution images from space (satellite). The contribution of the paper is to introduce an efficient pipelined JPEG-LS encoder, which requires significantly lower encoding time than any other available JPEG-LS hardware or software implementation. The experimental results show that encoding is performed as expected in high-speed, being able to serve real-time applications. This is the first time that a JPEG-LS implementation offers such a high-speed encoding.

34 citations


Journal ArticleDOI
TL;DR: Using a well defined performance metric, this work finds out that Zernike moments from geometric moments of digital filters is nearly 70 times faster than the best method known as the symmetry method.
Abstract: Zernike moments are important digital image descriptors used in various applications starting from image watermarking to image recognition. Lots of fast algorithms have been proposed to speedup the computation of Zernike moments. This work provides computational complexity analysis of methods for the computation of Zernike moments, as well as a thorough study and simplification to the methods of finding Zernike moments from geometric moments. A new formula that relates Zernike moments to moments of digital filters is introduced that is very efficient and accurate. Comparisons are performed using Zernike moment via geometric moments method, the Q-recursive method, the coefficient method, and the symmetry method. Using a well defined performance metric, this work finds out that Zernike moments from geometric moments of digital filters is nearly 70 times faster than the best method known as the symmetry method.

33 citations


Journal ArticleDOI
TL;DR: The design and the implementation of a coarse-grain reconfigurable machine used as an accelerator for a programmable RISC core, to speed up the execution of computationally demanding tasks like multimedia applications is presented.
Abstract: This paper presents the design and the implementation of a coarse-grain reconfigurable machine used as an accelerator for a programmable RISC core, to speed up the execution of computationally demanding tasks like multimedia applications. We created a VHDL model of the proposed architecture and implemented it on a FPGA board for prototyping purposes; then we mapped on our architecture some DSP and image processing algorithms as a benchmark. In particular, we provided the proposed architecture with subword computation capabilities, which turns out to be extremely effective especially when dealing with image processing algorithms, achieving significant benefits in terms of speed and efficiency in resource usage. To create the configuration bitstream (configware) we created a tool based on a graphical user interface (GUI) which provides a first step towards the automation of the programming flow of our design: the tool is meant to ease the life of the programmer, relieving him from the burden of calculating the configuration bits by hand. Synthesis results indicate that the area occupation and the operating frequency of our design are reasonable also when compared to other similar design. In addition to this, the amount of clock cycles taken by our machine to perform a given algorithm is orders of magnitude smaller than the one required by a corresponding software implementation on a RISC microprocessor.

33 citations


Journal ArticleDOI
TL;DR: The ability to choose the most efficient search technique with respect to speeding up the process and locating the best matching target block leads to the improvement of the quality of service and the performance of the video encoding.
Abstract: A motion estimation architecture allowing the execution of a variety of block-matching search techniques is presented in this paper. The ability to choose the most efficient search technique with respect to speeding up the process and locating the best matching target block leads to the improvement of the quality of service and the performance of the video encoding. The proposed architecture is pipelined to efficiently support a large set of currently used block-matching algorithms including Diamond Search, 3-step, MVFAST and PMVFAST. The proposed design executes the algorithms by providing a set of instructions common for all the block-matching algorithms and a few instructions accommodating the specific actions of each technique. Moreover, the architecture supports the use of different search techniques at the block level. The results and performance measurements of the architecture have been validated on FPGA supporting maximum throughput of 30 frames/s with frame size 1,024 × 768.

27 citations


Journal ArticleDOI
TL;DR: A parallel-matching processor architecture with early jump-out (EJO) control is proposed to carry out high-speed biometric fingerprint database retrieval and results in about a 22% gain in computing efficiency.
Abstract: In this paper, a parallel-matching processor architecture with early jump-out (EJO) control is proposed to carry out high-speed biometric fingerprint database retrieval. The processor performs the fingerprint retrieval by using minutia point matching. An EJO method is applied to the proposed architecture to speed up the large database retrieval. The processor is implemented on a Xilinx Virtex-E, and occupies 6,825 slices and runs at up to 65 MHz. The software/hardware co-simulation benchmark with a database of 10,000 fingerprints verifies that the matching speed can achieve the rate of up to 1.22 million fingerprints per second. EJO results in about a 22% gain in computing efficiency.

20 citations


Journal ArticleDOI
TL;DR: The architectural solution proposed offers a low-cost implementation for automatic exposure correction in real-time video systems, achieving the required specifications in terms of flexibility, timing, performance and visual quality.
Abstract: This paper presents an FPGA implementation of a novel image enhancement algorithm, which compensates for the under-/over-exposed image regions, caused by the limited dynamic range of contemporary standard dynamic range image sensors. The algorithm, which is motivated by the attributes of the shunting center-surround cells of the human visual system, is implemented in Altera Stratix II GX: EP2SGX130GF1508C5 FPGA device. The proposed implementation, which is synthesized in an FPGA technology, employs reconfigurable pipeline, structured memory management, and data reuse in spatial operations, to render in real-time the huge amount of input data that the video signal comprises. It also avoids the use of computationally intensive operations, achieving the required specifications in terms of flexibility, timing, performance and visual quality. The proposed implementation allows real-time processing of color images with sizes up to 2.5 Mpixels, at frame rate of 25 fps. As a result, the architectural solution described in this work offers a low-cost implementation for automatic exposure correction in real-time video systems.

Journal ArticleDOI
TL;DR: A VLSI H.264/AVC encoder architecture performing at real-time is described, which complies with the reference software encoder of the standard, follows the baseline profile level 3.0 and constitutes an IP-core and/or an efficient stand-alone solution.
Abstract: Evolving applications related to video technologies require video encoder and decoder implemented with low cost and achieving real-time performance. In order to meet this demand and targeting especially the applications imposing low VLSI area requirements, the present paper describes a VLSI H.264/AVC encoder architecture performing at real-time. The encoder uses a pipeline architecture and all the modules have been optimized with respect to the VLSI cost. The encoder design complies with the reference software encoder of the standard, follows the baseline profile level 3.0 and it constitutes an IP-core and/or an efficient stand-alone solution. The architecture operates at a maximum frequency of 100 MHz and achieves maximum throughput of 30 frames/s with frame size 1,024 × 768. Results and performance measurements of the entire encoder have been validated on FPGA and VLSI 0.18 μm occupying a total area of 3.9 mm2.

Journal ArticleDOI
TL;DR: This work implements several powerful algorithms for object recognition, namely an interest point detector together with an region descriptor, and builds a medium-sized object database based on a vocabulary tree, which is suitable for the dedicated hardware setup.
Abstract: In the last few years, object recognition has become one of the most popular tasks in computer vision. In particular, this was driven by the development of new powerful algorithms for local appearance based object recognition. So-called “smart cameras” with enough power for decentralized image processing became more and more popular for all kinds of tasks, especially in the field of surveillance. Recognition is a very important tool as the robust recognition of suspicious vehicles, persons or objects is a matter of public safety. This simply makes the deployment of recognition capabilities on embedded platforms necessary. In our work we investigate the task of object recognition based on state-of-the-art algorithms in the context of a DSP-based embedded system. We implement several powerful algorithms for object recognition, namely an interest point detector together with an region descriptor, and build a medium-sized object database based on a vocabulary tree, which is suitable for our dedicated hardware setup. We carefully investigate the parameters of the algorithm with respect to the performance on the embedded platform. We show that state-of-the-art object recognition algorithms can be successfully deployed on nowadays smart cameras, even with strictly limited computational and memory resources.

Journal ArticleDOI
TL;DR: An efficient architecture for the FRIT suitable for VLSI implementation based on a parallel, systolic Finite Radon Transform (FRAT) and a Haar Discrete Wavelet Transform (DWT) sub-block, respectively is presented.
Abstract: In this paper, an efficient architecture for the Finite Ridgelet Transform (FRIT) suitable for VLSI implementation based on a parallel, systolic Finite Radon Transform (FRAT) and a Haar Discrete Wavelet Transform (DWT) sub-block, respectively is presented The FRAT sub-block is a novel parametrisable, scalable and high performance core with a time complexity of O(p 2), where p is the block size Field Programmable Gate Array (FPGA) and Application Specific Integrated Circuit (ASIC) implementations are carried out to analyse the performance of the FRIT core developed

Journal ArticleDOI
TL;DR: The authors' techniques result in an efficient framework for trading off performance and configurable hardware resource usage based on the constraints of a given application and present a novel architecture that enables dynamically-reconfigurable image registration.
Abstract: Image registration is a computationally intensive application in the medical imaging domain that places stringent requirements on performance and memory management efficiency. This paper develops techniques for mapping rigid image registration applications onto configurable hardware under real-time performance constraints. Building on the framework of homogeneous parameterized dataflow, which provides an effective formal model of design and analysis of hardware and software for signal processing applications, we develop novel methods for representing and exploring the hardware design space when mapping image registration algorithms onto configurable hardware. Our techniques result in an efficient framework for trading off performance and configurable hardware resource usage based on the constraints of a given application. Based on trends that we have observed when applying these techniques, we also present a novel architecture that enables dynamically-reconfigurable image registration. This proposed architecture has the ability to tune its parallel processing structure adaptively based on relevant characteristics of the input images.

Journal ArticleDOI
TL;DR: A hybrid memory sub-system is presented which is able to capture data re-use effectively in spite of data-dependent memory accesses, and is capable of exploiting 2D spatial locality, which is frequently exhibited in the access patterns of image processing applications.
Abstract: In an effort to achieve lower bandwidth requirements, video compression algorithms have become increasingly complex. Consequently, the deployment of these algorithms on field programmable gate arrays (FPGAs) is becoming increasingly desirable, because of the computational parallelism on these platforms as well as the measure of flexibility afforded to designers. Typically, video data are stored in large and slow external memory arrays, but the impact of the memory access bottleneck may be reduced by buffering frequently used data in fast on-chip memories. The order of the memory accesses, resulting from many compression algorithms are dependent on the input data (Jain in Proceedings of the IEEE, pp. 349–389, 1981). These data-dependent memory accesses complicate the exploitation of data re-use, and subsequently reduce the extent to which an application may be accelerated. In this paper, we present a hybrid memory sub-system which is able to capture data re-use effectively in spite of data-dependent memory accesses. This memory sub-system is made up of a custom parallel cache and a scratchpad memory. Further, the framework is capable of exploiting 2D spatial locality, which is frequently exhibited in the access patterns of image processing applications. In a case study involving the quad-tree structured pulse code modulation (QSDPCM) application, the impact of data dependence on memory accesses is shown to be significant. In comparison with an implementation which only employs an SPM, performance improvements of up to 1.7× and 1.4× are observed through actual implementation on two modern FPGA platforms. These performance improvements are more pronounced for image sequences exhibiting greater inter-frame movements. In addition, reductions of on-chip memory resources by up to 3.2× are achievable using this framework. These results indicate that, on custom hardware platforms, there is substantial scope for improvement in the capture of data re-use when memory accesses are data dependent.

Journal ArticleDOI
TL;DR: Two architectures based on the replication sort algorithm (RSA) and rank based network sorting algorithm (RBNS) for implementation of Rank order filer (ROF) are presented and the concepts of pipelining and grain level parallelism are used.
Abstract: In this paper we present two architectures based on the replication sort algorithm (RSA) and rank based network sorting algorithm (RBNS) for implementation of Rank order filer (ROF). This paper focuses on optimization strategies for sorting in terms of operating speed (throughput) and area (no. of comparators). The RSA algorithm achieves maximum throughput by sorting, which finds the position of all the window elements in parallel using eight bit comparators, a LUT to store the bit sum and a decoder. The time cost for filtering the complete image remains constant irrespective of the size of the window and the algorithm is generalized for all rank orders. The RBNS architecture is based on Sorting Network architecture algorithm, optimized for each desired output rank with O (N) hardware complexity compared to O (N2) complexity of the existing architectures that are based on bubble-sort and quick-sort reported so far. The proposed architectures use the concepts of pipelining and grain level parallelism and accomplish the task of sorting and filtering each sample appearing at the input window of the filter in one clock cycle, excluding the initial latency.

Journal ArticleDOI
TL;DR: To meet both flexibility and performance requirements, particularly when implementing high-end real-time image/video processing algorithms, the paper proposes to combine the application specific instruction-set processor (ASIP) paradigm with the reconfigurable hardware one.
Abstract: To meet both flexibility and performance requirements, particularly when implementing high-end real-time image/video processing algorithms, the paper proposes to combine the application specific instruction-set processor (ASIP) paradigm with the reconfigurable hardware one. As case studies, the design of partially reconfigurable ASIP (r-ASIP) architectures is presented for two classes of algorithms with widespread diffusion in image/video processing: motion estimation and retinex filtering. Design optimizations are addressed at both algorithmic and architectural levels. Special processor concepts used to trade-off performance versus flexibility and to enable new features of post-fabrication configurability are shown. Silicon implementation results are compared to known ASIC, DSP or reconfigurable designs; the proposed r-ASIPs stand for their better performance–flexibility figures in the respective algorithmic class.

Journal ArticleDOI
TL;DR: This paper presents a fast or computationally efficient feature-assisted adaptive early termination approach in order to reduce the computational complexity while maintaining more or less the same video quality.
Abstract: The multiple reference frames motion estimation approach used in H.264 is computationally intensive. This paper presents a fast or computationally efficient feature-assisted adaptive early termination approach in order to reduce the computational complexity while maintaining more or less the same video quality. The introduced feature-assisted approach consists of three parts: (1) reduction of the number of available reference frames using predicted motion activity, extracted texture information, and skip mode from neighboring macroblocks, (2) the most probable reference frame prediction based on neighboring macroblocks, and (3) an adaptive early termination threshold derived from a theoretical analysis of all zero block detection. Extensive experimental results are performed to demonstrate the computational gain of the introduced approach over the standard approach for the multiple reference frames motion estimation.

Journal ArticleDOI
TL;DR: A hybrid wave-pipelining scheme is proposed to get the benefits of both pipelining and WP techniques and is applicable for ASICs and also applicable for the computation of other image transforms such as DCT, DHT.
Abstract: In the literature, techniques such as pipelining and wave-pipelining (WP) are proposed for increasing the operating frequency of a digital circuit. In general, use of pipelining results in higher speed at the cost of increase in the area and clock routing complexity. On the other hand, use of WP results in less clock routing complexity and less area but enables the digital circuit to be operated only at moderate speeds. In this paper, a hybrid wave-pipelining scheme is proposed to get the benefits of both pipelining and WP techniques. Major contributions of this paper are: proposal for the implementation of 2D DWT using lifting scheme by adopting the hybrid wave-pipelining and proposal for the automation of the choice of clock frequency and clock skew between the input and output registers of wave-pipelined circuit using built in self test (BIST) and system-on-chip (SOC) approaches. In the hybrid scheme, different lifting blocks are interconnected using pipelining registers and the individual blocks are implemented using WP. For the purpose of evaluating the superiority of the schemes proposed in this paper, the system for the computation of one level 2D DWT is implemented using the following techniques: pipelining, non-pipelining and hybrid wave-pipelining. The BIST approach is used for the implementation on Xilinx Spartan-II device. The SOC approach is adopted for implementation on Altera and Xilinx field programmable gate arrays (FPGAs) based SOC kits with Nios II or Micro blaze soft-core processors. From the implementation results, it is verified that the hybrid WP circuit is faster than non-pipelined circuit by a factor of 1.25–1.39. The pipelined circuit is in turn faster than the hybrid wave-pipelined circuit by a factor of 1.15–1.38 and this is achieved with the increase in the number of registers by a factor of 1.79–3.15 and increase in the number of LEs by a factor of 1.11–1.65. The soft-core processor based automation scheme has considerably reduced the effort required for the design and testing of the hybrid wave-pipelined circuit. The techniques proposed in this paper, are also applicable for ASICs. The optimization schemes proposed in this paper are also applicable for the computation of other image transforms such as DCT, DHT.

Journal ArticleDOI
TL;DR: The article outlines mainly the Ardoise tools aspect: development environment and real-time management of the hardware tasks, based on a dynamic management of tasks according to an application scenario written using C++ language.
Abstract: Technology evolution makes possible the integration of heterogeneous components as programmable elements (processors), hardware dedicated blocks, hierarchical memories and buses. Furthermore, an optimized reconfigurable logic core embedded within a System-on-Chip will associate the performances of dedicated architecture and the flexibility of programmable ones. In order to increase performances, some of the applications are carried out in hardware, using dynamically reconfigurable logic, rather than software, using programmable elements. This approach offers a suitable hardware support to design malleable systems able to adapt themselves to a specific application. This article makes a synthesis of the Ardoise project. The first objective of Ardoise project was to design and to produce a dynamically reconfigurable platform based on commercial FPGAs. The concept of dynamically reconfigurable architecture depends partially on new design methodologies elaboration as well as on the programming environment. The platform architecture was designed to be suitable for real-time image processing. The article outlines mainly the Ardoise tools aspect: development environment and real-time management of the hardware tasks. The proposed methodology is based on a dynamic management of tasks according to an application scenario written using C++ language.

Journal ArticleDOI
TL;DR: It is shown that a virtualized Associative Mesh achieves real-time execution for a number of complex image processing algorithms, including split and merge segmentation, watershed segmentation and motion detection.
Abstract: This paper presents the evolution of the Associative Mesh, a massively parallel SIMD architecture based on reconfigurability and asynchronism. To favor its System on Chip implementation, we introduce a reorganization of the structure based on processors virtualization and evaluate its consequences on hardware cost and algorithmic performances. Using an evaluation environment based on a programming library and a parameterized description of the architecture, we show that a virtualized Associative Mesh achieves real-time execution for a number of complex image processing algorithms, including split and merge segmentation, watershed segmentation and motion detection.

Journal ArticleDOI
TL;DR: A new fast mode decision (FMD) algorithm is proposed for the recent H.264/AVC video coding standard, aiming to reduce its computational load without loosing coding efficiency, and shows reduction values around 35–65% of motion estimation time.
Abstract: In this paper a new fast mode decision (FMD) algorithm is proposed for the recent H264/AVC video coding standard, aiming to reduce its computational load without loosing coding efficiency This algorithm identifies redundancy and selects the minimum sub-set of modes for each macroblock (MB) required to provide high rate-distortion (RD) efficiency It is based on a fast analysis of the histogram of the difference image between frames which classifies the areas of each frame as active or non-active by means of an adaptive thresholding technique More coding effort is devoted to active areas with the selection of a large sub-set of Modes, as these areas are expected to be the most relevant in terms of RD cost Results show reduction values around 35–65% of motion estimation (ME) time, preserving the RD cost for the Baseline Profile, by using P-Slices and without needing B-Slices Moreover, the strategy works as an intelligent tool for real-time applications with constrained number of operations per frame: it wisely uses the given operational resources distributing them among those MBs that need it

Journal ArticleDOI
TL;DR: The computation performance of the best previous algorithm that models this phenomenon is substantially improved using high performance reconfigurable computing through acceleration of the key computationally intensive steps of the algorithm on a field programmable gate array (FPGA).
Abstract: Modelling the effects of wavefront distortions over a finite aperture is an essential component in the simulation of adaptive optics configurations, prediction of performance of laser designators and atmospheric imaging simulations like generation of infrared (IR) scenes in the presence of atmospheric turbulence. In all of these applications many thousands of phase screens need to be generated. The computation time required for a large iterations of algorithms that model this effect is important an issue and for this reason there have been many previous attempts to improve the computation speed such algorithms. In this paper, the computation performance of the best previous algorithm that models this phenomenon is substantially improved using high performance reconfigurable computing through acceleration of the key computationally intensive steps of the algorithm on a field programmable gate array (FPGA). Our best hardware implementation can provide a speedup of more than 60 times the original algorithm.

Journal ArticleDOI
TL;DR: This issue constitutes a combined first and second issue or a double issue of the third volume of Journal of Real Time Image Processing (JRTIP) and contains original research articles that address various real-time issues.
Abstract: This issue constitutes a combined first and second issue or a double issue of the third volume of Journal of Real Time Image Processing (JRTIP). There are nine papers in this issue, two of which belong to the papers that had been originally submitted to the previous special issue on FieldProgrammable Technology [1] and could not meet the publication deadline of that special issue. These papers address architectural aspects of FPGA implementations. The first paper discusses the problem of real-time motion estimation while the second one discusses the reconfiguration capabilities for multi-media applications. These two papers conclude the above special issue. Considering the subject of these papers, it is worth pointing out that the next issue of JRTIP will be a special issue on reconfigurable architectures for real-time image processing [2], which is scheduled to appear in the second half of 2008. The other seven papers are original research articles that address various real-time issues. The first paper among this set presents the implementation and analysis of optimized architectures based on a replication and a rank-based network sorting algorithm. The next three papers address various real-time aspects of the H.264/AVC video coding standard. The first paper among these three discusses architectural issues for a VLSI implementation of the H.264/ACV encoder operating in real-time. The next one focuses on reducing the computational load for the H.264/ACV video coding standard by a fast mode decision approach. The third one also addresses the computational complexity of H.264/ACV by utilizing a computationally efficient early-termination approach. The next two papers cover fast computations of Zernike moments. The first one provides a computational complexity analysis for different computation methods. The second one discusses a fast algorithm for the exact computation of Zernike moments. Finally, the last paper involves a pixel-pattern-based texture feature for real-time gender recognition involving image basis functions. At this point, we would like to use this opportunity to mention that the editorial board of JRTIP recently met in San Jose, CA, where they were attending the SPIE conference on real-time image processing. The following two major issues were discussed at length by the board members in their meetings: (1) ways to shorten the review cycle time while keeping the quality of reviews high, and (2) ways to enhance the visibility of the journal in the image and video processing community. A picture of some of the editorial board members present at the Springer sponsored dinner meeting is shown below (Fig. 1). A number of useful recommendations were made at the meetings, which are planned to be implemented this year. Noting that the SPIE conference on real-time image processing is the only conference dedicated to the subject of real-time image and video processing, the board members thought that it would be beneficial if they meet annually during this conference to discuss ways to improve and enhance JRTIP. Henceforth, it is our pleasure to indicate that the second annual meeting of the JRTIP board is expected to be held in January 2009 after the completion of the third volume. M. F. Carlsohn (&) Ingenieurberatung für Computer Vision und Bildkommunikation, Am Heiddamm 36g, 28355 Bremen, Germany e-mail: matthias.carlsohn@t-online.de

Journal ArticleDOI
Mohamed Akil1
TL;DR: This special issue on Reconfigurable Architecture for Real-Time Image Processing presents articles addressing reconfigurable computing for real-time image processing, programming frameworks with FPGA-style mapping as well as real- time and embedded image processing applications and implementations.
Abstract: The performance requirements of image processing applications have led to increase in the computing power of implementation platforms, in particular when real-time constraints are to be met. Image processing applications may consist of different standards, or different algorithms used at different stages of the processing chain. The computing paradigm of reconfigurable architectures promises a trade-off between flexibility and performance. Reconfigurable architectures can exploit fine-grain (more suitable for low-level and medium-level operations in the image processing chain) and coarse-grain parallelism (more suitable for high-level operations in the image processing chain). Many reconfigurable architectures have been constructed specifically for image processing using different processors and dedicated circuits such as ASICs and FPGAs. This special issue on Reconfigurable Architecture for Real-Time Image Processing presents articles addressing reconfigurable computing for real-time image processing, programming frameworks with FPGA-style mapping as well as real-time and embedded image processing applications and implementations. This special issue includes nine papers that are briefly outlined below: The first paper, by Denoulet and Merigot, presents a massively parallel SIMD architecture based on reconfigurability and asynchronism called Associative Mesh. The introduced virtualized associative mesh achieves real-time execution for split and merge segmentation, watershed segmentation and motion detection. The second paper, by Kessal, Karabernou and Demigny, focuses on the Ardoise project. The goal of this project is to realize a dynamically reconfigurable platform. This paper proposes a methodology that can be easily adapted to common SoC architecture. The third paper, by Sen, Hemaraj, Plishker, Shekhar and Bhattacharya, presents a novel architecture that enables dynamically-reconfigurable image registration. It also addresses some data-flow-motivated parallel architectures for image registration. The fourth paper, by Meng, Freeman, Pears and Bailey, details the implementation of a human action recognition system on a reconfigurable, FPGA based video processing architecture. The fifth paper, by Jiang, Crookes and Bouridane, describes a parallel-matching processor architecture to perform high-speed biometric fingerprint database retrieval. The processor was implemented on Xilinx Virtex-E and runs up to 65 MHz. The sixth paper, by Chandrasekaran, Amira, Minghua and Bermak, covers the presentation of an architecture for the Finite Ridgelet Transform. It presents a parallel architecture as well as FPGA and ASIC implementations. The seventh paper, by Srinam and Eng, presents various implementations of the Kolmogorov phase screen generator. The FPGA hardware implementation provides a speedup of more than 60 times of the original algorithm. The eight paper, by Saponara, Casula and Fanucci proposes to combine the application specific instruction-set processor (ASIP) paradigm with the reconfigurable hardware one. The ninth paper, by Seetharaman, Venkataramani and Lakshminarayanan, describes how to get the benefits of M. Akil (&) Department of Computer Science, Institut Gaspar-Monge, Unite mixte de recherche CNRS-UMLVPE-ESIEE (UMR 8049), ESIEE, Cite Descartes, BP 99, 93162 Noisy-le-Grand Cedex, France e-mail: akilm@esiee.fr

Journal ArticleDOI
TL;DR: This paper shows that these regional operations can be decomposed into a set of union operations of the corresponding components of all horizontal lines, which can be turned into combinations or truncations of the line segments.
Abstract: In this paper a new fast region description method named line segment table is proposed. Each element in the table represents a horizontal line, which includes the coordinates of a line segment, the relationship between upper and lower lines, and the characteristics of a line. Thus a segment table can be used to describe the shape of a region precisely. This paper shows that these regional operations can be decomposed into a set of union operations of the corresponding components of all horizontal lines. Thus the resource intensive regional operations can be turned into combinations or truncations of the line segments. This paper also presents a set of line processing operations, which can be applied in binary image processing such as erosion, dilation, convolution and correlations. Line segment encoding is particularly applicable to line-based processing. Furthermore it is also useful for applications such as seed filling. This paper proposes a fast bucket sorting algorithm which is faster than traditional contour tracing algorithms and seed filling algorithms. This algorithm simplifies regional processing and improves processing efficiency, and is particularly advantageous in the application of industrial real-time detection and hardware design.