scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Circuits and Systems for Video Technology in 1992"


Journal ArticleDOI
TL;DR: An iterative block reduction technique based on the theory of a projection onto convex sets to impose a number of constraints on the coded image in such a way to restore it to its original artifact-free form.
Abstract: The authors propose an iterative block reduction technique based on the theory of a projection onto convex sets. The idea is to impose a number of constraints on the coded image in such a way as to restore it to its original artifact-free form. One such constraint can be derived by exploiting the fact that the transform-coded image suffering from blocking effects contains high-frequency vertical and horizontal artifacts corresponding to vertical and horizontal discontinuities across boundaries of neighboring blocks. Another constraint has to be with the quantization intervals of the transform coefficients. Specifically, the decision levels associated with transform coefficient quantizers can be used as lower and upper bounds on transform coefficients, which in turn define boundaries of the convex set for projection. A few examples of the proposed approach are presented. >

544 citations


Journal ArticleDOI
TL;DR: It is found that traffic periodicity can cause different sources with identical statistical characteristics to experience differing cell-loss rates, and a multistate Markov chain model that can be derived from three traffic parameters is sufficiently accurate for use in traffic studies.
Abstract: Source modeling and performance issues are studied using a long (30 min) sequence of real video teleconference data. It is found that traffic periodicity can cause different sources with identical statistical characteristics to experience differing cell-loss rates. For a single-stage multiplexer model, some of this source-periodicity effect can be mitigated by appropriate buffer scheduling and one effective scheduling policy is presented. For the sequence analyzed, the number of cells per frame follows a gamma (or negative binomial) distribution. The number of cells per frame is a stationary stochastic process. For traffic studies, neither an autoregressive model of order two nor a two-state Markov chain model is good because they do not model correctly the occurrence of frames with a large number of cells, which are a primary factor in determining cell-loss rates. The order two autoregressive model, however, fits the data well in a statistical sense. A multistate Markov chain model that can be derived from three traffic parameters is sufficiently accurate for use in traffic studies. >

469 citations


Journal ArticleDOI
TL;DR: A variable-block-size multiresolution motion compensation scheme in which the size of a block is adapted to its level in the wavelet pyramid is proposed, and it is shown that the coding approach has a superior performance in terms of the peak-to-peak signal- to-noise ratio as well as the subjective quality.
Abstract: A variable-block-size multiresolution motion compensation (MRMC) scheme in which the size of a block is adapted to its level in the wavelet pyramid is proposed. This scheme not only considerably reduces the searching and matching time but also provides a meaningful characterization of the intrinsic motion structure. The variable-block-size approach also avoids the drawback of the constant-size MRMC in describing small object motion activities. After wavelet decomposition, each scaled subframe tends to have different statistical properties. An adaptive truncation process is implemented, and a bit allocation scheme similar to that in transform coding is examined. Four variations of the proposed motion-compensated wavelet video compression system are identified, and it is shown that the coding approach has a superior performance in terms of the peak-to-peak signal-to-noise ratio as well as the subjective quality. >

296 citations


Journal ArticleDOI
Amy R. Reibman1, B.G. Haskell1
TL;DR: The performance of video that has been encoded using the derived constraints for the leaky bucket channel is presented and it is shown how these ideas might be implemented in a system that controls both the encoded and transmitted bit rates.
Abstract: Constraints on the encoded bit rate of a video signal that are imposed by a channel and encoder and decoder buffers are considered Conditions that ensure that the video encoder and decoder buffers do not overflow or underflow when the channel can transmit a variable bit rate are presented Using these conditions and a commonly proposed network-user contract, the effect of a (BISDN) network policing function on the allowable variability in the encoded video bit rate is examined It is shown how these ideas might be implemented in a system that controls both the encoded and transmitted bit rates The performance of video that has been encoded using the derived constraints for the leaky bucket channel is presented >

259 citations


Journal ArticleDOI
TL;DR: A VLSI architecture for implementing a full-search block-matching algorithm is presented, based on a systolic array processor and shift register arrays with programmable length, which has the following advantages: it allows serial data input to save the pin counts but performs parallel processing.
Abstract: The block-matching motion estimation is the most popular method for motion-compensated coding of image sequence. A VLSI architecture for implementing a full-search block-matching algorithm is presented. Based on a systolic array processor and shift register arrays with programmable length, the proposed architecture has the following advantages: it allows serial data input to save the pin counts but performs parallel processing; it is flexible in adaptation to the dimensional change of the search area via simple control; it can operate in real time for videoconference applications; and it is simple and modular in design, and thus is suitable for VLSI implementation. >

232 citations


Journal ArticleDOI
TL;DR: The issues involved in the design of special-purpose VLSI processors are discussed, and an 8*8 2-D DCT/IDCT processor chip that can be used for high rate image and video coding is presented.
Abstract: Computation of the discrete cosine transform (DCT) and its inverse (IDCT) at a high data rate is considered. The issues involved in the design of special-purpose VLSI processors are discussed, and an 8*8 2-D DCT/IDCT processor chip that can be used for high rate image and video coding is presented. A recently published algorithm for computing the DCT and its inverse is outlined. The processor architecture based on the algorithm is presented. Details of functional units and special circuits are discussed. The 8*8 2-D DCT/IDCT processor chip measures 7.9*9.2 mm/sup 2/. It is designed using the MOSIS 2 mu m scalable CMOS technology. It takes 16-b inputs. uses 16-b internal memory for coefficients and data, and generates 16-b outputs. A single input line determines if the chip computes the DCT or the IDCT. The chip is highly pipelined with a latency of 127 cycles and a maximum delay time of 18 ns between any two pipeline stages. >

87 citations


Journal ArticleDOI
TL;DR: A layout of the major parts and a simulation of the critical path of the pipelined constant-input-rate PLA-based architecture using a high-level synthesis approach estimates that a decoding throughput of 200 Mb/s with a single chip is achievable with CMOS 2.0 mu m technology.
Abstract: Two classes of architectures-the tree-based and the PLA-based architectures-have been discussed in the literature for the variable length code (VLC) decoder. Pipelined or parallel architectures in these two classes are proposed for high-speed implementation. The pipelined tree-based architectures have the advantages of fully pipelined design, short clock cycle, and partial programmability. They are suitable for concurrent decoding of multiple independent bit streams. The PLA-based architectures have greater flexibility and can take advantages of some high-level optimization techniques. The input/output rate can be fixed or variable to meet the application requirements. As an experiment, the authors have constructed a VLC based on a popular video compression system and compared the architectures. A layout of the major parts and a simulation of the critical path of the pipelined constant-input-rate PLA-based architecture using a high-level synthesis approach estimates that a decoding throughput of 200 Mb/s with a single chip is achievable with CMOS 2.0 mu m technology. >

83 citations


Journal ArticleDOI
P.A. Ruetz1, P. Tong1, D. Bailey1, P.A. Luthi1, P.H. Ang1 
TL;DR: A seven-chip set which performs the functions associated with video and image compression algorithms, and CCITT H.261 in particular, has been designed, fabricated, and is fully functional.
Abstract: A seven-chip set which performs the functions associated with video and image compression algorithms, and CCITT H.261 in particular, has been designed, fabricated, and is fully functional. The major functions performed by the devices include motion estimation, DCT and IDCT, forward and inverse quantization, Huffman coding and decoding, BCH error correction, and loop filtering. The chips that perform the predictive and transform coding section of the algorithm operate with pixel rates up to 40 MHz. Array-based technologies of 1.5 and 1.0 mu m CMOS were used extensively to achieve a 28 man-month design time. Each die is less than 10 mm on a side. >

77 citations


Journal ArticleDOI
TL;DR: A VLSI implementation of a lossless data compression algorithm is reported and its performance on several 8-b test images exceeds other techniques employing differential pulse code modulation followed by arithmetic coding, adaptive Huffman coding, and a Lempel-Ziv-Welch (LZW) algorithm.
Abstract: A VLSI implementation of a lossless data compression algorithm is reported. This is the first implementation of an encoder/decoder chip set that uses the Rice (see JPL Publication 91-1, 1991) algorithm and provides an introduction to the algorithm and a description of the high-performance hardware. The algorithm is adaptive over a aide entropy range. Its performance on several 8-b test images exceeds other techniques employing differential pulse code modulation (DPCM) followed by arithmetic coding, adaptive Huffman coding, and a Lempel-Ziv-Welch (LZW) algorithm. A major feature of the algorithm is that it requires no look-up tables or external RAM. There are only 71000 transistors required to implement the encoder and decoder. Each chip was fabricated in a 1.0- mu m CMOS process and both are only 5 mm on a side. A comparison is made with other hardware realizations. Under laboratory conditions, the encoder compresses at a rate in excess of 50 Msamples/s and the decoder operates at 25 Msamples/s. The current implementation processes quantized data from 4 to 14 b/sample. >

73 citations


Journal ArticleDOI
TL;DR: In order to demonstrate the video quality of the newly established standard and the feasibility of a cost-effective VLSI solution, a real-time video codec based on H.261 has been constructed using ASICs (application specific integrated circuits).
Abstract: After many years of intensive deliberation, an international low-bit-rate video coding standard, known as CCITT (International Telegraph and Telephone Consultative Committee) Recommendation H.261, has been completed. The H.261 covers a wide range of bit rates at p*64 kbs, where p=1, 2, . . ., 30. A great deal of real-time signal processing power is required to compress an NTSC or other similar video signals to these rates for transport and to reconstruct the original signal back for display. In order to demonstrate the video quality of the newly established standard and the feasibility of a cost-effective VLSI solution, a real-time video codec based on H.261 has been constructed using ASICs (application specific integrated circuits). A single-board research prototype consisting of 11 ASICs with an aggregate signal processing power of approximately two billion operations per second is presented. >

65 citations


Journal ArticleDOI
TL;DR: Attempts have been made to modify the asymptotic DRF to estimate the performance of real VQs at low bit rates, and the modification is shown to be in good agreement with experimental results.
Abstract: An image-coding technique, in which the discrete cosine transform (DCT) is combined with a classified vector quantization (CVQ), is presented. A DCT-transformed input block is classified according to the perceptual feature, partitioned into several smaller vectors, and then vector quantized. An efficient edge-oriented classifier employing the DCT coefficients as feature for classification is used to maintain the edge integrity in the reconstructed image. Based on a smaller geometric mean vector variance, a partition scheme in which 2-D DCT coefficients are divided into several smaller size vectors is also investigated. Because the distortion rate function (DRF) used is essential for the bit allocation algorithm to perform well, attempts have been made to modify the asymptotic DRF to estimate the performance of real VQs at low bit rates, and the modification is shown to be in good agreement with experimental results. Simulation results indicate that a good visual quality of the coded image in the range of 0.4 approximately 0.7 b/pixel is obtained. >

Journal ArticleDOI
TL;DR: A fully pipelined architecture to compute the 2D discrete cosine transform (DCT) from a frame-recursive point of view, which is suitable for VLSI implementation for high-speed HDTV systems and using distributed arithmetic to increase computational efficiency and reduce round-off error is discussed.
Abstract: The authors propose a fully pipelined architecture to compute the 2D discrete cosine transform (DCT) from a frame-recursive point of view. Based on this approach, two real-time parallel lattice structures for successive frame and block 2D DCT are developed. These structures are fully pipelined with throughput rate N clock cycles for an N*N successive input data frame. Moreover, the resulting 2D DCT architectures are modular, regular, and locally connected and require only two 1D DCT blocks that are extended directly from the 1D DCT structure without transposition. It is therefore suitable for VLSI implementation for high-speed HDTV systems. A parallel 2D DCT architecture and a scanning pattern for HDTV systems to achieve higher performance is proposed. The VLSI implementation of the 2D DCT using distributed arithmetic to increase computational efficiency and reduce round-off error is discussed. >

Journal ArticleDOI
TL;DR: It is believed that the proposed methods and architectures make it possible to reconstruct arbitrarily high resolution Huffman-coded images and video in real time with current electronics.
Abstract: For pt.I see ibid, vol.2, no.2, p.187, 1992, For applications in graphic computers, image and video composition, high-definition television (HDTV), and optical fiber networks, Huffman-coded images need to be reconstructed at a high throughput rate. Part I showed several architectural and architecture-specific optimization techniques. However, due to the recursion within the reconstruction algorithm, the achievable throughput rate for a given decoding architecture in a given IC technology is limited. The authors propose various concurrent decoding methods to relax the throughput limit by using parallel or pipelined hardware. These methods are simple, effective, flexible, and applicable to general decoder architectures. Unlimited concurrency can be achieved at the expense of additional latency, the overhead is low, and the complexity increases linearly with the throughput improvement. It is believed that the proposed methods and architectures make it possible to reconstruct arbitrarily high resolution Huffman-coded images and video in real time with current electronics. >

Journal ArticleDOI
TL;DR: The architecture of a single-chip video DSP capable of attaining a maximum performance of 300-MOPS (mega operations per second) using 0.8- mu m CMOS technology is described and some performance evaluations are presented.
Abstract: The architecture of a single-chip video DSP capable of attaining a maximum performance of 300-MOPS (mega operations per second) using 0.8- mu m CMOS technology is described. The DSP is designed for the many applications regarding p*64 kb/s single-board video codecs based on DSPs that have roughly ten times the performance of conventional DSPs. Highly parallel architectures that allow four pipelined processing units to be integrated into one chip are studied extensively. The authors consider data path configurations, program sequencing control, and microinstructions that effectively support multiple pipeline processing. A prototype DSP is fabricated using 0.8- mu m CMOS technology, and some performance evaluations are presented. >

Journal ArticleDOI
TL;DR: The design of an encoder for pruned tree-search vector quantization (VQ) is discussed, which allows near-optimal performance in a mean square error sense while keeping the hardware complexity low.
Abstract: The design of an encoder for pruned tree-search vector quantization (VQ) is discussed. This allows near-optimal performance in a mean square error sense while keeping the hardware complexity low. The encoder is partitioned into a slave processor chip that computes the distance and performs minimizations and an off-chip controller that directs the search. Pointer addressing is exploited in the codebook memory to keep the controller hardware simple. Inputs to the slave processor include the source vectors, the code vectors; and external control signals. The slave processor outputs the index of the code vector that best approximates the input in a mean square error sense. The layout for the slave processor has been generated using a 1.2- mu m CMOS library and measures 5.76*6.6 mm/sup 2/. Critical path simulation with SPICE indicates a throughput of 89 million multiply-accumulates per second. This implies that real-time processing at MPEG rates can be achieved if the number of levels (N7) and the number of children at any node (M) obey the constraint M*N >

Journal ArticleDOI
TL;DR: It is shown how the proposed performance model can be applied to the design of VLSI-based multiprocessor systems for video coding and the optimization of architectural parameters for SIMD and MIMD architectures is discussed.
Abstract: The authors discuss the performance of multiprocessor architectures to be applied for video coding algorithms SIMD, SIMD cluster, and MIMD architectures are studied by a unified performance approach under specific constraints of video coding algorithms The proposed performance model considers communication and computation times as well as the required silicon area MIMD architectures are compared to SIMD with regard to their performance The optimization of architectural parameters for SIMD and MIMD architectures is discussed It is shown how the proposed performance model can be applied to the design of VLSI-based multiprocessor systems for video coding >

Journal ArticleDOI
TL;DR: An architecture for the efficient and high-speed realization of morphological filters based on one-dimensional structuring elements constructed using the dual architectures similar to the systolic array architecture as used in the implementation of linear digital filters is presented.
Abstract: An architecture for the efficient and high-speed realization of morphological filters is presented. Since morphological filtering can be described in terms of erosion and dilation, two basic building units performing these functions are required for the realization of any morphological filter. Dual architectures for erosion and dilation are proposed and their operations are described. Their structure, similar to the systolic array architecture as used in the implementation of linear digital filters, is highly modular and suitable for efficient very-large-scale integration (VLSI) implementation. A decomposition scheme is proposed to facilitate the implementation of two-dimensional morphological filters based on one-dimensional structuring elements constructed using the dual architectures. The proposed architectures, which also allow the processing of gray-scale images, are appropriate for applications where speed, size, and cost are of critical significance. >

Journal ArticleDOI
TL;DR: An approach to constrained recursive estimation of the displacement vector field (DVF) in image sequences is presented and shows an improved performance with respect to accuracy, robustness of occlusion, and smoothness of the estimated DVF when applied to typical videoconferencing scenes.
Abstract: An approach to constrained recursive estimation of the displacement vector field (DVF) in image sequences is presented. An estimate of the displacement vector at the working point is obtained by minimizing the linearized displaced frame difference based on a set of observations that belong to a causal neighborhood (mask). An expression for the variance of the linearization error (noise) is obtained. Because the estimation of the DVF is an ill-posed problem, the solution is constrained by considering an autoregressive (AR) model for the DVF. A nonstationary AR model of the DVF is also considered. Additional information about the solution is incorporated into the algorithm using a causal oriented smoothness constraint. A set theoretic regularization approach based on this formulation results in a weighted constrained least-squares estimation of the DVF. The algorithm shows an improved performance with respect to accuracy, robustness of occlusion, and smoothness of the estimated DVF when applied to typical videoconferencing scenes. >

Journal ArticleDOI
TL;DR: The authors demonstrate that it is possible to implement practical high-order conditional entropy codecs using current low-cost very large-scale integration (VLSI) technology.
Abstract: High-order conditional entropy coding has not been practical due to its high complexity and lack of hardware to extract the conditioning state efficiently. The authors adopt the recently developed incremental-tree-extension technique to design the conditional tree for high-order conditional entropy coding. In order to make the high-speed conditional entropy coder feasible, they introduce several key innovations in the areas of complexity reduction and hardware architecture. For complexity reduction, they develop two techniques: code table reduction and nonlinear quantization of conditioning pixels. For hardware architecture, they propose a pattern-matching technique for fast conditioning state extraction and a multistage pipelined structure to handle the case of a large number of conditioning pixels. Using the complexity reduction techniques and the hardware structures. the authors demonstrate that it is possible to implement practical high-order conditional entropy codecs using current low-cost very large-scale integration (VLSI) technology. >

Journal ArticleDOI
TL;DR: The implementation of video data format using a minimum number of registers is considered, and a systematic lifetime analysis of the variables is carried out to determine the latency and the minimumNumber of registers needed for the converter.
Abstract: The implementation of video data format using a minimum number of registers is considered. A systematic lifetime analysis of the variables is carried out to determine the latency and the minimum number of registers needed for the converter. The technique of obtaining the minimum number of registers is illustrated using four classes of data format converters: line-by-line to column-by-column, line-by-line to interleaved skewed one-dimensional block format, line-by-line to interleaved two-dimensional block format, and line-by-line to zigzag format. Closed-form expressions for minimum number of registers in these converters are obtained in terms of the number of input and output pixels processed per cycle and the number of input and output bits processed per pixel in a cycle. >

Journal ArticleDOI
TL;DR: Presents a parallel architecture for a pixel-recursive motion estimation algorithm that is a linear array of processors, each consisting of an initialization part, a data-routing part, and an updating part.
Abstract: A parallel architecture for a pel-recursive motion estimation algorithm is presented. It is a linear array of processors, each consisting of an initialization part, a data-routing part, and an updating part. The initializing part performs a prediction of the motion vector. The routing parts constitute a routing path along which previous-frame data are routed from processors that store to processors that request such data. The router can be data driven or clocked. The latter approach is presented in more detail. The updating part calculates an update to the predicted motion vector. The architecture proposed is derived in a systematic way and is parameterized with respect to certain window sizes. It is thus completely different from the few existing pel-recursive motion estimation architectures. >

Journal ArticleDOI
TL;DR: A codebook design algorithm based on a two-dimensional discrete cosine transform (2-D DCT) is presented for vector quantization (VQ) of images that results in a considerable reduction in computation time and shows better picture quality.
Abstract: A codebook design algorithm based on a two-dimensional discrete cosine transform (2-D DCT) is presented for vector quantization (VQ) of images. The significant features of training images are extracted by using the 2-D DCT. A codebook is generated by partitioning the training set into a binary tree. Each training vector at a nonterminal node of the binary tree is directed to one of the two descendants by comparing a single feature associated with that node to a threshold. Compared with the pairwise nearest neighbor (PNN) algorithm, the algorithm results in a considerable reduction in computation time and shows better picture quality. >

Journal ArticleDOI
TL;DR: An adaptive finite-state vector quantization (FSVQ) in which the bit rate and the encoding time can be reduced is described and a threshold is used in FSVQ to decide whether to switch to a full searching VQ.
Abstract: A coding algorithm must have the ability to adapt to changing image characteristics for image sequences. An adaptive finite-state vector quantization (FSVQ) in which the bit rate and the encoding time can be reduced is described. In order to improve the image quality and avoid producing a wrong state for an input vector, a threshold is used in FSVQ to decide whether to switch to a full searching VQ. The codebook is conditionally replenished according to a distortion threshold at a later time to reflect the local statistics of the current frame. After the codebook is replenished, one can quickly reconstruct the state codebooks of FSVQ using the state codebook selection algorithm. In the experiments, the improvement over the static SMVQ is up to 2.40 dB at nearly the same bit rate and the encoding time is only one-ninth the time required by the static SMVQ. Moreover, the improvement over the static VQ is up to 2.91 dB, and the encoding time is only three-fifths the time required by the static VQ for the image sequence 'Claire'. >

Journal ArticleDOI
TL;DR: An expandable multiprocessor architecture for high-speed computation that has been realized as a prototype using commercially available digital signal processor (DSP) chips as the basic processing elements is described.
Abstract: The accuracy and speed characteristics of implementations of several line integration models required for Radon (1917) transform (used for computed tomography image reconstruction) and backprojection computations are described and compared. The fixed-point number system used is evaluated by error comparisons to identical floating-point calculations. An expandable multiprocessor architecture for high-speed computation that has been realized as a prototype using commercially available digital signal processor (DSP) chips as the basic processing elements is described. The simulated performances of two popular DSP chips for this application are discussed and compared. Performance characteristics of the complete prototype hardware system are presented. The computational speed of a four-chip system is measured to be more than 190 times better than that of a Sun 3/160 with a math coprocessor. The architecture and prototype organization are not dependent on the DSP chip chosen, and substitution of the most up-to-date DSP chips can yield even better speed performance. >

Journal ArticleDOI
TL;DR: Results indicate that SC networks can be used in moderate resolution systems (250*250 pixels) operating at video rate and the amount of spatial smoothing can be adjusted when the clocking scheme applied to the network is varied.
Abstract: The use of switched capacitor (SC) networks as an alternative to large resistor networks for performing computations in VLSI circuits for real-time machine vision and image processing systems is investigated. A mapping can be made from any resistor network to an equivalent SC network that has the same node voltage solution in steady state. However, it takes several switching cycles for a charge to be distributed in the SC network before steady state is reached. Results indicate that SC networks can be used in moderate resolution systems (250*250 pixels) operating at video rate. A chip that implemented several switched capacitor networks for spatial smoothing of images in one dimension was fabricated and tested. The amount of spatial smoothing can be adjusted when the clocking scheme applied to the network is varied. >

Journal ArticleDOI
TL;DR: A preliminary study shows the effectiveness of high-order entropy coding for 2-D data and the hardware structure used for decoding variable length codes can be applied to determine the conditioning state based on data in the causal region.
Abstract: A preliminary study shows the effectiveness of high-order entropy coding for 2-D data. The incremental conditioning tree extension method is the key element for reducing the complexity of high-order statistical coding. The determination of the conditioning state in the nonfull tree for an underlying sample is, in functionality, similar to the extraction of a codeword from a variable length coded bit string. Therefore, the hardware structure used for decoding variable length codes can be applied to determine the conditioning state based on data in the causal region. >

Journal ArticleDOI
TL;DR: A programmable general-purpose digital filter IC that employs multiple processing units on a single chip that is capable of realizing a rich variety of filter structures operating at the maximum possible instruction execution rate is described.
Abstract: A programmable general-purpose digital filter IC that employs multiple processing units on a single chip is described. The processors operate in parallel and communicate through on-chip dual-access storage register blocks. The processors are arranged as a ring with the locally shared register blocks between each adjacent pair of processors. It is shown that this ring of processors is capable of realizing a rich variety of filter structures operating at the maximum possible instruction execution rate. Fast real-time processing at video rates is obtained by using a fast hardware multiplier in each processor and by employing a design that permits each processor to simultaneously execute multiplication and addition operations in a single instruction cycle. An advantage of the design is the free passing of data between adjacent processors due to each register block's dual port. A ring of five such processors can be implemented in 2- mu m CMOS technology on a single 7.9 mm*9.2 mm chip operating at data rates up to 40 MHz. >

Journal ArticleDOI
TL;DR: Novel three-dimensional (3-D) recursive ladder filter structures for implementing a class of 3-D frequency-planar and 3- D frequency-beam filters are developed, suitable for high-speed real-time processing of video image sequences.
Abstract: By exploring parallelism, novel three-dimensional (3-D) recursive ladder filter structures for implementing a class of 3-D frequency-planar and 3-D frequency-beam filters are developed. These structures use discrete differentiators and are suitable for high-speed real-time processing of video image sequences. >

Journal ArticleDOI
TL;DR: A global optimization procedure known as simulated annealing has been used to determine the coefficients of filters with the coefficients that are given by the sum or difference of two power-of-two terms for a minimax criterion.
Abstract: To eliminate multipliers in the hardware implementation, filters with the coefficients that are given by the sum or difference of two power-of-two terms are designed. For a minimax criterion, a global optimization procedure known as simulated annealing has been used to determine the coefficients. Significant improvement in the filter performance is gained over methods that simply round the infinite precision solution. The algorithm also allows the maximum value of the coefficient wordlength beyond which filter performance does not improve to be found. A number of design examples involving typical video specifications, such as deinterlacing filters and luminance/chrominance separation filters, are reported. The method proves to be very effective, yet the designs are within the dimensional limits of the filters used in current industrial applications. >

Journal ArticleDOI
D. Zhao1, D.G. Daut1
TL;DR: The concept and the structure of the processor are described and its operations are compared to that of other processors and it appears to be more economical and flexible compared to a cellular array processor and more functional than a pipelined systolic array.
Abstract: A column array processor (CAP) architecture for real-time (video rate) morphological image processing is proposed. The basic idea for this structure is that of a serial-to-parallel data input format and circular data stream processing. As compared to other structures, the column array processor appears to be more economical and flexible compared to a cellular array processor and more functional than a pipelined systolic array. The authors describe the concept and the structure of the processor and compare its operations to that of other processors. The primary motivation behind such an array processor is to implement morphological image analysis and processing algorithms in real-time for applications to video signals. Use of the processor is not limited to morphological operations. >