scispace - formally typeset
Search or ask a question

Showing papers in "Signal Processing-image Communication in 2000"


Journal ArticleDOI
TL;DR: The experimental results conducted on a database of about 6,000 images in terms of exact matching under various transformations and the similarity-based retrieval show that the proposed shape descriptor is very effective in representing shapes.
Abstract: In order to retrieve an image from a large image database, the descriptor should be invariant to scale and rotation. It must also have enough discriminating power and immunity to noise for retrieval from a large image database. The Zernike moment descriptor has many desirable properties such as rotation invariance, robustness to noise, expression efficiency, fast computation and multi-level representation for describing the shapes of patterns. In this paper, we show that the Zernike moment can be used as an effective descriptor of global shape of an image in a large image database. The experimental results conducted on a database of about 6,000 images in terms of exact matching under various transformations and the similarity-based retrieval show that the proposed shape descriptor is very effective in representing shapes.

390 citations


Journal ArticleDOI
TL;DR: An efficient way to represent the coarse shape, scale and composition properties of an object is described, which is invariant to resolution, translation and rotation, and may be used for both two- and three-dimensional objects.
Abstract: The description of the spatial characteristics of two- and three-dimensional objects, in the framework of MPEG-7, is considered. The shape of an object is one of its fundamental properties, and this paper describes an efficient way to represent the coarse shape, scale and composition properties of an object. This representation is invariant to resolution, translation and rotation, and may be used for both two-dimensional (2-D) and three-dimensional (3-D) objects. This coarse shape descriptor will be included in the eXperimentation Model (XM) of MPEG-7. Applications of such a description to search object databases, in particular the CAESAR anthropometric database are discussed.

338 citations


Journal ArticleDOI
TL;DR: An overview of some of the synthetic visual objects supported by MPEG-4 version-1, namely animated faces and animated arbitrary 2D uniform and Delaunay meshes and integration of the face animation tool with the text-to-speech interface (TTSI), so that face animation can be driven by text input.
Abstract: This paper presents an overview of some of the synthetic visual objects supported by MPEG-4 version-1, namely animated faces and animated arbitrary 2D uniform and Delaunay meshes. We discuss both specification and compression of face animation and 2D-mesh animation in MPEG-4. Face animation allows to animate a proprietary face model or a face model downloaded to the decoder. We also address integration of the face animation tool with the text-to-speech interface (TTSI), so that face animation can be driven by text input.

224 citations


Journal ArticleDOI
TL;DR: A texture descriptor based on a multiresolution decomposition using Gabor wavelets is proposed that is quite robust to illumination variations and compares favorably with other texture descriptors for similarity retrieval.
Abstract: Image texture is useful in image browsing, search and retrieval. A texture descriptor based on a multiresolution decomposition using Gabor wavelets is proposed. The descriptor consists of two parts: a perceptual browsing component (PBC) and a similarity retrieval component (SRC). The extraction methods of both PBC and SRC are based on a multiresolution decomposition using Gabor wavelets. PBC provides a quantitative characterization of the texture’s structuredness and directionality for browsing application, and the SRC characterizes the distribution of texture energy in di!erent subbands, and supports similarity retrieval. This representation is quite robust to illumination variations and compares favorably with other texture descriptors for similarity retrieval. Experimental results are provided. ( 2000 Elsevier Science B.V. All rights reserved.

222 citations


Journal ArticleDOI
TL;DR: A region-based shape descriptor invariant to rotation, scale and translation is presented and experimental results conforming to the MPEG-7 shape descriptor core experiments are presented.
Abstract: A region-based shape descriptor invariant to rotation, scale and translation is presented in this paper. For a given binary shape, positions of pixels belonging to the shape are regarded as observed vectors of a 2-D random vector and two eigenvectors are obtained from the covariance matrix of the vector population. The shape is divided into four sub-regions by two principal axes corresponding to the two eigenvectors at the center of mass of the shape. Each sub-region is subdivided into four sub-regions in the same way. The sub-division process is repeated for a predetermined number of times. A quadtree representation with its nodes corresponding to regions of the shape is derived from the above process. Four parameters invariant to translation, rotation and scale are calculated for the corresponding region of each node while two parameters are extracted for the root node. The shape descriptor is represented as a vector of all the parameters and the similarity distance between two shapes is calculated by summing up the absolute differences of each element of descriptor vectors. Experimental results conforming to the MPEG-7 shape descriptor core experiments are presented.

162 citations


Journal ArticleDOI
TL;DR: The MPEG-4 visual standard is developed to provide users a new level of interaction with visual contents, which provides technologies to view, access and manipulate objects rather than pixels, with great error robustness at a large range of bit-rates.
Abstract: This paper describes the MPEG-4 standard, as defined in ISO/IEC 14496-2. The MPEG-4 visual standard is developed to provide users a new level of interaction with visual contents. It provides technologies to view, access and manipulate objects rather than pixels, with great error robustness at a large range of bit-rates. Application areas range from digital television, streaming video, to mobile multimedia and games. The MPEG-4 natural video standard consists of a collection of tools that support these application areas. The standard provides tools for shape coding, motion estimation and compensation, texture coding, error resilience, sprite coding and scalability. Conformance points in the form of object types, profiles and levels, provide the basis for interoperability. Shape coding can be performed in binary mode, where the shape of each object is described by a binary mask, or in gray scale mode, where the shape is described in a form similar to an alpha channel, allowing transparency, and reducing aliasing. Motion compensation is block based, with appropriate modifications for object boundaries. The block size can be 16×16, or 8×8, with half pixel resolution. MPEG-4 also provides a mode for overlapped motion compensation. Texture coding is based in 8×8 DCT, with appropriate modifications for object boundary blocks. Coefficient prediction is possible to improve coding efficiency. Static textures can be encoded using a wavelet transform. Error resilience is provided by resynchronization markers, data partitioning, header extension codes, and reversible variable length codes. Scalability is provided for both spatial and temporal resolution enhancement. MPEG-4 provides scalability on an object basis, with the restriction that the object shape has to be rectangular. MPEG-4 conformance points are defined at the Simple Profile, the Core Profile, and the Main Profile. Simple Profile and Core Profiles address typical scene sizes of QCIF and CIF size, with bit-rates of 64, 128, 384 and 2 Mbit/s. Main Profile addresses a typical scene sizes of CIF, ITU-R 601 and HD, with bit-rates at 2, 15 and 38.4 Mbit/s.

141 citations


Journal ArticleDOI
TL;DR: This paper provides an introduction to the use and internal mechanisms of these functions of the new MPEG-4 standard: streaming multimedia content, good compression, and user interactivity.
Abstract: The new MPEG-4 standard provides a suite of functionalities under one standard: streaming multimedia content, good compression, and user interactivity. This paper provides an introduction to the use and internal mechanisms of these functions.

77 citations


Journal ArticleDOI
TL;DR: The natural coding part within MPEG-4 audio describes traditional type speech and high-quality audio coding algorithms and their combination to enable new functionalities like scalability across the boundaries of coding algorithms.
Abstract: MPEG-4 audio represents a new kind of audio coding standard. Unlike its predecessors, MPEG-1 and MPEG-2 high-quality audio coding, and unlike the speech coding standards which have been completed by the ITU-T, it describes not a single or small set of highly efficient compression schemes but a complete toolbox to do everything from low bit-rate speech coding to high-quality audio coding or music synthesis. The natural coding part within MPEG-4 audio describes traditional type speech and high-quality audio coding algorithms and their combination to enable new functionalities like scalability (hierarchical coding) across the boundaries of coding algorithms. This paper gives an overview of the basic algorithms and how they can be combined.

62 citations


Journal ArticleDOI
TL;DR: This paper demonstrates that the fractal image coding algorithm is compatible with other image coding methods and proposes a new mapping in the image space called partial fractal mapping, which provides much flexibility for real implementations.
Abstract: The fractal image compression technique models a natural image using a contractive mapping called fractal mapping in the image space. In this paper, we demonstrate that the fractal image coding algorithm is compatible with other image coding methods. In other words, we can encode only part of the image using fractal technique and model the remaining part using other algorithms. According to such an idea, a new mapping in the image space called partial fractal mapping is proposed. Furthermore, a general framework of fractal-based hybrid image coding encoding/decoding systems is presented. The framework provides us with much flexibility for real implementations. Many different hybrid image coding schemes can be derived from it. Finally, a new hybrid image coding scheme is proposed where non-fractal coded regions are used to help the encoding of fractal coded regions. Experiments show that the proposed system performs better than the quadtree-based fractal image coding algorithm and the JPEG image compression standard at high compression ratios larger than 30.

54 citations


Journal ArticleDOI
TL;DR: A new algorithm for image compression, named predictive vector quantization (PVQ), is developed based on competitive neural networks and optimal linear predictors, and the performance of the algorithm is discussed.
Abstract: In this paper a new algorithm for image compression, named predictive vector quantization (PVQ), is developed based on competitive neural networks and optimal linear predictors The semi-closed-loop PVQ methodology is studied The experimental results are presented and the performance of the algorithm is discussed

49 citations


Journal ArticleDOI
TL;DR: The recently introduced incomplete 3D technique, which in a first step extracts the texture of the visible surface of a video object acquired with multiple cameras, and then performs disparity-compensated projection from the surface onto a view plane.
Abstract: Multi-viewpoint synthesis of video data is a key technology for the integration of video and 3D graphics, as necessary for telepresence and augmented-reality applications. This paper describes a number of important techniques which can be employed to accomplish that goal. The techniques presented are based on the analysis of 2D images acquired by two or more cameras. To determine depth information of single objects present in the scene, it is necessary to perform segmentation and disparity estimation. It is shown, how these analysis tools can benefit from each other. For viewpoint synthesis, techniques with different levels of tradeoff between complexity and degrees of freedom are presented. The first approach is disparity-controlled view interpolation, which is capable of generating intermediate views along the interocular axis between two adjacent cameras. The second is the recently introduced incomplete 3D technique, which in a first step extracts the texture of the visible surface of a video object acquired with multiple cameras, and then performs disparity-compensated projection from the surface onto a view plane. In the third and most complex approach, a 3D model of the object is generated, which can be represented by a 3D wire grid. For synthesis, this model can be rotated to arbitrary orientations, and original texture is mapped onto the surface to obtain an arbitrary view of the processed object. The result of this rendering procedure is a virtual image with very natural appearance.

Journal ArticleDOI
Limin Wang1
TL;DR: A new rate control approach which addresses the problems associated with degradation in picture quality at scene cuts and nonuniform picture quality due to buffer-dependent variations of the quantization parameter is presented.
Abstract: ISO/IEC MPEG-2 Test Model 5 (TM5) describes a rate control method which consists of three steps: bit allocation, rate control and modulation (ISO/MPEG II, Test Model 5, April 1993). There are, however, two problems associated with TM5 rate control: degradation in picture quality at scene cuts and nonuniform picture quality due to buffer-dependent variations of the quantization parameter. This paper presents a new rate control approach which addresses these issues. To eliminate the impact of scene cuts on the picture quality, the first scheduled P picture in a new scene is coded as an I picture, and the extra I picture is further balanced by coding the next scheduled I picture as a P picture. To achieve a relatively uniform picture quality, the same global quantization parameter is applied to all the macroblocks in a picture. The global quantization parameter is determined by using either an iterative or a binary search algorithm. The simulation results demonstrate that a significant improvement in performance is obtained using the proposed rate control approach.

Journal ArticleDOI
TL;DR: An overview of Part 1 of ISO/IEC 14496 (MPEG-4 systems) is given, starting from the general architecture up to the description of the individual MPEG-4 Systems tools.
Abstract: This paper gives an overview of Part 1 of ISO/IEC 14496 (MPEG-4 Systems). It first presents the objectives of the MPEG-4 activity. In the MPEG-1 and MPEG-2 standards, “Systems” referred only to overall architecture, multiplexing, and synchronization. In MPEG-4, in addition to these issues, the Systems part encompasses scene description, interactivity, content description, and programmability. The description of the MPEG-4 specification follows, starting from the general architecture up to the description of the individual MPEG-4 Systems tools. Finally, a conclusion describes the future extensions of the specification, as well as a comparison between the solutions provided by MPEG-4 Systems and some alternative technologies.

Journal ArticleDOI
TL;DR: This paper describes description schemes (DSs) for image, video, multimedia, home media, and archive content proposed to the MPEG-7 standard and demonstrates the feasibility and the efficiency of the description schemes by presenting applications that already use the proposed structures or will greatly benefit from their use.
Abstract: In this paper, we describe description schemes (DSs) for image, video, multimedia, home media, and archive content proposed to the MPEG-7 standard. MPEG-7 aims to create a multimedia content description standard in order to facilitate various multimedia searching and filtering applications. During the design process, special care was taken to provide simple but powerful structures that represent generic multimedia data. We use the extensible markup language (XML) to illustrate and exemplify the proposed DSs because of its interoperability and flexibility advantages. The main components of the image, video, and multimedia description schemes are object, feature classification, object hierarchy, entity-relation graph, code downloading, multi-abstraction levels, and modality transcoding. The home media description instantiates the former DSs proposing the 6-W semantic features for objects, and 1-P physical and 6-W semantic object hierarchies. The archive description scheme aims to describe collections of multimedia documents, whereas the former DSs only aim at individual multimedia documents. In the archive description scheme, the content of an archive is represented using multiple hierarchies of clusters, which may be related by entity-relation graphs. The hierarchy is a specific case of entity-relation graph using a containment relation. We explicitly include the hierarchy structure in our DSs because it is a natural way of defining composite objects, a more efficient structure for retrieval, and the representation structure used in MPEG-4. We demonstrate the feasibility and the efficiency of our description schemes by presenting applications that already use the proposed structures or will greatly benefit from their use. These applications are the visual apprentice, the AMOS-search system, a multimedia broadcast news browser, a storytelling system, and an image meta-search engine, MetaSEEk.

Journal ArticleDOI
TL;DR: An algorithm based on Kalman filtering is suggested here to dynamically estimate the background reference image and faces the severe problems of parameter tuning and modeling approximations.
Abstract: A change detection scheme used to detect objects in a complex real-life scene must be able to deal with illumination changes, shadows and structural variations of the environment. Several approaches are based on subtracting a reference image, representing the background, from the current input image. The most used methods estimate the background image by applying some low-pass filter on the input image sequence. Many of them require an accurate calibration phase and rely on a careful selection of critical parameters. An algorithm based on Kalman filtering is suggested here to dynamically estimate the background reference image. The approach extends former works and faces the severe problems of parameter tuning and modeling approximations. An experimental analysis on the behavior of the proposed algorithm in presence of different illumination changes is performed using noisy synthetic data. The results are used to address the choice of values for the filter parameters. The effectiveness and robustness of the algorithm are evaluated on several tests that were carried out on real-life sequences.

Journal ArticleDOI
TL;DR: The paper describes the elementary stream management (ESM) facilities provided by MPEG-4 Systems and describes the synchronization functionality as well as the system decoder model that defines the timing behavior and buffer resource management of MPEG- 4 receivers.
Abstract: We describe the elementary stream management (ESM) facilities provided by MPEG-4 Systems. Within the extensive set of tools defined by MPEG-4, the ESM tools play a critical role in joining several building blocks together. ESM provides a dual to the scene description language (BIFS) in that it links the streaming resources of a presentation to the scene. We also describe the synchronization functionality as well as the system decoder model that defines the timing behavior and buffer resource management of MPEG-4 receivers. The paper concludes with considerations on data packaging in underlying delivery layer protocols and a description of the MPEG-4 content access procedure.

Journal ArticleDOI
TL;DR: This paper intends to give an overview on the MPEG-4 motivations, objectives, achievements, process and workplan, providing a stimulating starting point for more detailed reading.
Abstract: The MPEG-4 Version 1 standard has been recently finalized Since MPEG-4 adopted an object-based audiovisual representation model with hyperlinking and interaction capabilities and supports both natural and synthetic content, it is expected that this standard will become the information coding playground for future multimedia applications This paper intends to give an overview on the MPEG-4 motivations, objectives, achievements, process and workplan, providing a stimulating starting point for more detailed reading

Journal ArticleDOI
TL;DR: Empirical descriptors for basic visual information features are developed so that invariance against common transformations of visual material is achieved, and that they are fitted to human perception properties.
Abstract: This paper reports about descriptors for basic visual information features, which have been developed in the context of the forthcoming MPEG-7 standard. The four basic features supported are color, texture, shape and motion. A search engine system has been developed which supports combinations of basic feature descriptors in a low-level description scheme for similarity-based retrieval of visual (image and video) data. All basic descriptors have been developed so that invariance against common transformations of visual material, e.g. filtering, contrast/color manipulation, resizing, etc., is achieved, and that they are fitted to human perception properties. Furthermore, descriptors have been designed to allow fast, coarse-to-fine search procedures. Elements described in this contribution have been proposed for the MPEG-7 standard. They are currently either included in the MPEG-7 Experimentation Model (XM) or investigated within core experiments, which are performed during the standard's development.

Journal ArticleDOI
TL;DR: With this algorithm, a compression ratio higher than that of the Lossless JPEG method for 512×512 images can be obtained and the newly proposed algorithm provides a good means for lossless image compression.
Abstract: A novel lossless image-compression scheme is proposed in this paper. A two-stage structure is embedded in this scheme. A linear predictor is used to decorrelate the raw image data in the first stage. Then in the second stage, an effective scheme based on the Huffman coding method is developed to encode the residual image. This newly proposed scheme could reduce the cost for the Huffman coding table while achieving high compression ratio. With this algorithm, a compression ratio higher than that of the Lossless JPEG method for 512×512 images can be obtained. In other words, the newly proposed algorithm provides a good means for lossless image compression.

Journal ArticleDOI
TL;DR: This paper presents a set of description schemes (DS) dealing with video programs, users and devices designed to support personalization, efficient management of AV information and the expected variability in the capabilities ofAV information access devices.
Abstract: This paper presents a set of description schemes (DS) dealing with video programs, users and devices. Following MPEG-7 terminology, a description of an AV document includes descriptors (termed Ds), which specify the syntax and semantics of a representation entity for a feature of the AV data, and description schemes (termed DSs) which specify the structure and semantics of a set of Ds and DSs. The Program DS is used to describe the physical structure as well as the semantic content of a video program. It focuses on the visual information only. The physical structure is described by the temporal organization of the sequence (segments), the spatial organization of images (regions) as well as the spatio-temporal structure of the video (regions with motion). The semantic description is built around objects and events. Finally, the physical and semantic descriptions are related by a set of links defining where or when instances of specific semantic notions can be found. The User DS is used to describe the personal preferences and usage patterns of a user. It facilitates a smart personalizable device that records and presents to the user audio and video information based upon the user's preferences, prior viewing and listening habits, as well as personal characteristics. Finally, the Device DS keeps a record of the users of the device, available programs, and a description of device capabilities. It allows a device to prepare itself based on the existing users, profiles and available programs. These three types of DSs and the common set of descriptors that they share are designed to support personalization, efficient management of AV information and the expected variability in the capabilities of AV information access devices.

Journal ArticleDOI
TL;DR: The paper concentrates on the motivations and objectives behind MPEG-7, giving some applications, outlining the process and work plan, and explains the relation with the other MPEG standards, notably MPEG-4.
Abstract: The value of information often depends on how easily it can be found, retrieved, accessed, filtered and managed. An incommensurable amount of audiovisual information is becoming available in digital form, in digital archives, on the World Wide Web, in broadcast datastreams and in personal and professional databases, and this amount is only growing. In spite of the fact that users have increasing access to these resources, identifying and managing them efficiently is becoming more difficult, because of the growing volume. The question of identifying content is not just restricted to database retrieval applications such as digital libraries, but extends to areas like broadcast channel selection, multimedia editing, and multimedia directory services. In 1996 MPEG has recognised the need to identify multimedia content, and started a work item formally called `Multimedia Content Description Interface', better known as MPEG-7. The new MPEG-7 standard will provide a rich set of standardised tools to describe multimedia content. The people active in defining MPEG-7 represent broadcasters, equipment and chip manufacturers, digital content creators and managers, telecommunication service providers, publishers and intellectual property rights managers, as well as university researchers. Both human users and automatic systems that process audiovisual information are within the scope of MPEG-7. This paper will present an overview of the MPEG-7 standardisation project. The paper concentrates on the motivations and objectives behind MPEG-7, giving some applications, outlining the process and work plan. It will also explain the relation with the other MPEG standards, notably MPEG-4.

Journal ArticleDOI
TL;DR: This work considers motion estimation between images of a video sequence in the presence of illumination variations in the scene and proposes a pel-recursive motion estimator adapted to this new motion model in order to estimate both motion and illumination variation fields.
Abstract: We consider motion estimation between images of a video sequence in the presence of illumination variations in the scene. The standard model of motion between consecutive images is extended to a prediction model with multiplicative prediction coefficient. This additional coefficient is interpreted as an illumination variation parameter. A pel-recursive motion estimator is adapted to this new motion model in order to estimate both motion and illumination variation fields. We present experiments on real images containing localised illumination variations and show that the proposed approach allows the prediction error to be largely reduced in comparison with the standard pel-recursive motion estimation algorithm.

Journal ArticleDOI
TL;DR: This paper presents a contour-based approach to efficiently code binary shape information in the context of object-based video coding which meets some of the most important requirements identified for the MPEG-4 standard, notably efficient coding and low delay.
Abstract: This paper presents a contour-based approach to efficiently code binary shape information in the context of object-based video coding. This approach meets some of the most important requirements identified for the MPEG-4 standard, notably efficient coding and low delay. The proposed methods support both object-based lossless and quasi-lossless coding modes. For the cases where low delay is a primary requirement, a macroblock-based coding mode is proposed which can take advantage of inter-frame coding to improve the coding efficiency. The approach presented here relies on a grid different from that used for the pixels to represent the shape – the hexagonal grid – which simplifies the task of contour coding. Using this grid, an appraoch based on a differential chain code (DCC) is proposed for the lossless mode while, for the quasi-lossless case, an approach based on the multiple grid chain code (MGCC) principle is proposed. The MGCC combines both contour simplification and contour prediction to reduce the number of bits needed to code the shapes. Results for alpha plane coding of MPEG-4 video test sequences are presented in order to illustrate the performance of the several modes of operation, and a comparison is made with the shape-coding tool chosen by MPEG-4.

Journal ArticleDOI
TL;DR: The MPEG-4 profiles and levels as discussed by the authors serve two main purposes: (1) ensuring interoperability between MPEG4 implementations, and (2) allowing conformance to the standard to be tested.
Abstract: Profiles and levels in MPEG-4 are standardised in order to give users a number of well-defined and well-chosen conformance points. They serve two main purposes: (1) ensuring interoperability between MPEG-4 implementations, and (2) allowing conformance to the standard to be tested. Profiles exist not only for the Audio and Visual parts of the standard (audio profiles and visual profiles), but also for the Systems part of the standard, in the form of graphics profiles, scene graph profiles, and an object descriptor profile. Different profiles are created for different application environments. The policy for defining profiles is that they should enable as many applications as possible while keeping the number of different profiles low. MPEG has defined a first set of profiles for MPEG-4, but more are expected. MPEG will be restrictive in defining any new profiles, listening carefully to what its users have to say.

Journal ArticleDOI
TL;DR: This work presents a new technique for reducing the encoding complexity of fractal image compression that is lossless, i.e., it does not sacrifice any image reconstruction quality for the sake of speedup, and outperforms other currently known lossless acceleration methods.
Abstract: In fractal image compression the encoding step is computationally expensive. We present a new technique for reducing the encoding complexity. It is lossless, i.e., it does not sacrifice any image reconstruction quality for the sake of speedup. It is based on a codebook coherence characteristic of fractal image compression and leads to a novel application of the fast Fourier transform-based cross correlation. The proposed method is particularly well suited for use with highly irregular image partitions for which most traditional (lossy) acceleration schemes lose a large part of their efficiency. For large ranges our approach outperforms other currently known lossless acceleration methods.

Journal ArticleDOI
TL;DR: This paper presents two motion descriptors which were recommended by MPEG to become part of the first visual reference model (XM 1.0) of the evolving MPEG-7 standard in development and are important elements in capturing the dynamic content of video sequences in a compact form.
Abstract: This paper presents two motion descriptors which were recommended by MPEG to become part of the first visual reference model (XM 1.0) of the evolving MPEG-7 standard in development. These motion descriptors are: (i) the camera motion descriptor which describes the global motion of the camera or of the observer in a natural 3-D scene, and (ii) the object motion trajectory descriptor which describes how an object moves in 3-D space or in the 2-D image plane. These two descriptors are important elements in capturing the dynamic content of video sequences in a compact form. They are used to index video sequences according to their dynamic content. Applications that use these descriptors include TV program classification, video editing for broadcast TV and movies, broadcast sports, and video surveillance.

Journal ArticleDOI
TL;DR: This paper presents a videotext description scheme and automatic methods for detection and representation of text in video segments based on edge characterization while the other is based on region analysis.
Abstract: Superimposed text and scene text in video, i.e. videotext, brings important semantic clues into content analysis. In this paper we present a videotext description scheme and automatic methods for detection and representation of text in video segments. One of the methods is based on edge characterization while the other is based on region analysis. Applications of the videotext description scheme are numerous ranging from video indexing and annotation, ticker-tape analysis, commercial detection, transcript analysis to cross-modal querying using text and face information.

Journal ArticleDOI
TL;DR: A compression method that can be described as an intermediate stage between traditional transform coding and vector quantization, which can be used for compressing natural images, but it would be suited best for real multidimensional data.
Abstract: Discrete cosine transform (DCT) is used in practically all commercial image and video compression systems. It has been found to be the best choice among all unitary transforms in compressing natural images. We are also quite convinced that it is difficult to find any clear alternative for DCT in typical transform coders. Instead, we decided to change the problem. If another single transform cannot outperform DCT, how about using a set of transforms? This paper tries to give an overview about this idea. It was tested in several experiments, which show that the use of several transforms can improve compression ratio despite the side information needed. The result is a compression method that can be described as an intermediate stage between traditional transform coding and vector quantization. It can be used for compressing natural images, but it would be suited best for real multidimensional data.

Journal ArticleDOI
TL;DR: This paper introduces a new binary shape coding technique called generalized predictive shape coding (GPSC) to encode the boundary of a visual object compactly by using a vertex-based approach that retains the advantages of existing polygon-based algorithms for visual content description while furnishing better geometric compression.
Abstract: This paper introduces a new binary shape coding technique called generalized predictive shape coding (GPSC) to encode the boundary of a visual object compactly by using a vertex-based approach. GPSC consists of a contour pixel matching algorithm and a motion-compliant contour coding algorithm. The contour pixel matching algorithm utilizes the knowledge of previously decoded contours by using a uniform translational model for silhouette motion, and generalizes polygon approximation for lossless and lossy motion estimation by adjusting a tolerance parameter d max . To represent motion-compliant regions with minimum information in the transmitted bitstream, we develop a reference index-based coding scheme to represent the 2D positions of the matched segments using 1D reference contour indices. Finally, we encode the mismatched segments by sending residual polygons until the distortion is less than d max . While GPSC realizes polygon approximation exactly at every encoding stage, we can guarantee quality of service because the peak distortion is no greater than d max , and we improve coding efficiency as long as a silhouette complies with the model. The tolerance parameter d max can be assigned to each contour to smooth the transmitted data rate, which is especially useful for constant bandwidth channels. Compared with non-predictive approaches, simulation using MPEG-4 sequences demonstrates that GPSC not only improves objective gain but also enhances visual quality based on MPEG-4 subjective tests. The significance of GPSC is that it provides a generic framework for seamlessly extending conventional vertex coding schemes into the temporal domain yet it retains the advantages of existing polygon-based algorithms for visual content description while furnishing better geometric compression.

Journal ArticleDOI
TL;DR: An e-cient method for the lossy encoding of object shapes which are given as 8-connect chain codes using a mathematical model that approximate a boundary by a second-order B-spline curve and considers the problem of "nding the curve with the lowest bit-rate for a given distortion".
Abstract: A major problem in object-oriented video coding is the e$cient encoding of the shape information of arbitrarily shaped objects. E$cient shape coding schemes are also needed in encoding the shape information of video object (VO) in the upcoming MPEG-4 standard. Furthermore, there are many applications where only the shape needs to be encoded, such as CAD, 3D modeling and signature encoding. In this paper, we present an e$cient method for the lossy encoding of object shapes which are given as 8-connect chain codes using a mathematical model. We approximate a boundary by a second-order B-spline curve and consider the problem of "nding the curve with the lowest bit-rate for a given distortion. The presented scheme is optimal, e$cient and o!ers complete control over the trade-o! between bit-rate and distortion. It is an extension of our previous research where we used polygons to approximate a boundary. The main reason for using curves rather than polygons is that curves have a more natural appearance than polygons and can give better coding e$cencies. We present results of the proposed scheme using objects boundaries in di!erent shapes and sizes as well as an MPEG-4 test sequence. ( 2000 Elsevier Science B.V. All rights reserved.