scispace - formally typeset
Search or ask a question

Showing papers in "Storage and Retrieval for Image and Video Databases in 1997"


Proceedings ArticleDOI
TL;DR: A new method based on amplitude modulation is presented that has shown to be resistant to both classical attacks, such as filtering, and geometrical attacks and can be extracted without the original image.
Abstract: Watermarking techniques, also referred to as digital signature, sign images by introducing changes that are imperceptible to the human eye but easily recoverable by a computer program. Generally, the signature is a number which identifies the owner of the image. The locations in the image where the signature is embedded are determined by a secret key. Doing so prevents possible pirates from easily removing the signature. Furthermore, it should be possible to retrieve the signature from an altered image. Possible alternations of signed images include blurring, compression and geometrical transformations such as rotation and translation. These alterations are referred to as attacks. A new method based on amplitude modulation is presented. Single signature bits are multiply embedded by modifying pixel values in the blue channel. These modifications are either additive or subtractive, depending on the value of the bit, and proportional to the luminance. This new method has shown to be resistant to both classical attacks, such as filtering, and geometrical attacks. Moreover, the signature can be extracted without the original image.

408 citations


Proceedings ArticleDOI
TL;DR: This paper provides counterfeit watermarking schemes that can be performed on a watermarked image to allow multiple claims of rightful ownerships, and proposed non-invertible water marking schemes in this paper.
Abstract: Digital watermarks have been proposed in recent literature as the means for copyright protection of multimedia data. In this paper we address the capability of invisible watermarking schemes to resolve copyright ownerships. We will show that rightful ownerships cannot be resolved by current watermarking schemes alone. In addition, in the absence of standardization of watermarking procedures, anyone can claim ownership of any watermarked image. Specifically, we provide counterfeit watermarking schemes that can be performed on a watermarked image to allow multiple claims of rightful ownerships. We also proposed non-invertible watermarking schemes in this paper and discuss in general the usefulness of digital watermarks in identifying the rightful copyright owners. The results, coupled with the recent attacks on some image watermarks, further imply that we have to carefully re-think our approaches to invisible watermarking of images, and re- evaluate the promises, applications and limitations of such digital means of copyright protection.

270 citations


Proceedings ArticleDOI
TL;DR: A relevance feedback based interactive retrieval approach, which effectively takes into account the above two characteristics in CBIR and greatly reduces the user's effort of composing a query and captures the users' information need more precisely.
Abstract: Content-based image retrieval (CBIR) has become one of the most active research areas in the past few years. Many visual feature representations have been explored and many systems built. While these research efforts establish the basis of CBIR, the usefulness of the proposed approaches is limited. Specifically, these efforts have relatively ignored two distinct characteristics of CBIR systems: (1) the gap between high level concepts and low level features; (2) subjectivity of human perception of visual content. This paper proposes a relevance feedback based interactive retrieval approach, which effectively takes into account the above two characteristics in CBIR. During the retrieval process, the user's high level query and perception subjectivity are captured by dynamically updated weights based on the user's relevance feedback. The experimental results show that the proposed approach greatly reduces the user's effort of composing a query and captures the user's information need more precisely.

231 citations


Proceedings ArticleDOI
TL;DR: This paper presents the use of the VVE framework for content based video retrieval, a flexible platform independent architecture which provides support for processing multiple synchronized data streams like image sequences, audio and closed captions.
Abstract: The temporal and multi-modal nature of video increases the dimensionality of content based retrieval problem. This places new demands on the indexing and retrieval tools required. The Virage Video Engine (VVE) with the default set of primitives provide the necessary frame work and basic tools for video content based retrieval. The video engine is a flexible platform independent architecture which provides support for processing multiple synchronized data streams like image sequences, audio and closed captions. The architecture allows for multi-modal indexing and retrieval of video through the use of media specific primitives. This paper presents the use of the VVE framework for content based video retrieval.© (1997) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

202 citations


Proceedings ArticleDOI
TL;DR: This paper describes an effective technique for image authentication which can prevent malicious manipulations but allow JPEG lossy compression and shows that the design of authenticator depends on the number of recompression times and on whether the image is decoded into integral values in the pixel domain during the recompression process.
Abstract: Image authentication verifies the originality of an image by detecting malicious manipulations. This goal is different from that of image watermarking which embeds into the image a signature surviving most manipulations. Existing methods for image authentication treat all types of manipulation equally (i.e., as unacceptable). However, some applications demand techniques that can distinguish acceptable manipulations (e.g., compression) from malicious ones. In this paper, we describe an effective technique for image authentication which can prevent malicious manipulations but allow JPEG lossy compression. The authentication signature is based on the invariance of the relationship between DCT coefficients of the same position in separate blocks of an image. This relationship will be preserved when these coefficients are quantized in a JPEG compression process. Our proposed method can distinguish malicious manipulations from JPEG lossy compression regardless of how high the compression ratio is. We also show that, in different practical cases, the design of authenticator depends on the number of recompression times and on whether the image is decoded into integral values in the pixel domain during the recompression process. Theoretical and experimental results indicate that this technique is effective for image authentication.

180 citations


Proceedings ArticleDOI
TL;DR: An image and video search engine which utilizes both text-based navigation and content-based technology for searching visually through the catalogued images and videos is introduced.
Abstract: We describe a visual information system prototype for searching for images and videos on the World-Wide Web. New visual information in the form of images, graphics, animations and videos is being published on the Web at an incredible rate. However, cataloging this visual data is beyond the capabilities of current text-based Web search engines. In this paper, we describe a complete system by which visual information on the Web is (1) collected by automated agents, (2) processed in both text and visual feature domains, (3) catalogued and (4) indexed for fast search and retrieval. We introduce an image and video search engine which utilizes both text-based navigation and content-based technology for searching visually through the catalogued images and videos. Finally, we provide an initial evaluation based upon the cataloging of over one half million images and videos collected from the Web.

171 citations



Proceedings ArticleDOI
TL;DR: A basketball annotation system which combines the low-level information extracted from MPEG stream with the prior knowledge of basketball video structure to provide high level content analysis, annotation and browsing for events such as wide- angle and close-up views, fast breaks, steals, potential shots, number of possessions and possession times is developed.
Abstract: Automated analysis and annotation of video sequences are important for digital video libraries, content-based video browsing and data mining projects. A successful video annotation system should provide users with useful video content summary in a reasonable processing time. Given the wide variety of video genres available today, automatically extracting meaningful video content for annotation still remains hard by using current available techniques. However, a wide range video has inherent structure such that some prior knowledge about the video content can be exploited to improve our understanding of the high-level video semantic content. In this paper, we develop tools and techniques for analyzing structured video by using the low-level information available directly from MPEG compressed video. Being able to work directly in the video compressed domain can greatly reduce the processing time and enhance storage efficiency. As a testbed, we have developed a basketball annotation system which combines the low-level information extracted from MPEG stream with the prior knowledge of basketball video structure to provide high level content analysis, annotation and browsing for events such as wide- angle and close-up views, fast breaks, steals, potential shots, number of possessions and possession times. We expect our approach can also be extended to structured video in other domains.© (1997) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

142 citations


Proceedings ArticleDOI
TL;DR: A frame-type independent representation of the various types of frames present in an MPEG video in which al frames can be considered equivalent is developed and enable fast archiving, indexing, and retrieval of video.
Abstract: Development of various multimedia applications hinges on the availability of fast and efficient storage, browsing, indexing, and retrieval techniques. Given that video is typically stored efficiently in a compressed format, if we can analyze the compressed representation directly, we can avoid the costly overhead of decompressing and operating at the pixel level. Compressed domain parsing of video has been presented in earlier work where a video clip is divided into shots, subshots, and scenes. In this paper, we describe key frame selection, feature extraction, and indexing and retrieval techniques that are directly applicable to MPEG compressed video. We develop a frame-type independent representation of the various types of frames present in an MPEG video in which al frames can be considered equivalent. Features are derived from the available DCT, macroblock, and motion vector information and mapped to a low-dimensional space where they can be accessed with standard database techniques. The spatial information is used as primary index while the temporal information is used to enhance the robustness of the system during the retrieval process. The techniques presented enable fast archiving, indexing, and retrieval of video. Our operational prototype typically takes a fraction of a second to retrieve similar video scenes from our database, with over 95% success.

114 citations


Proceedings ArticleDOI
TL;DR: A new spatio-temporal segmentation and object-tracking scheme, and a hierarchical object-based video representation model are presented, which can handle large motion.
Abstract: There is a growing need for new representations of video that allow not only compact storage of data but also content-based functionalities such as search and manipulation of objects. We present here a prototype system, called NeTra-V, that is currently being developed to address some of these content related issues. The system has a two-stage video processing structure: a global feature extraction and clustering stage, and a local feature extraction and object-based representation stage. Key aspects of the system include a new spatio-temporal segmentation and object-tracking scheme, and a hierarchical object-based video representation model. The spatio-temporal segmentation scheme combines the color/texture image segmentation and affine motion estimation techniques. Experimental results show that the proposed approach can handle large motion. The output of the segmentation, the alpha plane as it is referred to in the MPEG-4 terminology, can be used to compute local image properties. This local information forms the low-level content description module in our video representation. Experimental results illustrating spatio- temporal segmentation and tracking are provided.

102 citations


Proceedings ArticleDOI
TL;DR: A technique for automatic classification of video sequences, (such as a TV broadcast, movies), that analyzes the incoming video sequences and classifies them into categories is explored and can be viewed as an on-line parser for video signals.
Abstract: In this paper, we explore a technique for automatic classification of video sequences, (such as a TV broadcast, movies). This technique analyzes the incoming video sequences and classifies them into categories. It can be viewed as an on-line parser for video signals. We present two techniques for automatic classification. In the first technique, the incoming video sequence is analyzed to extract the motion information. This information is optimally projected onto a single dimension. This projection information is then used to train Hidden Markov Models (HMMs) that efficiently and accurately classify the incoming video sequence. Preliminary results with 50 different test sequences (25 Sports and 25 News sequences) indicae a classification accuracy of 90% by the HMM models. In the second technique, 24 full-length motion picture trailers are classified using HMMs. This classification is compared with the internet movie database and we find that they correlate well. Only two out of 24 trailers were classified incorrectly.

Proceedings ArticleDOI
TL;DR: MetaSEEk is a content-based meta-search engine used for finding images on the Web based on their visual information designed to intelligently select and interface with multiple on-line image search engines by ranking their performance for different classes of user queries.
Abstract: Search engines are the most powerful resources for finding information on the rapidly expanding World Wide Web (WWW). Finding the desired search engines and learning how to use them, however, can be very time consuming. The integration of such search tools enables the users to access information across the world in a transparent and efficient manner. These systems are called meta-search engines. The recent emergence of visual information retrieval (VIR) search engines on the web is leading to the same efficiency problem. This paper describes and evaluates MetaSEEk, a content-based meta-search engine used for finding images on the Web based on their visual information. MetaSEEk is designed to intelligently select and interface with multiple on-line image search engines by ranking their performance for different classes of user queries. User feedback is also integrated in the ranking refinement. We compare MetaSEEk with a base line version of meta-search engine, which does not use the past performance of the different search engines in recommending target search engines for future queries.

Proceedings ArticleDOI
TL;DR: Although the second DCT-based method is slightly less resistant to JPEG compression, it is more resistant to line-shifting and cropping than the first one and is suitable for real-time labeling.
Abstract: In the European project SMASH a mass multimedia storage device for home usage is being developed. The success of such a storage system depends not only on technical advances, but also on the existence of an adequate copy protection method. Copy protection for visual data requires fast and robust labeling techniques. In this paper, two new labeling techniques are proposed. The first method extends an existing spatial labeling technique. This technique divides the image into blocks and searches an optimal label- embedding level for each block instead of using a fixed embedding-level for the complete image. The embedding-level for each block is dependent on a lower quality JPEG compressed version of the labeled block. The second method removes high frequency DCT-coefficients in some areas to embed a label. A JPEG quality factor and the local image structure determine how many coefficients are discarded during the labeling process. Using both methods a perceptually invisible label of a few hundred bits was embedded in a set of true color images. The label added by the spatial method is very robust against JPEG compression. However, this method is not suitable for real-time applications. Although the second DCT-based method is slightly less resistant to JPEG compression, it is more resistant to line-shifting and cropping than the first one and is suitable for real-time labeling.© (1997) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
TL;DR: Current research directions of the QBIC project such as indexing for high-dimensional multimedia data, retrieval of gray level images, and storyboard generation suitable for video are discussed.
Abstract: QBICTM (Query By Image Content) is a set of technologies and associated software that allows a user to search, browse, and retrieve image, graphic, and video data from large on-line collections. This paper discusses current research directions of the QBIC project such as indexing for high-dimensional multimedia data, retrieval of gray level images, and storyboard generation suitable for video. It describes aspects of QBIC software including scripting tools, application interfaces, and available GUIs, and gives examples of applications and demonstration systems using it.© (1997) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
TL;DR: A general approach based on either a set or sequence representation with variable degrees of aggregation is proposed and applied recursively over the different levels of temporal resolution to allow the inquirer to fully control the importance of temporal ordering and duration.
Abstract: In this paper, we consider the problem of similarity between video sequences. Three basic questions are raised and (partially) answered. Firstly, at what temporal duration can video sequences be compared? The frame, shot, scene and video levels are identified. Secondly, given some image or video feature, what are the requirements on its distance measure and how can it be 'easily' transformed into the visual similarity desired by the inquirer? Thirdly, how can video sequences be compared at different levels? A general approach based on either a set or sequence representation with variable degrees of aggregation is proposed and applied recursively over the different levels of temporal resolution. It allows the inquirer to fully control the importance of temporal ordering and duration. Promising experimental results are presented.© (1997) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
Minerva M. Yeung1, Boon-Lock Yeo1
TL;DR: This paper develops models to capture and characterize video by temporal events, namely, dialogues, actions and story units, and presents these events using succinct visual summaries that depict and differentiate the underlying dramatic elements in an intuitive manner.
Abstract: In digital libraries and the Internet, large amount of data in various modalities has to be transmitted and delivered across the networks, and is subject to bandwidth constraints and network congestion. Among all multimedia data, video is the most difficult to handle, both in terms of its size and the scarcity of tools and techniques available for efficient delivery, storage and retrieval. Providing tools to help users search and browse large collections of video documents is important. Equally important are the means to deliver and present the essence of video content to the user without noticeable delay. In this paper, we focus on the characterization of video by means of automatic analysis of its visual content and the compact presentation of the underlying story content built upon the derived characteristics. We develop models to capture and characterize video by temporal events, namely, dialogues, actions and story units. We then present these events using succinct visual summaries that depict and differentiate the underlying dramatic elements in an intuitive manner. The combination of video characterization and visual summary offers significant compaction of data size in video far beyond the numbers achieved by traditional video compression, while retaining essential meanings and semantics of the content, and is particularly useful for digital library and Internet applications.

Proceedings ArticleDOI
TL;DR: This work studies the performance of multiscale Hurst parameters as texture features for database image retrieval over a database consisting of homogeneous textures and compares the retrieval performance of the extended parameters against traditional Hurst features and features obtained from the Gabor wavelet.
Abstract: The increase in the number of multimedia databases consisting of images has created a need for a quick method to search these databases for a particular type of image. An image retrieval system will output images from the database similar to the query image in terms of shape, color, and texture. For the scope of our work, we study the performance of multiscale Hurst parameters as texture features for database image retrieval over a database consisting of homogeneous textures. These extended Hurst features represent a generalization of the Hurst parameter for fractional Brownian motion (fBm) where the extended parameters quantize the texture roughness of an image at various scales. We compare the retrieval performance of the extended parameters against traditional Hurst features and features obtained from the Gabor wavelet. Gabor wavelets have previously been suggested for image retrieval applications because they can be tuned to obtain texture information for a number of different scales and orientations. In our experiments, we form a database combining textures from the Bonn, Brodatz, and MIT VisTex databases. Over the hybrid database, the extended fractal features were able to retrieve a higher percentage of similar textures than the Gabor features. Furthermore, the fractal features are faster to compute than the Gabor features.

Proceedings ArticleDOI
TL;DR: A retrieval algorithm that takes a video clip as a query and searches the database for clips with similar contents that facilitates retrieval of clips for the purpose of video editing, broadcast news retrieval, or copyright violation detection is introduced.
Abstract: This paper presents a novel approach for video retrieval from a large archive of MPEG or Motion JPEG compressed video clips. We introduce a retrieval algorithm that takes a video clip as a query and searches the database for clips with similar contents. Video clips are characterized by a sequence of representative frame signatures, which are constructed from DC coefficients and motion information (`DC+M' signatures). The similarity between two video clips is determined by using their respective signatures. This method facilitates retrieval of clips for the purpose of video editing, broadcast news retrieval, or copyright violation detection.© (1997) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
TL;DR: An approach to embedding gray scale images using a discrete wavelet transform is proposed, which provides a simple control parameter that can be tailored to either hiding or watermarking purposes, and is robust to operations such as JPEG compression.
Abstract: An approach to embedding gray scale images using a discrete wavelet transform is proposed. The proposed scheme enables using signature images that could be as much as 25% of the host image data and hence could be used both in digital watermarking as well as image/data hiding. In digital watermarking the primary concern is the recovery or checking for signature even when the embedded image has been changed by image processing operations. Thus the embedding scheme should be robust to typical operations such as low-pass filtering and lossy compression. In contrast, for data hiding applications it is important that there should not be any visible changes to the host data that is used to transmit a hidden image. In addition, in both data hiding and watermarking, it is desirable that it is difficult or impossible for unauthorized persons to recover the embedded signatures. The proposed scheme provides a simple control parameter that can be tailored to either hiding or watermarking purposes, and is robust to operations such as JPEG compression. Experimental results demonstrate that high quality recovery of the signature data is possible.

Proceedings ArticleDOI
TL;DR: This paper explores the use of localized dominant hue and saturation values for color-based image similarity retrieval and results in a relatively compact representation of color images for similarity retrieval.
Abstract: Color is one of the most widely used features for image similarity retrieval. Most of the existing image similarity retrieval schemes employ either global or local color histogramming. In this paper, we explore the use of localized dominant hue and saturation values for color-based image similarity retrieval. This scheme results in a relatively compact representation of color images for similarity retrieval. Experimental results comparing the proposed representation with global and local color histogramming are presented to show the efficacy of the suggested scheme.

Proceedings ArticleDOI
TL;DR: This paper introduces FIDS, or `Flexible Image Database System.
Abstract: There is a growing need for the ability to query image databases based on image content rather than strict keyword search. Most current image database systems that perform query by content require a distance computation for each image in the database. Distance computations can be time consuming, limiting the usability of such systems. There is thus a need for indexing systems and algorithms that can eliminate candidate images without performing distance calculations. As user needs may change from session to session, there is also a need for run-time creation of distance measures. In this paper, we introduce FIDS, or `Flexible Image Database System.' FIDS allows the user to query the database based on user-defined polynomial combinations of predefined distance measures. Using an indexing scheme and algorithms based on the triangle inequality, FIDS can return matches to the query image without directly comparing the query images to much of the database. FIDS is currently being tested on a database of eighteen hundred images.

Proceedings ArticleDOI
TL;DR: The SS+-tree is described, a tree structure for supporting similarity searches in a high- dimensional Euclidean space that makes a better use of the clustering property of the available data by using a variant of the k-means clustering algorithm as the split heuristic for its nodes.
Abstract: In this paper, we describe the SS+-tree, a tree structure for supporting similarity searches in a high- dimensional Euclidean space. Compared to the SS-tree, the tree uses a tighter bounding sphere for each node which is an approximation to the smallest enclosing sphere and it also makes a better use of the clustering property of the available data by using a variant of the k-means clustering algorithm as the split heuristic for its nodes. A local reorganization rule is also introduced during the tree building to reduce the overlapping between the nodes' bounding spheres.© (1997) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
TL;DR: Novel approaches for automation of mentioned preparatory phases of mass-market storage systems for domestic use are proposed and validated, and a detection method for abrupt shot changes is proposed, using locally computed threshold based on a statistical model for frame-to-frame differences.
Abstract: In the European project SMASH mass-market storage systems for domestic use are under study. Besides the storage technology that is developed in this project, the related objective of user-friendly browsing/query of video data is studied as well. Key issues in developing a user-friendly system are (1) minimizing the user-intervention in preparatory steps (extraction and storage of representative information needed for browsing/query), (2) providing an acceptable representation of the stored video content in view of a higher automation level, (3) the possibility for performing these steps directly on the incoming stream at storage time, and (4) parameter-robustness of algorithms used for these steps. This paper proposes and validate novel approaches for automation of mentioned preparatory phases. A detection method for abrupt shot changes is proposed, using locally computed threshold based on a statistical model for frame-to-frame differences. For the extraction of representative frames (key frames) an approach is presented which distributes a given number of key frames over the sequence depending on content changes in a temporal segment of the sequence. A multimedia database is introduced, able to automatically store all bibliographic information about a recorded video as well as a visual representation of the content without any manual intervention from the user.

Proceedings ArticleDOI
TL;DR: This paper describes ongoing work toward querying by image content, which will include capability to search for x-ray images similar to an input image with respect to vertebral morphometry used to characterize features such as fractures and disc space narrowing.
Abstract: At the Lister Hill National Center for Biomedical Communications, a research and development division of the National Library of Medicine (NLM), we are developing a prototype multimedia database system to provide World Wide Web access to biomedical databases. WebMIRS (Web-based Medical Information Retrieval System) will allow access to databases containing text and images and will allow database query by standard SQL, by image content, or by a combination of the two. The system is being developed in the form of Java applets, which will communicate with the Informix DBMS on an NLM Sun workstation running the Solaris operating system. The system architecture will allow access from any hardware platform, which supports a Java-enabled Web browser, such as Netscape or Internet Explorer. Initial databases will include data from two national health surveys conducted by the National Center for Health Statistics (NCHS), and will include x-ray images from those surveys. In addition to describing in- house research in database access systems, this paper describes ongoing work toward querying by image content. Image content search capability will include capability to search for x-ray images similar to an input image with respect to vertebral morphometry used to characterize features such as fractures and disc space narrowing.

Proceedings ArticleDOI
TL;DR: An additive watermarking technique for grey scale pictures, which can be extended to video sequences, which analyzes the efficacy of the decoding as well as the resistance of the watermark towards compression and robustness against malevolent treatments.
Abstract: This paper presents an additive watermarking technique for grey scale pictures, which can be extended to video sequences. It consists of embedding secretly a copyright information (a binary scale) in the picture without degrading its quality. Those bits are encoded through the phase of Maximal Length Sequences (MLS). MLS are sequences having good correlation properties, which means that the result of the autocorrelation is far greater than crosscorrelations, i.e. correlations made with shifted version of this sequence. This embedding is performed line by line going from the top to the bottom of the picture as the objective was to implement a low cost and real time embedding method able to work for common video equipments. The very embedding process is underlain by a masking criterion that guarantees the invisibility of the watermark. This perceptive criterion, deduced from physiological and psychophysic studies, has already proved its efficiency in a previously presented paper. It is combined with an edge and texture discrimination to determine the embedding level of the MLS, whose bits are actually spread over 32 by 8 pixel squares. Eventually, some preliminary results are presented, which analyze the efficacy of the decoding as well as the resistance of the watermark towards compression and robustness against malevolent treatments.

Proceedings ArticleDOI
TL;DR: Although more research is required across different photo-scales and sets of images, it is concluded that texture features generated from compressed JPEG images have potential for content-based image retrieval based on texture.
Abstract: We test the performance of a texture feature constructed from the variance of the first eight AC Discrete Cosine Transform (DCT) coefficients of JPEG compressed images. We break the image into sub-images, consisting of many 8*8 blocks, and them calculate the variance of each DCT coefficient across the sub-image. We evaluate the texture feature at two different image resolutions, and at three different quality factors. In our high resolution image a pixel covered a square of side 4 cm on the ground. Our low resolution image was generated by subsampling. Representative feature vectors were generated for give subjectively identified textures, by averaging a small training set. Each sub-image was then classified according to the representative feature vector closest in feature space. Compression ratio had little effect on the classification result in our study. However image resolution significantly altered the classification result. Classification correlated much more closely to a subjective classification for the low resolution image. Feature vectors also fell into much more clearly defined clusters at the lower resolution. Although more research is required across different photo-scales and sets of images, we conclude that texture features generated from compressed JPEG images have potential for content-based image retrieval based on texture.

Proceedings ArticleDOI
TL;DR: This paper describes a three-stage processing system consisting of a shot boundary detection stage, an audio classification stage, and a speaker identification stage to determine the presence of different actors in isolated shots to show the efficacy of speaker identification for labeling video clips in terms of persons present in them.
Abstract: Video content characterization is a challenging problem in video databases. The aim of such characterization is to generate indices that can describe a video clip in terms of objects and their actions in the clip. Generally, such indices are extracted by performing image analysis on the video clips. Many such indices can also be generated by analyzing the embedded audio information of video clips. Indices pertaining to context, scene emotion, and actors or characters present in a video clip appear especially suitable for generation via audio analysis techniques of keyword spotting, and speech and speaker recognition. In this paper, we examine the potential of speaker identification techniques for characterizing video clips in terms of actors present in them. We describe a three-stage processing system consisting of a shot boundary detection stage, an audio classification stage, and a speaker identification stage to determine the presence of different actors in isolated shots. Experimental results using the movie A Few Good Men are presented to show the efficacy of speaker identification for labeling video clips in terms of persons present in them.© (1997) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
TL;DR: The ImageMiner-System, which was developed at the University of Bremen in the AI group, is designed for content-based retrieval of single images by a new combination of techniques and methods from computer vision and artificial intelligence.
Abstract: The large amount of available multimedia information (e.g. videos, audio, images) requires efficient and effective annotation and retrieval methods. As videos start playing a more important role in the frame of multimedia, we want to make these available for content-based retrieval. The ImageMiner-System, which was developed at the University of Bremen in the AI group, is designed for content-based retrieval of single images by a new combination of techniques and methods from computer vision and artificial intelligence. In our approach to make videos available for retrieval in a large database of videos and images there are two necessary steps: First, the detection and extraction of shots from a video, which is done by a histogram based method and second, the construction of the separate frames in a shot to one still single images. This is performed by a mosaicing-technique. The resulting mosaiced image gives a one image visualization of the shot and can be analyzed by the ImageMiner-System. ImageMiner has been tested on several domains, (e.g. landscape images, technical drawings), which cover a wide range of applications.

Proceedings ArticleDOI
Boon-Lock Yeo1, Minerva M. Yeung1
TL;DR: New techniques for classification and simplification of the STG are studied, and better means of visualizing the graph through dynamic visual display and simplified structures are presented to enable more succinct presentation of the graphs.
Abstract: The Scene Transition Graph (STG) [1] is a directed graph structure that compactly captures both image content and temporalflow ofvideo. An STG offers a condensed view ofthe story content, serves as the summary ofthe clip represented, and allowsnonlinear access to its story element. It can serve as a valuable tool for both the analysis of video structure and presentationof high level visual summary for video browsing applications. In this paper, we study new techniques for classification andsimplification of the STG, and present better means of visualizing the graph through dynamic visual display and simplifiedstructures. In other words, our techniques improve significantly the existing graph structure to enable more succinct presentationof the graphs which leads to more efficient utilization of the screen spaces. In addition, a technique that captures and presentsvisually the temporal dynamics ofthe video sequence is described. We have tested the graph visualization techniques on variousprogramming types and the new tools are found to effectively handle video from a wider variety than the existing STG structure.Keywords: Scene Transition Graph, graph simplification, video visualization, content analysis, digitallibrary, video databases,video browsing, dynamic temporal display of video sequences.

Proceedings ArticleDOI
TL;DR: A novel technique is described which reduces a sequence of MPEG encoded video frames to a trail of points in a low dimensional space and lays the groundwork for the complete analysis and representation of the video's physical and semantic structure.
Abstract: A high-level representation of a video clip comprising information about its physical and semantic structure is necessary for providing appropriate processing, indexing and retrieval capabilities for video databases. We describe a novel technique which reduces a sequence of MPEG encoded video frames to a trail of points in a low dimensional space. In our earlier work, we presented techniques applicable in 3-D, but in this paper, we describe techniques that can be extended to higher dimensions where improved performance is expected. In the low-dimensional space, we can cluster frames, analyze transitions between clusters and compute properties of the resulting trail. Portions of the trail can be classified as either stationary or transitional, leading to high-level descriptions of the video. Tracking the interaction of clusters over time, we lay the groundwork for the complete analysis and representation of the video's physical and semantic structure.