scispace - formally typeset
Search or ask a question

Showing papers in "Storage and Retrieval for Image and Video Databases in 1998"


Proceedings ArticleDOI
TL;DR: Comparative investigations on early shot boundary detection algorithms designed explicitly to detect specific complex editing operations, such as fades and dissolves, are taken into account and show that while hard cuts and fades can be detected reliably, dissolves are still an open research issue.
Abstract: Various methods of automatic shot boundary detection have been proposed and claimed to perform reliably. Although the detection of edits is fundamental to any kind of video analysis since it segments a video into its basic components, the shots, only few comparative investigations on early shot boundary detection algorithms have been published. These investigations mainly concentrate on measuring the edit detection performance, however, do not consider the algorithms? ability to classify the types and to locate the boundaries of the edits correctly. This paper extends these comparative investigations. More recent algorithms designed explicitly to detect specific complex editing operations such as fades and dissolves are taken into account, and their ability to classify the types and locate the boundaries of such edits are examined. The algorithms? performance is measured in terms of hit rate, number of false hits, and miss rate for hard cuts, fades, and dissolves over a large and diverse set of video sequences. The experiments show that while hard cuts and fades can be detected reliably, dissolves are still an open research issue. The false hit rate for dis-solves is usually unacceptably high, ranging from 50% up to over 400%. Moreover, all algorithms seem to fail under roughly the same conditions.

510 citations


Proceedings ArticleDOI
TL;DR: Experimental results show that the estimated Gaussian mixture model fits skin images from a large database and applications of the estimated density function in image and video databases are presented.
Abstract: This paper is concerned with estimating a probability density function of human skin color, using a finite Gaussian mixture model, whose parameters are estimated through the EM algorithm. Hawkins' statistical test on the normality and homoscedasticity (common covariance matrix) of the estimated Gaussian mixture models is performed and McLachlan's bootstrap method is used to test the number of components in a mixture. Experimental results show that the estimated Gaussian mixture model fits skin images from a large database. Applications of the estimated density function in image and video databases are presented.

302 citations


Proceedings ArticleDOI
TL;DR: This paper casts the image classification problem in a Bayesian framework, and demonstrates how high-level concepts can be understood from specific low-level image features, under the constraint that the test images do belong to one of the delineated classes.
Abstract: Grouping images into (semantically) meaningful categories using low-level visual features is still a challenging and important problem in content-based image retrieval. Based on these groupings, effective indices can be built for an image database. In this paper, we cast the image classification problem in a Bayesian framework. Specifically, we consider city vs. landscape classification, and further, classification of landscape into sunset, forest, and mountain classes. We demonstrate how high-level concepts can be understood from specific low-level image features, under the constraint that the test images do belong to one of the delineated classes. We further demonstrate that a small codebook (the optimal size is selected using the MDL principle) extracted from a vector quantizer, can be used to estimate the class-conditional densities needed for the Bayesian methodology. Classification based on color histograms, color coherence vectors, edge direction histograms, and edge-direction coherence vectors as features shows promising results. On a database of 2,716 city and landscape images, our system achieved an accuracy of 95.3 percent for city vs. landscape classification. On a subset of 528 landscape images, our system achieves an accuracy of 94.9 percent for sunset vs. forest and mountain classification, and 93.6 percent for forest vs. mountain classification. Our final goal is to combine multiple 2- class classifiers into a single hierarchical classifier.

71 citations



Proceedings ArticleDOI
TL;DR: This work presents a clustering based indexing technique, where the images in the database are grouped into clusters of images, with similar color content using a hierarchical clustering algorithm, and shows that this clustering-based approach offers a superior response time with high retrieval accuracy.
Abstract: Image retrieval systems that compare the query image exhaustively with each individual image in the database are not scalable to large databases. A scalable search system should ensure that the search time does not increase linearly with the number of images in the database. We present a clustering based indexing technique, where the images in the database are grouped into clusters of images with similar color content using a hierarchical clustering algorithm. At search time the query image is not compared with all the images in the database, but only with a small subset. Experiments show that this clustering based approach offers a superior response time with a high retrieval accuracy. Experiments with different database sizes indicate that for a given retrieval accuracy the search time does not increase linearly with the database size.

69 citations


Journal Article
TL;DR: In this article, the authors look at the task of comparing computing machines, reviewing normalization techniques and many important issues which arise during comparisons. But they do not attempt to make definitive conclusions about the merits of the technology alternatives from the small sample set.
Abstract: Reconfigurable computing devices are emerging as a viable alternative to fixed-function components and programmable processors. To expand our knowledge of the role and optimization of these devices, it is increasingly imperative for us to compare implementations of tasks and subroutines across this wide spectrum of implementation options. The fact that most processors, FPGAs, ASICs, and memories are fabricated in a uniform technology medium, CMOS VLSI, where area scaling is moderately well understood eases our comparison task. Nonetheless, the rapid pace of technology, limited device size selection, and economic artifacts complicate the picture. In this paper, we look at the task of comparing computing machines, reviewing normalization techniques and many important issues which arise during comparisons. This paper includes examples intended to underscore the methodology and comparison issues, but does not attempt to make definitive conclusions about the merits of the technology alternatives from the small sample set. The immediate intent of this work is to help designers faced with tradeoffs between technological alternatives. The longer term intent is to help the community collect and analyze the broad-based data needed to better understand the range of available computing options.© (1998) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

43 citations


Proceedings ArticleDOI
TL;DR: In this paper, a feature point histogram is obtained by discretizing the angles produced by the Delaunay triangulation of a set of unique feature points, which characterize object shape in context, and then counting the number of times each discrete angle occurs in the resulting triangle.
Abstract: Recent research on image databases has been aimed at the development of content-based retrieval techniques for the management of visual information Compared with such visual information as color, texture, and spatial constraints, shape is an important feature Associated with those image objects of interest, shape alone may be sufficient to identify and classify an object completely and accurately This paper presents a novel method, based on feature point histogram indexing for object shape representation in image databases In this scheme, the feature point histogram is obtained by discretizing the angles produced by the Delaunay triangulation of a set of unique feature points, which characterize object shape in context, and then counting the number of times each discrete angle occurs in the resulting triangulation The proposed shape representation technique is translation, scale, and rotation independent Our various experiments concluded that the Euclidean distance performs well as the similarity measure function, in combination with the feature point histogram computed by counting the two largest angles of each individual Delauney triangle Through further experiments, we also found evidence that an image object representation, using a feature point histogram, provides an effective cue for image object discrimination

43 citations


Proceedings ArticleDOI
TL;DR: The recovery of the semantic structure of the data enables the automated solutions in constructing visual representations that are relevant to the semantics as well as in establishing useful relationships among data units such as topic categorization and content based multimedia hyperlinking.
Abstract: This paper addresses the problem of recovering the semantic structure of broadcast news. A hierarchy of retrievable units is automatically constructed by integrating information from different media. The hierarchy provides a compact, yet meaningful, abstraction of the broadcast news data, similar to a conventional table of content that can serve as an effective index table, facilitating the capability of browsing through large amounts of data in a nonlinear fashion. The recovery of the semantic structure of the data further enables the automated solutions in constructing visual representations that are relevant to the semantics as well as in establishing useful relationships among data units such as topic categorization and content based multimedia hyperlinking. Preliminary experiments of integrating different media for hierarchical segmentation of semantics have yielded encouraging results. Some of the results are presented and discussed in this paper.© (1998) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

39 citations


Proceedings ArticleDOI
TL;DR: This paper presents an approach for automatic segmentation, indexing, and retrieval of audiovisual data, based on audio content analysis, and shows that the proposed approach has an accuracy rate higher than 90 percent for the coarse-level classification, and higher than 85% for the fine- level classification.
Abstract: While current approaches for video segmentation and indexing are mostly focused on visual information, audio signals may actually play a primary role in video content parsing. In this paper, we present an approach for automatic segmentation, indexing, and retrieval of audiovisual data, based on audio content analysis. The accompanying audio signal of audiovisual data is first segmented and classified into basic types, i.e., speech, music, environmental sound, and silence. This coarse-level segmentation and indexing step is based upon morphological and statistical analysis of several short-term features of the audio signals. Then, environmental sounds are classified into finer classes, such as applause, explosions, bird sounds, etc. This fine-level classification and indexing step is based upon time- frequency analysis of audio signals and the use of the hidden Markov model as the classifier. On top of this archiving scheme, an audiovisual data retrieval system is proposed. Experimental results show that the proposed approach has an accuracy rate higher than 90 percent for the coarse-level classification, and higher than 85 percent for the fine-level classification. Examples of audiovisual data segmentation and retrieval are also provided.

39 citations


Proceedings ArticleDOI
TL;DR: This paper presents a different approach to content-based retrieval and a novel framework for classification of visual information, in which users define their own visual classes and classifiers are learned automatically and multiple fuzzy-classifiers and machine learning techniques are combined for automatic classification at multiple levels.
Abstract: Most existing approaches to content-based retrieval rely on query by example, or user sketch based on low-level features. However, these are not suitable for semantic (object level) distinctions. In other approaches, information is classified according to a predefined set of classes and classification is either performed manually or by using class-specific algorithms. Most of these systems lack flexibility: the user does not have the ability to define or change the classes, and new classification schemes require implementation of new class-specific algorithms and/or the input of an expert. In this paper, we present a different approach to content-based retrieval and a novel framework for classification of visual information, in which (1) users define their own visual classes and classifiers are learned automatically, and (multiple fuzzy-classifiers and machine learning techniques are combined for automatic classification at multiple levels (region, perceptual, object-part, object and scene). We present The Visual Apprentice, an implementation of our framework for still images and video that uses a combination of lazy-learning, decision trees, and evolution programs for classification and grouping. Our system is flexible, in that models can be changed by users over time, different types of classifiers are combined, and user-model definitions can be applied to object and scene structure classification. Special emphasis is placed on the difference between semantic and visual classes, and between classification and detection. Examples and results are presented to demonstrate the applicability of our approach to perform visual classification and detection.

39 citations


Proceedings ArticleDOI
TL;DR: The performance of the Video Trails-based algorithms, with other existing special effect edit-detection algorithms within the literature are discussed.
Abstract: Video segmentation plays an integral role in many multimedia applications, such as digital libraries, content management systems, and various other video browsing, indexing, and retrieval systems. Many algorithms for segmentation of video have appeared within the past few years. Most of these algorithms perform well on cuts, but yield poor performance on gradual transitions or special effects edits. A complete video segmentation system must also achieve good performance on special effect edit detection. In this paper, we discuss the performance of our Video Trails-based algorithms, with other existing special effect edit-detection algorithms within the literature. Results from experiments testing for the ability to detect edits from TV programs, ranging from commercials to news magazine programs, including diverse special effect edits, which we have introduced.

Proceedings ArticleDOI
TL;DR: This paper describes a snapshot of work in progress on the development of an efficient file-access method for similarity searching in high-dimensional vector spaces, based on using a collection of space-filling curves, as an auxiliary indexing structure.
Abstract: This paper describes a snapshot of work in progress on the development of an efficient file-access method for similarity searching in high-dimensional vector spaces. This method has applications in image databases, where images are accessed via high-dimensional feature vectors, as well as other areas. The technique is based on using a collection of space-filling curves, as an auxiliary indexing structure. Initial performance analyses suggest that the method works as efficiently in moderately high-dimensional spaces (256 dimensions), with tolerable storage and execution-time overhead.

Proceedings ArticleDOI
TL;DR: The concept of an efficient semiautomatic system for analysis, classification and indexing of TV news program material, and the feasibility of its practical realization is presented.
Abstract: In this paper we present the concept of an efficient semi-automatic system for analysis, classification and indexing of TV news program material and show the feasibility of its practical realization. The only input into the system, other than the news program itself, are the spoken words, serving as keys for topic prespecification. The chosen topics express user's current professional or private interests and are used for filtering the news material correspondingly. After the basic analysis steps on a news program stream, including the processes of shot change detection and key frame extraction, the system automatically represents the news program as a series of longer higher-level segments. Each of them contains one or more video shots and belongs to one of the coarse categories such as anchorperson (news reader) shots, news shot series, the starting and ending program sequence. The segmentation procedure is performed on the video component of the news program stream and the results are used to define the corresponding segments in the news audio stream. In the next step, the system uses the prespecified audio keys to index the segments and group them into reports, being the actual retrieval units. This step is performed on the segmented news audio stream by applying the wordspotting procedure to each segment. As a result, all the reports on prespecified topics are easily reachable for efficient retrieval.

Proceedings ArticleDOI
TL;DR: This paper focuses on the detection of wipes, and proposes an algorithm that becomes possible to automatically detect wipes, simply by determining various lines and curves on the visual rhythm.
Abstract: With the currently existing shot change detection algorithms, abrupt changes are detected fairly well. It is thus more challenging to detect gradual changes, including fades, dissolves, and wipes, as these are often missed or falsely detected. In this paper, we focus on the detection of wipes. The proposed algorithm begins by processing the visual rhythm, a portion of the DC image sequence. It is a single image, a sub-sampled version of a full video, in which the sampling is performed in a predetermined and systematic fashion. The visual rhythm contains distinctive patterns or visual features for many different types of video effects. The different video effects manifest themselves differently on the visual rhythm. In particular, wipes appear as curves, which run from the top to the bottom of the visual rhythm. Thus, using the visual rhythm, it becomes possible to automatically detect wipes, simply by determining various lines and curves on the visual rhythm.

Proceedings ArticleDOI
TL;DR: This paper proposes to incorporate spatial information into the image histograms, by computing features from the spatial distance between pixels, belonging to the same intensity or color, to form an augmented image histogram.
Abstract: Image histogram is an image feature widely used in content- based image retrieval and video segmentation. It is simple to compute, yet very effective as a feature in detecting image-to-image similarity, or frame-to-frame dissimilarity. While the image histogram captures the global distribution of different intensities or colors well, it does not contain any information about the spatial distribution of pixels. In this paper, we propose to incorporate spatial information into the image histogram, by computing features from the spatial distance between pixels, belonging to the same intensity or color. In addition to the frequency, count of the intensity or color, the mean, variance, and entropy of the distances are computed to form an augmented image histogram. Using the new feature, we performed experiments on a set of color images and a color video sequence. Experimental results demonstrate that the augmented image histogram performs significantly better than the conventional color histogram, both in the image retrieval and video shot segmentation.

Proceedings ArticleDOI
TL;DR: This paper describes a new approach to managing large image databases, which is based on a similarity pyramid data structure, which hierarchically organizes the database, so that it can be efficiently browsed.
Abstract: In this paper, we describe a new approach to managing large image databases, which we call active browsing. Active browsing integrates relevance feedback into the browsing environment, so that users can modify the database's organization to suit the desired task. Our method is based on a similarity pyramid data structure, which hierarchically organizes the database, so that it can be efficiently browsed. At coarse levels, the similarity pyramid allows users to view the database as large clusters of similar images. Alternatively, users can 'zoom into' finer levels to view individual images. We discuss relevance feedback for the browsing process, and argue that it is fundamentally different from relevance feedback for more traditional search-by-query tasks. We propose two fundamental operations for active browsing: pruning and reorganization. Both of these operations depend on a user-defined relevance set, which represents the image or set of images desired by the user. We present statistical methods for accurately pruning the database, and we propose a new 'worm hole' distance metric for reorganizing the database, so that members of the relevance set are grouped together.

Proceedings ArticleDOI
TL;DR: This system does away with histogram techniques for color indexing and retrieval, and implements color vector techniques, and reaches a much smaller index, which does not have the granularity of a histogram.
Abstract: A key aspect of image retrieval using color, is the creation of robust and efficient indices. In particular, the color histogram remains the most popular index, due primarily to its simplicity. However, the color histogram has a number of drawbacks. Specifically, histograms capture only global activity, they require quantization to reduce dimensionality, are highly dependent on the chosen color space, have no means to exclude a certain color from a query, and can provide erroneous results due to gamma nonlinearity. In this paper, we present a vector angular distance measure, which is implemented as part of our database system. Our system does away with histogram techniques for color indexing and retrieval, and implements color vector techniques. We use color segmentation to extract regions of prominent color and use representative vectors from these extracted regions in the image indices. We therefore reach a much smaller index, which does not have the granularity of a histogram. Rather, similarity is based on our vector angular distance measure, between a query color vector and the indexed representative vectors.

Proceedings ArticleDOI
Thomas Mcgee1, Nevenka Dimitrova1
TL;DR: Results indicate that adding detection of text, in addition to cut rate, to reduce the number of false positives, appears to be a promising method that should further increase reliability.
Abstract: ing video information automatically from TV broadcast, requires reliable methods for isolating program and commercial segments out of the full broadcast material. In this paper, we present the results from cut, static sequence, black frame, and text detection, for the purpose of isolating non-program segments. These results are evaluated, by comparison, to human visual inspection using more than 13 hours of varied program content. Using cut rate detection alone, produced a high recall with medium precision. Text detection was performed on the commercials, and the false positive segments. Adding text detection slightly lowers the recall. However, much higher precision is achieved. A new fast black frame detector algorithm is presented. Black frame detection is important for identifying commercial boundaries. Results indicate that adding detection of text, in addition to cut rate, to reduce the number of false positives, appears to be a promising method. Furthermore, by adding the information about position and size of text, and tracking it through an area, should further increase reliability.

Proceedings ArticleDOI
TL;DR: An algorithm, based on motion vectors, is proposed to detect sudden scene changes and gradual scene changes (camera movements such as panning, tilting and zooming) in uncompressed, as well as, compressed domain video.
Abstract: Video parsing is an important step in content-based indexing techniques, where the input video is decomposed into segments with uniform content. In video parsing detection of scene changes is one of the approaches widely used for extracting key frames from the video sequence. In this paper, an algorithm, based on motion vectors, is proposed to detect sudden scene changes and gradual scene changes (camera movements such as panning, tilting and zooming). Unlike some of the existing schemes, the proposed scheme is capable of detecting both sudden and gradual changes in uncompressed, as well as, compressed domain video. It is shown that the resultant motion vector can be used to identify and classify gradual changes due to camera movements. Results show that algorithm performed as well as the histogram-based schemes, with uncompressed video. The performance of the algorithm was also investigated with H.263 compressed video. The detection and classification of both sudden and gradual scene changes was successfully demonstrated.© (1998) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
TL;DR: It is shown that in certain conditions, the novel similarity measure, introduced for retrieving images that 'mostly' fit the query image, from an image database, is a norm, a fact that can be used to reduce the searching time, using the triangle inequality.
Abstract: A novel similarity measure based on the Choquet integral was introduced for retrieving images that 'mostly' fit the query image, from an image database. We show that in certain conditions, the measure is a norm, a fact that can be used to reduce the searching time, using the triangle inequality. To test the new measure, a content-based image retrieval system was built. The system was benchmarked against the visual retrieval cartridge, Virage, built into Oracle 8 database system. The results suggested that the new measure is useful for image retrieval.© (1998) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
TL;DR: A prototype of the content-based image retrieval system is implemented, which is able to trace the users' interactions during retrieval and make refinements of the retrieval results while the users are submitting the queries telling the specific requirements.
Abstract: A prototype of the content-based image retrieval system is implemented, based on the algorithms introduced in this paper. The image contents at the high levels are extracted. The fuzzy C means classifier is employed to compute the object clusters and provide useful information for overlapped clusters. Automatic image segmentation and categorization are achieved. To obtain the context for image retrieval, the subjective and objective contexts are modeled by means of the fuzzy sets theory. The system is able to trace the users' interactions during retrieval. The refinements of the retrieval results can be made, while the users are submitting the queries telling the specific requirements.© (1998) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
TL;DR: This article describes the use of gesture recognition techniques in computer vision, as a natural interface for video content navigation, and the design of a navigation and browsing system, which caters to these natural means of computer-human interaction.
Abstract: This article describes the use of gesture recognition techniques in computer vision as a natural interlace for video content navigation, and the design of a navigation andbrowsing system that caters to these natural means of computer-human interaction. For consumer applications, video content navigation presents two challenges: (1) howto parse and summarize multiple video streams in an intuitive and efficient manner, and (2) what type of interface will enhance the ease of use for video browsing andnavigation in a living room setting or an interactive environment. In this paper, we address the issues and propose the techniques that combine video content navigation with gestures, seamlessly and intuitively, in an integrated system. The current framework can incorporate speech recognition technology. We present a new type ofbrowser for browsing and navigating video content, as well as a gesture recognitioninterface for this browser. Keywords: Computer vision, pattern recognition, user interface, video content

Proceedings ArticleDOI
TL;DR: This paper provides a comprehensive quantitative comparison of the metrics, which have been applied to shot boundary detection, and a mathematical framework for quantitatively comparing metrics is supplied.
Abstract: The detection of shot boundaries in video sequences is an important task for generating indexed video databases This paper provides a comprehensive quantitative comparison of the metrics, which have been applied to shot boundary detection We will additionally consider several standardized statistical tests, which have not been applied to this problem, and three new metrics A mathematical framework for quantitatively comparing metrics is supplied Also included, are experimental results based on a video database containing 39,000 frames

Proceedings ArticleDOI
TL;DR: It is concluded that optimal texture feature set for texture feature-based similarity retrieval is highly application dependent, and has to be carefully evaluated for each individual application scenario.
Abstract: In this paper, the performance of similarity retrieval from a database of earth core images by using different sets of spatial and transformed-based texture features is evaluated and compared. A benchmark consisting of 69 core images from rock samples is devised for the experiments. We show that the Gabor feature set is far superior to other feature sets in terms of precision-recall for the benchmark images. This is in contrast to an earlier report by the authors in which we have observed that the spatial-based feature set outperforms the other feature sets by a wide margin for a benchmark image set consisting of satellite images when the evaluation window has to be small (32 X 32) in order to extract homogenous regions. Consequently, we conclude that optimal texture feature set for texture feature-based similarity retrieval is highly application dependent, and has to be carefully evaluated for each individual application scenario.© (1998) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
TL;DR: A prototype implementation of DRAWSEARCH, an image retrieval by content system, which uses color and shape features to index and retrieve images and implements relevance feedback to allow users to dynamically refine queries.
Abstract: Content-based image retrieval has recently become one of the most active research areas, due to the massive increase in the amount and complexity of digitized data being stored, transmitted, and accessed. We present here a prototype implementation of DRAWSEARCH, an image retrieval by content system, which uses color and shape (and texture in the near future) features to index and retrieve images. The system, currently being tested and improved, is designed to increase interactivity with users posing queries over the Internet and avails of a Java client for query by sketch. It also implements relevance feedback to allow users to dynamically refine queries. Experiments show that the proposed approach can greatly reduce the user's effort to compose a query, while capturing the needed information with greater precision.© (1998) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
TL;DR: This work presents a framework for adaptively storing, accessing, and retrieving large images, which uses a space and frequency graph to generate and select image view elements for storing in the database and speeds-up retrieval for access and retrieval modes.
Abstract: Enabling the efficient storage, access and retrieval of large volumes of multidimensional data is one of the important emerging problems in databases. We present a framework for adaptively storing, accessing, and retrieving large images. The framework uses a space and frequency graph to generate and select image view elements for storing in the database. By adapting to user access patterns, the system selects and stores those view elements that yield the lowest average cost for accessing the multiresolution subregion image views. The system uses a second adaptation strategy to divide computation between server and client in progressive retrieval of image views using view elements. We show that the system speeds-up retrieval for access and retrieval modes, such as drill-down browsing and remote zooming and panning, and minimizes the amount of data transfer over the network.© (1998) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
TL;DR: A new set of color features for representing color images are proposed, and it is shown how they can be computed and used efficiently to retrieve images that possess certain similarity.
Abstract: Color indexing is a technique by which images in the database could be retrieved on the basis of their color content. In this paper, we propose a new set of color features for representing color images, and show how they can be computed and used efficiently to retrieve images that possess certain similarity. These features are based on the first three moments of each color channel. Two differences distinguish this work from previous work reported in the literature. First, we compute the third moment of the color channel distribution around the second moment, not around the first moment. The second moment is less sensitive to small luminance changes, than the first moment. Secondly, we combine all three moment values in a single descriptor. This reduces the number of floating point values needed to index the image and, hence, speeds up the search. To give the user flexibility in terns of defining his center of attention during query time, the proposed approach divides the image into five geometrical regions and allows the user of give different weights for each region to designate its importance. The approach has been tested on databases of 205 images of airplanes and natural scenes. It proved to be insensitive to small rotations and small translations in the image and yielded a better hit rate than similar algorithms previously reported in the literature.

Journal Article
TL;DR: WaveMark as mentioned in this paper is a wavelet-based multiresolution digital watermarking system for color images, which uses discrete wavelet transforms and error-correcting coding schemes to provide robust watermark.
Abstract: As more and more digital images are distributed on-line via the Internet and WWW, many copyright owners are concerned about protecting the copyright of digital images. This paper describes WaveMark, a novel wavelet-based multiresolution digital watermarking system for color images. The algorithm in WaveMark uses discrete wavelet transforms and error- correcting coding schemes to provide robust watermarking of digital images. Unlike other wavelet-based algorithms, our watermark recovery procedure does not require a match with an uncorrupted original image. Our algorithm uses Daubechies' advanced wavelets and extended Hamming codes to deal with problems associated with JPEG compression and random additive noise. In addition, the algorithm is able to sustain intentional disturbances introduced by professional robustness testing programs such as StirMark. The use of Daubechies' advanced wavelets makes the watermarked images more perceptively faithful than the images watermarked with the Haar wavelet transform. The watermark is adaptively applied to different areas of the image, based on the smoothness of the areas, to increase robustness within the limits of perception. The system is practical for real-world applications, encoding or decoding images at the speed of less than one second each on a Pentium Pro PC.

Proceedings ArticleDOI
TL;DR: This paper addresses the question of increasing performance of the proposed content-based image retrieval algorithm, by the addition of a data structure known as the Triangle Trie.
Abstract: A new class of algorithms, based on triangle inequality, has recently been proposed for use in content-based image retrieval. These algorithms rely on comparing a set of key images to the database images, and storing the computed distance distances. Query images are later compared to the keys, and the triangle inequality is used to speedily compute lower bounds on the distance from the query to each database image. This paper addresses the question of increasing performance of this algorithm, by the addition of a data structure known as the Triangle Trie.

Proceedings ArticleDOI
TL;DR: A new, computationally efficient, effective technique for detection of abrupt scene changes in MPEG-4/2 compressed video sequences that combines the dc image-based approach of Feng, Lo, and Mehrpour with the bit allocation-based technique.
Abstract: In this paper, we present a new, computationally efficient, effective technique for detection of abrupt scene changes in MPEG-4/2 compressed video sequences. We combine the dc image-based approach of Feng, Lo, and Mehrpour. The bit allocation-based approach has the advantage of computational simplicity, since it only requires entropy decoding of the sequence. Since extraction of dc images from I- Frames/Objects is simple, the dc image-based technique of Yeo is a good alternative for comparison of I- Frames/Objects. For P-Frames/Objects, however, Yeo's algorithm requires additional computation. We find that the bit allocation-change based approach is prone to false detection in comparison to intracoded objects in MPEG-4 sequences. However, if a suspected scene/object change has been located accurately in a group of consecutive frames/objects, the bit allocation-based technique quickly and accurately locates the cut point therein. This motivates us to use dc image-based detection between successive I- Frames/Objects, to identify the subsequences with scene/object changes, and then use bit allocation-based detection to find the cut point therein. Our technique thus has only a marginally greater complexity than the completely bit allocation-based technique, but has greater accuracy. It is applicable to both MPEG-2 sequences and MPEG-4 multiple- object sequences. In the MPEG-4 multiple object case, we use a weighted sum of the change in each object of the frame, using the area of the object as the weight.