scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Fuzzy color histogram-based video segmentation

01 Jan 2010-Computer Vision and Image Understanding (Academic Press)-Vol. 114, Iss: 1, pp 125-134
TL;DR: Experimental results show that the proposed fuzzy color histogram-based shot-boundary detection algorithm effectively detects shot boundaries and reduces false alarms as compared to the state-of-the-art shot- boundary detection algorithms.
Abstract: We present a fuzzy color histogram-based shot-boundary detection algorithm specialized for content-based copy detection applications The proposed method aims to detect both cuts and gradual transitions (fade, dissolve) effectively in videos where heavy transformations (such as cam-cording, insertions of patterns, strong re-encoding) occur Along with the color histogram generated with the fuzzy linking method on L*a*b* color space, the system extracts a mask for still regions and the window of picture-in-picture transformation for each detected shot, which will be useful in a content-based copy detection system Experimental results show that our method effectively detects shot boundaries and reduces false alarms as compared to the state-of-the-art shot-boundary detection algorithms

Summary (3 min read)

1. Introduction

  • Recent developments in multimedia technology with the significant growth of media resources introduced content-based copy detection (CBCD) as a new research field alternative to the watermarking approach for identification of video sequences.
  • An abrupt transition, also known as hard-cut, is the most common and easy to detect transition type.
  • The paper is organized as follows: Section 2 discusses related work.
  • Section 3 explains the video transformations used for ll rights reserved.
  • Section 4 describes the proposed shot-boundary detection algorithm, as well as the methods for detecting frame-dropping and picture-in-picture transformations, noise detection, and mask generation.

3. Video transformations

  • In order to develop a shot-boundary detection algorithm specialized for CBCD applications, the authors need to understand the effects of transformations used for modifying videos.
  • Here, the authors discuss the negative effects of video transformations, and possible corrective actions taken by their method: i Frame dropping: Dropped frames should be ignored or estimated; otherwise the shot-boundary detection algorithm decides each blank frame as a cut.
  • Such frames have the mean of intensity values near to zero.
  • With the extracted window, foreground and background frames can be handled separately.
  • A mask for still regions, which includes the inserted pattern or text, will increase the effectiveness of a CBCD system.

4.1. Detection of frame-dropping transformation

  • Handling frame-dropping transformation is one of the key features of a shot-boundary detection system specialized for CBCD applications; since most of the proposed algorithms consider missing frames as hard-cuts.
  • A dropped frame is either exactly or nearly a blank frame, which has a small overall intensity (less than thbf = 0.0039).

4.2. Noise detection

  • CBCD applications should handle query videos with heavy noise transformations.
  • For their algorithms to work properly, noisy frames/shots should be identified before any further operation that is based on edge detection or use standard deviation of pixel intensity values.
  • In image processing, a nonlinear median filter is preferred over a linear filter for cleaning salt and pepper and white noise.
  • Otherwise, when the average intensity change exceeds a threshold thn, it is regarded as noisy.
  • The authors evaluate the noise detection method and the impact of the parameters (the size of the median filter and the threshold value) in Section 5.

4.3. Mask generation

  • When a video segment is transformed with various types of transformations summarized in Table 1, it clearly changes the content of the frames regarding color, edge, and shape information.
  • A content-based copy detection system should cut out the artificially inserted texts, patterns, logos, etc., if possible.
  • Besides, it should ignore the bordering black areas produced by shift, crop, and letterbox transformations.
  • As a result, the probability of matching with the original video segment is increased.
  • The authors create a mask of standard deviations greater than the threshold thsr = 0.01 for each shot representing still regions while detecting shot boundaries.

4.4. Detection of picture-in-picture transformation

  • In order to detect the window of picture-in-picture transformation, black framings on the sides of the video segment generated by cam-cording, crop, or shift transformations should be extracted first.
  • The authors mark each row and column starting from the beginning and from the end as border rows if 1 w Xw c¼1 rshotði; cÞ < thsr ð9Þ holds for that row.
  • The authors crop out the borders from Mshot, and then find the derivatives with a first-order difference from both + and x-axis using the Prewitt edge detector: Eshot ¼ 1 0 1 1 0 1 1 0 1 2 64 3 75 Mshot ð10Þ Strong vertical edges are extracted from Eshot using Hough lines [29].
  • Only vertical lines are selected, and compared in order to form a rectangular window.
  • The candidate window(s) and the border information for each shot are stored.

4.5. Shot-boundary detection

  • The authors use a color histogram-based method generated with the fuzzy linking method on L*a*b* color space.
  • L* is divided into 3 regions (black, gray, white), a* is divided into 5 regions (green, greenish, middle, reddish, red), and b* also is divided into 5 regions (blue, bluish, middle, yellowish, yellow).
  • The main advantage of the proposed fuzzy color histogram over a conventional color histogram is its accuracy.
  • The fuzzy color histogram of a dropped frame is predicted by averaging the features of the previous two frames:.
  • The essential idea of using color histogram for shot-boundary detection is that color content does not change rapidly within a shot.

5. Evaluations and discussion

  • Three main parts of the proposed system, noise detection, shotboundary detection, and window extraction of picture-in-picture transformation, are evaluated.
  • The authors test dataset is composed of query videos provided for TRECVID 2008 content-based copy detection task.
  • There are total of 2010 MPEG-1 videos, which is about 80 h of video segments with various transformations applied (see Table 1).
  • The important events that occur in query videos are shown in Fig.

5.1. Evaluation of noise detection

  • The authors used the query video set of TRECVID 2008 CBCD task, and extracted 1 frame per 2 s for each of 2010 videos.
  • After decoding the videos, 33,478 images are manually labeled as 1 or 0, indicating that the frame is highly noisy or not.
  • Median filters of different sizes are evaluated and compared in an ROC curve (see Fig. 9).
  • Most of the false alarms are caused by query videos that have noise originally, but not as a transformation.
  • Method Recall Precision F1 RGB CH 0.6284 0.6862 0.6560 that frames with sea, wavy water, or a textured background generally give high noise detection outputs.

5.2. Evaluation of picture-in-picture transformation detection

  • Out of 2010 query videos, 545 of them include picture-in-picture transformation.
  • The authors obtained the scale and offset information of all picture-in-picture transformations by processing the groundtruth data used for generating query videos.
  • Missed picture-in-picture transformations are generally caused by complex transformations, i.e., (8)–(10) in Table 1.

5.3. Evaluation of shot-boundary detection algorithm

  • The authors selected a set of shot-boundary detection algorithms from the literature for a comparative evaluation of their method.
  • The algorithms selected for comparison cover most of the major techniques listed in [8,10,7].
  • It should be noted that the methods selected for comparison could perform much better for detecting shot-boundaries of videos on which none of the transformations listed in Table 1 are applied.
  • For most of the transformations, the proposed fuzzy color histogram-based method performs better than other techniques.

Did you find this useful? Give us your feedback

Figures (12)
Citations
More filters
Journal ArticleDOI
TL;DR: A deep review of the state of the art on color image segmentation methods based on edge detection, thresholding, histogram-thresholding, region, feature clustering and neural networks is presented.
Abstract: En este articulo se hace la revision del estado del arte sobre la segmentacion de imagenes de color

122 citations

Journal ArticleDOI
TL;DR: A novel fuzzy color difference histogram (FCDH) is proposed by using fuzzy c-means (FCM) clustering and exploiting the CDH, which reduces the large dimensionality of the histogram bins in the computation and lessens the effect of intensity variation generated due to the fake motion or change in illumination of the background.
Abstract: Detection of moving objects in the presence of complex scenes such as dynamic background (e.g, swaying vegetation, ripples in water, spouting fountain), illumination variation, and camouflage is a very challenging task. In this context, we propose a robust background subtraction technique with three contributions. First, we present the use of color difference histogram (CDH) in the background subtraction algorithm. This is done by measuring the color difference between a pixel and its neighbors in a small local neighborhood. The use of CDH reduces the number of false errors due to the non-stationary background, illumination variation and camouflage. Secondly, the color difference is fuzzified with a Gaussian membership function. Finally, a novel fuzzy color difference histogram (FCDH) is proposed by using fuzzy c-means (FCM) clustering and exploiting the CDH. The use of FCM clustering algorithm in CDH reduces the large dimensionality of the histogram bins in the computation and also lessens the effect of intensity variation generated due to the fake motion or change in illumination of the background. The proposed algorithm is tested with various complex scenes of some benchmark publicly available video sequences. It exhibits better performance over the state-of-the-art background subtraction techniques available in the literature in terms of classification accuracy metrics like $MCC$ and $PCC$ .

77 citations

Proceedings ArticleDOI
03 Nov 2014
TL;DR: A fast and effective heuristic ranking approach based on heterogeneous late fusion by jointly considering three aspects: venue categories, visual scene, and user listening history that recommends appealing soundtracks for UGVs to enhance the viewing experience is proposed.
Abstract: Capturing videos anytime and anywhere, and then instantly sharing them online, has become a very popular activity. However, many outdoor user-generated videos (UGVs) lack a certain appeal because their soundtracks consist mostly of ambient background noise. Aimed at making UGVs more attractive, we introduce ADVISOR, a personalized video soundtrack recommendation system. We propose a fast and effective heuristic ranking approach based on heterogeneous late fusion by jointly considering three aspects: venue categories, visual scene, and user listening history. Specifically, we combine confidence scores, produced by SVMhmm models constructed from geographic, visual, and audio features, to obtain different types of video characteristics. Our contributions are threefold. First, we predict scene moods from a real-world video dataset that was collected from users' daily outdoor activities. Second, we perform heuristic rankings to fuse the predicted confidence scores of multiple models, and third we customize the video soundtrack recommendation functionality to make it compatible with mobile devices. A series of extensive experiments confirm that our approach performs well and recommends appealing soundtracks for UGVs to enhance the viewing experience.

77 citations


Cites methods from "Fuzzy color histogram-based video s..."

  • ...A color histogram [13,24] with 64 dimensions is computed from each UGV video frame by dividing each component of RGB into four bins....

    [...]

Journal ArticleDOI
23 Mar 2018-Entropy
TL;DR: This paper presents a review of an extensive set for SBD approaches and their development, and the advantages and disadvantages of each approach are comprehensively explored.
Abstract: The recent increase in the number of videos available in cyberspace is due to the availability of multimedia devices, highly developed communication technologies, and low-cost storage devices. These videos are simply stored in databases through text annotation. Content-based video browsing and retrieval are inefficient due to the method used to store videos in databases. Video databases are large in size and contain voluminous information, and these characteristics emphasize the need for automated video structure analyses. Shot boundary detection (SBD) is considered a substantial process of video browsing and retrieval. SBD aims to detect transition and their boundaries between consecutive shots; hence, shots with rich information are used in the content-based video indexing and retrieval. This paper presents a review of an extensive set for SBD approaches and their development. The advantages and disadvantages of each approach are comprehensively explored. The developed algorithms are discussed, and challenges and recommendations are presented.

56 citations


Cites background or methods from "Fuzzy color histogram-based video s..."

  • ...[37] L*a*b* Video transformation detection 15 Fuzzy rules D D D D -...

    [...]

  • ...VPP encompasses shooting and production processes....

    [...]

  • ...As stated previously, VEP is used in VPP by individuals or institutes....

    [...]

  • ...Concatenation between two or more shots is implemented in the video editing process (VEP) to create a video during the video production process (VPP) [37]....

    [...]

  • ...In [37], fuzzy logic was used to generate a color histogram for HT and ST (fade and dissolve) detection....

    [...]

Proceedings ArticleDOI
10 Dec 2014
TL;DR: Cineast is a multi-feature sketch-based video retrieval engine able to universally retrieve video (sequences) without the need for prior knowledge or semantic understanding and to support powerful search paradigms in large video collections on the basis of user-provided sketches as query input.
Abstract: Despite the tremendous importance and availability of large video collections, support for video retrieval is still rather limited and is mostly tailored to very concrete use cases and collections. In image retrieval, for instance, standard keyword search on the basis of manual annotations and content-based image retrieval, based on the similarity to query image (s), are well established search paradigms, both in academic prototypes and in commercial search engines. Recently, with the proliferation of sketch-enabled devices, also sketch-based retrieval has received considerable attention. The latter two approaches are based on intrinsic image features and rely on the representation of the objects of a collection in the feature space. In this paper, we present Cineast, a multi-feature sketch-based video retrieval engine. The main objective of Cineast is to enable a smooth transition from content-based image retrieval to content-based video retrieval and to support powerful search paradigms in large video collections on the basis of user-provided sketches as query input. Cineast is capable of retrieving video sequences based on edge or color sketches as query input and even supports one or multiple exemplary video sequences as query input. Moreover, Cineast also supports a novel approach to sketch-based motion queries by allowing a user to specify the motion of objects within a video sequence by means of (partial) flow fields, also specified via sketches. Using an emergent combination of multiple different features, Cineast is able to universally retrieve video (sequences) without the need for prior knowledge or semantic understanding. The evaluation with a general purpose video collection has shown the effectiveness and the efficiency of the Cineast approach.

47 citations


Cites methods from "Fuzzy color histogram-based video s..."

  • ...To perform the shot segmentation, Cineast uses a modified version of the fuzzy color histogram [16]....

    [...]

References
More filters
Book
01 Aug 1996
TL;DR: A separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.
Abstract: A fuzzy set is a class of objects with a continuum of grades of membership. Such a set is characterized by a membership (characteristic) function which assigns to each object a grade of membership ranging between zero and one. The notions of inclusion, union, intersection, complement, relation, convexity, etc., are extended to such sets, and various properties of these notions in the context of fuzzy sets are established. In particular, a separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.

52,705 citations

Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Journal ArticleDOI
TL;DR: It is pointed out that the use of angle-radius rather than slope-intercept parameters simplifies the computation further, and how the method can be used for more general curve fitting.
Abstract: Hough has proposed an interesting and computationally efficient procedure for detecting lines in pictures. This paper points out that the use of angle-radius rather than slope-intercept parameters simplifies the computation further. It also shows how the method can be used for more general curve fitting, and gives alternative interpretations that explain the source of its efficiency.

6,693 citations

Book
01 Jan 1982

5,834 citations

Proceedings ArticleDOI
26 Oct 2006
TL;DR: An introduction to information retrieval (IR) evaluation from both a user and a system perspective is given, high-lighting that system evaluation is by far the most prevalent type of evaluation carried out.
Abstract: The TREC Video Retrieval Evaluation (TRECVid)is an international benchmarking activity to encourage research in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations 1 interested in comparing their results. TRECVid completed its fifth annual cycle at the end of 2005 and in 2006 TRECVid will involve almost 70 research organizations, universities and other consortia. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for shots from within a video corpus,automatic detection of a variety of semantic and low-level video features, shot boundary detection and the detection of story boundaries in broadcast TV news. This paper will give an introduction to information retrieval (IR) evaluation from both a user and a system perspective, high-lighting that system evaluation is by far the most prevalent type of evaluation carried out. We also include a summary of TRECVid as an example of a system evaluation bench-marking campaign and this allows us to discuss whether such campaigns are a good thing or a bad thing. There are arguments for and against these campaigns and we present some of them in the paper concluding that on balance they have had a very positive impact on research progress.

1,395 citations


"Fuzzy color histogram-based video s..." refers methods in this paper

  • ...We evaluate the proposed method in the query dataset prepared for CBCD task of TRECVID 2008, and show the accuracy of the system....

    [...]

  • ...For the first time in 2008, TRECVID [23] evaluated CBCD systems....

    [...]

  • ...Some of the recent studies on CBCD task of TRECVID 2008 employ the following techniques....

    [...]

  • ...Our test dataset is composed of query videos provided for TRECVID 2008 content-based copy detection task....

    [...]

  • ...We used the query video set of TRECVID 2008 CBCD task, and extracted 1 frame per 2 s for each of 2010 videos....

    [...]

Frequently Asked Questions (12)
Q1. What are the contributions in "Fuzzy color histogram-based video segmentation" ?

The authors present a fuzzy color histogram-based shot-boundary detection algorithm specialized for contentbased copy detection applications. The proposed method aims to detect both cuts and gradual transitions ( fade, dissolve ) effectively in videos where heavy transformations ( such as cam-cording, insertions of patterns, strong re-encoding ) occur. 

As a future work the authors will use the detected shot boundaries, masks of still regions, picture-in-picture window boundaries, and the fuzzy color histogram method in their content-based copy detection system. 

By matching the same objects and scenes using contrast context histogram (CCH) in two adjacent frames, the method decides that there is no shot change. 

In the field of CBCD, representing video with a set of keyframes (one or more representative frame for each shot) is a common approach. 

In [17], histogram differences of consecutive frames are characterized as fuzzy terms, such as small, significant and large, and fuzzy rules for detecting abrupt and gradual transitions are formulated in a fuzzy-logic-based framework for segmentation of video sequences. 

The authors obtained the scale and offset information of all picture-in-picture transformations by processing the groundtruth data used for generating query videos. 

iv Local keypoint matching (KM): Recognizing the objects and scenes throughout the video is the basic idea of the keypoint matching-based shot-boundary detection methods. 

Douze et al. prefer extracting 2.5 frames per second for query videos, and extracting only a few representative keyframes for the dataset [21]. 

Their tests with 50 query videos, which represents each transformation type with at least 4 videos, showed that fuzzy color histogram-based shot-boundary detection method can achieve higher accuracy values, while reducing false alarms. 

The authors propose a fuzzy color histogram-based shot-boundary detection method for the videos where heavy transformations (such as cam-cording, insertions of patterns, strong re-encoding) occur. 

Their method also achieves a lower false alarm rate with a precision of 93.67%, whereas the precision values of the other methods could only reach up to 53.21%. 

Although the insertion of a pattern or text does not affect the shot-boundary detection process strongly, a mask for still regions, which includes the inserted pattern or text, will increase the effectiveness of a CBCD system.