scispace - formally typeset
Search or ask a question

Showing papers on "Video tracking published in 1999"


Proceedings ArticleDOI
01 Jan 1999
TL;DR: An observation density for tracking is presented which solves this problem by exhibiting a probabilistic exclusion principle, which arises naturally from a systematic derivation of the observation density, without relying on heuristics.
Abstract: Tracking multiple targets whose models are indistinguishable is a challenging problem. Simply instantiating several independent I-body trackers is not an adequate solution, because the independent trackers can coalesce onto the best-fitting target. This paper presents an observation density for tracking which solves this problem by exhibiting a probabilistic exclusion principle. Exclusion arises naturally from a systematic derivation of the observation density, without relying on heuristics. Another important contribution of the paper is the presentation of partitioned sampling, a new sampling method for multiple object tracking. Partitioned sampling avoids the high computational load associated with fully coupled trackers, while retaining the desirable properties of coupling.

439 citations


Proceedings ArticleDOI
30 Oct 1999
TL;DR: Methods for automatically creating pictorial video summaries that resemble comic books are presented and how the automatically generated summaries are used to simplify access to a large collection of videos is described.
Abstract: This paper presents methods for automatically creating pictorial video summaries that resemble comic books. The relative importance of video segments is computed from their length and novelty. Image and audio analysis is used to automatically detect and emphasize meaningful events. Based on this importance measure, we choose relevant keyframes. Selected keyframes are sized by importance, and then efficiently packed into a pictorial summary. We present a quantitative measure of how well a summary captures the salient events in a video, and show how it can be used to improve our summaries. The result is a compact and visually pleasing summary that captures semantically important events, and is suitable for printing or Web access. Such a summary can be further enhanced by including text captions derived from OCR or other methods. We describe how the automatically generated summaries are used to simplify access to a large collection of videos.

404 citations


Patent
09 Sep 1999
TL;DR: In this paper, an object tracking system is provided for tracking the removal of objects from a location and the replacement of the objects at the location, which includes a radio frequency identification (RFID) tag attached to each object to be tracked and each tag has an antenna.
Abstract: An object tracking system is provided for tracking the removal of objects from a location and the replacement of the objects at the location. The system includes a radio frequency identification (RFID) tag attached to each of the objects to be tracked and each tag has an antenna. When activated, the RFID tag of an object transmits a unique code identifying the object. A storage unit is provided at the location and the storage unit has a plurality of receptacles configured to receive objects replaced at the location. Each receptacle has an associated antenna for activating the RFID tag of an object in the receptacle and receiving the radio frequency transmitted code of the object. The antennae of the system can be capacitive plates for conveying the radio frequency transmissions through capacitive coupling or inductive loops for conveying the transmissions through inductive coupling. A computer-based controller is coupled to the antenna of the receptacles for receiving transmitted codes and determining based thereon the absence or presence and location of objects within the storage unit.

384 citations


Journal ArticleDOI
TL;DR: A novel method for generating key frames and previews for an arbitrary video sequence by first applying multiple partitional clustering to all frames of a video sequence and then selecting the most suitable clustering option(s) using an unsupervised procedure for cluster-validity analysis.
Abstract: Key frames and previews are two forms of a video abstract, widely used for various applications in video browsing and retrieval systems. We propose in this paper a novel method for generating these two abstract forms for an arbitrary video sequence. The underlying principle of the proposed method is the removal of the visual-content redundancy among video frames. This is done by first applying multiple partitional clustering to all frames of a video sequence and then selecting the most suitable clustering option(s) using an unsupervised procedure for cluster-validity analysis. In the last step, key frames are selected as centroids of obtained optimal clusters. Video shots, to which key frames belong, are concatenated to form the preview sequence.

369 citations


Book
20 Dec 1999
TL;DR: Covering both image and video compression, this book yields a unique, self-contained reference for practitioners tobuild a basis for future study, research, and development.
Abstract: Multimedia hardware still cannot accommodate the demand for large amounts of visual data Without the generation of high-quality video bitstreams, limited hardware capabilities will continue to stifle the advancement of multimedia technologies Thorough grounding in coding is needed so that applications such as MPEG-4 and JPEG 2000 may come to fruition Image and Video Compression for Multimedia Engineering provides a solid, comprehensive understanding of the fundamentals and algorithms that lead to the creation of new methods for generating high quality video bit streams The authors present a number of relevant advances along with international standards New to the Second Edition A chapter describing the recently developed video coding standard, MPEG-Part 10 Advances Video Coding also known as H264 Fundamental concepts and algorithms of JPEG2000 Color systems of digital video Up-to-date video coding standards and profiles Visual data, image, and video coding will continue to enable the creation of advanced hardware, suitable to the demands of new applications Covering both image and video compression, this book yields a unique, self-contained reference for practitioners tobuild a basis for future study, research, and development

342 citations


Proceedings Article
29 Nov 1999
TL;DR: A system that reconstructs the 3D motion of human subjects from single-camera video, relying on prior knowledge about human motion, learned from training data, to resolve those ambiguities.
Abstract: The three-dimensional motion of humans is underdetermined when the observation is limited to a single camera, due to the inherent 3D ambiguity of 2D video. We present a system that reconstructs the 3D motion of human subjects from single-camera video, relying on prior knowledge about human motion, learned from training data, to resolve those ambiguities. After initialization in 2D, the tracking and 3D reconstruction is automatic; we show results for several video sequences. The results show the power of treating 3D body tracking as an inference problem.

341 citations


Patent
28 Jan 1999
TL;DR: The recognition of instruments by patterns of optically detectable structures provides data on three-dimensional position, orientation, and instrument type as discussed by the authors, where passive or active optical detection is possible via various light sources, reflectors, and pattern structures applicable in various clinical contexts.
Abstract: Camera systems in combination with data processors, image scan data, and computers and associated graphic display provide tracking of instruments, objects, patients, and apparatus in a surgical, diagnostic, or treatment setting. Optically detectable objects are connected to instrumentation, a patient, or a clinician to track their position in space by optical detection systems and methods. The recognition of instruments by patterns of optically detectable structures provides data on three-dimensional position, orientation, and instrument type. Passive or active optical detection is possible via various light sources, reflectors, and pattern structures applicable in various clinical contexts.

304 citations


Journal ArticleDOI
01 Oct 1999
TL;DR: Feedback-based low bit-rate video coding techniques for robust transmission in mobile multimedia networks, applicable to a wide variety of interframe video schemes, including various video coding standards are reviewed.
Abstract: We review feedback-based low bit-rate video coding techniques for robust transmission in mobile multimedia networks. For error control on the source coding level, each decoder has to make provisions for error detection, resynchronization, and error concealment, and we review techniques suitable for that purpose. Further, techniques are discussed for intelligent processing of acknowledgment information by the coding control to adapt the source coder to the channel. We review and compare error tracking, error confinement, and reference picture selection techniques for channel-adaptive source coding. For comparison of these techniques, a system for transmitting low bit-rate video over a wireless channel is presented and the performance is evaluated for a range of transmission conditions. We also show how feedback-based source coding can be employed in conjunction with precompressed video stored on a media server. The techniques discussed are applicable to a wide variety of interframe video schemes, including various video coding standards. Several of the techniques have been incorporated into the H.263 video compression standard, and this standard is used as an example throughout.

284 citations


Journal ArticleDOI
TL;DR: This paper surveys several approaches and algorithms that have been recently proposed to automatically structure audio?visual data, both for annotation and access.

277 citations


01 Jan 1999
TL;DR: In this article, the authors review feedback-based low bit-rate video coding techniques for robust transmission in mobile multimedia networks and compare error tracking, error confinement, and reference picture selection techniques for channel-adaptive source coding.
Abstract: We review feedback-based low bit-rate video coding techniques for robust transmission in mobile multimedia networks. For error control on the source coding level, each decoder has to make provisions for error detection, resynchronization, and error concealment, and we review techniques suitable for that purpose. Further, techniques are discussed for intelligent processing of acknowledgment information by the coding control to adapt the source coder to the channel. We review and compare error tracking, error confinement, and reference picture selection techniques for channel-adaptive source coding. For comparison of these techniques, a system for transmitting low bit-rate video over a wireless channel is presented and the performance is evaluated for a range of transmission conditions. We also show how feedback-based source coding can be employed in conjunction with precompressed video stored on a media server. The techniques discussed are applicable to a wide variety of interframe video schemes, including various video coding standards. Several of the techniques have been incorporated into the H.263 video compression standard recently, and this standard is used as an example throughout.

269 citations


Journal ArticleDOI
TL;DR: In this paper, the key frame extraction problem is considered from a set-theoretic point of view, and systematic algorithms are derived to find a compact set of key frames that can represent a video segment for a given degree of fidelity.
Abstract: Extracting a small number of key frames that can abstract the content of video is very important for efficient browsing and retrieval in video databases. In this paper, the key frame extraction problem is considered from a set-theoretic point of view, and systematic algorithms are derived to find a compact set of key frames that can represent a video segment for a given degree of fidelity. The proposed extraction algorithms can be hierarchically applied to obtain a tree-structured key frame hierarchy that is a multilevel abstract of the video. The key frame hierarchy enables an efficient content-based retrieval by using the depth-first search scheme with pruning., Intensive experiments on a variety of video sequences are presented to demonstrate the improved performance of the proposed algorithms over the existing approaches.

Proceedings ArticleDOI
24 Oct 1999
TL;DR: This method first locates candidate text regions directly in the DCT compressed domain, and then reconstructs the candidate regions for further refinement in the spatial domain, so that only a small amount of decoding is required.
Abstract: We present a method to automatically locate captions in MPEG video. Caption text regions are segmented from the background using their distinguishing texture characteristics. This method first locates candidate text regions directly in the DCT compressed domain, and then reconstructs the candidate regions for further refinement in the spatial domain. Therefore, only a small amount of decoding is required. The proposed algorithm achieves about 4.0% false reject rate and less than 5.7% false positive rate on a variety of MPEG compressed video containing more than 42,000 frames.

Proceedings ArticleDOI
23 Jun 1999
TL;DR: The proposed tracker allows to deal with partial occlusions, stop and go motion in very challenging situations and demonstrates results on a number of different real sequences.
Abstract: We address the problem of detection and tracking of moving objects in a video stream obtained from a moving airborne platform. The proposed method relies on a graph representation of moving objects which allows to derive and maintain a dynamic template of each moving object by enforcing their temporal coherence. This inferred template along with the graph representation used in our approach allows us to characterize objects trajectories as an optimal path in a graph. The proposed tracker allows to deal with partial occlusions, stop and go motion in very challenging situations. We demonstrate results on a number of different real sequences. We then define an evaluation methodology to quantify our results and show how tracking overcome detection errors.

Journal ArticleDOI
TL;DR: This survey reviews techniques and systems for image and video retrieval including research, commercial, and World Wide Web-based systems and concludes with an overview of current challenges and future trends.
Abstract: Storage and retrieval of multimedia has become a requirement for many contemporary information systems. These systems need to provide browsing, querying, navigation, and, sometimes, composition capabilities involving various forms of media. In this survey, we review techniques and systems for image and video retrieval. We first look at visual features for image retrieval such as color, texture, shape, and spatial relationships. The indexing techniques are discussed for these features. Nonvisual features include captions, annotations, relational attributes, and structural descriptions. Temporal aspects of video retrieval and video segmentation are discussed next. We review several systems for image and video retrieval including research, commercial, and World Wide Web-based systems. We conclude with an overview of current challenges and future trends for image and video retrieval.

Journal ArticleDOI
TL;DR: To solve two problems of character recognition for videos, low-resolution characters and extremely complex backgrounds, an interpolation filter, multi-frame integration and character extraction filters are applied and the overall recognition results are satisfactory for use in news indexing.
Abstract: The automatic extraction and recognition of news captions and annotations can be of great help locating topics of interest in digital news video libraries. To achieve this goal, we present a technique, called Video OCR (Optical Character Reader), which detects, extracts, and reads text areas in digital video data. In this paper, we address problems, describe the method by which Video OCR operates, and suggest applications for its use in digital news archives. To solve two problems of character recognition for videos, low-resolution characters and extremely complex backgrounds, we apply an interpolation filter, multiframe integration and character extraction filters. Character segmentation is performed by a recognition-based segmentation method, and intermediate character recognition results are used to improve the segmentation. We also include a method for locating text areas using text-like properties and the use of a language-based postprocessing technique to increase word recognition rates. The overall recognition results are satisfactory for use in news indexing. Performing Video OCR on news video and combining its results with other video understanding techniques will improve the overall understanding of the news video content.

Book
01 Dec 1999
TL;DR: Image and Video Compression for Multimedia Engineering is a first, comprehensive graduate/senior level text and a self-contained reference for researchers and engineers that builds a basis for future study, research, and development.
Abstract: From the Publisher: Image and Video Compression for Multimedia Engineering provides a solid, comprehensive understanding of the fundamentals and algorithms of coding and details all of the relevant international coding standards "With the growing popularity of applications that use large amounts of visual data, image and video coding is an active and dynamic field Image and Video Compression for Multimedia Engineering is a first, comprehensive graduate/senior level text and a self-contained reference for researchers and engineers that builds a basis for future study, research, and development

Journal ArticleDOI
TL;DR: This work proposes an alternative compressed domain-based approach that computes motion vectors for the downscaled (N/ 2xN/2) video sequence directly from the original motion vectors from the N/spl times/N video sequence, and discovers that the scheme produces better results by weighting the originalmotion vectors adaptively.
Abstract: Digital video is becoming widely available in compressed form, such as a motion JPEG or MPEG coded bitstream. In applications such as video browsing or picture-in-picture, or in transcoding for a lower bit rate, there is a need to downscale the video prior to its transmission. In such instances, the conventional approach to generating a downscaled video bitstream at the video server would be to first decompress the video, perform the downscaling operation in the pixel domain, and then recompress it as, say, an MPEG, bitstream for efficient delivery. This process is computationally expensive due to the motion-estimation process needed during the recompression phase. We propose an alternative compressed domain-based approach that computes motion vectors for the downscaled (N/2xN/2) video sequence directly from the original motion vectors for the N/spl times/N video sequence. We further discover that the scheme produces better results by weighting the original motion vectors adaptively. The proposed approach can lead to significant computational savings compared to the conventional spatial (pixel) domain approach. The proposed approach is useful for video severs that provide quality of service in real time for heterogeneous clients.

Journal Article
Z.M. Hefed1
TL;DR: Object tracking means tracing the progress of objects (or object features) as they move about in a visual scene, which involves processing spatial and temporal changes.
Abstract: Object tracking means tracing the progress of objects (or object features) as they move about in a visual scene. It involves processing spatial and temporal changes. Some approaches are discussed together with applications and challenges.

Journal ArticleDOI
TL;DR: This work develops robust computer vision methods to detect and track natural features in video images that represent a step toward integrating vision with graphics to produce robust wide-area augmented realities.
Abstract: Natural scene features stabilize and extend the tracking range of augmented reality (AR) pose-tracking systems. We develop robust computer vision methods to detect and track natural features in video images. Point and region features are automatically and adaptively selected for properties that lead to robust tracking. A multistage tracking algorithm produces accurate motion estimates, and the entire system operates in a closed-loop that stabilizes its performance and accuracy. We present demonstrations of the benefits of using tracked natural features for AR applications that illustrate direct scene annotation, pose stabilization, and extendible tracking range. Our system represents a step toward integrating vision with graphics to produce robust wide-area augmented realities.

Journal ArticleDOI
TL;DR: The results indicate that model-based tracking of rigid objects in monocular image sequences may have to be reappraised more thoroughly than anticipated during the recent past.
Abstract: A model-based vehicle tracking system for the evaluation of inner-city traffic video sequences has been systematically tested on about 15 minutes of real world video data Methodological improvements during preparatory test phases affected—among other changes—the combination of edge element and optical flow estimates in the measurement process and a more consequent exploitation of background knowledge The explication of this knowledge in the form of models facilitates the evaluation of video data for different scenes by exchanging the scene-dependent models An extensive series of experiments with a large test sample demonstrates that the current version of our system appears to have reached a relative optimum: further interactive tuning of tracking parameters does no longer promise to improve the overall system performance significantly Even the incorporation of further knowledge regarding vehicle and scene geometry or illumination has to cope with an increasing level of interaction between different knowledge sources and system parameters Our results indicate that model-based tracking of rigid objects in monocular image sequences may have to be reappraised more thoroughly than anticipated during the recent past

Proceedings ArticleDOI
15 Mar 1999
TL;DR: A new approach to content-based video indexing using hidden Markov models (HMMs), in which one feature vector is calculated for each image of the video sequence, that allows the classification of complex video sequences.
Abstract: This paper presents a new approach to content-based video indexing using hidden Markov models (HMMs). In this approach one feature vector is calculated for each image of the video sequence. These feature vectors are modeled and classified using HMMs. This approach has many advantages compared to other video indexing approaches. The system has automatic learning capabilities. It is trained by presenting manually indexed video sequences. To improve the system we use a video model, that allows the classification of complex video sequences. The presented approach works three times faster than real-time. We tested our system on TV broadcast news. The rate of 97.3% correctly classified frames shows the efficiency of our system.

Journal ArticleDOI
TL;DR: Two schemes are proposed: retrieval based on key frames follows the traditional approach of identifying shots, computing key frames from a video, and then extracting image features around the key frames, and retrieval using sub-sampled frames is based on matching color and texture features of the sub-Sampled frames.
Abstract: Typical digital video search is based on queries involving a single shot. We generalize this problem by allowing queries that involve a video clip (say, a 10-s video segment). We propose two schemes: (i) retrieval based on key frames follows the traditional approach of identifying shots, computing key frames from a video, and then extracting image features around the key frames. For each key flame in the query, a similarity value (using color, texture, and motion) is obtained with respect to the key frames in the database video. Consecutive key frames in the database video that are highly similar to the query key frames are then used to generate the set of retrieved video clips. (ii) In retrieval using sub-sampled frames, we uniformly subsample the query Clip as well as the database video. Retrieval is based on matching color and texture features of the subsampled frames. Initial experiments on two video databases (basketball video with approximately 16,000 frames and a CNN news video with approximately 20,000 frames) show promising results. Additional experiments using segments from one basketball video as query and a different basketball video as the database show the effectiveness of feature representation and matching schemes.

Proceedings ArticleDOI
30 Oct 1999
TL;DR: NoteLook is a client-server system designed and built to support multimedia note taking in meetings with digital video and ink, integrated into a conference room equipped with computer controllable video cameras, video conference camera, and a large display rear video projector.
Abstract: NoteLook is a client-server system designed and built to support multimedia note taking in meetings with digital video and ink. It is integrated into a conference room equipped with computer controllable video cameras, video conference camera, and a large display rear video projector. The NoteLook client application runs on wireless pen-based notebook computers. Video channels containing images of the room activity and presentation material are transmitted by the NoteLook servers to the clients, and the images can be interactively and automatically incorporated into the note pages. Users can select channels, snap in large background images and sequences of thumbnails, and write freeform ink notes. A smart video source management component enables the capture of high quality images of the presentation material from a variety of sources. For accessing and browsing the notes and recorded video, NoteLook generates Web pages with links from the images and ink strokes correlated to the video.

Patent
17 Nov 1999
TL;DR: In this paper, a system (100) for tracking the movement of multiple objects within a predefined area using a combination of overhead X-Y filming cameras (25) and tracking cameras (24) with attached frequency selective filter (24f).
Abstract: A system (100) for tracking the movement of multiple objects within a predefined area using a combination of overhead X-Y filming cameras (25) and tracking cameras (24) with attached frequency selective filter (24f). Also employed are perspective Z filming cameras (25) and tracking cameras (24) with filter (24f). Objects to be tracked have been marked with a frequency selective reflective material, such as patches (7r and 71), sticker (9) and tape (4a). System (100) radiates selected energy (23a) throughout the area of tracking to reflect off the reflective materials. Reflected energy such as (7m, 9a and 4b) is then received by tracking cameras (24) while all other ambient light is blocked by filter (24f). Local Computer System (60) captures images from tracking cameras (24) and locates said markings. Using the location information along with preknowledge concerning the multiple objects maximum rate of speed and maximum size as well as calculated movement information, system (60) is able to extract from the background the portion of the unfiltered images that represent the multiple objects.

David Beymer1
01 Jan 1999
TL;DR: This paper explores an alternative method that keeps just a single hypothesis per tracked object for computationaliency, but displays robust performance and recovery from error by employing continuous detection during tracking.
Abstract: Recent investigations have shown the advantages of keeping multiple hypotheses during visual tracking. In this paper we explore an alternative method that keeps just a single hypothesis per tracked object for computational e ciency, but displays robust performance and recovery from error by employing continuous detection during tracking. The method is implemented in the domain of people-tracking, using a novel combination of stereo information for continuous detection and intensity image correlation for tracking. Real-time stereo provides extended information for 3D detection and tracking, even in the presence of crowded scenes, obscuring objects, and large scale changes. We are able to reliably detect and track people in natural environments, on an implemented system that runs at more than 10 Hz on standard PC

Patent
30 Nov 1999
TL;DR: Watermarks and related machine-readable coding techniques are used to embed data within the information content on object surfaces as mentioned in this paper, which may be used as a substitute for (or in combination with) standard machinereadable coding methods such as bar codes, magnetic stripes, etc.
Abstract: Watermarks and related machine-readable coding techniques are used to embed data within the information content on object surfaces. These techniques may be used as a substitute for (or in combination with) standard machine-readable coding methods such as bar codes, magnetic stripes, etc. As such, the coding techniques extend to many applications, such as linking objects with network resources, retail point of sale applications, object tracking and counting, production control, object sorting, etc. Object message data, including information about the object, machine instructions, or an index, may be hidden in the surface media of the object. An object messaging system includes an embedder and reader. The embedder converts an object message to an object reference, and encodes this reference in a watermarked signal applied to the object. The reader detects the presence of a watermark and decodes the watermark signal to extract the object reference.

Journal ArticleDOI
TL;DR: A system for real-time object recognition and tracking for remote video surveillance with a unique feature, i.e., the statistical morphological skeleton, which achieves low computational complexity, accuracy of localization, and noise robustness has been presented.
Abstract: A system for real-time object recognition and tracking for remote video surveillance is presented. In order to meet real-time requirements, a unique feature, i.e., the statistical morphological skeleton, which achieves low computational complexity, accuracy of localization, and noise robustness has been considered for both object recognition and tracking. Recognition is obtained by comparing an analytical approximation of the skeleton function extracted from the analyzed image with that obtained from model objects stored into a database. Tracking is performed by applying an extended Kalman filter to a set of observable quantities derived from the detected skeleton and other geometric characteristics of the moving object. Several experiments are shown to illustrate the validity of the proposed method and to demonstrate its usefulness in video-based applications.

Journal ArticleDOI
TL;DR: This paper compares the transmission schedules generated by the various smoothing algorithms, based on a collection of metrics that relate directly to the server, network, and client resources necessary for the transmission, transport, and playback of prerecorded video.
Abstract: The transfer of prerecorded, compressed variable-bit-rate video requires multimedia services to support large fluctuations in bandwidth requirements on multiple time scales. Bandwidth smoothing techniques can reduce the burstiness of a variable-bit-rate stream by transmitting data at a series of fixed rates, simplifying the allocation of resources in video servers and the communication network. This paper compares the transmission schedules generated by the various smoothing algorithms, based on a collection of metrics that relate directly to the server, network, and client resources necessary for the transmission, transport, and playback of prerecorded video. Using MPEG-1 and MJPEG video data and a range of client buffer sizes, we investigate the interplay between the performance metrics and the smoothing algorithms. The results highlight the unique strengths and weaknesses of each bandwidth smoothing algorithm, as well as the characteristics of a diverse set of video clips.

Patent
03 Sep 1999
TL;DR: In this article, a transition from real-time mode to time-shifted mode is described, where a realtime frame is paused during a transition to the time-shift mode.
Abstract: A time-shifted video method has a real-time mode during which real-time video frames are delivered for display. In a time-shifted mode, time-shifted video frames are delivered for display. The time-shifted video frames are delayed relative to the real-time video frames. A real-time frame is paused during a transition from the real-time mode to the time-shifted mode.

Proceedings ArticleDOI
TL;DR: A novel scheme for matching video sequences base on low-level features that supports fast and efficient matching and can search 450,000 frames of video data within 72 seconds on a 400 MHz Pentium II, for a 50 frame query.
Abstract: Efficient ways to manage digital video data have assumed enormous importance lately. An integral aspect is the ability to browse, index and search huge volumes of video data automatically and efficiently. This paper presents a novel scheme for matching video sequences based on low-level features. The scheme supports fast and efficient matching and can search 4,50,000 frames (250 minutes) of video data within 72 seconds on a 400 MHz. Pentium II, for a 50 frame query. Video sequences are processed in the compressed domain to extract the histograms of the images in the DCT sequence of the compressed video. A variation of histogram intersection of linearized histograms of the frames in the DCT sequence is implemented for matching video clips. The bins of the histograms of successive frames are approximated temporally, using a polynomial approximation technique. The approximation is then used for comparison. This leads to efficient storage and transmission. The histogram representation can be compacted to 4.26 real numbers per frame, while achieving high matching accuracy. Multiple temporal resolution sampling of the videos to be matched is also supported and any key-frame-based matching scheme thus becomes a particular implementation of this scheme.