Showing papers in "Multimedia Systems in 1999"
TL;DR: An implementation of NeTra, a prototype image retrieval system that uses color texture, shape and spatial location information in segmented image database that incorporates a robust automated image segmentation algorithm that allows object or region based search.
Abstract: We present here an implementation of NeTra, a prototype image retrieval system that uses color, texture, shape and spatial location information in segmented image regions to search and retrieve similar regions from the database. A distinguishing aspect of this system is its incorporation of a robust automated image segmentation algorithm that allows object- or region-based search. Image segmentation significantly improves the quality of image retrieval when images contain multiple complex objects. Images are segmented into homogeneous regions at the time, of ingest into the database, and image attributes that represent each of these regions are computed. In addition to image segmentation, other important components of the system include an efficient color representation, and indexing of color, texture, and shape features for fast search and retrieval. This representation allows the user to compose interesting queries such as "retrieve all images that contain regions that have the color of object A, texture of object B, shape of object C, and lie in the upper of the image", where the individual objects could be regions belonging to different images. A Java-based web implementation of NeTra is available at http://vivaldi.ece.ucsb.edu/Netra.
TL;DR: The state of the art in audio information retrieval is reviewed, and recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity are presented with a view towards making audio less “opaque”.
Abstract: The problem of audio information retrieval is familiar to anyone who has returned from vacation to find ananswering machine full of messages. While there is not yetan "AltaVista" for the audio data type, many workers arefinding ways to automatically locate, index, and browse audio using recent advances in speech recognition and machinelistening. This paper reviews the state of the art in audioinformation retrieval, and presents recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity with a view towardsmaking audio less "opaque". A special section addresses intelligent interfaces for navigating and browsing audio andmultimedia documents, using automatically derived information to go beyond the tape recorder metaphor.
TL;DR: This paper presents an effective semantic-level ToC construction technique based on intelligent unsupervised clustering that has the characteristics of better modeling the time locality and scene structure.
Abstract: A fundamental task in video analysis is to extract structures from the video to facilitate user's access (browsing and retrieval). Motivated by the important role that the table of content (ToC) plays in a book, in this paper, we introduce the concept of ToC in the video domain. Some existing approaches implicitly use the ToC, but are mainly limited to low-level entities (e.g., shots and key frames). The drawbacks are that low-level structures (1) contain too many entries to be efficiently presented to the user; and (2) do not capture the underlying semantic structure of the video based on which the user may wish to browse/retrieve. To address these limitations, in this paper, we present an effective semantic-level ToC construction technique based on intelligent unsupervised clustering. It has the characteristics of better modeling the time locality and scene structure. Experiments based on real-world movie videos validate the effectiveness of the proposed approach. Examples are given to demonstrate the usage of the scene-based ToC in facilitating user's access to the video.
TL;DR: The algorithm proposed withstands JPEG and MPEG artifacts, even at high compression rates, and can detect and classify production effects that are difficult to detect with previous approaches.
Abstract: We describe a new approach to the detection and classification of production effects in video sequences. Our method can detect and classify a variety of effects, including cuts, fades, dissolves, wipes and captions, even in sequences involving significant motion. We detect the appearance of intensity edges that are distant from edges in the previous frame. A global motion computation is used to handle camera or object motion. The algorithm we propose withstands JPEG and MPEG artifacts, even at high compression rates. Experimental evidence demonstrates that our method can detect and classify production effects that are difficult to detect with previous approaches.
TL;DR: This paper creates a joint histogram by selecting a set of local pixel features and constructing a multidimensional histogram, which incorporates additional information without sacrificing the robustness of color histograms.
Abstract: Color histograms are widely used for content-based image retrieval due to their efficiency and robustness. However, a color histogram only records an image's overall color composition, so images with very different appearances can have similar color histograms. This problem is especially critical in large image databases, where many images have similar color histograms. In this paper, we propose an alternative to color histograms called a joint histogram, which incorporates additional information without sacrificing the robustness of color histograms. We create a joint histogram by selecting a set of local pixel features and constructing a multidimensional histogram. Each entry in a joint histogram contains the number of pixels in the image that are described by a particular combination of feature values. We describe a number of different joint histograms, and evaluate their performance for image retrieval on a database with over 210,000 images. On our benchmarks, joint histograms outperform color histograms by an order of magnitude.
TL;DR: Curvature scale space (CSS) image representation along with a small number of global parameters are used for this purpose and the results show the promising performance of the method and its superiority over Fourier descriptors and moment invariants.
Abstract: In many applications, the user of an image database system points to an image, and wishes to retrieve similar images from the database. Computer vision researchers aim to capture image information in feature vectors which describe shape, texture and color properties of the image. These vectors are indexed or compared to one another during query processing to find images from the database. This paper is concerned with the problem of shape similarity retrieval in image databases. Curvature scale space (CSS) image representation along with a small number of global parameters are used for this purpose. The CSS image consists of several arch-shape contours representing the inflection points of the shape as it is smoothed. The maxima of these contours are used to represent a shape. The method is then tested on a database of 1100 images of marine creatures. A classified subset of this database is used to evaluate the method and compare it with other methods. The results show the promising performance of the method and its superiority over Fourier descriptors and moment invariants.
TL;DR: To solve two problems of character recognition for videos, low-resolution characters and extremely complex backgrounds, an interpolation filter, multi-frame integration and character extraction filters are applied and the overall recognition results are satisfactory for use in news indexing.
Abstract: The automatic extraction and recognition of news captions and annotations can be of great help locating topics of interest in digital news video libraries. To achieve this goal, we present a technique, called Video OCR (Optical Character Reader), which detects, extracts, and reads text areas in digital video data. In this paper, we address problems, describe the method by which Video OCR operates, and suggest applications for its use in digital news archives. To solve two problems of character recognition for videos, low-resolution characters and extremely complex backgrounds, we apply an interpolation filter, multiframe integration and character extraction filters. Character segmentation is performed by a recognition-based segmentation method, and intermediate character recognition results are used to improve the segmentation. We also include a method for locating text areas using text-like properties and the use of a language-based postprocessing technique to increase word recognition rates. The overall recognition results are satisfactory for use in news indexing. Performing Video OCR on news video and combining its results with other video understanding techniques will improve the overall understanding of the news video content.
TL;DR: The shape retrieval performance of the proposed approach to shape representation and similarity measure is better than that of the more established Fourier descriptor method.
Abstract: A region-based approach to shape representation and similarity measure is presented. The shape representation is invariant to translation, scale and rotation. The similarity measure conforms to human similarity perception, i.e., perceptually similar shapes have high similarity measure. An experimental shape retrieval system has been developed and its performance has been studied. The shape retrieval performance of the proposed approach is better than that of the more established Fourier descriptor method.
TL;DR: Two schemes are proposed: retrieval based on key frames follows the traditional approach of identifying shots, computing key frames from a video, and then extracting image features around the key frames, and retrieval using sub-sampled frames is based on matching color and texture features of the sub-Sampled frames.
Abstract: Typical digital video search is based on queries involving a single shot. We generalize this problem by allowing queries that involve a video clip (say, a 10-s video segment). We propose two schemes: (i) retrieval based on key frames follows the traditional approach of identifying shots, computing key frames from a video, and then extracting image features around the key frames. For each key flame in the query, a similarity value (using color, texture, and motion) is obtained with respect to the key frames in the database video. Consecutive key frames in the database video that are highly similar to the query key frames are then used to generate the set of retrieved video clips. (ii) In retrieval using sub-sampled frames, we uniformly subsample the query Clip as well as the database video. Retrieval is based on matching color and texture features of the subsampled frames. Initial experiments on two video databases (basketball video with approximately 16,000 frames and a CNN news video with approximately 20,000 frames) show promising results. Additional experiments using segments from one basketball video as query and a different basketball video as the database show the effectiveness of feature representation and matching schemes.
TL;DR: It is demonstrated that integrated spatial and feature querying using color regions improves image search capabilities over non-spatial content-based image retrieval methods.
Abstract: We present a new system for querying for images by regions and their spatial and feature attributes. The system enables the user to find the images that contain arrangements of regions similar to those diagrammed in a query image. By indexing the attributes of regions, such as sizes, locations and visual features, a wide variety of complex joint spatial and feature queries are efficiently computed. In order to demonstrate the utility of the system, we develop a process for the extracting color regions from photographic images. We demonstrate that integrated spatial and feature querying using color regions improves image search capabilities over non-spatial content-based image retrieval methods.
TL;DR: An overview of the principles and methods for synthesizing complex 3D sound scenes by processing multiple individual source signals is given and a real-time modular spatial-sound-processing software system, called Spat, is presented.
Abstract: This paper gives an overview of the principlesand methods for synthesizing complex 3D sound scenesby processing multiple individual source signals. Signal-processing techniques for directional sound encoding andrendering over loudspeakers or headphones are reviewed,as well as algorithms and interface models for synthesizingand dynamically controling room reverberation and distanceeffects. A real-time modular spatial-sound-processing software system, called Spat, is presented. It allows reproducingand controling the localization of sound sources in three dimensions and the reverberation of sounds in an existing orvirtual space. A particular aim of the Spatialisateur project isto provide direct and computationally efficient control overperceptually relevant parameters describing the interactionof each sound source with the virtual space, irrespective ofthe chosen reproduction format over loudspeakers or headphones. The advantages of this approach are illustrated inpractical contexts, including professional audio, computermusic, multimodal immersive simulation systems, and architectural acoustics.
TL;DR: The innovative features of the TELEPORT system include the use of full-wall display surfaces, “merging” of real and virtual environments, viewer tracking, and real-time compositing of live video with synthetic backgrounds.
Abstract: TELEPORT is an experimental teleconferencing system with the goal of enabling small groups of people, although geographically separated, to meet as if face'to face. The innovative features of the system include the use of full-wall display surfaces, "merging" of real and virtual environments, viewer tracking, and real-time compositing of live video with synthetic backgrounds.
TL;DR: A system which translates the Itten theory into a formal language that expresses the semantics associated with the combination of chromatic properties of color images and exploits a competitive learning technique to segment images into regions with homogeneous colors.
Abstract: The development of a system supporting querying of image databases by color content tackles a major design choice about properties of colors which are referenced within user queries. On the one hand, low-level properties directly reflect numerical features and concepts tied to the machine representation of color information. On the other hand, high-level properties address concepts such as the perceptual quality of colors and the sensations that they convey. Color-induced sensations include warmth, accordance or contrast, harmony, excitement, depression, anguish, etc. In other words, they refer to the semantics of color usage. In particular, paintings are an example where the message is contained more in the high-level color qualities and spatial arrangements than in the physical properties of colors. Starting from this observation, Johannes Itten introduced a formalism to analyze the use of color in art and the effects that this induces on the user's psyche. In this paper, we present a system which translates the Itten theory into a formal language that expresses the semantics associated with the combination of chromatic properties of color images. The system exploits a competitive learning technique to segment images into regions with homogeneous colors. Fuzzy sets are used to represent low-level region properties such as hue, saturation, luminance, warmth, size and position. A formal language and a set of model-checking rules are implemented to define semantic clauses and verify the degree of truth by which they hold over an image.
TL;DR: This paper provides a survey of VBR source models which can be used to drive network simulations and presents models which have been used for VBR sources containing moderate-to-significant scene changes and moderate- to-full motion.
Abstract: It is predicted that, in the near future, the transport of compressed video will pervade computer networks. Variable-bit-rate (VBR) encoded video is expected to become a significant source of network traffic, due to its advantages in statistical multiplexing gain and consistent video quality. Both systems analysts and developers need to assess and study the impact these sources will have on their networks and networking products. To this end, suitable statistical source models are required to analyze performance metrics such as packet loss, delay and jitter. This paper provides a survey of VBR source models which can be used to drive network simulations. The models are categorized into four groups: Markov chain/linear regression, TES, self-similar and i.i.d/analytical. We present models which have been used for VBR sources containing moderate-to-significant scene changes and moderate-to-full motion. A description of each model is given along with corresponding advantages and shortcomings. Comparisons are made based on the complexity of each model.
TL;DR: Results of the study provide bounds on losses, rate variations and transient synchronization losses as a function of user satisfaction, in the form of Likert values.
Abstract: Perception of multimedia quality, specified by quality-of-service (QoS) metrics, can be used by system designers to optimize customer satisfaction within resource bounds enforced by general-purpose computing platforms. Media losses, rate variations and transient synchronization losses have been suspected to affect human perception of multimedia quality. This paper presents metrics to measure such defects, and results of a series of user experiments that justify such speculations. Results of the study provide bounds on losses, rate variations and transient synchronization losses as a function of user satisfaction, in the form of Likert values. It is shown how these results can be used by algorithm designers of underlying multimedia systems.
TL;DR: A survey of past and present research that has influenced this application area is provided, and research directions for the future are described.
Abstract: As integrated services have become available to the desktop, users have embraced new modes of interaction, such as multimedia conferencing and collaborative computing. In this paper, we provide a survey of past and present research that has influenced this application area, and describe research directions for the future.
TL;DR: A new protocol for synchronized playback and compute the buffer required to achieve both, the continuity within a single substream and the synchronization between related substreams is proposed.
Abstract: Multimedia streams such as audio and video impose tight temporal constraints for their presentation. Often,related multimedia streams, such as audio and video, mustbe presented in a synchronized way. We introduce a novelscheme to ensure the continuous and synchronous deliveryof distributed stored multimedia streams across a communications network. We propose a new protocol for synchronized playback and compute the buffer required to achieveboth, the continuity within a single substream and the synchronization between related substreams. The scheme is verygeneral and does not require synchronized clocks. Using aresynchronization protocol based on buffer level control, thescheme is able to cope with server drop-outs and clock drift.The synchronization scheme has been implemented and thepaper concludes with our experimental results.
TL;DR: The paper introduces MEs, then a flexible ME architecture, with a special focus on the modeling of the emotional component of the agents forming an ME.
Abstract: Multimodal Environments (MEs) are systems capable of establishing creative, multimodal user interactionby exhibiting real-time adaptive behaviour. In a typical scenario, one or more users are immersed in an environmentallowing them to communicate by means of full-body movement, singing or playing. Users get feedback from the environment in real time in terms of sound, music, visual media,and actuators, i.e. movement of semi-autonomous mobilesystems including mobile scenography, on-stage robots behaving as actors or players, possibly equipped with musicand multimedia output. MEs are therefore a sort of extension of augmented reality environments. From another viewpoint, an ME can be seen as a sort of prolongation of thehuman mind and senses. From an artificial intelligence perspective, an ME consists of a population of physical andas software agents capable of changing their reactions andtheir social interaction over time. For example, a gesture ofthe user(s) can mean different things in different situations,and can produce changes in the agents populating the ME.The paradigm adopted for movement recognition is that ofa human observer of the dance, where the focus of attentionchanges according to the evolution of the dance itself andof the music produced. MEs are therefore agents able to observe the user, extract "gesture gestalts", and change theirstate, including artificial emotions, over time. MEs open newniches of application, many still to be discovered, includingmusic, dance, theatre, interactive arts, entertainment, interactive exhibitions and museal installations, information atelier, edutainment, training, industrial applications and cognitive rehabilitation (e.g. for autism). The environment canbe a theatre, a museum, a discotheque, a school classroom,a rehabilitation centre for patients with a variety of sensory/motor and cognitive impairments, etc. The ME conceptgeneralizes the bio-feedback methods which already havefound widespread applications. The paper introduces MEs,then a flexible ME architecture, with a special focus on themodeling of the emotional component of the agents formingan ME. Description of four applications we recently developed, currently used in several real testbeds, conclude thepaper.
TL;DR: An adaptive buffering scheme for implementing intra-stream and inter-stream synchronization in real-time multimedia applications to dynamically enforce equalized delays to incoming media streams in order to piece-wise smooth the network delay variations and to synchronize the streams at the sink.
Abstract: In this paper, we present an adaptive buffering scheme for implementing intra-stream and interstream synchronization in real-time multimedia applications. The essence of the proposed scheme is to dynamically enforce equalized delays to incoming media streams, in order to piece-wise smooth the network delay variations and to synchronize the streams at the sink. An adaptive control mechanism based on an event-counting algorithm is employed to calibrate the PlayOut Clocks (POCs), which manages the presentations of multimedia data. The algorithm does not rely on globally synchronized clock and makes minimal assumption on underlying network delay distribution. Also, the user defined quality of service (QoS) specifications can be directly incorporated into the design parameters of the synchronization algorithm. The proposed synchronization scheme has been experimentally implemented in a teleconference system which consists of separately controllable audio, video, and data channels. The modular structure of the synchronization control provides the flexibility to maintain an arbitrary synchronization group in conjunction with a distributed conference management scheme. This paper also shows the experimental results of the test implementation and the suitability of the proposed scheme with respect to the multimedia traffic across an FDDI/Ethernet network.
TL;DR: An overview is presented of the “Structured Audio” and “AudioBIFS” components of MPEG-4, which enable the description of synthetic soundtracks, musical scores, and effects algorithms and the compositing, manipulation, and synchronization of real and synthetic audio sources.
Abstract: While previous generations of the MPEG multimedia standard have focused primarily on coding and transmission of content digitally sampled from the real world,MPEG-4 contains extensive support for structured, syntheticand synthetic/natural hybrid coding methods. An overviewis presented of the "Structured Audio" and "AudioBIFS"components of MPEG-4, which enable the description ofsynthetic soundtracks, musical scores, and effects algorithmsand the compositing, manipulation, and synchronization ofreal and synthetic audio sources. A discussion of the separation of functionality between the systems layer and the audiotoolset of MPEG-4 is presented, and prospects for efficientDSP-based implementations are discussed.
TL;DR: A novel system that strives to achieve advanced content-based image retrieval using seamless combination of two complementary approaches, which surpasses other methods under comparison in terms of not only quantitative measures, but also image retrieval capabilities.
Abstract: In this paper, we propose a novel system that strives to achieve advanced content-based image retrieval using seamless combination of two complementary approaches: on the one hand, we propose a new color-clustering method to better capture color properties of the original images; on the other hand, expecting that image regions acquired from the original images inevitably contain many errors, we make use of the available erroneous, ill-segmented image regions to accomplish the object-region-based image retrieval. We also propose an effective image-indexing scheme to facilitate fast and efficient image matching and retrieval. The carefully designed experimental evaluation shows that our proposed image retrieval system surpasses other methods under comparison in terms of not only quantitative measures, but also image retrieval capabilities.
TL;DR: A playout scheduling framework for supporting the continuous and synchronized presentations of multimedia streams in a distributed multimedia presentation system and develops various playout-scheduling algorithms that are adaptable to quality-of-service parameters.
Abstract: In this paper, we investigate a playout scheduling framework for supporting the continuous and synchronized presentations of multimedia streams in a distributed multimedia presentation system. We assume a situation in which the server and network transmissions provide sufficient support for the delivery of media objects. In this context, major issues regarding the enforcement of the smooth presentation of multimedia streams at client sites must be addressed to deal with rate variance of stream presentations and delay variance of networks. We develop various playout-scheduling algorithms that are adaptable to quality-of-service parameters. The proposed algorithms permit the local adjustment of unsynchronized presentations by gradually accelerating or retarding presentation components, rather than abruptly skipping or pausing the presentation materials. A comprehensive experimental analysis of the proposed algorithms demonstrates that our algorithms can effectively avoid playout gaps (or hiccups) in the presentations. This scheduling framework can be readily used to support customized multimedia presentations.
TL;DR: This paper presents an algorithm for precomputing and storing the optimal schedules for all possible client buffer sizes in a compact manner, and proposes and empirically evaluates an “approximation scheme” that produces a schedule close to optimal but takes much less computation time.
Abstract: Work-ahead smoothing is a technique whereby a server, transmitting stored compressed video to a client, utilizes client buffer space to reduce the rate variability of the transmitted stream. The technique requires the server to compute a schedule of transfer under the constraints that the client buffer neither overflows nor underflows. Recent work established an optimal off-line algorithm (which minimizes peak, variance and rate variability of the transmitted stream) under the assumptions of fixed client buffer size, known worst case network jitter, and strict playback of the client video. In this paper, we examine the practical considerations of heterogeneous and dynamically variable client buffer sizes, variable worst case network jitter estimates, and client interactivity. These conditions require on-line computation of the optimal transfer schedule. We focus on techniques for reducing on-line computation time. Specifically, (i) we present an algorithm for precomputing and storing the optimal schedules for all possible client buffer sizes in a compact manner; (ii) we show that it is theoretically possible to precompute and store compactly the optimal schedules for all possible estimates of worst case network jitter; (iii) in the context of playback resumption after client interactivity, we show convergence of the recomputed schedule with the original schedule, implying greatly reduced on-line computation time; and (iv) we propose and empirically evaluate an "approximation scheme" that produces a schedule close to optimal but takes much less computation time.
TL;DR: This work proposes to implement Leaky ARQ by modifying Type-II Hybrid/“code combining” ARQ, a progressively reliable transport protocol for delivery of delay-sensitive multimedia over Internet connections with wireless access links that permits corrupt packets to be leaked to the receiving application and then uses retransmissions to progressively refine the quality of subsequent packet versions.
Abstract: We propose a progressively reliable transport protocol for delivery of delay-sensitive multimedia over Internet connections with wireless access links. The protocol, termed "Leaky" ARQ, initially permits corrupt packets to be leaked to the receiving application and then uses retransmissions to progressively refine the quality of subsequent packet versions. A Web server would employ Leaky ARQ to quickly deliver a possibly corrupt first version of an image over a noisy bandlimited wireless link for immediate display by a Web browser. Later, Leaky ARQ's retransmissions would enable the browser to eventually display a cleaner image. Forwarding and displaying corrupt error-tolerant image data: (1) lowers the perceptual delay compared to fully reliable packet delivery, and (2) can be shown to produce images with lower distortion than aggressively compressed images when the delay budget only permits weak forward error correction. Leaky ARQ supports delaying of re-transmissions, so that initial packet transmissions can be expedited, and cancelling of retransmissions associated with "out-of-date" data. Leaky ARQ can be parametrized to partially retransmit audio and video. We propose to implement Leaky ARQ by modifying Type-II Hybrid/"code combining" ARQ.
TL;DR: This paper proposes and study a dynamic approach to schedule real-time requests in a video-on-demand (VOD) server and improves throughput by making use of run-time information to relax admission control, and reduces start-up latency.
Abstract: In this paper, we propose and study a dynamic approach to schedule real-time requests in a video-on-demand (VOD) server. Providing quality of service in such servers requires uninterrupted and on-time retrieval of motion video data. VOD services and multimedia applications further require access to the storage devices to be shared among multiple concurrent streams. Most of the previous VOD scheduling approaches use limited run-time, 0 information and thus cannot exploit the potential capacity of the system fully. Our approach improves throughput by making use of run-time information to relax admission control. It maintains excellent quality of service under varying playout rates by observing deadlines and by reallocating resources to guarantee continuous service. It also reduces start-up latency by beginning service as soon as it is detected that deadlines of all real-time requests will be met. We establish safe conditions for greedy admission, dynamic control of disk read sizes, fast initial service, and sporadic services. We conduct thorough simulations over a wide range of buffer capacities, load settings, and over varying playout rates to demonstrate the significant improvements in quality of service, throughput and start-up latency of our approach relative to a static approach.
TL;DR: This paper presents a methodology for automated construction of multimedia presentations, and defines four types of presentation organization constraints that are incorporated into the multimedia data model, independent of any presentation.
Abstract: In this paper, we present a methodology for automated construction of multimedia presentations. Semantic coherency of a multimedia presentation is expressed in terms of presentation inclusion and exclusion constraints. When a user specifies a set of segments for a presentation, the multimedia database system adds segments into and/or deletes segments from the set in order to satisfy the inclusion and exclusion constraints. We discuss the consistency and the satisfiability of inclusion and exclusion constraints when exclusion is allowed. Users express a presentation query by (a) pointing and clicking to an initial set of desired multimedia segments to be included into the presentation, and (b) specifying an upper bound on the time length of the presentation. The multimedia database system then finds the set of segments satisfying the inclusion-exclusion constraints and the time bound. Using priorities for segments and inclusion constraints, we give two algorithms for automated presentation assembly and discuss their complexity. To automate the assembly of a presentation with concurrent presentation streams, we introduce presentation organization constraints that are incorporated into the multimedia data model, independent of any presentation. We define four types of presentation organization constraints that, together with an underlying database ordering, allow us to obtain a unique presentation graph for a given set of multimedia segments. We briefly summarize a prototype system that fully incorporates the algorithms for the segment selection problem.
TL;DR: This paper presents a placement algorithm that interleaves multi-resolution video streams on a disk array and enables a video server to efficiently support playback of these streams at different resolution levels and combines this placement algorithm with a scalable compression technique to efficient support interactive scan operations.
Abstract: In this paper, we present a placement algorithm that interleaves multi-resolution video streams on a disk array and enables a video server to efficiently support playback of these streams at different resolution levels. We then combine this placement algorithm with a scalable compression technique to efficiently support interactive scan operations (i.e., fast-forward and rewind). We present an analytical model for evaluating the impact of the scan operations on the performance of disk-array-based servers. Our experiments demonstrate that: (1) employing our placement algorithm substantially reduces seek and rotational latency overhead during playback, and (2) exploiting the characteristics of video streams and human perceptual tolerances enables a server to support interactive scan operations without any additional overhead.
TL;DR: This paper shows how wavelet analysis of frames of video can be used to detect transitions between shots in a video stream, thereby dividing the stream into segments and describes how each segment can be inserted into a video database using an indexing scheme that involves a wavelet-based “signature.”
Abstract: We present several algorithms suitable for analysis of broadcast video. First, we show how wavelet analysis of frames of video can be used to detect transitions between shots in a video stream, thereby dividing the stream into segments. Next we describe how each segment can be inserted into a video database using an indexing scheme that involves a wavelet-based "signature." Finally, we show that during a subsequent broadcast of a similar or identical video clip, the segment can be found in the database by quickly searching for the relevant signature. The method is robust against noise and typical variations in the video stream, even global changes in brightness that can fool histogram-based techniques. In the paper, we compare experimentally our shot transition mechanism to a color histogram Implementation, and also evaluate the effectiveness of our database-searching scheme. Our algorithms are very efficient and run in realtime on a desktop computer. We describe how this technology could be employed to construct a "smart VCR" that was capable of alerting the viewer to the beginning of a specific program or identifying commercials and then muting the volume on the TV.
TL;DR: A multi-level abstraction mechanism for capturing the spatial and temporal semantics associated with various objects in an input image or in a sequence of video frames is proposed and an object-oriented paradigm is proposed which is capable of supporting domain specific views.
Abstract: In this paper, we propose a multi-level abstraction mechanism for capturing the spatial and temporal semantics associated with various objects in an input image or in a sequence of video frames. This abstraction can manifest itself effectively in conceptualizing events and views in multimedia data as perceived by individual users. The objective is to provide an efficient mechanism for handling content-based queries, with the minimum amount Of processing performed on raw data during query evaluation. We introduce a multi-level architecture for video data management at different levels of abstraction. The architecture facilitates a multi-level indexing/searching mechanism. At the finest level of granularity, video data can be indexed based on mere appearance of objects and faces. For management of information at higher levels of abstractions, an object-oriented paradigm is proposed which is capable of supporting domain specific views.
TL;DR: The proposed schemes are based on a slicing technique and use aggressive methods for admission control and the performance evaluations done through simulations show that the server utilization is improved and the quality of service is improved by using the FM and IE algorithms.
Abstract: In this paper, we have proposed efficient admission control algorithms for multimedia storage servers that are providers of variable-bit-rate media streams. The proposed schemes are based on a slicing technique and use aggressive methods for admission control. We have developed two types of admission control schemes: Future-Max (FM) and Interval Estimation (IE). The FM algorithm uses the maximum bandwidth requirement of the future to estimate the bandwidth requirement. The IE algorithm defines a class of admission control schemes that use a combination of the maximum and average bandwidths within each interval to estimate the bandwidth requirement of the interval. The performance evaluations done through simulations show that the server utilization is improved by using the FM and IE algorithms. Furthermore, the quality of service is also improved by using the FM and IE algorithms. Several results depicting the trade-off between the implementation complexity, the desired accuracy, the number of accepted requests, and the quality of service are presented.