scispace - formally typeset
Search or ask a question

Showing papers on "Smacker video published in 1999"


01 Jan 1999
TL;DR: In this article, the authors review feedback-based low bit-rate video coding techniques for robust transmission in mobile multimedia networks and compare error tracking, error confinement, and reference picture selection techniques for channel-adaptive source coding.
Abstract: We review feedback-based low bit-rate video coding techniques for robust transmission in mobile multimedia networks. For error control on the source coding level, each decoder has to make provisions for error detection, resynchronization, and error concealment, and we review techniques suitable for that purpose. Further, techniques are discussed for intelligent processing of acknowledgment information by the coding control to adapt the source coder to the channel. We review and compare error tracking, error confinement, and reference picture selection techniques for channel-adaptive source coding. For comparison of these techniques, a system for transmitting low bit-rate video over a wireless channel is presented and the performance is evaluated for a range of transmission conditions. We also show how feedback-based source coding can be employed in conjunction with precompressed video stored on a media server. The techniques discussed are applicable to a wide variety of interframe video schemes, including various video coding standards. Several of the techniques have been incorporated into the H.263 video compression standard recently, and this standard is used as an example throughout.

269 citations


Proceedings ArticleDOI
24 Oct 1999
TL;DR: This method first locates candidate text regions directly in the DCT compressed domain, and then reconstructs the candidate regions for further refinement in the spatial domain, so that only a small amount of decoding is required.
Abstract: We present a method to automatically locate captions in MPEG video. Caption text regions are segmented from the background using their distinguishing texture characteristics. This method first locates candidate text regions directly in the DCT compressed domain, and then reconstructs the candidate regions for further refinement in the spatial domain. Therefore, only a small amount of decoding is required. The proposed algorithm achieves about 4.0% false reject rate and less than 5.7% false positive rate on a variety of MPEG compressed video containing more than 42,000 frames.

268 citations


Book
01 Dec 1999
TL;DR: Image and Video Compression for Multimedia Engineering is a first, comprehensive graduate/senior level text and a self-contained reference for researchers and engineers that builds a basis for future study, research, and development.
Abstract: From the Publisher: Image and Video Compression for Multimedia Engineering provides a solid, comprehensive understanding of the fundamentals and algorithms of coding and details all of the relevant international coding standards "With the growing popularity of applications that use large amounts of visual data, image and video coding is an active and dynamic field Image and Video Compression for Multimedia Engineering is a first, comprehensive graduate/senior level text and a self-contained reference for researchers and engineers that builds a basis for future study, research, and development

206 citations


Book
01 Sep 1999
TL;DR: This book gives an introduction to video coding algorithms, working up from basic principles through to the advanced video compression systems now being developed.
Abstract: From the Publisher: This book gives an introduction to video coding algorithms, working up from basic principles through to the advanced video compression systems now being developed. The main objective is to describe the reasons behind the introduction of a standard codec for a specific application and its chosen parameters. The book should enable readers to appreciate the fundamental elements needed to design a video codec for a given application.

196 citations


Journal ArticleDOI
TL;DR: This work proposes an alternative compressed domain-based approach that computes motion vectors for the downscaled (N/ 2xN/2) video sequence directly from the original motion vectors from the N/spl times/N video sequence, and discovers that the scheme produces better results by weighting the originalmotion vectors adaptively.
Abstract: Digital video is becoming widely available in compressed form, such as a motion JPEG or MPEG coded bitstream. In applications such as video browsing or picture-in-picture, or in transcoding for a lower bit rate, there is a need to downscale the video prior to its transmission. In such instances, the conventional approach to generating a downscaled video bitstream at the video server would be to first decompress the video, perform the downscaling operation in the pixel domain, and then recompress it as, say, an MPEG, bitstream for efficient delivery. This process is computationally expensive due to the motion-estimation process needed during the recompression phase. We propose an alternative compressed domain-based approach that computes motion vectors for the downscaled (N/2xN/2) video sequence directly from the original motion vectors for the N/spl times/N video sequence. We further discover that the scheme produces better results by weighting the original motion vectors adaptively. The proposed approach can lead to significant computational savings compared to the conventional spatial (pixel) domain approach. The proposed approach is useful for video severs that provide quality of service in real time for heterogeneous clients.

195 citations


Patent
03 Mar 1999
TL;DR: In this article, an OMFS is configured to divide the received digital video information into one or more packets, each packet having the same number of bytes as a sector on a disk in the disk drive.
Abstract: Apparatus and corresponding methods for storing video information. The apparatus includes a means for receiving video information, a means for converting the received video information into digital video information, and a means for storing the digital video information. Converting the received video information into digital video information can include converting it into an MPEG-compatible digital format. The apparatus can include an MPEG-compatible digital encoder, which can include separate audio and video MPEG encoders. The digital video information can be stored on an electromagnetically writable disk drive with an optimized MPEG file system (OMFS) configured to receive the digital video information and store the digital video information on the disk drive, where the OMFS is configured to divide the received digital video information into one or more packets, each packet having the same number of bytes as a sector on a disk in the disk drive. The OMFS can first accumulate one of the packets of digital video information in a cache memory. The OMFS, upon filling the cache memory with a completed packet of digital video information, then stores the completed packet on a single one of the sectors of the disk drive.

187 citations


Journal ArticleDOI
TL;DR: Two schemes are proposed: retrieval based on key frames follows the traditional approach of identifying shots, computing key frames from a video, and then extracting image features around the key frames, and retrieval using sub-sampled frames is based on matching color and texture features of the sub-Sampled frames.
Abstract: Typical digital video search is based on queries involving a single shot. We generalize this problem by allowing queries that involve a video clip (say, a 10-s video segment). We propose two schemes: (i) retrieval based on key frames follows the traditional approach of identifying shots, computing key frames from a video, and then extracting image features around the key frames. For each key flame in the query, a similarity value (using color, texture, and motion) is obtained with respect to the key frames in the database video. Consecutive key frames in the database video that are highly similar to the query key frames are then used to generate the set of retrieved video clips. (ii) In retrieval using sub-sampled frames, we uniformly subsample the query Clip as well as the database video. Retrieval is based on matching color and texture features of the subsampled frames. Initial experiments on two video databases (basketball video with approximately 16,000 frames and a CNN news video with approximately 20,000 frames) show promising results. Additional experiments using segments from one basketball video as query and a different basketball video as the database show the effectiveness of feature representation and matching schemes.

166 citations


Proceedings ArticleDOI
30 Oct 1999
TL;DR: NoteLook is a client-server system designed and built to support multimedia note taking in meetings with digital video and ink, integrated into a conference room equipped with computer controllable video cameras, video conference camera, and a large display rear video projector.
Abstract: NoteLook is a client-server system designed and built to support multimedia note taking in meetings with digital video and ink. It is integrated into a conference room equipped with computer controllable video cameras, video conference camera, and a large display rear video projector. The NoteLook client application runs on wireless pen-based notebook computers. Video channels containing images of the room activity and presentation material are transmitted by the NoteLook servers to the clients, and the images can be interactively and automatically incorporated into the note pages. Users can select channels, snap in large background images and sequences of thumbnails, and write freeform ink notes. A smart video source management component enables the capture of high quality images of the presentation material from a variety of sources. For accessing and browsing the notes and recorded video, NoteLook generates Web pages with links from the images and ink strokes correlated to the video.

158 citations


Journal ArticleDOI
TL;DR: This paper compares the transmission schedules generated by the various smoothing algorithms, based on a collection of metrics that relate directly to the server, network, and client resources necessary for the transmission, transport, and playback of prerecorded video.
Abstract: The transfer of prerecorded, compressed variable-bit-rate video requires multimedia services to support large fluctuations in bandwidth requirements on multiple time scales. Bandwidth smoothing techniques can reduce the burstiness of a variable-bit-rate stream by transmitting data at a series of fixed rates, simplifying the allocation of resources in video servers and the communication network. This paper compares the transmission schedules generated by the various smoothing algorithms, based on a collection of metrics that relate directly to the server, network, and client resources necessary for the transmission, transport, and playback of prerecorded video. Using MPEG-1 and MJPEG video data and a range of client buffer sizes, we investigate the interplay between the performance metrics and the smoothing algorithms. The results highlight the unique strengths and weaknesses of each bandwidth smoothing algorithm, as well as the characteristics of a diverse set of video clips.

137 citations


Proceedings ArticleDOI
07 Jun 1999
TL;DR: A novel technique for determining keyframes that are different from each other and provide a good representation of the whole video is described, which can determine any number of keyframes.
Abstract: In accessing large collections of digitized videos, it is often difficult to find both the appropriate video file and the portion of the video that is of interest. The paper describes a novel technique for determining keyframes that are different from each other and provide a good representation of the whole video. We use keyframes to distinguish videos from each other to summarize videos, and to provide access points into them. The technique can determine any number of keyframes by clustering the frames in a video and by selecting a representative frame from each cluster. Temporal constraints are used to filter out some clusters and to determine the representative frame for a cluster. Desirable visual features can be emphasized in the set of keyframes. An application for browsing a collection of videos makes use of the keyframes to support skimming and to provide visual summaries.

121 citations


Patent
27 May 1999
TL;DR: In this paper, a video distribution apparatus for storing video data and distributing it to a viewer has a memory for video data, and a schedule table for holding a distribution schedule of the stored video data.
Abstract: A video distributing apparatus for storing video data and distributing it to a viewer has a memory for video data and a schedule table for holding a distribution schedule of the stored video data A controller controls the distribution of the stored video data. A reservation request includes a title of the video data to be distributed, a channel to be used for distribution of the video data of the title, and information to designate a time to start the distribution.

Patent
09 Mar 1999
TL;DR: In this article, an apparatus is provided for editing video which has at least two components: a digital database system (11) and a nonlinear video editor (12) with the capability to decimate the source video segments into decimated video segments of a selected decimation quality.
Abstract: An apparatus (10) is provided for editing video which has at least two components: a digital database system (11) and a nonlinear video editor (12). The digital database system (11) stores source video segments and has the capability to decimate the source video segments into decimated video segments of a selected decimation quality. The nonlinear video editor (12) is connected to selectively access decimated video segments and source video segments from the digital database system (11). The nonlinear video editor (12) is capable of using the decimated video segments during editing of a video program and accessing the source video segments to produce the program at a different quality than the selected video.

Patent
06 Dec 1999
TL;DR: In this article, a navigational control map, transmitted from the head-end to the CATV set-top box in a fixed location in the MPEG-2 video data stream, permits the settop to find the requested video clip in a predetermined Packet Identifier of the MPEG2 data stream.
Abstract: An implementation of streaming video in HTML Web pages combines video signals in MPEG digital television format with Internet World Wide Web pages in HTML format. Internet streaming video is transcoded into MPEG-2 digital video format and multiplexed along with other MPEG-2 digital video signals for transport within a multiple channel digital video system. A navigational control map, transmitted from the headend to the CATV set-top box in a fixed location in the MPEG-2 video data stream, permits the CATV set-top to find the requested video clip in a predetermined Packet Identifier of the MPEG-2 data stream. The viewer controls the video clip (e.g., play, pause, resume, restart etc.) during the session. The set-top transmits control commands to the headend. The disclosed arrangement allows the available MPEG-2 decoder hardware in the CATV set-top box to be used to display streaming video without requiring additional hardware or additional RAM memory.

Patent
21 Dec 1999
TL;DR: In this article, a method and system for conveying a video message is described, where video data comprising at least image data and associated audio data is captured and a message structure (401) is created.
Abstract: A method and system (100) for conveying a video message is disclosed. Video data comprising at least image data and associated audio data is captured and a video message structure (401) is created. A link is established between the structure (401) and the video data to create a structured video message. The structured video message is characterized by a video message structure (401) that provides an originator (103), of the message, enhanced manipulation capabilities for the video data by manipulating the structure (401). The structured video message can be conveyed to a recipient for viewing and/or for providing the recipient substantially the same enhanced manipulation capabilities.

Patent
30 Sep 1999
TL;DR: In this article, a data structure is associated with each multimedia bitstream and an automatic format conversion process is used to determine whether or not decoding is required before transcoding is performed, which can save processing time and computer resources in those cases where decoding is not required.
Abstract: A multimedia information retrieval system and method including a method and system for automatic format conversion. The invention includes a data structure that is associated with each multimedia bitstream. The data structure identifies the encoding format, e.g., compression technique, used in the multimedia bitstream which is originated by a contents server. An automatic format conversion process then queries information from the client system (requester) and also receives the data structure identifying the encoding format. The client information identifies the decoding format. The automatic format conversion determines the transcoding process required for converting the bitstream from its encoded format to the format recognized by the client system. The format conversion process of the present invention also determines whether or not decoding is required before transcoding is performed thereby saving processing time and computer resources in those cases where decoding is not required. Moreover, the format conversion process also automatically determines the computer memory size required to perform the transcoding process thereby saving computer memory resources. The format converter can be implemented in software as an application and can also be integrated within a data access server. The data access server can be integrated within the client system or within the contents server. The format converter of the invention is particularly useful for electronic devices coupled in a communication network where the encoding format of the sender may not be compatible with the decoding format of the receiver, thereby requiring transcoding between the formats.

Patent
09 Aug 1999
TL;DR: In this paper, a video processing device for searching video streams for one or more user-selected image text attributes is presented. But the video processing devices are limited in their ability to detect and extract image text from video frames, determining attributes of the extracted image text, comparing the extracted text attributes and the user-selectable image text features, and modifying, transferring, and labeling at least a portion of the video stream in accordance with user commands.
Abstract: There is disclosed, for use in video text analysis system, a video processing device for searching video streams for one or more user-selected image text attributes. The video processing device comprises an image processor capable detecting and extracting image text from video frames, determining attributes of the extracted image text, comparing the extracted image text attributes and the user-selected image text attributes, and, if a match occurs, modifying, transferring, and/or labeling at least a portion of the video stream in accordance with user commands. The invention uses the user-selected image text attributes to search through an archive of video clips to 1) locate particular types of events, such as news programs or sports events; 2) locate programs featuring particular persons or groups; 3) locate programs by name; 4) save or remove all or some commercials, and to otherwise sort, edit, and save all of, or portions of, video clips according to image text that appears in the frames of the video clips.

Patent
01 Apr 1999
TL;DR: In this article, a defibrillator includes circuitry configured to produce defibrillation and an audio/video output unit having a database of video image information stored in a memory, a video display, and a video formulation unit coupled to the memory and configured to retrieve video information from the database and present corresponding information to the video display for display.
Abstract: A defibrillator includes circuitry configured to produce a defibrillatory shock and an audio/video output unit having a database of video image information stored in a memory, a video display, and a video formulation unit coupled to the memory and configured to retrieve video information from the database of video image information and present corresponding information to the video display for display. The video information may include still images, animated images, motion images or a combination of textual information and at least one of still images, animated images and motion images. The audio/video output unit may be configured to receive inputs relating to user inputs, patient signals and device inputs, and to provide video instructions, and optionally audio or textual instructions, relating to operation of the defibrillator based on the current operational state of the defibrillator.

Patent
30 Mar 1999
TL;DR: In this article, a single-chip application specific integrated circuit provides autonomous management of playback of digital video and audio, which includes a digital video decoder and output system, and a central processing unit controlling the decoder.
Abstract: A single-chip application specific integrated circuit provides autonomous management of playback of digital video and audio. The chip includes a digital video decoder and output system, and a central processing unit controlling said digital video decoder and output system. The central processing unit receives commands to establish a current playback state for management of playback of digital video and audio by said digital video decoder and output system, and responds to a video field synchronization signal and a current playback state, without external instruction, to determine whether to display digital video, whether to decode digital video for display, whether to repeat display of previously decoded digital video, and whether to skip over digital video prior to decoding digital video for output. By delivering commands to the central processing unit, the application specific integrated circuit can be caused to transition between playback states to provide desired playback of said digital video and audio.

Patent
25 May 1999
TL;DR: In this paper, a computer system is coupled with a video camera, and the computer system can detect an object in the scene and generate an input signal for the computer in response to the detected object.
Abstract: Providing input signals to a computer system having a display (116), the computer system being coupled to a video camera (501) or other video source, is accomplished by capturing video data signals generated by the video camera (501), the video data signals representing a scene, rendering the scene on the display such that the scene is reflected and transparently visible on the display (116), analyzing the video data signals to detect an object in the scene, and generating an input signal for the computer system in response to the detected object.

Proceedings ArticleDOI
24 Oct 1999
TL;DR: A framework for measuring video similarity across different resolutions-both spatial and temporal is presented and an application to searching MPEG compressed video by example is presented to demonstrate the potential use of the proposed video similarity measure.
Abstract: The usefulness of a video database relies on whether the video of interest can be easily located To allow exploring, browsing, and retrieving videos according to their visual content, efficient techniques for evaluating the visual similarity between different video clips are necessary We present a framework for measuring video similarity across different resolutions-both spatial and temporal In particular, the video clips to be compared can be properly aligned through the use of suitable weighting functions and alignment constraints Dynamic programming techniques are employed to obtain the video similarity measure with a reasonable computational cost An application to searching MPEG compressed video by example is presented to demonstrate the potential use of the proposed video similarity measure

Proceedings ArticleDOI
01 Oct 1999
TL;DR: A new video summarization procedure is introduced that produces a dynamic (video) abstract of the original video sequence, in which two semantic events; emotionaldialogue and violentfeatured action are characterized and abstracted into the video summary before all other events.
Abstract: In this paper, we introduce a new video summarization procedure that produces a dynamic (video) abstract of the original video sequence. Our technique compactly summarizes a video data by preserving its original temporal characteristics (visual activity) and semantically essential information. It relies on an adaptive nonlinear sampling. The local sampling rate is directly proportional to the amount of visual’ activity in localized sub-shot units of the video. The resulting video abstract is highly compact. To get very short, yet semantically meaningful summaries, we propose an event-oriented abstraction scheme, in which two semantic events; emotionaldialogue and violentfeatured action, are characterized and abstracted into the video summary before all other events. If the length of the summary permits, other non key events are then added.

Journal ArticleDOI
TL;DR: A video model to generate VBR MPEG video traffic based on the scene content description that may be used to generate traffic of any type of video scenes ranging from a low complexity video conferencing to a highly active sport program.
Abstract: In this paper, we propose a video model to generate VBR MPEG video traffic based on the scene content description. Long sessions of nonhomogeneous video clips are decomposed into homogeneous video shots. The shots are then classified into different classes in terms of their texture and motion complexity. Each shot class was uniquely described with an autoregressive model. Transitions between the shots and their durations have been analyzed. Unlike many classical video source models, this model may be used to generate traffic of any type of video scenes ranging from a low complexity video conferencing to a highly active sport program. The performance of the model is evaluated by measuring the mean cell delay when the generated video traffic is fed to an ATM multiplex buffer.

Patent
08 Dec 1999
TL;DR: In this paper, a stereoscopic video apparatus is proposed to convert the format of the input stereoscopic data into a format suitable for output operation, and then output the converted format to a network.
Abstract: When stereoscopic video data is distributed through a network, some terminal cannot process the distributed stereoscopic video data because the format of the data does not correspond to the terminal. Likewise, when video data are input from a plurality of types of cameras with different stereoscopic video data formats, data of an incompatible format cannot be processed. In order to prevent this problem, there is provided a stereoscopic video apparatus which inputs stereoscopic video data, converts the format of the input stereoscopic video data into a format suitable to output operation, and outputs the converted stereoscopic video data to a network.

Patent
12 Nov 1999
TL;DR: In this paper, a computer system includes a video input device that generates video data representing a field of view in front of the input device, and a buffer and a video processing module.
Abstract: A computer system includes a video input device that generates video data representing a field of view in front of the video input device. The computer system further includes a buffer and a video processing module. The buffer is configured to record a video clip of the field of view in front of the video input device. The video processing module is connected to the buffer and the video input device. The video processing module includes a video capture device to encode the video data, and a signal processing module. The signal processing module processes the encoded video data in order to calculate an average value of the video data indicative of motion within the field of view in front of the video input device. The signal processing module generates an alarm if the average value is above a predetermined threshold value.

Patent
Hiroshi Nojima1
24 May 1999
TL;DR: In this paper, the retrieval server performs retrieval processing in a unit of object on the basis of the retrieval condition designated for each video object by the user and judges the simultaneous appearance on a basis of appearance time section information of the object coincident with the retrieval conditions to extract the user's desired video scene and present it to the user.
Abstract: The video information retrieval system includes a client for inputting video data or retrieving a registered video file, and a retrieval server for registering video data transmitted from the client and retrieving the registered video data in response to a request from the client. The retrieval server analyzes a video stream upon registration of the video data and separates contents of a video constituting the video stream. The server extracts annotation information such as image feature vectors for each of the separated video contents and stores the annotation information in a video information table as a video object. The retrieval server performs retrieval processing in a unit of object on the basis of the retrieval condition designated for each video object by the user and judges the simultaneous appearance on the basis of appearance time section information of the object coincident with the retrieval condition to thereby extract the user's desired video scene and present it to the user.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: A new video summarization procedure that produces a dynamic (video) abstract of the original video sequence that relies on an adaptive nonlinear sampling of the video.
Abstract: We present a new video summarization procedure that produces a dynamic (video) abstract of the original video sequence. Our approach relies on an adaptive nonlinear sampling of the video. The local sampling rate is directly proportional to the amount of visual activity in localized sub-shot units of the video. The resulting video abstract is highly compact. At playtime, linear interpolation is used to provide the viewer with a summary of the video that accurately preserves the relative length and amount of activity in each sub-shot unit.

Patent
10 Aug 1999
TL;DR: In this paper, the authors describe a communication apparatus consisting of a wireless communication unit that receives first encoded video data encoded by a first video encoding system from the first apparatus through a wireless network; a decoding unit which decodes the first encoded data to provide decoded video data; an encoding unit which encodes the decoded data into second encoded video using a second video encoding systems; and a wired communication unit which transmits the second encoded audio data to the second apparatus through wired network.
Abstract: A communication apparatus includes a wireless communication unit which receives first encoded video data encoded by a first video encoding system from the first apparatus through a wireless network; a decoding unit which decodes the first encoded video data to provide decoded video data; an encoding unit which encodes the decoded video data into second encoded video data using a second video encoding system; and a wired communication unit which transmits the second encoded video data to the second apparatus through a wired network. The first video encoding system is suitable for a first communication protocol used between the first apparatus and the communication apparatus. The second video encoding system is suitable for a second communication protocol used between the second apparatus and the communication apparatus. The first video encoding system uses a video encoding different from MPEG encoding. The second video encoding system uses MPEG encoding.

Journal ArticleDOI
TL;DR: In order to provide access to video footage within seconds of broadcast, a new pipelined digital video processing architecture which is capable of digitizing, processing, indexing and compressing video in real time on an inexpensive general purpose computer is developed.
Abstract: The VISION (video indexing for searching over networks) digital video library system has been developed in our laboratory as a testbed for evaluating automatic and comprehensive mechanisms for video archive creation and content-based search, filtering and retrieval of video over local and wide area networks. In order to provide access to video footage within seconds of broadcast, we have developed a new pipelined digital video processing architecture which is capable of digitizing, processing, indexing and compressing video in real time on an inexpensive general purpose computer. These videos were automatically partitioned into short scenes using video, audio and closed-caption information. The resulting scenes are indexed based on their captions and stored in a multimedia database. A client-server-based graphical user interface was developed to enable users to remotely search this archive and view selected video segments over networks of different bandwidths. Additionally, VISION classifies the incoming videos with respect to a taxonomy of categories and will selectively send users videos which match their individual profiles.

Proceedings ArticleDOI
07 Jun 1999
TL;DR: The operational library interface shows the geographic entities addressed in a given story, highlighting the regions discussed at any point in the video through a map display, synchronized with the video playback.
Abstract: The Informedia Digital Video Library contains over 1200 hours of video. Through automatic processing, descriptors are derived for the video to improve library access. A new extension to the video processing is the extraction of geographic references from these descriptors. The operational library interface shows the geographic entities addressed in a given story, highlighting the regions discussed at any point in the video through a map display, synchronized with the video playback. The map can also be used as a query mechanism, allowing users to search the terabyte library for stories taking place in a selected area of interest.

Patent
16 Apr 1999
TL;DR: In this article, a system is disclosed that can be used to enhance a video of an event, where sensors are used at the event to acquire information such as pan, tilt and zoom sensors to acquire camera view information.
Abstract: A system is disclosed that can be used to enhance a video of an event. Sensors are used at the event to acquire information. For example, the system can include pan, tilt and zoom sensors to acquire camera view information. This information can be added to the video signal from a camera (e.g. in the vertical blanking interval) or otherwise transmitted to a central studio. At the studio, the sensor information is used to enhance the video for broadcast. Example enhancements include drawing lines or other shapes in the video, adding advertisements to the video or adding other graphics to the video.