scispace - formally typeset
Search or ask a question

Showing papers on "Video tracking published in 1996"


Journal ArticleDOI
TL;DR: A novel observation model based on motion compensated subsampling is proposed for a video sequence and Bayesian restoration with a discontinuity-preserving prior image model is used to extract a high-resolution video still given a short low-resolution sequence.
Abstract: The human visual system appears to be capable of temporally integrating information in a video sequence in such a way that the perceived spatial resolution of a sequence appears much higher than the spatial resolution of an individual frame. While the mechanisms in the human visual system that do this are unknown, the effect is not too surprising given that temporally adjacent frames in a video sequence contain slightly different, but unique, information. This paper addresses the use of both the spatial and temporal information present in a short image sequence to create a single high-resolution video frame. A novel observation model based on motion compensated subsampling is proposed for a video sequence. Since the reconstruction problem is ill-posed, Bayesian restoration with a discontinuity-preserving prior image model is used to extract a high-resolution video still given a short low-resolution sequence. Estimates computed from a low-resolution image sequence containing a subpixel camera pan show dramatic visual and quantitative improvements over bilinear, cubic B-spline, and Bayesian single frame interpolations. Visual and quantitative improvements are also shown for an image sequence containing objects moving with independent trajectories. Finally, the video frame extraction algorithm is used for the motion-compensated scan conversion of interlaced video data, with a visual comparison to the resolution enhancement obtained from progressively scanned frames.

1,058 citations


Proceedings ArticleDOI
01 Aug 1996
TL;DR: This work presents a hybrid tracking method that combines the accuracy of vision-based tracking with the robustness of magnetic tracking without compromising real-time performance or usability.
Abstract: Accurate registration between real and virtual objects is crucial for augmented reality applications. Existing tracking methods are individually inadequate: magnetic trackers are inaccurate, mechanical trackers are cumbersome, and vision-based trackers are computationally problematic. We present a hybrid tracking method that combines the accuracy of vision-based tracking with the robustness of magnetic tracking without compromising real-time performance or usability. We demonstrate excellent registration in three sample applications. CR

457 citations


Patent
12 Jul 1996
TL;DR: In this article, a method and apparatus for use in a digital video delivery system is provided, where a digital representation of an audio-visual work, such as an MPEG file, is parsed to produce a tag file.
Abstract: A method and apparatus for use in a digital video delivery system is provided. A digital representation of an audio-visual work, such as an MPEG file, is parsed to produce a tag file. The tag file includes information about each of the frames in the audio-visual work. During the performance of the audio-visual work, data from the digital representation is sent from a video pump to a decoder. Seek operations are performed by causing the video pump to stop transmitting data from the current position in the digital representation, and to start transmitting data from a new position in the digital representation. The information in the tag file is inspected to determine the new position from which to start transmitting data. To ensure that the data stream transmitted by the video pump maintains compliance with the applicable video format, prefix data that includes appropriate header information is transmitted by said video pump prior to transmitting data from the new position. Fast and slow forward and rewind operations are performed by selecting video frames based on the information contained in the tag file and the desired presentation rate, and generating a data stream containing data that represents the selected video frames. A video editor is provided for generating a new video file from pre-existing video files. The video editor selects frames from the pre-existing video files based on editing commands and the information contained in the tag files of the pre-existing video files. A presentation rate, start position, end position, and source file may be separately specified for each sequence to be created by the video editor.

388 citations


Journal ArticleDOI
Wei Ding1, Bede Liu1
TL;DR: A feedback re-encoding method with a rate-quantization model, which can be adapted to changes in picture activities, is developed and used for quantization parameter selection at the frame and slice level.
Abstract: For MPEG video coding and recording applications, it is important to select the quantization parameters at slice and macroblock levels to produce consistent quality image for a given bit budget. A well-designed rate control strategy can improve the overall image quality for video transmission over a constant-bit-rate channel and fulfil the editing requirement of video recording, where a certain number of new pictures are encoded to replace consecutive frames on the storage media using, at most, the same number of bits. We developed a feedback re-encoding method with a rate-quantization model, which can be adapted to changes in picture activities. The model is used for quantization parameter selection at the frame and slice level. The extra computations needed are modest. Experiments show the accuracy of the model and the effectiveness of the proposed rate control method. A new bit allocation algorithm is then proposed for MPEG video coding.

377 citations


Proceedings ArticleDOI
22 Mar 1996
TL;DR: A metric for the assessment of video coding quality is presented based on a multi- channel model of human spatio-temporal vision that has been parameterized for video coding applications by psychophysical experiments.
Abstract: This paper addresses the problem of quality estimation of digitally coded video sequences. The topic is of great interest since many products in digital video are about to be released and it is thus important to have robust methodologies for testing and performance evaluation of such devices. The inherent problem is that human vision has to be taken into account in order to assess the quality of a sequence with a good correlation with human judgment. It is well known that the commonly used metric, the signal-to-noise ratio is not correlated with human vision. A metric for the assessment of video coding quality is presented. It is based on a multi- channel model of human spatio-temporal vision that has been parameterized for video coding applications by psychophysical experiments. The visual mechanisms of vision are simulated by a spatio-temporal filter bank. The decomposition is then used to account for phenomena as contrast sensitivity and masking. Once the amount of distortions actually perceived is known, quality estimation can be assessed at various levels. The described metric is able to rate the overall quality of the decoded video sequence as well as the rendition of important features of the sequence such as contours or textures.

372 citations


Patent
18 Dec 1996
TL;DR: In this article, the operation of drag and drop of a symbol to a specific position on a map showing the symbol indicating information of the position where an image generator is set, thereby establishing logical network connection with a video transmission terminal to which the image generator was connected, is described.
Abstract: In order to freely select, locate and display an image from a remote place on a monitor to facilitate observer's use, disclosed are apparatus and methods arranged to perform operation of drag and drop of a symbol to a specific position on a map showing the symbol indicating information of the position where an image generator is set, thereby establishing logical network connection with a video transmission terminal to which the image generator is connected, to display a video in an arbitrary display area, to perform the drag and drop operation of the video displayed in the video display area to another video display area, thereby changing a video display position, and to perform the drag and drop operation thereof to a display stop symbol, thereby disconnecting the logical network connection to stop the video display of the video camera.

353 citations


Proceedings ArticleDOI
16 Sep 1996
TL;DR: In this article, a scheme for robust interoperable watermarking of MPEG-2 encoded video is presented. But the watermark is embedded either into the uncoded video or into the MPEG2 bitstream, and can be retrieved from the decoded video.
Abstract: Embedding information into multimedia data is a topic that has gained increasing attention recently. For video broadcast applications, watermarking of video, and especially of already encoded video, is interesting. We present a scheme for robust interoperable watermarking of MPEG-2 encoded video. The watermark is embedded either into the uncoded video or into the MPEG-2 bitstream, and can be retrieved from the decoded video. The scheme working on encoded video is of much lower complexity than a complete decoding process followed by watermarking in the pixel domain and re-encoding. Although an existing MPEG-2 bitstream is partly altered, the scheme avoids drift problems. The scheme has been implemented and practical results show that a robust watermark can be embedded into MPEG encoded video which can be used to transmit arbitrary binary information at a data rate of several bytes/second.

332 citations


Proceedings ArticleDOI
22 Feb 1996
TL;DR: This study showed that previously proposed selective encryption schemes for MPEG video security are inadequate for sensitive applications and discusses the tradeoffs between levels of security and computational and compression efficiency.
Abstract: MPEG (Moving Pictures Expert Group) is an industrial standard for video processing and is widely used in multimedia applications in the Internet. However, no security provision is specified in the standard. We conducted an experimental study of previously proposed selective encryption schemes for MPEG video security. This study showed that these methods are inadequate for sensitive applications. We discuss the tradeoffs between levels of security and computational and compression efficiency.

294 citations


Patent
28 Jun 1996
TL;DR: In this article, a system for enhancing the television presentation of an object at a sporting event includes a sensor (210, 212, 214, 216), which determines the location of the object.
Abstract: A system (200) for enhancing the television presentation of an object at a sporting event includes a sensor (210, 212, 214, 216), which determines the location of the object. Based on the location of the object and the field of view of a broadcast camera (201, 202, 203, 204), a processor (302) determines the position of the object in the video frame of the broadcast camera. Once knowing where the object is positioned within the video frame, the television signal can be edited or augmented to enhance the presentation of the object.

286 citations


Patent
16 Feb 1996
TL;DR: In this article, a video terminal device capable of controlling video playback by a controller stores a position of the video program at which it was interrupted by the user, and the interrupted position is stored in a video library of a video server.
Abstract: In order to provided information for helping a viewing user remember the contents of past viewing of a video program, in an easy-to-comprehend form and in as small an information amount as possible, a video terminal device capable of controlling video playback by a controller stores a position of the video program at which it was interrupted by the user. The interrupted position is stored in a video library of a video server. Images representative of a portion from the start or another position of the interrupted video program up to the interrupted position are extracted by a video digest making program. The extracted representative images are represented by a list display based on reduced icons or a digest image. The list or the digest image is displayed before resuming the interrupted video program.

266 citations


Patent
18 Sep 1996
TL;DR: In this paper, a graphic image system comprising a video camera producing a first video signal defining a first image including a foreground object and a background, the foreground object preferably including an image of a human subject having a head with a face, an image position estimating system for identifying a position with respect to said foreground object, and a computer, responsive to the position estimation system, for defining a mask region separating the foreground objects from said background.
Abstract: A graphic image system comprising a video camera producing a first video signal defining a first image including a foreground object and a background, the foreground object preferably including an image of a human subject having a head with a face; an image position estimating system for identifying a position with respect to said foreground object, e.g., the head, the foreground object having features in constant physical relation to the position; and a computer, responsive to the position estimating system, for defining a mask region separating the foreground object from said background. The computer generates a second video signal including a portion corresponding to the mask region, responsive to said position estimating system, which preferably includes a character having a mask outline. In one embodiment, the mask region of the second video signal is keyed so that the foreground object of the first video signal shows through, with the second video signal having portions which interact with the foreground object. In another embodiment, means, responsive to the position estimating system, for dynamically defining an estimated boundary of the face and for merging the face, as limited by the estimated boundary, within the mask outline of the character. Video and still imaging devices may be flexibly placed in uncontrolled environments, such as in a kiosk in a retail store, with an actual facial image within the uncontrolled environment placed within a computer generated virtual world replacing the existing background and any non-participants.

Journal ArticleDOI
01 Feb 1996
TL;DR: In this article, the authors present a technique for calibrating the head-eye geometry and the camera intrinsic parameters using three pure translational motions, each consisting of three orthogonal translations, are necessary to determine the camera orientation and intrinsic parameters.
Abstract: A manipulator wrist-mounted camera considerably facilitates motion stereo, object tracking, and active perception. An important issue in active vision is to determine the camera position and orientation relative to the camera platform (head-eye calibration or hand-eye calibration). We present a technique for calibrating the head-eye geometry and the camera intrinsic parameters. The technique allows camera self-calibration because it requires no reference object and directly uses the images of the environment. Camera self-calibration is important especially where the underlying visual tasks do not permit the use of reference objects. Our method exploits the flexibility of the active vision system, and bases camera calibration on a sequence of specially designed motion. It is shown that if the camera intrinsic parameters are known a priori, the orientation of the camera relative to the platform can be solved using 3 pure translational motions. If the intrinsic parameters are unknown, then two sequences of motion, each consisting of three orthogonal translations, are necessary to determine the camera orientation and intrinsic parameters. Once the camera orientation and intrinsic parameters are determined, the position of the camera relative to the platform can be computed from an arbitrary nontranslational motion of the platform. All the computations in our method are linear. Experimental results with real images are presented.

Proceedings ArticleDOI
25 Aug 1996
TL;DR: Time-constrained clustering of video shots is proposed to collapse visually similar and temporally local shots into a compact structure that allows the automatic segmentation of scenes and story units that cannot be achieved by existing shot boundary detection schemes.
Abstract: Many video programs have story structures that can be recognized through the clustering of video contents based on low-level visual primitives, and the analysis of high level structures imposed by temporal arrangement of composing elements. In this paper time-constrained clustering of video shots is proposed to collapse visually similar and temporally local shots into a compact structure. We show that the proposed clustering formulations, when incorporated into the scene transition graph framework, allows the automatic segmentation of scenes and story units that cannot be achieved by existing shot boundary detection schemes. The proposed method is able to decompose video into meaningful hierarchies and provide compact representations that reflect the flow of story, thus offering efficient browsing and organization of video.

Proceedings ArticleDOI
TL;DR: A generalized top-down hierarchial clustering process, which adopts partition clustering recursively at each level of the hierarchy, is studied and used to build hierarchical views of video shots.
Abstract: The large amount of video data makes it a tedious and hard job to browse and annotate them by just fast forward and rewind. Recent works in video parsing provide a foundation for building interactive and content based video browsing systems. In this paper, a generalized top-down hierarchial clustering process, which adopts partition clustering recursively at each level of the hierarchy, is studied and used to build hierarchical views of video shots. With the clustering processes, when a list of video programs or clips is provided, a browsing system can use either key-frame and/or shot features to cluster shots into classes, each of which consists of shots of similar content. After such clustering, each class of shots can be represented by an icon, which can then be displayed at the high levels of a hierarchical browser. As a result, users can know roughly the content of video shots even without moving down to a lower level of the hierarchy.© (1996) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Patent
02 Aug 1996
TL;DR: In this article, the authors proposed a hybrid tracking system that combines the registration accuracy of vision-based tracking and the robustness of magnetic tracking systems, which is applicable to see-through and video augmented reality systems.
Abstract: Systems, methods and computer program products which have the registration accuracy of vision-based tracking systems and the robustness of magnetic tracking systems. Video tracking of landmarks is utilized as the primary method for determining camera position and orientation but is enhanced by magnetic or other forms of physical tracking camera movement and orientation. A physical tracker narrows the landmark search area on images, speeding up the landmark search process. Information from the physical tracker may also be used to select one of several solutions of a non-linear equation resulting from the vision-based tracker. The physical tracker may also act as a primary tracker if the image analyzer cannot locate enough landmarks to provide proper registration, thus, avoiding complete loss of registration. Furthermore, if 1 or 2 landmarks (not enough for a unique solution) are detected, several may be utilized heuristic methods are used to minimize registration loss. Catastrophic failure may be avoided by monitoring the difference between results from the physical tracker and the vision-based tracker and discarding corrections that exceed a certain magnitude. The hybrid tracking system is equally applicable to see-through and video augmented reality systems.

Proceedings ArticleDOI
Jakub Segen1
25 Aug 1996
TL;DR: The system has numerous applications since various statistics and indicators of human activity can be derived from the motion trajectories, including people counts, presence and time spent in a region, traffic density maps and directional traffic statistics.
Abstract: This paper describes a system for real-time tracking of people in video sequences. The input to the system is live or recorded video data acquired by a stationary camera in an environment where the primary moving objects are people. The output consists of trajectories which give the spatio-temporal coordinates of individual persons as they move in the environment. The system uses a new model-based approach to object tracking. It identifies feature points in each video frame, matches feature points across frames to produce feature "paths", then groups short-lived and partially overlapping feature paths into longer living trajectories representing motion of individual persons. The path grouping is based on a novel model-based algorithm for motion clustering. The system runs on an SGI Indy workstation at an average rate of 14 frames a second. The system has numerous applications since various statistics and indicators of human activity can be derived from the motion trajectories. Examples of these indicators described in the paper include people counts, presence and time spent in a region, traffic density maps and directional traffic statistics.

Journal ArticleDOI
TL;DR: In this article, the authors define a video abstract as a sequence of still or moving images presenting the content of a video in such a way that the target group is rapidly provided with concise information about the content while the essential message of the original is preserved.

Book
31 Dec 1996
TL;DR: Rate-Distortion Based Video Compression establishes a general theory for the optimal bit allocation among dependent quantizers, which is used to design efficient motion estimation schemes, video compression schemes and object boundary encoding schemes.
Abstract: From the Publisher: The book contains a review chapter on video compression, a background chapter on optimal bit allocation and the necessary mathematical tools, such as the Lagrangian multiplier method and Dynamic Programming. These two introductory chapters make the book self-contained and a fast way of entering this exciting field. Rate-Distortion Based Video Compression establishes a general theory for the optimal bit allocation among dependent quantizers. The minimum total (average) distortion and the minimum maximum distortion cases are discussed. This theory is then used to design efficient motion estimation schemes, video compression schemes and object boundary encoding schemes. For the motion estimation schemes, the theory is used to optimally trade the reduction of energy in the displaced frame difference (DFD) for the increase in the rate required to encode the displacement vector field (DVF). These optimal motion estimators are then used to formulate video compression schemes which achieve an optimal distribution of the available bit rate among DVF, DFD and segmentation. This optimal bit allocation results in very efficient video coders. In the last part of the book, the proposed theory is applied to the optimal encoding of object boundaries, where the bit rate needed to encode a given boundary is traded for the resulting geometrical distortion. Again, the resulting boundary encoding schemes are very efficient. Rate-Distortion Based Video Compression is ideally suited for anyone interested in this booming field of research and development, especially engineers who are concerned with the implementation and design of efficient video compression schemes. It also represents a foundation for future research, since all the key elements needed are collected and presented uniformly. Therefore, it is ideally suited for graduate students and researchers working in this field.

Journal ArticleDOI
TL;DR: A formal model for video data is developed and it is shown how spatial data structures, suitably modified, provide an elegant way of storing such data.
Abstract: We describe how video data can be organized and structured so as to facilitate efficient querying. We develop a formal model for video data and show how spatial data structures, suitably modified, provide an elegant way of storing such data. We develop algorithms to process various kinds of video queries and show that, in most cases, the complexity of these algorithms is linear. A prototype system, called the Advanced Video Information System (AVIS), based on these concepts, has been designed at the University of Maryland.

Proceedings Article
30 Mar 1996
TL;DR: The concept of immersive video, which employs computer vision and computer graphics technologies to provide viewers of live events a sense of total immersion by providing the viewer with a "virtual camera", is introduced.
Abstract: Interactive video and television viewers should have the power to control their viewing position. To realize this, we introduce the concept of immersive video, which employs computer vision and computer graphics technologies to provide viewers of live events a sense of total immersion by providing the viewer with a "virtual camera". Immersive video uses multiple videos of an event, captured from different perspectives, to generate a full 3D digital video of that event. While replaying this 3D digital movie, interactive viewers are able to explore the scene continuously from any perspective. This is accomplished by combining an a priori static model with a dynamic model that is created by assimilating dynamic information from each video stream into a comprehensive three dimensional environment model. We formalize the concept of immersive video, describe the architecture of our current implementation, and illustrate immersive video in staged karate demonstrations and basketball games. In its full realization, immersive video will be a paradigm shift in visual communication which will revolutionize television and video media, and will become an integral part of future telepresence and virtual reality systems.

Proceedings ArticleDOI
30 Mar 1996
TL;DR: WYSIWYF display as discussed by the authors provides correct visual/haptic registration using a vision based object tracking technique and a video keying technique so that what the user can see via a visual interface is consistent with what he/she can feel through a haptic interface using Chroma Keying, a live video image of the user's hand is extracted and blended with the graphic scene of the virtual environment.
Abstract: We propose a new concept of visual/haptic interfaces called WYSIWYF display The proposed concept provides correct visual/haptic registration using a vision based object tracking technique and a video keying technique so that what the user can see via a visual interface is consistent with what he/she can feel through a haptic interface Using Chroma Keying, a live video image of the user's hand is extracted and blended with the graphic scene of the virtual environment The user's hand "encounters" the haptic device exactly when his/her hand touches a virtual object in the blended scene The first prototype has been built and the proposed concept was demonstrated

Patent
08 Apr 1996
TL;DR: In this article, a video data stream analyzer is proposed to eliminate redundancy in the input video signal, and reorganize the video signal so that the spatial and temporal redundancy is increased.
Abstract: A video data stream analyzer modifies an input digital video signal so that the resulting output digital signal can be optimally compressed by a digital video encoder. The video data stream analyzer eliminates redundancy in the input video signal, and reorganizes the input video signal so that the spatial and temporal redundancy is increased. In addition, the video data stream analyzer generates side channel information that is supplied to the video encoder. The side channel information tells the video encoder whether vertical frame-based filtering or vertical field-based filtering is preferable. Additional side channel information specifies the order and duration of the display of the fields after decoding and this information preferably is encoded with the video signal. The video data stream analyzer provides scan detection of the incoming video digital data, and automatically and reliably detects scene cuts, repeated fields, and mixed-field frames in the incoming digital video data in real time independent of the video source. The video data stream analyzer modifies the input video data stream by dropping repeated fields and replacing a frame with a scene cut with a frame having identical fields for video, cartoon, telecine video sources as well as arbitrary combinations of these video sources.

Patent
06 Dec 1996
TL;DR: A sports event video manipulating system for manipulating a representation of a sports event is described in this article, where a sports editor including a video field grabber and an object tracker is used to track an object through a plurality of successive video fields, an object highlighter receiving input from the object tracker and an operative to highlight the tracked object on each of the plurality of video fields.
Abstract: A sports event video manipulating system for manipulating a representation of a sports event, the sports editor including a video field grabber operative to grab at least one video field including a video image A/D converter operative to digitize a grabbed video field, an object tracker operative to track an object through a plurality of successive video fields, an object highlighter receiving input from the object tracker and operative to highlight the tracked object on each of the plurality of successive video fields, a D/A image converter operative to convert output of the object highlighter into a video standard format, and a video display monitor.

Journal ArticleDOI
TL;DR: The focus of this research is the use of a society of low-level models for performing relatively high-level tasks, such as retrieval and annotation of image and video libraries.
Abstract: The average person with a computer will soon have access to the world's collections of digital video and images. However, unlike text that can be alphabetized or numbers that can be ordered, image and video has no general language to aid in its organization. Tools that can "see" and "understand" the content of imagery are still in their infancy, but they are now at the point where they can provide substantial assistance to users in navigating through visual media. This paper describes new tools based on "vision texture" for modeling image and video. The focus of this research is the use of a society of low-level models for performing relatively high-level tasks, such as retrieval and annotation of image and video libraries. This paper surveys recent and present research in this fast-growing area.

Patent
Robert J. Gove1
30 Aug 1996
TL;DR: In this paper, a system for stabilizing a video recording of a scene (20, 22, and 24) made with a video camera (34) is provided. The video recording may include video data (36) and audio (38) data.
Abstract: A system (26) for stabilizing a video recording of a scene (20, 22, & 24) made with a video camera (34) is provided. The video recording may include video data (36) and audio (38) data. The system (26) may include source frame storage (64) for storing source video data (36) as a plurality of sequential frames. The system (26) may also include a processor (50) for detecting camera movement occurring during recording and for modifying the video data (36) to compensate for the camera movement. Additionally the system (26) may include destination frame storage (70) for storing the modified video data as plurality of sequential frames.

Patent
19 Nov 1996
TL;DR: In this paper, an approach for detecting a cut in a video comprises arrangements for acquiring video images from a source, for deriving from the video images a pixel-based difference metric, and for measuring video content of video images to provide up-to-date test criteria.
Abstract: Apparatus for detecting a cut in a video comprises arrangements for acquiring video images from a source, for deriving from the video images a pixel-based difference metric, for deriving from the video images a distribution-based difference metric, and for measuring video content of the video images to provide up-to-date test criteria. Arrangements are included for combining the pixel-based difference metric and the distribution-based difference metric, taking into account the up-to-date test criteria provided so as to derive a scene change candidate signal and for filtering the scene change candidate signal so as to generate a scene change frame list.

Patent
31 Oct 1996
TL;DR: In this paper, a flexible video information analysis apparatus stores a video information data base and a plurality of moving image content analysis algorithms for analyzing the video information in the data base, and a user can manipulate a mouse to select one of the analysis algorithms.
Abstract: A flexible video information analysis apparatus stores a video information data base and a plurality of moving image content analysis algorithms for analyzing the video information in the data base. A user can manipulate a mouse to select one of the analysis algorithms. The selected algorithm is used to analyze video information in the data base.

Proceedings ArticleDOI
17 Jun 1996
TL;DR: The novelty of this work is that it proposes to integrate speech understanding and image analysis algorithms for extracting information in news or sports video indexing, where usually speech analysis is more efficient in detecting events than image analysis.
Abstract: We study an important problem in multimedia database, namely the automatic extraction of indexing information from raw data based on video contents. The goal of our research project is to develop a prototype system for automatic indexing of sports videos. The novelty of our work is that we propose to integrate speech understanding and image analysis algorithms for extracting information. The main thrust of this work comes from the observation that in news or sports video indexing, usually speech analysis is more efficient in detecting events than image analysis. Therefore, in our system, the audio processing modules are first applied to locate candidates in the whole data. This information is passed to the video processing modules, which further analyze the video. The final products of video analysis are in the form of pointers to the locations of interesting events in a video. Our algorithms have been tested extensively with real TV programs, and results are presented and discussed.

Proceedings ArticleDOI
08 Feb 1996
TL;DR: These cameras will be a standard peripheral on all PCs bundled for multimedia applications, given that in excess of 60M PCs will be sold this year, a sizable new market for electronic cameras is being created.
Abstract: Recent advances in video compression and digital networking technology, combined with the ever increasing power of PCs and workstations, are creating enormous opportunities to develop new multimedia products and services built upon sophisticated voice, data, image and video processing. This will create a significant demand for compact, low-cost, low-power electronic cameras for video and still image capture. These cameras will be a standard peripheral on all PCs bundled for multimedia applications. Given that in excess of 60M PCs will be sold this year, a sizable new market for electronic cameras is being created.

Patent
11 Dec 1996
TL;DR: In this article, the position of detection measurement frame having a feature pattern with the largest similarity to the standard feature pattern obtained from the standard measurement frame is determined and an imaging condition of a television camera is controlled on the basis of the position information of the detection measurement frames in order to attain a video camera system enabling to suitably track the object motion.
Abstract: A video camera system can suitably track a moving object without influence of other objects outside the desired image. Detection feature patterns are formed after brightness and hue frequency feature data are obtained on the basis of image information of the detection measurement frame. The position of detection measurement frame having a feature pattern with the largest similarity to the standard feature pattern obtained from the standard measurement frame is determined. An imaging condition of a television camera is controlled on the basis of the position information of the detection measurement frame in order to attain a video camera system enabling to suitably track the object motion. Further, a video camera system can obtain a face image of constantly a same size with a simple construction. An area of the face image on the display plane is detected as the detected face area, and by comparing this with a standard face area, zooming-processing is performed such that the difference becomes 0. Thus, it is unnecessary to use the method of a distance sensor, etc., and a video camera system with a simple construction can be obtained.