Abstract: Each and any viewer of a video or a television scene is his or her own proactive editor of the scene, having the ability to interactively dictate and select--in advance of the unfolding of the scene and by high-level command--a particular perspective by which the scene will be depicted, as and when the scene unfolds. Video images of the scene are selected, or even synthesized, in response no a viewer-selected (i) spatial perspective on the scene, (ii) static or dynamic object appearing in the scene, or (iii) event depicted in the scene. Multiple video cameras, each at a different spatial location, produce multiple two-dimensional video images of the real-world scene, each at a different spatial perspective. Objects of interest in the scene are identified and classified by computer in these two-dimensional images. The two-dimensional images of the scene, and accompanying information, are then combined in the computer into a three-dimensional video database, or model, of the scene. The computer also receives a user/viewer-specified criterion relative to which criterion the user/viewer wishes to view the scene. From the (i) model and (ii) the criterion, the computer produces a particular two-dimensional image of the scene that is in "best" accordance with the user/viewer-specified criterion. This particular two-dimensional image of the scene is then displayed on a video display. From its knowledge of the scene and of the objects and the events therein, the computer may also answer user/viewer-posed questions regarding the scene and its objects and events.