scispace - formally typeset
Search or ask a question
Author

Zhu Liu

Other affiliations: AT&T Labs, New York University
Bio: Zhu Liu is an academic researcher from AT&T. The author has contributed to research in topics: Metadata & TRECVID. The author has an hindex of 38, co-authored 170 publications receiving 5230 citations. Previous affiliations of Zhu Liu include AT&T Labs & New York University.


Papers
More filters
Journal ArticleDOI
TL;DR: This work describes audio and visual features that can effectively characterize scene content, present selected algorithms for segmentation and classification, and review some testbed systems for video archiving and retrieval.
Abstract: Multimedia content analysis refers to the computerized understanding of the semantic meanings of a multimedia document, such as a video sequence with an accompanying audio track. With a multimedia document, its semantics are embedded in multiple forms that are usually complimentary of each other, Therefore, it is necessary to analyze all types of data: image frames, sound tracks, texts that can be extracted from image frames, and spoken words that can be deciphered from the audio track. This usually involves segmenting the document into semantically meaningful units, classifying each unit into a predefined scene type, and indexing and summarizing the document for efficient retrieval and browsing. We review advances in using audio and visual information jointly for accomplishing the above tasks. We describe audio and visual features that can effectively characterize scene content, present selected algorithms for segmentation and classification, and review some testbed systems for video archiving and retrieval. We also describe audio and visual descriptors and description schemes that are being considered by the MPEG-7 standard for multimedia content description.

552 citations

Journal ArticleDOI
01 Oct 1998
TL;DR: A set of low-level audio features are proposed for characterizing semantic contents of short audio clips and a neural net classifier was successful in separating the above five types of TV programs.
Abstract: Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we have focused on using the associated audio information (mainly the nonspeech portion) for video scene analysis. As an example, we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. A set of low-level audio features are proposed for characterizing semantic contents of short audio clips. The linear separability of different classes under the proposed feature space is examined using a clustering analysis. The effective features are identified by evaluating the intracluster and intercluster scattering matrices of the feature space. Using these features, a neural net classifier was successful in separating the above five types of TV programs. By evaluating the changes between the feature vectors of adjacent clips, we also can identify scene breaks in an audio sequence quite accurately. These results demonstrate the capability of the proposed audio features for characterizing the semantic content of an audio sequence.

272 citations

Patent
06 Jun 2003
TL;DR: In this article, a method and system for automatically identifying and presenting video clips or other media to a user at a client device is described, and a method for updating a user profile or other persistent data store based on user feedback is presented.
Abstract: The invention relates to a method and system for automatically identifying and presenting video clips or other media to a user at a client device. One embodiment of the invention provides a method for updating a user profile or other persistent data store based on user feedback to improve the identification of video clips or other media content responsive to the user's profile. Embodiments of the invention also provide methods for processing user feedback. Related architectures are also disclosed.

235 citations

Patent
David Crawford Gibbon1, Qian Huang1, Zhu Liu1, Aaron E. Rosenberg1, Behzad Shahraray1 
15 Oct 2003
TL;DR: In this paper, a system and method for automatically indexing and retrieving multimedia content is presented, which may include separating a multimedia data stream into audio, visual and text components, segmenting the audio and visual components based on semantic differences.
Abstract: The invention provides a system and method for automatically indexing and retrieving multimedia content. The method may include separating a multimedia data stream into audio, visual and text components, segmenting the audio, visual and text components based on semantic differences, identifying at least one target speaker using the audio and visual components, identifying a topic of the multimedia event using the segmented text and topic category models, generating a summary of the multimedia event based on the audio, visual and text components, the identified topic and the identified target speaker, and generating a multimedia description of the multimedia event based on the identified target speaker, the identified topic, and the generated summary.

215 citations

Patent
05 Jan 2005
TL;DR: In this paper, a machine-implemented method to build a library of reusable components for use in building a natural language spoken dialog system may include storing a dataset in a database.
Abstract: A machine-readable medium may include a group of reusable components for building a spoken dialog system. The reusable components may include a group of previously collected audible utterances. A machine-implemented method to build a library of reusable components for use in building a natural language spoken dialog system may include storing a dataset in a database. The dataset may include a group of reusable components for building a spoken dialog system. The reusable components may further include a group of previously collected audible utterances. A second method may include storing at least one set of data. Each one of the at least one set of data may include ones of the reusable components associated with audible data collected during a different collection phase.

177 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Patent
11 Jan 2011
TL;DR: In this article, an intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.
Abstract: An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

1,462 citations

Patent
Jong Hwan Kim1
13 Mar 2015
TL;DR: In this article, a mobile terminal including a body; a touchscreen provided to a front and extending to side of the body and configured to display content; and a controller configured to detect one side of a body when it comes into contact with a side of an external terminal, display a first area on the touchscreen corresponding to a contact area of body and the external terminal and a second area including the content.
Abstract: A mobile terminal including a body; a touchscreen provided to a front and extending to side of the body and configured to display content; and a controller configured to detect one side of the body comes into contact with one side of an external terminal, display a first area on the touchscreen corresponding to a contact area of the body and the external terminal and a second area including the content, receive an input of moving the content displayed in the second area to the first area, display the content in the first area, and share the content in the first area with the external terminal.

1,441 citations

Patent
01 Feb 1999
TL;DR: An adaptive interface for a programmable system, for predicting a desired user function, based on user history, as well as machine internal status and context, is presented for confirmation by the user, and the predictive mechanism is updated based on this feedback as mentioned in this paper.
Abstract: An adaptive interface for a programmable system, for predicting a desired user function, based on user history, as well as machine internal status and context. The apparatus receives an input from the user and other data. A predicted input is presented for confirmation by the user, and the predictive mechanism is updated based on this feedback. Also provided is a pattern recognition system for a multimedia device, wherein a user input is matched to a video stream on a conceptual basis, allowing inexact programming of a multimedia device. The system analyzes a data stream for correspondence with a data pattern for processing and storage. The data stream is subjected to adaptive pattern recognition to extract features of interest to provide a highly compressed representation that may be efficiently processed to determine correspondence. Applications of the interface and system include a video cassette recorder (VCR), medical device, vehicle control system, audio device, environmental control system, securities trading terminal, and smart house. The system optionally includes an actuator for effecting the environment of operation, allowing closed-loop feedback operation and automated learning.

1,182 citations

Journal ArticleDOI
TL;DR: This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used for combining multiple modalities in order to accomplish various multimedia analysis tasks.
Abstract: This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used for combining multiple modalities in order to accomplish various multimedia analysis tasks. The existing literature on multimodal fusion research is presented through several classifications based on the fusion methodology and the level of fusion (feature, decision, and hybrid). The fusion methods are described from the perspective of the basic concept, advantages, weaknesses, and their usage in various analysis tasks as reported in the literature. Moreover, several distinctive issues that influence a multimodal fusion process such as, the use of correlation and independence, confidence level, contextual information, synchronization between different modalities, and the optimal modality selection are also highlighted. Finally, we present the open issues for further research in the area of multimodal fusion.

1,019 citations