Home
/
Authors
/
Zhu Liu

Author

Zhu Liu

Other affiliations: AT&T Labs, New York University

Bio: Zhu Liu is an academic researcher from AT&T. The author has contributed to research in topics: Metadata & TRECVID. The author has an hindex of 38, co-authored 170 publications receiving 5230 citations. Previous affiliations of Zhu Liu include AT&T Labs & New York University.

Topics: Metadata, TRECVID, IPTV, Search engine indexing, Audio mining ...read more

Papers published on a yearly basis

2022
2020
2019
2018
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Multimedia content analysis-using both audio and visual clues

[...]

Yao Wang, Zhu Liu¹, Jincheng Huang²•Institutions (2)

New York University¹, Tsinghua University²

01 Nov 2000-IEEE Signal Processing Magazine

TL;DR: This work describes audio and visual features that can effectively characterize scene content, present selected algorithms for segmentation and classification, and review some testbed systems for video archiving and retrieval.

...read moreread less

Abstract: Multimedia content analysis refers to the computerized understanding of the semantic meanings of a multimedia document, such as a video sequence with an accompanying audio track. With a multimedia document, its semantics are embedded in multiple forms that are usually complimentary of each other, Therefore, it is necessary to analyze all types of data: image frames, sound tracks, texts that can be extracted from image frames, and spoken words that can be deciphered from the audio track. This usually involves segmenting the document into semantically meaningful units, classifying each unit into a predefined scene type, and indexing and summarizing the document for efficient retrieval and browsing. We review advances in using audio and visual information jointly for accomplishing the above tasks. We describe audio and visual features that can effectively characterize scene content, present selected algorithms for segmentation and classification, and review some testbed systems for video archiving and retrieval. We also describe audio and visual descriptors and description schemes that are being considered by the MPEG-7 standard for multimedia content description.

...read moreread less

552 citations

Journal Article•DOI•

Audio Feature Extraction and Analysis for Scene Segmentation and Classification

[...]

Zhu Liu¹, Yao Wang¹, Tsuhan Chen²•Institutions (2)

New York University¹, Carnegie Mellon University²

01 Oct 1998

TL;DR: A set of low-level audio features are proposed for characterizing semantic contents of short audio clips and a neural net classifier was successful in separating the above five types of TV programs.

...read moreread less

Abstract: Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we have focused on using the associated audio information (mainly the nonspeech portion) for video scene analysis. As an example, we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. A set of low-level audio features are proposed for characterizing semantic contents of short audio clips. The linear separability of different classes under the proposed feature space is examined using a clustering analysis. The effective features are identified by evaluating the intracluster and intercluster scattering matrices of the feature space. Using these features, a neural net classifier was successful in separating the above five types of TV programs. By evaluating the changes between the feature vectors of adjacent clips, we also can identify scene breaks in an audio sequence quite accurately. These results demonstrate the capability of the proposed audio features for characterizing the semantic content of an audio sequence.

...read moreread less

272 citations

Patent•

Method and apparatus for interactively retrieving content related to previous query results

[...]

Lee Begeja, David Crawford Gibbon, Zhu Liu, Bernard S. Renger, Kenneth H. Rosen, Behzad Shahraray - Show less +2 more

06 Jun 2003

TL;DR: In this article, a method and system for automatically identifying and presenting video clips or other media to a user at a client device is described, and a method for updating a user profile or other persistent data store based on user feedback is presented.

...read moreread less

Abstract: The invention relates to a method and system for automatically identifying and presenting video clips or other media to a user at a client device. One embodiment of the invention provides a method for updating a user profile or other persistent data store based on user feedback to improve the identification of video clips or other media content responsive to the user's profile. Embodiments of the invention also provide methods for processing user feedback. Related architectures are also disclosed.

...read moreread less

235 citations

Patent•

System and method for automated multimedia content indexing and retrieval

[...]

David Crawford Gibbon¹, Qian Huang¹, Zhu Liu¹, Aaron E. Rosenberg¹, Behzad Shahraray¹ - Show less +1 more•Institutions (1)

AT&T¹

15 Oct 2003

TL;DR: In this paper, a system and method for automatically indexing and retrieving multimedia content is presented, which may include separating a multimedia data stream into audio, visual and text components, segmenting the audio and visual components based on semantic differences.

...read moreread less

Abstract: The invention provides a system and method for automatically indexing and retrieving multimedia content. The method may include separating a multimedia data stream into audio, visual and text components, segmenting the audio, visual and text components based on semantic differences, identifying at least one target speaker using the audio and visual components, identifying a topic of the multimedia event using the segmented text and topic category models, generating a summary of the multimedia event based on the audio, visual and text components, the identified topic and the identified target speaker, and generating a multimedia description of the multimedia event based on the identified target speaker, the identified topic, and the generated summary.

...read moreread less

215 citations

Patent•

Library of existing spoken dialog data for use in generating new natural language spoken dialog systems

[...]

Lee Begeja¹, Giuseppe Di Fabbrizio¹, David Crawford Gibbon¹, Dilek Hakkani-Tur¹, Zhu Liu¹, Bernard S. Renger¹, Behzad Shahraray¹, Gokhan Tur¹ - Show less +4 more•Institutions (1)

AT&T¹

05 Jan 2005

TL;DR: In this paper, a machine-implemented method to build a library of reusable components for use in building a natural language spoken dialog system may include storing a dataset in a database.

...read moreread less

Abstract: A machine-readable medium may include a group of reusable components for building a spoken dialog system. The reusable components may include a group of previously collected audible utterances. A machine-implemented method to build a library of reusable components for use in building a natural language spoken dialog system may include storing a dataset in a database. The dataset may include a group of reusable components for building a spoken dialog system. The reusable components may further include a group of previously collected audible utterances. A second method may include storing at least one set of data. Each one of the at least one set of data may include ones of the reusable components associated with audible data collected during a different collection phase.

...read moreread less

177 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

I and i

[...]

Kevin Barraclough

08 Dec 2001-BMJ

TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.

...read moreread less

Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

...read moreread less

33,785 citations

Patent•

Intelligent Automated Assistant

[...]

Thomas R. Gruber¹, Adam Cheyer¹, Dag Kittlaus¹, Didier Rene Guzzoni¹, Christopher Dean Brigham¹, Richard Donald Giuli¹, Marcello Bastea-Forte¹, Harry J. Saddler¹ - Show less +4 more•Institutions (1)

Apple Inc.¹

11 Jan 2011

TL;DR: In this article, an intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.

...read moreread less

Abstract: An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

...read moreread less

1,462 citations

Patent•

Mobile terminal and method for controlling the same

[...]

Jong Hwan Kim¹•Institutions (1)

LG Electronics¹

13 Mar 2015

TL;DR: In this article, a mobile terminal including a body; a touchscreen provided to a front and extending to side of the body and configured to display content; and a controller configured to detect one side of a body when it comes into contact with a side of an external terminal, display a first area on the touchscreen corresponding to a contact area of body and the external terminal and a second area including the content.

...read moreread less

Abstract: A mobile terminal including a body; a touchscreen provided to a front and extending to side of the body and configured to display content; and a controller configured to detect one side of the body comes into contact with one side of an external terminal, display a first area on the touchscreen corresponding to a contact area of the body and the external terminal and a second area including the content, receive an input of moving the content displayed in the second area to the first area, display the content in the first area, and share the content in the first area with the external terminal.

...read moreread less

1,441 citations

Patent•

Adaptive pattern recognition based control system and method

[...]

Steven M. Hoffberg, Linda I. Hoffberg-Borghesani

01 Feb 1999

TL;DR: An adaptive interface for a programmable system, for predicting a desired user function, based on user history, as well as machine internal status and context, is presented for confirmation by the user, and the predictive mechanism is updated based on this feedback as mentioned in this paper.

...read moreread less

Abstract: An adaptive interface for a programmable system, for predicting a desired user function, based on user history, as well as machine internal status and context. The apparatus receives an input from the user and other data. A predicted input is presented for confirmation by the user, and the predictive mechanism is updated based on this feedback. Also provided is a pattern recognition system for a multimedia device, wherein a user input is matched to a video stream on a conceptual basis, allowing inexact programming of a multimedia device. The system analyzes a data stream for correspondence with a data pattern for processing and storage. The data stream is subjected to adaptive pattern recognition to extract features of interest to provide a highly compressed representation that may be efficiently processed to determine correspondence. Applications of the interface and system include a video cassette recorder (VCR), medical device, vehicle control system, audio device, environmental control system, securities trading terminal, and smart house. The system optionally includes an actuator for effecting the environment of operation, allowing closed-loop feedback operation and automated learning.

...read moreread less

1,182 citations

Journal Article•DOI•

Multimodal fusion for multimedia analysis: a survey

[...]

Pradeep K. Atrey¹, M. Anwar Hossain², Abdulmotaleb El Saddik², Mohan S. Kankanhalli³•Institutions (3)

University of Winnipeg¹, University of Ottawa², National University of Singapore³

01 Nov 2010-Multimedia Systems

TL;DR: This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used for combining multiple modalities in order to accomplish various multimedia analysis tasks.

...read moreread less

Abstract: This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used for combining multiple modalities in order to accomplish various multimedia analysis tasks. The existing literature on multimodal fusion research is presented through several classifications based on the fusion methodology and the level of fusion (feature, decision, and hybrid). The fusion methods are described from the perspective of the basic concept, advantages, weaknesses, and their usage in various analysis tasks as reported in the literature. Moreover, several distinctive issues that influence a multimodal fusion process such as, the use of correlation and independence, confidence level, contextual information, synchronization between different modalities, and the optimal modality selection are also highlighted. Finally, we present the open issues for further research in the area of multimodal fusion.

...read moreread less

1,019 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse