scispace - formally typeset
Search or ask a question
Author

Xiaodan Zhu

Bio: Xiaodan Zhu is an academic researcher from University of Toronto. The author has contributed to research in topics: Automatic summarization & Spoken language. The author has an hindex of 5, co-authored 6 publications receiving 92 citations.

Papers
More filters
Proceedings ArticleDOI
02 Aug 2009
TL;DR: This model modifies a recently proposed unsupervised algorithm to detect re-occurring acoustic patterns in speech and uses them to estimate similarities between utterances, which are in turn used to identify salient utterances and remove redundancies.
Abstract: This paper presents a model for summarizing multiple untranscribed spoken documents. Without assuming the availability of transcripts, the model modifies a recently proposed unsupervised algorithm to detect re-occurring acoustic patterns in speech and uses them to estimate similarities between utterances, which are in turn used to identify salient utterances and remove redundancies. This model is of interest due to its independence from spoken language transcription, an error-prone and resource-intensive process, its ability to integrate multiple sources of information on the same topic, and its novel use of acoustic patterns that extends previous work on low-level prosodic feature detection. We compare the performance of this model with that achieved using manual and automatic transcripts, and find that this new approach is roughly equivalent to having access to ASR transcripts with word error rates in the 33--37% range without actually having to do the ASR, plus it better handles utterances with out-of-vocabulary words.

46 citations

Proceedings ArticleDOI
04 Jun 2006
TL;DR: It is found that speech disfluencies, which have been removed as noise in previous work, help identify important utterances, while the structural feature is less effective than it is in broadcast news.
Abstract: This paper is concerned with the summarization of spontaneous conversations. Compared with broadcast news, which has received intensive study, spontaneous conversations have been less addressed in the literature. Previous work has focused on textual features extracted from transcripts. This paper explores and compares the effectiveness of both textual features and speech-related features. The experiments show that these features incrementally improve summarization performance. We also find that speech disfluencies, which have been removed as noise in previous work, help identify important utterances, while the structural feature is less effective than it is in broadcast news.

22 citations

Proceedings ArticleDOI
02 Aug 2009
TL;DR: It is demonstrated that transformation-based learning can be used to correct noisy speech recognition transcripts in the lecture domain with an average word error rate reduction of 12.9%.
Abstract: We demonstrate that transformation-based learning can be used to correct noisy speech recognition transcripts in the lecture domain with an average word error rate reduction of 12.9%. Our method is distinguished from earlier related work by its robustness to small amounts of training data, and its resulting efficiency, in spite of its use of true word error rate computations as a rule scoring function.

12 citations

Proceedings ArticleDOI
09 Jul 2006
TL;DR: Experiments show that the use of speech-related features improves summarization performance, and the effectiveness of individual features is examined and compared.
Abstract: To identify important utterances from open-domain spontaneous conversations, previous work has focused on using textual features that are extracted from transcripts, e.g., word frequencies and noun senses. In this paper, we summarize spontaneous conversations with features of a wide variety that have not been explored before. Experiments show that the use of speech-related features improves summarization performance. In addition, the effectiveness of individual features is examined and compared.

9 citations

Proceedings ArticleDOI
22 Sep 2008
TL;DR: This paper incorporates relevant written materials, e.g., textbooks for lectures, which convey semantic relationships, in particular domain-specific relationships, between words, and trains latent Dirichlet allocation models on these materials and measures the similarity between slides and transcripts in the acquired hidden-topic space.
Abstract: This paper studies automatic detection of topic transitions for recorded presentations. This can be achieved by matching slide content with presentation transcripts directly with some similarity metrics. Such literal matching, however, misses domain-specific knowledge and is sensitive to speech recognition errors. In this paper, we incorporate relevant written materials, e.g., textbooks for lectures, which convey semantic relationships, in particular domain-specific relationships, between words. To this end, we train latent Dirichlet allocation (LDA) models on these materials and measure the similarity between slides and transcripts in the acquired hidden-topic space. This similarity is then combined with literal matchings. Experiments show that the proposed approach reduces the errors in slide transition detection by 17-41% on manual transcripts and 27-37% on automatic transcripts.

6 citations


Cited by
More filters
Book
27 Jun 2011
TL;DR: The challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field are discussed.
Abstract: It has now been 50 years since the publication of Luhn’s seminal paper on automatic summarization. During these years the practical need for automatic summarization has become increasingly urgent and numerous papers have been published on the topic. As a result, it has become harder to find a single reference that gives an overview of past efforts or a complete view of summarization tasks and necessary system components. This article attempts to fill this void by providing a comprehensive overview of research in summarization, including the more traditional efforts in sentence extraction as well as the most novel recent approaches for determining important content, for domain and genre specific summarization and for evaluation of summarization. We also discuss the challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field. We would like to thank the anonymous reviewers, our students and Noemie Elhadad, Hongyan Jing, Julia Hirschberg, Annie Louis, Smaranda Muresan and Dragomir Radev for their helpful feedback. This paper was supported in part by the U.S. National Science Foundation (NSF) under IIS-05-34871 and CAREER 09-53445. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF. Full text available at: http://dx.doi.org/10.1561/1500000015

697 citations

Journal ArticleDOI
TL;DR: It is argued that multimodal learning analytics can offer new insights into students’ learning trajectories in more complex and open-ended learning environments and its educational application is presented.
Abstract: New high-frequency multimodal data collection technologies and machine learning analysis techniques could offer new insights into learning, especially when students have the opportunity to generate unique, personalized artifacts, such as computer programs, robots, and solutions engineering challenges. To date most of the work on learning analytics and educational data mining has been focused on online courses and cognitive tutors, both of which provide a high degree of structure to the tasks, and are restricted to interactions that occur in front of a computer screen. In this paper, we argue that multimodal learning analytics can offer new insights into students’ learning trajectories in more complex and open-ended learning environments. We present several examples of this work and its educational application.

269 citations

Posted Content
TL;DR: This paper presents a novel and realistic method for speeding up the training time of a transformation-based learner without sacrificing performance and shows that this system is able to achieve a significant improvement in training time while still achieving the same performance as a standard transformation- based learner.
Abstract: Transformation-based learning has been successfully employed to solve many natural language processing problems. It achieves state-of-the-art performance on many natural language processing tasks and does not overtrain easily. However, it does have a serious drawback: the training time is often intorelably long, especially on the large corpora which are often used in NLP. In this paper, we present a novel and realistic method for speeding up the training time of a transformation-based learner without sacrificing performance. The paper compares and contrasts the training time needed and performance achieved by our modified learner with two other systems: a standard transformation-based learner, and the ICA system \cite{hepple00:tbl}. The results of these experiments show that our system is able to achieve a significant improvement in training time while still achieving the same performance as a standard transformation-based learner. This is a valuable contribution to systems and algorithms which utilize transformation-based learning at any part of the execution.

220 citations

Journal ArticleDOI
TL;DR: This overview article is intended to provide a thorough overview of the concepts, principles, approaches, and achievements of major technical contributions along this line of investigation.
Abstract: Spoken content retrieval refers to directly indexing and retrieving spoken content based on the audio rather than text descriptions. This potentially eliminates the requirement of producing text descriptions for multimedia content for indexing and retrieval purposes, and is able to precisely locate the exact time the desired information appears in the multimedia. Spoken content retrieval has been very successfully achieved with the basic approach of cascading automatic speech recognition (ASR) with text information retrieval: after the spoken content is transcribed into text or lattice format, a text retrieval engine searches over the ASR output to find desired information. This framework works well when the ASR accuracy is relatively high, but becomes less adequate when more challenging real-world scenarios are considered, since retrieval performance depends heavily on ASR accuracy. This challenge leads to the emergence of another approach to spoken content retrieval: to go beyond the basic framework of cascading ASR with text retrieval in order to have retrieval performances that are less dependent on ASR accuracy. This overview article is intended to provide a thorough overview of the concepts, principles, approaches, and achievements of major technical contributions along this line of investigation. This includes five major directions: 1) Modified ASR for Retrieval Purposes: cascading ASR with text retrieval, but the ASR is modified or optimized for spoken content retrieval purposes; 2) Exploiting the Information not present in ASR outputs: to try to utilize the information in speech signals inevitably lost when transcribed into phonemes and words; 3) Directly Matching at the Acoustic Level without ASR: for spoken queries, the signals can be directly matched at the acoustic level, rather than at the phoneme or word levels, bypassing all ASR issues; 4) Semantic Retrieval of Spoken Content: trying to retrieve spoken content that is semantically related to the query, but not necessarily including the query terms themselves; 5) Interactive Retrieval and Efficient Presentation of the Retrieved Objects: with efficient presentation of the retrieved objects, an interactive retrieval process incorporating user actions may produce better retrieval results and user experiences.

117 citations