Home
/
Authors
/
Xiaodan Zhu

Author

Xiaodan Zhu

Bio: Xiaodan Zhu is an academic researcher from University of Toronto. The author has contributed to research in topics: Automatic summarization & Spoken language. The author has an hindex of 5, co-authored 6 publications receiving 92 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Summarizing multiple spoken documents: finding evidence from untranscribed audio

[...]

Xiaodan Zhu¹, Gerald Penn¹, Frank Rudzicz¹•Institutions (1)

University of Toronto¹

02 Aug 2009

TL;DR: This model modifies a recently proposed unsupervised algorithm to detect re-occurring acoustic patterns in speech and uses them to estimate similarities between utterances, which are in turn used to identify salient utterances and remove redundancies.

...read moreread less

Abstract: This paper presents a model for summarizing multiple untranscribed spoken documents. Without assuming the availability of transcripts, the model modifies a recently proposed unsupervised algorithm to detect re-occurring acoustic patterns in speech and uses them to estimate similarities between utterances, which are in turn used to identify salient utterances and remove redundancies. This model is of interest due to its independence from spoken language transcription, an error-prone and resource-intensive process, its ability to integrate multiple sources of information on the same topic, and its novel use of acoustic patterns that extends previous work on low-level prosodic feature detection. We compare the performance of this model with that achieved using manual and automatic transcripts, and find that this new approach is roughly equivalent to having access to ASR transcripts with word error rates in the 33--37% range without actually having to do the ASR, plus it better handles utterances with out-of-vocabulary words.

...read moreread less

46 citations

Proceedings Article•DOI•

Comparing the roles of textual, acoustic and spoken-language features on spontaneous-conversation summarization

[...]

Xiaodan Zhu¹, Gerald Penn¹•Institutions (1)

University of Toronto¹

04 Jun 2006

TL;DR: It is found that speech disfluencies, which have been removed as noise in previous work, help identify important utterances, while the structural feature is less effective than it is in broadcast news.

...read moreread less

Abstract: This paper is concerned with the summarization of spontaneous conversations. Compared with broadcast news, which has received intensive study, spontaneous conversations have been less addressed in the literature. Previous work has focused on textual features extracted from transcripts. This paper explores and compares the effectiveness of both textual features and speech-related features. The experiments show that these features incrementally improve summarization performance. We also find that speech disfluencies, which have been removed as noise in previous work, help identify important utterances, while the structural feature is less effective than it is in broadcast news.

...read moreread less

22 citations

Proceedings Article•DOI•

Improving Automatic Speech Recognition for Lectures through Transformation-based Rules Learned from Minimal Data

[...]

Cosmin Munteanu¹, Gerald Penn¹, Xiaodan Zhu¹•Institutions (1)

University of Toronto¹

02 Aug 2009

TL;DR: It is demonstrated that transformation-based learning can be used to correct noisy speech recognition transcripts in the lecture domain with an average word error rate reduction of 12.9%.

...read moreread less

Abstract: We demonstrate that transformation-based learning can be used to correct noisy speech recognition transcripts in the lecture domain with an average word error rate reduction of 12.9%. Our method is distinguished from earlier related work by its robustness to small amounts of training data, and its resulting efficiency, in spite of its use of true word error rate computations as a rule scoring function.

...read moreread less

12 citations

Proceedings Article•DOI•

Utterance-Level Extractive Summarization of Open-Domain Spontaneous Conversations with Rich Features

[...]

Xiaodan Zhu¹, Gerald Penn¹•Institutions (1)

University of Toronto¹

09 Jul 2006

TL;DR: Experiments show that the use of speech-related features improves summarization performance, and the effectiveness of individual features is examined and compared.

...read moreread less

Abstract: To identify important utterances from open-domain spontaneous conversations, previous work has focused on using textual features that are extracted from transcripts, e.g., word frequencies and noun senses. In this paper, we summarize spontaneous conversations with features of a wide variety that have not been explored before. Experiments show that the use of speech-related features improves summarization performance. In addition, the effectiveness of individual features is examined and compared.

...read moreread less

9 citations

Proceedings Article•DOI•

Using latent Dirichlet allocation to incorporate domain knowledge for topic transition detection.

[...]

Xiaodan Zhu¹, Xuming He¹, Cosmin Munteanu¹, Gerald Penn¹•Institutions (1)

University of Toronto¹

22 Sep 2008

TL;DR: This paper incorporates relevant written materials, e.g., textbooks for lectures, which convey semantic relationships, in particular domain-specific relationships, between words, and trains latent Dirichlet allocation models on these materials and measures the similarity between slides and transcripts in the acquired hidden-topic space.

...read moreread less

Abstract: This paper studies automatic detection of topic transitions for recorded presentations. This can be achieved by matching slide content with presentation transcripts directly with some similarity metrics. Such literal matching, however, misses domain-specific knowledge and is sensitive to speech recognition errors. In this paper, we incorporate relevant written materials, e.g., textbooks for lectures, which convey semantic relationships, in particular domain-specific relationships, between words. To this end, we train latent Dirichlet allocation (LDA) models on these materials and measure the similarity between slides and transcripts in the acquired hidden-topic space. This similarity is then combined with literal matchings. Experiments show that the proposed approach reduces the errors in slide transition detection by 17-41% on manual transcripts and 27-37% on automatic transcripts.

...read moreread less

6 citations

Cited by

PDF

Open Access

More filters

Book•

Automatic Summarization

[...]

Ani Nenkova¹, Sameer Maskey², Yang Liu³•Institutions (3)

University of Pennsylvania¹, IBM², University of Texas at Dallas³

27 Jun 2011

TL;DR: The challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field are discussed.

...read moreread less

Abstract: It has now been 50 years since the publication of Luhn’s seminal paper on automatic summarization. During these years the practical need for automatic summarization has become increasingly urgent and numerous papers have been published on the topic. As a result, it has become harder to find a single reference that gives an overview of past efforts or a complete view of summarization tasks and necessary system components. This article attempts to fill this void by providing a comprehensive overview of research in summarization, including the more traditional efforts in sentence extraction as well as the most novel recent approaches for determining important content, for domain and genre specific summarization and for evaluation of summarization. We also discuss the challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field. We would like to thank the anonymous reviewers, our students and Noemie Elhadad, Hongyan Jing, Julia Hirschberg, Annie Louis, Smaranda Muresan and Dragomir Radev for their helpful feedback. This paper was supported in part by the U.S. National Science Foundation (NSF) under IIS-05-34871 and CAREER 09-53445. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF. Full text available at: http://dx.doi.org/10.1561/1500000015

...read moreread less

697 citations

[서평]「Algorithms on Strings, Trees, and Sequences」

[...]

김동규

01 Mar 2000

512 citations

Journal Article•DOI•

Multimodal Learning Analytics and Education Data Mining: Using computational technologies to measure complex learning tasks

[...]

Paulo Blikstein¹, Paulo Blikstein², Marcelo Worsley¹•Institutions (2)

Stanford University¹, Northwestern University²

17 Sep 2016-Journal of learning Analytics

TL;DR: It is argued that multimodal learning analytics can offer new insights into students’ learning trajectories in more complex and open-ended learning environments and its educational application is presented.

...read moreread less

Abstract: New high-frequency multimodal data collection technologies and machine learning analysis techniques could offer new insights into learning, especially when students have the opportunity to generate unique, personalized artifacts, such as computer programs, robots, and solutions engineering challenges. To date most of the work on learning analytics and educational data mining has been focused on online courses and cognitive tutors, both of which provide a high degree of structure to the tasks, and are restricted to interactions that occur in front of a computer screen. In this paper, we argue that multimodal learning analytics can offer new insights into students’ learning trajectories in more complex and open-ended learning environments. We present several examples of this work and its educational application.

...read moreread less

269 citations

Posted Content•

Transformation-Based Learning in the Fast Lane

[...]

Grace Ngai¹, Radu Florian¹•Institutions (1)

Johns Hopkins University¹

17 Jul 2001-arXiv: Computation and Language

TL;DR: This paper presents a novel and realistic method for speeding up the training time of a transformation-based learner without sacrificing performance and shows that this system is able to achieve a significant improvement in training time while still achieving the same performance as a standard transformation- based learner.

...read moreread less

Abstract: Transformation-based learning has been successfully employed to solve many natural language processing problems. It achieves state-of-the-art performance on many natural language processing tasks and does not overtrain easily. However, it does have a serious drawback: the training time is often intorelably long, especially on the large corpora which are often used in NLP. In this paper, we present a novel and realistic method for speeding up the training time of a transformation-based learner without sacrificing performance. The paper compares and contrasts the training time needed and performance achieved by our modified learner with two other systems: a standard transformation-based learner, and the ICA system \cite{hepple00:tbl}. The results of these experiments show that our system is able to achieve a significant improvement in training time while still achieving the same performance as a standard transformation-based learner. This is a valuable contribution to systems and algorithms which utilize transformation-based learning at any part of the execution.

...read moreread less

220 citations

Journal Article•DOI•

Spoken content retrieval: beyond cascading speech recognition with text retrieval

[...]

Lin-Shan Lee¹, James Glass², Hung-yi Lee¹, Chun-an Chan³•Institutions (3)

National Taiwan University¹, Massachusetts Institute of Technology², Google³

01 Sep 2015-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This overview article is intended to provide a thorough overview of the concepts, principles, approaches, and achievements of major technical contributions along this line of investigation.

...read moreread less

Abstract: Spoken content retrieval refers to directly indexing and retrieving spoken content based on the audio rather than text descriptions. This potentially eliminates the requirement of producing text descriptions for multimedia content for indexing and retrieval purposes, and is able to precisely locate the exact time the desired information appears in the multimedia. Spoken content retrieval has been very successfully achieved with the basic approach of cascading automatic speech recognition (ASR) with text information retrieval: after the spoken content is transcribed into text or lattice format, a text retrieval engine searches over the ASR output to find desired information. This framework works well when the ASR accuracy is relatively high, but becomes less adequate when more challenging real-world scenarios are considered, since retrieval performance depends heavily on ASR accuracy. This challenge leads to the emergence of another approach to spoken content retrieval: to go beyond the basic framework of cascading ASR with text retrieval in order to have retrieval performances that are less dependent on ASR accuracy. This overview article is intended to provide a thorough overview of the concepts, principles, approaches, and achievements of major technical contributions along this line of investigation. This includes five major directions: 1) Modified ASR for Retrieval Purposes: cascading ASR with text retrieval, but the ASR is modified or optimized for spoken content retrieval purposes; 2) Exploiting the Information not present in ASR outputs: to try to utilize the information in speech signals inevitably lost when transcribed into phonemes and words; 3) Directly Matching at the Acoustic Level without ASR: for spoken queries, the signals can be directly matched at the acoustic level, rather than at the phoneme or word levels, bypassing all ASR issues; 4) Semantic Retrieval of Spoken Content: trying to retrieve spoken content that is semantically related to the query, but not necessarily including the query terms themselves; 5) Interactive Retrieval and Efficient Presentation of the Retrieved Objects: with efficient presentation of the retrieved objects, an interactive retrieval process incorporating user actions may produce better retrieval results and user experiences.

...read moreread less

117 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17

Collapse