scispace - formally typeset
Open AccessProceedings ArticleDOI

Chinese Spoken Document Summarization Using Probabilistic Latent Topical Information

Reads0
Chats0
TLDR
The use of probabilistic latent topical information for extractive summarization of spoken documents is proposed and the summarization capabilities were verified by comparison with the conventional vector space model and latent semantic indexing model, as well as the HMM model.
Abstract
The purpose of extractive summarization is to automatically select a number of indicative sentences, passages, or paragraphs from the original document according to a target summarization ratio and then sequence them to form a concise summary. In the paper, we proposed the use of probabilistic latent topical information for extractive summarization of spoken documents. Various kinds of modeling structures and learning approaches were extensively investigated. In addition, the summarization capabilities were verified by comparison with the conventional vector space model and latent semantic indexing model, as well as the HMM model. The experiments were performed on the Chinese broadcast news collected in Taiwan. Noticeable performance gains were obtained.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Speech Summarization Without Lexical Features for Mandarin Broadcast News

Jian Zhang, +1 more
TL;DR: It is shown that structural features are superior to lexical features and the summarizer performs surprisingly well at the average F-measure of 0.3914 by using only acoustic features, which enable us to summarize speech without placing a stringent demand on speech recognition accuracy.
Proceedings ArticleDOI

A Comparative Study on Speech Summarization of Broadcast News and Lecture Speech

TL;DR: It is found that acoustic and structural features are more important for Broadcast News summarization due to the speaking styles of anchors and reporters, as well as typical news story flow.
Proceedings ArticleDOI

Improving lecture speech summarization using rhetorical information

TL;DR: It is shown that, despite a 29.7% character error rate in speech recognition, extractive summarization performs relatively well, underlining the fact that spontaneity in lecture speech does not affect the central meaning of lecture speech.
Journal ArticleDOI

A Probabilistic Generative Framework for Extractive Broadcast News Speech Summarization

TL;DR: A unified probabilistic generative framework that combines the sentence generative probability and the sentence prior probability for sentence ranking is proposed and two matching strategies, namely literal term matching and concept matching are thoroughly investigated.
Journal ArticleDOI

A Comparative Study of Probabilistic Ranking Models for Chinese Spoken Document Summarization

TL;DR: A novel training data selection approach that leverages the relevance information of spoken sentences to select reliable document-summary pairs derived by the probabilistic generative summarizers is explored for training the classification-based summarizers.
References
More filters
Proceedings ArticleDOI

Generic text summarization using relevance measure and latent semantic analysis

Yihong Gong, +1 more
TL;DR: This paper proposes two generic text summarization methods that create text summaries by ranking and extracting sentences from the original documents, and uses the latent semantic analysis technique to identify semantically important sentences, for summary creations.
Proceedings ArticleDOI

Summarizing text documents: sentence selection and evaluation metrics

TL;DR: An analysis of news-article summaries generated by sentence selection, using a normalized version of precision-recall curves with a baseline of random sentence selection to evaluate features and empirical results show the importance of corpus-dependent baseline summarization standards, compression ratios and carefully crafted long queries.
Journal ArticleDOI

Speech-to-text and speech-to-speech summarization of spontaneous speech

TL;DR: These methods are applied to the summarization of unrestricted-domain spontaneous presentations and evaluated by objective and subjective measures and it was confirmed that proposed methods are effective in spontaneous speech summarization.
Proceedings ArticleDOI

Comparing Lexical, Acoustic/Prosodic, Structural and Discourse Features for Speech Summarization

TL;DR: It is shown that a summarization system that uses a combination of lexical, prosodic, structural and discourse features produces the most accurate summaries, and that a combinations of acoustic/prosodic and structural features are enough to build a ‘good’ summarizer when speech transcription is not available.