scispace - formally typeset
Proceedings ArticleDOI

Research on Web Document Summarization

TLDR
An algorithm for WDS based on sentences extraction that considers both the Web formats and hyperlink attributes and the weight proportion of words and structures is learned by machine learning approach is presented.
Abstract
Web document summarization (WDS) is becoming one of the hot subjects in the text summarization field due to the rapidly increasing number of documents on Web. WDS is different from traditional text summarization because it must process hyperlinked texts. This paper first analyses the features of Web documents, then gives a definition for WDS, and finally presents an algorithm for WDS based on sentences extraction. Each sentence's weight is a weighted sum of words' weight and its sentence-structure's weight. The former weight is adjusted by document class graph and latter weight considers both the Web formats and hyperlink attributes. The weight proportion of words and structures is learned by machine learning approach. Experiments on 2,000 Web documents show that our algorithm is feasible.

read more

Citations
More filters
Journal Article

Automatic Text Summarization for Web Pages on Internet

TL;DR: This paper discusses the new demands of automatic summarization for text on Internet and some related information and draws a conclusion and prospect on the research of auto text summarization on Internet.
Journal Article

A Framework for Collaborative Document Classification with GA-SVM

TL;DR: A Collaborative Document Classification (CDC) system that adapts according to a given corpus, the weighted contributions of statistical features, an array of lexical-semantic features derived from the WordNet ontology and categorical-Semantic features obtained from the hierarchical organization of Wikipedia category pages are developed.
Journal ArticleDOI

Cognos Clustering in IBM Connections Metrics

TL;DR: Cognos Clustering greatly enhance the load capacity of the report server, improve the performance, effectiveness and capacity, make the server more stable, ensure the user quantity concurrency.
Journal ArticleDOI

Classify the Search Result Based on IBM OminiFind Edition and UIMA

TL;DR: A method is proposed which uses the IBM OmniFind Enterprise Edition combined with IBM open source of unstructured information management architecture of Unstructured Information Management Architecture (UIMA), to realize the IBM Omnibus Enterprise Edition semantic search engine search and result classification.
References
More filters
Book ChapterDOI

Automatic Text Summarization with Genetic Algorithm-Based Attribute Selection

TL;DR: The goal of the paper is to investigate the effectiveness of Genetic Algorithm (GA)-based attribute selection in improving the performance of classification algorithms solving the automatic text summarization task.
Proceedings ArticleDOI

An approach to sentence-selection-based text summarization

TL;DR: This paper introduced a newly developed text summarization system that supports both Chinese and English, and describes two new techniques for processing the topic sensitive word feature and the sentence length feature.
Journal ArticleDOI

FIDS: an intelligent financial Web news articles digest system

TL;DR: A system called FIDS (Financial Information Digest System), which can digest online financial news automatically and allows one to perform cross-validation on their contents, so users can have access to more complete information which otherwise would be scattered in different articles.
Journal Article

Automatic Text Summarization for Web Pages on Internet

TL;DR: This paper discusses the new demands of automatic summarization for text on Internet and some related information and draws a conclusion and prospect on the research of auto text summarization on Internet.
Related Papers (5)