Journal ArticleDOI
ACTS: an automatic Chinese text segmentation system for full text retrieval
Zimin Wu,Gwyneth Tseng +1 more
Reads0
Chats0
TLDR
ACTS is an automatic Chinese text segmentation proto-type for Chinese full text retrieval that applies partial syntactic analysis—the analysis of morphemes, words, and phrases.Abstract:
Text segmentation is a prerequisite for text retrieval systems. Chinese texts cannot be readily segmented into words because they do not contain word boundaries. ACTS is an automatic Chinese text segmentation proto-type for Chinese full text retrieval. It applies partial syntactic analysis—the analysis of morphemes, words, and phrases. The idea was originally largely inspired by experiments on English morpheme and phrase-analysis-based text retrieval, which are particularly germane to Chinese, because neither Chinese nor English texts have morpheme and phrase boundaries. ACTS is built on the hypothesis that Chinese words and phrases exceeding two characters can be characterized by a grammar that describes the concatenation behavior of the morphological and syntactic categories of their formatives. This is examined through three procedures: (1) Segmentation—texts are divided into one and two character segments by matching against a dictionary; (2) Category disambiguation—the syntactic categories of segments are determined according to context; (3) Parsing—the segments are analyzed based on the grammar, and subsequently combined into compound and complex words for indexing and retrieval. The experimental results, based on a small sample of 30 texts, show that most significant words and phrases in these texts can be extracted with a high degree of accuracy. © 1995 John Wiley & Sons, Inc.read more
Citations
More filters
Proceedings ArticleDOI
Comparing representations in Chinese information retrieval
TL;DR: Evaluated representation methods for Chinese information retrieval show that 1-gram indexing is good but not sufficiently competitive, while bigram indexing works surprisingly well.
Journal ArticleDOI
Knowledge map creation and maintenance for virtual communities of practice
Fu-ren Lin,Chih-ming Hsueh +1 more
TL;DR: The knowledge map creation and maintenance mechanisms developed in this research enable the dynamic knowledge management of communities of practice on the Internet.
Journal ArticleDOI
Chinese word segmentation and its effect on information retrieval
Schubert Foo,Hui Li +1 more
TL;DR: The findings reveal that the segmentation approach has an effect on IR effectiveness and better IR results are obtained by using the same method for query and document processing as this increase the probability of the query-document match.
Proceedings ArticleDOI
Chinese text retrieval without using a dictionary
TL;DR: The results show that, for all three sets of queries, the simple bigram indexing and the purely statistical word segmentation perform better than the popular dictionary-based maximum matching method with a dictionary of 138,955 entries.
Journal ArticleDOI
Applications of n‐grams in textual information systems
TL;DR: Applications that can be implemented efficiently and effectively using sets of n‐grams include spelling error detection and correction, query expansion, information retrieval with serial, inverted and signature files, dictionary look‐up, text compression, and language identification.
Related Papers (5)
Chinese text segmentation for text retrieval: achievements and problems
Zimin Wu,Gwyneth Tseng +1 more