scispace - formally typeset
Journal ArticleDOI

ACTS: an automatic Chinese text segmentation system for full text retrieval

Reads0
Chats0
TLDR
ACTS is an automatic Chinese text segmentation proto-type for Chinese full text retrieval that applies partial syntactic analysis—the analysis of morphemes, words, and phrases.
Abstract
Text segmentation is a prerequisite for text retrieval systems. Chinese texts cannot be readily segmented into words because they do not contain word boundaries. ACTS is an automatic Chinese text segmentation proto-type for Chinese full text retrieval. It applies partial syntactic analysis—the analysis of morphemes, words, and phrases. The idea was originally largely inspired by experiments on English morpheme and phrase-analysis-based text retrieval, which are particularly germane to Chinese, because neither Chinese nor English texts have morpheme and phrase boundaries. ACTS is built on the hypothesis that Chinese words and phrases exceeding two characters can be characterized by a grammar that describes the concatenation behavior of the morphological and syntactic categories of their formatives. This is examined through three procedures: (1) Segmentation—texts are divided into one and two character segments by matching against a dictionary; (2) Category disambiguation—the syntactic categories of segments are determined according to context; (3) Parsing—the segments are analyzed based on the grammar, and subsequently combined into compound and complex words for indexing and retrieval. The experimental results, based on a small sample of 30 texts, show that most significant words and phrases in these texts can be extracted with a high degree of accuracy. © 1995 John Wiley & Sons, Inc.

read more

Citations
More filters
Proceedings ArticleDOI

Comparing representations in Chinese information retrieval

TL;DR: Evaluated representation methods for Chinese information retrieval show that 1-gram indexing is good but not sufficiently competitive, while bigram indexing works surprisingly well.
Journal ArticleDOI

Knowledge map creation and maintenance for virtual communities of practice

TL;DR: The knowledge map creation and maintenance mechanisms developed in this research enable the dynamic knowledge management of communities of practice on the Internet.
Journal ArticleDOI

Chinese word segmentation and its effect on information retrieval

TL;DR: The findings reveal that the segmentation approach has an effect on IR effectiveness and better IR results are obtained by using the same method for query and document processing as this increase the probability of the query-document match.
Proceedings ArticleDOI

Chinese text retrieval without using a dictionary

TL;DR: The results show that, for all three sets of queries, the simple bigram indexing and the purely statistical word segmentation perform better than the popular dictionary-based maximum matching method with a dictionary of 138,955 entries.
Journal ArticleDOI

Applications of n‐grams in textual information systems

TL;DR: Applications that can be implemented efficiently and effectively using sets of n‐grams include spelling error detection and correction, query expansion, information retrieval with serial, inverted and signature files, dictionary look‐up, text compression, and language identification.
Related Papers (5)