ACTS: an automatic Chinese text segmentation system for full text retrieval

doi:10.1002/(SICI)1097-4571(199503)46:2<83::AID-ASI2>3.0.CO;2-0

Journal ArticleDOI

ACTS: an automatic Chinese text segmentation system for full text retrieval

Zimin Wu, +1 more

- 01 Mar 1995 -

Journal of the Association for Informati...

- Vol. 46, Iss: 2, pp 83-96

Chats0

TLDR

ACTS is an automatic Chinese text segmentation proto-type for Chinese full text retrieval that applies partial syntactic analysis—the analysis of morphemes, words, and phrases.

Abstract:

Text segmentation is a prerequisite for text retrieval systems. Chinese texts cannot be readily segmented into words because they do not contain word boundaries. ACTS is an automatic Chinese text segmentation proto-type for Chinese full text retrieval. It applies partial syntactic analysis—the analysis of morphemes, words, and phrases. The idea was originally largely inspired by experiments on English morpheme and phrase-analysis-based text retrieval, which are particularly germane to Chinese, because neither Chinese nor English texts have morpheme and phrase boundaries. ACTS is built on the hypothesis that Chinese words and phrases exceeding two characters can be characterized by a grammar that describes the concatenation behavior of the morphological and syntactic categories of their formatives. This is examined through three procedures: (1) Segmentation—texts are divided into one and two character segments by matching against a dictionary; (2) Category disambiguation—the syntactic categories of segments are determined according to context; (3) Parsing—the segments are analyzed based on the grammar, and subsequently combined into compound and complex words for indexing and retrieval. The experimental results, based on a small sample of 30 texts, show that most significant words and phrases in these texts can be extracted with a high degree of accuracy. © 1995 John Wiley & Sons, Inc.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Comparing representations in Chinese information retrieval

K. L. Kwok

TL;DR: Evaluated representation methods for Chinese information retrieval show that 1-gram indexing is good but not sufficiently competitive, while bigram indexing works surprisingly well.

...read moreread less

Journal ArticleDOI

Knowledge map creation and maintenance for virtual communities of practice

Fu-ren Lin, +1 more

- 01 Mar 2006 -

Information Processing and Management

TL;DR: The knowledge map creation and maintenance mechanisms developed in this research enable the dynamic knowledge management of communities of practice on the Internet.

...read moreread less

Journal ArticleDOI

Chinese word segmentation and its effect on information retrieval

Schubert Foo, +1 more

- 01 Jan 2004 -

Information Processing and Management

TL;DR: The findings reveal that the segmentation approach has an effect on IR effectiveness and better IR results are obtained by using the same method for query and document processing as this increase the probability of the query-document match.

...read moreread less

Proceedings ArticleDOI

Chinese text retrieval without using a dictionary

Aitao Chen, +4 more

TL;DR: The results show that, for all three sets of queries, the simple bigram indexing and the purely statistical word segmentation perform better than the popular dictionary-based maximum matching method with a dictionary of 138,955 entries.

...read moreread less

Journal ArticleDOI

Applications of n‐grams in textual information systems

Alexander M. Robertson, +1 more

- 01 Mar 1998 -

Journal of Documentation

TL;DR: Applications that can be implemented efficiently and effectively using sets of n‐grams include spelling error detection and correction, query expansion, information retrieval with serial, inverted and signature files, dictionary look‐up, text compression, and language identification.

...read moreread less

Collapse

ACTS: an automatic Chinese text segmentation system for full text retrieval

Citations

Comparing representations in Chinese information retrieval

Knowledge map creation and maintenance for virtual communities of practice

Chinese word segmentation and its effect on information retrieval

Chinese text retrieval without using a dictionary

Applications of n‐grams in textual information systems

Related Papers (5)

Chinese text segmentation for text retrieval: achievements and problems

A statistical method for finding word boundaries in Chinese text

A stochastic finite-state word-segmentation algorithm for Chinese

PAT-tree-based keyword extraction for Chinese information retrieval

Word identification for Mandarin Chinese sentences