Study of Word-Based Chinese Document Experimental System and Chinese Free-Text Information Extraction Experiment Based on It
24 Aug 2007-Vol. 5, pp 120-123
TL;DR: The Word-based Chinese Document System designed by us can promote the development of Chinese Information Processing technology to more advanced application stages.
Abstract: This paper presents a word-based Chinese document experimental system which is aimed to make Chinese information processing technology to develop on a more reliable and more efficient basis. This system implements the document storage and processing format, both of which are based on the smallest information carrier: Chinese word. Further an IE algorithm with two steps strategy for the Chinese free text is introduced. And then taking this document system as experimental platform, choosing the abstract part of Chinese Sci_Tech journals as the free text, the IE experiment which is conducted and get good results: accuracy ratio P is 95.03%, recall ratio R is 91.40% and F-value is 93.18% From the experimental results, we can see that the Word-based Chinese Document System designed by us can promote the development of Chinese Information Processing technology to more advanced application stages.
Citations
More filters
••
15 Jan 2008TL;DR: A WCA-Selection Chinese free-text HMM IE algorithm that takes the Chinese Sci-tech journal abstract text as the extraction text and a WCA selection optimization strategy concreted is presented.
Abstract: This paper proposes the extraction task of the Chinese Sci-tech journal text and presents a WCA-Selection Chinese free-text HMM IE algorithm. The HMM IE algorithm takes the Chinese Sci-tech journal abstract text as the extraction text. According to the features of WCA, an idea of WCA selection model re-optimization is proposed. And a WCA selection optimization strategy is concreted. Then the experimental verification is conducted with a satisfied result. The experiment results show that the designed extraction algorithm and WCA selection optimization strategy have good performance in the the Chinese Sci-tech journal abstract text.
1 citations
Cites methods from "Study of Word-Based Chinese Documen..."
...Firstly, in our designed Word-based Chinese Document Experimental System [8], the training text and testing text are transformed into the text of CDM format....
[...]
References
More filters
••
TL;DR: This work introduces kernels defined over shallow parse representations of text, and design efficient algorithms for computing the kernels, and uses the devised kernels in conjunction with Support Vector Machine and Voted Perceptron learning algorithms for the task of extracting person-affiliation and organization-location relations from text.
Abstract: We present an application of kernel methods to extracting relations from unstructured natural language sources. We introduce kernels defined over shallow parse representations of text, and design efficient algorithms for computing the kernels. We use the devised kernels in conjunction with Support Vector Machine and Voted Perceptron learning algorithms for the task of extracting person-affiliation and organization-location relations from text. We experimentally evaluate the proposed methods and compare them with feature-based learning algorithms, with promising results.
916 citations
••
06 Jul 2002TL;DR: This work introduces kernels defined over shallow parse representations of text, and design efficient algorithms for computing the kernels, and uses the devised kernels in conjunction with Support Vector Machine and Voted Perceptron learning algorithms for the task of extracting person-affiliation and organization-location relations from text.
Abstract: We present an application of kernel methods to extracting relations from unstructured natural language sources. We introduce kernels defined over shallow parse representations of text, and design efficient algorithms for computing the kernels. We use the devised kernels in conjunction with Support Vector Machine and Voted Perceptron learning algorithms for the task of extracting person-affiliation and organization-location relations from text. We experimentally evaluate the proposed methods and compare them with feature-based learning algorithms, with promising results.
821 citations
"Study of Word-Based Chinese Documen..." refers methods in this paper
...The Machine learning approach [4][5] is relied on the annotated corpus providing examples on which learning algorithms can operate....
[...]
••
TL;DR: The author presents a generic architecture for information-extraction systems and then surveys the learning algorithms that have been developed to address the problems of accuracy, portability, and knowledge acquisition for each component of the architecture.
Abstract: This article surveys the use of empirical, machine-learning methods for a particular natural language-understanding task-information extraction. The author presents a generic architecture for information-extraction systems and then surveys the learning algorithms that have been developed to address the problems of accuracy, portability, and knowledge acquisition for each component of the architecture.
279 citations
••
TL;DR: An overview of the problems addressed, current approaches toward solutions, and the state of the art is assessed, and its potential for future progress is assessed.
Abstract: In recent years, analysts have been confronted with the increasing availability of ondline sources of information in the form of naturaldlanguage texts. This increased accessibility of textual information has led to a corresponding interest in technology for processing this text automatically to extract taskdrelevant information. This demand for a technological solution to the need to deal with the oftendoverwhelming quantity of available information has stimulated the development of the field of Information Extraction. This article provides an overview of the problems addressed, current approaches toward solutions, and assesses the state of the art and its potential for future progress.
191 citations
"Study of Word-Based Chinese Documen..." refers methods in this paper
...There are two basic approaches [2] to design the modules of an IE system, which can be called the Knowledge Engineering approach, and the Machine Learning approach....
[...]
01 Jan 1998
TL;DR: For MUC-7, BBN has for the first time fielded a fully-trained system for NE, TE, and TR; results are all the output of statistical language models trained on annotated data, rather than programs executing handwritten rules.
143 citations
"Study of Word-Based Chinese Documen..." refers methods in this paper
...The Machine learning approach [4][5] is relied on the annotated corpus providing examples on which learning algorithms can operate....
[...]