scispace - formally typeset
Search or ask a question

Showing papers by "Fuji Ren published in 2001"


Proceedings ArticleDOI
11 Jun 2001
TL;DR: A PMT system that takes advantage of high-speed and large-memory computers and existing machine translation systems with different characteristics to solve the difficult machine translation problem is designed and implemented.
Abstract: Parallel machine translation (PMT) is a new machine translation paradigm that takes advantage of high-speed and large-memory computers and existing machine translation systems with different characteristics to solve the difficult machine translation problem. PMT is based on technologies of parallel computing, machine translation, and artificial intelligence. A PMT system consists of many machine translation procedures running in parallel, coordinated by a controller to dissolve various ambiguities in machine translation. We have designed and implemented a PMT system based on the above approach at a coarse parallel level. The system consists of four independent machine translation subsystems. Each subsystem is implemented using an existing machine translation technique and has its own characteristics. We present the principles and practice of PMT. We also present some results of experiments with our experimental PMT system and point out some future research on PMT.

34 citations


Proceedings ArticleDOI
07 Oct 2001
TL;DR: This paper describes some special features of Chinese characters and text and some statistical information obtained from a real world Chinese text corpus, and presents a hybrid approach that combines a rule-based method and a probability- based method to automatic checking and error correction of Chinese text.
Abstract: Automatic Chinese text checking and error correction is an important and difficult problem. Compared with automatic checking and error correction of Western text automatic checking and error correction of Chinese text faces more challenges. The Chinese language has many characters and no delimiters separating words. It is impossible to detect. and correct errors by penetrating into the inner composition of a character. In this paper, we describe some special features of Chinese characters and text and some statistical information obtained from a real world Chinese text corpus, and we present a hybrid approach that combines a rule-based method and a probability-based method to automatic checking and error correction of Chinese text. We also present an experimental system, HSACCCT (Hybrid System of Automatic Checking and Correction for Chinese Text), that implements this hybrid approach and some experimental results on real world Chinese text.

25 citations


Proceedings ArticleDOI
07 Oct 2001
TL;DR: This paper proposes a method which uses both statistical and structural information in sentence extraction, following the analysis of human's extractions, and several heuristic rules are added to filter out non-important sentences and to prevent similar sentences from being extracted.
Abstract: Being increasingly popular, the Internet greatly changes our live. We can conveniently receive and send information via the Internet. With the information explosion in Web, it is becoming crucial to develop means to automatically extract important sentences from the Web articles. In this paper, we propose a method which uses both statistical and structural information in sentence extraction. In addition, following the analysis of human's extractions, several heuristic rules are added to filter out non-important sentences and to prevent similar sentences from being extracted. Our experimental results proved the effectiveness of these means. In particular, once the heuristic rules being added, a significant improvement has been observed.

8 citations


Proceedings ArticleDOI
07 Oct 2001
TL;DR: This paper presents the design and implementation of an intelligent system that interacts with users using a natural language, English, and retrieves information from sources, for the users, using part of speech tagging, query knowledge base, query formation, and answer synthesis.
Abstract: As the traditional relational databases and the new XML document repositories are being widely used on the Web as information storage, there is a great need for easy access to the information sources, particularly through natural language interactions. In this paper, we present the design and implementation of an intelligent system that interacts with users using a natural language, English, and retrieves information from sources, for the users. The system consists of four major parts: part of speech tagging, query knowledge base, query formation, and answer synthesis. In implementation, the system first uses QTAG, a Hidden Markov Model based speech part tagger, to tag each word in the input sentence. Then, important words in the main phrase axe identified. A thesaurus is applied to reduce the important words to basic keywords, which are used to query the database. The query is formed based on the query knowledge stored in the query knowledge base. Finally, the query result is synthesized into an English sentence, which is presented to the user as the answer. With an efficient part of speech tagger, intelligent subsystems for query formation and synthesis of query result, and user-friendly interface, the intelligent system can answer questions effectively.

5 citations


Proceedings ArticleDOI
07 Oct 2001
TL;DR: A new machine translation (MT) approach using MT engines and sentence partitioning is presented and it is shown that the proposed approach is effective for implementing practical MT systems.
Abstract: In this paper, we present a new machine translation (MT) approach using MT engines and sentence partitioning. A multiple engine MT system consists of several MT engines running in parallel, coordinated by a controller. Each engine is implemented using an existing MT technique and has its own characteristics. When translating a sentence, each engine translates it independently. If more than one engine translates the sentence successfully, the controller chooses the best translation according to a combining algorithm implemented using translation statistics. If no engine succeeds in translating the sentence, the, controller partitions the sentence, coordinates the engines to translates its constituent simple sentences, and combines the partial translation results into a translation result for the whole input sentence. A complex sentence is partitioned based on conjunctives and punctuation marks such as comma: and semicolon. We have developed a multiple engine MT system based on the above approach. The system consists of four independent MT engines. The experiments show that the proposed approach is effective for implementing practical MT systems.

3 citations


Book ChapterDOI
18 Feb 2001
TL;DR: The concept of sensitive words and their efficacy in text segmentation are explained, then the hybrid approach that combines the rule- based method and the probability-based method using the concept ofsensitive words is described, showing that the presented approach is able to address theText segmentation problems effectively.
Abstract: Natural language processing, such as Checking and Correction of Texts, Machine Translation, and Information Retrieval, usually starts from words. The identification of words in Indo-European languages is a trivial task. However, this problem named text segmentation has been, and is still a bottleneck for various Asian languages, such as Chinese. There have been two main groups of approaches to Chinese segmentation: dictionary-based approaches and statistical approaches. However, both approaches have difficulty to deal with some Chinese text. To address the difficulties, we propose a hybrid approach using Sensitive Word Concept to Chinese text segmentation. Sensitive words are the compound words whose syntactic category is different from those of their components. According to the segmentation, a sensitive word may play different roles, leading to significantly different syntactic structures. In this paper, we explain the concept of sensitive words and their efficacy in text segmentation firstly, then describe the hybrid approach that combines the rule-based method and the probability-based method using the concept of sensitive words. Our experimental results showed that the presented approach is able to address the text segmentation problems effectively.

2 citations