scispace - formally typeset
Search or ask a question

Showing papers on "Computer-assisted translation published in 1999"


Proceedings ArticleDOI
01 Aug 1999
TL;DR: It is shown that using a probabilistic model, it is able to obtain performances close to those using an MT system, and the possibility of automatically gather parallel texts from the Web in an attempt to construct a reasonable training corpus is investigated.
Abstract: This paper describes the use of a probabilistic translation model to cross-language IR (CLIR). The performance of this approach is compared with that using machine translation (MT). It is shown that using a probabilistic model, we are able to obtain performances close to those using an MT system. In addition, we also investigated the possibility of automatically gather parallel texts from the Web in an attempt to construct a reasonable training corpus. The result is very encouraging. We showed that in several tests, such a training corpus is as good as a manually constructed one for CLIR purposes.

332 citations


01 Jan 1999
TL;DR: This article illustrates this by showing that an Example-Based approach to lexical choice for machine translation can use the Web as an adequate and free resource.
Abstract: The WWW is two orders of magnitude larger than the largest corpora. Although noisy, web text presents language as it is used, and statistics derived from the Web can have practical uses in many NLP applications. For this reason, the WWW should be seen and studied as any other computationally available linguistic resource. In this article, we illustrate this by showing that an Example-Based approach to lexical choice for machine translation can use the Web as an adequate and free resource.

175 citations


Proceedings ArticleDOI
J. Scott McCarley1
20 Jun 1999
TL;DR: This work investigates information retrieval between English and French, incorporating both translations directions into both document translation and query translation-based information retrieval, as well as into hybrid systems.
Abstract: Previous comparisons of document and query translation suffered difficulty due to differing quality of machine translation in these two opposite directions. We avoid this difficulty by training identical statistical translation models for both translation directions using the same training data. We investigate information retrieval between English and French, incorporating both translations directions into both document translation and query translation-based information retrieval, as well as into hybrid systems. We find that hybrids of document and query translation-based systems out-perform query translation systems, even human-quality query translation systems.

135 citations


Book
08 Oct 1999
TL;DR: This book discusses Machine Translation, Computational Morphology and the Two-level Model, and other Approaches to MT, which focuses on Computational Linguistics Techniques.
Abstract: 1 Background.- 1 Introduction.- 1.1 Computers in Translation.- 1.2 History of Machine Translation.- 1.3 Strategies for Machine Translation.- 1.4 Artificial Intelligence.- 1.5 Conclusion.- 2 Basic Terminology and Background.- 2.1 Linguistics.- 2.2 Formal Background.- 2.3 Review of Prolog.- 2.4 Conclusion.- 2 Machine-Aided Translation.- 3 Text Processing.- 3.1 Format Preservation.- 3.2 Character Sets and Typography.- 3.3 Input Methods.- 3.4 Conclusion.- 4 Translator's Workbench and Translation Aids.- 4.1 Translator's Workbench.- 4.2 Translation Memory.- 4.3 Bilingual Alignment.- 4.4 Subsentential Alignment.- 4.5 Conclusion.- 3 Machine Translation.- 5 Computational Linguistics Techniques.- 5.1 Introduction.- 5.2 Computational Morphology and the Two-level Model.- 5.3 Syntactic Analysis.- 5.4 Parsing.- 5.5 Generation.- 5.6 Conclusion.- 6 Transfer Machine Translation.- 6.1 Syntactic Transfer MT.- 6.2 Semantic Transfer MT.- 6.3 Lexicalist MT.- 6.4 Conclusion.- 7 Interlingua Machine Translation.- 7.1 Lexical Conceptual Structure MT.- 7.2 Knowledge-Based Machine Translation.- 7.3 Conclusion.- 8 Other Approaches to MT.- 8.1 Example-Based Machine Translation.- 8.2 Statistical Machine Translation.- 8.3 Minimal Recursion Semantics.- 8.4 Constraint Systems.- 8.5 Conclusion.- 4 Common Issues.- 9 Disambiguation.- 9.1 POS Tagging.- 9.2 Disambiguation of Syntactic Analysis.- 9.3 Word Sense Disambiguation.- 9.4 Transfer Disambiguation.- 9.5 Conclusion.- 10 Evaluation.- 10.1 Evaluation Participants.- 10.2 Evaluation Strategies.- 10.3 Quality Measures.- 10.4 Software Evaluation.- 10.5 Software User Needs.- 10.6 Conclusion.- 11 Conclusion.- 11.1 Trends.- 11.2 Further Reading.- Appendix: Useful Resources.

119 citations


01 Jan 1999
TL;DR: In recent years the fields of translation studies, natural language processing and corpus linguistics have come to share one object of study, namely parallel text corpora, and more specifically tra ...
Abstract: In recent years the fields of translation studies, natural language processing and corpus linguistics have come to share one object of study, namely parallel text corpora, and more specifically tra ...

118 citations


Proceedings Article
01 Jan 1999
TL;DR: The Attribute Logic Engine is described and enhancements to it are described that enable it to serve as a complete grammatical infrastructure for applications such as spoken language translation.
Abstract: In this paper, we describe The Attribute Logic Engine (ALE) and enhancements to it that enable it to serve as a complete grammatical infrastructure for applications such as spoken language translation. We indicate how ALE was expanded and combined with off-the-shelf speech components to develop an application that translates English speech to German speech. The translation operates by way of a semantic representation based on typed feature structures, with information on thematic roles (“who did what to whom”) and agreement information that can be used to guide search in less restricted domains and map expressions to more felicitous (but semantically equivalent) constructions in the target language than a more literal, surface-oriented method would admit.

114 citations


Patent
19 Feb 1999
TL;DR: An interactive paper translation using alternate dictionaries includes: a first storage for storing words and strings of more words with respective correct translations so that it forms a dictionary of words and sentences or sentence portions.
Abstract: An interactive paper translation using alternate dictionaries includes: a first storage for storing words and strings of more words with respective correct translations so that it forms a dictionary of words and sentences or sentence portions; a second receiver for receiving a text to be translated; a third storage for storing the translated text in the second screen field; and a fourth searcher for searching in progression for the words of the text to be translated. The method compares translated words with the words of the first storage to obtain a progressive translation and forms a completely automatic translation or an interactive translation or vice versa, before beginning the translation. During the option of interactive translation, there are further displays and windows. The method may also involve a scanner integrated with OCR for direct loading of the sheets to be translated.

52 citations


13 Sep 1999
TL;DR: An overview of the issues in designing acontrolled language, the implementation of a controlled language checker, and the deployment of KANT Controlled English for multilingual machine translation is presented.
Abstract: In this paper, we present an overview of the issues in designing a controlled language, the implementation of a controlled language checker, and the deployment of KANT Controlled English for multilingual machine translation. We also discuss some success criteria for introducing controlled language. Finally, future vision of KANT controlled language development is discussed.

51 citations


Michael Carl1
13 Sep 1999
TL;DR: An example-based machine translation (EBMT) system which relays on various knowledge resources and investigates the possibilities and limits of the translation template induction process.
Abstract: This paper describes an example-based machine translation (EBMT) system which relays on various knowledge resources. Morphologic analyses abstract the surface forms of the languages to be translated. A shallow syntactic rule formalism is used to percolate features in derivation trees. Translation examples serve the decomposition of the text to be translated and determine the transfer of lexical values into the target language. Translation templates determine the word order of the target language and the type of phrases (e.g. noun phrase, prepositional phase, ...) to be generated in the target language. An induction mechanism generalizes translation templates from translation examples. The paper outlines the basic idea underlying the EBMT system and investigates the possibilities and limits of the translation template induction process.

51 citations


01 Jan 1999
TL;DR: This survey of the present demand and use of computer-based translation software concentrates on systems designed for the production of translations of publishable quality, including developments in controlled language systems, translator workstations, and localisation.
Abstract: This survey of the present demand and use of computer-based translation software concentrates on systems designed for the production of translations of publishable quality, including developments in controlled language systems, translator workstations, and localisation; but it covers also the developments of software for non-translators, in particular for use with Web pages and other Internet applications, and it looks at future needs and systems under development. The final section compares the types of translations that can be met most appropriately by human and by machine (and computer-aided) translation respectively.

47 citations


Patent
27 Dec 1999
TL;DR: In this article, a translation supporting apparatus which searches out a translation example useful for a translation task from within translation example database is disclosed, which stores character strings of a first language and translation results of a second language corresponding to the character strings in a unit of a document.
Abstract: A translation supporting apparatus which searches out a translation example useful for a translation task from within a translation example database is disclosed. The translation example database stores character strings of a first language and translation results of a second language corresponding to the character strings in a unit of a document. A retrieval request inputting apparatus inputs a translation target sentence. A similarity retrieval apparatus determines, for each translation example, a similarity to the translation target sentence, a similarity to a translation example context which is another translation example having such a predetermined relationship that it is included in the same document and is present within one sentence before or after the translation example, a similarity to a retrieval request context which is another translation target character string having such a predetermined relationship that it is included in the same document as the translation target character string and is present within the range of one sentence before or after the translation target character string, and a similarity between the translation example context and the retrieval request context, and integrates the four similarities. A similar example outputting apparatus refers to the integrated similarities and outputs those translation examples similar to the translation target character string.


01 Sep 1999
TL;DR: This thesis advances the state of the art in example-based machine translation by proposing techniques for predicting the adaptation requirements of a retrieval episode, and a new EBMT scheme is proposed in which the cases encode knowledge about their own reusability, determined by cross-linguistic mappings.
Abstract: Example-Based Machine Translation Br ona Collins Supervisor: P adraig Cunningham Translation can be viewed as a problem-solving process where a source language text is transformed into its target language equivalent. A machine translation system, solving the problem from rst-principles, requires more knowledge than has ever been successfully encoded in any system. An alternative approach is to reuse past translation experience encoded in a set of exemplars, or cases. A case which is similar to the input problem will be retrieved and a solution produced by adapting its target language component. This thesis advances the state of the art in example-based machine translation by proposing techniques for predicting the adaptation requirements of a retrieval episode. An Adaptation-Guided Retrieval policy increases the e ciency of the retriever, which will now search for adaptable cases, and relieves the knowledge-acquisition bottleneck of the adaptation component. A exible case-storage scheme also allows all knowledge required for adaptation to be deduced from the case-base itself. The rst part of the thesis contrasts such a CBR-motivated approach with current EBMT systems which are either data-intensive or knowledge-intensive. A new EBMT scheme is proposed in which the cases encode knowledge about their own reusability, determined by cross-linguistic mappings. The information allows cases to be generalised carefully, to the degree that is necessitated by the data. Linguistic and translational divergences | the obstacles to reusability | are investigated in the domain of software-manual translation, and on this basis, a suitable case representation scheme is proposed. The second and third parts of the thesis describe the on-line and o -line processes of an EBMT system in which the case-base is the only knowledge source. Cases are deduced from texts automatically, and at run-time, the matching and retrieval tasks exploit the adaptability information in the cases in order to maximise coverage without compromising on accuracy. The multi-tiered case representation scheme allows adaptation at the sub-sentential and word levels, when necessary. The general performance of the system is shown to degrade gracefully and to improve as the case-base size increases.

01 Jan 1999
TL;DR: Safety apparatus intended to safeguard the operation of a machine and to discourage overloading of an electric motor powering the machine.
Abstract: Safety apparatus intended to safeguard the operation of a machine and to discourage overloading of an electric motor powering the machine. The machine is one which requires the operator to wear an eye or face protector and incorporated in the protector is switch means connected by an electric circuit to the machine motor. Only when the protector is in normal position of use will the machine operate or the work be properly illuminated by a floodlight also forming part of the circuit. A heat sensitive switch in the circuit aids in preventing the operator from overloading the machine motor.

13 Sep 1999
TL;DR: In this article, an example-based machine translation (EBMT) system for English-Malay translation is described. But it is based on a Bilingual Knowledge Bank (BKB).
Abstract: In this paper, we describe an Example-Based Machine Translation (EBMT) system for English-Malay translation. Our approach is an example-based approach which relies sorely on example translations kept in a Bilingual Knowledge Bank (BKB). In our approach, a flexible annotation schema called Structured String-Tree Correspondence (SSTC) is used to annotate both the source and target sentences of a translation pair. Each SSTC describes a sentence, a representation tree as well as the correspondences between substrings in the sentence and subtrees in the representation tree. With both the source and target SSTCs established, a translation example in the BKB can then be represented effectively in terms of a pair of synchronous SSTCs. In the process of translation, we first try to build the representation tree for the source sentence (English) based on the example-based parsing algorithm as presented in [1]. By referring to the resultant source parse tree, we then proceed to synthesis the target sentence (Malay) based on the target SSTCs as pointed to by the synchronous SSTCs which encode the relationship between source and target SSTCs.

Book
01 Jan 1999
TL;DR: Graph Theory and Natural Language Processing Unification-Based Formalisms for Translation in Natural Language processing and MILC: Structure and Implementation are presented.
Abstract: Graph Theory and Natural Language Processing.- Unification-Based Formalisms for Translation in Natural Language Processing.- MILC: Structure and Implementation.- Experiments and Results.- Conclusion and Outlook.

13 Sep 1999
TL;DR: C-DAC took up this challenge, as it felt that India, being a multi-lingual and multi-cultural country with a population of approximately 950 million people and 18 constitutionally recognized languages, needs a translation system for instant transfer of information and knowledge.
Abstract: Work in the area of Machine Translation has been going on for several decades and it was only during the early 90s that a promising translation technology began to emerge with advanced researches in the field of Artificial Intelligence and Computational Linguistics. This held the promise of successfully developing usable Machine Translation Systems in certain well-defined domains. C-DAC took up this challenge, as we felt that India, being a multi-lingual and multi-cultural country with a population of approximately 950 million people and 18 constitutionally recognized languages, needs a translation system for instant transfer of information and knowledge. The other groups who are working in this area of English to Hindi Translation are National Center for Software Technology (NCST), who are working on translation of News Stories and Electronics Research & Development Center of India (ER & DCI). who have developed the Machine Assisted Translation System for the Health Domain. A major project on Indian Languages to Indian Languages Translation (Anusaaraka) is also under development at University of Hyderabad.

Patent
30 Nov 1999
TL;DR: A translation script operates to select a translation routine from a set of available translation routines, the selection being based on the nature of the text file, the operating system, and the desired language translation as mentioned in this paper.
Abstract: A method of providing a desired language version of textual portions of a source code program for a computer system. During the system assembly process, a system description record (SDR) is read that identifies the operating system, including the desired language version thereof, and other software programs. A text file corresponding to at least one of the programs is read and a native-language version of the program is installed on the computer system. A translation script operates to select a translation routine from a set of available translation routines, the selection being based on the nature of the text file, the operating system, and the desired language translation. The translation routine locates native-language text strings in the text file and substitutes the desired language translations of those strings. The translation process takes place substantially concurrently with installation of the program in the computer system.

01 Jan 1999
TL;DR: This research is aimed at developing a valency dictionary architecture to comprehensively list the full range of alternations associated with a given predicate sense, both efficiently and robustly.
Abstract: This research is aimed at developing a valency dictionary architecture to comprehensively list the full range of alternations associated with a given predicate sense, both efficiently and robustly. The architecture is designed to incorporate all information available in current on-line resources, as well as additional features such as argument status, grammatical relations, and an augmented case-role representation. Words are divided into senses, which are distinguished on semantic grounds, depending on the core lexical meaning of the verb. Each sense may have one or more alternations, thus keeping the number of senses manageable, while allowing for systematic variation in the lexical realization. Individual syntactic case frames are indexed back to the basic semantic argument component of the given predicate sense.

13 Sep 1999
TL;DR: A controlled language brings out the maximum performance of machine translation systems at the cost of the burden on source text authors, so the controlled language approach is suitable for translation for dissemination of information.
Abstract: A controlled language is a subset of a natural language with artificially restricted vocabulary, grammar, and style. Texts written in a controlled language are usually less complex and less ambiguous than those written in an uncontrolled language. The use of a controlled language therefore produces better results in machine translation. On the other hand, a controlled language reduces the power of expression and decreases the writing speed. In short, a controlled language brings out the maximum performance of machine translation systems at the cost of the burden on source text authors. So the controlled language approach is suitable for translation for dissemination of information. And a controlled language becomes more beneficial when texts are translated into multiple target languages. We should note the distinction between a controlled language and a sublanguage, which are sometimes confused. The term ‘sublanguage’, which means literally a subset of a language, is used when focus is put on a language used in a specific domain (for example, weather forecasting) rather than on the whole of a language. ‘Sublanguage’ does not imply artificially imposed restrictions. We should also mention pre-editing. Preediting is a form of human assistance in machine translation. It includes not only rewriting a source text but also inserting special symbols or tags within the text. Preediting is not always done by the authors of source texts, but the controlled language is originally expected to be used by the authors themselves.

Journal ArticleDOI
TL;DR: An iterative paradigm is used to examine errors associated with interlingual divergence in meaning arising from the automated machine translation of English proverbs and the need for the development of Web‐based translation systems, which have an explicit cross‐linguistic representation of meaning for successful intercultural communication is discussed.
Abstract: The Internet has the potential to facilitate understanding across cultures and languages by removing the physical barriers to intercultural communication. One possible contributor to this development has been the recent release of freely‐available automated direct machine translation systems, such as AltaVista with SYSTRAN, which translates from English to five other European languages (French, German, Italian, Spanish and Portuguese), and vice versa. However, concerns have recently been raised over the performance of these systems, and the potential for confusion that can be created when the intended meaning of sentences is not correctly translated (i.e. semantic processing errors). In this paper, we use an iterative paradigm to examine errors associated with interlingual divergence in meaning arising from the automated machine translation of English proverbs. The need for the development of Web‐based translation systems, which have an explicit cross‐linguistic representation of meaning for successful intercultural communication, is discussed.

01 Jan 1999
TL;DR: A building blocks approach (a term borrowed from the theoretical framework discussed in Lange et al (1997)), is advantageous in that it extracts fragments of text, from a traditional TM database, that more closely represent those with which a human translator works.
Abstract: Traditional Translation Memory systems that find the best match between a SL input sentence and SL sentences in a database of previously translated sentences are not ideal. Studies in the cognitive processes underlying human translation reveal that translators very rarely process SL text at the level of the sentence. The units with which translators work are usually much smaller i.e. word, syntactic unit, clause or group of meaningful words. A building blocks approach (a term borrowed from the theoretical framework discussed in Lange et al (1997)), is advantageous in that it extracts fragments of text, from a traditional TM database, that more closely represent those with which a human translator works. The text fragments are combined with the intention of producing TL translations that are more accurate, thus requiring less post- editing on the part of the translator.

13 Sep 1999
TL;DR: A translation method which recursively divides a sentence and translates each part separately and an analogy-based word-level alignment method which predicts word correspondences between source and translation sentences of new translation examples are evaluated.
Abstract: Example-Based Machine Translation can be applied to languages whose resources like dictionaries, reliable syntactic analyzers are hardly available because it can learn from new translation examples. However, difficulties still remain in translation of sentences which are not fully covered by the matching sentence. To solve that problem, we present in this paper a translation method which recursively divides a sentence and translates each part separately. In addition, we evaluate an analogy-based word-level alignment method which predicts word correspondences between source and translation sentences of new translation examples. The translation method was implemented in a French-Japanese machine translation system and spoken language text were used as examples. Promising translation results were earned and the effectiveness of the alignment method in the translation was confirmed.

Proceedings ArticleDOI
12 Oct 1999
TL;DR: Learning methods presented here enable a supervised, human-assisted learning of generalised translation rules, thus making it faster and easier to adapt the machine translation system to new languages.
Abstract: The purpose of this paper is to present learning methods for creating language translation rules from multilingual text samples The languages concerned are controlled languages, ie, they are domain specific sublanguages with ambiguities eliminated by restricting the vocabulary and syntax Learning methods presented here enable a supervised, human-assisted learning of generalised translation rules, thus making it faster and easier to adapt our machine translation system to new languages

13 Sep 1999
TL;DR: The technology the authors demonstrate has first been applied to Persian-English machine translation within the Shiraz project and is currently extended to cover languages such as Arabic, Japanese, Korean and others.
Abstract: The Computing Research Laboratory is currently developing technologies that allow rapid deployment of automatic translation capabilities. These technologies are designed to handle low-density languages for which resources, be that human informants or data in electronically readable form, are scarce. All tools are built in an incremental fashion, such that some simple tools (a bilingual dictionary or a glosser) can be delivered early in the development to support initial analysis tasks. More complex applications can be fielded in successive functional versions. The technology we demonstrate has first been applied to PersianEnglish machine translation within the Shiraz project and is currently extended to cover languages such as Arabic, Japanese, Korean and others.

13 Sep 1999
TL;DR: Results of the implementation show that the approach of using terminology categorization already existing in the machine translation system is very promising.
Abstract: This paper describes an ongoing project which has the goal of improving machine translation quality by increasing knowledge about the text to be translated. A basic piece of such knowledge is the domain or subject field of the text. When this is known, it is possible to improve meaning selection appropriate to that domain. Our current effort consists in automating both recognition of the text’s domain and the assignment of domain-specific translations. Results of our implementation show that the approach of using terminology categorization already existing in the machine translation system is very promising.

01 Jan 1999
TL;DR: The Janus-II system uses paraphrasing and interactive error correction to boost performance, and now accepts English, German, Japanese, Spanish, and Korean input, which it translates into any other of these languages.
Abstract: We present JANUS-II, a large scale system effort aimed at interactive spoken language translation. JANUS-II now accepts spontaneous conversational speech in a limited domain in English, German or Spanish and produces output in German, English, Spanish, Japanese and Korean. The challenges of coarticulated, disfluent, ill-formed speech are manifold, and have required advances in acoustic modeling, dictionary learning, language modeling, semantic parsing and generation, to achieve acceptable performance. A semantic interlingua that represents the intended meaning of an input sentence, facilitates the generation of culturally and contextually appropriate translation in the presence of irrelevant or erroneous information. Application of statistical, contextual, prosodic and discourse constraints permits a progressively narrowing search for the most plausible interpretation of an utterance. During translation, JANUS-II produces paraphrases that are used for interactive correction of translation errors. Beyond our continuing efforts to improve robustness and accuracy, we have also begun to study possible forms of deployment. Several system prototypes have been implemented to explore translation needs in different settings: speech translation in one-on-one video conferencing, as portable mobile interpreter, or as passive simultaneous conversation translator. We will discuss their usability and performance.

Book
01 Jun 1999
TL;DR: In this paper, the authors investigate the requirements for automatically recognizing idioms and check whether idiom recognition is possible within current translation systems, i.e. machine translation and translation memory systems.
Abstract: Translating idioms is one of the most difficult tasks for human translators and translation machines alike. The main problems consist in recognizing an idiom and in distinguishing idiomatic from non-idiomatic usage. Recognition is difficult since many idioms can be modified and others can be discontinuously spread over a clause. But with the help of systematic idiom collections and special rules the recognition of an idiom candidate is always possible. The distinction between idiomatic and non-idiomatic usage is more problematic. Sometimes this can be done by means of special words that are only used in an idiom. But in general this distinction is a question of semantics and pragmatics and therefore beyond the abilities of current translation systems. In this paper we investigate the requirements for automatically recognizing idioms and we check whether idiom recognition is possible within current translation systems, i.e. machine translation and translation memory systems. This is of current interest since the developers of translation systems have started to include huge idiom collections in their products.

11 May 1999
TL;DR: The first half of the paper describes the details of automating most of the translation from C to C++, as well as the difficulties encountered, and identifies some of the issues and challenges in automating this translation process.
Abstract: As programming languages become more and more diversified, there is an increasing demand to translate programs written in one high-level language into another. Such translation can help us more effectively reuse the existing code, especially when automating translation is possible. However due to many subtle distinctions between different languages, usually only a subset of translation can be automated. The first half of the paper describes the details of automating most of the translation from C to C++, as well as the difficulties encountered. The second half of the paper talks about the experience of manually porting Java programs to C++, and identifies some of the issues and challenges in automating this translation process. Through the discussions, it is evident that translation is heavily language specific. Comprehensive knowledge about the languages and their subtle distinctions is essential. On the other hand, designing tools to allow high level specification of translation rules and effectively incorporate human interaction is a generic approach to any language translation problem, which is an interesting research problem to explore.

Journal Article
TL;DR: This paper analyses the influence of information society on translation teaching from the following aspects: the traditional translation criterion of “be faithful to the original author” being challenged, diversification of translation object, and change in translation ways and means.
Abstract: Information blooming and Internet bring opportunities and challenges to translation teaching.This paper analyses the influence of information society on translation teaching from the following aspects: 1. The traditional translation criterion of “be faithful to the original author” being challenged; 2. Diversification of translation object; 3. The change in translation ways and means; 4. “machine translation system” and translation teaching; 5. The development of electronic textbooks. Therefore, as teachers of translation, we should continuously study new knowledge and new technology as well as use computer proficiently. [