scispace - formally typeset
Search or ask a question

Showing papers on "Computer-assisted translation published in 2003"


Proceedings ArticleDOI
27 May 2003
TL;DR: The empirical results suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translation.
Abstract: We propose a new phrase-based translation model and decoding algorithm that enables us to evaluate and compare several, previously proposed phrase-based translation models. Within our framework, we carry out a large number of experiments to understand better and explain why phrase-based models out-perform word-based models. Our empirical results, which hold for all examined language pairs, suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translations. Surprisingly, learning phrases longer than three words and learning phrases from high-accuracy word-level alignment models does not have a strong impact on performance. Learning only syntactically motivated phrases degrades the performance of our systems.

3,778 citations


Patent
14 Nov 2003
TL;DR: In this paper, a system and method for translation of electronic communications automatically selects and deploys specialized dictionaries based upon context recognition and other factors, which can be used to translate electronic mail, instant messages, chat, SMS messages, electronic text and word processing files, Internet web pages, Internet search results, and other textual communications for a variety of device types, including wireless devices.
Abstract: A system and method for translation of electronic communications automatically selects and deploys specialized dictionaries based upon context recognition and other factors. Software tools can be employed for continual dictionary enhancement. The invention can accept speech and text inputs and can be used to translate electronic mail, instant messages, chat, SMS messages, electronic text and word processing files, Internet web pages, Internet search results, and other textual communications for a variety of device types, including wireless devices. In one embodiment, language pairs are automatically determined in real-time.

225 citations


Proceedings Article
01 Jan 2003
TL;DR: This paper is collecting dialogue corpora by letting two people talk, each in his/her native language, through a speechto-speech translation system, to concentrate on translation modules, and has replaced speech recognition modules with human typists.
Abstract: This paper presents three approaches to creating corpora that we are working on for speech-to-speech translation in the travel conversation task. The first approach is to collect sentences that bilingual travel experts consider useful for people goingto/coming-from another country. The resulting EnglishJapanese aligned corpora are collectively called the basic travel expression corpus (BTEC), which is now being translated into several other languages. The second approach tries to expand this corpus by generating many “synonymous” expressions for each sentence. Although we can create large corpora by the above two approaches relatively cheaply, they may be different from utterances in actual conversation. Thus, as the third approach, we are collecting dialogue corpora by letting two people talk, each in his/her native language, through a speechto-speech translation system. To concentrate on translation modules, we have replaced speech recognition modules with human typists. We will report some of the characteristics of these corpora as well.

149 citations


Journal ArticleDOI
TL;DR: In this article, the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process was investigated, and the results showed that the Web-based translation models can surpass commercial MT systems in CLIR tasks.
Abstract: Although more and more language pairs are covered by machine translation (MT) services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application that needs translation functionality of a relatively low level of sophistication, since current models for information retrieval (IR) are still based on a bag of words. The Web provides a vast resource for the automatic construction of parallel corpora that can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this article, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost.

128 citations


Proceedings ArticleDOI
07 Jul 2003
TL;DR: A dedicated noun phrase translation subsystem is built that improves over the currently best general statistical machine translation methods by incorporating special modeling and special features.
Abstract: We define noun phrase translation as a subtask of machine translation This enables us to build a dedicated noun phrase translation subsystem that improves over the currently best general statistical machine translation methods by incorporating special modeling and special features We achieved 655% translation accuracy in a German-English translation task vs 532% with IBM Model 4

117 citations


Proceedings ArticleDOI
26 Oct 2003
TL;DR: A decoder for statistical machine translation which allows controlled reordering of the words generated in the target language and the effect of the length of this reordering window on the search space and the translation quality is analyzed.
Abstract: We describe a decoder for statistical machine translation which allows controlled reordering of the words generated in the target language After a general discussion of the structure of a decoder a particular implementation is discussed which allows for word-to-word and phrase-to-phrase translation Word reordering is used to improve the translation quality We analyze the effect of the length of this reordering window on the search space and the translation quality Results for Chinese-to-English and Arabic-to-English translation tasks are presented

80 citations


Patent
12 Mar 2003
TL;DR: In this paper, a translation result is sent to client terminal (1 A), for the client requesting for translation to select the best translator to translate the document from a client terminal to a translator.
Abstract: The dictionaries which each of translators made are registered in translator dictionary database ( 11 ) of translation mediate server ( 4 ) being divided into each special field where corresponding translators belong. And, the client requesting for translation sends document to be translated, from for example, client terminal ( 1 A) to translation mediate server ( 4 ). There, automatic translating means ( 10 ) reads out translation dictionary corresponding to this document field of translator dictionary database ( 11 ). And, it performs automatic translation of the document by using this dictionary. This translation result is sent to client terminal ( 1 A), for the client requesting for translation to select the best translator. The selected translator is informed of translation request at his or her terminal, for example, translator terminal ( 2 A).

75 citations


Proceedings ArticleDOI
12 Apr 2003
TL;DR: A fully fledged translation system is used to ensure the quality of the proposed extensions of the interactive machine translation system and word hypotheses graphs are used as an efficient search space representation to achieve fast response times.
Abstract: The goal of interactive machine translation is to improve the productivity of human translators. An interactive machine translation system operates as follows: the automatic system proposes a translation. Now, the human user has two options: to accept the suggestion or to correct it. During the post-editing process, the human user is assisted by the interactive system in the following way: the system suggests an extension of the current translation prefix. Then, the user either accepts this extension (completely or partially) or ignores it. The two most important factors of such an interactive system are the quality of the proposed extensions and the response time. Here, we will use a fully fledged translation system to ensure the quality of the proposed extensions. To achieve fast response times, we will use word hypotheses graphs as an efficient search space representation. We will show results of our approach on the Verbmobil task and on the Canadian Hansards task.

61 citations


Book ChapterDOI
01 Jan 2003
TL;DR: A new example-based method of machine translation in which the examples need not be direct translations, allowing the use of currently available sentence-aligned corpora as data.
Abstract: This paper introduces a new example-based method of machine translation in which the examples need not be direct translations. The system will weed out strange examples during translation, allowing the use of currently available sentence-aligned corpora as data. Rule-based modules are used where appropriate. A prototype Japanese-to-English system has been implemented that allows multiple users to share corpora.

60 citations


01 Jan 2003
TL;DR: This workshop incrementally add new features representing syntactic knowledge that deal with specific problems of the underlying baseline, and extends previous tree-based alignment models by allowing partial tree alignments when the two syntactic structures are not isomorphic.
Abstract: In recent evaluations of machine translation systems, statistical systems have outperformed classical approaches based on interpretation, transfer, and generation. Nonetheless, the output of statistical systems often contains obvious grammatical errors. This can be attributed to the fact that the syntactic well-formedness is only influenced by local n-gram language models and simple alignment models. We aim to integrate syntactic structure into statistical models to address this problem. In the workshop we start with a very strong baseline – the alignment template statistical machine translation system that obtained the best results in the 2002 and 2003 DARPA MT evaluations. This model is based on a log-linear modeling framework, which allows for the easy integration of many different knowledge sources (i.e. feature functions) into an overall model and to train the feature function combination weights discriminatively. During the workshop, we incrementally add new features representing syntactic knowledge that deal with specific problems of the underlying baseline. We want to investigate a broad range of possible feature functions, from very simple binary features to sophisticated treeto-tree translation models. Simple feature functions test if a certain constituent occurs in the source and the target language parse tree. More sophisticated features are derived from an alignment model where whole sub-trees in source and target can be aligned node by node. We also plan to investigate features based on projection of parse trees from one language onto strings of another, a useful technique when parses are available for only one of the two languages. We extend previous tree-based alignment models by allowing partial tree alignments when the two syntactic structures are not isomorphic. We work with the Chinese-English data from the recent evaluations, as large amounts of sentence-aligned training corpora, as well as multiple reference translations are available. This will also allow to compare results with the various systems participating in the evaluations. In addition, an annotated Chinese-English parallel tree-bank is available. We evaluate the improvement of our system using the BLEU metric. Using the additional feature functions developed during the workshop the BLEU score improved from 31.6% for the baseline MT system to 33.2% using rescoring of a 1000-best list.

60 citations


01 Jan 2003
TL;DR: A dedicated noun phrase translation subsystem is built that improves over the currently best general statistical machine translation methods by incorporating special modeling and special features and shows overall improvement in translation quality.
Abstract: We define noun phrase translation as a subtask of statistical machine translation. This enables us to build a dedicated noun phrase translation subsystem that improves over the currently best general statistical machine translation methods by incorporating special modeling and special features. We integrate such a system into a state-of-the-art statistical machine translation system with novel methods and show overall improvement in translation quality. We also carry out empirical linguistic studies on noun phrase translatability and the sources of translation errors.

Book ChapterDOI
01 Jan 2003
TL;DR: It is shown that Example-Based Machine Translation, as long as it is linguistically principled, significantly overlaps with other linguologically principled approaches to Machine Translation.
Abstract: We maintain that the essential feature that characterizes a Machine Translation approach and sets it apart from other approaches is the kind of knowledge it uses. From this perspective, we argue that Example-Based Machine Translation is sometimes characterized in terms of nonessential features. We show that Example-Based Machine Translation, as long as it is linguistically principled, significantly overlaps with other linguistically principled approaches to Machine Translation. We make a proposal for translation knowledge bases that make such an overlap explicit. We relate our proposal to translation by analogy, which stands out as an inherently example-based technique.


Book ChapterDOI
21 Aug 2003
TL;DR: The monolingual, bilingual, and multilingual retrieval experiments using the CLEF 2003 test collection show that document translation- based retrieval is slightly better than the query translation-based retrieval on the CLEFs.
Abstract: This paper describes monolingual, bilingual, and multilingual retrieval experiments using the CLEF 2003 test collection. The paper compares query translation-based multilingual retrieval with document translation-based multilingual retrieval where the documents are translated into the query language by translating the document words individually using machine translation systems or statistical translation lexicons derived from parallel texts. The multilingual retrieval results show that document translation-based retrieval is slightly better than the query translation-based retrieval on the CLEF 2003 test collection. Furthermore, combining query translation and document translation in multilingual retrieval achieves even better performance.

Patent
23 Sep 2003
TL;DR: A system and method for generating language-translated versions of software include a parsing engine to scan original-language versions and detect textual string or other expressions which may require translation for other countries or markets as mentioned in this paper.
Abstract: A system and method for generating language-translated versions of software include a parsing engine to scan original-language versions of software, and detect textual string or other expressions which may require translation for other countries or markets. After testing for prior translation, those strings may be converted to appropriate expressions in other languages, and for instance stored in paired-memory or other format. Users may download the original version of the software, and then install run-time, language-specific resources to tailor the software to their market or country. The run-time, language-specific resources may be or include resource-only dynamic link libraries (dlls). In embodiments the target language into which translation may be made may be automatically detected using the regional settings of the user's machine, or otherwise. Because translation resources for various sets of languages may be generated before the release of the original code, software products may be deployed in various markets and countries at the same time as the original code. Staggered release of localized versions of a software product in one country after the other is therefore not necessary, and software maintenance is made more efficient.

Proceedings Article
01 Jan 2003
TL;DR: It is shown that, although domain actions are domain specific, the approach scales up to large domains without an explosion of domain actions and can be coded with high inter-coder reliability across research sites.
Abstract: We describe a coding scheme for machine translation of spoken taskoriented dialogue. The coding scheme covers two levels of speaker intention − domain independent speech acts and domain dependent domain actions. Our database contains over 14,000 tagged sentences in English, Italian, and German. We argue that domain actions, and not speech acts, are the relevant discourse unit for improving translation quality. We also show that, although domain actions are domain specific, the approach scales up to large domains without an explosion of domain actions and can be coded with high inter-coder reliability across research sites. Furthermore, although the number of domain actions is on the order of ten times the number of speech acts, sparseness is not a problem for the training of classifiers for identifying the domain action. We describe our work on developing high accuracy speech act and domain action classifiers, which is the core of the source language analysis module of our NESPOLE machine translation system.

Posted Content
TL;DR: Since, the machine is not capable of interpreting a general text with sufficient accuracy automatically at present - let alone re-expressing it for a given audience, it fails to perform as FGH-MT.
Abstract: Fully-automatic general-purpose high-quality machine translation systems (FGH-MT) are extremely difficult to build. In fact, there is no system in the world for any pair of languages which qualifies to be called FGH-MT. The reasons are not far to seek. Translation is a creative process which involves interpretation of the given text by the translator. Translation would also vary depending on the audience and the purpose for which it is meant. This would explain the difficulty of building a machine translation system. Since, the machine is not capable of interpreting a general text with sufficient accuracy automatically at present - let alone re-expressing it for a given audience, it fails to perform as FGH-MT. FOOTNOTE{The major difficulty that the machine faces in interpreting a given text is the lack of general world knowledge or common sense knowledge.}

Proceedings ArticleDOI
31 May 2003
TL;DR: This article examines the task of identifying the target-language words that correspond to a given set of source- language words in a pair of text segments known to be mutual translations within the context of a sub-sentential translation-memory system, i.e. a translation support tool capable of proposing translations for portions of a SL sentence, extracted from an archive of existing translations.
Abstract: The term translation spotting (TS) refers to the task of identifying the target-language (TL) words that correspond to a given set of source-language (SL) words in a pair of text segments known to be mutual translations. This article examines this task within the context of a sub-sentential translation-memory system, i.e. a translation support tool capable of proposing translations for portions of a SL sentence, extracted from an archive of existing translations. Different methods are proposed, based on a statistical translation model. These methods take advantage of certain characteristics of the application, to produce TL segments submitted to constraints of contiguity and compositionality. Experiments show that imposing these constraints allows important gains in accuracy, with regard to the most probable alignments predicted by the model.

Proceedings Article
01 Jan 2003
TL;DR: A working two-way speech-to-speech translation system that runs in near real-time on a consumer handheld computer that can translate from English to Arabic and Arabic to English in the domain of medical interviews is described.
Abstract: This paper describes a working two-way speech-to-speech translation system that runs in near real-time on a consumer handheld computer. It can translate from English to Arabic and Arabic to English in the domain of medical interviews. We describe the general architecture and frameworks within which we developed each of the components: HMM-based recognition, interlingua translation (both rule and statistically based), and unit selection synthesis.

Proceedings ArticleDOI
07 Jul 2003
TL;DR: An alternative translation model based on a text chunk under the framework of statistical machine translation is described, which has experimented on a broad-coverage Japanese-English traveling corpus and achieved improved performance.
Abstract: This paper describes an alternative translation model based on a text chunk under the framework of statistical machine translation. The translation model suggested here first performs chunking. Then, each word in a chunk is translated. Finally, translated chunks are reordered. Under this scenario of translation modeling, we have experimented on a broad-coverage Japanese-English traveling corpus and achieved improved performance.

Book
31 Jul 2003
TL;DR: This chapter discusses translation into the foreign language and its cultural adaptation to the reader, as well as the role of language skills in this process.
Abstract: Introduction to English Translation Foreword Part I: The Theoretical Aspects of Translation Chapter 1: Translation through Interpretation 1.1. The three levels of translation 1.2. Interpreting 1.3. The oral and the written 1.4. The oral origins of the interpretive explanation of translation 1.5. What is interpretation? 1.5.1. Deverbalization 1.5.2. Sense 1.5.3. The immediate grasp of sense 1.5.4. Units of sense 1.6. The written form 1.7. Understanding 1.7.1. Understanding the linguistic component 1.7.2. Understanding what is implicit 1.7.3. Cognitive inputs 1.8. Expression 1.8.1. Reverbalization 1.8.2. The verification stage 1.8.3. Identical contents, equivalent forms Chapter 2: Equivalence and correspondence 2.1. Equivalence and correspondence 2.1.1. What is equivalence? 2.1.2. What is correspondence? 2.2. Translation by equivalence 2.2.1. Cognitive equivalence 2.2.2. Affective equivalence 2.2.3. The global nature of equivalence 2.2.4. Explicit or synecdoche 2.2.5. The spirit of a language and the creation of equivalents 2.2.6. How to evaluate equivalence? 2.3. Correspondences which are appropriate when translating texts 2.3.1. Words chosen deliberately 2.3.2. Enumerations 2.3.3. Technical terms 2.3.4. Polysemy and actualization 2.3.5. The various forms of translation by correspondence 2.4. Faithfulness and freedom Chapter 3: Language and Translation 3.1. Linguistics and translation 3.1.1. Structural linguistics 3.1.2. Generative linguistics 3.1.3. Communication and the interactionist approach 3.2. Langue, parole and text: some definitions 3.3. Macro-signs and hypotheses of senses 3.4. Interpretation 3.5. Two demonstrations of interpretation 3.5.1. Interpretation from the actor 3.5.2. Interpretation made explicit Part II: The Practice of Translation Chapter 4: The Practical Problems of Translation 4.1. A few problems observed in practice 4.1.1. The absence of deverbalization 4.1.2. Deverbalization, a methodological issue 4.1.3. The translation unit 4.1.4. Faithfulness 4.1.5. The transfer of culture Chapter 5: Translation and the Teaching of Languages 5.1. The natural tendency of all learners 5.2. Comparative studies and the teaching of translation 5.3. The awkward position of translation 5.4. Translation into the foreign language (thAme) and translation into the mother tongue (version 5.4.1. Translation into the foreign language (thAme) 5.4.2. Translation into the mother tongue (version) 5.5. How to improve the language skills of the would-be-translator 5.5.1 The language skills course 5.5.2. The self-study brochure 5.6.The teaching of translation Chapter 6: Translation into the Foreign Language 6.1. Into which language should one translate? 6.2. The limits of translation into the foreign language 6.3. Acceptability in translation 6.3.1. The complementarity between the specialist reader and the foreign language translation 6.3.2. Foreign language translation and its cultural adaptation to the reader 6.3.3. The general public and translation into a foreign language Chapter 7 Machine Translation versus Human Translation 7.1. An historical overview of machine translation 7.2. Machine translation today 7.2.1. Fully automatic machine translation 7.2.2. Human intervention 7.3. How the machine understands languages 7.3.1. Lexical data 7.3.2. Transformational rules 7.3.3. Parsing 7.4. Comparing humans and machines 7.4.1. The differences 7.4.2. The similarities 7.4.3. Real world knowledge and contextual knowledge 7.5. Machines move closer to humans 7.5.1. Knowledge bases 7.5.2. Neural networks 7.6. Machine-aided human translation Afterword Appendix 1 Cannery Row Appendix 2 The Woman behind the Woman

Journal ArticleDOI
TL;DR: The authors argue for a homology between machine translation and global English, arguing that both exist in the technocratic mode and abide by the principle of instrumental rationality, and argue that basic language, with its privileging of communicability and immediate legibility, is the precondition for the global network of programming languages.
Abstract: This essay critiques Warren Weaver's articulation of machine translation as a problem of cryptography and his analogizing of the treatment of language within the context of machine translation to C. K. Ogden and I. A. Richards's Basic English project. Basic language, with its privileging of communicability and immediate legibility, is the precondition for the global network of programming languages. Focusing on the underlying principles of machine translation, functionality, and performativity, this essay argues for a homology between machine translation and global English: both exist in the technocratic mode and abide by the principle of instrumental rationality.

DissertationDOI
01 Jan 2003
TL;DR: This paper presents a meta-analysis of how translation memory tools help human translators recycle portions of their previous work by storing previously translated material in the context of source texts.
Abstract: Acknowledgments First and foremost, I would like to thank my thesis supervisor Dr. Lynne Bowker for her prompt and insightful feedback, her talent for transforming the overwhelming into the manageable and her perpetual good nature. Working with you has truly been one of the highlights of this whole adventure! Thanks to Dr. Ingrid Meyer for steering me toward graduate work in the first place and for awakening my interest in translation technology. Thanks also to Lucie Langlois, who helped me define my project in its earliest stages and who has been providing me with interesting work experience and useful contacts ever since. Association of Translators and Interpreters of Ontario (ATIO) generously provided me with financial support. The Translation Bureau has been very supportive of my research by providing me with access to software, hardware and technical support. I am especially grateful for being granted access to the Central Archiving System, which allowed me to build the corpora I needed to test my methodology. Another benefit of my time at the Translation Bureau was the opportunity to work with André Guyon of IT Strategies, who was very generous with his knowledge of both translation memory and evaluation. Our exchanges of ideas in the later stages of the project stimulated me to solve problems creatively and certainly resulted in an improved methodology. iii Last but not least, many thanks to my friends and family who have been very supportive during the past two years. Thanks to Vanessa for walking this path ahead of me and letting me know some of what to expect. Thanks to my fellow translation students for walking this path alongside me. Thanks to my parents and sisters in Newfoundland for listening to my whoops of joy and wails of despair whenever I needed an ear. Finally, thanks to my " Ottawa family " , Inge, Julien, Anne, Alphonse and their (our) friends, for keeping me well fed and (relatively) well adjusted all this time. Additional thanks go to Alphonse for translating my abstract into French. I am privileged to have you all in my life; thanks for the " memories " ! iv Abstract Translation memory (TM) tools help human translators recycle portions of their previous work by storing previously translated material. This material is aligned, i.e. segments of the source texts are linked with their equivalents in the corresponding target texts. When a translator uses a TM …

Proceedings ArticleDOI
27 May 2003
TL;DR: The development of the Speechalator software-based translation system required addressing a number of hard issues, including a new language for the team, close integration on a small device, computational efficiency on a limited platform, and scalable coverage for the domain.
Abstract: This demonstration involves two-way automatic speech-to-speech translation on a consumer off-the-shelf PDA. This work was done as part of the DARPA-funded Babylon project, investigating better speech-to-speech translation systems for communication in the field. The development of the Speechalator software-based translation system required addressing a number of hard issues, including a new language for the team (Egyptian Arabic), close integration on a small device, computational efficiency on a limited platform, and scalable coverage for the domain.

Proceedings ArticleDOI
12 Apr 2003
TL;DR: The latest performance of components and features of a project named Corpus-Centered Computation (C3), which targets a translation technology suitable for spoken language translation, are reported.
Abstract: This paper reports the latest performance of components and features of a project named Corpus-Centered Computation (C3), which targets a translation technology suitable for spoken language translation. C3 places corpora at the center of the technology. Translation knowledge is extracted from corpora by both EBMT and SMT methods, translation quality is gauged by referring to corpora, the best translation among multiple-engine outputs is selected based on corpora and the corpora themselves are paraphrased or filtered by automated processes.

Patent
23 Sep 2003
TL;DR: In this article, a parsing engine is used to scan original-language versions of software, and detect textual string or other expressions which may require translation for other countries or markets, after testing for prior translation, those strings may be converted to appropriate expressions in other languages and for instance stored in paired-memory or other format.
Abstract: Generating language-translated versions of software include a parsing engine to scan original-language versions of software, and detect textual string or other expressions which may require translation for other countries or markets. After testing for prior translation, those strings may be converted to appropriate expressions in other languages, and for instance stored in paired-memory or other format. Users may download the original version of the software, and then install run-time, language-specific resources to tailor the software to their market or country. The run-time, language-specific resources may be or include resource-only dynamic link libraries (dlls). In embodiments the target language into which translation may be made may be automatically detected using the regional settings of the user's machine, or otherwise. Because translation resources for various sets of languages may be generated before the release of the original code, software products may be deployed in various markets and countries at the same time as the original code. Staggered release of localized versions of a software product in one country after the other is therefore not necessary, and software maintenance is made more efficient.

Proceedings ArticleDOI
27 May 2003
TL;DR: A technical overview of the state-of-the-art machine translation projects, including several statistical MT projects have appeared in North America, Europe, and Asia, and the literature is growing substantially.
Abstract: Automatic translation from one human language to another using computers, better known as machine translation (MT), is a long-standing goal of computer science. Accurate translation requires a great deal of knowledge about the usage and meaning of words, the structure of phrases, the meaning of sentences, and which real-life situations are plausible. For general-purpose translation, the amount of required knowledge is staggering, and it is not clear how to prioritize knowledge acquisition efforts.Recently, there has been a fair amount of research into extracting translation-relevant knowledge automatically from bilingual texts. In the early 1990s, IBM pioneered automatic bilingual-text analysis. A 1999 workshop at Johns Hopkins University saw a re-implementation of many of the core components of this work, aimed at attracting more researchers into the field. Over the past years, several statistical MT projects have appeared in North America, Europe, and Asia, and the literature is growing substantially. We will provide a technical overview of the state-of-the-art.


Proceedings ArticleDOI
07 Jul 2003
TL;DR: A Web-based English-Chinese concordance system developed to promote translation reuse and encourage authentic and idiomatic use in second language writing and to provide high-precision bilingual alignment on the sentence, phrase and word levels is described.
Abstract: This paper describes a Web-based English-Chinese concordance system, Total-Recall, developed to promote translation reuse and encourage authentic and idiomatic use in second language writing. We exploited and structured existing high-quality translations from the bilingual Sinorama Magazine to build the concordance of authentic text and translation. Novel approaches were taken to provide high-precision bilingual alignment on the sentence, phrase and word levels. A browser-based user interface (UI) is also developed for ease of access over the Internet. Users can search for word, phrase or expression in English or Chinese. The Web-based user interface facilitates the recording of the user actions to provide data for further research.

Journal Article
TL;DR: Three kinds-notes in the sentences (or phrases), after the sentences and after the texts, are discussed and their usages in translation by adopting a great number of examples.
Abstract: The impediments of translation are inevitably brought by language and culture factors in the process of translation from the source language to the target language, during which the "note" plays a typical role. After dividing it into three kinds-notes in the sentences (or phrases),after the sentences and after the texts, the paper discusses their usages in translation by adopting a great number of examples.