Showing papers on "Computer-assisted translation published in 2017"

PDF

Open Access

Journal Article•DOI•

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

[...]

Melvin Johnson¹, Mike Schuster¹, Quoc V. Le¹, Maxim Krikun¹, Yonghui Wu¹, Zhifeng Chen¹, Nikhil Thorat¹, Fernanda B. Viégas¹, Martin Wattenberg¹, Greg S. Corrado¹, Macduff Hughes¹, Jeffrey Dean¹ - Show less +8 more•Institutions (1)

Google¹

09 Oct 2017-Transactions of the Association for Computational Linguistics

TL;DR: This work proposes a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages using a shared wordpiece vocabulary, and introduces an artificial token at the beginning of the input sentence to specify the required target language.

...read moreread less

Abstract: We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-the-art results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. Our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages.

...read moreread less

1,288 citations

Journal Article•DOI•

Multi-way, multilingual neural machine translation

[...]

Orhan Firat¹, Kyunghyun Cho², Baskaran Sankaran³, Fatos T. Yarman Vural¹, Yoshua Bengio⁴ - Show less +1 more•Institutions (4)

Middle East Technical University¹, New York University², IBM³, Canadian Institute for Advanced Research⁴

01 Sep 2017-Computer Speech & Language

TL;DR: The first attention-based neural-MT for multi-way, multilingual translation is proposed and it outperforms strong conventional statistical machine translation systems on Turkish-English and Uzbek-English by incorporating the resources of other language pairs.

...read moreread less

85 citations

Proceedings Article•DOI•

A Study of Style in Machine Translation: Controlling the Formality of Machine Translation Output

[...]

Xing Niu¹, Marianna J. Martindale¹, Marine Carpuat¹•Institutions (1)

University of Maryland, College Park¹

01 Sep 2017

TL;DR: This work proposes to use lexical formality models to control the formality level of machine translation output and demonstrates the effectiveness of this approach in empirical evaluations, as measured by automatic metrics and human assessments.

...read moreread less

Abstract: Stylistic variations of language, such as formality, carry speakers’ intention beyond literal meaning and should be conveyed adequately in translation. We propose to use lexical formality models to control the formality level of machine translation output. We demonstrate the effectiveness of our approach in empirical evaluations, as measured by automatic metrics and human assessments.

...read moreread less

70 citations

Journal Article•DOI•

Translators and machine translation: knowledge and skills gaps in translator pedagogy

[...]

Christopher D. Mellinger¹•Institutions (1)

Walsh University¹

03 Aug 2017-Interpreter and Translator Trainer

TL;DR: For translation graduates to serve as professional post-editors in the language industry, content must be embedded in multiple courses across the curriculum, rather than concentrating the material in a stand-alone course or module.

...read moreread less

Abstract: Graduates of translation programmes increasingly encounter machine translation in the language industry. In response to this identified market need, translation education programmes have begun to i...

...read moreread less

44 citations

Posted Content•

Neural machine translation for low-resource languages.

[...]

Robert Östling, Jörg Tiedemann

18 Aug 2017-arXiv: Computation and Language

TL;DR: It is found that while SMT remains the best option for low-resource settings, this method can produce acceptable translations with only 70000 tokens of training data, a level where the baseline NMT system fails completely.

...read moreread less

Abstract: Neural machine translation (NMT) approaches have improved the state of the art in many machine translation settings over the last couple of years, but they require large amounts of training data to produce sensible output. We demonstrate that NMT can be used for low-resource languages as well, by introducing more local dependencies and using word alignments to learn sentence reordering during translation. In addition to our novel model, we also present an empirical evaluation of low-resource phrase-based statistical machine translation (SMT) and NMT to investigate the lower limits of the respective technologies. We find that while SMT remains the best option for low-resource settings, our method can produce acceptable translations with only 70000 tokens of training data, a level where the baseline NMT system fails completely.

...read moreread less

37 citations

Journal Article•DOI•

Arabic-English Parallel Corpus: A New Resource for Translation Training and Language Teaching

[...]

Hind M. Al-Otaibi¹•Institutions (1)

King Saud University¹

15 Oct 2017-Social Science Research Network

TL;DR: An ongoing project to compile a 10-million-word Arabic–English parallel corpus to be used as a resource for translation training and language teaching and the bidirectional corpus can be used to compare translated and source language and identify differences.

...read moreread less

Abstract: Parallel corpora can be defined as collections of aligned, translated texts of two or more languages. They play a major role in translation and contrastive studies, and are also becoming popular in translation training and language teaching, with the advent of the data-driven learning (DDL) approach. Despite their significance, however, Arabic seems to lack a satisfactory general-use parallel corpus resource. The literature describes few Arabic–English parallel corpora, and these few are usually inaccurate and/or expensive. Some are small in size, while others are restricted in terms of genre, failing to meet the requirements of many academics and researchers. This paper describes an ongoing project at the College of Languages and Translation, King Saud University, to compile a 10-million-word Arabic–English parallel corpus to be used as a resource for translation training and language teaching. The bidirectional corpus can be used to compare translated and source language and identify differences. The corpus has been manually verified at different stages, including translation, text segmentation, alignment, and file preparation; it is available as full-text in XML format and through a user-friendly web interface that provides a concordancer to support bilingual search queries and several filtering options.

...read moreread less

36 citations

Journal Article•DOI•

Why Translation Is Difficult: A Corpus-Based Study of Non-Literality in Post-Editing and From-Scratch Translation

[...]

Michael Carl¹, Moritz Schaeffer²•Institutions (2)

Renmin University of China¹, University of Mainz²

11 Oct 2017-HERMES - Journal of Language and Communication in Business

TL;DR: A definition of translation literality is developed that is based on the syntactic and semantic similarity of the source and the target texts and it is found that non-literality makes from-scratch translation and post-editing difficult.

...read moreread less

Abstract: The paper develops a definition of translation literality that is based on the syntactic and semantic similarity of the source and the target texts. We provide theoretical and empirical evidence that absolute literal translations are easy to produce. Based on a multilingual corpus of alternative translations we investigate the effects of cross-lingual syntactic and semantic distance on translation production times and find that non-literality makes from-scratch translation and post-editing difficult. We show that statistical machine translation systems encounter even more difficulties with non-literality.

...read moreread less

34 citations

Other•DOI•

Translation Process Research

[...]

Arnt Lykke Jakobsen

18 Feb 2017

29 citations

Proceedings Article•DOI•

On Integrating Discourse in Machine Translation

[...]

Karin Sim Smith¹•Institutions (1)

University of Glasgow¹

01 Sep 2017

TL;DR: In order to take Machine Translation to another level, it will need to judge output not based on a single reference translation, but based on notions of fluency and of adequacy – ideally with reference to the source text.

...read moreread less

Abstract: As the quality of Machine Translation (MT) improves, research on improving discourse in automatic translations becomes more viable This has resulted in an increase in the amount of work on discourse in MT However many of the existing models and metrics have yet to integrate these insights Part of this is due to the evaluation methodology, based as it is largely on matching to a single reference At a time when MT is increasingly being used in a pipeline for other tasks, the semantic element of the translation process needs to be properly integrated into the task Moreover, in order to take MT to another level, it will need to judge output not based on a single reference translation, but based on notions of fluency and of adequacy – ideally with reference to the source text

...read moreread less

29 citations

Journal Article•DOI•

Leveraging bilingual terminology to improve machine translation in a CAT environment

[...]

Mihael Arcan¹, Marco Turchi, Sara Tonelli, Paul Buitelaar¹•Institutions (1)

National University of Ireland¹

01 Sep 2017-Natural Language Engineering

TL;DR: This work evaluates the proposed framework that, taking as input a small set of parallel documents, gathers domain-specific bilingual terms and injects them into an SMT system to enhance translation quality and compares two terminology injection methods that can be easily used at run-time without altering the normal activity of anSMT system.

...read moreread less

Abstract: This work focuses on the extraction and integration of automatically aligned bilingual terminology into a Statistical Machine Translation (SMT) system in a Computer Aided Translation scenario We evaluate the proposed framework that, taking as input a small set of parallel documents, gathers domain-specific bilingual terms and injects them into an SMT system to enhance translation quality Therefore, we investigate several strategies to extract and align terminology across languages and to integrate it in an SMT system We compare two terminology injection methods that can be easily used at run-time without altering the normal activity of an SMT system: XML markup and cache-based model We test the cache-based model on two different domains (information technology and medical) in English, Italian and German, showing significant improvements ranging from 223 to 678 BLEU points over a baseline SMT system and from 005 to 303 compared to the widely-used XML markup approach

...read moreread less

20 citations

Journal Article•DOI•

Enhancing the communicative dimension of legal translation: comparable corpora in the research-informed classroom

[...]

Łucja Biel¹•Institutions (1)

University of Warsaw¹

31 Jul 2017-Interpreter and Translator Trainer

TL;DR: How comparable corpora may be used in the classroom to increase the communicative dimension of legal translations, an aspect which tends to be neglected in training, is demonstrated.

...read moreread less

Abstract: The objective of this paper is to demonstrate how comparable corpora may be used in the classroom to increase the communicative dimension of legal translations, an aspect which tends to be neglecte...

...read moreread less

Journal Article•DOI•

Segment-based interactive-predictive machine translation

[...]

Miguel Domingo¹, Álvaro Peris¹, Francisco Casacuberta¹•Institutions (1)

Polytechnic University of Valencia¹

01 Dec 2017-Machine Translation

TL;DR: This work presents one of these new interactive protocols, which allows the user to validate all correct word sequences in a translation hypothesis, and compares it against the classical prefix-based approach.

...read moreread less

Abstract: Machine translation systems require human revision to obtain high-quality translations. Interactive methods provide an efficient human–computer collaboration, notably increasing productivity. Recently, new interactive protocols have been proposed, seeking for a more effective user interaction with the system. In this work, we present one of these new protocols, which allows the user to validate all correct word sequences in a translation hypothesis. Thus, the left-to-right barrier from most of the existing protocols is broken. We compare this protocol against the classical prefix-based approach, obtaining a significant reduction of the user effort in a simulated environment. Additionally, we experiment with the use of confidence measures to select the word the user should correct at each iteration, reaching the conclusion that the order in which words are corrected does not affect the overall effort.

...read moreread less

Proceedings Article•DOI•

Building a Non-Trivial Paraphrase Corpus Using Multiple Machine Translation Systems.

[...]

Yui Suzuki, Tomoyuki Kajiwara¹, Mamoru Komachi¹•Institutions (1)

Tokyo Metropolitan University¹

01 Jul 2017

TL;DR: This work built and released the first evaluation corpus for Japanese paraphrase identification, which comprises 655 sentence pairs, and proposes a novel sentential paraphrase acquisition method, which focuses on acquiring both non-trivial positive and negative instances.

...read moreread less

Abstract: We propose a novel sentential paraphrase acquisition method. To build a wellbalanced corpus for Paraphrase Identification, we especially focus on acquiring both non-trivial positive and negative instances. We use multiple machine translation systems to generate positive candidates and a monolingual corpus to extract negative candidates. To collect nontrivial instances, the candidates are uniformly sampled by word overlap rate. Finally, annotators judge whether the candidates are either positive or negative. Using this method, we built and released the first evaluation corpus for Japanese paraphrase identification, which comprises 655 sentence pairs.

...read moreread less

Posted Content•

Machine Translation at Booking.com: Journey and Lessons Learned

[...]

Pavel Levin, Nishikant Dhanuka, Maxim Khalilov

25 Jul 2017-arXiv: Computation and Language

TL;DR: This work describes the recently developed neural machine translation (NMT) system and benchmark it against the authors' own statistical machinetranslation (SMT) system as well as two other general purpose online engines (statistical and neural).

...read moreread less

Abstract: We describe our recently developed neural machine translation (NMT) system and benchmark it against our own statistical machine translation (SMT) system as well as two other general purpose online engines (statistical and neural). We present automatic and human evaluation results of the translation output provided by each system. We also analyze the effect of sentence length on the quality of output for SMT and NMT systems.

...read moreread less

Posted Content•

Translating Domain-Specific Expressions in Knowledge Bases with Neural Machine Translation.

[...]

Mihael Arcan, Paul Buitelaar

07 Sep 2017

TL;DR: Through the specific and unique terminological expressions, subword segmentation within NMT does not outperform a word based neural translation model and a clear advantage in domain adaptation and terminology injection of NMT methods over SMT is observed.

...read moreread less

Abstract: Our work presented in this paper focuses on the translation of domain-specific expressions represented in semantically structured resources, like ontologies or knowledge graphs. To make knowledge accessible beyond language borders, these resources need to be translated into different languages. The challenge of translating labels or terminological expressions represented in ontologies lies in the highly specific vocabulary and the lack of contextual information, which can guide a machine translation system to translate ambiguous words into the targeted domain. Due to the challenges, we train and translate the terminological expressions in the medial and financial domain with statistical as well as with neural machine translation methods. We evaluate the translation quality of domainspecific expressions with translation systems trained on a generic dataset and experiment domain adaptation with terminological expressions. Furthermore we perform experiments on the injection of external knowledge into the translation systems. Through these experiments, we observed a clear advantage in domain adaptation and terminology injection of NMT methods over SMT. Nevertheless, through the specific and unique terminological expressions, subword segmentation within NMT does not outperform a word based neural translation model.

...read moreread less

Posted Content•

Translation of Patent Sentences with a Large Vocabulary of Technical Terms Using Neural Machine Translation

[...]

Zi Long¹, Takehito Utsuro¹, Tomoharu Mitsuhashi, Mikio Yamamoto¹•Institutions (1)

University of Tsukuba¹

14 Apr 2017-arXiv: Computation and Language

TL;DR: This paper proposes a method that enables NMT to translate patent sentences comprising a large vocabulary of technical terms, and trains an NMT system on bilingual data wherein technical terms are replaced with technical term tokens; this allows it to translate most of the source sentences except technical terms.

...read moreread less

Abstract: Neural machine translation (NMT), a new approach to machine translation, has achieved promising results comparable to those of traditional approaches such as statistical machine translation (SMT). Despite its recent success, NMT cannot handle a larger vocabulary because training complexity and decoding complexity proportionally increase with the number of target words. This problem becomes even more serious when translating patent documents, which contain many technical terms that are observed infrequently. In NMTs, words that are out of vocabulary are represented by a single unknown token. In this paper, we propose a method that enables NMT to translate patent sentences comprising a large vocabulary of technical terms. We train an NMT system on bilingual data wherein technical terms are replaced with technical term tokens; this allows it to translate most of the source sentences except technical terms. Further, we use it as a decoder to translate source sentences with technical term tokens and replace the tokens with technical term translations using SMT. We also use it to rerank the 1,000-best SMT translations on the basis of the average of the SMT score and that of the NMT rescoring of the translated sentences with technical term tokens. Our experiments on Japanese-Chinese patent sentences show that the proposed NMT system achieves a substantial improvement of up to 3.1 BLEU points and 2.3 RIBES points over traditional SMT systems and an improvement of approximately 0.6 BLEU points and 0.8 RIBES points over an equivalent NMT system without our proposed technique.

...read moreread less

Journal Article•DOI•

Meaning preservation in Example-based Machine Translation with structural semantics

[...]

Chong Chai Chua¹, Tek Yong Lim¹, Lay-Ki Soon¹, Enya Kong Tang², Bali Ranaivo-Malançon³ - Show less +1 more•Institutions (3)

Multimedia University¹, Universiti Sains Malaysia², Universiti Malaysia Sarawak³

15 Jul 2017-Expert Systems With Applications

TL;DR: An English to Malay EBMT system is presented to demonstrate the practical application of the structural semantics, which is used to support deeper semantic similarity measurement and impose structural constraints in translation examples selection.

...read moreread less

Abstract: The main tasks in Example-based Machine Translation (EBMT) comprise of source text decomposition, following with translation examples matching and selection, and finally adaptation and recombination of the target translation. As the natural language is ambiguous in nature, the preservation of source text’s meaning throughout these processes is complex and challenging. A structural semantics is introduced, as an attempt towards meaning-based approach to improve the EBMT system. The structural semantics is used to support deeper semantic similarity measurement and impose structural constraints in translation examples selection. A semantic compositional structure is derived from the structural semantics of the selected translation examples. This semantic compositional structure serves as a representation structure to preserve the consistency and integrity of the input sentence’s meaning structure throughout the recombination process. In this paper, an English to Malay EBMT system is presented to demonstrate the practical application of this structural semantics. Evaluation of the translation test results shows that the new translation framework based on the structural semantics has outperformed the previous EBMT framework.

...read moreread less

Journal Article•DOI•

Translating technical terms into Arabic: Microsoft Terminology Collection (English-Arabic) as an example

[...]

Sameh Saad Hassan¹•Institutions (1)

Suez Canal University¹

21 Jul 2017-Translation & Interpreting

TL;DR: Results show that it is more appropriate to use translation and/or Arabic-expanding techniques with technical terms derived from common linguistic roots in the source language (SL) to preserve the integrity and authenticity of Arabic as a target language (TL) at a time of a marked increase in the number of SL technical terms.

...read moreread less

Abstract: The main aim of this paper is to explore the techniques used in translating English technical terms into Arabic in the Microsoft Terminology Collection (MTC) (English-Arabic) as an example of comprehensive multilingual resources of technical terminology on the Web. MTC is a well-known online IT-glossary available on the Microsoft Language Portal in over ninety languages. It provides users with the opportunity to perform quick searches between different languages and to download files that integrate with Microsoft products and computer-assisted translation (CAT) tools. Some examples of MTC terms in Arabic are examined by the researcher to identify the kinds of translation strategies that MTC follows in order to translate technical terms into Arabic as well as the appropriateness of these strategies to their translation situations through comparison of different translations for the same SL term. The analysis of selected examples from MTC shows that in the Arabic translations of technical terms, MTC uses translation, Arabicisation, and Arabic-expanding techniques inconsistently, either in providing more than one translation for a standard technical term within the same translation situation or in using different translation strategies for similar technical terms in similar translation situations. Results show that it is more appropriate to use translation and/or Arabic-expanding techniques (mainly derivation and compounding) with technical terms derived from common linguistic roots in the source language (SL) to preserve the integrity and authenticity of Arabic as a target language (TL) at a time of a marked increase in the number of SL technical terms, while methods of Arabicisation should only be used with SL proper nouns or any word derived from them to solve problems of non-equivalence at word level between Arabic and English.

...read moreread less

Journal Article•DOI•

Errors and non-errors in English-Arabic machine translation of gender-bound constructs in technical texts

[...]

Emad A. S. Abu-Ayyash¹•Institutions (1)

British University in Dubai¹

01 Jan 2017-Procedia Computer Science

TL;DR: The qualitative examination of the target language texts revealed that the three MT systems had errors and non-errors in rendering gender-bound constructs from English to Arabic, and that errors transpired in certain co-textual environments.

...read moreread less

Journal Article•DOI•

A system for terminology extraction and translation equivalent detection in real time

[...]

Antoni Oliver¹•Institutions (1)

Open University of Catalonia¹

17 Oct 2017-Machine Translation

TL;DR: A system for automatic terminology extraction and automatic detection of the equivalent terms in the target language to be used alongside a computer assisted translation (CAT) tool that provides term candidates and their translations in an automatic way each time the translator goes from one segment to the next one.

...read moreread less

Abstract: In this paper we present a system for automatic terminology extraction and automatic detection of the equivalent terms in the target language to be used alongside a computer assisted translation (CAT) tool that provides term candidates and their translations in an automatic way each time the translator goes from one segment to the next one. The system uses several sources of information: the text from the segment being translated and from the whole translation project, the translation memories assigned to the project and a translation phrase table from a statistical machine translation system. It also uses the terminological database assigned to the project in order to avoid presenting already known terms. The use of translation phrase tables allows us to use very large parallel corpora in a very efficient way. We have used Moses to calculate and to consult the translation phrase tables. The program is written in Python and it can be used with any CAT tool. In our experiments we have used OmegaT, a well-known open source CAT tool. Evaluation results for English–Spanish and for three subjects (politics, finance, and medicine) are presented.

...read moreread less

Proceedings Article•DOI•

C-3MA: Tartu-Riga-Zurich Translation Systems for WMT17

[...]

Matīss Rikters, Chantal Amrhein, Maksym Del, Mark Fishel

01 Sep 2017

TL;DR: This paper describes the neural machine translation systems of the University of Latvia, University of Zurich and University of Tartu, which participated in the WMT 2017 shared task on news translation by building systems for two language pairs, based on an attentional encoder-decoder, using BPE subword segmentation.

...read moreread less

Abstract: This paper describes the neural machine translation systems of the University of Latvia, University of Zurich and University of Tartu. We participated in the WMT 2017 shared task on news translation by building systems for two language pairs: English↔German and English↔Latvian. Our systems are based on an attentional encoder-decoder, using BPE subword segmentation. We experimented with backtranslating the monolingual news corpora and filtering out the best translations as additional training data, enforcing named entity translation from a dictionary of parallel named entities, penalizing overand under-translated sentences, and combining output from multiple NMT systems with SMT. The described methods give 0.7 1.8 BLEU point improvements over our baseline systems.

...read moreread less

Journal Article•DOI•

Teaching Specialized Translation Error-tagged Translation Learner Corpora

[...]

Jarmila Fictumova, Kristýna Štěpánková, Adam Obrusník

07 Jul 2017-Sendebar

TL;DR: The method used in teaching specialised translation in the English Language Translation Master’s programme at Masaryk University is described, with the first results of the research examining a learner corpus of translations from Czech into English.

...read moreread less

Abstract: This paper describes the method used in teaching specialised translation in the English Language Translation Master’s programme at Masaryk University. After a brief description of the courses, the focus shifts to translation learner corpora (TLC) compiled in the new Hypal interface, which can be integrated in Moodle. Student translations are automatically aligned (with possible adjustments), PoS (part-of-speech) tagged, and manually error-tagged. Personal student reports based on error statistics for individual translations to show students’progress throughout the term or during their studies in the four-semester programme can be easily generated. Using the data from the pilot run of the new software, the paper concludes with the first results of the research examining a learner corpus of translations from Czech into English.

...read moreread less

Journal Article•DOI•

Translator Attitudes towards Translator-Computer Interaction - Findings from a Workplace Study

[...]

Kristine Bundgaard¹•Institutions (1)

Aalborg University¹

11 Oct 2017-HERMES - Journal of Language and Communication in Business

TL;DR: Bundgaard et al. as discussed by the authors found that translators seem to have a flexible and pragmatic attitude towards TCI, adapting to the tool's imperfections and accommodating its resistances.

...read moreread less

Abstract: Today technology is part and parcel of professional translation, and translation has therefore been characterised as Translator-Computer Interaction (TCI) (O’Brien 2012). Translation is increasingly carried out using Translation Memory (TM) systems which incorporate machine translation (MT), referred to as MT-assisted TM translation, and in this type of tool, translators switch between editing TM matches and post-editing MT matches. It is generally assumed that translators’ attitudes towards technology impact on this interaction with the technology. Drawing on Eagly/Chaiken’s (1995) definition of attitudes as evaluations of entities with favour or disfavour and on qualitative data from a workplace study of TCI, conducted as part of a PhD dissertation (Bundgaard 2017) and partly reported on in Bundgaard et al. (2016), this paper explores translator attitudes towards TCI in the form of MT-assisted TM translation. In doing so, the paper has a particular focus on the disfavour towards TCI expressed by translators. Moreover, inspired by Olohan (2011), who applies Pickering’s “mangle of practice” theory and analyses resistance and accommodation in TCI, the paper focuses on how translators accommodate resistances offered by the tool. The study shows that the translators express disfavour towards MT in many respects, but also acknowledge positive aspects of the technology and expect MT to play a significant role in their future working lives. The translators do not make many positive or negative comments about TM which might indicate that TM is a completely integrated part of their processes. The translators seem to have a flexible and pragmatic attitude towards TCI, adapting to the tool’s imperfections and accommodating its resistances.

...read moreread less

Journal Article•DOI•

Productivity and quality when editing machine translation and translation memory outputs: an empirical analysis of English to Welsh translation

[...]

Benjamin Screen¹•Institutions (1)

Cardiff University¹

26 Sep 2017

TL;DR: Findings in a controlled study carried out to examine the possible benefits of editing Machine Translation and Translation Memory outputs when translating from English to Welsh contradict supposed similarities between translation quality in terms of style and post-editing Machine Translation.

...read moreread less

Abstract: This article reports on a controlled study carried out to examine the possible benefits of editing Machine Translation and Translation Memory outputs when translating from English to Welsh. Using software capable of timing the translation process per segment, 8 professional translators each translated 75 sentences of differing match percentage, and post- edited a further 25 segments of Machine Translation. Basing the final analysis on 800 sentences and 17,440 words, the use of Fuzzy Matches in the 70-99% match range, Exact Matches and Statistical Machine Translation was found to significantly speed up the translation process. Significant correlations were also found between the processing time data of Exact Matches and Machine Translation post-editing, rather than between Fuzzy Matches and Machine Translation as expected. Two experienced translators were then asked to rate all translations for fidelity, grammaticality and style, whereby it was found that the use of translation technology either did not negatively affect translation quality compared to manual translation, or its use actually improved final quality in some cases. As well as confirming the findings of research in relation to translation technology, these findings also contradict supposed similarities between translation quality in terms of style and post-editing Machine Translation.

...read moreread less

Posted Content•

Improving a Multi-Source Neural Machine Translation Model with Corpus Extension for Low-Resource Languages

[...]

Gyu-Hyeon Choi, Shin Jong Hun¹, Kim Young Kil¹•Institutions (1)

Electronics and Telecommunications Research Institute¹

26 Sep 2017-arXiv: Computation and Language

TL;DR: This paper proposes the use of synthetic methods for extending a low-resource corpus and apply it to a multi-source neural machine translation model, and shows the improvement of machine translation performance through corpus extension using the synthetic method.

...read moreread less

Abstract: In machine translation, we often try to collect resources to improve performance. However, most of the language pairs, such as Korean-Arabic and Korean-Vietnamese, do not have enough resources to train machine translation systems. In this paper, we propose the use of synthetic methods for extending a low-resource corpus and apply it to a multi-source neural machine translation model. We showed the improvement of machine translation performance through corpus extension using the synthetic method. We specifically focused on how to create source sentences that can make better target sentences, including the use of synthetic methods. We found that the corpus extension could also improve the performance of multi-source neural machine translation. We showed the corpus extension and multi-source model to be efficient methods for a low-resource language pair. Furthermore, when both methods were used together, we found better machine translation performance.

...read moreread less

The Translator's Amanuensis 2020

[...]

Elisa Alonso¹, Lucas Nunes Vieira•Institutions (1)

Pablo de Olavide University¹

13 Jul 2017

TL;DR: It is argued that the Translator’s Amanuensis 2020 could benefit from existing Translation Studies concepts: the study of translation problems, translation competence models, and the ethics and sociology of translation.

...read moreread less

Abstract: This paper is an exercise of imagination. Based on Kay’s (1980) inspiring idea of a translator’s amanuensis, we attempt to describe a post-editing tool that enables ubiquitous translation (Cronin 2010). We argue that a parallelism exists between media remediation (Bolter and Grusin 1999) and the shifting phase translation is undergoing, with machine translation post-editing having an impact on the global workflow of translated content. We take the hybridisation of traditional and machine translation processes as a starting point to envisage the features of forthcoming translation technologies. Results of previous surveys helped us to select features expected to play a central role: versatile devices to which we broadly refer as displayers would enable ubiquity; a relevant knowledge feature would provide human translators with a well-assorted repertoire of reliable sources; and an effort prediction feature would provide post-editors with reliable estimates of how much work lay ahead. Interacting with the Translator’s Amanuensis 2020 would not always be straightforward, however. Translators will have to adapt to richer ways of reading and visualising information. Ultimately, we argue that the Translator’s Amanuensis 2020 could benefit from existing Translation Studies concepts: the study of translation problems, translation competence models, and the ethics and sociology of translation.

...read moreread less

Journal Article•DOI•

Problems of machine translation of business texts from Russian into English

[...]

A. V. Novikova¹, L. A. Mylnikov¹•Institutions (1)

Perm National Research Polytechnic University¹

19 Aug 2017-Automatic Documentation and Mathematical Linguistics

TL;DR: An integrated functional approach to translating business texts is suggested on the basis of analyzing semantic and morphological features of actual text content and also on axiological and epistemic semantic features that bring to light subjective modality.

...read moreread less

Abstract: This article draws on the example of business texts to consider practical aspects of the distortion of meaning in translation from one language to another in the available machine translation (MT) systems and their underlying approach based on word-by-word translation. An integrated functional approach to translating business texts is suggested on the basis of analyzing semantic and morphological features of actual text content and also on axiological and epistemic semantic features that bring to light subjective modality. The suggested technique is used to develop an algorithm of business text MT that makes it possible to resolve the word-by-word translation issue and conveys the meanings of short texts. Cases of testing the suggested technique and the derived algorithm are considered for the Russian–English language pair.

...read moreread less

Journal Article•DOI•

Rule-Based Machine Translation for the Italian–Sardinian Language Pair

[...]

Francis M. Tyers¹, Hèctor Alòs i Font², Gianfranco Fronteddu³, Adrià Martín-Mor⁴•Institutions (4)

University of Tartu¹, University of Barcelona², University of Cagliari³, Autonomous University of Barcelona⁴

01 Jun 2017-The Prague Bulletin of Mathematical Linguistics

TL;DR: The article focuses on the technology used (Rule-Based Machine Translation) and on some of the rules created, as well as on the orthographic model used for Sardinian.

...read moreread less

Abstract: Abstract This paper describes the process of creation of the first machine translation system from Italian to Sardinian, a Romance language spoken on the island of Sardinia in the Mediterranean. The project was carried out by a team of translators and computational linguists. The article focuses on the technology used (Rule-Based Machine Translation) and on some of the rules created, as well as on the orthographic model used for Sardinian.

...read moreread less

Journal Article•DOI•

Approaches for Improving Hindi to English Machine Translation System

[...]

Rajesh Kumar Chakrawarti¹, Pratosh Bansal²•Institutions (2)

Devi Ahilya Vishwavidyalaya¹, Information Technology Institute²

25 Apr 2017-Indian journal of science and technology

TL;DR: The paper addresses the challenges of MT and solution efforts made in this direction and provides approaches for effective Hindi-to-English Machine Translation that can be helpful in inexpensive and ease implementation of and MT systems.

...read moreread less

Abstract: Objectives: To provide approaches for effective Hindi-to-English Machine Translation (MT) that can be helpful in inexpensive and ease implementation of and MT systems. Methods/Statistical Analysis: Structure of the Hindi and English languages have been studied thoroughly. The possible steps towards the Natural languages have also been studied. The methods, rules, approaches, tools, resources etc. related to MT have been discussed in detail. Findings: MT is an idea for automatic translation of a language. India is the country with full of diversity in culture and languages. More than 20 regional languages are spoken along with several dialects. Hindi is a widely spoken language in all the states of country. A lot of literature, poetries and valuable texts are available in Hindi which gives opportunities to retranslate into English. However, new generation is learning English rapidly and also showing keenness to learn it in simplified lucid manner. Several efforts have been made in this direction. A large number of approaches and solutions exist for MT still there is a huge scope. The paper addresses the challenges of MT and solution efforts made in this direction. This motivates researchers to implement new Hindi-to-English Machine translation systems. Application/Improvements: Efficient, inexpensive and ease translation for available Hindi literature, poetries and other valuable texts into English. Children can easily learn the culture through the poetries and literatures hence the Machine Translation of these will bring wonderful impact.

...read moreread less

Dissertation•

ClipFlair, Audiovisual Translation and Computer Assisted Language Learning: beyond the four-walled classroom

[...]

Alejandro Ros Abaurrea

01 Aug 2017