Generating titles for millions of browse pages on an e-Commerce site
Prashant Mathur,Nicola Ueffing,Gregor Leusch +2 more
- pp 158-167
Reads0
Chats0
TLDR
An automatic post-editing approach which learns how to post-edit the rule-based titles into curated titles for browse pages in five different languages, namely English, German, French, Italian and Spanish is presented.Abstract:
We present two approaches to generate titles for browse pages in five different languages, namely English, German, French, Italian and Spanish These browse pages are structured search pages in an e-commerce domain We first present a rule-based approach to generate these browse page titles In addition, we also present a hybrid approach which uses a phrase-based statistical machine translation engine on top of the rule-based system to assemble the best title For the two languages English and German we have access to a large amount of already available rule-based generated and curated titles For these languages we present an automatic post-editing approach which learns how to post-edit the rule-based titles into curated titlesread more
Citations
More filters
Proceedings Article
ESCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing
TL;DR: The Synthetic Corpus for Automatic Post-Editing (SCAPE) as discussed by the authors is the largest freely-available synthetic corpus for automatic post-editing, consisting of millions of entries in which the MT element of the training triplets has been obtained by translating the source side of publicly available parallel corpora, and using the target side as an artificial human post-edit.
Posted Content
eSCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing
TL;DR: eSCAPE is presented, the largest freely-available Synthetic Corpus for Automatic Post-Editing released so far, and consists of millions of entries in which the MT element of the training triplets has been obtained by translating the source side of publicly-available parallel corpora, and using the target side as an artificial human post-edit.
Proceedings ArticleDOI
Generating Titles for Web Tables
TL;DR: The proposed technique is the first to consider text-generation methods for table titles, and establishes a new state of the art for generating high-quality table titles using a sequence-to-sequence neural network model.
Proceedings ArticleDOI
Generating E-Commerce Product Titles and Predicting their Quality
José G. C. de Souza,Michael Kozielski,Prashant Mathur,Ernie Chang,Marco Guerini,Matteo Negri,Marco Turchi,Evgeny Matusov +7 more
TL;DR: This work proposes approaches that automatically generate product titles, predict their quality and both automatic and human evaluations performed on real-world data indicate these approaches are effective and applicable to existing e-commerce scenarios.
Posted Content
Generating Titles for Web Tables
TL;DR: In this article, a sequence-to-sequence neural network model is proposed to generate table titles from web tables. But the model is limited to text snippets that have potentially relevant information to the table, encoding them into an input sequence, and using both copy and generation mechanisms in the decoder to balance relevance and readability of the generated title.
References
More filters
Proceedings ArticleDOI
Bleu: a Method for Automatic Evaluation of Machine Translation
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Proceedings Article
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Posted Content
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Proceedings ArticleDOI
Moses: Open Source Toolkit for Statistical Machine Translation
Philipp Koehn,Hieu Hoang,Alexandra Birch,Chris Callison-Burch,Marcello Federico,Nicola Bertoldi,Brooke Cowan,Wade Shen,C. Corbett Moran,Richard Zens,Chris Dyer,Ondrej Bojar,Alexandra Elena Constantin,Evan Herbst +13 more
TL;DR: An open-source toolkit for statistical machine translation whose novel contributions are support for linguistically motivated factors, confusion network decoding, and efficient data formats for translation models and language models.
Proceedings Article
KenLM: Faster and Smaller Language Model Queries
TL;DR: KenLM is a library that implements two data structures for efficient language model queries, reducing both time and memory costs and is integrated into the Moses, cdec, and Joshua translation systems.