Generating titles for millions of browse pages on an e-Commerce site

doi:10.18653/V1/W17-3525

Open AccessProceedings ArticleDOI

Generating titles for millions of browse pages on an e-Commerce site

Prashant Mathur, +2 more

- pp 158-167

Chats0

TLDR

An automatic post-editing approach which learns how to post-edit the rule-based titles into curated titles for browse pages in five different languages, namely English, German, French, Italian and Spanish is presented.

Abstract:

We present two approaches to generate titles for browse pages in five different languages, namely English, German, French, Italian and Spanish These browse pages are structured search pages in an e-commerce domain We first present a rule-based approach to generate these browse page titles In addition, we also present a hybrid approach which uses a phrase-based statistical machine translation engine on top of the rule-based system to assemble the best title For the two languages English and German we have access to a large amount of already available rule-based generated and curated titles For these languages we present an automatic post-editing approach which learns how to post-edit the rule-based titles into curated titles

Citations

PDF

Open Access

More filters

Proceedings Article

ESCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing

Matteo Negri, +3 more

TL;DR: The Synthetic Corpus for Automatic Post-Editing (SCAPE) as discussed by the authors is the largest freely-available synthetic corpus for automatic post-editing, consisting of millions of entries in which the MT element of the training triplets has been obtained by translating the source side of publicly available parallel corpora, and using the target side as an artificial human post-edit.

...read moreread less

Posted Content

eSCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing

Matteo Negri, +3 more

- 20 Mar 2018 -

arXiv: Computation and Language

TL;DR: eSCAPE is presented, the largest freely-available Synthetic Corpus for Automatic Post-Editing released so far, and consists of millions of entries in which the MT element of the training triplets has been obtained by translating the source side of publicly-available parallel corpora, and using the target side as an artificial human post-edit.

...read moreread less

Proceedings ArticleDOI

Generating Titles for Web Tables

Braden Hancock, +2 more

TL;DR: The proposed technique is the first to consider text-generation methods for table titles, and establishes a new state of the art for generating high-quality table titles using a sequence-to-sequence neural network model.

...read moreread less

Proceedings ArticleDOI

Generating E-Commerce Product Titles and Predicting their Quality

José G. C. de Souza, +7 more

TL;DR: This work proposes approaches that automatically generate product titles, predict their quality and both automatic and human evaluations performed on real-world data indicate these approaches are effective and applicable to existing e-commerce scenarios.

...read moreread less

Posted Content

Generating Titles for Web Tables

Braden Hancock, +2 more

- 30 Jun 2018 -

arXiv: Computation and Language

TL;DR: In this article, a sequence-to-sequence neural network model is proposed to generate table titles from web tables. But the model is limited to text snippets that have potentially relevant information to the table, encoding them into an input sequence, and using both copy and generation mechanisms in the decoder to balance relevance and readability of the generated title.

...read moreread less

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Sep 2014 -

arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Proceedings Article

KenLM: Faster and Smaller Language Model Queries

Kenneth Heafield

TL;DR: KenLM is a library that implements two data structures for efficient language model queries, reducing both time and memory costs and is integrated into the Moses, cdec, and Joshua translation systems.

...read moreread less