scispace - formally typeset
Open AccessProceedings ArticleDOI

Tailoring entity resolution for matching product offers

Reads0
Chats0
TLDR
This work proposes a new approach to extract and use so-called product codes to identify products and distinguish them from similar product variations and shows that the UPC information in product offers is often error-prone and can lead to insufficient match decisions.
Abstract
Product matching is a challenging variation of entity resolution to identify representations and offers referring to the same product. Product matching is highly difficult due to the broad spectrum of products, many similar but different products, frequently missing or wrong values, and the textual nature of product titles and descriptions. We propose the use of tailored approaches for product matching based on a preprocessing of product offers to extract and clean new attributes usable for matching. In particular, we propose a new approach to extract and use so-called product codes to identify products and distinguish them from similar product variations. We evaluate the effectiveness of the proposed approaches with challenging real-life datasets with product offers from online shops. We also show that the UPC information in product offers is often error-prone and can lead to insufficient match decisions.

read more

Citations
More filters
Proceedings ArticleDOI

Deep Learning for Entity Matching: A Design Space Exploration

TL;DR: The results show that DL does not outperform current solutions on structured EM, but it can significantly outperform them on textual and dirty EM, which suggests that practitioners should seriously consider using DL for textual anddirty EM problems.
Book ChapterDOI

The Case for Holistic Data Integration

TL;DR: This work outlines different use cases and provides an overview of initial approaches for holistic schema/ontology integration and entity clustering, and considers open data repositories and so-called knowledge graphs.
Proceedings ArticleDOI

Matching product titles using web-based enrichment

TL;DR: A novel unsupervised matching algorithm is proposed that leverages web earch engines to enrich product titles by adding important missing tokens that occur frequently in search results, and compute importance scores for tokens based on their ability to retrieve other (enriched title) tokens in searchresults.
Journal ArticleDOI

A machine learning approach for product matching and categorization

TL;DR: This paper uses neural language models to produce word embeddings from large quantities of publicly available product data marked up with Microdata, which boost the performance of the feature extraction model, thus leading to better product matching and categorization performances.
Proceedings ArticleDOI

Integrating product data from websites offering microdata markup

TL;DR: This paper discusses the challenges that arise in the task of integrating descriptions of electronic products from several thousand e-shops that offer Microdata markup and presents a solution for each step of the data integration process including Microdata extraction, product classification, product feature extraction, identity resolution, and data fusion.
References
More filters
Journal ArticleDOI

Duplicate Record Detection: A Survey

TL;DR: This paper presents an extensive set of duplicate detection algorithms that can detect approximately duplicate records in a database and covers similarity metrics that are commonly used to detect similar field entries.
Journal ArticleDOI

Duplicate Record Detection: A Survey

TL;DR: This paper presents an extensive set of duplicate detection algorithms that can detect approximately duplicate records in a database and covers similarity metrics that are commonly used to detect similar field entries.
Proceedings Article

A comparison of string distance metrics for name-matching tasks

TL;DR: Using an open-source, Java toolkit of name-matching methods, the authors experimentally compare string distance metrics on the task of matching entity names and find that the best performing method is a hybrid scheme combining a TFIDF weighting scheme, which is widely used in information retrieval, with the Jaro-Winkler string-distance scheme.
Journal ArticleDOI

Evaluation of entity resolution approaches on real-world match problems

TL;DR: It is found that some challenging resolution tasks such as matching product entities from online shops are not sufficiently solved with conventional approaches based on the similarity of attribute values.
Journal ArticleDOI

Frameworks for entity matching: A comparison

TL;DR: This paper comparatively analyze 11 proposed frameworks for entity matching and considers both frameworks which do or do not utilize training data to semi-automatically find an entity matching strategy to solve a given match task.
Related Papers (5)