OpenTag: Open Attribute Value Extraction from Product Profiles
Guineng Zheng,Subhabrata Mukherjee,Xin Luna Dong,Feifei Li +3 more
- pp 1049-1058
Reads0
Chats0
TLDR
OpenTag as mentioned in this paper leverages product profile information such as titles and descriptions to discover missing values of product attributes, and proposes a novel sampling strategy exploring active learning to reduce the burden of human annotation.Abstract:
Extraction of missing attribute values is to find values describing an attribute of interest from a free text input. Most past related work on extraction of missing attribute values work with a closed world assumption with the possible set of values known beforehand, or use dictionaries of values and hand-crafted features. How can we discover new attribute values that we have never seen before? Can we do this with limited human annotation or supervision? We study this problem in the context of product catalogs that often have missing values for many attributes of interest. In this work, we leverage product profile information such as titles and descriptions to discover missing values of product attributes. We develop a novel deep tagging model OpenTag for this extraction problem with the following contributions: (1) we formalize the problem as a sequence tagging task, and propose a joint model exploiting recurrent neural networks (specifically, bidirectional LSTM) to capture context and semantics, and Conditional Random Fields (CRF) to enforce tagging consistency; (2) we develop a novel attention mechanism to provide interpretable explanation for our model's decisions; (3) we propose a novel sampling strategy exploring active learning to reduce the burden of human annotation. OpenTag does not use any dictionary or hand-crafted features as in prior works. Extensive experiments in real-life datasets in different domains show that OpenTag with our active learning strategy discovers new attribute values from as few as 150 annotated samples (reduction in 3.3x amount of annotation effort) with a high F-score of 83%, outperforming state-of-the-art models.read more
Citations
More filters
Posted Content
Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases
TL;DR: In this article, the authors survey fundamental concepts and practical methods for creating and curating large-scale knowledge bases, including methods for discovering and canonicalizing entities and their semantic types and organizing them into clean taxonomies.
Proceedings ArticleDOI
Challenges and Innovations in Building a Product Knowledge Graph
TL;DR: Three advanced extraction technologies to harvest product knowledge from semi-structured sources on the web and from text product profiles are developed, and the OpenTag technique extends state-of-the-art techniques such as Recursive Neural Network and Conditional Random Field with attention and active learning.
Journal ArticleDOI
CASIE: Extracting Cybersecurity Event Information from Text
TL;DR: CASIE is a system that extracts information about cybersecurity events from text and populates a semantic model that can incorporate rich linguistic features and word embeddings and shows that each subsystem performs well in the event detection pipeline.
Proceedings ArticleDOI
Scaling up Open Tagging from Tens to Thousands: Comprehension Empowered Attribute Value Extraction from Product Title
TL;DR: A novel approach to support value extraction scaling up to thousands of attributes without losing performance, and explicitly model the semantic representations for attribute and title, and develop an attention mechanism to capture the interactive semantic relations in-between to enforce the framework to be attribute comprehensive.
Proceedings ArticleDOI
AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types
Xin Luna Dong,Xiang He,Andrey Kan,Xian Li,Yan Liang,Jun Ma,Yifan Ethan Xu,Chenwei Zhang,Tong Zhao,Gabriel Blanco Saldana,Saurabh Deshpande,Alexandre Michetti Manduca,Jay Ren,Surender Pal Singh,Fan Xiao,Haw-Shiuan Chang,Giannis Karamanolakis,Yuning Mao,Yaqing Wang,Christos Faloutsos,Andrew McCallum,Jiawei Han +21 more
TL;DR: AutoKnow, the automatic (self-driving) system that addresses challenges of organizing information about products poses many additional challenges, including sparsity and noise of structured data for products, complexity of the domain with millions of product types and thousands of attributes, heterogeneity across large number of categories, as well as large and constantly growing number of products.
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Journal Article
Dropout: a simple way to prevent neural networks from overfitting
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Proceedings ArticleDOI
Glove: Global Vectors for Word Representation
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Proceedings Article
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.