H
Han Li
Researcher at Amazon.com
Publications - 21
Citations - 922
Han Li is an academic researcher from Amazon.com. The author has contributed to research in topics: Data management & Natural language understanding. The author has an hindex of 8, co-authored 19 publications receiving 615 citations. Previous affiliations of Han Li include University of Wisconsin-Madison.
Papers
More filters
Proceedings ArticleDOI
Deep Learning for Entity Matching: A Design Space Exploration
Sidharth Mudgal,Han Li,Theodoros Rekatsinas,AnHai Doan,Youngchoon Park,Ganesh Krishnan,Rohit Deep,Esteban Arcaute,Vijay Raghavendra +8 more
TL;DR: The results show that DL does not outperform current solutions on structured EM, but it can significantly outperform them on textual and dirty EM, which suggests that practitioners should seriously consider using DL for textual anddirty EM problems.
Journal ArticleDOI
Magellan: toward building entity matching management systems
Pradap Konda,Sanjib Das,G C Paul Suganthan,AnHai Doan,Adel Ardalan,Jeff Ballard,Han Li,Fatemah Panahi,Haojun Zhang,Jeffrey F. Naughton,Shishir Prasad,Ganesh Krishnan,Rohit Deep,Vijay Raghavendra +13 more
TL;DR: Magellan is novel in four important aspects: it provides how-to guides that tell users what to do in each EM scenario, step by step, and provides tools to help users do these steps; the tools seek to cover the entire EM pipeline, not just matching and blocking as current EM systems do.
Journal ArticleDOI
Magellan: toward building entity matching management systems over data science stacks
Pradap Konda,Sanjib Das,G C Paul Suganthan,AnHai Doan,Adel Ardalan,Jeff Ballard,Han Li,Fatemah Panahi,Haojun Zhang,Jeffrey F. Naughton,Shishir Prasad,Ganesh Krishnan,Rohit Deep,Vijay Raghavendra +13 more
TL;DR: This paper discusses the limitations of current EM systems, presents Magellan, a new kind of EM systems that addresses these limitations, and proposes demonstration scenarios that show the promise of the Magellan approach.
Proceedings ArticleDOI
Inferring air pollution by sniffing social media
TL;DR: A series of progressively more sophisticated machine learning models are proposed, culminating in a Markov Random Field model that utilizes the text content in social media as well as the spatiotemporal correlation among cities and days to estimate AQI from social media posts.
Proceedings ArticleDOI
Human-in-the-Loop Challenges for Entity Matching: A Midterm Report
AnHai Doan,Adel Ardalan,Jeff Ballard,Sanjib Das,Yash Govind,Pradap Konda,Han Li,Sidharth Mudgal,Paulson Erik S,G C Paul Suganthan,Haojun Zhang +10 more
TL;DR: This paper shows how the challenges of EM forced us to revise the authors' solution architecture, from a typical RDBMS-style architecture to a very human-centric one, in which human users are first-class objects driving the EM process, using tools at pain-point places.