scispace - formally typeset
H

Han Li

Researcher at Amazon.com

Publications -  21
Citations -  922

Han Li is an academic researcher from Amazon.com. The author has contributed to research in topics: Data management & Natural language understanding. The author has an hindex of 8, co-authored 19 publications receiving 615 citations. Previous affiliations of Han Li include University of Wisconsin-Madison.

Papers
More filters
Proceedings ArticleDOI

Deep Learning for Entity Matching: A Design Space Exploration

TL;DR: The results show that DL does not outperform current solutions on structured EM, but it can significantly outperform them on textual and dirty EM, which suggests that practitioners should seriously consider using DL for textual anddirty EM problems.
Journal ArticleDOI

Magellan: toward building entity matching management systems

TL;DR: Magellan is novel in four important aspects: it provides how-to guides that tell users what to do in each EM scenario, step by step, and provides tools to help users do these steps; the tools seek to cover the entire EM pipeline, not just matching and blocking as current EM systems do.
Journal ArticleDOI

Magellan: toward building entity matching management systems over data science stacks

TL;DR: This paper discusses the limitations of current EM systems, presents Magellan, a new kind of EM systems that addresses these limitations, and proposes demonstration scenarios that show the promise of the Magellan approach.
Proceedings ArticleDOI

Inferring air pollution by sniffing social media

TL;DR: A series of progressively more sophisticated machine learning models are proposed, culminating in a Markov Random Field model that utilizes the text content in social media as well as the spatiotemporal correlation among cities and days to estimate AQI from social media posts.
Proceedings ArticleDOI

Human-in-the-Loop Challenges for Entity Matching: A Midterm Report

TL;DR: This paper shows how the challenges of EM forced us to revise the authors' solution architecture, from a typical RDBMS-style architecture to a very human-centric one, in which human users are first-class objects driving the EM process, using tools at pain-point places.