scispace - formally typeset
Y

Yeye He

Researcher at Microsoft

Publications -  65
Citations -  1418

Yeye He is an academic researcher from Microsoft. The author has contributed to research in topics: Set (abstract data type) & Table (database). The author has an hindex of 19, co-authored 59 publications receiving 1178 citations. Previous affiliations of Yeye He include University of Wisconsin-Madison.

Papers
More filters
Journal ArticleDOI

Anonymization of set-valued data via top-down, local generalization

TL;DR: A top-down, partition-based approach to anonymizing set-valued data that scales linearly with the input size and scores well on an information-loss data quality metric is proposed.
Proceedings ArticleDOI

SEISA: set expansion by iterative similarity aggregation

TL;DR: A new general framework based on iterative similarity aggregation is proposed, and results are presented to show that, when using general-purpose web data for set expansion, this approach outperforms previous techniques in terms of both precision and recall.
Journal ArticleDOI

ClusterJoin: a similarity joins framework using map-reduce

TL;DR: A ClusterJoin framework that partitions the data space based on the underlying data distribution, and distributes each record to partitions in which they may produce join results based onThe distance threshold, and develops a dynamic load balancing scheme using sampling, which provides strong probabilistic guarantees on the size of partitions, and greatly improves scalability.
Proceedings ArticleDOI

Auto-EM: End-to-end Fuzzy Entity-Matching using Pre-trained Deep Models and Transfer Learning

TL;DR: This work proposes a transfer-learning approach to EM, leveraging pre-trained EM models from large-scale, production knowledge bases (KB), and suggests that the pre- trained approach is effective and outperforms existing EM methods.
Proceedings ArticleDOI

Crawling deep web entity pages

TL;DR: This work describes a prototype system built that specializes in crawling entity-oriented deep-web sites and proposes techniques tailored to tackle important subproblems including query generation, empty page filtering and URL deduplication in the specific context of entity oriented deep- web sites.