Y
Yeye He
Researcher at Microsoft
Publications - 65
Citations - 1418
Yeye He is an academic researcher from Microsoft. The author has contributed to research in topics: Set (abstract data type) & Table (database). The author has an hindex of 19, co-authored 59 publications receiving 1178 citations. Previous affiliations of Yeye He include University of Wisconsin-Madison.
Papers
More filters
Journal ArticleDOI
Anonymization of set-valued data via top-down, local generalization
Yeye He,Jeffrey F. Naughton +1 more
TL;DR: A top-down, partition-based approach to anonymizing set-valued data that scales linearly with the input size and scores well on an information-loss data quality metric is proposed.
Proceedings ArticleDOI
SEISA: set expansion by iterative similarity aggregation
TL;DR: A new general framework based on iterative similarity aggregation is proposed, and results are presented to show that, when using general-purpose web data for set expansion, this approach outperforms previous techniques in terms of both precision and recall.
Journal ArticleDOI
ClusterJoin: a similarity joins framework using map-reduce
TL;DR: A ClusterJoin framework that partitions the data space based on the underlying data distribution, and distributes each record to partitions in which they may produce join results based onThe distance threshold, and develops a dynamic load balancing scheme using sampling, which provides strong probabilistic guarantees on the size of partitions, and greatly improves scalability.
Proceedings ArticleDOI
Auto-EM: End-to-end Fuzzy Entity-Matching using Pre-trained Deep Models and Transfer Learning
Chen Zhao,Yeye He +1 more
TL;DR: This work proposes a transfer-learning approach to EM, leveraging pre-trained EM models from large-scale, production knowledge bases (KB), and suggests that the pre- trained approach is effective and outperforms existing EM methods.
Proceedings ArticleDOI
Crawling deep web entity pages
TL;DR: This work describes a prototype system built that specializes in crawling entity-oriented deep-web sites and proposes techniques tailored to tackle important subproblems including query generation, empty page filtering and URL deduplication in the specific context of entity oriented deep- web sites.