scispace - formally typeset
Y

Yeye He

Researcher at Microsoft

Publications -  65
Citations -  1418

Yeye He is an academic researcher from Microsoft. The author has contributed to research in topics: Set (abstract data type) & Table (database). The author has an hindex of 19, co-authored 59 publications receiving 1178 citations. Previous affiliations of Yeye He include University of Wisconsin-Madison.

Papers
More filters
Proceedings ArticleDOI

Concept Expansion Using Web Tables

TL;DR: Novel probabilistic ranking methods are developed that can model a new type of table-entity relationship and are significantly more effective than applying state-of-the-art set expansion or holistic ranking techniques.
Journal ArticleDOI

Keyword++: a framework to improve keyword search over entity databases

TL;DR: A general framework that can improve an existing search interface by translating a keyword query to a structured query that leverages the keyword to attribute value associations discovered in the results returned by the original search interface is proposed.
Journal ArticleDOI

Auto-join: joining tables by leveraging transformations

TL;DR: This work has developed Auto-Join, a system that can automatically search over a rich space of operators to compose a transformation program, whose execution makes input tables equi-join-able, and developed an optimal sampling strategy that allows Auto- join to scale to large datasets efficiently, while ensuring joins succeed with high probability.
Proceedings ArticleDOI

Preventing equivalence attacks in updated, anonymized data

TL;DR: To deal with the equivalence attack, this work proposes a graph-based anonymization algorithm that leverages solutions to the classic “min-cut/max-flow” problem, and demonstrates that the algorithm is efficient and effective in preventing equivalence attacks.
Proceedings ArticleDOI

Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks

TL;DR: This work crawled over 4M Jupyter notebooks on GitHub, and replayed them step-by-step, to observe not only full input/output tables at each step, but also the exact data-preparation choices data scientists make that they believe are best suited to the input data.