scispace - formally typeset
D

Dan Pelleg

Researcher at Yahoo!

Publications -  76
Citations -  5097

Dan Pelleg is an academic researcher from Yahoo!. The author has contributed to research in topics: Web search query & Cluster analysis. The author has an hindex of 22, co-authored 75 publications receiving 4818 citations. Previous affiliations of Dan Pelleg include Technion – Israel Institute of Technology & Carnegie Mellon University.

Papers
More filters
Proceedings Article

X-means: Extending K-means with Efficient Estimation of the Number of Clusters

TL;DR: A new algorithm is introduced that eeciently, searches the space of cluster locations and number of clusters to optimize the Bayesian Information Criterion (BIC) or the Akaike Information Criteria (AIC) measure.
Proceedings ArticleDOI

Accelerating exact k-means algorithms with geometric reasoning

TL;DR: New algorithms for the k-means clustering problem are presented that use the kd-tree data structure to reduce the large number of nearest-neighbor queries issued by the traditional algorithm.
Journal ArticleDOI

The shark-search algorithm. An application: tailored Web site mapping

TL;DR: The shark search algorithm is introduced, a refined version of one of the first dynamic Web search algorithms, the “fish search”, which has been embodied into a dynamic Web site mapping that enables users to tailor Web maps to their interests.
Proceedings ArticleDOI

What makes a query difficult

TL;DR: This work addresses a novel model that captures the main components of a topic and the relationship between those components and topic difficulty and demonstrates the applicability of the difficulty model for several uses such as predicting query difficulty, predicting the number of topic aspects expected to be covered by the search results, and analyzing the findability of a specific domain.
Proceedings Article

Active Learning for Anomaly and Rare-Category Detection

TL;DR: A technique is proposed to meet the challenge to identify "rare category" records in an unlabeled noisy set with help from a human expert who has a small budget of datapoints that they are prepared to categorize, which assumes a mixture model fit to the data but otherwise makes no assumptions on the particular form of the mixture components.