scispace - formally typeset
M

Moshe Dubiner

Researcher at Google

Publications -  9
Citations -  275

Moshe Dubiner is an academic researcher from Google. The author has contributed to research in topics: k-nearest neighbors algorithm & Probability distribution. The author has an hindex of 5, co-authored 9 publications receiving 266 citations.

Papers
More filters
Proceedings Article

Large Scale Parallel Document Mining for Machine Translation

TL;DR: A distributed system is described that reliably mines parallel text from large corpora as cross-language near-duplicate detection, enabled by an initial, low-quality batch translation.
Journal ArticleDOI

Bucketing Coding and Information Theory for the Statistical High-Dimensional Nearest-Neighbor Problem

TL;DR: Bucketing information is defined, and is proven to bound the performance of all bucketing codes, and it is shown that order of 1/p+∈comparisons suffice, for any ∈ > 0
Patent

Parallel document mining

TL;DR: This paper provided a collection of documents in multiple languages, identifying, from the collection, a group of candidate documents, where each candidate document in the group shared multiple corresponding rare features, and evaluated pairs of candidates in the candidate documents in the collection using multiple common features present in the collections of documents.
Patent

Identifying nearest neighbors for machine translation

TL;DR: In this paper, the authors describe technologies relating to identifying nearest neighbors are provided. In one implementation, a method includes using a first and a second collections of n-grams and their associated probabilities to generate a plurality of randomized ranked collections.
Journal ArticleDOI

A Heterogeneous High-Dimensional Approximate Nearest Neighbor Algorithm

TL;DR: An old style probabilistic formulation is introduced instead of the more general locality sensitive hashing (LSH) formulation, and it is shown that at least for sparse problems it recognizes much more efficient algorithms than the sparseness destroying LSH random projections.