scispace - formally typeset
S

Sandeep Tata

Researcher at Google

Publications -  53
Citations -  1714

Sandeep Tata is an academic researcher from Google. The author has contributed to research in topics: Computer science & Information extraction. The author has an hindex of 19, co-authored 46 publications receiving 1544 citations. Previous affiliations of Sandeep Tata include University of Michigan & IBM.

Papers
More filters
Journal ArticleDOI

Estimating the selectivity of tf-idf based cosine similarity predicates

TL;DR: This paper presents the first approach for estimating the selectivity of tf.idf based cosine similarity predicates and shows that this method often produces estimates that are within 40% of the actual selectivity.
Journal ArticleDOI

Using Paxos to build a scalable, consistent, and highly available datastore

TL;DR: Spinnaker as discussed by the authors is an experimental datastore that is designed to run on a large cluster of commodity servers in a single datacenter It features key-based range partitioning, 3-way replication, and a transactional get-put API with the option to choose either strong or timeline consistency on reads.

Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore

TL;DR: Spinnaker as discussed by the authors is an experimental datastore that is designed to run on a large cluster of commodity servers in a single datacenter, and it features key-based range partitioning, 3-way replication, and a transactional get-put API with the option to choose either strong or timeline consistency on reads.
Journal ArticleDOI

Column-oriented storage techniques for MapReduce

TL;DR: This paper describes how column-oriented storage techniques can be incorporated in Hadoop in a way that preserves its popular programming APIs and introduces a novel skip list column format and lazy record construction strategy that avoids deserializing unwanted records to provide an additional 1.5x performance boost.
Proceedings ArticleDOI

SQAK: doing more with keywords

TL;DR: SQAK provides a novel and exciting way to trade-off some of the expressive power of SQL in exchange for the ability to express a large class of aggregate queries using simple keywords.