scispace - formally typeset
Search or ask a question
Institution

Yahoo!

CompanyLondon, United Kingdom
About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.


Papers
More filters
Proceedings ArticleDOI
26 Apr 2010
TL;DR: BitMat - a compressed bit-matrix structure for storing huge RDF graphs, and a novel, light-weight SPARQL join query processing method that employs an initial pruning technique, followed by a variable-binding-matching algorithm on BitMats to produce the final results are introduced.
Abstract: The Semantic Web community, until now, has used traditional database systems for the storage and querying of RDF data. The SPARQL query language also closely follows SQL syntax. As a natural consequence, most of the SPARQL query processing techniques are based on database query processing and optimization techniques. For SPARQL join query optimization, previous works like RDF-3X and Hexastore have proposed to use 6-way indexes on the RDF data. Although these indexes speed up merge-joins by orders of magnitude, for complex join queries generating large intermediate join results, the scalability of the query processor still remains a challenge.In this paper, we introduce (i) BitMat - a compressed bit-matrix structure for storing huge RDF graphs, and (ii) a novel, light-weight SPARQL join query processing method that employs an initial pruning technique, followed by a variable-binding-matching algorithm on BitMats to produce the final results. Our query processing method does not build intermediate join tables and works directly on the compressed data. We have demonstrated our method against RDF graphs of upto 1.33 billion triples - the largest among results published until now (single-node, non-parallel systems), and have compared our method with the state-of-the-art RDF stores - RDF-3X and MonetDB. Our results show that the competing methods are most effective with highly selective queries. On the other hand, BitMat can deliver 2-3 orders of magnitude better performance on complex, low-selectivity queries over massive data.

235 citations

Proceedings ArticleDOI
04 Feb 2010
TL;DR: This work shows empirically that user behavior alone can give an accurate picture of the success of the user's web search goals, without considering the relevance of the documents displayed.
Abstract: Web search engines are traditionally evaluated in terms of the relevance of web pages to individual queries. However, relevance of web pages does not tell the complete picture, since an individual query may represent only a piece of the user's information need and users may have different information needs underlying the same queries. In this work, we address the problem of predicting user search goal success by modeling user behavior. We show empirically that user behavior alone can give an accurate picture of the success of the user's web search goals, without considering the relevance of the documents displayed. In fact, our experiments show that models using user behavior are more predictive of goal success than those using document relevance. We build novel sequence models incorporating time distributions for this task and our experiments show that the sequence and time distribution models are more accurate than static models based on user behavior, or predictions based on document relevance.

235 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: The results suggest that summaries generated by visual co-occurrence tend to match more closely with human generated summaries, when compared to several popular unsupervised techniques.
Abstract: We present video co-summarization, a novel perspective to video summarization that exploits visual co-occurrence across multiple videos. Motivated by the observation that important visual concepts tend to appear repeatedly across videos of the same topic, we propose to summarize a video by finding shots that co-occur most frequently across videos collected using a topic keyword. The main technical challenge is dealing with the sparsity of co-occurring patterns, out of hundreds to possibly thousands of irrelevant shots in videos being considered. To deal with this challenge, we developed a Maximal Biclique Finding (MBF) algorithm that is optimized to find sparsely co-occurring patterns, discarding less co-occurring patterns even if they are dominant in one video. Our algorithm is parallelizable with closed-form updates, thus can easily scale up to handle a large number of videos simultaneously. We demonstrate the effectiveness of our approach on motion capture and self-compiled YouTube datasets. Our results suggest that summaries generated by visual co-occurrence tend to match more closely with human generated summaries, when compared to several popular unsupervised techniques.

234 citations

Patent
22 Dec 2003
TL;DR: A geographical location extraction method and tool to infer a likely geographical location from one or more search terms entered as a query by a user on a search engine or the like is presented in this paper.
Abstract: A geographical location extraction method and tool to infer a likely geographical location from one or more search terms entered as a query by a user on a search engine or the like.

234 citations

Proceedings ArticleDOI
23 May 2006
TL;DR: This work shows how to adapt recent results from theoretical computer science to expand a seed set into a community with small conductance and a strong relationship to the seed, while examining only a small neighborhood of the entire graph.
Abstract: Expanding a seed set into a larger community is a common procedure in link-based analysis. We show how to adapt recent results from theoretical computer science to expand a seed set into a community with small conductance and a strong relationship to the seed, while examining only a small neighborhood of the entire graph. We extend existing results to give theoretical guarantees that apply to a variety of seed sets from specified communities. We also describe simple and flexible heuristics for applying these methods in practice, and present early experiments showing that these methods compare favorably with existing approaches.

234 citations


Authors

Showing all 26766 results

NameH-indexPapersCitations
Ashok Kumar1515654164086
Alexander J. Smola122434110222
Howard I. Maibach116182160765
Sanjay Jain10388146880
Amirhossein Sahebkar100130746132
Marc Davis9941250243
Wenjun Zhang9697638530
Jian Xu94136652057
Fortunato Ciardiello9469547352
Tong Zhang9341436519
Michael E. J. Lean9241130939
Ashish K. Jha8750330020
Xin Zhang87171440102
Theunis Piersma8663234201
George Varghese8425328598
Network Information
Related Institutions (5)
University of Toronto
294.9K papers, 13.5M citations

85% related

University of California, San Diego
204.5K papers, 12.3M citations

85% related

University College London
210.6K papers, 9.8M citations

84% related

Cornell University
235.5K papers, 12.2M citations

84% related

University of Washington
305.5K papers, 17.7M citations

84% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20232
202247
20211,088
20201,074
20191,568
20181,352