Institution
Yahoo!
Company•London, United Kingdom•
About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.
Papers published on a yearly basis
Papers
More filters
••
23 May 2006TL;DR: This work combines a novel solution to an interval covering problem with extensions to previous work on score aggregation in order to create an efficient backend system capable of producing visualizations at arbitrary scales on this large dataset in real time.
Abstract: We consider the problem of visualizing the evolution of tags within the Flickr (flickr.com) online image sharing community. Any user of the Flickr service may append a tag to any photo in the system. Over the past year, users have on average added over a million tags each week. Understanding the evolution of these tags over time is therefore a challenging task. We present a new approach based on a characterization of the most interesting tags associated with a sliding interval of time. An animation provided via Flash in a web browser allows the user to observe and interact with the interesting tags as they evolve over time.New algorithms and data structures are required to support the efficient generation of this visualization. We combine a novel solution to an interval covering problem with extensions to previous work on score aggregation in order to create an efficient backend system capable of producing visualizations at arbitrary scales on this large dataset in real time.
291 citations
••
11 Aug 2013TL;DR: This paper defines a novel density function, which gives subgraphs of much higher quality than densest sub graphs: the graphs found by the method are compact, dense, and with smaller diameter.
Abstract: Finding dense subgraphs is an important graph-mining task with many applications. Given that the direct optimization of edge density is not meaningful, as even a single edge achieves maximum density, research has focused on optimizing alternative density functions. A very popular among such functions is the average degree, whose maximization leads to the well-known densest-subgraph notion. Surprisingly enough, however, densest subgraphs are typically large graphs, with small edge density and large diameter. In this paper, we define a novel density function, which gives subgraphs of much higher quality than densest subgraphs: the graphs found by our method are compact, dense, and with smaller diameter. We show that the proposed function can be derived from a general framework, which includes other important density functions as subcases and for which we show interesting general theoretical properties. To optimize the proposed function we provide an additive approximation algorithm and a local-search heuristic. Both algorithms are very efficient and scale well to large graphs. We evaluate our algorithms on real and synthetic datasets, and we also devise several application studies as variants of our original problem. When compared with the method that finds the subgraph of the largest average degree, our algorithms return denser subgraphs with smaller diameter. Finally, we discuss new interesting research directions that our problem leaves open.
290 citations
••
08 Feb 2012TL;DR: The problem of correlating micro-blogging activity with stock-market events, defined as changes in the price and traded volume of stocks, is studied and it is shown that even relatively small correlations between price and micro- bloggers features can be exploited to drive a stock trading strategy that outperforms other baseline strategies.
Abstract: We study the problem of correlating micro-blogging activity with stock-market events, defined as changes in the price and traded volume of stocks. Specifically, we collect messages related to a number of companies, and we search for correlations between stock-market events for those companies and features extracted from the micro-blogging messages. The features we extract can be categorized in two groups. Features in the first group measure the overall activity in the micro-blogging platform, such as number of posts, number of re-posts, and so on. Features in the second group measure properties of an induced interaction graph, for instance, the number of connected components, statistics on the degree distribution, and other graph-based properties.We present detailed experimental results measuring the correlation of the stock market events with these features, using Twitter as a data source. Our results show that the most correlated features are the number of connected components and the number of nodes of the interaction graph. The correlation is stronger with the traded volume than with the price of the stock. However, by using a simulator we show that even relatively small correlations between price and micro-blogging features can be exploited to drive a stock trading strategy that outperforms other baseline strategies.
290 citations
••
05 Jun 2011TL;DR: It is shown that generically, no individually rational mechanism can compensate individuals for the privacy loss incurred due to their reported valuations for privacy, and modeling it correctly is one of the many exciting directions for future work.
Abstract: We initiate the study of markets for private data, through the lens of differential privacy. Although the purchase and sale of private data has already begun on a large scale, a theory of privacy as a commodity is missing. In this paper, we propose to build such a theory. Specifically, we consider a setting in which a data analyst wishes to buy information from a population from which he can estimate some statistic. The analyst wishes to obtain an accurate estimate cheaply, while the owners of the private data experience some cost for their loss of privacy, and must be compensated for this loss. Agents are selfish, and wish to maximize their profit, so our goal is to design truthful mechanisms. Our main result is that such problems can naturally be viewed and optimally solved as variants of multi-unit procurement auctions. Based on this result, we derive auctions which are optimal up to small constant factors for two natural settings: When the data analyst has a fixed accuracy goal, we show that an application of the classic Vickrey auction achieves the analyst's accuracy goal while minimizing his total payment. When the data analyst has a fixed budget, we give a mechanism which maximizes the accuracy of the resulting estimate while guaranteeing that the resulting sum payments do not exceed the analyst's budget.In both cases, our comparison class is the set of envy-free mechanisms, which correspond to the natural class of fixed-price mechanisms in our setting.In both of these results, we ignore the privacy cost due to possible correlations between an individual's private data and his valuation for privacy itself. We then show that generically, no individually rational mechanism can compensate individuals for the privacy loss incurred due to their reported valuations for privacy. This is nevertheless an important issue, and modeling it correctly is one of the many exciting directions for future work.
289 citations
•
20 Apr 2006TL;DR: In this paper, metadata may be in the form of tags, comments, annotations or favorites, and the media objects may be searched according to metadata, and ranked in a variety of ways.
Abstract: Metadata may be associated with media objects by providing media objects for display, and accepting input concerning the media objects, where the input may include at least two different types of metadata. For example, metadata may be in the form of tags, comments, annotations or favorites. The media objects may be searched according to metadata, and ranked in a variety of ways.
289 citations
Authors
Showing all 26766 results
Name | H-index | Papers | Citations |
---|---|---|---|
Ashok Kumar | 151 | 5654 | 164086 |
Alexander J. Smola | 122 | 434 | 110222 |
Howard I. Maibach | 116 | 1821 | 60765 |
Sanjay Jain | 103 | 881 | 46880 |
Amirhossein Sahebkar | 100 | 1307 | 46132 |
Marc Davis | 99 | 412 | 50243 |
Wenjun Zhang | 96 | 976 | 38530 |
Jian Xu | 94 | 1366 | 52057 |
Fortunato Ciardiello | 94 | 695 | 47352 |
Tong Zhang | 93 | 414 | 36519 |
Michael E. J. Lean | 92 | 411 | 30939 |
Ashish K. Jha | 87 | 503 | 30020 |
Xin Zhang | 87 | 1714 | 40102 |
Theunis Piersma | 86 | 632 | 34201 |
George Varghese | 84 | 253 | 28598 |