Institution

Yahoo!

Company•London, United Kingdom•

About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.

...read moreread less

Topics: Population, Web search query, Web page, Web query classification, Query expansion ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Feature hashing for large scale multitask learning

[...]

Kilian Q. Weinberger¹, Anirban Dasgupta¹, John Langford¹, Alexander J. Smola¹, Josh Attenberg¹ - Show less +1 more•Institutions (1)

Yahoo!¹

14 Jun 2009

TL;DR: In this article, the authors provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability, and demonstrate the feasibility of this approach with experimental results for a new use case.

...read moreread less

Abstract: Empirical evidence suggests that hashing is an effective strategy for dimensionality reduction and practical nonparametric estimation. In this paper we provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability. We demonstrate the feasibility of this approach with experimental results for a new use case --- multitask learning with hundreds of thousands of tasks.

...read moreread less

955 citations

Posted Content•

Empirical Comparison of Algorithms for Network Community Detection

[...]

Jure Leskovec¹, Kevin J. Lang², Michael W. Mahoney¹•Institutions (2)

Stanford University¹, Yahoo!²

20 Apr 2010-arXiv: Data Structures and Algorithms

TL;DR: In this paper, the authors explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify, and examine several different classes of approximation algorithms that aim to optimize such objective functions.

...read moreread less

Abstract: Detecting clusters or communities in large real-world graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity than external connectivity, and then one applies approximation algorithms or heuristics to extract sets of nodes that are related to the objective function and that "look like" good communities for the application of interest. In this paper, we explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify. We evaluate several common objective functions that are used to formalize the notion of a network community, and we examine several different classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objective and asking for an approximation to the best cluster of any size, we consider a size-resolved version of the optimization problem. Considering community quality as a function of its size provides a much finer lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have non-obvious size-dependent behavior.

...read moreread less

950 citations

Proceedings Article•DOI•

Abusive Language Detection in Online User Content

[...]

Chikashi Nobata¹, Joel Tetreault¹, Achint Oommen Thomas, Yashar Mehdad¹, Yi Chang¹ - Show less +1 more•Institutions (1)

Yahoo!¹

11 Apr 2016

TL;DR: A machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach and a corpus of user comments annotated for abusive language, the first of its kind.

...read moreread less

Abstract: Detection of abusive language in user generated online content has become an issue of increasing importance in recent years. Most current commercial methods make use of blacklists and regular expressions, however these measures fall short when contending with more subtle, less ham-fisted examples of hate speech. In this work, we develop a machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach. We also develop a corpus of user comments annotated for abusive language, the first of its kind. Finally, we use our detection tool to analyze abusive language over time and in different settings to further enhance our knowledge of this behavior.

...read moreread less

945 citations

Proceedings Article•DOI•

Who says what to whom on twitter

[...]

Shaomei Wu¹, Jake M. Hofman², Winter Mason², Duncan J. Watts²•Institutions (2)

Cornell University¹, Yahoo!²

28 Mar 2011

TL;DR: A striking concentration of attention is found on Twitter, in that roughly 50% of URLs consumed are generated by just 20K elite users, where the media produces the most information, but celebrities are the most followed.

...read moreread less

Abstract: We study several longstanding questions in media communications research, in the context of the microblogging service Twitter, regarding the production, flow, and consumption of information. To do so, we exploit a recently introduced feature of Twitter known as "lists" to distinguish between elite users - by which we mean celebrities, bloggers, and representatives of media outlets and other formal organizations - and ordinary users. Based on this classification, we find a striking concentration of attention on Twitter, in that roughly 50% of URLs consumed are generated by just 20K elite users, where the media produces the most information, but celebrities are the most followed. We also find significant homophily within categories: celebrities listen to celebrities, while bloggers listen to bloggers etc; however, bloggers in general rebroadcast more information than the other categories. Next we re-examine the classical "two-step flow" theory of communications, finding considerable support for it on Twitter. Third, we find that URLs broadcast by different categories of users or containing different types of content exhibit systematically different lifespans. And finally, we examine the attention paid by the different user categories to different news topics.

...read moreread less

932 citations

Patent•

Dynamic page generator

[...]

Farzad Nazem¹, Ashvinkumar P. Patel•Institutions (1)

Yahoo!¹

22 Jan 2007

TL;DR: In this paper, a custom page server is provided with user preferences organized into templates stored in compact data structures and the live data used to fill the templates stored local to the page server which is handing user requests for custom pages.

...read moreread less

Abstract: An custom page server is provided with user preferences organized into templates stored in compact data structures and the live data used to fill the templates stored local to the page server which is handing user requests for custom pages. One process is executed on the page server for every request. The process is provided a user template for the user making the request, where the user template is either generated from user preferences or retrieved from a cache of recently used user templates. Each user process is provided access to a large region of shared memory which contains all of the live data needed to fill any user template. Typically, the pages served are news pages, giving the user a custom selection of stock quotes, news headlines, sports scores, weather, and the like. With the live data stored in a local, shared memory, any custom page can be built within the page server, eliminating the need to make requests from other servers for portions of the live data. While the shared memory might include RAM (random access memory) and disk storage, in many computer systems, it is faster to store all the live data in RAM.

...read moreread less

919 citations

Collapse

Authors

Showing all 26766 results

Name	H-index	Papers	Citations
Ashok Kumar	151	5654	164086
Alexander J. Smola	122	434	110222
Howard I. Maibach	116	1821	60765
Sanjay Jain	103	881	46880
Amirhossein Sahebkar	100	1307	46132
Marc Davis	99	412	50243
Wenjun Zhang	96	976	38530
Jian Xu	94	1366	52057
Fortunato Ciardiello	94	695	47352
Tong Zhang	93	414	36519
Michael E. J. Lean	92	411	30939
Ashish K. Jha	87	503	30020
Xin Zhang	87	1714	40102
Theunis Piersma	86	632	34201
George Varghese	84	253	28598