Institution

Yahoo!

Company•London, United Kingdom•

About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.

...read moreread less

Topics: Population, Web search query, Web page, Web query classification, Query expansion ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Facetnet: a framework for analyzing communities and their evolutions in dynamic networks

[...]

Yu-Ru Lin¹, Yun Chi, Shenghuo Zhu, Hari Sundaram¹, Belle L. Tseng² - Show less +1 more•Institutions (2)

Arizona State University¹, Yahoo!²

21 Apr 2008

TL;DR: This paper proposes FacetNet, a novel framework for analyzing communities and their evolutions through a robust unified process, where communities not only generate evolutions, they also are regularized by the temporal smoothness of evolutions.

...read moreread less

Abstract: We discover communities from social network data, and analyze the community evolution. These communities are inherent characteristics of human interaction in online social networks, as well as paper citation networks. Also, communities may evolve over time, due to changes to individuals' roles and social status in the network as well as changes to individuals' research interests. We present an innovative algorithm that deviates from the traditional two-step approach to analyze community evolutions. In the traditional approach, communities are first detected for each time slice, and then compared to determine correspondences. We argue that this approach is inappropriate in applications with noisy data. In this paper, we propose FacetNet for analyzing communities and their evolutions through a robust unified process. In this novel framework, communities not only generate evolutions, they also are regularized by the temporal smoothness of evolutions. As a result, this framework will discover communities that jointly maximize the fit to the observed data and the temporal evolution. Our approach relies on formulating the problem in terms of non-negative matrix factorization, where communities and their evolutions are factorized in a unified way. Then we develop an iterative algorithm, with proven low time complexity, which is guaranteed to converge to an optimal solution. We perform extensive experimental studies, on both synthetic datasets and real datasets, to demonstrate that our method discovers meaningful communities and provides additional insights not directly obtainable from traditional methods.

...read moreread less

425 citations

Journal Article•DOI•

l p -Norm Multiple Kernel Learning

[...]

Marius Kloft¹, Ulf Brefeld², Sören Sonnenburg³, Alexander Zien•Institutions (3)

University of California, Berkeley¹, Yahoo!², Technical University of Berlin³

01 Feb 2011-Journal of Machine Learning Research

TL;DR: Empirical applications of lp-norm MKL to three real-world problems from computational biology show that non-sparse MKL achieves accuracies that surpass the state-of-the-art, and two efficient interleaved optimization strategies for arbitrary norms are developed.

...read moreread less

Abstract: Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability and scalability. Unfortunately, this l1-norm MKL is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we extend MKL to arbitrary norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary norms, that is lp-norms with p ≥ 1. This interleaved optimization is much faster than the commonly used wrapper approaches, as demonstrated on several data sets. A theoretical analysis and an experiment on controlled artificial data shed light on the appropriateness of sparse, non-sparse and l∞-norm MKL in various scenarios. Importantly, empirical applications of lp-norm MKL to three real-world problems from computational biology show that non-sparse MKL achieves accuracies that surpass the state-of-the-art. Data sets, source code to reproduce the experiments, implementations of the algorithms, and further information are available at http://doc.ml.tu-berlin.de/nonsparse_mkl/.

...read moreread less

423 citations

Journal Article•DOI•

Sensing Trending Topics in Twitter

[...]

Luca Maria Aiello¹, Georgios Petkos², Carlos Martin³, David Corney³, Symeon Papadopoulos², R. Skraba⁴, Ayse Göker⁵, Ioannis Kompatsiaris², Alejandro Jaimes¹ - Show less +5 more•Institutions (5)

Yahoo!¹, Information Technology Institute², City University London³, Bell Labs⁴, Robert Gordon University⁵

01 Oct 2013-IEEE Transactions on Multimedia

TL;DR: It is found that standard natural language processing techniques can perform well for social streams on very focused topics, but novel techniques designed to mine the temporal distribution of concepts are needed to handle more heterogeneous streams containing multiple stories evolving in parallel.

...read moreread less

Abstract: Online social and news media generate rich and timely information about real-world events of all kinds. However, the huge amount of data available, along with the breadth of the user base, requires a substantial effort of information filtering to successfully drill down to relevant topics and events. Trending topic detection is therefore a fundamental building block to monitor and summarize information originating from social sources. There are a wide variety of methods and variables and they greatly affect the quality of results. We compare six topic detection methods on three Twitter datasets related to major events, which differ in their time scale and topic churn rate. We observe how the nature of the event considered, the volume of activity over time, the sampling procedure and the pre-processing of the data all greatly affect the quality of detected topics, which also depends on the type of detection method used. We find that standard natural language processing techniques can perform well for social streams on very focused topics, but novel techniques designed to mine the temporal distribution of concepts are needed to handle more heterogeneous streams containing multiple stories evolving in parallel. One of the novel topic detection methods we propose, based on -grams cooccurrence and topic ranking, consistently achieves the best performance across all these conditions, thus being more reliable than other state-of-the-art techniques.

...read moreread less

423 citations

Journal Article•DOI•

A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation

[...]

Arindam Banerjee, Inderjit S. Dhillon¹, Joydeep Ghosh¹, Srujana Merugu², Dharmendra S. Modha³ - Show less +1 more•Institutions (3)

University of Texas at Austin¹, Yahoo!², IBM³

01 Dec 2007-Journal of Machine Learning Research

TL;DR: This paper presents a substantially generalized co-clustering framework wherein any Bregman divergence can be used in the objective function, and various conditional expectation based constraints can be considered based on the statistics that need to be preserved.

...read moreread less

Abstract: Co-clustering, or simultaneous clustering of rows and columns of a two-dimensional data matrix, is rapidly becoming a powerful data analysis technique. Co-clustering has enjoyed wide success in varied application domains such as text clustering, gene-microarray analysis, natural language processing and image, speech and video analysis. In this paper, we introduce a partitional co-clustering formulation that is driven by the search for a good matrix approximation---every co-clustering is associated with an approximation of the original data matrix and the quality of co-clustering is determined by the approximation error. We allow the approximation error to be measured using a large class of loss functions called Bregman divergences that include squared Euclidean distance and KL-divergence as special cases. In addition, we permit multiple structurally different co-clustering schemes that preserve various linear statistics of the original data matrix. To accomplish the above tasks, we introduce a new minimum Bregman information (MBI) principle that simultaneously generalizes the maximum entropy and standard least squares principles, and leads to a matrix approximation that is optimal among all generalized additive models in a certain natural parameter space. Analysis based on this principle yields an elegant meta algorithm, special cases of which include most previously known alternate minimization based clustering algorithms such as kmeans and co-clustering algorithms such as information theoretic (Dhillon et al., 2003b) and minimum sum-squared residue co-clustering (Cho et al., 2004). To demonstrate the generality and flexibility of our co-clustering framework, we provide examples and empirical evidence on a variety of problem domains and also describe novel co-clustering applications such as missing value prediction and compression of categorical data matrices.

...read moreread less

420 citations

Proceedings Article•DOI•

How flickr helps us make sense of the world: context and content in community-contributed media collections

[...]

Lyndon Kennedy¹, Mor Naaman¹, Shane Ahern¹, Rahul Nair¹, Tye Rattenbury¹ - Show less +1 more•Institutions (1)

Yahoo!¹

29 Sep 2007

TL;DR: A location-tag-vision-based approach to retrieving images of geography-related landmarks and features from the Flickr dataset is demonstrated, suggesting that community-contributed media and annotation can enhance and improve access to multimedia resources - and the understanding of the world.

...read moreread less

Abstract: The advent of media-sharing sites like Flickr and YouTube has drastically increased the volume of community-contributed multimedia resources available on the web These collections have a previously unimagined depth and breadth, and have generated new opportunities - and new challenges - to multimedia research How do we analyze, understand and extract patterns from these new collections? How can we use these unstructured, unrestricted community contributions of media (and annotation) to generate "knowledge" As a test case, we study Flickr - a popular photo sharing website Flickr supports photo, time and location metadata, as well as a light-weight annotation model We extract information from this dataset using two different approaches First, we employ a location-driven approach to generate aggregate knowledge in the form of "representative tags" for arbitrary areas in the world Second, we use a tag-driven approach to automatically extract place and event semantics for Flickr tags, based on each tag's metadata patterns With the patterns we extract from tags and metadata, vision algorithms can be employed with greater precision In particular, we demonstrate a location-tag-vision-based approach to retrieving images of geography-related landmarks and features from the Flickr dataset The results suggest that community-contributed media and annotation can enhance and improve our access to multimedia resources - and our understanding of the world

...read moreread less

417 citations

Collapse

Authors

Showing all 26766 results

Name	H-index	Papers	Citations
Ashok Kumar	151	5654	164086
Alexander J. Smola	122	434	110222
Howard I. Maibach	116	1821	60765
Sanjay Jain	103	881	46880
Amirhossein Sahebkar	100	1307	46132
Marc Davis	99	412	50243
Wenjun Zhang	96	976	38530
Jian Xu	94	1366	52057
Fortunato Ciardiello	94	695	47352
Tong Zhang	93	414	36519
Michael E. J. Lean	92	411	30939
Ashish K. Jha	87	503	30020
Xin Zhang	87	1714	40102
Theunis Piersma	86	632	34201
George Varghese	84	253	28598

Network Information

Related Institutions (5)

University of Toronto

294.9K papers, 13.5M citations

85% related

University of California, San Diego

204.5K papers, 12.3M citations

85% related

University College London

210.6K papers, 9.8M citations

84% related

Cornell University

235.5K papers, 12.2M citations

84% related

University of Washington

305.5K papers, 17.7M citations

84% related

Performance

Metrics

29,952

Papers

823,278

Citations

No. of papers from the Institution in previous years
Year	Papers
2023	2
2022	47
2021	1,088
2020	1,074
2019	1,568
2018	1,352