Institution

Yahoo!

Company•London, United Kingdom•

About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.

...read moreread less

Topics: Population, Web search query, Web page, Web query classification, Query expansion ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Link spam detection based on mass estimation

[...]

Zoltan Gyongyi¹, Pavel Berkhin², Hector Garcia-Molina¹, Jan Pedersen²•Institutions (2)

Stanford University¹, Yahoo!²

01 Sep 2006

TL;DR: The concept of spam mass, a measure of the impact of link spamming on a page's ranking, is introduced, and how to estimate spam mass and how the estimates can help identifying pages that benefit significantly from links spamming are discussed.

...read moreread less

Abstract: Link spamming intends to mislead search engines and trigger an artificially high link-based ranking of specific target web pages. This paper introduces the concept of spam mass, a measure of the impact of link spamming on a page's ranking. We discuss how to estimate spam mass and how the estimates can help identifying pages that benefit significantly from link spamming. In our experiments on the host-level Yahoo! web graph we use spam mass estimates to successfully identify tens of thousands of instances of heavyweight link spamming.

...read moreread less

163 citations

Proceedings Article•DOI•

A sequential dual method for large scale multi-class linear svms

[...]

S. Sathiya Keerthi¹, S. Sundararajan¹, Kai-Wei Chang², Cho-Jui Hsieh², Chih-Jen Lin² - Show less +1 more•Institutions (2)

Yahoo!¹, National Taiwan University²

24 Aug 2008

TL;DR: Experiments indicate that the main idea is to sequentially traverse through the training set and optimize the dual variables associated with one example at a time, much faster than state of the art solvers such as bundle, cutting plane and exponentiated gradient methods.

...read moreread less

Abstract: Efficient training of direct multi-class formulations of linear Support Vector Machines is very useful in applications such as text classification with a huge number examples as well as features. This paper presents a fast dual method for this training. The main idea is to sequentially traverse through the training set and optimize the dual variables associated with one example at a time. The speed of training is enhanced further by shrinking and cooling heuristics. Experiments indicate that our method is much faster than state of the art solvers such as bundle, cutting plane and exponentiated gradient methods.

...read moreread less

163 citations

Posted Content•

Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program)

[...]

Joelle Pineau¹, Joelle Pineau², Philippe Vincent-Lamarre³, Koustuv Sinha², Koustuv Sinha¹, Vincent Larivière³, Alina Beygelzimer⁴, Florence d'Alché-Buc, Emily B. Fox⁵, Emily B. Fox⁶, Hugo Larochelle - Show less +7 more•Institutions (6)

McGill University¹, Facebook², École Normale Supérieure³, Yahoo!⁴, University of Washington⁵, Apple Inc.⁶

27 Mar 2020-arXiv: Learning

TL;DR: The program contained three components: a code submission policy, a community-wide reproducibility challenge, and the inclusion of the Machine Learning Reproducibility checklist as part of the paper submission process, which was deployed and described.

...read moreread less

Abstract: One of the challenges in machine learning research is to ensure that presented and published results are sound and reliable. Reproducibility, that is obtaining similar results as presented in a paper or talk, using the same code and data (when available), is a necessary step to verify the reliability of research findings. Reproducibility is also an important step to promote open and accessible research, thereby allowing the scientific community to quickly integrate new findings and convert ideas to practice. Reproducibility also promotes the use of robust experimental workflows, which potentially reduce unintentional errors. In 2019, the Neural Information Processing Systems (NeurIPS) conference, the premier international conference for research in machine learning, introduced a reproducibility program, designed to improve the standards across the community for how we conduct, communicate, and evaluate machine learning research. The program contained three components: a code submission policy, a community-wide reproducibility challenge, and the inclusion of the Machine Learning Reproducibility checklist as part of the paper submission process. In this paper, we describe each of these components, how it was deployed, as well as what we were able to learn from this initiative.

...read moreread less

163 citations

Book•

Measuring User Engagement

[...]

Mounia Lalmas¹, Heather L. O'Brien², Elad Yom-Tov³•Institutions (3)

Yahoo!¹, University of British Columbia², Microsoft³

01 Dec 2014

TL;DR: This book advocates for the development of ``good'' measures and good measurement practices that will advance the study of user engagement and improve the understanding of this construct, which has become so vital in the authors' wired world.

...read moreread less

Abstract: User engagement refers to the quality of the user experience that emphasizes the positive aspects of interacting with an online application and, in particular, the desire to use that application longer and repeatedly. User engagement is a key concept in the design of online applications (whether for desktop, tablet or mobile), motivated by the observation that successful applications are not just used, but are engaged with. Users invest time, attention, and emotion in their use of technology, and seek to satisfy pragmatic and hedonic needs. Measurement is critical for evaluating whether online applications are able to successfully engage users, and may inform the design of and use of applications. User engagement is a multifaceted, complex phenomenon; this gives rise to a number of potential measurement approaches. Common ways to evaluate user engagement include using self-report measures, e.g., questionnaires; observational methods, e.g. facial expression analysis, speech analysis; neuro-physiological signal processing methods, e.g., respiratory and cardiovascular accelerations and decelerations, muscle spasms; and web analytics, e.g., number of site visits, click depth. These methods represent various trade-offs in terms of the setting (laboratory versus ``in the wild''), object of measurement (user behaviour, affect or cognition) and scale of data collected. For instance, small-scale user studies are deep and rich, but limited in terms of generalizability, whereas large-scale web analytic studies are powerful but negate users' motivation and context. The focus of this book is how user engagement is currently being measured and various considerations for its measurement. Our goal is to leave readers with an appreciation of the various ways in which to measure user engagement, and their associated strengths and weaknesses. We emphasize the multifaceted nature of user engagement and the unique contextual constraints that come to bear upon attempts to measure engagement in different settings, and across different user groups and web domains. At the same time, this book advocates for the development of ``good'' measures and good measurement practices that will advance the study of user engagement and improve our understanding of this construct, which has become so vital in our wired world. Table of Contents: Preface / Acknowledgments / Introduction and Scope / Approaches Based on Self-Report Methods / Approaches Based on Physiological Measurements / Approaches Based on Web Analytics / Beyond Desktop, Single Site, and Single Task / Enhancing the Rigor of User Engagement Methods and Measures / Conclusions and Future Research Directions / Bibliography / Authors' Biographies / Index

...read moreread less

163 citations

Proceedings Article•DOI•

The role of information diffusion in the evolution of social networks

[...]

Lilian Weng¹, Jacob Ratkiewicz², Nicola Perra³, Bruno Gonçalves⁴, Carlos Castillo⁵, Francesco Bonchi⁶, Rossano Schifanella⁷, Filippo Menczer¹, Alessandro Flammini¹ - Show less +5 more•Institutions (7)

Indiana University¹, Google², Northeastern University³, Aix-Marseille University⁴, Qatar Computing Research Institute⁵, Yahoo!⁶, University of Turin⁷

11 Aug 2013

TL;DR: In this paper, the authors present an analysis of longitudinal micro-blogging data, revealing a more nuanced view of the strategies employed by users when expanding their social circles, and characterize users with a set of parameters associated with different link creation strategies, estimated by a Maximum-Likelihood approach.

...read moreread less

Abstract: Every day millions of users are connected through online social networks, generating a rich trove of data that allows us to study the mechanisms behind human interactions. Triadic closure has been treated as the major mechanism for creating social links: if Alice follows Bob and Bob follows Charlie, Alice will follow Charlie. Here we present an analysis of longitudinal micro-blogging data, revealing a more nuanced view of the strategies employed by users when expanding their social circles. While the network structure affects the spread of information among users, the network is in turn shaped by this communication activity. This suggests a link creation mechanism whereby Alice is more likely to follow Charlie after seeing many messages by Charlie. We characterize users with a set of parameters associated with different link creation strategies, estimated by a Maximum-Likelihood approach. Triadic closure does have a strong effect on link formation, but shortcuts based on traffic are another key factor in interpreting network evolution. However, individual strategies for following other users are highly heterogeneous. Link creation behaviors can be summarized by classifying users in different categories with distinct structural and behavioral characteristics. Users who are popular, active, and influential tend to create traffic-based shortcuts, making the information diffusion process more efficient in the network.

...read moreread less

162 citations

Collapse

Authors

Showing all 26766 results

Name	H-index	Papers	Citations
Ashok Kumar	151	5654	164086
Alexander J. Smola	122	434	110222
Howard I. Maibach	116	1821	60765
Sanjay Jain	103	881	46880
Amirhossein Sahebkar	100	1307	46132
Marc Davis	99	412	50243
Wenjun Zhang	96	976	38530
Jian Xu	94	1366	52057
Fortunato Ciardiello	94	695	47352
Tong Zhang	93	414	36519
Michael E. J. Lean	92	411	30939
Ashish K. Jha	87	503	30020
Xin Zhang	87	1714	40102
Theunis Piersma	86	632	34201
George Varghese	84	253	28598