scispace - formally typeset
Search or ask a question
Institution

Yahoo!

CompanyLondon, United Kingdom
About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.


Papers
More filters
Proceedings ArticleDOI
01 Sep 2006
TL;DR: The concept of spam mass, a measure of the impact of link spamming on a page's ranking, is introduced, and how to estimate spam mass and how the estimates can help identifying pages that benefit significantly from links spamming are discussed.
Abstract: Link spamming intends to mislead search engines and trigger an artificially high link-based ranking of specific target web pages. This paper introduces the concept of spam mass, a measure of the impact of link spamming on a page's ranking. We discuss how to estimate spam mass and how the estimates can help identifying pages that benefit significantly from link spamming. In our experiments on the host-level Yahoo! web graph we use spam mass estimates to successfully identify tens of thousands of instances of heavyweight link spamming.

163 citations

Proceedings ArticleDOI
24 Aug 2008
TL;DR: Experiments indicate that the main idea is to sequentially traverse through the training set and optimize the dual variables associated with one example at a time, much faster than state of the art solvers such as bundle, cutting plane and exponentiated gradient methods.
Abstract: Efficient training of direct multi-class formulations of linear Support Vector Machines is very useful in applications such as text classification with a huge number examples as well as features. This paper presents a fast dual method for this training. The main idea is to sequentially traverse through the training set and optimize the dual variables associated with one example at a time. The speed of training is enhanced further by shrinking and cooling heuristics. Experiments indicate that our method is much faster than state of the art solvers such as bundle, cutting plane and exponentiated gradient methods.

163 citations

Posted Content
TL;DR: The program contained three components: a code submission policy, a community-wide reproducibility challenge, and the inclusion of the Machine Learning Reproducibility checklist as part of the paper submission process, which was deployed and described.
Abstract: One of the challenges in machine learning research is to ensure that presented and published results are sound and reliable. Reproducibility, that is obtaining similar results as presented in a paper or talk, using the same code and data (when available), is a necessary step to verify the reliability of research findings. Reproducibility is also an important step to promote open and accessible research, thereby allowing the scientific community to quickly integrate new findings and convert ideas to practice. Reproducibility also promotes the use of robust experimental workflows, which potentially reduce unintentional errors. In 2019, the Neural Information Processing Systems (NeurIPS) conference, the premier international conference for research in machine learning, introduced a reproducibility program, designed to improve the standards across the community for how we conduct, communicate, and evaluate machine learning research. The program contained three components: a code submission policy, a community-wide reproducibility challenge, and the inclusion of the Machine Learning Reproducibility checklist as part of the paper submission process. In this paper, we describe each of these components, how it was deployed, as well as what we were able to learn from this initiative.

163 citations

Book
01 Dec 2014
TL;DR: This book advocates for the development of ``good'' measures and good measurement practices that will advance the study of user engagement and improve the understanding of this construct, which has become so vital in the authors' wired world.
Abstract: User engagement refers to the quality of the user experience that emphasizes the positive aspects of interacting with an online application and, in particular, the desire to use that application longer and repeatedly. User engagement is a key concept in the design of online applications (whether for desktop, tablet or mobile), motivated by the observation that successful applications are not just used, but are engaged with. Users invest time, attention, and emotion in their use of technology, and seek to satisfy pragmatic and hedonic needs. Measurement is critical for evaluating whether online applications are able to successfully engage users, and may inform the design of and use of applications. User engagement is a multifaceted, complex phenomenon; this gives rise to a number of potential measurement approaches. Common ways to evaluate user engagement include using self-report measures, e.g., questionnaires; observational methods, e.g. facial expression analysis, speech analysis; neuro-physiological signal processing methods, e.g., respiratory and cardiovascular accelerations and decelerations, muscle spasms; and web analytics, e.g., number of site visits, click depth. These methods represent various trade-offs in terms of the setting (laboratory versus ``in the wild''), object of measurement (user behaviour, affect or cognition) and scale of data collected. For instance, small-scale user studies are deep and rich, but limited in terms of generalizability, whereas large-scale web analytic studies are powerful but negate users' motivation and context. The focus of this book is how user engagement is currently being measured and various considerations for its measurement. Our goal is to leave readers with an appreciation of the various ways in which to measure user engagement, and their associated strengths and weaknesses. We emphasize the multifaceted nature of user engagement and the unique contextual constraints that come to bear upon attempts to measure engagement in different settings, and across different user groups and web domains. At the same time, this book advocates for the development of ``good'' measures and good measurement practices that will advance the study of user engagement and improve our understanding of this construct, which has become so vital in our wired world. Table of Contents: Preface / Acknowledgments / Introduction and Scope / Approaches Based on Self-Report Methods / Approaches Based on Physiological Measurements / Approaches Based on Web Analytics / Beyond Desktop, Single Site, and Single Task / Enhancing the Rigor of User Engagement Methods and Measures / Conclusions and Future Research Directions / Bibliography / Authors' Biographies / Index

163 citations

Proceedings ArticleDOI
11 Aug 2013
TL;DR: In this paper, the authors present an analysis of longitudinal micro-blogging data, revealing a more nuanced view of the strategies employed by users when expanding their social circles, and characterize users with a set of parameters associated with different link creation strategies, estimated by a Maximum-Likelihood approach.
Abstract: Every day millions of users are connected through online social networks, generating a rich trove of data that allows us to study the mechanisms behind human interactions. Triadic closure has been treated as the major mechanism for creating social links: if Alice follows Bob and Bob follows Charlie, Alice will follow Charlie. Here we present an analysis of longitudinal micro-blogging data, revealing a more nuanced view of the strategies employed by users when expanding their social circles. While the network structure affects the spread of information among users, the network is in turn shaped by this communication activity. This suggests a link creation mechanism whereby Alice is more likely to follow Charlie after seeing many messages by Charlie. We characterize users with a set of parameters associated with different link creation strategies, estimated by a Maximum-Likelihood approach. Triadic closure does have a strong effect on link formation, but shortcuts based on traffic are another key factor in interpreting network evolution. However, individual strategies for following other users are highly heterogeneous. Link creation behaviors can be summarized by classifying users in different categories with distinct structural and behavioral characteristics. Users who are popular, active, and influential tend to create traffic-based shortcuts, making the information diffusion process more efficient in the network.

162 citations


Authors

Showing all 26766 results

NameH-indexPapersCitations
Ashok Kumar1515654164086
Alexander J. Smola122434110222
Howard I. Maibach116182160765
Sanjay Jain10388146880
Amirhossein Sahebkar100130746132
Marc Davis9941250243
Wenjun Zhang9697638530
Jian Xu94136652057
Fortunato Ciardiello9469547352
Tong Zhang9341436519
Michael E. J. Lean9241130939
Ashish K. Jha8750330020
Xin Zhang87171440102
Theunis Piersma8663234201
George Varghese8425328598
Network Information
Related Institutions (5)
University of Toronto
294.9K papers, 13.5M citations

85% related

University of California, San Diego
204.5K papers, 12.3M citations

85% related

University College London
210.6K papers, 9.8M citations

84% related

Cornell University
235.5K papers, 12.2M citations

84% related

University of Washington
305.5K papers, 17.7M citations

84% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20232
202247
20211,088
20201,074
20191,568
20181,352