Institution
Yahoo!
Company•London, United Kingdom•
About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.
Papers published on a yearly basis
Papers
More filters
••
07 Jun 2015TL;DR: This work proposes an effective deep learning framework to generate binary hash codes for fast image retrieval by employing a hidden layer for representing the latent concepts that dominate the class labels in convolutional neural networks.
Abstract: Approximate nearest neighbor search is an efficient strategy for large-scale image retrieval. Encouraged by the recent advances in convolutional neural networks (CNNs), we propose an effective deep learning framework to generate binary hash codes for fast image retrieval. Our idea is that when the data labels are available, binary codes can be learned by employing a hidden layer for representing the latent concepts that dominate the class labels. The utilization of the CNN also allows for learning image representations. Unlike other supervised methods that require pair-wised inputs for binary code learning, our method learns hash codes and image representations in a point-wised manner, making it suitable for large-scale datasets. Experimental results show that our method outperforms several state-of-the-art hashing algorithms on the CIFAR-10 and MNIST datasets. We further demonstrate its scalability and efficacy on a large-scale dataset of 1 million clothing images.
605 citations
••
TL;DR: Thanks to the spectral accuracy in both space and time of the proposed method, the storage requirement due to the “global time dependence” can be considerably relaxed, and therefore calculation of the long-time solution becomes possible.
Abstract: In this paper, we consider the numerical solution of the time fractional diffusion equation. Essentially, the time fractional diffusion equation differs from the standard diffusion equation in the time derivative term. In the former case, the first-order time derivative is replaced by a fractional derivative, making the problem global in time. We propose a spectral method in both temporal and spatial discretizations for this equation. The convergence of the method is proven by providing a priori error estimate. Numerical tests are carried out to confirm the theoretical results. Thanks to the spectral accuracy in both space and time of the proposed method, the storage requirement due to the “global time dependence” can be considerably relaxed, and therefore calculation of the long-time solution becomes possible.
599 citations
••
TL;DR: This work investigates two representative ways of approximating the dense similarity matrix and picks the strategy of sparsifying the matrix via retaining nearest neighbors and investigates its parallelization, which can effectively handle large problems.
Abstract: Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms, such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix with another by the Nystrom method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through an empirical study on a document data set of 193,844 instances and a photo data set of 2,121,863, we show that our parallel algorithm can effectively handle large problems.
591 citations
•
05 Jul 2011TL;DR: This paper automatically infer the values of user attributes such as political orientation or ethnicity by leveraging observable information such as the user behavior, network structure and the linguistic content of the user’s Twitter feed through a machine learning approach.
Abstract: This paper addresses the task of user classification in social media, with an application to Twitter. We automatically infer the values of user attributes such as political orientation or ethnicity by leveraging observable information such as the user behavior, network structure and the linguistic content of the user’s Twitter feed. We employ a machine learning approach which relies on a comprehensive set of features derived from such user information. We report encouraging experimental results on 3 tasks with different characteristics: political affiliation detection, ethnicity identification and detecting affinity for a particular business. Finally, our analysis shows that rich linguistic features prove consistently valuable across the 3 tasks and show great promise for additional user classification needs.
584 citations
••
TL;DR: This work counted daily unique queries originating in the United States that contained influenza-related search terms from the Yahoo! search engine from March 2004 through May 2008, and estimated linear models, using searches with 1-10-week lead times as explanatory variables to predict the percentage of cultures positive for influenza and deaths attributable to pneumonia and influenza in the US.
Abstract: The Internet is an important source of health information. Thus, the frequency of Internet searches may provide information regarding infectious disease activity. As an example, we examined the relationship between searches for influenza and actual influenza occurrence. Using search queries from the Yahoo! search engine ( http://search.yahoo.com ) from March 2004 through May 2008, we counted daily unique queries originating in the United States that contained influenza-related search terms. Counts were divided by the total number of searches, and the resulting daily fraction of searches was averaged over the week. We estimated linear models, using searches with 1-10-week lead times as explanatory variables to predict the percentage of cultures positive for influenza and deaths attributable to pneumonia and influenza in the United States. With use of the frequency of searches, our models predicted an increase in cultures positive for influenza 1-3 weeks in advance of when they occurred (P < .001), and similar models predicted an increase in mortality attributable to pneumonia and influenza up to 5 weeks in advance (P < .001). Search-term surveillance may provide an additional tool for disease surveillance.
584 citations
Authors
Showing all 26766 results
Name | H-index | Papers | Citations |
---|---|---|---|
Ashok Kumar | 151 | 5654 | 164086 |
Alexander J. Smola | 122 | 434 | 110222 |
Howard I. Maibach | 116 | 1821 | 60765 |
Sanjay Jain | 103 | 881 | 46880 |
Amirhossein Sahebkar | 100 | 1307 | 46132 |
Marc Davis | 99 | 412 | 50243 |
Wenjun Zhang | 96 | 976 | 38530 |
Jian Xu | 94 | 1366 | 52057 |
Fortunato Ciardiello | 94 | 695 | 47352 |
Tong Zhang | 93 | 414 | 36519 |
Michael E. J. Lean | 92 | 411 | 30939 |
Ashish K. Jha | 87 | 503 | 30020 |
Xin Zhang | 87 | 1714 | 40102 |
Theunis Piersma | 86 | 632 | 34201 |
George Varghese | 84 | 253 | 28598 |