scispace - formally typeset
Search or ask a question
Institution

Yahoo!

CompanyLondon, United Kingdom
About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.


Papers
More filters
Proceedings ArticleDOI
02 Nov 2009
TL;DR: This paper presents an add-on to traditional information retrieval applications in which various temporal information associated with documents are exploited to present and cluster documents along timelines and shows how temporal expressions are made explicit and used in the construction of multiple-granularity timelines.
Abstract: Time is an important dimension of any information space and can be very useful in information retrieval and in particular clustering and exploration of search results. Search result clustering is a feature integrated in some of today's search engines, allowing users to further explore search results. However, only little work has been done on exploiting temporal information embedded in documents for the presentation, clustering, and exploration of search results along well-defined timelines. In this paper, we present an add-on to traditional information retrieval applications in which we exploit various temporal information associated with documents to present and cluster documents along timelines. Temporal information expressed in the form of, e.g., date and time tokens or temporal references, appear in documents as part of the textual context or metadata. Using temporal entity extraction techniques, we show how temporal expressions are made explicit and used in the construction of multiple-granularity timelines. We discuss how hit-list based search results can be clustered according to temporal aspects, anchored in the constructed timelines, and how time-based document clusters can be used to explore search results that include temporal snippets. We also outline a prototypical implementation and evaluation that demonstrates the feasibility and functionality of our framework.

150 citations

Proceedings ArticleDOI
28 Oct 2007
TL;DR: By using spectral graph analysis, SRKDA casts discriminant analysis into a regression framework which facilitates both efficient computation and the use of regularization techniques, which is a huge save of computational cost.
Abstract: Linear discriminant analysis (LDA) has been a popular method for extracting features which preserve class separability. The projection vectors are commonly obtained by maximizing the between class covariance and simultaneously minimizing the within class covariance. LDA can be performed either in the original input space or in the reproducing kernel Hilbert space (RKHS) into which data points are mapped, which leads to Kernel Discriminant Analysis (KDA). When the data are highly nonlinear distributed, KDA can achieve better performance than LDA. However, computing the projective functions in KDA involves eigen-decomposition of kernel matrix, which is very expensive when a large number of training samples exist. In this paper, we present a new algorithm for kernel discriminant analysis, called spectral regression kernel discriminant analysis (SRKDA). By using spectral graph analysis, SRKDA casts discriminant analysis into a regression framework which facilitates both efficient computation and the use of regularization techniques. Specifically, SRKDA only needs to solve a set of regularized regression problems and there is no eigenvector computation involved, which is a huge save of computational cost. Our computational analysis shows that SRKDA is 27 times faster than the ordinary KDA. Moreover, the new formulation makes it very easy to develop incremental version of the algorithm which can fully utilize the computational results of the existing training samples. Experiments on face recognition demonstrate the effectiveness and efficiency of the proposed algorithm.

150 citations

Patent
11 Feb 2009
TL;DR: In this paper, an interest type for multimedia content is obtained from a consuming user and a subset of the multimedia content conforming to the interest type is presented in substantial real-time at the receiving devices of the consuming users.
Abstract: Methods and systems for processing multimedia content captured from a plurality of locations via one or more capturing devices include obtaining multimedia content from one or more capturing devices. The capturing devices identify a type of content being captured and/or location of capture. An interest type for multimedia content is obtained from a consuming user. The multimedia content from the capturing devices are searched based on the interest type of the consuming user. A subset of the multimedia content conforming to the interest type is presented in substantial real-time at the receiving devices of the consuming users. Feedback regarding the presented multimedia content is obtained from consuming users and communicated to the capturing devices in substantial real-time so as to influence future capture of multimedia content. The methods also include receiving a request for recording a live event wherein the request provides one or more recording preferences including one or more requester preferences for recording the live event. The request is dynamically matched to one or more generating users who have expressed intentions for recording the live event. The generating users are associated with one or more capturing devices that are configured to record the live event based on the recording preferences of the request. The request is then forwarded to the matched one or more generating users for recording the live event. User interactions at the recordings are monitored and fed back to the generating users for further refining the recordings.

150 citations

Journal ArticleDOI
TL;DR: To map community interventions in LMIC, identify competencies for community-based providers, and highlight research gaps, a review-of-reviews strategy was used and 23 reviews were identified for the narrative synthesis.
Abstract: Community-based mental health services are emphasized in the World Health Organization’s Mental Health Action Plan, the World Bank’s Disease Control Priorities, and the Action Plan of the World Psychiatric Association. There is increasing evidence for effectiveness of mental health interventions delivered by non-specialists in community platforms in low- and middle-income countries (LMIC). However, the role of community components has yet to be summarized. Our objective was to map community interventions in LMIC, identify competencies for community-based providers, and highlight research gaps. Using a review-of-reviews strategy, we identified 23 reviews for the narrative synthesis. Motivations to employ community components included greater accessibility and acceptability compared to healthcare facilities, greater clinical effectiveness through ongoing contact and use of trusted local providers, family involvement, and economic benefits. Locations included homes, schools, and refugee camps, as well as technology-aided delivery. Activities included awareness raising, psychoeducation, skills training, rehabilitation, and psychological treatments. There was substantial variation in the degree to which community components were integrated with primary care services. Addressing gaps in current practice will require assuring collaboration with service users, utilizing implementation science methods, creating tools to facilitate community services and evaluate competencies of providers, and developing standardized reporting for community-based programs.

150 citations

Proceedings ArticleDOI
16 Oct 2009
TL;DR: Two optimization schemes, prefetching and pre-shuffling, are proposed, which improve the overall performance under the shared environment while retaining compatibility with the native Hadoop.
Abstract: MapReduce is a programming model that supports distributed and parallel processing for large-scale data-intensive applications such as machine learning, data mining, and scientific simulation. Hadoop is an open-source implementation of the MapReduce programming model. Hadoop is used by many companies including Yahoo!, Amazon, and Facebook to perform various data mining on large-scale data sets such as user search logs and visit logs. In these cases, it is very common to share the same computing resources by multiple users due to practical considerations about cost, system utilization, and manageability. However, Hadoop assumes that all cluster nodes are dedicated to a single user, failing to guarantee high performance in the shared MapReduce computation environment. In this paper, we propose two optimization schemes, prefetching and pre-shuffling, which improve the overall performance under the shared environment while retaining compatibility with the native Hadoop. The proposed schemes are implemented in the native Hadoop-0.18.3 as a plug-in component called HPMR (High Performance MapReduce Engine). Our evaluation on the Yahoo!Grid platform with three different workloads and seven types of test sets from Yahoo! shows that HPMR reduces the execution time by up to 73%.

150 citations


Authors

Showing all 26766 results

NameH-indexPapersCitations
Ashok Kumar1515654164086
Alexander J. Smola122434110222
Howard I. Maibach116182160765
Sanjay Jain10388146880
Amirhossein Sahebkar100130746132
Marc Davis9941250243
Wenjun Zhang9697638530
Jian Xu94136652057
Fortunato Ciardiello9469547352
Tong Zhang9341436519
Michael E. J. Lean9241130939
Ashish K. Jha8750330020
Xin Zhang87171440102
Theunis Piersma8663234201
George Varghese8425328598
Network Information
Related Institutions (5)
University of Toronto
294.9K papers, 13.5M citations

85% related

University of California, San Diego
204.5K papers, 12.3M citations

85% related

University College London
210.6K papers, 9.8M citations

84% related

Cornell University
235.5K papers, 12.2M citations

84% related

University of Washington
305.5K papers, 17.7M citations

84% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20232
202247
20211,088
20201,074
20191,568
20181,352