scispace - formally typeset
Search or ask a question

Showing papers by "Yahoo! published in 2006"


Journal ArticleDOI
Lou Jost1
01 May 2006-Oikos
TL;DR: The standard similarity measure based on untransformed indices is shown to give misleading results, but transforming the indices or entropies to effective numbers of species produces a stable, easily interpreted, sensitive general similarity measure.
Abstract: Entropies such as the Shannon–Wiener and Gini–Simpson indices are not themselves diversities. Conversion of these to effective number of species is the key to a unified and intuitive interpretation of diversity. Effective numbers of species derived from standard diversity indices share a common set of intuitive mathematical properties and behave as one would expect of a diversity, while raw indices do not. Contrary to Keylock, the lack of concavity of effective numbers of species is irrelevant as long as they are used as transformations of concave alpha, beta, and gamma entropies. The practical importance of this transformation is demonstrated by applying it to a popular community similarity measure based on raw diversity indices or entropies. The standard similarity measure based on untransformed indices is shown to give misleading results, but transforming the indices or entropies to effective numbers of species produces a stable, easily interpreted, sensitive general similarity measure. General overlap measures derived from this transformed similarity measure yield the Jaccard index, Sorensen index, Horn index of overlap, and the Morisita–Horn index as special cases.

3,677 citations


Book ChapterDOI
Pavel Berkhin1
01 Jan 2006
TL;DR: This survey concentrates on clustering algorithms from a data mining perspective as a data modeling technique that provides for concise summaries of the data.
Abstract: Clustering is the division of data into groups of similar objects. In clustering, some details are disregarded in exchange for data simplification. Clustering can be viewed as a data modeling technique that provides for concise summaries of the data. Clustering is therefore related to many disciplines and plays an important role in a broad range of applications. The applications of clustering usually deal with large datasets and data with many attributes. Exploration of such data is a subject of data mining. This survey concentrates on clustering algorithms from a data mining perspective.

3,047 citations


Journal ArticleDOI
TL;DR: A new taxonomy of living amphibians is proposed to correct the deficiencies of the old one, based on the largest phylogenetic analysis of living Amphibia so far accomplished, and many subsidiary taxa are demonstrated to be nonmonophyletic.
Abstract: The evidentiary basis of the currently accepted classification of living amphibians is discussed and shown not to warrant the degree of authority conferred on it by use and tradition. A new taxonomy of living amphibians is proposed to correct the deficiencies of the old one. This new taxonomy is based on the largest phylogenetic analysis of living Amphibia so far accomplished. We combined the comparative anatomical character evidence of Haas (2003) with DNA sequences from the mitochondrial transcription unit H1 (12S and 16S ribosomal RNA and tRNAValine genes, ≈ 2,400 bp of mitochondrial sequences) and the nuclear genes histone H3, rhodopsin, tyrosinase, and seven in absentia, and the large ribosomal subunit 28S (≈ 2,300 bp of nuclear sequences; ca. 1.8 million base pairs; x = 3.7 kb/terminal). The dataset includes 532 terminals sampled from 522 species representative of the global diversity of amphibians as well as seven of the closest living relatives of amphibians for outgroup comparisons. The...

1,994 citations


Proceedings ArticleDOI
20 Aug 2006
TL;DR: A simple model of network growth is presented, characterizing users as either passive members of the network; inviters who encourage offline friends and acquaintances to migrate online; and linkers who fully participate in the social evolution of thenetwork.
Abstract: In this paper, we consider the evolution of structure within large online social networks. We present a series of measurements of two such networks, together comprising in excess of five million people and ten million friendship links, annotated with metadata capturing the time of every event in the life of the network. Our measurements expose a surprising segmentation of these networks into three regions: singletons who do not participate in the network; isolated communities which overwhelmingly display star structure; and a giant component anchored by a well-connected core region which persists even in the absence of stars.We present a simple model of network growth which captures these aspects of component structure. The model follows our experimental results, characterizing users as either passive members of the network; inviters who encourage offline friends and acquaintances to migrate online; and linkers who fully participate in the social evolution of the network.

1,151 citations


Proceedings ArticleDOI
21 Oct 2006
TL;DR: An improved algorithm for computing approximate PageRank vectors, which allows us to find a cut with conductance at most oslash and approximately optimal balance in time O(m log4 m/oslash) in time proportional to its size.
Abstract: A local graph partitioning algorithm finds a cut near a specified starting vertex, with a running time that depends largely on the size of the small side of the cut, rather than the size of the input graph. In this paper, we present a local partitioning algorithm using a variation of PageRank with a specified starting distribution. We derive a mixing result for PageRank vectors similar to that for random walks, and show that the ordering of the vertices produced by a PageRank vector reveals a cut with small conductance. In particular, we show that for any set C with conductance \Phi and volume k, a PageRank vector with a certain starting distribution can be used to produce a set with conductance O\left( {\sqrt {\Phi \log k} } \right). We present an improved algorithm for computing approximate PageRank vectors, which allows us to find such a set in time proportional to its size. In particular, we can find a cut with conductance at most ot o , whose small side has volume at least 2b, in time O\left( {2^b \log ^2 m/ ot o^2 } \right) where m is the number of edges in the graph. By combining small sets found by this local partitioning algorithm, we obtain a cut with conductance ot o and approximately optimal balance in time O\left( {m\log ^4 m/ ot o^2 } \right).

1,022 citations


Proceedings ArticleDOI
22 Aug 2006
TL;DR: A model of tagging systems, specifically in the context of web-based systems, is offered to help illustrate the possible benefits of these tools and a simple taxonomy of incentives and contribution models is provided to inform potential evaluative frameworks.
Abstract: In recent years, tagging systems have become increasingly popular. These systems enable users to add keywords (i.e., "tags") to Internet resources (e.g., web pages, images, videos) without relying on a controlled vocabulary. Tagging systems have the potential to improve search, spam detection, reputation systems, and personal organization while introducing new modalities of social communication and opportunities for data mining. This potential is largely due to the social structure that underlies many of the current systems.Despite the rapid expansion of applications that support tagging of resources, tagging systems are still not well studied or understood. In this paper, we provide a short description of the academic related work to date. We offer a model of tagging systems, specifically in the context of web-based systems, to help us illustrate the possible benefits of these tools. Since many such systems already exist, we provide a taxonomy of tagging systems to help inform their analysis and design, and thus enable researchers to frame and compare evidence for the sustainability of such systems. We also provide a simple taxonomy of incentives and contribution models to inform potential evaluative frameworks. While this work does not present comprehensive empirical results, we present a preliminary study of the photo-sharing and tagging system Flickr to demonstrate our model and explore some of the issues in one sample system. This analysis helps us outline and motivate possible future directions of research in tagging systems.

993 citations


Journal ArticleDOI
TL;DR: An appearance-based face recognition method, called orthogonal Laplacianface, based on the locality preserving projection (LPP) algorithm, which aims at finding a linear approximation to the eigenfunctions of the Laplace Beltrami operator on the face manifold.
Abstract: Following the intuition that the naturally occurring face data may be generated by sampling a probability distribution that has support on or near a submanifold of ambient space, we propose an appearance-based face recognition method, called orthogonal Laplacianface. Our algorithm is based on the locality preserving projection (LPP) algorithm, which aims at finding a linear approximation to the eigenfunctions of the Laplace Beltrami operator on the face manifold. However, LPP is nonorthogonal, and this makes it difficult to reconstruct the data. The orthogonal locality preserving projection (OLPP) method produces orthogonal basis functions and can have more locality preserving power than LPP. Since the locality preserving power is potentially related to the discriminating power, the OLPP is expected to have more discriminating power than LPP. Experimental results on three face databases demonstrate the effectiveness of our proposed algorithm

783 citations


Proceedings ArticleDOI
23 May 2006
TL;DR: A model for selecting between candidates is built, by using a number of features relating the query-candidate pair, and by fitting the model to human judgments of relevance of query suggestions, which improves the quality of the candidates generated.
Abstract: We introduce the notion of query substitution, that is, generating a new query to replace a user's original search query. Our technique uses modifications based on typical substitutions web searchers make to their queries. In this way the new query is strongly related to the original query, containing terms closely related to all of the original terms. This contrasts with query expansion through pseudo-relevance feedback, which is costly and can lead to query drift. This also contrasts with query relaxation through boolean or TFIDF retrieval, which reduces the specificity of the query. We define a scale for evaluating query substitution, and show that our method performs well at generating new queries related to the original queries. We build a model for selecting between candidates, by using a number of features relating the query-candidate pair, and by fitting the model to human judgments of relevance of query suggestions. This further improves the quality of the candidates generated. Experiments show that our techniques significantly increase coverage and effectiveness in the setting of sponsored search.

707 citations


Proceedings ArticleDOI
20 Aug 2006
TL;DR: This work presents a generic framework for clustering data over time, and discusses evolutionary versions of two widely-used clustering algorithms within this framework: k-means and agglomerative hierarchical clustering.
Abstract: We consider the problem of clustering data over time. An evolutionary clustering should simultaneously optimize two potentially conflicting criteria: first, the clustering at any point in time should remain faithful to the current data as much as possible; and second, the clustering should not shift dramatically from one timestep to the next. We present a generic framework for this problem, and discuss evolutionary versions of two widely-used clustering algorithms within this framework: k-means and agglomerative hierarchical clustering. We extensively evaluate these algorithms on real data sets and show that our algorithms can simultaneously attain both high accuracy in capturing today's data, and high fidelity in reflecting yesterday's clustering.

686 citations


Journal ArticleDOI
TL;DR: More than adequate or excessive iodine intake may lead to hypothyroidism and autoimmune thyroiditis in cohorts from three regions with different levels of iodine intake.
Abstract: Background Iodine is an essential component of thyroid hormones; either low or high intake may lead to thyroid disease. We observed an increase in the prevalence of overt hypothyroidism, subclinical hypothyroidism, and autoimmune thyroiditis with increasing iodine intake in China in cohorts from three regions with different levels of iodine intake: mildly deficient (median urinary iodine excretion, 84 μg per liter), more than adequate (median, 243 μg per liter), and excessive (median, 651 μg per liter). Participants enrolled in a baseline study in 1999, and during the five-year follow-up through 2004, we examined the effect of regional differences in iodine intake on the incidence of thyroid disease. Methods Of the 3761 unselected subjects who were enrolled at baseline, 3018 (80.2 percent) participated in this follow-up study. Levels of thyroid hormones and thyroid autoantibodies in serum, and iodine in urine, were measured and B-mode ultrasonography of the thyroid was performed at baseline and follow-up. Results Among subjects with mildly deficient iodine intake, those with more than adequate intake, and those with excessive intake, the cumulative incidence of overt hypothyroidism was 0.2 percent, 0.5 percent, and 0.3 percent, respectively; that of subclinical hypothyroidism, 0.2 percent, 2.6 percent, and 2.9 percent, respectively; and that of autoimmune thyroiditis, 0.2 percent, 1.0 percent, and 1.3 percent, respectively. Among subjects with euthyroidism and antithyroid antibodies at baseline, the five-year incidence of elevated serum thyrotropin levels was greater among those with more than adequate or excessive iodine intake than among those with mildly deficient iodine intake. A baseline serum thyrotropin level of 1.0 to 1.9 mIU per liter was associated with the lowest subsequent incidence of abnormal thyroid function. Conclusions More than adequate or excessive iodine intake may lead to hypothyroidism and autoimmune thyroiditis.

626 citations


Proceedings ArticleDOI
25 Jun 2006
TL;DR: A general model, the collective factorization on related matrices, is proposed for multi-type relational data clustering and a novel algorithm is derived, the spectral relational clustering, to cluster multi- type interrelated data objects simultaneously.
Abstract: Clustering on multi-type relational data has attracted more and more attention in recent years due to its high impact on various important applications, such as Web mining, e-commerce and bioinformatics. However, the research on general multi-type relational data clustering is still limited and preliminary. The contribution of the paper is three-fold. First, we propose a general model, the collective factorization on related matrices, for multi-type relational data clustering. The model is applicable to relational data with various structures. Second, under this model, we derive a novel algorithm, the spectral relational clustering, to cluster multi-type interrelated data objects simultaneously. The algorithm iteratively embeds each type of data objects into low dimensional spaces and benefits from the interactions among the hidden structures of different types of data objects. Extensive experiments demonstrate the promise and effectiveness of the proposed algorithm. Third, we show that the existing spectral clustering algorithms can be considered as the special cases of the proposed model and algorithm. This demonstrates the good theoretic generality of the proposed model and algorithm.

Patent
19 Jun 2006
TL;DR: In this paper, a system and method for recommending tags and/or content items in response to requests received from remote computing devices is presented, where the tag density is defined as the number of times a tag has been associated with a content item by any user of a plurality of users who are members of a community.
Abstract: The present invention relates to a system and method for recommending tags and/or content items in response to requests received from remote computing devices. In one aspect, a content item recommendation system comprises a database configured to store an identifier of a first content item, a first tag and information from which a tag density associated with the first tag and with the first content item may be derived. The tag density may be a measure of times a tag has been associated with a content item by any user of a plurality of users who are members of a community. The system also comprises a recommendation engine configured to receive search results containing the first tag from a search engine and to correlate the first tag with information stored in the database. The recommendation engine may be further configured to determine a recommended tag, based on a recommendation threshold and a tag density, the tag density associated with both the recommended tag and the first content item.

Patent
Ramesh Sarukkai1
26 Jun 2006
TL;DR: In this article, a trust network is defined for each user, and annotations by any member of the user's trust network are made visible to the user during search and/or browsing of the corpus if the querying user and trust network members use similar queries to identify documents in the corpus.
Abstract: Computer systems and methods incorporate user annotations (metadata) regarding various pages or sites, including annotations by a querying user and by members of a trust network defined for the querying user into search and browsing of a corpus such as the World Wide Web. A trust network is defined for each user, and annotations by any member of the querying user's trust network are made visible to the querying user during search and/or browsing of the corpus if the querying user and trust network members use similar queries to identify documents in the corpus. Users can also limit searches to content annotated by members of their trust networks or by members of a community selected by the user.

Journal ArticleDOI
TL;DR: The results indicate the validity of using the HADS and the GDS to screen for depressive symptoms and to diagnose depressive illness in PD.
Abstract: We assessed the concurrent validity of the Hospital Anxiety and Depression Scale (HADS) and the Geriatric Depression Scale (GDS) against the Hamilton Rating Scale for Depression (Ham-D) in patients with Parkinson's disease (PD). Forty-six non-demented PD patients were assessed by a neurologist on the Ham-D. Patients also completed four mood rating scales: the HADS, the GDS, the VAS and the Face Scale. For the HADS and the GDS, Receiver Operating Characteristics (ROC) curves were obtained and the positive and negative predictive values (PPV, NPV) were calculated for different cut-off scores. Maximum discrimination between depressed and non-depressed PD patients was reached at a cut-off score of 10/11 for both the HADS and the GDS. At the same cut-off score of 10/11 for both the HADS and the GDS, the high sensitivity and NPV make these scales appropriate screening instruments for depression in PD. A high specificity and PPV, which is necessary for a diagnostic test, was reached at a cut-off score of 12/13 for the GDS and at a cut-off score of 11/12 for the HADS. The results indicate the validity of using the HADS and the GDS to screen for depressive symptoms and to diagnose depressive illness in PD.

Journal ArticleDOI
TL;DR: A primal method that decouples the idea of basis functions from the concept of support vectors and greedily finds a set of kernel basis functions of a specified maximum size to approximate the SVM primal cost function well.
Abstract: Support vector machines (SVMs), though accurate, are not preferred in applications requiring great classification speed, due to the number of support vectors being large. To overcome this problem we devise a primal method with the following properties: (1) it decouples the idea of basis functions from the concept of support vectors; (2) it greedily finds a set of kernel basis functions of a specified maximum size (dmax) to approximate the SVM primal cost function well; (3) it is efficient and roughly scales as O(ndmax2) where n is the number of training examples; and, (4) the number of basis functions it requires to achieve an accuracy close to the SVM accuracy is usually far less than the number of SVM support vectors.

Patent
28 Apr 2006
TL;DR: A mobile device, system, and method are directed towards sharing multimedia information on a mobile device based at least in part on vitality information and other social networking information as discussed by the authors, where multimedia information captured on the mobile device may be manually and/or automatically annotated and shared with members of the social network.
Abstract: A mobile device, system, and method are directed towards sharing multimedia information on a mobile device based at least in part on vitality information and other social networking information. Multimedia information may be received and/or synchronized on the mobile device based on a relationship between vitality information of members of a social network. The relationship may comprise a common membership in a group, a common multimedia usage behavior, a geographical proximity of members of the social network, a degree of separation of members of the social network, a common search behavior, or the like. Multimedia information captured on the mobile device may be manually and/or automatically annotated and shared with members of the social network. The multimedia information may be displayed in an integrated live view in conjunction with other social networking information.

Patent
01 Nov 2006
TL;DR: In this paper, a GPS coordinate and a search criterion are received from a client device associated with a member of a social network, and a route is determined between the start and end location and through the location of interest.
Abstract: A device, system, and method are directed towards providing location information from a social network. A GPS coordinate and a search criterion are received from a client device associated with a member of a social network. The social network is searched for another member associated with a location name based on the GPS coordinate and the search criterion. The location name may be a sponsored advertisement. The location name is provided to the client device. A communication may be enabled between the member and the other member. Moreover, a start and end location may also be received. The GPS coordinate and/or search criterion may be associated with either the start or end location. The searched location name is used to determine a location of interest. A route is determined between the start and end location and through the location of interest. The route is provided to the client device.

Proceedings ArticleDOI
23 May 2006
TL;DR: This work combines a novel solution to an interval covering problem with extensions to previous work on score aggregation in order to create an efficient backend system capable of producing visualizations at arbitrary scales on this large dataset in real time.
Abstract: We consider the problem of visualizing the evolution of tags within the Flickr (flickr.com) online image sharing community. Any user of the Flickr service may append a tag to any photo in the system. Over the past year, users have on average added over a million tags each week. Understanding the evolution of these tags over time is therefore a challenging task. We present a new approach based on a characterization of the most interesting tags associated with a sliding interval of time. An animation provided via Flash in a web browser allows the user to observe and interact with the interesting tags as they evolve over time.New algorithms and data structures are required to support the efficient generation of this visualization. We combine a novel solution to an interval covering problem with extensions to previous work on score aggregation in order to create an efficient backend system capable of producing visualizations at arbitrary scales on this large dataset in real time.

Patent
20 Apr 2006
TL;DR: In this paper, metadata may be in the form of tags, comments, annotations or favorites, and the media objects may be searched according to metadata, and ranked in a variety of ways.
Abstract: Metadata may be associated with media objects by providing media objects for display, and accepting input concerning the media objects, where the input may include at least two different types of metadata. For example, metadata may be in the form of tags, comments, annotations or favorites. The media objects may be searched according to metadata, and ranked in a variety of ways.

Proceedings ArticleDOI
26 Oct 2006
TL;DR: A framework for automatically selecting a summary set of photos from a large collection of geo-referenced photographs, based on spa-tial patterns in photo sets, as well as textual-topical patterns and user (photographer) identity cues, which can be expanded to support social, temporal, and other factors.
Abstract: We describe a framework for automatically selecting a summary set of photos from a large collection of geo-referenced photographs. Such large collections are inherently difficult to browse, and become excessively so as they grow in size, making summaries an important tool in rendering these collections accessible. Our summary algorithm is based on spa-tial patterns in photo sets, as well as textual-topical patterns and user (photographer) identity cues. The algorithm can be expanded to support social, temporal, and other factors. The summary can thus be biased by the content of the query, the user making the query, and the context in which the query is made.A modified version of our summarization algorithm serves as a basis for a new map-based visualization of large collections of geo-referenced photos, called Tag Maps. Tag Maps visualize the data by placing highly representative textual tags on relevant map locations in the viewed region, effectively providing a sense of the important concepts embodied in the collection.An initial evaluation of our implementation on a set of geo-referenced photos shows that our algorithm and visualization perform well, producing summaries and views that are highly rated by users.

Patent
08 Feb 2006
TL;DR: In this paper, a new class of metrics known as "interestingness" is proposed to rank media objects based on the quantity of user-entered metadata concerning the media object.
Abstract: Media objects, such as images or soundtracks, may be ranked according to a new class of metrics known as “interestingness.” These rankings may be based at least in part on the quantity of user-entered metadata concerning the media object, the number of users who have assigned metadata to the media object, access patterns related to the media object, and/or a lapse of time related to the media object.

Patent
07 Jun 2006
TL;DR: In this article, a system and method are directed towards prefetching content for a mobile terminal based on characteristics of, and tracked usage of the mobile terminal to request content through an online portal service, which provides access to content in multiple subject areas.
Abstract: A system and method are directed towards prefetching content for a mobile terminal based on characteristics of, and tracked usage of the mobile terminal to request content through an online portal service, which provides access to content in multiple subject areas. A mobile user profile is created from the characteristics and patterns of the tracked usage. The tracked usage information includes the time, location, frequency at which the content was requested. Based on the mobile user profile information, content similar to previously requested content is prefetched and cached in anticipation of the mobile terminal making a similar request. Prefetching may also occur based on a trigger event such as the mobile terminal returning to a location from which certain content was previously requested. Prefetching may further be based on a related general user profile that indicates usage of an alternate electronic device to access content through the portal.

Book ChapterDOI
David Cossock1, Tong Zhang1
22 Jun 2006
TL;DR: In this article, the authors consider the problem of subset ranking, motivated by its important application in web search and present bounds that relate the approximate optimization of DCG to the approximate minimization of certain regression errors.
Abstract: We study the subset ranking problem, motivated by its important application in web-search. In this context, we consider the standard DCG criterion (discounted cumulated gain) that measures the quality of items near the top of the rank-list. Similar to error minimization for binary classification, the DCG criterion leads to a non-convex optimization problem that can be NP-hard. Therefore a computationally more tractable approach is needed. We present bounds that relate the approximate optimization of DCG to the approximate minimization of certain regression errors. These bounds justify the use of convex learning formulations for solving the subset ranking problem. The resulting estimation methods are not conventional, in that we focus on the estimation quality in the top-portion of the rank-list. We further investigate the generalization ability of these formulations. Under appropriate conditions, the consistency of the estimation schemes with respect to the DCG metric can be derived.

Proceedings ArticleDOI
06 Aug 2006
TL;DR: An implementation of Transductive SVM (TSVM) that is significantly more efficient and scalable than currently used dual techniques, for linear classification problems involving large, sparse datasets, and a variant of TSVM that involves multiple switching of labels.
Abstract: Large scale learning is often realistic only in a semi-supervised setting where a small set of labeled examples is available together with a large collection of unlabeled data. In many information retrieval and data mining applications, linear classifiers are strongly preferred because of their ease of implementation, interpretability and empirical performance. In this work, we present a family of semi-supervised linear support vector classifiers that are designed to handle partially-labeled sparse datasets with possibly very large number of examples and features. At their core, our algorithms employ recently developed modified finite Newton techniques. Our contributions in this paper are as follows: (a) We provide an implementation of Transductive SVM (TSVM) that is significantly more efficient and scalable than currently used dual techniques, for linear classification problems involving large, sparse datasets. (b) We propose a variant of TSVM that involves multiple switching of labels. Experimental results show that this variant provides an order of magnitude further improvement in training efficiency. (c) We present a new algorithm for semi-supervised learning based on a Deterministic Annealing (DA) approach. This algorithm alleviates the problem of local minimum in the TSVM optimization procedure while also being computationally attractive. We conduct an empirical study on several document classification tasks which confirms the value of our methods in large scale semi-supervised settings.

Patent
28 Apr 2006
TL;DR: In this article, a system and a method for creating and providing a user interface for optimizing advertiser defined groups of advertisement campaign information is disclosed, which is based on the forecasting information to optimize performance of at least one or more ad groups.
Abstract: A system and method for creating and providing a user interface for optimizing advertiser defined groups of advertisement campaign information is disclosed. Generally, advertisement campaign information is organized into one more ad groups. An ad group typically includes advertisements and parameters for advertisements that are to be handled by an advertisement campaign management system in a similar manner. Forecasting information is obtained relating to at least a portion of one of the one or more ad groups. At least a portion of the advertisement campaign information is then modified based at least in part on the forecasting information to optimize performance of at least one of the one or more ad groups.

Proceedings ArticleDOI
23 May 2006
TL;DR: This work shows how to adapt recent results from theoretical computer science to expand a seed set into a community with small conductance and a strong relationship to the seed, while examining only a small neighborhood of the entire graph.
Abstract: Expanding a seed set into a larger community is a common procedure in link-based analysis. We show how to adapt recent results from theoretical computer science to expand a seed set into a community with small conductance and a strong relationship to the seed, while examining only a small neighborhood of the entire graph. We extend existing results to give theoretical guarantees that apply to a variety of seed sets from specified communities. We also describe simple and flexible heuristics for applying these methods in practice, and present early experiments showing that these methods compare favorably with existing approaches.

Journal ArticleDOI
Daniel C. Fain1, Jan Pedersen1
TL;DR: GoTo as discussed by the authors is a GoTo sponsorisee sur le Web, which permet aux annonceurs de faire apparaitre leurs contenus dans les resultats affiches par un moteur de recherche, a demarre en 1998 avec GoTo, acquis par Yahoo! en 2002.
Abstract: La recherche sponsorisee sur le Web, qui permet aux annonceurs de faire apparaitre leurs contenus dans les resultats affiches par un moteur de recherche, a demarre en 1998 avec GoTo, acquis par Yahoo! en 2002. Le prix de l'apparition sur un site donne peut etre fonction du nombre d'apparitions (cout pour mille), des clicks sur un lien (cout par click) ou des actes engendres (cout par action). La recherche payee souleve un interet grandissant au sein de la communaute academique.

Patent
Pavel Berkhim1, Zhichen Xu1, Jianchang Mao1, Daniel E. Rose1, Abe Taha1, Farzin Maghoul1 
02 Aug 2006
TL;DR: In this paper, the authors proposed a method for trust propagation in which a first feature vector for a first user, calculating a second feature for a second user and comparing the similarity value with the second feature vector to calculate a similarity value.
Abstract: The present invention is directed towards systems and methods for trust propagation. The method according to one embodiment comprises calculating a first feature vector for a first user, calculating a second feature for a second user and comparing the first feature vector with the second feature vector to calculate a similarity value. A determination is made as to whether the similarity value falls within a threshold. If the similarity value falls within the threshold, a relationship is recorded between the first user and the second user in a first user profile and a second user profile.

Journal ArticleDOI
01 Dec 2006
TL;DR: This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by a large and diverse set of judges.
Abstract: We describe the WEBSPAM-UK2006 collection, a large set of Web pages that have been manually annotated with labels indicating if the hosts are include Web spam aspects or not. This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by a large and diverse set of judges.

Journal ArticleDOI
TL;DR: Assessment of the periodontal status of mandibular central incisors that were proclined during orthodontic treatment found recession was negatively correlated with keratinized gingival height and thickness of the facial gingivals margin, and thickness had greater relevance to recession.