scispace - formally typeset
Search or ask a question

Showing papers on "Ranking SVM published in 2001"


Proceedings ArticleDOI
01 Apr 2001
TL;DR: A set of techniques for the rank aggregation problem is developed and compared to that of well-known methods, to design rank aggregation techniques that can be used to combat spam in Web searches.
Abstract: We consider the problem of combining ranking results from various sources. In the context of the Web, the main applications include building meta-search engines, combining ranking functions, selecting documents based on multiple criteria, and improving search precision through word associations. We develop a set of techniques for the rank aggregation problem and compare their performance to that of well-known methods. A primary goal of our work is to design rank aggregation techniques that can e ectively combat \spam," a serious problem in Web searches. Experiments show that our methods are simple, e cient, and e ective.

1,982 citations


Proceedings Article
03 Jan 2001
TL;DR: This article presents a Support Vector Machine like learning system to handle multi-label problems, based on a large margin ranking system that shares a lot of common properties with SVMs.
Abstract: This article presents a Support Vector Machine (SVM) like learning system to handle multi-label problems. Such problems are usually decomposed into many two-class problems but the expressive power of such a system can be weak [5, 7]. We explore a new direct approach. It is based on a large margin ranking system that shares a lot of common properties with SVMs. We tested it on a Yeast gene functional classification problem with positive results.

1,306 citations


Proceedings Article
03 Jan 2001
TL;DR: A simple and efficient online algorithm is described, its performance in the mistake bound model is analyzed, its correctness is proved, and it outperforms online algorithms for regression and classification applied to ranking.
Abstract: We discuss the problem of ranking instances. In our framework each instance is associated with a rank or a rating, which is an integer from 1 to k. Our goal is to find a rank-predict ion rule that assigns each instance a rank which is as close as possible to the instance's true rank. We describe a simple and efficient online algorithm, analyze its performance in the mistake bound model, and prove its correctness. We describe two sets of experiments, with synthetic data and with the EachMovie dataset for collaborative filtering. In the experiments we performed, our algorithm outperforms online algorithms for regression and classification applied to ranking.

657 citations


Journal ArticleDOI
TL;DR: It is shown that the support vector machine (SVM) classification algorithm, a recent development from the machine learning community, proves its potential for structure-activity relationship analysis in a benchmark test, compared to several machine learning techniques currently used in the field.

627 citations


Journal ArticleDOI
TL;DR: The first use of the SVM approach to predict protein secondary structure is described here, with good performance of segment overlap accuracy and a useful "reliability index" for the predictions was developed.

574 citations


Journal Article
TL;DR: A new algorithm that combines Support Vector Machine (SVM) and unsupervised clustering and proposes a new vector representation of web pages and applies it to web page classification.
Abstract: This paper presents a new algorithm that combines Support Vector Machine (SVM) and unsupervised clustering. After analyzing the characteristics of web pages, it proposes a new vector representation of web pages and applies it to web page classification. Given a training set, the algorithm clusters positive and negative examples respectively by the unsupervised clustering algorithm (UC), which will produce a number of positive and negative centers. Then, it selects only some of the examples to input to SVM according to ISUC algorithm. At the end, it constructs a classifier through SVM learning. Any text can be classified by comparing the distance of clustering centers or by SVM. If the text nears one cluster center of a category and far away from all the cluster centers of other categories, UC can classify it rightly with high possibility, otherwise SVM is employed to decide the category it belongs. The algorithm utilizes the virtues of SVM and unsupervised clustering. The experiment shows that it not only improves training efficiency, but also has good precision.

33 citations


Book ChapterDOI
02 Jul 2001
TL;DR: Numerical results for different classifiers on a benchmark data set handwritten digits are presented and binary trees of SVMs are considered to solve the multi-class pattern recognition problem.
Abstract: Support vector machines (SVM) are learning algorithms derived from statistical learning theory. The SVM approach was originally developed for binary classification problems. In this paper SVM architectures for multi-class classification problems are discussed, in particular we consider binary trees of SVMs to solve the multi-class pattern recognition problem. Numerical results for different classifiers on a benchmark data set handwritten digits are presented.

30 citations


Journal Article
TL;DR: A new improved incremental SVM learning algorithm is proposed, which is based on a sifting factor, which accumulates distribution knowledge of the training sample while the incremental training is proceeded, and thus makes it possible to discard samples optimally.
Abstract: The classification algorithm based on SVM (support vector machine) attracts more attention from researchers due to its perfect theoretical properties and good empirical results. In this paper, the properties of SV set are analyzed thoroughly, and a new learning method is introdnced to extend the SVM Classification algorithm to incremental learning area. After that, a new improved incremental SVM learning algorithm is proposed, which is based on a sifting factor. This algorithm accumulates distribution knowledge of the training sample while the incremental training is proceeded, and thus makes it possible to discard samples optimally. The theoretical analysis and experimental results show that this algorithm could not only improve the training speed, but also reduce storage cost.

23 citations


Proceedings ArticleDOI
29 Oct 2001
TL;DR: This paper studies the speaker identification and verification problem using a support vector machine, and presents a SVM training method on large-scale samples according to the speech signal.
Abstract: The support vector machine (SVM) is an important learning method of statistical learning theory, and is also a powerful tool for pattern recognition problems. This paper studies the speaker identification and verification problem using a support vector machine, and presents a SVM training method on large-scale samples according to the speech signal. A text-independent speaker recognition system based on SVM was implemented and the results show good performance.

15 citations


Journal Article
TL;DR: SVM architectures for multi-class classification problems are discussed, in particular binary trees of SVMs are considered to solve the multi- class problem.
Abstract: Support vector machines (SVM) are learning algorithms derived from statistical learning theory. The SVM approach was originally developed for binary classification problems. In this paper SVM architectures for multi-class classification problems are discussed, in particular we consider binary trees of SVMs to solve the multi-class problem. Numerical results for different classifiers on a benchmark data set of handwritten digits are presented.

14 citations


Proceedings ArticleDOI
Joaquin Rapela1
09 Nov 2001
TL;DR: Using recall/precision evaluations and using collections of HTML documents with different characteristics, it is shown that the automatic method finds weights tailored to specific characteristics of each document collection.
Abstract: Current search engines use several criteria or heuristics to rank HTML documents. HTML ranking heuristics need to be combined into a ranking function that given a text query returns a ranked list of HTML documents. The standard approach is to build a weighted average by manually estimating the importance of every heuristic and assigning a weight proportional to the estimated importance. In the current paper we apply an automatic method for combining HTML ranking heuristics. Using recall/precision evaluations we study the performance of the automatic method and using collections of HTML documents with different characteristics we show that the automatic method finds weights tailored to specific characteristics of each document collection

15 Jun 2001
TL;DR: This paper experimentally evaluates two ranking methods and two selection methods based on a dichotomy of unithood and termhood and concludes that the simple threshold method with the window method that is proposed does not show much difference in recall and precision.
Abstract: An automatic term extraction system consists of a term candidate extraction subsystem, a ranking subsystem and a selection subsystem. In this paper, we experimentally evaluate two ranking methods and two selection methods. As for ranking, a dichotomy of unithood and termhood is a key notion. We evaluate these two notions experimentally by comparing Imp based ranking method that is based directly on termhood and C-value based method that is indirectly based on both termhood and unithood. As for selection, we compare the simple threshold method with the window method that we propose. We did the experimental evaluation with several Japanese technical manuals. The result does not show much difference in recall and precision. The small difference between the extracted terms by these two ranking methods depends upon their ranking mechanism per se.

Journal ArticleDOI
TL;DR: In what order a search engine should return the URLs it has produced in response to a user’s query, so as to show more relevant pages first is discussed.
Abstract: The amount of information on the web is growing rapidly, and search engines that rely on keyword matching usually return too many low quality matches. To improve search results, a challenging task for search engines is how to effectively calculate a relevance ranking for each web page. This paper discusses in what order a search engine should return the URLs it has produced in response to a user’s query, so as to show more relevant pages first. Emphasis is given on the ranking functions adopted by WebGather that take link structure and user popularity factors into account. Experimental results are also presented to evaluate the proposed strategy.

Journal ArticleDOI
TL;DR: A mathematical programming model is introduced that overcomes the major methodological problem of a large ranking task: respondent fatigue and deteriorated decision quality caused by an excessive number of objects to be ranked.
Abstract: This paper introduces a mathematical programming model that overcomes the major methodological problem of a large ranking task: respondent fatigue and deteriorated decision quality caused by an excessive number of objects to be ranked. The model was applied to the problem of ranking Marketing and International Business journals. There are more than 200 such journals, making direct ranking or rating very difficult, if not impossible. The result shows that the mathematical programming model uses very little information and yet can produce rankings that are in agreement with results obtained from direct ranking studies.

Proceedings ArticleDOI
15 Jul 2001
TL;DR: This paper proposes a generic procedure for ranking, based on 1D self-organizing maps (SOMs), where the similarity metric used by SOM is modified and automatically adjusted to the context by a genetic search.
Abstract: There are applications that require ordered instances modeled by high dimensional vectors. Despite the reasonable quantity of papers on the areas of classification and clustering and its crescent importance, papers on ranking are rare. Usual solutions are not generic and demand expert knowledge on the specification of the weight of each component and, therefore, the definition of a ranking function. This paper proposes a generic procedure for ranking, based on 1D self-organizing maps (SOMs). Additionally, the similarity metric used by SOM is modified and automatically adjusted to the context by a genetic search. This process seeks for the best ranking that marches the desired probability distribution provided by the specialist expectation. Promising results were achieved on the ranking of data from blood banks inspections.

Journal Article
TL;DR: It was proved that the mean error of classifying N catalogs with SVM decision tree is smaller and the types of S VM decision tree selected in various cases were discussed in detail, and the features of SVM decided tree were analysed.
Abstract: Support vector machine is a highly performance classification method. The basic support vector machine (SVM) is for pair class problem. The principle of SVM was introduced and a classifier based on SVM for a large number catalogs was proposed. The method was named SVM decision tree. The types of SVM decision tree selected in various cases were discussed in detail, and the features of SVM decision tree were also analysed. It was proved that the mean error of classifying N catalogs with SVM decision tree is smaller.

Proceedings ArticleDOI
21 Sep 2001
TL;DR: A novel criterion for both feature ranking and feature selection using Support Vector Machines (SVMs) using the bound on the expected error probability of an SVM is demonstrated.
Abstract: This paper demonstrates a novel criterion for both feature ranking and feature selection using Support Vector Machines (SVMs). The method analyses the importance of feature subset using the bound on the expected error probability of an SVM. In addition a scheme for feature ranking based on SVMs is presented. Experiments show that the proposed schemes perform well in feature ranking/selection, and risk bound based criterion is superior to some other criterions.

Proceedings ArticleDOI
02 Dec 2001
TL;DR: The proposed approach combines the merits of two prominent concepts individually used in the literature: the fuzzy reference set and the degree of dominance so that the decisive information embedded in the set of fuzzy numbers to be ranked is sensibly used, and satisfactory ranking outcomes are always achieved.
Abstract: Ranking fuzzy numbers plays a critical role in decision analysis under a fuzzy environment. Existing fuzzy ranking methods may not always be suitable for practical decision problems of large size, due to counter-intuitive ranking outcomes produced or considerable computational effort required. This paper presents an effective approach to address the fuzzy ranking problem of practical size. The proposed approach combines the merits of two prominent concepts individually used in the literature: the fuzzy reference set and the degree of dominance. As such, the decisive information embedded in the set of fuzzy numbers to be ranked is sensibly used, and satisfactory ranking outcomes are always achieved. The approach is computationally simple and its underlying concepts are logically sound and comprehensible. A comparative study is conducted on all benchmark cases used in the literature to examine its performance on rationality and discriminatory ability. The comparison result shows that the approach compares favorably with comparable methods examined.