scispace - formally typeset
Search or ask a question

Showing papers on "Ranking SVM published in 2004"


Journal Article
TL;DR: An algorithm is derived that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.
Abstract: The support vector machine (SVM) is a widely used tool for classification. Many efficient implementations exist for fitting a two-class SVM model. The user has to supply values for the tuning parameters: the regularization cost parameter, and the kernel parameters. It seems a common practice is to use a default value for the cost parameter, often leading to the least restrictive model. In this paper we argue that the choice of the cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model. We illustrate our algorithm on some examples, and use our representation to give further insight into the range of SVM solutions.

699 citations


Proceedings ArticleDOI
21 Jul 2004
TL;DR: This paper presents an innovative unsupervised method for automatic sentence extraction using graph-based ranking algorithms and shows that the results obtained compare favorably with previously published results on established benchmarks.
Abstract: This paper presents an innovative unsupervised method for automatic sentence extraction using graph-based ranking algorithms. We evaluate the method in the context of a text summarization task, and show that the results obtained compare favorably with previously published results on established benchmarks.

485 citations


Book ChapterDOI
31 Aug 2004
TL;DR: This work adapt and apply principles of probabilistic models from Information Retrieval for structured data to solve the problem of ranking answers to a database query when many tuples are returned.
Abstract: We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval for structured data. Our proposed solution is domain independent. It leverages data and workload statistics and correlations. Our ranking functions can be further customized for different applications. We present results of preliminary experiments which demonstrate the efficiency as well as the quality of our ranking system.

171 citations


Patent
07 May 2004
TL;DR: Rank aggregation as discussed by the authors is a modular scoring system using rank aggregation that merges search results into an ordered list of results using many different features of documents, such as indegree, page ranking, URL length, proximity to the root server of an intranet, etc.
Abstract: A modular scoring system using rank aggregation merges search results into an ordered list of results using many different features of documents. The ranking functions of the present system can easily be customized to the needs of a particular corpus or collection of users such as an intranet. Rank aggregation is independent of the underlying score distributions between the different factors, and can be applied to merge any set of ranking functions. Rank aggregation holds the advantage of combining the influence of many different heuristic factors in a robust way to produce high-quality results for queries. The modular scoring system combines factors such as indegree, page ranking, URL length, proximity to the root server of an intranet, etc, to form a single ordering on web pages that closely obeys the individual orderings, but also mediates between the collective wisdom of individual heuristics.

155 citations


Proceedings ArticleDOI
31 Oct 2004
TL;DR: This paper presents a general probabilistic technique for error ranking that exploits correlation behavior amongst reports and incorporates user feedback into the ranking process and observes a factor of 2-8 improvement over randomized ranking for error reports emitted by both intra-Procedural and inter-procedural analysis tools.
Abstract: Static program checking tools can find many serious bugs in software, but due to analysis limitations they also frequently emit false error reports. Such false positives can easily render the error checker useless by hiding real errors amidst the false. Effective error report ranking schemes mitigate the problem of false positives by suppressing them during the report inspection process [17, 19, 20]. In this way, ranking techniques provide a complementary method to increasing the precision of the analysis results of a checking tool. A weakness of previous ranking schemes, however, is that they produce static rankings that do not adapt as reports are inspected, ignoring useful correlations amongst reports. This paper addresses this weakness with two main contributions. First, we observe that both bugs and false positives frequently cluster by code locality. We analyze clustering behavior in historical bug data from two large systems and show how clustering can be exploited to greatly improve error report ranking. Second, we present a general probabilistic technique for error ranking that (1) exploits correlation behavior amongst reports and (2) incorporates user feedback into the ranking process. In our results we observe a factor of 2-8 improvement over randomized ranking for error reports emitted by both intra-procedural and inter-procedural analysis tools.

144 citations


Journal Article
TL;DR: In this article, the authors study the general performance of naive Bayes in ranking and show that it outperforms C4.4, the state-of-the-art decision tree algorithm for ranking.
Abstract: It is well-known that naive Bayes performs surprisingly well in classification, but its probability estimation is poor. In many applications, however, a ranking based on class probabilities is desired. For example, a ranking of customers in terms of the likelihood that they buy one's products is useful in direct marketing. What is the general performance of naive Bayes in ranking? In this paper, we study it by both empirical experiments and theoretical analysis. Our experiments show that naive Bayes outperforms C4.4, the most state-of-the-art decision-tree algorithm for ranking. We study two example problems that have been used in analyzing the performance of naive Bayes in classification [3]. Surprisingly, naive Bayes performs perfectly on them in ranking, even though it does not in classification. Finally, we present and prove a sufficient condition for the optimality of naive Bayes in ranking.

112 citations


Journal ArticleDOI
01 Apr 2004
TL;DR: This paper presents a one-layer recurrent neural network for support vector machine (SVM) learning in pattern classification and regression that can converge exponentially to the optimal solution of SVM learning.
Abstract: This paper presents a one-layer recurrent neural network for support vector machine (SVM) learning in pattern classification and regression. The SVM learning problem is first converted into an equivalent formulation, and then a one-layer recurrent neural network for SVM learning is proposed. The proposed neural network is guaranteed to obtain the optimal solution of support vector classification and regression. Compared with the existing two-layer neural network for the SVM classification, the proposed neural network has a low complexity for implementation. Moreover, the proposed neural network can converge exponentially to the optimal solution of SVM learning. The rate of the exponential convergence can be made arbitrarily high by simply turning up a scaling parameter. Simulation examples based on benchmark problems are discussed to show the good performance of the proposed neural network for SVM learning.

102 citations


Book ChapterDOI
20 Sep 2004
TL;DR: The authors' experiments show that naive Bayes outperforms C4.4, the most state-of-the-art decision-tree algorithm for ranking, and present and prove a sufficient condition for the optimality of naive Baye in ranking.
Abstract: It is well-known that naive Bayes performs surprisingly well in classification, but its probability estimation is poor. In many applications, however, a ranking based on class probabilities is desired. For example, a ranking of customers in terms of the likelihood that they buy one's products is useful in direct marketing. What is the general performance of naive Bayes in ranking? In this paper, we study it by both empirical experiments and theoretical analysis. Our experiments show that naive Bayes outperforms C4.4, the most state-of-the-art decision-tree algorithm for ranking. We study two example problems that have been used in analyzing the performance of naive Bayes in classification [3]. Surprisingly, naive Bayes performs perfectly on them in ranking, even though it does not in classification. Finally, we present and prove a sufficient condition for the optimality of naive Bayes in ranking.

96 citations


Proceedings ArticleDOI
17 May 2004
TL;DR: It is shown that the quality of content-based ranking strategies can be improved by the use of community information as another evidential source of relevance, and the improvements reach up to 48% in terms of average precision.
Abstract: Current search technologies work in a "one size fits all" fashion. Therefore, the answer to a query is independent of specific user information need. In this paper we describe a novel ranking technique for personalized search servicesthat combines content-based and community-based evidences. The community-based information is used in order to provide context for queries andis influenced by the current interaction of the user with the service. Ouralgorithm is evaluated using data derived from an actual service available on the Web an online bookstore. We show that the quality of content-based ranking strategies can be improved by the use of communityinformation as another evidential source of relevance. In our experiments the improvements reach up to 48% in terms of average precision.

91 citations


Patent
Dmitriy Meyerzon1, Hang Li1
21 Dec 2004
TL;DR: In this paper, a feature extraction application extracts features, such as titles, found in any of the text based on formatting information applied to or associated with the text, and the extracted titles, the text and the formatting information are processed according to a field weighting application for determining a ranking of the given search results item.
Abstract: Methods and computer-readable media are provided for ranking search results using feature extraction data. Each of the results of a search engine query is parsed to obtain data, such as text, formatting information, metadata, and the like. The text, the formatting information and the metadata are passed through a feature extraction application to extract data that may be used to improve a ranking of the search results based on relevance of the search results to the search engine query. The feature extraction application extracts features, such as titles, found in any of the text based on formatting information applied to or associated with the text. The extracted titles, the text, the formatting information and the metadata for any given search results item are processed according to a field weighting application for determining a ranking of the given search results item. Ranked search results items may then be displayed according to ranking.

85 citations


Patent
30 Jul 2004
TL;DR: In this article, a ranking algorithm is defined by a plurality of parameters and weights associated with the plurality of parameter values, and an optimizing algorithm is applied to the received input information to identify an optimal ranking algorithm having an optimal score.
Abstract: Improving ranking algorithms for information retrieval. The ranking algorithms operate on search results obtained from a search engine. Input information including information describing a first ranking algorithm, a first score associated with the first ranking algorithm, a second ranking algorithm, a second score associated with the second ranking algorithm, and causal information relating a difference between the first ranking algorithm and the second ranking algorithm with a difference between the first score and the second score is received. An optimizing algorithm is applied to the received input information to identify an optimal ranking algorithm having an optimal score. The optimal ranking algorithm is defined by a plurality of parameters and a plurality of weights associated with the plurality of parameters.

Patent
23 Aug 2004
TL;DR: In this paper, a ranking function for a set of document rank values is iteratively solved with respect to the set of linked documents until a first stability condition is satisfied, after which some of the ranks will have converged.
Abstract: A system and method is disclosed in which a ranking function for a set of document rank values is iteratively solved with respect to a set of linked documents until a first stability condition is satisfied. After such condition is satisfied, some of the ranks will have converged. The ranking function is modified to take into account these converged ranks so as to reduce the ranking function's computation cost. The modified ranking function is then solved until a second stability condition is satisfied. After such condition is satisfied more of the ranks will have converged. The ranking function is again modified and process continues until complete.

Book ChapterDOI
17 Mar 2004
TL;DR: A new algorithm, Ranking SVM in a Co-training Framework (RSCF), which takes the clickthrough data containing the items in the search result that have been clicked on by a user as an input, and generates adaptive rankers as an output and it is demonstrated that the RSCF algorithm produces better ranking results than the standard ranking SVM algorithm.
Abstract: The information on the World Wide Web is growing without bound. Users may have very diversified preferences in the pages they target through a search engine. It is therefore a challenging task to adapt a search engine to suit the needs of a particular community of users who share similar interests. In this paper, we propose a new algorithm, Ranking SVM in a Co-training Framework (RSCF). Essentially, the RSCF algorithm takes the clickthrough data containing the items in the search result that have been clicked on by a user as an input, and generates adaptive rankers as an output. By analyzing the clickthrough data, RSCF first categorizes the data as the labelled data set, which contains the items that have been scanned already, and the unlabelled data set, which contains the items that have not yet been scanned. The labelled data is then augmented with unlabelled data to obtain a larger data set for training the rankers. We demonstrate that the RSCF algorithm produces better ranking results than the standard Ranking SVM algorithm. Based on RSCF we develop a metasearch engine that comprises MSNSearch, Wisenut, and Overture, and carry out an online experiment to show that our metasearch engine outperforms Google.

Book ChapterDOI
20 Sep 2004
TL;DR: An ensemble approach for Feature Ranking is proposed, aggregating feature rankings extracted along independent runs of an evolutionary learning algorithm named ROGER, inspired from the complexity framework proposed in the Constraint Satisfaction domain.
Abstract: A crucial issue for Machine Learning and Data Mining is Feature Selection, selecting the relevant features in order to focus the learning search. A relaxed setting for Feature Selection is known as Feature Ranking, ranking the features with respect to their relevance. This paper proposes an ensemble approach for Feature Ranking, aggregating feature rankings extracted along independent runs of an evolutionary learning algorithm named ROGER. The convergence of ensemble feature ranking is studied in a theoretical perspective, and a statistical model is devised for the empirical validation, inspired from the complexity framework proposed in the Constraint Satisfaction domain. Comparative experiments demonstrate the robustness of the approach for learning (a limited kind of) non-linear concepts, specifically when the features significantly outnumber the examples.

Proceedings ArticleDOI
15 Jun 2004
TL;DR: An effective strategy for automatic parameter selection for SVM is proposed by using the genetic algorithm (GA) in this paper to demonstrate the effectiveness and high efficiency of the proposed approach.
Abstract: Motivated by the fact that automatic parameter selection for support vector machines (SVM) is an important issue in order to make the SVM practically useful against the commonly used leave-one-out (loo) method, which has complex calculation and time consuming An effective strategy for automatic parameter selection for SVM is proposed by using the genetic algorithm (GA) in this paper Simulation results of the practice data model demonstrate the effectiveness and high efficiency of the proposed approach

Journal ArticleDOI
TL;DR: RSC, a modern learning algorithm, is used to rank the features extracted for detecting intrusions and generate intrusion detection models that take the form of “IF-THEN” rules, which have the advantage of explication.
Abstract: Recently machine learning-based intrusion detection approaches have been subjected to extensive researches because they can detect both misuse and anomaly. In this paper, rough set classification (RSC), a modern learning algorithm, is used to rank the features extracted for detecting intrusions and generate intrusion detection models. Feature ranking is a very critical step when building the model. RSC performs feature ranking before generating rules, and converts the feature ranking to minimal hitting set problem addressed by using genetic algorithm (GA). This is done in classical approaches using Support Vector Machine (SVM) by executing many iterations, each of which removes one useless feature. Compared with those methods, our method can avoid many iterations. In addition, a hybrid genetic algorithm is proposed to increase the convergence speed and decrease the training time of RSC. The models generated by RSC take the form of “IF-THEN” rules, which have the advantage of explication. Tests and comparison of RSC with SVM on DARPA benchmark data showed that for Probe and DoS attacks both RSC and SVM yielded highly accurate results (greater than 99% accuracy on testing set).

Proceedings ArticleDOI
27 Jun 2004
TL;DR: This paper proposes an asymmetric bagging based SVM, and combines the random subspace method (RSM) and SVM for RF to solve all the three problems and further improve the RF performance.
Abstract: Relevance feedback (RF) schemes based on support vector machine (SVM) have been widely used in content-based image retrieval. However, the performance of SVM based RF is often poor when the number of labeled positive feedback samples is small. This is mainly due to three reasons: (1) SVM classifier is unstable on small size training set; (2) SVM's optimal hyper-plane may be biased when the positive feedback samples are much less than the negative feedback samples; (3) overfitting due to that the feature dimension is much higher than the size of the training set. In this paper, we try to use random sampling techniques to overcome these problems. To address the first two problems, we propose an asymmetric bagging based SVM. For the third problem, we combine the random subspace method (RSM) and SVM for RF. Finally, by integrating bagging and RSM we solve all the three problems and further improve the RF performance.

Book ChapterDOI
07 Jun 2004
TL;DR: GRLVQ is discussed in comparison to the SVM and its beneficial theoretical properties which are similar to SVM are pointed out, whereby providing sparse and intuitive solutions.
Abstract: The support vector machine (SVM) constitutes one of the most successful current learning algorithms with excellent classification accuracy in large real-life problems and strong theoretical background. However, a SVM solution is given by a not intuitive classification in terms of extreme values of the training set and the size of a SVM classifier scales with the number of training data. Generalized relevance learning vector quantization (GRLVQ) has recently been introduced as a simple though powerful expansion of basic LVQ. Unlike SVM, it provides a very intuitive classification in terms of prototypical vectors the number of which is independent of the size of the training set. Here, we discuss GRLVQ in comparison to the SVM and point out its beneficial theoretical properties which are similar to SVM whereby providing sparse and intuitive solutions. In addition, the competitive performance of GRLVQ is demonstrated in one experiment from computational biology.

Journal Article
TL;DR: This work presents a least squares version of the least squares support vector machine (LS/sup 2/-SVM), which speeds up the calculations and provides better results, but most importantly it concludes a sparse solution.
Abstract: In the last decade Support Vector Machines (SVM) - introduced by Vapnik - have been successfully applied to a large number of problems. Lately a new technique, the Least Squares SVM (LS-SVM) has been introduced, which addresses classification and regression problems by formulating a linear equation set. In comparison to the original SVM, which involves a quadratic programming task, LS-SVM simplifies the required computation, but unfortunately the sparseness of standard SVM is lost. The linear equation set of LS-SVM embodies all available information about the learning process. By applying modifications to this equation set, we present a Least Squares version of the Least Squares Support Vector Machine (LS 2 -SVM). The modifications simplify the formulations, speed up the calculations and provide better results, but most importantly it concludes a sparse solution.

Book ChapterDOI
07 Sep 2004
TL;DR: In this article, an SVM-based learning system for information extraction (IE) is presented, which uses a variant of the SVM with uneven margins, which is particularly helpful for small training datasets.
Abstract: This paper presents an SVM-based learning system for information extraction (IE). One distinctive feature of our system is the use of a variant of the SVM, the SVM with uneven margins, which is particularly helpful for small training datasets. In addition, our approach needs fewer SVM classifiers to be trained than other recent SVM-based systems. The paper also compares our approach to several state-of-the-art systems (including rule learning and statistical learning algorithms) on three IE benchmark datasets: CoNLL-2003, CMU seminars, and the software jobs corpus. The experimental results show that our system outperforms a recent SVM-based system on CoNLL-2003, achieves the highest score on eight out of 17 categories on the jobs corpus, and is second best on the remaining nine.

Book ChapterDOI
30 Aug 2004
TL;DR: In this article, the complexity of the classifier is made dependent on the input image patch by using a Cascaded Reduced Set Vector expansion of the SVM, which has a Haar-like structure enabling a very fast SVM kernel evaluation by using the Integral Image.
Abstract: In this paper, we present a novel method for reducing the computational complexity of a Support Vector Machine (SVM) classifier without significant loss of accuracy. We apply this algorithm to the problem of face detection in images. To achieve high run-time efficiency, the complexity of the classifier is made dependent on the input image patch by use of a Cascaded Reduced Set Vector expansion of the SVM. The novelty of the algorithm is that the Reduced Set Vectors have a Haar-like structure enabling a very fast SVM kernel evaluation by use of the Integral Image. It is shown in the experiments that this novel algorithm provides, for a comparable accuracy, a 200 fold speed-up over the SVM and an 6 fold speed-up over the Cascaded Reduced Set Vector Machine.

Proceedings ArticleDOI
François Poulet1
01 Nov 2004
TL;DR: In this article, the authors present a cooperative approach using both support vector machine (SVM) algorithms and visualization methods to help the user to evaluate and explain the SVM results.
Abstract: We present a cooperative approach using both support vector machine (SVM) algorithms and visualization methods. SVM are widely used today and often give high quality results, but they are used as "black-box" (it is very difficult to explain the obtained results) and cannot treat easily very large datasets. We have developed graphical methods to help the user to evaluate and explain the SVM results. The first method is a graphical representation of the separating frontier quality, it is then linked with other visualization tools to help the user explaining SVM results. The information provided by these graphical methods is also used for SVM parameter tuning, they are then used together with automatic algorithms to deal with very large datasets on standard computers. We present an evaluation of our approach with the UCI and the Kent Ridge Bio-medical data sets.

Proceedings ArticleDOI
Weiguo Fan1, Ming Luo1, Li Wang2, Wensi Xi1, Edward A. Fox1 
25 Jul 2004
TL;DR: This paper argues that the ranking function should be tuned first, using user-provided queries, before applying the blind feedback technique, and shows that combining ranking function tuning and blind feedback can improve search performance by almost 30% over the baseline Okapi system.
Abstract: Both ranking functions and user queries are very important factors affecting a search engine's performance. Prior research has looked at how to improve ad-hoc retrieval performance for existing queries while tuning the ranking function, or modify and expand user queries using a fixed ranking scheme using blind feedback. However, almost no research has looked at how to combine ranking function tuning and blind feedback together to improve ad-hoc retrieval performance. In this paper, we look at the performance improvement for ad-hoc retrieval from a more integrated point of view by combining the merits of both techniques. In particular, we argue that the ranking function should be tuned first, using user-provided queries, before applying the blind feedback technique. The intuition is that highly-tuned ranking offers more high quality documents at the top of the hit list, thus offers a stronger baseline for blind feedback. We verify this integrated model in a large scale heterogeneous collection and the experimental results show that combining ranking function tuning and blind feedback can improve search performance by almost 30% over the baseline Okapi system.

Journal Article
TL;DR: A novel method for reducing the computational complexity of a Support Vector Machine (SVM) classifier without significant loss of accuracy is presented and this algorithm is applied to the problem of face detection in images.
Abstract: In this paper, we present a novel method for reducing the computational complexity of a Support Vector Machine (SVM) classifier without significant loss of accuracy. We apply this algorithm to the problem of face detection in images. To achieve high run-time efficiency, the complexity of the classifier is made dependent on the input image patch by use of a Cascaded Reduced Set Vector expansion of the SVM. The novelty of the algorithm is that the Reduced Set Vectors have a Haar-like structure enabling a very fast SVM kernel evaluation by use of the Integral Image. It is shown in the experiments that this novel algorithm provides, for a comparable accuracy, a 200 fold speed-up over the SVM and an 6 fold speed-up over the Cascaded Reduced Set Vector Machine.

Proceedings ArticleDOI
27 Jun 2004
TL;DR: A generic fuzzy scheme for the ranking of multivariate data based on a comparison function of two numbers fused by a T-norm for all components of two vectors, giving the comparison values.
Abstract: This paper presents a generic fuzzy scheme for the ranking of multivariate data. The scheme is based on a comparison function of two numbers. The comparison function values are fused by a T-norm for all components of two vectors, giving the comparison values. For each vector in a set of vectors, the smallest value of its comparison values with all other elements of the set is assigned to this vector as its ranking value. Then, all further processing is based on the ranking values alone. As a suitable comparison function, bounded division is identified. The application of the scheme to define a color morphology and an evolutionary multiobjective optimization algorithm is demonstrated.

Proceedings Article
01 Dec 2004
TL;DR: In this article, the authors argue that the choice of the SVM cost parameter can be critical and derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.
Abstract: In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.

Proceedings Article
01 Dec 2004
TL;DR: The Preference Learning Model is proposed as a unifying framework to model and solve a large class of multiclass problems in a large margin perspective and an original kernel-based method is proposed and evaluated on a ranking dataset with state-of-the-art results.
Abstract: Many interesting multiclass problems can be cast in the general framework of label ranking defined on a given set of classes The evaluation for such a ranking is generally given in terms of the number of violated order constraints between classes In this paper, we propose the Preference Learning Model as a unifying framework to model and solve a large class of multiclass problems in a large margin perspective In addition, an original kernel-based method is proposed and evaluated on a ranking dataset with state-of-the-art results

Journal ArticleDOI
TL;DR: The purpose of this paper is to present a learning algorithm to classify data with nonlinear characteristics by applying the SVM method to AVO classification of gas sand and wet sand.
Abstract: [1] The purpose of this paper is to present a learning algorithm to classify data with nonlinear characteristics. The Support Vector Machine (SVM) is a novel type of learning machine based on statistical learning theory [Vapnik, 1998]. The support vector machine (SVM) implements the following idea: It maps the input vector X into a high-dimensional feature space Z through some nonlinear mapping, chosen a priori. In this space, an optimal separating hyperplane is constructed to separate data groupings. The support vector machine (SVM) learning method can be used to classify seismic data patterns for exploration and reservoir characterization applications. The SVM is particularly good at classifying data with nonlinear characteristics. As an example the SVM method is applied to AVO classification of gas sand and wet sand.

Journal ArticleDOI
TL;DR: A DEA-CP model, which once employs the flexible DEA weighting system, can totally rank the entities by specifying nothing arbitrary, and can avoid to use the diverse DEA weights.
Abstract: This paper addresses comprehensive ranking systems determining an ordering of entities by aggregating quantitative data for multiple attributes. We propose a DEA-CP (Data Envelopment Analysis - Compromise Programming) model for the comprehensive ranking, including preference voting (ranked voting) to rank candidates in terms of aggregate vote by rank for each candidate. Although the DEA-CP model once employs the flexible DEA weighting system that can vary by entity, it finally aims at regressing to the common weights across the entities. Therefore, the model can totally rank the entities by specifying nothing arbitrary, and can avoid to use the diverse DEA weights.

Proceedings ArticleDOI
19 May 2004
TL;DR: A novel link-based ranking algorithm RBS, which may be viewed as an extension of PageRank by back-step feature, is presented.
Abstract: We present a novel link-based ranking algorithm RBS, which may be viewed as an extension of PageRank by back-step feature.