Showing papers on "Ranking SVM published in 2005"

PDF

Open Access

Proceedings Article•DOI•

[...]

Chris J.C. Burges¹, Tal Shaked¹, Erin L. Renshaw¹, Ari Lazier¹, Matt Deeds¹, Nicole A. Hamilton¹, Greg Hullender¹ - Show less +3 more•Institutions (1)

Microsoft¹

07 Aug 2005

TL;DR: RankNet is introduced, an implementation of these ideas using a neural network to model the underlying ranking function, and test results on toy data and on data from a commercial internet search engine are presented.

...read moreread less

Abstract: We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. We present test results on toy data and on data from a commercial internet search engine.

...read moreread less

2,813 citations

Proceedings Article•DOI•

A combined SVM and LDA approach for classification

[...]

Tao Xiong¹, Vladimir Cherkassky¹•Institutions (1)

University of Minnesota¹

27 Dec 2005

TL;DR: It is shown that existing SVM software can be used to solve the SVM/LDA formulation and empirical comparisons of the proposed algorithm with SVM and LDA using both synthetic and real world benchmark data are presented.

...read moreread less

Abstract: This paper describes a new large margin classifier, named SVM/LDA. This classifier can be viewed as an extension of support vector machine (SVM) by incorporating some global information about the data. The SVM/LDA classifier can be also seen as a generalization of linear discriminant analysis (LDA) by incorporating the idea of (local) margin maximization into standard LDA formulation. We show that existing SVM software can be used to solve the SVM/LDA formulation. We also present empirical comparisons of the proposed algorithm with SVM and LDA using both synthetic and real world benchmark data.

...read moreread less

1,030 citations

Journal Article•DOI•

Forecasting stock market movement direction with support vector machine

[...]

Wei Huang¹, Yoshiteru Nakamori², Shouyang Wang¹•Institutions (2)

Chinese Academy of Sciences¹, Japan Advanced Institute of Science and Technology²

01 Oct 2005-Computers & Operations Research

TL;DR: This paper investigates the predictability of financial movement direction with SVM by forecasting the weekly movement direction of NIKKEI 225 index and proposes a combining model by integrating SVM with the other classification methods.

...read moreread less

984 citations

Proceedings Article•DOI•

Query chains: learning to rank from implicit feedback

[...]

Filip Radlinski¹, Thorsten Joachims¹•Institutions (1)

Cornell University¹

21 Aug 2005

TL;DR: A novel approach for using clickthrough data to learn ranked retrieval functions for web search results by using query chains to generate new types of preference judgments from search engine logs, thus taking advantage of user intelligence in reformulating queries.

...read moreread less

Abstract: This paper presents a novel approach for using clickthrough data to learn ranked retrieval functions for web search results. We observe that users searching the web often perform a sequence, or chain, of queries with a similar information need. Using query chains, we generate new types of preference judgments from search engine logs, thus taking advantage of user intelligence in reformulating queries. To validate our method we perform a controlled user study comparing generated preference judgments to explicit relevance judgments. We also implemented a real-world search engine to test our approach, using a modified ranking SVM to learn an improved ranking function from preference data. Our results demonstrate significant improvements in the ranking given by the search engine. The learned rankings outperform both a static ranking function, as well as one trained without considering query chains.

...read moreread less

530 citations

Journal Article•DOI•

Link analysis ranking: algorithms, theory, and experiments

[...]

Allan Borodin¹, Gareth O. Roberts², Jeffrey S. Rosenthal¹, Panayiotis Tsaparas³•Institutions (3)

University of Toronto¹, Lancaster University², University of Helsinki³

01 Feb 2005-ACM Transactions on Internet Technology

TL;DR: This article works within the hubs and authorities framework defined by Kleinberg and proposes new families of algorithms, and provides an axiomatic characterization of the INDEGREE heuristic which ranks each node according to the number of incoming links.

...read moreread less

Abstract: The explosive growth and the widespread accessibility of the Web has led to a surge of research activity in the area of information retrieval on the World Wide Web. The seminal papers of Kleinberg [1998, 1999] and Brin and Page [1998] introduced Link Analysis Ranking, where hyperlink structures are used to determine the relative authority of a Web page and produce improved algorithms for the ranking of Web search results. In this article we work within the hubs and authorities framework defined by Kleinberg and we propose new families of algorithms. Two of the algorithms we propose use a Bayesian approach, as opposed to the usual algebraic and graph theoretic approaches. We also introduce a theoretical framework for the study of Link Analysis Ranking algorithms. The framework allows for the definition of specific properties of Link Analysis Ranking algorithms, as well as for comparing different algorithms. We study the properties of the algorithms that we define, and we provide an axiomatic characterization of the INDEGREE heuristic which ranks each node according to the number of incoming links. We conclude the article with an extensive experimental evaluation. We study the quality of the algorithms, and we examine how different structures in the graphs affect their performance.

...read moreread less

323 citations

Proceedings Article•DOI•

Comparison of SVM and LS-SVM for Regression

[...]

Haifeng Wang¹, Dejin Hu¹•Institutions (1)

Shanghai Jiao Tong University¹

13 Oct 2005

TL;DR: Comparisons of least squares support vector machines with SVM for regression show that LS-SVM is preferred especially for large scale problem, because its solution procedure is high efficiency and after pruning both sparseness and performance are comparable with those of SVM.

...read moreread less

Abstract: Support vector machines (SVM) has been widely used in classification and nonlinear function estimation. However, the major drawback of SVM is its higher computational burden for the constrained optimization programming. This disadvantage has been overcome by least squares support vector machines (LS-SVM), which solves linear equations instead of a quadratic programming problem. This paper compares LS-SVM with SVM for regression. According to the parallel test results, conclusions can be made that LS-SVM is preferred especially for large scale problem, because its solution procedure is high efficiency and after pruning both sparseness and performance of LS-SVM are comparable with those of SVM

...read moreread less

277 citations

Journal Article•DOI•

Virtual screening of molecular databases using a support vector machine.

[...]

Robert N. Jorissen¹, Michael K. Gilson¹•Institutions (1)

University of Maryland Biotechnology Institute¹

16 Apr 2005-Journal of Chemical Information and Modeling

TL;DR: The SVM algorithm is applied to the problem of virtual screening for molecules with a desired activity by using a modified version of the standard SVM function to rank molecules and employing a simple and novel criterion for picking molecular descriptors.

...read moreread less

Abstract: The Support Vector Machine (SVM) is an algorithm that derives a model used for the classification of data into two categories and which has good generalization properties. This study applies the SVM algorithm to the problem of virtual screening for molecules with a desired activity. In contrast to typical applications of the SVM, we emphasize not classification but enrichment of actives by using a modified version of the standard SVM function to rank molecules. The method employs a simple and novel criterion for picking molecular descriptors and uses cross-validation to select SVM parameters. The resulting method is more effective at enriching for active compounds with novel chemistries than binary fingerprint-based methods such as binary kernel discrimination.

...read moreread less

257 citations

Journal Article•DOI•

Feature space interpretation of SVMs with indefinite kernels

[...]

Bernard Haasdonk

01 Apr 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A geometric interpretation of SVM with indefinite kernel functions is given and it is shown that such SVM are optimal hyperplane classifiers not by margin maximization, but by minimization of distances between convex hulls in pseudo-Euclidean spaces.

...read moreread less

Abstract: Kernel methods are becoming increasingly popular for various kinds of machine learning tasks, the most famous being the support vector machine (SVM) for classification. The SVM is well understood when using conditionally positive definite (cpd) kernel functions. However, in practice, non-cpd kernels arise and demand application in SVM. The procedure of "plugging" these indefinite kernels in SVM often yields good empirical classification results. However, they are hard to interpret due to missing geometrical and theoretical understanding. In this paper, we provide a step toward the comprehension of SVM classifiers in these situations. We give a geometric interpretation of SVM with indefinite kernel functions. We show that such SVM are optimal hyperplane classifiers not by margin maximization, but by minimization of distances between convex hulls in pseudo-Euclidean spaces. By this, we obtain a sound framework and motivation for indefinite SVM. This interpretation is the basis for further theoretical analysis, e.g., investigating uniqueness, and for the derivation of practical guidelines like characterizing the suitability of indefinite SVM.

...read moreread less

242 citations

Proceedings Article•DOI•

Improving web search results using affinity graph

[...]

Benyu Zhang¹, Hua Li², Yi Liu³, Lei Ji⁴, Wensi Xi⁵, Weiguo Fan⁵, Zheng Chen¹, Wei-Ying Ma¹ - Show less +4 more•Institutions (5)

Microsoft¹, Peking University², Michigan State University³, Beijing Institute of Technology⁴, Virginia Tech⁵

15 Aug 2005

TL;DR: A novel ranking scheme named Affinity Ranking (AR) is proposed to re-rank search results by optimizing two metrics: diversity -- which indicates the variance of topics in a group of documents; and information richness -- which measures the coverage of a single document to its topic.

...read moreread less

Abstract: In this paper, we propose a novel ranking scheme named Affinity Ranking (AR) to re-rank search results by optimizing two metrics: (1) diversity -- which indicates the variance of topics in a group of documents; (2) information richness -- which measures the coverage of a single document to its topic. Both of the two metrics are calculated from a directed link graph named Affinity Graph (AG). AG models the structure of a group of documents based on the asymmetric content similarities between each pair of documents. Experimental results in Yahoo! Directory, ODP Data, and Newsgroup data demonstrate that our proposed ranking algorithm significantly improves the search performance. Specifically, the algorithm achieves 31% improvement in diversity and 12% improvement in information richness relatively within the top 10 search results.

...read moreread less

211 citations

Journal Article•DOI•

SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming

[...]

Qiang Wu¹, Ding-Xuan Zhou¹•Institutions (1)

City University of Hong Kong¹

01 May 2005-Neural Computation

TL;DR: This article shows that the convergence behavior of the linear programming SVM is almost the same as that of the quadratic programming S VM, and proposes an upper bound for the misclassification error for general probability distributions.

...read moreread less

Abstract: Support vector machine (SVM) soft margin classifiers are important learning algorithms for classification problems. They can be stated as convex optimization problems and are suitable for a large data setting. Linear programming SVM classifiers are especially efficient for very large size samples. But little is known about their convergence, compared with the well-understood quadratic programming SVM classifier. In this article, we point out the difficulty and provide an error analysis. Our analysis shows that the convergence behavior of the linear programming SVM is almost the same as that of the quadratic programming SVM. This is implemented by setting a stepping-stone between the linear programming SVM and the classical 1-norm soft margin classifier. An upper bound for the misclassification error is presented for general probability distributions. Explicit learning rates are derived for deterministic and weakly separable distributions, and for distributions satisfying some Tsybakov noise condition.

...read moreread less

179 citations

Journal Article•DOI•

The Genetic Kernel Support Vector Machine: Description and Evaluation

[...]

Tom Howley¹, Michael G. Madden¹•Institutions (1)

National University of Ireland, Galway¹

01 Nov 2005-Artificial Intelligence Review

TL;DR: This paper proposes a classification technique, which it is called the Genetic Kernel SVM (GK SVM), that uses Genetic Programming to evolve a kernel for a SVM classifier.

...read moreread less

Abstract: The Support Vector Machine (SVM) has emerged in recent years as a popular approach to the classification of data. One problem that faces the user of an SVM is how to choose a kernel and the specific parameters for that kernel. Applications of an SVM therefore require a search for the optimum settings for a particular problem. This paper proposes a classification technique, which we call the Genetic Kernel SVM (GK SVM), that uses Genetic Programming to evolve a kernel for a SVM classifier. Results of initial experiments with the proposed technique are presented. These results are compared with those of a standard SVM classifier using the Polynomial, RBF and Sigmoid kernel with various parameter settings

...read moreread less

Journal Article•

SVM based learning system for information extraction

[...]

Yaoyong Li, Kalina Bontcheva, Hamish Cunningham

01 Jan 2005-Lecture Notes in Computer Science

TL;DR: One distinctive feature of this SVM-based learning system for information extraction is the use of a variant of the S VM, the SVM with uneven margins, which is particularly helpful for small training datasets.

...read moreread less

Abstract: This paper presents an SVM-based learning system for information extraction (IE). One distinctive feature of our system is the use of a variant of the SVM, the SVM with uneven margins, which is particularly helpful for small training datasets. In addition, our approach needs fewer SVM classifiers to be trained than other recent SVM-based systems. The paper also compares our approach to several state-of-the-art systems (including rule learning and statistical learning algorithms) on three IE benchmark datasets: CoNLL-2003, CMU seminars, and the software jobs corpus. The experimental results show that our system outperforms a recent SVM-based system on CoNLL-2003, achieves the highest score on eight out of 17 categories on the jobs corpus, and is second best on the remaining nine.

...read moreread less

Journal Article•DOI•

Support vector machines based on K-means clustering for real-time business intelligence systems

[...]

Jiaqi Wang¹, Xindong Wu², Chengqi Zhang¹•Institutions (2)

University of Technology, Sydney¹, University of Vermont²

01 Jul 2005-International Journal of Business Intelligence and Data Mining

TL;DR: Experiments show that the KMSVM algorithm can speed up the response time of classifiers by both reducing support vectors and maintaining a similar testing accuracy to SVM.

...read moreread less

Abstract: Support vector machines (SVM) have been applied to build classifiers, which can help users make well-informed business decisions. Despite their high generalisation accuracy, the response time of SVM classifiers is still a concern when applied into real-time business intelligence systems, such as stock market surveillance and network intrusion detection. This paper speeds up the response of SVM classifiers by reducing the number of support vectors. This is done by the K-means SVM (KMSVM) algorithm proposed in this paper. The KMSVM algorithm combines the K-means clustering technique with SVM and requires one more input parameter to be determined: the number of clusters. The criterion and strategy to determine the input parameters in the KMSVM algorithm are given in this paper. Experiments compare the KMSVM algorithm with SVM on real-world databases, and the results show that the KMSVM algorithm can speed up the response time of classifiers by both reducing support vectors and maintaining a similar testing accuracy to SVM.

...read moreread less

Journal Article•DOI•

Selecting IS personnel use fuzzy GDSS based on metric distance method

[...]

Ling-Show Chen¹, Ling-Show Chen², Ching-Hsue Cheng¹•Institutions (2)

National Yunlin University of Science and Technology¹, Kun Shan University²

01 Feb 2005-European Journal of Operational Research

TL;DR: A new approach to rank fuzzy numbers by metric distance is proposed and the result indicates that the new method is coincident with the intuition ranking and the Lee and Li's fuzzy mean/spread method on each type weight.

...read moreread less

Proceedings Article•DOI•

SVM selective sampling for ranking with application to data retrieval

[...]

Hwanjo Yu¹•Institutions (1)

University of Iowa¹

21 Aug 2005

TL;DR: The proposed sampling technique effectively learns an accurate SVM ranking function with fewer partial orders, and is applied to the data retrieval application, which enables fuzzy search on relational databases by interacting with users for learning their preferences.

...read moreread less

Abstract: Learning ranking (or preference) functions has been a major issue in the machine learning community and has produced many applications in information retrieval. SVMs (Support Vector Machines) - a classification and regression methodology - have also shown excellent performance in learning ranking functions. They effectively learn ranking functions of high generalization based on the "large-margin" principle and also systematically support nonlinear ranking by the "kernel trick". In this paper, we propose an SVM selective sampling technique for learning ranking functions. SVM selective sampling (or active learning with SVM) has been studied in the context of classification. Such techniques reduce the labeling effort in learning classification functions by selecting only the most informative samples to be labeled. However, they are not extendable to learning ranking functions, as the labeled data in ranking is relative ordering, or partial orders of data. Our proposed sampling technique effectively learns an accurate SVM ranking function with fewer partial orders. We apply our sampling technique to the data retrieval application, which enables fuzzy search on relational databases by interacting with users for learning their preferences. Experimental results show a significant reduction of the labeling effort in inducing accurate ranking functions.

...read moreread less

Proceedings Article•DOI•

Genetic algorithm to improve SVM based network intrusion detection system

[...]

Dong Seong Kim, Ha-Nam Nguyen, Jong Sou Park

25 Mar 2005

TL;DR: Through fusions of GA and SVM, the "optimal detection model" for SVM classifier can be determined and this fusion enhances the overall performance of SVM based IDS.

...read moreread less

Abstract: In this paper, we propose genetic algorithm (GA) to improve support vector machines (SVM) based intrusion detection system (IDS). SVM is relatively a novel classification technique and has shown higher performance than traditional learning methods in many applications. So several security researchers have proposed SVM based IDS. We use fusions of GA and SVM to enhance the overall performance of SVM based IDS. Through fusions of GA and SVM, the "optimal detection model" for SVM classifier can be determined. As the result of this fusion, SVM based IDS not only select "optimal parameters "for SVM but also "optimal feature set" among the whole feature set. We demonstrate the feasibility of our method by performing several experiments on KDD 1999 intrusion detection system competition dataset.

...read moreread less

Journal Article•DOI•

Rapid and brief communication: Design efficient support vector machine for fast classification

[...]

Yiqiang Zhan¹, Dinggang Shen¹•Institutions (1)

Johns Hopkins University¹

01 Jan 2005-Pattern Recognition

TL;DR: Compared to the initially trained SVM by all samples, the efficiency of the finally-trained SVM is highly improved, without system degradation.

...read moreread less

Book Chapter•DOI•

Margin-Based ranking meets boosting in the middle

[...]

Cynthia Rudin¹, Corinna Cortes², Mehryar Mohri³, Robert E. Schapire⁴•Institutions (4)

Howard Hughes Medical Institute¹, Google², New York University³, Princeton University⁴

27 Jun 2005

TL;DR: A new algorithm, Smooth Margin Ranking, is described, a modification of RankBoost, analogous to Approximate Coordinate Ascent Boosting, that precisely converges to a maximum ranking-margin solution.

...read moreread less

Abstract: We present several results related to ranking. We give a general margin-based bound for ranking based on the L∞ covering number of the hypothesis space. Our bound suggests that algorithms that maximize the ranking margin generalize well. We then describe a new algorithm, Smooth Margin Ranking, that precisely converges to a maximum ranking-margin solution. The algorithm is a modification of RankBoost, analogous to Approximate Coordinate Ascent Boosting. We also prove a remarkable property of AdaBoost: under very natural conditions, AdaBoost maximizes the exponentiated loss associated with the AUC and achieves the same AUC as RankBoost. This explains the empirical observations made by Cortes and Mohri, and Caruana and Niculescu-Mizil, about the excellent performance of AdaBoost as a ranking algorithm, as measured by the AUC.

...read moreread less

Journal Article•DOI•

A topsis-based centroid–index ranking method of fuzzy numbers and its application in decision-making

[...]

Deng Yong¹, Liu Qi²•Institutions (2)

Shanghai Jiao Tong University¹, Chinese Academy of Sciences²

01 Sep 2005-Cybernetics and Systems

TL;DR: A new centroid-index ranking method of fuzzy numbers is proposed using the ideal of Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS).

...read moreread less

Abstract: Ranking fuzzy numbers plays a very important role in decision-making problems. Existing centroid-index ranking methods have some drawbacks. In this article, a new centroid-index ranking method of fuzzy numbers is proposed. The proposed method is using the ideal of Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS). Some numerical examples show that the new method can overcome the drawbacks of the existing methods. Finally, a human selection problem is used to illustrate the efficiency of the proposed fuzzy ranking method.

...read moreread less

Book Chapter•DOI•

Stability and generalization of bipartite ranking algorithms

[...]

Shivani Agarwal¹, Partha Niyogi²•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of Chicago²

27 Jun 2005

TL;DR: It is shown that kernel-based ranking algorithms that perform regularization in a reproducing kernel Hilbert space have such stability properties, and therefore the bounds can be applied to these algorithms; this is in contrast with previous generalization bounds for ranking, which are based on uniform convergence and in many cases cannot be appliedto these algorithms.

...read moreread less

Abstract: The problem of ranking, in which the goal is to learn a real-valued ranking function that induces a ranking or ordering over an instance space, has recently gained attention in machine learning. We study generalization properties of ranking algorithms, in a particular setting of the ranking problem known as the bipartite ranking problem, using the notion of algorithmic stability.In particular, we derive generalization bounds for bipartite ranking algorithms that have good stability properties. We show that kernel-based ranking algorithms that perform regularization in a reproducing kernel Hilbert space have such stability properties, and therefore our bounds can be applied to these algorithms; this is in contrast with previous generalization bounds for ranking, which are based on uniform convergence and in many cases cannot be applied to these algorithms. A comparison of the bounds we obtain with corresponding bounds for classification algorithms yields some interesting insights into the difference in generalization behaviour between ranking and classification.

...read moreread less

Book Chapter•DOI•

Ranking and scoring using empirical risk minimization

[...]

Stéphan Clémençon¹, Gábor Lugosi², Nicolas Vayatis³•Institutions (3)

Paris West University Nanterre La Défense¹, Pompeu Fabra University², University of Paris³

27 Jun 2005

TL;DR: This work investigates learning methods based on empirical minimization of the natural estimates of the ranking risk of U-statistics and U-processes to give a theoretical framework for ranking algorithms based on boosting and support vector machines.

...read moreread less

Abstract: A general model is proposed for studying ranking problems. We investigate learning methods based on empirical minimization of the natural estimates of the ranking risk. The empirical estimates are of the form of a U-statistic. Inequalities from the theory of U-statistics and U-processes are used to obtain performance bounds for the empirical risk minimizers. Convex risk minimization methods are also studied to give a theoretical framework for ranking algorithms based on boosting and support vector machines. Just like in binary classification, fast rates of convergence are achieved under certain noise assumption. General sufficient conditions are proposed in several special cases that guarantee fast rates of convergence.

...read moreread less

Proceedings Article•DOI•

A parallel SVM training algorithm on large-scale classification problems

[...]

Jian-Pei Zhang¹, Zhong-Wei Li¹, Jing Yang¹•Institutions (1)

Harbin Engineering University¹

07 Nov 2005

TL;DR: A parallel training algorithm on large-scale classification problems is proposed, in which multiple SVM classifiers are applied and may be trained in a distributed computer system, which shows that this parallel SVM training algorithm is efficient and has more satisfying accuracy compared with standard cascade SVM algorithm in classification precision.

...read moreread less

Abstract: Support vector machine (SVM) has become a popular classification tool but the main disadvantages of SVM algorithms are their large memory requirement and computation time to deal with very large datasets. To speed up the process of training SVM, parallel methods have been proposed by splitting the problem into smaller subsets and training a network to assign samples of different subsets. A parallel training algorithm on large-scale classification problems is proposed, in which multiple SVM classifiers are applied and may be trained in a distributed computer system. As an improvement algorithm of cascade SVM, the support vectors are obtained according to the data samples' distance mean and the feedback is not the whole final output but alternating to avoid the problem that the learning results are subject to the distribution state of the data samples in different subsets. The experiment results on real-world text dataset show that this parallel SVM training algorithm is efficient and has more satisfying accuracy compared with standard cascade SVM algorithm in classification precision.

...read moreread less

Proceedings Article•DOI•

Measuring the Relative Compositionality of Verb-Noun (V-N) Collocations by Integrating Features

[...]

Sriram Venkatapathy¹, Aravind K. Joshi²•Institutions (2)

International Institute of Information Technology, Hyderabad¹, University of Pennsylvania²

06 Oct 2005

TL;DR: Novel measures (both collocation based and context based measures) to measure the relative compositionality of MWEs of V-N type are defined and it is shown that the correlation of these features with the human ranking is much superior to the correlated of the traditional features withThe human ranking.

...read moreread less

Abstract: Measuring the relative compositionality of Multi-word Expressions (MWEs) is crucial to Natural Language Processing. Various collocation based measures have been proposed to compute the relative compositionality of MWEs. In this paper, we define novel measures (both collocation based and context based measures) to measure the relative compositionality of MWEs of V-N type. We show that the correlation of these features with the human ranking is much superior to the correlation of the traditional features with the human ranking. We then integrate the proposed features and the traditional features using a SVM based ranking function to rank the collocations of V-N type based on their relative compositionality. We then show that the correlation between the ranks computed by the SVM based ranking function and human ranking is significantly better than the correlation between ranking of individual features and human ranking.

...read moreread less

Journal Article•DOI•

An effective total ranking model for a ranked voting system

[...]

A. A. Foroughi¹, Mehrdad Tamiz²•Institutions (2)

University of Qom¹, University of Portsmouth²

01 Dec 2005-Omega-international Journal of Management Science

TL;DR: An effective model to rank candidates in a preferential election is proposed that is an extension and simplified form of a recently proposed model for ranking efficient candidates and can be used for ranking inefficient as well as efficient candidates.

...read moreread less

Abstract: In this paper an effective model to rank candidates in a preferential election is proposed. It is an extension and simplified form of a recently proposed model for ranking efficient candidates. The model consists of fewer constraints and can be used for ranking inefficient as well as efficient candidates. Some techniques are introduced to decrease the complexity of the proposed model by obtaining some of the results by inspection.

...read moreread less

Proceedings Article•DOI•

Ranking definitions with supervised learning methods

[...]

Jun Xu¹, Yunbo Cao², Hang Li², Min Zhao³•Institutions (3)

Nankai University¹, Microsoft², Chinese Academy of Sciences³

10 May 2005

TL;DR: Experimental results indicate that the use of SVM and Ranking SVM can significantly outperform the baseline methods of using heuristic rules or employing the conventional information retrieval method of Okapi, indicating that generic models for definition ranking can be constructed.

...read moreread less

Abstract: This paper is concerned with the problem of definition search. Specifically, given a term, we are to retrieve definitional excerpts of the term and rank the extracted excerpts according to their likelihood of being good definitions. This is in contrast to the traditional approaches of either generating a single combined definition or simply outputting all retrieved definitions. Definition ranking is essential for the task. Methods for performing definition ranking are proposed in this paper, which formalize the problem as either classification or ordinal regression. A specification for judging the goodness of a definition is given. We employ SVM as the classification model and Ranking SVM as the ordinal regression model respectively, such that they rank definition candidates according to their likelihood of being good definitions. Features for constructing the SVM and Ranking SVM models are defined. An enterprise search system based on this method has been developed and has been put into practical use. Experimental results indicate that the use of SVM and Ranking SVM can significantly outperform the baseline methods of using heuristic rules or employing the conventional information retrieval method of Okapi. This is true both when the answers are paragraphs and when they are sentences. Experimental results also show that SVM or Ranking SVM models trained in one domain can be adapted to another domain, indicating that generic models for definition ranking can be constructed.

...read moreread less

Book Chapter•DOI•

Automatic text summarization based on word-clusters and ranking algorithms

[...]

Massih R. Amini, Nicolas Usunier, Patrick Gallinari

21 Mar 2005

TL;DR: This paper investigates a new approach for Single Document Summarization based on a Machine Learning ranking algorithm and proposes an original framework based on ranking for this task, believing that the classification criterion for training a classifier is not adapted for SDS.

...read moreread less

Abstract: This paper investigates a new approach for Single Document Summarization based on a Machine Learning ranking algorithm. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have motivated an increasing amount of work in this field over the last few years. Most approaches attempt to generate summaries by extracting text-spans (sentences in our case) and adopt the classification framework which consists to train a classifier in order to discriminate between relevant and irrelevant spans of a document. A set of features is first used to produce a vector of scores for each sentence in a given document and a classifier is trained in order to make a global combination of these scores. We believe that the classification criterion for training a classifier is not adapted for SDS and propose an original framework based on ranking for this task. A ranking algorithm also combines the scores of different features but its criterion tends to reduce the relative misordering of sentences within a document. Features we use here are either based on the state-of-the-art or built upon word-clusters. These clusters are groups of words which often co-occur with each other, and can serve to expand a query or to enrich the representation of the sentences of the documents. We analyze the performance of our ranking algorithm on two data sets – the Computation and Language (cmp_lg) collection of TIPSTER SUMMAC and the WIPO collection. We perform comparisons with different baseline – non learning – systems, and a reference trainable summarizer system based on the classification framework. The experiments show that the learning algorithms perform better than the non-learning systems while the ranking algorithm outperforms the classifier. The difference of performance between the two learning algorithms depends on the nature of datasets. We give an explanation of this fact by the different separability hypothesis of the data made by the two learning algorithms.

...read moreread less

Journal Article•DOI•

Ranking fuzzy number based on lexicographic screening procedure

[...]

Miao-Ling Wang¹, Hsiao-Fan Wang², Chih-Lung Lin²•Institutions (2)

Minghsin University of Science and Technology¹, National Tsing Hua University²

01 Dec 2005-International Journal of Information Technology and Decision Making

TL;DR: It is pointed out that it cannot be avoided for different fuzzy numbers to be corresponding to the same real number by any type of ranking functions, therefore, sequential screening approach would be necessary for a complete order of fuzzy numbers.

...read moreread less

Abstract: Fuzzy numbers are used for representing the numerical quantities in a vague environment. Their ranking order has been an important issue for application purposes. Most of the existing ranking methods transform a fuzzy number into a real number based on certain criteria. There is yet no method that can always give a satisfactory solution. This study pointed out that it cannot be avoided for different fuzzy numbers to be corresponding to the same real number by any type of ranking functions. Therefore, sequential screening approach would be necessary for a complete order of fuzzy numbers. We summarize the properties for a sufficient and necessary set of ranking criteria. Then, based on the ranking order of these criteria, a Lexicographical Screening Procedure is proposed, which has shown to be promising in ranking fuzzy number with efficient computation and ease of understanding. Six counterexamples of the existing methods were used for comparison.

...read moreread less

Journal Article•DOI•

Set-based vector model: An efficient approach for correlation-based ranking

[...]

Bruno Pôssas¹, Nivio Ziviani¹, Wagner Meira¹, Berthier Ribeiro-Neto¹•Institutions (1)

Universidade Federal de Minas Gerais¹

01 Oct 2005-ACM Transactions on Information Systems

TL;DR: The results suggest that the set-based vector model provides a correlation-based ranking formula that is effective with general collections and computationally practical.

...read moreread less

Abstract: This work presents a new approach for ranking documents in the vector space model. The novelty lies in two fronts. First, patterns of term co-occurrence are taken into account and are processed efficiently. Second, term weights are generated using a data mining technique called association rules. This leads to a new ranking mechanism called the set-based vector model. The components of our model are no longer index terms but index termsets, where a termset is a set of index terms. Termsets capture the intuition that semantically related terms appear close to each other in a document. They can be efficiently obtained by limiting the computation to small passages of text. Once termsets have been computed, the ranking is calculated as a function of the termset frequency in the document and its scarcity in the document collection. Experimental results show that the set-based vector model improves average precision for all collections and query types evaluated, while keeping computational costs small. For the 2-gigabyte TREC-8 collection, the set-based vector model leads to a gain in average precision figures of 14.7p and 16.4p for disjunctive and conjunctive queries, respectively, with respect to the standard vector space model. These gains increase to 24.9p and 30.0p, respectively, when proximity information is taken into account. Query processing times are larger but, on average, still comparable to those obtained with the standard vector model (increases in processing time varied from 30p to 300p). Our results suggest that the set-based vector model provides a correlation-based ranking formula that is effective with general collections and computationally practical.

...read moreread less

Journal Article•DOI•

Online Ranking by Projecting

[...]

Koby Crammer¹, Yoram Singer¹•Institutions (1)

Hebrew University of Jerusalem¹

01 Jan 2005-Neural Computation

TL;DR: The goal is to find a rank-prediction rule that assigns each instance a rank that is as close as possible to the instance's true rank.

...read moreread less

Abstract: We discuss the problem of ranking instances. In our framework, each instance is associated with a rank or a rating, which is an integer in 1 to k. Our goal is to find a rank-prediction rule that assigns each instance a rank that is as close as possible to the instance's true rank. We discuss a group of closely related online algorithms, analyze their performance in the mistake-bound model, and prove their correctness. We describe two sets of experiments, with synthetic data and with the EachMovie data set for collaborative filtering. In the experiments we performed, our algorithms outperform online algorithms for regression and classification applied to ranking.

...read moreread less

Journal Article•DOI•

On the convergence of a modified version of SVM light algorithm

[...]

Laura Palagi¹, Marco Sciandrone•Institutions (1)

Sapienza University of Rome¹

01 Apr 2005-Optimization Methods & Software

TL;DR: This work considers the convex quadratic programming problem arising in support vector machine (SVM), which is a technique designed to solve a variety of learning and pattern recognition problems, and proposes a decomposition method on the basis of a proximal point modification of the subproblem and a working set selection rule.

...read moreread less

Abstract: In this work, we consider the convex quadratic programming problem arising in support vector machine (SVM), which is a technique designed to solve a variety of learning and pattern recognition problems. Since the Hessian matrix is dense and real applications lead to large-scale problems, several decomposition methods have been proposed, which split the original problem into a sequence of smaller subproblems. SVM light algorithm is a commonly used decomposition method for SVM, and its convergence has been proved only recently under a suitable block-wise convexity assumption on the objective function. In SVM light algorithm, the size q of the working set, i.e. the dimension of the subproblem, can be any even number. In the present paper, we propose a decomposition method on the basis of a proximal point modification of the subproblem and the basis of a working set selection rule that includes, as a particular case, the one used by the SVM light algorithm. We establish the asymptotic convergence of the metho...

...read moreread less