scispace - formally typeset
Search or ask a question

Showing papers on "Ranking (information retrieval) published in 2010"


Patent
14 Sep 2010
TL;DR: An improved human user computer interface system, wherein a user characteristic or set of characteristics, such as demographic profile or societal role, is employed to define a scope or domain of operation, is proposed in this article, where user privacy and anonymity is maintained by physical and algorithmic controls over access to the personal profiles, and releasing only aggregate data without personally identifying information or of small groups.
Abstract: An improved human user computer interface system, wherein a user characteristic or set of characteristics, such as demographic profile or societal “role”, is employed to define a scope or domain of operation. The operation itself may be a database search, to interactively define a taxonomic context for the operation, a business negotiation, or other activity. After retrieval of results, a scoring or ranking may be applied according to user define criteria, which are, for example, commensurate with the relevance to the context, but may be, for example, by date, source, or other secondary criteria. A user profile is preferably stored in a computer accessible form, and may be used to provide a history of use, persistent customization, collaborative filtering and demographic information for the user. Advantageously, user privacy and anonymity is maintained by physical and algorithmic controls over access to the personal profiles, and releasing only aggregate data without personally identifying information or of small groups.

1,465 citations


23 Jun 2010
TL;DR: RankNet, LambdaRank, and LambdaMART have proven to be very successful algorithms for solving real world ranking problems and the details are spread across several papers and reports, so here is a self-contained, detailed and complete description of them.
Abstract: LambdaMART is the boosted tree version of LambdaRank, which is based on RankNet. RankNet, LambdaRank, and LambdaMART have proven to be very successful algorithms for solving real world ranking problems: for example an ensemble of LambdaMART rankers won Track 1 of the 2010 Yahoo! Learning To Rank Challenge. The details of these algorithms are spread across several papers and reports, and so here we give a self-contained, detailed and complete description of them.

1,114 citations


Journal Article
TL;DR: OASIS is an online dual approach using the passive-aggressive family of learning algorithms with a large margin criterion and an efficient hinge loss cost, which suggests that query independent similarity could be accurately learned even for large scale data sets that could not be handled before.
Abstract: Learning a measure of similarity between pairs of objects is an important generic problem in machine learning. It is particularly useful in large scale applications like searching for an image that is similar to a given image or finding videos that are relevant to a given video. In these tasks, users look for objects that are not only visually similar but also semantically related to a given object. Unfortunately, the approaches that exist today for learning such semantic similarity do not scale to large data sets. This is both because typically their CPU and storage requirements grow quadratically with the sample size, and because many methods impose complex positivity constraints on the space of learned similarity functions. The current paper presents OASIS, an Online Algorithm for Scalable Image Similarity learning that learns a bilinear similarity measure over sparse representations. OASIS is an online dual approach using the passive-aggressive family of learning algorithms with a large margin criterion and an efficient hinge loss cost. Our experiments show that OASIS is both fast and accurate at a wide range of scales: for a data set with thousands of images, it achieves better results than existing state-of-the-art methods, while being an order of magnitude faster. For large, web scale, data sets, OASIS can be trained on more than two million images from 150K text queries within 3 days on a single CPU. On this large scale data set, human evaluations showed that 35% of the ten nearest neighbors of a given test image, as found by OASIS, were semantically relevant to that image. This suggests that query independent similarity could be accurately learned even for large scale data sets that could not be handled before.

738 citations


Proceedings ArticleDOI
01 Jan 2010
TL;DR: This work converts the person re-identification problem from an absolute scoring p roblem to a relative ranking problem and develops an novel Ensemble RankSVM to overcome the scalability limitation problem suffered by existing SVM-based ranking methods.
Abstract: Solving the person re-identification problem involves matching observation s of individuals across disjoint camera views. The problem becomes particularly hard in a busy public scene as the number of possible matches is very high. This is further compounded by significant appearance changes due to varying lighting conditions, vie wing angles and body poses across camera views. To address this problem, existing approaches focus on extracting or learning discriminative features followed by template matching using a distance measure. The novelty of this work is that we reformulate the person reidentification problem as a ranking problem and learn a subspace where th e potential true match is given highest ranking rather than any direct distance measure. By doing so, we convert the person re-identification problem from an absolute scoring p roblem to a relative ranking problem. We further develop an novel Ensemble RankSVMto overcome the scalability limitation problem suffered by existing SVM-based ranking methods. This new model reduces significantly memory usage therefore is much more scalable, whilst maintaining high-level performance. We present extensive experiments to demonstrate the performance gain of the proposed ranking approach over existing template matching and classification models.

736 citations


Journal ArticleDOI
Tao Qin1, Tie-Yan Liu1, Jun Xu1, Hang Li1
TL;DR: The details of the LETOR collection are described and it is shown how it can be used in different kinds of researches, and several state-of-the-art learning to rank algorithms on LETOR are compared.
Abstract: LETOR is a benchmark collection for the research on learning to rank for information retrieval, released by Microsoft Research Asia. In this paper, we describe the details of the LETOR collection and show how it can be used in different kinds of researches. Specifically, we describe how the document corpora and query sets in LETOR are selected, how the documents are sampled, how the learning features and meta information are extracted, and how the datasets are partitioned for comprehensive evaluation. We then compare several state-of-the-art learning to rank algorithms on LETOR, report their ranking performances, and make discussions on the results. After that, we discuss possible new research topics that can be supported by LETOR, in addition to algorithm comparison. We hope that this paper can help people to gain deeper understanding of LETOR, and enable more interesting research projects on learning to rank and related topics.

486 citations


Proceedings ArticleDOI
26 Apr 2010
TL;DR: A novel probabilistic framework for Web search result diversification, which explicitly accounts for the various aspects associated to an underspecified query, is introduced and diversify a document ranking by estimating how well a given document satisfies each uncovered aspect and the extent to which different aspects are satisfied by the ranking as a whole.
Abstract: When a Web user's underlying information need is not clearly specified from the initial query, an effective approach is to diversify the results retrieved for this query. In this paper, we introduce a novel probabilistic framework for Web search result diversification, which explicitly accounts for the various aspects associated to an underspecified query. In particular, we diversify a document ranking by estimating how well a given document satisfies each uncovered aspect and the extent to which different aspects are satisfied by the ranking as a whole. We thoroughly evaluate our framework in the context of the diversity task of the TREC 2009 Web track. Moreover, we exploit query reformulations provided by three major Web search engines (WSEs) as a means to uncover different query aspects. The results attest the effectiveness of our framework when compared to state-of-the-art diversification approaches in the literature. Additionally, by simulating an upper-bound query reformulation mechanism from official TREC data, we draw useful insights regarding the effectiveness of the query reformulations generated by the different WSEs in promoting diversity.

464 citations


Proceedings Article
15 Jul 2010
TL;DR: The participating systems were evaluated by matching their extracted keyphrases against manually assigned ones and the overall ranking of the submitted systems is presented.
Abstract: This paper describes Task 5 of the Workshop on Semantic Evaluation 2010 (SemEval-2010). Systems are to automatically assign keyphrases or keywords to given scientific articles. The participating systems were evaluated by matching their extracted keyphrases against manually assigned ones. We present the overall ranking of the submitted systems and discuss our findings to suggest future directions for this task.

413 citations


Proceedings Article
21 Jun 2010
TL;DR: A general metric learning algorithm is presented, based on the structural SVM framework, to learn a metric such that rankings of data induced by distance from a query can be optimized against various ranking measures, such as AUC, Precision-at-k, MRR, MAP or NDCG.
Abstract: We study metric learning as a problem of information retrieval. We present a general metric learning algorithm, based on the structural SVM framework, to learn a metric such that rankings of data induced by distance from a query can be optimized against various ranking measures, such as AUC, Precision-at-k, MRR, MAP or NDCG. We demonstrate experimental results on standard classification data sets, and a large-scale online dating recommendation problem.

371 citations


Journal ArticleDOI
TL;DR: The interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, parsing of abstracts by topics and authors, and detection of unusual papers by specific authors are discussed.
Abstract: We propose an unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a two-stage stochastic process. An author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words. The probability distribution over topics in a multi-author paper is a mixture of the distributions associated with the authors. The topic-word and author-topic distributions are learned from data in an unsupervised manner using a Markov chain Monte Carlo algorithm. We apply the methodology to three large text corpora: 150,000 abstracts from the CiteSeer digital library, 1740 papers from the Neural Information Processing Systems (NIPS) Conferences, and 121,000 emails from the Enron corporation. We discuss in detail the interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, parsing of abstracts by topics and authors, and detection of unusual papers by specific authors. Experiments based on perplexity scores for test documents and precision-recall for document retrieval are used to illustrate systematic differences between the proposed author-topic model and a number of alternatives. Extensions to the model, allowing for example, generalizations of the notion of an author, are also briefly discussed.

329 citations


Book
13 Oct 2010
TL;DR: The editors first offer a thorough introduction, including a systematic categorization according to learning task and learning technique, along with a unified notation, and the first half of the book is organized into parts on applications of preference learning in multiattribute domains, information retrieval, and recommender systems.
Abstract: The topic of preferences is a new branch of machine learning and data mining, and it has attracted considerable attention in artificial intelligence research in previous years. It involves learning from observations that reveal information about the preferences of an individual or a class of individuals. Representing and processing knowledge in terms of preferences is appealing as it allows one to specify desires in a declarative way, to combine qualitative and quantitative modes of reasoning, and to deal with inconsistencies and exceptions in a flexible manner. And, generalizing beyond training data, models thus learned may be used for preference prediction. This is the first book dedicated to this topic, and the treatment is comprehensive. The editors first offer a thorough introduction, including a systematic categorization according to learning task and learning technique, along with a unified notation. The first half of the book is organized into parts on label ranking, instance ranking, and object ranking; while the second half is organized into parts on applications of preference learning in multiattribute domains, information retrieval, and recommender systems. The book will be of interest to researchers and practitioners in artificial intelligence, in particular machine learning and data mining, and in fields such as multicriteria decision-making and operations research.

304 citations


Proceedings Article
23 Aug 2010
TL;DR: This paper proposes a new ranking strategy which uses not only the content relevance of a tweet, but also the account authority and tweet-specific features such as whether a URL link is included in the tweet.
Abstract: Twitter, as one of the most popular micro-blogging services, provides large quantities of fresh information including real-time news, comments, conversation, pointless babble and advertisements. Twitter presents tweets in chronological order. Recently, Twitter introduced a new ranking strategy that considers popularity of tweets in terms of number of retweets. This ranking method, however, has not taken into account content relevance or the twitter account. Therefore a large amount of pointless tweets inevitably flood the relevant tweets. This paper proposes a new ranking strategy which uses not only the content relevance of a tweet, but also the account authority and tweet-specific features such as whether a URL link is included in the tweet. We employ learning to rank algorithms to determine the best set of features with a series of experiments. It is demonstrated that whether a tweet contains URL or not, length of tweet and account authority are the best conjunction.

Journal ArticleDOI
TL;DR: A new perspective to this problem is provided by considering the existing shapes as a group, and study their similarity measures to the query shape in a graph structure, which achieves promising improvements on both shape classification and shape clustering.
Abstract: Shape similarity and shape retrieval are very important topics in computer vision. The recent progress in this domain has been mostly driven by designing smart shape descriptors for providing better similarity measure between pairs of shapes. In this paper, we provide a new perspective to this problem by considering the existing shapes as a group, and study their similarity measures to the query shape in a graph structure. Our method is general and can be built on top of any existing shape similarity measure. For a given similarity measure, a new similarity is learned through graph transduction. The new similarity is learned iteratively so that the neighbors of a given shape influence its final similarity to the query. The basic idea here is related to PageRank ranking, which forms a foundation of Google Web search. The presented experimental results demonstrate that the proposed approach yields significant improvements over the state-of-art shape matching algorithms. We obtained a retrieval rate of 91.61 percent on the MPEG-7 data set, which is the highest ever reported in the literature. Moreover, the learned similarity by the proposed method also achieves promising improvements on both shape classification and shape clustering.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: A new transductive learning framework for image retrieval is proposed, in which images are taken as vertices in a weighted hypergraph and the task of image search is formulated as the problem of hypergraph ranking.
Abstract: In this paper, we propose a new transductive learning framework for image retrieval, in which images are taken as vertices in a weighted hypergraph and the task of image search is formulated as the problem of hypergraph ranking. Based on the similarity matrix computed from various feature descriptors, we take each image as a ‘centroid’ vertex and form a hyperedge by a centroid and its k-nearest neighbors. To further exploit the correlation information among images, we propose a probabilistic hypergraph, which assigns each vertex v i to a hyperedge e j in a probabilistic way. In the incidence structure of a probabilistic hypergraph, we describe both the higher order grouping information and the affinity relationship between vertices within each hy-peredge. After feedback images are provided, our retrieval system ranks image labels by a transductive inference approach, which tends to assign the same label to vertices that share many incidental hyperedges, with the constraints that predicted labels of feedback images should be similar to their initial labels. We compare the proposed method to several other methods and its effectiveness is demonstrated by extensive experiments on Corel5K, the Scene dataset and Caltech 101.

Proceedings Article
23 Aug 2010
TL;DR: The problem is formulated as a bipartite graph and the well-known web page ranking algorithm HITS is used to find important features and rank them high and demonstrates promising results on diverse real-life datasets.
Abstract: An important task of opinion mining is to extract people's opinions on features of an entity. For example, the sentence, "I love the GPS function of Motorola Droid" expresses a positive opinion on the "GPS function" of the Motorola phone. "GPS function" is the feature. This paper focuses on mining features. Double propagation is a state-of-the-art technique for solving the problem. It works well for medium-size corpora. However, for large and small corpora, it can result in low precision and low recall. To deal with these two problems, two improvements based on part-whole and "no" patterns are introduced to increase the recall. Then feature ranking is applied to the extracted feature candidates to improve the precision of the top-ranked candidates. We rank feature candidates by feature importance which is determined by two factors: feature relevance and feature frequency. The problem is formulated as a bipartite graph and the well-known web page ranking algorithm HITS is used to find important features and rank them high. Experiments on diverse real-life datasets show promising results.

Posted Content
TL;DR: It is found that in the author co-citation network, citation rank is highly correlated with PageRank's with different damping factors and also with different PageRank algorithms; citation rank and PageRank are not significantly correlation with centrality measures; and h-index is not significantly correlated withcentrality measures.
Abstract: Google's PageRank has created a new synergy to information retrieval for a better ranking of Web pages. It ranks documents depending on the topology of the graphs and the weights of the nodes. PageRank has significantly advanced the field of information retrieval and keeps Google ahead of competitors in the search engine market. It has been deployed in bibliometrics to evaluate research impact, yet few of these studies focus on the important impact of the damping factor (d) for ranking purposes. This paper studies how varied damping factors in the PageRank algorithm can provide additional insight into the ranking of authors in an author co-citation network. Furthermore, we propose weighted PageRank algorithms. We select 108 most highly cited authors in the information retrieval (IR) area from the 1970s to 2008 to form the author co-citation network. We calculate the ranks of these 108 authors based on PageRank with damping factor ranging from 0.05 to 0.95. In order to test the relationship between these different measures, we compare PageRank and weighted PageRank results with the citation ranking, h-index, and centrality measures. We found that in our author co-citation network, citation rank is highly correlated with PageRank's with different damping factors and also with different PageRank algorithms; citation rank and PageRank are not significantly correlated with centrality measures; and h-index is not significantly correlated with centrality measures.

Patent
05 Aug 2010
TL;DR: In this paper, a facial recognition search system identifies one or more likely names (or other personal identifiers) corresponding to the facial image(s) in a query as follows: after receiving the visual query with one or multiple facial images, the system identifies images that potentially match the respective facial image in accordance with visual similarity criteria.
Abstract: A facial recognition search system identifies one or more likely names (or other personal identifiers) corresponding to the facial image(s) in a query as follows. After receiving the visual query with one or more facial images, the system identifies images that potentially match the respective facial image in accordance with visual similarity criteria. Then one or more persons associated with the potential images are identified. For each identified person, person-specific data comprising metrics of social connectivity to the requester are retrieved from a plurality of applications such as communications applications, social networking applications, calendar applications, and collaborative applications. An ordered list of persons is then generated by ranking the identified persons in accordance with at least metrics of visual similarity between the respective facial image and the potential image matches and with the social connection metrics. Finally, at least one person identifier from the list is sent to the requester.

Journal ArticleDOI
TL;DR: A diverse relevance ranking scheme that is able to take relevance and diversity into account by exploring the content of images and their associated tags, and it is shown that the diversity of search results can be enhanced while maintaining a comparable level of relevance.
Abstract: Recent years have witnessed the great success of social media websites. Tag-based image search is an important approach to accessing the image content on these websites. However, the existing ranking methods for tag-based image search frequently return results that are irrelevant or not diverse. This paper proposes a diverse relevance ranking scheme that is able to take relevance and diversity into account by exploring the content of images and their associated tags. First, it estimates the relevance scores of images with respect to the query term based on both the visual information of images and the semantic information of associated tags. Then, we estimate the semantic similarities of social images based on their tags. Based on the relevance scores and the similarities, the ranking list is generated by a greedy ordering algorithm which optimizes average diverse precision, a novel measure that is extended from the conventional average precision. Comprehensive experiments and user studies demonstrate the effectiveness of the approach. We also apply the scheme for web image search reranking, and it is shown that the diversity of search results can be enhanced while maintaining a comparable level of relevance.

Proceedings ArticleDOI
26 Apr 2010
TL;DR: An approximate index structure summarising graph-structured content of sources adhering to Linked data principles is developed, an algorithm for answering conjunctive queries over Linked Data on theWeb exploiting the source summary is provided, and the system is evaluated using synthetically generated queries.
Abstract: Typical approaches for querying structured Web Data collect (crawl) and pre-process (index) large amounts of data in a central data repository before allowing for query answering. However, this time-consuming pre-processing phase however leverages the benefits of Linked Data -- where structured data is accessible live and up-to-date at distributed Web resources that may change constantly -- only to a limited degree, as query results can never be current. An ideal query answering system for Linked Data should return current answers in a reasonable amount of time, even on corpora as large as the Web. Query processors evaluating queries directly on the live sources require knowledge of the contents of data sources. In this paper, we develop and evaluate an approximate index structure summarising graph-structured content of sources adhering to Linked Data principles, provide an algorithm for answering conjunctive queries over Linked Data on theWeb exploiting the source summary, and evaluate the system using synthetically generated queries. The experimental results show that our lightweight index structure enables complete and up-to-date query results over Linked Data, while keeping the overhead for querying low and providing a satisfying source ranking at no additional cost.

Journal ArticleDOI
Olivier Chapelle1, Mingrui Wu1
TL;DR: This work proposes an algorithm which aims at directly optimizing popular measures such as the Normalized Discounted Cumulative Gain and the Average Precision, to minimize a smooth approximation of these measures with gradient descent.
Abstract: Most ranking algorithms are based on the optimization of some loss functions, such as the pairwise loss. However, these loss functions are often different from the criteria that are adopted to measure the quality of the web page ranking results. To overcome this problem, we propose an algorithm which aims at directly optimizing popular measures such as the Normalized Discounted Cumulative Gain and the Average Precision. The basic idea is to minimize a smooth approximation of these measures with gradient descent. Crucial to this kind of approach is the choice of the smoothing factor. We provide various theoretical analysis on that choice and propose an annealing algorithm to iteratively minimize a less and less smoothed approximation of the measure of interest. Results on the Letor benchmark datasets show that the proposed algorithm achieves state-of-the-art performances.

Book
David Carmel1, Elad Yom-Tov1
30 Apr 2010
TL;DR: This tutorial is to expose participants to the current research on query performance prediction (also known as query difficulty estimation), and participants will become familiar with states-of-the-art performance prediction methods, and with common evaluation methodologies for prediction quality.
Abstract: Many information retrieval (IR) systems suffer from a radical variance in performance when responding to users' queries. Even for systems that succeed very well on average, the quality of results returned for some of the queries is poor. Thus, it is desirable that IR systems will be able to identify "difficult" queries in order to handle them properly. Understanding why some queries are inherently more difficult than others is essential for IR, and a good answer to this important question will help search engines to reduce the variance in performance, hence better servicing their customer needs. The high variability in query performance has driven a new research direction in the IR field on estimating the expected quality of the search results, i.e. the query difficulty, when no relevance feedback is given. Estimating the query difficulty is a significant challenge due to the numerous factors that impact retrieval performance. Many prediction methods have been proposed recently. However, as many researchers observed, the prediction quality of state-of-the-art predictors is still too low to be widely used by IR applications. The low prediction quality is due to the complexity of the task, which involves factors such as query ambiguity, missing content, and vocabulary mismatch. The goal of this tutorial is to expose participants to the current research on query performance prediction (also known as query difficulty estimation). Participants will become familiar with states-of-the-art performance prediction methods, and with common evaluation methodologies for prediction quality. We will discuss the reasons that cause search engines to fail for some of the queries, and provide an overview of several approaches for estimating query difficulty. We then describe common methodologies for evaluating the prediction quality of those estimators, and some experiments conducted recently with their prediction quality, as measured over several TREC benchmarks. We will cover a few potential applications that can utilize query difficulty estimators by handling each query individually and selectively based on its estimated difficulty. Finally we will summarize with a discussion on open issues and challenges in the field.

Journal ArticleDOI
TL;DR: A novel correlation based memetic framework (MA-C) which is a combination of genetic algorithm (GA) and local search (LS) using correlation based filter ranking is proposed in this paper and outperforms recent existing methods in the literature in terms of classification accuracy, selected feature size and efficiency.
Abstract: A novel correlation based memetic framework (MA-C) which is a combination of genetic algorithm (GA) and local search (LS) using correlation based filter ranking is proposed in this paper. The local filter method used here fine-tunes the population of GA solutions by adding or deleting features based on Symmetrical Uncertainty (SU) measure. The focus here is on filter methods that are able to assess the goodness or ranking of the individual features. Empirical study of MA-C on several commonly used datasets from the large-scale Gene expression datasets indicates that it outperforms recent existing methods in the literature in terms of classification accuracy, selected feature size and efficiency. Further, we also investigate the balance between local and genetic search to maximize the search quality and efficiency of MA-C.

Proceedings ArticleDOI
19 Jul 2010
TL;DR: The experimental results clearly show that the context-aware ranking approach improves the ranking of a commercial search engine which ignores context information and outperforms a baseline method which considers context information in ranking.
Abstract: The context of a search query often provides a search engine meaningful hints for answering the current query better. Previous studies on context-aware search were either focused on the development of context models or limited to a relatively small scale investigation under a controlled laboratory setting. Particularly, about context-aware ranking for Web search, the following two critical problems are largely remained unsolved. First, how can we take advantage of different types of contexts in ranking? Second, how can we integrate context information into a ranking model? In this paper, we tackle the above two essential problems analytically and empirically. We develop different ranking principles for different types of contexts. Moreover, we adopt a learning-to-rank approach and integrate the ranking principles into a state-of-the-art ranking model by encoding the context information as features of the model. We empirically test our approach using a large search log data set obtained from a major commercial search engine. Our evaluation uses both human judgments and implicit user click data. The experimental results clearly show that our context-aware ranking approach improves the ranking of a commercial search engine which ignores context information. Furthermore, our method outperforms a baseline method which considers context information in ranking.

Journal ArticleDOI
01 Sep 2010
TL;DR: Empirical studies with real-world spatial data demonstrate that LkPT queries are more effective in retrieving web objects than a previous approach that does not consider the effects of nearby objects; and they show that the proposed algorithms are scalable and outperform a baseline approach significantly.
Abstract: The location-aware keyword query returns ranked objects that are near a query location and that have textual descriptions that match query keywords. This query occurs inherently in many types of mobile and traditional web services and applications, e.g., Yellow Pages and Maps services. Previous work considers the potential results of such a query as being independent when ranking them. However, a relevant result object with nearby objects that are also relevant to the query is likely to be preferable over a relevant object without relevant nearby objects.The paper proposes the concept of prestige-based relevance to capture both the textual relevance of an object to a query and the effects of nearby objects. Based on this, a new type of query, the Location-aware top-k Prestige-based Text retrieval (LkPT) query, is proposed that retrieves the top-k spatial web objects ranked according to both prestige-based relevance and location proximity.We propose two algorithms that compute LkPT queries. Empirical studies with real-world spatial data demonstrate that LkPT queries are more effective in retrieving web objects than a previous approach that does not consider the effects of nearby objects; and they show that the proposed algorithms are scalable and outperform a baseline approach significantly.

Posted Content
TL;DR: In this article, a critical analysis of the "Academic Ranking of World Universities", published every year by the Institute of Higher Education of the Jiao Tong University in Shanghai and more commonly known as the Shanghai ranking, is presented.
Abstract: This paper proposes a critical analysis of the "Academic Ranking of World Universities", published every year by the Institute of Higher Education of the Jiao Tong University in Shanghai and more commonly known as the Shanghai ranking. After having recalled how the ranking is built, we first discuss the relevance of the criteria and then analyze the proposed aggregation method. Our analysis uses tools and concepts from Multiple Criteria Decision Making (MCDM). Our main conclusions are that the criteria that are used are not relevant, that the aggregation methodology is plagued by a number of major problems and that the whole exercise suffers from an insufficient attention paid to fundamental structuring issues. Hence, our view is that the Shanghai ranking, in spite of the media coverage it receives, does not qualify as a useful and pertinent tool to discuss the "quality" of academic institutions, let alone to guide the choice of students and family or to promote reforms of higher education systems. We outline the type of work that should be undertaken to oer sound alternatives to the Shanghai ranking.

Journal ArticleDOI
TL;DR: The view is that the Shanghai ranking, in spite of the media coverage it receives, does not qualify as a useful and pertinent tool to discuss the “quality” of academic institutions, let alone to guide the choice of students and family or to promote reforms of higher education systems.
Abstract: This paper proposes a critical analysis of the “Academic Ranking of World Universities”, published every year by the Institute of Higher Education of the Jiao Tong University in Shanghai and more commonly known as the Shanghai ranking. After having recalled how the ranking is built, we first discuss the relevance of the criteria and then analyze the proposed aggregation method. Our analysis uses tools and concepts from Multiple Criteria Decision Making (MCDM). Our main conclusions are that the criteria that are used are not relevant, that the aggregation methodology is plagued by a number of major problems and that the whole exercise suffers from an insufficient attention paid to fundamental structuring issues. Hence, our view is that the Shanghai ranking, in spite of the media coverage it receives, does not qualify as a useful and pertinent tool to discuss the “quality” of academic institutions, let alone to guide the choice of students and family or to promote reforms of higher education systems. We outline the type of work that should be undertaken to offer sound alternatives to the Shanghai ranking.

Patent
25 Oct 2010
TL;DR: In this paper, a method for transliteration includes receiving input such as a word, a sentence, a phrase, and a paragraph, in a source language, creating source language sub-phonetic units for the word and converting the source language SUB-PHONETs to target language subphonETs.
Abstract: A method for transliteration includes receiving input such as a word, a sentence, a phrase, and a paragraph, in a source language, creating source language sub-phonetic units for the word and converting the source language sub-phonetic units for the word to target language sub-phonetic units, retrieving ranking for each of the target language sub-phonetic units from a database and creating target language words for the word in the source language based on the target language sub-phonetic units and ranking of the each of the target language sub-phonetic units. The method further includes identifying candidate target language words based predefined criteria, and displaying candidate target language words.

Proceedings ArticleDOI
Loïc Lecerf1, Boris Chidlovskii1
22 Mar 2010
TL;DR: A model of layout indexing of a collection adapted for the quick retrieval of top k relevant documents by document layout is developed and a direct evaluation of the similarity between a query and each document in the collection is avoided.
Abstract: In this paper we propose a schema for querying large documents collections by document layout. We develop a model of layout indexing of a collection adapted for the quick retrieval of top k relevant documents. Fort the sake of scalability, we avoid a direct evaluation of the similarity between a query and each document in the collection; their similarity is instead approximated by the similarity between their projections on the set of representative blocks which are inferred from the collection on the indexed step. The technique also proposes new functions for the relevance ranking and the cluster pruning that ensure a scalable retrieval and ranking.

Patent
Rong Xiao1, Qiang Hao1, Changhu Wang1, Rui Cai1, Lei Zhang1 
08 Jun 2010
TL;DR: In this paper, location-related aspects of user-generated content based on automated analysis of the usergenerated content are automatically learned based on dividing documents into document segments, and decomposing the segments into local topics and global topics.
Abstract: Described herein is a technology that facilitates efficient automated mining of topic-related aspects of user-generated content based on automated analysis of the user-generated content. Locations are automatically learned based on dividing documents into document segments, and decomposing the segments into local topics and global topics. Techniques are described that facilitate automatically extracting snippets. These techniques include, for example, computer annotating travelogues with learned tags and images, performing topic learning to obtain an interest model, performing location matching based on the interest model, calculating geographic and semantic relevance scores, ranking snippets based on the geographic and semantic relevance scores, and searching snippets with a “location+context term” query.

Journal ArticleDOI
TL;DR: This work explores the variation in what different people consider relevant to the same query by mining three data sources, finding that people's explicit judgments for the same queries differ greatly.
Abstract: Current Web search tools do a good job of retrieving documents that satisfy the most common intentions associated with a query, but do not do a very good job of discerning different individuals' unique search goals. We explore the variation in what different people consider relevant to the same query by mining three data sources: (1) explicit relevance judgments, (2) clicks on search results (a behavior-based implicit measure of relevance), and (3) the similarity of desktop content to search results (a content-based implicit measure of relevance). We find that people's explicit judgments for the same queries differ greatly. As a result, there is a large gap between how well search engines could perform if they were to tailor results to the individual, and how well they currently perform by returning results designed to satisfy everyone. We call this gap the potential for personalization. The two implicit indicators we studied provide complementary value for approximating this variation in result relevance among people. We discuss several uses of our findings, including a personalized search system that takes advantage of the implicit measures by ranking personally relevant results more highly and improving click-through rates.