scispace - formally typeset
Search or ask a question
Topic

Ranking (information retrieval)

About: Ranking (information retrieval) is a research topic. Over the lifetime, 21109 publications have been published within this topic receiving 435130 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The model provides a formal method for minimizing expected information overload and predicts the usefulness of a message based on the available message features and may be useful to rank messages by expected importance or economic worth.
Abstract: The decision to examine a message at a particular point in time should be made rationally and economically if the mes sage recipient is to operate efficiently. Electronic message distribution systems, electronic bulletin board systems, and telephone systems capable of leaving digitized voice messages can contribute to "information overload", defined as the eco nomic loss associated with the examination of a number of non- or less-relevant messages. Our model provides a formal method for minimizing expected information overload.The proposed adaptive model predicts the usefulness of a message based on the available message features and may be useful to rank messages by expected importance or economic worth. The assumptions of binary and two Poisson indepen dent probabilistic distributions of message feature frequencies are examined, and methods of incorporating these distributions into the ranking model are examined. Ways to incorporate user supplied relevance feedback are suggested. Analytic perfor mance m...

233 citations

Patent
08 Aug 2002
TL;DR: In this paper, a categorization engine classifies incoming documents to topics and then assigns each document to a topic using confidence scores expressing how confident the algorithm is in this assignment, and the confidence score is compared to the topic's (configurable) threshold.
Abstract: Automatic classification is applied in two stages: classification and ranking. In the first stage, a categorization engine classifies incoming documents to topics. A document may be classified to a single topic or multiple topics or no topics. For each topic, a raw score is generated for a document and that raw score is used to determine whether the document should be at least preliminarily classified to the topic. In the second stage, for each document assigned to a topic (i.e., for each document-topic association) the categorization engine generates confidence scores expressing how confident the algorithm is in this assignment. The confidence score of the assigned document is compared to the topic's (configurable) threshold. If the confidence score is higher than this configurable threshold, the document is placed in the topic's Published list. If not, the document is placed in the topic's Proposed list, where it awaits approval by a knowledge management expert. By modifying a topic's threshold, a knowledge management expert can advantageously control the tradeoff between human oversight and control vs. time and human effort expended.

233 citations

Proceedings ArticleDOI
23 May 2006
TL;DR: This paper presents a simple and intuitive method for mining search engine query logs to get fast query recommendations on a large scale industrial strength search engine, and combines this method with a traditional content based similarity method to compensate for the high sparsity of real query log data, and more specifically, the shortness of most query sessions.
Abstract: This paper presents a simple and intuitive method for mining search engine query logs to get fast query recommendations on a large scale industrial strength search engine. In order to get a more comprehensive solution, we combine two methods together. On the one hand, we study and model search engine users' sequential search behavior, and interpret this consecutive search behavior as client-side query refinement, that should form the basis for the search engine's own query refinement process. On the other hand, we combine this method with a traditional content based similarity method to compensate for the high sparsity of real query log data, and more specifically, the shortness of most query sessions. To evaluate our method, we use one hundred day worth query logs from SINA' search engine to do off-line mining. Then we analyze three independent editors evaluations on a query test set. Based on their judgement, our method was found to be effective for finding related queries, despite its simplicity. In addition to the subjective editors' rating, we also perform tests based on actual anonymous user search sessions.

233 citations

Journal ArticleDOI
01 Mar 2011
TL;DR: The results of the evaluation indicate that CL-CNG, despite its simple approach, is the best choice to rank and compare texts across languages if they are syntactically related.
Abstract: Cross-language plagiarism detection deals with the automatic identification and extraction of plagiarism in a multilingual setting. In this setting, a suspicious document is given, and the task is to retrieve all sections from the document that originate from a large, multilingual document collection. Our contributions in this field are as follows: (1) a comprehensive retrieval process for cross-language plagiarism detection is introduced, highlighting the differences to monolingual plagiarism detection, (2) state-of-the-art solutions for two important subtasks are reviewed, (3) retrieval models for the assessment of cross-language similarity are surveyed, and, (4) the three models CL-CNG, CL-ESA and CL-ASA are compared. Our evaluation is of realistic scale: it relies on 120,000 test documents which are selected from the corpora JRC-Acquis and Wikipedia, so that for each test document highly similar documents are available in all of the six languages English, German, Spanish, French, Dutch, and Polish. The models are employed in a series of ranking tasks, and more than 100 million similarities are computed with each model. The results of our evaluation indicate that CL-CNG, despite its simple approach, is the best choice to rank and compare texts across languages if they are syntactically related. CL-ESA almost matches the performance of CL-CNG, but on arbitrary pairs of languages. CL-ASA works best on "exact" translations but does not generalize well.

232 citations

Book ChapterDOI
08 Oct 2016
TL;DR: In this paper, a deep convolutional neural network is proposed to rank photo aesthetics in which the relative ranking of photo aesthetics are directly modeled in the loss function. But this method is not suitable for image aesthetics analysis.
Abstract: Real-world applications could benefit from the ability to automatically generate a fine-grained ranking of photo aesthetics. However, previous methods for image aesthetics analysis have primarily focused on the coarse, binary categorization of images into high- or low-aesthetic categories. In this work, we propose to learn a deep convolutional neural network to rank photo aesthetics in which the relative ranking of photo aesthetics are directly modeled in the loss function. Our model incorporates joint learning of meaningful photographic attributes and image content information which can help regularize the complicated photo aesthetics rating problem.

231 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
83% related
Ontology (information science)
57K papers, 869.1K citations
82% related
Graph (abstract data type)
69.9K papers, 1.2M citations
82% related
Feature learning
15.5K papers, 684.7K citations
81% related
Supervised learning
20.8K papers, 710.5K citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
20233,112
20226,541
20211,105
20201,082
20191,168