scispace - formally typeset
Search or ask a question
JournalISSN: 0306-4573

Information Processing and Management 

Elsevier BV
About: Information Processing and Management is an academic journal published by Elsevier BV. The journal publishes majorly in the area(s): Computer science & Relevance (information retrieval). It has an ISSN identifier of 0306-4573. Over the lifetime, 3895 publications have been published receiving 151677 citations. The journal is also known as: Information processing and management.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.
Abstract: The experimental evidence accumulated over the past 20 years indicates that textindexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective term weighting systems. This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.

9,460 citations

Journal ArticleDOI
TL;DR: This paper presents a systematic analysis of twenty four performance measures used in the complete spectrum of Machine Learning classification tasks, i.e., binary, multi-class,multi-labelled, and hierarchical, to produce a measure invariance taxonomy with respect to all relevant label distribution changes in a classification problem.
Abstract: This paper presents a systematic analysis of twenty four performance measures used in the complete spectrum of Machine Learning classification tasks, i.e., binary, multi-class, multi-labelled, and hierarchical. For each classification task, the study relates a set of changes in a confusion matrix to specific characteristics of data. Then the analysis concentrates on the type of changes to a confusion matrix that do not change a measure, therefore, preserve a classifier's evaluation (measure invariance). The result is the measure invariance taxonomy with respect to all relevant label distribution changes in a classification problem. This formal analysis is supported by examples of applications where invariance properties of measures lead to a more reliable evaluation of classifiers. Text classification supplements the discussion with several case studies.

3,945 citations

Journal ArticleDOI
TL;DR: A failure analysis was conducted, identifying trends among user mistakes, and a summary of findings and a discussion of the implications of these findings were concluded.
Abstract: We analyzed transaction logs containing 51,473 queries posed by 18,113 users of Excite, a major Internet search service. We provide data on: (i) sessions — changes in queries during a session, number of pages viewed, and use of relevance feedback; (ii) queries — the number of search terms, and the use of logic and modifiers; and (iii) terms — their rank/frequency distribution and the most highly used search terms. We then shift the focus of analysis from the query to the user to gain insight to the characteristics of the Web user. With these characteristics as a basis, we then conducted a failure analysis, identifying trends among user mistakes. We conclude with a summary of findings and a discussion of the implications of these findings. # 2000 Elsevier Science Ltd. All rights reserved.

1,414 citations

Journal ArticleDOI
TL;DR: A multi-document summarizer, MEAD, is presented, which generates summaries using cluster centroids produced by a topic detection and tracking system and an evaluation scheme based on sentence utility and subsumption is applied.
Abstract: We present a multi-document summarizer, MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We describe two new techniques, a centroid-based summarizer, and an evaluation scheme based on sentence utility and subsumption. We have applied this evaluation to both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.

1,121 citations

Journal ArticleDOI
TL;DR: The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material, and presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions.
Abstract: The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions. Each step in the argument is matched by comparative retrieval tests, to provide a single coherent account of a major line of research. The experiments demonstrate, for a large test collection, that the probabilistic model is eAective and robust, and that it responds appropriately, with major improvements in performance, to key features of retrieval situations. Part 1 covers the foundations and the model development for document collection and relevance data, along with the test apparatus. Part 2 covers the further development and elaboration of the model, with extensive testing, and briefly considers other environment conditions and tasks, model training, concluding with comparisons with other approaches and an overall assessment. Data and results tables for both parts are given in Part 1. Key results are summarised in Part 2. 7 2000 Elsevier Science Ltd. All rights reserved.

1,055 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
2023293
2022336
2021257
2020252
2019138
201877