Journal•ISSN: 1386-4564

Information Retrieval

Springer Science+Business Media

About: Information Retrieval is an academic journal published by Springer Science+Business Media. The journal publishes majorly in the area(s): Relevance (information retrieval) & Query expansion. It has an ISSN identifier of 1386-4564. Over the lifetime, 638 publications have been published receiving 33001 citations. The journal is also known as: Information retrieval (Dordrecht) & Information retrieval (London).

...read moreread less

Topics: Relevance (information retrieval), Query expansion, Ranking (information retrieval), Web query classification, Pattern recognition (psychology) ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

An Evaluation of Statistical Approaches to Text Categorization

[...]

Yiming Yang¹•Institutions (1)

Carnegie Mellon University¹

15 May 1999-Information Retrieval

TL;DR: Analysis and empirical evidence suggest that the evaluation results on some versions of Reuters were significantly affected by the inclusion of a large portion of unlabelled documents, mading those results difficult to interpret and leading to considerable confusions in the literature.

...read moreread less

Abstract: This paper focuses on a comparative evaluation of a wide-range of text categorization methods, including previously published results on the Reuters corpus and new results of additional experiments. A controlled study using three classifiers, kNN, LLSF and WORD, was conducted to examine the impact of configuration variations in five versions of Reuters on the observed performance of classifiers. Analysis and empirical evidence suggest that the evaluation results on some versions of Reuters were significantly affected by the inclusion of a large portion of unlabelled documents, mading those results difficult to interpret and leading to considerable confusions in the literature. Using the results evaluated on the other versions of Reuters which exclude the unlabelled documents, the performance of twelve methods are compared directly or indirectly. For indirect compararions, kNN, LLSF and WORD were used as baselines, since they were evaluated on all versions of Reuters that exclude the unlabelled documents. As a global observation, kNN, LLSF and a neural network method had the best performances except for a Naive Bayes approach, the other learning algorithms also performed relatively well.

...read moreread less

2,130 citations

Journal Article•DOI•

Eigentaste: A Constant Time Collaborative Filtering Algorithm

[...]

Ken Goldberg¹, Theresa M. Roeder¹, Dhruv Gupta¹, Chris Perkins¹•Institutions (1)

University of California, Berkeley¹

01 Jul 2001-Information Retrieval

TL;DR: This work compares Eigentaste to alternative algorithms using data from Jester, an online joke recommending system, and uses the Normalized Mean Absolute Error (NMAE) measure to compare performance of different algorithms.

...read moreread less

Abstract: Eigentaste is a collaborative filtering algorithm that uses i>universal queries to elicit real-valued user ratings on a common set of items and applies principal component analysis (PCA) to the resulting dense subset of the ratings matrix. PCA facilitates dimensionality reduction for offline clustering of users and rapid computation of recommendations. For a database of i>n users, standard nearest-neighbor techniques require i>O(i>n) processing time to compute recommendations, whereas Eigentaste requires i>O(1) (constant) time. We compare Eigentaste to alternative algorithms using data from i>Jester, an online joke recommending system. Jester has collected approximately 2,500,000 ratings from 57,000 users. We use the Normalized Mean Absolute Error (NMAE) measure to compare performance of different algorithms. In the Appendix we use Uniform and Normal distribution models to derive analytic estimates of NMAE when predictions are random. On the Jester dataset, Eigentaste computes recommendations two orders of magnitude faster with no loss of accuracy. Jester is online at: http://eigentaste.berkeley.edu

...read moreread less

1,618 citations

Journal Article•DOI•

Automating the Construction of Internet Portals with Machine Learning

[...]

Andrew McCallum¹, Kamal Nigam¹, Jason D. M. Rennie², Kristie Seymore¹•Institutions (2)

Carnegie Mellon University¹, Massachusetts Institute of Technology²

21 Jul 2000-Information Retrieval

TL;DR: New research in reinforcement learning, information extraction and text classification that enables efficient spidering, the identification of informative text segments, and the population of topic hierarchies are described.

...read moreread less

Abstract: Domain-specific internet portals are growing in popularity because they gather content from the Web and organize it for easy access, retrieval and search. For example, www.campsearch.com allows complex queries by age, location, cost and specialty over summer camps. This functionality is not possible with general, Web-wide search engines. Unfortunately these portals are difficult and time-consuming to maintain. This paper advocates the use of machine learning techniques to greatly automate the creation and maintenance of domain-specific Internet portals. We describe new research in reinforcement learning, information extraction and text classification that enables efficient spidering, the identification of informative text segments, and the population of topic hierarchies. Using these techniques, we have built a demonstration system: a portal for computer science research papers. It already contains over 50,000 papers and is publicly available at www.cora.justresearch.com. These techniques are widely applicable to portal creation in other domains.

...read moreread less

1,081 citations

Journal Article•DOI•

Learning Algorithms for Keyphrase Extraction

[...]

Peter D. Turney¹•Institutions (1)

National Research Council¹

21 May 2000-Information Retrieval

TL;DR: In this paper, the problem of automatically extracting keyphrases from text is treated as a supervised learning task, where the learning algorithm must learn to classify as positive or negative examples of key phrases.

...read moreread less

Abstract: Many academic journals ask their authors to provide a list of about five to fifteen keywords, to appear on the first page of each article. Since these key words are often phrases of two or more words, we prefer to call them keyphrases. There is a wide variety of tasks for which keyphrases are useful, as we discuss in this paper. We approach the problem of automatically extracting keyphrases from text as a supervised learning task. We treat a document as a set of phrases, which the learning algorithm must learn to classify as positive or negative examples of keyphrases. Our first set of experiments applies the C4.5 decision tree induction algorithm to this learning task. We evaluate the performance of nine different configurations of C4.5. The second set of experiments applies the GenEx algorithm to the task. We developed the GenEx algorithm specifically for automatically extracting keyphrases from text. The experimental results support the claim that a custom-designed algorithm (GenEx), incorporating specialized procedural domain knowledge, can generate better keyphrases than a general-purpose algorithm (C4.5). Subjective human evaluation of the keyphrases generated by GenEx suggests that about 80% of the keyphrases are acceptable to human readers. This level of performance should be satisfactory for a wide variety of applications.

...read moreread less

869 citations

Journal Article•DOI•

Advances in Automatic Text Summarization

[...]

Elizabeth D. Liddy¹•Institutions (1)

Syracuse University¹

01 Apr 2001-Information Retrieval

850 citations

Collapse

Performance

Metrics

642

Papers

33,008

Citations

No. of papers from the Journal in previous years
Year	Papers
2023	4
2022	14
2021	42
2020	7
2019	7
2018	14