scispace - formally typeset
Search or ask a question
Author

M. Gordon

Bio: M. Gordon is an academic researcher. The author has contributed to research in topics: Document retrieval & Document clustering. The author has an hindex of 1, co-authored 1 publications receiving 249 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Competing document descriptions are associated with a document and altered over time by a genetic algorithm according to the queries used and relevance judgments made during retrieval.
Abstract: Document retrieval systems are built to provide inquirers with computerized access to relevant documents. Such systems often miss many relevant documents while falsely identifying many non-relevant documents. Here, competing document descriptions are associated with a document and altered over time by a genetic algorithm according to the queries used and relevance judgments made during retrieval.

252 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This article proposes to deal with the sparsity problem by applying an associative retrieval framework and related spreading activation algorithms to explore transitive associations among consumers through their past transactions and feedback to solve the problem of sparse transactional data.
Abstract: Recommender systems are being widely applied in many application settings to suggest products, services, and information items to potential consumers. Collaborative filtering, the most successful recommendation approach, makes recommendations based on past transactions and feedback from consumers sharing similar interests. A major problem limiting the usefulness of collaborative filtering is the sparsity problem, which refers to a situation in which transactional or feedback data is sparse and insufficient to identify similarities in consumer interests. In this article, we propose to deal with this sparsity problem by applying an associative retrieval framework and related spreading activation algorithms to explore transitive associations among consumers through their past transactions and feedback. Such transitive associations are a valuable source of information to help infer consumer interests and can be explored to deal with the sparsity problem. To evaluate the effectiveness of our approach, we have conducted an experimental study using a data set from an online bookstore. We experimented with three spreading activation algorithms including a constrained Leaky Capacitor algorithm, a branch-and-bound serial symbolic search algorithm, and a Hopfield net parallel relaxation search algorithm. These algorithms were compared with several collaborative filtering approaches that do not consider the transitive associations: a simple graph search approach, two variations of the user-based approach, and an item-based approach. Our experimental results indicate that spreading activation-based approaches significantly outperformed the other collaborative filtering methods as measured by recommendation precision, recall, the F-measure, and the rank score. We also observed the over-activation effect of the spreading activation approach, that is, incorporating transitive associations with past transactional data that is not sparse may "dilute" the data used to infer user preferences and lead to degradation in recommendation performance.

678 citations

Journal ArticleDOI
TL;DR: A survey of the available literature on data mining using soft computing based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model is provided.
Abstract: The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included.

630 citations

Journal ArticleDOI
TL;DR: The paper summarizes the different characteristics of Web data, the basic components of Web mining and its different types, and the current state of the art.
Abstract: The paper summarizes the different characteristics of Web data, the basic components of Web mining and its different types, and the current state of the art. The reason for considering Web mining, a separate field from data mining, is explained. The limitations of some of the existing Web mining methods and tools are enunciated, and the significance of soft computing (comprising fuzzy logic (FL), artificial neural networks (ANNs), genetic algorithms (GAs), and rough sets (RSs) are highlighted. A survey of the existing literature on "soft Web mining" is provided along with the commercially available systems. The prospective areas of Web mining where the application of soft computing needs immediate attention are outlined with justification. Scope for future research in developing "soft Web mining" systems is explained. An extensive bibliography is also provided.

365 citations

Proceedings ArticleDOI
01 Aug 1994
TL;DR: This work proposes a method by which the relevance estimates made by different experts can be automatically combined to result in superior retrieval performance and applies the method to two expert combination tasks.
Abstract: Retrieval performance can often be improved significantly by using a number of different retrieval algorithms and combining the results, in contrast to using just a single retrieval algorithm. This is because different retrieval algorithms, or retrieval experts, often emphasize different document and query features when determining relevance and therefore retrieve different sets of documents. However, it is unclear how the different experts are to be combined, in general, to yield a superior overall estimate. We propose a method by which the relevance estimates made by different experts can be automatically combined to result in superior retrieval performance. We apply the method to two expert combination tasks. The applications demonstrate that the method can identify high performance combinations of experts and also is a novel means for determining the combined effectiveness of experts.

343 citations

Journal ArticleDOI
TL;DR: This article presents three popular methods: the connectionist Hopfield network; the symbolic ID3/ID5R; and evolution-based genetic algorithms, which are promising in their ability to analyze user queries, identify users' information needs, and suggest alternatives for search.
Abstract: Information retrieval using probabilistic techniques has attracted significant attention on the part of researchers in information and computer science over the past few decades In the 1980s, knowledge-based techniques also made an impressive contribution to “intelligent” information retrieval and indexing More recently, information science researchers have turned to other newer artificial-intelligence-based inductive learning techniques including neural networks, symbolic learning, and genetic algorithms These newer techniques, which are grounded on diverse paradigms, have provided great opportunities for researchers to enhance the information processing and retrieval capabilities of current information storage and retrieval systems In this article, we first provide an overview of these newer techniques and their use in information science research To familiarize readers with these techniques, we present three popular methods: the connectionist Hopfield network; the symbolic ID3/ID5R; and evolution-based genetic algorithms We discuss their knowledge representations and algorithms in the context of information retrieval Sample implementation and testing results from our own research are also provided for each technique We believe these techniques are promising in their ability to analyze user queries, identify users' information needs, and suggest alternatives for search With proper user-system interactions, these methods can greatly complement the prevailing full-text, keyword-based, probabilistic, and knowledge-based techniques © 1995 John Wiley & Sons, Inc

283 citations