scispace - formally typeset
Search or ask a question
Journal ArticleDOI

SpeedTracer: a Web usage mining and analysis tool

Kun-Lung Wu1, Philip S. Yu1, A. Ballman1
01 Jan 1998-Ibm Systems Journal (IBM Corp.)-Vol. 37, Iss: 1, pp 89-105
TL;DR: The design of SpeedTracer is described and some of its features are demonstrated with a few sample reports, helping the understanding of user surfing behavior.
Abstract: SpeedTracer, a World Wide Web usage mining and analysis tool, was developed to understand user surfing behavior by exploring the Web server log files with data mining techniques. As the popularity of the Web has exploded, there is a strong desire to understand user surfing behavior. However, it is difficult to perform user-oriented data mining and analysis directly on the server log files because they tend to be ambiguous and incomplete. With innovative algorithms, SpeedTracer first identifies user sessions by reconstructing user traversal paths. It does not require “cookies” or user registration for session identification. User privacy is protected. Once user sessions are identified, data mining algorithms are then applied to discover the most common traversal paths and groups of pages frequently visited together. Important user browsing patterns are manifested through the frequent traversal paths and page groups, helping the understanding of user surfing behavior. Three types of reports are prepared: user-based reports, path-based reports and group-based reports. In this paper, we describe the design of SpeedTracer and demonstrate some of its features with a few sample reports.
Citations
More filters
Journal ArticleDOI
TL;DR: Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications as mentioned in this paper, where preprocessing, pattern discovery, and pattern analysis are described in detail.
Abstract: Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Web usage mining consists of three phases, namely preprocessing, pattern discovery, and pattern analysis. This paper describes each of these phases in detail. Given its application potential, Web usage mining has seen a rapid increase in interest, from both the research and practice communities. This paper provides a detailed taxonomy of the work in this area, including research efforts as well as commercial offerings. An up-to-date survey of the existing work is also provided. Finally, a brief overview of the WebSIFT system as an example of a prototypical Web usage mining system is given.

2,227 citations

Journal ArticleDOI
TL;DR: This article introduces the modules that comprise a Web personalization system, emphasizing the Web usage mining module, and presents a review of the most common methods that are used as well as technical issues that occur.
Abstract: Web personalization is the process of customizing a Web site to the needs of specific users, taking advantage of the knowledge acquired from the analysis of the user's navigational behavior (usage data) in correlation with other information collected in the Web context, namely, structure, content, and user profile data. Due to the explosive growth of the Web, the domain of Web personalization has gained great momentum both in the research and commercial areas. In this article we present a survey of the use of Web mining for Web personalization. More specifically, we introduce the modules that comprise a Web personalization system, emphasizing the Web usage mining module. A review of the most common methods that are used as well as technical issues that occur is given, along with a brief overview of the most popular tools and applications available from software vendors. Moreover, the most important research initiatives in the Web usage mining and personalization areas are presented.

941 citations

Journal ArticleDOI
TL;DR: The authors explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access and show that the option of selective scan is very advantageous and can lead to prominent performance improvement.
Abstract: The authors explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access. The solution procedure consists of two steps. First, they derive an algorithm to convert the original sequence of log data into a set of maximal forward references. By doing so, one can filter out the effect of some backward references, which are mainly made for ease of traveling and concentrate on mining meaningful user access sequences. Second, they derive algorithms to determine the frequent traversal patterns-i.e., large reference sequences-from the maximal forward references obtained. Two algorithms are devised for determining large reference sequences; one is based on some hashing and pruning techniques, and the other is further improved with the option of determining large reference sequences in batch so as to reduce the number of database scans required. Performance of these two methods is comparatively analyzed. It is shown that the option of selective scan is very advantageous and can lead to prominent performance improvement. Sensitivity analysis on various parameters is conducted.

565 citations

Patent
02 Apr 2010
TL;DR: In this paper, an improved user interface and method for presenting recommendations to a user when the user adds an item to a shopping cart is presented, where a page generation process generates and returns a page that includes a recommendation portion and a condensed view of the shopping cart.
Abstract: An improved user interface and method are provided for presenting recommendations to a user when the user adds an item to a shopping cart. In response to the shopping cart add event, a page generation process generates and returns a page that includes a recommendations portion and a condensed view of the shopping cart. The recommendations portion preferably includes multiple recommendation sections, each of which displays a different respective set of recommended items selected according to a different respective recommendation or selection algorithm (e.g., recommendations based on shopping cart contents, recommendations based on purchase history, etc.). The condensed shopping cart view preferably lacks controls for editing the shopping cart, and lacks certain types of product information, making more screen real estate available for the display of the recommendations content. A link to a full shopping cart page allows the user to edit the shopping cart and view expanded product descriptions.

555 citations

Patent
19 Jul 2001
TL;DR: In this paper, a DNS Server (SPD) load balances network requests among customer Web servers and directs client requests for hosted customer content to the appropriate caching server which is selected by choosing the caching server that is closest to the user, is available, and is the least loaded.
Abstract: A content delivery and global traffic management network system provides a plurality of caching servers connected to a network. The caching servers host customer content that can be cached and stored, and respond to requests for Web content from clients. If the requested content does not exist in memory or on disk, it generates a request to an origin site to obtain the content. A DNS Server (SPD) load balances network requests among customer Web servers and directs client requests for hosted customer content to the appropriate caching server which is selected by choosing the caching server that is closest to the user, is available, and is the least loaded. SPD also supports persistence and returns the same IP addresses, for a given client. The entire Internet address space is broken up into multiple zones. Each zone is assigned to a group of SPD servers. If an SPD server gets a request from a client that is not in the zone assigned to that SPD server, it forwards the request to the SPD server assigned to that zone. Servers write information about the content delivered to log files that are picked up by a log server.

466 citations

References
More filters
Proceedings ArticleDOI
01 Jun 1993
TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.
Abstract: We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. The algorithm incorporates buffer management and novel estimation and pruning techniques. We also present results of applying this algorithm to sales data obtained from a large retailing company, which shows the effectiveness of the algorithm.

15,645 citations

Proceedings ArticleDOI
06 Mar 1995
TL;DR: Three algorithms are presented to solve the problem of mining sequential patterns over databases of customer transactions, and empirically evaluating their performance using synthetic data shows that two of them have comparable performance.
Abstract: We are given a large database of customer transactions, where each transaction consists of customer-id, transaction time, and the items bought in the transaction. We introduce the problem of mining sequential patterns over such databases. We present three algorithms to solve this problem, and empirically evaluate their performance using synthetic data. Two of the proposed algorithms, AprioriSome and AprioriAll, have comparable performance, albeit AprioriSome performs a little better when the minimum number of customers that must support a sequential pattern is low. Scale-up experiments show that both AprioriSome and AprioriAll scale linearly with the number of customer transactions. They also have excellent scale-up properties with respect to the number of transactions per customer and the number of items in a transaction. >

5,663 citations

Proceedings ArticleDOI
22 May 1995
TL;DR: The number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck, and allows us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly.
Abstract: In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first and then, identifying, within this candidate set, those itemsets that meet the large itemset requirement. Generally this is done iteratively for each large k-itemset in increasing order of k where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate large itemsets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we propose an effective hash-based algorithm for the candidate set generation. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. Extensive simulation study is conducted to evaluate performance of the proposed algorithm.

1,625 citations

Proceedings Article
11 Sep 1995
TL;DR: A top-down progressive deepening method is developed for mining multiplelevel association rules from large transaction databases by extension of some existing association rule mining techniques.
Abstract: Previous studies on mining association rules find rules at single concept level, however, mining association rules at multiple concept levels may lead to the discovery of more specific and concrete knowledge from data. In this study, a top-down progressive deepening method is developed for mining multiplelevel association rules from large transaction databases by extension of some existing association rule mining techniques. A group of variant algorithms are proposed based on the ways of sharing intermediate results, with the relative performance tested on different kinds of data. Relaxation of the rule conditions for finding “level-crossing” association rules is also discussed in the paper.

1,128 citations