SpeedTracer: a Web usage mining and analysis tool

doi:10.1147/SJ.371.0089

Home
/
Papers
/
SpeedTracer: a Web usage mining and analysis tool

Journal Article•DOI•

SpeedTracer: a Web usage mining and analysis tool

Kun-Lung Wu¹, Philip S. Yu¹, A. Ballman¹•Institutions (1)

IBM¹

01 Jan 1998-Ibm Systems Journal (IBM Corp.)-Vol. 37, Iss: 1, pp 89-105

TL;DR: The design of SpeedTracer is described and some of its features are demonstrated with a few sample reports, helping the understanding of user surfing behavior.

read less

Abstract: SpeedTracer, a World Wide Web usage mining and analysis tool, was developed to understand user surfing behavior by exploring the Web server log files with data mining techniques. As the popularity of the Web has exploded, there is a strong desire to understand user surfing behavior. However, it is difficult to perform user-oriented data mining and analysis directly on the server log files because they tend to be ambiguous and incomplete. With innovative algorithms, SpeedTracer first identifies user sessions by reconstructing user traversal paths. It does not require “cookies” or user registration for session identification. User privacy is protected. Once user sessions are identified, data mining algorithms are then applied to discover the most common traversal paths and groups of pages frequently visited together. Important user browsing patterns are manifested through the frequent traversal paths and page groups, helping the understanding of user surfing behavior. Three types of reports are prepared: user-based reports, path-based reports and group-based reports. In this paper, we describe the design of SpeedTracer and demonstrate some of its features with a few sample reports.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Web usage mining: discovery and applications of usage patterns from Web data

[...]

Jaideep Srivastava¹, Robert Cooley¹, Mukund Deshpande¹, Pang-Ning Tan¹•Institutions (1)

University of Minnesota¹

01 Jan 2000-Sigkdd Explorations

TL;DR: Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications as mentioned in this paper, where preprocessing, pattern discovery, and pattern analysis are described in detail.

...read moreread less

Abstract: Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Web usage mining consists of three phases, namely preprocessing, pattern discovery, and pattern analysis. This paper describes each of these phases in detail. Given its application potential, Web usage mining has seen a rapid increase in interest, from both the research and practice communities. This paper provides a detailed taxonomy of the work in this area, including research efforts as well as commercial offerings. An up-to-date survey of the existing work is also provided. Finally, a brief overview of the WebSIFT system as an example of a prototypical Web usage mining system is given.

...read moreread less

2,227 citations

Journal Article•DOI•

Web mining for web personalization

[...]

Magdalini Eirinaki¹, Michalis Vazirgiannis¹•Institutions (1)

Athens University of Economics and Business¹

01 Feb 2003-ACM Transactions on Internet Technology

TL;DR: This article introduces the modules that comprise a Web personalization system, emphasizing the Web usage mining module, and presents a review of the most common methods that are used as well as technical issues that occur.

...read moreread less

Abstract: Web personalization is the process of customizing a Web site to the needs of specific users, taking advantage of the knowledge acquired from the analysis of the user's navigational behavior (usage data) in correlation with other information collected in the Web context, namely, structure, content, and user profile data. Due to the explosive growth of the Web, the domain of Web personalization has gained great momentum both in the research and commercial areas. In this article we present a survey of the use of Web mining for Web personalization. More specifically, we introduce the modules that comprise a Web personalization system, emphasizing the Web usage mining module. A review of the most common methods that are used as well as technical issues that occur is given, along with a brief overview of the most popular tools and applications available from software vendors. Moreover, the most important research initiatives in the Web usage mining and personalization areas are presented.

...read moreread less

941 citations

Journal Article•DOI•

Efficient data mining for path traversal patterns

[...]

Ming-Syan Chen¹, Jong Soo Park², Philip S. Yu•Institutions (2)

National Taiwan University¹, Sungshin Women's University²

01 Mar 1998-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The authors explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access and show that the option of selective scan is very advantageous and can lead to prominent performance improvement.

...read moreread less

Abstract: The authors explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access. The solution procedure consists of two steps. First, they derive an algorithm to convert the original sequence of log data into a set of maximal forward references. By doing so, one can filter out the effect of some backward references, which are mainly made for ease of traveling and concentrate on mining meaningful user access sequences. Second, they derive algorithms to determine the frequent traversal patterns-i.e., large reference sequences-from the maximal forward references obtained. Two algorithms are devised for determining large reference sequences; one is based on some hashing and pruning techniques, and the other is further improved with the option of determining large reference sequences in batch so as to reduce the number of database scans required. Performance of these two methods is comparatively analyzed. It is shown that the option of selective scan is very advantageous and can lead to prominent performance improvement. Sensitivity analysis on various parameters is conducted.

...read moreread less

565 citations

Patent•

User interface and methods for recommending items to users

[...]

Russell A. Dicker¹, Jeffrey T. Brownell¹, Jennifer A. Jacobi¹, Eric A. Benson¹, Gregory D. Linden¹ - Show less +1 more•Institutions (1)

Amazon.com¹

02 Apr 2010

TL;DR: In this paper, an improved user interface and method for presenting recommendations to a user when the user adds an item to a shopping cart is presented, where a page generation process generates and returns a page that includes a recommendation portion and a condensed view of the shopping cart.

...read moreread less

Abstract: An improved user interface and method are provided for presenting recommendations to a user when the user adds an item to a shopping cart. In response to the shopping cart add event, a page generation process generates and returns a page that includes a recommendations portion and a condensed view of the shopping cart. The recommendations portion preferably includes multiple recommendation sections, each of which displays a different respective set of recommended items selected according to a different respective recommendation or selection algorithm (e.g., recommendations based on shopping cart contents, recommendations based on purchase history, etc.). The condensed shopping cart view preferably lacks controls for editing the shopping cart, and lacks certain types of product information, making more screen real estate available for the display of the recommendations content. A link to a full shopping cart page allows the user to edit the shopping cart and view expanded product descriptions.

...read moreread less

555 citations

Patent•

Content delivery and global traffic management network system

[...]

Eric Sven-Johan Swildens, Richard David Day¹, Ajit K. Gupta¹•Institutions (1)

Akamai Technologies¹

19 Jul 2001

TL;DR: In this paper, a DNS Server (SPD) load balances network requests among customer Web servers and directs client requests for hosted customer content to the appropriate caching server which is selected by choosing the caching server that is closest to the user, is available, and is the least loaded.

...read moreread less

Abstract: A content delivery and global traffic management network system provides a plurality of caching servers connected to a network. The caching servers host customer content that can be cached and stored, and respond to requests for Web content from clients. If the requested content does not exist in memory or on disk, it generates a request to an origin site to obtain the content. A DNS Server (SPD) load balances network requests among customer Web servers and directs client requests for hosted customer content to the appropriate caching server which is selected by choosing the caching server that is closest to the user, is available, and is the least loaded. SPD also supports persistence and returns the same IP addresses, for a given client. The entire Internet address space is broken up into multiple zones. Each zone is assigned to a group of SPD servers. If an SPD server gets a request from a client that is not in the zone assigned to that SPD server, it forwards the request to the SPD server assigned to that zone. Servers write information about the content delivered to log files that are picked up by a log server.

...read moreread less

466 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Mining association rules between sets of items in large databases

[...]

Rakesh Agrawal¹, Tomasz Imielinski², Arun N. Swami¹•Institutions (2)

IBM¹, Rutgers University²

01 Jun 1993

TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.

...read moreread less

Abstract: We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. The algorithm incorporates buffer management and novel estimation and pruning techniques. We also present results of applying this algorithm to sales data obtained from a large retailing company, which shows the effectiveness of the algorithm.

...read moreread less

15,645 citations

Proceedings Article•

Fast Algorithms for Mining Association Rules in Large Databases

[...]

Rakesh Agrawal, Ramakrishnan Srikant

12 Sep 1994

10,454 citations

Proceedings Article•DOI•

Mining sequential patterns

[...]

Rakesh Agrawal¹, Ramakrishnan Srikant¹•Institutions (1)

IBM¹

06 Mar 1995

TL;DR: Three algorithms are presented to solve the problem of mining sequential patterns over databases of customer transactions, and empirically evaluating their performance using synthetic data shows that two of them have comparable performance.

...read moreread less

Abstract: We are given a large database of customer transactions, where each transaction consists of customer-id, transaction time, and the items bought in the transaction. We introduce the problem of mining sequential patterns over such databases. We present three algorithms to solve this problem, and empirically evaluate their performance using synthetic data. Two of the proposed algorithms, AprioriSome and AprioriAll, have comparable performance, albeit AprioriSome performs a little better when the minimum number of customers that must support a sequential pattern is low. Scale-up experiments show that both AprioriSome and AprioriAll scale linearly with the number of customer transactions. They also have excellent scale-up properties with respect to the number of transactions per customer and the number of items in a transaction. >

...read moreread less

5,663 citations

Proceedings Article•DOI•

An effective hash-based algorithm for mining association rules

[...]

Jong Soo Park¹, Ming-Syan Chen¹, Philip S. Yu¹•Institutions (1)

IBM¹

22 May 1995

TL;DR: The number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck, and allows us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly.

...read moreread less

Abstract: In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first and then, identifying, within this candidate set, those itemsets that meet the large itemset requirement. Generally this is done iteratively for each large k-itemset in increasing order of k where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate large itemsets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we propose an effective hash-based algorithm for the candidate set generation. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. Extensive simulation study is conducted to evaluate performance of the proposed algorithm.

...read moreread less

1,625 citations

Proceedings Article•

Discovery of Multiple-Level Association Rules from Large Databases

[...]

Jiawei Han, Yongjian Fu

11 Sep 1995

TL;DR: A top-down progressive deepening method is developed for mining multiplelevel association rules from large transaction databases by extension of some existing association rule mining techniques.

...read moreread less

Abstract: Previous studies on mining association rules find rules at single concept level, however, mining association rules at multiple concept levels may lead to the discovery of more specific and concrete knowledge from data. In this study, a top-down progressive deepening method is developed for mining multiplelevel association rules from large transaction databases by extension of some existing association rule mining techniques. A group of variant algorithms are proposed based on the ways of sharing intermediate results, with the relative performance tested on different kinds of data. Relaxation of the rule conditions for finding “level-crossing” association rules is also discussed in the paper.

...read moreread less

1,128 citations