Showing papers on "Web modeling published in 2002"

PDF

Open Access

Journal Article•DOI•

Principled design of the modern Web architecture

[...]

Roy T. Fielding, Richard N. Taylor¹•Institutions (1)

01 May 2002-ACM Transactions on Internet Technology

TL;DR: The Representational State Transfer (REST) architectural style is introduced, developed as an abstract model of the Web architecture and used to guide the redesign and definition of the Hypertext Transfer Protocol and Uniform Resource Identifiers.

...read moreread less

Abstract: The World Wide Web has succeeded in large part because its software architecture has been designed to meet the needs of an Internet-scale distributed hypermedia application. The modern Web architecture emphasizes scalability of component interactions, generality of interfaces, independent deployment of components, and intermediary components to reduce interaction latency, enforce security, and encapsulate legacy systems. In this article we introduce the Representational State Transfer (REST) architectural style, developed as an abstract model of the Web architecture and used to guide our redesign and definition of the Hypertext Transfer Protocol and Uniform Resource Identifiers. We describe the software engineering principles guiding REST and the interaction constraints chosen to retain those principles, contrasting them to the constraints of other architectural styles. We then compare the abstract model to the currently deployed Web architecture in order to elicit mismatches between the existing protocols and the applications they are intended to support.

...read moreread less

1,581 citations

Journal Article•DOI•

Unraveling the Web services web: an introduction to SOAP, WSDL, and UDDI

[...]

Francisco Curbera¹, Matthew J. Duftler¹, Rania Khalaf¹, William A. Nagy¹, Nirmal K. Mukhi¹, Sanjiva Weerawarana¹ - Show less +2 more•Institutions (1)

IBM¹

01 Mar 2002-IEEE Internet Computing

TL;DR: This tutorial explores the most salient and stable specifications in each of the three major areas of the emerging Web services framework, which are the simple object access protocol, the Web Services Description Language and the Universal Description, Discovery, and Integration directory.

...read moreread less

Abstract: This tutorial explores the most salient and stable specifications in each of the three major areas of the emerging Web services framework. They are the simple object access protocol, the Web Services Description Language and the Universal Description, Discovery, and Integration directory, which is a registry of Web services descriptions.

...read moreread less

1,470 citations

Journal Article•DOI•

Developing and validating an instrument for measuring user-perceived web quality

[...]

Adel M. Aladwani¹, Prashant Palvia²•Institutions (2)

Kuwait University¹, University of North Carolina at Greensboro²

01 May 2002-Information & Management

TL;DR: The development of an instrument that captures key characteristics of web site quality from the user's perspective is reported on, which provides an aggregate measure of web quality and would be useful to organizations and web designers, and to researchers in related web research.

...read moreread less

1,118 citations

Journal Article•DOI•

Self-organization and identification of Web communities

[...]

Gary W. Flake, Steve Lawrence¹, C.L. Giles², Frans M. Coetzee•Institutions (2)

Gas Technology Institute¹, Pennsylvania State University²

01 Mar 2002-IEEE Computer

TL;DR: This work shows that the Web self-organizes and its link structure allows efficient identification of communities and is significant because no central authority or process governs the formation and structure of hyperlinks.

...read moreread less

Abstract: The vast improvement in information access is not the only advantage resulting from the increasing percentage of hyperlinked human knowledge available on the Web. Additionally, much potential exists for analyzing interests and relationships within science and society. However, the Web's decentralized and unorganized nature hampers content analysis. Millions of individuals operating independently and having a variety of backgrounds, knowledge, goals and cultures author the information on the Web. Despite the Web's decentralized, unorganized, and heterogeneous nature, our work shows that the Web self-organizes and its link structure allows efficient identification of communities. This self-organization is significant because no central authority or process governs the formation and structure of hyperlinks.

...read moreread less

1,033 citations

Book Chapter•DOI•

DAML-S: Web Service Description for the Semantic Web

[...]

Mark Burstein¹, Jerry R. Hobbs², Ora Lassila³, David Martin², Drew McDermott⁴, Sheila A. McIlraith⁵, Srini Narayanan², Massimo Paolucci⁶, Terry R. Payne⁶, Katia Sycara⁶ - Show less +6 more•Institutions (6)

BBN Technologies¹, SRI International², Nokia³, Yale University⁴, Stanford University⁵, Carnegie Mellon University⁶

09 Jun 2002

TL;DR: DAML-S is presented, a DAML+OIL ontology for describing the properties and capabilities of Web Services, and three aspects of the ontology are described: the service profile, the process model, and the service grounding.

...read moreread less

Abstract: In this paper we present DAML-S, a DAML+OIL ontology for describing the properties and capabilities of Web Services. Web Services - Web-accessible programs and devices - are garnering a great deal of interest from industry, and standards are emerging for low-level descriptions of Web Services. DAML-S complements this effort by providing Web Service descriptions at the application layer, describing what a service can do, and not just how it does it. In this paper we describe three aspects of our ontology: the service profile, the process model, and the service grounding. The paper focuses on the grounding, which connects our ontology with low-level XML-based descriptions of Web Services.

...read moreread less

1,018 citations

Patent•

Method for providing search-specific web pages in a network computing environment

[...]

Rodger Rio

12 Jul 2002

TL;DR: In this article, a method and article for providing search-specific page sets and query-results listings is provided, and a method for defining the custom search page and the custom results page without the need for line-by-line computer coding is presented.

...read moreread less

Abstract: A method and article for providing search-specific page sets and query-results listings is provided. The method and article provides end-users with customized, search-specific pages upon which to initiate a query. A method is also provided for defining the custom search page and the custom results page without the need for line-by-line computer coding. The present invention provides product and service information to end-users in an initiative format.

...read moreread less

960 citations

Proceedings Article•DOI•

Simulation, verification and automated composition of web services

[...]

Srini Narayanan¹, Sheila A. McIlraith²•Institutions (2)

SRI International¹, Stanford University²

07 May 2002

TL;DR: This paper defines the semantics for a relevant subset of DAML-S in terms of a first-order logical language and provides decision procedures for Web service simulation, verification and composition.

...read moreread less

Abstract: Web services -- Web-accessible programs and devices - are a key application area for the Semantic Web. With the proliferation of Web services and the evolution towards the Semantic Web comes the opportunity to automate various Web services tasks. Our objective is to enable markup and automated reasoning technology to describe, simulate, compose, test, and verify compositions of Web services. We take as our starting point the DAML-S DAML+OIL ontology for describing the capabilities of Web services. We define the semantics for a relevant subset of DAML-S in terms of a first-order logical language. With the semantics in hand, we encode our service descriptions in a Petri Net formalism and provide decision procedures for Web service simulation, verification and composition. We also provide an analysis of the complexity of these tasks under different restrictions to the DAML-S composite services we can describe. Finally, we present an implementation of our analysis techniques. This implementation takes as input a DAML-S description of a Web service, automatically generates a Petri Net and performs the desired analysis. Such a tool has broad applicability both as a back end to existing manual Web service composition tools, and as a stand-alone tool for Web service developers.

...read moreread less

953 citations

Proceedings Article•

Adapting Golog for Composition of Semantic Web Services

[...]

Sheila A. McIlraith¹, Tran Cao Son•Institutions (1)

Stanford University¹

01 Jan 2002

TL;DR: It is argued that an augmented version of the logic programming language Golog provides a natural formalism for automatically composing services on the Semantic Web and logical criteria for these generic procedures that define when they are knowledge self-sufficient and physically selfsufficient are proposed.

...read moreread less

Abstract: Motivated by the problem of automatically composing network accessible services, such as those on the World Wide Web, this paper proposes an approach to building agent technology based on the notion of generic procedures and customizing user constraint. We argue that an augmented version of the logic programming language Golog provides a natural formalism for automatically composing services on the Semantic Web. To this end, we adapt and extend the Golog language to enable programs that are generic, customizable and usable in the context of the Web. Further, we propose logical criteria for these generic procedures that define when they are knowledge self-sufficient and physically selfsufficient. To support information gathering combined with search, we propose a middle-ground Golog interpreter that operates under an assumption of reasonable persistence of certain information. These contributions are realized in our augmentation of a ConGolog interpreter that combines online execution of information-providing Web services with offline simulation of worldaltering Web services, to determine a sequence of Web Services for subsequent execution. Our implemented system is currently interacting with services on the Web.

...read moreread less

939 citations

Journal Article•DOI•

The Web Service Modeling Framework WSMF

[...]

Dieter Fensel¹, Christoph Bussler²•Institutions (2)

VU University Amsterdam¹, Oracle Corporation²

01 Jun 2002-Electronic Commerce Research and Applications

TL;DR: A fully-fledged Web Service Modeling Framework is defined that provides the appropriate conceptual model for developing and describing web services and their composition and its philosophy is based on the following principle: maximal de-coupling complemented by a scalable mediation service.

...read moreread less

912 citations

Journal Article•DOI•

A brief survey of web data extraction tools

[...]

Alberto H. F. Laender¹, Berthier Ribeiro-Neto¹, Altigran Soares da Silva¹, Juliana Silveira Teixeira¹•Institutions (1)

Universidade Federal de Minas Gerais¹

01 Jun 2002

TL;DR: A taxonomy for characterizing Web data extraction fools is proposed, a survey of major web data extraction tools described in the literature is briefly surveyed, and a qualitative analysis of them is provided.

...read moreread less

Abstract: In the last few years, several works in the literature have addressed the problem of data extraction from Web pages. The importance of this problem derives from the fact that, once extracted, the data can be handled in a way similar to instances of a traditional database. The approaches proposed in the literature to address the problem of Web data extraction use techniques borrowed from areas such as natural language processing, languages and grammars, machine learning, information retrieval, databases, and ontologies. As a consequence, they present very distinct features and capabilities which make a direct comparison difficult to be done. In this paper, we propose a taxonomy for characterizing Web data extraction fools, briefly survey major Web data extraction tools described in the literature, and provide a qualitative analysis of them. Hopefully, this work will stimulate other studies aimed at a more comprehensive analysis of data extraction approaches and tools for Web data.

...read moreread less

760 citations

Book•

Mining the Web: Discovering Knowledge from Hypertext Data

[...]

Soumen Chakrabarti

01 Jan 2002

TL;DR: This chapter discusses the infrastructure of the Web, the future of Web mining, and applications of semi-supervised learning for text and similarity and clustering.

...read moreread less

Abstract: Preface. Introduction. I Infrastructure: Crawling the Web. Web search. II Learning: Similarity and clustering. Supervised learning for text. Semi-supervised learning. III Applications: Social network analysis. Resource discovery. The future of Web mining.

...read moreread less

Journal Article•DOI•

A fine-grained access control system for XML documents

[...]

Ernesto Damiani¹, Sabrina De Capitani di Vimercati², Stefano Paraboschi³, Pierangela Samarati¹•Institutions (3)

University of Milan¹, University of Brescia², Polytechnic University of Milan³

01 May 2002-ACM Transactions on Information and System Security

TL;DR: This work presents an access control model to protect information distributed on the Web that, by exploiting XML's own capabilities, allows the definition and enforcement of access restrictions directly on the structure and content of the documents.

...read moreread less

Abstract: Web-based applications greatly increase information availability and ease of access, which is optimal for public information. The distribution and sharing of information via the Web that must be accessed in a selective way, such as electronic commerce transactions, require the definition and enforcement of security controls, ensuring that information will be accessible only to authorized entities. Different approaches have been proposed that address the problem of protecting information in a Web system. However, these approaches typically operate at the file-system level, independently of the data that have to be protected from unauthorized accesses. Part of this problem is due to the limitations of HTML, historically used to design Web documents. The extensible markup language (XML), a markup language promoted by the World Wide Web Consortium (W3C), is de facto the standard language for the exchange of information on the Internet and represents an important opportunity to provide fine-grained access control. We present an access control model to protect information distributed on the Web that, by exploiting XML's own capabilities, allows the definition and enforcement of access restrictions directly on the structure and content of the documents. We present a language for the specification of access restrictions, which uses standard notations and concepts, together with a description of a system architecture for access control enforcement based on existing technology. The result is a flexible and powerful security system offering a simple integration with current solutions.

...read moreread less

Book•

Designing Data-Intensive Web Applications

[...]

Stefano Ceri, Piero Fraternali, Aldo Bongio, Marco Brambilla, Sara Comai, Maristella Matera - Show less +2 more

30 Dec 2002

TL;DR: This chapter discusses models for Designing Web Applications: Data Model, Hypertext Model, Content Management Model, and Implementation of WebML Elements.

...read moreread less

Abstract: FOREWORD by Adam Bosworth. PREFACE. PART ONE Technology Overview: Technologies for Web Applications. PART TWO Models for Designing Web Applications: Data Model. Hypertext Model. Content Management Model. Advanced Hypertext Model. PART THREE Design of Web Applications: Overview of the Development Process. Requirements Specifications. Data Design. Hypertext Design. PART FOUR Implementation of Web Applications: Architecture Design. Data Implementation. Hypertext Implementation. Advanced Hypertext Implementation. Tools for Model-Based Development of Web Applications. APPENDIX: Summary of WebML Elements. WebML Syntax. OCL Syntax. Summary of WebML Elements Implementation. REFERENCES. INDEX.

...read moreread less

Journal Article•DOI•

The state of the art in locally distributed Web-server systems

[...]

Valeria Cardellini, Emiliano Casalicchio, Michele Colajanni, Philip S. Yu¹•Institutions (1)

IBM¹

01 Jun 2002-ACM Computing Surveys

TL;DR: This article classifies and describes main mechanisms to split the traffic load among the server nodes, discussing both the alternative architectures and the load sharing policies.

...read moreread less

Abstract: The overall increase in traffic on the World Wide Web is augmenting user-perceived response times from popular Web sites, especially in conjunction with special events. System platforms that do not replicate information content cannot provide the needed scalability to handle large traffic volumes and to match rapid and dramatic changes in the number of clients. The need to improve the performance of Web-based services has produced a variety of novel content delivery architectures. This article will focus on Web system architectures that consist of multiple server nodes distributed on a local area, with one or more mechanisms to spread client requests among the nodes. After years of continual proposals of new system solutions, routing mechanisms, and policies (the first dated back to 1994 when the NCSA Web site had to face the first million of requests per day), many problems concerning multiple server architectures for Web sites have been solved. Other issues remain to be addressed, especially at the network application layer, but the main techniques and methodologies for building scalable Web content delivery architectures placed in a single location are settled now. This article classifies and describes main mechanisms to split the traffic load among the server nodes, discussing both the alternative architectures and the load sharing policies. To this purpose, it focuses on architectures, internal routing mechanisms, and dispatching request algorithms for designing and implementing scalable Web-server systems under the control of one content provider. It identifies also some of the open research issues associated with the use of distributed systems for highly accessed Web sites.

...read moreread less

Journal Article•DOI•

Introduction to web services architecture

[...]

K. D. Gottschalk¹, S. Graham¹, H. Kreger¹, J. Snell²•Institutions (2)

Research Triangle Park¹, IBM²

01 Apr 2002-Ibm Systems Journal

TL;DR: The architectural elements of Web services are related to a real-world business scenario in order to illustrate how the Web services approach helps solve real business problems.

...read moreread less

Abstract: This paper introduces the major components of, and standards associated with, the Web services architecture. The different roles associated with the Web services architecture and the programming stack for Web services are described. The architectural elements of Web services are then related to a real-world business scenario in order to illustrate how the Web services approach helps solve real business problems.

...read moreread less

Journal Article•DOI•

The virtual customer

[...]

Ely Dahan¹, Jay Hauser¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Sep 2002-Journal of Product Innovation Management

TL;DR: This article reviews six web-based methods of customer input as examples of the improved Internet capabilities of communication, conceptualization, and computation and discusses how they complement existing methods.

...read moreread less

Book Chapter•DOI•

OntoEdit: Collaborative Ontology Development for the Semantic Web

[...]

York Sure¹, Michael Erdmann², Jürgen Angele², Steffen Staab², Steffen Staab¹, Rudi Studer¹, Rudi Studer², Rudi Studer³, Dirk Wenke² - Show less +5 more•Institutions (3)

Karlsruhe Institute of Technology¹, Ontoprise GmbH², Center for Information Technology³

09 Jun 2002

TL;DR: This paper focuses on collaborative development of ontologies with OntoEdit which is guided by a comprehensive methodology.

...read moreread less

Abstract: Ontologies now play an important role for enabling the semantic web. They provide a source of precisely defined terms e.g. for knowledge-intensive applications. The terms are used for concise communication across people and applications. Typically the development of ontologies involves collaborative efforts of multiple persons. OntoEdit is an ontology editor that integrates numerous aspects of ontology engineering. This paper focuses on collaborative development of ontologies with OntoEdit which is guided by a comprehensive methodology.

...read moreread less

Proceedings Article•DOI•

Building a recommender agent for e-learning systems

[...]

Osmar R. Zaïane

03 Dec 2002

TL;DR: The use of web mining techniques are suggested to build such an agent that could recommend on-line learning activities or shortcuts in a course web site based on learners' access history to improve course material navigation as well as assist the online learning process.

...read moreread less

Abstract: A recommender system in an e-learning context is a software agent that tries to "intelligently" recommend actions to a learner based on the actions of previous learners. This recommendation could be an on-line activity such as doing an exercise, reading posted messages on a conferencing system, or running an on-line simulation, or could be simply a web resource. These recommendation systems have been tried in e-commerce to entice purchasing of goods, but haven't been tried in e-learning. This paper suggests the use of web mining techniques to build such an agent that could recommend on-line learning activities or shortcuts in a course web site based on learners' access history to improve course material navigation as well as assist the online learning process. These techniques are considered integrated web mining as opposed to off-line web mining used by expert users to discover on-line access patterns.

...read moreread less

Proceedings Article•DOI•

Design and implementation of a high-performance distributed Web crawler

[...]

V. Shkapenyuk, Torsten Suel¹•Institutions (1)

New York University¹

07 Aug 2002

TL;DR: This paper describes the design and implementation of a distributed Web crawler that runs on a network of workstations that scales to several hundred pages per second, is resilient against system crashes and other events, and can be adapted to various crawling applications.

...read moreread less

Abstract: Broad Web search engines as well as many more specialized search tools rely on Web crawlers to acquire large collections of pages for indexing and analysis. Such a Web crawler may interact with millions of hosts over a period of weeks or months, and thus issues of robustness, flexibility, and manageability are of major importance. In addition, I/O performance, network resources, and OS limits must be taken into account in order to achieve high performance at a reasonable cost. In this paper, we describe the design and implementation of a distributed Web crawler that runs on a network of workstations. The crawler scales to (at least) several hundred pages per second, is resilient against system crashes and other events, and can be adapted to various crawling applications. We present the software architecture of the system, discuss the, performance bottlenecks, and describe efficient techniques for achieving high performance. We also report preliminary experimental results based on a crawl of 120 million pages on 5 million hosts.

...read moreread less

Journal Article•DOI•

Ontology languages for the Semantic Web

[...]

Asunción Gómez-Pérez, Oscar Corcho

01 Jan 2002-IEEE Intelligent Systems

TL;DR: The goal is to help developers find the most suitable language for their representation needs for the semantic information that this Web requires-solving heterogeneous data exchange in this heterogeneous environment.

...read moreread less

Abstract: Ontologies have proven to be an essential element in many applications. They are used in agent systems, knowledge management systems, and e-commerce platforms. They can also generate natural language, integrate intelligent information, provide semantic-based access to the Internet, and extract information from texts in addition to being used in many other applications to explicitly declare the knowledge embedded in them. However, not only are ontologies useful for applications in which knowledge plays a key role, but they can also trigger a major change in current Web contents. This change is leading to the third generation of the Web-known as the Semantic Web-which has been defined as the conceptual structuring of the Web in an explicit machine-readable way. New ontology-based applications and knowledge architectures are developing for this new Web. A common claim for all of these approaches is the need for languages to represent the semantic information that this Web requires-solving heterogeneous data exchange in this heterogeneous environment. Our goal is to help developers find the most suitable language for their representation needs.

...read moreread less

Journal Article•DOI•

Web mining in soft computing framework: relevance, state of the art and future directions

[...]

Sankar K. Pal, V. Talwar¹, Pabitra Mitra²•Institutions (2)

Netaji Subhas Institute of Technology¹, Indian Statistical Institute²

01 Sep 2002-IEEE Transactions on Neural Networks

TL;DR: The paper summarizes the different characteristics of Web data, the basic components of Web mining and its different types, and the current state of the art.

...read moreread less

Abstract: The paper summarizes the different characteristics of Web data, the basic components of Web mining and its different types, and the current state of the art. The reason for considering Web mining, a separate field from data mining, is explained. The limitations of some of the existing Web mining methods and tools are enunciated, and the significance of soft computing (comprising fuzzy logic (FL), artificial neural networks (ANNs), genetic algorithms (GAs), and rough sets (RSs) are highlighted. A survey of the existing literature on "soft Web mining" is provided along with the commercially available systems. The prospective areas of Web mining where the application of soft computing needs immediate attention are outlined with justification. Scope for future research in developing "soft Web mining" systems is explained. An extensive bibliography is also provided.

...read moreread less

Book Chapter•DOI•

S-CREAM - Semi-automatic CREAtion of Metadata

[...]

Siegfried Handschuh¹, Steffen Staab¹, Fabio Ciravegna²•Institutions (2)

Karlsruhe Institute of Technology¹, University of Sheffield²

01 Oct 2002

TL;DR: OntoMat-Annotizer extract with the help of Amilcare knowledge structure from web pages through the use of knowledge extraction rules, the result of a learning-cycle based on already annotated pages.

...read moreread less

Abstract: Richly interlinked, machine-understandable data constitute the basis for the Semantic Web. We provide a framework, S-CREAM, that allows for creation of metadata and is trainable for a specific domain. Annotating web documents is one of the major techniques for creating metadata on the web. The implementation of S-CREAM, OntoMat-Annotizer supports now the semi-automatic annotation of web pages. This semi-automatic annotation is based on the information extraction component Amilcare. OntoMat-Annotizer extract with the help of Amilcare knowledge structure from web pages through the use of knowledge extraction rules. These rules are the result of a learning-cycle based on already annotated pages.

...read moreread less

Journal Article•DOI•

Quality attributes of Web software applications

[...]

Jeff Offutt¹•Institutions (1)

George Mason University¹

01 Mar 2002-IEEE Software

TL;DR: Some of the technological challenges of building today's complex Web software applications, their unique quality requirements, and how to achieve them are discussed.

...read moreread less

Abstract: Web applications have very high requirements for numerous quality attributes. This article discusses some of the technological challenges of building today's complex Web software applications, their unique quality requirements, and how to achieve them.

...read moreread less

Journal Article•DOI•

Challenges in web search engines

[...]

Monika Henzinger¹, Rajeev Motwani², Craig Silverstein¹•Institutions (2)

Google¹, Stanford University²

01 Sep 2002

TL;DR: This article presents a high-level discussion of some problems in information retrieval that are unique to web search engines.

...read moreread less

Abstract: This article presents a high-level discussion of some problems in information retrieval that are unique to web search engines. The goal is to raise awareness and stimulate research in these areas.

...read moreread less

Journal Article•DOI•

An Overview of Standards and Related Technology in Web Services

[...]

Aphrodite Tsalgatidou¹, Thomi Pilioura¹•Institutions (1)

National and Kapodistrian University of Athens¹

01 Sep 2002-Distributed and Parallel Databases

TL;DR: The Web Service model is presented and an overview of existing standards are given, the Web Service life-cycle is sketched, and related technical challenges are discussed and how they are addressed by current standards, commercial products and research efforts.

...read moreread less

Abstract: The Internet is revolutionizing business by providing an affordable and efficient way to link companies with their partners as well as customers. Nevertheless, there are problems that degrade the profitability of the Internet: closed markets that cannot use each other's servicess incompatible applications and frameworks that cannot interoperate or built upon each others difficulties in exchanging business data. Web Services is a new paradigm for e-business that is expected to change the way business applications are developed and interoperate. A Web Service is a self-describing, self-contained, modular application accessible over the web. It exposes an XML interface, it is registered and can be located through a Web Service registry. Finally, it communicates with other services using XML messages over standard Web protocols. This paper presents the Web Service model and gives an overview of existing standards. It then sketches the Web Service life-cycle, discusses related technical challenges and how they are addressed by current standards, commercial products and research efforts. Finally it gives some concluding remarks regarding the state of the art of Web Services.

...read moreread less

Journal Article•DOI•

Discovery of Web Robot Sessions Based on their Navigational Patterns

[...]

Pang-Ning Tan¹, Vipin Kumar¹•Institutions (1)

University of Minnesota¹

01 Jan 2002-Data Mining and Knowledge Discovery

TL;DR: Experimental results on the Computer Science department Web server logs show that highly accurate classification models can be built using the navigational patterns in the click-stream data to determine if it is due to a robot.

...read moreread less

Abstract: Web robots are software programs that automatically traverse the hyperlink structure of the World Wide Web in order to locate and retrieve information. There are many reasons why it is important to identify visits by Web robots and distinguish them from other users. First of all, e-commerce retailers are particularly concerned about the unauthorized deployment of robots for gathering business intelligence at their Web sites. In addition, Web robots tend to consume considerable network bandwidth at the expense of other users. Sessions due to Web robots also make it more difficult to perform clickstream analysis effectively on the Web data. Conventional techniques for detecting Web robots are often based on identifying the IP address and user agent of the Web clients. While these techniques are applicable to many well-known robots, they may not be sufficient to detect camouflaged and previously unknown robots. In this paper, we propose an alternative approach that uses the navigational patterns in the click-stream data to determine if it is due to a robot. Experimental results on our Computer Science department Web server logs show that highly accurate classification models can be built using this approach. We also show that these models are able to discover many camouflaged and previously unidentified robots.

...read moreread less

Specification and Implementation of Dynamic Web Site Benchmarks

[...]

Cristiana Amza, Emmanuel Cecchet, Anupam Chanda, Alan L. Cox, Sameh Elnikety, Romer Gil, Julie Marguerite, Karthick Rajamani, Willy Zwaenepoel - Show less +5 more

01 Jan 2002

TL;DR: This paper describes three benchmarks for evaluating the performance of Web sites with dynamic content, and implemented these three benchmarks with a variety of methods for building dynamic-content applications, including PHP, Java servlets and EJB (Enterprise Java Beans).

...read moreread less

Abstract: The absence of benchmarks for Web sites with dynamic content has been a major impediment to research in this area. We describe three benchmarks for evaluating the performance of Web sites with dynamic content. The benchmarks model three common types of dynamic content Web sites with widely varying application characteristics: an online bookstore, an auction site, and a bulletin board. For the online bookstore, we use the TPCW specification. For the auction site and the bulletin board, we provide our own specification, modeled after ebay.com and slahdot.org, respectively. For each benchmark we describe the design of the database and the interactions provided by the Web server. We have implemented these three benchmarks with a variety of methods for building dynamic-content applications, including PHP, Java servlets and EJB (Enterprise Java Beans). In all cases, we use commonly used open-source software. We also provide a client emulator that allows a dynamic content Web server to be driven with various workloads. Our implementations are available freely from our Web site for other researchers to use. These benchmarks can be used for research in dynamic Web and application server design. In this paper, we provide one example of such possible use, namely discovering the bottlenecks for applications in a particular server configuration. Other possible uses include studies of clustering and caching for dynamic content, comparison of different application implementation methods, and studying the effect of different workload characteristics on the performance of servers. With these benchmarks we hope to provide a common reference point for studies in these areas.

...read moreread less

Proceedings Article•DOI•

Abstracting application-level web security

[...]

David Scott, Richard Sharp¹•Institutions (1)

University of Cambridge¹

07 May 2002

TL;DR: A scalable structuring mechanism facilitating the abstraction of security policies from large web-applications developed in heterogenous multi-platform environments is described and a tool which assists programmers develop secure applications which are resilient to a wide range of common attacks is presented.

...read moreread less

Abstract: Application-level web security refers to vulnerabilities inherent in the code of a web-application itself (irrespective of the technologies in which it is implemented or the security of the web-server/back-end database on which it is built). In the last few months application-level vulnerabilities have been exploited with serious consequences: hackers have tricked e-commerce sites into shipping goods for no charge, user-names and passwords have been harvested and condential information (such as addresses and credit-card numbers) has been leaked.In this paper we investigate new tools and techniques which address the problem of application-level web security. We (i) describe a scalable structuring mechanism facilitating the abstraction of security policies from large web-applications developed in heterogenous multi-platform environments; (ii) present a tool which assists programmers develop secure applications which are resilient to a wide range of common attacks; and (iii) report results and experience arising from our implementation of these techniques.

...read moreread less

Book Chapter•DOI•

KAON - Towards a Large Scale Semantic Web

[...]

Erol Bozsak¹, Marc Ehrig¹, Siegfried Handschuh¹, Andreas Hotho¹, Alexander Maedche¹, Boris Motik¹, Daniel Oberle¹, Christoph Schmitz¹, Steffen Staab¹, Ljiljana Stojanovic¹, Nenad Stojanovic¹, Rudi Studer¹, Gerd Stumme¹, York Sure¹, Julien Tane¹, Raphael Volz¹, Valentin Zacharias¹ - Show less +13 more•Institutions (1)

Forschungszentrum Informatik¹

02 Sep 2002

TL;DR: KAON - the Karlsruhe Ontology and Semantic WebTool Suite is introduced, specifically designed to provide the ontology and metadata infrastructure needed for building, using and accessing semantics-driven applications on the Web and on your desktop.

...read moreread less

Abstract: The Semantic Web will bring structure to the content of Web pages, being an extension of the current Web, in which information is given a well-defined meaning. Especially within e-commerce applications, Semantic Web technologies in the form of ontologies and metadata are becoming increasingly prevalent and important. This paper introduce KAON - the Karlsruhe Ontology and Semantic WebTool Suite. KAON is developed jointly within several EU-funded projects and specifically designed to provide the ontology and metadata infrastructure needed for building, using and accessing semantics-driven applications on the Web and on your desktop.

...read moreread less

Proceedings Article•DOI•

Cognitive walkthrough for the web

[...]

Marilyn Hughes Blackmon¹, Peter G. Polson¹, Muneo Kitajima², Clayton Lewis¹•Institutions (2)

University of Colorado Boulder¹, National Institute of Advanced Industrial Science and Technology²

20 Apr 2002

TL;DR: The new Cognitive Walkthrough for the Web (CWW) is superior for evaluating how well websites support users' navigation and information search tasks.

...read moreread less

Abstract: This paper proposes a transformation of the Cognitive Walkthrough (CW), a theory-based usability inspection method that has proven useful in designing applications that support use by exploration. The new Cognitive Walkthrough for the Web (CWW) is superior for evaluating how well websites support users' navigation and information search tasks. The CWW uses Latent Semantic Analysis to objectively estimate the degree of semantic similarity (information scent) between representative user goal statements (100-200 words) and heading/link texts on each web page. Using an actual website, the paper shows how the CWW identifies three types of problems in web page designs. Three experiments test CWW predictions of users' success rates in accomplishing goals, verifying the value of CWW for identifying these usability problems

...read moreread less

Collapse