scispace - formally typeset
Search or ask a question

Showing papers on "Web modeling published in 2002"


Journal ArticleDOI
TL;DR: The Representational State Transfer (REST) architectural style is introduced, developed as an abstract model of the Web architecture and used to guide the redesign and definition of the Hypertext Transfer Protocol and Uniform Resource Identifiers.
Abstract: The World Wide Web has succeeded in large part because its software architecture has been designed to meet the needs of an Internet-scale distributed hypermedia application. The modern Web architecture emphasizes scalability of component interactions, generality of interfaces, independent deployment of components, and intermediary components to reduce interaction latency, enforce security, and encapsulate legacy systems. In this article we introduce the Representational State Transfer (REST) architectural style, developed as an abstract model of the Web architecture and used to guide our redesign and definition of the Hypertext Transfer Protocol and Uniform Resource Identifiers. We describe the software engineering principles guiding REST and the interaction constraints chosen to retain those principles, contrasting them to the constraints of other architectural styles. We then compare the abstract model to the currently deployed Web architecture in order to elicit mismatches between the existing protocols and the applications they are intended to support.

1,581 citations


Journal ArticleDOI
TL;DR: This tutorial explores the most salient and stable specifications in each of the three major areas of the emerging Web services framework, which are the simple object access protocol, the Web Services Description Language and the Universal Description, Discovery, and Integration directory.
Abstract: This tutorial explores the most salient and stable specifications in each of the three major areas of the emerging Web services framework. They are the simple object access protocol, the Web Services Description Language and the Universal Description, Discovery, and Integration directory, which is a registry of Web services descriptions.

1,470 citations


Journal ArticleDOI
TL;DR: The development of an instrument that captures key characteristics of web site quality from the user's perspective is reported on, which provides an aggregate measure of web quality and would be useful to organizations and web designers, and to researchers in related web research.

1,118 citations


Journal ArticleDOI
TL;DR: This work shows that the Web self-organizes and its link structure allows efficient identification of communities and is significant because no central authority or process governs the formation and structure of hyperlinks.
Abstract: The vast improvement in information access is not the only advantage resulting from the increasing percentage of hyperlinked human knowledge available on the Web. Additionally, much potential exists for analyzing interests and relationships within science and society. However, the Web's decentralized and unorganized nature hampers content analysis. Millions of individuals operating independently and having a variety of backgrounds, knowledge, goals and cultures author the information on the Web. Despite the Web's decentralized, unorganized, and heterogeneous nature, our work shows that the Web self-organizes and its link structure allows efficient identification of communities. This self-organization is significant because no central authority or process governs the formation and structure of hyperlinks.

1,033 citations


Book ChapterDOI
09 Jun 2002
TL;DR: DAML-S is presented, a DAML+OIL ontology for describing the properties and capabilities of Web Services, and three aspects of the ontology are described: the service profile, the process model, and the service grounding.
Abstract: In this paper we present DAML-S, a DAML+OIL ontology for describing the properties and capabilities of Web Services. Web Services - Web-accessible programs and devices - are garnering a great deal of interest from industry, and standards are emerging for low-level descriptions of Web Services. DAML-S complements this effort by providing Web Service descriptions at the application layer, describing what a service can do, and not just how it does it. In this paper we describe three aspects of our ontology: the service profile, the process model, and the service grounding. The paper focuses on the grounding, which connects our ontology with low-level XML-based descriptions of Web Services.

1,018 citations


Patent
12 Jul 2002
TL;DR: In this article, a method and article for providing search-specific page sets and query-results listings is provided, and a method for defining the custom search page and the custom results page without the need for line-by-line computer coding is presented.
Abstract: A method and article for providing search-specific page sets and query-results listings is provided. The method and article provides end-users with customized, search-specific pages upon which to initiate a query. A method is also provided for defining the custom search page and the custom results page without the need for line-by-line computer coding. The present invention provides product and service information to end-users in an initiative format.

960 citations


Proceedings ArticleDOI
07 May 2002
TL;DR: This paper defines the semantics for a relevant subset of DAML-S in terms of a first-order logical language and provides decision procedures for Web service simulation, verification and composition.
Abstract: Web services -- Web-accessible programs and devices - are a key application area for the Semantic Web. With the proliferation of Web services and the evolution towards the Semantic Web comes the opportunity to automate various Web services tasks. Our objective is to enable markup and automated reasoning technology to describe, simulate, compose, test, and verify compositions of Web services. We take as our starting point the DAML-S DAML+OIL ontology for describing the capabilities of Web services. We define the semantics for a relevant subset of DAML-S in terms of a first-order logical language. With the semantics in hand, we encode our service descriptions in a Petri Net formalism and provide decision procedures for Web service simulation, verification and composition. We also provide an analysis of the complexity of these tasks under different restrictions to the DAML-S composite services we can describe. Finally, we present an implementation of our analysis techniques. This implementation takes as input a DAML-S description of a Web service, automatically generates a Petri Net and performs the desired analysis. Such a tool has broad applicability both as a back end to existing manual Web service composition tools, and as a stand-alone tool for Web service developers.

953 citations


Proceedings Article
01 Jan 2002
TL;DR: It is argued that an augmented version of the logic programming language Golog provides a natural formalism for automatically composing services on the Semantic Web and logical criteria for these generic procedures that define when they are knowledge self-sufficient and physically selfsufficient are proposed.
Abstract: Motivated by the problem of automatically composing network accessible services, such as those on the World Wide Web, this paper proposes an approach to building agent technology based on the notion of generic procedures and customizing user constraint. We argue that an augmented version of the logic programming language Golog provides a natural formalism for automatically composing services on the Semantic Web. To this end, we adapt and extend the Golog language to enable programs that are generic, customizable and usable in the context of the Web. Further, we propose logical criteria for these generic procedures that define when they are knowledge self-sufficient and physically selfsufficient. To support information gathering combined with search, we propose a middle-ground Golog interpreter that operates under an assumption of reasonable persistence of certain information. These contributions are realized in our augmentation of a ConGolog interpreter that combines online execution of information-providing Web services with offline simulation of worldaltering Web services, to determine a sequence of Web Services for subsequent execution. Our implemented system is currently interacting with services on the Web.

939 citations


Journal ArticleDOI
TL;DR: A fully-fledged Web Service Modeling Framework is defined that provides the appropriate conceptual model for developing and describing web services and their composition and its philosophy is based on the following principle: maximal de-coupling complemented by a scalable mediation service.

912 citations


Journal ArticleDOI
01 Jun 2002
TL;DR: A taxonomy for characterizing Web data extraction fools is proposed, a survey of major web data extraction tools described in the literature is briefly surveyed, and a qualitative analysis of them is provided.
Abstract: In the last few years, several works in the literature have addressed the problem of data extraction from Web pages. The importance of this problem derives from the fact that, once extracted, the data can be handled in a way similar to instances of a traditional database. The approaches proposed in the literature to address the problem of Web data extraction use techniques borrowed from areas such as natural language processing, languages and grammars, machine learning, information retrieval, databases, and ontologies. As a consequence, they present very distinct features and capabilities which make a direct comparison difficult to be done. In this paper, we propose a taxonomy for characterizing Web data extraction fools, briefly survey major Web data extraction tools described in the literature, and provide a qualitative analysis of them. Hopefully, this work will stimulate other studies aimed at a more comprehensive analysis of data extraction approaches and tools for Web data.

760 citations


Book
01 Jan 2002
TL;DR: This chapter discusses the infrastructure of the Web, the future of Web mining, and applications of semi-supervised learning for text and similarity and clustering.
Abstract: Preface. Introduction. I Infrastructure: Crawling the Web. Web search. II Learning: Similarity and clustering. Supervised learning for text. Semi-supervised learning. III Applications: Social network analysis. Resource discovery. The future of Web mining.

Journal ArticleDOI
TL;DR: This work presents an access control model to protect information distributed on the Web that, by exploiting XML's own capabilities, allows the definition and enforcement of access restrictions directly on the structure and content of the documents.
Abstract: Web-based applications greatly increase information availability and ease of access, which is optimal for public information. The distribution and sharing of information via the Web that must be accessed in a selective way, such as electronic commerce transactions, require the definition and enforcement of security controls, ensuring that information will be accessible only to authorized entities. Different approaches have been proposed that address the problem of protecting information in a Web system. However, these approaches typically operate at the file-system level, independently of the data that have to be protected from unauthorized accesses. Part of this problem is due to the limitations of HTML, historically used to design Web documents. The extensible markup language (XML), a markup language promoted by the World Wide Web Consortium (W3C), is de facto the standard language for the exchange of information on the Internet and represents an important opportunity to provide fine-grained access control. We present an access control model to protect information distributed on the Web that, by exploiting XML's own capabilities, allows the definition and enforcement of access restrictions directly on the structure and content of the documents. We present a language for the specification of access restrictions, which uses standard notations and concepts, together with a description of a system architecture for access control enforcement based on existing technology. The result is a flexible and powerful security system offering a simple integration with current solutions.

Book
30 Dec 2002
TL;DR: This chapter discusses models for Designing Web Applications: Data Model, Hypertext Model, Content Management Model, and Implementation of WebML Elements.
Abstract: FOREWORD by Adam Bosworth. PREFACE. PART ONE Technology Overview: Technologies for Web Applications. PART TWO Models for Designing Web Applications: Data Model. Hypertext Model. Content Management Model. Advanced Hypertext Model. PART THREE Design of Web Applications: Overview of the Development Process. Requirements Specifications. Data Design. Hypertext Design. PART FOUR Implementation of Web Applications: Architecture Design. Data Implementation. Hypertext Implementation. Advanced Hypertext Implementation. Tools for Model-Based Development of Web Applications. APPENDIX: Summary of WebML Elements. WebML Syntax. OCL Syntax. Summary of WebML Elements Implementation. REFERENCES. INDEX.

Journal ArticleDOI
TL;DR: This article classifies and describes main mechanisms to split the traffic load among the server nodes, discussing both the alternative architectures and the load sharing policies.
Abstract: The overall increase in traffic on the World Wide Web is augmenting user-perceived response times from popular Web sites, especially in conjunction with special events. System platforms that do not replicate information content cannot provide the needed scalability to handle large traffic volumes and to match rapid and dramatic changes in the number of clients. The need to improve the performance of Web-based services has produced a variety of novel content delivery architectures. This article will focus on Web system architectures that consist of multiple server nodes distributed on a local area, with one or more mechanisms to spread client requests among the nodes. After years of continual proposals of new system solutions, routing mechanisms, and policies (the first dated back to 1994 when the NCSA Web site had to face the first million of requests per day), many problems concerning multiple server architectures for Web sites have been solved. Other issues remain to be addressed, especially at the network application layer, but the main techniques and methodologies for building scalable Web content delivery architectures placed in a single location are settled now. This article classifies and describes main mechanisms to split the traffic load among the server nodes, discussing both the alternative architectures and the load sharing policies. To this purpose, it focuses on architectures, internal routing mechanisms, and dispatching request algorithms for designing and implementing scalable Web-server systems under the control of one content provider. It identifies also some of the open research issues associated with the use of distributed systems for highly accessed Web sites.

Journal ArticleDOI
TL;DR: The architectural elements of Web services are related to a real-world business scenario in order to illustrate how the Web services approach helps solve real business problems.
Abstract: This paper introduces the major components of, and standards associated with, the Web services architecture. The different roles associated with the Web services architecture and the programming stack for Web services are described. The architectural elements of Web services are then related to a real-world business scenario in order to illustrate how the Web services approach helps solve real business problems.

Journal ArticleDOI
TL;DR: This article reviews six web-based methods of customer input as examples of the improved Internet capabilities of communication, conceptualization, and computation and discusses how they complement existing methods.

Book ChapterDOI
09 Jun 2002
TL;DR: This paper focuses on collaborative development of ontologies with OntoEdit which is guided by a comprehensive methodology.
Abstract: Ontologies now play an important role for enabling the semantic web. They provide a source of precisely defined terms e.g. for knowledge-intensive applications. The terms are used for concise communication across people and applications. Typically the development of ontologies involves collaborative efforts of multiple persons. OntoEdit is an ontology editor that integrates numerous aspects of ontology engineering. This paper focuses on collaborative development of ontologies with OntoEdit which is guided by a comprehensive methodology.

Proceedings ArticleDOI
03 Dec 2002
TL;DR: The use of web mining techniques are suggested to build such an agent that could recommend on-line learning activities or shortcuts in a course web site based on learners' access history to improve course material navigation as well as assist the online learning process.
Abstract: A recommender system in an e-learning context is a software agent that tries to "intelligently" recommend actions to a learner based on the actions of previous learners. This recommendation could be an on-line activity such as doing an exercise, reading posted messages on a conferencing system, or running an on-line simulation, or could be simply a web resource. These recommendation systems have been tried in e-commerce to entice purchasing of goods, but haven't been tried in e-learning. This paper suggests the use of web mining techniques to build such an agent that could recommend on-line learning activities or shortcuts in a course web site based on learners' access history to improve course material navigation as well as assist the online learning process. These techniques are considered integrated web mining as opposed to off-line web mining used by expert users to discover on-line access patterns.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: This paper describes the design and implementation of a distributed Web crawler that runs on a network of workstations that scales to several hundred pages per second, is resilient against system crashes and other events, and can be adapted to various crawling applications.
Abstract: Broad Web search engines as well as many more specialized search tools rely on Web crawlers to acquire large collections of pages for indexing and analysis. Such a Web crawler may interact with millions of hosts over a period of weeks or months, and thus issues of robustness, flexibility, and manageability are of major importance. In addition, I/O performance, network resources, and OS limits must be taken into account in order to achieve high performance at a reasonable cost. In this paper, we describe the design and implementation of a distributed Web crawler that runs on a network of workstations. The crawler scales to (at least) several hundred pages per second, is resilient against system crashes and other events, and can be adapted to various crawling applications. We present the software architecture of the system, discuss the, performance bottlenecks, and describe efficient techniques for achieving high performance. We also report preliminary experimental results based on a crawl of 120 million pages on 5 million hosts.

Journal ArticleDOI
TL;DR: The goal is to help developers find the most suitable language for their representation needs for the semantic information that this Web requires-solving heterogeneous data exchange in this heterogeneous environment.
Abstract: Ontologies have proven to be an essential element in many applications. They are used in agent systems, knowledge management systems, and e-commerce platforms. They can also generate natural language, integrate intelligent information, provide semantic-based access to the Internet, and extract information from texts in addition to being used in many other applications to explicitly declare the knowledge embedded in them. However, not only are ontologies useful for applications in which knowledge plays a key role, but they can also trigger a major change in current Web contents. This change is leading to the third generation of the Web-known as the Semantic Web-which has been defined as the conceptual structuring of the Web in an explicit machine-readable way. New ontology-based applications and knowledge architectures are developing for this new Web. A common claim for all of these approaches is the need for languages to represent the semantic information that this Web requires-solving heterogeneous data exchange in this heterogeneous environment. Our goal is to help developers find the most suitable language for their representation needs.

Journal ArticleDOI
TL;DR: The paper summarizes the different characteristics of Web data, the basic components of Web mining and its different types, and the current state of the art.
Abstract: The paper summarizes the different characteristics of Web data, the basic components of Web mining and its different types, and the current state of the art. The reason for considering Web mining, a separate field from data mining, is explained. The limitations of some of the existing Web mining methods and tools are enunciated, and the significance of soft computing (comprising fuzzy logic (FL), artificial neural networks (ANNs), genetic algorithms (GAs), and rough sets (RSs) are highlighted. A survey of the existing literature on "soft Web mining" is provided along with the commercially available systems. The prospective areas of Web mining where the application of soft computing needs immediate attention are outlined with justification. Scope for future research in developing "soft Web mining" systems is explained. An extensive bibliography is also provided.

Book ChapterDOI
01 Oct 2002
TL;DR: OntoMat-Annotizer extract with the help of Amilcare knowledge structure from web pages through the use of knowledge extraction rules, the result of a learning-cycle based on already annotated pages.
Abstract: Richly interlinked, machine-understandable data constitute the basis for the Semantic Web. We provide a framework, S-CREAM, that allows for creation of metadata and is trainable for a specific domain. Annotating web documents is one of the major techniques for creating metadata on the web. The implementation of S-CREAM, OntoMat-Annotizer supports now the semi-automatic annotation of web pages. This semi-automatic annotation is based on the information extraction component Amilcare. OntoMat-Annotizer extract with the help of Amilcare knowledge structure from web pages through the use of knowledge extraction rules. These rules are the result of a learning-cycle based on already annotated pages.

Journal ArticleDOI
TL;DR: Some of the technological challenges of building today's complex Web software applications, their unique quality requirements, and how to achieve them are discussed.
Abstract: Web applications have very high requirements for numerous quality attributes. This article discusses some of the technological challenges of building today's complex Web software applications, their unique quality requirements, and how to achieve them.

Journal ArticleDOI
01 Sep 2002
TL;DR: This article presents a high-level discussion of some problems in information retrieval that are unique to web search engines.
Abstract: This article presents a high-level discussion of some problems in information retrieval that are unique to web search engines. The goal is to raise awareness and stimulate research in these areas.

Journal ArticleDOI
TL;DR: The Web Service model is presented and an overview of existing standards are given, the Web Service life-cycle is sketched, and related technical challenges are discussed and how they are addressed by current standards, commercial products and research efforts.
Abstract: The Internet is revolutionizing business by providing an affordable and efficient way to link companies with their partners as well as customers. Nevertheless, there are problems that degrade the profitability of the Internet: closed markets that cannot use each other's servicess incompatible applications and frameworks that cannot interoperate or built upon each others difficulties in exchanging business data. Web Services is a new paradigm for e-business that is expected to change the way business applications are developed and interoperate. A Web Service is a self-describing, self-contained, modular application accessible over the web. It exposes an XML interface, it is registered and can be located through a Web Service registry. Finally, it communicates with other services using XML messages over standard Web protocols. This paper presents the Web Service model and gives an overview of existing standards. It then sketches the Web Service life-cycle, discusses related technical challenges and how they are addressed by current standards, commercial products and research efforts. Finally it gives some concluding remarks regarding the state of the art of Web Services.

Journal ArticleDOI
TL;DR: Experimental results on the Computer Science department Web server logs show that highly accurate classification models can be built using the navigational patterns in the click-stream data to determine if it is due to a robot.
Abstract: Web robots are software programs that automatically traverse the hyperlink structure of the World Wide Web in order to locate and retrieve information. There are many reasons why it is important to identify visits by Web robots and distinguish them from other users. First of all, e-commerce retailers are particularly concerned about the unauthorized deployment of robots for gathering business intelligence at their Web sites. In addition, Web robots tend to consume considerable network bandwidth at the expense of other users. Sessions due to Web robots also make it more difficult to perform clickstream analysis effectively on the Web data. Conventional techniques for detecting Web robots are often based on identifying the IP address and user agent of the Web clients. While these techniques are applicable to many well-known robots, they may not be sufficient to detect camouflaged and previously unknown robots. In this paper, we propose an alternative approach that uses the navigational patterns in the click-stream data to determine if it is due to a robot. Experimental results on our Computer Science department Web server logs show that highly accurate classification models can be built using this approach. We also show that these models are able to discover many camouflaged and previously unidentified robots.

01 Jan 2002
TL;DR: This paper describes three benchmarks for evaluating the performance of Web sites with dynamic content, and implemented these three benchmarks with a variety of methods for building dynamic-content applications, including PHP, Java servlets and EJB (Enterprise Java Beans).
Abstract: The absence of benchmarks for Web sites with dynamic content has been a major impediment to research in this area. We describe three benchmarks for evaluating the performance of Web sites with dynamic content. The benchmarks model three common types of dynamic content Web sites with widely varying application characteristics: an online bookstore, an auction site, and a bulletin board. For the online bookstore, we use the TPCW specification. For the auction site and the bulletin board, we provide our own specification, modeled after ebay.com and slahdot.org, respectively. For each benchmark we describe the design of the database and the interactions provided by the Web server. We have implemented these three benchmarks with a variety of methods for building dynamic-content applications, including PHP, Java servlets and EJB (Enterprise Java Beans). In all cases, we use commonly used open-source software. We also provide a client emulator that allows a dynamic content Web server to be driven with various workloads. Our implementations are available freely from our Web site for other researchers to use. These benchmarks can be used for research in dynamic Web and application server design. In this paper, we provide one example of such possible use, namely discovering the bottlenecks for applications in a particular server configuration. Other possible uses include studies of clustering and caching for dynamic content, comparison of different application implementation methods, and studying the effect of different workload characteristics on the performance of servers. With these benchmarks we hope to provide a common reference point for studies in these areas.

Proceedings ArticleDOI
07 May 2002
TL;DR: A scalable structuring mechanism facilitating the abstraction of security policies from large web-applications developed in heterogenous multi-platform environments is described and a tool which assists programmers develop secure applications which are resilient to a wide range of common attacks is presented.
Abstract: Application-level web security refers to vulnerabilities inherent in the code of a web-application itself (irrespective of the technologies in which it is implemented or the security of the web-server/back-end database on which it is built). In the last few months application-level vulnerabilities have been exploited with serious consequences: hackers have tricked e-commerce sites into shipping goods for no charge, user-names and passwords have been harvested and condential information (such as addresses and credit-card numbers) has been leaked.In this paper we investigate new tools and techniques which address the problem of application-level web security. We (i) describe a scalable structuring mechanism facilitating the abstraction of security policies from large web-applications developed in heterogenous multi-platform environments; (ii) present a tool which assists programmers develop secure applications which are resilient to a wide range of common attacks; and (iii) report results and experience arising from our implementation of these techniques.

Book ChapterDOI
02 Sep 2002
TL;DR: KAON - the Karlsruhe Ontology and Semantic WebTool Suite is introduced, specifically designed to provide the ontology and metadata infrastructure needed for building, using and accessing semantics-driven applications on the Web and on your desktop.
Abstract: The Semantic Web will bring structure to the content of Web pages, being an extension of the current Web, in which information is given a well-defined meaning. Especially within e-commerce applications, Semantic Web technologies in the form of ontologies and metadata are becoming increasingly prevalent and important. This paper introduce KAON - the Karlsruhe Ontology and Semantic WebTool Suite. KAON is developed jointly within several EU-funded projects and specifically designed to provide the ontology and metadata infrastructure needed for building, using and accessing semantics-driven applications on the Web and on your desktop.

Proceedings ArticleDOI
20 Apr 2002
TL;DR: The new Cognitive Walkthrough for the Web (CWW) is superior for evaluating how well websites support users' navigation and information search tasks.
Abstract: This paper proposes a transformation of the Cognitive Walkthrough (CW), a theory-based usability inspection method that has proven useful in designing applications that support use by exploration. The new Cognitive Walkthrough for the Web (CWW) is superior for evaluating how well websites support users' navigation and information search tasks. The CWW uses Latent Semantic Analysis to objectively estimate the degree of semantic similarity (information scent) between representative user goal statements (100-200 words) and heading/link texts on each web page. Using an actual website, the paper shows how the CWW identifies three types of problems in web page designs. Three experiments test CWW predictions of users' success rates in accomplishing goals, verifying the value of CWW for identifying these usability problems