Showing papers on "Web modeling published in 2012"

PDF

Open Access

Journal Article•DOI•

Evolution of the world wide web : from web 1.0 to web 4.0

[...]

31 Jan 2012-International Journal of Web & Semantic Technology

TL;DR: The World Wide Web as the largest information construct has had much progress since its advent and this paper provides a background of the evolution of the web from web 1.0 to web 4.0.

...read moreread less

Abstract: The World Wide Web as the largest information construct has had much progress since its advent. This paper provides a background of the evolution of the web from web 1.0 to web 4.0. Web 1.0 as a web of information connections, Web 2.0 as a web of people connections, Web 3.0 as a web of knowledge connections and web 4.0 as a web of intelligence connections are described as four generations of the web in the paper.

...read moreread less

358 citations

Journal Article•DOI•

Crawling Ajax-Based Web Applications through Dynamic Analysis of User Interface State Changes

[...]

Ali Mesbah¹, Arie van Deursen², Stefan Lenselink²•Institutions (2)

University of British Columbia¹, Delft University of Technology²

01 Mar 2012-ACM Transactions on The Web

TL;DR: A novel technique for crawling Ajax-based applications through automatic dynamic analysis of user-interface-state changes in Web browsers, and incrementally infers a state machine that models the various navigational paths and states within an Ajax application.

...read moreread less

Abstract: Using JavaScript and dynamic DOM manipulation on the client side of Web applications is becoming a widespread approach for achieving rich interactivity and responsiveness in modern Web applications. At the same time, such techniques---collectively known as Ajax---shatter the concept of webpages with unique URLs, on which traditional Web crawlers are based. This article describes a novel technique for crawling Ajax-based applications through automatic dynamic analysis of user-interface-state changes in Web browsers. Our algorithm scans the DOM tree, spots candidate elements that are capable of changing the state, fires events on those candidate elements, and incrementally infers a state machine that models the various navigational paths and states within an Ajax application. This inferred model can be used in program comprehension and in analysis and testing of dynamic Web states, for instance, or for generating a static version of the application. In this article, we discuss our sequential and concurrent Ajax crawling algorithms. We present our open source tool called Crawljax, which implements the concepts and algorithms discussed in this article. Additionally, we report a number of empirical studies in which we apply our approach to a number of open-source and industrial Web applications and elaborate on the obtained results.

...read moreread less

338 citations

Proceedings Article•DOI•

Twitcident: fighting fire with information from social web streams

[...]

Fabian Abel¹, Claudia Hauff¹, Geert-Jan Houben¹, Richard Stronkman, Ke Tao¹ - Show less +1 more•Institutions (1)

Delft University of Technology¹

16 Apr 2012

TL;DR: Twitcident connects to emergency broadcasting services and automatically starts tracking and filtering information from Social Web streams (Twitter) when a new incident occurs and enriches the semantics of streamed Twitter messages to profile incidents and to continuously improve and adapt the information filtering to the current temporal context.

...read moreread less

Abstract: In this paper, we present Twitcident, a framework and Web-based system for filtering, searching and analyzing information about real-world incidents or crises. Twitcident connects to emergency broadcasting services and automatically starts tracking and filtering information from Social Web streams (Twitter) when a new incident occurs. It enriches the semantics of streamed Twitter messages to profile incidents and to continuously improve and adapt the information filtering to the current temporal context. Faceted search and analytical tools allow users to retrieve particular information fragments and overview and analyze the current situation as reported on the Social Web. Demo: http://wis.ewi.tudelft.nl/twitcident/

...read moreread less

220 citations

Book Chapter•DOI•

Understanding tables on the web

[...]

Jingjing Wang¹, Haixun Wang², Zhongyuan Wang², Kenny Q. Zhu³•Institutions (3)

University of Washington¹, Microsoft², Shanghai Jiao Tong University³

15 Oct 2012

TL;DR: This paper argues that in order for computers to understand HTML tables, computers must first have a brain --- a general purpose knowledge taxonomy that is comprehensive enough to cover the concepts of worldly facts in a human mind, and illustrates a two phase process.

...read moreread less

Abstract: The Web contains a wealth of information, and a key challenge is to make this information machine processable. In this paper, we study how to "understand" HTML tables on the Web, which is one step further from finding the schemas of tables. From 0.3 billion Web documents, we obtain 1.95 billion tables, and 0.5-1% of these contain information of various entities and their properties. We argue that in order for computers to understand these tables, computers must first have a brain --- a general purpose knowledge taxonomy that is comprehensive enough to cover the concepts (of worldly facts) in a human mind. Second, we argue that the process of understanding a table is the process of finding the right position for the table in the knowledge taxonomy. Once a table is associated with a concept in the knowledge taxonomy, it will be automatically linked to all other tables that are associated with the same concept, as well as tables associated with concepts related to this concept. In other words, understanding occurs when computers will understand the semantics of the tables through the interconnections of concepts in the knowledge base. In this paper, we illustrate a two phase process. Our experimental results show that the approach is feasible and it may benefit many useful applications such as web search.

...read moreread less

169 citations

Proceedings Article•DOI•

FlowFox: a web browser with flexible and precise information flow control

[...]

Willem De Groef¹, Dominique Devriese¹, Nick Nikiforakis¹, Frank Piessens¹•Institutions (1)

Katholieke Universiteit Leuven¹

16 Oct 2012

TL;DR: FlowFox is presented, the first fully functional web browser that implements a precise and general information flow control mechanism for web scripts based on the technique of secure multi-execution, and can support powerful, yet precise policies refining the same-origin-policy in a way that is compatible with existing websites.

...read moreread less

Abstract: We present FlowFox, the first fully functional web browser that implements a precise and general information flow control mechanism for web scripts based on the technique of secure multi-execution. We demonstrate how FlowFox subsumes many ad-hoc script containment countermeasures developed over the last years. We also show that FlowFox is compatible with the current web, by investigating its behavior on the Alexa top-500 web sites, many of which make intricate use of JavaScript.The performance and memory cost of FlowFox is substantial (a performance cost of around 20% on macro benchmarks for a simple two level policy), but not prohibitive. Our prototype implementation shows that information flow enforcement based on secure multi-execution can be implemented in full-scale browsers. It can support powerful, yet precise policies refining the same-origin-policy in a way that is compatible with existing websites.

...read moreread less

153 citations

Journal Article•DOI•

Challenges for the multilingual Web of Data

[...]

Jorge Gracia¹, Elena Montiel-Ponsoda¹, Philipp Cimiano², Asunción Gómez-Pérez¹, Paul Buitelaar³, John P. McCrae² - Show less +2 more•Institutions (3)

Technical University of Madrid¹, Citec², National University of Ireland, Galway³

01 Mar 2012-Journal of Web Semantics

TL;DR: A vision of a multilingual Web of Data is presented and the role that techniques such as ontology localization, ontology mapping, and cross-lingual ontology-based information access and presentation will play in achieving this is discussed.

...read moreread less

146 citations

Patent•

Secure data container for web applications

[...]

Hong Li¹, Rita H. Wouhaybi¹, Tobias M. Kohlenberg¹, Alan D. Ross¹•Institutions (1)

Intel¹

27 Sep 2012

TL;DR: In this article, the authors propose a context-based security policy for detecting an attempt by the web content to access a local data store based on a user profile, a multi-user data source and a cloud service.

...read moreread less

Abstract: Systems and methods may provide for identifying web content and detecting an attempt by the web content to access a local data store. Additionally, a determination may be made as to whether to permit the attempt based on a context-based security policy. In one example, the context-based security policy is obtained from one or more of a user profile, a multi-user data source and a cloud service.

...read moreread less

145 citations

Proceedings Article•

Enemy of the state: a state-aware black-box web vulnerability scanner

[...]

Adam Doupé¹, Ludovico Cavedon¹, Christopher Kruegel¹, Giovanni Vigna¹•Institutions (1)

University of California, Santa Barbara¹

08 Aug 2012

TL;DR: It is shown that the state-aware black-box web vulnerability scanner is able to not only exercise more code of the web application, but also discover vulnerabilities that other vulnerability scanners miss.

...read moreread less

Abstract: Black-box web vulnerability scanners are a popular choice for finding security vulnerabilities in web applications in an automated fashion. These tools operate in a point-and-shootmanner, testing any web application-- regardless of the server-side language--for common security vulnerabilities. Unfortunately, black-box tools suffer from a number of limitations, particularly when interacting with complex applications that have multiple actions that can change the application's state. If a vulnerability analysis tool does not take into account changes in the web application's state, it might overlook vulnerabilities or completely miss entire portions of the web application. We propose a novel way of inferring the web application's internal state machine from the outside--that is, by navigating through the web application, observing differences in output, and incrementally producing a model representing the web application's state. We utilize the inferred state machine to drive a black-box web application vulnerability scanner. Our scanner traverses a web application's state machine to find and fuzz user-input vectors and discover security flaws. We implemented our technique in a prototype crawler and linked it to the fuzzing component from an open-source web vulnerability scanner. We show that our state-aware black-box web vulnerability scanner is able to not only exercise more code of the web application, but also discover vulnerabilities that other vulnerability scanners miss.

...read moreread less

141 citations

Journal Article•DOI•

A middleware framework for scalable management of linked streams

[...]

Danh Le-Phuoc¹, Hoan Quoc Nguyen-Mau¹, Josiane Xavier Parreira¹, Manfred Hauswirth¹•Institutions (1)

National University of Ireland, Galway¹

01 Nov 2012-Journal of Web Semantics

TL;DR: The Linked Stream Middleware is described, which makes it easy to integrate time-dependent data with other Linked Data sources, by enriching both sensor sources and sensor data streams with semantic descriptions, and enabling complex SPARQL-like queries across both dataset types through a novel query processing engine, along with means to mashup the data and process results.

...read moreread less

119 citations

Proceedings Article•DOI•

Race detection for web applications

[...]

Boris Petrov¹, Martin Vechev², Manu Sridharan³, Julian Dolby³•Institutions (3)

Sofia University¹, ETH Zurich², IBM³

11 Jun 2012

TL;DR: This work implemented WebRacer, the first dynamic race detector for web applications, implemented atop the production-quality WebKit engine, enabling testing of full-featured web sites and discovered many harmful races.

...read moreread less

Abstract: Modern web pages are becoming increasingly full-featured, and this additional functionality often requires greater use of asynchrony. Unfortunately, this asynchrony can trigger unexpected concurrency errors, even though web page scripts are executed sequentially.We present the first formulation of a happens-before relation for common web platform features. Developing this relation was a non-trivial task, due to complex feature interactions and browser differences. We also present a logical memory access model for web applications that abstracts away browser implementation details.Based on the above, we implemented WebRacer, the first dynamic race detector for web applications. WebRacer is implemented atop the production-quality WebKit engine, enabling testing of full-featured web sites. WebRacer can also simulate certain user actions, exposing more races.We evaluated WebRacer by testing a large set of Fortune 100 company web sites. We discovered many harmful races, and also gained insights into how developers handle asynchrony in practice.

...read moreread less

119 citations

Proceedings Article•DOI•

Paraimpu: a platform for a social web of things

[...]

Antonio Pintus¹, Davide Carboni¹, Andrea Piras¹•Institutions (1)

Center for Advanced Studies Research and Development in Sardinia¹

16 Apr 2012

TL;DR: This demo presents the prototype of a scalable architecture for a large scale social Web of Things for smart objects and services, named Paraimpu, a Web-based platform which allows to add, use, share and inter-connect real HTTP-enabledSmart objects and "virtual" things like services on the Web and social networks.

...read moreread less

Abstract: The Web of Things is a scenario where potentially billions of connected smart objects communicate using the Web protocols, HTTP in primis. A Web of Things envisioning and design has raised several research issues, from protocols adoption and communication models to architectural styles and social aspects facing. In this demo we present the prototype of a scalable architecture for a large scale social Web of Things for smart objects and services, named Paraimpu. It is a Web-based platform which allows to add, use, share and inter-connect real HTTP-enabled smart objects and "virtual" things like services on the Web and social networks. Paraimpu defines and uses few strong abstractions, in order to allow mash-ups of heterogeneous things introducing powerful rules for data adaptation. Adding and inter-connecting objects is supported through user friendly models and features.

...read moreread less

Journal Article•DOI•

Mining the Semantic Web

[...]

Achim Rettinger¹, Uta Lösch¹, Volker Tresp², Claudia d'Amato³, Nicola Fanizzi³ - Show less +1 more•Institutions (3)

Karlsruhe Institute of Technology¹, Siemens², University of Bari³

01 May 2012-Data Mining and Knowledge Discovery

TL;DR: It is argued that machine learning research has to offer a wide variety of methods applicable to different expressivity levels ofSemantic Web knowledge bases: ranging from weakly expressive but widely available knowledge bases in RDF to highly expressive first-order knowledge bases, this paper surveys statistical approaches to mining the Semantic Web.

...read moreread less

Abstract: In the Semantic Web vision of the World Wide Web, content will not only be accessible to humans but will also be available in machine interpretable form as ontological knowledge bases. Ontological knowledge bases enable formal querying and reasoning and, consequently, a main research focus has been the investigation of how deductive reasoning can be utilized in ontological representations to enable more advanced applications. However, purely logic methods have not yet proven to be very effective for several reasons: First, there still is the unsolved problem of scalability of reasoning to Web scale. Second, logical reasoning has problems with uncertain information, which is abundant on Semantic Web data due to its distributed and heterogeneous nature. Third, the construction of ontological knowledge bases suitable for advanced reasoning techniques is complex, which ultimately results in a lack of such expressive real-world data sets with large amounts of instance data. From another perspective, the more expressive structured representations open up new opportunities for data mining, knowledge extraction and machine learning techniques. If moving towards the idea that part of the knowledge already lies in the data, inductive methods appear promising, in particular since inductive methods can inherently handle noisy, inconsistent, uncertain and missing data. While there has been broad coverage of inducing concept structures from less structured sources (text, Web pages), like in ontology learning, given the problems mentioned above, we focus on new methods for dealing with Semantic Web knowledge bases, relying on statistical inference on their standard representations. We argue that machine learning research has to offer a wide variety of methods applicable to different expressivity levels of Semantic Web knowledge bases: ranging from weakly expressive but widely available knowledge bases in RDF to highly expressive first-order knowledge bases, this paper surveys statistical approaches to mining the Semantic Web. We specifically cover similarity and distance-based methods, kernel machines, multivariate prediction models, relational graphical models and first-order probabilistic learning approaches and discuss their applicability to Semantic Web representations. Finally we present selected experiments which were conducted on Semantic Web mining tasks for some of the algorithms presented before. This is intended to show the breadth and general potential of this exiting new research and application area for data mining.

...read moreread less

Journal Article•DOI•

Exploiting web scraping in a collaborative filtering- based approach to web advertising

[...]

Eloisa Vargiu, Mirko Urru¹•Institutions (1)

University of Cagliari¹

05 Dec 2012-Artificial Intelligence Review

TL;DR: This paper proposes a collaborative filtering-based Web advertising system aimed at finding the most relevant ads for a generic Web page by exploiting Web scraping techniques.

...read moreread less

Abstract: Web scraping is the set of techniques used to automatically get some information from a website instead of manually copying it. The goal of a Web scraper is to look for certain kinds of information, extract, and aggregate it into new Web pages. In particular, scrapers are focused on transforming unstructured data and save them in structured databases. In this paper, among others kind of scraping, we focus on those techniques that extract the content of a Web page. In particular, we adopt scraping techniques in the Web advertising field. To this end, we propose a collaborative filtering-based Web advertising system aimed at finding the most relevant ads for a generic Web page by exploiting Web scraping. To illustrate how the system works in practice, a case study is presented.

...read moreread less

Proceedings Article•DOI•

Linked education: interlinking educational resources and the Web of data

[...]

Stefan Dietze, Honq Qing Yu¹, Daniela Giordano², Eleni Kaldoudi³, Nikolas Dovrolis³, Davide Taibi - Show less +2 more•Institutions (3)

Open University¹, University of Catania², Democritus University of Thrace³

26 Mar 2012

TL;DR: This paper describes a general approach to exploit the wealth of already existing TEL data on the Web by allowing its exposure as Linked Data and by taking into account automated enrichment and interlinking techniques to provide rich and well-interlinked data for the educational domain.

...read moreread less

Abstract: Research on interoperability of technology-enhanced learning (TEL) repositories throughout the last decade has led to a fragmented landscape of competing approaches, such as metadata schemas and interface mechanisms. However, so far Web-scale integration of resources is not facilitated, mainly due to the lack of take-up of shared principles, datasets and schemas. On the other hand, the Linked Data approach has emerged as the de-facto standard for sharing data on the Web and offers a large potential to solve interoperability issues in the field of TEL. In this paper, we describe a general approach to exploit the wealth of already existing TEL data on the Web by allowing its exposure as Linked Data and by taking into account automated enrichment and interlinking techniques to provide rich and well-interlinked data for the educational domain. This approach has been implemented in the context of the mEducator project where data from a number of open TEL data repositories has been integrated, exposed and enriched by following Linked Data principles.

...read moreread less

Journal Article•DOI•

The Geoprocessing Web

[...]

Peisheng Zhao¹, Theodor Foerster², Peng Yue³•Institutions (3)

George Mason University¹, University of Münster², Wuhan University³

01 Oct 2012-Computers & Geosciences

TL;DR: A comprehensive overview about the state-of-the-art architecture and technologies, and the most recent developments in the Geoprocessing Web is provided.

...read moreread less

DOI•

Context-based recommendation to support problem solving in software development

[...]

Joel Cordeiro¹, Bruno Antunes¹, Paulo Gomes¹•Institutions (1)

University of Coimbra¹

04 Jun 2012

TL;DR: This work developed a tool that integrates recommendation of question/answer web resources in Eclipse, according to the context of these exception stack traces, and shows that the approach performs better than a simple keyword-based approach.

...read moreread less

Abstract: During the software development process, developers are often faced with problem solving situations. For instance, it is common the occurrence of exceptions, that originate stack traces in the Console View of the IDE. These situations motivate the developer to use the Web to search for information. However, there is a gap between the IDE and the Web, requiring developers to spend significant time searching for relevant information and navigating through web pages in a Web browser. We propose to process the information of exception stack traces and retrieve question-answering web resources to help developers. We developed a tool that integrates recommendation of question/answer web resources in Eclipse, according to the context of these exception stack traces. The results of a preliminary experimentation are promising, showing that our approach performs better than a simple keyword-based approach.

...read moreread less

Proceedings Article•DOI•

Analyzing the Evolution of Web Services Using Fine-Grained Changes

[...]

Daniele Romano¹, Martin Pinzger¹•Institutions (1)

Delft University of Technology¹

24 Jun 2012

TL;DR: WSDLDiff as mentioned in this paper is a tool that extracts fine-grained changes from subsequent versions of a web service interface defined in WSDL to highlight the most frequent types of changes affecting a web services.

...read moreread less

Abstract: In the service-oriented paradigm web service interfaces are considered contracts between web service subscribers and providers. However, these interfaces are continuously evolving over time to satisfy changes in the requirements and to fix bugs. Changes in a web service interface typically affect the systems of its subscribers. Therefore, it is essential for subscribers to recognize which types of changes occur in a web service interface in order to analyze the impact on his/her systems. In this paper we propose a tool called WSDLDiff to extract fine-grained changes from subsequent versions of a web service interface defined in WSDL. In contrast to existing approaches, WSDLDiff takes into account the syntax of WSDL and extracts the WSDL elements affected by changes and the types of changes. With WSDLDiff we performed a study aimed at analyzing the evolution of web services using the fine-grained changes extracted from the subsequent versions of four real world WSDL interfaces. The results of our study show that the analysis of the fine-grained changes helps web service subscribers to highlight the most frequent types of changes affecting a WSDL interface. This information can be relevant for web service subscribers who want to assess the risk associated to the usage of web services and to subscribe to the most stable ones.

...read moreread less

Journal Article•DOI•

Web Content Mining Techniques: A Survey

[...]

Faustina Johnson, Santosh Kumar Gupta

30 Jun 2012-International Journal of Computer Applications

TL;DR: This survey focuses on how to apply content mining on the web contains structured, unstructured, semi structured and multimedia data and how web content mining can be utilized in web usage mining.

...read moreread less

Abstract: The Quest for knowledge has led to new discoveries and inventions. With the emergence of World Wide Web, it became a hub for all these discoveries and inventions. Web browsers became a tool to make the information available at our finger tips. As years passed World Wide Web became overloaded with information and it became hard to retrieve data according to the need. Web mining came as a rescue for the above problem. Web content mining is a subdivision under web mining. This paper deals with a study of different techniques and pattern of content mining and the areas which has been influenced by content mining. The web contains structured, unstructured, semi structured and multimedia data. This survey focuses on how to apply content mining on the above data. It also points out how web content mining can be utilized in web usage mining.

...read moreread less

Proceedings Article•DOI•

AWSR: Active Web Service Recommendation Based on Usage History

[...]

Guosheng Kang¹, Jianxun Liu¹, Mingdong Tang¹, Xiaoqing Liu², Buqing Cao¹, Yu Xu¹ - Show less +2 more•Institutions (2)

Hunan University of Science and Technology¹, Missouri University of Science and Technology²

24 Jun 2012

TL;DR: By conducting large-scale experiments based on a real-world Web services dataset, it is shown that the AWSR system effectively recommends Web services based on users functional interests and non-functional requirements with excellent performance.

...read moreread less

Abstract: Web services are very prevalent nowadays. Recommending Web services that users are interested in becomes an interesting and challenging research problem. In this paper, we present AWSR (Active Web Service Recommendation), an effective Web service recommendation system based on users' usage history to actively recommend Web services to users. AWSR extracts user's functional interests and QoS preferences from his/her usage history. Similarity between user's functional interests and a candidate Web service is calculated first. A hybrid new metric of similarity is developed to combine functional similarity measurement and nonfunctional similarity measurement based on comprehensive QoS of Web services. The AWSR ranks publicly available Web services based on values of the hybrid metric of similarity, so that a Top-K Web service recommendation list is created for a user. AWSR has been implemented and deployed on the Web. By conducting large-scale experiments based on a real-world Web services dataset, it is shown that our system effectively recommends Web services based on users functional interests and non-functional requirements with excellent performance.

...read moreread less

Book Chapter•DOI•

ClowdFlows: a cloud based scientific workflow platform

[...]

Janez Kranjc¹, Vid Podpečan¹, Nada Lavrač²•Institutions (2)

Jožef Stefan Institute¹, University of Nova Gorica²

24 Sep 2012

TL;DR: ClowdFlows is an open cloud based platform for composition, execution, and sharing of interactive data mining workflows based on the principles of service-oriented knowledge discovery, and features interactive scientific workflows.

...read moreread less

Abstract: This paper presents an open cloud based platform for composition, execution, and sharing of interactive data mining workflows It is based on the principles of service-oriented knowledge discovery, and features interactive scientific workflows In contrast to comparable data mining platforms, our platform runs in all major Web browsers and platforms, including mobile devices In terms of crowdsourcing, ClowdFlows provides researchers with an easy way to expose and share their work and results, as only an Internet connection and a Web browser are required to access the workflows from anywhere Practitioners can use ClowdFlows to seamlessly integrate and join different implementations of algorithms, tools and Web services into a coherent workflow that can be executed in a cloud based application ClowdFlows is also easily extensible during run-time by importing Web services and using them as new workflow components

...read moreread less

Journal Article•DOI•

Have things changed now? An empirical study on input validation vulnerabilities in web applications

[...]

Theodoor Scholte, Davide Balzarotti¹, Engin Kirda²•Institutions (2)

Institut Eurécom¹, Northeastern University²

01 May 2012-Computers & Security

TL;DR: An empirical analysis of a large number of web vulnerability reports is performed with the aim of understanding how input validation flaws have evolved in the last decade, and suggests that many web problems are still simple in nature.

...read moreread less

Journal Article•DOI•

A hybrid approach for personalized recommendation of news on the Web

[...]

Hao Wen¹, Liping Fang¹, Ling Guan¹•Institutions (1)

Ryerson University¹

01 Apr 2012-Expert Systems With Applications

TL;DR: A hybrid method for personalized recommendation of news on the Web is presented, which provides Web users with an autonomous tool that is able to minimize repetitive and tedious Web surfing.

...read moreread less

Abstract: A hybrid method for personalized recommendation of news on the Web is presented, which provides Web users with an autonomous tool that is able to minimize repetitive and tedious Web surfing. The proposed approach classifies Web pages by calculating the respective weights of terms. A user's interest and preference models are generated by analyzing the user's navigational history. Based on the content of the Web pages and on a user's interest and preference models, the recommender system suggests news Web pages to the user who is likely interested in the related topics. Moreover, the technique of collaborative filtering, which aims to choose the trusted users, is employed to improve the performance of the recommender system. Experiments are carried out in order to demonstrate the effectiveness of the proposed method. In the experiments, Web news items are classified and recommended to Web users by matching the users' interests with the contents of the news.

...read moreread less

Proceedings Article•DOI•

Characterizing web content, user interests, and search behavior by reading level and topic

[...]

Jinyoung Kim¹, Kevyn Collins-Thompson², Paul N. Bennett², Susan T. Dumais²•Institutions (2)

University of Massachusetts Amherst¹, Microsoft²

08 Feb 2012

TL;DR: This work applies automatic text classifiers to estimate a novel type of profile for important entities in Web search -- users, websites, and queries, and finds that reading level and topic distributions provide an important new representation of Web content and user interests, and that using both together is more effective than using either one separately.

...read moreread less

Abstract: A user's expertise or ability to understand a document on a given topic is an important aspect of that document's relevance. However, this aspect has not been well-explored in information retrieval systems, especially those at Web scale where the great diversity of content, users, and tasks presents an especially challenging search problem. To help improve our modeling and understanding of this diversity, we apply automatic text classifiers, based on reading difficulty and topic prediction, to estimate a novel type of profile for important entities in Web search -- users, websites, and queries. These profiles capture topic and reading level distributions, which we then use in conjunction with search log data to characterize and compare different entities.We find that reading level and topic distributions provide an important new representation of Web content and user interests, and that using both together is more effective than using either one separately. In particular we find that: 1) the reading level of Web content and the diversity of visitors to a website can vary greatly by topic; 2) the degree to which a user's profile matches with a site's profile is closely correlated with the user's preference of the website in search results, and 3) site or URL profiles can be used to predict 'expertness' whether a given site or URL is oriented toward expert vs. non-expert users. Our findings provide strong evidence in favor of jointly incorporating reading level and topic distribution metadata into a variety of critical tasks in Web information systems.

...read moreread less

Journal Article•DOI•

Web evolution and Web Science

[...]

Wendy Hall¹, Thanassis Tiropanis¹•Institutions (1)

University of Southampton¹

01 Dec 2012-Computer Networks

TL;DR: The lessons from this retrospective examination of the evolution of the Web are outlined, the main outcomes of Web Science activities are presented and directions along which future developments could be anticipated are discussed.

...read moreread less

Proceedings Article•DOI•

Combining social web and BPM for improving enterprise performances: the BPM4People approach to social BPM

[...]

Marco Brambilla¹, Piero Fraternali, Carmen Karina Vaca Ruiz¹•Institutions (1)

Polytechnic University of Milan¹

16 Apr 2012

TL;DR: This project-centered demonstration paper proposes a model-driven approach to participatory and social enactment of business processes by defining a specific notation for describing Social BPM behaviors and a methodology that allows enterprises to implement of social processes as Web applications integrated with public or private Web social networks.

...read moreread less

Abstract: Social BPM fuses business process management practices with social networking applications, with the aim of enhancing the enterprise performance by means of a controlled participation of external stakeholders to process design and enactment. This project-centered demonstration paper proposes a model-driven approach to participatory and social enactment of business processes. The approach consists of defining a specific notation for describing Social BPM behaviors (defined as a BPMN 2.0 extension), a methodology, and a technical framework that allows enterprises to implement of social processes as Web applications integrated with public or private Web social networks. The presented work is performed within the BPM4People SME Capacities project

...read moreread less

Book Chapter•DOI•

SPARQL for a web of linked data: semantics and computability

[...]

Olaf Hartig¹•Institutions (1)

Humboldt University of Berlin¹

27 May 2012

TL;DR: In this paper, the applicability of SPARQL as a query language for Linked Data on the Web is investigated, where the scope of a query is the complete set of linked data on the web and a family of reachability-based semantics which restrict the scope to data that is reachable by traversing certain data links.

...read moreread less

Abstract: The World Wide Web currently evolves into a Web of Linked Data where content providers publish and link data as they have done with hypertext for the last 20 years. While the declarative query language SPARQL is the de facto for querying a-priory defined sets of data from the Web, no language exists for querying the Web of Linked Data itself. However, it seems natural to ask whether SPARQL is also suitable for such a purpose In this paper we formally investigate the applicability of SPARQL as a query language for Linked Data on the Web. In particular, we study two query models: 1) a full-Web semantics where the scope of a query is the complete set of Linked Data on the Web and 2) a family of reachability-based semantics which restrict the scope to data that is reachable by traversing certain data links. For both models we discuss properties such as monotonicity and computability as well as the implications of querying aWeb that is infinitely large due to data generating servers.

...read moreread less

Proceedings Article•DOI•

Functional descriptions as the bridge between hypermedia APIs and the Semantic Web

[...]

Ruben Verborgh¹, Thomas Steiner², Davy Van Deursen¹, Sam Coppens¹, Joaquim Gabarró Vallès², Rik Van de Walle¹ - Show less +2 more•Institutions (2)

Ghent University¹, Polytechnic University of Catalonia²

17 Apr 2012

TL;DR: This paper explains why capturing functionality is the connection between those three building blocks, and introduces the functional API description format RESTdesc that creates this bridge between hypermedia APIs and the Semantic Web.

...read moreread less

Abstract: The early visions for the Semantic Web, from the famous 2001 Scientific American article by Berners-Lee et al., feature intelligent agents that can autonomously perform tasks like discovering information, scheduling events, finding execution plans for complex operations, and in general, use reasoning techniques to come up with sense-making and traceable decisions. While today-more than ten years later-the building blocks (1) resource-oriented rest infrastructure, (2) Web APIs, and (3) Linked Data are in place, the envisioned intelligent agents have not landed yet. In this paper, we explain why capturing functionality is the connection between those three building blocks, and introduce the functional API description format RESTdesc that creates this bridge between hypermedia APIs and the Semantic Web. Rather than adding yet another component to the Semantic Web stack, RESTdesc offers instead concise descriptions that reuse existing vocabularies to guide hypermedia-driven agents. Its versatile capabilities are illustrated by a real-life agent use case for Web browsers wherein we demonstrate that RESTdesc functional descriptions are capable of fulfilling the promise of autonomous agents on the Web.

...read moreread less

Proceedings Article•DOI•

Non-tracking web analytics

[...]

Istemi Ekin Akkus¹, Ruichuan Chen¹, Michaela Hardt², Paul Francis¹, Johannes Gehrke³ - Show less +1 more•Institutions (3)

Max Planck Society¹, Twitter², Cornell University³

16 Oct 2012

TL;DR: This paper presents the first design of a system that provides web analytics without tracking, which gives users differential privacy guarantees, can provide better quality analytics than current services, requires no new organizational players, and is practical to deploy.

...read moreread less

Abstract: Today, websites commonly use third party web analytics services t obtain aggregate information about users that visit their sites. This information includes demographics and visits to other sites as well as user behavior within their own sites. Unfortunately, to obtain this aggregate information, web analytics services track individual user browsing behavior across the web. This violation of user privacy has been strongly criticized, resulting in tools that block such tracking as well as anti-tracking legislation and standards such as Do-Not-Track. These efforts, while improving user privacy, degrade the quality of web analytics. This paper presents the first design of a system that provides web analytics without tracking. The system gives users differential privacy guarantees, can provide better quality analytics than current services, requires no new organizational players, and is practical to deploy. This paper describes and analyzes the design, gives performance benchmarks, and presents our implementation and deployment across several hundred users.

...read moreread less

Proceedings Article•DOI•

Penetration Testing Tool for Web Services Security

[...]

Christian Mainka¹, Juraj Somorovsky¹, Jörg Schwenk¹•Institutions (1)

Ruhr University Bochum¹

24 Jun 2012

TL;DR: An overview of the design decisions and evaluation of four Web Services frameworks and their resistance against WS-Addressing spoofing and SOAPAction spoofing attacks is given.

...read moreread less

Abstract: XML-based SOAP Web Services are a widely used technology, which allows the users to execute remote operations and transport arbitrary data. It is currently adapted in Service Oriented Architectures, cloud interfaces, management of federated identities, eGovernment, or millitary services. The wide adoption of this technology has resulted in an emergence of numerous -- mostly complex -- extension specifications. Naturally, this has been followed by a rise in large number of Web Services attacks. They range from specific Denial of Service attacks to attacks breaking interfaces of cloud providers or confidentiality of encrypted messages. By implementing common web applications, the developers evaluate the security of their systems by applying different penetration testing tools. However, in comparison to the well-known attacks as SQL injection or Cross Site Scripting, there exist no penetration testing tools for Web Services specific attacks. This was the motivation for developing the first automated penetration testing tool for Web Services called WS-Attacker. In this paper we give an overview of our design decisions and provide evaluation of four Web Services frameworks and their resistance against WS-Addressing spoofing and SOAPAction spoofing attacks. %WS-Attacker was built with respect to its future extensions with further attacks in order to provide an all-in-one security checking interface.

...read moreread less

Proceedings Article•DOI•

Evolution of Mobile Software Development from Platform-Specific to Web-Based Multiplatform Paradigm

[...]

Luis Corral¹, Alberto Sillitti¹, Giancarlo Succi¹, Alessandro Garibbo, Paolo Ramella - Show less +1 more•Institutions (1)

Free University of Bozen-Bolzano¹

01 Mar 2012

TL;DR: A projection on the trend of using web technologies for creating end-user applications in mobile devices to offer an integral native solution that allows to simplify the soft-ware process and broad its scope to a true, single cross-platform development effort.

...read moreread less

Abstract: In this paper, we outline a projection on the trend of using web technologies for creating end-user applications in mobile devices. Following a paradigm shift in the software industry, from only-binary applications to dynamic web applications, mobile web development tools evolve to offer an integral native solution that allows to simplify the soft-ware process and broad its scope to a true, single cross-platform development effort. Soon, mobile web development tools will be preferred by designers and programmers thanks to their versatility, economy and usefulness, less dependent on specific platforms and SDKs, while fully functional and reliable in comparison to their binary counterparts.

...read moreread less

Collapse