Showing papers on "Semantic Web Stack published in 2010"

PDF

Open Access

Proceedings Article•

SenticNet: A Publicly Available Semantic Resource for Opinion Mining

[...]

Erik Cambria¹, Robert Speer², Catherine Havasi², Amir Hussain¹•Institutions (2)

University of Stirling¹, Massachusetts Institute of Technology²

03 Nov 2010

TL;DR: SenticNet is a publicly available resource for opinion mining built exploiting AI and Semantic Web techniques and uses dimensionality reduction to infer the polarity of common sense concepts and hence provide a public resource for mining opinions from natural language text at a semantic, rather than just syntactic, level.

...read moreread less

Abstract: Today millions of web-users express their opinions about many topics through blogs, wikis, fora, chats and social networks. For sectors such as e-commerce and e-tourism, it is very useful to automatically analyze the huge amount of social information available on the Web, but the extremely unstructured nature of these contents makes it a difficult task. SenticNet is a publicly available resource for opinion mining built exploiting AI and Semantic Web techniques. It uses dimensionality reduction to infer the polarity of common sense concepts and hence provide a public resource for mining opinions from natural language text at a semantic, rather than just syntactic, level.

...read moreread less

285 citations

Journal Article•DOI•

ViDE: A Vision-Based Approach for Deep Web Data Extraction

[...]

Wei Liu¹, Xiaofeng Meng¹, Weiyi Meng²•Institutions (2)

Renmin University of China¹, Binghamton University²

01 Mar 2010-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A novel vision-based approach that is Web-page-programming-language-independent is proposed that primarily utilizes the visual features on the deep Web pages to implement deep Web data extraction, including data record extraction and data item extraction.

...read moreread less

Abstract: Deep Web contents are accessed by queries submitted to Web databases and the returned data records are enwrapped in dynamically generated Web pages (they will be called deep Web pages in this paper). Extracting structured data from deep Web pages is a challenging problem due to the underlying intricate structures of such pages. Until now, a large number of techniques have been proposed to address this problem, but all of them have inherent limitations because they are Web-page-programming-language-dependent. As the popular two-dimensional media, the contents on Web pages are always displayed regularly for users to browse. This motivates us to seek a different way for deep Web data extraction to overcome the limitations of previous works by utilizing some interesting common visual features on the deep Web pages. In this paper, a novel vision-based approach that is Web-page-programming-language-independent is proposed. This approach primarily utilizes the visual features on the deep Web pages to implement deep Web data extraction, including data record extraction and data item extraction. We also propose a new evaluation measure revision to capture the amount of human effort needed to produce perfect extraction. Our experiments on a large set of Web databases show that the proposed vision-based approach is highly effective for deep Web data extraction.

...read moreread less

278 citations

Proceedings Article•DOI•

Ad-hoc object retrieval in the web of data

[...]

Jeffrey Pound¹, Peter Mika², Hugo Zaragoza²•Institutions (2)

University of Waterloo¹, Yahoo!²

26 Apr 2010

TL;DR: This work proposes a formal model of one specific semantic search task: ad-hoc object retrieval and shows that this task provides a solid framework to study some of the semantic search problems currently tackled by commercial Web search engines.

...read moreread less

Abstract: Semantic Search refers to a loose set of concepts, challenges and techniques having to do with harnessing the information of the growing Web of Data (WoD) for Web search. Here we propose a formal model of one specific semantic search task: ad-hoc object retrieval. We show that this task provides a solid framework to study some of the semantic search problems currently tackled by commercial Web search engines. We connect this task to the traditional ad-hoc document retrieval and discuss appropriate evaluation metrics. Finally, we carry out a realistic evaluation of this task in the context of a Web search application.

...read moreread less

228 citations

Book Chapter•DOI•

dbrec: music recommendations using DBpedia

[...]

Alexandre Passant¹•Institutions (1)

National University of Ireland, Galway¹

07 Nov 2010

TL;DR: Dbrec, a music recommendation system built on top of DBpedia, offering recommendations for more than 39,000 bands and solo artists, is described, providing relevant insights for people developing applications consuming Linked Data.

...read moreread less

Abstract: This paper describes the theoretical background and the implementation of dbrec, a music recommendation system built on top of DBpedia, offering recommendations for more than 39,000 bands and solo artists. We discuss the various challenges and lessons learnt while building it, providing relevant insights for people developing applications consuming Linked Data. Furthermore, we provide a user-centric evaluation of the system, notably by comparing it to last.fm.

...read moreread less

217 citations

Journal Article•DOI•

Semantic Enablement for Spatial Data Infrastructures

[...]

Krzysztof Janowicz¹, Sven Schade, Arne Bröring, Carsten Keßler², Patrick Maué², Christoph Stasch² - Show less +2 more•Institutions (2)

Pennsylvania State University¹, University of Münster²

01 Apr 2010-Transactions in Gis

TL;DR: Instead of developing new semantically enabled services from scratch, this work proposes to create profiles of existing services that implement a transparent mapping between the OGC and the Semantic Web world, and points out how to combine SDI with linked data.

...read moreread less

Abstract: Building on abstract reference models, the Open Geospatial Consortium (OGC) has established standards for storing, discovering, and processing geographical information. These standards act as a basis for the implementation of specific services and Spatial Data Infrastructures (SDI). Research on geo-semantics plays an increasing role to support complex queries and retrieval across heterogeneous information sources, as well as for service orchestration, semantic translation, and on-the-fly integration. So far, this research targets individual solutions or focuses on the Semantic Web, leaving the integration into SDI aside. What is missing is a shared and transparent Semantic Enablement Layer for SDI which also integrates reasoning services known from the Semantic Web. Instead of developing new semantically enabled services from scratch, we propose to create profiles of existing services that implement a transparent mapping between the OGC and the Semantic Web world. Finally, we point out how to combine SDI with linked data.

...read moreread less

177 citations

Journal Article•DOI•

Invited paper: Sig.ma: Live views on the Web of Data

[...]

Giovanni Tummarello¹, Richard Cyganiak¹, Michele Catasta², Szymon Danielczyk¹, Renaud Delbru¹, Stefan Decker¹ - Show less +2 more•Institutions (2)

National University of Ireland, Galway¹, École Polytechnique Fédérale de Lausanne²

01 Nov 2010-Journal of Web Semantics

TL;DR: This work presents Sig.ma, both a service and an end user application to access the Web of Data as an integrated information space in which large scale semantic Web indexing, logic reasoning, data aggregation heuristics, ad-hoc ontology consolidation, external services and responsive user interaction all play together to create rich entity descriptions.

...read moreread less

177 citations

Proceedings Article•

Linked Data is Merely More Data

[...]

Prateek Jain¹, Pascal Hitzler¹, Peter Z. Yeh², Kunal Verma², Amit P. Sheth¹ - Show less +1 more•Institutions (2)

Wright State University¹, Accenture²

23 Mar 2010

TL;DR: It is argued that the Linked Open Data (LoD) Cloud, in its current form, is only of limited value for furthering the Semantic Web vision and directions for research to remedy the situation are given.

...read moreread less

Abstract: In this position paper, we argue that the Linked Open Data (LoD) Cloud, in its current form, is only of limited value for furthering the Semantic Web vision. Being merely a weakly linked triple collection, it will only be of very limited benefit for the AI or Semantic Web communities. We describe the corresponding problems with the LoD Cloud and give directions for research to remedy the situation.

...read moreread less

176 citations

Journal Article•DOI•

A review of Web-based simulation and supporting tools

[...]

James M. Byrne¹, Cathal Heavey¹, Peter J. Byrne²•Institutions (2)

University of Limerick¹, Dublin City University²

01 Mar 2010-Simulation Modelling Practice and Theory

TL;DR: A review of the area of Web-based simulation, exploring the advantages and disadvantages of WBS over classical simulation systems, a classification of different sub- and related-areas of W BS, an exploration of technologies that enable WBS, and the evolution of the Web in terms of its relationship to WBS are given.

...read moreread less

175 citations

Proceedings Article•DOI•

Sig.ma: live views on the web of data

[...]

Giovanni Tummarello¹, Richard Cyganiak¹, Michele Catasta², Szymon Danielczyk¹, Renaud Delbru¹, Stefan Decker¹ - Show less +2 more•Institutions (2)

National University of Ireland, Galway¹, École Polytechnique Fédérale de Lausanne²

26 Apr 2010

TL;DR: Sig.ma uses an holistic approach in which large scale semantic web indexing, logic reasoning, data aggregation heuristics, ad hoc ontology consolidation, external services and responsive user interaction all play together to create rich entity descriptions.

...read moreread less

Abstract: We demonstrate Sig.ma, both a service and an end user application to access the Web of Data as an integrated information space.Sig.ma uses an holistic approach in which large scale semantic web indexing, logic reasoning, data aggregation heuristics, ad hoc ontology consolidation, external services and responsive user interaction all play together to create rich entity descriptions. These consolidated entity descriptions then form the base for embeddable data mashups, machine oriented services as well as data browsing services. Finally, we discuss Sig.ma's peculiar characteristics and report on lessions learned and ideas it inspires.

...read moreread less

130 citations

Book Chapter•DOI•

TrOWL: tractable OWL 2 reasoning infrastructure

[...]

Edward Thomas¹, Jeff Z. Pan¹, Yuan Ren¹•Institutions (1)

University of Aberdeen¹

30 May 2010

TL;DR: This work presents the TrOWL infrastructure for transforming, reasoning, and querying OWL2 ontologies which uses novel techniques such as Quality Guaranteed Approximations and Forgetting to achieve this goal.

...read moreread less

Abstract: The Semantic Web movement has led to the publication of thousands of ontologies online. These ontologies present and mediate information and knowledge on the Semantic Web. Tools exist to reason over these ontologies and to answer queries over them, but there are no large scale infrastructures for storing, reasoning, and querying ontologies on a scale that would be useful for a large enterprise or research institution. We present the TrOWL infrastructure for transforming, reasoning, and querying OWL2 ontologies which uses novel techniques such as Quality Guaranteed Approximations and Forgetting to achieve this goal.

...read moreread less

124 citations

Book Chapter•DOI•

Interactive relationship discovery via the semantic web

[...]

Philipp Heim¹, Steffen Lohmann², Timo Stegemann³•Institutions (3)

University of Stuttgart¹, Charles III University of Madrid², University of Duisburg-Essen³

30 May 2010

TL;DR: Main contributions compared to previous and related work are data aggregations on several dimensions, a graph visualization that displays and connects relationships also between more than two given objects, and an advanced implementation that is highly configurable and applicable to arbitrary RDF datasets.

...read moreread less

Abstract: This paper presents an approach for the interactive discovery of relationships between selected elements via the Semantic Web. It emphasizes the human aspect of relationship discovery by offering sophisticated interaction support. Selected elements are first semi-automatically mapped to unique objects of Semantic Web datasets. These datasets are then crawled for relationships which are presented in detail and overview. Interactive features and visual clues allow for a sophisticated exploration of the found relationships. The general process is described and the RelFinder tool as a concrete implementation and proof-of-concept is presented and evaluated in a user study. The application potentials are illustrated by a scenario that uses the RelFinder and DBpedia to assist a business analyst in decision-making. Main contributions compared to previous and related work are data aggregations on several dimensions, a graph visualization that displays and connects relationships also between more than two given objects, and an advanced implementation that is highly configurable and applicable to arbitrary RDF datasets.

...read moreread less

Proceedings Article•

Measuring Semantic Distance on Linking Data and Using it for Resources Recommendations.

[...]

Alexandre Passant¹•Institutions (1)

National University of Ireland, Galway¹

23 Mar 2010

TL;DR: This paper demonstrates how to measure semantic distance on Linked Data in order to identify relatedness between resources, and how such measures can be used to provide a new kind of self-explanatory recommendations.

...read moreread less

Abstract: A frequent topic discussed in the Linked Data community, especially when trying to outreach its values, is "What can we do with all this data ?". In this paper, we demonstrate (1) how to measure semantic distance on Linked Data in order to identify relatedness between resources, and (2) how such measures can be used to provide a new kind of self-explanatory recommendations, bringing together Linked Data and Artificial Intelligence principles, and demonstrating how intelligent agents could emerge in the realm of Linked Data.

...read moreread less

Book Chapter•DOI•

Using SPARQL and SPIN for Data Quality Management on the Semantic Web

[...]

Christian Fürber¹, Martin Hepp¹•Institutions (1)

Bundeswehr University Munich¹

03 May 2010

TL;DR: How the state of the art in data quality research fits the characteristics of the Web of Data is evaluated, how the SPARQL query language and the SParQL Inferencing Notation can be utilized to identify data quality problems in Semantic Web data automatically and this within theSemantic Web technology stack are described.

...read moreread less

Abstract: The quality of data is a key factor that determines the performance of information systems, in particular with regard (1) to the amount of exceptions in the execution of business processes and (2) to the quality of decisions based on the output of the respective information system. Recently, the Semantic Web and Linked Data activities have started to provide substantial data resources that may be used for real business operations. Hence, it will soon be critical to manage the quality of such data. Unfortunately, we can observe a wide range of data quality problems in Semantic Web data. In this paper, we (1) evaluate how the state of the art in data quality research fits the characteristics of the Web of Data, (2) describe how the SPARQL query language and the SPARQL Inferencing Notation (SPIN) can be utilized to identify data quality problems in Semantic Web data automatically and this within the Semantic Web technology stack, and (3) evaluate our approach.

...read moreread less

Journal Article•DOI•

Structural and Role-Oriented Web Service Discovery with Taxonomies in OWL-S

[...]

Georgios Meditskos¹, Nick Bassiliades¹•Institutions (1)

Aristotle University of Thessaloniki¹

01 Feb 2010-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The Web service matchmaking algorithm extends object-based matching techniques used in structural case-based reasoning, allowing 1) the retrieval of Web services not only based on subsumption relationships, but exploiting also the structural information of OWL ontologies, performing domain-dependent discovery.

...read moreread less

Abstract: In this paper, we describe and evaluate a Web service discovery framework using OWL-S advertisements, combined with the distinction between service and Web service of the WSMO discovery framework. More specifically, we follow the Web service discovery model, which is based on abstract and lightweight semantic Web service descriptions, using the service profile ontology of OWL-S. Our goal is to determine fast an initial set of candidate Web services for a specific request. This set can then be used in more fine-grained discovery approaches, based on richer Web service descriptions. Our Web service matchmaking algorithm extends object-based matching techniques used in structural case-based reasoning, allowing 1) the retrieval of Web services not only based on subsumption relationships, but exploiting also the structural information of OWL ontologies and 2) the exploitation of Web services classification in profile taxonomies, performing domain-dependent discovery. Furthermore, we describe how the typical paradigm of profile input/output annotation with ontology concepts can be extended, allowing ontology roles to be considered as well. We have implemented our framework in the OWLS-SLR system, which we extensively evaluate and compare to the OWLS-MX matchmaker.

...read moreread less

Book Chapter•DOI•

Facet graphs: complex semantic querying made easy

[...]

Philipp Heim¹, Thomas Ertl¹, Jürgen Ziegler²•Institutions (2)

University of Stuttgart¹, University of Duisburg-Essen²

30 May 2010

TL;DR: This paper describes an approach that allows humans to access information contained in the Semantic Web according to its semantics and thus to leverage the specific characteristic of this Web.

...read moreread less

Abstract: While the Semantic Web is rapidly filling up, appropriate tools for searching it are still at infancy In this paper we describe an approach that allows humans to access information contained in the Semantic Web according to its semantics and thus to leverage the specific characteristic of this Web To avoid the ambiguity of natural language queries, users only select already defined attributes organized in facets to build their search queries The facets are represented as nodes in a graph visualization and can be interactively added and removed by the users in order to produce individual search interfaces This provides the possibility to generate interfaces in arbitrary complexities and access arbitrary domains Even multiple and distantly connected facets can be integrated in the graph facilitating the access of information from different user-defined perspectives Challenges include massive amounts of data, massive semantic relations within the data, highly complex search queries and users' unfamiliarity with the Semantic Web

...read moreread less

Proceedings Article•DOI•

Building data warehouses with semantic data

[...]

Victoria Nebot, Rafael Berlanga

22 Mar 2010

TL;DR: A semi-automatic method for the identification and extraction of valid facts aimed at analyzing semantic data expressed as instance stores in RDF/OWL that exploits the semantics and theoretical foundations of Description Logics to derive valid combinations of instances into fact tuples.

...read moreread less

Abstract: The Semantic Web has become a new environment that enables organizations to attach semantic annotations taken from ontologies to the information they generate. As a result, large amounts of complex, semi-structured and heterogeneous semantic data repositories are being made available, making necessary new data warehouse tools for analyzing the Semantic Web. In this paper, we present a semi-automatic method for the identification and extraction of valid facts aimed at analyzing semantic data expressed as instance stores in RDF/OWL. The starting point of the method is a multidimensional (MD) star schema (i.e., subject of analysis, dimensions and measures) designed by the analyst by picking up concepts and properties from the ontology. The method exploits the semantics and theoretical foundations of Description Logics to derive valid combinations of instances into fact tuples. Moreover, some specific index structures are applied to the ontology in order to reach scalability and effectiveness.

...read moreread less

Journal Article•DOI•

Five challenges for the Semantic Sensor Web

[...]

Oscar Corcho, Raúl García-Castro¹•Institutions (1)

Technical University of Madrid¹

01 Apr 2010-Social Work

TL;DR: Some of the most relevant challenges of the current Sensor Web are go through, and some ongoing work and open opportunities for the introduction of semantics in this context are described.

...read moreread less

Abstract: The combination of sensor networks with the Web, web services and database technologies, was named some years ago as the Sensor Web or the Sensor Internet. Most efforts in this area focused on the provision of platforms that could be used to build sensor-based applications more efficiently, considering some of the most important challenges in sensor-based data management and sensor network configuration. The introduction of semantics into these platforms provides the opportunity of going a step forward into the understanding, management and use of sensor-based data sources, and this is a topic being explored by ongoing initiatives. In this paper we go through some of the most relevant challenges of the current Sensor Web, and describe some ongoing work and open opportunities for the introduction of semantics in this context.

...read moreread less

Patent•

Automatic mapping of a location identifier pattern of an object to a semantic type using object metadata

[...]

Nova T. Spivack¹, Scott White, James M. Wissner•Institutions (1)

Radar Networks¹

14 Apr 2010

TL;DR: In this article, a method for automatic mapping of a location identifier pattern of an object to a semantic type using object metadata is described. But the method is not suitable for web pages, and it requires the user to specify a set of tags associated with a website that is hosted by a web server.

...read moreread less

Abstract: Systems and methods for automatic mapping of a location identifier pattern of an object to a semantic type using object metadata are disclosed. In one aspect, embodiments of the present disclosure include a method, which may be implemented on a system, of identifying a set of tags associated with a website that is hosted by a web server. The method further includes, detecting a web page in the website in which a tag of the set of tags is identified, extracting a pattern from a Universal Resource Locator (URL) of the web page, and/or storing the pattern in a database embodied in a machine-readable storage medium as being mapped to the semantic type. The tag corresponds to a semantic type with which the content embodied in the web page has a semantic relationship and the pattern corresponds to the semantic type with which the content embodied in the web page has a semantic relationship.

...read moreread less

Journal Article•DOI•

Human Intelligence in the Process of Semantic Content Creation

[...]

Katharina Siorpaes¹, Elena Simperl¹•Institutions (1)

University of Innsbruck¹

01 Mar 2010-World Wide Web

TL;DR: The process of semantic content creation is analyzed in order to identify those tasks that are inherently human-driven and proposed incentive schemes that are likely to encourage users to perform exactly these tasks that crucially rely on manual input.

...read moreread less

Abstract: Despite significant progress over the last years the large-scale adoption of semantic technologies is still to come. One of the reasons for this state of affairs is assumed to be the lack of useful semantic content, a prerequisite for almost every IT system or application using semantics. Through its very nature, this content can not be created fully automatically, but requires, to a certain degree, human contribution. The interest of Internet users in semantics, and in particular in creating semantic content, is, however, low. This is understandable if we think of several characteristics exposed by many of the most prominent semantic technologies, and the applications thereof. One of these characteristics is the high barrier of entry imposed. Interacting with semantic technologies today requires specific skills and expertise on subjects which are not part of the mainstream IT knowledge portfolio. A second characteristic are the incentives that are largely missing in the design of most semantic applications. The benefits of using machine-understandable content are in most applications fully decoupled from the effort of creating and maintaining this content. In other words, users do not have a motivation to contribute to the process. Initiatives in the areas of the Social Semantic Web acknowledged this problem, and identified mechanisms to motivate users to dedicate more of their time and resources to participate in the semantic content creation process. Still, even if incentives are theoretically in place, available human labor is limited and must only be used for those tasks that are heavily dependent on human intervention, and cannot be reliably automated. In this article, we concentrate on this step in between. As a first contribution, we analyze the process of semantic content creation in order to identify those tasks that are inherently human-driven. When building semantic applications involving these specific tasks, one has to install incentive schemes that are likely to encourage users to perform exactly these tasks that crucially rely on manual input. As a second contribution of the article, we propose incentives or incentive-driven tools that can be used to increase user interest in semantic content creation tasks. We hope that our findings will be adopted as recommendations for establishing a fundamentally new form of design of semantic applications by the semantic technologies community.

...read moreread less

Proceedings Article•DOI•

[...]

Fangfang Liu¹, Yuliang Shi², Jie Yu¹, Tianhong Wang¹, Jingzhe Wu¹ - Show less +1 more•Institutions (2)

Shanghai University¹, Shandong University²

05 Jul 2010

TL;DR: This work provides the method which tries to reflect the underlying semantics of web services by utilizing the terms within WSDL fully and shows that this method works well on both service classification and query.

...read moreread less

Abstract: Web service has already been an important paradigm for web applications. Growing number of services need efficiently locating the desired web services. The similarity metric of web services plays important role in service search and classification. The very small text fragments in WSDL of web services are unsuitable for applying the traditional IR techniques. We describe our approach which supports the similarity search and classification of service operations. The approach firstly employs the external knowledge to compute the semantic distance of terms from two compared services. The similarity of services is measured upon these distances. Previous researches treat terms within the same WSDL documents as the isolated words and neglect the semantic association among them, hence lower down the accuracy of the similarity metric. We provide our method which tries to reflect the underlying semantics of web services by utilizing the terms within WSDL fully. The experiments show that our method works well on both service classification and query.

...read moreread less

Proceedings Article•

Exploiting a Web of Semantic Data for Interpreting Tables

[...]

Zareen Syed¹, Tim Finin¹, Varish Mulwad¹, Anupam Joshi¹•Institutions (1)

University of Maryland, Baltimore County¹

26 Apr 2010

TL;DR: In this paper, the authors describe techniques to automatically infer a (partial) semantic model for information in tables using both table headings, if available, and the values stored in table cells and to export the data the table represents as linked data.

...read moreread less

Abstract: Much of the world’s knowledge is contained in structured documents like spreadsheets, database relations and tables in documents found on the Web and in print. The information in these tables might be much more valuable if it could be appropriately exported or encoded in RDF, making it easier to share, understand and integrate with other information. This is especially true if it could be linked into the growing linked data cloud. We describe techniques to automatically infer a (partial) semantic model for information in tables using both table headings, if available, and the values stored in table cells and to export the data the table represents as linked data. The techniques have been prototyped for a subset of linked data that covers the core of Wikipedia.

...read moreread less

Journal Article•DOI•

Web 3.0: The Dawn of Semantic Search

[...]

James A. Hendler¹•Institutions (1)

Rensselaer Polytechnic Institute¹

01 Jan 2010-IEEE Computer

TL;DR: In this paper, the authors use semantic technologies to augment the underlying Web system's functionalities to improve the performance of emerging Web 3.0 applications, such as WSNs.

...read moreread less

Abstract: Emerging Web 3.0 applications use semantic technologies to augment the underlying Web system's functionalities.

...read moreread less

Posted Content•

A Survey on Preprocessing Methods for Web Usage Data

[...]

V. Chitraa, Antony Selvdoss Davamani

08 Apr 2010-arXiv: Information Retrieval

TL;DR: This paper reviews existing work done in the preprocessing stage of web usage mining, and a brief overview of various data mining techniques for discovering patterns, and pattern analysis are discussed.

...read moreread less

Abstract: World Wide Web is a huge repository of web pages and links. It provides abundance of information for the Internet users. The growth of web is tremendous as approximately one million pages are added daily. Users' accesses are recorded in web logs. Because of the tremendous usage of web, the web log files are growing at a faster rate and the size is becoming huge. Web data mining is the application of data mining techniques in web data. Web Usage Mining applies mining techniques in log data to extract the behavior of users which is used in various applications like personalized services, adaptive web sites, customer profiling, prefetching, creating attractive web sites etc., Web usage mining consists of three phases preprocessing, pattern discovery and pattern analysis. Web log data is usually noisy and ambiguous and preprocessing is an important process before mining. For discovering patterns sessions are to be constructed efficiently. This paper reviews existing work done in the preprocessing stage. A brief overview of various data mining techniques for discovering patterns, and pattern analysis are discussed. Finally a glimpse of various applications of web usage mining is also presented.

...read moreread less

Proceedings Article•DOI•

Using BM25F for semantic search

[...]

José R. Pérez-Agüera, Javier Arroyo, Jane Greenberg, Joaquin Perez Iglesias¹, Víctor Fresno¹ - Show less +1 more•Institutions (1)

National University of Distance Education¹

26 Apr 2010

TL;DR: This paper analyzes the specific problems of structured IR and how to adapt weighting schemas for semantic document retrieval and concludes that the structure is the most important feature of Semantic Web documents.

...read moreread less

Abstract: Information Retrieval (IR) approaches for semantic web search engines have become very populars in the last years. Popularization of different IR libraries, like Lucene, that allows IR implementations almost out-of-the-box have make easier IR integration in Semantic Web search engines. However, one of the most important features of Semantic Web documents is the structure, since this structure allow us to represent semantic in a machine readable format. In this paper we analyze the specific problems of structured IR and how to adapt weighting schemas for semantic document retrieval.

...read moreread less

Journal Article•DOI•

Discovering Semantic Web services using SPARQL and intelligent agents

[...]

Marco Luca Sbodio¹, David Martin², Claude Moulin³•Institutions (3)

Hewlett-Packard¹, Artificial Intelligence Center², University of Technology of Compiègne³

01 Nov 2010-Journal of Web Semantics

TL;DR: It is shown that SPARQL query evaluation can be used to check the truth of preconditions in a given context, construct the postconditions that will result from the execution of a service in a context, and determine whether a service execution with those results will satisfy the goal of an agent.

...read moreread less

Journal Article•DOI•

Towards a pattern science for the Semantic Web

[...]

Aldo Gangemi, Valentina Presutti

01 Apr 2010-Social Work

TL;DR: The knowledge soup problem is about semantic heterogeneity, and can be considered a difficult technical issue, which needs appropriate transformation and inferential pipelines that can help making sense of the different knowledge contexts.

...read moreread less

Abstract: With the web of data, the semantic web can be an empirical science Two problems have to be dealt with The knowledge soup problem is about semantic heterogeneity, and can be considered a difficult technical issue, which needs appropriate transformation and inferential pipelines that can help making sense of the different knowledge contexts The knowledge boundary problem is at the core of empirical investigation over the semantic web: what are the meaningful units that constitute the research objects for the semantic web? This question touches many aspects of semantic web studies: data, schemata, representation and reasoning, interaction, linguistic grounding, etc

...read moreread less

Book Chapter•DOI•

Improving the performance of semantic web applications with SPARQL query caching

[...]

Michael Martin¹, Jörg Unbehauen¹, Sören Auer¹•Institutions (1)

Leipzig University¹

30 May 2010

TL;DR: This work developed an approach for improving the performance of triple stores by caching query results and even complete application objects and selective invalidation of cache objects, following updates of the underlying knowledge bases.

...read moreread less

Abstract: The performance of triple stores is one of the major obstacles for the deployment of semantic technologies in many usage scenarios. In particular, Semantic Web applications, which use triple stores as persistence backends, trade performance for the advantage of flexibility with regard to information structuring. In order to get closer to the performance of relational database-backed Web applications, we developed an approach for improving the performance of triple stores by caching query results and even complete application objects. The selective invalidation of cache objects, following updates of the underlying knowledge bases, is based on analysing the graph patterns of cached SPARQL queries in order to obtain information about what kind of updates will change the query result. We evaluated our approach by extending the BSBM triple store benchmark with an update dimension as well as in typical Semantic Web application scenarios.

...read moreread less

Journal Article•DOI•

The schema theory for semantic link network

[...]

Hai Zhuge¹, Yunchuan Sun¹•Institutions (1)

Chinese Academy of Sciences¹

01 Mar 2010-Future Generation Computer Systems

TL;DR: The schema theory for SLN including the concepts, rule-constraint normal forms and relevant algorithms are proposed, which provides the basis for normalized management of SLN and its applications.

...read moreread less

Proceedings Article•

A general framework for representing and reasoning with annotated semantic web data

[...]

Umberto Straccia¹, Nuno Lopes², Gergely Lukácsy², Axel Polleres²•Institutions (2)

Istituto di Scienza e Tecnologie dell'Informazione¹, National University of Ireland, Galway²

11 Jul 2010

TL;DR: A generic framework for representing and reasoning with annotated Semantic Web data, formalise the annotated language, the corresponding deductive system, and address the query answering problem is described.

...read moreread less

Abstract: We describe a generic framework for representing and reasoning with annotated Semantic Web data, formalise the annotated language, the corresponding deductive system, and address the query answering problem. We extend previous contributions on RDF annotations by providing a unified reasoning formalism and allowing the seamless combination of different annotation domains. We demonstrate the feasibility of our method by instantiating it on (i) temporal RDF; (ii) fuzzy RDF; (iii) and their combination. A prototype shows that implementing and combining new domains is easy and that RDF stores can easily be extended to our framework.

...read moreread less

Proceedings Article•DOI•

Freshness matters: in flowers, food, and web authority

[...]

Na Dai¹, Brian D. Davison¹•Institutions (1)

Lehigh University¹

19 Jul 2010

TL;DR: A temporal web link-based ranking scheme, which incorporates features from historical author activities, based on a temporal web graph composed of multiple web snapshots at different time points, which improves upon PageRank in both relevance and freshness of the search results.

...read moreread less

Abstract: The collective contributions of billions of users across the globe each day result in an ever-changing web. In verticals like news and real-time search, recency is an obvious significant factor for ranking. However, traditional link-based web ranking algorithms typically run on a single web snapshot without concern for user activities associated with the dynamics of web pages and links. Therefore, a stale page popular many years ago may still achieve a high authority score due to its accumulated in-links. To remedy this situation, we propose a temporal web link-based ranking scheme, which incorporates features from historical author activities. We quantify web page freshness over time from page and in-link activity, and design a web surfer model that incorporates web freshness, based on a temporal web graph composed of multiple web snapshots at different time points. It includes authority propagation among snapshots, enabling link structures at distinct time points to influence each other when estimating web page authority. Experiments on a real-world archival web corpus show our approach improves upon PageRank in both relevance and freshness of the search results.

...read moreread less

Collapse