scispace - formally typeset
Search or ask a question

Showing papers on "Semantic Web Stack published in 2010"


Proceedings Article
03 Nov 2010
TL;DR: SenticNet is a publicly available resource for opinion mining built exploiting AI and Semantic Web techniques and uses dimensionality reduction to infer the polarity of common sense concepts and hence provide a public resource for mining opinions from natural language text at a semantic, rather than just syntactic, level.
Abstract: Today millions of web-users express their opinions about many topics through blogs, wikis, fora, chats and social networks. For sectors such as e-commerce and e-tourism, it is very useful to automatically analyze the huge amount of social information available on the Web, but the extremely unstructured nature of these contents makes it a difficult task. SenticNet is a publicly available resource for opinion mining built exploiting AI and Semantic Web techniques. It uses dimensionality reduction to infer the polarity of common sense concepts and hence provide a public resource for mining opinions from natural language text at a semantic, rather than just syntactic, level.

285 citations


Journal ArticleDOI
TL;DR: A novel vision-based approach that is Web-page-programming-language-independent is proposed that primarily utilizes the visual features on the deep Web pages to implement deep Web data extraction, including data record extraction and data item extraction.
Abstract: Deep Web contents are accessed by queries submitted to Web databases and the returned data records are enwrapped in dynamically generated Web pages (they will be called deep Web pages in this paper). Extracting structured data from deep Web pages is a challenging problem due to the underlying intricate structures of such pages. Until now, a large number of techniques have been proposed to address this problem, but all of them have inherent limitations because they are Web-page-programming-language-dependent. As the popular two-dimensional media, the contents on Web pages are always displayed regularly for users to browse. This motivates us to seek a different way for deep Web data extraction to overcome the limitations of previous works by utilizing some interesting common visual features on the deep Web pages. In this paper, a novel vision-based approach that is Web-page-programming-language-independent is proposed. This approach primarily utilizes the visual features on the deep Web pages to implement deep Web data extraction, including data record extraction and data item extraction. We also propose a new evaluation measure revision to capture the amount of human effort needed to produce perfect extraction. Our experiments on a large set of Web databases show that the proposed vision-based approach is highly effective for deep Web data extraction.

278 citations


Proceedings ArticleDOI
26 Apr 2010
TL;DR: This work proposes a formal model of one specific semantic search task: ad-hoc object retrieval and shows that this task provides a solid framework to study some of the semantic search problems currently tackled by commercial Web search engines.
Abstract: Semantic Search refers to a loose set of concepts, challenges and techniques having to do with harnessing the information of the growing Web of Data (WoD) for Web search. Here we propose a formal model of one specific semantic search task: ad-hoc object retrieval. We show that this task provides a solid framework to study some of the semantic search problems currently tackled by commercial Web search engines. We connect this task to the traditional ad-hoc document retrieval and discuss appropriate evaluation metrics. Finally, we carry out a realistic evaluation of this task in the context of a Web search application.

228 citations


Book ChapterDOI
07 Nov 2010
TL;DR: Dbrec, a music recommendation system built on top of DBpedia, offering recommendations for more than 39,000 bands and solo artists, is described, providing relevant insights for people developing applications consuming Linked Data.
Abstract: This paper describes the theoretical background and the implementation of dbrec, a music recommendation system built on top of DBpedia, offering recommendations for more than 39,000 bands and solo artists. We discuss the various challenges and lessons learnt while building it, providing relevant insights for people developing applications consuming Linked Data. Furthermore, we provide a user-centric evaluation of the system, notably by comparing it to last.fm.

217 citations


Journal ArticleDOI
TL;DR: Instead of developing new semantically enabled services from scratch, this work proposes to create profiles of existing services that implement a transparent mapping between the OGC and the Semantic Web world, and points out how to combine SDI with linked data.
Abstract: Building on abstract reference models, the Open Geospatial Consortium (OGC) has established standards for storing, discovering, and processing geographical information. These standards act as a basis for the implementation of specific services and Spatial Data Infrastructures (SDI). Research on geo-semantics plays an increasing role to support complex queries and retrieval across heterogeneous information sources, as well as for service orchestration, semantic translation, and on-the-fly integration. So far, this research targets individual solutions or focuses on the Semantic Web, leaving the integration into SDI aside. What is missing is a shared and transparent Semantic Enablement Layer for SDI which also integrates reasoning services known from the Semantic Web. Instead of developing new semantically enabled services from scratch, we propose to create profiles of existing services that implement a transparent mapping between the OGC and the Semantic Web world. Finally, we point out how to combine SDI with linked data.

177 citations


Journal ArticleDOI
TL;DR: This work presents Sig.ma, both a service and an end user application to access the Web of Data as an integrated information space in which large scale semantic Web indexing, logic reasoning, data aggregation heuristics, ad-hoc ontology consolidation, external services and responsive user interaction all play together to create rich entity descriptions.

177 citations


Proceedings Article
23 Mar 2010
TL;DR: It is argued that the Linked Open Data (LoD) Cloud, in its current form, is only of limited value for furthering the Semantic Web vision and directions for research to remedy the situation are given.
Abstract: In this position paper, we argue that the Linked Open Data (LoD) Cloud, in its current form, is only of limited value for furthering the Semantic Web vision. Being merely a weakly linked triple collection, it will only be of very limited benefit for the AI or Semantic Web communities. We describe the corresponding problems with the LoD Cloud and give directions for research to remedy the situation.

176 citations


Journal ArticleDOI
TL;DR: A review of the area of Web-based simulation, exploring the advantages and disadvantages of WBS over classical simulation systems, a classification of different sub- and related-areas of W BS, an exploration of technologies that enable WBS, and the evolution of the Web in terms of its relationship to WBS are given.

175 citations


Proceedings ArticleDOI
26 Apr 2010
TL;DR: Sig.ma uses an holistic approach in which large scale semantic web indexing, logic reasoning, data aggregation heuristics, ad hoc ontology consolidation, external services and responsive user interaction all play together to create rich entity descriptions.
Abstract: We demonstrate Sig.ma, both a service and an end user application to access the Web of Data as an integrated information space.Sig.ma uses an holistic approach in which large scale semantic web indexing, logic reasoning, data aggregation heuristics, ad hoc ontology consolidation, external services and responsive user interaction all play together to create rich entity descriptions. These consolidated entity descriptions then form the base for embeddable data mashups, machine oriented services as well as data browsing services. Finally, we discuss Sig.ma's peculiar characteristics and report on lessions learned and ideas it inspires.

130 citations


Book ChapterDOI
30 May 2010
TL;DR: This work presents the TrOWL infrastructure for transforming, reasoning, and querying OWL2 ontologies which uses novel techniques such as Quality Guaranteed Approximations and Forgetting to achieve this goal.
Abstract: The Semantic Web movement has led to the publication of thousands of ontologies online. These ontologies present and mediate information and knowledge on the Semantic Web. Tools exist to reason over these ontologies and to answer queries over them, but there are no large scale infrastructures for storing, reasoning, and querying ontologies on a scale that would be useful for a large enterprise or research institution. We present the TrOWL infrastructure for transforming, reasoning, and querying OWL2 ontologies which uses novel techniques such as Quality Guaranteed Approximations and Forgetting to achieve this goal.

124 citations


Book ChapterDOI
30 May 2010
TL;DR: Main contributions compared to previous and related work are data aggregations on several dimensions, a graph visualization that displays and connects relationships also between more than two given objects, and an advanced implementation that is highly configurable and applicable to arbitrary RDF datasets.
Abstract: This paper presents an approach for the interactive discovery of relationships between selected elements via the Semantic Web. It emphasizes the human aspect of relationship discovery by offering sophisticated interaction support. Selected elements are first semi-automatically mapped to unique objects of Semantic Web datasets. These datasets are then crawled for relationships which are presented in detail and overview. Interactive features and visual clues allow for a sophisticated exploration of the found relationships. The general process is described and the RelFinder tool as a concrete implementation and proof-of-concept is presented and evaluated in a user study. The application potentials are illustrated by a scenario that uses the RelFinder and DBpedia to assist a business analyst in decision-making. Main contributions compared to previous and related work are data aggregations on several dimensions, a graph visualization that displays and connects relationships also between more than two given objects, and an advanced implementation that is highly configurable and applicable to arbitrary RDF datasets.

Proceedings Article
23 Mar 2010
TL;DR: This paper demonstrates how to measure semantic distance on Linked Data in order to identify relatedness between resources, and how such measures can be used to provide a new kind of self-explanatory recommendations.
Abstract: A frequent topic discussed in the Linked Data community, especially when trying to outreach its values, is "What can we do with all this data ?". In this paper, we demonstrate (1) how to measure semantic distance on Linked Data in order to identify relatedness between resources, and (2) how such measures can be used to provide a new kind of self-explanatory recommendations, bringing together Linked Data and Artificial Intelligence principles, and demonstrating how intelligent agents could emerge in the realm of Linked Data.

Book ChapterDOI
03 May 2010
TL;DR: How the state of the art in data quality research fits the characteristics of the Web of Data is evaluated, how the SPARQL query language and the SParQL Inferencing Notation can be utilized to identify data quality problems in Semantic Web data automatically and this within theSemantic Web technology stack are described.
Abstract: The quality of data is a key factor that determines the performance of information systems, in particular with regard (1) to the amount of exceptions in the execution of business processes and (2) to the quality of decisions based on the output of the respective information system. Recently, the Semantic Web and Linked Data activities have started to provide substantial data resources that may be used for real business operations. Hence, it will soon be critical to manage the quality of such data. Unfortunately, we can observe a wide range of data quality problems in Semantic Web data. In this paper, we (1) evaluate how the state of the art in data quality research fits the characteristics of the Web of Data, (2) describe how the SPARQL query language and the SPARQL Inferencing Notation (SPIN) can be utilized to identify data quality problems in Semantic Web data automatically and this within the Semantic Web technology stack, and (3) evaluate our approach.

Journal ArticleDOI
TL;DR: The Web service matchmaking algorithm extends object-based matching techniques used in structural case-based reasoning, allowing 1) the retrieval of Web services not only based on subsumption relationships, but exploiting also the structural information of OWL ontologies, performing domain-dependent discovery.
Abstract: In this paper, we describe and evaluate a Web service discovery framework using OWL-S advertisements, combined with the distinction between service and Web service of the WSMO discovery framework. More specifically, we follow the Web service discovery model, which is based on abstract and lightweight semantic Web service descriptions, using the service profile ontology of OWL-S. Our goal is to determine fast an initial set of candidate Web services for a specific request. This set can then be used in more fine-grained discovery approaches, based on richer Web service descriptions. Our Web service matchmaking algorithm extends object-based matching techniques used in structural case-based reasoning, allowing 1) the retrieval of Web services not only based on subsumption relationships, but exploiting also the structural information of OWL ontologies and 2) the exploitation of Web services classification in profile taxonomies, performing domain-dependent discovery. Furthermore, we describe how the typical paradigm of profile input/output annotation with ontology concepts can be extended, allowing ontology roles to be considered as well. We have implemented our framework in the OWLS-SLR system, which we extensively evaluate and compare to the OWLS-MX matchmaker.

Book ChapterDOI
30 May 2010
TL;DR: This paper describes an approach that allows humans to access information contained in the Semantic Web according to its semantics and thus to leverage the specific characteristic of this Web.
Abstract: While the Semantic Web is rapidly filling up, appropriate tools for searching it are still at infancy In this paper we describe an approach that allows humans to access information contained in the Semantic Web according to its semantics and thus to leverage the specific characteristic of this Web To avoid the ambiguity of natural language queries, users only select already defined attributes organized in facets to build their search queries The facets are represented as nodes in a graph visualization and can be interactively added and removed by the users in order to produce individual search interfaces This provides the possibility to generate interfaces in arbitrary complexities and access arbitrary domains Even multiple and distantly connected facets can be integrated in the graph facilitating the access of information from different user-defined perspectives Challenges include massive amounts of data, massive semantic relations within the data, highly complex search queries and users' unfamiliarity with the Semantic Web

Proceedings ArticleDOI
22 Mar 2010
TL;DR: A semi-automatic method for the identification and extraction of valid facts aimed at analyzing semantic data expressed as instance stores in RDF/OWL that exploits the semantics and theoretical foundations of Description Logics to derive valid combinations of instances into fact tuples.
Abstract: The Semantic Web has become a new environment that enables organizations to attach semantic annotations taken from ontologies to the information they generate. As a result, large amounts of complex, semi-structured and heterogeneous semantic data repositories are being made available, making necessary new data warehouse tools for analyzing the Semantic Web. In this paper, we present a semi-automatic method for the identification and extraction of valid facts aimed at analyzing semantic data expressed as instance stores in RDF/OWL. The starting point of the method is a multidimensional (MD) star schema (i.e., subject of analysis, dimensions and measures) designed by the analyst by picking up concepts and properties from the ontology. The method exploits the semantics and theoretical foundations of Description Logics to derive valid combinations of instances into fact tuples. Moreover, some specific index structures are applied to the ontology in order to reach scalability and effectiveness.

Journal ArticleDOI
TL;DR: Some of the most relevant challenges of the current Sensor Web are go through, and some ongoing work and open opportunities for the introduction of semantics in this context are described.
Abstract: The combination of sensor networks with the Web, web services and database technologies, was named some years ago as the Sensor Web or the Sensor Internet. Most efforts in this area focused on the provision of platforms that could be used to build sensor-based applications more efficiently, considering some of the most important challenges in sensor-based data management and sensor network configuration. The introduction of semantics into these platforms provides the opportunity of going a step forward into the understanding, management and use of sensor-based data sources, and this is a topic being explored by ongoing initiatives. In this paper we go through some of the most relevant challenges of the current Sensor Web, and describe some ongoing work and open opportunities for the introduction of semantics in this context.

Patent
14 Apr 2010
TL;DR: In this article, a method for automatic mapping of a location identifier pattern of an object to a semantic type using object metadata is described. But the method is not suitable for web pages, and it requires the user to specify a set of tags associated with a website that is hosted by a web server.
Abstract: Systems and methods for automatic mapping of a location identifier pattern of an object to a semantic type using object metadata are disclosed. In one aspect, embodiments of the present disclosure include a method, which may be implemented on a system, of identifying a set of tags associated with a website that is hosted by a web server. The method further includes, detecting a web page in the website in which a tag of the set of tags is identified, extracting a pattern from a Universal Resource Locator (URL) of the web page, and/or storing the pattern in a database embodied in a machine-readable storage medium as being mapped to the semantic type. The tag corresponds to a semantic type with which the content embodied in the web page has a semantic relationship and the pattern corresponds to the semantic type with which the content embodied in the web page has a semantic relationship.

Journal ArticleDOI
TL;DR: The process of semantic content creation is analyzed in order to identify those tasks that are inherently human-driven and proposed incentive schemes that are likely to encourage users to perform exactly these tasks that crucially rely on manual input.
Abstract: Despite significant progress over the last years the large-scale adoption of semantic technologies is still to come. One of the reasons for this state of affairs is assumed to be the lack of useful semantic content, a prerequisite for almost every IT system or application using semantics. Through its very nature, this content can not be created fully automatically, but requires, to a certain degree, human contribution. The interest of Internet users in semantics, and in particular in creating semantic content, is, however, low. This is understandable if we think of several characteristics exposed by many of the most prominent semantic technologies, and the applications thereof. One of these characteristics is the high barrier of entry imposed. Interacting with semantic technologies today requires specific skills and expertise on subjects which are not part of the mainstream IT knowledge portfolio. A second characteristic are the incentives that are largely missing in the design of most semantic applications. The benefits of using machine-understandable content are in most applications fully decoupled from the effort of creating and maintaining this content. In other words, users do not have a motivation to contribute to the process. Initiatives in the areas of the Social Semantic Web acknowledged this problem, and identified mechanisms to motivate users to dedicate more of their time and resources to participate in the semantic content creation process. Still, even if incentives are theoretically in place, available human labor is limited and must only be used for those tasks that are heavily dependent on human intervention, and cannot be reliably automated. In this article, we concentrate on this step in between. As a first contribution, we analyze the process of semantic content creation in order to identify those tasks that are inherently human-driven. When building semantic applications involving these specific tasks, one has to install incentive schemes that are likely to encourage users to perform exactly these tasks that crucially rely on manual input. As a second contribution of the article, we propose incentives or incentive-driven tools that can be used to increase user interest in semantic content creation tasks. We hope that our findings will be adopted as recommendations for establishing a fundamentally new form of design of semantic applications by the semantic technologies community.

Proceedings ArticleDOI
05 Jul 2010
TL;DR: This work provides the method which tries to reflect the underlying semantics of web services by utilizing the terms within WSDL fully and shows that this method works well on both service classification and query.
Abstract: Web service has already been an important paradigm for web applications. Growing number of services need efficiently locating the desired web services. The similarity metric of web services plays important role in service search and classification. The very small text fragments in WSDL of web services are unsuitable for applying the traditional IR techniques. We describe our approach which supports the similarity search and classification of service operations. The approach firstly employs the external knowledge to compute the semantic distance of terms from two compared services. The similarity of services is measured upon these distances. Previous researches treat terms within the same WSDL documents as the isolated words and neglect the semantic association among them, hence lower down the accuracy of the similarity metric. We provide our method which tries to reflect the underlying semantics of web services by utilizing the terms within WSDL fully. The experiments show that our method works well on both service classification and query.

Proceedings Article
26 Apr 2010
TL;DR: In this paper, the authors describe techniques to automatically infer a (partial) semantic model for information in tables using both table headings, if available, and the values stored in table cells and to export the data the table represents as linked data.
Abstract: Much of the world’s knowledge is contained in structured documents like spreadsheets, database relations and tables in documents found on the Web and in print. The information in these tables might be much more valuable if it could be appropriately exported or encoded in RDF, making it easier to share, understand and integrate with other information. This is especially true if it could be linked into the growing linked data cloud. We describe techniques to automatically infer a (partial) semantic model for information in tables using both table headings, if available, and the values stored in table cells and to export the data the table represents as linked data. The techniques have been prototyped for a subset of linked data that covers the core of Wikipedia.

Journal ArticleDOI
TL;DR: In this paper, the authors use semantic technologies to augment the underlying Web system's functionalities to improve the performance of emerging Web 3.0 applications, such as WSNs.
Abstract: Emerging Web 3.0 applications use semantic technologies to augment the underlying Web system's functionalities.

Posted Content
TL;DR: This paper reviews existing work done in the preprocessing stage of web usage mining, and a brief overview of various data mining techniques for discovering patterns, and pattern analysis are discussed.
Abstract: World Wide Web is a huge repository of web pages and links. It provides abundance of information for the Internet users. The growth of web is tremendous as approximately one million pages are added daily. Users' accesses are recorded in web logs. Because of the tremendous usage of web, the web log files are growing at a faster rate and the size is becoming huge. Web data mining is the application of data mining techniques in web data. Web Usage Mining applies mining techniques in log data to extract the behavior of users which is used in various applications like personalized services, adaptive web sites, customer profiling, prefetching, creating attractive web sites etc., Web usage mining consists of three phases preprocessing, pattern discovery and pattern analysis. Web log data is usually noisy and ambiguous and preprocessing is an important process before mining. For discovering patterns sessions are to be constructed efficiently. This paper reviews existing work done in the preprocessing stage. A brief overview of various data mining techniques for discovering patterns, and pattern analysis are discussed. Finally a glimpse of various applications of web usage mining is also presented.

Proceedings ArticleDOI
26 Apr 2010
TL;DR: This paper analyzes the specific problems of structured IR and how to adapt weighting schemas for semantic document retrieval and concludes that the structure is the most important feature of Semantic Web documents.
Abstract: Information Retrieval (IR) approaches for semantic web search engines have become very populars in the last years. Popularization of different IR libraries, like Lucene, that allows IR implementations almost out-of-the-box have make easier IR integration in Semantic Web search engines. However, one of the most important features of Semantic Web documents is the structure, since this structure allow us to represent semantic in a machine readable format. In this paper we analyze the specific problems of structured IR and how to adapt weighting schemas for semantic document retrieval.

Journal ArticleDOI
TL;DR: It is shown that SPARQL query evaluation can be used to check the truth of preconditions in a given context, construct the postconditions that will result from the execution of a service in a context, and determine whether a service execution with those results will satisfy the goal of an agent.

Journal ArticleDOI
TL;DR: The knowledge soup problem is about semantic heterogeneity, and can be considered a difficult technical issue, which needs appropriate transformation and inferential pipelines that can help making sense of the different knowledge contexts.
Abstract: With the web of data, the semantic web can be an empirical science Two problems have to be dealt with The knowledge soup problem is about semantic heterogeneity, and can be considered a difficult technical issue, which needs appropriate transformation and inferential pipelines that can help making sense of the different knowledge contexts The knowledge boundary problem is at the core of empirical investigation over the semantic web: what are the meaningful units that constitute the research objects for the semantic web? This question touches many aspects of semantic web studies: data, schemata, representation and reasoning, interaction, linguistic grounding, etc

Book ChapterDOI
30 May 2010
TL;DR: This work developed an approach for improving the performance of triple stores by caching query results and even complete application objects and selective invalidation of cache objects, following updates of the underlying knowledge bases.
Abstract: The performance of triple stores is one of the major obstacles for the deployment of semantic technologies in many usage scenarios. In particular, Semantic Web applications, which use triple stores as persistence backends, trade performance for the advantage of flexibility with regard to information structuring. In order to get closer to the performance of relational database-backed Web applications, we developed an approach for improving the performance of triple stores by caching query results and even complete application objects. The selective invalidation of cache objects, following updates of the underlying knowledge bases, is based on analysing the graph patterns of cached SPARQL queries in order to obtain information about what kind of updates will change the query result. We evaluated our approach by extending the BSBM triple store benchmark with an update dimension as well as in typical Semantic Web application scenarios.

Journal ArticleDOI
TL;DR: The schema theory for SLN including the concepts, rule-constraint normal forms and relevant algorithms are proposed, which provides the basis for normalized management of SLN and its applications.

Proceedings Article
11 Jul 2010
TL;DR: A generic framework for representing and reasoning with annotated Semantic Web data, formalise the annotated language, the corresponding deductive system, and address the query answering problem is described.
Abstract: We describe a generic framework for representing and reasoning with annotated Semantic Web data, formalise the annotated language, the corresponding deductive system, and address the query answering problem. We extend previous contributions on RDF annotations by providing a unified reasoning formalism and allowing the seamless combination of different annotation domains. We demonstrate the feasibility of our method by instantiating it on (i) temporal RDF; (ii) fuzzy RDF; (iii) and their combination. A prototype shows that implementing and combining new domains is easy and that RDF stores can easily be extended to our framework.

Proceedings ArticleDOI
19 Jul 2010
TL;DR: A temporal web link-based ranking scheme, which incorporates features from historical author activities, based on a temporal web graph composed of multiple web snapshots at different time points, which improves upon PageRank in both relevance and freshness of the search results.
Abstract: The collective contributions of billions of users across the globe each day result in an ever-changing web. In verticals like news and real-time search, recency is an obvious significant factor for ranking. However, traditional link-based web ranking algorithms typically run on a single web snapshot without concern for user activities associated with the dynamics of web pages and links. Therefore, a stale page popular many years ago may still achieve a high authority score due to its accumulated in-links. To remedy this situation, we propose a temporal web link-based ranking scheme, which incorporates features from historical author activities. We quantify web page freshness over time from page and in-link activity, and design a web surfer model that incorporates web freshness, based on a temporal web graph composed of multiple web snapshots at different time points. It includes authority propagation among snapshots, enabling link structures at distinct time points to influence each other when estimating web page authority. Experiments on a real-world archival web corpus show our approach improves upon PageRank in both relevance and freshness of the search results.