scispace - formally typeset
Search or ask a question

Showing papers in "International Journal on Semantic Web and Information Systems in 2009"


Journal ArticleDOI
TL;DR: The authors describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked data community as it moves forward.
Abstract: The term “Linked Data” refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions— the Web of Data. In this article, the authors present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. They describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.

5,113 citations


Journal ArticleDOI
TL;DR: The Berlin SPARQL Benchmark (BSBM) as mentioned in this paper is built around an e-commerce use case in which a set of products is offered by different vendors and consumers have posted reviews about products.
Abstract: The SPARQL Query Language for RDF and the SPARQL Protocol for RDF are implemented by a growing number of storage systems and are used within enterprise and open Web settings. As SPARQL is taken up by the community, there is a growing need for benchmarks to compare the performance of storage systems that expose SPARQL endpoints via the SPARQL protocol. Such systems include native RDF stores as well as systems that rewrite SPARQL queries to SQL queries against non-RDF relational databases. This article introduces the Berlin SPARQL Benchmark (BSBM) for comparing the performance of native RDF stores with the performance of SPARQL-to-SQL rewriters across architectures. The benchmark is built around an e-commerce use case in which a set of products is offered by different vendors and consumers have posted reviews about products. The benchmark query mix emulates the search and navigation pattern of a consumer looking for a product. The article discusses the design of the BSBM benchmark and presents the results of a benchmark experiment comparing the performance of four popular RDF stores (Sesame, Virtuoso, Jena TDB, and Jena SDB) with the performance of two SPARQL-to-SQL rewriters (D2R Server and Virtuoso RDF Views) as well as the performance of two relational database management systems (MySQL and Virtuoso RDBMS).

634 citations


Journal ArticleDOI
TL;DR: This work presents Falcons Object Search, a keyword-based search engine for linked objects, which constructs a comprehensive virtual document including not only associated literals but also the textual descriptions of associated links and linked objects.
Abstract: Along with the rapid growth of the data Web, searching linked objects for information needs and for reusing become emergent for ordinary Web users and developers, respectively. To meet the challenge, we present Falcons Object Search, a keyword-based search engine for linked objects. To serve various keyword queries, for each object the system constructs a comprehensive virtual document including not only associated literals but also the textual descriptions of associated links and linked objects. The resulting objects are ranked by considering both their relevance to the query and their popularity. For each resulting object, a query-relevant structured snippet is provided to show the associated literals and linked objects matched with the query. Besides, Web-scale class-inclusion reasoning is performed to discover implicit typing information, and users could navigate class hierarchies for incremental class-based results filtering. The results of a task-based experiment show the promising features of the system.

139 citations


Journal ArticleDOI
TL;DR: In this paper, the authors discuss the challenges of reasoning on large scale RDF datasets from the Web and present a rule-based framework for application to web data: they argue their decisions using observations of undesirable examples taken directly from the web.
Abstract: In this article the authors discuss the challenges of performing reasoning on large scale RDF datasets from the Web. Using ter-Horst’s pD* fragment of OWL as a base, the authors compose a rule-based framework for application to web data: they argue their decisions using observations of undesirable examples taken directly from the Web. The authors further temper their OWL fragment through consideration of “authoritative sources†which counter-acts an observed behaviour which they term “ontology hijacking†: new ontologies published on the Web re-defining the semantics of existing entities resident in other ontologies. They then present their system for performing rule-based forward-chaining reasoning which they call SAOR: Scalable Authoritative OWL Reasoner. Based upon observed characteristics of web data and reasoning in general, they design their system to scale: the system is based upon a separation of terminological data from assertional data and comprises of a lightweight in-memory index, on-disk sorts and file-scans. The authors evaluate their methods on a dataset in the order of a hundred million statements collected from real-world Web sources and present scale-up experiments on a dataset in the order of a billion statements collected from the Web.

85 citations


Journal ArticleDOI
TL;DR: The authors present an approach for obtaining complex class descriptions from objects in knowledge bases by using Machine Learning techniques and describe in detail how to leverage existing techniques to achieve scalability on large knowledge bases available as SPARQL endpoints or Linked Data.
Abstract: The vision of the Semantic Web is to make use of semantic representations on the largest possible scale - the Web. Large knowledge bases such as DBpedia, OpenCyc, GovTrack, and others are emerging and are freely available as Linked Data and SPARQL endpoints. Exploring and analysing such knowledge bases is a significant hurdle for Semantic Web research and practice. As one possible direction for tackling this problem, the authors present an approach for obtaining complex class descriptions from objects in knowledge bases by using Machine Learning techniques. They describe in detail how we leverage existing techniques to achieve scalability on large knowledge bases available as SPARQL endpoints or Linked Data. Their algorithms are made available in the open source DL-Learner project and we present several real-life scenarios in which they can be used by Semantic Web applications.

74 citations


Book ChapterDOI
TL;DR: In this paper, the authors introduce an ontology to represent ideas, which provides a common language to foster interoperability between tools and to support the idea life cycle through the use of semantic reasoning and automatic analysis.
Abstract: Exchanging and analyzing ideas across different software tools and repositories is needed to implement the concepts of open innovation and holistic innovation management. However, a precise and formal definition for the concept of an idea is hard to obtain. In this paper, the authors introduce an ontology to represent ideas. This ontology provides a common language to foster interoperability between tools and to support the idea life cycle. Through the use of an ontology, additional benefits like semantic reasoning and automatic analysis become available. The proposed ontology captures both a core idea concept that covers the ?heart of the idea? and further concepts to support collaborative idea development, including rating, discussing, tagging, and grouping ideas. This modular approach allows the idea ontology to be complemented by additional concepts like customized evaluation methods. The authors present a case study that demonstrates how the ontology can be used to achieve interoperability between innovation tools and to answer questions relevant for innovation managers that demonstrate the advantages of semantic reasoning.

46 citations


Journal ArticleDOI
TL;DR: This article presents the MOAT, which helps solve the problems mentioned previously, and weaves user-generated content into the Web of Data, making it more efficiently interoperable and retrievable.
Abstract: The work presented in this article has been funded in part by Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2).

34 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a mediation infrastructure capable of resolving semantic interoperability conflicts at a pan-European level and provide several examples to illustrate both the need to solve such semantic conflicts and the actual solutions proposed.
Abstract: Interoperability is one of the most challenging problems in modern cross-organizational information systems, which rely on heterogeneous information and process models Interoperability becomes very important for e-Government information systems that support cross-organizational communication especially in a cross-border setting The main goal in this context is to seamlessly provide integrated services to the user (citizen) In this paper we focus on Pan European e-Services and issues related with their integration Our analysis uses basic concepts of the generic public service model of the Governance Enterprise Architecture (GEA) and of the Web Service Modeling Ontology (WSMO), to express the semantic description of the e-services Based on the above, we present a mediation infrastructure capable of resolving semantic interoperability conflicts at a pan-European level We provide several examples to illustrate both the need to solve such semantic conflicts and the actual solutions we propose

19 citations


Journal ArticleDOI
TL;DR: The mTableaux algorithm is developed which optimizes the reasoning process to facilitate service selection and improves the performance and scalability of semantic reasoning for mobile devices.
Abstract: With the emergence of high-end smart phones/PDAs there is a growing opportunity to enrich mobile/pervasive services with semantic reasoning. This article presents novel strategies for optimising semantic reasoning for realising semantic applications and services on mobile devices. We have developed the mTableaux algorithm which optimises the reasoning process to facilitate service selection. We present comparative experimental results which show that mTableaux improves the performance and scalability of semantic reasoning for mobile devices. [Article copies are available for purchase from InfoSci-on-Demand.com]

19 citations


Journal ArticleDOI
TL;DR: In this paper, the concept of unknown word (UW) is defined and a method to construct a lexical dictionary of unknown words through inputting various document collections scattered on the web is proposed.
Abstract: This article deals with research that automatically constructs a lexical dictionary of unknown words. The lexical dictionary has been usefully applied to various fields for semantic information processing. It has limitations in which it only processes terms defined in the dictionary. Under this circumstance, the concept of “Unknown Word (UW)†is defined. UW, in this research, is considered a word not defined in WordNet. Here is where a new method to construct UW lexical dictionary through inputting various document collections scattered on the web is proposed. We grasp related terms of UW and measure semantic relatedness (similarity) between an UW and a related term(s). The relatedness is obtained by calculating both probabilistic relationship and semantic relationship. This research can extend UW lexical dictionary with an abundant number of UW. It is also possible to prepare a foundation for semantic retrieval by simultaneously using the UW lexical dictionary and WordNet.

14 citations


Journal ArticleDOI
TL;DR: This paper proposes an original method based on a non-parametric learning scheme: the Reduced Coulomb Energy (RCE) Network, and shows that new knowledge is induced and the likelihood of the answers may be provided.
Abstract: The tasks of resource classification and retrieval from knowledge bases in the Semantic Web are the basis for a lot of important applications In order to overcome the limitations of purely deductive approaches to deal with these tasks, inductive (instance-based) methods have been introduced as efficient and noise-tolerant alternatives In this paper we propose an original method based on a non-parametric learning scheme: the Reduced Coulomb Energy (RCE) Network The method requires a limited training effort but it turns out to be very effective during the classification phase Casting retrieval as the problem of assessing the classmembership of individuals wrt the query concepts, we propose an extension of a classification algorithm using RCE networks based on an entropic similarity measure for OWL Experimentally we show that the performance of the resulting inductive classifier is comparable with the one of a standard reasoner and often more efficient than with other inductive approaches Moreover, we show that new knowledge (not logically derivable) is induced and the likelihood of the answers may be provided

Journal ArticleDOI
TL;DR: This work proposes an approach to enable people to share various data through an easy-to-use social platform using semi-automatic schema alignment techniques supported by the community and informal lightweight ontologies emerge gradually.
Abstract: User-generated content can help the growth of linked data. However, we lack interfaces enabling ordinary people to author linked data. Secondly, people have multiple perspectives on the same concept and different contexts. Thirdly, not enough ontologies exist to model various data. Therefore, we propose an approach to enable people to share various data through an easy-to-use social platform. Users define their own concepts and multiple conceptualizations are allowed. These are consolidated using semi-automatic schema alignment techniques supported by the community. Further, concepts are grouped semi-automatically by similarity. As a result of consolidation and grouping, informal lightweight ontologies emerge gradually. We have implemented social software, called StYLiD, to realize our approach. It can serve as a platform motivating people to bookmark and share different things. It may also drive vertical portals for specific communities with integrated data from multiple sources. Experimental observations support the validity of our approach.

Journal ArticleDOI
TL;DR: In this article, the authors compared the performance of corpus-based and structural approaches to determine semantic relatedness in ontologies and found that structural measures proposed by Wu and Palmer, and Leacock and Chodorow have superior performance when cut-off values are used.
Abstract: In this paper, the authors compare the performance of corpus-based and structural approaches to determine semantic relatedness in ontologies. A large light-weight ontology and a news corpus are used as materials. The results show that structural measures proposed by Wu and Palmer, and Leacock and Chodorow have superior performance when cut-off values are used. The corpus-based method Latent Semantic Analysis is found more accurate on specific rank levels. In further investigation, the approximation of structural measures and Latent Semantic Analysis show a low level of overlap and the methods are found to approximate different types of relations. The results suggest that a combination of corpus-based methods and structural methods should be used and appropriate cut-off values should be selected according to the intended use case.

Journal ArticleDOI
TL;DR: If the cluster-based WSD method generates several contradictory interpretations for one ambiguous query, the method extracts users' preferences from clickthrough data, and determines suitable concepts or concepts that meet users’ interests for explaining the ambiguous query.
Abstract: For most Web searching applications, queries are commonly ambiguous because words usually contain several meanings. Traditional Word Sense Disambiguation (WSD) methods use statistic models or ontology-based knowledge models to find the most appropriate sense for the ambiguous word. Since queries are usually short, the contexts of the queries may not always provide enough information for disambiguating queries. Thus, more than one interpretation may be found for one ambiguous query. In this paper, we propose a cluster-based WSD method, which finds out all appropriate interpretations for the query. Because some senses of one ambiguous word usually have very close semantic relations, we group those similar senses together for explaining the ambiguous word in one interpretation. If the cluster-based WSD method generates several contradictory interpretations for one ambiguous query, we extract users’ preferences from clickthrough data, and determine suitable concepts or concepts’ clusters that meet users’ interests for explaining the ambiguous query.