scispace - formally typeset
Search or ask a question

Showing papers on "XML published in 2009"


Book ChapterDOI
Gabriella Kazai1
01 Jan 2009
TL;DR: The INEX initiative organises an international, coordinated effort to promote evaluation procedures for content-based XML retrieval, providing an opportunity for participants to evaluate their retrieval methods using uniform scoring procedures and a forum for participating organisations to compare their results.
Abstract: The widespread use of XML prompted the development of appropriate searching and browsing methods for XML documents. This explosion of XML retrieval tools requires the development of appropriate testbeds and evaluation methods. As part of a large-scale effort to improve the efficiency of research in information retrieval and digital libraries, the INEX initiative organises an international, coordinated effort to promote evaluation procedures for content-based XML retrieval. The project provides an opportunity for participants to evaluate their retrieval methods using uniform scoring procedures and a forum for participating organisations to compare their results.

415 citations


Journal ArticleDOI
TL;DR: The NEURON simulation program now allows Python to be used, alone or in combination withNEURON's traditional Hoc interpreter, and the use of the xml module in implementing NEurON's Import3D and CellBuild tools to read MorphML and NeuroML model specifications.
Abstract: The NEURON simulation program now allows Python to be used, alone or in combination with NEURON's traditional Hoc interpreter. Adding Python to NEURON has the immediate benefit of making available a very extensive suite of analysis tools written for engineering and science. It also catalyzes NEURON software development by offering users a modern programming tool that is recognized for its flexibility and power to create and maintain complex programs. At the same time, nothing is lost because all existing models written in Hoc, including GUI tools, continue to work without change and are also available within the Python context. An example of the benefits of Python availability is the use of the XML module in implementing NEURON's Import3D and CellBuild tools to read MorphML and NeuroML model specifications.

380 citations


Proceedings Article
01 Jan 2009
TL;DR: This paper compares two data interchange formats currently used by industry applications; XML and JSON and finds that JSON is significantly faster than XML and is further record other resource-related metrics in the results.
Abstract: This paper compares two data interchange formats currently used by industry applications; XML and JSON. The choice of an adequate data interchange format can have significant consequences on data transmission rates and performance. We describe the language specifications and their respective setting of use. A case study is then conducted to compare the resource utilization and the relative performance of applications that use the interchange formats. We find that JSON is significantly faster than XML and we further record other resource-related metrics in our results.

304 citations


Book ChapterDOI
04 Jul 2009
TL;DR: This chapter presents algorithms for both schema mapping creation via query discovery, and for query generation for data exchange that can be used in pure relational, pure XML, nested relational, or mixed relational and nested contexts.
Abstract: The Clio project provides tools that vastly simplify information integration. Information integration requires data conversions to bring data in different representations into a common form. Key contributions of Clio are the definition of non-procedural schema mappings to describe the relationship between data in heterogeneous schemas, a new paradigm in which we view the mapping creation process as one of query discovery, and algorithms for automatically generating queries for data transformation from the mappings. Clio provides algorithms to address the needs of two major information integration problems, namely, data integration and data exchange . In this chapter, we present our algorithms for both schema mapping creation via query discovery, and for query generation for data exchange. These algorithms can be used in pure relational, pure XML, nested relational, or mixed relational and nested contexts.

243 citations


Proceedings ArticleDOI
16 May 2009
TL;DR: This work unify languages and tools that rely on superimposition by using the language-independent model of feature structure trees (FSTs), and proposes a general approach to the composition of software artifacts written in different languages.
Abstract: Superimposition is a composition technique that has been applied successfully in many areas of software development. Although superimposition is a general-purpose concept, it has been (re)invented and implemented individually for various kinds of software artifacts. We unify languages and tools that rely on superimposition by using the language-independent model of feature structure trees (FSTs). On the basis of the FST model, we propose a general approach to the composition of software artifacts written in different languages, Furthermore, we offer a supporting framework and tool chain, called FEATUREHOUSE. We use attribute grammars to automate the integration of additional languages, in particular, we have integrated Java, C#, C, Haskell, JavaCC, and XML. Several case studies demonstrate the practicality and scalability of our approach and reveal insights into the properties a language must have in order to be ready for superimposition.

231 citations


Proceedings ArticleDOI
29 Mar 2009
TL;DR: This paper designs novel formulae to identify the search for nodes and search via nodes of a query, and presents a novel XML TF*IDF ranking strategy to rank the individual matches of all possible search intentions.
Abstract: Inspired by the great success of information retrieval (IR) style keyword search on the web, keyword search on XML has emerged recently. The difference between text database and XML database results in three new challenges: (1) Identify the user search intention, i.e. identify the XML node types that user wants to search for and search via. (2) Resolve keyword ambiguity problems: a keyword can appear as both a tag name and a text value of some node; a keyword can appear as the text values of different XML node types and carry different meanings. (3) As the search results are sub-trees of the XML document, new scoring function is needed to estimate its relevance to a given query. However, existing methods cannot resolve these challenges, thus return low result quality in term of query relevance. In this paper, we propose an IR-style approach which basically utilizes the statistics of underlying XML data to address these challenges. We first propose specific guidelines that a search engine should meet in both search intention identification and relevance oriented ranking for search results. Then based on these guidelines, we design novel formulae to identify the search for nodes and search via nodes of a query, and present a novel XML TF*IDF ranking strategy to rank the individual matches of all possible search intentions. Lastly, the proposed techniques are implemented in an XML keyword search engine called XReal, and extensive experiments show the effectiveness of our approach.

187 citations


Proceedings ArticleDOI
29 Jun 2009
TL;DR: An overview of the state-of-the-art techniques for supporting keyword search on structured and semi-structured data, including query result definition, ranking functions, result generation and top-k query processing, snippet generation, result clustering, query cleaning, performance optimization, and search quality evaluation are given.
Abstract: Empowering users to access databases using simple keywords can relieve the users from the steep learning curve of mastering a structured query language and understanding complex and possibly fast evolving data schemas. In this tutorial, we give an overview of the state-of-the-art techniques for supporting keyword search on structured and semi-structured data, including query result definition, ranking functions, result generation and top-k query processing, snippet generation, result clustering, query cleaning, performance optimization, and search quality evaluation. Various data models will be discussed, including relational data, XML data, graph-structured data, data streams, and workflows. We also discuss applications that are built upon keyword search, such as keyword based database selection, query generation, and analytical processing. Finally we identify the challenges and opportunities of future research to advance the field.

168 citations


Patent
01 May 2009
TL;DR: In this article, the authors propose a method and system for generation and playback of supplemented videos which include interactive features such as hotspots that allow a video viewer to interact with the video when the video viewer sees an object.
Abstract: The present invention is a method and system for generation and playback of supplemented videos which include interactive features. The supplemented video includes hotspots that allow a video viewer to interact with the video when the video viewer sees an object. The hotspots can be manually defined. Information regarding the object and the hotspot can be stored in a separate XML file. Furthermore, the present invention can be a marketplace where a desired object can be found by searching the XML file. The search results can list the supplemented videos which contain hotspots corresponding to the object and also the time in the supplemented video in which the object is found. The present invention can also aggregate data about the objects based on the playback of the supplemented videos and the video viewer's interaction with the supplemented videos.

148 citations


Book ChapterDOI
07 Dec 2009
TL;DR: The XER tasks and the evaluation procedure used at the XER track in 2009, where a new version of Wikipedia was used as underlying collection are described; and the approaches adopted by the participants are summarized.
Abstract: In some situations search engine users would prefer to retrieve entities instead of just documents. Example queries include "Italian Nobel prize winners", "Formula 1 drivers that won the Monaco Grand Prix", or "German spoken Swiss cantons". The XML Entity Ranking (XER) track at INEX creates a discussion forum aimed at standardizing evaluation procedures for entity retrieval. This paper describes the XER tasks and the evaluation procedure used at the XER track in 2009, where a new version of Wikipedia was used as underlying collection; and summarizes the approaches adopted by the participants.

147 citations


Proceedings ArticleDOI
06 Jul 2009
TL;DR: The verification steps required to effectively validate an incoming SOAP request are discussed and a practical guideline for achieving a robust and effective SOAP message security validation mechanism is provided.
Abstract: The service-oriented architecture paradigm is influencing modern software systems remarkably and Web Services are a common technology to implement such systems. However, the numerous Web Service standard specifications and especially their ambiguity result in a high complexity which opens the door for security-critical mistakes.This paper aims on raising awareness of this issue while discussing a vulnerability in Amazon’s Elastic Compute Cloud (EC2) services to XML wrapping attacks, which has since been resolved as a result of our findings and disclosure. More importantly, this paper discusses the verification steps required to effectively validate an incoming SOAP request. It reviews the available work in the light of the discovered Amazon EC2 vulnerability and provides a practical guideline for achieving a robust and effective SOAP message security validation mechanism.

134 citations


Proceedings ArticleDOI
16 Nov 2009
TL;DR: This work combines static analysis with model checking to mine Computation Tree Logic formulas that describe the operations a parameter goes through: "In parseProperties(String xml), the parameter xml normally stems from getProperties()."
Abstract: A caller must satisfy the callee's precondition--that is, reach a state in which the callee may be called. Preconditions describe the state that needs to be reached, but not how to reach it. We combine static analysis with model checking to mine Computation Tree Logic (CTL) formulas that describe the operations a parameter goes through: "In parseProperties(String xml), the parameter xml normally stems from getProperties()." Such operational preconditions can be learned from program code, and the code can be checked for their violations. Applied to AspectJ, our Tikanga prototype found 189 violations of operational preconditions, uncovering 9 unique defects and 36 unique code smells---with 44% true positives in the 50 top-ranked violations.

Proceedings ArticleDOI
01 Apr 2009
TL;DR: An early-stage implementation of WS-TAXI is presented, obtained by the integration of two existing softwares: soapUI, a popular tool for WS testing, and TAXI, an application previously developed for the automated derivation of XML instances from a XML schema.
Abstract: Web Services (WSs) are the W3C-endorsed realization of the Service-Oriented Architecture (SOA). Since they are supposed to be implementation-neutral, WSs are typically tested black-box at their interface. Such an interface is generally specified in an XML-based notation called the WS Description Language (WSDL). Conceptually, these WSDL documents are eligible for fully automated WS test generation using syntax-based testing approaches. Towards such goal, we introduce the WS-TAXI framework, in which we combine the coverage of WS operations with data-driven test generation. In this paper we present an early-stage implementation of WS-TAXI, obtained by the integration of two existing softwares: soapUI, a popular tool for WS testing, and TAXI, an application we have previously developed for the automated derivation of XML instances from a XML schema. WS-TAXI delivers a complete suite of test messages ready for execution. Test generation is driven by basic coverage criteria and by the application of some heuristics. The application of WS-TAXI to a real case study gave encouraging results.

Journal ArticleDOI
TL;DR: This survey gives an overview of formal results on the XML query language XPath and its fragments compared to other formalisms for querying trees, algorithms, and complexity bounds for evaluation ofXPath queries, as well as static analysis of XPath queries.
Abstract: This survey gives an overview of formal results on the XML query language XPath. We identify several important fragments of XPath, focusing on subsets of XPath 1.0. We then give results on the expressiveness of XPath and its fragments compared to other formalisms for querying trees, algorithms, and complexity bounds for evaluation of XPath queries, as well as static analysis of XPath queries.

Proceedings ArticleDOI
21 May 2009
TL;DR: A program called |fiwalk| which produces detailedXML describing all of the partitions and files on a hard drive or disk image, as well as any extractable metadata from the document filesthemselves is developed.
Abstract: We have developed a program called |fiwalk| which produces detailedXML describing all of the partitions and files on a hard drive or diskimage, as well as any extractable metadata from the document filesthemselves. We show how it is relatively simple to create automateddisk forensic applications using a Python module we have written thatreads |fiwalk|'s XML files. Finally, we present threeapplications using this system: a program to generate maps ofdisk images; an image redaction program; and a data transfer kioskwhich uses forensic tools to allow the migration of data from portablestorage devices without risk of infection from hostile software thatthe portable device may contain.

Proceedings ArticleDOI
26 Jul 2009
TL;DR: This paper presents a new dataset (and the methodology used to create it) based on a wide range of contemporary documents, with strong emphasis on comprehensive and detailed representation of both complex and simple layouts, and on colour originals.
Abstract: There is a significant need for a realistic dataset on which to evaluate layout analysis methods and examine their performance in detail. This paper presents a new dataset (and the methodology used to create it) based on a wide range of contemporary documents. Strong emphasis is placed on comprehensive and detailed representation of both complex and simple layouts, and on colour originals. In-depth information is recorded both at the page and region level. Ground truth is efficiently created using a new semi-automated tool and stored in a new comprehensive XML representation, the PAGE format. The dataset can be browsed and searched via a web-based front end to the underlying database and suitable subsets (relevant to specific evaluation goals) can be selected and downloaded.

Proceedings ArticleDOI
29 Jun 2009
TL;DR: It is concluded that the ideal database for SaaS has not yet been developed and some suggestions as to how it should be designed are offered.
Abstract: A multi-tenant database system for Software as a Service (SaaS) should offer schemas that are flexible in that they can be extended different versions of the application and dynamically modified while the system is on-line. This paper presents an experimental comparison of five techniques for implementing flexible schemas for SaaS. In three of these techniques, the database "owns" the schema in that its structure is explicitly defined in DDL. Included here is the commonly-used mapping where each tenant is given their own private tables, which we take as the baseline, and a mapping that employs Sparse Columns in Microsoft SQL Server. These techniques perform well, however they offer only limited support for schema evolution in the presence of existing data. Moreover they do not scale beyond a certain level. In the other two techniques, the application "owns" the schema in that it is mapped into generic structures in the database. Included here are XML in DB2 and Pivot Tables in HBase. These techniques give the application complete control over schema evolution, however they can produce a significant decrease in performance. We conclude that the ideal database for SaaS has not yet been developed and offer some suggestions as to how it should be designed.

Book ChapterDOI
18 Apr 2009
TL;DR: This paper proposes the concept of a mapping probability, which maps each query word into a related field (or XML element), which is used as a weight to combine the language models estimated from each field.
Abstract: Retrieving semistructured (XML) data typically requires either a structured query such as XPath, or a keyword query that does not take structure into account In this paper, we infer structural information automatically from keyword queries and incorporate this into a retrieval model More specifically, we propose the concept of a mapping probability, which maps each query word into a related field (or XML element) This mapping probability is used as a weight to combine the language models estimated from each field Experiments on two test collections show that our retrieval model based on mapping probabilities outperforms baseline techniques significantly

Patent
Eric Williamson1
29 May 2009
TL;DR: In this article, a modeling client can host modeling logic and an application programming interface (API) to create, access, manipulate, and import/export modeling objects used in modeling applications, such as engineering, medical, financial, and other modeling platforms.
Abstract: Embodiments relate to systems and methods for extracting a model object from a multi-dimensional source database. A modeling client can host modeling logic and an application programming interface (API) to create, access, manipulate, and import/export modeling objects used in modeling applications, such as engineering, medical, financial, and other modeling platforms. In aspects, the source data accepted into the modeling client can include consumer or business-level applications, whose database or other content can be extracted and encapsulated in object-oriented format, such as extensible markup language (XML) format. The resulting model object can be pivoted along selected dimensions, or otherwise manipulated. The modeling client can exchange one or more modeling object directly with external platforms, such as mainframe modeling platforms, via the application programming interface (API) on a programmatic basis. Costs and maintenance savings over mainframe-based modeling tools can thereby be achieved, while providing greater power than consumer or business-level tools.

Journal ArticleDOI
01 Oct 2009
TL;DR: The focus of the paper is on the expressive power of families of p-documents, namely, the ability to (efficiently) translate a given p-document of one family into another family and closure under updates.
Abstract: Various known models of probabilistic XML can be represented as instantiations of the abstract notion of p-documents. In addition to ordinary nodes, p-documents have distributional nodes that specify the possible worlds and their probabilistic distribution. Particular families of p-documents are determined by the types of distributional nodes that can be used as well as by the structural constraints on the placement of those nodes in a p-document. Some of the resulting families provide natural extensions and combinations of previously studied probabilistic XML models. The focus of the paper is on the expressive power of families of p-documents. In particular, two main issues are studied. The first is the ability to (efficiently) translate a given p-document of one family into another family. The second is closure under updates, namely, the ability to (efficiently) represent the result of updating the instances of a p-document of a given family as another p-document of that family. For both issues, we distinguish two variants corresponding to value-based and object-based semantics of p-documents.

Proceedings ArticleDOI
29 Jun 2009
TL;DR: A novel labeling scheme called DDE (for Dynamic DEwey) which is tailored for both static and dynamic XML documents which can completely avoid re-labeling and its label quality is most resilient to the number and order of insertions compared to the existing approaches.
Abstract: Labeling schemes lie at the core of query processing for many XML database management systems. Designing labeling schemes for dynamic XML documents is an important problem that has received a lot of research attention. Existing dynamic labeling schemes, however, often sacrifice query performance and introduce additional labeling cost to facilitate arbitrary updates even when the documents actually seldom get updated. Since the line between static and dynamic XML documents is often blurred in practice, we believe it is important to design a labeling scheme that is compact and efficient regardless of whether the documents are frequently updated or not. In this paper, we propose a novel labeling scheme called DDE (for Dynamic DEwey) which is tailored for both static and dynamic XML documents. For static documents, the labels of DDE are the same as those of dewey which yield compact size and high query performance. When updates take place, DDE can completely avoid re-labeling and its label quality is most resilient to the number and order of insertions compared to the existing approaches. In addition, we introduce Compact DDE (CDDE) which is designed to optimize the performance of DDE for insertions. Both DDE and CDDE can be incorporated into existing systems and applications that are based on dewey labeling scheme with minimum efforts. Experiment results demonstrate the benefits of our proposed labeling schemes over the previous approaches.

Journal ArticleDOI
TL;DR: This paper provides an overview of XML similarity/comparison by presenting existing research related to XML similarity by detailing the possible applications of XML comparison processes in various fields, ranging over data warehousing, data integration, classification/clustering and XML querying.

Journal ArticleDOI
TL;DR: The article includes two use cases demonstrating the adaptability of the SIMO framework, which is available as open source software and connected through a common ontology, again defined in XML.

Journal ArticleDOI
TL;DR: An attempt to clarify between the standards prevailing in the area is performed and the XML Data Standards providing generic XML Schemas are presented and a multi-faceted classification mechanism is proposed, leading to an extensible taxonomy of standards.

Proceedings ArticleDOI
22 Sep 2009
TL;DR: This paper explains the process-resource-product concept which was introduced with version 1.1 of AutomationML and published at the Hannover trade fair 2009 by means of an industrial example.
Abstract: AutomationML (Automation Markup Language) is an XML based data format for the exchange of plant engineering information. Its mission is to interconnect engineering tools of different disciplines, e.g. process, mechanical or control engineering. AutomationML enables the vendor independent storage of interlinked engineering data that is usually distributed into different engineering tools. This enables a wide range of future tool functionality not available and not possible today. For this, AutomationML defines a number of extended concepts for different engineering aspects. One of them is the process-resource-product concept which was introduced with version 1.1 and published at the Hannover trade fair 2009. This paper explains this concept by means of an industrial example.

Journal ArticleDOI
TL;DR: A product representation approach using the strengths of lightweight representation and annotation and markup practices to allow the association of product data from throughout the lifecycle with the geometric form of the product is proposed.
Abstract: Today companies face the unprecedented challenges of a global market, collaborative environments and the concept of management of the entire product life cycle. In supporting this, the challenge is not only how to utilize information management policies, but also how to develop product representation methods to meet the new demands including platform/application independence, support for the product lifecycle, assisting generation of viewpoint-specific representations, rapid sharing of information between geographically distributed applications and users, and protection of commercial security (intellectual property). This paper proposes a product representation approach using the strengths of lightweight representation and annotation and markup practices to allow the association of product data from throughout the lifecycle with the geometric form of the product. The approach, called Lightweight Model with Multi-layer Annotation (LIMMA), integrates the concept of lightweight representation with annotation of boundary-representations (b-rep) of a product and the use of a formalised markup language (XML). Examples of annotation layers and development of models through-life are given, based on the LIMMA approach.

Patent
21 Oct 2009
TL;DR: In this paper, a query represented by an XML file based on a reusable XML query template is executed with respect to the plurality of event objects to produce a plurality of result objects, each result object is produced based on an application of at least one operator of the query.
Abstract: Methods, systems, and computer-readable media are disclosed for event processing with a query based on a reusable XML query template. A particular method includes receiving a plurality of events from a source and generating a plurality of event objects based on the plurality of events. A query represented by an XML file based on a reusable XML query template is executed with respect to the plurality of event objects to produce a plurality of result objects. Each result object is produced based on an application of at least one operator of the query. A plurality of results is generated based on the plurality of result objects, and the plurality of results is transmitted to a sink.

Proceedings ArticleDOI
03 Mar 2009
TL;DR: A new system, called BGPmon, for monitoring the Border Gateway Protocol, which enables scalable real-time monitoring data distribution by allowing monitors to peer with each other and form an overlay network to provide new services and features without modifying the monitors.
Abstract: This paper presents a new system, called BGPmon, for monitoring the Border Gateway Protocol (BGP). BGP is the routing protocol for the global Internet. Monitoring BGP is important for both operations and research; a number of public and private BGP monitors are deployed and widely used. These existing monitors typically collect data using a full implementation of a BGP router. In contrast, BGPmon eliminates the unnecessary functions of route selection and data forwarding to focus solely on the monitoring function. BGPmon uses a publish/subscribe overlay network to provide real-time access to vast numbers of peers and clients. All routing events are consolidated into a single XML stream. XML allows us to add additional features such as labeling updates to allow easy identification of useful data by clients. Clients subscribe to BGPmon and receive the XML stream, performing tasks such as archiving, filtering, or real-time data analysis. BGPmon enables scalable real-time monitoring data distribution by allowing monitors to peer with each other and form an overlay network to provide new services and features without modifying the monitors. We illustrate the effectiveness of the BGPmon data using the Cyclops route monitoring system.

BookDOI
TL;DR: In this paper, the impact of document level ranking on focused retrieval of XML documents at the document level has been investigated in the context of the INEX 2008 Ad Hoc Track.
Abstract: Ad Hoc Track.- Overview of the INEX 2008 Ad Hoc Track.- Experiments with Proximity-Aware Scoring for XML Retrieval at INEX 2008.- Finding Good Elements for Focused Retrieval.- New Utility Models for the Garnata Information Retrieval System at INEX'08.- UJM at INEX 2008: Pre-impacting of Tags Weights.- Use of Multiword Terms and Query Expansion for Interactive Information Retrieval.- Enhancing Keyword Search with a Keyphrase Index.- CADIAL Search Engine at INEX.- Indian Statistical Institute at INEX 2008 Adhoc Track.- Using Collectionlinks and Documents as Context for INEX 2008.- SPIRIX: A Peer-to-Peer Search Engine for XML-Retrieval.- Book Track.- Overview of the INEX 2008 Book Track.- XRCE Participation to the Book Structure Task.- University of Waterloo at INEX 2008: Adhoc, Book, and Link-the-Wiki Tracks.- The Impact of Document Level Ranking on Focused Retrieval.- Adhoc and Book XML Retrieval with Cheshire.- Book Layout Analysis: TOC Structure Extraction Engine.- The Impact of Query Length and Document Length on Book Search Effectiveness.- Efficiency Track.- Overview of the INEX 2008 Efficiency Track.- Exploiting User Navigation to Improve Focused Retrieval.- Efficient XML and Entity Retrieval with PF/Tijah: CWI and University of Twente at INEX'08.- Pseudo Relevance Feedback Using Fast XML Retrieval.- TopX 2.0 at the INEX 2008 Efficiency Track.- Aiming for Efficiency by Detecting Structural Similarity.- Entity Ranking Track.- Overview of the INEX 2008 Entity Ranking Track.- L3S at INEX 2008: Retrieving Entities Using Structured Information.- Adapting Language Modeling Methods for Expert Search to Rank Wikipedia Entities.- Finding Entities in Wikipedia Using Links and Categories.- Topic Difficulty Prediction in Entity Ranking.- A Generative Language Modeling Approach for Ranking Entities.- Interactive Track.- Overview of the INEX 2008 Interactive Track.- Link the Wiki Track.- Overview of the INEX 2008 Link the Wiki Track.- Link-the-Wiki: Performance Evaluation Based on Frequent Phrases.- CMIC@INEX 2008: Link-the-Wiki Track.- Stealing Anchors to Link the Wiki.- Context Based Wikipedia Linking.- Link Detection with Wikipedia.- Wikisearching and Wikilinking.- CSIR at INEX 2008 Link-the-Wiki Track.- A Content-Based Link Detection Approach Using the Vector Space Model.- XML Mining Track.- Overview of the INEX 2008 XML Mining Track.- Semi-supervised Categorization of Wikipedia Collection by Label Expansion.- Document Clustering with K-tree.- Using Links to Classify Wikipedia Pages.- Clustering XML Documents Using Frequent Subtrees.- UJM at INEX 2008 XML Mining Track.- Probabilistic Methods for Link-Based Classification at INEX 2008.- Utilizing the Structure and Content Information for XML Document Clustering.- Self Organizing Maps for the Clustering of Large Sets of Labeled Graphs.

Book ChapterDOI
15 Jan 2009
TL;DR: An integrated development environment is presented that plays the role of an interface where P-Lingua programs can be written and compiled and illustrated by following the writing, compiling and simulating processes with a family of P systems solving the SAT problem.
Abstract: A new programming language for membrane computing, P-Lingua, is developed in this paper. This language is not designed for a specific simulator software. On the contrary, its purpose is to offer a general syntactic framework that could define a unified standard for membrane computing, covering a broad variety of models. At the present stage, P-Lingua can only handle P systems with active membranes, although the authors intend to extend it to other models in the near future. P-Lingua allows to write programs in a friendly way, as its syntax is very close to standard scientific notation, and parameterized expressions can be used as shorthand for sets of rules. There is a built-in compiler that parses these human-style programs and generates XML documents that can be given as input to simulation tools, while different plugins can be designed to produce specific adequate outputs for existing simulators. Furthermore, we present in this paper an integrated development environment that plays the role of an interface where P-Lingua programs can be written and compiled. We also present a simulator for the class of recognizer P systems with active membranes, and we illustrate it by following the writing, compiling and simulating processes with a family of P systems solving the SAT problem.

Journal ArticleDOI
01 Aug 2009
TL;DR: This work presents a new approach to translating xpath queries into sql queries based on a notion of extended XPath expressions and a simple least fixpoint (lfp) operator, and shows that extended xpath expressions are capable of capturing both dtd recursion and x path queries in a uniform framework.
Abstract: We study the problem of evaluating xpath queries over xml data that is stored in an rdbms via schema-based shredding The interaction between recursion (descendants-axis) in xpath queries and recursion in dtds makes it challenging to answer xpath queries using rdbms We present a new approach to translating xpath queries into sql queries based on a notion of extended XP ath expressions and a simple least fixpoint (lfp) operator Extended xpath expressions are a mild extension of xpath, and the lfp operator takes a single input relation and is already supported by most commercial rdbms We show that extended xpath expressions are capable of capturing both dtd recursion and xpath queries in a uniform framework Furthermore, they can be translated into an equivalent sequence of sql queries with the lfp operator We present algorithms for rewriting xpath queries over a (possibly recursive) dtd into extended xpath expressions and for translating extended xpath expressions to sql queries, as well as optimization techniques The novelty of our approach consists in its capability to answer a large class of xpath queries by means of only low-end rdbms features already available in most rdbms, as well as its flexibility to accommodate existing relational query optimization techniques In addition, these translation algorithms provide a solution to query answering for certain (possibly recursive) xml views of xml data Our experimental results verify the effectiveness of our techniques