scispace - formally typeset
Search or ask a question

Showing papers on "Simple API for XML published in 2018"


Journal ArticleDOI
01 Feb 2018
TL;DR: Experiments over real-world benchmark XML corpora show that the effectiveness of the three approaches improves with contextualized n-grams of suitable length, which confirms the validity of the devised method from multiple clustering perspectives.
Abstract: A new method is proposed for clustering XML documents by structure-constrained phrases. It is implemented by three machine-learning approaches previously unexplored in the XML domain, namely non-negative matrix (tri-)factorization, co-clustering and automatic transactional clustering. A novel class of XML features approximately captures structure-constrained phrases as n-grams contextualized by root-to-leaf paths. Experiments over real-world benchmark XML corpora show that the effectiveness of the three approaches improves with contextualized n-grams of suitable length. This confirms the validity of the devised method from multiple clustering perspectives. Two approaches overcome in effectiveness several state-of-the-art competitors. The scalability of the three approaches is investigated, too.

8 citations


Proceedings Article
09 Aug 2018
TL;DR: This method consists of two steps: i) unification of XML document structures in order to set a global and generic perception/view of the distributed document warehouse, and ii) multidimensional modeling of unified documents for decisional purposes.
Abstract: Data warehouses and OLAP (On Line Analytical Processing) technologies analyse huge amounts of structured data that companies store as conventional databases. Recent works underline the importance of textual data for the decision making process and, therefore, lead to build document warehouses. In fact, documents help decision makers to better understand the evolution of their business activities. In general, these documents exist in XML format, are geographically distributed and described by multiple and different structures. This paper deals with a method to build a distributed document warehouse. This method consists of two steps: i) unification of XML document structures in order to set a global and generic perception/view of the distributed document warehouse, and ii) multidimensional modeling of unified documents for decisional purposes. More specifically, this paper focuses on the unification step.

6 citations


Journal ArticleDOI
TL;DR: Andromeda, a system for processing queries and updates on large XML documents based on the idea of statically and dynamically partitioning the input document, so as to distribute the computing load among the machines of a MapReduce cluster.
Abstract: In this paper we present Andromeda, a system for processing queries and updates on large XML documents. The system is based on the idea of statically and dynamically partitioning the input document, so as to distribute the computing load among the machines of a MapReduce cluster.

5 citations


Patent
21 Aug 2018
TL;DR: In this article, a method for quickly positioning and processing an XML tag is presented, which can save a lot of parsing time and maintenance and expansion of a program are very easy, and the convenient and flexible capability of treating the XML node is provided.
Abstract: The invention discloses a method for quickly positioning and processing an XML tag. A quick positioning operation of an XML node includes creation and arrangement of a document context processor, anda quick processing operation of the XML node includes high efficiency match of an XML element node, high efficiency acquisition of an XML element attribute and a post-processing immediate exit mechanism. Through the adoption of a method for quickly positioning and processing an XML node, a to-be-processed XML node can be positioned quickly, processing is finished, and whole XML parsing can be quickly completed. Compared with a traditional XML SAX PARSER, a lot of parsing time is saved (the parsing time refers to an operation time spent before a node is located and an idle time spent after thenode is processed), the convenient and flexible capability of treating the XML node is provided, and maintenance and expansion of a program are very easy.

1 citations


Book ChapterDOI
23 Oct 2018
TL;DR: This paper describes a unique approach to generate the graph based ontology that consists of four different phases and its working is improved and its produce more accurate result in term of accuracy.
Abstract: This paper describes a unique approach to generate the graph based ontology. Ontology is created from text graph. Many other tools are available to create the ontology but each tool has its own method and complex structure to generate ontology. Ontology is very popular in many fields today and also became the necessary part of www. In this paper, the proposed tool that is used to generate the ontology consists of four different phases. Each phase has its own purpose. First, the text graph is input in notepad and implementing in java (eclipse). Second, the output of first step is converts into XML file. Third, the XML file is parsed with the help of DOM or SAX parser. In the last step, XML file is converted into RDF file which is validating by the help of online RDF parser. In the future, the RDF file is converted into RDFS and in the last ontology is created. After this, the working of proposed tool is improved and its produce more accurate result in term of accuracy.

1 citations