Showing papers on "XML published in 2020"

PDF

Open Access

Book Chapter•DOI•

Image-Based Table Recognition: Data, Model, and Evaluation

[...]

Xu Zhong¹, Elaheh ShafieiBavani¹, Antonio Jimeno Yepes¹•Institutions (1)

23 Aug 2020

TL;DR: Li et al. as discussed by the authors proposed an attention-based encoder-dual-decoder (EDD) architecture that converts images of tables into HTML code, which has a structure decoder which reconstructs the table structure and a cell decoder to recognize cell content.

...read moreread less

Abstract: Important information that relates to a specific topic in a document is often organized in tabular format to assist readers with information retrieval and comparison, which may be difficult to provide in natural language. However, tabular data in unstructured digital documents, e.g. Portable Document Format (PDF) and images, are difficult to parse into structured machine-readable format, due to complexity and diversity in their structure and style. To facilitate image-based table recognition with deep learning, we develop and release the largest publicly available table recognition dataset PubTabNet (https://github.com/ibm-aur-nlp/PubTabNet.), containing 568k table images with corresponding structured HTML representation. PubTabNet is automatically generated by matching the XML and PDF representations of the scientific articles in PubMed Central™ Open Access Subset (PMCOA). We also propose a novel attention-based encoder-dual-decoder (EDD) architecture that converts images of tables into HTML code. The model has a structure decoder which reconstructs the table structure and helps the cell decoder to recognize cell content. In addition, we propose a new Tree-Edit-Distance-based Similarity (TEDS) metric for table recognition, which more appropriately captures multi-hop cell misalignment and OCR errors than the pre-established metric. The experiments demonstrate that the EDD model can accurately recognize complex tables solely relying on the image representation, outperforming the state-of-the-art by 9.7% absolute TEDS score.

...read moreread less

39 citations

Journal Article•DOI•

GTFS-Madrid-Bench: A benchmark for virtual knowledge graph access in the transport domain

[...]

David Chaves-Fraga¹, Freddy Priyatna¹, Andrea Cimmino¹, Jhon Toledo¹, Edna Ruckhaus¹, Oscar Corcho¹ - Show less +2 more•Institutions (1)

Technical University of Madrid¹

01 Dec 2020-Journal of Web Semantics

TL;DR: This paper presents GTFS-Madrid-Bench, a benchmark to evaluate OBDI engines that can be used for the provision of access mechanisms to virtual knowledge graphs, and introduces several scenarios that aim at measuring the query capabilities, performance and scalability of all these engines, considering their heterogeneity.

...read moreread less

28 citations

Journal Article•DOI•

Pubmed Parser: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset XML Dataset

[...]

Titipat Achakulvisut, Daniel E. Acuna, Konrad P. Kording

08 Feb 2020-The Journal of Open Source Software

TL;DR: Pubmed Parser is described, a software to mine Pubmed and MEDLINE efficiently that is built on top of Python and can therefore be integrated into a myriad of tools for machine learning such as scikit-learn and deep learningsuch as tensorflow and pytorch.

...read moreread less

Abstract: The number of biomedical publications is increasing exponentially every year. If we had the ability to access, manipulate, and link this information, we could extract knowledge that is perhaps hidden within the figures, text, and citations. In particular, the repositories made available by the PubMed and MEDLINE databases enable these kinds of applications at an unprecedented level. Examples of applications that can be built from this dataset range from predicting novel drug-drug interactions, classifying biomedical text data, searching specific oncological profiles, disambiguating author names, or automatically learning a biomedical ontology. Here, we describe Pubmed Parser (pubmed_parser), a software to mine Pubmed and MEDLINE efficiently. Pubmed Parser is built on top of Python and can therefore be integrated into a myriad of tools for machine learning such as scikit-learn and deep learning such as tensorflow and pytorch.

...read moreread less

17 citations

DOI•

Pubmed Parser: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset

[...]

Titipat Achakulvisut, Daniel E. Acuna, Konrad P. Kording

08 Feb 2020

17 citations

Proceedings Article•DOI•

End-to-End Extraction of Structured Information from Business Documents with Pointer-Generator Networks

[...]

Clément Sage, Alex Aussem, Véronique Eglin, Haytham Elghazel, Jérémy Espinas - Show less +1 more

20 Nov 2020

TL;DR: This paper discusses a new method for training extraction models directly from the textual value of information and shows that it performs competitively with a standard word classifier without requiring costly word level supervision.

...read moreread less

Abstract: The predominant approaches for extracting key information from documents resort to classifiers predicting the information type of each word. However, the word level ground truth used for learning is expensive to obtain since it is not naturally produced by the extraction task. In this paper, we discuss a new method for training extraction models directly from the textual value of information. The extracted information of a document is represented as a sequence of tokens in the XML language. We learn to output this representation with a pointer-generator network that alternately copies the document words carrying information and generates the XML tags delimiting the types of information. The ability of our end-to-end method to retrieve structured information is assessed on a large set of business documents. We show that it performs competitively with a standard word classifier without requiring costly word level supervision.

...read moreread less

15 citations

Book•

XML Security

[...]

Elisa Bertino, Elena Ferrari, B. Carminati, L. Parasiliti Provenza

09 Feb 2020

TL;DR: The Author-X system as mentioned in this paper provides an overall overview of both the model and the system, provided by AuthorX, for securing XML data especially conceived for push dissemination mode, allowing the specification of both access control and signature policies in order to require the satisfaction of confidentiality, integrity and authenticity requirements for both the receiving subjects and information owners.

...read moreread less

Abstract: The increasing ability to interconnect computers through internetworking, and the use of the Web, as the main means to exchange information, has sped up the development of a new class of information-centered applications focused on the dissemination of XML data. Since it is often the case that data managed by an information system are highly strategic and sensitive, a comprehensive framework for ensuring the main security properties in disseminating XML data, is needed. In our poster, we provide an overall overview of both the model and the system, provided by Author-X, for securing XML data especially conceived for push dissemination mode. Our model allows the specification of both access control and signature policies in order to require the satisfaction of confidentiality, integrity and authenticity requirements for both the receiving subjects and information owners. The Author-X system provides ad hoc techniques for enforcing the specified security policies.

...read moreread less

14 citations

Journal Article•DOI•

XML data manipulation in conventional and temporal XML databases: A survey

[...]

Zouhaier Brahmia¹, Hind Hamrouni¹, Rafik Bouaziz¹•Institutions (1)

University of Sfax¹

01 May 2020-Computer Science Review

TL;DR: An overview of the state-of-the-art of XML data manipulation, in conventional and temporal XML databases, studies the support of such functionality in mainstream commercial DBMSs and gives some remarks on possible future research directions related to this issue are provided.

...read moreread less

14 citations

Journal Article•DOI•

Mapping the OWASP Top Ten to Blockchain

[...]

Howard Poston

01 Jan 2020-Procedia Computer Science

TL;DR: Since blockchain-specific security guidance is currently lacking, mapping existing frameworks, such as OWASP, to the blockchain can help in the identification of potential vulnerabilities in blockchain systems.

...read moreread less

13 citations

Journal Article•DOI•

Extending intelligent content delivery in technical communication by semantics: microdocuments and content services

[...]

Wolfgang Ziegler¹•Institutions (1)

Karlsruhe University of Applied Sciences¹

01 Jan 2020

TL;DR: This article proposes how content users might benefit from semantic concepts by the delivery of sets of logically connected topics, which can be described as microDocs, which might also play a role in the provisioning of content by web-services being integrated into different types of content processing and content delivery applications.

...read moreread less

Abstract: We address and develop a new concept for the dynamic delivery of topic-based content created within the domain of technical communication. Corresponding content management environments introduced within the last decades, focused so far on semantically structured and mostly XML-based information models and, more recently, on semantic metadata using taxonomies leading together to concepts of so-called intelligent content. Latest developments attempt to extend these concepts with additional explicit semantic approaches modelled and implemented, for example, by using ontologies and related technologies. In this article, we propose how content users might benefit from these semantic concepts by the delivery of sets of logically connected topics, which can be described as microdocuments (“microDocs”). This generic approach of topic assemblies might also play a role in the provisioning of content by web-services being integrated into different types of content processing and content delivery applications.

...read moreread less

12 citations

Journal Article•DOI•

Using Digital Forensic Techniques to Identify Contract Cheating: A Case Study.

[...]

Clare S. Johnson¹, Ross Davies¹•Institutions (1)

University of New South Wales¹

02 Jan 2020-Journal of Academic Ethics

TL;DR: Digital forensics techniques were used to investigate a known case of contract cheating where the contract author has notified the university and the student subsequently confirmed that they had contracted the work out.

...read moreread less

Abstract: Contract cheating is a major problem in Higher Education because it is very difficult to detect using traditional plagiarism detection tools. Digital forensics techniques are already used in law to determine ownership of documents, and also in criminal cases, where it is not uncommon to hide information and images within an ordinary looking document using steganography techniques. These digital forensic techniques were used to investigate a known case of contract cheating where the contract author has notified the university and the student subsequently confirmed that they had contracted the work out. Microsoft Word documents use a format known as Office Open XML Format, and as such, it is possible to review the editing process of a document. A student submission known to have been contracted out was analysed using the revision identifiers within the document, and a tool was developed to review these identifiers. Using visualisation techniques it is possible to see a pattern of editing that is inconsistent with the pattern seen in an authentic document.

...read moreread less

11 citations

Journal Article•DOI•

Constructing the CityGML ADE for the Multi-Source Data Integration of Urban Flooding

[...]

Jie Shen, Jingyi Zhou, Jiemin Zhou, Lukáš Herman, Tomas Reznik - Show less +1 more

30 May 2020-ISPRS international journal of geo-information

TL;DR: The current CTWLADE can map the data required and provided by the hydraulic software tool storm water management model (SWMM) and is ready to be integrated into a Web 3D Service to provide the data for 3D dynamic visualization in interactive scenes.

...read moreread less

Abstract: Urban flooding, as one of the most serious natural disasters, has caused considerable personal injury and property damage throughout the world. To better cope with the problem of waterlogging, the experts have developed many waterlogging models that can accurately simulate the process of pipe network drainage and water accumulation. The study of urban waterlogging involves many data types. These data come from the departments of hydrology, meteorology, planning, surveying, and mapping, etc. The incoordination of space–time scale and format standard has brought huge obstacles to the study of urban waterlogging. This is not conducive to interpretation, transmission, and visualization in today’s network environment. In this paper, the entities and attributes related to waterlogging are defined. Based on the five modules of urban drainage network, sub basin, dynamic water body, time series, and meteorological data, the corresponding UML (Unified Modeling Language) model is designed and constructed. On this basis, the urban waterlogging application domain extension model city waterlogging application domain extension (CTWLADE) is established. According to the characteristics of different types of data, two different methods based on FME object and citygml4j are proposed to realize the corresponding data integration, and KML (Keyhole Markup Language) /glTF data organization form and the corresponding sharing method are proposed to solve the problem that the CTWLADE model data cannot be visualized directly on the web and cannot interact in three-dimensional format. To evaluate the CTWLADE, a prototype system was implemented, which can convert waterlogging-related multi-source data in extensible markup language (XML) files conform. The current CTWLADE can map the data required and provided by the hydraulic software tool storm water management model (SWMM) and is ready to be integrated into a Web 3D Service to provide the data for 3D dynamic visualization in interactive scenes.

...read moreread less

Proceedings Article•

Annotation of Quantification: The Current State of ISO 24617-12

[...]

Harry Bunt¹•Institutions (1)

Tilburg University¹

01 May 2020

TL;DR: The focus in this paper is on the structuring of the semantic information needed to characterise quantification in natural language and the representation of these structures in QuantML.

...read moreread less

Abstract: This paper discusses the current state of developing an ISO standard annotation scheme for quantification phenomena in natural language, as part of the ISO Semantic Annotation Framework (ISO 24617). A proposed approach that combines ideas from the theory of generalised quantifiers and from neo-Davidsonian event semantics was adopted by the ISO organisation in 2019 as a starting point for developing such an annotation scheme. * This scheme consists of (1) a conceptual ‘metamodel’ that visualises the types of entities, functions and relations that go into annotations of quantification; (2) an abstract syntax which defines ‘annotation structures’ as triples and other set-theoretic constructs; (3) an XML-based representation of annotation structures (‘concrete syntax’); and (4) a compositional semantics of annotation structures. The latter three components together define the interpreted markup language QuantML. The focus in this paper is on the structuring of the semantic information needed to characterise quantification in natural language and the representation of these structures in QuantML.

...read moreread less

Journal Article•DOI•

Enabling Massive XML-Based Biological Data Management in HBase

[...]

Jian Liu¹, Qiuru Liu¹, Lei Zhang², Shuhui Su¹, Yongzhuang Liu¹ - Show less +1 more•Institutions (2)

Harbin Institute of Technology¹, Zhejiang University of Science and Technology²

01 Nov 2020-IEEE/ACM Transactions on Computational Biology and Bioinformatics

TL;DR: This study reports a novel platform to store and query massive XML-based biological data collections, and a formal approach to transform the XML query model into the MapReduce query model is proposed.

...read moreread less

Abstract: Publishing biological data in XML formats is attractive for organizations who would like to provide their bioinformatics resources in an extensible and machine-readable format. In the era of big data, massive XML-based biological data management is emerged as a challengeable issue. With the continuous growth of the XML-based biological data sets, it is usually frustrating to use traditional declarative query languages to provide efficient query capabilities in terms of processing speed and scale. In this study, we report a novel platform to store and query massive XML-based biological data collections. A prototype tool for constructing HBase tables from XML-based biological data collections is first developed, and then a formal approach to transform the XML query model into the MapReduce query model is proposed. Finally, an evaluation of the query performance of the proposed approach on the existing XML-based biological databases is presented, showing that the performance advantages of the proposed solution. The source code of the massive XML-based biological data management platform is freely available at https://github.com/lyotvincent/X2H .

...read moreread less

Journal Article•DOI•

Automatic conversion of citygml to ifc

[...]

Nebras Salheb¹, G.A.K. Arroyo Ohori¹, Jantien Stoter¹•Institutions (1)

Delft University of Technology¹

03 Sep 2020-The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

TL;DR: This paper provides an insight into the relationship between the two standards and a methodology for the conversion from one to the other, and the process of developing software to perform such conversion.

...read moreread less

Abstract: . The trend of increased usage of both BIM and 3D GIS and the similarity between the two has led to an increase in the overlap between them. A key application of such overlap is providing geospatial context data for BIM models through importing 3D GIS-data to BIM software to help in different design-related issues. However, this is currently difficult because of the lack of support in BIM software for the formats and data models of 3D Geo-information. This paper deals with this issue by developing and implementing a methodology to convert the common open 3D city model data model into the most common open BIM data format, namely CityGML (Groger et al., 2012) to IFC (buildingsmart, 2019b). For the aim of this study, the two standards are divided into 5 comparable subparts: Semantics, Geometry, Geographical coordinates, Topology, and Encoding. The characteristics of each of these subparts are studied and a conversion method is proposed for each of them from the former standard to the latter. This is done by performing a semantic and geometrical mapping between the two standards, converting the georeferencing from global to local, converting the encoding that the two standards use from XML to STEP, and deciding which topological relations are to be retained. A prototype implementation has been created using Python to combine the above tasks. The work presented in this paper can provide a foundation for future work in converting CityGML to IFC. It provides an insight into the relationship between the two standards and a methodology for the conversion from one to the other, and the process of developing software to perform such conversion. This is done in a way that can be extended for future specific needs.

...read moreread less

Journal Article•DOI•

A survey and classification of XML based attacks on web applications

[...]

Charu Gupta¹, Rakesh Kumar Singh¹, Amar Kumar Mohapatra¹•Institutions (1)

Indira Gandhi Institute of Technology¹

12 Apr 2020-Information Security Journal: A Global Perspective

TL;DR: The authors have proposed a classification of XML based vulnerabilities based on exhaustive literature survey that will help the web developers for proposing secure parsers that will thwart such attacks.

...read moreread less

Abstract: XML based attacks are executed in web applications through crafted XML document that forces XML parser to process un-validated documents. This leads to disclosure of sensitive information, maliciou...

...read moreread less

DOI•

SmartCom Digital-SI (D-SI) XML exchange format for metrological data version 1.3.1

[...]

Daniel Hutzschenreuter, Frank Härtig, Thomas Wiedenhöfer, Siegfried Hackel, Alexander Scheibner, I M Smith, Clifford Brown, Wiebke Heeren - Show less +4 more

18 May 2020

Proceedings Article•DOI•

Learning, Teaching, and Making Music Together in the COVID-19 Era Through IEEE 1599

[...]

Adriano Baratè¹, Goffredo Haus¹, Luca A. Ludovico¹•Institutions (1)

University of Milan¹

17 Sep 2020

TL;DR: This paper focuses on IEEE 1599 applicability to music education and geographically-distributed performance in this period of self-isolation and home schooling due to Coronavirus disease.

...read moreread less

Abstract: IEEE 1599 is an international standard aiming to describe music and music-related information through a multilayer approach. The idea is to organize multiple and heterogeneous materials related to a single musical piece within a hierarchical and highly-interconnected structure expressed via the XML syntax, so as to support multimodal and synchronized experience of music content. Another relevant feature is the possibility to enjoy IEEE 1599 materials via network, thanks to ad hoc Web applications already publicly available. This paper focuses on IEEE 1599 applicability to music education and geographically-distributed performance in this period of self-isolation and home schooling due to Coronavirus disease. Moreover, the lesson learned from the experimentation during the emergency phase is inspiring some improvements for the format, which is currently under revision by the IEEE Working Group for XML Musical Application.

...read moreread less

Proceedings Article•DOI•

Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web

[...]

Xin Luna Dong¹, Hannaneh Hajishirzi², Colin Lockard³, Prashant Shiralkar¹•Institutions (3)

Amazon.com¹, Allen Institute for Artificial Intelligence², University of Washington³

23 Aug 2020

TL;DR: In this article, the authors present approaches for information extraction from Web data that can be differentiated along two key dimensions: 1) the diversity in data modality that is leveraged, e.g. text, visual, XML/HTML, and 2) the thrust to develop scalable approaches with zero to limited human supervision.

...read moreread less

Abstract: How do we surface the large amount of information present in HTML documents on the Web, from news articles to Rotten Tomatoes pages to tables of sports scores? Such information can enable a variety of applications including knowledge base construction, question answering, recommendation, and more. In this tutorial, we present approaches for information extraction (IE) from Web data that can be differentiated along two key dimensions: 1) the diversity in data modality that is leveraged, e.g. text, visual, XML/HTML, and 2) the thrust to develop scalable approaches with zero to limited human supervision.

...read moreread less

Journal Article•DOI•

A Landscape of XML Data from Analytics Perspective

[...]

Pratima Singh, Shelly Sachdeva

01 Jan 2020-Procedia Computer Science

TL;DR: The finding is that the performance of XML data in a health care environment is better if the authors use BaseX instead of eXist-DB, and Analytic SQL is dominating XQuery to perform analytical functions.

...read moreread less

Posted Content•

A High-Quality Multilingual Dataset for Structured Documentation Translation

[...]

Kazuma Hashimoto, Raffaella Buschiazzo, James Bradbury, Teresa Marshall, Richard Socher, Caiming Xiong - Show less +2 more

24 Jun 2020-arXiv: Computation and Language

TL;DR: This paper builds and evaluates translation models for seven target languages from English, with several different copy mechanisms and an XML-constrained beam search, and provides a detailed human analysis of gaps between the model output and human translations for real-world applications, including suitability for post-editing.

...read moreread less

Abstract: This paper presents a high-quality multilingual dataset for the documentation domain to advance research on localization of structured text Unlike widely-used datasets for translation of plain text, we collect XML-structured parallel text segments from the online documentation for an enterprise software platform These Web pages have been professionally translated from English into 16 languages and maintained by domain experts, and around 100,000 text segments are available for each language pair We build and evaluate translation models for seven target languages from English, with several different copy mechanisms and an XML-constrained beam search We also experiment with a non-English pair to show that our dataset has the potential to explicitly enable $17 \times 16$ translation settings Our experiments show that learning to translate with the XML tags improves translation accuracy, and the beam search accurately generates XML structures We also discuss trade-offs of using the copy mechanisms by focusing on translation of numerical words and named entities We further provide a detailed human analysis of gaps between the model output and human translations for real-world applications, including suitability for post-editing

...read moreread less

Posted Content•DOI•

mzMLb: a future-proof raw mass spectrometry data format based on standards-compliant mzML and optimized for speed and storage requirements

[...]

Ranjeet S Bhamber¹, Andris Jankevics², Eric W. Deutsch³, Andrew R. Jones⁴, Andrew W. Dowsey¹ - Show less +1 more•Institutions (4)

University of Bristol¹, University of Birmingham², Institute for Systems Biology³, University of Liverpool⁴

20 Mar 2020-bioRxiv

TL;DR: An HDF5 file format ‘mzMLb’ that is optimised for both read/write speed and storage of the raw mass spectrometry data is proposed, demonstrating a flexible format that with or without compression is faster than all existing approaches in virtually all cases.

...read moreread less

Abstract: With ever-increasing amounts of data produced by mass spectrometry (MS) proteomics and metabolomics, and the sheer volume of samples now analyzed, the need for a common open format possessing both file size efficiency and faster read/write speeds has become paramount to drive the next generation of data analysis pipelines. The Proteomics Standards Initiative (PSI) has established a clear and precise XML representation for data interchange, mzML, receiving substantial uptake; nevertheless, storage and file access efficiency has not been the main focus. We propose an HDF5 file format 9mzMLb9 that is optimised for both read/write speed and storage of the raw mass spectrometry data. We provide extensive validation of write speed, random read speed and storage size, demonstrating a flexible format that with or without compression is faster than all existing approaches in virtually all cases, while with compression, is comparable in size to proprietary vendor file formats. Since our approach uniquely preserves the XML encoding of the metadata, the format implicitly supports future versions of mzML and is straightforward to implement: mzMLb9s design adheres to both HDF5 and NetCDF4 standard implementations, which allows it to be easily utilised by third parties due to their widespread programming language support. A reference implementation within the established ProteoWizard toolkit is provided.

...read moreread less

Proceedings Article•DOI•

Integrating the Common Variability Language with Multilanguage Annotations for Web Engineering

[...]

José Miguel Horcas Aguilera, Alejandro Cortiñas, Lidia Fuentes, Miguel R. Luaces

15 Jan 2020-arXiv: Software Engineering

TL;DR: In this paper, the Common Variability Language (CVL) is used as a composition-based approach and annotations to manage fine-grained variability of a Software Product Line for web applications.

...read moreread less

Abstract: Web applications development involves managing a high diversity of files and resources like code, pages or style sheets, implemented in different languages. To deal with the automatic generation of custom-made configurations of web applications, industry usually adopts annotation-based approaches despite the majority of studies encourage the use of composition-based approaches to implement Software Product Lines. Recent work tries to combine both approaches to get the complementary benefits. However, technological companies are reticent to adopt new development paradigms such as feature-oriented programming or aspect-oriented programming. Moreover, it is extremely difficult, or even impossible, to apply these programming models to web applications, mainly because of their multilingual nature, since their development involves multiple types of source code (Java, Groovy, JavaScript), templates (HTML, Markdown, XML), style sheet files (CSS and its variants, such as SCSS), and other files (JSON, YML, shell scripts). We propose to use the Common Variability Language as a composition-based approach and integrate annotations to manage fine grained variability of a Software Product Line for web applications. In this paper, we (i) show that existing composition and annotation-based approaches, including some well-known combinations, are not appropriate to model and implement the variability of web applications; and (ii) present a combined approach that effectively integrates annotations into a composition-based approach for web applications. We implement our approach and show its applicability with an industrial real-world system.

...read moreread less

Journal Article•DOI•

GPML: an XML-based standard for the interchange of genetic programming trees

[...]

Tiantian Dou¹, Yuri Kaszubowski Lopes¹, Yuri Kaszubowski Lopes², Peter I. Rockett¹, Elizabeth Abigail Hathway¹, Esmail M. Saber³, Esmail M. Saber¹ - Show less +3 more•Institutions (3)

University of Sheffield¹, Federal University of Technology - Paraná², London South Bank University³

01 Dec 2020-Genetic Programming and Evolvable Machines

TL;DR: A formal definition of this GPML standard is presented and a case study where GPML is used to implement a model predictive controller for the control of a building heating plant is presented.

...read moreread less

Abstract: We propose a genetic programming markup language (GPML), an XML-based standard for the interchange of genetic programming trees, and outline the benefits such a format would bring in allowing the deployment of trained genetic programming (GP) models in applications as well as the subsidiary benefit of allowing GP researchers to directly share trained trees. We present a formal definition of this standard and describe details of an implementation. In addition, we present a case study where GPML is used to implement a model predictive controller for the control of a building heating plant.

...read moreread less

Journal Article•DOI•

Pentagonal scheme for dynamic XML prefix labelling

[...]

Ebtesam Taktek¹, Dhavalkumar Thakker¹•Institutions (1)

University of Bradford¹

17 Dec 2020-Knowledge Based Systems

TL;DR: This research presents a novel dynamic XML labelling scheme, called the Pentagonal Scheme, in which data are represented as ordered XML nodes with relationships between them, which efficiently supports random skewed updates, has fast calculations and uncomplicated implementations to thus handle updates efficiently.

...read moreread less

Abstract: In XML databases, the indexing process is based on a labelling or numbering scheme and generally used to label an XML document to perform an XML query using the path node information. Moreover, a labelling scheme helps to capture the structural relationships during query processing without the need to access the physical document. Two of the main problems for labelling XML schemes are duplicated labels, and cost efficiency regarding labelling time and size. This research presents a novel dynamic XML labelling scheme, called the Pentagonal Scheme, in which data are represented as ordered XML nodes with relationships between them. The update of these nodes from large-scale XML documents has been widely investigated and represents a challenging research problem, as it means relabelling a whole tree. Our algorithms provide an efficient dynamic XML labelling scheme to support data updates without duplicating labels or relabelling old nodes. Our work evaluates the labelling process in terms of size and time, and evaluates the labelling scheme’s ability to handle several insertions in XML documents. The findings indicate that the Pentagonal scheme shows a better initial labelling time performance than compared schemes. Particularly when using large size XML datasets. Moreover, it efficiently supports random skewed updates, has fast calculations and uncomplicated implementations to thus handle updates efficiently. In addition, the comparable evaluation of the query response time and relationships in Pentagonal scheme can be efficiently performed without presenting any extra cost. It was for this reason that our labelling scheme achieved the goal of this research.

...read moreread less

Journal Article•DOI•

Efficient Generation of a Digital Twin Using Object Detection for Data Acquisition and XML-Interface for Model Creation

[...]

Berend Denkena¹, Marc-André Dittrich¹, Sebastian Stobrawa¹, Josip Stjepandić•Institutions (1)

Leibniz University of Hanover¹

01 Jan 2020-Procedia CIRP

TL;DR: An approach for generating a Digital Twin efficiently is presented, in which the obstacles will be overcome by using fast scans of the shop floor and subsequent object recognition, and the focus here is on creating a simulation model.

...read moreread less

Journal Article•DOI•

An effective quality analysis of XML web data using hybrid clustering and classification approach

[...]

M. Gopianand¹, P. Jaganathan¹•Institutions (1)

PSNA College of Engineering and Technology¹

01 Feb 2020

TL;DR: An effective quality analysis of XML web data using clustering and classification approach is used in JAVA to assess the XML data quality and the web pages are effectively ranked.

...read moreread less

Abstract: An effective quality analysis of XML web data using clustering and classification approach is used in our proposed method. XML is turning into a standard in representation of data, it is attractive to support keyword search in XML database. A keyword search searches for words anyplace in record. It is developed as best worldview for finding data on web. The most imperative prerequisite for the keyword search is to rank the consequences of question so that the most pertinent outcomes show up. Here, we gather more XML documents. Followed by that, feature extraction occurs. Since the selected feature contains both relevant as well as irrelevant features it is essential to filter the irrelevant features. For the purpose of selecting, the relevant features probability-based feature selection method is used. Then for clustering the relevant features on the basis of keywords weighted fuzzy c means clustering algorithm is used. In order to assess the XML data quality, optimal neural network (ONN) classifier is utilized. In this ONN classifier in order to select the optimal weights, whale optimization algorithm is used. Thus, the web pages are effectively ranked. The efficiency of the proposed method is assessed using clustering and classification accuracy, RMSE, and search time. The proposed method is implemented in JAVA.

...read moreread less

Journal Article•DOI•

A Tool for Generating Health Applications Using Archetypes

[...]

Andre Araujo¹, Valéria Cesário Times¹, Marcus Silva¹•Institutions (1)

Federal University of Pernambuco¹

01 Jan 2020-IEEE Software

TL;DR: Template4EHR is a to ol for the dynamic creation of data schemas for electronichealth-record storage and user creation and customization of graphical user interfaces.

...read moreread less

Abstract: Template4EHR is a to ol for the dynamic creation of data schemas for electronichealth-record storage and user creation and customization of graphical user interfaces. In experimental tests with IT and health professionals, Template4EHR obtained an 81.22% satisfaction rate.

...read moreread less

Proceedings Article•DOI•

OPC UA based IEC 61499 Device Configuration Interface

[...]

Muddasir Shakil¹, Alois Zoitl¹•Institutions (1)

Johannes Kepler University of Linz¹

10 Jun 2020

TL;DR: This paper introduces the concept of a new configuration interface for the IEC 61499 devices using OPC UA information modeling concepts and demonstrates how this interface can be implemented in XML and Binary XML.

...read moreread less

Abstract: In the modern era of industrial automation, the term Industry 4.0 is defined as the fourth industrial revolution. This is a phenomenon where technologies from various layers of an enterprise are interconnected and form a meshed network of self-regulated, adaptive, re-configurable and self-optimizing devices. These devices vary from Programmable Logic Controller, embedded PCs, edge nodes, smart sensors, and actuators, working as proxies or mediators for a real object in the software domain integrating into Intelligent Enterprise Applications. Heterogeneous configuration interfaces of these devices hinder smooth integration and configuration process. A unified way of interacting with the devices for configuration is well-defined in the IEC 61499 standard. The standard defines the commands, interaction behavior, and interface description for the control devices and engineering tools. There are implementations of the configuration interface in XML and Binary XML, which are widely used for their flexible, extensible, and human-readable nature. Whereas the OPC UA can offer an open configuration interface for the IEC 61499 devices and software tools, with built-in interoperability solutions. This paper introduces the concept of a new configuration interface for the IEC 61499 devices using OPC UA information modeling concepts.

...read moreread less

Journal Article•DOI•

EMTF XML: New data interchange format and conversion tools for electromagnetic transfer functions

[...]

Anna Kelbert¹•Institutions (1)

United States Geological Survey¹

01 Jan 2020-Geophysics

TL;DR: A new format for MT and related electromagnetic transfer function extensible markup language (EMTF XML) is developed, which is a novel, self-describing, searchable, and extensible way to store the data.

...read moreread less

Abstract: Initial processing of magnetotelluric (MT) data collected at a site results in a small data file that defines the MT transfer functions (MT TFs) or variants at a discrete set of frequencies...

...read moreread less

Journal Article•DOI•

XChange: A semantic diff approach for XML documents

[...]

Alessandreia Marta de Oliveira¹, Alessandreia Marta de Oliveira², Troy C. Kohwalter¹, Marcos Kalinowski³, Leonardo Murta¹, Vanessa Braganholo¹ - Show less +2 more•Institutions (3)

Federal Fluminense University¹, Universidade Federal de Juiz de Fora², Pontifical Catholic University of Rio de Janeiro³

01 Dec 2020-Information Systems

TL;DR: Results of an experimental study indicate that XChange can provide higher effectiveness and efficiency when used to understand changes between versions of XML documents when compared with the (syntactic) state-of-the-art approaches.

...read moreread less

Collapse