Showing papers on "XML published in 2017"

PDF

Open Access

Journal Article•DOI•

The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations

[...]

Robbie Love¹, Claire Dembry², Andrew Hardie¹, Vaclav Brezina¹, Tony McEnery¹ - Show less +1 more•Institutions (2)

Lancaster University¹, University of Cambridge²

23 Nov 2017-International Journal of Corpus Linguistics

TL;DR: The Spoken British National Corpus 2014 is introduced, an 11.5-million-word corpus of orthographically transcribed conversations among L1 speakers of British English from across the UK, recorded in the years 2012–2016.

...read moreread less

Abstract: This paper introduces the Spoken British National Corpus 2014, an 11.5-million-word corpus of orthographically transcribed conversations among L1 speakers of British English from across the UK, recorded in the years 2012–2016. After showing that a survey of the recent history of corpora of spoken British English justifies the compilation of this new corpus, we describe the main stages of the Spoken BNC2014’s creation: design, data and metadata collection, transcription, XML encoding, and annotation. In doing so we aim to (i) encourage users of the corpus to approach the data with sensitivity to the many methodological issues we identified and attempted to overcome while compiling the Spoken BNC2014, and (ii) inform (future) compilers of spoken corpora of the innovations we implemented to attempt to make the construction of corpora representing spontaneous speech in informal contexts more tractable, both logistically and practically, than in the past.

...read moreread less

159 citations

Journal Article•DOI•

GasLib—A Library of Gas Network Instances

[...]

Martin Schmidt¹, Denis Aßmann, Robert Burlacu², Jesco Humpola³, Imke Joormann⁴, Nikolaos Kanelakis⁵, Thorsten Koch⁶, Djamal Oucherif, Marc E. Pfetsch⁷, Lars Schewe⁸, Robert Schwarz³, Mathias Sirvent² - Show less +8 more•Institutions (8)

University of Trier¹, University of Erlangen-Nuremberg², Zuse Institute Berlin³, Braunschweig University of Technology⁴, Aristotle University of Thessaloniki⁵, Technical University of Berlin⁶, Technische Universität Darmstadt⁷, University of Edinburgh⁸

01 Dec 2017

TL;DR: The goal of GasLib is to provide a set of publicly available gas network instances that can be used by researchers in the field of gas transport to save time and compare different models and algorithms on the same specified test sets.

...read moreread less

Abstract: The development of mathematical simulation and optimization models and algorithms for solving gas transport problems is an active field of research. In order to test and compare these models and algorithms, gas network instances together with demand data are needed. The goal of GasLib is to provide a set of publicly available gas network instances that can be used by researchers in the field of gas transport. The advantages are that researchers save time by using these instances and that different models and algorithms can be compared on the same specified test sets. The library instances are encoded in an XML (extensible markup language) format. In this paper, we explain this format and present the instances that are available in the library.

...read moreread less

113 citations

Proceedings Article•DOI•

PURE: A Dataset of Public Requirements Documents

[...]

Alessio Ferrari¹, Giorgio Oronzo Spagnolo¹, Stefania Gnesi¹•Institutions (1)

Istituto di Scienza e Tecnologie dell'Informazione¹

01 Sep 2017

TL;DR: The PURE (PUblic REquirements dataset), a dataset of 79 publicly available natural language requirements documents collected from the Web, is presented and its language with generic English texts is compared, showing the peculiarities of the requirements jargon.

...read moreread less

Abstract: This paper presents PURE (PUblic REquirements dataset), a dataset of 79 publicly available natural language requirements documents collected from the Web. The dataset includes 34,268 sentences and can be used for natural language processing tasks that are typical in requirements engineering, such as model synthesis, abstraction identification and document structure assessment. It can be further annotated to work as a benchmark for other tasks, such as ambiguity detection, requirements categorisation and identification of equivalent re-quirements. In the paper, we present the dataset and we compare its language with generic English texts, showing the peculiarities of the requirements jargon, made of a restricted vocabulary of domain-specific acronyms and words, and long sentences. We also present the common XML format to which we have manually ported a subset of the documents, with the goal of facilitating replication of NLP experiments.

...read moreread less

83 citations

Journal Article•DOI•

JavaScript Object Notation (JSON) data serialization for IFC schema in web-based BIM data exchange

[...]

Kereshmeh Afsari¹, Charles Eastman¹, Daniel Castro-Lacouture¹•Institutions (1)

Georgia Institute of Technology¹

01 May 2017-Automation in Construction

TL;DR: The analysis of results indicates that ifcJSON4 schema developed in this paper is a valid JSON schema that can guide the creation of valid ifc JSON documents to be used for web-based data transfer and to improve interoperability of Cloud-based BIM applications.

...read moreread less

77 citations

Proceedings Article•DOI•

Transkribus - A Service Platform for Transcription, Recognition and Retrieval of Historical Documents

[...]

Philip Kahle¹, Sebastian Colutto¹, Gunter Hackl, Günter Mühlberger¹•Institutions (1)

University of Innsbruck¹

01 Nov 2017

TL;DR: Transkribus is a comprehensive platform for the computer-aided transcription, recognition and retrieval of digitized historical documents through an open-source desktop application that incorporates means to segment document images, to add a transcription and to tag entities within.

...read moreread less

Abstract: Transkribus is a comprehensive platform for the computer-aided transcription, recognition and retrieval of digitized historical documents. The main user interface is provided via an open-source desktop application that incorporates means to segment document images, to add a transcription and to tag entities within. The desktop application is able to connect to the platform's backend, which implements a document management system as well as several tools for document image analysis, such as layout analysis or automatic/handwritten text recognition (ATR/HTR). Access to documents, uploaded to the platform, may be granted to other users in order to collaborate on the transcription and to share results.

...read moreread less

76 citations

Posted Content•

Deep Extreme Multi-label Learning

[...]

Wenjie Zhang¹, Junchi Yan², Xiangfeng Wang¹, Hongyuan Zha¹•Institutions (2)

East China Normal University¹, Shanghai Jiao Tong University²

12 Apr 2017-arXiv: Learning

TL;DR: In this article, a graph priors-based label space modeling method was proposed for extreme multi-label classification. But it is not suitable for XML, where the label space can be as large as in millions.

...read moreread less

Abstract: Extreme multi-label learning (XML) or classification has been a practical and important problem since the boom of big data. The main challenge lies in the exponential label space which involves $2^L$ possible label sets especially when the label dimension $L$ is huge, e.g., in millions for Wikipedia labels. This paper is motivated to better explore the label space by originally establishing an explicit label graph. In the meanwhile, deep learning has been widely studied and used in various classification problems including multi-label classification, however it has not been properly introduced to XML, where the label space can be as large as in millions. In this paper, we propose a practical deep embedding method for extreme multi-label classification, which harvests the ideas of non-linear embedding and graph priors-based label space modeling simultaneously. Extensive experiments on public datasets for XML show that our method performs competitive against state-of-the-art result.

...read moreread less

42 citations

Journal Article•DOI•

An integrated system for building structural health monitoring and early warning based on an Internet of things approach

[...]

Jun Wang¹, Yongfeng Fu², Xiaokang Yang¹•Institutions (2)

Shanghai Jiao Tong University¹, Zhejiang University of Technology²

26 Jan 2017-International Journal of Distributed Sensor Networks

TL;DR: The proposed Internet of things–based integrated information system is demonstrated to improve the effectiveness of monitoring processes and decision making in construction informatics applications and highlights the crucial importance of a systematic approach toward integrated information systems for effective information collection and structural health monitoring.

...read moreread less

Abstract: The intelligent security monitoring of buildings and their surroundings has become increasingly crucial as the number of high-rise buildings increases. Building structural health monitoring and early warning technology are key components of building safety, the implementation of which remains challenging, and the Internet of things approach provides a new technical measure for addressing this challenge. This article presents a novel integrated information system that combines Internet of things, building information management, early warning system, and cloud services. Specifically, the system involves an intelligent data box with enhanced connectivity and exchangeability for accessing and integrating the data obtained from distributed heterogeneous sensing devices. An extensible markup language (XML)–based uniform data parsing model is proposed to abstract the various message formats of heterogeneous devices to ensure data integration. The proposed Internet of things–based integrated information system s...

...read moreread less

32 citations

Proceedings Article•DOI•

Automatically reducing tree-structured test inputs

[...]

Satia Herfert¹, Jibesh Patra¹, Michael Pradel¹•Institutions (1)

Technische Universität Darmstadt¹

30 Oct 2017

TL;DR: The GTR algorithm is presented, an effective and efficient technique to reduce arbitrary test inputs that can be represented as a tree, such as program code, PDF files, and XML documents, and automatically specializes the tree transformations applied by the algorithm based on examples of input trees.

...read moreread less

Abstract: Reducing the test input given to a program while preserving some property of interest is important, e.g., to localize faults or to reduce test suites. The well-known delta debugging algorithm and its derivatives automate this task by repeatedly reducing a given input. Unfortunately, these approaches are limited to blindly removing parts of the input and cannot reduce the input by restructuring it. This paper presents the Generalized Tree Reduction (GTR) algorithm, an effective and efficient technique to reduce arbitrary test inputs that can be represented as a tree, such as program code, PDF files, and XML documents. The algorithm combines tree transformations with delta debugging and a greedy backtracking algorithm. To reduce the size of the considered search space, the approach automatically specializes the tree transformations applied by the algorithm based on examples of input trees. We evaluate GTR by reducing Python files that cause interpreter crashes, JavaScript files that cause browser inconsistencies, PDF documents with malicious content, and XML files used to tests an XML validator. The GTR algorithm reduces the trees of these files to 45.3%, 3.6%, 44.2%, and 1.3% of the original size, respectively, outperforming both delta debugging and another state-of-the-art algorithm.

...read moreread less

31 citations

Posted Content•

Automated Migration of Hierarchical Data to Relational Tables using Programming-by-Example

[...]

Navid Yaghmazadeh¹, Xinyu Wang¹, Isil Dillig¹•Institutions (1)

University of Texas at Austin¹

10 Nov 2017-arXiv: Programming Languages

TL;DR: This paper presents a novel programming-by-example approach, and its implementation in a tool called Mitra, for automatically migrating tree-structured documents to relational tables, and shows that Mitra can automate the desired transformation for all datasets.

...read moreread less

Abstract: While many applications export data in hierarchical formats like XML and JSON, it is often necessary to convert such hierarchical documents to a relational representation. This paper presents a novel programming-by-example approach, and its implementation in a tool called Mitra, for automatically migrating tree-structured documents to relational tables. We have evaluated the proposed technique using two sets of experiments. In the first experiment, we used Mitra to automate 98 data transformation tasks collected from StackOverflow. Our method can generate the desired program for 94% of these benchmarks with an average synthesis time of 3.8 seconds. In the second experiment, we used Mitra to generate programs that can convert real-world XML and JSON datasets to full-fledged relational databases. Our evaluation shows that Mitra can automate the desired transformation for all datasets.

...read moreread less

28 citations

Journal Article•DOI•

Leveraging bilingual terminology to improve machine translation in a CAT environment

[...]

Mihael Arcan¹, Marco Turchi, Sara Tonelli, Paul Buitelaar¹•Institutions (1)

National University of Ireland¹

01 Sep 2017-Natural Language Engineering

TL;DR: This work evaluates the proposed framework that, taking as input a small set of parallel documents, gathers domain-specific bilingual terms and injects them into an SMT system to enhance translation quality and compares two terminology injection methods that can be easily used at run-time without altering the normal activity of anSMT system.

...read moreread less

Abstract: This work focuses on the extraction and integration of automatically aligned bilingual terminology into a Statistical Machine Translation (SMT) system in a Computer Aided Translation scenario We evaluate the proposed framework that, taking as input a small set of parallel documents, gathers domain-specific bilingual terms and injects them into an SMT system to enhance translation quality Therefore, we investigate several strategies to extract and align terminology across languages and to integrate it in an SMT system We compare two terminology injection methods that can be easily used at run-time without altering the normal activity of an SMT system: XML markup and cache-based model We test the cache-based model on two different domains (information technology and medical) in English, Italian and German, showing significant improvements ranging from 223 to 678 BLEU points over a baseline SMT system and from 005 to 303 compared to the widely-used XML markup approach

...read moreread less

20 citations

Proceedings Article•DOI•

Treatment of Markup in Statistical Machine Translation.

[...]

Mathias Müller¹•Institutions (1)

University Hospital Regensburg¹

01 Sep 2017

TL;DR: In the experiments, hybrid reinsertion has proven the most accurate method to handle markup, while alignment masking and alignment reinsertions should be regarded as viable alternatives.

...read moreread less

Abstract: We present work on handling XML markup in Statistical Machine Translation (SMT). The methods we propose can be used to effectively preserve markup (for instance inline formatting or structure) and to place markup correctly in a machine-translated segment. We evaluate our approaches with parallel data that naturally contains markup or where markup was inserted to create synthetic examples. In our experiments, hybrid reinsertion has proven the most accurate method to handle markup, while alignment masking and alignment reinsertion should be regarded as viable alternatives. We provide implementations of all the methods described and they are freely available as an open-source framework.

...read moreread less

Proceedings Article•DOI•

From implicit semantics towards ontologies — practical considerations from the INTER-IoT perspective

[...]

Maria Ganzha¹, Marcin Paprzycki¹, Wieslaw Pawlowski², Pawel Szmeja¹, Katarzyna Wasielewska¹, Carlos E. Palau³ - Show less +2 more•Institutions (3)

Polish Academy of Sciences¹, University of Gdańsk², Polytechnic University of Valencia³

01 Jan 2017

TL;DR: This work is particularly interested in establishing what methods and tools exist to create OWL ontologies from implicitly expressed semantics, and focuses on popular data formats i.e. XML, JSON, RDF, Relational Databases and NoSQL Databases.

...read moreread less

Abstract: From the general SOA architectural pattern, through distributed computing based on Grids and Clouds, to the Internet of Things, the idea of collaboration between software entities, independent from their vendors and technologies, attracts much attention. This brings about a question: how to achieve interoperability among multiple (existing and upcoming) platforms/systems/applications. The context for the presented research is provided by the INTER-IoT project, which deals with different aspects of interoperability in the Internet of Things (IoT). It aims at the design and implementation of an open framework and associated methodology to provide interoperability among heterogeneous IoT platforms, across a software stack (devices, network, middleware, application services, data and semantics). We focus on the data and semantics layer. Specifically, the role of ontologies and semantic data processing, as means of achieving interoperability. However, since the vision of the Semantic Web remains mostly unfulfilled, semantics remains implicitly “hidden” data and in exchanged messages. Therefore, we are particularly interested in establishing what methods and tools exist to create OWL ontologies from implicitly expressed semantics. We focus on popular data formats i.e. XML, JSON, RDF, Relational Databases and NoSQL Databases.

...read moreread less

Journal Article•DOI•

XML interoperability standards for seamless communication: An analysis of industry-neutral and domain-specific initiatives

[...]

Claudia-Melania Chituc¹•Institutions (1)

Eindhoven University of Technology¹

01 Nov 2017-Computers in Industry

TL;DR: An up-to-date review of current XML-based industry-neutral and domain-specific standardization initiatives aiming at achieving seamless interoperability among communication systems is presented, discussing their commonalities and differences, and highlighting directions for further research and development work.

...read moreread less

Proceedings Article•DOI•

Towards a framework for tensor ontologies over Neo4j: Representations and operations

[...]

Georgios Drakopoulos, Andreas Kanavos¹, Phivos Mylonas, Spyros Sioutas, Dimitrios Tsolis¹ - Show less +1 more•Institutions (1)

University of Patras¹

01 Aug 2017

TL;DR: Multilayer graphs, namely graphs whose labeled edges belong to a number of predetermined classes, have been recently introduced in social network analysis in order to represent the different interaction options between netizens and the potential of applying this new type of graphs to an ontological context creating essentially an ontology tensor is outlined and its complexity is assessed.

...read moreread less

Abstract: Ontology has been an active research field connecting philosophy, logic, history, mathematics, and computer science to name a few. Within an ontological context defined over a domain the entities as well as their associated relationships can be represented by the vertices and the edges of a tree. From the latter new knowledge can be then inferred through a number of techniques including Horn logic from reasoners and RDF triplets. With the advent of the Semantic Web and sophisticated associated software tools including graph databases such as Neo4j, Sparksee, and TitanDB or XML parsers such as Xerces graph mining is done efficiently on the semantic level instead of the combinatorial or algebraic ones. Multilayer graphs, namely graphs whose labeled edges belong to a number of predetermined classes, have been recently introduced in social network analysis in order to represent the different interaction options between netizens. In this work, the potential of applying this new type of graphs to an ontological context creating essentially an ontological tensor is outlined and its complexity is assessed. A human readable dataset based on the late 1970s and early 1980s Apple manually constructed from the 2011 officially authorized biography of Steve Jobs and the 1999 film Pirates of Silicon Valley serves as a concrete example complete with Neo4j queries.

...read moreread less

Journal Article•DOI•

Learning to cite framework: How to automatically construct citations for hierarchical data

[...]

Gianmaria Silvello¹•Institutions (1)

University of Padua¹

01 Jun 2017

TL;DR: An extensive evaluation of the proposed citation system is conducted, showing that it represents a suitable solution that can be easily employed in real‐world environments and that reduces human intervention on data to a minimum.

...read moreread less

Abstract: The practice of citation is foundational for the propagation of knowledge along with scientific development and it is one of the core aspects on which scholarship and scientific publishing rely. Within the broad context of data citation, we focus on the automatic construction of citations problem for hierarchically structured data. We present the "learning to cite" framework, which enables the automatic construction of human- and machine-readable citations with different levels of coarseness. The main goal is to reduce the human intervention on data to a minimum and to provide a citation system general enough to work on heterogeneous and complex XML data sets. We describe how this framework can be realized by a system for creating citations to single nodes within an XML data set and, as a use case, show how it can be applied in the context of digital archives. We conduct an extensive evaluation of the proposed citation system by analyzing its effectiveness from the correctness and completeness viewpoints, showing that it represents a suitable solution that can be easily employed in real-world environments and that reduces human intervention on data to a minimum.

...read moreread less

Book Chapter•DOI•

Document Type Definition (DTD)

[...]

Judith Wusteman

15 Mar 2017

TL;DR: The basic syntax of the DTD is described and it is compared to its two main rivals: W3C XML Schema and RELAX NG.

...read moreread less

Abstract: Document Type Definitions (DTDs) are schemas that describe the structure and, to a limited extent, the content of Extensible Markup Language (XML) and Standard Generalized Markup Language (SGML) documents. At its inception, the XML standard inherited the DTD from SGML as its only schema language. Many alternative schema languages have subsequently been developed for XML. But the DTD is still alive and actively used to define narrative-based document types. This entry describes the basic syntax of the DTD and compares it to its two main rivals: W3C XML Schema and RELAX NG

...read moreread less

Posted Content•

Warehousing complex data from the Web

[...]

Omar Boussaid¹, Jérôme Darmont¹, Fadila Bentayeb¹, Sabine Loudcher¹•Institutions (1)

University of Lyon¹

02 Jan 2017-arXiv: Databases

TL;DR: In this paper, the authors present a complex data warehousing methodology that exploits XML as a pivot language, which includes the integration of complex data in an ODS, under the form of XML documents; their dimensional modeling and storage in an XML data warehouse; and their analysis with combined OLAP and data mining techniques.

...read moreread less

Abstract: The data warehousing and OLAP technologies are now moving onto handling complex data that mostly originate from the Web. However, intagrating such data into a decision-support process requires their representation under a form processable by OLAP and/or data mining techniques. We present in this paper a complex data warehousing methodology that exploits XML as a pivot language. Our approach includes the integration of complex data in an ODS, under the form of XML documents; their dimensional modeling and storage in an XML data warehouse; and their analysis with combined OLAP and data mining techniques. We also address the crucial issue of performance in XML warehouses.

...read moreread less

Proceedings Article•DOI•

A framework to support data interoperability in web objects based IoT environments

[...]

Muhammad Golam Kibria¹, Sajjad Ali¹, Muhammad Aslam Jarwar¹, Ilyoung Chong¹•Institutions (1)

Hankuk University of Foreign Studies¹

01 Oct 2017

TL;DR: This paper proposes a framework that considers both semantic and non-semantic data sources to process them for data interoperability in Web Objects enabled IoT environment, a service platform that provides a resourceful infrastructure to deploy IoT services in the World Wide Web environment.

...read moreread less

Abstract: Data interoperability is a prerequisite to achieve cross-community and cross-application sharing of information and knowledge. Heterogeneous data from multiple sources including semantic and non-semantic data sources (e.g. SNS data, web data, relational data, RDF, XML, CSV, etc.) have an important effect for IoT service provisioning. The data are not in a same type or format always that requires to be processed and transformed into a machine readable semantic format so that systems can understand each other to create and offer services. This paper proposes a framework that considers both semantic and non-semantic data sources to process them for data interoperability in Web Objects enabled IoT environment. WoO is a service platform that provides a resourceful infrastructure to deploy IoT services in the World Wide Web environment. For data interoperability, heterogeneous data are transformed following a semantic data schema that has been defined using a standard schema. The heterogeneous data are processed and stored in a knowledge base in RDF/OWL format.

...read moreread less

Proceedings Article•DOI•

XML wrapping attack mitigation using positional token

[...]

Jitendra Kumar¹, Balaji Rajendran¹, B. S. Bindhumadhava¹, N. Sarat Chandra Babu¹•Institutions (1)

Centre for Development of Advanced Computing¹

01 Nov 2017

TL;DR: This paper proposes the concept of “Positional Token” to overcome the attack on XML signatures and demonstrates the same.

...read moreread less

Abstract: XML signature standard defined by IETF/W3C references or identifies signed elements by their unique identities specified by “id” attribute values in the given XML document Hence, signed XML elements can be shifted from one location to another location in a XML document, and still, it does not have any effect on its ability to verify its signature This flexibility paves the way for an attacker to tweak original XML message without getting noticed by the receiver In this paper we propose the concept of “Positional Token” to overcome the attack on XML signatures and demonstrate the same

...read moreread less

Journal Article•DOI•

AnnoSys-implementation of a generic annotation system for schema-based data using the example of biodiversity collection data.

[...]

Lutz Suhrbier¹, Wolf-Henning Kusber¹, Okka Tschöpe¹, Anton Güntsch¹, Walter G. Berendsohn¹ - Show less +1 more•Institutions (1)

Free University of Berlin¹

01 Jan 2017-Database

TL;DR: The AnnoSys system accesses collection data from either conventional web resources or the Biological Collection Access Service (BioCASe) and accepts XML-based data standards like ABCD or DarwinCore and proposes best practice procedures for digital annotations of complex records.

...read moreread less

Abstract: Biological research collections holding billions of specimens world-wide provide the most important baseline information for systematic biodiversity research. Increasingly, specimen data records become available in virtual herbaria and data portals. The traditional (physical) annotation procedure fails here, so that an important pathway of research documentation and data quality control is broken. In order to create an online annotation system, we analysed, modeled and adapted traditional specimen annotation workflows. The AnnoSys system accesses collection data from either conventional web resources or the Biological Collection Access Service (BioCASe) and accepts XML-based data standards like ABCD or DarwinCore. It comprises a searchable annotation data repository, a user interface, and a subscription based message system. We describe the main components of AnnoSys and its current and planned interoperability with biodiversity data portals and networks. Details are given on the underlying architectural model, which implements the W3C OpenAnnotation model and allows the adaptation of AnnoSys to different problem domains. Advantages and disadvantages of different digital annotation and feedback approaches are discussed. For the biodiversity domain, AnnoSys proposes best practice procedures for digital annotations of complex records. Database url https://annosys.bgbm.fu-berlin.de/AnnoSys/AnnoSys.

...read moreread less

Journal Article•DOI•

Examining database persistence of ISO/EN 13606 standardized electronic health record extracts: relational vs. NoSQL approaches

[...]

Ricardo Sánchez-de-Madariaga¹, Adolfo Muñoz¹, Raimundo Lozano-Rubí², Raimundo Lozano-Rubí³, Pablo Serrano-Balazote, Antonio L Castro¹, Oscar Moreno¹, Mario Pascual¹ - Show less +4 more•Institutions (3)

Carlos III Health Institute¹, University of Barcelona², Autonomous University of Barcelona³

18 Aug 2017-BMC Medical Informatics and Decision Making

TL;DR: Non-relational NoSQL databases seem to be more appropriate than standard relational SQL databases when database size is extremely high (secondary use, research applications).

...read moreread less

Abstract: The objective of this research is to compare the relational and non-relational (NoSQL) database systems approaches in order to store, recover, query and persist standardized medical information in the form of ISO/EN 13606 normalized Electronic Health Record XML extracts, both in isolation and concurrently. NoSQL database systems have recently attracted much attention, but few studies in the literature address their direct comparison with relational databases when applied to build the persistence layer of a standardized medical information system. One relational and two NoSQL databases (one document-based and one native XML database) of three different sizes have been created in order to evaluate and compare the response times (algorithmic complexity) of six different complexity growing queries, which have been performed on them. Similar appropriate results available in the literature have also been considered. Relational and non-relational NoSQL database systems show almost linear algorithmic complexity query execution. However, they show very different linear slopes, the former being much steeper than the two latter. Document-based NoSQL databases perform better in concurrency than in isolation, and also better than relational databases in concurrency. Non-relational NoSQL databases seem to be more appropriate than standard relational SQL databases when database size is extremely high (secondary use, research applications). Document-based NoSQL databases perform in general better than native XML NoSQL databases. EHR extracts visualization and edition are also document-based tasks more appropriate to NoSQL database systems. However, the appropriate database solution much depends on each particular situation and specific problem.

...read moreread less

Patent•

Method and system for presenting webpage by native user interface assembly

[...]

Liu Yongsheng

01 Feb 2017

TL;DR: In this article, the authors present a method and a system for presenting a webpage by a native user interface assembly, which relates to the technical field of computers and relates to our work.

...read moreread less

Abstract: The embodiment of the invention provides a method and a system for presenting a webpage by a native user interface assembly, and relates to the technical field of computers. The method comprises the following steps: converting an HTML (Hypertext Markup Language) document specific to the webpage into an XML (Extensive Markup Language) document on a server side; receiving an operation instruction of accessing the webpage, sending a webpage request specific to the webpage to a server according to the operation instruction, and receiving the XML document returned by the server, so that the native user interface assembly presents the XML document in a rendering way. The performance and user experience advantages of native application are utilized, the HTML document specific to the webpage is converted into the XML document, and the XML document is presented in a rendering way by the native user interface assembly, so that the performance and user experience problems caused by presentation of the HTML document by a webpage window assembly are solved, a presentation effect being the same as that of the webpage window assembly is ensured, and better performance and user experience can be realized via presentation by the native user interface assembly.

...read moreread less

Proceedings Article•DOI•

Smart grid serialization comparison: Comparision of serialization for distributed control in the context of the Internet of Things

[...]

Bo Søborg Petersen¹, Henrik W. Bindner¹, Shi You¹, Bjarne Poulsen¹•Institutions (1)

Technical University of Denmark¹

01 Jul 2017

TL;DR: The paper shows that there are better alternatives than XML & JAXB and gives guidance in choosing the most appropriate serialization format and library depending on the context, especially in the context of the Internet of Things.

...read moreread less

Abstract: Communication between DERs and System Operators is required to provide Demand Response and solve some of the problems caused by the intermittency of much Renewable Energy. An important part of efficient communication is serialization, which is important to ensure a high probability of delivery within a given timeframe, especially in the context of the Internet of Things, using low-bandwidth data connections and constrained devices. The paper shows that there are better alternatives than XML & JAXB and gives guidance in choosing the most appropriate serialization format and library depending on the context.

...read moreread less

Journal Article•DOI•

Open chemistry: RESTful web APIs, JSON, NWChem and the modern web application.

[...]

Marcus D. Hanwell¹, Wibe A. de Jong², Chris Harris¹•Institutions (2)

Kitware¹, Lawrence Berkeley National Laboratory²

30 Oct 2017-Journal of Cheminformatics

TL;DR: The platform offers an interactive visualization and analytics environment that functions well on mobile, laptop and desktop devices, and has the goal of offering federated instances, that can be customized to the sites/research performed.

...read moreread less

Abstract: An end-to-end platform for chemical science research has been developed that integrates data from computational and experimental approaches through a modern web-based interface. The platform offers an interactive visualization and analytics environment that functions well on mobile, laptop and desktop devices. It offers pragmatic solutions to ensure that large and complex data sets are more accessible. Existing desktop applications/frameworks were extended to integrate with high-performance computing resources, and offer command-line tools to automate interaction—connecting distributed teams to this software platform on their own terms. The platform was developed openly, and all source code hosted on the GitHub platform with automated deployment possible using Ansible coupled with standard Ubuntu-based machine images deployed to cloud machines. The platform is designed to enable teams to reap the benefits of the connected web—going beyond what conventional search and analytics platforms offer in this area. It also has the goal of offering federated instances, that can be customized to the sites/research performed. Data gets stored using JSON, extending upon previous approaches using XML, building structures that support computational chemistry calculations. These structures were developed to make it easy to process data across different languages, and send data to a JavaScript-based web client.

...read moreread less

Journal Article•DOI•

Efficiency of JSON for Data Retrieval in Big Data

[...]

Mohd Kamir Yusof¹•Institutions (1)

Universiti Sultan Zainal Abidin¹

01 Jul 2017-Indonesian Journal of Electrical Engineering and Computer Science

TL;DR: Two databases approaches namely Extensible Markup Language (XML) and Java Object Notation (JSON) were investigated to evaluate their suitability for handling thousands records of publication data and showed JSON is the best choice for query retrieving speed and CPU usage.

...read moreread less

Abstract: Big data is the latest industry buzzword to describe large volume of structured and unstructured data that can be difficult to process and analyze. Most of organization looking for the best approach to manage and analyze the large volume of data especially in making a decision. XML is chosen by many organization because of powerful approach during retrieval and storage processes. However, XML approach, the execution time for retrieving large volume of data are still considerably inefficient due to several factors. In this contribution, two databases approaches namely Extensible Markup Language (XML) and Java Object Notation (JSON) were investigated to evaluate their suitability for handling thousands records of publication data. The results showed JSON is the best choice for query retrieving speed and CPU usage. These are essential to cope with the characteristics of publication’s data. Whilst, XML and JSON technologies are relatively new to date in comparison to the relational database. Indeed, JSON technology demonstrates greater potential to become a key database technology for handling huge data due to increase of data annually.

...read moreread less

Journal Article•DOI•

A new service-oriented grid-based method for AIoT application and implementation

[...]

Yiqin Zou¹, Li Quan¹•Institutions (1)

Jiangsu University¹

27 Jul 2017-Modern Physics Letters B

TL;DR: A new service-oriented grid-based method to set up and build the agricultural IoT and description of Web Service Resource Framework-based Agricultural Internet of Things (AIoT) and the encapsulation method were discussed in this paper.

...read moreread less

Abstract: The traditional three-layer Internet of things (IoT) model, which includes physical perception layer, information transferring layer and service application layer, cannot express complexity and diversity in agricultural engineering area completely. It is hard to categorize, organize and manage the agricultural things with these three layers. Based on the above requirements, we propose a new service-oriented grid-based method to set up and build the agricultural IoT. Considering the heterogeneous, limitation, transparency and leveling attributes of agricultural things, we propose an abstract model for all agricultural resources. This model is service-oriented and expressed with Open Grid Services Architecture (OGSA). Information and data of agricultural things were described and encapsulated by using XML in this model. Every agricultural engineering application will provide service by enabling one application node in this service-oriented grid. Description of Web Service Resource Framework (WSRF)-based Agricultural Internet of Things (AIoT) and the encapsulation method were also discussed in this paper for resource management in this model.

...read moreread less

Journal Article•DOI•

Consistencies of fuzzy spatiotemporal data in XML documents

[...]

Zongmin Ma¹, Zongmin Ma², Luyi Bai¹, Yoshiharu Ishikawa³, Li Yan² - Show less +1 more•Institutions (3)

Northeastern University (China)¹, Nanjing University of Aeronautics and Astronautics², Nagoya University³

16 Mar 2017-Fuzzy Sets and Systems

TL;DR: It is demonstrated how updating operations, inserting operations, and deleting operations effect on consistencies of fuzzy spatiotemporal data in XML documents, and proposed algorithms for fixing these inconsistencies are proposed.

...read moreread less

Proceedings Article•DOI•

Automating functional and structural software size measurement based on XML structure of UML sequence diagram

[...]

Meiliana¹, Syaeful Karim¹, Suryadiputra Liawatimena¹, Agung Trisetyarso¹, Bahtiar Saleh Abbas¹, Wayan Suparta - Show less +2 more•Institutions (1)

Binus University¹

01 Nov 2017

TL;DR: To automate the measurement process, XML structure of sequence diagram is analyzed to fit existing functional and structural equation, and UML sequence diagram chose as source to extract software function.

...read moreread less

Abstract: This research aims to automated software size measurement for both functional and structural view. Software size is used to estimate schedule, effort, cost and other resource in software development process. Therefore, the best method to measure software size is to derive software attributes from requirement artifacts to get early estimation. UML sequence diagram chose as source to extract software function, since this diagram provide high level granularity of function. Functional size is measured by using COSMIC method, while structural size is calculated based on control structure on sequence diagram. To automate the measurement process, XML structure of sequence diagram is analyzed to fit existing functional and structural equation. A well-known case study of rice cooker is used to depict the proposed method in this research. In addition, a simple support tool is provided.

...read moreread less

Journal Article•DOI•

Computational Chemistry Data Management Platform Based on the Semantic Web.

[...]

Bing Wang, Paul A. Dobosh, Stuart J. Chalk¹, Mirek Sopek, Neil S. Ostlund - Show less +1 more•Institutions (1)

University of North Florida¹

12 Jan 2017-Journal of Physical Chemistry A

TL;DR: By leveraging semantic web technologies, this platform is able to place computational chemistry data onto web portals as a component of a Giant Global Graph (GGG) such that computer agents, as well as individual chemists, can access the data.

...read moreread less

Abstract: This paper presents a formal data publishing platform for computational chemistry using semantic web technologies. This platform encapsulates computational chemistry data from a variety of packages in an Extensible Markup Language (XML) file called CSX (Common Standard for eXchange). On the basis of a Gainesville Core (GC) ontology for computational chemistry, a CSX XML file is converted into the JavaScript Object Notation for Linked Data (JSON-LD) format using an XML Stylesheet Language Transformation (XSLT) file. Ultimately the JSON-LD file is converted to subject–predicate–object triples in a Turtle (TTL) file and published on the web portal. By leveraging semantic web technologies, we are able to place computational chemistry data onto web portals as a component of a Giant Global Graph (GGG) such that computer agents, as well as individual chemists, can access the data.

...read moreread less

Journal Article•DOI•

A model for aggregation and filtering on encrypted XML streams in fog computing

[...]

Jyun-Yao Huang¹, Wei-Chih Hong², Po-Shin Tsai¹, I-En Liao¹•Institutions (2)

National Chung Hsing University¹, Feng Chia University²

06 May 2017-International Journal of Distributed Sensor Networks

TL;DR: A model that expands the present XML Encryption standard for data with string and numeric types implemented in the sensors, efficiently and discreetly filters matched streaming data and performs summation in the fog nodes, and decrypts the filtered and aggregated data in the subscribers without revealing privacy data is proposed.

...read moreread less

Abstract: The Internet of Things provides visions of innovative services and domain-specific applications. With the development of Internet of Things services, various structural data need to be transferred ...

...read moreread less

Collapse