scispace - formally typeset
Search or ask a question

Showing papers on "XML published in 2015"


Journal ArticleDOI
TL;DR: The implementation of the DNAS framework into the obXML schema will facilitate the development of occupant information modeling (OIM) by providing interoperability between occupant behavior models and building energy modeling programs.

139 citations


Journal ArticleDOI
TL;DR: The Root System Markup Language (RSML), which has been designed to enable portability of root architecture data between different software tools in an easy and interoperable manner, is described, to provide a standard format upon which to base central repositories that will soon arise following the expanding worldwide root phenotyping effort.
Abstract: The number of image analysis tools supporting the extraction of architectural features of root systems has increased over the last years. These tools offer a handy set of complementary facilities, yet it is widely accepted that none of these software tool is able to extract in an efficient way growing array of static and dynamic features for different types of images and species. . We describe the Root System Markup Language (RSML) that has been designed to overcome two major challenges: (i) to enable portability of root architecture data between different software tools in an easy and interoperable manner allowing seamless collaborative work, and (ii) to provide a standard format upon which to base central repositories which will soon arise following the expanding worldwide root phenotyping effort. RSML follows the XML standard to store 2D or 3D image metadata, plant and root properties and geometries, continuous functions along individual root paths and a suite of annotations at the image, plant or root scales, at one or several time points. Plant ontologies are used to describe botanical entities that are relevant at the scale of root system architecture. An xml-schema describes the features and constraints of RSML and open-source packages have been developed in several languages (R, Excel, Java, Python, C#) to enable researchers to integrate RSML files into popular research workflow.

108 citations


Journal ArticleDOI
TL;DR: This work begins with an introduction to CellML and two of its early adopters, which limitations eventually led to the development of OpenCOR, an open source modeling environment that is supported on Windows, Linux and OS X.
Abstract: Computational biologists have been developing standards and formats for nearly two decades, with the aim of easing the description and exchange of experimental data, mathematical models, simulation experiments, etc. One of those efforts is CellML (cellml.org), an XML-based markup language for the encoding of mathematical models. Early CellML-based environments include COR and OpenCell. However, both of those tools have limitations and were eventually replaced with OpenCOR (opencor.ws). OpenCOR is an open source modeling environment that is supported on Windows, Linux and OS X. It relies on a modular approach, which means that all of its features come in the form of plugins. Those plugins can be used to organize, edit, simulate and analyze models encoded in the CellML format. We start with an introduction to CellML and two of its early adopters, which limitations eventually led to the development of OpenCOR. We then go onto describing the general philosophy behind OpenCOR, as well as describing its openness and its development process. Next, we illustrate various aspects of OpenCOR, such as its user interface and some of the plugins that come bundled with it (e.g. its editing and simulation plugins). Finally, we discuss some of the advantages and limitations of OpenCOR before drawing some concluding remarks.

90 citations


Proceedings ArticleDOI
30 Mar 2015
TL;DR: This work proposes an algorithm that integrates syntactic and semantic validations in order to overcome limitations in the current Schematron design and implementation based on Extensible Style sheet Language Transformations.
Abstract: Extensible Markup Language (XML) syntax and semantic validations are critical to the correct service transaction specification and service integration based on the existing distributed heterogeneous computing services. However, the current Schematron design and implementation based on Extensible Style sheet Language Transformations (XSLT) have limitations in terms of validation correctness and support for system integration. We propose an algorithm that integrates syntactic and semantic validations in order to overcome the aforementioned limitations. The syntactic validation is based on DTD and XSD and the semantic validation is based on the Schematron. The solution is illustrated by several use cases. Our contributions include combining syntax and semantic validations, designing and implementing a reusable software component to implement this integrated validation process, and supporting invoking this integrated validation through the more flexible observer pattern.

82 citations


Proceedings ArticleDOI
01 Nov 2015
TL;DR: A brief review of some applications which are used AIML chatbot for their conversational service and how they are not only providing useful services but also interact with customers and give solution of their problems through AIMl chatbot instead of human beings.
Abstract: Artificial Intelligence Markup Language (AIML) is derived from Extensible Markup Language (XML) which is used to build up conversational agent (chatbot) artificially. There are developed a lot of works to make conversational agent. But low cost, configuration and availability make possible to use it in various applications. In this paper, we give a brief review of some applications which are used AIML chatbot for their conversational service. These applications are related to cultural heritage, e-learning, e-government, web base model, dialog model, semantic analysis framework, interaction framework, humorist expert, network management, adaptive modular architecture as well. In this case, they are not only providing useful services but also interact with customers and give solution of their problems through AIML chatbot instead of human beings. So, this is popular day by day with entrepreneur and users to provide efficient service.

57 citations


Book ChapterDOI
11 Apr 2015
TL;DR: In this paper, the authors present a generic policy analysis framework that employs SMT as the underlying reasoning mechanism and demonstrate that a wide range of security properties proposed in the literature can be easily modeled within the framework.
Abstract: The eXtensible Access Control Markup Language XACML is an extensible and flexible XML language for the specification of access control policies. However, the richness and flexibility of the language along with the verbose syntax of XML come with a price: errors are easy to make and difficult to detect when policies grow in size. If these errors are not detected and rectified, they can result in serious data leakage and/or privacy violations leading to significant legal and financial consequences. To assist policy authors in the analysis of their policies, several policy analysis tools have been proposed based on different underlying formalisms. However, most of these tools either abstract away functions over non-Boolean domains hence they cannot provide information about them or produce very large encodings which hinder the performance. In this paper, we present a generic policy analysis framework that employs SMT as the underlying reasoning mechanism. The use of SMT does not only allow more fine-grained analysis of policies but also improves the performance. We demonstrate that a wide range of security properties proposed in the literature can be easily modeled within the framework. A prototype implementation and its evaluation are also provided.

53 citations


Journal ArticleDOI
TL;DR: This work presents an XML-based wrapper application for the Insight Toolkit that combines the performance of a pure C++ implementation with an easy-to-use graphical setup of dynamic image analysis pipelines and successfully applied the software tool for the automated analysis of terabyte-scale, time-resolved 3D image data of zebrafish embryos.
Abstract: UNLABELLED The Insight Toolkit offers plenty of features for multidimensional image analysis. Current implementations, however, often suffer either from a lack of flexibility due to hard-coded C++ pipelines for a certain task or by slow execution times, e.g. caused by inefficient implementations or multiple read/write operations for separate filter execution. We present an XML-based wrapper application for the Insight Toolkit that combines the performance of a pure C++ implementation with an easy-to-use graphical setup of dynamic image analysis pipelines. Created XML pipelines can be interpreted and executed by XPIWIT in console mode either locally or on large clusters. We successfully applied the software tool for the automated analysis of terabyte-scale, time-resolved 3D image data of zebrafish embryos. AVAILABILITY AND IMPLEMENTATION XPIWIT is implemented in C++ using the Insight Toolkit and the Qt SDK. It has been successfully compiled and tested under Windows and Unix-based systems. Software and documentation are distributed under Apache 2.0 license and are publicly available for download at https://bitbucket.org/jstegmaier/xpiwit/downloads/. CONTACT johannes.stegmaier@kit.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

47 citations


Journal ArticleDOI
TL;DR: This approach combines several different software technologies: C language for microcontroller programming, Matlab to chart data, Simulink to control the system, AJAX and XML for the client- side application, web services in the server-side application and serial I2-C bus interface to read/write data from/to Feedback system.

45 citations


Proceedings ArticleDOI
02 Aug 2015
TL;DR: A conceptual design for an additive manufacturing integrated data model, AMIDM, is proposed based on a well-defined product lifecycle management (PLM) data modeling method called PPR (product, process, and resource).
Abstract: Large amounts of data are generated, exchanged, and used during an additive manufacturing (AM) build. While the AM data from a single build is essential for establishing part traceability, when methodically collected, the full processing history of thousands of components can be mined to advance our understanding of AM processes. Hence, this full body of data must be captured, stored, and properly managed for easy query and analysis. An innovative, AM-specific data model is necessary for establishing of a comprehensive AM information management system.This paper introduces our work towards designing a complete and integrated data model for AM processes. We begin by defining the scope and specifying the requirements of such a data schema. We investigate how information created and exchanged in the AM process chain is identified based on an AM process activity diagram. A comprehensive survey shows that existing AM standards are unable to provide both the breadth and the depth needed for an integrated AM information model. We propose a conceptual design for an additive manufacturing integrated data model, AMIDM, based on a well-defined product lifecycle management (PLM) data modeling method called PPR (product, process, and resource). The proposed AM model has a core scheme composed of product, process, and resource entities. The process entities play critical roles in transforming product input into product output using assigned resources such as equipment, material, personnel, and software tools. The proposed model has been applied to an information system design for Powder Bed Fusion based AM experimental data management. An XML (eXtensible Markup Language) schema is presented in the paper to demonstrate the effectiveness of the conceptual model.Copyright © 2015 by ASME

40 citations


Journal ArticleDOI
TL;DR: The SXSI system performs on par or better than the fastest known systems MonetDB and Qizx on pure tree queries, and on queries that use text search, SXSI outperforms the existing systems by 1-3 orders of magnitude.
Abstract: Extensible Markup Language XML documents consist of text data plus structured data markup. XPath allows to query both text and structure. Evaluating such hybrid queries is challenging. We present a system for in-memory evaluation of XPath search queries, that is, queries with text and structure predicates, yet without advanced features such as backward axes, arithmetics, and joins. We show that for this query fragment, which contains Forward Core XPath, our system, dubbed Succinct XML Self-Index 'SXSI', outperforms existing systems by 1-3 orders of magnitude. SXSI is based on state-of-the-art indexes for text and structure data. It combines two novelties. On one hand, it represents the XML data in a compact indexed form, which allows it to handle larger collections in main memory while supporting powerful search and navigation operations over the text and the structure. On the other hand, it features an execution engine that uses tree automata and cleverly chooses evaluation orders that leverage the speeds of the respective indexes. SXSI is modular and allows seamless replacement of its indexes. This is demonstrated through experiments with 1 a text index specialized for search of bio sequences, and 2 a word-based text index specialized for natural language search. Copyright © 2013 John Wiley & Sons, Ltd.

36 citations


Journal ArticleDOI
TL;DR: SED-ML L1V3 extends SED- ML by means to describe which datasets and subsets thereof to use within a simulation experiment, and encodes which models to use and how these results should be plotted and reported.
Abstract: The creation of computational simulation experiments to inform modern biological research poses challenges to reproduce, annotate, archive, and share such experiments. Efforts such as SBML or CellML standardize the formal representation of computational models in various areas of biology. The Simulation Experiment Description Markup Language (SED-ML) describes what procedures the models are subjected to, and the details of those procedures. These standards, together with further COMBINE standards, describe models sufficiently well for the reproduction of simulation studies among users and software tools. The Simulation Experiment Description Markup Language (SED-ML) is an XML-based format that encodes, for a given simulation experiment, (i) which models to use; (ii) which modifications to apply to models before simulation; (iii) which simulation procedures to run on each model; (iv) how to post-process the data; and (v) how these results should be plotted and reported. SED-ML Level 1 Version 1 (L1V1) implemented support for the encoding of basic time course simulations. SED-ML L1V2 added support for more complex types of simulations, specifically repeated tasks and chained simulation procedures. SED-ML L1V3 extends L1V2 by means to describe which datasets and subsets thereof to use within a simulation experiment.

Journal ArticleDOI
TL;DR: The SPARQL2XQuery Framework as discussed by the authors provides a mapping model for the expression of OWL---RDF/S to XML Schema mappings as well as a method for SPARql to XQuery translation.
Abstract: In the context of the emergent Web of Data, a large number of organizations, institutes and companies (e.g., DBpedia, Data.gov, GeoNames, PubMed) adopt the Linked Data practices. Utilizing the Semantic Web (SW) technologies, they publish their data and offer SPARQL endpoints (i.e., SPARQL-based search services). On the other hand, the dominant standard for information exchange in the Web today is XML. Additionally, many international standards (e.g., Dublin Core, MPEG-7, METS, TEI, IEEE LOM) in several domains (e.g., Digital Libraries, GIS, Multimedia, e-Learning) have been expressed in XML Schema. The aforementioned have led to an increasing emphasis on XML data, accessed using the XQuery query language. The SW and XML worlds and their developed infrastructures are based on different data models, semantics and query languages. Thus, it is crucial to develop interoperability mechanisms that allow the Web of Data users to access XML datasets, using SPARQL, from their own working environments. It is unrealistic to expect that all the existing legacy data (e.g., Relational, XML, etc.) will be transformed into SW data. Therefore, publishing legacy data as Linked Data and providing SPARQL endpoints over them has become a major research challenge. In this direction, we introduce the SPARQL2XQuery Framework which creates an interoperable environment, where SPARQL queries are automatically translated to XQuery queries, in order to access XML data across the Web. The SPARQL2XQuery Framework provides a mapping model for the expression of OWL---RDF/S to XML Schema mappings as well as a method for SPARQL to XQuery translation. To this end, our Framework supports both manual and automatic mapping specification between ontologies and XML Schemas. In the automatic mapping specification scenario, the SPARQL2XQuery exploits the XS2OWL component which transforms XML Schemas into OWL ontologies. Finally, extensive experiments have been conducted in order to evaluate the schema transformation, mapping generation, query translation and query evaluation efficiency, using both real and synthetic datasets.

Proceedings ArticleDOI
16 May 2015
TL;DR: This work constructed a full island grammar capable of modeling the set of 700,000 Stack Overflow discussions talking about Java, building a heterogeneous abstract syntax tree (H-AST) of each post (question, answer or comment) in a discussion.
Abstract: Stack Overflow is the de facto Question and Answer (QaA) website for developers, and it has been used in many approaches by software engineering researchers to mine useful data. However, the contents of a Stack Overflow discussion are inherently heterogeneous, mixing natural language, source code, stack traces and configuration files in XML or JSON format. We constructed a full island grammar capable of modeling the set of 700,000 Stack Overflow discussions talking about Java, building a heterogeneous abstract syntax tree (H-AST) of each post (question, answer or comment) in a discussion. The resulting dataset models every Stack Overflow discussion, providing a full H-AST for each type of structured fragment (i.e., JSON, XML, Java, Stack traces), and complementing this information with a set of basic meta-information like term frequency to enable natural language analyses. Our dataset allows the end-user to perform combined analyses of the Stack Overflow by visiting the H-AST of a discussion.

Journal ArticleDOI
TL;DR: This document provides the specification for Version 5 of SBML Level 2, which defines the data structures prescribed by SBML as well as their encoding in XML, the eXtensible Markup Language.
Abstract: Computational models can help researchers to interpret data, understand biological function, and make quantitative predictions. The Systems Biology Markup Language (SBML) is a file format for representing computational models in a declarative form that can be exchanged between different software systems. SBML is oriented towards describing biological processes of the sort common in research on a number of topics, including metabolic pathways, cell signaling pathways, and many others. By supporting SBML as an input/output format, different tools can all operate on an identical representation of a model, removing opportunities for translation errors and assuring a common starting point for analyses and simulations. This document provides the specification for Version 5 of SBML Level 2. The specification defines the data structures prescribed by SBML as well as their encoding in XML, the eXtensible Markup Language. This specification also defines validation rules that determine the validity of an SBML document, and provides many examples of models in SBML form. Other materials and software are available from the SBML project web site, http://sbml.org.

Journal ArticleDOI
TL;DR: Extended Dominance (E-Dominance) is proposed as a new feature selection criterion and compared favorably with usual feature selection methods based on document frequency, information gain, and entropy on a collection of XML documents from Hamshahri2, commonly used in Persian text classification.
Abstract: the rapid growth of the World Wide Web and increasing availability of electronic documents, the automatic text classification became a general and important machine learning problem in text mining domain. In text classification, feature selection is used for reducing the size of feature vector and for improving the performance of classifier. This paper improved Dominance which is a feature selection criterion and proposed Extended Dominance (E-Dominance) as a new criterion. E-Dominance is compared favorably with usual feature selection methods based on document frequency (DF), information gain (IG), Entropy, χ2 and Dominance on a collection of XML documents from Hamshahri2 which is a commonly used in Persian text classification. The comparative study confirms the effectiveness of proposed feature selection criterion derived from the Dominance.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed an approach that automatically diversifies XML keyword search based on its different contexts in the XML data, and two efficient algorithms are proposed to incrementally compute top- $k$ qualified query candidates as the diversified search intentions.
Abstract: While keyword query empowers ordinary users to search vast amount of data, the ambiguity of keyword query makes it difficult to effectively answer keyword queries, especially for short and vague keyword queries. To address this challenging problem, in this paper we propose an approach that automatically diversifies XML keyword search based on its different contexts in the XML data. Given a short and vague keyword query and XML data to be searched, we first derive keyword search candidates of the query by a simple feature selection model. And then, we design an effective XML keyword search diversification model to measure the quality of each candidate. After that, two efficient algorithms are proposed to incrementally compute top- $k$ qualified query candidates as the diversified search intentions. Two selection criteria are targeted: the $k$ selected query candidates are most relevant to the given query while they have to cover maximal number of distinct results. At last, a comprehensive evaluation on real and synthetic data sets demonstrates the effectiveness of our proposed diversification model and the efficiency of our algorithms.

Journal ArticleDOI
TL;DR: The mzDB described here can boost existing mass spectrometry data analysis pipelines, offering unprecedented performance in terms of efficiency, portability, compactness, and flexibility.

Journal ArticleDOI
TL;DR: A new technique is presented for the interpretation of the International Standards Organization (ISO) 6983 data interface model and is able to interpret ISO 6983Data and translate it to the internal structure required by the CNC machine.
Abstract: Computer numerical control (CNC) technology is a key technology in machine tools and is also the base of industrial unit computerization. CNC machines are operated by controllers, which have a software module inside known as interpreter. The function of an interpreter is to extract data from computer-aided manufacturing (CAM) system-generated code and convert it to controller motion commands. However, with the development of numerical control technology, existing CNC systems are limited with interpreter lacking in expandibility, modularity, and openness. In order to overcome these problems, open architecture technology (OAC) was employed in CNC controller. In this paper, a new technique is presented for the interpretation of the International Standards Organization (ISO) 6983 data interface model. The proposed technique is able to interpret ISO 6983 data and translate it to the internal structure required by the CNC machine. It takes an input file in text format and extracts position, feed rate, tool, spindle, and other data. It is also able to generate output in text and EXtensible Markup Language (XML) files as per user defined file structure. The use of .txt and .xml files enables shop floor data modification and internet accessibility facilities in CNC system. The paper includes an introduction, brief description of the architecture, algorithm design, operational pattern, and validation of the system through experimental studies. The output of these experiments illustrated satisfactory performance of the interpreter with an OAC CNC system.

Journal ArticleDOI
01 Feb 2015
TL;DR: This work addresses the problem of authenticating pattern matching queries over textual data that is outsourced to an untrusted cloud server by employing cryptographic accumulators in a novel optimal integrity-checking tool built directly over a suffix tree, and designs the first authenticated data structure for verifiable answers topattern matching queries featuring fast generation of constant-size proofs.
Abstract: We address the problem of authenticating pattern matching queries over textual data that is outsourced to an untrusted cloud server. By employing cryptographic accumulators in a novel optimal integrity-checking tool built directly over a suffix tree, we design the first authenticated data structure for verifiable answers to pattern matching queries featuring fast generation of constant-size proofs. We present two main applications of our new construction to authenticate: (i) pattern matching queries over text documents, and (ii) exact path queries over XML documents. Answers to queries are verified by proofs of size at most 500 bytes for text pattern matching, and at most 243 bytes for exact path XML search, independently of the document or answer size. By design, our authentication schemes can also be parallelized to offer extra efficiency during data outsourcing. We provide a detailed experimental evaluation of our schemes showing that for both applications the times required to compute and verify a proof are very small---e.g., it takes less than 10μs to generate a proof for a pattern (mis)match of 102 characters in a text of 106 characters, once the query has been evaluated.

Journal ArticleDOI
TL;DR: The main contribution of the paper is to provide a reference point for future researches by collecting different techniques and methods concerning the topic; classifying them into a number of categories and creating a complete bibliography of the major published works.
Abstract: Over the past decade, there has been increasing interest in using extensible markup language (XML) which has made it a de facto standard for representing and exchanging data over different systems and platforms (specifically the internet). Due to the popularity of XML and with increasing numbers of XML documents, the process of knowledge discovery from this type of data has found more attention. Although in the last decade several different methods have been proposed for mining XML documents, this research field still is in its infancy compared to traditional data mining. As in relational techniques, in the case of XML documents, association rule mining has a strong research interest. In this paper we have performed a comprehensive study on all of the major works so far done on mining association rules from XML documents. The main contribution of the paper is to provide a reference point for future researches by collecting different techniques and methods concerning the topic; classifying them into a number of categories and creating a complete bibliography of the major published works. We think that this paper can help researchers in XML association rules mining domains to quickly find the current work as the basis for the future activities.

Journal ArticleDOI
TL;DR: It turns out that the early query answering algorithm is earliest in practice on most queries from the XPathMark benchmark, and is proved tight upper complexity bounds on time and space consumption.

Journal ArticleDOI
TL;DR: In this article, the authors describe an XML-based language for describing and exchanging models of cellular and subcellular processes, called CellML 1.1, which is used to define the underlying mathematics of models.
Abstract: This document specifies CellML 1.1, an XML-based language for describing and exchanging models of cellular and subcellular processes. MathML embedded in CellML documents is used to define the underlying mathematics of models. Models consist of a network of reusable components, each with variables and equations manipulating those variables. Models may import other models to create systems of increasing complexity. Metadata may be embedded in CellML documents using RDF.

Journal ArticleDOI
TL;DR: This paper designs and builds a XACS (XML Access Control System), which is capable of making fined-grained access control, and suggests an empirical telemedicine application to confirm the adequacy and validity of the proposed method.
Abstract: XML can supply the standard data type in information exchange format on a lot of data generated in running database or applied programs for a company by using the advantage that it can describe meaningful information directly. Accordingly since there are increasing needs for the efficient management and telemedicine security of the massive volume of XML data, it is necessary to develop a secure access control mechanism for XML. The existing access control has not taken information structures and semantics into full consideration due to the fundamental limitations of HTML. In addition, access control for XML documents allows read operations only, and there are problems of slowing down the system performance due to the complex authorization evaluation process. To resolve this problem, this paper designs and builds a XACS (XML Access Control System), which is capable of making fined-grained access control. This only provides data corresponding to its users' authority levels by authorizing them to access only the specific items of XML documents when they are searching XML documents in telemedicine. To accomplish this, XACS eliminates certain parts of the documents that are inaccessible and transmits the parts accessible depending on the users' authority levels. In addition, it can be expanded to existing web servers because XML documents are used based on the normal web sites. The telemedicine secure and the guidelines are provided to enable quick and precise understanding of the information, and thus the safety enhancement gets improved. Ultimately, this paper suggests an empirical telemedicine application to confirm the adequacy and validity using the proposed method.

Journal ArticleDOI
TL;DR: A rule-based approach for generation of ancient Chinese architectures from the Song dynasty, based on the special module system and the hierarchical topology of structural patterns in traditional Chinese architectures, parameterizes the wooden elements of buildings and formalizes the construction rules for different architecture styles.
Abstract: Ancient Chinese architecture from the Song dynasty is a prominent example of the ancient oriental architectures. The cai-fen system was a module system used for the carpentry of Song architectures, which was specified by the governmental manual, the Yingzao Fashi (State Building Standards) compiled by Li Jie [1103]. We present a rule-based approach for generation of ancient Chinese architectures from the Song dynasty. Based on the special module system and the hierarchical topology of structural patterns in traditional Chinese architectures, the approach parameterizes the wooden elements of buildings and formalizes the construction rules for different architecture styles. In the approach, XML-based description files are generated for displaying the construction process. What the approach generates are standard architectures that strictly follow the ancient Chinese governmental manual. To demonstrate the efficiency of our approach, architectures in different styles have been generated based on their corresponding rules. The fundamental difference between our approach and previous works is that we apply and implement the module system in digitalization of ancient Chinese architecture.

Proceedings ArticleDOI
03 Aug 2015
TL;DR: The conclusion is that most of the studied parsers are vulnerable and so are systems that use them, and is a strong motivation for developers to provide security measures to thwart BIL and XXE attacks before deployment when adopting existing XML parsers.
Abstract: The Extensible Markup Language (XML) is extensively used in software systems and services. Various XML-based attacks, which may result in sensitive information leakage or denial of services, have been discovered and published. However, due to development time pressures and limited security expertise, such attacks are often overlooked in practice. In this paper, following a rigorous and extensive experimental process, we study the presence of two types of XML-based attacks: BIL and XXE in 13 popular XML parsers. Furthermore, we investigate whether open-source systems that adopt a vulnerable XML parser apply any mitigation to prevent such attacks. Our objective is to provide clear and solid scientific evidence about the extent of the threat associated with such XML-based attacks and to discuss the implications of the obtained results. Our conclusion is that most of the studied parsers are vulnerable and so are systems that use them. Such strong evidence can be used to raise awareness among software developers and is a strong motivation for developers to provide security measures to thwart BIL and XXE attacks before deployment when adopting existing XML parsers.

Patent
11 Mar 2015
TL;DR: In this paper, a recording and broadcasting equipment-based intelligent teaching information processing device, which comprises a teacher PC (personal computer), recording cameras, image detection cameras, pickup transmitters, and an audio processing device was proposed.
Abstract: The invention relates to a recording and broadcasting equipment-based intelligent teaching information processing device, which comprises a teacher PC (personal computer), recording cameras, image detection cameras, pickup transmitters, an audio processing device and a recording and broadcasting host, wherein the teacher PC, the recording cameras and the image detection cameras are connected with the recording and broadcasting host; the pickup transmitters are connected with the recording and broadcasting host through the audio processing device. The device comprises a tracking host, wherein the teacher PC, the recording cameras and the image detection cameras are connected with the recording and broadcast host through the tracking host. According to the device, teaching information can be automatically marked to form an xml (extensible markup language) file in a teaching process of a teacher by virtue of an AVA intelligent recording and broadcasting system, knowledge points and teaching link information can be synchronously recorded for the retrieval of the teacher and students after a video is recorded, and the rapidly jumping to corresponding time points for viewing by virtue of virtual slice information embedded into a web player and a mobile terminal player is achieved.

Journal ArticleDOI
01 Jun 2015
TL;DR: Xigt, an extensible storage format for interlinear glossed text (IGT), is presented and its application to the use case of representing a large, noisy, heterogeneous set of IGT is described.
Abstract: This paper presents Xigt, an extensible storage format for interlinear glossed text (IGT). We review design desiderata for such a format based on our own use cases as well as general best practices, and then explore existing representations of IGT through the lens of those desiderata. We give an overview of the data model and XML serialization of Xigt, and then describe its application to the use case of representing a large, noisy, heterogeneous set of IGT.

Journal Article
TL;DR: The GROBID tool exploits “Conditional Random Fields” (CRF), a machine-learning technique for extracting and restructuring content automatically from raw and heterogeneous sources into uniform standard TEI (Text Encoding Initiative) documents.
Abstract: Scientific papers potentially offer a wealth of information that allows one to put the corresponding work in context and offer a wide range of services to researchers. GROBID is a high performing software environment to extract such information as metadata, bibliographic references or entities in scientific texts. Most modern digital library techniques rely on the availability of high quality textual documents. In practice, however, the majority of full text collections are in raw PDF or in incomplete and inconsistent semi-structured XML. To address this fundamental issue, the development of the Java library GROBID started in 2008 [1]. The tool exploits “Conditional Random Fields” (CRF), a machine-learning technique for extracting and restructuring content automatically from raw and heterogeneous sources into uniform standard TEI (Text Encoding Initiative) documents.

Journal ArticleDOI
TL;DR: An automatic document processing system for the extraction of data contained in medical laboratory results printed on paper using an open source OCR engine, as a basis of further processing, and the storage in XML format of the data retrieved, for ease of sharing is illustrated.

Journal ArticleDOI
TL;DR: This paper presents an up-to-date analysis of existing structural XML clustering algorithms and analyzes and compares 23 state-of-the-art approaches and arrange them in an original taxonomy.
Abstract: With its presence in data integration, chemistry, biological and geographic systems, XML has become an important standard not only in computer science. A common problem among the mentioned applications involves structural clustering of XML documents — an issue that has been thoroughly studied and led to the creation of a myriad of approaches. In this paper, we present a comprehensive review of structural XML clustering. First, we provide a basic introduction to the problem and highlight the main challenges in this research area. Subsequently, we divide the problem into three subtasks and discuss the most common document representations, structural similarity measures, and clustering algorithms. Additionally, we present the most popular evaluation measures, which can be used to estimate clustering quality. Finally, we analyze and compare 23 state-of-the-art approaches and arrange them in an original taxonomy. By providing an up-to-date analysis of existing structural XML clustering algorithms, we hope to showcase methods suitable for current applications and draw lines of future research.