Showing papers on "XML published in 2014"

PDF

Open Access

Journal Article•DOI•

BEAST 2: A Software Platform for Bayesian Evolutionary Analysis

[...]

Remco R. Bouckaert¹, Joseph Heled¹, Denise Kühnert¹, Timothy G. Vaughan², Chieh-Hsi Wu¹, Dong Xie¹, Marc A. Suchard³, Andrew Rambaut⁴, Alexei J. Drummond¹ - Show less +5 more•Institutions (4)

University of Auckland¹, Massey University², University of California, Los Angeles³, University of Edinburgh⁴

10 Apr 2014-PLOS Computational Biology

TL;DR: BEAST 2 now has a fully developed package management system that allows third party developers to write additional functionality that can be directly installed to the BEAST 2 analysis platform via a package manager without requiring a new software release of the platform.

...read moreread less

Abstract: We present a new open source, extensible and flexible software platform for Bayesian evolutionary analysis called BEAST 2. This software platform is a re-design of the popular BEAST 1 platform to correct structural deficiencies that became evident as the BEAST 1 software evolved. Key among those deficiencies was the lack of post-deployment extensibility. BEAST 2 now has a fully developed package management system that allows third party developers to write additional functionality that can be directly installed to the BEAST 2 analysis platform via a package manager without requiring a new software release of the platform. This package architecture is showcased with a number of recently published new models encompassing birth-death-sampling tree priors, phylodynamics and model averaging for substitution models and site partitioning. A second major improvement is the ability to read/write the entire state of the MCMC chain to/from disk allowing it to be easily shared between multiple instances of the BEAST software. This facilitates checkpointing and better support for multi-processor and high-end computing extensions. Finally, the functionality in new packages can be easily added to the user interface (BEAUti 2) by a simple XML template-based mechanism because BEAST 2 has been re-designed to provide greater integration between the analysis engine and the user interface so that, for example BEAST and BEAUti use exactly the same XML file format.

...read moreread less

5,183 citations

Book•

Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining

[...]

Simon Munzert, Christian Rubba, Peter Meiner, Dominic Nyhuis

18 Dec 2014

TL;DR: A hands on guide to web scraping and text mining for both beginners and experienced users of R introduces fundamental concepts of the main architecture of the web and databases and covers HTTP, HTML, XML, JSON, SQL.

...read moreread less

Abstract: A hands on guide to web scraping and text mining for both beginners and experienced users of RIntroduces fundamental concepts of the main architecture of the web and databases and covers HTTP, HTML, XML, JSON, SQL. Provides basic techniques to query web documents and data sets (XPath and regular expressions). An extensive set of exercises are presentedto guide the reader through each technique. Explores both supervised and unsupervised techniques as well as advanced techniques such as data scraping and text management. Case studies are featured throughout along with examples for each technique presented. R code and solutionsto exercises featuredin the book are provided ona supporting website.

...read moreread less

112 citations

Proceedings Article•

PanLex: Building a Resource for Panlingual Lexical Translation

[...]

David Kamholz, Jonathan Pool, Susan Colowick¹•Institutions (1)

University of Washington¹

01 May 2014

TL;DR: The PanLex database, a project of The Long Now Foundation, aims to enable the translation of lexemes among all human languages in the world by focusing on lexemic translations, rather than grammatical or corpus data, and achieves broader lexical and language coverage than related projects.

...read moreread less

Abstract: PanLex, a project of The Long Now Foundation, aims to enable the translation of lexemes among all human languages in the world. By focusing on lexemic translations, rather than grammatical or corpus data, it achieves broader lexical and language coverage than related projects. The PanLex database currently documents 20 million lexemes in about 9,000 language varieties, with 1.1 billion pairwise translations. The project primarily engages in content procurement, while encouraging outside use of its data for research and development. Its data acquisition strategy emphasizes broad, high-quality lexical and language coverage. The project plans to add data derived from 4,000 new sources to the database by the end of 2016. The dataset is publicly accessible via an HTTP API and monthly snapshots in CSV, JSON, and XML formats. Several online applications have been developed that query PanLex data. More broadly, the project aims to make a contribution to the preservation of global linguistic diversity.

...read moreread less

112 citations

Book•

Foundations of Data Exchange

[...]

Marcelo Arenas¹, Pablo Barcel², Leonid Libkin, Filip Murlak³•Institutions (3)

Pontifical Catholic University of Chile¹, University of Chile², University of Warsaw³

14 Apr 2014

TL;DR: This book introduces the problem of data exchange via examples, both relational and XML; Part II deals with exchanging relational data; Part III focuses on exchanging XML data; and Part IV covers metadata management.

...read moreread less

Abstract: The problem of exchanging data between different databases with different schemas is an area of immense importance. Consequently data exchange has been one of the most active research topics in databases over the past decade. Foundational questions related to data exchange largely revolve around three key problems: how to build target solutions; how to answer queries over target solutions; and how to manipulate schema mappings themselves? The last question is also known under the name 'metadata management', since mappings represent metadata, rather than data in the database. In this book the authors summarize the key developments of a decade of research. Part I introduces the problem of data exchange via examples, both relational and XML; Part II deals with exchanging relational data; Part III focuses on exchanging XML data; and Part IV covers metadata management.

...read moreread less

104 citations

Journal Article•DOI•

The Remote Sensing and GIS Software Library (RSGISLib)

[...]

Peter Bunting¹, Daniel Clewley², Richard Lucas¹, Sam Gillingham³•Institutions (3)

Aberystwyth University¹, University of Southern California², Landcare Research³

01 Jan 2014-Computers & Geosciences

TL;DR: The RSGISLib software was designed to fill the gaps within existing software packages and to provide a platform to ease the implementation of new and innovative algorithms and data processing techniques.

...read moreread less

94 citations

Proceedings Article•DOI•

JSON data management: supporting schema-less development in RDBMS

[...]

Zhen Hua Liu¹, Beda Hammerschmidt¹, Doug McMahon¹•Institutions (1)

Oracle Corporation¹

18 Jun 2014

TL;DR: The way in which requirements differ between management of relational data and management of JSON data is analyzed and three architectural principles that facilitate a schema-less development style within an RDBMS are presented so that RDB MS users can store, query, and index JSON data without requiring schemas.

...read moreread less

Abstract: Relational Database Management Systems (RDBMS) have been very successful at managing structured data with well-defined schemas. Despite this, relational systems are generally not the first choice for management of data where schemas are not pre-defined or must be flexible in the face of variations and changes. Instead, No-SQL database systems supporting JSON are often selected to provide persistence to such applications. JSON is a light-weight and flexible semi-structured data format supporting constructs common in most programming languages. In this paper, we analyze the way in which requirements differ between management of relational data and management of JSON data. We present three architectural principles that facilitate a schema-less development style within an RDBMS so that RDBMS users can store, query, and index JSON data without requiring schemas. We show how these three principles can be applied to industry-leading RDBMS platforms, such as the Oracle RDBMS Server, with relatively little effort. Consequently, an RDBMS can unify the management of both relational data and JSON data in one platform and use SQL with an embedded JSON path language as a single declarative language to query both relational data and JSON data. This SQL/JSON approach offers significant benefits to application developers as they can use one product to manage both relational data and semi-structured flexible schema data.

...read moreread less

78 citations

Proceedings Article•DOI•

NoSQL Systems for Big Data Management

[...]

Venkat N. Gudivada¹, Dhana Rao¹, Vijay V. Raghavan²•Institutions (2)

Marshall University¹, University of Louisiana at Lafayette²

27 Jun 2014

TL;DR: A taxonomy and unified perspective on NoSQL systems is provided using multiple facets including system architecture, data model, query language, client API, scalability, and availability to help the reader in choosing an appropriate NoSQL system for a given application.

...read moreread less

Abstract: The advent of Big Data created a need for out-of-the-box horizontal scalability for data management systems. This ushered in an array of choices for Big Data management under the umbrella term NoSQL. In this paper, we provide a taxonomy and unified perspective on NoSQL systems. Using this perspective, we compare and contrast various NoSQL systems using multiple facets including system architecture, data model, query language, client API, scalability, and availability. We group current NoSQL systems into seven broad categories: Key-Value, Table-type/Column, Document, Graph, Native XML, Native Object, and Hybrid databases. We also describe application scenarios for each category to help the reader in choosing an appropriate NoSQL system for a given application. We conclude the paper by indicating future research directions.

...read moreread less

77 citations

Journal Article•DOI•

A Novel Executable Modeling Approach for System-of-Systems Architecture

[...]

Bingfeng Ge¹, Keith W. Hipel², Kewei Yang¹, Yingwu Chen¹•Institutions (2)

National University of Defense Technology¹, University of Waterloo²

01 Mar 2014-IEEE Systems Journal

TL;DR: A novel executable modeling approach is proposed for system-of-systems architecture by taking full advantage of the Department of Defense Architecture Framework (DoDAF) Meta-model (DM2) and defining the common model transformation specification at the metamodel level.

...read moreread less

Abstract: A novel executable modeling approach is proposed for system-of-systems architecture by taking full advantage of the Department of Defense Architecture Framework (DoDAF) Meta-model (DM2) and defining the common model transformation specification at the metamodel level. This methodology provides more flexibility and adaptability for the automated construction of executable models directly from the architectural data rather than from static models. More specifically, the architectural data metamodel is first established to guide architectural data modeling of core data elements and associations in DM2 as the common and consistent data dictionary for architecture modeling, while the executable formalism metamodel is designed to formally define executable models. Then, the mapping rules between both metamodels are presented as the common transformation specification regardless of which modeling language or methodology is employed in developing architectural descriptions. Finally, eXtensible Markup Language (XML) technology, like XML schema languages and eXtensible Stylesheet Language specifications, is discussed to facilitate the automated transformation of executable models from architectural instance data. The feasibility of the proposed approach is illustrated using a published example where colored Petri net (CPN) is used as the executable formalism.

...read moreread less

62 citations

Proceedings Article•

FoLiA: A practical XML Format for Linguistic Annotation - a descriptive and comparative study

[...]

M. van Gompel¹, Martin Reynaert²•Institutions (2)

Radboud University Nijmegen¹, Tilburg University²

01 Jan 2014

TL;DR: The aim of the paper is to present a clear image of the capabilities of FoLiA and how it relates to other formats, and to open discussion and aid users in their decision for a particular format.

...read moreread less

Abstract: In this paper we present FoLiA, a Format for Linguistic Annotation, and conduct a comparative study with other annotation schemes, including the Linguistic Annotation Framework (LAF), the Text Encoding Initiative (TEI) and Text Corpus Format (TCF). An additional point of focus is the interoperability between FoLiA and metadata standards such as the Component MetaData Infrastructure (CMDI), as well as data category registries such as ISOcat. The aim of the paper is to present a clear image of the capabilities of FoLiA and how it relates to other formats. This should open discussion and aid users in their decision for a particular format. FoLiA is a practically-oriented XML-based annotation format for the representation of language resources, explicitly supporting a wide variety of annotation types. It introduces a flexible and uniform paradigm and a representation independent of language or label set. It is designed to be highly expressive, generic, and formalised, whilst at the same time focussing on being as practical as possible to ease its adoption and implementation. The aspiration is to offer a generic format for storage, exchange, and machine-processing of linguistically annotated documents, preventing users as well as software tools from having to cope with a wide variety of different formats, which in the field regularly causes convertibility issues and proliferation of ad-hoc formats. FoLiA emerged from such a practical need in the context of Computational Linguistics in the Netherlands and Flanders. It has been successfully adopted by numerous projects within this community. FoLiA was developed in a bottom-up fashion, with special emphasis on software libraries and tools to handle it.

...read moreread less

62 citations

Journal Article•DOI•

XHSTT: an XML archive for high school timetabling problems in different countries

[...]

Gerhard F. Post¹, Jeffrey H. Kingston², Samad Ahmadi³, Sophia Daskalaki⁴, Christos Gogos⁵, Jari Kyngäs⁶, Cimmo Nurmi⁶, Nysret Musliu⁷, Nelishia Pillay⁸, Haroldo Gambini Santos⁹, Andrea Schaerf¹⁰ - Show less +7 more•Institutions (10)

University of Twente¹, University of Sydney², De Montfort University³, University of Patras⁴, American Hotel & Lodging Educational Institute⁵, Satakunta University of Applied Sciences⁶, Vienna University of Technology⁷, University of KwaZulu-Natal⁸, Universidade Federal de Ouro Preto⁹, University of Udine¹⁰

01 Jul 2014-Annals of Operations Research

TL;DR: The High School Timetabling Archive XHSTT-2011 is announced with 21 instances from 8 countries and an evaluator capable of checking the syntax of instances and evaluating the solutions.

...read moreread less

Abstract: We present the progress on the benchmarking project for high school timetabling that was introduced at PATAT 2008. In particular, we announce the High School Timetabling Archive XHSTT-2011 with 21 instances from 8 countries and an evaluator capable of checking the syntax of instances and evaluating the solutions.

...read moreread less

54 citations

Book Chapter•DOI•

Survey of Semantic Description of REST APIs

[...]

Ruben Verborgh¹, Andreas Harth², Maria Maleshkova², Steffen Stadtmüller², Thomas Steiner³, Mohsen Taheriyan⁴, Rik Van de Walle¹ - Show less +3 more•Institutions (4)

Ghent University¹, Karlsruhe Institute of Technology², Polytechnic University of Catalonia³, University of Southern California⁴

01 Jan 2014

TL;DR: Since the agreement on the semantics of the communicated data is only implicit, programmers developing client applications have to manually gain a deep understanding of several APIs from multiple providers.

...read moreread less

Abstract: The REST architectural style assumes that client and server form a contract with content negotiation, not only on the data format but implicitly also on the semantics of the communicated data, i.e., an agreement on how the data have to be interpreted [247]. In different application scenarios such an agreement requires vendor-specific content types for the individual services to convey the meaning of the communicated data. The idea behind vendor-specific content types is that service providers can reuse content types and service consumers can make use of specific processors for the individual content types. In practice however, we see that many RESTful APIs on the Web simply make use of standard non-specific content types, e.g., text/xml or application/json [150]. Since the agreement on the semantics is only implicit, programmers developing client applications have to manually gain a deep understanding of several APIs from multiple providers.

...read moreread less

Proceedings Article•

Verovio: a library for engraving mei music notation into svg

[...]

Laurent Pugin¹, Rodolfo Zitellini, Perry Roland²•Institutions (2)

McGill University¹, University of Virginia²

01 Jan 2014

TL;DR: Verovio is a library and toolkit for rendering Music Encoding Initiative (MEI) notation encodings that are increasingly used in music research projects and is particularly well-suited for interactive applications, especially in web browsers.

...read moreread less

Abstract: Rendering symbolic music notation is a common component of many MIR applications, and many tools are available for this task. There is, however, a need for a tool that can natively render the Music Encoding Initiative (MEI) notation encodings that are increasingly used in music research projects. In this paper, we present Verovio, a library and toolkit for rendering MEI. A significant advantage of Verovio is that it implements MEI’s structure internally, making it the best suited solution for rendering features that make MEI unique. Verovio is designed as a fast, portable, lightweight tool written in pure standard C++ with no dependencies on third-party frameworks or libraries. It can be used as a command-line rendering tool, as a library, or it can be compiled to JavaScript using the Emscripten LLVM-to-JavaScript compiler. This last option is particularly interesting because it provides a complete in-browser music MEI typesetter. The SVG output from Verovio is organized in such a way that the MEI structure is preserved as much as possible. Since every graphic in SVG is an XML element that is easily addressable, Verovio is particularly well-suited for interactive applications, especially in web browsers. Verovio is available under the GPL open-source license.

...read moreread less

Book•DOI•

Modelling Control Systems Using IEC 61499

[...]

Alois Zoitl, Robert Lewis¹•Institutions (1)

Atkins¹

30 May 2014

TL;DR: Modelling Control Systems Using IEC 61499 2nd Edition provides a concise and yet thorough introduction to the main concepts and models defined in the standard, and will be of interest to research-led control and process engineers and students working in fields that require complex control systems using networked based distributed control.

...read moreread less

Abstract: IEC 61499 is a standard for modelling distributed control systems for use in industrial automation, and is already having an impact on the design and implementation of industrial control systems that involve the integration of programmable logic controllers, intelligent devices and sensors. Modelling Control Systems Using IEC 61499. 2nd Edition provides a concise and yet thorough introduction to the main concepts and models defined in the standard. Topics covered include defining applications, systems, distributing applications on the system's devices, function blocks, structuring applications, service interface function blocks, event function blocks, and examples of industrial applications. This second edition has been significantly updated to reflect the current second release of IEC 61499, including changes in the function block model, its execution, and the newly standardized XML exchange format for model artefacts, and to reflect lessons learned from the author's teaching of IEC 61499 over the last ten years. This book will be of interest to research-led control and process engineers and students working in fields that require complex control systems using networked based distributed control.

...read moreread less

Patent•

Aggregated information access and control using a personal unifying taxonomy

[...]

William P. Jones

06 Nov 2014

TL;DR: In this paper, techniques for providing users with aggregate access to and control over information from multiple storing applications and information services, and for enabling developers to integrate such aggregate access and control into applications are described.

...read moreread less

Abstract: Techniques are described for providing users with aggregate access to and control over information from multiple storing applications and information services, and for enabling developers to integrate such aggregate access and control into applications. Textual markup language may represent the structure of grouping items. Examples using XML and XooML (“Cross Tool Mark-up Language,” an XML schema) are provided, such that users need not change the storing application or service in order for those users' informational structures to be represented.

...read moreread less

Journal Article•DOI•

The Linguistic Annotation Framework: a standard for annotation interchange and merging

[...]

Nancy Ide¹, Keith Suderman¹•Institutions (1)

Vassar College¹

01 Sep 2014

TL;DR: The XML serialization of ISO–LAF, the Graph Annotation Format (GrAF) is described and the rationale behind the various decisions that were made in determining the standard is discussed.

...read moreread less

Abstract: This paper overviews the International Standards Organization---Linguistic Annotation Framework (ISO---LAF) developed in ISO TC37 SC4. We describe the XML serialization of ISO---LAF, the Graph Annotation Format (GrAF) and discuss the rationale behind the various decisions that were made in determining the standard. We describe the structure of the GrAF headers in detail and provide multiple examples of GrAF representation for text and multi-media. Finally, we discuss the next steps for standardization of interchange formats for linguistic annotations.

...read moreread less

Journal Article•DOI•

qcML: An Exchange Format for Quality Control Metrics from Mass Spectrometry Experiments

[...]

Mathias Walzer¹, Lucia Espona Pernas², Sara Nasso³, Sara Nasso², Wout Bittremieux⁴, Sven Nahnsen¹, Pieter Kelchtermans⁵, Pieter Kelchtermans⁶, Peter Pichler⁷, Peter Pichler⁸, Henk W. P. van den Toorn⁹, An Staes⁶, Jonathan Vandenbussche⁶, Michael Mazanek⁸, Michael Mazanek⁷, Thomas Taus⁸, Thomas Taus⁷, Richard A. Scheltema¹⁰, Christian D. Kelstrup¹¹, Laurent Gatto¹², Bas van Breukelen⁹, Stephan Aiche¹³, Dirk Valkenborg⁵, Dirk Valkenborg¹⁴, Dirk Valkenborg⁴, Kris Laukens⁴, Kathryn S. Lilley¹², Jesper V. Olsen¹¹, Albert J. R. Heck⁹, Karl Mechtler⁸, Karl Mechtler⁷, Ruedi Aebersold², Ruedi Aebersold³, Kris Gevaert⁶, Juan Antonio Vizcaíno¹⁵, Henning Hermjakob¹⁵, Oliver Kohlbacher¹, Lennart Martens⁶ - Show less +34 more•Institutions (15)

University of Tübingen¹, ETH Zurich², University of Zurich³, University of Antwerp⁴, Flemish Institute for Technological Research⁵, Ghent University⁶, Austrian Academy of Sciences⁷, Research Institute of Molecular Pathology⁸, Utrecht University⁹, Max Planck Society¹⁰, University of Copenhagen¹¹, University of Cambridge¹², Free University of Berlin¹³, University of Hasselt¹⁴, European Bioinformatics Institute¹⁵

01 Aug 2014-Molecular & Cellular Proteomics

TL;DR: The qcML format is described, an XML-based standard that follows the design principles of the related mzML, mzIdentML,mzQuantML, and TraML standards from the HUPO-PSI (Proteomics Standards Initiative), so that existing LIMS systems can easily add relational storage of the quality control data to their existing schema.

...read moreread less

Pemrograman Web Dengan HTML

[...]

Betha Sidik, Husni Iskandar Pohan

28 Sep 2014

TL;DR: Isi Buku Mencakup as mentioned in this paper describes HTML Static, VB Script, Access, Cascading Style Sheet, Internet Data Connector, HTML Dynamic w/ASP, XML.

...read moreread less

Abstract: Isi Buku Mencakup: - HTML Static - VB Script - Access - Cascading Style Sheet - Internet Data Connector - HTML Dynamic w/ASP - XML

...read moreread less

Proceedings Article•DOI•

Xevolver: An XML-based code translation framework for supporting HPC application migration

[...]

Hiroyuki Takizawa¹, Shoichi Hirasawa¹, Yasuharu Hayashi², Ryusuke Egawa¹, Hiroaki Kobayashi¹ - Show less +1 more•Institutions (2)

Tohoku University¹, NEC²

01 Dec 2014

TL;DR: An extensible programming framework to separate platform-specific optimizations from application codes, and to incrementally improve the performance of an existing application without messing up the code is proposed.

...read moreread less

Abstract: This paper proposes an extensible programming framework to separate platform-specific optimizations from application codes. The framework allows programmers to define their own code translation rules for special demands of individual systems, compilers, libraries, and applications. Code translation rules associated with user-defined compiler directives are defined in an external file, and the application code is just annotated by the directives. For code transformations based on the rules, the framework exposes the abstract syntax tree (AST) of an application code as an XML document to expert programmers. Hence, the XML document of an AST can be transformed using any XML-based technologies. Our case studies using real applications demonstrate that the framework is effective to separate platform-specific optimizations from application codes, and to incrementally improve the performance of an existing application without messing up the code.

...read moreread less

Journal Article•DOI•

Efficient query processing for XML keyword queries based on the IDList index

[...]

Junfeng Zhou¹, Zhifeng Bao, Wei Wang², Jinjia Zhao¹, Xiaofeng Meng³ - Show less +1 more•Institutions (3)

Yanshan University¹, University of New South Wales², Renmin University of China³

01 Feb 2014

TL;DR: This paper proposes a novel form of inverted list, namely the IDList, which consists of ordered nodes that directly or indirectly contain $$k$$ that can be reduced to ordered set intersection problem, which has been heavily optimized due to its application in areas such as information retrieval and database systems.

...read moreread less

Abstract: Keyword search over XML data has attracted a lot of research efforts in the last decade, where one of the fundamental research problems is how to efficiently answer a given keyword query w.r.t. a certain query semantics. We found that the key factor resulting in the inefficiency for existing methods is that they all heavily suffer from the common-ancestor-repetition problem. In this paper, we propose a novel form of inverted list, namely the IDList; the IDList for keyword $$k$$ consists of ordered nodes that directly or indirectly contain $$k$$ . We then show that finding keyword query results based on the smallest lowest common ancestor and exclusive lowest common ancestor semantics can be reduced to ordered set intersection problem, which has been heavily optimized due to its application in areas such as information retrieval and database systems. We propose several algorithms that exploit set intersection in different directions and with or without using additional indexes. We further propose several algorithms that are based on hash search to simplify the operation of finding common nodes from all involved IDLists. We have conducted an extensive set of experiments using many state-of-the-art algorithms and several large-scale datasets. The results demonstrate that our proposed methods outperform existing methods by up to two orders of magnitude in many cases.

...read moreread less

Proceedings Article•DOI•

Using JSON to manage communication between services in the Internet of Things

[...]

Philipp Wehner¹, Christina Piberger¹, Diana Gohringer¹•Institutions (1)

Ruhr University Bochum¹

26 May 2014

TL;DR: A specification for communication between FPGA-based systems is presented, based on the combination of reconfigurable FPGa technology with the concepts of distributed computing, which provides an efficient communication structure with a consistent format for highly flexible and adaptive systems.

...read moreread less

Abstract: The emerging Internet of Things results in new challenges for the interconnection of devices and the efficient management of available resources. In this paper, a specification for communication between FPGA-based systems is presented. It is based on the combination of reconfigurable FPGA technology with the concepts of distributed computing. JavaScript Object Notation (JSON) is a standardized human-readable data-interchange format. Even though it is based on JavaScript, JSON is completely language independent. The high rate of flexibility in combination with a light-weight structure avoids unnecessary overhead, associated with the benefits of standardization. JSON can be used in heterogeneous compute nodes, as it can be parsed easily and without special requirements. This way it provides an efficient method for transferring various data and for exploiting dynamic and partial reconfiguration in the area of distributed embedded systems. The result is an efficient communication structure with a consistent format for highly flexible and adaptive systems.

...read moreread less

Journal Article•DOI•

Modest XML for corpora:not a standard, but a suggestion

[...]

Andrew Hardie

28 Apr 2014-ICAME Journal

TL;DR: A set of recommendations (not a standard) that outlines general best practices in the use of XML in corpora without going into any of the more technical aspects of XML or the full weight of TEI encoding are presented.

...read moreread less

Abstract: This paper argues for, and presents, a modest approach to XML encoding for use by the majority of contemporary linguists who need to engage in corpus construction. While extensive standards for corpus encoding exist - most notably, the Text Encoding Initiative’s Guidelines and the Corpus Encoding Standard based on them - these are rather heavyweight approaches, implicitly intended for major corpus-building projects, which are rather different from the increasingly common efforts in corpus construction undertaken by individual researchers in support of their personal research goals. Therefore, there is a clear benefit to be had from a set of recommendations (not a standard) that outlines general best practices in the use of XML in corpora without going into any of the more technical aspects of XML or the full weight of TEI encoding. This paper presents such a set of suggestions, dubbed Modest XML for Corpora, and posits that such a set of pointers to a limited level of XML knowledge could work as part of the normal, general training of corpus linguists. The Modest XML recommendations cover the following set of things, which, according to the foregoing argument, are sufficient knowledge about XML for most corpus linguists’ day-to-day needs: use of tags; adding attribute value pairs; recommended use of attributes; nesting of tags; encoding of special characters; XML well-formedness; a collection of de facto standard tags and attributes; going beyond the basic de facto standard tags; and text headers.

...read moreread less

Proceedings Article•

Programming by example using least general generalizations

[...]

Mohammad Shahid Raza¹, Sumit Gulwani¹, Natasa Milic-Frayling¹•Institutions (1)

Microsoft¹

27 Jul 2014

TL;DR: A novel domain specific language (DSL) is described that expresses transformations over XML structures describing richly formatted content, and a synthesis algorithm is presented that generates a minimal program with respect to a natural subsumption ordering in this DSL.

...read moreread less

Abstract: Recent advances in Programming by Example (PBE) have supported new applications to text editing, but existing approaches are limited to simple text strings. In this paper we address transformations in richly formatted documents, using an approach based on the idea of least general generalizations from inductive inference, which avoids the scalability issues faced by state-of-the-art PBE methods. We describe a novel domain specific language (DSL) that expresses transformations over XML structures describing richly formatted content, and a synthesis algorithm that generates a minimal program with respect to a natural subsumption ordering in our DSL. We present experimental results on tasks collected from online help forums, showing an average of 4.17 examples required for task completion.

...read moreread less

Journal Article•DOI•

Journal Article Tag Suite 1.0: National Information Standards Organization standard of journal extensible markup language

[...]

Sun Huh¹•Institutions (1)

Hallym University¹

18 Aug 2014-Science Editing

TL;DR: Publisher and editors should now adopt JATS XML for journal publishing because it is an essential format to present readers with a more userfriendly interface.

...read moreread less

Abstract: In the era of information technology, scholarly journals cannot escape the rising tide of technological advancement. To be exposed more easily to readers, the web forms of schol arly journals and articles become more important year after year. Furthermore, there is a trend of print journals closing, and a significant emergence of online journals. Journal Ar ticle Tag Suite (JATS) extensible markup language (XML) became an National Infor mation Standards Organization standard language in online journal publishing in 2012. It is an essential format to present readers with a more userfriendly interface. JATS XML was developed by PubMed Central (PMC) XML, which was a deposit form of articles to PMC. Editors and other publishingrelated personnel should be able to understand the concept and production process of XML files. When JATS XML is produced, a variety of web presentation views can be generated, such as PubReader and epub 3.0. Further, JATS XML can be easily converted to digital object identifier CrossRef XML, CrossMark XML, and FundRef XML. Small scholarly society journal editors and publishers can promote the visibility of their journals by depositing JATS XML files to PMC or ScienceCentral. Owing to these benefits of JATS XML, publishers and editors should now adopt JATS XML for journal publishing.

...read moreread less

Journal Article•DOI•

A provenance-based approach to semantic web service description and discovery

[...]

Tom Narock¹, Victoria Y. Yoon², Sal March³•Institutions (3)

Marymount University¹, Virginia Commonwealth University², Vanderbilt University³

01 Aug 2014

TL;DR: A new publicly available provenance ontology for service discovery is created and a user study indicates the results extend beyond e-Science, and an integrated approach to web service description and discovery is developed.

...read moreread less

Abstract: Web services have become common, if not essential, in the areas of business-to-business integration, distributed computing, and enterprise application integration. Yet the XML-based standards for web service descriptions encode only a syntactic representation of the service input and output. The actual meaning of these terms, their formal definitions, and their relationships to other concepts are not represented. This poses challenges for leveraging web services in the development of software capabilities. As the number of services grows and the specificity of users' needs increases, the ability to find an appropriate service for a specific application is strained. In order to overcome this challenge, semantic web services were proposed. For the discovery of web services, semantic web services use ontologies to find matches between user requirements and service capabilities. The computational reasoning afforded by ontologies enables users to find categorizations that weren't explicitly defined. However, there are a number of methodological variants on semantic web service discovery. Based on e-Science, an analog to e-Business, one methodology advocates deep and detailed semantic description of a web service's inputs and outputs. Yet, this methodology predates recent advances in semantic web and provenance research, and it is unclear the extent to which it applies outside of e-Science. We explore this question through a within-subjects experiment and we extend this methodology with current research in provenance, semantic web, and web service standards, developing and empirically evaluating an integrated approach to web service description and discovery. Implications for more advanced web service discovery algorithms and user interfaces are also presented. We address limitations in semantic web service discovery.Our approach is grounded in semantic web standards and W3C provenance ontology.Our user study indicates the results extend beyond e-Science.Our user study provides insights for web service discovery applications.We have created a new publicly available provenance ontology for service discovery.

...read moreread less

Journal Article•DOI•

The National Cancer Informatics Program (NCIP) Annotation and Image Markup (AIM) Foundation Model

[...]

Pattanasak Mongkolwat¹, Vladimir Kleper¹, Skip Talbot¹, Daniel L. Rubin²•Institutions (2)

Northwestern University¹, Stanford University²

17 Jun 2014-Journal of Digital Imaging

TL;DR: Fundamental AIM concepts are described and how to use and extend AIM for various imaging disciplines are described.

...read moreread less

Abstract: Knowledge contained within in vivo imaging annotated by human experts or computer programs is typically stored as unstructured text and separated from other associated information. The National Cancer Informatics Program (NCIP) Annotation and Image Markup (AIM) Foundation information model is an evolution of the National Institute of Health’s (NIH) National Cancer Institute’s (NCI) Cancer Bioinformatics Grid (caBIG®) AIM model. The model applies to various image types created by various techniques and disciplines. It has evolved in response to the feedback and changing demands from the imaging community at NCI. The foundation model serves as a base for other imaging disciplines that want to extend the type of information the model collects. The model captures physical entities and their characteristics, imaging observation entities and their characteristics, markups (two- and three-dimensional), AIM statements, calculations, image source, inferences, annotation role, task context or workflow, audit trail, AIM creator details, equipment used to create AIM instances, subject demographics, and adjudication observations. An AIM instance can be stored as a Digital Imaging and Communications in Medicine (DICOM) structured reporting (SR) object or Extensible Markup Language (XML) document for further processing and analysis. An AIM instance consists of one or more annotations and associated markups of a single finding along with other ancillary information in the AIM model. An annotation describes information about the meaning of pixel data in an image. A markup is a graphical drawing placed on the image that depicts a region of interest. This paper describes fundamental AIM concepts and how to use and extend AIM for various imaging disciplines.

...read moreread less

Proceedings Article•DOI•

Representing SimModel in the Web Ontology Language

[...]

Pieter Pauwels, Edward Corry, James O'Donnell

17 Jun 2014-Computing in Civil and Building Engineering

TL;DR: In order to be able to better integrate SimModel information with other building information, a conversion service has been built that is able to parse the SimModel ontology in the form of XSD schemas and output a SimModel Ontology in OWL.

...read moreread less

Abstract: Many building energy performance (BEP) simulation tools, such as EnergyPlus and DOE-2, use custom schema definitions (IDD and BDL respectively) as opposed to standardised schema definitions (defined in XSD, EXPRESS, and so forth). A Simulation Domain Model (SimModel) was therefore proposed earlier, representative for a new interoperable XML-based data model for the building simulation domain. Its ontology aims at moving away from tool-specific, non-standard nomenclature by implementing an industry-validated terminology aligned with the Industry Foundation Classes (IFC). In this paper, we document our ongoing efforts to make building simulation data more interoperable with other building data. In order to be able to better integrate SimModel information with other building information, we have aimed at representing this information in the Resource Description Framework (RDF). A conversion service has been built that is able to parse the SimModel ontology in the form of XSD schemas and output a SimModel ontology in OWL. In this article, we document this effort and give an indication of what the resulting SimModel ontology in OWL can be used for.

...read moreread less

Journal Article•DOI•

A general lexicographic model for a typological variety of dictionaries in African languages

[...]

Gertrud Faaß¹, Sonja E. Bosch¹, Rufus H. Gouws²•Institutions (2)

University of South Africa¹, Stellenbosch University²

27 Oct 2014-Lexikos

TL;DR: The model was developed in the context of the project "Scientific eLexicography for Africa" and the lexicographic database to be built thereof will be implemented with MySQL and the model and the (empty) database will be freely available by mid-2015.

...read moreread less

Abstract: So far, there have been few descriptions on creating structures capable of storing lexicographic data, ISO 24613:2008 being one of the latest. Another one is by Spohr (2012), who designs a multifunctional lexical resource which is able to store data of different types of dictionaries in a user-oriented way. Technically, his design is based on the principle of a hierarchical XML/OWL (eXtensible Markup Language/Web Ontology Language) representation model. This article follows another route in describing a model based on entities and relations between them; MySQL (usually referred to as: Structured Query Language) describes a database system of tables containing data and definitions of relations between them. The model was developed in the context of the project "Scientific eLexicography for Africa" and the lexicographic database to be built thereof will be implemented with MySQL. The principles of the ISO model and of Spohr's model are adhered to with one major difference in the implementation strategy: we do not place the lemma in the centre of attention, but the sense description — all other elements, including the lemma, depend on the sense description. This article also describes the contained lexicographic data sets and how they have been collected from different sources. As our aim is to compile several prototypical internet dictionaries (a monolingual Northern Sotho dictionary, a bilingual learners' Xhosa–English dictionary and a bilingual Zulu–English dictionary), we describe the necessary microstructural elements for each of them and which principles we adhere to when designing different ways of accessing them. We plan to make the model and the (empty) database with all graphical user interfaces that have been developed, freely available by mid-2015.

...read moreread less

Patent•

Methods and systems for enabling the provisioning and execution of a platform-independent application

[...]

Bruce Warila, Brian D. Markey, Jeremy Aaron Gilbert, James P. Echmalian, Todd Silverstein - Show less +1 more

08 Aug 2014

TL;DR: In this article, a method for enabling the provisioning and execution of a platform-independent application includes receiving, by a mobile computing device, from a provisioning source, an XML document describing at least a portion of functionality provided by an application.

...read moreread less

Abstract: A method for enabling the provisioning and execution of a platform-independent application includes receiving, by a mobile computing device, from a provisioning source, an XML document describing at least a portion of functionality provided by an application. An abstraction layer executing on the mobile computing device populates a Document Object Model (“DOM”) structure at least in part representing the running state of the application consistent with the received XML document. The abstraction layer presents a device-appropriate application user interface responsive to the DOM structure. The application receives a user input from within the rendered application user interface. Programming code referenced by the DOM receives, from the abstraction layer, the input event. The programming code reconfigures the DOM, in response to the received input event to reflect a response by the application to the input. The abstraction layer updates the device-appropriate application user interface, responsive to the reconfiguring of the DOM.

...read moreread less

Journal Article•DOI•

libNeuroML and PyLEMS: using Python to combine procedural and declarative modeling approaches in computational neuroscience.

[...]

Michael Vella¹, Robert C. Cannon, Sharon M. Crook², Andrew P. Davison³, Gautham Ganapathy, Hugh P. C. Robinson¹, R. Angus Silver⁴, Padraig Gleeson⁴ - Show less +4 more•Institutions (4)

University of Cambridge¹, Arizona State University², Centre national de la recherche scientifique³, University College London⁴

23 Apr 2014-Frontiers in Neuroinformatics

TL;DR: Two Application Programming Interfaces (APIs) written in Python are described, which simplify the process of developing and modifying models expressed in NeuroML and LEMS.

...read moreread less

Abstract: NeuroML is an XML-based model description language, which provides a powerful common data format for defining and exchanging models of neurons and neuronal networks. In the latest version of NeuroML, the structure and behavior of ion channel, synapse, cell, and network model descriptions are based on underlying definitions provided in LEMS, a domain-independent language for expressing hierarchical mathematical models of physical entities. While declarative approaches for describing models have led to greater exchange of model elements among software tools in computational neuroscience, a frequent criticism of XML-based languages is that they are difficult to work with directly. Here we describe two Application Programming Interfaces (APIs) written in Python (http://www.python.org), which simplify the process of developing and modifying models expressed in NeuroML and LEMS. The libNeuroML API provides a Python object model with a direct mapping to all NeuroML concepts defined by the NeuroML Schema, which facilitates reading and writing the XML equivalents. In addition, it offers a memory-efficient, array-based internal representation, which is useful for handling large-scale connectomics data. The libNeuroML API also includes support for performing common operations that are required when working with NeuroML documents. Access to the LEMS data model is provided by the PyLEMS API, which provides a Python implementation of the LEMS language, including the ability to simulate most models expressed in LEMS. Together, libNeuroML and PyLEMS provide a comprehensive solution for interacting with NeuroML models in a Python environment.

...read moreread less

Journal Article•DOI•

Performance Evaluation of Continuity of Care Records (CCRs): Parsing Models in a Mobile Health Management System

[...]

Hung-Ming Chen¹, Yong-Zan Liou¹•Institutions (1)

National Taichung University of Science and Technology¹

01 Oct 2014-Journal of Medical Systems

TL;DR: The objective of this study was to identify different operational and performance characteristics for those CCR parsing models including the XML DOMparser, the SAX parser, the PULL parser, and the JSON parser with regard to JSON data converted from XML-based CCR.

...read moreread less

Abstract: In a mobile health management system, mobile devices act as the application hosting devices for personal health records (PHRs) and the healthcare servers construct to exchange and analyze PHRs. One of the most popular PHR standards is continuity of care record (CCR). The CCR is expressed in XML formats. However, parsing is an expensive operation that can degrade XML processing performance. Hence, the objective of this study was to identify different operational and performance characteristics for those CCR parsing models including the XML DOM parser, the SAX parser, the PULL parser, and the JSON parser with regard to JSON data converted from XML-based CCR. Thus, developers can make sensible choices for their target PHR applications to parse CCRs when using mobile devices or servers with different system resources. Furthermore, the simulation experiments of four case studies are conducted to compare the parsing performance on Android mobile devices and the server with large quantities of CCR data.

...read moreread less

Collapse