scispace - formally typeset
Search or ask a question

Showing papers on "Serialization published in 2020"


Proceedings ArticleDOI
01 Nov 2020
TL;DR: The 2020 Shared Task at the Conference for Computational Language Learning (CoNLL) was devoted to Meaning Representation Parsing (MRP) across frameworks and languages.
Abstract: The 2020 Shared Task at the Conference for Computational Language Learning (CoNLL) was devoted to Meaning Representation Parsing (MRP) across frameworks and languages. Extending a similar setup from the previous year, five distinct approaches to the representation of sentence meaning in the form of directed graphs were represented in the English training and evaluation data for the task, packaged in a uniform graph abstraction and serialization; for four of these representation frameworks, additional training and evaluation data was provided for one additional language per framework. The task received submissions from eight teams, of which two do not participate in the official ranking because they arrived after the closing deadline or made use of additional training data. All technical information regarding the task, including system submissions, official results, and links to supporting resources and software are available from the task web site at: http://mrp.nlpl.eu

42 citations


Posted Content
TL;DR: MOTION is built in a user-friendly, modular, and extensible way, intended to be used as a tool in MPC research and to increase adoption of MPC protocols in practice and is shown to be highly efficient for privacy-preserving neural network inference.
Abstract: We present MOTION, an efficient and generic framework for mixedprotocol secure multi-party computation (MPC). Our framework is built from the ground up and incorporates several important engineering decisions such as full communication serialization which enables MPC over arbitrary messaging interfaces and removes the need of owning network sockets. It is available under the liberal MIT license and independent of external MPC libraries, which often have stricter licenses. MOTION is extensive and thoroughly tested: it currently consists of more than 36 000 lines of code, 20% of which are unit and component tests. It is built in a user-friendly, modular, and extensible way, intended to be used as tool in MPC research and to increase adoption of MPC protocols in practice. MOTION incorporates several novel performance optimizations that improve the communication complexity and latency, e.g., 2× better online round complexity of precomputed correlated Oblivious Transfer (OT). We instantiate our framework with protocols for N parties and security against up to N−1 passive corruptions: the MPC protocols of Goldreich-Micali-Wigderson (GMW) in its arithmetic and Boolean version and oblivious transfer (OT)-based BMR (BenEfraim et al., CCS’16), as well as novel and highly efficient conversions between them, including a non-interactive conversion from BMR to arithmetic GMW. Moreover, we design a novel garbling technique that saves 20% of communication in the BMR protocol. MOTION is highly efficient, which we demonstrate in our experiments by measuring its run-times in various network settings with different numbers of parties. For secure evaluation of AES-128 with N=3 parties in the high-latency network setting from the OT-based BMR paper, we achieve a 16× better throughput of 16 AES/s using BMR. This shows that the BMR protocol is much more competitive than previously assumed. For N=3 parties and full-threshold protocols in the LAN setting, MOTION is 10×–18× faster than the previous best passively secure implementation from the MP-SPDZ framework, and 190×–586× faster than the actively secure SCALEMAMBA framework. Finally, we show that our framework is highly efficient for privacy-preserving neural network inference.

39 citations


Journal ArticleDOI
TL;DR: A critical review of data models, annotation frameworks, knowledge organization systems, serialization syntaxes, and algebras that enable provenance-aware RDF statements and their limitations can serve as the basis for novel approaches in RDF-powered applications with increasing provenance needs.
Abstract: Expressing machine-interpretable statements in the form of subject-predicate-object triples is a well-established practice for capturing semantics of structured data. However, the standard used for representing these triples, RDF, inherently lacks the mechanism to attach provenance data, which would be crucial to make automatically generated and/or processed data authoritative. This paper is a critical review of data models, annotation frameworks, knowledge organization systems, serialization syntaxes, and algebras that enable provenance-aware RDF statements. The various approaches are assessed in terms of standard compliance, formal semantics, tuple type, vocabulary term usage, blank nodes, provenance granularity, and scalability. This can be used to advance existing solutions and help implementers to select the most suitable approach (or a combination of approaches) for their applications. Moreover, the analysis of the mechanisms and their limitations highlighted in this paper can serve as the basis for novel approaches in RDF-powered applications with increasing provenance needs.

39 citations


Proceedings ArticleDOI
11 May 2020
TL;DR: A checkpointing technique specifically designed to address limitations of simple checkpointing techniques, introducing efficient asynchronous techniques to hide the overhead of serialization and I/O, and distribute the load over all participating processes is proposed.
Abstract: In the age of big data, deep learning has emerged as a powerful tool to extract insight and exploit its value, both in industry and scientific applications. One common pattern emerging in such applications is frequent checkpointing of the state of the learning model during training, needed in a variety of scenarios: analysis of intermediate states to explain features and correlations with training data, exploration strategies involving alternative models that share a common ancestor, knowledge transfer, resilience, etc. However, with increasing size of the learning models and popularity of distributed data-parallel training approaches, simple checkpointing techniques used so far face several limitations: low serialization performance, blocking I/O, stragglers due to the fact that only a single process is involved in checkpointing. This paper proposes a checkpointing technique specifically designed to address the aforementioned limitations, introducing efficient asynchronous techniques to hide the overhead of serialization and I/O, and distribute the load over all participating processes. Experiments with two deep learning applications (CANDLE and ResNet) on a pre-Exascale HPC platform (Theta) shows significant improvement over state-of-art, both in terms of checkpointing duration and runtime overhead.

21 citations


Proceedings ArticleDOI
03 Sep 2020
TL;DR: This work is the first to tackle the problem of schema inference for property graphs with a novel end-to-end schema inference method for property graph schemas that tackles complex and nested property values, multi-labeled nodes and node hierarchies.
Abstract: Property graph instances are typically populated without defining a schema beforehand. Although this ensures great flexibility, the lack of a schema implies to miss opportunities for query optimization, data integration and analytics, to name a few. Since several graph instances exist prior to the schema definition, extracting the schema from those instances in a principled way might become a significant yet daunting task. In this paper, we present a novel end-to-end schema inference method for property graph schemas that tackles complex and nested property values, multi-labeled nodes and node hierarchies. Our method consists of three main steps, the first of which builds upon Cypher queries to extract the node and edge serialization of a property graph. The second step builds over a MapReduce type inference system, working on the serialized output thereby obtained during the first step. The third step analyzes subtypes and supertypes to infer node hierarchies. We describe our schema inference pipeline and its implementation, a labels-and a properties-oriented variant. Finally, we experimentally evaluate and compare the scalability and accuracy of our approaches on several real-life datasets. To the best of our knowledge, our work is the first to tackle the problem of schema inference for property graphs.

21 citations


Proceedings ArticleDOI
30 May 2020
TL;DR: Cereal, a specialized hardware accelerator for memory object serialization, is proposed by co-designing the serialization format with hardware architecture and effectively utilizes abundant parallelism in the S/D process to deliver high throughput.
Abstract: Object serialization and deserialization (S/D) is an essential feature for efficient communication between distributed computing nodes with potentially non-uniform execution environments. S/D operations are widely used in big data analytics frameworks for remote procedure calls and massive data transfers like shuffles. However, frequent S/D operations incur significant performance and energy overheads as they must traverse and process a large object graph. Prior approaches improve S/D throughput by effectively hiding disk or network I/O latency with computation, increasing compression ratio, and/or application-specific customization. However, inherent dependencies in the existing (de)serialization formats and algorithms eventually become the major performance bottleneck. Thus, we propose Cereal, a specialized hardware accelerator for memory object serialization. By co-designing the serialization format with hardware architecture, Cereal effectively utilizes abundant parallelism in the S/D process to deliver high throughput. Cereal also employs an efficient object packing scheme to compress metadata such as object reference offsets and a space-efficient bitmap representation for the object layout. Our evaluation of Cereal using both a cycle-level simulator and synthesizable Chisel RTL demonstrates that Cereal delivers 43.4× higher average S/D throughput than 88 other S/D libraries on Java Serialization Benchmark Suite. For six Spark applications Cereal achieves 7.97× and 4.81× speedups on average for S/D operations over Java built-in serializer and Kryo, respectively, while saving S/D energy by 227.75× and 136.28×.

20 citations


Journal ArticleDOI
TL;DR: This work proposes a novel technique, called adaptive two-layer serialization algorithm, which can achieve good performance in communication for different kinds of messages in autonomous robot systems and shows significant performance improvement over traditional methods in ROS2.
Abstract: With the development of deep learning, autonomous robot systems grow rapidly and require better performance. Robot Operating System 2 (ROS2) has been widely adopted as the main communication framework in autonomous robot systems. However, the performance of ROS2 has become the bottleneck of these real-time systems. From our observations, we find that it can take a large amount of time to serialize complex message in communication, especially for some high-level programming languages, including Python, Java and so on. To address this challenge, we propose a novel technique, called adaptive two-layer serialization algorithm, which can achieve good performance in communication for different kinds of messages. Experimental results show that our algorithm can achieve significant performance improvement over traditional methods in ROS2, up to 93% improvement in our framework. We have successfully applied our proposed techniques in a real autonomous robot system.

18 citations


Posted Content
TL;DR: This work develops a storage architecture for in-memory database management systems (DBMSs) that is aware of the eventual usage of its data and emits columnar storage blocks in a universal open-source format and introduces relaxations to common analytical data formats to efficiently update records.
Abstract: The proliferation of modern data processing tools has given rise to open-source columnar data formats. The advantage of these formats is that they help organizations avoid repeatedly converting data to a new format for each application. These formats, however, are read-only, and organizations must use a heavy-weight transformation process to load data from on-line transactional processing (OLTP) systems. We aim to reduce or even eliminate this process by developing a storage architecture for in-memory database management systems (DBMSs) that is aware of the eventual usage of its data and emits columnar storage blocks in a universal open-source format. We introduce relaxations to common analytical data formats to efficiently update records and rely on a lightweight transformation process to convert blocks to a read-optimized layout when they are cold. We also describe how to access data from third-party analytical tools with minimal serialization overhead. To evaluate our work, we implemented our storage engine based on the Apache Arrow format and integrated it into the DB-X DBMS. Our experiments show that our approach achieves comparable performance with dedicated OLTP DBMSs while enabling orders-of-magnitude faster data exports to external data science and machine learning tools than existing methods.

17 citations


Journal ArticleDOI
01 Sep 2020
TL;DR: This paper uses blockchains to develop a product serialization method that solves the above security issues in a multi-party perishable good supply chain and proposes a secure serialization protocol to verify the authenticity of serial numbers despite not frequently engaging with the blockchain.
Abstract: Product serialization aims to allocate unique serial numbers to products in a supply chain The security challenges to product serialization are: • Valid serial numbers can be stolen and used to label fake products Thus uniqueness of a serial number should be verifiable at any stage of its lifecycle in a supply chain • A planned change of custody of a product in distribution can be corrupted by a few intimidatory nodes Compliance with the planned change of custody should be verifiable • The manufacturer and the consumer should be able to verify that perishable food products with expired shelf life are discarded In this paper, we use blockchains to develop a product serialization method that solves the above security issues in a multi-party perishable good supply chain Blockchains can revolutionize security and transparency in supply chains by providing a secure data-sharing platform in a multi-party environment Although blockchains can provide a secure data storage of change of custody events of products in a supply chain, a high volume of such events poses scalability problems for blockchains In this paper, we solve the product serialization problem using blockchain offline channels Our solution significantly reduces the number of transactions needed to be recorded in the blockchain We propose a secure serialization protocol to verify the authenticity of serial numbers despite not frequently engaging with the blockchain

17 citations


Journal ArticleDOI
TL;DR: The results show that the proposed model can validly recognize the human motion serialization and achieve 93% recognition accuracy within the initial 20% duration of the activities, which is of great significance for real-time human motion recognition.
Abstract: Motivated by the intrinsic dynamics of physical motion as well as establishment of target motion model, this article addresses the problem of human motion recognition with ultra wide band (UWB) through-the-wall radar (TWR) in a novel view of range profile serialization. Specifically, we first convert the original radar echoes into range profiles. Then, an auto-encoder network (AEN) with three dense layers is adopted to reduce the dimension and extract the features of each range profile. After that, a gated recurrent unit (GRU) network with two hidden layers is employed to deal with the features of each time-range slice and output the recognition results at each slice in real time. Finally, experimental data with respect to four different behind-wall human motions is collected by self-developed UWB TWR to validate the effectiveness of the proposed model. The results show that the proposed model can validly recognize the human motion serialization and achieve 93% recognition accuracy within the initial 20% duration of the activities (the average durations are 4s, 5.5s, 3s and 4.5s), which is of great significance for real-time human motion recognition.

16 citations


Journal ArticleDOI
TL;DR: This contribution presents the status and plans of the future ROOT 7 event I/O, and shows how the new, optimized physical data layout speeds up serialization and deserialization and facilitates parallel, vectorized and bulk operations.
Abstract: The ROOT TTree data format encodes hundreds of petabytes of High Energy and Nuclear Physics events. Its columnar layout drives rapid analyses, as only those parts ("branches") that are really used in a given analysis need to be read from storage. Its unique feature is the seamless C++ integration, which allows users to directly store their event classes without explicitly defining data schemas. In this contribution, we present the status and plans of the future ROOT 7 event I/O. Along with the ROOT 7 interface modernization, we aim for robust, where possible compile-time safe C++ interfaces to read and write event data. On the performance side, we show first benchmarks using ROOT's new experimental I/O subsystem that combines the best of TTrees with recent advances in columnar data formats. A core ingredient is a strong separation of the high-level logical data layout (C++ classes) from the low-level physical data layout (storage backed nested vectors of simple types). We show how the new, optimized physical data layout speeds up serialization and deserialization and facilitates parallel, vectorized and bulk operations. This lets ROOT I/O run optimally on the upcoming ultra-fast NVRAM storage devices, as well as file-less storage systems such as object stores.

Book ChapterDOI
22 Jun 2020
TL;DR: This document specifies a JSON representation for the PROV Data Model, called PROV-JSON, that provides a compact and faithful representation of PROV that supports fast data look-up and is particularly suitable for interchanging PROV documents between web services and clients.
Abstract: Provenance is information about entities, activities, and people involved in producing a piece of data or a thing, which can be used to form assessments about the data or the thing’s quality, reliability, or trustworthiness. PROV-DM is the conceptual data model that forms the basis for the W3C provenance (PROV) family of specifications. In this paper, we propose a new serialization for PROV in JSON called PROV-JSONLD. It provides a lightweight representation of PROV expressions in JSON, which is suitable to be processed by Web applications, while maintaining a natural encoding that is familiar with PROV practitioners. In addition, PROV-JSONLD exploits JSON-LD to define a semantic mapping that conforms to the PROV-O specification and, hence, the encoded PROV expressions can be readily processed as Linked Data. Finally, we show that the serialization is also efficiently processable in our evaluation. Overall, PROV-JSONLD is designed to be suitable for interchanging provenance information in Web and Linked Data applications, to offer a natural encoding of provenance for its targeted audience, and to allow for fast processing.

Journal ArticleDOI
01 Apr 2020
TL;DR: This work presents an approach to directly model uni- and bidirectional noncontainment relations in RAGs and provide efficient means for navigating and editing them and discusses the efficient and inter-operable serialization and deserialization of such model instances.
Abstract: Just like current software systems, conceptual models are characterised by increasing complexity and rate of change. Yet, these models only become useful if they can be continuously evaluated, validated and serialized. To achieve sufficiently low response times for large models, incremental analysis is required. Reference Attribute Grammars (RAGs) offer mechanisms to perform incremental analysis efficiently using dynamic dependency tracking. However, not all features used in conceptual modelling are directly available in RAGs. In particular, support for noncontainment model relations is only available through encodings. We present an approach called Relational RAGs to directly model uni- and bidirectional noncontainment relations in RAGs and provide efficient means for navigating and editing them. Furthermore, we discuss the efficient and inter-operable serialization and deserialization of such model instances. This approach is evaluated using a scalable benchmark for incremental model editing and the JastAdd RAG system. Our work demonstrates the suitability of RAGs for validating complex and continuously changing models of current software systems.

Journal ArticleDOI
TL;DR: BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, while reducing file sizes by more than a factor of two versus gzip compressed CIF files, is introduced.
Abstract: 3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression-factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data.

Journal ArticleDOI
TL;DR: A binary serialization format originating from the family of EXPRESS standards, based on an existing open, binary, hierarchical data format called HDF5 that allows random access to specific instances and therefore efficient retrieval of relevant subsets is assessed.

Journal ArticleDOI
TL;DR: This research proposes a model that can automatically transform a SysML Requirement Diagram into an OWL file so that system designs can be easily understood by both humans and machines.
Abstract: The Requirement Diagrams are used by the System Modeling Language (SysML) to depict and model non-functional requirements, such as response time, size, or system functionality, which cannot be accommodated in the Unified Modeling Language (UML). Nevertheless, SysML still lacks the capability to represent the semantic contexts within the design. Web Ontology Language (OWL) can be used to capture the semantic context of system design; hence, the transformation of SysML diagrams into OWL is needed. The current method of SysML Diagrams transformation into OWL is still done manually so that it is very vulnerable to errors, and the translation process requires more time and effort for system engineers. This research proposes a model that can automatically transform a SysML Requirement Diagram into an OWL file so that system designs can be easily understood by both humans and machines. It also allows users to extract knowledge contained in the previous diagrams. The transformation process makes use of a transformation rule and an algorithm that can be used to change a SysML Requirement Diagram into an OWL ontology file. XML Metadata Interchange (XMI) serialization is used as the bridge to perform the transformation. The produced ontology can be viewed in Protege. The class and subclass hierarchy, as well as the object properties and data properties, are clearly shown. In the experiment, it is also shown that the model can conduct the transformation correctly.

Journal ArticleDOI
02 Jan 2020-PLOS ONE
TL;DR: The Core Scientific Dataset model with JavaScript Object Notation (JSON) serialization is presented as a lightweight, portable, and versatile standard for intra- and interdisciplinary scientific data exchange.
Abstract: The Core Scientific Dataset (CSD) model with JavaScript Object Notation (JSON) serialization is presented as a lightweight, portable, and versatile standard for intra- and interdisciplinary scientific data exchange. This model supports datasets with a p-component dependent variable, {U0, …, Uq, …, Up-1}, discretely sampled at M unique points in a d-dimensional independent variable (X0, …, Xk, …, Xd-1) space. Moreover, this sampling is over an orthogonal grid, regular or rectilinear, where the principal coordinate axes of the grid are the independent variables. It can also hold correlated datasets assuming the different physical quantities (dependent variables) are sampled on the same orthogonal grid of independent variables. The model encapsulates the dependent variables' sampled data values and the minimum metadata needed to accurately represent this data in an appropriate coordinate system of independent variables. The CSD model can serve as a re-usable building block in the development of more sophisticated portable scientific dataset file standards.

22 Jun 2020
TL;DR: Comparing the performance tradeoffs of popular application-layer messaging protocols and binary serialization formats in the context of vehicle-to-cloud communication for maintaining digital twins shows that CoAP has the lowest latency and overhead, but is not able to guarantee reliable transfer, even when using its confirmable message feature.
Abstract: This paper compares the performance tradeoffs of popular application-layer messaging protocols and binary serialization formats in the context of vehicle-to-cloud communication for maintaining digital twins. Of particular interest are solutions that enables emerging delay-sensitive Intelligent Transport System (ITS) features, while reducing data usage in mobile networks. The evaluated protocols are Constrained Application Protocol (CoAP), Advanced Message Queuing Protocol (AMQP), and Message Queuing Telemetry Transport (MQTT), and the serialization formats studied are Protobuf and Flatbuffers. The results show that CoAP – the only User Datagram Protocol (UDP) based protocol evaluated – has the lowest latency and overhead, but is not able to guarantee reliable transfer, even when using its confirmable message feature. For our context, the best performer that guarantees reliable transfer is MQTT. For the serialization formats, Protobuf is shown to have faster serialization speed and three times smaller serialized message size than Flatbuffers. In contrast, Flatbuffers uses less memory and has shorter deserialization time, making it an interesting alternative for applications where the vehicle is the receiver of time sensitive information. Finally, insights and implications on ITS communication are discussed.

Posted ContentDOI
16 Nov 2020-bioRxiv
TL;DR: The ISA API provides users with rich programmatic metadata handling functionality to support automation, a common interface and an interoperable medium between the two ISA formats, as well as with other life science data formats required for depositing data in public databases.
Abstract: Background The Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open-source community specifications and software tools for enabling discovery, exchange and publication of metadata from experiments in the life sciences. The original ISA software suite provided a set of user-facing Java tools for creating and manipulating the information structured in ISA-Tab – a now widely used tabular format. To make the ISA framework more accessible to machines and enable programmatic manipulation of experiment metadata, a JSON serialization ISA-JSON was developed. Results In this work, we present the ISA API, a Python library for the creation, editing, parsing, and validating of ISA-Tab and ISA-JSON formats by using a common data model engineered as Python class objects. We describe the ISA API feature set, early adopters and its growing user community. Conclusions The ISA API provides users with rich programmatic metadata handling functionality to support automation, a common interface and an interoperable medium between the two ISA formats, as well as with other life science data formats required for depositing data in public databases.

Journal ArticleDOI
01 Dec 2020
TL;DR: In this article, the authors propose a storage architecture for in-memory database management systems (DBMSs) that is aware of the eventual usage of its data and emits columnar storage blocks in a universal open-source format.
Abstract: The proliferation of modern data processing tools has given rise to open-source columnar data formats. The advantage of these formats is that they help organizations avoid repeatedly converting data to a new format for each application. These formats, however, are read-only, and organizations must use a heavy-weight transformation process to load data from on-line transactional processing (OLTP) systems. We aim to reduce or even eliminate this process by developing a storage architecture for in-memory database management systems (DBMSs) that is aware of the eventual usage of its data and emits columnar storage blocks in a universal open-source format. We introduce relaxations to common analytical data formats to efficiently update records and rely on a lightweight transformation process to convert blocks to a read-optimized layout when they are cold. We also describe how to access data from third-party analytical tools with minimal serialization overhead. To evaluate our work, we implemented our storage engine based on the Apache Arrow format and integrated it into the DB-X DBMS. Our experiments show that our approach achieves comparable performance with dedicated OLTP DBMSs while enabling orders-of-magnitude faster data exports to external data science and machine learning tools than existing methods.

Journal ArticleDOI
TL;DR: The ROOT TTree as discussed by the authors data format encodes hundreds of petabytes of High Energy and Nuclear Physics events, and it has a columnar layout that drives rapid analyses, as only those parts (branches) that are really used in a given analysis need to be read from storage.
Abstract: The ROOT TTree data format encodes hundreds of petabytes of High Energy and Nuclear Physics events. Its columnar layout drives rapid analyses, as only those parts (“branches”) that are really used in a given analysis need to be read from storage. Its unique feature is the seamless C++ integration, which allows users to directly store their event classes without explicitly defining data schemas. In this contribution, we present the status and plans of the future ROOT 7 event I/O. Along with the ROOT 7 interface modernization, we aim for robust, where possible compile-time safe C++ interfaces to read and write event data. On the performance side, we show first benchmarks using ROOT’s new experimental I/O subsystem that combines the best of TTrees with recent advances in columnar data formats. A core ingredient is a strong separation of the high-level logical data layout (C++ classes) from the low-level physical data layout (storage backed nested vectors of simple types). We show how the new, optimized physical data layout speeds up serialization and deserialization and facilitates parallel, vectorized and bulk operations. This lets ROOT I/O run optimally on the upcoming ultra-fast NVRAM storage devices, as well as file-less storage systems such as object stores.

Posted Content
TL;DR: A novel refinement technique to progressively improve the CFG, which obtains counterexamples from CFG inclusion queries and uses them to introduce new non-terminals and productions to the grammar while still over-approximating the program’s relevant behavior.
Abstract: Several real-world libraries (e.g., reentrant locks, GUI frameworks, serialization libraries) require their clients to use the provided API in a manner that conforms to a context-free specification. Motivated by this observation, this paper describes a new technique for verifying the correct usage of context-free API protocols. The key idea underlying our technique is to over-approximate the program's feasible API call sequences using a context-free grammar (CFG) and then check language inclusion between this grammar and the specification. However, since this inclusion check may fail due to imprecision in the program's CFG abstraction, we propose a novel refinement technique to progressively improve the CFG. In particular, our method obtains counterexamples from CFG inclusion queries and uses them to introduce new non-terminals and productions to the grammar while still over-approximating the program's relevant behavior. We have implemented the proposed algorithm in a tool called CFPChecker and evaluate it on 10 popular Java applications that use at least one API with a context-free specification. Our evaluation shows that CFPChecker is able to verify correct usage of the API in clients that use it correctly and produces counterexamples for those that do not. We also compare our method against three relevant baselines and demonstrate that CFPChecker enables verification of safety properties that are beyond the reach of existing tools.

Journal ArticleDOI
TL;DR: This paper shows how to write a software program to avoid the problem of serialization in high-level synthesis and adapts this method to two image processing and evaluates the effect of this proposal to them.
Abstract: High-level synthesis (HLS) automatically converting software into hardware is a promising technology to reduce the design burden significantly. However, to use HLS technology efficiently, software program must be described considering the hardware organization that HLS tool will generate. We are developing the HLS image processing library. However, some caution is required when using HLS for programs that read images. When the same image is read through an argument of the function, the input port corresponding to this argument on the hardware generated by HLS tool may cause the port conflict. As a result, the image reading is made serialized and this serialization disturbs the performance of the data path well pipelined by the HLS tool. This paper shows how to write a software program to avoid this problem. In addition, we adapt this method to two image processing and evaluate the effect of our proposal to them.

Proceedings ArticleDOI
11 Jun 2020
TL;DR: The implementation of continuation marks for Chez Scheme (in support of Racket) makes dynamic binding and lookup constant-time and fast, preserves the performance of Chez scheme's first-class continuations, and imposes negligible overhead on program fragments that do not use first- class continuations or marks.
Abstract: Continuation marks enable dynamic binding and context inspection in a language with proper handling of tail calls and first-class, multi-prompt, delimited continuations. The simplest and most direct use of continuation marks is to implement dynamically scoped variables, such as the current output stream or the current exception handler. Other uses include stack inspection for debugging or security checks, serialization of an in-progress computation, and run-time elision of redundant checks. By exposing continuation marks to users of a programming language, more kinds of language extensions can be implemented as libraries without further changes to the compiler. At the same time, the compiler and runtime system must provide an efficient implementation of continuation marks to ensure that library-implemented language extensions are as effective as changing the compiler. Our implementation of continuation marks for Chez Scheme (in support of Racket) makes dynamic binding and lookup constant-time and fast, preserves the performance of Chez Scheme's first-class continuations, and imposes negligible overhead on program fragments that do not use first-class continuations or marks.

Journal ArticleDOI
TL;DR: This paper demonstrates how HDT - a compressed serialization format for RDF - can be extended to cater for supporting encryption and proposes a number of different graph partitioning strategies and discusses the benefits and tradeoffs of each approach.
Abstract: The publication and interchange of RDF datasets online has experienced significant growth in recent years, promoted by different but complementary efforts, such as Linked Open Data, the Web of Things and RDF stream processing systems. However, the current Linked Data infrastructure does not cater for the storage and exchange of sensitive or private data. On the one hand, data publishers need means to limit access to confidential data (e.g. health, financial, personal, or other sensitive data). On the other hand, the infrastructure needs to compress RDF graphs in a manner that minimises the amount of data that is both stored and transferred over the wire. In this paper, we demonstrate how HDT - a compressed serialization format for RDF - can be extended to cater for supporting encryption. We propose a number of different graph partitioning strategies and discuss the benefits and tradeoffs of each approach.

ReportDOI
20 Jan 2020
TL;DR: The JCS specification defines how to create a canonical representation of JSON data by building on the strict serialization methods for JSON primitives defined by ECMAScript, constraining JSON data to the I-JSON subset, and by using deterministic property sorting.
Abstract: Cryptographic operations like hashing and signing need the data to be expressed in an invariant format so that the operations are reliably repeatable. One way to address this is to create a canonical representation of the data. Canonicalization also permits data to be exchanged in its original form on the "wire" while cryptographic operations performed on the canonicalized counterpart of the data in the producer and consumer end points, generate consistent results. This document describes the JSON Canonicalization Scheme (JCS). The JCS specification defines how to create a canonical representation of JSON data by building on the strict serialization methods for JSON primitives defined by ECMAScript, constraining JSON data to the I-JSON subset, and by using deterministic property sorting.

Book ChapterDOI
Zihao Zhang1, Huiqi Hu1, Yang Yu1, Weining Qian1, Ke Shu 
24 Sep 2020
Abstract: Modern databases are commonly deployed on multiple commercial machines with quorum-based replication to provide high availability and guarantee strong consistency. A widely adopted consensus protocol is Raft because it is easy to understand and implement. However, Raft’s strict serialization limits the concurrency of the system, making it unable to reflect the capability of high concurrent transaction processing brought by new hardware and concurrency control technologies. Upon realizing this, the work targets on improving the parallelism of replication. We propose a variant of Raft protocol named DP-Raft to support parallel replication of database logs so that it can match the speed of transaction execution. Our key contributions are: (1) we define the rules for using log dependencies to commit and apply logs out of order; (2) DP-Raft is proposed for replicating logs in parallel. DP-Raft preserves log dependencies to ensure the safety of parallel replication and with some data structures to reduce the cost of state maintenance; (3) experiments on YCSB benchmark show that our method can improve throughput and reduce latency of transaction processing in database systems than existing Raft-based solutions.

Journal ArticleDOI
TL;DR: It is found that specific forms of constrained RSD ensure fairness under certain assumptions about the content of those orders but that the general case nevertheless requires unconstrained RSD.
Abstract: The emergence of high frequency trading has resulted in `bursts' of orders arriving at an exchange (nearly) simultaneously, yet most electronic financial exchanges implement the continuous limit order book which requires processing of orders serially. Contrary to an assumption that appears throughout the economics literature, the technology that performs serialization provides only constrained random serial dictatorship (RSD) in the sense that not all priority orderings of agents are possible. We provide necessary and sufficient conditions for fairness under different market conditions on orders for constrained RSD mechanisms. Our results show that exchanges relying on the current serialization technology cannot ensure fairness, including exchanges using `speed bumps.' We find that specific forms of constrained RSD ensure fairness under certain assumptions about the content of those orders but that the general case nevertheless requires unconstrained RSD. Our results have implications for the design of trading exchanges.

Posted Content
01 May 2020
TL;DR: It is shown that SPADE (SPatial DEpendency parser), an end-to-end spatial dependency parser that is serializer-free and capable of modeling an arbitrary number of information layers, outperforms the previous BIO tagging-based approach on name card parsing task and achieves comparable performance on receipt parsing task.
Abstract: Information Extraction (IE) for document images is often approached as a BIO tagging problem, where the model sequentially goes through and classifies each recognized input token into one of the information categories. However, such problem setup has two inherent limitations that (1) it can only extract a flat list of information and (2) it assumes that the input data is serialized, often by a simple rule-based script. Nevertheless, real-world documents often contain hierarchical information in the form of two-dimensional language data in which the serialization can be highly non-trivial. To tackle these issues, we propose SPADE$\spadesuit$ (SPatial DEpendency parser), an end-to-end spatial dependency parser that is serializer-free and capable of modeling an arbitrary number of information layers, making it suitable for parsing structure-rich documents such as receipts and multimodal documents such as name cards. We show that SPADE$\spadesuit$ outperforms the previous BIO tagging-based approach on name card parsing task and achieves comparable performance on receipt parsing task. Especially, when the receipt images have non-flat manifold representing physical distortion of receipt paper in real-world, SPADE$\spadesuit$ outperforms the tagging-based method by a large margin of 25.8% highlighting the strong performance of SPADE$\spadesuit$ over spatially complex document.

Book ChapterDOI
01 Jan 2020
TL;DR: A new, simpler, non-epigenetic alternative to Plush, called Plushy, is presented that appears to maintain all of the advantages of Plush while providing additional benefits and illustrates the virtues of unconstrained linear genome representations more generally.
Abstract: In many genetic programming systems, the program variation and execution processes operate on different program representations. The representations on which variation operates are referred to as genomes. Unconstrained linear genome representations can provide a variety of advantages, including reduced complexity of program generation, variation, simplification and serialization operations. The Plush genome representation, which uses epigenetic markers on linear genomes to express nonlinear structures, has supported the production of state-of-the-art results in program synthesis with the PushGP genetic programming system. Here we present a new, simpler, non-epigenetic alternative to Plush, called Plushy, that appears to maintain all of the advantages of Plush while providing additional benefits. These results illustrate the virtues of unconstrained linear genome representations more generally, and may be transferable to genetic programming systems that target different languages for evolved programs.