scispace - formally typeset
Search or ask a question

Showing papers on "Serialization published in 2021"


Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this paper, the authors observe that widely deployed NICs possess scatter-gather capabilities that can be re-purposed to accelerate serialization's core task of coalescing and flattening inmemory data structures.
Abstract: Microsecond I/O will make data serialization a major bottleneck for datacenter applications. Serialization is fundamentally about data movement: serialization libraries coalesce and flatten in-memory data structures into a single transmittable buffer. CPU-based serialization approaches will hit a performance limit due to data movement overheads and be unable to keep up with modern networks. We observe that widely deployed NICs possess scatter-gather capabilities that can be re-purposed to accelerate serialization's core task of coalescing and flattening in-memory data structures. It is possible to build a completely zero-copy, zero-allocation serialization library with commodity NICs. Doing so introduces many research challenges, including using the hardware capabilities efficiently for a wide variety of non-uniform data structures, making application memory available for zero-copy I/O, and ensuring memory safety.

19 citations


Proceedings ArticleDOI
18 Oct 2021
TL;DR: HyperProtoBench as mentioned in this paper is an open-source benchmark representative of key serialization-framework user services at scale, which is based on the Protocol Buffers (protobuf) library.
Abstract: Serialization frameworks are a fundamental component of scale-out systems, but introduce significant compute overheads. However, they are amenable to acceleration with specialized hardware. To understand the trade-offs involved in architecting such an accelerator, we present the first in-depth study of serialization framework usage at scale by profiling Protocol Buffers (“protobuf”) usage across Google’s datacenter fleet. We use this data to build HyperProtoBench, an open-source benchmark representative of key serialization-framework user services at scale. In doing so, we identify key insights that challenge prevailing assumptions about serialization framework usage. We use these insights to develop a novel hardware accelerator for protobufs, implemented in RTL and integrated into a RISC-V SoC. Applications can easily harness the accelerator, as it integrates with a modified version of the open-source protobuf library and is wire-compatible with standard protobufs. We have fully open-sourced our RTL, which, to the best of our knowledge, is the only such implementation currently available to the community. We also present a first-of-its-kind, end-to-end evaluation of our entire RTL-based system running hyperscale-derived benchmarks and microbenchmarks. We boot Linux on the system using FireSim to run these benchmarks and implement the design in a commercial 22nm FinFET process to obtain area and frequency metrics. We demonstrate an average 6.2 × to 11.2 × performance improvement vs. our baseline RISC-V SoC with BOOM OoO cores and despite the RISC-V SoC’s weaker uncore/supporting components, an average 3.8 × improvement vs. a Xeon-based server.

19 citations


Journal ArticleDOI
TL;DR: PowerSystems.jl implements an abstract hierarchy to represent and customize power systems data and includes data containers for quasi-static and dynamic simulation applications that include efficient management of large quantities of time series data, optimized serialization, and comprehensive validation capabilities.

18 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this paper, the authors argue for offloading serialization logic to the DMA path via specialized hardware, and propose an initial hardware design for such an accelerator, and give preliminary evidence of its feasibility and expected benefits.
Abstract: Achieving zero-copy I/O has long been an important goal in the networking community. However, data serialization obviates the benefits of zero-copy I/O, because it requires the CPU to read, transform, and write message data, resulting in additional memory copies between the real object instances and the contiguous socket buffer. Therefore, we argue for offloading serialization logic to the DMA path via specialized hardware. We propose an initial hardware design for such an accelerator, and give preliminary evidence of its feasibility and expected benefits.

18 citations


Journal ArticleDOI
TL;DR: In this paper, a context-aware recommendation system for improving manufacturing process modeling is proposed, where independent paths and P,Q-grams are efficiently extracted from the manufacturing processes in the repository to represent their typical behavior and structure.
Abstract: Process recommendation is an essential technique to help process modeler effectively and efficiently model a manufacturing process from scratch. However, the current process recommendation methods suffer from the following problems: (1) To extract all the execution paths from a manufacturing process, the behavior-based methods may occur a state space explosion problem when unfolding a process with multiple parallel patterns, resulting in low efficiency. (2) Current structure-based methods are inefficient since too many expensive computations of the graph edit distance are involved. (3) Most of the existing methods manually design their process similarity metrics with several features, which can only be applied in specific situations. (4) Few works provide visualization tools for process modeling assistance. To resolve these problems, this paper proposes a context-aware recommendation system for improving manufacturing process modeling. First, the independent paths and P,Q-grams are efficiently extracted from the manufacturing processes in the repository to represent their typical behavior and structure. Then, the process recommendation problem is transformed into the word prediction problem in natural language processing, where the serialization of an independent path/P,Q-gram and a node in it are separately regarded as a sentence and a word. The Word2vec model is introduced to automatically learn the relationships among nodes from independent paths and P,Q-grams and generate the vectors with hundreds of context-aware features for nodes in the repository. After that, the top-k similar nodes are recommended for the target node in the process fragment under construction based on the k-nearest neighbors algorithm. Finally, a visualization tool is provided for process modelers to efficiently design a new manufacturing process. Experimental evaluations show that the proposed method can perform similar or even better than the baseline methods in terms of recommending quality.

15 citations


Journal ArticleDOI
TL;DR: A novel extension of Theatre, Parallel Theatre, which is developed for an exploitation of the computing potential of nowadays multi-core machines with shared memory and the particular control forms which were developed for untimed and timed parallel systems are described.

14 citations


Journal ArticleDOI
TL;DR: The ISA Metadata Framework as discussed by the authors is a set of open source community specifications and software tools for enabling discovery, exchange, and publication of metadata from experiments in the life sciences.
Abstract: BACKGROUND The Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open source community specifications and software tools for enabling discovery, exchange, and publication of metadata from experiments in the life sciences. The original ISA software suite provided a set of user-facing Java tools for creating and manipulating the information structured in ISA-Tab-a now widely used tabular format. To make the ISA framework more accessible to machines and enable programmatic manipulation of experiment metadata, the JSON serialization ISA-JSON was developed. RESULTS In this work, we present the ISA API, a Python library for the creation, editing, parsing, and validating of ISA-Tab and ISA-JSON formats by using a common data model engineered as Python object classes. We describe the ISA API feature set, early adopters, and its growing user community. CONCLUSIONS The ISA API provides users with rich programmatic metadata-handling functionality to support automation, a common interface, and an interoperable medium between the 2 ISA formats, as well as with other life science data formats required for depositing data in public databases.

10 citations


Journal ArticleDOI
TL;DR: An deterministic rule-based approach is proposed to overcome the serialization specificities and to enable extraction of characteristic elements from differently serialized process models and an online web-based model-driven tool named AMADEOS is implemented, which is able to automatically derive conceptual database models from process models represented by different notations and also differentlyserialized.
Abstract: The existing tools that aim to derive data models from business process models are typically able to process the source models represented by one single notation and also serialized in one specific way. However, the standards (e.g., BPMN) enable different serialization formats and also provide serialization flexibility, which leads to various implementations of the standard in different modeling tools and results in differently serialized models in practice, which therefore significantly constraints usability of the existing model-driven tools. In this article, we present an approach to automatic derivation of conceptual database models from business process models represented by different notations, with particular focus on differently serialized process models. A deterministic rule-based approach is proposed to overcome the serialization specificities and to enable extraction of characteristic elements from differently serialized process models. Based on the proposed approach, we implemented an online web-based model-driven tool named AMADEOS, which is able to automatically derive conceptual database models from process models represented by different notations and also differently serialized. The experimental results show that the proposed approach and implemented tool enable successful extraction of specific elements from differently serialized process models and enable derivation of the target conceptual database models with very high completeness and precision.

7 citations


Journal ArticleDOI
04 Jan 2021
TL;DR: CFPChecker as mentioned in this paper is a tool for verifying the correct usage of context-free API protocols by over-approximating the program's feasible API call sequences using a CFG and checking language inclusion between this grammar and the specification.
Abstract: Several real-world libraries (e.g., reentrant locks, GUI frameworks, serialization libraries) require their clients to use the provided API in a manner that conforms to a context-free specification. Motivated by this observation, this paper describes a new technique for verifying the correct usage of context-free API protocols. The key idea underlying our technique is to over-approximate the program’s feasible API call sequences using a context-free grammar (CFG) and then check language inclusion between this grammar and the specification. However, since this inclusion check may fail due to imprecision in the program’s CFG abstraction, we propose a novel refinement technique to progressively improve the CFG. In particular, our method obtains counterexamples from CFG inclusion queries and uses them to introduce new non-terminals and productions to the grammar while still over-approximating the program’s relevant behavior. We have implemented the proposed algorithm in a tool called CFPChecker and evaluate it on 10 popular Java applications that use at least one API with a context-free specification. Our evaluation shows that CFPChecker is able to verify correct usage of the API in clients that use it correctly and produces counterexamples for those that do not. We also compare our method against three relevant baselines and demonstrate that CFPChecker enables verification of safety properties that are beyond the reach of existing tools.

6 citations


Posted Content
TL;DR: In this paper, a machine learning model is proposed to automatically generate 2D sketches from computer-aided design (CAD) applications, which are used in manufacturing to model everything from coffee mugs to sports cars.
Abstract: Computer-Aided Design (CAD) applications are used in manufacturing to model everything from coffee mugs to sports cars. These programs are complex and require years of training and experience to master. A component of all CAD models particularly difficult to make are the highly structured 2D sketches that lie at the heart of every 3D construction. In this work, we propose a machine learning model capable of automatically generating such sketches. Through this, we pave the way for developing intelligent tools that would help engineers create better designs with less effort. Our method is a combination of a general-purpose language modeling technique alongside an off-the-shelf data serialization protocol. We show that our approach has enough flexibility to accommodate the complexity of the domain and performs well for both unconditional synthesis and image-to-sketch translation.

6 citations


Proceedings ArticleDOI
06 Jun 2021
TL;DR: In this paper, the authors proposed a voting algorithm for connected component analysis on many-core architectures like GPUs because of the serialization of atomic memory accesses and the trend to increase the number of cores makes this issue even more critical.
Abstract: Connected Component Analysis is vastly used as a building block for many Computer Vision algorithms from many fields like medical image processing, surveillance, or autonomous driving. It extends Connected Component Labeling by computing some features of the connected components like their bounding box or their surface. As such, Connected Component Analysis is a voting algorithm just like histogram computation or Hough transform. Voting algorithms are difficult on many-core architectures like GPUs because of the serialization of atomic memory accesses. The trend to increase the number of cores makes this issue even more critical. This paper explores multiple ways to reduce those conflicts for voting algorithms and especially for Connected Component Analysis. We show that our new algorithm is from 4 up to 10 times faster than State-of-the-Art on average on an Nvidia A100.

Proceedings ArticleDOI
22 Jun 2021
TL;DR: Salsa as discussed by the authors is an approach to complement existing points-to analysis with respect to serialization-related features to enhance the call graph soundness while not greatly affecting its precision.
Abstract: Although call graphs are crucial for inter-procedural analyses, it is challenging to statically compute them for programs with dynamic features. Prior work focused on supporting certain kinds of dynamic features, but serialization-related features are still not very well supported. Therefore, we introduce Salsa, an approach to complement existing points-to analysis with respect to serialization-related features to enhance the call graph’s soundness while not greatly affecting its precision. We evaluate Salsa’s soundness, precision, and performance using 9 programs from the Java Call graph Assessment & Test Suite (CATS) and 4 programs from the XCorpus dataset. We compared Salsa against off-the-shelf call graph construction algorithms available on Soot, Doop, WALA, and OPAL. Our experiments showed that Salsa improved call graphs’ soundness while not greatly affecting their precision. We also observed that Salsa did not incur an extra overhead on the underlying pointer analysis method.

Journal ArticleDOI
02 Jul 2021-Sensors
TL;DR: In this paper, a new serialization format (PSON) is proposed for Internet of Things (IoT) environments, which simplifies the serialization/deserialization tasks and minimizes the messages to be sent/received.
Abstract: In many Internet of Things (IoT) environments, the lifetime of a sensor is linked to its power supply. Sensor devices capture external information and transmit it. They also receive messages with control commands, which means that one of the largest computational overheads of sensor devices is spent on data serialization and deserialization tasks, as well as data transmission. The simpler the serialization/deserialization and the smaller the size of the information to be transmitted, the longer the lifetime of the sensor device and, consequently, the longer the service life. This paper presents a new serialization format (PSON) for these environments, which simplifies the serialization/deserialization tasks and minimizes the messages to be sent/received. The paper presents evaluation results with the most popular serialization formats, demonstrating the improvement obtained with the new PSON format.

Journal ArticleDOI
TL;DR: In this article, a flipped classroom model for a digital micro-video for a big data English course is presented, which uses certain techniques to apply audiovisual language to the production of specific micro-class videos.
Abstract: This paper provides an in-depth analysis and study of the interactive flipped classroom model for a digital micro-video for a big data English course. To improve the learning efficiency of English courses and reduce the learning pressure of students, the thesis also uses certain techniques to apply audiovisual language to the production of specific micro-class videos, broadcast the successfully recorded micro-class courses to students, and then use the questionnaire to randomly distribute the designed audiovisual language use questionnaire. Micro-classes earnestly perform data statistics for students and finally conduct data analysis to summarize and verify the effects of micro-class audiovisual language use. The improved algorithm can effectively reduce the fluctuation of the consumption of various resources in the cluster and make the services in the cluster more stable. The new distributed interprocess communication based on protocol and serialization technology is more efficient than traditional communication based on protocol standards, reduces bandwidth consumption in the cluster, and improves the throughput of each node in the cluster. The content design and scripting of micro-video teaching resources are based on this. Then, the production process of micro-video teaching resources is explained, according to the selection of tools, the preparation, recording, editing, and generation of materials.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a method to determine the use planning of modular equipment in shale gas field, considering the processing capacity, processing cost, floor area, construction cost of the modular equipment and the changes of market supply and demand, an optimization model is established.
Abstract: The potential technical and economic advantages and flexible operability of modular equipment make it more and more widely used in gas field production and development. In addition to considering the manufacturing process, the selection and serialization of modular equipment should be made according to the change of gas well productivity curve, so as to meet the field demand to the greatest extent and enhance the flexibility of gathering and transportation system. This paper proposes a method to determine the use planning of modular equipment in shale gas field. Considering the processing capacity, processing cost, floor area, construction cost of modular equipment and the changes of market supply and demand, an optimization model is established. On the basis of the above model, the method of serialization of modular equipment is proposed. The effectiveness of the model is verified by a real case study. It is proved that the model can optimize the layout of modular equipment, make the modular equipment run efficiently and economically, reduce costs and increase efficiency. This study provides a reference for optimizing equipment management strategy and promoting green production practice of shale gas production.

Journal ArticleDOI
01 Jun 2021
TL;DR: An extension to Silver itself is described that simplifies writing language extensions for the ableC extensible C specification by allowing language engineers to specify C-language syntax trees using the concrete syntax of C (with typed holes) instead of writing abstract syntax trees.
Abstract: This paper shows how reflection on undecorated syntax trees (terms) used in attribute grammars can significantly reduce the amount of boiler-plate specifications that must be written. The proposed reflection system is implemented in the form of a function mapping terms and other values into a generic representation and a function for the inverse mapping. The system is implemented in the Silver attribute grammar system. We demonstrate the usefulness of this approach to reflection in attribute grammars in several ways. The first use is in the serialization and de-serialization of the interface files Silver generates to support separate compilation; a custom interface language was replaced by a generic reflection-based implementation. Secondly, we describe an extension to Silver itself that simplifies writing language extensions for the ableC extensible C specification by allowing language engineers to specify C-language syntax trees using the concrete syntax of C (with typed holes) instead of writing abstract syntax trees. Third, strategic term rewriting in the style of Stratego is implemented using reflection as a library for, and extension to, Silver . Finally, an experimental implementation of staged interpreters for a small staged functional language is discussed.

Proceedings ArticleDOI
10 Nov 2021
TL;DR: In this paper, the authors proposed a global address space for pervasive data identity in the OS and the network, which combines the code mobility of RPC with first-class data references in a global addressing space.
Abstract: As data becomes increasingly distributed, traditional RPC and data serialization limits performance, result in rigidity, and hamper expressivity. We believe that technology trends including high-density persistent memory, high-speed networks, and programmable switches make this the right time to revisit prior research on distributed shared memory, global addressing, and content-based networking. Our vision combines the code mobility of RPC with first-class data references in a global address space by co-designing the OS and the network around pervasive data identity. We have initial results showing the promise of the proposed co-design.

Journal ArticleDOI
05 Mar 2021-Langages
TL;DR: In this article, the grammatical status of the V1-V2 (Cy/vy) constructions found in Mbya Guarani is discussed, which can express simultaneous events, among other meanings, and involve a single clause.
Abstract: In this paper, we intend to describe and discuss the grammatical status of the V1-V2 (Cy/vy) constructions found in Mbya Guarani which can express simultaneous events, among other meanings, and which involve a single clause. We suggest here that this verbal complex can be treated as a case of asymmetrical verbal serialization because it contains verbs from a major lexical class, occupying the V1 slot, followed by a more restricted intransitive verbal class, such as movement, postural, or stative verbs, which stands in the V2 position. The curious property of these constructions is that V2 can be transitivized through the attachment of applicative or causative morphemes and “share” its object with transitive V1. “Object sharing” is another property attributed to serialization, as suggested by Baker and Baker and Stewart, which may be seen as a strong argument in favor of the present hypothesis. We will also provide evidence to distinguish Mbya Guarani V1-V2 (Cy/vy) complex from other constructions, such as temporal and purpose subordinate clauses, involving the particle vy.

Posted Content
TL;DR: In this paper, a domain-independent, flexible, and sequence first Python toolkit for processing and feature extraction is presented, which is capable of handling irregularly-sampled sequences with unaligned measurements.
Abstract: Time series processing and feature extraction are crucial and time-intensive steps in conventional machine learning pipelines. Existing packages are limited in their real-world applicability, as they cannot cope with irregularly-sampled and asynchronous data. We therefore present $\texttt{tsflex}$, a domain-independent, flexible, and sequence first Python toolkit for processing & feature extraction, that is capable of handling irregularly-sampled sequences with unaligned measurements. This toolkit is sequence first as (1) sequence based arguments are leveraged for strided-window feature extraction, and (2) the sequence-index is maintained through all supported operations. $\texttt{tsflex}$ is flexible as it natively supports (1) multivariate time series, (2) multiple window-stride configurations, and (3) integrates with processing and feature functions from other packages, while (4) making no assumptions about the data sampling rate regularity and synchronization. Other functionalities from this package are multiprocessing, in-depth execution time logging, support for categorical & time based data, chunking sequences, and embedded serialization. $\texttt{tsflex}$ is developed to enable fast and memory-efficient time series processing & feature extraction. Results indicate that $\texttt{tsflex}$ is more flexible than similar packages while outperforming these toolkits in both runtime and memory usage.

Journal ArticleDOI
TL;DR: A novel task-aware fine-grained storage scheme auto-selection mechanism that automatically determines the storage scheme for caching each data block, which is the smallest unit during computing, which can offer great performance improvement.
Abstract: In-memory big data computing, widely used in hot areas such as deep learning and artificial intelligence, can meet the demands of ultra-low latency service and real-time data analysis. However, existing in-memory computing frameworks usually use memory in an aggressive way. Memory space is quickly exhausted and leads to great performance degradation or even task failure. On the other hand, the increasing volumes of raw data and intermediate data introduce huge memory demands, which further deteriorate the short of memory. To release the pressure on memory, those in-memory frameworks provide various storage schemes options for caching data, which determines where and how data is cached. But their storage scheme selection mechanisms are simple and insufficient, always manually set by users. Besides, those coarse-grained data storage mechanisms cannot satisfy memory access patterns of each computing unit which works on only part of the data. In this paper, we proposed a novel task-aware fine-grained storage scheme auto-selection mechanism. It automatically determines the storage scheme for caching each data block, which is the smallest unit during computing. The caching decision is made by considering the future tasks, real-time resource utilization, and storage costs, including block creation costs, I/O costs, and serialization costs under each storage scenario. The experiments show that our proposed mechanism, compared with the default storage setting, can offer great performance improvement, especially in memory-constrained circumstances it can be as much as 78%.

Patent
15 Jan 2021
TL;DR: In this paper, a heterogeneous data real-time synchronization method is proposed, which consists of an extraction thread, a writing thread and a plurality of source databases, and the method comprises the following steps: in response to a data synchronization request input by a user, extracting incremental data from the source databases in parallel through the extraction thread; executing serialization analysis operation on the incremental data through the extractor thread, generating intermediate data and writing the intermediate data into an intermediate database; obtaining intermediate data from intermediate database in parallel using the write-in thread; performing deserialization analysis
Abstract: The invention discloses a heterogeneous data real-time synchronization method and device, equipment and a storage medium, and relates to an extraction thread, a writing thread and a plurality of source databases. The method comprises the following steps: in response to a data synchronization request input by a user, extracting incremental data from the source databases in parallel through the extraction thread; executing serialization analysis operation on the incremental data through the extraction thread, generating intermediate data and writing the intermediate data into an intermediate database; obtaining the intermediate data from the intermediate database in parallel through the write-in thread; performing deserialization analysis operation and idempotent operation on the intermediate data through the writing thread to obtain to-be-written data; and serially writing the to-be-written data into a target database through the writing thread, so that a universal data synchronizationmode is provided, the heterogeneous data synchronization efficiency is effectively improved, and meanwhile, rapid breakpoint continuous storage is achieved.

Proceedings ArticleDOI
01 May 2021
TL;DR: FlashByte as discussed by the authors proposes a lightweight native storage that efficiently caches intermediate data in the Java heap and reduces the overhead of garbage collection by transferring the data from the heap to the native storage.
Abstract: In-memory caching of intermediate data is effective in reducing re-computation and I/O cost in distributed data-analytics frameworks, but it also generates a large amount of data in Java heap which increases the overhead of garbage collection (GC). An alternative off-heap approach caches data in native storage by transmitting the data from heap to native storage so as to reduce GC overhead. However, it incurs severe serialization and de-serialization overheads. Serialization also generates non-trivial metadata of the cached data in native storage. We propose and develop FlashByte, a lightweight native storage that efficiently caches intermediate data. FlashByte improves memory efficiency by achieving low GC overhead, low data transmission overhead, and low memory consumption. Specifically, the cached data are divided into two parts: metadata stored in Java heap and raw data stored in native storage. The metadata is generated based on the profile of workloads. Its size is trivial because it only contains a concise format of raw data, which achieves low memory consumption in the heap as well as low GC overhead. Native storage only stores the raw data to reduce its memory consumption. According to the metadata, the raw data are efficiently transmitted between the heap and native storage without serialization and de-serialization. We implement FlashByte in Spark and conduct evaluation with benchmark workloads. Experimental results show that, compared with the in-heap approach of Vanilla Spark, FlashByte achieves up to 4x speedup of the job execution time, reduces GC time by up to 96%, and reduces the memory consumption in heap by up to 36%. Compared with the alternative off-heap approach, FlashByte achieves up to 2.3x speedup of the job execution time, reduces the data transmission time by up to 84%, and reduces the cache size in native storage by up to 34%.

DOI
08 Sep 2021
TL;DR: This work evaluates (de)serialization and transmission cost of mqtt.eclipse.org payloads on 8- to 32-bit microcontrollers and finds that Protocol Buffers and the XDR format, dating back to 1987, are most efficient.
Abstract: IoT devices rely on data exchange with gateways and cloud servers. However, the performance of today's serialization formats and libraries on embedded systems with energy and memory constraints is not well-documented and hard to predict. We evaluate (de)serialization and transmission cost of mqtt.eclipse.org payloads on 8- to 32-bit microcontrollers and find that Protocol Buffers (as implemented by NanoPB) and the XDR format, dating back to 1987, are most efficient.

Book ChapterDOI
01 Jan 2021
TL;DR: This chapter is a critical review of different approaches proposed over the years in the Semantic Web research community to address this limitation of RDF for capturing different types of information, such as data provenance, spatiotemporal data, and certainty.
Abstract: The many benefits of knowledge graphs using or based on the Resource Description Framework (RDF) well justify the utilization and wide deployment of a simple yet powerful, formally grounded data model, its serialization formats, vocabulary, and well-defined interpretation to be used for efficient querying, data integration, and automated reasoning. However, the simplicity of RDF comes at a price: there is no built-in mechanism for RDF statements to store metadata and context. This chapter is a critical review of different approaches proposed over the years in the Semantic Web research community to address this limitation, which are used for capturing different types of information, such as data provenance, spatiotemporal data, and certainty, which are crucial in data science applications to make statements context-aware, authoritative, verifiable, and reproducible.

Book ChapterDOI
18 May 2021
TL;DR: In this article, the authors propose a snapshot-based technique to migrate a stateful JavaScript program to enable a non-breaking user experience across different devices by profiling and serializing the runtime state.
Abstract: Recently, researches have proposed application (app) migration approaches for JavaScript programs to enable a non-breaking user experience across different devices. To migrate a stateful JavaScript app’s runtime, past studies have proposed snapshot-based techniques in which the app’s runtime state is profiled and serialized into a text form that can be restored back later. A common limitation of existing literature, however, is that they are based on old JavaScript specifications. Since major updates introduced by ECMASCript2015 (a.k.a. ES6), JavaScript supports various features that cannot be migrated correctly with existing methods. Some of these features are in fact heavily used in today’s real-world apps and thus greatly reduces the scope of previous works.


Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this paper, a C++ support library for ROS is proposed to track the provenance of message data across multiple nodes and apply source changes, reversing any transformation on the tracked data.
Abstract: Interactive controls that enrich visualizations need domain knowledge to create a sensible visual representation, as well as access to parameters and data to manipulate. However, source data and the means to visualize them are often scattered across multiple components, making it hard to link a value change in the interface to the appropriate source data. Provenance, the documentation of the origin and history of message data, can be used to reverse the evaluation of a value and change it at its source. We present a communication pattern as well as a C++ support library for ROS to track the provenance of message data across multiple nodes and apply source changes, reversing any transformation on the tracked data. We demonstrate that it is possible to automatically infer interactive 3D user interfaces from standard, non-interactive ROS visualizations by leveraging this additional tracking information. Preliminary results from a prototypical implementation of multiple origin tracking enabled ROS nodes indicate, that this tracking introduces a significant but still practicable message size and serialization performance overhead. To apply this tracking to existing C++ codebases only small, syntactic changes are necessary: a wrapper type around tracked values hides all necessary bookkeeping.


Proceedings ArticleDOI
01 Aug 2021
TL;DR: Match-Extend as discussed by the authors is a deterministic serialization algorithm for the derivation of surface forms, which is based on the Precedence-based phonology or Multiprecedence phonology.
Abstract: Raimy (1999; 2000a; 2000b) proposed a graphical formalism for modeling reduplication, originallymostly focused on phonological overapplication in a derivational framework. This framework is now known as Precedence-based phonology or Multiprecedence phonology. Raimy’s idea is that the segments at the input to the phonology are not totally ordered by precedence. This paper tackles a challenge that arose with Raimy’s work, the development of a deterministic serialization algorithm as part of the derivation of surface forms. The Match-Extend algorithm introduced here requires fewer assumptions and sticks tighter to the attested typology. The algorithm also contains no parameter or constraint specific to individual graphs or topologies, unlike previous proposals. Match-Extend requires nothing except knowing the last added set of links.

Journal ArticleDOI
15 Oct 2021
TL;DR: In this paper, the authors present an architecture for the efficient storing and querying of large RDF datasets, which is built over HDT, an RDF serialization framework, and its interaction with the Jena query engine.
Abstract: We present an architecture for the efficient storing and querying of large RDF datasets. Our approach seeks to store RDF datasets in very little space while offering complete SPARQL functionality. To achieve this, our proposal was built over HDT, an RDF serialization framework, and its interaction with the Jena query engine. We propose a set of modifications to this framework in order to incorporate a range of space-efficient compact data structures for data storage and access, while using high-level capabilities to answer more complicated SPARQL queries. As a result, our approach provides a standard mechanism for using low-level data structures in complicated query situations requiring SPARQL searches, which are typically not supported by current solutions.