Showing papers on "Distributed object published in 2021"

PDF

Open Access

Journal Article•DOI•

A Global-Local Self-Adaptive Network for Drone-View Object Detection

[...]

Sutao Deng¹, Shuai Li¹, Ke Xie, Wenfeng Song¹, Xiao Liao, Aimin Hao¹, Hong Qin² - Show less +3 more•Institutions (2)

Beihang University¹, Stony Brook University²

01 Jan 2021-IEEE Transactions on Image Processing

TL;DR: In this article, the authors propose an end-to-end global-local self-adaptive network (GLSAN) for drone-view object detection, which includes a global-layer detection network (GLDN), a simple yet efficient selfadaptive region selecting algorithm (SARSA), and a local super-resolution network (LSRN).

...read moreread less

Abstract: Directly benefiting from the deep learning methods, object detection has witnessed a great performance boost in recent years. However, drone-view object detection remains challenging for two main reasons: (1) Objects of tiny-scale with more blurs w.r.t. ground-view objects offer less valuable information towards accurate and robust detection; (2) The unevenly distributed objects make the detection inefficient, especially for regions occupied by crowded objects. Confronting such challenges, we propose an end-to-end global-local self-adaptive network (GLSAN) in this paper. The key components in our GLSAN include a global-local detection network (GLDN), a simple yet efficient self-adaptive region selecting algorithm (SARSA), and a local super-resolution network (LSRN). We integrate a global-local fusion strategy into a progressive scale-varying network to perform more precise detection, where the local fine detector can adaptively refine the target’s bounding boxes detected by the global coarse detector via cropping the original images for higher-resolution detection. The SARSA can dynamically crop the crowded regions in the input images, which is unsupervised and can be easily plugged into the networks. Additionally, we train the LSRN to enlarge the cropped images, providing more detailed information for finer-scale feature extraction, helping the detector distinguish foreground and background more easily. The SARSA and LSRN also contribute to data augmentation towards network training, which makes the detector more robust. Extensive experiments and comprehensive evaluations on the VisDrone2019-DET benchmark dataset and UAVDT dataset demonstrate the effectiveness and adaptivity of our method. Towards an industrial application, our network is also applied to a DroneBolts dataset with proven advantages. Our source codes have been available at https://github.com/dengsutao/glsan .

...read moreread less

39 citations

Journal Article•DOI•

A Scalable Platform for Distributed Object Tracking Across a Many-Camera Network

[...]

Aakash Khochare¹, Aravindhan Krishnan¹, Yogesh Simmhan¹•Institutions (1)

Indian Institute of Science¹

01 Jun 2021-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Anveshak as mentioned in this paper is a runtime platform for composing and coordinating distributed tracking applications, which provides a domain-specific dataflow programming model to intuitively compose a tracking application, supporting contemporary CV advances like query fusion and re-identification, and enabling dynamic scoping of the camera network's search space to avoid wasted computation.

...read moreread less

Abstract: Advances in deep neural networks (DNN) and computer vision (CV) algorithms have made it feasible to extract meaningful insights from large-scale deployments of urban cameras. Tracking an object of interest across the camera network in near real-time is a canonical problem. However, current tracking platforms have two key limitations: 1) They are monolithic, proprietary and lack the ability to rapidly incorporate sophisticated tracking models, and 2) They are less responsive to dynamism across wide-area computing resources that include edge, fog, and cloud abstractions. We address these gaps using Anveshak , a runtime platform for composing and coordinating distributed tracking applications. It provides a domain-specific dataflow programming model to intuitively compose a tracking application, supporting contemporary CV advances like query fusion and re-identification, and enabling dynamic scoping of the camera network's search space to avoid wasted computation. We also offer tunable batching and data-dropping strategies for dataflow blocks deployed on distributed resources to respond to network and compute variability. These balance the tracking accuracy, its real-time performance, and the active camera-set size. We illustrate the concise expressiveness of the programming model for four tracking applications. Our detailed experiments for a network of 1000 camera-feeds on modest resources exhibit the tunable scalability, performance, and quality trade-offs enabled by our dynamic tracking, batching, and dropping strategies.

...read moreread less

8 citations

Proceedings Article•DOI•

Enabling near-data processing in distributed object storage systems

[...]

Ian F. Adams¹, Neha Agrawal², Michael P. Mesnier¹•Institutions (2)

Intel¹, Portland State University²

27 Jul 2021

TL;DR: In this article, the authors propose a hinting mechanism for near data processing (NDP) in distributed storage systems to reduce data movement by up to 99% when querying CSV data with NDP co-located with the stored data.

...read moreread less

Abstract: Most general-purpose distributed storage systems are not designed with near data processing (NDP) in mind. They do not respect semantic data boundaries when writing data, for example splitting a record across servers. This reduces NDP effectiveness by requiring data collation before computation. While semantic data awareness and NDP functions can be retroactively added to existing distributed storage, it is often complex and difficult to accomplish in practice. We propose sharing storage system layout information with data writers so they can adjust data layouts to prevent data alignment issues regardless of the underlying architectures. By doing so, we can simplify NDP implementation by reducing the need for data reassembly, and reduce the need for complex storage system or application extensions. We demonstrate a hinting mechanism on both HDFS with computational block storage and an erasure coded MinIO deployment, reducing data movement by up to 99% when querying CSV data with NDP co-located with the stored data. This was accomplished purely with client side data alignment, no modifications to the server side write paths, and no inter-node collation of data.

...read moreread less

5 citations

Book Chapter•DOI•

Collection and Consolidation of Big Data for Proactive Monitoring of Critical Events at Infrastructure Facilities in an Urban Environment

[...]

Anton Finogeev¹, Danila Parygin, Sergey Schevchenko², Alexey Finogeev¹, Danish Ather³ - Show less +1 more•Institutions (3)

Penza State University¹, Kharkiv Polytechnic Institute², Sharda University³

20 Sep 2021

TL;DR: In this paper, the authors investigated the concept of proactive monitoring of critical events at the distributed objects of engineering communications of the city and in the urban road environment and proposed a multi-agent approach, which involves the use of software agents directly on distributed data collection sources and robots for collecting data from open sources on the Internet, brokers for consolidation and protection of transmitted data, a component of distributed information warehouse.

...read moreread less

Abstract: The article proposes and investigates the concept of proactive monitoring of critical events at the distributed objects of engineering communications of the city and in the urban road environment. The purpose of monitoring is to determine, assess and predict the dynamics of the risks of critical events, depending on changes in indicators and factors correlating with them. The main characteristics of distributed monitoring objects, the classification and analysis of critical events, the reasons for their occurrence and possible influencing factors are given. For predictive analysis and assessment of risks of critical events, collection, consolidation and analysis of big sensor and social data is carried out, reflecting the dynamics of changes in the indicators of monitored objects and external factors of influence. Big data includes indicators of monitored objects, information about events and possible causes of their occurrence, factors influencing the risks of their development. Cyber-physical (sensor) data is unloaded from spatially distributed photo and video recording complexes, video surveillance cameras, weather stations, measuring instruments and sensors of objects and pipelines of engineering networks. Cyber-social (social) data is collected from open sources of information on the Internet and mobile communications of the civilian population. The monitoring system is using a multi-agent approach, which involves the use of software agents directly on distributed data collection sources and robots for collecting data from open sources on the Internet, brokers for consolidation and protection of transmitted data, a component of a distributed information warehouse.

...read moreread less

4 citations

Journal Article•DOI•

An Overview of Service Interface Design Approaches for Interoperability of Traditional System Integration Patterns

[...]

Norziana Yahya¹, Mohd Azahani Md Taib•Institutions (1)

Universiti Teknologi MARA¹

14 Mar 2021

TL;DR: A basic concept on types of traditional SIP covering File-Based, Common Database, Remote Procedure Call, Distributed Objects, and Messaging is included, and an overview of three Service Interface Design (SID) approaches for systems interoperability is discussed.

...read moreread less

Abstract: One of the major issues in system integration is to deal with interoperability of legacy systems which use traditional System Integration Patterns (SIP). Information are unable to exchange effectively when the systems involved comes from developer that tended to not interoperate and this leads to the interoperability problem in heterogeneous system integration. To address the interoperability issues, interfacing processes need to be made more easily by defining components, processes, and interfaces that affect the system integration architecture at the initial design stage. This paper includes a basic concept on types of traditional SIP covering File-Based, Common Database, Remote Procedure Call (RPC), Distributed Objects, and Messaging. An overview of three Service Interface Design (SID) approaches for systems interoperability is discussed. The discussions on these approaches serve as a basis for the solution of interoperability of heterogeneous systems which use traditional SIP.

...read moreread less

3 citations

Journal Article•DOI•

Set-constrained delivery broadcast: A communication abstraction for read/write implementable distributed objects

[...]

Damien Imbs, Achour Mostefaoui, Matthieu Perrin, Michel Raynal¹, Michel Raynal² - Show less +1 more•Institutions (2)

Hong Kong Polytechnic University¹, Institut de Recherche en Informatique et Systèmes Aléatoires²

13 Sep 2021-Theoretical Computer Science

TL;DR: A new communication abstraction, called Set-Constrained Delivery Broadcast (SCD-broadcast), whose aim is to provide its users with an appropriate abstraction level when they have to implement objects or distributed tasks in an asynchronous message-passing system prone to process crash failures is introduced.

...read moreread less

2 citations

Journal Article•DOI•

Exploring Object Stores for High-Energy Physics Data Storage

[...]

Javier López-Gómez¹, Jakob Blomer¹•Institutions (1)

CERN¹

15 Jul 2021-arXiv: Databases

TL;DR: The ROOT RNTuple I/O system aims at overcoming TTree's limitations and at providing improved efficiency for modern storage systems, such as NVMe devices and distributed object stores as mentioned in this paper.

...read moreread less

Abstract: Over the last two decades, ROOT TTree has been used for storing over one exabyte of High-Energy Physics (HEP) events. The TTree columnar on-disk layout has been proved to be ideal for analyses of HEP data that typically require access to many events, but only a subset of the information stored for each of them. Future colliders, and particularly HL-LHC, will bring an increase of at least one order of magnitude in the volume of generated data. Therefore, the use of modern storage hardware, such as low-latency high-bandwidth NVMe devices and distributed object stores, becomes more important. However, TTree was not designed to optimally exploit modern hardware and may become a bottleneck for data retrieval. The ROOT RNTuple I/O system aims at overcoming TTree's limitations and at providing improved efficiency for modern storage systems. In this paper, we extend RNTuple with a backend that uses Intel DAOS as the underlying storage, demonstrating that the RNTuple architecture can accommodate high-performance object stores. From the user perspective, data can be accessed with minimal changes to the code, that is by replacing a filesystem path by a DAOS URI. Our performance evaluation shows that the new backend can be used for realistic analyses, while outperforming the compatibility solution provided by the DAOS project.

...read moreread less

2 citations

Journal Article•DOI•

On distributed object storage architecture based on mimic defense

[...]

Haiyang Yu, Hui Li, Xin Yang, Ma Huajun

01 Aug 2021-China Communications

TL;DR: Based on mimic defense theory, the authors constructs the principle framework of the distributed object storage system and introduces the dynamic redundancy and heterogeneous function in the DOS architecture, which increases the attack cost, and greatly improves the security and availability of data.

...read moreread less

Abstract: With the advent of the era of big data, cloud computing, Internet of things, and other information industries continue to develop. There is an increasing amount of unstructured data such as pictures, audio, and video on the Internet. And the distributed object storage system has become the mainstream cloud storage solution. With the increasing number of distributed applications, data security in the distributed object storage system has become the focus. For the distributed object storage system, traditional defenses are means that fix discovered system vulnerabilities and backdoors by patching, or means to modify the corresponding structure and upgrade. However, these two kinds of means are hysteretic and hardly deal with unknown security threats. Based on mimic defense theory, this paper constructs the principle framework of the distributed object storage system and introduces the dynamic redundancy and heterogeneous function in the distributed object storage system architecture, which increases the attack cost, and greatly improves the security and availability of data.

...read moreread less

2 citations

Journal Article•DOI•

Much ADO about failures: a fault-aware model for compositional verification of strongly consistent distributed systems

[...]

Wolf Honoré¹, Jieung Kim¹, Ji-Yong Shin², Zhong Shao¹•Institutions (2)

Yale University¹, Northeastern University²

15 Oct 2021

TL;DR: In this paper, the authors propose a compositional, atomic distributed object (ADO) model for strongly consistent distributed systems that combines the best of both options, which abstracts over protocol-specific details and decouples high-level correctness reasoning from implementation choices.

...read moreread less

Abstract: Despite recent advances, guaranteeing the correctness of large-scale distributed applications without compromising performance remains a challenging problem. Network and node failures are inevitable and, for some applications, careful control over how they are handled is essential. Unfortunately, existing approaches either completely hide these failures behind an atomic state machine replication (SMR) interface, or expose all of the network-level details, sacrificing atomicity. We propose a novel, compositional, atomic distributed object (ADO) model for strongly consistent distributed systems that combines the best of both options. The object-oriented API abstracts over protocol-specific details and decouples high-level correctness reasoning from implementation choices. At the same time, it intentionally exposes an abstract view of certain key distributed failure cases, thus allowing for more fine-grained control over them than SMR-like models. We demonstrate that proving properties even of composite distributed systems can be straightforward with our Coq verification framework, Advert, thanks to the ADO model. We also show that a variety of common protocols including multi-Paxos and Chain Replication refine the ADO semantics, which allows one to freely choose among them for an application's implementation without modifying ADO-level correctness proofs.

...read moreread less

1 citations

Journal Article•DOI•

A Reliable Large Distributed Object Store Based Platform for Collecting Event Metadata

[...]

Alvaro Fernandez Casani¹, Juan M. Orduña², Javier Sánchez¹, Santiago González de la Hoz¹•Institutions (2)

Spanish National Research Council¹, University of Valencia²

01 Sep 2021-Journal of Grid Computing

TL;DR: In this paper, a new approach based on an object-based storage method was designed and implemented, taking into account the lessons learned and leveraging the ATLAS experience with this kind of systems.

...read moreread less

Abstract: The Large Hadron Collider (LHC) is about to enter its third run at unprecedented energies. The experiments at the LHC face computational challenges with enormous data volumes that need to be analysed by thousands of physics users. The ATLAS EventIndex project, currently running in production, builds a complete catalogue of particle collisions, or events, for the ATLAS experiment at the LHC. The distributed nature of the experiment data model is exploited by running jobs at over one hundred Grid data centers worldwide. Millions of files with petabytes of data are indexed, extracting a small quantity of metadata per event, that is conveyed with a data collection system in real time to a central Hadoop instance at CERN. After a successful first implementation based on a messaging system, some issues suggested performance bottlenecks for the challenging higher rates in next runs of the experiment. In this work we characterize the weaknesses of the previous messaging system, regarding complexity, scalability, performance and resource consumption. A new approach based on an object-based storage method was designed and implemented, taking into account the lessons learned and leveraging the ATLAS experience with this kind of systems. We present the experiment that we run during three months in the real production scenario worldwide, in order to evaluate the messaging and object store approaches. The results of the experiment show that the new object-based storage method can efficiently support large-scale data collection for big data environments like the next runs of the ATLAS experiment at the LHC.

...read moreread less

1 citations

Patent•

Method and apparatus for processing object metadata

[...]

Stephen G. Graham¹, Eron D. Wright¹•Institutions (1)

EMC Corporation¹

25 May 2021

TL;DR: In this article, a distributed object store can expose object metadata, in addition to object data, to distributed processing systems such as Hadoop and Apache Spark, and it may act as a HCFS, exposing object metadata as a collection of records that can be efficiently processed by MapReduce (MR) and other distributed processing frameworks.

...read moreread less

Abstract: A distributed object store can expose object metadata, in addition to object data, to distributed processing systems, such as Hadoop and Apache Spark. The distributed object store may acts as a Hadoop Compatible File System (HCFS), exposing object metadata as a collection of records that can be efficiently processed by MapReduce (MR) and other distributed processing frameworks. Various metadata records formats are supported. Related methods are also described.

...read moreread less

Posted Content•

A decentralized FAIR platform to facilitate data sharing in the life sciences.

[...]

Pavel Vazquez, Kayoko Shoji, Steffen Novik, Stefan Krauss, Simon Rayner - Show less +1 more

08 Feb 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors developed a simple, user-friendly platform built for academic and scientific research collaboration, which consists of a metadata quality control based on blockchain technologies, and the data is stored separately in a distributed object storage that functions as a cloud.

...read moreread less

Abstract: The Hybrid Technology Hub and many other research centers work in cross-functional teams whose workflow is not necessarily linear and where in many cases technology advances are done through parallel work. The lack of proper tools and platforms for a collaborative environment can create time lags in coordination and limited sharing of research findings. To solve this, we have developed a simple, user-friendly platform built for academic and scientific research collaboration. To ensure FAIRness compliance, the platform consists of a metadata quality control based on blockchain technologies. The data is stored separately in a distributed object storage that functions as a cloud. The platform also implements a version control system; it provides a history track of the project along with the possibility of reviewing the project's development. This platform aims to be a standardized tool within the Hybrid Technology Hub to ease collaboration, speed research workflow and improve research quality.

...read moreread less

Proceedings Article•DOI•

Multi-agent Approach to the Formation of a Unified Geoinformation Environment

[...]

Galina V. Verkhova¹, Sergei V. Akimov¹•Institutions (1)

St. Petersburg State University of Telecommunications¹

26 May 2021

TL;DR: In this article, the authors present the results of research in the field of forming a unified geo-information environment, which is a development of the concept of the cyber environment of virtual enterprises.

...read moreread less

Abstract: The article presents the results of research in the field of forming a unified geoinformation environment. The concept of the geo-information environment is a development of the concept of the cyber environment of virtual enterprises, which is formed from agents of three types: individuals, legal entities and man-made objects, by introducing geo-information into it. The geoinformation cyber environment provides tools that are best suited to automate the management of spatially distributed objects.

...read moreread less

DOI•

The attacking methods involved in current trend environment

[...]

Rahma Fitria

11 Aug 2021

TL;DR: In this paper, the authors have presented a simple program to handle auction sale in a CORBA application using JAVA programming language and its method is being invoked by a client from the other machine that also runs in Windows OS.

...read moreread less

Abstract: The use of Common Object Request Broker Architecture (CORBA) has become one of the answer to the requirement for interoperability among the rapidly increasing number of hardware and software products available nowadays. CORBA has been introduced as a mechanism in a distributed computing environment in order to overcome recent interoperability issue. This mechanism allows distributed objects to communicate with each other, whether operate on remote devices or local devices, written in different languages or platforms, or at different locations on a network. In this paper, the concept of CORBA application as a middleware is presented. In order to understand this concept, a simple program to handle auction sale. This system is developed using JAVA programming language. This application is implemented on Windows Operating System (OS) and its method is being invoked by a client from the other machine that also runs in Windows OS. Along with this, the benefits of CORBA and its limitation are discussed in this paper. Keywords—CORBA concept; client-server; benefits of Corba; Application example

...read moreread less

Journal Article•DOI•

Performance research and prospect exploration of distributed object storage of meteorological data in domestic platform

[...]

Kexin Cheng

01 Mar 2021

TL;DR: New excellent domestic products including Phytium chips, Starblaze open-channel flash memory devices, and Kylin operating systems are adopted as components of a new domestic software and hardware platform for distributed storage of meteorological data under domestic platforms.

...read moreread less

Abstract: Meteorological data are important information data in China, and have made tremendous contributions to disaster prevention and disaster reduction in China. However, the current meteorological system has a long-term dependence on foreign hardware and software platforms, showing obvious security problems. It is an urgent task to complete the domestication and performance improvement of the meteorological system. At the same time, as the black box architecture of flash memory devices greatly hinders the co-optimization of software and hardware, open-channel flash memory has begun to attract attention as a new type of flash memory architecture. In this article, we have adopted new excellent domestic products including Phytium chips, Starblaze open-channel flash memory devices, and Kylin operating systems as components of a new domestic software and hardware platform. The object storage system of meteorological data has been implemented in a real environment by Ceph. We provides a built solution and optimizes the parameters. Based on the comparison with the commercial platform storage system, the problems and challenges faced by the meteorological data distributed object storage system in autonomy are discussed. Finally, this article looks forward to the distributed storage of meteorological data under domestic platforms.

...read moreread less

Journal Article•DOI•

Ensor networks - distributed information measuring and information control systems

[...]

Boris Ya. Likhttsinder, Б Я Лихтциндер

30 Aug 2021

TL;DR: It is shown that modern wireless sensor networks can be considered as distributed information measuring and information control systems.

...read moreread less

Abstract: The problems of control and management of geographically distributed objects are considered. The sensor networks operating on the ZigBee technology are considered. The characteristics of the 802.15.4 ZigBee standard are given. The advantages of this technology are shown when building networks that are not very critical to traffic delays. The elements of such a network are considered. The primary converters used in such networks and their energy characteristics are considered. The issues of reducing and compensating delays in control circuits are considered. It is shown that modern wireless sensor networks can be considered as distributed information measuring and information control systems.

...read moreread less

Proceedings Article•DOI•

The Application of the Method of the Square Four-Point Partially Coherent Model of the Volume Distributed Object Synthesis Based on its Multipoint Model

[...]

Artemy O. Podkopayev¹, Maksim A. Stepanov¹•Institutions (1)

Novosibirsk State Technical University¹

30 Jun 2021

TL;DR: In this paper, the authors proposed a method of replacing the volume distributed object (moisture target) with two-dimensional four-point partially coherent matrix simulator's models, and the synthesis of the starting multipoint model of the moisture target based on requirements to powers and wind velocities distributions over the object is explored.

...read moreread less

Abstract: The method of replacing the volume distributed object (moisture target) with two-dimensional four-point partially coherent matrix simulator's models is proposed in this paper. The synthesis of the starting multipoint model of the moisture target based on requirements to powers and wind velocities distributions over the object is explored. Expressions allowing one to replace the moisture target with the low-point partially coherent model based on the multipoint starting model are defined.

...read moreread less

Journal Article•DOI•

Exploring Object Stores for High-Energy Physics Data Storage

[...]

Javier López-Gómez, Jakob Blomer

15 Jul 2021-Epj Web of Conferences

TL;DR: The ROOT RNTuple I/O system aims at overcoming TTree's limitations and at providing improved efficiency for modern storage systems, such as NVMe devices and distributed object stores.

...read moreread less

Journal Article•DOI•

Comparing of Soap and Distributed Object Technologies: A Case Study

[...]

Manish L Jivtode

30 Mar 2021

DOI•

Evaluating CephFS Performance vs. Cost on High-Density Commodity Disk Servers

[...]

Andreas J. Peters¹, Daniel van der Ster¹•Institutions (1)

CERN¹

09 Nov 2021

TL;DR: In this article, the authors evaluate the performance of CephFS on this cost-optimized hardware when it is combined with EOS to support the missing functionalities, including third-party copy, SciTokens, and high-level user and quota management.

...read moreread less

Abstract: CephFS is a network filesystem built upon the Reliable Autonomic Distributed Object Store (RADOS). At CERN we have demonstrated its reliability and elasticity while operating several 100-to-1000TB clusters which provide NFS-like storage to infrastructure applications and services. At the same time, our lab developed EOS to offer high performance 100PB-scale storage for the LHC at extremely low costs while also supporting the complete set of security and functional APIs required by the particle-physics user community. This work seeks to evaluate the performance of CephFS on this cost-optimized hardware when it is combined with EOS to support the missing functionalities. To this end, we have setup a proof-of-concept Ceph Octopus cluster on high-density JBOD servers (840 TB each) with 100Gig-E networking. The system uses EOS to provide an overlayed namespace and protocol gateways for HTTP(S) and XROOTD, and uses CephFS as an erasure-coded object storage backend. The solution also enables operators to aggregate several CephFS instances and adds features, such as third-party-copy, SciTokens, and high-level user and quota management. Using simple benchmarks we measure the cost/performance tradeoffs of different erasure-coding layouts, as well as the network overheads of these coding schemes. We demonstrate some relevant limitations of the CephFS metadata server and offer improved tunings which can be generally applicable. To conclude, we reflect on the advantages and drawbacks related to this architecture, such as RADOS-level free space requirements and double-network penalties, and offer ideas for improvements in the future.

...read moreread less

Journal Article•DOI•

GenoVault: a cloud based genomics repository.

[...]

Sankalp Jain¹, Amit Saxena¹, Suprit Hesarur¹, Kirti Bhadhadhara¹, Neeraj Bharti¹, Sunitha Manjari Kasibhatla¹, Uddhavesh Sonavane¹, Rajendra Joshi¹ - Show less +4 more•Institutions (1)

Centre for Development of Advanced Computing¹

29 Jul 2021-Biodata Mining

TL;DR: GenoVault is a cloud-based repository for handling Next Generation Sequencing (NGS) data developed using OpenStack based private cloud with various services like keystone for authentication, cinder for block storage, neutron for networking and nova for managing compute instances for the Cloud.

...read moreread less

Abstract: GenoVault is a cloud-based repository for handling Next Generation Sequencing (NGS) data. It is developed using OpenStack-based private cloud with various services like keystone for authentication, cinder for block storage, neutron for networking and nova for managing compute instances for the Cloud. GenoVault uses object-based storage, which enables data to be stored as objects instead of files or blocks for faster retrieval from different distributed object nodes. Along with a web-based interface, a JavaFX-based desktop client has also been developed to meet the requirements of large file uploads that are usually seen in NGS datasets. Users can store files in their respective object-based storage areas and the metadata provided by the user during file uploads is used for querying the database. GenoVault repository is designed taking into account future needs and hence can scale both vertically and horizontally using OpenStack-based cloud features. Users have an option to make the data shareable to the public or restrict the access as private. Data security is ensured as every container is a separate entity in object-based storage architecture which is also supported by Secure File Transfer Protocol (SFTP) for data upload and download. The data is uploaded by the user in individual containers that include raw read files (fastq), processed alignment files (bam, sam, bed) and the output of variation detection (vcf). GenoVault architecture allows verification of the data in terms of integrity and authentication before making it available to collaborators as per the user’s permissions. GenoVault is useful for maintaining the organization-wide NGS data generated in various labs which is not yet published and submitted to public repositories like NCBI. GenoVault also provides support to share NGS data among the collaborating institutions. GenoVault can thus manage vast volumes of NGS data on any OpenStack-based private cloud.

...read moreread less