Showing papers on "Metadata repository published in 2018"

PDF

Open Access

Journal Article•DOI•

MIRACUM: Medical Informatics in Research and Care in University Medicine

[...]

17 Jul 2018-Methods of Information in Medicine

TL;DR: The MIRACUM Data Integration Center (DIC) as discussed by the authors is a consortium of academic and hospital partners as well as one industrial partner in eight German cities which have joined forces to create interoperable data integration centres (dIC) and make data within those DIC available for innovative new IT solutions in patient care and medical research.

...read moreread less

Abstract: Introduction: This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. Similar to other large international data sharing networks (e.g. OHDSI, PCORnet, eMerge, RD-Connect) MIRACUM is a consortium of academic and hospital partners as well as one industrial partner in eight German cities which have joined forces to create interoperable data integration centres (DIC) and make data within those DIC available for innovative new IT solutions in patient care and medical research. Objectives: Sharing data shall be supported by common interoperable tools and services, in order to leverage the power of such data for biomedical discovery and moving towards a learning health system. This paper aims at illustrating the major building blocks and concepts which MIRACUM will apply to achieve this goal. Governance and Policies: Besides establishing an efficient governance structure within the MIRACUM consortium (based on the steering board, a central administrative office, the general MIRACUM assembly, six working groups and the international scientific advisory board), defining DIC governance rules and data sharing policies, as well as establishing (at each MIRACUM DIC site, but also for MIRACUM in total) use and access committees are major building blocks for the success of such an endeavor. Architectural Framework and Methodology: The MIRACUM DIC architecture builds on a comprehensive ecosystem of reusable open source tools (MIRACOLIX), which are linkable and interoperable amongst each other, but also with the existing software environment of the MIRACUM hospitals. Efficient data protection measures, considering patient consent, data harmonization and a MIRACUM metadata repository as well as a common data model are major pillars of this framework. The methodological approach for shared data usage relies on a federated querying and analysis concept. Use Cases: MIRACUM aims at proving the value of their DIC with three use cases: IT support for patient recruitment into clinical trials, the development and routine care implementation of a clinico-molecular predictive knowledge tool, and molecular-guided therapy recommendations in molecular tumor boards. Results: Based on the MIRACUM DIC release in the nine months conceptual phase first large scale analysis for stroke and colorectal cancer cohorts have been pursued. Discussion: Beyond all technological challenges successfully applying the MIRACUM tools for the enrichment of our knowledge about diagnostic and therapeutic concepts, thus supporting the concept of a Learning Health System will be crucial for the acceptance and sustainability in the medical community and the MIRACUM university hospitals.

...read moreread less

70 citations

Journal Article•DOI•

OSSE Goes FAIR - Implementation of the FAIR Data Principles for an Open-Source Registry for Rare Diseases.

[...]

Jannik Schaaf, Dennis Kadioglu¹, Jens Goebel, Christian-Alexander Behrendt¹, Marco Roos², David van Enckevort³, Frank Ückert⁴, Fatlume Sadiku⁴, Thomas O. F. Wagner, Holger Storf - Show less +6 more•Institutions (4)

University of Hamburg¹, Centre for Life², University Medical Center Groningen³, German Cancer Research Center⁴

01 Jan 2018-Studies in health technology and informatics

TL;DR: The so-called FAIR Data Point was integrated into OSSE to provide a description of metadata in a FAIR manner, which is an important step towards unified documentation across multiple registries.

...read moreread less

Abstract: The Open Source Registry for Rare Diseases (OSSE) provides a concept and a software for the management of registries for patients with rare diseases. A disease is defined as rare if less than 5 out of 10,000 people are affected. Up to date, approximately 6,000 rare diseases are catalogued. Networking and data exchange for research purposes remains challenging due to the paucity of interoperability and due to the fact that small data stocks are stored locally. The so called "Findable, Accessible, Interoperable, Reusable" (FAIR) Data Principles have been developed to improve research in the field of rare diseases. Subsequently, the OSSE architecture was adapted to implement the FAIR Data Principles. Therefore, the so-called FAIR Data Point was integrated into OSSE to provide a description of metadata in a FAIR manner. OSSE relies on the existing metadata repository (MDR), which is used in to define data elements in the system. This is an important step towards unified documentation across multiple registries. The integration and use of new procedures to improve interoperability plays an important role in the context of registries for rare diseases.

...read moreread less

12 citations

Proceedings Article•DOI•

Semantic Software Metadata for Workflow Exploration and Evolution

[...]

Lucas Augusto M. C. Carvalho¹, Daniel Garijo², Claudia Bauzer Medeiros¹, Yolanda Gil³•Institutions (3)

State University of Campinas¹, Technical University of Madrid², Information Sciences Institute³

01 Oct 2018

TL;DR: OntoSoft-VFF (Ontology for Software Version, Function and Functionality), a software metadata repository designed to capture information about software and workflow components that is important for managing workflow exploration and evolution, is proposed and implemented.

...read moreread less

Abstract: Scientific workflow management systems play a major role in the design, execution and documentation of computational experiments. However, they have limited support for managing workflow evolution and exploration because they lack rich metadata for the software that implements workflow components. Such metadata could be used to support scientists in exploring local adjustments to a workflow, replacing components with similar software, or upgrading components upon release of newer software versions. To address this challenge, we propose OntoSoft-VFF (Ontology for Software Version, Function and Functionality), a software metadata repository designed to capture information about software and workflow components that is important for managing workflow exploration and evolution. Our approach uses a novel ontology to describe the functionality and evolution through time of any software used to create workflow components. OntoSoft-VFF is implemented as an online catalog that stores semantic metadata for software to enable workflow exploration through understanding of software functionality and evolution. The catalog also supports comparison and semantic search of software metadata. We showcase OntoSoft-VFF using machine learning workflow examples. We validate our approach by testing that a workflow system could compare differences in software metadata, explain software updates and describe the general functionality of workflow steps.

...read moreread less

12 citations

Journal Article•DOI•

Samply.MDR – A Metadata Repository and Its Application in Various Research Networks

[...]

Dennis Kadioglu, Bernhard Breil, Christian Knell¹, Martin Lablans², Sebastian Mate¹, Danijela Schlue, Hubert Serve³, Holger Storf, Frank Ückert², Thomas O. F. Wagner, Paul Weingardt, Hans-Ulrich Prokosch¹ - Show less +8 more•Institutions (3)

University of Erlangen-Nuremberg¹, German Cancer Research Center², Goethe University Frankfurt³

01 Jan 2018-Studies in health technology and informatics

TL;DR: The structure and features of the Samply.MDR as well as its flexible usability are presented by giving an overview about its application in various projects.

...read moreread less

Abstract: Collaboration in medical research is becoming common, especially for collecting relevant cases across institutional boundaries. If the data, which is usually very heterogeneously formalized and structured, can be integrated, such a collaboration can facilitate research. An absolute prerequisite for this is an extensive description about the formalization and exact meaning of every data element contained in a dataset. This information is commonly known as metadata. Various research networking projects tackle this challenge with the development of concepts and IT tools. The Samply Metadata Repository (Samply.MDR) is a solution for managing and publishing such metadata in a standardized and reusable way. In this article we present the structure and features of the Samply.MDR as well as its flexible usability by giving an overview about its application in various projects.

...read moreread less

10 citations

Journal Article•DOI•

MetaStore: an adaptive metadata management framework for heterogeneous metadata models

[...]

Ajinkya Prabhune¹, Rainer Stotzka¹, Vaibhav Sakharkar¹, Jürgen Hesser², Michael Gertz² - Show less +1 more•Institutions (2)

Karlsruhe Institute of Technology¹, Heidelberg University²

01 Mar 2018-Distributed and Parallel Databases

TL;DR: MetaStore is an adaptive metadata management framework based on a NoSQL database and an RDF triple store that automatically segregates the different categories of metadata in their corresponding data models to maximize the utilization of the data models supported by NoSQL databases.

...read moreread less

Abstract: In this paper, we present MetaStore, a metadata management framework for scientific data repositories. Scientific experiments are generating a deluge of data, and the handling of associated metadata is critical, as it enables discovering, analyzing, reusing, and sharing of scientific data. Moreover, metadata produced by scientific experiments are heterogeneous and subject to frequent changes, demanding a flexible data model. Existing metadata management systems provide a broad range of features for handling scientific metadata. However, the principal limitation of these systems is their architecture design that is restricted towards either a single or at the most a few standard metadata models. Support for handling different types of metadata models, i.e., administrative, descriptive, structural, and provenance metadata, and including community-specific metadata models is not possible with these systems. To address this challenge, we present MetaStore, an adaptive metadata management framework based on a NoSQL database and an RDF triple store. MetaStore provides a set of core functionalities to handle heterogeneous metadata models by automatically generating the necessary software code (services) and on-the-fly extends the functionality of the framework. To handle dynamic metadata and to control metadata quality, MetaStore also provides an extended set of functionalities such as enabling annotation of images and text by integrating the Web Annotation Data Model, allowing communities to define discipline-specific vocabularies using Simple Knowledge Organization System, and providing advanced search and analytical capabilities by integrating the ElasticSearch. To maximize the utilization of the data models supported by NoSQL databases, MetaStore automatically segregates the different categories of metadata in their corresponding data models. Complex provenance graphs and dynamic metadata are modeled and stored in an RDF triple store, whereas the static metadata is stored in a NoSQL database. For enabling large-scale harvesting (sharing) of metadata using the METS standard over the OAI-PMH protocol, MetaStore is designed OAI-compliant. Finally, to show the practical usability of the MetaStore framework and that the requirements from the research communities have been realized, we describe our experience in the adoption of MetaStore for three communities.

...read moreread less

10 citations

Interactive Multi-Instrument Database of Solar Flares

[...]

Shubha S. Ranjan¹, Ryan Spaulding, Donald G. Deardorff•Institutions (1)

Ames Research Center¹

07 Feb 2018

TL;DR: In this paper, the authors developed a specific innovative methodology based on recent advances in "big data" intelligent databases applied to the growing amount of high-spatial and multi-wavelength resolution, high-cadence data from NASA's missions and supporting ground-based observatories.

...read moreread less

Abstract: The fundamental motivation of the project is that the scientific output of solar research can be greatly enhanced by better exploitation of the existing solar/heliosphere space-data products jointly with ground-based observations. Our primary focus is on developing a specific innovative methodology based on recent advances in "big data" intelligent databases applied to the growing amount of high-spatial and multi-wavelength resolution, high-cadence data from NASA's missions and supporting ground-based observatories. Our flare database is not simply a manually searchable time-based catalog of events or list of web links pointing to data. It is a preprocessed metadata repository enabling fast search and automatic identification of all recorded flares sharing a specifiable set of characteristics, features, and parameters. The result is a new and unique database of solar flares and data search and classification tools for the Heliophysics community, enabling multi-instrument/multi-wavelength investigations of flare physics and supporting further development of flare-prediction methodologies.

...read moreread less

8 citations

Journal Article•DOI•

Towards a completely extensible dynamic geometry software with metadata

[...]

Davorka Radaković¹, Ðorđe Herceg¹•Institutions (1)

University of Novi Sad¹

01 Jun 2018-Computer Languages, Systems & Structures

TL;DR: The solution, presented in this work, provides extensibility to simple and complex data types, unary and binary operations, type conversions, functions and visuals, thus enabling developers to seamlessly add new features to SLGeometry by implementing them as C# classes annotated with metadata.

...read moreread less

7 citations

Patent•

Closed-loop unified metadata architecture with universal metadata repository

[...]

Chatelain Jean-Luc¹, Tung Teresa Sheausan, Parthasarathy Sonali, Puri Colin Anil, Abdolrashidi Amirreza, Abolhassani Neda - Show less +2 more•Institutions (1)

Accenture¹

23 May 2018

TL;DR: In this article, a closed-loop unified metadata architecture is proposed to provide a meaningful, consistent and normalized view of the metadata that describes the information, as well as to determine data lineage and meaningful data quality metrics.

...read moreread less

Abstract: There has been exponential growth in the capture and retention of immense quantities of information in a globally distributed manner. A closed-loop unified metadata architecture includes a universal metadata repository and implements data quality and data lineage analyses. The architecture solves significant technical challenges to provide a meaningful, consistent and normalized view of the metadata that describes the information, as well as to determine data lineage and meaningful data quality metrics.

...read moreread less

6 citations

Proceedings Article•DOI•

Large Database Schema Matching using Data Mining Techniques

[...]

Debora G. Reis¹, Marcelo Ladeira¹, Maristela Holanda¹, Marcio de Carvalho Victorino¹•Institutions (1)

University of Brasília¹

01 Nov 2018

TL;DR: This work proposes to use data mining techniques to automatically identify similar structures of relational databases by comparing their metadata, which is composed by physical details of the databases, and shows that this solution is flexible, it supports a variety of schema sizes and DBMS.

...read moreread less

Abstract: With the expanding diversity of database technologies and database sizes, it is becoming increasingly hard to identify similar relational databases among many large databases stored in different Database Management Systems (DBMS). Therefore, we propose to use data mining techniques to automatically identify similar structures of relational databases by comparing their metadata, which is composed by physical details of the databases. The amount of metadata is proportional to the size of the schema structure. The possibilities of combinations for comparison is quadratic in relation to the number of schemas analyzed. Looking for the most efficient technique, we propose to calculate the schema similarity evaluating a distance of all the schemas to just one schema, which is a start point. Obviously schemas with close distances are more similar than schemas with bigger distances. We compare this proposal against two other approaches. The first approach compares all schemas against all another schemas except for its inverse comparison. The second approach compares schemas in a group of schemas with similar sizes. To validate our proposal, an experiment is performed with 354 real schemas ranging in sizes from 2 to 20 thousand metadata, totaling together more than 26 thousand tables and 238 thousand columns. Those schemas came from 5 different DBMS. The metadata extracted is transformed and formatted for comparing pairs of a schema. The textual features are compared using Cosine Distance and numerical features are compared using Euclidean Distance. Then, the hierarchical cluster technique is used to facilitate the visualization of the schema that most closely resembled one another. Results showed that, our was the most efficient because it compared all schema and identified the most similar schema by its structure in less than 2 minutes. The extracted metadata was used to create the first version of the metadata repository and an initial version of a data catalog, which contributed to the knowledge of existing data. Using this procedure, duplicated schemas were discovered and then discontinued, resulting in a cost savings of 10% of cost savings, while freeing up infrastructure resources. This solution is flexible, it supports a variety of schema sizes and DBMS.

...read moreread less

6 citations

Journal Article•DOI•

The influence of community recommendations on metadata completeness

[...]

Sean N. Gordon, Ted Habermann

01 Jan 2018-Ecological Informatics

TL;DR: The goal of the study is to quantitatively measure completeness of metadata records and to determine if metadata developed by LTER is more complete with respect to the recommendation than other collections in EML and in CSDGM.

...read moreread less

6 citations

Patent•

Hybrid Cloud Chain Management of Centralized and Decentralized Data

[...]

Prithvi Krishnan Padmanabhan¹•Institutions (1)

Salesforce.com¹

29 Mar 2018

TL;DR: In this article, the authors describe a hybrid data management system that operates by receiving, from a user interface, a modification to a field of data, which is transmitted to the decentralized data management systems.

...read moreread less

Abstract: Disclosed herein are system, method, and computer program product embodiments for a hybrid data management system. An embodiment operates by receiving, from a user interface, a modification to a field of data. It is determined that the field of data corresponds to a decentralized data management system based on a look-up to a metadata repository. The modification is transmitted to the decentralized data management system. From the decentralized data management system, an asset identifier corresponding to the modification is received. The asset identifier is stored in a centralized database. Via the user interface, an indication that the field of data has been modified is provided.

...read moreread less

Journal Article•

Using Graph Tools on Metadata Repositories.

[...]

Hannes Ulrich¹, Ann-Kristin Kock-Schoppenhauer¹, Petra Duhm-Harbeck¹, Josef Ingenerf¹•Institutions (1)

University of Lübeck¹

01 Jan 2018-Studies in health technology and informatics

TL;DR: In this article, the authors evaluate how on-board techniques rely on matching and mapping using a graph database, and apply algorithms for metadata management to different datasets relating to cancer related to cancer.

...read moreread less

Abstract: To exchange data across several sites or to interpret it at a later point in time, it is necessary to create a general understanding of the data. As a standard practice, this understanding is achieved through metadata. These metadata are usually stored in relational databases, so-called metadata repositories (MDR). Typical functions of such an MDR include pure storage, administration and other specific metadata functionalities such as finding relations among data elements. This results in a multitude of connections between the data elements, which can be described as highly interconnected graphs. To use alternative databases such as graph databases for modelling and visualisation it has already been proven to be beneficial in previous studies. The objective of this work is to evaluate how on-board techniques rely on matching and mapping using a graph database. Different datasets relating to cancer were entered, and algorithms for metadata management were applied.

...read moreread less

Patent•

Data handling methods and system for data lakes

[...]

Yogesh Palrecha

27 Dec 2018

TL;DR: In this paper, a plurality of data elements from a data lake associated with an organization are registered with one or more metadata objects through a metadata registration, which is performed using a graphical user interface by either receiving a manual input from a user or using a REST application programming interface.

...read moreread less

Abstract: Embodiments provide data handling methods and systems for data lakes. In an embodiment, the method includes accessing a plurality of data elements from a data lake associated with an organization. Each data element is registered with one or more metadata objects through a metadata registration The metadata registration is performed using a graphical user interface by either receiving a manual input from a user or using a REST application programming interface. A unified metadata repository is formed based on the metadata registration of the plurality of data elements. Moreover, complex computations of the plurality of data elements for various data processing operations and business rules are performed. Graphical processing of the plurality of data elements in the data lake is performed for analyzing entities and their relationships to generate insights. The method further includes performing an analytical operation based at least on machine learning algorithms and deep learning techniques.

...read moreread less

Proceedings Article•

BRIDG-based Trial Metadata Repository - Need for Standardized Machine Interpretable Trial Descriptions.

[...]

J. van Leeuwen¹, Anca I. D. Bucur¹, Brecht Claerhout, K. De Schepper, David Pérez-Rey², Raul Alonso-Calvo³ - Show less +2 more•Institutions (3)

Philips¹, Technical University of Madrid², Intel³

02 Aug 2018

Proceedings Article•

SEMANTICALLY ENHANCING MULTIMEDIA DATA WAREHOUSES - Using Ontologies as Part of the Metadata

[...]

Andrei Vanea¹, Rodica Potolea¹•Institutions (1)

Technical University of Cluj-Napoca¹

13 Sep 2018

TL;DR: A method for semantically enhancing the metadata stored in a medical multimedia data warehouse is presented, which allows the system to speed up the execution of a query, by computing the results of new, unforeseen queries, from the fact data already stored in the data warehouse.

...read moreread less

Abstract: Data warehouses are versatile systems capable of storing and processing large quantities of data. They are most suited for aggregating and reporting. The data managed by these systems vary from simple, numeric data, to more complex, multimedia data. One of the domains in which multimedia data is intensively produced is medicine. We present a method for semantically enhancing the metadata stored in a medical multimedia data warehouse. This semantically rich environment will gain in autonomy, reducing the dependence on human intervention to resolve new, unforeseen queries. Furthermore, the use of the semantic relations defined in the ontology allows the system to speed up the execution of a query, by computing the results of new, unforeseen queries, from the fact data already stored in the data warehouse.

...read moreread less

Journal Article•

A Method to Use Metadata in Legacy Web Applications: The Samply.MDR.Injector.

[...]

Jori Kern¹, Deniz Tas¹, Hannes Ulrich², Esther Schmidt¹, Josef Ingenerf², Frank Ückert¹, Martin Lablans¹ - Show less +3 more•Institutions (2)

German Cancer Research Center¹, University of Lübeck²

01 Jan 2018-Studies in health technology and informatics

TL;DR: The goal of this work is to provide a way to "inject" the meaning of metadata keys into the web-based frontend of an application to make it "metadata aware".

...read moreread less

Abstract: Whenever medical data is integrated from multiple sources, it is regarded good practice to separate data from information about its meaning, such as designations, definitions or permissible values (in short: metadata). However, the ways in which applications work with metadata are imperfect: Many applications do not support fetching metadata from externalized sources such as metadata repositories. In order to display human-readable metadata in any application, we propose not to change the application, but to provide a library that makes a change to the user interface. The goal of this work is to provide a way to "inject" the meaning of metadata keys into the web-based frontend of an application to make it "metadata aware".

...read moreread less

Patent•

System for uploading information into a metadata repository

[...]

Christie Jennifer Babette, Sacchi Erin N

22 Mar 2018

TL;DR: In this paper, a metadata collection system may be executed to automatically populate a metadata template based on the set of potential metadata entries, and the system may update entries in the metadata template using a translation tool and validate the updated entries to ensure that required data elements are present.

...read moreread less

Abstract: A back-end application computer server may access a potential metadata entries data store containing a set of potential metadata entries, each entry including at least a data element name and a data element definition. A metadata collection system may be executed to automatically populate a metadata template based on the set of potential metadata entries. The system may update entries in the metadata template using a translation tool and validate the updated entries in the metadata template to ensure that required data elements are present. The system may also certify the validated entries load the set of certified metadata entries, including the certified data element names and certified data element definitions, into an enterprise metadata repository data store. Electronic messages may be exchanged to support at least one interactive user interface display associated with certification of the metadata template.

...read moreread less

Journal Article•DOI•

Developing IntegrityCatalog, a software system for managing integrity‐related metadata in digital repositories

[...]

Nikos Chondros¹, Mema Roussopoulos¹•Institutions (1)

National and Kapodistrian University of Athens¹

01 Jan 2018-Software - Practice and Experience

TL;DR: This work introduces IntegrityCatalog, a novel software system that can be integrated into any digital repository and introduces a treap‐based persistent authenticated dictionary managing arbitrary length key/value pairs, which it uses to store all integrity metadata.

...read moreread less

Abstract: Summary Digital repositories must periodically check the integrity of stored objects to assure users of their correctness. Prior solutions calculate integrity metadata and require the repository to store it alongside the actual data objects. To safeguard and detect damage to this metadata, prior solutions rely on widely visible media (unaffiliated third parties) to store and provide back digests of the metadata to verify it is intact. However, they do not address recovery of the integrity metadata in case of damage or adversarial attack. We introduce IntegrityCatalog, a novel software system that can be integrated into any digital repository. It collects all integrity-related metadata in a single component and treats them as first class objects, managing both their integrity and their preservation. We introduce a treap-based persistent authenticated dictionary managing arbitrary length key/value pairs, which we use to store all integrity metadata, accessible simply by object name. Additionally, IntegrityCatalog is a distributed system that includes a network protocol that manages both corruption detection and preservation of this metadata, using administrator-selected network peers with 2 possible roles. Verifiers store and offer attestations on digests and have minimal storage requirements, while preservers efficiently synchronize a complete copy of the catalog to assist in recovery in case of a detected catalog compromise on the local system. We present our approach in developing the prototype implementation, measure its performance experimentally, and demonstrate its effectiveness in real-world situations. We believe the implementation techniques of our open-source IntegrityCatalog will be useful in the construction of next-generation digital repositories.

...read moreread less

Patent•

Hyperdata generation in the cloud

[...]

Schmidt Olaf

14 Jun 2018

TL;DR: In this paper, a system and method for building a hyperdata hub to access an enriched data model is presented, where one or more data models are built based on user input to a user interface, and query definitions are built on the user input.

...read moreread less

Abstract: A system and method for building a hyperdata hub to access an enriched data model is presented. One or more data models are built based on user input to a user interface, and one or more query definitions are built based on the user input to the user interface. Data is collected from external data sources and internal data sources, and contextual data is extracted based on the collected data according to the one or more data models and the one or more query definitions. The metadata associated with the one or more data models and one or more query definitions are stored, and data is matched with the contextual data associated with the hyperdata metadata repository.

...read moreread less

Create or update image metadata item

[...]

富士通株式会社

01 Jan 2018

End-to-End Solution for Data Customization with NASA's Earthdata Search

[...]

Mark Reese, Christopher Lynnes¹, Doug Newman²•Institutions (2)

Goddard Space Flight Center¹, Raytheon²

10 Dec 2018

TL;DR: The Earthdata Search End-to-End Services (E2ES) workflow as mentioned in this paper leverages the Common Metadata Repository (CMR) newly implemented Unified Metadata Models for Services and Variables as well as a new service broker to expose and seamlessly integrate a collection's service capabilities and variables.

...read moreread less

Abstract: The goal of NASA's Earthdata Search End-to-End Services workflow is to take the pain and headache out of searching for data and getting that data back in a format that is usable with only that data that is relevant for you. For too long scientists have had to jump through endless hoops, use tools that only offer specific data or specific services, and perform any number of other non-science tasks just to get started on their actual project. Earthdata Search leverages the Common Metadata Repository's (CMR) newly implemented Unified Metadata Models for Services and Variables as well as a new service broker to expose and seamlessly integrate a collection's service capabilities and variables into an intuitive user interface. Using the new End-to-End Services workflow, scientists will be able to quickly see what data is available to be customized, what customization options are available, and actually perform those customizations on the data all within Earthdata Search, regardless of who the data provider is. This talk will demonstrate the simple workflow that will be available to end users and also give an overview covering how the workflow is enabled by the metadata stored within the CMR. (https://search.earthdata.nasa.gov/)

...read moreread less

Proceedings Article•DOI•

Universal Metadata Repository: Integrating Data Profiles Across an Organization

[...]

Neda Abolhassani¹, Colin A. Puri², Sonali Parthasarathy², Wang Zhijie², Matthew Kujawinski², Teresa Tung¹, Lakshmish Ramaswamy¹ - Show less +3 more•Institutions (2)

University of Georgia¹, Accenture²

06 Jul 2018

TL;DR: The authors' Universal Metadata Repository (UMR) applied to three in-flight use cases which combine the power of a technical and business view using knowledge graphs for: searching, inferencing, traceability, administration, enforcing accessibility standards, and providing consistent organizational architecture are presented.

...read moreread less

Abstract: Managing ever-growing content from heterogeneous data sources is a significant challenge in enterprise environments. Many data analysis tools work in isolation to capture various statistical, quality, and provenance information within the enterprise. Yielding meaningful and consistent information from a landscape of different vendor tools requires a holistic and transparent view over all existing extracted metadata. In this paper, we present our Universal Metadata Repository (UMR) applied to three in-flight use cases which combine the power of a technical and business view using knowledge graphs for: searching, inferencing, traceability, administration, enforcing accessibility standards, and providing consistent organizational architecture.

...read moreread less

Patent•

Accelerated deduplication block replication

[...]

Barajas Gonzalez Emmanuel¹, Shaun E. Harrington¹, Mcgregor Harry R¹, Christopher B. Moore¹•Institutions (1)

IBM¹

01 Mar 2018

TL;DR: In this paper, the authors propose an approach for managing data replication between first and second sites of a distributed computing environment by one or more processors based on an identified data block-set for replication.

...read moreread less

Abstract: Embodiments for, in a shared storage environment, managing data replication between first and second sites of a distributed computing environment by one or more processors Based on an identified data block-set for replication, a unique metadata map is generated as a computed snapshot of the identified data block-set, the metadata map accounting for a predetermined block-size for transfer The unique metadata map is transferred to the second site The second site adds the unique metadata map to a global metadata repository

...read moreread less

Patent•

Just In Time Deployment with Package Managers

[...]

Bocaletti Luis E

29 Nov 2018

TL;DR: In this article, a system, method, and computer-readable medium are disclosed for performing a deployment operation, comprising: receiving an application module command request; accessing a metadata repository for application modules to obtain metadata corresponding to the application module; determining whether a module corresponding to a command request is loaded within an application based upon metadata corresponding with the application modules.

...read moreread less

Abstract: A system, method, and computer-readable medium are disclosed for performing a deployment operation, comprising: receiving an application module command request; accessing a metadata repository for application modules to obtain metadata corresponding to the application module; determining whether an application module corresponding to the application module command request is loaded within an application based upon metadata corresponding to the application module; contacting a package manager to download an application module package if the application module is not loaded within the application or an update to the application module exists; loading the application module package; and, providing an invocation to an entry point of the application module.

...read moreread less