scispace - formally typeset
Search or ask a question
Topic

Metadata repository

About: Metadata repository is a research topic. Over the lifetime, 5841 publications have been published within this topic receiving 121778 citations.


Papers
More filters
Posted Content
TL;DR: HopsFS is introduced, a next generation distribution of the Hadoop Distributed File System that replaces HDFS' single node in-memory metadata service, with a distributed metadata service built on a NewSQL database that enables an order of magnitude larger and higher throughput clusters compared to HDFS.
Abstract: Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS' single node in-memory metadata service, with a distributed metadata service built on a NewSQL database. By removing the metadata bottleneck, HopsFS enables an order of magnitude larger and higher throughput clusters compared to HDFS. Metadata capacity has been increased to at least 37 times HDFS' capacity, and in experiments based on a workload trace from Spotify, we show that HopsFS supports 16 to 37 times the throughput of Apache HDFS. HopsFS also has lower latency for many concurrent clients, and no downtime during failover. Finally, as metadata is now stored in a commodity database, it can be safely extended and easily exported to external systems for online analysis and free-text search.

47 citations

Journal ArticleDOI
TL;DR: A natural language processing method is employed, namely Labeled Latent Dirichlet Allocation (LLDA), and a regression model is trained via a human participants experiment to address the topic heterogeneity brought by multiple metadata standards and the lack of established semantic search in Linked‐Data‐driven geoportals.
Abstract: Geoportals provide integrated access to geospatial resources, and enable both authorities and the general public to contribute and share data and services. An essential goal of geoportals is to facilitate the discovery of the available resources. Such process heavily relies on the quality of metadata. While multiple metadata standards have been established, data contributers may adopt different standards when sharing their data via the same geoportal. This is especially the case for user-generated content where various terms and topics can be introduced to describe similar datasets. While this heterogeneity provides a wealth of perspectives, it also complicates resource discovery. With the fast development of the Semantic Web technologies, there is a rise of Linked-Data-driven portals. Although these novel portals open up new ways to organizing metadata and retrieving resources, they lack effective semantic search methods. This paper addresses the two challenges discussed above, namely the topic heterogeneity brought by multiple metadata standards as well as the lack of established semantic search in Linked-Data-driven geoportals. To harmonize the metadata topics, we employ a natural language processing method, namely Labeled Latent Dirichlet Allocation (LLDA), and train it using standardized metadata from Data.gov. With respect to semantic search, we construct thematic and geographic matching features from the textual metadata descriptions, and train a regression model via a human participants experiment. We evaluate our methods by examining their performances in addressing the two issues. Finally, we implement a semantics-enabled and Linked-Data-driven prototypical geoportal using a sample dataset from Esri’s ArcGIS Online.

47 citations

Patent
28 Feb 2013
TL;DR: In this paper, an interest-driven data pipeline is compiled based upon reporting data requirements automatically derived from at least one report specification defined using the metadata, and the pipeline is automatically compiled to generate reporting data using the raw data.
Abstract: Interest-driven Business Intelligence (BI) systems in accordance with embodiments of the invention are illustrated. In one embodiment of the invention, a data processing system includes raw data storage containing raw data, metadata storage containing metadata that describes the raw data, and an interest-driven data pipeline that is automatically compiled to generate reporting data using the raw data, wherein the interest-driven data pipeline is compiled based upon reporting data requirements automatically derived from at least one report specification defined using the metadata.

46 citations

Patent
18 Apr 2011
TL;DR: In this paper, a data management server that is connectable to a plurality of content servers that store content data and metadata that includes content data attribute information and to a client device that acquires the content data based on the metadata.
Abstract: There is provided a data management server that is connectable to a plurality of content servers that store content data and metadata that includes content data attribute information and to a client device that acquires the content data based on the metadata. The data management server includes a data collection portion, a data processing portion, and a transmission portion. The data collection portion collects the metadata from each of the plurality of the content servers. The data processing portion hierarchically structures the metadata that the data collection portion collected, based on the attribute information that is included in the metadata. The transmission portion, in response to a request from the client device, transmits to the client device the metadata that was hierarchically structured by the data processing portion.

46 citations

Proceedings ArticleDOI
07 Jun 2005
TL;DR: The form and function of common metadata fields are described, and appropriate performance measures for these fields are identified, and the automatic metadata assignment tools in the iVia virtual library software are described and their performance is measured.
Abstract: This paper describes the development of practical automatic metadata assignment tools to support automatic record creation for virtual libraries, metadata repositories and digital libraries, with particular reference to library-standard metadata. The development process is incremental in nature, and depends upon an automatic metadata evaluation tool to objectively measure its progress. The evaluation tool is based on and informed by the metadata created and maintained by librarian experts at the INFOMINE Project, and uses different metrics to evaluate different metadata fields. In this paper, we describe the form and function of common metadata fields, and identify appropriate performance measures for these fields. The automatic metadata assignment tools in the iVia virtual library software are described, and their performance is measured. Finally, we discuss the limitations of automatic metadata evaluation, and cases where we choose to ignore its evidence in favor of human judgment.

46 citations


Network Information
Related Topics (5)
Information system
107.5K papers, 1.8M citations
85% related
User interface
85.4K papers, 1.7M citations
81% related
Software
130.5K papers, 2M citations
80% related
Mobile computing
51.3K papers, 1M citations
80% related
Support vector machine
73.6K papers, 1.7M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202332
202279
202113
202011
201921
201824