scispace - formally typeset
Search or ask a question

Showing papers on "Metadata repository published in 2009"


Patent
08 Jun 2009
TL;DR: Improved techniques for enhancing, associating, and linking various sources of metadata for music files, to allow integration of commercially generated metadata with user-entered metadata, and to ensure that metadata provided to the user is of the highest quality and accuracy available, even when the metadata comes from disparate sources having different levels of credibility as discussed by the authors.
Abstract: Improved techniques for enhancing, associating, and linking various sources of metadata for music files, to allow integration of commercially generated metadata with user-entered metadata, and to ensure that metadata provided to the user is of the highest quality and accuracy available, even when the metadata comes from disparate sources having different levels of credibility. The invention further provides improved techniques for identifying approximate matches when querying metadata databases, and also provides improved techniques for accepting user submissions of metadata, for categorizing user submissions according to relative credibility, and for integrating user submissions with existing metadata.

221 citations


Patent
13 Nov 2009
TL;DR: In this article, a computer-implemented method includes parsing a user query to determine whether the user query assigns a field to the first term, parsing resulting in a parsed query that conforms to a predefined format, performing a search in a metadata repository using the parsed query, the metadata repository embodied in a computer readable medium and including triplets generated based on multiple modes of metadata for video content, search identifying a set of candidate scenes from the video content.
Abstract: A computer-implemented method includes receiving, in a computer system, a user query comprising at least a first term, parsing the user query to at least determine whether the user query assigns a field to the first term, the parsing resulting in a parsed query that conforms to a predefined format, performing a search in a metadata repository using the parsed query, the metadata repository embodied in a computer readable medium and including triplets generated based on multiple modes of metadata for video content, the search identifying a set of candidate scenes from the video content, ranking the set of candidate scenes according to a scoring metric into a ranked scene list, and generating an output from the computer system that includes at least part of the ranked scene list, the output generated in response to the user query.

165 citations


Journal ArticleDOI
Jung-ran Park1
TL;DR: Results of the study indicate a pressing need for the building of a common data model that is interoperable across digital repositories.
Abstract: This study presents the current state of research and practice on metadata quality through focus on the functional perspective on metadata quality, measurement, and evaluation criteria coupled with mechanisms for improving metadata quality. Quality metadata reflect the degree to which the metadata in question perform the core bibliographic functions of discovery, use, provenance, currency, authentication, and administration. The functional perspective is closely tied to the criteria and measurements used for assessing metadata quality. Accuracy, completeness, and consistency are the most common criteria used in measuring metadata quality in the literature. Guidelines embedded within a Web form or template perform a valuable function in improving the quality of the metadata. Results of the study indicate a pressing need for the building of a common data model that is interoperable across digital repositories.

147 citations


Proceedings Article
24 Feb 2009
TL;DR: Spyglass achieves fast, scalable performance through the use of several novel metadata search techniques that exploit metadata search properties, including Snapshot-based metadata collection, which is up to 10× faster than existing approaches.
Abstract: The scale of today's storage systems has made it increasingly difficult to find and manage files. To address this, we have developed Spyglass, a file metadata search system that is specially designed for large-scale storage systems. Using an optimized design, guided by an analysis of real-world metadata traces and a user study, Spyglass allows fast, complex searches over file metadata to help users and administrators better understand and manage their files. Spyglass achieves fast, scalable performance through the use of several novel metadata search techniques that exploit metadata search properties. Flexible index control is provided by an index partitioning mechanism that leverages namespace locality. Signature files are used to significantly reduce a query's search space, improving performance and scalability. Snapshot-based metadata collection allows incremental crawling of only modified files. A novel index versioning mechanism provides both fast index updates and "back-in-time" search of metadata. An evaluation of our Spyglass prototype using our real-world, large-scale metadata traces shows search performance that is 1-4 orders of magnitude faster than existing solutions. The Spyglass index can quickly be updated and typically requires less than 0.1%of disk space. Additionally, metadata collection is up to 10× faster than existing approaches.

146 citations


Patent
15 Jan 2009
TL;DR: In this article, a system and method for server-side method for editing metadata in a file, the method including steps of: receiving from a user a request for editing the metadata in the file; presenting a window to the user for display on a user's screen wherein the window displays properties of the metadata; receiving from the user an edit to the metadata properties; and updating the metadata property with the edit received from the users, for producing an updated metadata.
Abstract: A system and method for server-side method for editing metadata in a file, the method including steps of: receiving from a user a request for editing the metadata in the file; presenting a window to the user for display on a user's screen wherein the window displays properties of the metadata; receiving from the user an edit to the metadata properties; and updating the metadata properties with the edit received from the user, for producing an updated metadata.

134 citations


Journal ArticleDOI
TL;DR: The recently redesigned SPECCHIO system stores spectral and metadata in a relational database based on a non-redundant data model and offers efficient data import, automated metadata generation, editing and retrieval via a Java application.

133 citations


Proceedings ArticleDOI
16 Oct 2009
TL;DR: This work proposes a mechanism to store small files in HDFS efficiently and improve the space utilization for metadata, and provides for new job functionality to allow for in-job archival of directories and files so that running MapReduce programs may complete without being killed by the JobTracker due to quota policies.
Abstract: Scientific applications are adapting HDFS/MapReduce to perform large scale data analytics. One of the major challenges is that an overabundance of small files is common in these applications, and HDFS manages all its files through a single server, the Namenode. It is anticipated that small files can significantly impact the performance of Namenode. In this work we propose a mechanism to store small files in HDFS efficiently and improve the space utilization for metadata. Our scheme is based on the assumption that each client is assigned a quota in the file system, for both the space and number of files. In our approach, we utilize the compression method ‘harballing', provided by Hadoop, to better utilize the HDFS. We provide for new job functionality to allow for in-job archival of directories and files so that running MapReduce programs may complete without being killed by the JobTracker due to quota policies. This approach leads to better functionality of metadata operations and more efficient usage of the HDFS. Our analysis results show that we can reduce the metadata footprint in main memory by a factor of 42.

128 citations


Patent
01 Oct 2009
TL;DR: In this paper, the authors present a system, methods and apparatuses for managing objects (files and directories) in network file systems according to policies, each policy may have one or more rules, each of which ties a condition to an action.
Abstract: Disclosed are systems, methods and apparatuses for managing objects (files and directories) in network file systems according to policies. Each policy may have one or more rules, each of which ties a condition to an action. Each condition can be expressed in terms of metadata harvested across file systems and stored in a metadata repository. The actions are user-programmable. Users can apply and/or enforce a policy by manipulating the metadata stored in the metadata repository. For example, suppose a policy prohibits storing MP3 files in corporate storage, a user can specify a rule that ties the condition “no MP3 files in volumes A-Z” to an action “delete MP3 files from volumes A-Z.” A file management application may apply a filter to the metadata repository to produce metadata records having values that meet the specified condition and take the corresponding action on managed objects associated with those metadata records.

113 citations


Journal ArticleDOI
TL;DR: A set of scalable quality metrics for metadata based on the Bruce & Hillman framework for metadata quality control is presented and it is found that several metrics, especially Text Information Content, correlate well with human evaluation and that the average of all the metrics are roughly as effective as people to flag low-quality instances.
Abstract: Owing to the recent developments in automatic metadata generation and interoperability between digital repositories, the production of metadata is now vastly surpassing manual quality control capabilities. Abandoning quality control altogether is problematic, because low-quality metadata compromise the effectiveness of services that repositories provide to their users. To address this problem, we present a set of scalable quality metrics for metadata based on the Bruce & Hillman framework for metadata quality control. We perform three experiments to evaluate our metrics: (1) the degree of correlation between the metrics and manual quality reviews, (2) the discriminatory power between metadata sets and (3) the usefulness of the metrics as low-quality filters. Through statistical analysis, we found that several metrics, especially Text Information Content, correlate well with human evaluation and that the average of all the metrics are roughly as effective as people to flag low-quality instances. The implications of this finding are discussed. Finally, we propose possible applications of the metrics to improve tools for the administration of digital repositories.

111 citations


Patent
04 May 2009
TL;DR: In this paper, a client stores client metadata entries corresponding to a plurality of data objects and requests a user to select from among a predefined set of conflict resolutions to resolve the conflict, and the client performs an action in accordance with the conflict resolution selected by the user.
Abstract: A client stores client metadata entries corresponding to a plurality of data objects. During a first phase of a synchronization process, the client sends one or more client metadata entries to a server. Each client metadata entry sent corresponds to a data object for which at least one metadata parameter has changed since a prior execution of the synchronization process. During a second phase of the synchronization process, the client receives from the server one or more server metadata entries, each having at least one parameter that has changed since a prior execution of the synchronization process. The client identifies any received server metadata entry that conflicts with a corresponding client metadata entry, requests a user to select from among a predefined set of conflict resolutions to resolve the conflict, and the performs an action in accordance with the conflict resolution selected by the user.

110 citations


Patent
Wei Huang1, Yizheng Zhou1, Bin Yu1, Wenting Tang1, Christian F. Beedgen1 
04 Sep 2009
TL;DR: In this article, a logging system includes an event receiver and a storage manager, where the receiver receives log data, processes it, and outputs a column-based data "chunk" which acts as a search index when querying event data.
Abstract: A logging system includes an event receiver and a storage manager. The receiver receives log data, processes it, and outputs a column-based data “chunk.” The manager receives and stores chunks. The receiver includes buffers that store events and a metadata structure that stores metadata about the contents of the buffers. Each buffer is associated with a particular event field and includes values from that field from one or more events. The metadata includes, for each “field of interest,” a minimum value and a maximum value that reflect the range of values of that field over all of the events in the buffers. A chunk is generated for each buffer and includes the metadata structure and a compressed version of the buffer contents. The metadata structure acts as a search index when querying event data. The logging system can be used in conjunction with a security information/event management (SIEM) system.

Patent
02 Nov 2009
TL;DR: In this paper, a digital directory comprising listings is accessed and metadata information associated with individual listings describing the individual listings is modified to generate transformed metadata information for aiding in an automated user input recognition process.
Abstract: Methods and systems of performing user input recognition are disclosed. A digital directory comprising listings is accessed. Metadata information is associated with individual listings describing the individual listings. The metadata information is modified to generate transformed metadata information. Therefore, the transformed metadata information is generated as a function of context information relating to a typical user interaction with the listings. Information is generated for aiding in an automated user input recognition process based on the transformed metadata information.

01 Jan 2009
TL;DR: This paper focuses on the task of item recommendation for social bookmarking websites, i.e. predicting which unseen bookmarks a user might like based on his or her profile, and examines how to incorporate the tags and other metadata into a nearest-neighbor collaborative filtering (CF) algorithm.
Abstract: Social bookmarking websites allow users to store, organize, and search bookmarks of web pages. Users of these services can annotate their bookmarks by using informal tags and other metadata, such as titles, descriptions, etc. In this paper, we focus on the task of item recommendation for social bookmarking websites, i.e. predicting which unseen bookmarks a user might like based on his or her profile. We examine how we can incorporate the tags and other metadata into a nearest-neighbor collaborative filtering (CF) algorithm, by replacing the traditional usage-based similarity metrics by tag overlap, and by fusing tag-based similarity with usage-based similarity. In addition, we perform experiments with content-based filtering by using the metadata content to recommend interesting items. We generate recommendations directly based on KullbackLeibler divergence of the metadata language models, and we explore the use of this metadata in calculating user and item similarities. We perform our experiments on three data sets from two di erent domains: Delicious, CiteULike and BibSonomy.

Patent
18 Sep 2009
TL;DR: In this article, a storage system analyzes the raw data based on join metadata, removing a certain amount of data that is guaranteed to be irrelevant to the join operation, then returns filtered data to the database server.
Abstract: Processing resources at a storage system for a database server are utilized to perform aspects of a join operation that would conventionally be performed by the database server. When requesting a range of data units from a storage system, the database server includes join metadata describing aspects of the join operation for which the data is being requested. The join metadata may be, for instance, a bloom filter. The storage system reads the requested data from disk as normal. However, prior to sending the requested data back to the storage system, the storage system analyzes the raw data based on the join metadata, removing a certain amount of data that is guaranteed to be irrelevant to the join operation. The storage system then returns filtered data to the database server. The database system thereby avoids the unnecessary transfer of certain data between the storage system and the database server.

Patent
Richard Offer1
15 Jan 2009
TL;DR: In this paper, the authors describe an application specific runtime environment defined by an application environment specification to include a minimal or reduced set of software resources required for execution of the application, which are optionally stored in a resource repository that includes resources associated with a plurality of operating systems and executable applications.
Abstract: Systems and methods of executing and/or provisioning an application in an application specific runtime environment are disclosed. The application specific runtime environment is defined by an application environment specification to include a minimal or reduced set of software resources required for execution of the application. These software resources are optionally stored in a resource repository that includes resources associated with a plurality of operating systems and/or executable applications. Various embodiments of the invention include the development of hierarchical resource metadata configured to characterize the various files, packages and file families included in the resource repository. In some embodiments this metadata is used to select between files and different versions of files when provisioning an application specific runtime environment.

Journal ArticleDOI
TL;DR: The Dryad repository's metadata best practice balancing of these two needs is presented, and the conclusion summarizes limitations and advantages of the two prongs underlying Dryad's metadata effort.
Abstract: Digital data repositories ought to support immediate operational needs and long-term project goals. This paper presents the Dryad repository's metadata best practice balancing of these two needs. The paper reviews background work exploring the meaning of science, characterizing data, and highlighting data curation metadata challenges. The Dryad repository is introduced, and the initiative's metadata best practice and underlying rationales are described. Dryad's metadata approach includes two prongs: one addressing the long-term goal to align with the Semantic Web via a metadata application profile; and another addressing the immediate need to make content available in DSpace via an extensible markup language (XML) schema. The conclusion summarizes limitations and advantages of the two prongs underlying Dryad's metadata effort.

Patent
18 Nov 2009
TL;DR: In this paper, the authors proposed a method and system for reducing the latencies of retrieving metadata by a user application, which aids a seamless browsing experience as the metadata is immediately available, being already prefetched, to the user upon his actual request.
Abstract: The present invention relates generally to interactive communication systems and, more particularly, to a method and system for reducing latencies of retrieving metadata by a user application. The method comprises providing a browsing interface for the user to browse through; and prefetching metadata likely to be soon accessed in advance of an actual user request for access to said metadata; wherein said prefetching of said metadata is performed dynamically based on prediction of at least one next available user action with respect to a current view of said browsing interface being directed towards fetching said metadata. The present invention aids a seamless browsing experience as the metadata is immediately available, being already prefetched, to the user upon his actual request.

Patent
Matthew William Barringer1
11 Feb 2009
TL;DR: In this paper, a system and method for building virtual appliances using a repository metadata server and a dependency resolution service is provided, whereby remote clients may follow a simple and repeatable process for creating virtual appliances.
Abstract: A system and method for building virtual appliances using a repository metadata server and a dependency resolution service is provided. In particular, a hosted web service may provide a collaborative environment for managing origin repositories and software dependencies, whereby remote clients may follow a simple and repeatable process for creating virtual appliances. For example, the repository metadata server may cache and parse metadata associated with an origin repository, download software from the origin repository, and generate resolution data that can be used by the dependency resolution service. The dependency resolution service may then use the resolution data to resolve dependencies for a package selected for an appliance, wherein the dependencies may include packages that are required, recommended, suggested, banned, or otherwise a dependency for the selected package.

Patent
Jian-Tao Sun1, Xiaochuan Ni1, Peng Xu1, Gang Wang1, Ke Tang1, Zheng Chen1 
29 Sep 2009
TL;DR: In this article, a computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a computer, cause the computer to implement an opinion search engine is defined.
Abstract: A computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a computer, cause the computer to implement an opinion search engine. The instructions to implement an opinion search engine cause the computer to collect opinion data about one or more objects from the Internet, extract metadata about the opinion data from the opinion data, remove duplicate metadata from the metadata to generate a resulting metadata, categorize the resulting metadata for similar objects according to one or more taxonomies from one or more websites on the Internet and rank the similar objects based on the categorized metadata.

Patent
07 May 2009
TL;DR: In this article, a system and methods for leveraging proximity data in a web-based socially-enabled information networking environment are disclosed, which may be implemented on a system, of semantic advertising via semantic profiles.
Abstract: Systems and methods for leveraging proximity data in a web-based socially- enabled information networking environment are disclosed. In one aspect, embodiments of the present disclosure include a method, which may be implemented on a system, of semantic advertising via semantic profiles. One embodiment can include, receiving a model profile from an advertiser, enforcing a set of rules that govern accessibility of the web content, parsing the model profile to obtain a first set of model user metadata associated with the ideal set of user characteristics, comparing model user metadata of the first set of model user metadata with user metadata of a set of user metadata of a semantic user profile of a user, and generating a correlation index to indicate a degree of correlation between the model profile and the semantic user profile.

Patent
15 Apr 2009
TL;DR: In this paper, techniques for flash memory management, including storing metadata and/or error correcting information separately from payload data, are discussed, where metadata and error correction information are stored in a random access memory within a solid state drive.
Abstract: Disclosed are techniques for flash memory management, including storing metadata and/or error correcting information separately from payload data. In various embodiments, metadata and/or error correcting information are stored in a random access memory within a solid state drive.

Patent
03 Jul 2009
TL;DR: In this paper, the methods and systems for developing, specifying, and assigning descriptive information relating to the contents of a file (i.e., metadata) are described, and user interface controls on a computer screen implement a dynamically changing display which responds to user input by presenting new categories of choices.
Abstract: The present invention generally relates to the methods and systems for developing, specifying, and assigning descriptive information relating to the contents of a file (i.e., metadata). User interface controls on a computer screen implement a dynamically changing display which responds to user input by presenting new categories of choices. Additional controls allow optimization of the process of specifying and assigning descriptive metadata.

Journal ArticleDOI
TL;DR: In this article, the authors proposed an automatic metadata generation system that leverages resource relationships generated from existing metadata as a medium for propagation from metadata-rich to metadata-poor resources.
Abstract: In spite of its tremendous value, metadata is generally sparse and incomplete, thereby hampering the effectiveness of digital information services. Many of the existing mechanisms for the automated creation of metadata rely primarily on content analysis which can be costly and inefficient. The automatic metadata generation system proposed in this article leverages resource relationships generated from existing metadata as a medium for propagation from metadata-rich to metadata-poor resources. Because of its independence from content analysis, it can be applied to a wide variety of resource media types and is shown to be computationally inexpensive. The proposed method operates through two distinct phases. Occurrence and cooccurrence algorithms first generate an associative network of repository resources leveraging existing repository metadata. Second, using the associative network as a substrate, metadata associated with metadata-rich resources is propagated to metadata-poor resources by means of a discrete-form spreading activation algorithm. This article discusses the general framework for building associative networks, an algorithm for disseminating metadata through such networks, and the results of an experiment and validation of the proposed method using a standard bibliographic dataset.

Patent
James E. Allard1
23 Jan 2009
TL;DR: In this article, the content and metadata associated with the content may be provided to a number of users by displaying the content on a display device while the metadata may be transmitted to a remote device corresponding to a receiving user.
Abstract: Content and metadata associated with the content may be provided to a number of users. The content may be displayed on a display device while the metadata may be transmitted to a remote device corresponding to a receiving user. The user may further request desired information or metadata pertaining to the content and the requested information or metadata may be transmitted to the user's remote device. Different users may request different information on the same or different objects being displayed or presented on a display device. Each requesting user may receive requested information on the same or different objects via corresponding remote devices.

Proceedings ArticleDOI
26 Jul 2009
TL;DR: A package that is designed to extract basic metadata from PDF documents is described, based on a suitable combination of several techniques that include PDF parsing, low level document image processing, and layout analysis.
Abstract: In this paper we analyze our recent research on the use of document analysis techniques for metadata extraction from PDF papers. We describe a package that is designed to extract basic metadata from these documents. The package is used in combination with a digital library software suite to easily build personal digital libraries. The proposed software is based on a suitable combination of several techniques that include PDF parsing, low level document image processing, and layout analysis. In addition, we use the information gathered from a widely known citation database (DBLP) to assist the tool in the difficult task of author identification. The system is tested on some paper collections selected from recent conference proceedings.

Patent
02 Mar 2009
TL;DR: In this paper, a system to synchronize metadata for a plurality of applications is presented, where a rules engine is programmed to apply at least a first set of the content administration rules to a metadata record received from a first application of the plurality of applied to control updating corresponding metadata stored in a master database.
Abstract: A system to synchronize metadata for a plurality of applications. The system includes content administration rules programmed to define policies for updating metadata in the master database and policies for propagating updates in the metadata to the plurality of applications. The metadata describes at least one asset represented as data residing in at least one of the plurality of applications. A rules engine is programmed to apply at least a first set of the content administration rules to a metadata record received from a first application of the plurality of applications to control updating corresponding metadata stored in a master database. Changes in the corresponding metadata made to the master database can be propagated to at least one second application of the plurality of applications according to a second set of the content administration rules predefined for each of the at least one second application.

Patent
16 Jul 2009
TL;DR: In this paper, a metadata server includes a directory hierarchy storage unit, a metadata storage unit and a search unit, which stores all directory hierarchies which are stored in the metadata server cluster.
Abstract: Provided are a metadata server cluster and a metadata management method thereof, which distribute metadata for a file to a cluster including a plurality of metadata servers to replicate the metadata. The metadata server includes a directory hierarchy storage unit, a metadata storage unit, and a search unit. The directory hierarchy storage unit stores all directory hierarchies which are stored in the metadata server cluster. The metadata storage unit stores metadata for a data file. The search unit searches the directory hierarchies and the metadata.

Journal ArticleDOI
TL;DR: A new methodology for knowledge discovery in geographic portals is presented based on the Semantic Web, which exploits the Resource Description Framework (RDF) in order to describe the geoportal's information with ontology-based metadata.

Patent
11 Dec 2009
TL;DR: In this paper, an information search method and an information provision method based on the user's intentions are provided, where an editing device matched to the searcher's intentions ascertained using the results of analysis of searched words is provided.
Abstract: Provided are an information search method and an information provision method based on the user's intentions. The information search method comprises: providing an editing device matched to the searcher's intentions ascertained using the results of analysis of searched words; and searching contents having metadata relating to metadata input through the editing device. In this way, the searcher's intentions can be ascertained from the information input by the searcher, detailed metadata input can be derived based on the ascertained intentions, and a search can be carried out using the input metadata.

Patent
02 Nov 2009
TL;DR: In this paper, a metabase formed from metadata can be used for various data management operations, such as enhanced data management, enhanced data identification, enhanced storage operations, data classification for organizing and storing the metadata, cataloging of metadata for the stored metadata, and/or user interfaces for managing data.
Abstract: Systems and methods for managing electronic data are disclosed. Various data management operations can be performed based on a metabase formed from metadata. Such metadata can be identified from an index of data interactions generated by a journaling module, and obtained from their associated data objects stored in one or more storage devices. In various embodiments, such processing of the index and storing of the metadata can facilitate, for example, enhanced data management operations, enhanced data identification operations, enhanced storage operations, data classification for organizing and storing the metadata, cataloging of metadata for the stored metadata, and/or user interfaces for managing data. In various embodiments, the metabase can be configured in different ways. For example, the metabase can be stored separately from the data objects so as to allow obtaining of information about the data objects without accessing the data objects or a data structure used by a file system.