scispace - formally typeset
Search or ask a question

Showing papers on "Metadata repository published in 2005"


Proceedings ArticleDOI
Phokion G. Kolaitis1
13 Jun 2005
TL;DR: The main aim in this paper is to present an overview of recent advances in data exchange and metadata management, where the schema mappings are between relational schemas.
Abstract: Schema mappings are high-level specifications that describe the relationship between database schemas. Schema mappings are prominent in several different areas of database management, including database design, information integration, data exchange, metadata management, and peer-to-peer data management systems. Our main aim in this paper is to present an overview of recent advances in data exchange and metadata management, where the schema mappings are between relational schemas. In addition, we highlight some research issues and directions for future work.

310 citations


Patent
11 Mar 2005
TL;DR: In this paper, a system and method for generating a list is described, which includes a seed item input subsystem, an item identifying subsystem, a descriptive metadata similarity determining subsystem and a list generating subsystem that builds a list based, at least in part, on similarity processing performed on seed item descriptive metadata and user item descriptors.
Abstract: A system and method for generating a list is provided. The system includes a seed item input subsystem, an item identifying subsystem, a descriptive metadata similarity determining subsystem and a list generating subsystem that builds a list based, at least in part, on similarity processing performed on seed item descriptive metadata and user item descriptive metadata and user selected thresholds applied to such similarity processing. The method includes inexact matching between identifying metadata associated with new user items and identifying metadata stored in a reference metadata database. The method further includes subjecting candidate user items to similarity processing, where the degree to which the candidate user items are similar to the seed item is determined, and placing user items in a list of items based on user selected preferences for (dis)similarity between items in the list and the seed item.

280 citations


Patent
18 Jan 2005
TL;DR: In this paper, the authors describe a system for providing rich multimedia metadata usable to generate, e.g., sophisticated entertainment user interfaces in the home, using a server-based software application that feeds multiple, diverse clients.
Abstract: Exemplary embodiments of the present invention provide methods and systems for supplying rich multimedia metadata usable to generate, e.g., sophisticated entertainment user interfaces in the home. These methods and systems can be implemented as a server-based software application that feeds multiple, diverse clients. The server functionality could be distributed, even co-located physically with one or more clients, or centralized. The server aggregates, filters, validates, augments and links metadata from disparate sources. The server transforms the metadata into a more manageable and extensible internal format. The server communicates with client devices using a schema-independent protocol, providing metadata in the appropriate format that suites the clients needs.

172 citations


Patent
27 Sep 2005
TL;DR: In this paper, the authors present an organization system for organizing items, consisting of a data structure (10) associating at least one semantic metadata (12) with an item (11) to define a directional relationship between a concept and the item.
Abstract: An organisation system (5) for organizing items (11), the system (5) comprising: a data structure (10) associating at least one semantic metadata (12) with an item (11) to define a directional relationship between a concept and the item (11); and a user interface (20) to express the at least one semantic metadata (12) in at least one natural language using a description or at least one keyword corresponding to the concept in the at least one natural language; wherein the at least one semantic metadata (12) corresponds to the concept that is a characteristic of the item (11); and the at least one semantic metadata (12) and the item (11) are referenced by unique machine-readable identifiers

169 citations


Proceedings ArticleDOI
10 May 2005
TL;DR: The first step towards this framework for automatic metadata generation is the definition of an Application Programmer Interface (API), which is called the Simple Indexing Interface (SII), and the second step is the defined a framework for implementation of the SII.
Abstract: In this paper, we focus on the development of a framework for automatic metadata generation. The first step towards this framework is the definition of an Application Programmer Interface (API), which we call the Simple Indexing Interface (SII). The second step is the definition of a framework for implementation of the SII. Both steps are presented in some detail in this paper. We also report on empirical evaluation of the metadata that the SII and supporting framework generated in a real-life context.

163 citations


Patent
28 Jul 2005
TL;DR: In this article, the authors propose an archival storage cluster of preferably symmetric nodes including a metadata management system that organizes and provides access to given metadata, preferably in the form of metadata objects.
Abstract: An archival storage cluster of preferably symmetric nodes includes a metadata management system that organizes and provides access to given metadata, preferably in the form of metadata objects. Each metadata object may have a unique name, and metadata objects are organized into regions. Preferably, a region is selected by hashing one or more object attributes (e.g., the object's name) and extracting a given number of bits of the resulting hash value. The number of bits may be controlled by a configuration parameter. Each region is stored redundantly. A region comprises a set of region copies. In particular, there is one authoritative copy of the region, and zero or more backup copies. The number of backup copies may be controlled by a configuration parameter. Region copies are distributed across the nodes of the cluster so as to balance the number of authoritative region copies per node, as well as the number of total region copies per node. Backup region copies are maintained synchronized to their associated authoritative region copy.

158 citations


07 Mar 2005
TL;DR: This document specifies an abstract model for DCMI metadata [DCMI], to provide a reference model against which particular DC encoding guidelines can be compared and facilitates the development of better mappings and translations between different syntaxes.
Abstract: This document specifies an abstract model for DCMI metadata [DCMI]. The primary purpose of this document is to provide a reference model against which particular DC encoding guidelines can be compared. To function well, a reference model needs to be independent of any particular encoding syntax. Such a reference model allows us to gain a better understanding of the kinds of descriptions that we are trying to encode and facilitates the development of better mappings and translations between different syntaxes. This document is primarily aimed at the developers of software applications that support Dublin Core metadata, people involved in developing new syntax encoding guidelines for Dublin Core metadata and those people developing metadata application profiles based on the Dublin Core.

128 citations


Patent
19 Aug 2005
TL;DR: In this article, a generic metadata container is presented that allows object metadata to be described in an extensible manner using protocol-neutral and platform-independent methodologies, and a metadata scope refers to a dynamic universe of targets to which the included metadata statements correspond.
Abstract: Methods, systems, and data structures for communicating object metadata are provided. A generic metadata container is presented that allows object metadata to be described in an extensible manner using protocol-neutral and platform-independent methodologies. A metadata scope refers to a dynamic universe of targets to which the included metadata statements correspond. Metadata properties provide a mechanism to describe the metadata itself, and metadata security can be used to ensure authentic metadata is sent and received. Mechanisms are also provided to allow refinement and replacement of metadata statements. The generic metadata container can be adapted to dynamically define access control rights to a range of objects by a range of users, including granted and denied access rights.

116 citations


Journal ArticleDOI
TL;DR: The MODAL (Metadata Objectives and principles, Domains, and Architectural Layout) framework is introduced as an approach for studying metadata schemes, including different types of metadata schemes.
Abstract: SUMMARY Although the development and implementation of metadata schemes over the last decade has been extensive, research examining the sum of these activities is limited. This limitation is likely due to the massive scope of the topic. A framework is needed to study the full extent of, and functionalities supported by, metadata schemes. Metadata schemes developed for information resources are analyzed. To begin, the author presents a review of the definition of metadata, metadata functions, and several metadata typologies. Next, a conceptualization for metadata schemes is presented. The emphasis is on semantic container-like metadata schemes (data structures). The last part of this paper introduces the MODAL (Metadata Objectives and principles, Domains, and Architectural Layout) framework as an approach for studying metadata schemes. The paper concludes with a brief discussion on the value of frameworks for examining metadata schemes, including different types of metadata schemes.

107 citations


Patent
07 Feb 2005
TL;DR: In this paper, a system and method that facilitates providing contextual advertisements based on one or more identified terms extracted from a non-text object such as an image, video, and/or audio object is presented.
Abstract: The subject invention provides a unique system and method that facilitates providing contextual advertisements based on one or more identified terms extracted from a non-text object such as an image, video, and/or audio object. Terms can also be identified and extracted from metadata associated with or other data derived from text objects such as email messages and attached text documents. One or more recognition techniques can be employed to identify data found in the non-text object (including the metadata or any other data derived therefrom) and data found in the metadata associated with the text object. Once the identified terms are analyzed, an appropriate contextual advertisement can be presented to the user. If the content of the non-text or text object is deemed of a negative nature, no contextual advertisement is provided.

102 citations


Patent
28 Oct 2005
TL;DR: In this paper, a harvester is disclosed for harvesting metadata of managed objects (files and directories) across file systems which are generally not interoperable in an enterprise environment, which are stored in a metadata repository to facilitate the automated or semi-automated application of policies.
Abstract: A harvester is disclosed for harvesting metadata of managed objects (files and directories) across file systems which are generally not interoperable in an enterprise environment. Harvested metadata may include 1) file system attributes such as size, owner, recency; 2) content-specific attributes such as the presence or absence of various keywords (or combinations of keywords) within documents as well as concepts comprised of natural language entities; 3) synthetic attributes such as mathematical checksums or hashes of file contents; and 4) high-level semantic attributes that serve to classify and categorize files and documents. The classification itself can trigger an action in compliance with a policy rule. Harvested metadata are stored in a metadata repository to facilitate the automated or semi-automated application of policies.

Patent
13 Jun 2005
TL;DR: In this paper, the authors present methods and systems to improve network searching for watermarked content by using a plurality of distributed watermark-enabled web browsers, and a system is provided to allow customer input.
Abstract: The present invention provides methods and systems to improve network searching for watermarked content. In some implementations we employ keyword searching to narrow the universe of possible URL candidates. A resulting URL list is searched for digital watermarking. A system is provided to allow customer input. For example, a customer enters keywords or network locations. The keywords or network locations are provided to a watermark-enabled web browser which accesses locations associated with the keywords or network locations. Some implementations of the present invention employ a plurality of distributed watermark-enabled web browsers. Other aspects of the invention provide methods and system to facilitate desktop searching and automated metadata gathering and generating. In one implementation a digital watermark is used to determine whether metadata associated with an image or audio file is current or fresh. The metadata is updated when it is out of date. Watermarks can also be used to link to or facilitate so-called on-line “blogs” (or online conversations).

Patent
07 Oct 2005
TL;DR: In this article, a method and system for responding locally to requests for file metadata associated with files stored remotely includes a method of responding locally, without downloading the file from a remote location.
Abstract: A method and system for responding locally to requests for file metadata associated with files stored remotely includes a method of responding locally to requests for file metadata without downloading the file from a remote location. A directory structure representing an application program stored by a remote machine and metadata associated with each file comprising the stored application program are received from the remote machine. The directory and the metadata are stored. At least one request to access metadata associated with a specific tile in the directory structure is received. The stored metadata is used to respond to the at least one request.

Patent
14 Nov 2005
TL;DR: In this paper, a method and image data acquisition service arrangement are disclosed that facilitate standardizing the process of adding metadata to acquired image information downloaded from a connected image capture device to a computer system.
Abstract: A method and image data acquisition service arrangement are disclosed that facilitate standardizing the process of adding metadata to acquired image information downloaded from a connected image capture device to a computer system. An image acquisition service analyzes image information downloaded from the image capture device and renders new metadata values based upon applied analytical algorithms/filters. Thereafter, the image information and new metadata are rendered available to other processes that use the image information and new metadata.

Book ChapterDOI
29 May 2005
TL;DR: This paper investigates how to extract and store activity based context information explicitly as RDF metadata and how to use them, as well as additional background information and ontologies, to enhance desktop search.
Abstract: With increasing storage capacities on current PCs, searching the World Wide Web has ironically become more efficient than searching one's own personal computer. The recently introduced desktop search engines are a first step towards coping with this problem, but not yet a satisfying solution. The reason for that is that desktop search is actually quite different from its web counterpart. Documents on the desktop are not linked to each other in a way comparable to the web, which means that result ranking is poor or even inexistent, because algorithms like PageRank cannot be used for desktop search. On the other hand, desktop search could potentially profit from a lot of implicit and explicit semantic information available in emails, folder hierarchies, browser cache contexts and others. This paper investigates how to extract and store these activity based context information explicitly as RDF metadata and how to use them, as well as additional background information and ontologies, to enhance desktop search.

01 Jan 2005
TL;DR: A number of statistical characterizations of metadata samples drawn from a large corpus harvested through the Open Archives Initiative (OAI) Protocol for Metadata Harvesting (PMH) are presented and interpret these findings in relation to general quality dimensions and metadata practices that occur at the local level.
Abstract: Introduction The federation of digital resources has become increasingly important in realizing the full potential of digital libraries. Federation is often achieved through the aggregation of descriptive metadata, therefore the decisions resource developers make for the creation, maintenance, and quality assurance of their metadata can have significant impacts on aggregators and service providers. Metadata may be of high quality within a local database or web site, but when it is taken out of this context, information may be lost or its integrity may be compromised. Maintaining consistency and fitness for purpose are also complicated when metadata are combined in a federated environment. A fuller understanding of the criteria for high quality, “shareable” metadata is crucial to the next step in the development of federated digital libraries. This study of metadata quality was conducted by the IMLS Digital Collections and Content (DCC) project team (http://imlsdcc.grainger.uiuc.edu/) using quantitative and qualitative analysis of metadata authoring practices of several projects funded through the Institute of Museum and Library Services (IMLS) National Leadership Grant (NLG) program. We present a number of statistical characterizations of metadata samples drawn from a large corpus harvested through the Open Archives Initiative (OAI) Protocol for Metadata Harvesting (PMH) and interpret these findings in relation to general quality dimensions and metadata practices that occur at the local level. We discuss the impact of these kinds of quality on aggregation and suggest quality control and normalization processes that may improve search and discovery services at the aggregated level.

Patent
06 Jan 2005
TL;DR: In this article, a general technique using semantic analysis is provided that can be used for converting a specific automation test script (and its underlying test case), generated from generally available or proprietary test automation tools, into an abstract test case representation.
Abstract: A general technique using semantic analysis is provided that can be used for converting a specific automation test script (and its underlying test case), generated from generally available or proprietary test automation tools, into an abstract test case representation. The abstract test case representation is based on a test case representation model that includes application states (state information), external interaction sequences (control flow information) and input data. The abstract representation in essence provides a platform independent representation of test cases. An application object model provides the representational capabilities required for capturing structural and behavioral properties of the application under test. The abstract test case representation can be validated against and further enriched by specific object information from an application metadata repository. Finally, object information and input data can be separated from control flow information to provide automatic parameterization of the test script.

Patent
25 May 2005
TL;DR: In this paper, the authors present a data processing system and method for applications such as marketing campaign management, speech recognition and signal processing, where a plurality of processing nodes are adapted to use the plurality of selectable parameters to assemble a first plurality of data from the first and second data repositories and from input data.
Abstract: The various embodiments of the invention provide a data processing system and method, for applications such as marketing campaign management, speech recognition and signal processing. An exemplary system embodiment includes a first data repository adapted to store a plurality of entity and attribute data; a second data repository adapted to store a plurality of entity linkage data; a metadata data repository adapted to store a plurality of metadata modules, with a first metadata module having a plurality of selectable parameters, received through a control interface, and having a plurality of metadata linkages to a first subset of metadata modules; and a multidimensional data structure. The control interface may modify the plurality of selectable parameters in response to received control information. A plurality of processing nodes are adapted to use the plurality of selectable parameters to assemble a first plurality of data from the first and second data repositories and from input data, to reduce the first plurality of data to form a second plurality of data, and to aggregate and dimension the second plurality of data for storage in the multidimensional data structure.

Patent
22 Dec 2005
TL;DR: In this paper, the authors present a method to monitor a first rendering of the program content and then to modify it during a second rendering to inhibit the rendering of one or more segments defined by the data within the metadata.
Abstract: Metadata regarding program content is created by monitoring a manner in which a first rendering of the program content is affected by a user whereupon the metadata will include data which defines one or more segments within the program content. The data within the metadata is then usable during a second rendering of the program content to inhibit, e.g., advance over during playing or omit during copying, the rendering of the one or more segments within the program content defined by the data within the metadata.

Patent
Ali Diab1, David Ku1, Kevin Lee1, Qi Lu1, Nam Nguyen1, Eckart Walther1 
15 Mar 2005
TL;DR: In inverse search, the user submits a query that includes an identifier of a target content item in the corpus and receives information (metadata) about the target content items being returned to the user as discussed by the authors.
Abstract: Inverse search systems and methods operate on identifiers of content items in a corpus such as the World Wide Web In an inverse search, the user submits a query that includes an identifier of a target content item in the corpus and receives information (metadata) about the target content item being returned to the user. Many types of metadata can be returned, including ratings or other metadata related to the target content item obtained from users, popularity data specific to the target content item, information about previously submitted forward search queries that led to the target content item being identified as a hit, and metadata extracted from the target content item.

Patent
28 Oct 2005
TL;DR: In this article, a method for providing selected content items to a user is presented, where the selection of content items is based on metadata pre-assigned to content items, typically authored content metadata, and on metadata generated and associated afterwards, called derived content metadate.
Abstract: A method is disclosed for providing selected content items to a user; the selection of content items is based on metadata pre-assigned to content items, typically authored content metadata, and on metadata generated and associated afterwards, called derived content metadate; additionally, the selection of content items can be based also on context metadata, particularly derived context metadata. Derived metadata are automatically generating on the basis of derivation rules corresponding to algorithms to be applied to e.g. the content of content items, authored content metadata and context metadata. User profiles can be used for improving the selection quality; a method is also disclosed for building and maintaining user profiles based on machine learning techniques.

Patent
16 Mar 2005
TL;DR: In this paper, the authors present methods and systems for migrating a data integration facility such as an ETL job from a source data integration platform to a target data integration platforms, where the target job is adapted to perform substantially the same functions as the source job.
Abstract: Methods and systems are provided for migrating a data integration facility, such as an ETL job, from a source data integration platform to a target data integration platform. Certain embodiments involve externalizing a metadata representation of a source data integration job; parsing the metadata representation; importing the parsed metadata into a plurality of object representations of the source data integration job; generating an intermediate representation of the source data integration platform based on the plurality of object representations; and translating the intermediate representation to generate a target data integration job; wherein the target data integration job is adapted perform substantially the same functions as the source data integration job.

Patent
22 Apr 2005
TL;DR: In this article, the authors propose a method and system for application-to-application data exchange which provides data conversion from the format of a source application to a target application upon receipt of data by the target application.
Abstract: A method and system for application-to-application data exchange which provides data conversion from the format of a source application to the format of a target application upon receipt of data by the target application. To achieve compatibility among applications exchanging data, the preferred system uses a standard set of terms and process names for building metadata packets that inform both applications as to their respective data representation. A metadata packet includes a standard name and application specific data format, as well as an optional associated process name. Source metadata provided in connection with source application-specific data enables the conversion of the source format to the format compatible with the target. This method eliminates data conversion at the source application.

Journal ArticleDOI
Jelena Tesic1
TL;DR: These metadata schemas provide a standard format for creating, processing, and exchanging digital image metadata and enable image management, analysis, indexing, and search applications.
Abstract: Digital image metadata plays a crucial role in managing digital image repositories It lets us catalog and maintain large image collections as well as search for and find relevant information Moreover, describing a digital image with defined metadata schemes lets multiple systems with different platforms and interfaces access and process image metadata Metadata's wide use in commercial, academic, and educational domains as well as on the Web has propelled the development of new standards for digital image data schemes The Japan Electronics and Information Technology Industries Association has proposed the Exchangeable Image File Format (EXIF) as a standard for storing administrative metadata in digital image files during acquisition The International Press Telecommunications Council (IPTC) has developed a standard for storing descriptive metadata information within digital images These metadata schemas, as well as other emerging standards, provide a standard format for creating, processing, and exchanging digital image metadata and enable image management, analysis, indexing, and search applications

Patent
25 Mar 2005
TL;DR: In this article, a method, system, device and software code product for efficient multicast/broadcast distribution of formatted data is described, where the metadata packets are transmitted at an earlier time location than they occur in the original formatted data file to decrease latency in file playback.
Abstract: A method, system, device and software code product are disclosed which provide efficient multicast/broadcast distribution of formatted data. Formatted data can comprise metadata and content (media data) and several embodiment of the inventions disclose retransmitting the metadata in order to increase the reliability of the file distribution by increasing the chances that the metadata is received without error. In addition, embodiments of the invention disclose scheduling transmission of data packets of formatted data so that the metadata packets are transmitted at an earlier time location than they occur in the original formatted data file to decrease latency in file playback.

Book ChapterDOI
31 Oct 2005
TL;DR: A proposal for a metadata standard, so called Ontology Metadata Vocabulary (OMV) which is based on discussions in the EU IST thematic network of excellence Knowledge Web and two complementary reference implementations which show the benefit of such a standard in decentralized and centralized scenarios are presented.
Abstract: Ontologies have seen quite an enormous development and application in many domains within the last years, especially in the context of the next web generation, the Semantic Web. Besides the work of countless researchers across the world, industry starts developing ontologies to support their daily operative business. Currently, most ontologies exist in pure form without any additional information, e.g. authorship information, such as provided by Dublin Core for text documents. This burden makes it difficult for academia and industry e.g. to identify, find and apply – basically meaning to reuse – ontologies effectively and efficiently. Our contribution consists of (i) a proposal for a metadata standard, so called Ontology Metadata Vocabulary (OMV) which is based on discussions in the EU IST thematic network of excellence Knowledge Web and (ii) two complementary reference implementations which show the benefit of such a standard in decentralized and centralized scenarios, i.e. the Oyster P2P system and the Onthology metadata portal.

Patent
29 Apr 2005
TL;DR: In this article, a metadata blob creation function cooperates with the file system APIs to read source metadata associated with specified file data and creates and populates metadata blob from which a substantial copy of the source metadata can be generated.
Abstract: A network storage system comprises data storage, one or more file system APIs, and a metadata handler. The data storage can comprise file data and associated metadata. The file system APIs can be configured to read and write file data and metadata to and from the data storage. The metadata handler can have a library of functions for handling the metadata. The library can include a metadata blob creation function and a metadata blob extraction function. The metadata blob creation function cooperates with the file system APIs to read source metadata associated with specified file data and creates and populates a metadata blob from which a substantial copy of the source metadata can be generated. The metadata blob extraction function receives at least a metadata blob, extracts information from the metadata blob, and cooperates with the file system APIs to generate destination metadata, a substantial copy of the source metadata.

Proceedings ArticleDOI
14 Jun 2005
TL;DR: This work describes how MetaMatrix captures and manages metadata through the use of the OMG's MOF architecture and multiple domain-specific modeling languages, and how this semantic and syntactic metadata is then used for a variety of purposes, including accessing data in real-time from the underlying enterprise systems.
Abstract: Integrating enterprise information requires an accurate, precise and complete understanding of the disparate data sources, the needs of the information consumers, and how these map to the semantic business concepts of the enterprise. We describe how MetaMatrix captures and manages this metadata through the use of the OMG's MOF architecture and multiple domain-specific modeling languages, and how this semantic and syntactic metadata is then used for a variety of purposes, including accessing data in real-time from the underlying enterprise systems, integrating it, and returning it as information expected by consumers.

Patent
18 Mar 2005
TL;DR: In one embodiment of the present invention, a method for creating spreadsheet metadata comprises receiving an item in a spreadsheet, receiving item metadata about the item, and associating the item metadata with the item to create spreadsheet metadata as discussed by the authors.
Abstract: The present invention generally relates to new and improved embodiments of methods and systems for capturing and providing arbitrarily rich data to be stored or manipulated within a spreadsheet. In one embodiment of the present invention, a method for creating spreadsheet metadata comprises receiving an item in a spreadsheet, receiving item metadata about the item, and associating the item metadata with the item to create spreadsheet metadata.

Patent
21 Mar 2005
TL;DR: In this paper, a system records a video program as well as metadata associated with the video program, and the system then receives updated metadata associated to the video programs and replaces the previously recorded metadata with the updated metadata.
Abstract: A system records a video program (304) as well as metadata associated with the video program (306). The system then receives updated metadata associated with the video program (308). The previously recorded metadata is replaced with the updated metadata (312).