scispace - formally typeset
Search or ask a question

Showing papers on "Data management published in 2005"


Journal ArticleDOI
01 Jun 2005-Proteins
TL;DR: The development of a set of software applications that use the Data Model and its associated libraries, thus validating the approach and providing a pipeline for high‐throughput analysis of NMR data.
Abstract: To address data management and data exchange problems in the nuclear magnetic resonance (NMR) community, the Collaborative Computing Project for the NMR community (CCPN) created a "Data Model" that describes all the different types of information needed in an NMR structural study, from molecular structure and NMR parameters to coordinates. This paper describes the development of a set of software applications that use the Data Model and its associated libraries, thus validating the approach. These applications are freely available and provide a pipeline for high-throughput analysis of NMR data. Three programs work directly with the Data Model: CcpNmr Analysis, an entirely new analysis and interactive display program, the CcpNmr FormatConverter, which allows transfer of data from programs commonly used in NMR to and from the Data Model, and the CLOUDS software for automated structure calculation and assignment (Carnegie Mellon University), which was rewritten to interact directly with the Data Model. The ARIA 2.0 software for structure calculation (Institut Pasteur) and the QUEEN program for validation of restraints (University of Nijmegen) were extended to provide conversion of their data to the Data Model. During these developments the Data Model has been thoroughly tested and used, demonstrating that applications can successfully exchange data via the Data Model. The software architecture developed by CCPN is now ready for new developments, such as integration with additional software applications and extensions of the Data Model into other areas of research.

2,906 citations


Book
22 Jun 2005
TL;DR: This book describes a body of practical techniques that can extract useful information from raw data and shows how they work.
Abstract: As with any burgeoning technology that enjoys commercial attention, the use of data mining is surrounded by a great deal of hype. Exaggerated reports tell of secrets that can be uncovered by setting algorithms loose on oceans of data. But there is no magic in machine learning, no hidden power, no alchemy. Instead there is an identifiable body of practical techniques that can extract useful information from raw data. This book describes these techniques and shows how they work.

1,807 citations


Journal ArticleDOI
01 Sep 2005
TL;DR: The main aspect of the taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and storeprovenance, and ways to disseminate it.
Abstract: Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources.In this paper we create a taxonomy of data provenance characteristics and apply it to current research efforts in e-science, focusing primarily on scientific workflow approaches. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. The survey culminates with an identification of open research problems in the field.

1,214 citations


Book
27 Jan 2005
TL;DR: In this paper, the contemporary importance of knowledge and knowledge management is discussed, and the production and consumption of knowledge on knowledge management and some reflections on its future as a subject are discussed.
Abstract: 1. Introduction: The Contemporary Importance of Knowledge and Knowledge Management SECTION 1: EPISTEMOLOGIES OF KNOWLEDGE IN THE KNOWLEDGE MANAGEMENT LITERATURE 2. The Objectivist Perspective on Knowledge 3. The Practice-Based Perspective on Knowledge SECTION 2: AN INTRODUCTION TO KEY CONCEPTS 4. What is Knowledge Management? 5. Knowledge Intensive Firms and Knowledge Workers 6. Learning and Knowledge Management SECTION 3: KNOWLEDGE CREATION AND LOSS 7. Innovation Dynamics and Knowledge Processes 8. Forgetting and Unlearning Knowledge SECTION 4: SOCIO-CULTURAL ISSUES RELATED TO MANAGING AND SHARING KNOWLEDGE 9. The Influence of Socio-Cultural Factors in Motivating Workers to Participate in Knowledge Management Initiatives 10. Communities of Practice 11. Cross Community, Boundary Spanning Knowledge Processes 12. Power, Politics, Conflict, and Knowledge Processes 13. Information and Communication Technologies and Knowledge Management 14. Facilitating Knowledge Management via Culture Management and HRM Practices 15. Leadership and Knowledge Management 16. The Production and Consumption of Knowledge on Knowledge Management and Some Reflections on Its Future as a Subject

731 citations


Journal ArticleDOI
01 Dec 2005
TL;DR: This paper proposes dataspaces and their support systems as a new agenda for data management, which encompasses much of the work going on in data management today, while posing additional research objectives.
Abstract: The development of relational database management systems served to focus the data management community for decades, with spectacular results. In recent years, however, the rapidly-expanding demands of "data everywhere" have led to a field comprised of interesting and productive efforts, but without a central focus or coordinated agenda. The most acute information management challenges today stem from organizations (e.g., enterprises, government agencies, libraries, "smart" homes) relying on a large number of diverse, interrelated data sources, but having no way to manage their dataspaces in a convenient, integrated, or principled fashion. This paper proposes dataspaces and their support systems as a new agenda for data management. This agenda encompasses much of the work going on in data management today, while posing additional research objectives.

723 citations


Posted Content
TL;DR: In this paper, the authors focus on the so-called field-tested and grounded technological rule as a possible product of Mode 2 research with the potential to improve the relevance of academic research in management.
Abstract: The relevance problem of academic management research in organization and management is an old and thorny one. Recent discussions on this issue have resulted in proposals to use more Mode 2 knowledge production in our field. These discussions focused mainly on the process of research itself and less on the products produced by this process. Here the focus is on the so-called field-tested and grounded technological rule as a possible product of Mode 2 research with the potential to improve the relevance of academic research in management. Technological rules can be seen as solution-oriented knowledge. Such knowledge may be called Management Theory, while more description-oriented knowledge may be called Organization Theory. In this article the nature of technological rules in management is discussed, as well as their development, their use in actual management practice and the potential for cross-fertilization between Management Theory and Organization Theory.

691 citations


Posted Content
TL;DR: Analyzing this data to find the subtle effects missed by previous studies requires algorithms that can simultaneously deal with huge datasets and that can find very subtle effects --- finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.
Abstract: This is a thought piece on data-intensive science requirements for databases and science centers. It argues that peta-scale datasets will be housed by science centers that provide substantial storage and processing for scientists who access the data via smart notebooks. Next-generation science instruments and simulations will generate these peta-scale datasets. The need to publish and share data and the need for generic analysis and visualization tools will finally create a convergence on common metadata standards. Database systems will be judged by their support of these metadata standards and by their ability to manage and access peta-scale datasets. The procedural stream-of-bytes-file-centric approach to data analysis is both too cumbersome and too serial for such large datasets. Non-procedural query and analysis of schematized self-describing data is both easier to use and allows much more parallelism.

476 citations


Journal ArticleDOI
01 Dec 2005
TL;DR: In this article, the authors propose algorithms that can simultaneously deal with huge datasets and that can find very subtle effects, finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.
Abstract: Scientific instruments and computer simulations are creating vast data stores that require new scientific methods to analyze and organize the data. Data volumes are approximately doubling each year. Since these new instruments have extraordinary precision, the data quality is also rapidly improving. Analyzing this data to find the subtle effects missed by previous studies requires algorithms that can simultaneously deal with huge datasets and that can find very subtle effects --- finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.

432 citations


Journal ArticleDOI
TL;DR: In this policy forum the authors argue that data cleaning is an essential part of the research process, and should be incorporated into study design.
Abstract: In this policy forum the authors argue that data cleaning is an essential part of the research process, and should be incorporated into study design.

396 citations


Proceedings Article
30 Aug 2005
TL;DR: This system enables semantic RFID data filtering and automatic data transformation based on declarative rules, provides powerful query support of RFID object tracking and monitoring, and can be adapted to different RFID-enabled applications.
Abstract: RFID technology can be used to significantly improve the efficiency of business processes by providing the capability of automatic identification and data capture. This technology poses many new challenges on current data management systems. RFID data are time-dependent, dynamically changing, in large volumes, and carry implicit semantics. RFID data management systems need to effectively support such large scale temporal data created by RFID applications. These systems need to have an explicit temporal data model for RFID data to support tracking and monitoring queries. In addition, they need to have an automatic method to transform the primitive observations from RFID readers into derived data used in RFID-enabled applications. In this paper, we present an integrated RFID data management system -- Siemens RFID Middleware -- based on an expressive temporal data model for RFID data. Our system enables semantic RFID data filtering and automatic data transformation based on declarative rules, provides powerful query support of RFID object tracking and monitoring, and can be adapted to different RFID-enabled applications.

352 citations


Patent
02 May 2005
TL;DR: In this paper, a hierarchical storage management architecture is presented to facilitate data management, which provides methods for evaluating the state of stored data relative to enterprise needs by using weighted parameters that may be user defined.
Abstract: The present invention provides systems and methods for data storage. A hierarchical storage management architecture is presented to facilitate data management. The disclosed system provides methods for evaluating the state of stored data relative to enterprise needs by using weighted parameters that may be user defined. Also disclosed are systems and methods evaluating costing and risk management associated with stored data.

Proceedings Article
01 Jan 2005
TL;DR: This work proposes a new, distributed architecture that allows an organization to outsource its data management to untrusted servers while preserving data privacy, and shows how the presence of two servers enables efficient partitioning of data.
Abstract: Recent trends towards database outsourcing, as well as concerns and laws governing data privacy, have led to great interest in enabling secure database services. Previous approaches to enabling such a service have been based on data encryption, causing a large overhead in query processing. We propose a new, distributed architecture that allows an organization to outsource its data management to {\em two} untrusted servers while preserving data privacy. We show how the presence of two servers enables efficient partitioning of data so that the contents at any one server are guaranteed not to breach data privacy. We show how to optimize and execute queries in this architecture, and discuss new challenges that emerge in designing the database schema.

Journal ArticleDOI
TL;DR: It is hoped that STAT will act as a catalytic foundation, fostering collaboration among users of satellite telemetry, and ensuring maximum value from these studies.
Abstract: Despite the obvious power and advan- tages of the Argos system to track animals by satellite, the data generated are difficult for many biologists to exploit. A broad range of skills is required to efficiently download, collate, filter and interpret Argos data. Inte- gration of animal movements with other physical (e.g. remote sensing imagery) and anthropogenic (e.g. fish- ery distributions) datasets presents additional technical and computing challenges. The Satellite Tracking and Analysis Tool (STAT) is a freely available system de- signed for biologists who work on animal tracking; it includes a set of standardized tools and techniques for data management, analysis, and integration with envi- ronmental data. STAT logs in to the Argos computer network each day and downloads all available loca- tions and associated data for each user. These data are parsed and stored in a relational database and auto- matically backed up to an offsite location. A number of data filtering options are available, including setting maximum speed, time or distance between consecutive points, Argos location class, and turning angle. A vari- ety of environmental data layers, including bathymetry, sea surface temperature, sea surface height, ocean cur- rents and chlorophyll, can be sampled for all locations in the STAT database and can be downloaded and incorporated into tracking maps and animations. STAT also facilitates collaboration and the sharing of animal tracking information with the wider public and funding organizations. We hope that STAT will act as a catalytic foundation, fostering collaboration among users of satellite telemetry, and ensuring maximum value from these studies.

Journal ArticleDOI
TL;DR: In this article, the authors examined the potential that quality management offers for improving supply chain management performance and found that there is a relationship between quality management and supply chain performance measures, and that a specific set of quality management practices known as co-makership and performance measures are related to the pursuit of supply chain and quality goals simultaneously.
Abstract: This paper examines the potential that quality management offers for improving supply chain management performance. Based on the theoretical and descriptive literature, four themes related to this topic are extracted. These are related to the pursuit of supply chain and quality goals simultaneously, leading to the development of cumulative capabilities, the relationship between quality management practices and supply chain performance measures and the relationship between a specific set of quality management practices known as co-makership and supply chain performance measures. Hypotheses were developed and tested using an existing database of information from 164 plants in the machinery, electronics and transportation components industries in the USA, Germany, Italy, Japan and England. There was strong support for all four hypotheses, indicating that there is a relationship between quality management and supply chain management. Practical implications and guidelines for managers focus upon leveraging thi...

Patent
06 Jun 2005
TL;DR: A host-client data sharing system manages diabetes care data as discussed by the authors, where the host database uses multiple servers for handling client interactions with the system, and the client or local database stores the data relating to multiple diabetics on a personal appliance such as a PC, or a portable or handheld microprocessor-based computing device.
Abstract: A host-client data sharing system manages diabetes care data. A host database, preferably web or internet based, is implemented for storing diabetes care data relating to multiple diabetics. A client or local database stores the diabetes care data relating to multiple diabetics on a personal appliance such as a PC, or a portable or handheld microprocessor-based computing device. The host database uses multiple servers for handling client interactions with the system.

Journal ArticleDOI
TL;DR: In this article, the authors focus on the development and implementation of a more streamlined digital workflow from the initial data acquisition stage to the final project output, which will permit explicit information on uncertainty to be carried forward from field-based geoscientific study since the invention of the geological map.
Abstract: The development of affordable digital technologies that allow the collection and analysis of georeferenced field data represents one of the most significant changes in field-based geoscientific study since the invention of the geological map. Digital methods make it easier to re-use pre-existing data (e.g. previous field data, geophysical survey, satellite images) during renewed phases of fieldwork. Increased spatial accuracy from satellite and laser positioning systems provides access to geostatistical and geospatial analyses that can inform hypothesis testing during fieldwork. High-resolution geomatic surveys, including laser scanning methods, allow 3D photorealistic outcrop images to be captured and interpreted using novel visualization and analysis methods. In addition, better data management on projects is possible using geospatially referenced databases that match agreed international data standards. Collectively, the new techniques allow 3D models of geological architectures to be constructed directly from field data in ways that are more robust compared with the abstract models constructed traditionally by geoscientists. This development will permit explicit information on uncertainty to be carried forward from field data to the final product. Current work is focused upon the development and implementation of a more streamlined digital workflow from the initial data acquisition stage to the final project output.

Journal ArticleDOI
TL;DR: Knowledge is distinguished from data and information and is viewed as a " fluid mix of framed experience, values, contextual information and expert insight that provide[s] a framework for evaluation and incorporating new experiences and information".

Patent
06 May 2005
TL;DR: In this article, a host driver embedded in an application server connects an application and its data to a cluster and provides a method and apparatus for capturing real-time data transactions in the form of an event journal that is provided to the data management system.
Abstract: A data management system or “DMS” provides a wide range of data services to data sources associated with a set of application host servers. The data management system typically comprises one or more regions, with each region having one or more clusters. A given cluster has one or more nodes that share storage. To facilitate the data service, a host driver embedded in an application server connects an application and its data to a cluster. The host driver provides a method and apparatus for capturing real-time data transactions in the form of an event journal that is provided to the data management system. The driver functions to translate traditional file/database/block I/O into a continuous, application-aware, output data stream. Using the streams generated in this manner, the DMS offers a wide range of data services that include, by way of example only: data protection (and recovery), disaster recovery (data distribution and data replication), data copy, and data query and access.

Proceedings Article
01 Jan 2005
TL;DR: This paper identifies the key characteristics and data management challenges presented by high fan-in systems, and argues for a uniform, query-based approach towards addressing them, and presents the initial design concepts behind HiFi.
Abstract: Advances in data acquisition and sensor technologies are leading towards the development of “high fan-in” architectures: widely distributed systems whose edges consist of numerous receptors such as sensor networks, RFID readers, or probes, and whose interior nodes are traditional host computers organized using the principles of cascading streams and successive aggregation. Examples include RFID-enabled supply chain management, largescale environmental monitoring, and various types of network and computing infrastructure monitoring. In this paper, we identify the key characteristics and data management challenges presented by high fan-in systems, and argue for a uniform, query-based approach towards addressing them. We then present our initial design concepts behind HiFi, the system we are building to embody these ideas, and describe a proof-of-concept prototype.

Patent
05 May 2005
TL;DR: In this article, the authors describe a data management system or "DMS" which provides a wide range of data services to data sources associated with a set of application host servers, including data protection, data distribution, and data replication.
Abstract: A data management system or 'DMS' provides a wide range of data services to data sources (520) associated with a set of application host servers. The data management system typically comprises one or more regions, with each region having one or more clusters. A given cluster has one or more nodes that share storage. To facilitate the data service, a host driver (510) embedded in an application server connects an application (514) and its data to a cluster. The host driver provides a method and apparatus for capturing real-time data modifications and application state notifications and, in response, generating data transactions in the form of an event journal that is provided to the data management system. The driver functions to translate traditional file/database/block I/O into a continuous, application-aware, output data stream. Using the streams generated in this manner, the DMS offers a wide range of data services that include, by way of example only: data protection (and recovery), and disaster recovery (data distribution and data replication).

01 Jan 2005
TL;DR: The main aspect of the taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and storeprovenance, and ways to disseminate it can help those building scientific and business metadata-management systems to understand existing provenance system designs.
Abstract: Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. The provenance of data products generated by complex transformations such as workflows is of considerable value to scientists. From it, one can ascertain the quality of the data based on its ancestral data and derivations, track back sources of errors, allow automated re-enactment of derivations to update a data, and provide attribution of data sources. Provenance is also essential to the business domain where it can be used to drill down to the source of data in a data warehouse, track the creation of intellectual property, and provide an audit trail for regulatory purposes. In this paper we create a taxonomy of data provenance techniques, and apply the classification to current research efforts in the field. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. Our synthesis can help those building scientific and business metadata-management systems to understand existing provenance system designs. The survey culminates with an identification of open research problems in the field.

Book
30 Sep 2005
TL;DR: Theoretical aspects of knowledge management are discussed in this paper for e-economy knowledge management in the global economy Knowledge management for e -economy Knowledge management in global economy Legal aspects of KMs Managerial aspects of Knowledge Management Organizational and social aspects of KM Organizing KMs in distributed organizations Stakeholder-based knowledge Management in organizations Successful knowledge management systems implementation Technologies of knowledge Management
Abstract: A sample of contents: Knowledge management and virtual organizations Knowledge management for e-economy Knowledge management in the global economy Legal aspects of knowledge management Managerial aspects of knowledge management Organizational and social aspects of knowledge management Organizing knowledge management in distributed organizations Stakeholder-based knowledge management in organizations Successful knowledge management systems implementation Technologies of knowledge management Theoretical aspects of knowledge management.

Patent
26 May 2005
TL;DR: In this article, the authors propose a data management system for transmitting and receiving data in a state wherein it is not necessary to know pieces of personal information of individuals without sending nor receiving data between the individuals.
Abstract: PROBLEM TO BE SOLVED: To transmit and receive data in a state wherein it is not necessary to know pieces of personal information of individuals without sending nor receiving data between the individuals. SOLUTION: The data management system has a data storage means (a database 13 for electronic letter of proxy and a database 14 for electronic notification) of storing data sent from a terminal through the Internet 4 and a window management means (a window management system 3) of managing terminal's access to the data storage means. The window management means is configured to generate an electronic permit permitting data to be taken out in relation to the data each time the data are stored in the data storage means and send it to the terminal as a transmission source, and also configured to receives an electronic permit when the electronic permit is sent back from the terminal after being sent and then allow data related to the electronic permit to be taken out of the data storage means. COPYRIGHT: (C)2005,JPO&NCIPI

Journal ArticleDOI
28 Feb 2005
TL;DR: The Earth System Grid (ESG) is a collaborative interdisciplinary project aimed at addressing the challenge of enabling management, discovery, access, and analysis of these critically important datasets in a distributed and heterogeneous computational environment.
Abstract: Understanding the Earth's climate system and how it might be changing is a preeminent scientific challenge. Global climate models are used to simulate past, present, and future climates, and experiments are executed continuously on an array of distributed supercomputers. The resulting data archive, spread over several sites, currently contains upwards of 100 TB of simulation data and is growing rapidly. Looking toward mid-decade and beyond, we must anticipate and prepare for distributed climate research data holdings of many petabytes. The Earth System Grid (ESG) is a collaborative interdisciplinary project aimed at addressing the challenge of enabling management, discovery, access, and analysis of these critically important datasets in a distributed and heterogeneous computational environment. The problem is fundamentally a Grid problem. Building upon the Globus toolkit and a variety of other technologies, ESG is developing an environment that addresses authentication, authorization for data access, large-scale data transport and management, services and abstractions for high-performance remote data access, mechanisms for scalable data replication, cataloging with rich semantic and syntactic information, data discovery, distributed monitoring, and Web-based portals for using the system.

Journal ArticleDOI
TL;DR: In this article, the authors propose research into several important new directions for database management systems, driven by the Internet and increasing amounts of scientific and sensor data, and propose new approaches for database needs are changing.
Abstract: Database needs are changing, driven by the Internet and increasing amounts of scientific and sensor data. In this article, the authors propose research into several important new directions for database management systems.

Proceedings Article
01 Jan 2005
TL;DR: The Semex System is described, that offers users a flexible platform for personal information management that leverages the personal information to enable lightweight information integration tasks that are discouragingly difficult to perform with today’s tools.
Abstract: The explosion of the amount of information available in digital form has made search a hot research topic for the Information Management Community. While most of the research on search is focussed on the WWW, individual computer users have developed their own vast collections of data on their desktops, and these collections are in critical need for good search tools. We describe the Semex System that offers users a flexible platform for personal information management. Semex has two main goals. The first goal is to enable browsing personal information by semantically meaningful associations. The challenge it to automatically create such associations between data items on one’s desktop, and to create enough of them so Semex becomes an indispensable tool. Our second goal is to leverage the personal information space we created to increase users’ productivity. As our first target, Semex leverages the personal information to enable lightweight information integration tasks that are discouragingly difficult to perform with today’s tools.

Proceedings Article
Radu Sion1
30 Aug 2005
TL;DR: This work introduces query execution proofs; for each executed batch of queries the database service provider is required to provide a strong cryptographic proof that provides assurance that the queries were actually executed correctly over their entire target data set.
Abstract: In this paper we propose and analyze a method for proofs of actual query execution in an outsourced database framework, in which a client outsources its data management needs to a specialized provider. The solution is not limited to simple selection predicate queries but handles arbitrary query types. While this work focuses mainly on read-only, compute-intensive (e.g. data-mining) queries, it also provides preliminary mechanisms for handling data updates (at additional costs). We introduce query execution proofs; for each executed batch of queries the database service provider is required to provide a strong cryptographic proof that provides assurance that the queries were actually executed correctly over their entire target data set. We implement a proof of concept and present experimental results in a real-world data mining application, proving the deployment feasibility of our solution. We analyze the solution and show that its overheads are reasonable and are far outweighed by the added security benefits. For example an assurance level of over 95% can be achieved with less than 25% execution time overhead.

Journal ArticleDOI
TL;DR: What drives knowledge management in today's business environment is considered and the need for fulfilment of these objectives is analyzed in order to provide some answers as to why knowledge management is seen as such a significant value contributor in today’s business world.


Patent
04 Mar 2005
TL;DR: A universal data management interface (UDMI) as discussed by the authors is a system that allows multiple virtual databases that reside in a single database to be available as a network service, allowing multiple users to access, manage, and manipulate data within each of the multiple standard database management systems.
Abstract: A universal data management interface (UDMI) system includes a processing system generates a visual interface through which a user can access, manage, and manipulate data on plural different types of remote databases. The UDMI connects to multiple standard database management systems and to allow multiple users to access, manage, and manipulate data within each of the multiple standard database management systems. The UDMI also allows multiple virtual databases that reside in a single database to be available as a network service.