Showing papers on "Distributed database published in 2006"

PDF

Open Access

Patent•

Systems and methods for use of structured and unstructured distributed data

[...]

01 Feb 2006

TL;DR: In this article, the authors describe hardware, software and electronic service components and systems to provide large-scale, reliable and secure foundations for distributed databases and content management systems, combining unstructured and structured data, and allowing post-input reorganization to achieve a high degree of flexibility.

...read moreread less

Abstract: The invention relates to hardware, software and electronic service components and systems to provide large-scale, reliable, and secure foundations for distributed databases and content management systems, combining unstructured and structured data, and allowing post-input reorganization to achieve a high degree of flexibility.

...read moreread less

659 citations

Patent•

Security systems and methods for use with structured and unstructured data

[...]

James Moore, Bela A. Labovitch

01 Feb 2006

TL;DR: In this article, systems and methods including hardware, software and electronic service components and systems to provide large-scale, reliable, and secure foundations for distributed databases and content management systems combining unstructured and structured data, and allowing post-input reorganization to achieve a high degree of flexibility.

...read moreread less

Abstract: Disclosed herein are systems and methods including hardware, software and electronic service components and systems to provide large-scale, reliable, and secure foundations for distributed databases and content management systems combining unstructured and structured data, and allowing post-input reorganization to achieve a high degree of flexibility.

...read moreread less

576 citations

Journal Article•DOI•

A taxonomy of Data Grids for distributed data sharing, management, and processing

[...]

Srikumar Venugopal¹, Rajkumar Buyya¹, Kotagiri Ramamohanarao¹•Institutions (1)

University of Melbourne¹

29 Jun 2006-ACM Computing Surveys

TL;DR: In this article, the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks, and distributed databases.

...read moreread less

Abstract: Data Grids have been adopted as the next generation platform by many scientific communities that need to share, access, transport, process, and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this article, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks, and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation, and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration.

...read moreread less

360 citations

Journal Article•DOI•

Efficient in-network moving object tracking in wireless sensor networks

[...]

Chih-Yu Lin¹, Wen-Chih Peng¹, Yu-Chee Tseng¹•Institutions (1)

National Chiao Tung University¹

01 Aug 2006-IEEE Transactions on Mobile Computing

TL;DR: This paper develops several tree structures for in-network object tracking which take the physical topology of the sensor network into consideration and shows a significant improvement over existing solutions.

...read moreread less

Abstract: The rapid progress of wireless communication and embedded microsensing MEMS technologies has made wireless sensor networks possible In light of storage in sensors, a sensor network can be considered as a distributed database, in which one can conduct in-network data processing An important issue of wireless sensor networks is object tracking, which typically involves two basic operations: update and query This issue has been intensively studied in other areas, such as cellular networks However, the in-network processing characteristic of sensor networks has posed new challenges to this issue In this paper, we develop several tree structures for in-network object tracking which take the physical topology of the sensor network into consideration The optimization process has two stages The first stage tries to reduce the location update cost based on a deviation-avoidance principle and a highest-weight-first principle The second stage further adjusts the tree obtained in the first stage to reduce the query cost The way we model this problem allows us to analytically formulate the cost of object tracking given the update and query rates of objects Extensive simulations are conducted, which show a significant improvement over existing solutions

...read moreread less

253 citations

Journal Article•DOI•

Distributed Data Mining in Peer-to-Peer Networks

[...]

Souptik Datta¹, Kanishka Bhaduri¹, C. Giannella¹, Ran Wolff², Hillol Kargupta³ - Show less +1 more•Institutions (3)

University of Baltimore¹, Technion – Israel Institute of Technology², University of Maryland, Baltimore County³

01 Jul 2006-IEEE Internet Computing

TL;DR: An overview of DDM applications and algorithms for P2P environments is offered, focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead.

...read moreread less

Abstract: Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact, well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with distributed data, computing nodes, and users. This article offers an overview of DDM applications and algorithms for P2P environments, focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead. The authors describe both exact and approximate local P2P data mining algorithms that work in a decentralized and communication-efficient manner

...read moreread less

239 citations

Patent•

Method and apparatus for categorizing and presenting documents of a distributed database

[...]

Daniel C Fain, Paul T Ryan, Peter Savich

09 Feb 2006

TL;DR: In this article, the authors describe methods for creating categorized documents, categorizing documents in a distributed database and categorizing Resulting Pages. But they do not describe the apparatus for searching such databases.

...read moreread less

Abstract: Described herein are methods for creating categorized documents, categorizing documents in a distributed database and categorizing Resulting Pages. Also described herein is an apparatus for searching a distributed database. The method for creating categorized documents generally comprises: initially assuming all documents are of type 1; filtering out all type 2 documents and placing them in a first category; filtering out all type 3 documents and placing them in a second category; and defining all remaining documents as type 4 documents and placing all type 4 documents in a third category. The apparatus for searching a distributed database generally comprises at least one memory device; a computing apparatus; an indexer; a transactional score generator; and a category assignor; a search server; and a user interface in communication with the search server.

...read moreread less

193 citations

Patent•

Managing database utilities to improve throughput and concurrency

[...]

Douglas P. Brown¹, Anita Richards², Bruce Wayne Britton², Todd A. Walter²•Institutions (2)

NCR Corporation¹, Teradata²

17 May 2006

TL;DR: In this article, a method, computer program, and system for executing a utility on a database system having a plurality of database system nodes is described, and an increased load on the database system required to execute the utility is determined.

...read moreread less

Abstract: A method, computer program, and system are disclosed for executing a utility on a database system having a plurality of database system nodes. Each database system node has an existing load. An increased load on the database system required to execute the utility is determined. The existing load on each of the database system nodes is determined. The increased load is distributed in such a way as to balance the load among the database system nodes.

...read moreread less

133 citations

Patent•

MapReduce for distributed database processing

[...]

Ali Dasdan¹, Hung-chih Yang¹, Ruey-Lung Hsiao¹•Institutions (1)

Yahoo!¹

05 Oct 2006

TL;DR: In this article, an input data set is treated as a plurality of grouped sets of key/value pairs, which enhances the utility of the MapReduce programming methodology, and map processing can be carried out independently on two or more related but possibly heterogeneous datasets (e.g., related by being characterized by a common primary key).

...read moreread less

Abstract: An input data set is treated as a plurality of grouped sets of key/value pairs, which enhances the utility of the MapReduce programming methodology. By utilizing such a grouping, map processing can be carried out independently on two or more related but possibly heterogeneous datasets (e.g., related by being characterized by a common primary key). The intermediate results of the map processing (key/value pairs) for a particular key can be processed together in a single reduce function by applying a different iterator to intermediate values for each group. Different iterators can be arranged inside reduce functions in ways however desired.

...read moreread less

129 citations

Proceedings Article•DOI•

Privacy Preserving Clustering on Horizontally Partitioned Data

[...]

Ali Inan¹, Y. Saygyn¹, Erkay Savas¹, Ayca Azgin Hintoglu¹, Albert Levi¹ - Show less +1 more•Institutions (1)

Sabancı University¹

03 Apr 2006

TL;DR: Methods for constructing the dissimilarity matrix of objects from different sites in a privacy preserving manner which can be used for privacy preserving clustering as well as database joins, record linkage and other operations that require pair-wise comparison of individual private data objects horizontally distributed to multiple sites are proposed.

...read moreread less

Abstract: Data mining has been a popular research area for more than a decade due to its vast spectrum of applications. The power of data mining tools to extract hidden information that cannot be otherwise seen by simple querying proved to be useful. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. The aim of privacy preserving data mining researchers is to develop data mining techniques that could be applied on databases without violating the privacy of individuals. Privacy preserving techniques for various data mining models have been proposed, initially for classification on centralized data then for association rules in distributed environments. In this work, we propose methods for constructing the dissimilarity matrix of objects from different sites in a privacy preserving manner which can be used for privacy preserving clustering as well as database joins, record linkage and other operations that require pair-wise comparison of individual private data objects horizontally distributed to multiple sites.

...read moreread less

101 citations

Patent•

Method for providing extensible software components within a distributed synchronization system

[...]

Scott Durgin, Michael G. Palone, Phil Stanhope, Mikhail Chekmarev

09 Aug 2006

TL;DR: In this article, a synchronization system is provided that distributes synchronization system-based applications and their associated resources and components (hereinafter plug-in applications or plug-ins).

...read moreread less

Abstract: A synchronization system is provided that distributes synchronization system-based applications and synchronization system-based application extensions and their associated resources and components (hereinafter “plug-in applications” or “plug-ins”). Components are maintained such that any synchronization system-based application instantiation may be changed or updated by the synchronization system. In one specific example using the synchronization system, each synchronization system-based application or plug-in is self-contained and self-updateable through a synchronization system synchronization process. A further benefit is that the synchronization system and synchronization system-based applications may be extended independent of device type or operating system. Thus, a system is provided for synchronizing one or more plug-in applications. In one example, the system for synchronizing plug-in applications inclides a synchronization system having at least one distributed database that is configured to store a plug-in application, and a schema for the database. Optionally, the distributed database may be configured to store plug-in application instantiation information, synchronization system-based application association information, role, permissions, access control rights, and data associated with the plug-in application. In one example, each distributed database has at least two instances, and the plug-in application (and optional resources and components) is stored in at least one instance of the distributed database. As described herein, the synchronization system is configured to synchronize the plug-in application (and optional resources and components) between the instances of said distributed database.

...read moreread less

98 citations

Journal Article•DOI•

Locating mobile nodes with EASE: learning efficient routes from encounter histories alone

[...]

Matthias Grossglauser¹, Martin Vetterli¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

01 Jun 2006-IEEE ACM Transactions on Networking

TL;DR: This paper defines and analyze a very simple algorithm called EASE (Exponential Age Search) and shows that in a model where Theta(n) nodes perform independent random walks on a square lattice of size n, the length of the routes computed by EASE are of the same order as the distance between the source and destination even for very large n.

...read moreread less

Abstract: Routing in large-scale mobile ad hoc networks is challenging because all the nodes are potentially moving. Geographic routing can partially alleviate this problem, as nodes can make local routing decisions based solely on the destinations' geographic coordinates. However, geographic routing still requires an efficient location service, i.e., a distributed database recording the location of every destination node. Devising efficient, scalable, and robust location services has received considerable attention in recent years. The main purpose of this paper is to show that node mobility can be exploited to disseminate destination location information without incurring any communication overhead. We achieve this by letting each node maintain a local database of the time and location of its last encounter with every other node in the network. This database is consulted by packets to obtain estimates of their destination's current location. As a packet travels towards its destination, it is able to successively refine an estimate of the destination's precise location, because node mobility has "diffused" estimates of that location. We define and analyze a very simple algorithm called EASE (Exponential Age Search) and show that in a model where Theta(n) nodes perform independent random walks on a square lattice of size n, the length of the routes computed by EASE are of the same order as the distance between the source and destination even for very large n. Therefore, without disseminating any explicit location information, the length of EASE routes are within a constant factor of routes obtained with perfect information. We discuss refinements of the EASE algorithm and evaluate it through extensive simulations. We discuss general conditions such that the mobility diffusion effect leads to efficient routes without an explicit location service. In practical settings, where these conditions may not always be met, we believe that the mobility diffusion effect can complement existing location services and enhance their robustness and scalability

...read moreread less

Patent•

Distributed database in an industrial automation environment

[...]

Gavan W. Hood¹, Ralph Kappelhoff¹, Kenwood H. Hall¹•Institutions (1)

Rockwell Automation¹

11 May 2006

TL;DR: In this article, a distributed database system (100) within an industrial automation environment comprises a plurality of associated programmable logic controllers(104, 106, 108), wherein each of the controllers (110, 112, 114) includes data relating to one of a process and a device.

...read moreread less

Abstract: A distributed database system(100) within an industrial automation environment comprises a plurality of associated programmable logic controllers(104, 106, 108), wherein each of the programmable logic controllers (104, 106, 108) includes data (110, 112, 114) relating to one of a process and a device. Furthermore, the data within the plurality of programmable logic controller (104, 106, 108) can conform to a hierarchically structured data model, which, for example, can be based upon ISA $95, ISA $88, OMAC, or any suitable combination thereof. A reception component (102) receives and services a request for data that is located within at least one of the programmable logic controllers (104, 106, 108).

...read moreread less

Journal Article•

Distributed continuous range query processing on moving objects

[...]

Haojun Wang, Roger Zimmermann, Wei-Shinn Ku

01 Jan 2006-Lecture Notes in Computer Science

TL;DR: In this paper, a distributed server infrastructure is introduced to partition the entire service region into a set of service zones and cooperatively handle requests of continuous range queries, which improves the robustness and flexibility of the system by adapting to a time-varying set of servers.

...read moreread less

Abstract: Recent work on continuous queries has focused on processing queries in very large, mobile environments. In this paper, we propose a system leveraging the computing capacities of mobile devices for continuous range query processing. In our design, continuous range queries are mainly processed on the mobile device side, which is able to achieve real-time updates with minimum server load. Our work distinguish itself from previous work with several important contributions. First, we introduce a distributed server infrastructure to partition the entire service region into a set of service zones and cooperatively handle requests of continuous range queries. This feature improves the robustness and flexibility of the system by adapting to a time-varying set of servers. Second, we propose a novel query indexing structure, which records the difference of the query distribution on a grid model. This approach significantly reduce the size and complexity of the index so that in-memory indexing can be achieved on mobile objects with constrained memory size. We report on the rigorous evaluation of our design, which shows substantial improvement in the efficiency of continuous range query processing in mobile environments.

...read moreread less

Proceedings Article•DOI•

Tashkent: uniting durability with transaction ordering for high-performance scalable database replication

[...]

Sameh Elnikety¹, Steven Dropsho¹, Fernando Pedone²•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, University of Lugano²

18 Apr 2006

TL;DR: It is demonstrated that this separation between ordering and durability in a replicated database causes a significant scalability bottleneck, and Tashkent-MW is a pure middleware solution that combines durability and ordering in the middleware, and treats an unmodified database as a black box.

...read moreread less

Abstract: In stand-alone databases, the functions of ordering the transaction commits and making the effects of transactions durable are performed in one single action, namely the writing of the commit record to disk. For efficiency many of these writes are grouped into a single disk operation. In replicated databases in which all replicas agree on the commit order of update transactions, these two functions are typically separated. Specifically, the replication middleware determines the global commit order, while the database replicas make the transactions durable.The contribution of this paper is to demonstrate that this separation causes a significant scalability bottleneck. It forces some of the commit records to be written to disk serially, where in a standalone system they could have been grouped together in a single disk write. Two solutions are possible: (1) move durability from the database to the replication middleware, or (2) keep durability in the database and pass the global commit order from the replication middleware to the database.We implement these two solutions. Tashkent-MW is a pure middleware solution that combines durability and ordering in the middleware, and treats an unmodified database as a black box. In Tashkent-API, we modify the database API so that the middleware can specify the commit order to the database, thus, combining ordering and durability inside the database. We compare both Tashkent systems to an otherwise identical replicated system, called Base, in which ordering and durability remain separated. Under high update transaction loads both Tashkent systems greatly outperform Base in throughput and response time.

...read moreread less

Dissertation•

Scheduling distributed data-intensive applications on global grids

[...]

Srikumar Venugopal

01 Jul 2006

TL;DR: This paper introduces a heuristic for the selection of resources based on a solution to the Set Covering Problem (SCP), and pair this mapping heuristic with the well-known MinMin scheduling algorithm and conduct performance evaluation through extensive simulations.

...read moreread less

Abstract: The next generation of scientific experiments and studies are being carried out by large collaborations of researchers distributed around the world engaged in analysis of huge collections of data generated by scientific instruments. Grid computing has emerged as an enabler for such collaborations as it aids communities in sharing resources to achieve common objectives. Data Grids provide services for accessing, replicating and managing data collections in these collaborations. Applications used in such Grids are distributed data-intensive, that is, they access and process distributed datasets to generate results. These applications need to transparently and efficiently access distributed data and computational resources. This thesis investigates properties of data-intensive computing environments and presents a software framework and algorithms for mapping distributed data-oriented applications to Grid resources. The thesis discusses the key concepts behind Data Grids and compares them with other data sharing and distribution mechanisms such as content delivery networks, peer-to-peer networks and distributed databases. This thesis provides comprehensive taxonomies that cover various aspects of Data Grid architecture, data transportation, data replication and resource allocation and scheduling. The taxonomies are mapped to various Data Grid systems not only to validate the taxonomy but also to better understand their goals and methodology. The thesis concentrates on one of the areas delineated in the taxonomy – scheduling distributed data-intensive applications on Grid resources. To this end, it presents the design and implementation of a Grid resource broker that mediates access to distributed computational and data resources running diverse middleware. The broker is able to discover remote data repositories, interface with various middleware services and select suitable resources in order to meet the application requirements. The use of the broker is illustrated by a case study of scheduling a data-intensive high energy physics analysis application on an Australia-wide Grid. The broker provides the framework to realise scheduling strategies with differing objectives. One of the key aspects of any scheduling strategy is the mapping of jobs to the appropriate resources to meet the objectives. This thesis presents heuristics for mapping jobs with data dependencies in an environment with heterogeneous Grid resources and multiple data replicas. These heuristics are then compared with performance evaluation metrics obtained through extensive simulations. This is to certify that (i) the thesis comprises only my original work, (ii) due acknowledgement has been made in the text to all other material used, (iii) the thesis is less than 100,000 words in length, exclusive of table, maps, bibliographies, appendices and footnotes.

...read moreread less

Journal Article•DOI•

Real-time processing of range-monitoring queries in heterogeneous mobile databases

[...]

Ying Cai¹, Kien A. Hua, Guohong Cao, T. Xu¹•Institutions (1)

Iowa State University¹

01 Jul 2006-IEEE Transactions on Mobile Computing

TL;DR: An efficient technique for real-time processing of range-monitoring queries that can be highly scalable in supporting location-based services in a wireless environment that consists of a large number of mobile devices.

...read moreread less

Abstract: Unlike conventional range queries, a range-monitoring query is a continuous query. It requires retrieving mobile objects inside a user-defined region and providing continuous updates as the objects move into and out of the region. In this paper, we present an efficient technique for real-time processing of such queries. In our approach, each mobile object is associated with a resident domain, and when an object moves, it monitors its spatial relationship with its resident domain and the monitoring areas inside it. An object reports its location to the server when it crosses over some query boundary or moves out of its resident domain. In the first case, the server updates the affected query results accordingly, while in the second case, the server determines a new resident domain for the object. This distributive approach achieves an accurate and real-time monitoring effect with minimal mobile communication and server processing costs. Our approach also allows a mobile object to negotiate a resident domain based on its computing capability. By having a larger resident domain, a more capable object has less of a chance of moving out of it and having to request a new one. As a result, both communication and server processing costs are reduced. Our comprehensive performance study shows that the proposed technique can be highly scalable in supporting location-based services in a wireless environment that consists of a large number of mobile devices.

...read moreread less

Journal Article•DOI•

Privacy-Preserving Computation of Bayesian Networks on Vertically Partitioned Data

[...]

Z. Yang¹, Rebecca N. Wright¹•Institutions (1)

Stevens Institute of Technology¹

01 Sep 2006-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper presents an efficient and privacy-preserving protocol to construct a Bayesian network from a database vertically partitioned among two parties, in this setting, two parties owning confidential databases wish to learn theBayesian network on the combination of their databases without revealing anything else about their data to each other.

...read moreread less

Abstract: Traditionally, many data mining techniques have been designed in the centralized model in which all data is collected and available in one central site. However, as more and more activities are carried out using computers and computer networks, the amount of potentially sensitive data stored by business, governments, and other parties increases. Different parties often wish to benefit from cooperative use of their data, but privacy regulations and other privacy concerns may prevent the parties from sharing their data. Privacy-preserving data mining provides a solution by creating distributed data mining algorithms in which the underlying data need not be revealed. In this paper, we present privacy-preserving protocols for a particular data mining task: learning a Bayesian network from a database vertically partitioned among two parties. In this setting, two parties owning confidential databases wish to learn the Bayesian network on the combination of their databases without revealing anything else about their data to each other. We present an efficient and privacy-preserving protocol to construct a Bayesian network on the parties' joint data

...read moreread less

Patent•

Method and apparatus for accessing a database through a network

[...]

Mark R. Nelson¹, Jeffery A. Sanders¹, Alan W. Treece¹, William C. Forsythe¹, Clay W. Luther¹ - Show less +1 more•Institutions (1)

Cisco Systems, Inc.¹

14 Feb 2006

TL;DR: In this article, a system includes a client which can communicate through a network and a database layer with any of several databases, in a manner independent of respective protocols specific to each of the databases.

...read moreread less

Abstract: A system includes a client which can communicate through a network and a database layer with any of several databases. The client communicates with the database layer using a public network communication protocol, in a manner independent of respective protocols specific to each of the databases. The database layer handles communication with each database according to the respective protocol of that database.

...read moreread less

Proceedings Article•DOI•

Using queries for distributed monitoring and forensics

[...]

Atul Singh¹, Petros Maniatis², Timothy Roscoe², Peter Druschel¹•Institutions (2)

Rice University¹, Intel²

18 Apr 2006

TL;DR: This paper presents an application logging, monitoring, and debugging facility that is built on top of the P2 system, comprising an introspection model, an execution tracing component, and a distributed query processor, to demonstrate a range of on-line distributed diagnosis tools.

...read moreread less

Abstract: Distributed systems are hard to build, profile, debug, and test. Monitoring a distributed system - to detect and analyze bugs, test for regressions, identify fault-tolerance problems or security compromises - can be difficult and error-prone. In this paper we argue that declarative development of distributed systems is well suited to tackle these tasks. We present an application logging, monitoring, and debugging facility that we have built on top of the P2 system, comprising an introspection model, an execution tracing component, and a distributed query processor. We use this facility to demonstrate a range of on-line distributed diagnosis tools that range from simple, local state assertions to sophisticated global property detectors on consistent snapshots. These tools are small, simple, and can be deployed piecemeal on-line at any point during a system's life cycle. Our evaluation suggests that the overhead of our approach to improving and monitoring running distributed systems continuously is well in tune with its benefits.

...read moreread less

Journal Article•DOI•

The costs and limits of availability for replicated services

[...]

Haifeng Yu¹, Amin Vahdat²•Institutions (2)

Carnegie Mellon University¹, University of California, San Diego²

01 Feb 2006-ACM Transactions on Computer Systems

TL;DR: The benefits of dynamically trading consistency for availability using a continuous consistency model, where applications specify a maximum deviation from strong consistency on a per-replica basis is explored.

...read moreread less

Abstract: As raw system performance continues to improve at exponential rates, the utility of many services is increasingly limited by availability rather than performance. A key approach to improving availability involves replicating the service across multiple, wide-area sites. However, replication introduces well-known trade-offs between service consistency and availability. Thus, this article explores the benefits of dynamically trading consistency for availability using a continuous consistency model. In this model, applications specify a maximum deviation from strong consistency on a per-replica basis. In this article, we: i) evaluate the availability of a prototype replication system running across the Internet as a function of consistency level, consistency protocol, and failure characteristics, ii) demonstrate that simple optimizations to existing consistency protocols result in significant availability improvements (more than an order of magnitude in some scenarios), iii) use our experience with these optimizations to prove tight upper bound on the availability of services, and iv) show that maximizing availability typically entails remaining as close to strong consistency as possible during times of good connectivity, resulting in a communication versus availability trade-off.

...read moreread less

Patent•

Implementation of reliable synchronization of distributed databases

[...]

Nir Arad, Carmi Arad, David Melman

08 May 2006

TL;DR: In this paper, the authors present a method of controlling a plurality of forwarding databases provided in an Ethernet bridge having a plurality-of-devices (MOD) consisting of a first set of entries in a first forwarding database maintained by a first one of the plurality of devices.

...read moreread less

Abstract: A method of controlling a plurality of forwarding databases provided in an Ethernet bridge having a plurality of devices. The method includes aging a first set of entries in a first forwarding database maintained by a first one of the plurality of devices. The first set of entries are owned by the first one of the plurality of devices. The method also includes transmitting one or more new address messages from the first one of the plurality of devices to a second one of the plurality of devices. The method further includes aging a second set of entries in the first forwarding database. The second set of entries are owned by the second one of the plurality of devices.

...read moreread less

Proceedings Article•DOI•

Towards Traceability across Sovereign, Distributed RFID Databases

[...]

Rakesh Agrawal¹, Alvin Cheung², Karin Kailing², Stefan Schonauer²•Institutions (2)

Microsoft¹, IBM²

11 Dec 2006

TL;DR: This work introduces the formal concept of traceability networks and highlights the technical challenges involved in sharing data in such a network, and presents an innovative combination of query processing techniques from P2P networks and distributed as well as parallel databases with confidentiality enforcement techniques.

...read moreread less

Abstract: Tracking and tracing individual items is a new and emerging trend in many industries. Driven by matur- ing technologies such as Radio-Frequency Identification (RFID) and upcoming standards such as the Electronic Product Code (EPC), a rapidly increasing number of enter- prises are collecting vast amounts of tracking data. To en- able traceability over the entire life-cycle of items data has to be shared across independent and possibly competing en- terprises. The need to simultaneously compete and cooper- ate requires a traceability system design that allows compa- nies to share their traceability data while maintaining com- plete sovereignty over what is shared and with whom. Based on an extensive study of traceability applications, we introduce the formal concept of traceability networks and highlight the technical challenges involved in shar- ing data in such a network. To address these challenges, we present an innovative combination of query process- ing techniques from P2P networks and distributed as well as parallel databases with confidentiality enforcement tech- niques.

...read moreread less

Book Chapter•DOI•

Efficient dynamic operator placement in a locally distributed continuous query system

[...]

Yongluan Zhou¹, Beng Chin Ooi¹, Kian-Lee Tan¹, Ji Wu¹•Institutions (1)

National University of Singapore¹

29 Oct 2006

TL;DR: This paper formalizes and analyze the operator placement problem in the context of a locally distributed continuous query system and proposes a solution, that is asynchronous and local, to dynamically manage the load across the system nodes.

...read moreread less

Abstract: In a distributed processing environment, the static placement of query operators may result in unsatisfactory system performance due to unpredictable factors such as changes of servers' load, data arrival rates, etc The problem is exacerbated for continuous (and long running) monitoring queries over data streams as any suboptimal placement will affect the system for a very long time In this paper, we formalize and analyze the operator placement problem in the context of a locally distributed continuous query system We also propose a solution, that is asynchronous and local, to dynamically manage the load across the system nodes Essentially, during runtime, we migrate query operators/fragments from overloaded nodes to lightly loaded ones to achieve better performance Heuristics are also proposed to maintain good data flow locality Results of a performance study shows the effectiveness of our technique.

...read moreread less

Proceedings Article•DOI•

On Quality-of-Service and Publish-Subscribe

[...]

Stefan Behnel, Ludger Fiege¹, Gero Mühl²•Institutions (2)

Siemens¹, Technical University of Berlin²

04 Jul 2006

TL;DR: This paper provides a broad overview of relevant quality-of-service metrics and describes their specific meaning in the context of distributed and decentralized publish-subscribe systems to provide a common base for future evaluations of emerging systems and for the design of qualityof- service aware publish- Subscribe infrastructures.

...read moreread less

Abstract: Publish-subscribe is a powerful paradigm for distributed communication based on decoupled producers and consumers of information. Its event-driven nature makes it very appealing for large-scale data dissemination infrastructures. Various architectures were proposed in recent years that provide very diverse features. However, there are few well-defined metrics in the publish-subscribe area that would allow their evaluation and comparison. In this paper, we provide a broad overview of relevant quality-of-service metrics and describe their specific meaning in the context of distributed and decentralized publish-subscribe systems. Our goal is to provide a common base for future evaluations of emerging systems and for the design of qualityof- service aware publish-subscribe infrastructures.

...read moreread less

Journal Article•DOI•

Maximizing over multiple pattern databases speeds up heuristic search

[...]

Robert C. Holte¹, Ariel Felner², Jack Newton¹, Ram Meshulam³, David Furcy⁴ - Show less +1 more•Institutions (4)

University of Alberta¹, Ben-Gurion University of the Negev², Bar-Ilan University³, University of Wisconsin–Oshkosh⁴

01 Nov 2006-Artificial Intelligence

TL;DR: In all the state spaces considered, the use of multiple smaller pattern databases reduces the number of nodes generated by IDA*.

...read moreread less

Journal Article•DOI•

MobileMed: A PDA-Based Mobile Clinical Information System

[...]

Jinwook Choi¹, Sooyoung Yoo¹, Heekyong Park¹, Jonghoon Chun²•Institutions (2)

Seoul National University¹, Myongji University²

01 Jul 2006

TL;DR: This paper proposes a mobile clinical information system (MobileMed), which integrates the distributed and fragmented patient data across heterogeneous sources and makes them accessible through mobile devices and provides a means for effortless implementation and deployment of such systems.

...read moreread less

Abstract: Patient clinical data are distributed and often fragmented in heterogeneous systems, and therefore the need for information integration is a key to reliable patient care. Once the patient data are orderly integrated and readily available, the problems in accessing the distributed patient clinical data, the well-known difficulties of adopting a mobile health information system, are resolved. This paper proposes a mobile clinical information system (MobileMed), which integrates the distributed and fragmented patient data across heterogeneous sources and makes them accessible through mobile devices. The system consists of four main components: a smart interface, an HL7 message server (HMS), a central clinical database (CCDB), and a web server. The smart interface and the HMS work in concert to generate HL7 messages from the existing legacy systems, which essentially send the patient data in HL7 messages to the CCDB to be stored and maintained. The CCDB and the web server enable the physicians to access the integrated up-to-date patient data. By proposing the smart interface approach, we provide a means for effortless implementation and deployment of such systems. Through a performance study, we show that the HMS is reliable yet fast enough to be able to support efficient clinical data communication

...read moreread less

Book Chapter•DOI•

A framework for distributed XML data management

[...]

Serge Abiteboul¹, Ioana Manolescu¹, Emanuel Taropa¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

26 Mar 2006

TL;DR: This work presents a simple extension to the AXML language, allowing it to declaratively specify and deploy complex applications based solely on XML and XML queries, and enables numerous powerful optimizations across a distributed complex process.

...read moreread less

Abstract: As data management applications grow more complex, they may need efficient distributed query processing, but also subscription management, data archival etc. To enact such applications, the current solution consists of stacking several systems together. The juxtaposition of different computing models prevents reasoning on the application as a whole, and wastes important opportunities to improve performance. We present a simple extension to the AXML [7] language, allowing it to declaratively specify and deploy complex applications based solely on XML and XML queries. Our main contribution is a full algebraic model for complex distributed AXML computations. While very expressive, the model is conceptually uniform, and enables numerous powerful optimizations across a distributed complex process.

...read moreread less

Proceedings Article•DOI•

Database replication policies for dynamic content applications

[...]

Gokul Soundararajan¹, Cristiana Amza¹, Ashvin Goel¹•Institutions (1)

University of Toronto¹

18 Apr 2006

TL;DR: The dynamic replication system dynamically allocates replicas to applications in order to maintain application-level performance in response to either peak loads or failure conditions and shows that dynamic replication requires fewer resources than static partitioning or full overlap replication policies and provides over 90% latency compliance to each application under a range of load and failure scenarios.

...read moreread less

Abstract: The database tier of dynamic content servers at large Internet sites is typically hosted on centralized and expensive hardware. Recently, research prototypes have proposed using database replication on commodity clusters as a more economical scaling solution. In this paper, we propose using database replication to support multiple applications on a shared cluster. Our system dynamically allocates replicas to applications in order to maintain application-level performance in response to either peak loads or failure conditions. This approach allows unifying load and fault management functionality. The main challenge in the design of our system is the lime taken to add database replicas. We present replica allocation policies that take this time delay into account and also design an efficient replica addition method that has minimal impact on other applications.We evaluate our dynamic replication system on a commodity cluster with two standard benchmarks: the TPC-W e-commerce benchmark and the RUBIS auction benchmark. Our evaluation shows that dynamic replication requires fewer resources than static partitioning or full overlap replication policies and provides over 90% latency compliance to each application under a range of load and failure scenarios.

...read moreread less

Monograph•DOI•

Intelligent distributed video surveillance systems

[...]

Sergio A. Velastin, Paolo Remagnino

01 Jan 2006

TL;DR: This book describes a hierarchical multi-sensor framework for event detection in wide environments and describes a distributed database for effective management and evaluation of CCTV systems.

...read moreread less

Abstract: * Chapter 1: A review of the state-of-the-art in distributed surveillance systems * Chapter 2: Monitoring practice: event detection and system design * Chapter 3: A distributed database for effective management and evaluation of CCTV systems * Chapter 4: A distributed domotic surveillance system * Chapter 5: A general-purpose system for distributed surveillance and communication * Chapter 6: Tracking objects across uncalibrated, arbitrary topology camera networks * Chapter 7: A distributed multi-sensor surveillance system for public transport applications * Chapter 8: Tracking football players with multiple cameras * Chapter 9: A hierarchical multi-sensor framework for event detection in wide environments

...read moreread less

Book Chapter•DOI•

“Secure” log-linear and logistic regression analysis of distributed databases

[...]

Stephen E. Fienberg¹, William J. Fulp¹, Aleksandra B. Slavkovic², Tracey A. Wrobel²•Institutions (2)

Carnegie Mellon University¹, Pennsylvania State University²

13 Dec 2006

TL;DR: It is shown how ideas from the current literature that focus on “secure” summations and secure regression analysis can be adapted or generalized to the categorical data setting, especially in the form of log-linear analysis and logistic regression over partitioned databases.

...read moreread less

Abstract: The machine learning community has focused on confidentiality problems associated with statistical analyses that “integrate” data stored in multiple, distributed databases where there are barriers to simply integrating the databases. This paper discusses various techniques which can be used to perform statistical analysis for categorical data, especially in the form of log-linear analysis and logistic regression over partitioned databases, while limiting confidentiality concerns. We show how ideas from the current literature that focus on “secure” summations and secure regression analysis can be adapted or generalized to the categorical data setting.

...read moreread less

Collapse