scispace - formally typeset
Search or ask a question

Showing papers on "Distributed database published in 2006"


Patent
01 Feb 2006
TL;DR: In this article, the authors describe hardware, software and electronic service components and systems to provide large-scale, reliable and secure foundations for distributed databases and content management systems, combining unstructured and structured data, and allowing post-input reorganization to achieve a high degree of flexibility.
Abstract: The invention relates to hardware, software and electronic service components and systems to provide large-scale, reliable, and secure foundations for distributed databases and content management systems, combining unstructured and structured data, and allowing post-input reorganization to achieve a high degree of flexibility.

659 citations


Patent
01 Feb 2006
TL;DR: In this article, systems and methods including hardware, software and electronic service components and systems to provide large-scale, reliable, and secure foundations for distributed databases and content management systems combining unstructured and structured data, and allowing post-input reorganization to achieve a high degree of flexibility.
Abstract: Disclosed herein are systems and methods including hardware, software and electronic service components and systems to provide large-scale, reliable, and secure foundations for distributed databases and content management systems combining unstructured and structured data, and allowing post-input reorganization to achieve a high degree of flexibility.

576 citations


Journal ArticleDOI
TL;DR: In this article, the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks, and distributed databases.
Abstract: Data Grids have been adopted as the next generation platform by many scientific communities that need to share, access, transport, process, and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this article, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks, and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation, and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration.

360 citations


Journal ArticleDOI
TL;DR: This paper develops several tree structures for in-network object tracking which take the physical topology of the sensor network into consideration and shows a significant improvement over existing solutions.
Abstract: The rapid progress of wireless communication and embedded microsensing MEMS technologies has made wireless sensor networks possible In light of storage in sensors, a sensor network can be considered as a distributed database, in which one can conduct in-network data processing An important issue of wireless sensor networks is object tracking, which typically involves two basic operations: update and query This issue has been intensively studied in other areas, such as cellular networks However, the in-network processing characteristic of sensor networks has posed new challenges to this issue In this paper, we develop several tree structures for in-network object tracking which take the physical topology of the sensor network into consideration The optimization process has two stages The first stage tries to reduce the location update cost based on a deviation-avoidance principle and a highest-weight-first principle The second stage further adjusts the tree obtained in the first stage to reduce the query cost The way we model this problem allows us to analytically formulate the cost of object tracking given the update and query rates of objects Extensive simulations are conducted, which show a significant improvement over existing solutions

253 citations


Journal ArticleDOI
TL;DR: An overview of DDM applications and algorithms for P2P environments is offered, focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead.
Abstract: Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact, well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with distributed data, computing nodes, and users. This article offers an overview of DDM applications and algorithms for P2P environments, focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead. The authors describe both exact and approximate local P2P data mining algorithms that work in a decentralized and communication-efficient manner

239 citations


Patent
09 Feb 2006
TL;DR: In this article, the authors describe methods for creating categorized documents, categorizing documents in a distributed database and categorizing Resulting Pages. But they do not describe the apparatus for searching such databases.
Abstract: Described herein are methods for creating categorized documents, categorizing documents in a distributed database and categorizing Resulting Pages. Also described herein is an apparatus for searching a distributed database. The method for creating categorized documents generally comprises: initially assuming all documents are of type 1; filtering out all type 2 documents and placing them in a first category; filtering out all type 3 documents and placing them in a second category; and defining all remaining documents as type 4 documents and placing all type 4 documents in a third category. The apparatus for searching a distributed database generally comprises at least one memory device; a computing apparatus; an indexer; a transactional score generator; and a category assignor; a search server; and a user interface in communication with the search server.

193 citations


Patent
17 May 2006
TL;DR: In this article, a method, computer program, and system for executing a utility on a database system having a plurality of database system nodes is described, and an increased load on the database system required to execute the utility is determined.
Abstract: A method, computer program, and system are disclosed for executing a utility on a database system having a plurality of database system nodes. Each database system node has an existing load. An increased load on the database system required to execute the utility is determined. The existing load on each of the database system nodes is determined. The increased load is distributed in such a way as to balance the load among the database system nodes.

133 citations


Patent
05 Oct 2006
TL;DR: In this article, an input data set is treated as a plurality of grouped sets of key/value pairs, which enhances the utility of the MapReduce programming methodology, and map processing can be carried out independently on two or more related but possibly heterogeneous datasets (e.g., related by being characterized by a common primary key).
Abstract: An input data set is treated as a plurality of grouped sets of key/value pairs, which enhances the utility of the MapReduce programming methodology. By utilizing such a grouping, map processing can be carried out independently on two or more related but possibly heterogeneous datasets (e.g., related by being characterized by a common primary key). The intermediate results of the map processing (key/value pairs) for a particular key can be processed together in a single reduce function by applying a different iterator to intermediate values for each group. Different iterators can be arranged inside reduce functions in ways however desired.

129 citations


Proceedings ArticleDOI
Ali Inan1, Y. Saygyn1, Erkay Savas1, Ayca Azgin Hintoglu1, Albert Levi1 
03 Apr 2006
TL;DR: Methods for constructing the dissimilarity matrix of objects from different sites in a privacy preserving manner which can be used for privacy preserving clustering as well as database joins, record linkage and other operations that require pair-wise comparison of individual private data objects horizontally distributed to multiple sites are proposed.
Abstract: Data mining has been a popular research area for more than a decade due to its vast spectrum of applications. The power of data mining tools to extract hidden information that cannot be otherwise seen by simple querying proved to be useful. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. The aim of privacy preserving data mining researchers is to develop data mining techniques that could be applied on databases without violating the privacy of individuals. Privacy preserving techniques for various data mining models have been proposed, initially for classification on centralized data then for association rules in distributed environments. In this work, we propose methods for constructing the dissimilarity matrix of objects from different sites in a privacy preserving manner which can be used for privacy preserving clustering as well as database joins, record linkage and other operations that require pair-wise comparison of individual private data objects horizontally distributed to multiple sites.

101 citations


Patent
09 Aug 2006
TL;DR: In this article, a synchronization system is provided that distributes synchronization system-based applications and their associated resources and components (hereinafter plug-in applications or plug-ins).
Abstract: A synchronization system is provided that distributes synchronization system-based applications and synchronization system-based application extensions and their associated resources and components (hereinafter “plug-in applications” or “plug-ins”). Components are maintained such that any synchronization system-based application instantiation may be changed or updated by the synchronization system. In one specific example using the synchronization system, each synchronization system-based application or plug-in is self-contained and self-updateable through a synchronization system synchronization process. A further benefit is that the synchronization system and synchronization system-based applications may be extended independent of device type or operating system. Thus, a system is provided for synchronizing one or more plug-in applications. In one example, the system for synchronizing plug-in applications inclides a synchronization system having at least one distributed database that is configured to store a plug-in application, and a schema for the database. Optionally, the distributed database may be configured to store plug-in application instantiation information, synchronization system-based application association information, role, permissions, access control rights, and data associated with the plug-in application. In one example, each distributed database has at least two instances, and the plug-in application (and optional resources and components) is stored in at least one instance of the distributed database. As described herein, the synchronization system is configured to synchronize the plug-in application (and optional resources and components) between the instances of said distributed database.

98 citations


Journal ArticleDOI
TL;DR: This paper defines and analyze a very simple algorithm called EASE (Exponential Age Search) and shows that in a model where Theta(n) nodes perform independent random walks on a square lattice of size n, the length of the routes computed by EASE are of the same order as the distance between the source and destination even for very large n.
Abstract: Routing in large-scale mobile ad hoc networks is challenging because all the nodes are potentially moving. Geographic routing can partially alleviate this problem, as nodes can make local routing decisions based solely on the destinations' geographic coordinates. However, geographic routing still requires an efficient location service, i.e., a distributed database recording the location of every destination node. Devising efficient, scalable, and robust location services has received considerable attention in recent years. The main purpose of this paper is to show that node mobility can be exploited to disseminate destination location information without incurring any communication overhead. We achieve this by letting each node maintain a local database of the time and location of its last encounter with every other node in the network. This database is consulted by packets to obtain estimates of their destination's current location. As a packet travels towards its destination, it is able to successively refine an estimate of the destination's precise location, because node mobility has "diffused" estimates of that location. We define and analyze a very simple algorithm called EASE (Exponential Age Search) and show that in a model where Theta(n) nodes perform independent random walks on a square lattice of size n, the length of the routes computed by EASE are of the same order as the distance between the source and destination even for very large n. Therefore, without disseminating any explicit location information, the length of EASE routes are within a constant factor of routes obtained with perfect information. We discuss refinements of the EASE algorithm and evaluate it through extensive simulations. We discuss general conditions such that the mobility diffusion effect leads to efficient routes without an explicit location service. In practical settings, where these conditions may not always be met, we believe that the mobility diffusion effect can complement existing location services and enhance their robustness and scalability

Patent
11 May 2006
TL;DR: In this article, a distributed database system (100) within an industrial automation environment comprises a plurality of associated programmable logic controllers(104, 106, 108), wherein each of the controllers (110, 112, 114) includes data relating to one of a process and a device.
Abstract: A distributed database system(100) within an industrial automation environment comprises a plurality of associated programmable logic controllers(104, 106, 108), wherein each of the programmable logic controllers (104, 106, 108) includes data (110, 112, 114) relating to one of a process and a device. Furthermore, the data within the plurality of programmable logic controller (104, 106, 108) can conform to a hierarchically structured data model, which, for example, can be based upon ISA $95, ISA $88, OMAC, or any suitable combination thereof. A reception component (102) receives and services a request for data that is located within at least one of the programmable logic controllers (104, 106, 108).

Journal Article
TL;DR: In this paper, a distributed server infrastructure is introduced to partition the entire service region into a set of service zones and cooperatively handle requests of continuous range queries, which improves the robustness and flexibility of the system by adapting to a time-varying set of servers.
Abstract: Recent work on continuous queries has focused on processing queries in very large, mobile environments. In this paper, we propose a system leveraging the computing capacities of mobile devices for continuous range query processing. In our design, continuous range queries are mainly processed on the mobile device side, which is able to achieve real-time updates with minimum server load. Our work distinguish itself from previous work with several important contributions. First, we introduce a distributed server infrastructure to partition the entire service region into a set of service zones and cooperatively handle requests of continuous range queries. This feature improves the robustness and flexibility of the system by adapting to a time-varying set of servers. Second, we propose a novel query indexing structure, which records the difference of the query distribution on a grid model. This approach significantly reduce the size and complexity of the index so that in-memory indexing can be achieved on mobile objects with constrained memory size. We report on the rigorous evaluation of our design, which shows substantial improvement in the efficiency of continuous range query processing in mobile environments.

Proceedings ArticleDOI
18 Apr 2006
TL;DR: It is demonstrated that this separation between ordering and durability in a replicated database causes a significant scalability bottleneck, and Tashkent-MW is a pure middleware solution that combines durability and ordering in the middleware, and treats an unmodified database as a black box.
Abstract: In stand-alone databases, the functions of ordering the transaction commits and making the effects of transactions durable are performed in one single action, namely the writing of the commit record to disk. For efficiency many of these writes are grouped into a single disk operation. In replicated databases in which all replicas agree on the commit order of update transactions, these two functions are typically separated. Specifically, the replication middleware determines the global commit order, while the database replicas make the transactions durable.The contribution of this paper is to demonstrate that this separation causes a significant scalability bottleneck. It forces some of the commit records to be written to disk serially, where in a standalone system they could have been grouped together in a single disk write. Two solutions are possible: (1) move durability from the database to the replication middleware, or (2) keep durability in the database and pass the global commit order from the replication middleware to the database.We implement these two solutions. Tashkent-MW is a pure middleware solution that combines durability and ordering in the middleware, and treats an unmodified database as a black box. In Tashkent-API, we modify the database API so that the middleware can specify the commit order to the database, thus, combining ordering and durability inside the database. We compare both Tashkent systems to an otherwise identical replicated system, called Base, in which ordering and durability remain separated. Under high update transaction loads both Tashkent systems greatly outperform Base in throughput and response time.

Dissertation
01 Jul 2006
TL;DR: This paper introduces a heuristic for the selection of resources based on a solution to the Set Covering Problem (SCP), and pair this mapping heuristic with the well-known MinMin scheduling algorithm and conduct performance evaluation through extensive simulations.
Abstract: The next generation of scientific experiments and studies are being carried out by large collaborations of researchers distributed around the world engaged in analysis of huge collections of data generated by scientific instruments. Grid computing has emerged as an enabler for such collaborations as it aids communities in sharing resources to achieve common objectives. Data Grids provide services for accessing, replicating and managing data collections in these collaborations. Applications used in such Grids are distributed data-intensive, that is, they access and process distributed datasets to generate results. These applications need to transparently and efficiently access distributed data and computational resources. This thesis investigates properties of data-intensive computing environments and presents a software framework and algorithms for mapping distributed data-oriented applications to Grid resources. The thesis discusses the key concepts behind Data Grids and compares them with other data sharing and distribution mechanisms such as content delivery networks, peer-to-peer networks and distributed databases. This thesis provides comprehensive taxonomies that cover various aspects of Data Grid architecture, data transportation, data replication and resource allocation and scheduling. The taxonomies are mapped to various Data Grid systems not only to validate the taxonomy but also to better understand their goals and methodology. The thesis concentrates on one of the areas delineated in the taxonomy – scheduling distributed data-intensive applications on Grid resources. To this end, it presents the design and implementation of a Grid resource broker that mediates access to distributed computational and data resources running diverse middleware. The broker is able to discover remote data repositories, interface with various middleware services and select suitable resources in order to meet the application requirements. The use of the broker is illustrated by a case study of scheduling a data-intensive high energy physics analysis application on an Australia-wide Grid. The broker provides the framework to realise scheduling strategies with differing objectives. One of the key aspects of any scheduling strategy is the mapping of jobs to the appropriate resources to meet the objectives. This thesis presents heuristics for mapping jobs with data dependencies in an environment with heterogeneous Grid resources and multiple data replicas. These heuristics are then compared with performance evaluation metrics obtained through extensive simulations. This is to certify that (i) the thesis comprises only my original work, (ii) due acknowledgement has been made in the text to all other material used, (iii) the thesis is less than 100,000 words in length, exclusive of table, maps, bibliographies, appendices and footnotes.

Journal ArticleDOI
TL;DR: An efficient technique for real-time processing of range-monitoring queries that can be highly scalable in supporting location-based services in a wireless environment that consists of a large number of mobile devices.
Abstract: Unlike conventional range queries, a range-monitoring query is a continuous query. It requires retrieving mobile objects inside a user-defined region and providing continuous updates as the objects move into and out of the region. In this paper, we present an efficient technique for real-time processing of such queries. In our approach, each mobile object is associated with a resident domain, and when an object moves, it monitors its spatial relationship with its resident domain and the monitoring areas inside it. An object reports its location to the server when it crosses over some query boundary or moves out of its resident domain. In the first case, the server updates the affected query results accordingly, while in the second case, the server determines a new resident domain for the object. This distributive approach achieves an accurate and real-time monitoring effect with minimal mobile communication and server processing costs. Our approach also allows a mobile object to negotiate a resident domain based on its computing capability. By having a larger resident domain, a more capable object has less of a chance of moving out of it and having to request a new one. As a result, both communication and server processing costs are reduced. Our comprehensive performance study shows that the proposed technique can be highly scalable in supporting location-based services in a wireless environment that consists of a large number of mobile devices.

Journal ArticleDOI
TL;DR: This paper presents an efficient and privacy-preserving protocol to construct a Bayesian network from a database vertically partitioned among two parties, in this setting, two parties owning confidential databases wish to learn theBayesian network on the combination of their databases without revealing anything else about their data to each other.
Abstract: Traditionally, many data mining techniques have been designed in the centralized model in which all data is collected and available in one central site. However, as more and more activities are carried out using computers and computer networks, the amount of potentially sensitive data stored by business, governments, and other parties increases. Different parties often wish to benefit from cooperative use of their data, but privacy regulations and other privacy concerns may prevent the parties from sharing their data. Privacy-preserving data mining provides a solution by creating distributed data mining algorithms in which the underlying data need not be revealed. In this paper, we present privacy-preserving protocols for a particular data mining task: learning a Bayesian network from a database vertically partitioned among two parties. In this setting, two parties owning confidential databases wish to learn the Bayesian network on the combination of their databases without revealing anything else about their data to each other. We present an efficient and privacy-preserving protocol to construct a Bayesian network on the parties' joint data

Patent
14 Feb 2006
TL;DR: In this article, a system includes a client which can communicate through a network and a database layer with any of several databases, in a manner independent of respective protocols specific to each of the databases.
Abstract: A system includes a client which can communicate through a network and a database layer with any of several databases. The client communicates with the database layer using a public network communication protocol, in a manner independent of respective protocols specific to each of the databases. The database layer handles communication with each database according to the respective protocol of that database.

Proceedings ArticleDOI
18 Apr 2006
TL;DR: This paper presents an application logging, monitoring, and debugging facility that is built on top of the P2 system, comprising an introspection model, an execution tracing component, and a distributed query processor, to demonstrate a range of on-line distributed diagnosis tools.
Abstract: Distributed systems are hard to build, profile, debug, and test. Monitoring a distributed system - to detect and analyze bugs, test for regressions, identify fault-tolerance problems or security compromises - can be difficult and error-prone. In this paper we argue that declarative development of distributed systems is well suited to tackle these tasks. We present an application logging, monitoring, and debugging facility that we have built on top of the P2 system, comprising an introspection model, an execution tracing component, and a distributed query processor. We use this facility to demonstrate a range of on-line distributed diagnosis tools that range from simple, local state assertions to sophisticated global property detectors on consistent snapshots. These tools are small, simple, and can be deployed piecemeal on-line at any point during a system's life cycle. Our evaluation suggests that the overhead of our approach to improving and monitoring running distributed systems continuously is well in tune with its benefits.

Journal ArticleDOI
TL;DR: The benefits of dynamically trading consistency for availability using a continuous consistency model, where applications specify a maximum deviation from strong consistency on a per-replica basis is explored.
Abstract: As raw system performance continues to improve at exponential rates, the utility of many services is increasingly limited by availability rather than performance. A key approach to improving availability involves replicating the service across multiple, wide-area sites. However, replication introduces well-known trade-offs between service consistency and availability. Thus, this article explores the benefits of dynamically trading consistency for availability using a continuous consistency model. In this model, applications specify a maximum deviation from strong consistency on a per-replica basis. In this article, we: i) evaluate the availability of a prototype replication system running across the Internet as a function of consistency level, consistency protocol, and failure characteristics, ii) demonstrate that simple optimizations to existing consistency protocols result in significant availability improvements (more than an order of magnitude in some scenarios), iii) use our experience with these optimizations to prove tight upper bound on the availability of services, and iv) show that maximizing availability typically entails remaining as close to strong consistency as possible during times of good connectivity, resulting in a communication versus availability trade-off.

Patent
08 May 2006
TL;DR: In this paper, the authors present a method of controlling a plurality of forwarding databases provided in an Ethernet bridge having a plurality-of-devices (MOD) consisting of a first set of entries in a first forwarding database maintained by a first one of the plurality of devices.
Abstract: A method of controlling a plurality of forwarding databases provided in an Ethernet bridge having a plurality of devices. The method includes aging a first set of entries in a first forwarding database maintained by a first one of the plurality of devices. The first set of entries are owned by the first one of the plurality of devices. The method also includes transmitting one or more new address messages from the first one of the plurality of devices to a second one of the plurality of devices. The method further includes aging a second set of entries in the first forwarding database. The second set of entries are owned by the second one of the plurality of devices.

Proceedings ArticleDOI
11 Dec 2006
TL;DR: This work introduces the formal concept of traceability networks and highlights the technical challenges involved in sharing data in such a network, and presents an innovative combination of query processing techniques from P2P networks and distributed as well as parallel databases with confidentiality enforcement techniques.
Abstract: Tracking and tracing individual items is a new and emerging trend in many industries. Driven by matur- ing technologies such as Radio-Frequency Identification (RFID) and upcoming standards such as the Electronic Product Code (EPC), a rapidly increasing number of enter- prises are collecting vast amounts of tracking data. To en- able traceability over the entire life-cycle of items data has to be shared across independent and possibly competing en- terprises. The need to simultaneously compete and cooper- ate requires a traceability system design that allows compa- nies to share their traceability data while maintaining com- plete sovereignty over what is shared and with whom. Based on an extensive study of traceability applications, we introduce the formal concept of traceability networks and highlight the technical challenges involved in shar- ing data in such a network. To address these challenges, we present an innovative combination of query process- ing techniques from P2P networks and distributed as well as parallel databases with confidentiality enforcement tech- niques.

Book ChapterDOI
29 Oct 2006
TL;DR: This paper formalizes and analyze the operator placement problem in the context of a locally distributed continuous query system and proposes a solution, that is asynchronous and local, to dynamically manage the load across the system nodes.
Abstract: In a distributed processing environment, the static placement of query operators may result in unsatisfactory system performance due to unpredictable factors such as changes of servers' load, data arrival rates, etc The problem is exacerbated for continuous (and long running) monitoring queries over data streams as any suboptimal placement will affect the system for a very long time In this paper, we formalize and analyze the operator placement problem in the context of a locally distributed continuous query system We also propose a solution, that is asynchronous and local, to dynamically manage the load across the system nodes Essentially, during runtime, we migrate query operators/fragments from overloaded nodes to lightly loaded ones to achieve better performance Heuristics are also proposed to maintain good data flow locality Results of a performance study shows the effectiveness of our technique.

Proceedings ArticleDOI
04 Jul 2006
TL;DR: This paper provides a broad overview of relevant quality-of-service metrics and describes their specific meaning in the context of distributed and decentralized publish-subscribe systems to provide a common base for future evaluations of emerging systems and for the design of qualityof- service aware publish- Subscribe infrastructures.
Abstract: Publish-subscribe is a powerful paradigm for distributed communication based on decoupled producers and consumers of information. Its event-driven nature makes it very appealing for large-scale data dissemination infrastructures. Various architectures were proposed in recent years that provide very diverse features. However, there are few well-defined metrics in the publish-subscribe area that would allow their evaluation and comparison. In this paper, we provide a broad overview of relevant quality-of-service metrics and describe their specific meaning in the context of distributed and decentralized publish-subscribe systems. Our goal is to provide a common base for future evaluations of emerging systems and for the design of qualityof- service aware publish-subscribe infrastructures.

Journal ArticleDOI
TL;DR: In all the state spaces considered, the use of multiple smaller pattern databases reduces the number of nodes generated by IDA*.

Journal ArticleDOI
01 Jul 2006
TL;DR: This paper proposes a mobile clinical information system (MobileMed), which integrates the distributed and fragmented patient data across heterogeneous sources and makes them accessible through mobile devices and provides a means for effortless implementation and deployment of such systems.
Abstract: Patient clinical data are distributed and often fragmented in heterogeneous systems, and therefore the need for information integration is a key to reliable patient care. Once the patient data are orderly integrated and readily available, the problems in accessing the distributed patient clinical data, the well-known difficulties of adopting a mobile health information system, are resolved. This paper proposes a mobile clinical information system (MobileMed), which integrates the distributed and fragmented patient data across heterogeneous sources and makes them accessible through mobile devices. The system consists of four main components: a smart interface, an HL7 message server (HMS), a central clinical database (CCDB), and a web server. The smart interface and the HMS work in concert to generate HL7 messages from the existing legacy systems, which essentially send the patient data in HL7 messages to the CCDB to be stored and maintained. The CCDB and the web server enable the physicians to access the integrated up-to-date patient data. By proposing the smart interface approach, we provide a means for effortless implementation and deployment of such systems. Through a performance study, we show that the HMS is reliable yet fast enough to be able to support efficient clinical data communication

Book ChapterDOI
26 Mar 2006
TL;DR: This work presents a simple extension to the AXML language, allowing it to declaratively specify and deploy complex applications based solely on XML and XML queries, and enables numerous powerful optimizations across a distributed complex process.
Abstract: As data management applications grow more complex, they may need efficient distributed query processing, but also subscription management, data archival etc. To enact such applications, the current solution consists of stacking several systems together. The juxtaposition of different computing models prevents reasoning on the application as a whole, and wastes important opportunities to improve performance. We present a simple extension to the AXML [7] language, allowing it to declaratively specify and deploy complex applications based solely on XML and XML queries. Our main contribution is a full algebraic model for complex distributed AXML computations. While very expressive, the model is conceptually uniform, and enables numerous powerful optimizations across a distributed complex process.

Proceedings ArticleDOI
18 Apr 2006
TL;DR: The dynamic replication system dynamically allocates replicas to applications in order to maintain application-level performance in response to either peak loads or failure conditions and shows that dynamic replication requires fewer resources than static partitioning or full overlap replication policies and provides over 90% latency compliance to each application under a range of load and failure scenarios.
Abstract: The database tier of dynamic content servers at large Internet sites is typically hosted on centralized and expensive hardware. Recently, research prototypes have proposed using database replication on commodity clusters as a more economical scaling solution. In this paper, we propose using database replication to support multiple applications on a shared cluster. Our system dynamically allocates replicas to applications in order to maintain application-level performance in response to either peak loads or failure conditions. This approach allows unifying load and fault management functionality. The main challenge in the design of our system is the lime taken to add database replicas. We present replica allocation policies that take this time delay into account and also design an efficient replica addition method that has minimal impact on other applications.We evaluate our dynamic replication system on a commodity cluster with two standard benchmarks: the TPC-W e-commerce benchmark and the RUBIS auction benchmark. Our evaluation shows that dynamic replication requires fewer resources than static partitioning or full overlap replication policies and provides over 90% latency compliance to each application under a range of load and failure scenarios.

MonographDOI
01 Jan 2006
TL;DR: This book describes a hierarchical multi-sensor framework for event detection in wide environments and describes a distributed database for effective management and evaluation of CCTV systems.
Abstract: * Chapter 1: A review of the state-of-the-art in distributed surveillance systems * Chapter 2: Monitoring practice: event detection and system design * Chapter 3: A distributed database for effective management and evaluation of CCTV systems * Chapter 4: A distributed domotic surveillance system * Chapter 5: A general-purpose system for distributed surveillance and communication * Chapter 6: Tracking objects across uncalibrated, arbitrary topology camera networks * Chapter 7: A distributed multi-sensor surveillance system for public transport applications * Chapter 8: Tracking football players with multiple cameras * Chapter 9: A hierarchical multi-sensor framework for event detection in wide environments

Book ChapterDOI
13 Dec 2006
TL;DR: It is shown how ideas from the current literature that focus on “secure” summations and secure regression analysis can be adapted or generalized to the categorical data setting, especially in the form of log-linear analysis and logistic regression over partitioned databases.
Abstract: The machine learning community has focused on confidentiality problems associated with statistical analyses that “integrate” data stored in multiple, distributed databases where there are barriers to simply integrating the databases. This paper discusses various techniques which can be used to perform statistical analysis for categorical data, especially in the form of log-linear analysis and logistic regression over partitioned databases, while limiting confidentiality concerns. We show how ideas from the current literature that focus on “secure” summations and secure regression analysis can be adapted or generalized to the categorical data setting.