scispace - formally typeset
Search or ask a question

Showing papers by "Rajeev Rastogi published in 2005"


Proceedings ArticleDOI
14 Jun 2005
TL;DR: It is proved that finding minimal-cost repairs in this model is NP-complete in the size of the database, and an approach to heuristic repair-construction based on equivalence classes of attribute values is introduced.
Abstract: Data integrated from multiple sources may contain inconsistencies that violate integrity constraints. The constraint repair problem attempts to find "low cost" changes that, when applied, will cause the constraints to be satisfied. While in most previous work repair cost is stated in terms of tuple insertions and deletions, we follow recent work to define a database repair as a set of value modifications. In this context, we introduce a novel cost framework that allows for the application of techniques from record-linkage to the search for good repairs. We prove that finding minimal-cost repairs in this model is NP-complete in the size of the database, and introduce an approach to heuristic repair-construction based on equivalence classes of attribute values. Following this approach, we define two greedy algorithms. While these simple algorithms take time cubic in the size of the database, we develop optimizations inspired by algorithms for duplicate-record detection that greatly improve scalability. We evaluate our framework and algorithms on synthetic and real data, and show that our proposed optimizations greatly improve performance at little or no cost in repair quality.

436 citations


Proceedings ArticleDOI
14 Jun 2005
TL;DR: This work presents the first known distributed-tracking schemes for maintaining accurate quantile estimates with provable approximation guarantees, while simultaneously optimizing the storage space at each remote site as well as the communication cost across the network.
Abstract: While traditional database systems optimize for performance on one-shot queries, emerging large-scale monitoring applications require continuous tracking of complex aggregates and data-distribution summaries over collections of physically-distributed streams. Thus, effective solutions have to be simultaneously space efficient (at each remote site), communication efficient (across the underlying communication network), and provide continuous, guaranteed-quality estimates. In this paper, we propose novel algorithmic solutions for the problem of continuously tracking complex holistic aggregates in such a distributed-streams setting --- our primary focus is on approximate quantile summaries, but our approach is more broadly applicable and can handle other holistic-aggregate functions (e.g., "heavy-hitters" queries). We present the first known distributed-tracking schemes for maintaining accurate quantile estimates with provable approximation guarantees, while simultaneously optimizing the storage space at each remote site as well as the communication cost across the network. In a nutshell, our algorithms employ a combination of local tracking at remote sites and simple prediction models for local site behavior in order to produce highly communication- and space-efficient solutions. We perform extensive experiments with real and synthetic data to explore the various tradeoffs and understand the role of prediction models in our schemes. The results clearly validate our approach, revealing significant savings over naive solutions as well as our analytical worst-case guarantees.

234 citations


Journal ArticleDOI
TL;DR: This is the first study to provide a rigorous solution, with proven guarantees, to the combined problem of computing QoS paths with restoration, and the proposed algorithms construct a restoration topology, i.e., a set of bridges protecting a portion of the primary QoS path.
Abstract: There is a growing interest among service providers to offer new services with Quality of Service (QoS) guarantees that are also resilient to failures. Supporting QoS connections requires the existence of a routing mechanism, that computes the QoS paths, i.e., paths that satisfy QoS constraints (e.g., delay or bandwidth). Resilience to failures, on the other hand, is achieved by providing, for each primary QoS path, a set of alternative QoS paths used upon a failure of either a link or a node. The above objectives, coupled with the need to minimize the global use of network resources, imply that the cost of both the primary path and the restoration topology should be a major consideration of the routing process. We undertake a comprehensive study of problems related to finding suitable restoration topologies for QoS paths. We consider both bottleneck QoS constraints, such as bandwidth, and additive QoS constraints, such as delay and jitter. This is the first study to provide a rigorous solution, with proven guarantees, to the combined problem of computing QoS paths with restoration. It turns out that the widely used approach of disjoint primary and restoration paths is not an optimal strategy. Hence, the proposed algorithms construct a restoration topology , i.e., a set of bridges, each bridge protecting a portion of the primary QoS path. This approach guarantees to find a restoration topology with low cost when one exists.

72 citations


Journal ArticleDOI
TL;DR: This paper initiates a theoretical study of MPLS protocols, and routing algorithms and lower bounds are given for a variety of situations, and tree covers of logarithmic size for planar graphs and graphs with bounded separators are shown, which may be of independent interest.
Abstract: A new packet routing model proposed by the Internet Engineering Task Force is MultiProtocol Label Switching, or MPLS [B. Davie and Y. Rekhter, MPLS: Technology and Applications, Morgan Kaufmann (Elsevier), New York, 2000]. Instead of each router's parsing the packet network layer header and doing its lookups based on that analysis (as in much of conventional packet routing), MPLS ensures that the analysis of the header is performed just once. The packet is then assigned a stack of labels, where the labels are usually much smaller than the packet headers themselves. When a router receives a packet, it examines the label at the top of the label stack and makes the decision of where the packet is forwarded based solely on that label. It can pop the top label off the stack if it so desires, and can also push some new labels onto the stack, before forwarding the packet. This scheme has several advantages over conventional routing protocols, the two primary ones being (a) reduced amount of header analysis at intermediate routers, which allows for faster switching times, and (b) better traffic engineering capabilities and hence easier handling of quality of service issues. However, essentially nothing is known at a theoretical level about the performance one can achieve with this protocol, or about the intrinsic trade-offs in its use of resources. This paper initiates a theoretical study of MPLS protocols, and routing algorithms and lower bounds are given for a variety of situations. We first study the routing problem on the line, a case which is already nontrivial, and give routing protocols whose trade-offs are close to optimality. We then extend our results for paths to trees, and thence onto more general graphs. These routing algorithms on general graphs are obtained by finding a tree cover of a graph, i.e., a small family of subtrees of the graph such that, for each pair of vertices, one of the trees in the family contains an (almost-)shortest path between them. Our results show tree covers of logarithmic size for planar graphs and graphs with bounded separators, which may be of independent interest.

48 citations


Patent
31 Mar 2005
TL;DR: In this article, a framework is provided for integrating data from multiple relational sources into an XML document that both conforms to a given DTD and satisfies predefined XML constraints, based on a specification language, designated Attribute Integration Grammar (AIG), that extends a DTD by associating element types with semantic attributes, computing these attributes via parameterized SQL queries over multiple data sources, and incorporating XML keys and inclusion constraints.
Abstract: A framework is provided for integrating data from multiple relational sources into an XML document that both conforms to a given DTD and satisfies predefined XML constraints. The framework is based on a specification language, designated Attribute Integration Grammar (AIG), that extends a DTD by (1) associating element types with semantic attributes, (2) computing these attributes via parameterized SQL queries over multiple data sources, and (3) incorporating XML keys and inclusion constraints. The AIG uniquely operates on semantic attributes and their dependency relations for controlling context-dependent, DTD-directed construction of XML documents, and, as well as checks XML constraints in parallel with document-generation.

24 citations


Proceedings ArticleDOI
13 Jun 2005
TL;DR: This paper proposes the first space-efficient algorithmic solution to the general Join-Distinctestimation problem over continuous data streams (the authors' techniques can actually handle general update streams,prising tuple deletions as well as insertions) and presents lower bound for space usage of the estimators.
Abstract: There is growing interest in algorithms for processing and querying continuous data streams (i.e., data that is seen only once in a fixed order) with limited memory resources. Providing (perhaps approximate) answers to queries over such streams is a crucial requirement for many application environments; examples include large IP network installations where performance data from different parts of the network needs to be continuously collected and analyzed.The ability to estimate the number of distinct (sub)tuples in the result of a join operation correlating two data streams (i.e., the cardinality of a projection with duplicate elimination over a join) is an important requirement for several data-analysis scenarios. For instance, to enable real-time traffic analysis and load balancing, a network-monitoring application may need to estimate the number of distinct (source, destination) IP-address pairs occurring in the stream of IP packets observed by router R1, where the source address is also seen in packets routed through a different router R2. Earlier work has presented solutions to the individual problems of distinct counting and join-size estimation (without duplicate elimination) over streams. These solutions, however, are fundamentally different and extending or combining them to handle our more complex "Join-Distinct" estimation problem is far from obvious. In this paper, we propose the first space-efficient algorithmic solution to the general Join-Distinct estimation problem over continuous data streams (our techniques can actually handle general update streams comprising tuple deletions as well as insertions). Our estimators are probabilistic in nature and rely on novel algorithms for building and combining a new class of hash-based synopses (termed "JD sketches") for individual update streams. We demonstrate that our algorithms can provide low error, high-confidence Join-Distinct estimates using only small space and small processing time per update. In fact, we present lower bounds showing that the space usage of our estimators is within small factors of the best possible for the Join-Distinct problem. Preliminary experimental results verify the effectiveness of our approach.

23 citations


Patent
13 Dec 2005
TL;DR: In this paper, the authors proposed a method and apparatus for determining a rank of a query value, which comprises receiving a rank query request, determining, for each of the at least one remote monitor, a predicted lower-bound rank value and upper-bound ranking value, wherein the predicted lower bound and upper bound value are determined according to at least the respective prediction model used by each remote monitor to compute the local quantile summary.
Abstract: The invention comprises a method and apparatus for determining a rank of a query value. Specifically, the method comprises receiving a rank query request, determining, for each of the at least one remote monitor, a predicted lower-bound rank value and upper-bound rank value, wherein the predicted lower-bound rank value and upper-bound rank value are determined according to at least one respective prediction model used by each of the at least one remote monitor to compute the at least one local quantile summary, computing a predicted average rank value for each of the at least one remote monitor using the at least one predicted lower-bound rank value and the at least one predicted upper-bound rank value associated with the respective at least one remote monitor, and computing the rank of the query value using the at least one predicted average rank value associated with the respective at least one remote monitor.

14 citations


Journal ArticleDOI
TL;DR: This paper proposes an OSS for VoIP networks with emphasis on provisioning, monitoring, capacity planning, and service creation, with greater focus on service-oriented operations and management.
Abstract: The rapid emergence of voice over Internet Protocol (VoIP) networks and their enticing appeal to a broad base of customers are driven by economic incentives, improved productivity, and creation of new services. Thus, VoIP networks provide an attractive value proposition to service providers, as they enable additional revenue opportunities through new services coupled with reduced network operational costs. An effective operations support system (OSS) is needed to manage a VoIP network by performing a full range of network management tasks (fault, configuration, accounting, performance, and security [FCAPS]). However, its primary goal is to ensure that customers do not perceive any difference in the transition from the public switched telephone network (PSTN) to converged services architecture in terms of toll-quality voice and 99.999% network availability. An OSS for managing VoIP networks requires greater focus on service-oriented operations and management. In this paper, we discuss the challenges that are unique to managing VoIP networks. Specifically, we propose an OSS for VoIP networks with emphasis on provisioning, monitoring, capacity planning, and service creation.

12 citations


Patent
26 Sep 2005
TL;DR: In this paper, the authors proposed an approximation method for channel allocation in a wireless local area network (WLAN) that guarantees that channel allocations will be no less than 1/6 of an optimal channel allocation scheme, if the interference pattern associated with APs within a given WLAN conforms to a unit disk graph interference pattern.
Abstract: PROBLEM TO BE SOLVED: To provide a method and apparatus for allocating channels to access points (APs) in a wireless local area network (WLAN) in a reasonable time period using an approximation method. SOLUTION: This approximation method guarantees that channel allocations will be no less than 1/6 of an optimal channel allocation scheme, if the interference pattern associated with APs within a given WLAN conforms to a unit disk graph interference pattern. COPYRIGHT: (C)2006,JPO&NCIPI

2 citations


Patent
12 Apr 2005
TL;DR: In this paper, the authors proposed a channel allocation algorithm to minimize the mutual interference by isolating each cell for the channel to be allocated from all other cells for the same channel.
Abstract: PROBLEM TO BE SOLVED: To provide a channel allocation to one or a plurality of cells in a wireless LAN (WLAN) without generating unallowable interference. SOLUTION: The allocation includes a process to divide a temporal term for the allocation into a plurality of frames having an enough short sustained time respectively, and a process to allocate one or a plurality of channels to one or a plurality of WLAN cells based on an allocation vector then during each frame term. The vector makes it sure to minimize a mutual interference by enough isolating each cell for the channel to be allocated from all other cells for the same channel to be allocated. Only the cell for the channel to be allocated is allowed to transmit to a specific frame term. The allocation vector is decided by a system to optimize the performance of the WLAN. COPYRIGHT: (C)2006,JPO&NCIPI

1 citations