Showing papers by "Rajeev Rastogi published in 2005"

PDF

Open Access

Proceedings Article•DOI•

A cost-based model and effective heuristic for repairing constraints by value modification

[...]

Philip L. Bohannon¹, Wenfei Fan², Michael E. Flaster¹, Rajeev Rastogi¹•Institutions (2)

Alcatel-Lucent¹, University of Edinburgh²

14 Jun 2005

TL;DR: It is proved that finding minimal-cost repairs in this model is NP-complete in the size of the database, and an approach to heuristic repair-construction based on equivalence classes of attribute values is introduced.

...read moreread less

Abstract: Data integrated from multiple sources may contain inconsistencies that violate integrity constraints. The constraint repair problem attempts to find "low cost" changes that, when applied, will cause the constraints to be satisfied. While in most previous work repair cost is stated in terms of tuple insertions and deletions, we follow recent work to define a database repair as a set of value modifications. In this context, we introduce a novel cost framework that allows for the application of techniques from record-linkage to the search for good repairs. We prove that finding minimal-cost repairs in this model is NP-complete in the size of the database, and introduce an approach to heuristic repair-construction based on equivalence classes of attribute values. Following this approach, we define two greedy algorithms. While these simple algorithms take time cubic in the size of the database, we develop optimizations inspired by algorithms for duplicate-record detection that greatly improve scalability. We evaluate our framework and algorithms on synthetic and real data, and show that our proposed optimizations greatly improve performance at little or no cost in repair quality.

...read moreread less

436 citations

Proceedings Article•DOI•

Holistic aggregates in a networked world: distributed tracking of approximate quantiles

[...]

Graham Cormode¹, Minos Garofalakis¹, S. Muthukrishnan², Rajeev Rastogi¹•Institutions (2)

Bell Labs¹, Rutgers University²

14 Jun 2005

TL;DR: This work presents the first known distributed-tracking schemes for maintaining accurate quantile estimates with provable approximation guarantees, while simultaneously optimizing the storage space at each remote site as well as the communication cost across the network.

...read moreread less

Abstract: While traditional database systems optimize for performance on one-shot queries, emerging large-scale monitoring applications require continuous tracking of complex aggregates and data-distribution summaries over collections of physically-distributed streams. Thus, effective solutions have to be simultaneously space efficient (at each remote site), communication efficient (across the underlying communication network), and provide continuous, guaranteed-quality estimates. In this paper, we propose novel algorithmic solutions for the problem of continuously tracking complex holistic aggregates in such a distributed-streams setting --- our primary focus is on approximate quantile summaries, but our approach is more broadly applicable and can handle other holistic-aggregate functions (e.g., "heavy-hitters" queries). We present the first known distributed-tracking schemes for maintaining accurate quantile estimates with provable approximation guarantees, while simultaneously optimizing the storage space at each remote site as well as the communication cost across the network. In a nutshell, our algorithms employ a combination of local tracking at remote sites and simple prediction models for local site behavior in order to produce highly communication- and space-efficient solutions. We perform extensive experiments with real and synthetic data to explore the various tradeoffs and understand the role of prediction models in our schemes. The results clearly validate our approach, revealing significant savings over naive solutions as well as our analytical worst-case guarantees.

...read moreread less

234 citations

Journal Article•DOI•

Algorithms for computing QoS paths with restoration

[...]

Yigal Bejerano¹, Yuri Breitbart², Ariel Orda³, Rajeev Rastogi⁴, Alex Sprintson⁵ - Show less +1 more•Institutions (5)

Bell Labs¹, Kent State University², Technion – Israel Institute of Technology³, Alcatel-Lucent⁴, California Institute of Technology⁵

01 Jun 2005-IEEE ACM Transactions on Networking

TL;DR: This is the first study to provide a rigorous solution, with proven guarantees, to the combined problem of computing QoS paths with restoration, and the proposed algorithms construct a restoration topology, i.e., a set of bridges protecting a portion of the primary QoS path.

...read moreread less

Abstract: There is a growing interest among service providers to offer new services with Quality of Service (QoS) guarantees that are also resilient to failures. Supporting QoS connections requires the existence of a routing mechanism, that computes the QoS paths, i.e., paths that satisfy QoS constraints (e.g., delay or bandwidth). Resilience to failures, on the other hand, is achieved by providing, for each primary QoS path, a set of alternative QoS paths used upon a failure of either a link or a node. The above objectives, coupled with the need to minimize the global use of network resources, imply that the cost of both the primary path and the restoration topology should be a major consideration of the routing process. We undertake a comprehensive study of problems related to finding suitable restoration topologies for QoS paths. We consider both bottleneck QoS constraints, such as bandwidth, and additive QoS constraints, such as delay and jitter. This is the first study to provide a rigorous solution, with proven guarantees, to the combined problem of computing QoS paths with restoration. It turns out that the widely used approach of disjoint primary and restoration paths is not an optimal strategy. Hence, the proposed algorithms construct a restoration topology , i.e., a set of bridges, each bridge protecting a portion of the primary QoS path. This approach guarantees to find a restoration topology with low cost when one exists.

...read moreread less

72 citations

Journal Article•DOI•

Traveling with a Pez Dispenser (or, Routing Issues in MPLS)

[...]

Anupam Gupta, Arvind Kumar, Rajeev Rastogi

01 Feb 2005-SIAM Journal on Computing

TL;DR: This paper initiates a theoretical study of MPLS protocols, and routing algorithms and lower bounds are given for a variety of situations, and tree covers of logarithmic size for planar graphs and graphs with bounded separators are shown, which may be of independent interest.

...read moreread less

Abstract: A new packet routing model proposed by the Internet Engineering Task Force is MultiProtocol Label Switching, or MPLS [B. Davie and Y. Rekhter, MPLS: Technology and Applications, Morgan Kaufmann (Elsevier), New York, 2000]. Instead of each router's parsing the packet network layer header and doing its lookups based on that analysis (as in much of conventional packet routing), MPLS ensures that the analysis of the header is performed just once. The packet is then assigned a stack of labels, where the labels are usually much smaller than the packet headers themselves. When a router receives a packet, it examines the label at the top of the label stack and makes the decision of where the packet is forwarded based solely on that label. It can pop the top label off the stack if it so desires, and can also push some new labels onto the stack, before forwarding the packet. This scheme has several advantages over conventional routing protocols, the two primary ones being (a) reduced amount of header analysis at intermediate routers, which allows for faster switching times, and (b) better traffic engineering capabilities and hence easier handling of quality of service issues. However, essentially nothing is known at a theoretical level about the performance one can achieve with this protocol, or about the intrinsic trade-offs in its use of resources. This paper initiates a theoretical study of MPLS protocols, and routing algorithms and lower bounds are given for a variety of situations. We first study the routing problem on the line, a case which is already nontrivial, and give routing protocols whose trade-offs are close to optimality. We then extend our results for paths to trees, and thence onto more general graphs. These routing algorithms on general graphs are obtained by finding a tree cover of a graph, i.e., a small family of subtrees of the graph such that, for each pair of vertices, one of the trees in the family contains an (almost-)shortest path between them. Our results show tree covers of logarithmic size for planar graphs and graphs with bounded separators, which may be of independent interest.

...read moreread less

48 citations

Patent•

System and method for XML data integration

[...]

Michael Benedikt¹, Wenfei Fan, Rajeev Rastogi•Institutions (1)

Alcatel-Lucent¹

31 Mar 2005

TL;DR: In this article, a framework is provided for integrating data from multiple relational sources into an XML document that both conforms to a given DTD and satisfies predefined XML constraints, based on a specification language, designated Attribute Integration Grammar (AIG), that extends a DTD by associating element types with semantic attributes, computing these attributes via parameterized SQL queries over multiple data sources, and incorporating XML keys and inclusion constraints.

...read moreread less

Abstract: A framework is provided for integrating data from multiple relational sources into an XML document that both conforms to a given DTD and satisfies predefined XML constraints. The framework is based on a specification language, designated Attribute Integration Grammar (AIG), that extends a DTD by (1) associating element types with semantic attributes, (2) computing these attributes via parameterized SQL queries over multiple data sources, and (3) incorporating XML keys and inclusion constraints. The AIG uniquely operates on semantic attributes and their dependency relations for controlling context-dependent, DTD-directed construction of XML documents, and, as well as checks XML constraints in parallel with document-generation.

...read moreread less

24 citations

Proceedings Article•DOI•

Join-distinct aggregate estimation over update streams

[...]

Sumit Ganguly¹, Minos Garofalakis², Amit Kumar³, Rajeev Rastogi²•Institutions (3)

Indian Institute of Technology Kanpur¹, Bell Labs², Indian Institute of Technology Delhi³

13 Jun 2005

TL;DR: This paper proposes the first space-efficient algorithmic solution to the general Join-Distinctestimation problem over continuous data streams (the authors' techniques can actually handle general update streams,prising tuple deletions as well as insertions) and presents lower bound for space usage of the estimators.

...read moreread less

Abstract: There is growing interest in algorithms for processing and querying continuous data streams (i.e., data that is seen only once in a fixed order) with limited memory resources. Providing (perhaps approximate) answers to queries over such streams is a crucial requirement for many application environments; examples include large IP network installations where performance data from different parts of the network needs to be continuously collected and analyzed.The ability to estimate the number of distinct (sub)tuples in the result of a join operation correlating two data streams (i.e., the cardinality of a projection with duplicate elimination over a join) is an important requirement for several data-analysis scenarios. For instance, to enable real-time traffic analysis and load balancing, a network-monitoring application may need to estimate the number of distinct (source, destination) IP-address pairs occurring in the stream of IP packets observed by router R1, where the source address is also seen in packets routed through a different router R2. Earlier work has presented solutions to the individual problems of distinct counting and join-size estimation (without duplicate elimination) over streams. These solutions, however, are fundamentally different and extending or combining them to handle our more complex "Join-Distinct" estimation problem is far from obvious. In this paper, we propose the first space-efficient algorithmic solution to the general Join-Distinct estimation problem over continuous data streams (our techniques can actually handle general update streams comprising tuple deletions as well as insertions). Our estimators are probabilistic in nature and rely on novel algorithms for building and combining a new class of hash-based synopses (termed "JD sketches") for individual update streams. We demonstrate that our algorithms can provide low error, high-confidence Join-Distinct estimates using only small space and small processing time per update. In fact, we present lower bounds showing that the space usage of our estimators is within small factors of the best possible for the Join-Distinct problem. Preliminary experimental results verify the effectiveness of our approach.

...read moreread less

23 citations

Patent•

Method and apparatus for globally approximating quantiles in a distributed monitoring environment

[...]

Graham Cormode¹, Minos Garofalakis¹, S. Muthukrishnan², Rajeev Rastogi²•Institutions (2)

Alcatel-Lucent¹, Rutgers University²

13 Dec 2005

TL;DR: In this paper, the authors proposed a method and apparatus for determining a rank of a query value, which comprises receiving a rank query request, determining, for each of the at least one remote monitor, a predicted lower-bound rank value and upper-bound ranking value, wherein the predicted lower bound and upper bound value are determined according to at least the respective prediction model used by each remote monitor to compute the local quantile summary.

...read moreread less

Abstract: The invention comprises a method and apparatus for determining a rank of a query value. Specifically, the method comprises receiving a rank query request, determining, for each of the at least one remote monitor, a predicted lower-bound rank value and upper-bound rank value, wherein the predicted lower-bound rank value and upper-bound rank value are determined according to at least one respective prediction model used by each of the at least one remote monitor to compute the at least one local quantile summary, computing a predicted average rank value for each of the at least one remote monitor using the at least one predicted lower-bound rank value and the at least one predicted upper-bound rank value associated with the respective at least one remote monitor, and computing the rank of the query value using the at least one predicted average rank value associated with the respective at least one remote monitor.

...read moreread less

14 citations

Journal Article•DOI•

Oss architecture and requirements for VoIP networks

[...]

Gokul Prabhakar¹, Rajeev Rastogi², Marina Thottan²•Institutions (2)

Alcatel-Lucent¹, Bell Labs²

01 Mar 2005-Bell Labs Technical Journal

TL;DR: This paper proposes an OSS for VoIP networks with emphasis on provisioning, monitoring, capacity planning, and service creation, with greater focus on service-oriented operations and management.

...read moreread less

Abstract: The rapid emergence of voice over Internet Protocol (VoIP) networks and their enticing appeal to a broad base of customers are driven by economic incentives, improved productivity, and creation of new services. Thus, VoIP networks provide an attractive value proposition to service providers, as they enable additional revenue opportunities through new services coupled with reduced network operational costs. An effective operations support system (OSS) is needed to manage a VoIP network by performing a full range of network management tasks (fault, configuration, accounting, performance, and security [FCAPS]). However, its primary goal is to ensure that customers do not perceive any difference in the transition from the public switched telephone network (PSTN) to converged services architecture in terms of toll-quality voice and 99.999% network availability. An OSS for managing VoIP networks requires greater focus on service-oriented operations and management. In this paper, we discuss the challenges that are unique to managing VoIP networks. Specifically, we propose an OSS for VoIP networks with emphasis on provisioning, monitoring, capacity planning, and service creation.

...read moreread less

12 citations

Patent•

Method and device for approximating optimal channel allocation

[...]

S. Jamaloddin Golestani, Mark Anthony Smith, Rajeev Rastogi, ゴレスタニエス．ジャマロディン, アンソニーフォドールマーク, ラストギラジェヴ - Show less +2 more

26 Sep 2005

TL;DR: In this paper, the authors proposed an approximation method for channel allocation in a wireless local area network (WLAN) that guarantees that channel allocations will be no less than 1/6 of an optimal channel allocation scheme, if the interference pattern associated with APs within a given WLAN conforms to a unit disk graph interference pattern.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a method and apparatus for allocating channels to access points (APs) in a wireless local area network (WLAN) in a reasonable time period using an approximation method. SOLUTION: This approximation method guarantees that channel allocations will be no less than 1/6 of an optimal channel allocation scheme, if the interference pattern associated with APs within a given WLAN conforms to a unit disk graph interference pattern. COPYRIGHT: (C)2006,JPO&NCIPI

...read moreread less

2 citations

Patent•

Allocation of channel to wireless lan

[...]

Rajeev Rastogi, S. Jamaloddin Golestani, Mark Anthony Smith, ゴレスタニエス．ジャマロッディン, アンソニーショーンスミスマーク, ラストジラジーヴ - Show less +2 more

12 Apr 2005

TL;DR: In this paper, the authors proposed a channel allocation algorithm to minimize the mutual interference by isolating each cell for the channel to be allocated from all other cells for the same channel.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a channel allocation to one or a plurality of cells in a wireless LAN (WLAN) without generating unallowable interference. SOLUTION: The allocation includes a process to divide a temporal term for the allocation into a plurality of frames having an enough short sustained time respectively, and a process to allocate one or a plurality of channels to one or a plurality of WLAN cells based on an allocation vector then during each frame term. The vector makes it sure to minimize a mutual interference by enough isolating each cell for the channel to be allocated from all other cells for the same channel to be allocated. Only the cell for the channel to be allocated is allowed to transmit to a specific frame term. The allocation vector is decided by a system to optimize the performance of the WLAN. COPYRIGHT: (C)2006,JPO&NCIPI

...read moreread less

1 citations

Patent•

Berechnung optimaler Kanalzuweisungen unter Verwendung von Aufteilungsverfahren und entsprechende Einrichtungen

[...]

S. Jamaloddin Golestani, Rajeev Rastogi, Mark Anthony Smith

22 Sep 2005