Author

S. Kliger

Bio: S. Kliger is an academic researcher. The author has contributed to research in topics: Local area network & Enterprise private network. The author has an hindex of 1, co-authored 1 publications receiving 404 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

High speed and robust event correlation

[...]

S. Yemini, S. Kliger, Eyal Mozes, Yechiam Yemini, D. Ohsie - Show less +1 more

01 May 1996-IEEE Communications Magazine

TL;DR: The authors describe a network management system and illustrates its application to managing a distributed database application on a complex enterprise network.

...read moreread less

Abstract: The authors describe a network management system and illustrate its application to managing a distributed database application on a complex enterprise network.

...read moreread less

404 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Pinpoint: problem determination in large, dynamic Internet services

[...]

Mike Y. Chen¹, Emre Kiciman¹, Eugene Fratkin¹, Armando Fox¹, Eric Brewer¹ - Show less +1 more•Institutions (1)

University of California, Berkeley¹

23 Jun 2002

TL;DR: This work presents a dynamic analysis methodology that automates problem determination in these environments by coarse-grained tagging of numerous real client requests as they travel through the system and using data mining techniques to correlate the believed failures and successes of these requests to determine which components are most likely to be at fault.

...read moreread less

Abstract: Traditional problem determination techniques rely on static dependency models that are difficult to generate accurately in today's large, distributed, and dynamic application environments such as e-commerce systems. We present a dynamic analysis methodology that automates problem determination in these environments by 1) coarse-grained tagging of numerous real client requests as they travel through the system and 2) using data mining techniques to correlate the believed failures and successes of these requests to determine which components are most likely to be at fault. To validate our methodology, we have implemented Pinpoint, a framework for root cause analysis on the J2EE platform that requires no knowledge of the application components. Pinpoint consists of three parts: a communications layer that traces client requests, a failure detector that uses traffic-sniffing and middleware instrumentation, and a data analysis engine. We evaluate Pinpoint by injecting faults into various application components and show that Pinpoint identifies the faulty components with high accuracy and produces few false-positives.

...read moreread less

910 citations

Journal Article•DOI•

Clustering intrusion detection alarms to support root cause analysis

[...]

Klaus Julisch¹•Institutions (1)

IBM¹

01 Nov 2003-ACM Transactions on Information and System Security

TL;DR: A novel alarm-clustering method is proposed that supports the human analyst in identifying root causes and shows that the alarm load decreases quite substantially if the identified root causes are eliminated so that they can no longer trigger alarms in the future.

...read moreread less

Abstract: It is a well-known problem that intrusion detection systems overload their human operators by triggering thousands of alarms per day. This paper presents a new approach for handling intrusion detection alarms more efficiently. Central to this approach is the notion that each alarm occurs for a reason, which is referred to as the alarm's root causes. This paper observes that a few dozens of rather persistent root causes generally account for over 90p of the alarms that an intrusion detection system triggers. Therefore, we argue that alarms should be handled by identifying and removing the most predominant and persistent root causes. To make this paradigm practicable, we propose a novel alarm-clustering method that supports the human analyst in identifying root causes. We present experiments with real-world intrusion detection alarms to show how alarm clustering helped us identify root causes. Moreover, we show that the alarm load decreases quite substantially if the identified root causes are eliminated so that they can no longer trigger alarms in the future.

...read moreread less

481 citations

Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies

[...]

David M. Patterson¹, Aaron B. Brown, Pete Broadwell, George Candea, Mike Y. Chen, James Cutler, Patricia Enriquez, Armando Fox, Matthew Merzbacher, David Oppenheimer, Naveen Sastry, William H. Tetzlaff, Jonathan Traupman, Noah Treuhaft, David A. Patterson¹ - Show less +11 more•Institutions (1)

University of California, Berkeley¹

01 Jan 2002

TL;DR: Recovery Oriented Computing (ROC) takes the perspective that hardware faults, software bugs, and operator errors are facts to be coped with, not problems to be solved, and thus offers higher availability.

...read moreread less

Abstract: It is time to broaden our performance-dominated research agenda. A four order of magnitude increase in performance since the first ASPLOS in 1982 means that few outside the CS&E research community believe that speed is the only problem of computer hardware and software. Current systems crash and freeze so frequently that people become violent. 1 Fast but flaky should not be our 21 st century legacy. Recovery Oriented Computing (ROC) takes the perspective that hardware faults, software bugs, and operator errors are facts to be coped with, not problems to be solved. By concentrating on Mean Time to Repair (MTTR) rather than Mean Time to Failure (MTTF), ROC reduces recovery time and thus offers higher availability. Since a large portion of system administration is dealing with failures, ROC may also reduce total cost of ownership. One to two orders of magnitude reduction in cost mean that the purchase price of hardware and software is now a small part of the total cost of ownership. In addition to giving the motivation and definition of ROC, we introduce failure data for Internet sites that shows that the leading cause of outages is operator error. We also demonstrate five ROC techniques in five case studies, which we hope will influence designers of architectures and operating systems. If we embrace availability and maintainability, systems of the future may compete on recovery performance rather than just SPEC performance, and on total cost of ownership rather than just system price. Such a change may restore our pride in the architectures and operating systems we craft.

...read moreread less

470 citations

Proceedings Article•DOI•

Towards highly reliable enterprise network services via inference of multi-level dependencies

[...]

Paramvir Bahl¹, Ranveer Chandra¹, Albert Greenberg¹, Srikanth Kandula², David A. Maltz¹, Ming Zhang¹ - Show less +2 more•Institutions (2)

Microsoft¹, Massachusetts Institute of Technology²

27 Aug 2007

TL;DR: An Inference Graph model is introduced, which is well-adapted to user-perceptible problems rooted in conditions giving rise to both partial service degradation and hard faults, and takes into account multi-level structure, which leads to a 30% improvement in fault localization, as compared to two-level approaches.

...read moreread less

Abstract: Localizing the sources of performance problems in large enterprise networks is extremely challenging. Dependencies are numerous, complex and inherently multi-level, spanning hardware and software components across the network and the computing infrastructure. To exploit these dependencies for fast, accurate problem localization, we introduce an Inference Graph model, which is well-adapted to user-perceptible problems rooted in conditions giving rise to both partial service degradation and hard faults. Further, we introduce the Sherlock system to discover Inference Graphs in the operational enterprise, infer critical attributes, and then leverage the result to automatically detect and localize problems. To illuminate strengths and limitations of the approach, we provide results from a prototype deployment in a large enterprise network, as well as from testbed emulations and simulations. In particular, we find that taking into account multi-level structure leads to a 30% improvement in fault localization, as compared to two-level approaches.

...read moreread less

405 citations

Journal Article•DOI•

A survey of fault localization techniques in computer networks

[...]

Malgorzata Steinder¹, Adarshpal S. Sethi²•Institutions (2)

IBM¹, University UCINF²

01 Nov 2004-Science of Computer Programming

TL;DR: The challenges of fault localization in complex communication systems are discussed and an overview of solutions proposed in the course of the last ten years are presented, while discussing their advantages and shortcomings.

...read moreread less

397 citations

Collapse