scispace - formally typeset
Search or ask a question

Showing papers by "Martin Rinard published in 2015"


Proceedings ArticleDOI
01 Jan 2015
TL;DR: DroidSafe achieves unprecedented precision and accuracy for Android information flow analysis and detects all malicious information flow leaks inserted into 24 real-world Android applications by three independent, hostile Red-Team organizations.
Abstract: We present DroidSafe, a static information flow analysis tool that reports potential leaks of sensitive information in Android applications. DroidSafe combines a comprehensive, accurate, and precise model of the Android runtime with static analysis design decisions that enable the DroidSafe analyses to scale to analyze this model. This combination is enabled by accurate analysis stubs, a technique that enables the effective analysis of code whose complete semantics lies outside the scope of Java, and by a combination of analyses that together can statically resolve communication targets identified by dynamically constructed values such as strings and class designators. Our experimental results demonstrate that 1) DroidSafe achieves unprecedented precision and accuracy for Android information flow analysis (as measured on a standard previously published set of benchmark applications) and 2) DroidSafe detects all malicious information flow leaks inserted into 24 real-world Android applications by three independent, hostile Red-Team organizations. The previous state-of-the art analysis, in contrast, detects less than 10% of these malicious flows.

440 citations


Proceedings ArticleDOI
30 Aug 2015
TL;DR: SPR, a new program repair system that combines staged program repair and condition synthesis, is presented, to generate correct repairs for over five times as many defects as previous systems evaluated on the same benchmark set.
Abstract: We present SPR, a new program repair system that combines staged program repair and condition synthesis. These techniques enable SPR to work productively with a set of parameterized transformation schemas to generate and efficiently search a rich space of program repairs. Together these techniques enable SPR to generate correct repairs for over five times as many defects as previous systems evaluated on the same benchmark set.

346 citations


Proceedings ArticleDOI
13 Jul 2015
TL;DR: Kali as discussed by the authors is a binary hot patching system that leverages learned invariants to produce patches that enable systems to survive otherwise fatal defects and security attacks, but it is vulnerable to security vulnerabilities and the elimination of desirable functionality.
Abstract: We analyze reported patches for three existing generate-and- validate patch generation systems (GenProg, RSRepair, and AE). The basic principle behind generate-and-validate systems is to accept only plausible patches that produce correct outputs for all inputs in the validation test suite. Because of errors in the patch evaluation infrastructure, the majority of the reported patches are not plausible — they do not produce correct outputs even for the inputs in the validation test suite. The overwhelming majority of the reported patches are not correct and are equivalent to a single modification that simply deletes functionality. Observed negative effects include the introduction of security vulnerabilities and the elimination of desirable functionality. We also present Kali, a generate-and-validate patch generation system that only deletes functionality. Working with a simpler and more effectively focused search space, Kali generates at least as many correct patches as prior GenProg, RSRepair, and AE systems. Kali also generates at least as many patches that produce correct outputs for the inputs in the validation test suite as the three prior systems. We also discuss the patches produced by ClearView, a generate-and-validate binary hot patching system that lever- ages learned invariants to produce patches that enable systems to survive otherwise fatal defects and security attacks. Our analysis indicates that ClearView successfully patches 9 of the 10 security vulnerabilities used to evaluate the system. At least 4 of these patches are correct.

329 citations


21 May 2015
TL;DR: ClearView is presented, a generate-and-validate binary hot patching system that lever- ages learned invariants to produce patches that enable systems to survive otherwise fatal defects and security attacks.
Abstract: We analyze reported patches for three existing generate-and- validate patch generation systems (GenProg, RSRepair, and AE). The basic principle behind generate-and-validate systems is to accept only plausible patches that produce correct outputs for all inputs in the validation test suite. Because of errors in the patch evaluation infrastructure, the majority of the reported patches are not plausible — they do not produce correct outputs even for the inputs in the validation test suite. The overwhelming majority of the reported patches are not correct and are equivalent to a single modification that simply deletes functionality. Observed negative effects include the introduction of security vulnerabilities and the elimination of desirable functionality. We also present Kali, a generate-and-validate patch generation system that only deletes functionality. Working with a simpler and more effectively focused search space, Kali generates at least as many correct patches as prior GenProg, RSRepair, and AE systems. Kali also generates at least as many patches that produce correct outputs for the inputs in the validation test suite as the three prior systems. We also discuss the patches produced by ClearView, a generate-and-validate binary hot patching system that lever- ages learned invariants to produce patches that enable systems to survive otherwise fatal defects and security attacks. Our analysis indicates that ClearView successfully patches 9 of the 10 security vulnerabilities used to evaluate the system. At least 4 of these patches are correct.

293 citations


Proceedings ArticleDOI
12 Oct 2015
TL;DR: It is shown that many popular code bases such as Apache and Nginx use coding practices that create flexibility in their intended control flow graph (CFG) even when a strong static analyzer is used to construct the CFG, which allows an attacker to gain control of the execution while strictly adhering to a fine-grained CFI.
Abstract: Control flow integrity (CFI) has been proposed as an approach to defend against control-hijacking memory corruption attacks. CFI works by assigning tags to indirect branch targets statically and checking them at runtime. Coarse-grained enforcements of CFI that use a small number of tags to improve the performance overhead have been shown to be ineffective. As a result, a number of recent efforts have focused on fine-grained enforcement of CFI as it was originally proposed. In this work, we show that even a fine-grained form of CFI with unlimited number of tags and a shadow stack (to check calls and returns) is ineffective in protecting against malicious attacks. We show that many popular code bases such as Apache and Nginx use coding practices that create flexibility in their intended control flow graph (CFG) even when a strong static analyzer is used to construct the CFG. These flexibilities allow an attacker to gain control of the execution while strictly adhering to a fine-grained CFI. We then construct two proof-of-concept exploits that attack an unlimited tag CFI system with a shadow stack. We also evaluate the difficulties of generating a precise CFG using scalable static analysis for real-world applications. Finally, we perform an analysis on a number of popular applications that highlights the availability of such attacks.

216 citations


Proceedings ArticleDOI
17 May 2015
TL;DR: It is shown that, for architectures that do not support segmentation in which CPI relies on information hiding, CPI's safe region can be leaked and then maliciously modified by using data pointer overwrites.
Abstract: Memory corruption attacks continue to be a major vector of attack for compromising modern systems Numerous defenses have been proposed against memory corruption attacks, but they all have their limitations and weaknesses Stronger defenses such as complete memory safety for legacy languages (C/C++) incur a large overhead, while weaker ones such as practical control flow integrity have been shown to be ineffective A recent technique called code pointer integrity (CPI) promises to balance security and performance by focusing memory safety on code pointers thus preventing most control-hijacking attacks while maintaining low overhead CPI protects access to code pointers by storing them in a safe region that is protected by instruction level isolation On x86-32, this isolation is enforced by hardware, on x86-64 and ARM, isolation is enforced by information hiding We show that, for architectures that do not support segmentation in which CPI relies on information hiding, CPI's safe region can be leaked and then maliciously modified by using data pointer overwrites We implement a proof-of-concept exploit against Nginx and successfully bypass CPI implementations that rely on information hiding in 6 seconds with 13 observed crashes We also present an attack that generates no crashes and is able to bypass CPI in 98 hours Our attack demonstrates the importance of adequately protecting secrets in security mechanisms and the dangers of relying on difficulty of guessing without guaranteeing the absence of memory leaks

165 citations


Proceedings ArticleDOI
03 Jun 2015
TL;DR: Experimental results using seven donor applications to eliminate ten errors in seven recipient applications highlight the ability of CP to transfer code across applications to eliminated out of bounds access, integer overflow, and divide by zero errors.
Abstract: We present Code Phage (CP), a system for automatically transferring correct code from donor applications into recipient applications that process the same inputs to successfully eliminate errors in the recipient. Experimental results using seven donor applications to eliminate ten errors in seven recipient applications highlight the ability of CP to transfer code across applications to eliminate out of bounds access, integer overflow, and divide by zero errors. Because CP works with binary donors with no need for source code or symbolic information, it supports a wide range of use cases. To the best of our knowledge, CP is the first system to automatically transfer code across multiple applications.

129 citations


Proceedings ArticleDOI
14 Mar 2015
TL;DR: DIODE is designed to identify relevant sanity checks that inputs must satisfy to trigger overflows at target memory allocation sites, then generate inputs that satisfy these sanity checks to successfully trigger the overflow.
Abstract: We present a new technique and system, DIODE, for auto- matically generating inputs that trigger overflows at memory allocation sites. DIODE is designed to identify relevant sanity checks that inputs must satisfy to trigger overflows at target memory allocation sites, then generate inputs that satisfy these sanity checks to successfully trigger the overflow. DIODE works with off-the-shelf, production x86 binaries. Our results show that, for our benchmark set of applications, and for every target memory allocation site exercised by our seed inputs (which the applications process correctly with no overflows), either 1) DIODE is able to generate an input that triggers an overflow at that site or 2) there is no input that would trigger an overflow for the observed target expression at that site.

42 citations


Proceedings ArticleDOI
01 Jan 2015
TL;DR: This work presents a technique and implemented system, Fortuna, for obtaining probabilistic bounds on false positive rates for anomaly detectors that process Internet data, and obtains a theoretical result that may be of independent interest.
Abstract: To address this issue we present a technique and implemented system, Fortuna, for obtaining probabilistic bounds on false positive rates for anomaly detectors that process Internet data. Using a probability distribution based on PageRank and an efficient algorithm to draw samples from the distribution, Fortuna computes an estimated false positive rate and a probabilistic bound on the estimate’s accuracy. By drawing test samples from a well defined distribution that correlates well with data seen in practice, Fortuna improves on ad hoc methods for estimating false positive rate, giving bounds that are reproducible, comparable across different anomaly detectors, and theoretically sound. Experimental evaluations of three anomaly detectors (SIFT, SOAP, and JSAND) show that Fortuna is efficient enough to use in practice — it can sample enough inputs to obtain tight false positive rate bounds in less than 10 hours for all three detectors. These results indicate that Fortuna can, in practice, help place anomaly detection on a stronger theoretical foundation and help practitioners better understand the behavior and consequences of the anomaly detectors that they deploy. As part of our work, we obtain a theoretical result that may be of independent interest: We give a simple analysis of the convergence rate of the random surfer process defining PageRank that guarantees the same rate as the standard, second-eigenvalue analysis, but does not rely on any assumptions about the link structure of the web.

30 citations


Proceedings ArticleDOI
09 Nov 2015
TL;DR: It is shown that 63% of the external communication made by top-popular free Android applications from Google Play has no effect on the user-observable application functionality, and a highly precise and scalable static analysis technique is proposed that is effective for identifying and disabling covert communication.
Abstract: This paper studies communication patterns in mobile applications. Our analysis shows that 63% of the external communication made by top-popular free Android applications from Google Play has no effect on the user-observable application functionality. To detect such covert communication in an efficient manner, we propose a highly precise and scalable static analysis technique: it achieves 93% precision and 61% recall compared to the empirically determined "ground truth", and runs in a matter of a few minutes. Furthermore, according to human evaluators, in 42 out of 47 cases, disabling connections deemed covert by our analysis leaves the delivered application experience either completely intact or with only insignificant interference. We conclude that our technique is effective for identifying and disabling covert communication. We then use it to investigate communication patterns in the 500 top-popular applications from Google Play.

24 citations


26 May 2015
TL;DR: A novel patch generation system that learns a probabilistic model over candidate patches from a database of past successful patches, which enables Prophet to generate correct patches for 15 out of 69 real-world defects in eight open source projects.
Abstract: We present Prophet, a novel patch generation system that learns a probabilistic model over candidate patches from a database of past successful patches. Prophet defines the probabilistic model as the combination of a distribution over program points based on defect localization algorithms and a parameterized log-linear distribution over modification operations. It then learns the model parameters via maximum log-likelihood, which identifies important characteristics of the previous successful patches in the database. For a new defect, Prophet generates a search space that contains many candidate patches, applies the learned model to prioritize those potentially correct patches that are consistent with the identified successful patch characteristics, and then validates the candidate patches with a user supplied test suite. The experimental results indicate that these techniques enable Prophet to generate correct patches for 15 out of 69 real-world defects in eight open source projects. The previous state of the art generate and validate system, which uses a set of hand-code heuristics to prioritize the search, generates correct patches for 11 of these same 69 defects.

Proceedings ArticleDOI
23 Oct 2015
TL;DR: Topaz is a new task-based language for computations that execute on approximate computing platforms that may occasionally produce arbitrarily inaccurate results, and deploys a novel outlier detection mechanism that recognizes and precisely reexecutes outlier tasks.
Abstract: We present Topaz, a new task-based language for computations that execute on approximate computing platforms that may occasionally produce arbitrarily inaccurate results. Topaz maps tasks onto the approximate hardware and integrates the generated results into the main computation. To prevent unacceptably inaccurate task results from corrupting the main computation, Topaz deploys a novel outlier detection mechanism that recognizes and precisely reexecutes outlier tasks. Outlier detection enables Topaz to work effectively with approximate hardware platforms that have complex fault characteristics, including platforms with bit pattern dependent faults (in which the presence of faults may depend on values stored in adjacent memory cells). Our experimental results show that, for our set of benchmark applications, outlier detection enables Topaz to deliver acceptably accurate results (less than 1% error) on our target approximate hardware platforms. Depending on the application and the hardware platform, the overall energy savings range from 5 to 13 percent. Without outlier detection, only one of the applications produces acceptably accurate results.

Proceedings Article
18 May 2015
TL;DR: Lax is presented, a device driver abstraction for interacting with sensors that enables power savings in exchange for occasionally returning erroneous sensor data, and can deliver significant system-level energy savings.
Abstract: Embedded sensor platforms can dissipate most of their energy in accessing sensor integrated circuits such as gyroscopes. But the algorithms which process the sensor data and the humans who consume the overall output of the system may often be able to tolerate some amount of error in the retrieved sensor values. Because devices are accessed through interfaces provided by system software, exploiting the tolerable error for improvements in energy efficiency requires appropriate system software and hardware support. However, no such support currently exists. We present Lax, a device driver abstraction for interacting with sensors that enables power savings in exchange for occasionally returning erroneous sensor data. Our implementation on a hardware prototype delivers savings in sensor dynamic power dissipation of up to 48% (as compared to precise device access) while providing sensor access error rates lower than 5 data acquisition errors per 100 data accesses. Given the significant proportion of system energy budgets in wearable platforms that are devoted to sensors, approximate sensor data acquisition using Lax can deliver significant system-level energy savings.

11 Mar 2015
TL;DR: SPR’s staged repair strategy combines a rich space of potential repairs with a targeted search algorithm that makes this space viably searchable in practice and enables SPR to successfully find correct program repairs within a space that contains many meaningful and useful patches.
Abstract: We present SPR, a new program repair system that uses condition synthesis to instantiate transformation schemas to repair program defects. SPR’s staged repair strategy combines a rich space of potential repairs with a targeted search algorithm that makes this space viably searchable in practice. This strategy enables SPR to successfully find correct program repairs within a space that contains many meaningful and useful patches. The majority of these correct repairs are not within the search spaces of previous automatic program repair systems.

04 Jun 2015
TL;DR: A family of optimal value-deviation-bounded approximate serial encoders that significantly reduce signal transitions (and hence, dynamic power) for bit-serial communication interfaces for wearable and mobile systems and an efficient algorithm that performs close to the Pareto-optimalEncoders.
Abstract: Transferring data between ICs accounts for a growing proportion of system power in wearable and mobile systems. Reducing signal transitions reduces the dynamic power dissipated in this data transfer, but traditional approaches cannot be applied when the transfer interfaces are serial buses. To address this challenge, we present a family of optimal value-deviation-bounded approximate serial encoders (VDBS encoders) that significantly reduce signal transitions (and hence, dynamic power) for bit-serial communication interfaces. When the data in transfer are from sensors, VDBS encoding enables a tradeoff between power efficiency and application fidelity, by exploiting the tolerance of many of the typical algorithms consuming sensor data to deviations in values. We derive analytic formulations for the family of VDBS encoders and introduce an efficient algorithm that performs close to the Pareto-optimal encoders. We evaluate the algorithm in two applications: Encoding data between a camera and processor in a text-recognition system, and between an accelerometer and processor in a pedometer system. For the text recognizer, the algorithm reduces signal transitions by 55 % on average, while maintaining OCR accuracy at over 90 % for previously-correctly-recognized text. For the pedometer, the algorithm reduces signal transitions by an average of 54 % in exchange for step count errors of under 5 %.

26 May 2015
TL;DR: Targeted Automatic Patching (TAP), an automatic buffer and integer overflow discovery and patching system, which dynamically analyzes the execution of the application to locate target memory allocation sites and statements that access dynamically or statically allocated blocks of memory.
Abstract: We present Targeted Automatic Patching (TAP), an automatic buffer and integer overflow discovery and patching system. Starting with an application and a seed input that the application processes correctly, TAP dynamically analyzes the execution of the application to locate target memory allocation sites and statements that access dynamically or statically allocated blocks of memory. It then uses targeted errordiscovery techniques to automatically generate inputs that trigger integer and/or buffer overflows at the target sites. When it discovers a buffer or integer overflow error, TAP automatically matches and applies patch templates to generate patches that eliminate the error. Our experimental results show that TAP successfully discovers and patches two buffer and six integer overflow errors in six real-world applications.

Journal ArticleDOI
TL;DR: This article addresses the issue of the best-case effectiveness of techniques to reduce signal transitions on serial buses, if these techniques also allow some error in the numeric interpretation of transmitted data, and presents a study of the efficiency of these value-deviation-bounded approximate serial data encoders and proofs of their properties.
Abstract: Transferring data between integrated circuits accounts for a growing proportion of system power in wearable and mobile systems. The dynamic component of power dissipated in this data transfer can be reduced by reducing signal transitions. Techniques for reducing signal transitions on communication links have traditionally been targeted at parallel buses and can therefore not be applied when the transfer interfaces are serial buses. In this article, we address the issue of the best-case effectiveness of techniques to reduce signal transitions on serial buses, if these techniques also allow some error in the numeric interpretation of transmitted data. For many embedded applications, exchanging numeric accuracy for power reduction is a worthwhile tradeoff. We present a study of the efficiency of these value-deviation-bounded approximate serial data encoders (VDBS data encoders) and proofs of their properties. The bounds and proofs we present yield new insights into the best possible tradeoffs between dynamic power reduction and approximation error that can be achieved in practice. The insights are important regardless of whether actual practical VDBS data encoders are implemented in software or in hardware.

Proceedings ArticleDOI
01 Sep 2015
TL;DR: A new code transfer technique, program fracture and recombination, for automatically replacing, deleting, and/or combining code from multiple applications, and the ability to automatically search for and find efficient implementations with good numerical properties is presented.
Abstract: We present a new code transfer technique, program fracture and recombination, for automatically replacing, deleting, and/or combining code from multiple applications. Benefits include automatic generation of new applications incorporating the best or most desirable functionality developed anywhere, the automatic elimination of errors and security vulnerabilities, effective software rejuvenation, the automatic elimination of obsolete or undesirable functionality, and improved performance, energy efficiency, simplicity, analyzability, and clarity. The technique may be particularly appropriate for high performance computing. The field has devoted years of effort to developing efficient (but complex) implementations of standard linear algebra operations with good numerical properties. At the same time these operations also have very simple but inefficient implementations, often with poor numerical properties. Program fracture and recombination allows developers to work with the simple implementation during development and testing, then use program fracture and recombination to automatically find and deploy the most appropriate implementation for the hardware platform at hand. The benefits include reduced implementation effort, increased code clarity, and the ability to automatically search for and find efficient implementations with good numerical properties.

14 Apr 2015
TL;DR: A new horizontal code transfer technique, program fracture and recombination, for automatically replacing, deleting, and/or combining code from multiple applications, and improved performance, simplicity, analyzability, and clarity is presented.
Abstract: We present a new horizontal code transfer technique, program fracture and recombination, for automatically replacing, deleting, and/or combining code from multiple applications. Benefits include automatic generation of new applications incorporating the best or most desirable functionality developed anywhere, the automatic elimination of security vulnerabilities, effective software rejuvenation, the automatic elimination of obsolete or undesirable functionality, and improved performance, simplicity, analyzability, and clarity.

27 Dec 2015
TL;DR: Results demonstrate the effectiveness of filtered iterators in eliminating many of the difficulties that developers encounter when developing input-processing code and enabling developers to produce safe and robust input- processing code.
Abstract: We present a new language construct, filtered iterators, for safe and robust input processing. Filtered iterators are designed to eliminate many common input-processing errors while enabling robust continued execution. The design is inspired by (a) observed common input-processing errors and (b) continued execution strategies that are implemented by developers fixing input validation errors. Filtered iterators decompose inputs into input units, atomically and automatically discarding units that trigger errors. Statistically significant results from a developer study highlight the difficulties that developers encounter when developing input-processing code using standard language constructs. These results also demonstrate the effectiveness of filtered iterators in eliminating many of these difficulties and enabling developers to produce safe and robust input-processing code.

01 Jun 2015
TL;DR: The VIBRANCE tool starts with a vulnerable Java application and automatically hardens it against SQL injection, OS command injection, file path traversal, numeric errors, denial of service, and other attacks.
Abstract: : The VIBRANCE tool starts with a vulnerable Java application and automatically hardens it against SQL injection, OS command injection, file path traversal, numeric errors, denial of service, and other attacks. For a large class of attacks, the protection added by VIBRANCE blocks the attacks and safely continues execution.


12 Feb 2015
TL;DR: PCR uses a new condition synthesis technique to efficiently discover logical expressions that generate desired controlflow transfer patterns and leverages condition synthesis to obtain a set of compound modifications that generate a rich, productive, and tractable search space of candidate patches.
Abstract: We present PCR, a new automatic patch generation system. PCR uses a new condition synthesis technique to efficiently discover logical expressions that generate desired controlflow transfer patterns. Presented with a set of test cases, PCR deploys condition synthesis to find and repair incorrect if conditions that cause the application to produce the wrong result for one or more of the test cases. PCR also leverages condition synthesis to obtain a set of compound modifications that generate a rich, productive, and tractable search space of candidate patches. We evaluate PCR on a set of 105 defects from the GenProg benchmark set. For 40 of these defects, PCR generates plausible patches (patches that generate correct outputs for all inputs in the test suite used to validate the patch). For 12 of these defects, PCR generates correct patches that are functionally equivalent to developer patches that appear in subsequent versions. For comparison purposes, GenProg generates plausible patches for only 18 defects and correct patches for only 2 defects. AE generates plausible patches for only 27 defects and correct patches for only 3 defects.

04 May 2015
TL;DR: This paper presents a static analysis that is able to detect non-essential communication with 84% -90% precision and 63%-64% recall, depending on whether advertisement content is interpreted as essential or not.
Abstract: This paper studies communication patterns in mobile applications. Our analysis shows that 65% of the HTTP, socket, and RPC communication in top-popular Android applications from Google Play have no effect on the user-observable application functionality. We present a static analysis that is able to detect non-essential communication with 84% -90% precision and 63%-64% recall, depending on whether advertisement content is interpreted as essential or not. We use our technique to analyze the 500 top-popular Android applications from Google Play and determine that more than 80% of the connection statements in these applications are non-essential.