Showing papers on "Redundant code published in 2016"

PDF

Open Access

Proceedings Article•

Learning unified features from natural and programming languages for locating buggy source code

[...]

Xuan Huo¹, Ming Li¹, Zhi-Hua Zhou¹•Institutions (1)

09 Jul 2016

TL;DR: Experimental results on widely-used software projects indicate that NP-CNN significantly outperforms the state-of-the-art methods in locating the buggy source files.

...read moreread less

Abstract: Bug reports provide an effective way for end-users to disclose potential bugs hidden in a software system, while automatically locating the potential buggy source code according to a bug report remains a great challenge in software maintenance. Many previous studies treated the source code as natural language by representing both the bug report and source code based on bag-of-words feature representations, and correlate the bug report and source code by measuring similarity in the same lexical feature space. However, these approaches fail to consider the structure information of source code which carries additional semantics beyond the lexical terms. Such information is important in modeling program functionality. In this paper, we propose a novel convolutional neural network NP-CNN, which leverages both lexical and program structure information to learn unified features from natural language and source code in programming language for automatically locating the potential buggy source code according to bug report. Experimental results on widely-used software projects indicate that NP-CNN significantly outperforms the state-of-the-art methods in locating the buggy source files.

...read moreread less

137 citations

Proceedings Article•DOI•

Binary code is not easy

[...]

Xiaozhu Meng¹, Barton P. Miller¹•Institutions (1)

University of Wisconsin-Madison¹

18 Jul 2016

TL;DR: New code parsing algorithms in the open source Dyninst tool kit are presented, including a new model for describing jump tables that improves the ability to precisely determine the control flow targets, a new interprocedural analysis to determine when a function is non-returning, and techniques for handling tail calls.

...read moreread less

Abstract: Binary code analysis is an enabling technique for many applications. Modern compilers and run-time libraries have introduced significant complexities to binary code, which negatively affect the capabilities of binary analysis tool kits to analyze binary code, and may cause tools to report inaccurate information about binary code. Analysts may hence be confused and applications based on these tool kits may have degrading quality. We examine the problem of constructing control flow graphs from binary code and labeling the graphs with accurate function boundary annotations. We identified several challenging code constructs that represent hard-to-analyze aspects of binary code, and show code examples for each code construct. As part of this discussion, we present new code parsing algorithms in our open source Dyninst tool kit that support these constructs, including a new model for describing jump tables that improves our ability to precisely determine the control flow targets, a new interprocedural analysis to determine when a function is non-returning, and techniques for handling tail calls. We evaluated how various tool kits fare when handling these code constructs with real software as well as test binaries patterned after each challenging code construct we found in real software.

...read moreread less

112 citations

Proceedings Article•DOI•

JRed: Program Customization and Bloatware Mitigation Based on Static Analysis

[...]

Yufei Jiang¹, Dinghao Wu¹, Peng Liu¹•Institutions (1)

Pennsylvania State University¹

10 Jun 2016

TL;DR: A new static-analysis-enabled approach to trimming unused code from both Java applications and Java Runtime Environment (JRE) automatically is proposed, built on top of the Soot framework and evaluated based on a set of criteria: code size, code complexity, memory footprint, execution and garbage collection time, and security.

...read moreread less

Abstract: Modern software engineering practice increasingly brings redundant code into software products, which has caused a phenomenon called bloatware, leading to software system maintenance, performance and reliability issues as well as security problems. With the rapid advances of smart devices and a more connected world, it is never more important to trim bloatware to improve the leanness, agility, reliability, performance, and security of the interconnected software and network systems. Previous methods have limited scopes and are usually not fully automated. In this paper, we propose a new static-analysis-enabled approach to trimming unused code from both Java applications and Java Runtime Environment (JRE) automatically. We have built a tool called JRed on top of the Soot framework. We have conducted a fairly comprehensive evaluation of JRed based on a set of criteria: code size, code complexity, memory footprint, execution and garbage collection time, and security. Our experimental results show that, Java application size can be reduced by 44.5% on average and the JRE code can be reduced by more than 82.5% on average. The code complexity is significantly reduced according to a set of well-known metrics. Furthermore, we report that by trimming redundant code, 48.6% of the known security vulnerabilities in the Java Runtime Environment JRE 6 update 45 has been removed.

...read moreread less

69 citations

Proceedings Article•DOI•

No-Execute-After-Read: Preventing Code Disclosure in Commodity Software

[...]

Jan Werner¹, George Baltas², Rob Dallara², Nathan Otterness², Kevin Z. Snow², Fabian Monrose², Michalis Polychronakis³ - Show less +3 more•Institutions (3)

Renaissance Computing Institute¹, University of North Carolina at Chapel Hill², Stony Brook University³

30 May 2016

TL;DR: This paper introduces the design and implementation of a novel memory permission primitive, dubbed No-Execute-After-Read (near), that foregoes the problems of XnR and provides strong security guarantees against just-in-time attacks in commodity binaries.

...read moreread less

Abstract: Memory disclosure vulnerabilities enable an adversary to successfully mount arbitrary code execution attacks against applications via so-called just-in-time code reuse attacks, even when those applications are fortified with fine-grained address space layout randomization. This attack paradigm requires the adversary to first read the contents of randomized application code, then construct a code reuse payload using that knowledge. In this paper, we show that the recently proposed Execute-no-Read (XnR) technique fails to prevent just-in-time code reuse attacks. Next, we introduce the design and implementation of a novel memory permission primitive, dubbed No-Execute-After-Read (near), that foregoes the problems of XnR and provides strong security guarantees against just-in-time attacks in commodity binaries. Specifically, near allows all code to be disclosed, but prevents any disclosed code from subsequently being executed, thus thwarting just-in-time code reuse. At the same time, commodity binaries with mixed code and data regions still operate correctly, as legitimate data is still readable. To demonstrate the practicality and portability of our approach we implemented prototypes for both Linux and Android on the ARMv8 architecture, as well as a prototype that protects unmodified Microsoft Windows executables and dynamically linked libraries. In addition, our evaluation on the SPEC2006 benchmark demonstrates that our prototype has negligible runtime overhead, making it suitable for practical deployment.

...read moreread less

65 citations

Proceedings Article•DOI•

Code relatives: detecting similarly behaving software

[...]

Fang-Hsiang Su¹, Jonathan Bell¹, Kenneth Harvey¹, Simha Sethumadhavan¹, Gail E. Kaiser¹, Tony Jebara¹ - Show less +2 more•Institutions (1)

Columbia University¹

01 Nov 2016

TL;DR: DyCLINK records instruction-level traces from sample executions, organizes the traces into instruction- level dynamic dependence graphs, and employs the authors' specialized subgraph matching algorithm to efficiently compare the executions of candidate code relatives.

...read moreread less

Abstract: Detecting “similar code” is useful for many software engineering tasks. Current tools can help detect code with statically similar syntactic and–or semantic features (code clones) and with dynamically similar functional input/output (simions). Unfortunately, some code fragments that behave similarly at the finer granularity of their execution traces may be ignored. In this paper, we propose the term “code relatives” to refer to code with similar execution behavior. We define code relatives and then present DyCLINK, our approach to detecting code relatives within and across codebases. DyCLINK records instruction-level traces from sample executions, organizes the traces into instruction-level dynamic dependence graphs, and employs our specialized subgraph matching algorithm to efficiently compare the executions of candidate code relatives. In our experiments, DyCLINK analyzed 422+ million prospective subgraph matches in only 43 minutes. We compared DyCLINK to one static code clone detector from the community and to our implementation of a dynamic simion detector. The results show that DyCLINK effectively detects code relatives with a reasonable analysis time.

...read moreread less

60 citations

Proceedings Article•DOI•

Juggling the Gadgets: Binary-level Code Randomization using Instruction Displacement

[...]

Hyungjoon Koo¹, Michalis Polychronakis¹•Institutions (1)

Stony Brook University¹

30 May 2016

TL;DR: Direct instruction displacement is presented, a code diversification technique based on static binary instrumentation that does not rely on complete code disassembly coverage that aims to improve the randomization coverage and entropy of existing binary-level code diversifying techniques by displacing any remaining non-randomized gadgets to random locations.

...read moreread less

Abstract: Code diversification is an effective mitigation against return-oriented programming attacks, which breaks the assumptions of attackers about the location and structure of useful instruction sequences, known as "gadgets". Although a wide range of code diversification techniques of varying levels of granularity exist, most of them rely on the availability of source code, debug symbols, or the assumption of fully precise code disassembly, limiting their practical applicability for the protection of closed-source third-party applications. In-place code randomization has been proposed as an alternative binary-compatible diversification technique that is tolerant of partial disassembly coverage, in the expense though of leaving some gadgets intact, at the disposal of attackers. Consequently, the possibility of constructing robust ROP payloads using only the remaining non-randomized gadgets is still open. In this paper we present instruction displacement, a code diversification technique based on static binary instrumentation that does not rely on complete code disassembly coverage. Instruction displacement aims to improve the randomization coverage and entropy of existing binary-level code diversification techniques by displacing any remaining non-randomized gadgets to random locations. The results of our experimental evaluation demonstrate that instruction displacement reduces the number of non-randomized gadgets in the extracted code regions from 15.04% for standalone in-place code randomization, to 2.77% for the combination of both techniques. At the same time, the additional indirection introduced due to displacement incurs a negligible runtime overhead of 0.36% on average for the SPEC CPU2006 benchmarks.

...read moreread less

36 citations

Journal Article•DOI•

Stochastic program optimization

[...]

Eric Schkufza¹, Rahul Sharma¹, Alex Aiken¹•Institutions (1)

Stanford University¹

25 Jan 2016-Communications of The ACM

TL;DR: This work shows that by encoding constraints as terms in a cost function, and using a Markov Chain Monte Carlo sampler to rapidly explore the space of all possible code sequences, it is able to generate aggressively optimized versions of a given target code sequence.

...read moreread less

Abstract: The optimization of short sequences of loop-free, fixed-point assembly code sequences is an important problem in high-performance computing. However, the competing constraints of transformation correctness and performance improvement often force even special purpose compilers to produce sub-optimal code. We show that by encoding these constraints as terms in a cost function, and using a Markov Chain Monte Carlo sampler to rapidly explore the space of all possible code sequences, we are able to generate aggressively optimized versions of a given target code sequence. Beginning from binaries compiled by 11vm --O0, we are able to produce provably correct code sequences that either match or outperform the code produced by qcc --O3, icc --O3, and in some cases expert handwritten assembly.

...read moreread less

31 citations

Journal Article•DOI•

The spirit of ghost code

[...]

Jean-Christophe Filliâtre¹, Léon Gondelman¹, Andrei Paskevich¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Jun 2016

TL;DR: A simple ML-style programming language with mutable state and ghost code is described and non-interference is ensured by a type system with effects, which allows the same data types and functions to be used in both regular andghost code.

...read moreread less

Abstract: In the context of deductive program verification, ghost code is a part of the program that is added for the purpose of specification. Ghost code must not interfere with regular code, in the sense that it can be erased without observable difference in the program outcome. In particular, ghost data cannot participate in regular computations and ghost code cannot mutate regular data or diverge. The idea exists in the folklore since the early notion of auxiliary variables and is implemented in many state-of-the-art program verification tools. However, ghost code deserves rigorous definition and treatment, and few formalizations exist. In this article, we describe a simple ML-style programming language with mutable state and ghost code. Non-interference is ensured by a type system with effects, which allows, notably, the same data types and functions to be used in both regular and ghost code. We define the procedure of ghost code erasure and we prove its safety using bisimulation. A similar type system, with numerous extensions which we briefly discuss, is implemented in the program verification environment Why3.

...read moreread less

28 citations

Journal Article•DOI•

Automatic identifier inconsistency detection using code dictionary

[...]

Suntae Kim¹, Dongsun Kim²•Institutions (2)

Chonbuk National University¹, University of Luxembourg²

01 Apr 2016-Empirical Software Engineering

TL;DR: This paper proposes an approach to detecting inconsistent identifiers based on a custom code dictionary that automatically builds a Code Dictionary from the existing API documents of popular Java projects by using an Natural Language Processing (NLP) parser and conducts an interview with developers who used this approach.

...read moreread less

Abstract: Inconsistent identifiers make it difficult for developers to understand source code. In particular, large software systems written by several developers can be vulnerable to identifier inconsistency. Unfortunately, it is not easy to detect inconsistent identifiers that are already used in source code. Although several techniques have been proposed to address this issue, many of these techniques can result in false alarms since such techniques do not accept domain words and idiom identifiers that are widely used in programming practice. This paper proposes an approach to detecting inconsistent identifiers based on a custom code dictionary. It first automatically builds a Code Dictionary from the existing API documents of popular Java projects by using an Natural Language Processing (NLP) parser. This dictionary records domain words with dominant part-of-speech (POS) and idiom identifiers. This set of domain words and idioms can improve the accuracy when detecting inconsistencies by reducing false alarms. The approach then takes a target program and detects inconsistent identifiers of the program by leveraging the Code Dictionary. We provide CodeAmigo, a GUI-based tool support for our approach. We evaluated our approach on seven Java based open-/proprietary- source projects. The results of the evaluations show that the approach can detect inconsistent identifiers with 85.4 % precision and 83.59 % recall values. In addition, we conducted an interview with developers who used our approach, and the interview confirmed that inconsistent identifiers frequently and inevitably occur in most software projects. The interviewees then stated that our approach can help to better detect inconsistent identifiers that would have been missed through manual detection.

...read moreread less

28 citations

Patent•

Polymorphic obfuscation of executable code

[...]

Siying Yang, Jarrod Overson, Ben Vinegar, Bei Zhang

30 Aug 2016

TL;DR: In this paper, the authors present a method for identifying and interfering with the operation of computer malware, as a mechanism for improving system security, using a computer-implemented method.

...read moreread less

Abstract: This document generally relates to systems, method, and other techniques for identifying and interfering with the operation of computer malware, as a mechanism for improving system security. Some implementations include a computer-implemented method by which a computer security server system performs actions including receiving a request for content directed to a particular content server system; forwarding the request to the particular content server system; receiving executable code from the particular content server system; inserting executable injection code into at least one file of the executable code; applying a security countermeasure to the combined executable code and executable injection code to create transformed code; and providing the transformed code to a client computing device.

...read moreread less

25 citations

Proceedings Article•DOI•

Exploring concept of QR code and its benefits in digital education system

[...]

Saroj Goyal¹, Surendra Yadav¹, Manish Mathuria¹•Institutions (1)

SirsiDynix¹

01 Sep 2016

TL;DR: The testing and result are presented which state that the QR Code is the best way to compose the identical information of any entity to quickly figure out the originality.

...read moreread less

Abstract: This research paper concentrates on the concept of Digital Authentication using QR Code in Digital Education System. This paper aimed to provide a better solution to the Digital Security. There are two challenges of the work i.e. first one is to explore the usability of QR Code in general life and second is to incorporate QR Code technology with an educational document for security to avoid duplicity. The literature review is done to synthesis digital encoding and decoding technique as well as basics of Bar Code and QR Code. The implementation of QRC (Quick Response Code) for verification is presented where web environment, programming logics, and URL embedding are discussed. The result analysis and testing of experiment are done in the sense to get best quality of QR Code though the information embedded should not affected and the QR Code must easily be decoded the embedded information from common tools. The goal of this research paper is to explore and analyze the best image under the testing of Error Correction Level and Matrix Point Size parameters by calculating the PSNR and MSE values for QR Code images with different image file format (PNG and JPG). The calculated values are compared and the final conclusion of the work found which state that the PNG image with Error Correction Level-L and Matrix Point Size 1 are the best to generate quality QR Code. The testing and result are presented which state that the QR Code is the best way to compose the identical information of any entity to quickly figure out the originality.

...read moreread less

Book Chapter•DOI•

Runtime Code Polymorphism as a Protection Against Side Channel Attacks

[...]

Damien Couroussé¹, Thierno Barry¹, Bruno Robisson, Philippe Jaillon², Olivier Potin², Jean-Louis Lanet³ - Show less +2 more•Institutions (3)

University of Grenoble¹, École Normale Supérieure², French Institute for Research in Computer Science and Automation³

26 Sep 2016

TL;DR: This paper presents the implementation of code polymorphism with runtime code generation, which offers many code transformation possibilities: the use of random register allocation, random instruction selection, instruction shuffling and insertion of noise instructions.

...read moreread less

Abstract: We present a generic framework for runtime code polymorphism, applicable to a broad range of computing platforms including embedded systems with low computing resources (e.g. microcontrollers with few kilo-bytes of memory). Code polymorphism is defined as the ability to change the observable behaviour of a software component without changing its functional properties. In this paper we present the implementation of code polymorphism with runtime code generation, which offers many code transformation possibilities: we describe the use of random register allocation, random instruction selection, instruction shuffling and insertion of noise instructions. We evaluate the effectiveness of our framework against correlation power analysis: as compared to an unprotected implementation of AES where the secret key could be recovered in less than 50 traces in average, in our protected implementation, we increased the number of traces necessary to achieve the same attack by more than 20000\(\times \). With regards to the state of the art, our implementation shows a moderate impact in terms of performance overhead.

...read moreread less

Journal Article•DOI•

Enhance fuzzy vault security using nonrandom chaff point generator

[...]

Minh Tan Nguyen, Quang Hai Truong, Tran Khanh Dang

01 Jan 2016-Information Processing Letters

TL;DR: A significant flaw of current CRC-based fuzzy vault schemes is addressed, and an integration of two novel modules into general fuzzy vault scheme, namely chaff points generator and verifier, are proposed, designed to be integrated easily into existing systems as well as simple to enhance.

...read moreread less

Proceedings Article•DOI•

Defending against the attack of the micro-clones

[...]

Rijnard van Tonder¹, Claire Le Goues¹•Institutions (1)

Carnegie Mellon University¹

16 May 2016

TL;DR: The Boa software mining infrastructure is leveraged to detect micro-clones in a data set containing 380,125 Java repositories, and yield thousands of instances where redundant code may be safely removed.

...read moreread less

Abstract: Micro-clones are small pieces of redundant code, such as repeated subexpressions or statements. In this paper, we establish the considerations and value toward automated detection and removal of micro-clones at scale. We leverage the Boa software mining infrastructure to detect micro-clones in a data set containing 380,125 Java repositories, and yield thousands of instances where redundant code may be safely removed. By filtering our results to target popular Java projects on GitHub, we proceed to issue 43 pull requests that patch micro-clones. In summary, 95% of our patches to active GitHub repositories are merged rapidly (within 15 hours on average). Moreover, none of our patches were contested; they either constituted a real flaw, or have not been considered due to repository inactivity. Our results suggest that the detection and removal of micro-clones is valued by developers, can be automated at scale, and may be fixed with rapid turnaround times.

...read moreread less

Proceedings Article•DOI•

Rule-directed code clone synchronization

[...]

Xiao Cheng¹, Hao Zhong¹, Yuting Chen¹, Zhenjiang Hu², Jianjun Zhao¹ - Show less +1 more•Institutions (2)

Shanghai Jiao Tong University¹, National Institute of Informatics²

16 May 2016

TL;DR: This paper proposes CCSync, a novel, rule-directed approach, which paves the structure differences between the code clones and synchronizes them even when code clones become quite different in their structures.

...read moreread less

Abstract: Code clones are prevalent in software systems due to many factors in software development. Detecting code clones and managing consistency between them along code evolution can be very useful for reducing clone-related bugs and maintenance costs. Despite some early attempts at detecting code clones and managing the consistency between them, the state-of-the-art tool can only handle simple code clones whose structures are identical or quite similar. However, existing empirical studies show that clones can have quite different structures with their evolution, which can easily go beyond the capability of the state-of-the-art tool. In this paper, we propose CCSync, a novel, rule-directed approach, which paves the structure differences between the code clones and synchronizes them even when code clones become quite different in their structures. The key steps of this approach are, given two code clones, to (1) extract a synchronization rule from the relationship between the clones, and (2) once one code fragment is updated, propagate the modifications to the other following the synchronization rule. We have implemented a tool for CCSync and evaluated its effectiveness on five Java projects. Our results shows that there are many code clones suitable for synchronization, and our tool achieves precisions of up to 92% and recalls of up to 84%. In particular, more than 76% of our generated revisions are identical with manual revisions.

...read moreread less

Patent•

Runtime state based code re-optimization

[...]

Filip J. Pizlo¹, Gavin Barraclough¹•Institutions (1)

Apple Inc.¹

17 Mar 2016

TL;DR: A method and an apparatus to execute a code compiled from a source code to access an untyped variable are described in this paper, where an optimized access code may be compiled in the code with speculative optimization via a type prediction of runtime value of the untyping variable.

...read moreread less

Abstract: A method and an apparatus to execute a code compiled from a source code to access an untyped variable are described. An optimized access code may be compiled in the code with speculative optimization via a type prediction of runtime value of the untyped variable. Invalidity of the type prediction may be dynamically detected for future runtime values of the untyped variable. The code may be updated with an access code compiled for the access without the speculative optimization based on the invalidity detection. The updated code can be executed for the access to the untyped variable without executing the optimized access code.

...read moreread less

Proceedings Article•DOI•

Identifying Auto-Generated Code by Using Machine Learning Techniques

[...]

Kento Shimonaka¹, Soichi Sumi¹, Yoshiki Higo¹, Shinji Kusumoto¹•Institutions (1)

Osaka University¹

13 Mar 2016

TL;DR: This work proposes a technique to identify auto-generated code automatically by using machine learning techniques, which can identify whether source code is auto- generated code or not by utilizing syntactic information of source code.

...read moreread less

Abstract: Recently, many researchers have conducted mining source code repositories to retrieve useful information about software development. Source code repositories often include auto-generated code, and auto-generated code is usually removed in a preprocessing phase because the presence of auto-generated code is harmful to source code analysis. A usual way to removeauto-generated code is searching particular comments which existamong auto-generated code. However, we cannot identify auto-generated code automatically with such a way if comments have disappeared. In addition, it takes too much time to identify auto-generated code manually. Therefore, we propose a techniqueto identify auto-generated code automatically by using machinelearning techniques. In our proposed technique, we can identifywhether source code is auto-generated code or not by utilizingsyntactic information of source code. In order to evaluate theproposed technique, we conducted experiments on source codegenerated by four kinds of code generators. As a result, weconfirmed that the proposed technique was able to identify auto-generated code with high accuracy.

...read moreread less

Proceedings Article•

Testing error handling code in device drivers using characteristic fault injection

[...]

Jia-Ju Bai¹, Yu-Ping Wang¹, Jie Yin¹, Shi-Min Hu¹•Institutions (1)

Tsinghua University¹

22 Jun 2016

TL;DR: This paper first study the source code of Linux drivers to find useful characteristics of error handling code, then uses these characteristics in fault injection testing, and proposes a novel approach named EH-Test, which can efficiently testerror handling code in drivers.

...read moreread less

Abstract: Device drivers may encounter errors when communicating with OS kernel and hardware. However, error handling code often gets insufficient attention in driver development and testing, because these errors rarely occur in real execution. For this reason, many bugs are hidden in error handling code. Previous approaches for testing error handling code often neglect the characteristics of device drivers, so their efficiency and accuracy are limited. In this paper, we first study the source code of Linux drivers to find useful characteristics of error handling code. Then we use these characteristics in fault injection testing, and propose a novel approach named EH-Test, which can efficiently test error handling code in drivers. To improve the representativeness of injected faults, we design a pattern-based extraction strategy to automatically and accurately extract target functions which can actually fail and trigger error handling code. During execution, we use a monitor to record runtime information and pair checkers to check resource usages. We have evaluated EH-Test on 15 real Linux device drivers and found 50 new bugs in Linux 3.17.2. The code coverage is also effectively increased. Comparison experiments to previous related approaches also show the effectiveness of EH-Test.

...read moreread less

Proceedings Article•DOI•

Hunter: next-generation code reuse for Java

[...]

Yuepeng Wang¹, Yu Feng¹, Ruben Martins¹, Arati Kaushik¹, Isil Dillig¹, Steven P. Reiss² - Show less +2 more•Institutions (2)

University of Texas at Austin¹, Brown University²

01 Nov 2016

TL;DR: A tool called Hunter is presented that facilitates code reuse by finding relevant methods in large code bases and automatically synthesizing any necessary wrapper code and can automatically reuse existing methods even when code adaptation is necessary.

...read moreread less

Abstract: In many common scenarios, programmers need to implement functionality that is already provided by some third party library. This paper presents a tool called Hunter that facilitates code reuse by finding relevant methods in large code bases and automatically synthesizing any necessary wrapper code. Since Hunter internally uses advanced program synthesis technology, it can automatically reuse existing methods even when code adaptation is necessary. We have implemented Hunter as an Eclipse plug-in and evaluate it by (a) comparing it against S6, a state-of-the-art code reuse tool, and (b) performing a user study. Our evaluation shows that Hunter compares favorably with S6 and increases programmer productivity.

...read moreread less

Proceedings Article•DOI•

STAC: A tool for Static Textual Analysis of Code

[...]

Saket Khatiwada¹, Michael Kelly¹, Anas Mahmoud¹•Institutions (1)

Louisiana State University¹

16 May 2016

TL;DR: STAC is designed as a light-weight stand-alone tool that provides a practical one-stop solution for code indexing and provides features for extracting and processing textual patterns found in Java, C++, and C# code artifacts.

...read moreread less

Abstract: Static textual analysis techniques have been recently applied to process and synthesize source code. The underlying tenet is that important information is embedded in code identifiers and internal code comments. Such information can be analyzed to provide automatic aid for several software engineering activities. To facilitate this line of work, we present STAC, a tool for supporting Static Textual Analysis of Code. STAC is designed as a light-weight stand-alone tool that provides a practical one-stop solution for code indexing. Code indexing is the process of extracting important textual information from source code. Accurate indexing has been found to significantly influence the performance of code retrieval and analysis methods. STAC provides features for extracting and processing textual patterns found in Java, C++, and C# code artifacts. These features include identifier splitting, stemming, lemmatization, and spell-checking. STAC is also provided as an API to help researchers to integrate basic code indexing features into their code.

...read moreread less

Proceedings Article•DOI•

Toward improving ability to repair bugs automatically: a patch candidate location mechanism using code similarity

[...]

Haruki Yokoyama¹, Yoshiki Higo¹, Keisuke Hotta¹, Takafumi Ohta¹, Kozo Okano¹, Shinji Kusumoto¹ - Show less +2 more•Institutions (1)

Osaka University¹

04 Apr 2016

TL;DR: Some promising results are reported of the pilot study, which show that using code similarity to select code lines in code regions similar to the faulty code regions is very effective to generate repaired programs in less time.

...read moreread less

Abstract: Automated program repair is a promising way to reduce costs on program debugging dramatically. Some repair techniques with genetic algorithm have been proposed, and they were able to fix several dozen of actual bugs in open source software. However, existing techniques occasionally take a long time to generate a repaired version of a given program. The dominant factor for that is generating so many programs that do not pass given test cases. In this research, we are trying to generate a repaired program, which passes all the test cases, in less time. Our key idea is using code similarity to select code lines to be inserted into a given program. More concretely, we propose to select code lines in code regions similar to the faulty code regions. In existing techniques, code lines for insertion are randomly selected from a given program. Currently, we are still in an early stage of this research. In this paper, we report some promising results of our pilot study, which show that using code similarity is very effective to generate repaired programs in less time.

...read moreread less

Patent•

Technology mapping onto code fragments

[...]

Samit Chaudhuri, Andrew William Fox, Tigran Sargsyan

03 Nov 2016

TL;DR: In this paper, a high-level program description is analyzed to determine locations of one or more cuts within the program, and then a matching process is used to find modules that are suitable replacements for the high level code.

...read moreread less

Abstract: Technology mapping onto code fragments and related concepts are disclosed. Program descriptions are obtained in a high-level language. One or more intrinsic libraries containing modules are obtained. The modules correspond to sections of code intended for execution on the special purpose hardware. The high-level program description is analyzed to determine locations of one or more cuts within the program. The cuts represent portions of the high-level code that are eligible for replacement by one or more modules from intrinsic libraries. A matching process is used to find modules that are suitable replacements for the high level code. Once the replacements are made, additional verification and/or validation are performed by compiler checking and/or execution tests.

...read moreread less

Patent•

Method for programming a control unit of a motor vehicle

[...]

Eckart Schlottmann¹, Udo Schulz¹, Liem Dang¹•Institutions (1)

Bosch¹

24 Feb 2016

TL;DR: In this paper, a method for programming a control unit of a motor vehicle, a previous program code executed in the control unit being stored in a memory area, a new program code being written into the control units, and a check of this new code being carried out, was described.

...read moreread less

Abstract: A method for programming a control unit of a motor vehicle, a previous program code executed in the control unit being stored in a memory area, a new program code being written into the control unit, and a check of this new program code being carried out, the program code being executed by the control unit if the new program code is successfully verified in the course of the check, and the previous program code stored in the memory area being written from the memory area into the control unit and the previous program code being executed by the control unit if the new program code is not successfully verified in the course of the check.

...read moreread less

Patent•

Determining errors and warnings corresponding to a source code revision

[...]

James Hewitt¹, Colin I. Holyoake¹, Richard Postlethwaite¹, Caroline J. Thomas¹•Institutions (1)

IBM¹

22 Mar 2016

TL;DR: The method of tracking errors and warnings of a revision of a source code includes a computer processor that receives a first and second revision of source code as mentioned in this paper, and the computer processor determines a first set of error and warning included in the first revision of the source code and a second set of errors and warning omitted in the second revision.

...read moreread less

Abstract: The method of tracking errors and warnings of a revision of source code includes a computer processor that receives a first and second revision of source code. The computer processor determines a first set of errors and warnings included in the first revision of the source code and a second set of errors and warnings included in the second revision of the source code. The computer processor identifies a third set of errors and warnings that appear in the first revision of the source code, and absent in the second revision of the source code, and the computer processor identifies a fourth set of errors and warnings that are absent in the first revision of the source code and that appear in the second revision of the source code.

...read moreread less

Patent•

Correcting non-compliant source code in an integrated development environment

[...]

Swaminathan Balasubramanian¹, Radha M. De¹, Brian M. O'Connell¹, Keith R. Walker¹•Institutions (1)

IBM¹

01 Apr 2016

TL;DR: In this paper, a non-compliant segment of code, which requires correction, and that was coded by a first developer, is assembled into a stand-alone artifact that is dispatched to a second developer.

...read moreread less

Abstract: Approaches presented herein enable correction of source code that fails to comply with an established coding standard in a code base within an Integrated Development Environment (IDE). Specifically, a non-compliant segment of code, which requires correction, and that was coded by a first developer, is assembled into a stand-alone artifact that is dispatched to a second developer. The code segment is then corrected by the second developer, and the artifact containing the corrected segment is merged back into the code base from whence the segment originated. In one approach, the artifact comprises the code segment, dependent code classes of the segment, unit tests results, and test data. In another approach, the second developer is identified utilizing a skill-matching algorithm. In yet another approach, the corrected segment is unit-tested prior to being merged back into the code base.

...read moreread less

Patent•

Graphical representation of data in a program code editor

[...]

Max Drukman¹, Kenneth S. Orr¹, Page Samuel C¹, Behzad Aghaei¹, Chris Lattner¹ - Show less +1 more•Institutions (1)

Apple Inc.¹

21 Sep 2016

TL;DR: In this paper, a non-transitory computer-readable medium stores instructions for implementing a source code editor within an integrated development environment, where the instructions cause one or more processors to implement a method comprising receiving data at a source Code Editor, the data representing a data value to be used by source code displayed by the source code, inserting the data representation the data value into the source Code, and displaying a graphical representation of the data values in the Source code editor, the graphical representation displayed within a program code statement associated with the value.

...read moreread less

Abstract: In one embodiment, non-transitory computer-readable medium stores instructions for implementing a source code editor within an integrated development environment. The instructions to cause one or more processors to implement a method comprising receiving data at a source code editor, the data representing a data value to be used by source code displayed by the source code editor, inserting the data representing the data value into the source code, and displaying a graphical representation of the data value in the source code editor, the graphical representation displayed within a program code statement associated with the data value.

...read moreread less

Patent•

System and method for providing code coverage

[...]

Sanjay Kumar Yadava¹, Johnson Selwyn¹, S U M Prasad Dhanyamraju¹, Ambica Jain¹, Arivukarasu Sivanesan¹ - Show less +1 more•Institutions (1)

HCL Technologies¹

24 Mar 2016

TL;DR: In this article, a method and system for providing code coverage of a code is presented, where the system may determine a type of file comprising the code and extract a source code from the code, based on the type of files.

...read moreread less

Abstract: Disclosed is a method and system for providing code coverage of a code. The system may determine a type of file comprising the code. The system may extract a source code from the code, based on the type of file. The system may process the source code for generating a structured source code. The system may determine code coverage of the structured source code by executing test cases upon the structured source code. The system may provide a code coverage report comprising line coverages, program coverages, and code block coverages of the structured source code.

...read moreread less

Patent•

Protecting against malware variants using reconstructed code of malware

[...]

David Tolpin¹, Shlomi Boutnaru¹, Yuri Shafet¹•Institutions (1)

PayPal¹

05 Aug 2016

TL;DR: In this paper, the system analyzes system calls from executing a program to generate programming code or executable for a particular OS and/or CPU that would perform the same or similar actions as the program.

...read moreread less

Abstract: A system for discovering programming variants. The system analyzes system calls from executing a program to generate programming code or executable for a particular OS and/or CPU that would perform the same or similar actions as the program. The code that is generated is then mutated, augmented, and/or changed to create variations of the program which still functions and/or obtains the same objectives as the original code.

...read moreread less

Proceedings Article•DOI•

Comparing Test and Production Code Quality in a Large Commercial Multicore System

[...]

Steve Counsell¹, Giuseppe Destefanis¹, Xiaohui Liu¹, Sigrid Eldh², A. Ermedahl², K. Andersson - Show less +2 more•Institutions (2)

Brunel University London¹, Ericsson²

01 Aug 2016

TL;DR: This paper investigates four releases of an industrial embedded multi-core system from four perspectives and compares results for test code with corresponding production code, finding that test code did not fare well when compared with production code.

...read moreread less

Abstract: A fundamental goal of software engineering practice is to ensure that code quality is maintained throughout its lifetime. Measuring and maintaining the quality of test code should be as important as measuring production (in-the-field) code. However, test code often seems to be a second class citizen compared to production code in terms of its upkeep and general maintenance. Many of the code features we might expect in test code are either absent or, included when they should not be. In this paper, we investigate four releases of an industrial embedded multi-core system from four perspectives and compare results for test code with corresponding production code. The four perspectives we considered as indicators of code quality. Firstly, we looked at whether test and production code conformed to a set of in-house designated design rules. Secondly, we explored whether test code contained a reasonable proportion of comment to code lines ratio relative to production code. Thirdly, we examined test and production code and the number of assertions in that code. Finally we investigated the relationship between faults and code features. In terms of results, test code did not fare well when compared with production code. An interesting and startling result related to the use of assertions, they were used liberally in test and production code. However, their effect, if triggered, was much larger in production code.

...read moreread less

Book Chapter•DOI•

Code Bones: Fast and Flexible Code Generation for Dynamic and Speculative PolyhedralźOptimization

[...]

Juan Manuel Martinez Caamaño¹, Willy Wolff¹, Philippe Clauss¹•Institutions (1)

University of Strasbourg¹

24 Aug 2016

TL;DR: A new runtime code generation technique for speculative loop optimization and parallelization is presented that allows to generate on-the-fly codes resulting from any polyhedral optimizing transformation of loop nests, such as tiling, skewing, fission, fusion or interchange, without introducing a penalizing time overhead.

...read moreread less

Abstract: In this paper, we present a new runtime code generation technique for speculative loop optimization and parallelization, that allows to generate on-the-fly codes resulting from any polyhedral optimizing transformation of loop nests, such as tiling, skewing, fission, fusion or interchange, without introducing a penalizing time overhead. The proposed strategy is based on the generation of code bones at compile-time, which are parametrized code snippets either dedicated to speculation management or to computations of the original target program. These code bones are then instantiated and assembled at runtime to constitute the speculatively-optimized code, as soon as an optimizing polyhedral transformation has been determined. Their granularity threshold is sufficient to apply any polyhedral transformation, while still enabling fast runtime code generation. This strategy has been implemented in the speculative loop parallelizing framework Apollo.

...read moreread less