scispace - formally typeset
Search or ask a question

Showing papers on "Redundant code published in 2007"


Journal ArticleDOI
TL;DR: An experiment is presented that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC) and selects techniques that cover the whole spectrum of the state-of-the-art in clone detection.
Abstract: Many techniques for detecting duplicated source code (software clones) have been proposed in the past. However, it is not yet clear how these techniques compare in terms of recall and precision as well as space and time requirements. This paper presents an experiment that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC). Their clone candidates were evaluated by one of the authors as an independent third party. The selected techniques cover the whole spectrum of the state-of-the-art in clone detection. The techniques work on text, lexical and syntactic information, software metrics, and program dependency graphs.

765 citations


Proceedings ArticleDOI
05 Nov 2007
TL;DR: An approach that takes queries of the form "Source object type → Destination object type" as input, and suggests relevant method-invocation sequences that can serve as solutions that yield the destination object from the source object given in the query is developed.
Abstract: Programmers commonly reuse existing frameworks or libraries to reduce software development efforts. One common problem in reusing the existing frameworks or libraries is that the programmers know what type of object that they need, but do not know how to get that object with a specific method sequence. To help programmers to address this issue, we have developed an approach that takes queries of the form "Source object type → Destination object type" as input, and suggests relevant method-invocation sequences that can serve as solutions that yield the destination object from the source object given in the query. Our approach interacts with a code search engine (CSE) to gather relevant code samples and performs static analysis over the gathered samples to extract required sequences. As code samples are collected on demand through CSE, our approach is not limited to queries of any specific set of frameworks or libraries. We have implemented our approach with a tool called PARSEWeb, and conducted four different evaluations to show that our approach is effective in addressing programmer's queries. We also show that PARSEWeb performs better than existing related tools: Prospector and Strathcona

446 citations


Proceedings ArticleDOI
02 Nov 2007
TL;DR: This paper proposes a fully dynamic approach that captures an intrinsic nature of hidden code execution that the original code should be present in memory and executed at some point at run-time.
Abstract: As reverse engineering becomes a prevalent technique to analyze malware, malware writers leverage various anti-reverse engineering techniques to hide their code. One technique commonly used is code packing as packed executables hinder code analysis. While this problem has been previously researched, the existing solutions are either unable to handle novel samples, or vulnerable to various evasion techniques. In this paper, we propose a fully dynamic approach that captures an intrinsic nature of hidden code execution that the original code should be present in memory and executed at some point at run-time. Thus, this approach monitors program execution and memory writes at run-time, determines if the code under execution is newly generated, and then extracts the hidden code of the executable. To demonstrate its effectiveness, we implement a system, Renovo, and evaluate it with a large number of real-world malware samples. The experiments show that Renovo is accurate compared to previous work, yet practical in terms of performance

317 citations


Proceedings ArticleDOI
Stephen Chong1, Jed Liu1, Andrew C. Myers1, Xin Qi1, K. Vikram1, Lantian Zheng1, Xin Zheng1 
14 Oct 2007
TL;DR: Swift as discussed by the authors automatically partitions application code while providing assurance that the resulting placement is secure and efficient, and uses a max-flow algorithm to place code and data in a way that minimizes client-server communication.
Abstract: Swift is a new, principled approach to building web applications that are secure by construction. In modern web applications, some application functionality is usually implemented as client-side code written in JavaScript. Moving code and data to the client can create security vulnerabilities, but currently there are no good methods for deciding when it is secure to do so. Swift automatically partitions application code while providing assurance that the resulting placement is secure and efficient. Application code is written as Java-like code annotated with information flow policies that specify the confidentiality and integrity of web application information. The compiler uses these policies to automatically partition the program into JavaScript code running in the browser, and Java code running on the server. To improve interactive performance, code and data are placed on the client side. However, security-critical code and data are always placed on the server. Code and data can also be replicated across the client and server, to obtain both security and performance. A max-flow algorithm is used to place code and data in a way that minimizes client-server communication.

278 citations


Proceedings ArticleDOI
Jens Krinke1
28 Oct 2007
TL;DR: The study observes that when there are inconsistent changes to a code clones group in a near version, it is rarely the case that there are additional changes in later versions such that the code clone group then has only consistent changes.
Abstract: Code cloning is regarded as a threat to software maintenance, because it is generally assumed that a change to a code clone usually has to be applied to the other clones of the clone group as well. However, there exists little empirical data that supports this assumption. This paper presents a study on the changes applied to code clones in open source software systems based on the changes between versions of the system. It is analyzed if changes to code clones are consistent to all code clones of a clone group or not. The results show that usually half of the changes to code clone groups are inconsistent changes. Moreover, the study observes that when there are inconsistent changes to a code clone group in a near version, it is rarely the case that there are additional changes in later versions such that the code clone group then has only consistent changes.

214 citations


Proceedings ArticleDOI
24 Jul 2007
TL;DR: A code coverage-based fault localization method to prioritize suspicious code in terms of its likelihood of containing program bugs, and indicates that the method can effectively reduce the search domain for locating program bugs.
Abstract: Localizing a bug in a program can be a complex and time- consuming process. In this paper we propose a code coverage-based fault localization method to prioritize suspicious code in terms of its likelihood of containing program bugs. Code with a higher risk should be examined before that with a lower risk, as the former is more suspicious (i.e., more likely to contain program bugs) than the latter. We also answer a very important question: how can each additional test case that executes the program successfully help locate program bugs? We propose that with respect to a piece of code, the aid introduced by the first successful test that executes it in computing its likelihood of containing a bug is larger than or equal to that of the second successful test that executes it, which is larger than or equal to that of the third successful test that executes it, etc. A tool, chiDebug, was implemented to automate the computation of the risk of the code and the subsequent prioritization of suspicious code for locating program bugs. A case study using the Siemens suite was also conducted. Data collected from our study support the proposal described above. They also indicate that our method (in particular Heuristics III (c), (d), and (e)) can effectively reduce the search domain for locating program bugs.

192 citations


Patent
29 Aug 2007
TL;DR: In this paper, a system, method, and computer program product are provided for determining whether code is unwanted based on the decompilation of the code and whether the code is available in the program.
Abstract: A system, method, and computer program product are provided for determining whether code is unwanted based on the decompilation thereof. In use, code is identified and the code is decompiled. In addition, it is determined whether the code is unwanted, based on the decompiled code.

170 citations


Patent
17 Aug 2007
TL;DR: Split hardware transaction techniques as discussed by the authors support execution of serial and parallel nesting of code within an atomic block to an arbitrary nesting depth, but the execution of the parent code sequence, the child code sequences, and other code within the atomic block may appear to have been executed as a single transaction.
Abstract: Split hardware transaction techniques may support execution of serial and parallel nesting of code within an atomic block to an arbitrary nesting depth. An atomic block including child code sequences nested within a parent code sequence may be executed using separate hardware transactions for each child, but the execution of the parent code sequence, the child code sequences, and other code within the atomic block may appear to have been executed as a single transaction. If a child transaction fails, it may be retried without retrying the parent code sequence or other child code sequences. Before a child transaction is executed, a determination of memory consistency may be made. If a memory inconsistency is detected, the child transaction may be retried or control may be returned to its parent. Memory inconsistencies between parallel child transactions may be resolved by serializing their execution before retrying at least one of them.

118 citations


Proceedings ArticleDOI
10 Jun 2007
TL;DR: This paper presents a simple but novel Hoare-logic-like framework that supports modular verification of general von-Neumann machine code with runtime code manipulation and proves its soundness in the Coq proof assistant and its power by certifying a series of realistic examples and applications.
Abstract: Self-modifying code (SMC), in this paper, broadly refers to anyprogram that loads, generates, or mutates code at runtime. It is widely used in many of the world's critical software systems tosupport runtime code generation and optimization, dynamic loading and linking, OS boot loader, just-in-time compilation, binary translation,or dynamic code encryption and obfuscation. Unfortunately, SMC is alsoextremely difficult to reason about: existing formal verification techniques-including Hoare logic and type system-consistentlyassume that program code stored in memory is fixedand immutable; this severely limits their applicability and power.This paper presents a simple but novel Hoare-logic-like framework that supports modular verification of general von-Neumann machine code with runtime code manipulation. By dropping the assumption that code memory is fixed and immutable, we are forced to apply local reasoningand separation logic at the very beginning, and treat program code uniformly as regular data structure. We address the interaction between separation and code memory and show how to establish the frame rules for local reasoning even in the presence of SMC. Our frameworkis realistic, but designed to be highly generic, so that it can support assembly code under all modern CPUs (including both x86 andMIPS). Our system is expressive and fully mechanized. We prove itssoundness in the Coq proof assistant and demonstrate its power by certifying a series of realistic examples and applications-all of which can directly run on the SPIM simulator or any stock x86 hardware.

100 citations


Proceedings ArticleDOI
05 Nov 2007
TL;DR: This work presents CodeGenie, a tool that implements a test-driven approach to search and reuse of code available on large-scale coderepositories, and relies on Sourcerer, an Internet-scale source code infrastructure that it has developed.
Abstract: We present CodeGenie, a tool that implements a test-driven approachto search and reuse of code available on large-scale coderepositories. While using CodeGenie developers design test cases fora desired feature first, similar to Test-driven Development (TDD).However, instead of implementing the feature as in TDD, CodeGenieautomatically searches for it based on information available in thetests. To check the suitability of the candidate results in thelocal context, each result is automatically woven into thedeveloper's project and tested using the original tests. Thedeveloper can then reuse the most suitable result. Later, reusedcode can also be unwoven from the project as wished. For the codesearching and wrapping facilities, CodeGenie relies on Sourcerer, anInternet-scale source code infrastructure that we have developed

90 citations


Proceedings ArticleDOI
01 Jan 2007
TL;DR: Fault injection experiments on SPEC and MiBench benchmark programs compiled with ACCE show that the correct output is produced with high probability and that CFEs are corrected with a latency of a few hundred instructions.
Abstract: Detection of control-flow errors at the software level has been studied extensively in the literature. However, there has not been any published work that attempts to correct these errors. Low-cost correction of CFEs is important for real-time systems where checkpointing is too expensive or impossible. This paper presents automatic correction of control-flow errors (ACCE), an efficient error correction algorithm involving addition of redundant code to the program. ACCE has been implemented by modifying GCC, a widely used C compiler, and performance measurements show that the overhead is very low. Fault injection experiments on SPEC and MiBench benchmark programs compiled with ACCE show that the correct output is produced with high probability and that CFEs are corrected with a latency of a few hundred instructions.

Patent
21 Sep 2007
TL;DR: In this article, a method for protecting a computer program against manipulation and for shielding its communication with other programs against eavesdropping and modification is presented, which is suitable for the protection of online banking, online investment, online entertainment, digital rights management, and other electronic commerce applications.
Abstract: A method for protecting a computer program against manipulation and for shielding its communication with other programs against eavesdropping and modification is presented. The method comprises the creation of individualized program copies to different groups of users, the insertion of or the derivation of individual cryptographic keys from the program code, the obfuscation of the program code, and the self-authentication of the program towards other programs. The method is suitable for the protection of online banking, online investment, online entertainment, digital rights management, and other electronic commerce applications.

Proceedings ArticleDOI
20 Oct 2007
TL;DR: This work presents CodeGenie, a tool that implements a test-driven approach to search and reuse of code available on largescale code repositories, which automatically searches for an existing implementation based on information available in the tests.
Abstract: We present CodeGenie, a tool that implements a test-driven approach to search and reuse of code available on largescale code repositories. With CodeGenie, developers designtest cases for a desired feature first, similar to Test-driven Development (TDD). However, instead of implementing the feature from scratch, CodeGenie automatically searches foran existing implementation based on information available in the tests. To check the suitability of the candidate results in the local context, each result is automatically woven into the developer's project and tested using the original tests. The developer can then reuse the most suitable result. Later, reused code can also be unwoven from the project as wished. For the code searching and wrapping facilities, CodeGenie relies on Sourcerer, an Internet-scale source code infrastructure that we have developed.

Journal Article
TL;DR: ICD-10 coding software must: be fully integrated with the practice management system allow the user to combine codes check for the validity of all codes used verify the correlation between prescribed minimum benefits and the respective ICD-10 code be efficient in the actual user's hands.
Abstract: ICD-10 coding software must: be fully integrated with the practice management system allow the user to combine codes check for the validity of all codes used allow the user to split the combination code when necessary have the capability of assigning the combination code or parts thereof to the treatment or procedure in the account by line item without having to repeatedly type them guide the user as to whether a code has to be combined or not allow the user to find the appropriate code easily allow the user to view inclusions and / or exclusions for the particular code allow the correlation between prescribed minimum benefits and the respective ICD-10 code be efficient in the actual user's hands (i.e. try it out using the examples given).

Patent
12 Jan 2007
TL;DR: In this article, a method of optimizing code which invokes methods on a system across an interface is described, and high level information relating to the system is accessed and this information is used in performing code transformations in order to optimize the code.
Abstract: A method of optimizing code which invokes methods on a system across an interface is described. High level information relating to the system is accessed and this information is used in performing code transformations in order to optimize the code.

Journal ArticleDOI
TL;DR: This paper proposes methods of visualizing and featuring code clones to support their understanding in large-scale software and has been implemented as a tool called Gemini, which has applied to an open source software system.
Abstract: Maintaining software systems is becoming more difficult as the size and complexity of software increase. One factor that complicates software maintenance is the presence of code clones. A code clone is a code fragment that has identical or similar code fragments to it in the source code. Code clones are introduced for various reasons such as reusing code by 'copy and paste'. If modifying a code clone with many similar code fragments, we must consider whether to modify each of them. Especially for large-scale software, such a process is very complicated and expensive. In this paper, we propose methods of visualizing and featuring code clones to support their understanding in large-scale software. The methods have been implemented as a tool called Gemini, which has applied to an open source software system. Application results show the usefulness and capability of our system.

Proceedings ArticleDOI
11 Mar 2007
TL;DR: The proposed approach improves the intra-execution model of code reuse by storing and reusing translations across executions, thereby achieving inter-executions persistence and improving performance over time.
Abstract: Run-time compilation systems are challenged with the task of translating a program's instruction stream while maintaining low overhead. While software managed code caches are utilized to amortize translation costs, they are ineffective for programs with short run times or large amounts of cold code. Such program characteristics are prevalent in real-life computing environments, ranging from Graphical User Interface (GUI) programs to large-scale applications such as database management systems. Persistent code caching addresses these issues. It is described and evaluated in an industry-strength dynamic binary instrumentation system - Pin. The proposed approach improves the intra-execution model of code reuse by storing and reusing translations across executions, thereby achieving inter-execution persistence. Dynamically linked programs leverage inter-application persistence by using persistent translations of library code generated by other programs. New translations discovered across executions are automatically accumulated into the persistent code caches, thereby improving performance over time. Inter-execution persistence improves the performance of GUI applications by nearly 90%, while inter-application persistence achieves a 59% improvement. In more specialized uses, the SPEC2K INT benchmark suite experiences a 26% improvement under dynamic binary instrumentation. Finally, a 400% speedup is achieved in translating the Oracle database in a regression testing environment.

Patent
05 Oct 2007
TL;DR: In this article, a device generates code with a technical computing environment (TCE) based on a model and information associated with a target processor, registers an algorithm with the TCE, automatically sets optimization parameters applied during generation of the code based on the algorithm, executes the generated code, receives feedback based on execution of generated code and uses the feedback to automatically update the optimization parameters and to automatically regenerate the code with the TCE until an optimal code is achieved for the target processor.
Abstract: A device generates code with a technical computing environment (TCE) based on a model and information associated with a target processor, registers an algorithm with the TCE, automatically sets optimization parameters applied during generation of the code based on the algorithm, executes the generated code, receives feedback based on execution of the generated code, and uses the feedback to automatically update the optimization parameters and to automatically regenerate the code with the TCE until an optimal code is achieved for the target processor.

Patent
Lik Cheung1
15 Nov 2007
TL;DR: In this article, a semantic graph with dependencies and metadata describing the source code sections is constructed, and if changes are received for the semantic graph, the graph is modified with the changes to form a modified semantic graph.
Abstract: A computer implemented method and computer usable program product for version control of source code. In one embodiment, a source code file is scanned for relationships between source code sections. A semantic graph with dependencies and metadata describing the source code sections is constructed. The dependencies indicate the relationships between the source code sections. If changes are received for the semantic graph, the semantic graph is modified with the changes to form a modified semantic graph.

Patent
04 Dec 2007
TL;DR: In this paper, a computer implemented method, data processing system, and computer program product for Java class automatic deployment using byte code instrumentation technology is described, where one or more classloaders are instrumented with byte code instruments such that a class loading event is received when a class is loaded.
Abstract: A computer implemented method, data processing system, and computer program product for Java class automatic deployment using byte code instrumentation technology. One or more classloaders are instrumented with byte code instrumentation code such that a class loading event is received when a class is loaded. If a determination is made that new byte code instrumentation code needs to be loaded with the loaded class, candidate classloaders that load import classes of the new byte code instrumentation code are determined. A correct classloader from the candidate classloaders to load the new byte code instrumentation code is calculated. The correct classloader is instrumented to have an extended classpath, wherein the new byte code instrumentation code is inserted into the extended classpath of the correct classloader. The class is loaded from the extended classpath and original classpath of the correct classloader.

Proceedings ArticleDOI
11 Mar 2007
TL;DR: SMask is a novel approach towards approximating data/code separation by using string masking to persistently mark legitimate code in string values, which is able to identify code that was injected during the processing of an http request.
Abstract: Web applications employ a heterogeneous set of programming languages: the language that was used to write the application's logic and several supporting languages. Supporting languages are e.g., server-side languages for data management like SQL and client-side interface languages such as HTML and JavaScript. These languages are handled as string values by the application's logic. Therefore, no syntactic means exists to differentiate between executable code and generic data. This circumstance is the root of most code injection vulnerabilities: Attackers succeed in providing malicious data that is executed by the application as code. In this paper we introduce SMask, a novel approach towards approximating data/code separation. By using string masking to persistently mark legitimate code in string values, SMask is able to identify code that was injected during the processing of an http request. SMask works transparently to the application and is implementable either by integration in the application server or by source-to-source translation using code instrumentation.

Patent
26 Apr 2007
TL;DR: In this paper, an emulator uses code translation and recompilation to execute target computer system applications on a host computer system by partitioning code into target application code blocks and combining them into block groups and translating.
Abstract: An emulator uses code translation and recompilation to execute target computer system applications on a host computer system. Target application code is partitioned into target application code blocks, and related target application code blocks are combined into block groups and translated. Translated application code block groups are sized to comply with restrictions on branch instruction size. Upon selecting an application code block group for execution, a cache tag is used to determine if a corresponding translated code block group is available and valid. If not, the block group is translated and executed. Sequentially executed translated code blocks are located in adjacent portions of memory to improve performance when switching between translated code blocks. The emulator may use a link register of the host computer system to prefetch instructions and data from translated code blocks. The emulator also takes into account structural hazards in translating instructions.

Patent
05 Dec 2007
TL;DR: In this article, a first program function is invoked to render static components of a web page and identify program code within the web page or associated file, before executing the identified program code.
Abstract: Detecting obfuscated attacks on a computer. A first program function is invoked to render static components of a web page and identify program code within the web page or associated file. In response, before executing the identified program code, a malicious-code detector is invoked to scan the identified program code for malicious code. If the malicious-code detector identifies malicious code in the identified program code, the identified program code is not executed. If no malicious code is detected, a second program function generates revised program code from execution of the identified, program code. In response, before executing the revised program code, the malicious-code detector is invoked to scan the revised program code for malicious code. If the malicious-code detector identifies malicious code in the revised program code, the revised program code is not executed.

Patent
Bohuslav Rychlik1
03 May 2007
TL;DR: In this paper, a method of executing a conditional instruction within a pipeline processor having a plurality of pipelines is described, where the processor has a first condition code register associated with a first pipeline and a second condition code registers associated with another pipeline, and the method uses the most recent condition code value to determine if the conditional instruction should be executed.
Abstract: A method of executing a conditional instruction within a pipeline processor having a plurality of pipelines, the processor having a first condition code register associated with a first pipeline and a second condition code register associated with a second pipeline is disclosed. The method saves a most recent condition code value to either the first condition code register or the second condition code register. The method further sets an indicator indicating whether the second condition code register has the most recent condition code value and retrieves the most recent condition code value from either the first or second condition code register based on the indicator. The method uses the most recent condition code value to determine if the conditional instruction should be executed.

Patent
14 Aug 2007
TL;DR: In this article, a method of obfuscating executable computer code to impede reverse-engineering, by interrupting the software's execution flow and replacing in-line code with calls to subroutines that do not represent logical program blocks, is presented.
Abstract: A method of obfuscating executable computer code to impede reverse-engineering, by interrupting the software's execution flow and replacing in-line code with calls to subroutines that do not represent logical program blocks. Embodiments of the present invention introduce decoy code to confuse attackers, and computed branching to relocated code so that actual program flow cannot be inferred from disassembled source representations.

Patent
26 Apr 2007
TL;DR: In this article, the authors provide technologies for estimating code failure proneness probabilities for a code set and/or the files that make up the set, typically based on a set of selected code metrics, the code metrics typically selected based on corresponding failures of a previous version of the software.
Abstract: The present examples provide technologies for estimating code failure proneness probabilities for a code set and/or the files that make up the set. The code set being evaluated is typically comprised of binary and/or source files that embody the software for which the estimates are desired. The estimates are typically based on a set of selected code metrics, the code metrics typically selected based on corresponding failures of a previous version of the software. A historically variant metric feedback factor may also be calculated and code metric values classified relative to a baseline code set embodying the previous version of the software.

Patent
Subhas Kumar Ghosh1
18 Jan 2007
TL;DR: In this paper, a first code is read from a user carried device useable in an access control system, and access is permitted only if the first code compares favorably to the second code.
Abstract: A first code is read from a user carried device useable in an access control system. The first code is an encoded form of at least an ID of a user carrying the device and at least one privilege. The privilege defines the user's access to a resource. The first code is compared to a second code, and access is permitted only if the first code compares favorably to the second code. A reader of the access control system computes the second code based on the user ID and the privilege. The first and second codes may be also based on a secret key applicable only to the user.

Proceedings ArticleDOI
03 Sep 2007
TL;DR: An automated analysis is devised that detects unreachable code in the presence of code annotations and is implemented as an enhancement of the extended static checker ESC/Java2 where it serves as a check of coherency of specifications and code.
Abstract: Well-specified programs enable code reuse and therefore techniques that help programmers to annotate code correctly are valuable. We devised an automated analysis that detects unreachable code in the presence of code annotations. We implemented it as an enhancement of the extended static checker ESC/Java2 where it serves as a check of coherency of specifications and code. In this article we define the notion of semantic unreachability, describe an algorithm for checking it and demonstrate on a case study that it detects a class of errors previously undetected, as well as describe different scenarios of these errors.

Patent
27 Mar 2007
TL;DR: In this paper, a computer-implemented method for evaluating software code includes receiving from a static analysis of software code a warning indicating a respective location in the software code of a potential bug and a possible execution path leading to the potential bug.
Abstract: A computer-implemented method for evaluating software code includes receiving from a static analysis of the software code a warning indicating a respective location in the software code of a potential bug and a possible execution path leading to the potential bug. Responsively to the warning, instrumentation is added to the code at one or more locations along the execution path. Upon executing the instrumented code, an output is generated, responsively to the instrumentation, indicating that the execution path was traversed while executing the instrumented code.

Patent
Takashi Endo1, Toshio Okochi1, Takashi Watanabe1, Shunsuke Ota1, Tatsuya Kameyama1 
12 Apr 2007
TL;DR: In this paper, a program to be executed by a computer is divided into a plurality of code blocks, and, a unique code block ID is allotted to each code block, at the moment when the execution of the program is started, the code block id corresponding to the execution start address is written in a memory.
Abstract: A program to be executed by a computer is divided into a plurality of code blocks, and, a unique code block ID is allotted to each code block. At the moment when the execution of the program is started, the code block ID corresponding to the execution start address is written in a memory, and in the case when the control transits from the code block to other code block, by use of code block operation values obtained beforehand from these two code block IDs thereof, the code block ID in the memory is updated, and it is judged whether the updated code block ID in the memory and the code block ID allotted to the code block as the execution objective are identical or not so that a control flow error is detected.