Comparison and Evaluation of Clone Detection Tools
read more
Citations
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach
A Survey on Software Clone Detection Research
The state of the art in end-user software engineering
NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization
SourcererCC: scaling code clone detection to big-code
References
CCFinder: a multilinguistic token-based code clone detection system for large scale source code
A Space-Economical Suffix Tree Construction Algorithm
Efficient randomized pattern-matching algorithms
Winnowing: local algorithms for document fingerprinting
Clone detection using abstract syntax trees
Related Papers (5)
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach
Frequently Asked Questions (9)
Q2. What are the future works mentioned in the paper "Comparison and evaluation of clone detection tools" ?
To see how much the results depend upon Bellon, the authors plan to replicate the experiment with different independent judges. The whole benchmark suite with source code of the comparison framework, the data submitted by the participants, the reference set, and evaluation results are available online at [ 34 ] so that the experiment can be inspected in detail, replicated, and enhanced for new systems and clone detectors.
Q3. What is the underlying similarity function of clones?
The evaluation is based on clone pairs rather than equivalence classes of clones because, only for type-1 and type-2 clones, the underlying similarity function is reflexive, symmetric, and transitive.
Q4. How many lines can be shifted by the p 14 0:7?
Because the threshold for the acceptable length of a clone was 6 in the experiment, the choice of p ¼ 0:7 allows two six-line code fragments to be shifted by one line.
Q5. Why is syntax not taken into account?
Because syntax is not taken into account, the found clones may overlap different syntactic units, which cannot be replaced through functional abstraction.
Q6. What were the changes in the try-catch block?
In the try-catch block (in total, 9 and 11 lines, respectively), a method call was replaced by a string literal, anassignment was added, a simple assignment was turned into a declaration with initialization, a throw statement was added, and a package qualifier was extended.
Q7. Why is the yield value in Fig. 8 a lower bound?
Because oracling two identical candidates is negligible given the high absolute number and the lowpercentage of candidates the authors actually looked at in their experiment, the yield value in Fig. 8 is still a meaningful lower bound for the overall acceptance rate.
Q8. What is the common denominator for a code fragment?
Definition 2. A code fragment is a tuple ðf; s; eÞ which consists of the name of the source file f , the start line s, and the end line e of the fragment.
Q9. What is the encoding of the functors?
The functors and their parameters are summarized in a suffix tree, a trie that represents all suffixes of the program in a compact fashion.