Skip to Main content Skip to Navigation
Conference papers

Finding similarities in source code through factorization

Abstract : The high availability of a huge number of documents on the Web makes plagiarism very attractive and easy. This plagiarism concerns any kind of document, natural language texts as well as more structured information such as programs. In order to cope with this problem, many tools and algorithms have been proposed to find similarities. In this paper we present a new algorithm designed to detect similarities in source codes. Contrary to existing methods, this algorithm relies on the notion of function and focuses on obfuscation with inlining and outlining of functions. This method is also efficient against insertions, deletions and permutations of instruction blocks. It is based on code factorization and uses adapted pattern matching algorithms and structures such as suffix arrays.
Document type :
Conference papers
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download

https://hal-upec-upem.archives-ouvertes.fr/hal-00620319
Contributor : Etienne Duris <>
Submitted on : Friday, September 30, 2011 - 4:32:12 PM
Last modification on : Wednesday, February 26, 2020 - 7:06:05 PM
Long-term archiving on: : Tuesday, November 13, 2012 - 2:52:06 PM

File

hal.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00620319, version 1

Citation

Michel Chilowicz, Étienne Duris, Gilles Roussel. Finding similarities in source code through factorization. 8th Workshop on Language Descriptions, Tools and Applications (LDTA'08), Apr 2008, Budapest, Hungary, Hungary. pp.47-62. ⟨hal-00620319⟩

Share

Metrics

Record views

216

Files downloads

292