Finding similarities in source code through factorization

Michel Chilowicz; Étienne Duris; Gilles Roussel

Communication Dans Un Congrès Année : 2008

Finding similarities in source code through factorization

(1) , (1) , (1)

Michel Chilowicz

Fonction : Auteur

Laboratoire d'Informatique Gaspard-Monge

Étienne Duris

Fonction : Auteur

Laboratoire d'Informatique Gaspard-Monge

Gilles Roussel

Fonction : Auteur
PersonId : 182892
IdHAL : gilles-roussel

Laboratoire d'Informatique Gaspard-Monge

Résumé

The high availability of a huge number of documents on the Web makes plagiarism very attractive and easy. This plagiarism concerns any kind of document, natural language texts as well as more structured information such as programs. In order to cope with this problem, many tools and algorithms have been proposed to ﬁnd similarities. In this paper we present a new algorithm designed to detect similarities in source codes. Contrary to existing methods, this algorithm relies on the notion of function and focuses on obfuscation with inlining and outlining of functions. This method is also eﬃcient against insertions, deletions and permutations of instruction blocks. It is based on code factorization and uses adapted pattern matching algorithms and structures such as suffix arrays.

Domaines

Algorithme et structure de données [cs.DS]

Fichier principal

hal.pdf (222.89 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Etienne Duris : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00620319

Soumis le : vendredi 30 septembre 2011-16:32:12

Dernière modification le : jeudi 28 mars 2024-03:25:14

Archivage à long terme le : mardi 13 novembre 2012-14:52:06

Dates et versions

hal-00620319 , version 1 (30-09-2011)

Identifiants

HAL Id : hal-00620319 , version 1

Citer

Michel Chilowicz, Étienne Duris, Gilles Roussel. Finding similarities in source code through factorization. 8th Workshop on Language Descriptions, Tools and Applications (LDTA'08), Apr 2008, Budapest, Hungary, Hungary. pp.47-62. ⟨hal-00620319⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENPC CNRS UNIV-MLV LIGM_ALGO PARISTECH LIGM LIGM_LRT ESIEE-PARIS UNIV-EIFFEL JSE2024

61 Consultations

175 Téléchargements

Finding similarities in source code through factorization

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager