Syntax tree fingerprinting: a foundation for source code similarity detection - Archive ouverte HAL Accéder directement au contenu
Rapport Année : 2009

Syntax tree fingerprinting: a foundation for source code similarity detection

Résumé

Plagiarism detection and clone refactoring in software depend on one common concern: nding similar source chunks across large repositories. However, since code duplication in software is often the result of copy-paste behaviors, only minor modi cations are expected between shared codes. On the contrary, in a plagiarism detection context, edits are more extensive and exact matching strategies show their limits. Among the three main representations used by source code similarity detection tools, namely the linear token sequences, the Abstract Syntax Tree (AST) and the Program Depen- dency Graph (PDG), we believe that the AST could e ciently support the program analysis and transformations required for the advanced similarity detection process. In this paper we present a simple and scalable architecture based on syntax tree nger- printing. Thanks to a study of several hashing strategies reducing false-positive collisions, we propose a framework that e ciently indexes AST representations in a database, that quickly detects exact (w.r.t source code abstraction) clone clusters and that easily retrieves their corresponding ASTs. Our aim is to allow further processing of neighboring exact matches in order to identify the larger approximate matches, dealing with the common modi cation patterns seen in the intra-project copy-pastes and in the plagiarism cases.
Fichier principal
Vignette du fichier
HAL.pdf (400.52 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00627811 , version 1 (29-09-2011)

Identifiants

  • HAL Id : hal-00627811 , version 1

Citer

Michel Chilowicz, Étienne Duris, Gilles Roussel. Syntax tree fingerprinting: a foundation for source code similarity detection. 2009. ⟨hal-00627811⟩
262 Consultations
3582 Téléchargements

Partager

Gmail Facebook X LinkedIn More