Home // SOFTENG 2016, The Second International Conference on Advances and Trends in Software Engineering // View article


Sequence Data Mining Approach for Detecting Type-3 Clones

Authors:
Yoshihisa Udagawa
Mitsuyoshi Kitamura

Keywords: Code clone; Maximal frequent sequence; Longest common subsequence(LCS) algorithm; Java source code.

Abstract:
Code clones are introduced to source code by changing, adding, and/or deleting statements in copied code fragments. Thus, the problem of finding code clones is essentially the detection of strings that partially match. The proposed algorithm is based on the well-known apriori principle in data mining and is tailored to detect code clones represented as sequences of strings. However, the apriori principle may generate too many sequential patterns. The proposed algorithm finds a compact representation of sequential patterns, known as maximal frequent sequential patterns, which is often two orders of magnitude smaller than frequent sequential patterns. Early experiments using the Java SDK 1.7.0.45 lang package demonstrate the number of extracted patterns and elapsed time in several contexts.

Pages: 82 to 88

Copyright: Copyright (c) IARIA, 2016

Publication date: February 21, 2016

Published in: conference

ISSN: 2519-8394

ISBN: 978-1-61208-458-9

Location: Lisbon, Portugal

Dates: from February 21, 2016 to February 25, 2016