Home // SOFTENG 2016, The Second International Conference on Advances and Trends in Software Engineering // View article
Sequence Data Mining Approach for Detecting Type-3 Clones
Authors:
Yoshihisa Udagawa
Mitsuyoshi Kitamura
Keywords: Code clone; Maximal frequent sequence; Longest common subsequence(LCS) algorithm; Java source code.
Abstract:
Code clones are introduced to source code by changing, adding, and/or deleting statements in copied code fragments. Thus, the problem of finding code clones is essentially the detection of strings that partially match. The proposed algorithm is based on the well-known apriori principle in data mining and is tailored to detect code clones represented as sequences of strings. However, the apriori principle may generate too many sequential patterns. The proposed algorithm finds a compact representation of sequential patterns, known as maximal frequent sequential patterns, which is often two orders of magnitude smaller than frequent sequential patterns. Early experiments using the Java SDK 1.7.0.45 lang package demonstrate the number of extracted patterns and elapsed time in several contexts.
Pages: 82 to 88
Copyright: Copyright (c) IARIA, 2016
Publication date: February 21, 2016
Published in: conference
ISSN: 2519-8394
ISBN: 978-1-61208-458-9
Location: Lisbon, Portugal
Dates: from February 21, 2016 to February 25, 2016