Home // International Journal On Advances in Software, volume 7, numbers 3 and 4, 2014 // View article


Combining Association Mining with Topic Modeling to Discover More File Relationships

Authors:
Namita Dave
Karen Potts
Vu Dinh
Hazeline U. Asuncion

Keywords: Association mining; Topic Modeling; Software Engineering.

Abstract:
Software maintenance tasks require familiarity with the entire software system to make proper changes. Often, maintenance engineers who did not develop the software are tasked with corrective or adaptive maintenance tasks. As a result, modifying the software becomes a time-consuming process due to their lack of familiarity with the source code. To help software engineers locate relevant files for a maintenance task, association mining has been used to identify the files that frequently change together in a software repository. However, association mining techniques are limited to the amount of project history stored in a software repository. We address this difficulty by using a technique that combines association mining with topic modeling, referred to as Frequent Pattern Growth with Latent Dirichlet Allocation (FP-LDA). Topic modeling aims to uncover file relationships by learning semantic topics from source files. We validated our technique via experiments on seven open source projects with different project characteristics. Our results indicate that FP-LDA can find more related files than association mining alone. We also offer lessons learned from our investigation.

Pages: 539 to 550

Copyright: Copyright (c) to authors, 2014. Used with permission.

Publication date: December 30, 2014

Published in: journal

ISSN: 1942-2628