Home // VALID 2018, The Tenth International Conference on Advances in System Testing and Validation Lifecycle // View article
Learning to Categorize Bug Reports With LSTM Networks
Authors:
Kaushikkumar D. Gondaliya
Jan Peters
Elmar Rueckert
Keywords: classification of text; bug reports; natural language processing; long short term memory networks; support vector machines.
Abstract:
The manual routing of bug reports to specialized expert teams is a time-consuming and expensive process. In this paper, we investigated how this process can be automated by training deep networks and state-of-the-art classifiers from thousands of real bug reports from a software company. Different combinations of the natural language processing methods lemmatization, pos tagger, bigram and stopword removal were evaluated in the classification algorithms Linear Support Vector Machines (SVMs), multinomial naive Bayes, and Long Short Term Memory (LSTM) networks. For feature processing we used the Term Frequency-Inverse Document Frequency (TF-IDF) method. Best results were obtained with a combination of the bigram method and linear SVMs. Similar prediction performance values were observed with LSTM networks that however promise to improve further with larger datasets. The bug triage tool was implemented in a microservice architecture using docker containers which allows for extending individual components and simplifies applications to other text classification problems.
Pages: 7 to 12
Copyright: Copyright (c) IARIA, 2018
Publication date: October 14, 2018
Published in: conference
ISSN: 2308-4316
ISBN: 978-1-61208-671-2
Location: Nice, France
Dates: from October 14, 2018 to October 18, 2018