Learning to Categorize Bug Reports With LSTM Networks

Gondaliya, Kaushikkumar D.; Peters, Jan; Rueckert, Elmar

Home // VALID 2018, The Tenth International Conference on Advances in System Testing and Validation Lifecycle // View article

Learning to Categorize Bug Reports With LSTM Networks

Authors:
Kaushikkumar D. Gondaliya
Jan Peters
Elmar Rueckert

Keywords: classification of text; bug reports; natural language processing; long short term memory networks; support vector machines.

Abstract:
The manual routing of bug reports to specialized expert teams is a time-consuming and expensive process. In this paper, we investigated how this process can be automated by training deep networks and state-of-the-art classifiers from thousands of real bug reports from a software company. Different combinations of the natural language processing methods lemmatization, pos tagger, bigram and stopword removal were evaluated in the classification algorithms Linear Support Vector Machines (SVMs), multinomial naive Bayes, and Long Short Term Memory (LSTM) networks. For feature processing we used the Term Frequency-Inverse Document Frequency (TF-IDF) method. Best results were obtained with a combination of the bigram method and linear SVMs. Similar prediction performance values were observed with LSTM networks that however promise to improve further with larger datasets. The bug triage tool was implemented in a microservice architecture using docker containers which allows for extending individual components and simplifies applications to other text classification problems.

Pages: 7 to 12

Copyright: Copyright (c) IARIA, 2018

Publication date: October 14, 2018

Published in: conference

ISSN: 2308-4316

ISBN: 978-1-61208-671-2

Location: Nice, France

Dates: from October 14, 2018 to October 18, 2018