Home // CYBER 2022, The Seventh International Conference on Cyber-Technologies and Cyber-Systems // View article
Authors:
Kaan Oguzhan
Tiago Espinha Gasiba
Akram Louati
Keywords: machine learning; deep learning; industry; software; vulnerabilities
Abstract:
Machine learning has been gaining more and more attention over the last years. One of the recent areas where machine learning has been applied is secure software development to identify software vulnerabilities. The algorithms depend on the amount and quality of data used for training. Although many studies are emerging on machine learning algorithms, one must enquire about the data used to train these algorithms. This paper addresses this question by investigating and analyzing freely available vulnerable code snippets. We investigate their quantity and quality in terms of the existing categorization of security vulnerabilities used in industrial environments. Furthermore, we investigate these aspects in dependency on several different programming languages. In addition, we provide the database containing the collected vulnerable code snippets for further research. Our results show that, while a large number of training data is available for some programming languages, this is not the case for every language. Our results can be used by researchers and industry practitioners working on machine learning and applying these algorithms to improve software security.
Pages: 25 to 33
Copyright: Copyright (c) IARIA, 2022
Publication date: November 13, 2022
Published in: conference
ISSN: 2519-8599
ISBN: 978-1-61208-996-6
Location: Valencia, Spain
Dates: from November 13, 2022 to November 17, 2022