Malware Detection Using Machine Learning: A Comparative Analysis

Mohammed, Sameeruddin; Zhang, Fan; Brishti, Faria; Chen, Baiyun; Wu, Fan

Home // IARIA Congress 2025, The 2025 IARIA Annual Congress on Frontiers in Science, Technology, Services, and Applications // View article

Malware Detection Using Machine Learning: A Comparative Analysis

Authors:
Sameeruddin Mohammed
Fan Zhang
Faria Brishti
Baiyun Chen
Fan Wu

Keywords: machine learning; malware detection; classification; model comparison; model evaluation.

Abstract:
To address the growing challenges posed by Cyber threats, anti-malware organizations have increasingly turned to Machine Learning (ML). In recent years, machine learning algorithms have become indispensable for solving complex classifi- cation problems, outperforming traditional statistical methods by capturing intricate patterns in high dimensional data. However, selecting the optimal model requires rigorous evaluation in multiple performance metrics while ensuring stability across different data splits. In this study, we conducted a comprehensive assessment of eight machine learning algorithms. Random Forest (RF), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Logistic Regression (LR), Naive Bayes, Light Gradient Boosting Machine (LightGBM), Decision Tree (DT), and K-Nearest Neighbors (KNN) using stratified 5-fold cross- validation. Our results reveal that RF, LightGBM, DT, and KNN achieve exceptional performance, with identical near-perfect scores in accuracy (0.9918), precision (0.9920), recall (0.9918), F1 score (0.9918) and Area Under the Receiver Operation Characteristic Curve (AUC-ROC) (0.9998), along with remarkably low variance (10−6 to 10−8), demonstrating unparalleled robustness. The study highlights the superiority of tree-based ensembles and KNN in achieving high predictive power and stability, whereas classical algorithms such as logistic regression and naive Bayes lag. Despite XGBoost’s reputation, its performance here is eclipsed by simpler tree-based methods. Our analysis underscores the importance of considering variance when evaluating model selection, particularly for critical applications where stability is paramount, and provides actionable insights for practitioners seeking reliable, high-accuracy classifiers.

Pages: 94 to 99

Copyright: Copyright (c) IARIA, 2025

Publication date: July 6, 2025

Published in: conference

ISBN: 978-1-68558-284-5

Location: Venice, Italy

Dates: from July 6, 2025 to July 10, 2025