Home // IARIA Congress 2025, The 2025 IARIA Annual Congress on Frontiers in Science, Technology, Services, and Applications // View article
Malware Detection Using Machine Learning: A Comparative Analysis
Authors:
Sameeruddin Mohammed
Fan Zhang
Faria Brishti
Baiyun Chen
Fan Wu
Keywords: machine learning; malware detection; classification; model comparison; model evaluation.
Abstract:
To address the growing challenges posed by Cyber threats, anti-malware organizations have increasingly turned to Machine Learning (ML). In recent years, machine learning algorithms have become indispensable for solving complex classifi- cation problems, outperforming traditional statistical methods by capturing intricate patterns in high dimensional data. However, selecting the optimal model requires rigorous evaluation in multiple performance metrics while ensuring stability across different data splits. In this study, we conducted a comprehensive assessment of eight machine learning algorithms. Random Forest (RF), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Logistic Regression (LR), Naive Bayes, Light Gradient Boosting Machine (LightGBM), Decision Tree (DT), and K-Nearest Neighbors (KNN) using stratified 5-fold cross- validation. Our results reveal that RF, LightGBM, DT, and KNN achieve exceptional performance, with identical near-perfect scores in accuracy (0.9918), precision (0.9920), recall (0.9918), F1 score (0.9918) and Area Under the Receiver Operation Characteristic Curve (AUC-ROC) (0.9998), along with remarkably low variance (10−6 to 10−8), demonstrating unparalleled robustness. The study highlights the superiority of tree-based ensembles and KNN in achieving high predictive power and stability, whereas classical algorithms such as logistic regression and naive Bayes lag. Despite XGBoost’s reputation, its performance here is eclipsed by simpler tree-based methods. Our analysis underscores the importance of considering variance when evaluating model selection, particularly for critical applications where stability is paramount, and provides actionable insights for practitioners seeking reliable, high-accuracy classifiers.
Pages: 94 to 99
Copyright: Copyright (c) IARIA, 2025
Publication date: July 6, 2025
Published in: conference
ISBN: 978-1-68558-284-5
Location: Venice, Italy
Dates: from July 6, 2025 to July 10, 2025