Identifying Potentially Useful Email Header Features for Email Spam Filtering

Al-Jarrah, Omar; Khater, Ismail; Al-Duwairi, Basheer

Home // ICDS 2012, The Sixth International Conference on Digital Society // View article

Identifying Potentially Useful Email Header Features for Email Spam Filtering

Authors:
Omar Al-Jarrah
Ismail Khater
Basheer Al-Duwairi

Keywords: Email spam, Machine Learning

Abstract:
Email spam continues to be a major problem in the Internet. With the spread of malware combined with the power of botnets, spammers are now able to launch large scale spam campaigns causing major traffic increase and leading to enormous economical loss. In this paper, we identify potentially useful email header features for email spam filtering by analyzing publicly available datasets. Then, we use these features as input to several machine learning-based classifiers and compare their performance in filtering email spam. These classifiers are: C4.5 Decision Tree (DT), Support Vector Machine (SVM), Multilayer Perception (MP), Nave Bays (NB), Bayesian Network (BN), and Random Forest (RF). Experimental studies based on publicly available datasets show that RF classifier has the best performance with an average accuracy, precision, recall, F-Measure, ROC area of 98.5%, 98.4%, 98.5%, and 98.5%, respectively

Pages: 140 to 145

Copyright: Copyright (c) IARIA, 2012

Publication date: January 30, 2012

Published in: conference

ISSN: 2308-3956

ISBN: 978-1-61208-176-2

Location: Valencia, Spain

Dates: from January 30, 2012 to February 4, 2012