Home // FUTURE COMPUTING 2013, The Fifth International Conference on Future Computational Technologies and Applications // View article
Twitter Data Preprocessing for Spam Detection
Authors:
Myungsook Klassen
Keywords: data preprocessing;spam detection; social network; classification.
Abstract:
Detecting Twitter spammer accounts using various classification machines learning algorithms was explored from an aspect of data preprocessing techniques. Data normalization, discretization and transformation were methods used for preprocessing in our study. Additionally, attribute reduction was performed by computing correlation coefficients among attributes and by other attribute selection methods to obtain high classification rates with classifiers,such as Support Vector Machine, Neural Networks, J4.8, and Random Forests. When top 24 attributes were selected and used for these classifiers, the overall classification rates obtained were very close in range 84.30% and 89%. There was no unique subset of attributes which performed the best, and there were various different sets of attributes playing important roles.
Pages: 56 to 61
Copyright: Copyright (c) IARIA, 2013
Publication date: May 27, 2013
Published in: conference
ISSN: 2308-3735
ISBN: 978-1-61208-272-1
Location: Valencia, Spain
Dates: from May 27, 2013 to June 1, 2013