Twitter Data Preprocessing for Spam Detection

Klassen, Myungsook

Home // FUTURE COMPUTING 2013, The Fifth International Conference on Future Computational Technologies and Applications // View article

Twitter Data Preprocessing for Spam Detection

Authors:
Myungsook Klassen

Keywords: data preprocessing;spam detection; social network; classification.

Abstract:
Detecting Twitter spammer accounts using various classification machines learning algorithms was explored from an aspect of data preprocessing techniques. Data normalization, discretization and transformation were methods used for preprocessing in our study. Additionally, attribute reduction was performed by computing correlation coefficients among attributes and by other attribute selection methods to obtain high classification rates with classifiers,such as Support Vector Machine, Neural Networks, J4.8, and Random Forests. When top 24 attributes were selected and used for these classifiers, the overall classification rates obtained were very close in range 84.30% and 89%. There was no unique subset of attributes which performed the best, and there were various different sets of attributes playing important roles.

Pages: 56 to 61

Copyright: Copyright (c) IARIA, 2013

Publication date: May 27, 2013

Published in: conference

ISSN: 2308-3735

ISBN: 978-1-61208-272-1

Location: Valencia, Spain

Dates: from May 27, 2013 to June 1, 2013