Home // DATA ANALYTICS 2021, The Tenth International Conference on Data Analytics // View article


Classification of Bots and Gender using Topic Unigrams

Authors:
Astrid Fleig
Lisa Geyersbach
Melissa Göhler
Patricia Kurz
Paul Limburg
Dirk Labudde
Michael Spranger

Keywords: Author Profiling; Bot Detection; Gender Detection; Twitter; Spanish; English

Abstract:
In social networks such as Twitter, author profiling plays a big role. It is especially interesting to differentiate between accounts from humans and bots and to make a prediction about the age and the gender of human users. The information can be helpful to analyze possible manipulations, networks and crimes. This paper presents an approach to differentiate between bots and humans, as well as the gender for the human accounts using Tweets. For each sub-problem, a linear Support-Vector Machine (SVM) was used and different feature and featuresets were tested. The analysis showed that the topic model is the best feature for all categories. For this feature, the term frequencies of the most important terms of the topics were used. In comparison to other approaches, this approach could increase the performance. More precisely, only with this feature it was possible to reach accuracies between 99.7% and 100%.

Pages: 77 to 81

Copyright: Copyright (c) IARIA, 2021

Publication date: October 3, 2021

Published in: conference

ISSN: 2308-4464

ISBN: 978-1-61208-891-4

Location: Barcelona, Spain

Dates: from October 3, 2021 to October 7, 2021