Home // International Journal On Advances in Intelligent Systems, volume 8, numbers 3 and 4, 2015 // View article
On the Potential of Grammar Features for Automated Author Profiling
Authors:
Michael Tschuggnall
Günther Specht
Keywords: Author Profiling; Text Classification; Grammar Trees; Textual Features; Machine Learning
Abstract:
The automatic classification of data has become a major research topic in the last years, and especially the analysis of text has gained interest due to the availability of huge amounts of online documents. In this paper, a novel style feature based on grammar syntax analysis is presented that can be used to automatically profile authors, i.e., to predict gender and age of the originator. Using full grammar trees of the sentences of a document, substructures of the trees are extracted by utilizing pq-grams. The mostly used patterns are then stored in a profile and serve as input features for common machine learning algorithms. An extensive evaluation using a state-of-the-art test set containing several thousand English web blogs investigates on the optimal parameter and classifier configuration. Promising results indicate that the proposed feature can be used as a standalone, significant characteristic to automatically predict the gender and age of authors. Finally, further evaluations incorporating other commonly used word-based features like the number of stop words, the type-token-ratio or different readability indices strengthen the high potential of grammar analysis for automated author profiling.
Pages: 255 to 265
Copyright: Copyright (c) to authors, 2015. Used with permission.
Publication date: December 30, 2015
Published in: journal
ISSN: 1942-2679