Home // IMMM 2014, The Fourth International Conference on Advances in Information Mining and Management // View article


What Grammar Tells About Gender and Age of Authors

Authors:
Michael Tschuggnall
Günther Specht

Keywords: Author Profiling; Text Classification; Grammar Trees; Machine Learning

Abstract:
The automatic classification of data has become a major research topic in the last years, and especially the analysis of text has gained interest due to the availability of huge amounts of online documents. In this paper, a novel style feature based on grammar syntax analysis is presented that can be used to automatically profile authors, i.e., to predict gender and age of the originator. Using full grammar trees of the sentences of a document, substructures of the trees are extracted by utilizing pq-grams. The mostly used patterns are then stored in a profile, which serve as input features for common machine learning algorithms. An extensive evaluation using a state-of-the-art test set containing thousands of English web blogs investigates on the optimal parameter and classifier configuration. Finally, promising results indicate that the proposed feature can be used as a significant characteristic to automatically predict the gender and age of authors.

Pages: 30 to 35

Copyright: Copyright (c) IARIA, 2014

Publication date: July 20, 2014

Published in: conference

ISSN: 2326-9332

ISBN: 978-1-61208-364-3

Location: Paris, France

Dates: from July 20, 2014 to July 24, 2014