Home // IMMM 2014, The Fourth International Conference on Advances in Information Mining and Management // View article
What Grammar Tells About Gender and Age of Authors
Authors:
Michael Tschuggnall
Günther Specht
Keywords: Author Profiling; Text Classification; Grammar Trees; Machine Learning
Abstract:
The automatic classification of data has become a major research topic in the last years, and especially the analysis of text has gained interest due to the availability of huge amounts of online documents. In this paper, a novel style feature based on grammar syntax analysis is presented that can be used to automatically profile authors, i.e., to predict gender and age of the originator. Using full grammar trees of the sentences of a document, substructures of the trees are extracted by utilizing pq-grams. The mostly used patterns are then stored in a profile, which serve as input features for common machine learning algorithms. An extensive evaluation using a state-of-the-art test set containing thousands of English web blogs investigates on the optimal parameter and classifier configuration. Finally, promising results indicate that the proposed feature can be used as a significant characteristic to automatically predict the gender and age of authors.
Pages: 30 to 35
Copyright: Copyright (c) IARIA, 2014
Publication date: July 20, 2014
Published in: conference
ISSN: 2326-9332
ISBN: 978-1-61208-364-3
Location: Paris, France
Dates: from July 20, 2014 to July 24, 2014