Home // International Journal On Advances in Life Sciences, volume 13, numbers 1 and 2, 2021 // View article
Authors:
Sofia Gagiatsou
Georgios Markopoulos
George Mikros
Keywords: Author profiling; stylometry; Personality prediction; Jung Typology Test; Big Five model; corpus processing; computational stylistics; machine learning.
Abstract:
We present a study focused on the prediction of the author's personality based on natural language processing techniques applied to essays written in Modern Greek by high-school students. Each writer has been profiled by filling in two personality questionnaires, one based on the typology of Carl Jung and the other based on the Model of Five Factors. In addition, personality prediction is being discussed under the general research framework of author profiling by examining the effectiveness of several stylometric features to predict students’ personality types. The feature set we employed was a combination of the word and sentence length, the most frequent part-of-speech tags, most frequent character/word bigrams and trigrams, most frequent words, as well as hapax/dis legomena. Since personality prediction represents a complex multidimensional research problem, we applied various machine learning algorithms to optimize our model’s performance after extracting the stylometric features. We compared nine machine learning algorithms and ranked them according to their cross-validated accuracy. The best results in predicting the Jung’s Typology types were obtained by the Naive Bayes algorithm. In contrast, for the prediction of personality features based on the Five Factors Model, the Generalized Linear Model (Binomial method) algorithm prevailed. According to the personality classification based on the Jung Typology Test, the author’s personality prediction accuracy reached 80.7% on Extraversion, 79.9% on Intuition, 68.8% on Feeling, 75.7% on Judging, according to the personality classification. In the Big Five personality classification, the prediction accuracy reached 85.9% on Openness, 71.2% on Conscientiousness, 67.6% on Extraversion, 70.2% on Agreeableness, and 65.6% on Neuroticism. The reported results show a competitive approach to the personality prediction problem. Furthermore, our research revealed new combinations of stylometric features and corresponding computational techniques, giving interesting and satisfying solutions to the author’s personality prediction problem for Modern Greek.
Pages: 124 to 133
Copyright: Copyright (c) to authors, 2021. Used with permission.
Publication date: December 31, 2021
Published in: journal
ISSN: 1942-2660