Taking Advantage of Turkish Characteristic Features to Tackle with Authorship Attribution Problems for Turkish

Saygılı, Neslihan Şirin; Acarman, Tankut; Amghar, Tassadit; Levrat, Bernard

Home // ICCGI 2016, The Eleventh International Multi-Conference on Computing in the Global Information Technology // View article

Taking Advantage of Turkish Characteristic Features to Tackle with Authorship Attribution Problems for Turkish

Authors:
Neslihan Şirin Saygılı
Tankut Acarman
Tassadit Amghar
Bernard Levrat

Keywords: authorship attribution; Turkish language; stylometry; n-gram; gerunds; Support Vector Machines

Abstract:
The rapid increase in the number of the electronic and online texts, such as electronic mails, online newspapers and magazines, blog posts and online forum messages has also accelerated the studies carried out on authorship attribution. Although the studies are not as abundant as in English language, there have been considerable studies on author identification in Turkish in the last fifteen years. This paper includes two parts; first part is a quick review of Turkish authorship attribution studies. The review is focused on the stylometric features that enable authors to be distinguished one from another. In the second part, we analyze the main characteristics of the Turkish language and depict our first experiments on Turkish corpora. In these lasts, we experiment different kind of n-gram and word structure, taking advantages of Turkish characteristic features by the frequent usage of gerunds in Turkish language, and use Support Vector Machines as learning algorithm.

Pages: 26 to 29

Copyright: Copyright (c) IARIA, 2016

Publication date: November 13, 2016

Published in: conference

ISSN: 2308-4529

ISBN: 978-1-61208-513-5

Location: Barcelona, Spain

Dates: from November 13, 2016 to November 17, 2016