Home // DATA ANALYTICS 2013, The Second International Conference on Data Analytics // View article
Content-based Recommender System for Textual Documents Written in Croatian
Authors:
Ivana Ćavar
Zvonko Kavran
Natalija Jolić
Neven Anđelović
Ivan Cvitić
Marko Gović
Keywords: recommender system; k-nearest neighbour; content-based classification; document-term matrix
Abstract:
The paper describes a content-based recommender system that classifies textual documents written in Croatian. We describe how documents are pre-processed, including procedures of dimensionality reduction, selection of stop-words and creation of document-term matrix. For the text classification, a combination of v-fold cross validation and k - nearest neighbours (kNN) methods is used. This way, the ‘optimal’ value of k is firstly analyzed, and the results of v-fold cross validation are applied for the selection of value k. Results are given in the form of classification error analysis.
Pages: 25 to 29
Copyright: Copyright (c) IARIA, 2013
Publication date: September 29, 2013
Published in: conference
ISSN: 2308-4464
ISBN: 978-1-61208-295-0
Location: Porto, Portugal
Dates: from September 29, 2013 to October 3, 2013