Home // DATA ANALYTICS 2013, The Second International Conference on Data Analytics // View article


Content-based Recommender System for Textual Documents Written in Croatian

Authors:
Ivana Ćavar
Zvonko Kavran
Natalija Jolić
Neven Anđelović
Ivan Cvitić
Marko Gović

Keywords: recommender system; k-nearest neighbour; content-based classification; document-term matrix

Abstract:
The paper describes a content-based recommender system that classifies textual documents written in Croatian. We describe how documents are pre-processed, including procedures of dimensionality reduction, selection of stop-words and creation of document-term matrix. For the text classification, a combination of v-fold cross validation and k - nearest neighbours (kNN) methods is used. This way, the ‘optimal’ value of k is firstly analyzed, and the results of v-fold cross validation are applied for the selection of value k. Results are given in the form of classification error analysis.

Pages: 25 to 29

Copyright: Copyright (c) IARIA, 2013

Publication date: September 29, 2013

Published in: conference

ISSN: 2308-4464

ISBN: 978-1-61208-295-0

Location: Porto, Portugal

Dates: from September 29, 2013 to October 3, 2013