Home // International Journal On Advances in Internet Technology, volume 14, numbers 1 and 2, 2021 // View article
Authors:
Corné de Ruijt
Sandjai Bhulai
Keywords: Click models; Session clustering; HDBSCAN*
Abstract:
In this paper, we propose a click simulation model capable of simulating users' interactions with a search engine, in particular in the presence of user censoring. We illustrate the simulation model by applying it to the problem of detecting unique users from the session data of a search engine. In real click datasets, the user initiating the session may be censored, as unique users are often determined by their cookies. Therefore, analyzing this problem using a click simulation model, for which we have an uncensored ground truth, allows for studying the effect of cookie churn itself. Furthermore, it allows for studying how well clustering algorithms perform in detecting clusters of sessions that originate from a single user. To cluster sessions, we present and compare various constrained DBSCAN*-type clustering algorithms. From this comparison, we find that even though the clusters found by the best DBSCAN*-type algorithm did significantly outperform other benchmark clustering methods, it performed considerably worse compared to using the observed cookie clusters. This result remains under different simulation scenarios, though the results do improve when strengthening the user signal. While clustering algorithms may be useful to detect similar users for purposes such as user clustering, cookie tracking remains the preferred method for tracking individual users.
Pages: 1 to 13
Copyright: Copyright (c) to authors, 2021. Used with permission.
Publication date: December 31, 2021
Published in: journal
ISSN: 1942-2652