Home // DATA ANALYTICS 2020, The Ninth International Conference on Data Analytics // View article


Detecting Users from Website Sessions: A Simulation Study

Authors:
Corné de Ruijt
Sandjai Bhulai

Keywords: Click models; Session clustering; HDBSCAN*

Abstract:
In real click data sets, the user initiating a web session may be censored, as unique users are commonly determined by cookies. One way to study the effect of this censoring on various website metrics, and to study the effectiveness of algorithms trying to undo this censoring, is by simulation. We therefore propose a click simulation model, which is capable of simulating user censoring due to cookie churn or the usage of multiple devices, but for which we still keep the uncensored ground truth. To recover unique users from session data, we compare several (H)DBSCAN*-type (Hierarchical Density-based Spatial Clustering of Applications with Noise) algorithms, where we assume that all sessions in a cluster likely originate from the same user. From this comparison, we find that even though the best (H)DBSCAN*-type algorithm does significantly outperform other benchmark clustering methods, it performs considerably worse than when using the observed cookie clusters. I.e., websites for which the assumptions of our simulation model hold, our results suggest that uncovering users from their session data using clustering algorithms may lead to considerably larger errors in terms of user related websites metrics, compared to using cookies to uncover users.

Pages: 35 to 40

Copyright: Copyright (c) IARIA, 2020

Publication date: October 25, 2020

Published in: conference

ISSN: 2308-4464

ISBN: 978-1-61208-816-7

Location: Nice, France

Dates: from October 25, 2020 to October 29, 2020