Home // COGNITIVE 2015, The Seventh International Conference on Advanced Cognitive Technologies and Applications // View article
On the Generation of Privatized Synthetic Data Using Distance Transforms
Authors:
Kato Mivule
Keywords: Privatized synthetic data generation; Data privacy; Distance transforms; k-means clustering
Abstract:
Organizations have interest in research collaboration efforts that involve data sharing with peers. However, such partnerships often come with confidentiality risks that could involve insider attacks and untrustworthy collaborators who might leak sensitive information. To mitigate such data sharing vulnerabilities, entities share privatized data with retracted sensitive information. However, while such data sets might offer some assurances of privacy, maintaining the statistical traits of the original data, is often problematic, leading to poor data usability. Therefore, in this paper, a confidential synthetic data generation heuristic, that employs a combination of data privacy and distance transforms techniques, is presented. The heuristic is used for the generation of privatized numeric synthetic data, while preserving the statistical traits of the original data. Empirical results from applying unsupervised learning, using k-means, to test the usability of the privatized synthetic data set, are presented. Preliminary results from this implementation show that it might be possible to generate privatized synthetic data sets, with the same statistical morphological structure as the original, using data privacy and distance transforms methods.
Pages: 156 to 161
Copyright: Copyright (c) IARIA, 2015
Publication date: March 22, 2015
Published in: conference
ISSN: 2308-4197
ISBN: 978-1-61208-390-2
Location: Nice, France
Dates: from March 22, 2015 to March 27, 2015