K-Means on the Graphics Processor: Design And Experimental Analysis

Zechner, Mario; Granitzer, Michael

Home // International Journal On Advances in Systems and Measurements, volume 2, numbers 2 and 3, 2009 // View article

K-Means on the Graphics Processor: Design And Experimental Analysis

Authors:
Mario Zechner
Michael Granitzer

Keywords: Parallelization, GPGPU, K-Means

Abstract:
Apart from algorithmic improvements many intensive machine learning algorithms can gain performance by parallelization. Programmable graphics processing units (GPU) offer a highly data parallel architecture that is suitable for many computational tasks in machine learning. We present an optimized k-means implementation on the graphics processing unit. NVIDIA’s Compute Unified Device Architecture (CUDA), available from the G80 GPU family onwards, is used as the programming environment. Emphasis is placed on optimizations directly targeted at this architecture to best exploit the computational capabilities available. Additionally drawbacks and limitations of previous related work, e.g. maximum instance, dimension and centroid count are addressed. The algorithm is realized in a hybrid manner, parallelizing distance calculations on the GPU while sequentially updating cluster centroids on the CPU based on the results from the GPU calculations. An empirical performance study on synthetic data is given, demonstrating a maximum 14x speed increase to a fully SIMD optimized CPU implementation. We present detailed empirical data on the runtime behavior of the various stages of the implementation, identify bottlenecks and investigate potential discrepancies arising from different rounding modes on the GPU and CPU based. We extend our previous work in [1] by giving a more in depth description of CUDA as well as including previously omitted experimental data.

Pages: 224 to 235

Publication date: December 1, 2009

Published in: journal

ISSN: 1942-261x