Home // ALLDATA 2023, The Ninth International Conference on Big Data, Small Data, Linked Data and Open Data // View article


On Factorizing Million Scale Non-Negative Matrices using Compressed Structures

Authors:
Sudhindra Gopal Krishna
Aditya Narasimhan
Sridhar Radhakrishnan
Chandra N Sekharan

Keywords: Compression, Matrix Operations, Matrix Factorization.

Abstract:
Non-negative Matrix Factorization (NMF) is one of the algorithms with a wide range of applications, from dimensionality reduction and computer vision to text mining. The dimensions of these matrices can be of the order of several hundreds of thousands to millions, which is a raw format that would not fit in the main memory. Additionally, while performing matrix factorization on these extremely large matrices, the algorithms involving matrix operations such as transpose, multiplication, and subtraction; demand more storage for intermediate resultant matrices. In this paper, we store the matrices in compressed structures (Compressed Binary Tree CBT and Compressed Sparse Row CSR) that allow factorization without decompression. We also perform factorization CBT without using any intermediate structures by performing a virtual transpose and streaming the intermediate resultant matrices of a sequence of matrix multiplications directly into the compressed structure for every iteration. As an example, for an input matrix A of dimension 65,536 x 65,536 with 1.46M number of non-zero elements, the peak storage in any iteration of the multiplicative update factorization algorithm is 32.98GB when using a 2D array, 200MB when using CSR and 14.8MB for CBT. The ability to stream (add and delete) into the CBT structure without reallocation is why CBT performs the best. Furthermore, we provide a heuristic to reduce memory usage that also aids in faster convergence.

Pages: 39 to 44

Copyright: Copyright (c) IARIA, 2023

Publication date: April 24, 2023

Published in: conference

ISSN: 2519-8386

ISBN: 978-1-68558-041-4

Location: Venice, Italy

Dates: from April 24, 2023 to April 28, 2023