Home // ALLDATA 2023, The Ninth International Conference on Big Data, Small Data, Linked Data and Open Data // View article
On Factorizing Million Scale Non-Negative Matrices using Compressed Structures
Authors:
Sudhindra Gopal Krishna
Aditya Narasimhan
Sridhar Radhakrishnan
Chandra N Sekharan
Keywords: Compression, Matrix Operations, Matrix Factorization.
Abstract:
Non-negative Matrix Factorization (NMF) is one of the algorithms with a wide range of applications, from dimensionality reduction and computer vision to text mining. The dimensions of these matrices can be of the order of several hundreds of thousands to millions, which is a raw format that would not fit in the main memory. Additionally, while performing matrix factorization on these extremely large matrices, the algorithms involving matrix operations such as transpose, multiplication, and subtraction; demand more storage for intermediate resultant matrices. In this paper, we store the matrices in compressed structures (Compressed Binary Tree CBT and Compressed Sparse Row CSR) that allow factorization without decompression. We also perform factorization CBT without using any intermediate structures by performing a virtual transpose and streaming the intermediate resultant matrices of a sequence of matrix multiplications directly into the compressed structure for every iteration. As an example, for an input matrix A of dimension 65,536 x 65,536 with 1.46M number of non-zero elements, the peak storage in any iteration of the multiplicative update factorization algorithm is 32.98GB when using a 2D array, 200MB when using CSR and 14.8MB for CBT. The ability to stream (add and delete) into the CBT structure without reallocation is why CBT performs the best. Furthermore, we provide a heuristic to reduce memory usage that also aids in faster convergence.
Pages: 39 to 44
Copyright: Copyright (c) IARIA, 2023
Publication date: April 24, 2023
Published in: conference
ISSN: 2519-8386
ISBN: 978-1-68558-041-4
Location: Venice, Italy
Dates: from April 24, 2023 to April 28, 2023