Home // International Journal On Advances in Software, volume 10, numbers 3 and 4, 2017 // View article
WaveletTrie: a Compressed Self-Indexed Data Structure Supporting Efficient Database Operations
Authors:
Stefan Böttcher
Rita Hartel
Jonas Manuel
Keywords: Column-oriented database management systems; compression; compressed indexed sequences of strings
Abstract:
An indexed compressed sequence of strings is a representation of a string sequence that on the one hand is more space efficient than the original sequence, but on the other hand supports more efficient operations on these strings. Whenever an indexed compressed sequence of strings supports efficient evaluation of typical database operations, like searching for exact matches or prefixes, range queries, computing the union or the intersection of sets of strings, or data modifications, databases can strongly benefit from storing their table columns in form of an indexed compressed sequence. In this paper, we show how to extend the data structure of the Wavelet Trie to an indexed compressed sequence of strings that supports efficient operations on column oriented databases. Therefore, we present algorithms for executing database operations like union, intersection, and range-queries on string se¬quences represented by an extended Wavelet Trie. Further¬more, in our evaluation, we show that performing these typical database operations on the extended Wavelet Trie is faster than simulating these operations on gzip- or bzip2-compressed data.
Pages: 539 to 552
Copyright: Copyright (c) to authors, 2017. Used with permission.
Publication date: December 31, 2017
Published in: journal
ISSN: 1942-2628