Home // BIOTECHNO 2013, The Fifth International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies // View article


Compression- based Algorithms for Comparing Fragmented Genomic Sequences

Authors:
Ramez Mina
Dhundy Bastola
Hesham Ali

Keywords: compression algorithms; Kolmogorov complexity; Lempel-Ziv complexity; tree path difference; next generation sequencing

Abstract:
Sequence comparison is a fundamental tool in bioinformatics research since it helps to distinguish one sequence from another in terms of structure and function. Typically, methods such as global or local alignment are the preferred tools to measure a distance between sequence samples. Although they are often suitable tools for differentiation work, they could give erroneous results when the sequence data includes sequencing errors, gaps, repeats, and trans-locations which interfere with alignment methods. Next Generation sequence assembly tasks produce an enormous number of contigs and are reliant on alignment technologies to correctly place adjacent contigs together in the final sequence. If these alignment methods are confused by interruptions (i.e., fragmentation, gaps, mismatches or other blemishes) in the sequence data, then the assembly task may not be successful. We therefore suggest that sequence comparison can be successfully performed using alignment-free technologies and sequence compression methods which are less sensitive to inherent faults in sequencing tasks. In this paper, we evaluate different compression complexities and describe the use of compression algorithms for comparing biological sequence data. We analyze algorithm performance using protein sequence data and mitochondrial genomes with differing levels of interruption. Mitochondria is small dataset but is a well studied medium and is suitable to describe the effectiveness of the Lempel-Ziv complexity, Kolmogorov complexity using Lempel-Ziv-Welch, and Kolmogorov complexity using the Huffman coding schemes. We conclude our study by showing that sequence comparison via compression techniques is largely successful and could be a major help to high-throughput next-generation sequencing projects.

Pages: 15 to 22

Copyright: Copyright (c) IARIA, 2013

Publication date: March 24, 2013

Published in: conference

ISSN: 2308-4383

ISBN: 978-1-61208-260-8

Location: Lisbon, Portugal

Dates: from March 24, 2013 to March 29, 2013