Home // BIOTECHNO 2014, The Sixth International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies // View article


Identification of Short Motifs for Comparing Biological Sequences and Incomplete Genomes

Authors:
Ramez Mina
Hesham Ali

Keywords: sequence comparison; alignment; biological motifs; alrignment-free; k-mers; restriction enzymes; coding sequences; phylogenetic trees

Abstract:
Sequence comparison remains one of the main computational tools in bioinformatics research. It is an essential starting point for addressing many problems in bioinformatics; including problems associated with recognition and classification of organisms. Although sequence alignment provides a well-studied approach for comparing sequences, it has been well documented and reported that sequence alignment fails to solve several instances of the sequence comparison problem, particularly for those sequences that contains errors or those that represent incomplete genomes. In this work, we propose an approach to identify the relatedness among species based on whether their sequences contain similar short sequences or signals. We cluster species based on biological signals such as restriction enzymes or short sequences that occur in the coding regions, as well as random signals for baseline comparison. We focus on identifying k-mers (motifs) that would produce the best results using this approach. The obtained results showed that specific k-mers with biological significance such as restriction enzymes produce excellent results. They also make it possible to obtain good comparisons while using shorter or incomplete sequences, which is a critical property for comparing genomes obtained from next generation sequencers.

Pages: 76 to 83

Copyright: Copyright (c) IARIA, 2014

Publication date: April 20, 2014

Published in: conference

ISSN: 2308-4383

ISBN: 978-1-61208-335-3

Location: Chamonix, France

Dates: from April 20, 2014 to April 24, 2014