Home // BIOTECHNO 2014, The Sixth International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies // View article
Identification of Short Motifs for Comparing Biological Sequences and Incomplete Genomes
Authors:
Ramez Mina
Hesham Ali
Keywords: sequence comparison; alignment; biological motifs; alrignment-free; k-mers; restriction enzymes; coding sequences; phylogenetic trees
Abstract:
Sequence comparison remains one of the main computational tools in bioinformatics research. It is an essential starting point for addressing many problems in bioinformatics; including problems associated with recognition and classification of organisms. Although sequence alignment provides a well-studied approach for comparing sequences, it has been well documented and reported that sequence alignment fails to solve several instances of the sequence comparison problem, particularly for those sequences that contains errors or those that represent incomplete genomes. In this work, we propose an approach to identify the relatedness among species based on whether their sequences contain similar short sequences or signals. We cluster species based on biological signals such as restriction enzymes or short sequences that occur in the coding regions, as well as random signals for baseline comparison. We focus on identifying k-mers (motifs) that would produce the best results using this approach. The obtained results showed that specific k-mers with biological significance such as restriction enzymes produce excellent results. They also make it possible to obtain good comparisons while using shorter or incomplete sequences, which is a critical property for comparing genomes obtained from next generation sequencers.
Pages: 76 to 83
Copyright: Copyright (c) IARIA, 2014
Publication date: April 20, 2014
Published in: conference
ISSN: 2308-4383
ISBN: 978-1-61208-335-3
Location: Chamonix, France
Dates: from April 20, 2014 to April 24, 2014