An Efficient Algorithm for Read Matching in DNA Databases

Chen, Yangjun; Wu, Yujia; Xie, Jiuyong

Home // DBKDA 2016, The Eighth International Conference on Advances in Databases, Knowledge, and Data Applications // View article

An Efficient Algorithm for Read Matching in DNA Databases

Authors:
Yangjun Chen
Yujia Wu
Jiuyong Xie

Keywords: string matching; DNA sequences; tries; BWT-transformation

Abstract:
In this paper, we discuss an efficient and effective index mechanism to support the matching of massive reads (short DNA strings) in DNA databases. It is very important to the next generation sequencing in the biological research. The main idea behind it is to construct a trie structure over all the reads, and search the trie against a BWT-array L created for a genome sequence s to locate all the occurrences of every read in s once for all. In addition, we change a single-character checking against L to a multiple-character checking, by which multiple searches of L are reduced to a single scanning of L. In this way, high efficiency can be achieved. Experiments have been conducted, which show that our method for this problem is promising.

Pages: 23 to 34

Copyright: Copyright (c) IARIA, 2016

Publication date: June 26, 2016

Published in: conference

ISSN: 2308-4332

ISBN: 978-1-61208-486-2

Location: Lisbon, Portugal

Dates: from June 26, 2016 to June 30, 2016