Home // MMEDIA 2012, The Fourth International Conferences on Advances in Multimedia // View article
A Database of Artificial Urdu Text in Video Images with Semi-Automatic Text Line Labeling Scheme
Authors:
Imran Siddiqi
Ahsen Raza
Keywords: Data Set; Artificial Urdu Text; Text Detection; Text Localization.
Abstract:
This paper describes a novel database of video images containing artificial (superimposed) Urdu text with a semi-automatic text line labeling scheme. The main objective of this study is to provide the community with a standard dataset together with an auto-labeling scheme for algorithmic development and evaluation of textual content based indexing and retrieval systems. We have specifically focused on Urdu text which is increasingly gaining research interest in recent years. The data set comprises 1000 video images collected from 19 different channels of 5 different categories. An attempt is made to capture the maximum possible variation in the text in terms of size, location, appearance and background. The data set is completely labeled by finding the bounding rectangle of each text occurrence facilitating the evaluation of text detection and localization systems. Based on our previous work on text localization, an automatic text labeling scheme is also proposed and the obtained results are compared with manual labeling. Ground truth data, supporting tasks like text recognition and word spotting will be considered in the next version of the data set.
Pages: 75 to 81
Copyright: Copyright (c) IARIA, 2012
Publication date: April 29, 2012
Published in: conference
ISSN: 2308-4448
ISBN: 978-1-61208-195-3
Location: Chamonix, France
Dates: from April 29, 2012 to May 4, 2012