Home // MMEDIA 2012, The Fourth International Conferences on Advances in Multimedia // View article


A Database of Artificial Urdu Text in Video Images with Semi-Automatic Text Line Labeling Scheme

Authors:
Imran Siddiqi
Ahsen Raza

Keywords: Data Set; Artificial Urdu Text; Text Detection; Text Localization.

Abstract:
This paper describes a novel database of video images containing artificial (superimposed) Urdu text with a semi-automatic text line labeling scheme. The main objective of this study is to provide the community with a standard dataset together with an auto-labeling scheme for algorithmic development and evaluation of textual content based indexing and retrieval systems. We have specifically focused on Urdu text which is increasingly gaining research interest in recent years. The data set comprises 1000 video images collected from 19 different channels of 5 different categories. An attempt is made to capture the maximum possible variation in the text in terms of size, location, appearance and background. The data set is completely labeled by finding the bounding rectangle of each text occurrence facilitating the evaluation of text detection and localization systems. Based on our previous work on text localization, an automatic text labeling scheme is also proposed and the obtained results are compared with manual labeling. Ground truth data, supporting tasks like text recognition and word spotting will be considered in the next version of the data set.

Pages: 75 to 81

Copyright: Copyright (c) IARIA, 2012

Publication date: April 29, 2012

Published in: conference

ISSN: 2308-4448

ISBN: 978-1-61208-195-3

Location: Chamonix, France

Dates: from April 29, 2012 to May 4, 2012