Home // AISyS 2025, The Second International Conference on AI-based Systems and Services // View article
Authors:
Alaeddin Türkmen
Barış Bayram
Ahmet Çay
Zehra Hafızoğlu Gökdağ
Keywords: OCR; Automated Document Processing; Image Pre-processing
Abstract:
Extracting expiry dates from official documents is a critical task in numerous administrative and compliance workflows. Traditionally performed manually, this process is time- consuming, error-prone, and costly at scale. In this study, we present a comparative evaluation of multiple optical character recognition (OCR) engines combined with a diverse set of image preprocessing techniques to automate expiry date extraction from scanned and photographed documents, including insurance policies, identity cards, licenses, and inspection reports. A dataset of manually annotated portable document format (PDF) and joint photographic experts group (JPEG) files was used for benchmarking. Each image was processed using various transfor- mations. Extracted texts were parsed using comprehensive regular expression patterns to identify date candidates, from which the latest valid date was selected as the predicted expiry. Our findings indicate that SuryaOCR, particularly when applied to unprocessed raw images, consistently outperformed other configurations, substantially reducing the need for manual intervention.
Pages: 46 to 51
Copyright: Copyright (c) IARIA, 2025
Publication date: September 28, 2025
Published in: conference
ISBN: 978-1-68558-303-3
Location: Lisbon, Portugal
Dates: from September 28, 2025 to October 2, 2025