Home // AISyS 2025, The Second International Conference on AI-based Systems and Services // View article


A Comparative Study on Automated Expiry Date Extraction from Official Documents Using OCR and Image Preprocessing

Authors:
Alaeddin Türkmen
Barış Bayram
Ahmet Çay
Zehra Hafızoğlu Gökdağ

Keywords: OCR; Automated Document Processing; Image Pre-processing

Abstract:
Extracting expiry dates from official documents is a critical task in numerous administrative and compliance workflows. Traditionally performed manually, this process is time- consuming, error-prone, and costly at scale. In this study, we present a comparative evaluation of multiple optical character recognition (OCR) engines combined with a diverse set of image preprocessing techniques to automate expiry date extraction from scanned and photographed documents, including insurance policies, identity cards, licenses, and inspection reports. A dataset of manually annotated portable document format (PDF) and joint photographic experts group (JPEG) files was used for benchmarking. Each image was processed using various transfor- mations. Extracted texts were parsed using comprehensive regular expression patterns to identify date candidates, from which the latest valid date was selected as the predicted expiry. Our findings indicate that SuryaOCR, particularly when applied to unprocessed raw images, consistently outperformed other configurations, substantially reducing the need for manual intervention.

Pages: 46 to 51

Copyright: Copyright (c) IARIA, 2025

Publication date: September 28, 2025

Published in: conference

ISBN: 978-1-68558-303-3

Location: Lisbon, Portugal

Dates: from September 28, 2025 to October 2, 2025