Home // ACHI 2025, The Eighteenth International Conference on Advances in Computer-Human Interactions // View article


Improving Continuous Japanese Fingerspelling Recognition with Transformers: A Comparative Study against CNN-LSTM Hybrids

Authors:
Akihisa Shitara
Yuhki Shiraishi

Keywords: Deaf and hard of hearing; Sign language; Sensor glove; Recognition.

Abstract:
To achieve smooth communication between d/Deaf and hard of hearing (d/DHH) and hearing people, we have developed a continuous Japanese fingerspelling (JS) recognition system using sensor gloves and deep learning. We have selected a light and inexpensive sensor glove adapted for the system’s daily use. In our prior system using a machine learning model that combines convolutional neural network (CNN) and long short-term memory (LSTM), despite achieving the average micro F-measure of 76 JF characters was 92.1%, we reported the average macro F-measure of only 64.7%. Two problems cause this issue: distinguishing between static and dynamic fingerspellings, and the decreased recognition rate due to the large number of instances “ϕ” (the transition movements characters). Therefore, we conducted a quantitative evaluation using the CNN-LSTM combined machine learning model as a baseline to verify whether the Transformer Encoder could improve JS recognition rates. Consequently, for the 76 JF characters, the average micro and macro F-measures were 93.8% (0.2) and 77.4% (1.0), respectively.

Pages: 13 to 19

Copyright: Copyright (c) IARIA, 2025

Publication date: May 18, 2025

Published in: conference

ISSN: 2308-4138

ISBN: 978-1-68558-268-5

Location: Nice, France

Dates: from May 18, 2025 to May 22, 2025