Home // International Journal On Advances in Systems and Measurements, volume 17, numbers 3 and 4, 2024 // View article


Caption Generation for Clothing Image Pair Comparison Using Attribute Prediction and Prompt-based Visual Language Model

Authors:
Soichiro Yokoyama
Kohei Abe
Tomohisa Yamashita
Hidenori Kawamura

Keywords: consumer support; information provision; clothing caption generation; clothing attribute estimation, visual language model.

Abstract:
Detailed information for product comparisons is necessary for consumers' purchasing process, especially during the information search and choice evaluation phases. However, conventional product descriptions, which are the primary source of information, tend to focus only on the product in question and thus do not adequately express the differences between products. Garments are treated as target products, and the content required to compare items is assessed from clothing comparison articles in lifestyle magazines. Two generation methods are proposed for comparison of a pair of garment items. The first method separately generates captions for each item and selects a caption pair that expresses differences. The other utilizes a Visual Language Model with a prompt designed based on the assessment. Subject experiments confirmed that the proposed Visual Language Model method accurately represented the feature differences between garments and provided helpful information for consumers to compare garments.

Pages: 111 to 126

Copyright: Copyright (c) to authors, 2024. Used with permission.

Publication date: December 30, 2024

Published in: journal

ISSN: 1942-261x