Automating Benchmarking Process for Multimodal Large Language Models (MLLMs) in the Context of Waste Disposal

Hammoud, Sundus; Werner, Robert

Home // ADVCOMP 2024, The Eighteenth International Conference on Advanced Engineering Computing and Applications in Sciences // View article

Automating Benchmarking Process for Multimodal Large Language Models (MLLMs) in the Context of Waste Disposal

Authors:
Sundus Hammoud
Robert Werner

Keywords: Large Language Model; Multimodal Large Language Model; Benchmarking, LLM-as-a-judge.

Abstract:
Multimodal Large Language Models (MLLMs) are systems that can handle both text and non-text input by the user. They can also be prompted to follow certain instructions that can influence their behavior. These capabilities make them an excellent candidate for waste disposal classification. However, these models are trained on general knowledge, and they fail to answer simple questions about recycling because local recycling rules vary across regions. In addition, language models tend to respond in long and detailed text, which makes it very daunting for a human to go through thousands of lines of text while benchmarking such models to evaluate their answers. We propose an approach to automate the benchmarking process in the context of waste disposal and minimize human intervention by introducing a Large Language Model (LLM) to evaluate the answers of another LLM, we also leverage the prompting strategies to achieve this and to resolve the region-based recycling rules problem. We achieved promising results and sped up the benchmarking process significantly by saving researchers from hours of manual evaluation.

Pages: 26 to 31

Copyright: Copyright (c) IARIA, 2024

Publication date: September 29, 2024

Published in: conference

ISSN: 2308-4499

ISBN: 978-1-68558-184-8

Location: Venice, Italy

Dates: from September 29, 2024 to October 3, 2024