Home // ADVCOMP 2024, The Eighteenth International Conference on Advanced Engineering Computing and Applications in Sciences // View article
Authors:
Sundus Hammoud
Robert Werner
Keywords: Large Language Model; Multimodal Large Language Model; Benchmarking, LLM-as-a-judge.
Abstract:
Multimodal Large Language Models (MLLMs) are systems that can handle both text and non-text input by the user. They can also be prompted to follow certain instructions that can influence their behavior. These capabilities make them an excellent candidate for waste disposal classification. However, these models are trained on general knowledge, and they fail to answer simple questions about recycling because local recycling rules vary across regions. In addition, language models tend to respond in long and detailed text, which makes it very daunting for a human to go through thousands of lines of text while benchmarking such models to evaluate their answers. We propose an approach to automate the benchmarking process in the context of waste disposal and minimize human intervention by introducing a Large Language Model (LLM) to evaluate the answers of another LLM, we also leverage the prompting strategies to achieve this and to resolve the region-based recycling rules problem. We achieved promising results and sped up the benchmarking process significantly by saving researchers from hours of manual evaluation.
Pages: 26 to 31
Copyright: Copyright (c) IARIA, 2024
Publication date: September 29, 2024
Published in: conference
ISSN: 2308-4499
ISBN: 978-1-68558-184-8
Location: Venice, Italy
Dates: from September 29, 2024 to October 3, 2024