Practical Applications of State-Of-The-Art Large Language Models to Solve Real-World Software Engineering Problems Autonomously

Mikhalevich, Yurij

Home // CONTENT 2025, The Seventeenth International Conference on Creative Content Technologies // View article

Practical Applications of State-Of-The-Art Large Language Models to Solve Real-World Software Engineering Problems Autonomously

Authors:
Yurij Mikhalevich

Keywords: code generation; large language models; AI agents; natural language processing

Abstract:
This paper researches the application of state-of-the-art large language models to autonomously solve real-world software engineering problems based on the problem description intended for humans. For this research, we picked 10 outstanding GitHub issues of different difficulty levels in the Aibyss project. We tasked an AI agent to autonomously solve them based solely on the GitHub Issue description intended for human software engineers. As part of this research, we compared the following large language models: Claude Sonnet 3.7, DeepSeek-V3, DeepSeek-R1, and o3-mini-high. We used the Aider agent to solve the problems. Additionally, we have evaluated the Claude Code agent as one of the best closed-source AI software engineering agents. We have found that the best performance is achieved by Claude Sonnet 3.7 with reasoning enabled – with the Aider agent and the Claude Code agent. Both of them provided working solutions to 5 out of 10 GitHub issues. We analyze the agents’ behaviors, including reasoning steps, common failure modes, and the impact of reasoning tokens. The results highlight both the promise and the current limitations of autonomous LLM-based software engineering.

Pages: 13 to 19

Copyright: Copyright (c) IARIA, 2025

Publication date: April 6, 2025

Published in: conference

ISSN: 2308-4162

ISBN: 978-1-68558-262-3

Location: Valencia, Spain

Dates: from April 6, 2025 to April 10, 2025