Offline Reinforcement Learning Agents for Adaptive Reactive Power Control with Renewable Energy Sources

Bhatt, Tejashri; Balduin, Stephan; Veith, Eric MSP

Home // ENERGY 2025, The Fifteenth International Conference on Smart Grids, Green Communications and IT Energy-aware Technologies // View article

Offline Reinforcement Learning Agents for Adaptive Reactive Power Control with Renewable Energy Sources

Authors:
Tejashri Bhatt
Stephan Balduin
Eric MSP Veith

Keywords: Smart Grid Management; Reactive Power Control; Artificial Intelligence; Soft Actor-Critic; Behavioral Cloning from Observation; Renewable Energy Integration; Offline Reinforcement Learning.

Abstract:
Conventional reactive power control is typically performed by operators through coordinated switching of power electronic devices. This task is becoming increasingly complex as the integration of renewable energy sources, such as rooftop photovoltaic systems and wind turbines, expands. Maintaining grid stability is critical to ensure energy supply without risking equipment damage. In this context, artificial Reinforcement Learning (RL) agents for reactive power control can assist operators by suggesting actions, though final decisions remain with the operator. High-performing automated RL algorithms are essential for this as they enable execution of complex actions through trial and error, facilitating the adaptable transfer of learning to the real world. While established algorithms, such as Soft Actor-Critic (SAC), Deep Deterministic Policy Gradient (DDPG), Twin-Delayed DDPG and Proximal Policy Optimization (PPO), offer solutions, each has limitations. Training artificial RL agents in real-world power grids is impractical due to the safety-critical concerns, stressing the need for an alternative approach. SAC provides benefits in continuous action space, such as improved exploration and leveraging past experiences, but suffers from long training times. This paper addresses the issue by reducing SAC training periods through the integration of the Behavior Cloning from Observation (BCO) algorithm. This approach enhances performance by initializing SAC with a high-performing, pre-trained Artificial Neural Network (ANN) rather than a random policy, providing a strong starting point while preserving the benefits of SAC.

Pages: 59 to 67

Copyright: Copyright (c) IARIA, 2025

Publication date: March 9, 2025

Published in: conference

ISSN: 2308-412X

ISBN: 978-1-68558-242-5

Location: Lisbon, Portugal

Dates: from March 9, 2025 to March 13, 2025