Home // International Journal On Advances in Software, volume 16, numbers 3 and 4, 2023 // View article


Comparative Analysis of Small Data Acquisition Strategies in Machine Learning Regression Tasks Addressing Potential Uncertainties

Authors:
Xukuan Xu
Felix Conrad
Xingyu Xing
Oskar Loeprecht
Michael Moeckel

Keywords: Small-data; Process uncertainty; Design Of Experiments; Machine learning; Model-based sampling; Auto-sklearn.

Abstract:
As the algorithms mature, the bottleneck in applying Machine Learning (ML) to engineering, in particular to process analysis, monitoring and control, is often caused by the limited availability of suitable data and the cost of data acquisition. For many ML projects, datasets have been collected independently of subsequent analysis. In laboratory-based development, data acquisition and coverage of possible process uncertainties pose challenges to the preparation of datasets suitable for ML. This paper benchmarks existing design of experiments (DOE) strategies based on data generated by a simulation model, discussing their aptitude for training accurate ML regression models. 11 representative sampling strategies have been investigated to provide guidance for data collection under data acquisition constraints, including consideration of possible measurement uncertainties. As the optimal DOE depends on available data volume and the uncertainty level, recommendations for DOE selection are given.

Pages: 243 to 253

Copyright: Copyright (c) to authors, 2023. Used with permission.

Publication date: December 30, 2023

Published in: journal

ISSN: 1942-2628