Home // International Journal On Advances in Software, volume 16, numbers 3 and 4, 2023 // View article
Authors:
Xukuan Xu
Felix Conrad
Xingyu Xing
Oskar Loeprecht
Michael Moeckel
Keywords: Small-data; Process uncertainty; Design Of Experiments; Machine learning; Model-based sampling; Auto-sklearn.
Abstract:
As the algorithms mature, the bottleneck in applying Machine Learning (ML) to engineering, in particular to process analysis, monitoring and control, is often caused by the limited availability of suitable data and the cost of data acquisition. For many ML projects, datasets have been collected independently of subsequent analysis. In laboratory-based development, data acquisition and coverage of possible process uncertainties pose challenges to the preparation of datasets suitable for ML. This paper benchmarks existing design of experiments (DOE) strategies based on data generated by a simulation model, discussing their aptitude for training accurate ML regression models. 11 representative sampling strategies have been investigated to provide guidance for data collection under data acquisition constraints, including consideration of possible measurement uncertainties. As the optimal DOE depends on available data volume and the uncertainty level, recommendations for DOE selection are given.
Pages: 243 to 253
Copyright: Copyright (c) to authors, 2023. Used with permission.
Publication date: December 30, 2023
Published in: journal
ISSN: 1942-2628