Faster in Time and Better in Randomness Algorithms for Matching Subjects with Multiple Controls

Chang, Hung-Jui; Hsu, Yu-Hsuan; Hsueh, Chih-Wen; Pan, Mei-Lien; Tsao, Hsiao-Mei; Wang, Da-Wei; Hsu, Tsan-sheng

Home // International Journal On Advances in Software, volume 12, numbers 3 and 4, 2019 // View article

Faster in Time and Better in Randomness Algorithms for Matching Subjects with Multiple Controls

Authors:
Hung-Jui Chang
Yu-Hsuan Hsu
Chih-Wen Hsueh
Mei-Lien Pan
Hsiao-Mei Tsao
Da-Wei Wang
Tsan-sheng Hsu

Keywords: matching; observational study; relative entropy.

Abstract:
In the era of learning healthcare systems and big data, observational studies play a vital role in discovering hidden (causal) associations within a dataset. To reduce bias in these observational studies, a matching step usually is adopted to randomly match each case subject with one or more control candidates. A high-quality matching algorithm, RandFlow, is proposed and compared with the commonly used – Simple Match, Matchit and Optmatch algorithms. The execution time, the memory usage, the successful matching rate, the statistical variation of relative risk, and the randomness computed employing the different algorithms are compared. The execution time of RandFlow was at least 30 times faster than commonly used methods, with at least a 66% reduction in memory usage. The variation of relative risk computed by RandFlow usually was smaller than by Simple Match. Simple Match had varying relative entropy, ranging from 0.2 to 0.95, while RandFlow almost uniformly had relative entropy close to 1. RanfFlow could find a matching so long as the maximum matching ratio was not reached. For obtaining more reliable study results, a two-phase matching is proposed. The first phase is to identify the maximum matching ratio, then is followed by matching multiple times and taking an average.

Pages: 249 to 258

Publication date: December 30, 2019

Published in: journal

ISSN: 1942-2628