Home // DATA ANALYTICS 2021, The Tenth International Conference on Data Analytics // View article


Integrated Architecture of SQL Engine and Data Analytics Tool with Apache Arrow Flight and Its Performance Evaluation

Authors:
Yuichiro Aoki
Satoru Watanabe

Keywords: relational database; SQL engine; ODBC; Apache Arrow; Apache Arrow Flight; data analytics

Abstract:
Data analytics in enterprise systems requires huge amounts of data that are generally stored in databases. Conventionally, Structured Query Language (SQL) engines retrieve the data from the database and data analytics tools, such as Python® scripts, are used to analyze them. In this case, whenever the data moves from the database to the data analytics tools, the data needs to be serialized/deserialized in traditional Open Database Connectivity (ODBC). This is one of the bottlenecks in data analytics performance. In addition, the data needs to be joined for advanced data analytics, and joining the data in the SQL engines takes a lot of time. This is another bottleneck. To remove these bottlenecks, we propose a new architecture integrating the SQL engine and the data analytics tool that reduces the number of data serializations/ deserializations and caches joined results to improve performance of data analytics. Evaluation results show that the data transfer throughput using Apache Arrow/Arrow Flight is 13.1-37.4 times faster than that of a conventional data analytics tool using ODBC. Moreover, this architecture runs 2.4 times faster with the caching mechanism than without it.

Pages: 40 to 44

Copyright: Copyright (c) IARIA, 2021

Publication date: October 3, 2021

Published in: conference

ISSN: 2308-4464

ISBN: 978-1-61208-891-4

Location: Barcelona, Spain

Dates: from October 3, 2021 to October 7, 2021