Home // DATA ANALYTICS 2018, The Seventh International Conference on Data Analytics // View article


Towards a Scalable Data-Intensive Text Processing Architecture with Python and Cassandra

Authors:
Gregor-Patrick Heine
Thomas Woltron
Alexander Wöhrer

Keywords: Cassandra; Streaming; Python; Multiprocessing; Twitter; Sentiment Analysis

Abstract:
Canonical sentiment analysis implementations hinge on synchronous Hyper Text Transfer Protocol (HTTP) calls. This paper introduces an asynchronous streaming approach. A method for public opinion surveillance is proposed via stream subscriptions. A prototype combining Twitter streams, Python text processing and Cassandra storage methods is introduced elaborating on three major points: 1) Comparison of performance regarding writing methods. 2) Multiprocessing procedures employing data parallelization and asynchronous concurrent database writes. 3) Public opinion surveillance via noun-phrase extraction.

Pages: 15 to 18

Copyright: Copyright (c) IARIA, 2018

Publication date: November 18, 2018

Published in: conference

ISSN: 2308-4464

ISBN: 978-1-61208-681-1

Location: Athens, Greece

Dates: from November 18, 2018 to November 22, 2018