Home // DATA ANALYTICS 2014, The Third International Conference on Data Analytics // View article
Scalable System for Textual Analysis of Stock Market Prediction
Authors:
Roy Guanyu Lin
Tzu-Chieh Tsai
Keywords: distributed system; scalability; stock market prediction
Abstract:
Stock Market Prediction is a problem that people deal with when they want to predict market trend. For short-term investment, news is one of the most important factors that has influence on stock price. Based on this idea, our target issue is to build a scalable stock market prediction system, which can process Chinese news articles in order to produce a prediction model. With this system, we can speed up the model training process and take into account more training source, e.g., posts from China’s microblog service, Sina Weibo. Also, with the emergence of cloud computing, a scalable system can lease more resources from cloud to serve the growing work. Our solution about building this system is using mature open source project, such as Hadoop for parallel computing, Mahout for scalable machine learning, and Jieba for Chinese text segmentation. We provide a basic algorithm for stock trend prediction, build the software stack, collect the news in Taiwan during March 2009 to May 2014 and also run some experiments to evaluate scalability of this system. The result shows that in this application, Jieba Chinese text Segmentation tool can scale well with multiprocessing, namely, 80 percent faster with four parallel processes compared to sequential mode. However, Mahout does not show significant speedup in this scenario.
Pages: 95 to 99
Copyright: Copyright (c) IARIA, 2014
Publication date: August 24, 2014
Published in: conference
ISSN: 2308-4464
ISBN: 978-1-61208-358-2
Location: Rome, Italy
Dates: from August 24, 2014 to August 28, 2013