Home // IMMM 2017, The Seventh International Conference on Advances in Information Mining and Management // View article


A Framework for Blog Data Collection: Challenges and Opportunities

Authors:
Muhammad Nihal Hussain
Adewale Obadimu
Kiran Kumar Bandeli
Mohammad Nooman
Samer Al-khateeb
Nitin Agarwal

Keywords: blog; unstructured data; web crawling; blog data collection; blog data analysis tool.

Abstract:
Blogosphere has, although slowly after the advent of Twitter, continued to rise and provides a rich medium for content framing. With no restriction on the number of characters, many users use blogs to express their opinion and use other social media channels like Twitter and Facebook to steer their audience to their blogs. Blogs provide more content than any other social media and serve as a good platform for agenda-setting. This content can be of great use to sociologists and data scientists to track opinions about events. However, the importance of blog tracking has been challenged due to the complex process of data collection and handling unstructured text data. This has caused many tracking tools to abandon blogs and move to other medium like Twitter. Nevertheless, blogs continue to be an important part of social media and cannot be ignored. In this paper, we explain the process to collect blog data, challenges we encounter, and demonstrate the importance of blog tracking through a real-world test case. The blog datasets discussed in this paper are made available publicly for researchers and practitioners through the Blogtrackers tool.

Pages: 35 to 40

Copyright: Copyright (c) IARIA, 2017

Publication date: June 25, 2017

Published in: conference

ISSN: 2326-9332

ISBN: 978-1-61208-566-1

Location: Venice, Italy

Dates: from June 25, 2017 to June 29, 2017