Open Access BASE2012

How to Extract Relevant Knowledge from Tweets?

Bouillot, Flavien; Phan, Nhat Hai; Béchet, Nicolas; Bringay, Sandra; Ienco, Dino; Matwin, Stan; Poncelet, Pascal; Roche, Mathieu; Teisseire, Maguelonne

Zugriff(Open Access)

Abstract

International audience ; Tweets exchanged over the Internet are an important source of information even if their characteristics make them difficult to analyze (e.g., a maximum of 140 characters; noisy data). In this paper, we investigate two different problems. The first one is related to the extraction of representative terms from a set of tweets. More precisely we address the following question: are traditional information retrieval measures appropriate when dealing with tweets?. The second problem is related to the evolution of tweets over time for a set of users. With the development of data mining approaches, lots of very efficient methods have been defined to extract patterns hidden in the huge amount of data available. More recently new spatio-temporal data mining approaches have specifically been defined for dealing with the huge amount of moving object data that can be obtained from the improvement in positioning technology. Due to particularity of tweets, the second question we investigate is the following: are spatio-temporal mining algorithms appropriate for better understanding the behavior of communities over time? These two prob- lems are illustrated through real applications concerning both health and political tweets.