Philippine Standard time

Using Latent Dirichlet Allocation for Topic Modeling and Document Clustering of Dumaguete City Twitter Dataset


Online communication channel, such as social media is predominantly becoming common nowadays as it allows people to fearlessly and instantly share opinions and exchange information at one's convenience. One popular social media site and microblogging service, Twitter, has made it easy for people to express or share their experiences, adventures, and opinions on places they visited. These short messages, called tweets contain useful information that can be analyzed to generate topics of what people are talking about and their sentiments on that particular topic. To process these huge amounts of Twitter dataset requires substantial effort of information filtering just to successfully drill down relevant topics and determine sentiments of those topic clusters. This paper discusses the process and the methods of generating topics and topic clusters on Twitter dataset about Dumaguete City and generates a probable sentiment analysis of each topic clusters. Latent Dirichlet Allocation (LDA) model was used to generate topics out of 99,942 tweets and clusters the tweets by calculating the probability on which topic cluster the tweet belongs. A supervised machine learning algorithm, Support Vector Machine (SVM) was used to generalize the sentiment of each cluster into positive, negative, or neutral.

Citations

This publication has been cited time(s).