Tuesday, July 31, 2012

Summary of "Cool Blog Identification using Topic-based Models - Sriphaew et al."

The authors show how to identify cool blogs based on three assumptions: blogs tend to have definite topics, have enough posts, and tend to have a certain level of consistency among their posts.

The level of consistency or the topical consistency tries to measure whether a blogger focus on a solid interest thus it favours blogs with certain topics like reviews on mobile devices. It is based on a mixture of topic probabilities of posts (LDA). The authors measure the similarity preceding posts. Hereby, the similarity is the distance between the topic probability distributions, which is calculated using Euclidean, Kullback-Leibler, or Jensen-Shanon distance.

They conduct a "user study" based on a corporate blog data set and a single guy, who categorized 540 blogs in cool and not cool. Using a SVM implementation, the authors were able to show an accurate precision and recall for cool blog recognition.

This is a heuristic approach and can therefore be applied to any language following the same assumptions.
So, check out the paper.

No comments:

Post a Comment