The authors show how to identify cool blogs based on three assumptions:
blogs tend to have definite topics, have enough posts, and tend to have a
certain level of consistency among their posts.
The level of consistency or the topical consistency tries to measure
whether a blogger focus on a solid interest thus it favours blogs with certain
topics like reviews on mobile devices. It is based on a mixture of topic
probabilities of posts (LDA). The authors measure the similarity preceding
posts. Hereby, the similarity is the distance between the topic probability
distributions, which is calculated using Euclidean, Kullback-Leibler, or
Jensen-Shanon distance.
They conduct a "user study" based on a corporate blog data set
and a single guy, who categorized 540 blogs in cool and not cool. Using a SVM
implementation, the authors were able to show an accurate precision and recall
for cool blog recognition.
This is a heuristic approach and can therefore be applied to any
language following the same assumptions.
So, check out the paper.
So, check out the paper.
No comments:
Post a Comment