Tuesday, July 31, 2012

Summary of "Domain-Specific Identification of Topics and Trends in the Blogosphere - Schirru et al."

The authors present a system called "Social Media Miner". This system extracts topics and the corresponding, most relevant posts.
The relevance is calculated using a link authority algorithm like PageRank. The main contribution of the paper is the topic detection and tracking mechanism.
Schirru et al. cluster blog post using a time windowing approach. To create the cluster they use a tf/idf vector for each blog post, k-means, and non-negative matrix factorization for label extraction. To define the number of clusters they use the residual sum of squares.

Nevertheless, their approach is rather simple. They cluster topics for a given period, find relevant terms (or labels), and visualize the term mentions over time as Trend Graph.

Check out the paper.

No comments:

Post a Comment