Summary of "Using Blog Content Depth and Breadth to Access and Classify Blogs" - Chen et al.

Here, I sum up another interesting paper concerning a content-based ranking of blogs.

The authors present a blog-specific filtering system that measures topic concentration and variation.
They asses the quality of blogs via two main aspects: content depth and breadth. This got motivated via the sparseness of links and the highly personal character of the blogosphere.
The related work essentially consists of two areas: blog and quality assessment.
Blog assessment. PageRank, HITS, and Technorati's blog authority have two issues: sparseness of links and time lagging of score. Further, most blog search engines are based on simple retrieval models because they only access the limited content of feeds and have to struggle with real-time constraints.
Quality assessment. According to Joseph Juran, quality is the "fitness for use" of information. Common quality assessment metrics are based on heuristics for a specific situation. Thereby, researchers emphasise the differences in language, structure and importance of actuality of blogs. Further, blogs are more interesting, personal, and reflect the author's opinions/experiences. Thus, researches define the quality of a blog based on the blogger's expertise, trustworthiness, information quality, and its personal nature. In addition, the credibility of commentators also counts.

In essence, the authors present a score that relates 5 criterions.
The first criterion is the informativeness of a blog as the number of meaning full words. A meaningful word has a high tf/idf score. Secondly, the completeness of a blog indicates how much strongly related words from each mentioned topic are present.
Third criterion is the topic count per blog. Fourthly, the inter-topic distance specifies how much words of a post are shared between topics.
Finally, the topic mergence calculates the general overlap between topics.

