Monday, January 7, 2013

"News Information Flow Tracking, Yay!" - Suen et al.

Nifty is an evolution of the Memetracker project.
This time they implemented an incremental approach to their meme clustering.
What I specially like about their paper is the massive test data set of over 20 terabyte.
This data set consists of 6.1 billion blog post that they collected over 4 years.
One dry run of their clustering takes hereby less than 5 days.
For detail information please refer to the paper.

