Friday, August 23, 2013

"Counting Little Words in Big Data" - C.Chung and J. Pennebaker

I just read a really nice paper about the usage of words counts to derive psychological measures.
Their method called language style matching (LSM) was able to generate nice correlations with diverse phenomena.
For example, the show a relation between LSM and relationship stability (via dating protocols), income distribution (via craigslist) and wiki site rankings (via discussion threads).

Further, they refer to some nice example of how to use huge collection of data to derive new knowledge. gives you a real time impression of the current feeling of twitter users and serves as historical view on our linguistic features.

Monday, January 7, 2013

"News Information Flow Tracking, Yay!" - Suen et al.

Nifty is an evolution of the Memetracker project.
This time they implemented an incremental approach to their meme clustering.
What I specially like about their paper is the massive test data set of over 20 terabyte.
This data set consists of 6.1 billion blog post that they collected over 4 years.
One dry run of their clustering takes hereby less than 5 days.
For detail information please refer to the paper.

"Influence and Correlation in Social Networks" - Anagnostopoulos et al.

The authors describe a statistical test for actions in social networks.
This test helps to distinguish correlation from causality for information propagation in social networks.
Hereby, the author's test first calculates the probability for one user influencing the other user.
Afterwards, they shuffle action times of the user and test again.
To get a detailed insight into the statistical concept please take a look at the paper.