Nifty is an evolution of the Memetracker project.
This time they implemented an incremental approach to their meme clustering.
What I specially like about their paper is the massive test data set of over 20 terabyte.
This data set consists of 6.1 billion blog post that they collected over 4 years.
One dry run of their clustering takes hereby less than 5 days.
For detail information please refer to the paper.
Phils Blog
Blog of Philipp Berger posting about his daily life and work.
Monday, January 7, 2013
"Influence and Correlation in Social Networks" - Anagnostopoulos et al.
The authors describe a statistical test for actions in social networks.
This test helps to distinguish correlation from causality for information propagation in social networks.
Hereby, the author's test first calculates the probability for one user influencing the other user.
Afterwards, they shuffle action times of the user and test again.
To get a detailed insight into the statistical concept please take a look at the paper.
This test helps to distinguish correlation from causality for information propagation in social networks.
Hereby, the author's test first calculates the probability for one user influencing the other user.
Afterwards, they shuffle action times of the user and test again.
To get a detailed insight into the statistical concept please take a look at the paper.
Labels:
Blog Analysis,
Blogs,
social network,
statistics
Thursday, August 16, 2012
Why PageRank & Co. are inferior for blog ranking?
PageRank is one of the most frequent used algorithms for ranking traditional webpages based on the web link graph. It has been introduced by Page et al. and is based on the random surfer model. A website’s PageRank is defined as the probability of a random surfer visiting this website.
The random surfer traverses the web by repeatedly choosing between two options: clicking on a random link on the current page or jumping to any website at random. The second option is necessary to make sure the random surfer also visits pages that have no incoming links and to make sure that it is possible to escape from pages that have no outgoing links.
The PageRank algorithm is iterative and converges after a certain number of iterations depending on the used implementation.
A very similar algorithm to PageRank is TrustRank.
In contrast to PageRank, TrustRank gets initialized with a fixed set of trusty or untrusty web sites. The trust propagates through the web graph equally to the PageRank algorithm.
Another approach is the Hyperlink-Induced Topic Search (HITS) algorithm by Kleinberg.
It is based on the concept of Hubs and Authorities. In the traditional view of the web Hubs are link directories and archives that only refer to information Authorities, which actual offer valuable information.
HITS operates on a subgraph of the web that is related to a specific input query. Each page gets an Authority score and a Hub score. The Authority score get increase based on the Hub score of linking webpages and vice versa.
These traditional ranking algorithms are all based on the web link graph.
However, traditional webpages show a different linking behaviour as blogs. Blogs offer different types of links, e.g. trackbacks or blogroll links, with different semantics. Furthermore, the blog link graph tends to be rather sparse in comparison to the overall web.
Thus, tailor-made rankings for blogs are needed that also consider blog-specific characteristics and blogs' content.
Labels:
Blog Analysis,
Blogs,
HITS,
PageRank,
Related Work,
TrustRank,
web link graph
Wednesday, August 1, 2012
Summary of "Credibility Improves Topical Blog Post Retrieval - Weerkamp et al."
The authors introduce 11 indicators of credibility to improve the
effectiveness of topical blog retrieval. Their indicator are one blog and on
post level. Beside some syntactic indicators, they also present the timeliness
of posts, the regularity of blogs, and the consistency of blogs.
The timeliness of a post is defined as the temporal distance of a blog post to a news post of the same topic. In this paper, topics seem to be term occurrences. Nonetheless, it is very interesting to incorporate traditional media.
The posting/publishing behaviour of a blogger is called regularity. Hereby, the authors assume that a credible blog has a very regular posting behaviour. In contrast, related research often assumes this as an indicator for splogs.
The topical consistency of a blog represents its topical fluctuation. The authors define the consistency similar to the query clarity, which remembers me a bit of the tf/idf score. As contrast to related work, the authors do not use the natural ordering of posts.
Nevertheless, the author show that their indicator can improve the topical blog retrieval significantly (using the blog06 data set).
Take a look at the paper.
The timeliness of a post is defined as the temporal distance of a blog post to a news post of the same topic. In this paper, topics seem to be term occurrences. Nonetheless, it is very interesting to incorporate traditional media.
The posting/publishing behaviour of a blogger is called regularity. Hereby, the authors assume that a credible blog has a very regular posting behaviour. In contrast, related research often assumes this as an indicator for splogs.
The topical consistency of a blog represents its topical fluctuation. The authors define the consistency similar to the query clarity, which remembers me a bit of the tf/idf score. As contrast to related work, the authors do not use the natural ordering of posts.
Nevertheless, the author show that their indicator can improve the topical blog retrieval significantly (using the blog06 data set).
Take a look at the paper.
Tuesday, July 31, 2012
Summary of "Domain-Specific Identification of Topics and Trends in the Blogosphere - Schirru et al."
The authors present a system called "Social Media Miner". This
system extracts topics and the corresponding, most relevant posts.
The relevance is calculated using a link authority algorithm like
PageRank. The main contribution of the paper is the topic detection and
tracking mechanism.
Schirru et al. cluster blog post using a time windowing approach. To
create the cluster they use a tf/idf vector for each blog post, k-means, and non-negative matrix factorization for label extraction. To define the number of clusters they use the
residual sum of squares.
Nevertheless, their approach is rather simple. They cluster topics for a
given period, find relevant terms (or labels), and visualize the term mentions
over time as Trend Graph.
Check out the paper.
Labels:
Blog Analysis,
Related Work,
Topic Detection,
Trends
Summary of "Cool Blog Identification using Topic-based Models - Sriphaew et al."
The authors show how to identify cool blogs based on three assumptions:
blogs tend to have definite topics, have enough posts, and tend to have a
certain level of consistency among their posts.
The level of consistency or the topical consistency tries to measure
whether a blogger focus on a solid interest thus it favours blogs with certain
topics like reviews on mobile devices. It is based on a mixture of topic
probabilities of posts (LDA). The authors measure the similarity preceding
posts. Hereby, the similarity is the distance between the topic probability
distributions, which is calculated using Euclidean, Kullback-Leibler, or
Jensen-Shanon distance.
They conduct a "user study" based on a corporate blog data set
and a single guy, who categorized 540 blogs in cool and not cool. Using a SVM
implementation, the authors were able to show an accurate precision and recall
for cool blog recognition.
This is a heuristic approach and can therefore be applied to any
language following the same assumptions.
So, check out the paper.
So, check out the paper.
Labels:
Blog Analysis,
Consistency,
LDA,
Related Work,
research,
SVM
Summary of "Splog Filtering based on Writing Consistency - Liuwei et al."
![]() |
| CR: kinipela |
They define the consistency of the writing interval as the inverse variance of post update intervals. A high writing interval consistency implies a very constant update interval.
The authors also define a measure for consistency of writing structure. Unexpectedly, there is no NLP magic behind this; the measure simply relates the variation of words per post and the average number of words per post. The underlying assumption is that splogs are packed with keywords and their posts are all equally long. As contrast, normal blogger tend to deliver short and long posts depending on their daily mood.
The consistency on topic level is defined as the average topical similarity of posts. Each post gets compared with its preceding post. The topical similarity is defined as the cosine similarity of the posts' tf/idf word vectors. Thereby, blogs with a very high topical consistency tend to be auto-generated.
Finally, they introduced a filtering system and evaluated their feature set with three classification mechanisms (SVM, Bayes, C4.5) on the Blog06 data set. They showed that with a reduced feature number the same accuracy is reachable using their feature set. Further, one has to mention that their heuristic approach is language independent.
Labels:
Blog Analysis,
Consistency,
Related Work,
research
Wednesday, July 25, 2012
Summary of "Using Blog Content Depth and Breadth to Access and Classify Blogs" - Chen et al.
Here, I sum up another interesting paper concerning a content-based ranking of blogs.
Blog assessment. PageRank, HITS, and Technorati's blog
authority have two issues: sparseness of links and time lagging of score.
Further, most blog search engines are based on simple retrieval models because
they only access the limited content of feeds and have to struggle with real-time
constraints.
The authors present a blog-specific filtering system that
measures topic concentration and variation.
They asses the quality of blogs via two main aspects:
content depth and breadth. This got motivated via the sparseness of links and
the highly personal character of the blogosphere.
The related work essentially consists of two areas: blog and
quality assessment.
| D3 forces layout of German blogs |
Quality assessment. According to Joseph Juran, quality is
the "fitness for use" of information. Common quality assessment
metrics are based on heuristics for a specific situation. Thereby, researchers emphasise
the differences in language, structure and importance of actuality of blogs.
Further, blogs are more interesting, personal, and reflect the author's
opinions/experiences. Thus, researches define the quality of a blog based on
the blogger's expertise, trustworthiness, information quality, and its personal
nature. In addition, the credibility of commentators also counts.
In essence, the authors present a score that relates 5
criterions.
The first criterion is the informativeness of a blog as the
number of meaning full words. A meaningful word has a high tf/idf score.
Secondly, the completeness of a blog indicates how much strongly related words
from each mentioned topic are present.
Third criterion is the topic count per blog. Fourthly, the
inter-topic distance specifies how much words of a post are shared between
topics.
Finally, the topic mergence calculates the general overlap
between topics.
The authors conduct a small user study to prove their
scoring.
Chen, M. and Ohta, T. (2010), Using Blog Content Depth And Breadth To Access and Classify Blogs. International Journal of Business and Information Volume 5, number 1, June 2010.
Chen, M. and Ohta, T. (2010), Using Blog Content Depth And Breadth To Access and Classify Blogs. International Journal of Business and Information Volume 5, number 1, June 2010.
Labels:
Blog Analysis,
Consistency,
Related Work,
Topic
Summary of "An effective coherence measure to determine topical consistency in user-generated content - Jiyin He et.al"
Here, I sum up an interesting paper concerning a content-based ranking of blogs.
A blog is relevant if it focuses on a central topic. This is called topical consistency.
The authors introduce the coherence score to measure the consistency.
It is based on the intra blog clustering structure relative to the clustering of the background collection.
One has to differentiate between short and long term interest in blogs.
Further, the key features of blogs are a strong social aspect and their inherent noisiness.
The topical noise springs from random interest blogs or diaries. This creates topical diffuseness ( a loose clustering).
One has to find the blogger that is most closely associated with a specific topic.
Blogs mostly fail to maintain a central topical thrust. Nevertheless, the trend goes to rank full blogs to recommend the reader interesting feeds.
One has to take the time and the relevance of topics into account.
Thereby, recurring interest (time based) and focused interest (cohesiveness of language of posts) should get measured.
The authors' coherence score captures the topical focus and tightness of subtopics in each blog. Thus, it handles the focused interest.
Lexical cohesion is an alternative to the coherence score. It measures the semantic relation hips between content words.
Therfore, external thesauri like WordNet are used to build lexical chains. The number of chains reflect the number of distinct topics. A so called chain score is used to measure the significance of a lexical chain.
The lexical cohesion is sensitive to progression of topics, but blind to their hierarchical structure.
The coherence score gives the proportion of coherent document pairs relative to the background collection.
These pairs are calculated by thresholding the cosine similiarty of documents.
The score measures the relative tightness of the clustering for a blog and prefers structured document sets with fewer sub-clusters.
Thus, the coherence score captures the clustering structure of data, called topical consistency.
It is independent of external resources and adapts to the fast changing environment of blogs.
Its complexity is O(average document length * number of documents ^2) and it can be used beyond text data (eg. blog structure or linkage).
It gets integrated into a blog ranking for boosting the topical relevant and topical consistent blogs.
A blog is relevant if it focuses on a central topic. This is called topical consistency.
The authors introduce the coherence score to measure the consistency.
It is based on the intra blog clustering structure relative to the clustering of the background collection.
One has to differentiate between short and long term interest in blogs.
Further, the key features of blogs are a strong social aspect and their inherent noisiness.
| Forces Layout of blogs interlinkage using D3 |
One has to find the blogger that is most closely associated with a specific topic.
Blogs mostly fail to maintain a central topical thrust. Nevertheless, the trend goes to rank full blogs to recommend the reader interesting feeds.
One has to take the time and the relevance of topics into account.
Thereby, recurring interest (time based) and focused interest (cohesiveness of language of posts) should get measured.
The authors' coherence score captures the topical focus and tightness of subtopics in each blog. Thus, it handles the focused interest.
Lexical cohesion is an alternative to the coherence score. It measures the semantic relation hips between content words.
Therfore, external thesauri like WordNet are used to build lexical chains. The number of chains reflect the number of distinct topics. A so called chain score is used to measure the significance of a lexical chain.
The lexical cohesion is sensitive to progression of topics, but blind to their hierarchical structure.
The coherence score gives the proportion of coherent document pairs relative to the background collection.
These pairs are calculated by thresholding the cosine similiarty of documents.
The score measures the relative tightness of the clustering for a blog and prefers structured document sets with fewer sub-clusters.
Thus, the coherence score captures the clustering structure of data, called topical consistency.
It is independent of external resources and adapts to the fast changing environment of blogs.
Its complexity is O(average document length * number of documents ^2) and it can be used beyond text data (eg. blog structure or linkage).
It gets integrated into a blog ranking for boosting the topical relevant and topical consistent blogs.
Jiyin He, Wouter Weerkamp, Martha Larson, Maarten de Rijke: An effective coherence measure to determine topical consistency in user-generated content. IJDAR 12(3): 185-203 (2009)
Labels:
Blog Analysis,
Consistency,
Related Work,
Topic
Monday, June 25, 2012
Some Links from HCI Research
Acoustic radiation pressure: Radiation pressure--the history of a mislabeled tensor by Robert T. Beyer, a summary/review paper about 100 year-old history of radiation pressure. A more simple explanation can be found on the German Wikipedia. Essentially, this effect occurs if acoustic waves in one medium shoot at another target medium. If the frequency is higher than the time a target medium needs to stretch, than air particles get reflected back to the sound source. Furthermore, you can imagine the effect better if you think about the water-air medium change. Check Paper on water-air interface experiment.
Tangibles go on market Appmates, little racing cars for your iPad
Think about output to your brain Switching Neurons, Research Area is called Optogenetics, might be interesting for the normal nerve system as well.
Cheat Sheet for Statistics Just in case, you need a refresh Cheat Sheet
Hick's Law in Mortal Combat Webpage discusses the influence of choices in martial arts.
Labels:
Acoustic pressure,
BCI,
brain interface,
cheat sheet,
HCI,
hick's law,
mortal combat,
research,
statistics,
Tangibles
Monday, May 21, 2012
Lost in Storage - Find data zombies using Sequoia
Ever wondered where all your disk space went?
Check out SequoiaView!
It is a pretty nice visualization tool from university of Eindhoven that shows you a explorable treemap of your disk space usage. Thereby you can pretty fast identify old VM images or installers which are lying around wasting your storage.
As you can see in the image below, you just have to follow the huge blobs to find the wasted space.
Check out the homepage of the project:
http://w3.win.tue.nl/nl/onderzoek/onderzoek_informatica/visualization/sequoiaview//
Check out SequoiaView!
It is a pretty nice visualization tool from university of Eindhoven that shows you a explorable treemap of your disk space usage. Thereby you can pretty fast identify old VM images or installers which are lying around wasting your storage.
As you can see in the image below, you just have to follow the huge blobs to find the wasted space.
Check out the homepage of the project:
http://w3.win.tue.nl/nl/onderzoek/onderzoek_informatica/visualization/sequoiaview//
Labels:
disk space,
storage,
visualization,
wasting
Wednesday, May 16, 2012
Touch Paint
How to make your own touch pad and implement a simple drawing app with it?
This was the challenge during the HCI Research lecture by Baudisch at the Hasso-Plattner-Institute.Probably for me even a harder challenge caused by my lag of programming skills with C/C++ and my lag of image processing knowledge. Nevertheless, after quiet a time I handled it. Check out the video!
![]() |
| Glowing tips |
![]() |
| My Pad |
And remember to set all the necessary path variable (include, libary, execution).
Check out my visual studio 2010 project ZIP. (Just a prototype^^)
Labels:
C language,
C++,
drawing,
image processing,
paint,
touch,
touch controll,
touch pad
Thursday, March 22, 2012
Linux Shell Output Redirect
If you like to run a program in total silence and write every output to a file of your choice use:
Helps for cronjobs or huge output generators like Hadoop.
This will redirect everything to the specified file.YOURCOMMAND &> YOURFILE
Helps for cronjobs or huge output generators like Hadoop.
Tuesday, March 13, 2012
CouchDB Lucene: Connection refused
Yesterday a got the following exception from my couchdb:
Just execute this to start your couchdb-lucene again
Traceback (most recent call last):
File "/opt/couchdb-lucene-0.7-SNAPSHOT/tools/couchdb-external-hook.py", line 40, in main
resp = respond(res, req, opts.key)
File "/opt/couchdb-lucene-0.7-SNAPSHOT/tools/couchdb-external-hook.py", line 81, in respond
res.request(method, path, headers=req_headers)
File "/usr/lib/python2.6/httplib.py", line 914, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.6/httplib.py", line 951, in _send_request
self.endheaders()
File "/usr/lib/python2.6/httplib.py", line 908, in endheaders
self._send_output()
File "/usr/lib/python2.6/httplib.py", line 780, in _send_output
self.send(msg)
File "/usr/lib/python2.6/httplib.py", line 739, in send
self.connect()
File "/usr/lib/python2.6/httplib.py", line 720, in connect
self.timeout)
File "/usr/lib/python2.6/socket.py", line 561, in create_connection
raise error, msg
error: [Errno 111] Connection refused
Because it worked like a charme for six month until now, I got pretty depressed. To solve the issue a begun to randomly search the web for this stack trace of my lucene view. But I could not find anything useful until now. After reading the third tutorial about how to setup lucene with couchdb, I figured out that the couchdb-lucence is not running anymore. Therefore after restarting the couchdb-lucene daemon everything works fine again.Just execute this to start your couchdb-lucene again
nohup /opt/couchdb-lucene-0.7-SNAPSHOT/bin/run &
Labels:
connection,
couchdb,
lucene,
python,
stacktrace
Wednesday, February 29, 2012
Real-Time measurement in C
For those of you, who could not find it somewhere else.
Here is the code snippet to get time measurements exact to the microsecond on a Linux system with pure C.
Fun fact, the struct is already there, should be defined in time.h
rt_printk is real-time printf to read your output use command dmesg
Here is the code snippet to get time measurements exact to the microsecond on a Linux system with pure C.
Fun fact, the struct is already there, should be defined in time.h
rt_printk is real-time printf to read your output use command dmesg
#include#include #include #include int main(void) { char buffer[30]; struct timeval tv; time_t curtime; gettimeofday(&tv, NULL); curtime=tv.tv_sec; strftime(buffer,30,"%m-%d-%Y %T.",localtime(&curtime)); rt_printk("%s%ld\n",buffer,tv.tv_usec); return 0; }
Labels:
C,
C language,
coding,
measurement,
programming,
real-time,
time
Wednesday, February 15, 2012
Social Networks and Academic Research
The world is getting faster and faster, but still the most reputation in research is in printed journals.
Now it seams that the times change. There is an upcoming development of social networks for researchers.
These are not like Facebook with sharing pics and useless stuff to procrastinate. Instead the research networks focus on publications and the answer of small research questions in collaborative manner. (see also German Article of Welt-Online)
So check it out, it might become an advantage soon.
researchgate - Social Researcher Network (German startup)
academia - Social Researcher Network (US version)
mendeley - Collaborative Paper Plattform
Now it seams that the times change. There is an upcoming development of social networks for researchers.
These are not like Facebook with sharing pics and useless stuff to procrastinate. Instead the research networks focus on publications and the answer of small research questions in collaborative manner. (see also German Article of Welt-Online)
So check it out, it might become an advantage soon.
researchgate - Social Researcher Network (German startup)
academia - Social Researcher Network (US version)
mendeley - Collaborative Paper Plattform
Wednesday, February 8, 2012
Start of an idea - Eclipse Badges
Do you ever coded day and night without having enough benefit from it. Here it comes the Eclipse Badges Extennsion. Okay, it is not yet developed, but the idea is cool and to cool to die.
So here some bullets for what you could earn badges while coding:
So here some bullets for what you could earn badges while coding:
- X hours coding at once
- Master of Shortcuts
- Challenger of Refactorings
- Best Refactorings in a Row
- longest class name
- Deepest Hierarchy
- lines of code per minute
- longest method
- deepest call hierarchy
- Quick-fix Master
- Web of Eclipse (a lot of complex dependencies)
- longest build time ever
- Antitrust, Eclipse as notepad
- Archaeologist open the oldest projects
- Plugin Master - you can catch them all
- most active views
- nightly coding is appreciated
- Your Rank as developer in the Eclipse Badge Community
All this stuff is coming soon...
Please comment for more ideas!
Please comment for more ideas!
Thursday, November 10, 2011
German Museum of Technology in Berlin
I am currently taking part at the Wissen und Macht symposium in Berlin. While presenting some fact about my university and about the Blog-Intelligence project, I can listen to diverse talks about the power and oppurtunities of the web.Hereby the main interest is in improving the quality and distribution of knowledge. The message is that universities and online communities shall interact more to build a trustful knowledge base.
One contra point mentioned is that this will make the web more inklusive, so normal people like you and me without an reputation will get problems.
The important point discussed now is the influence through social media and especially through the blogosphere. For example a journalist presents the Egyptian Blogger phenomen.
Check out the homepage and the Tele-Task for more information and streams.
Thursday, October 27, 2011
Setting up a new notebook
Hey Guys,
I am just setting up my new notebook. Business as usual, installing the same stuff and configuring it in the same way as before.
I found a nice tutorial for transfering your putty configuration from one pc to another Link to Tutorial.
So you just have to know the right registry key, export it and all your ssh config is back.
I am just setting up my new notebook. Business as usual, installing the same stuff and configuring it in the same way as before.
I found a nice tutorial for transfering your putty configuration from one pc to another Link to Tutorial.
So you just have to know the right registry key, export it and all your ssh config is back.
Monday, October 17, 2011
Citations of The Day
For the Beginning of this semester my citations of the Day
"Misquotation is, in fact, the pride and privilege of the learned. A widely read man never quotes accurately, for the rather obvious reason that he has read too widely." - Hesketh Pearson
"Men who seek happiness are like drunkards who can never find their house but are sure that they have one." - Voltaire
"A weak man has doubts before a decision; a strong man has them afterwards." - Karl Krauss
"No one on his deathbed ever said, "I wish I had spent more time on my business."" - Arnold Zack
Sunday, September 11, 2011
Trip to Florida
Guys, I just went to Miami for vacations. It was an awesome trip. Let's begin!
First, we arrived in Miami. So warm, unbelievable compared to the ultra cold AC of the plan.
Directly to the Miami International Hostel, just 500 meters from the beach. And the beach was nice. White, long and lots of ladies around there. We've been a bit unlucky with the weather, so cloudy and some strong rains.
Nevertheless, chilln around the beach promenade can be expensive, but it is worth it. Lots of bars and a quiet busy place.Next, we went north. Titusville, nice motel Daysinn nearby. And there it went to Space, Cap canerval, Kennedy Space Center. Pretty awesome to see the rockets in real life, I can recommend the rocjet garden and the cap tour with the launching pad. A pretty nice simulation of the moon landing and of a apollo launch are definetly a must see. All in all: a filled day.
Next station Orlando with all his Theme Parks. First Sea World, incredible cool animal shows with dolphins and orca whales. Including some awesome roller coaster trips. Next day, next park, Aquatica, an chilled water park with swimming, sun bathing and scary slides. And finally on the last day in Orlando, we visited Universal Studios park. It felt like being in a movie, really nice rides with fire and other stunning effects. Including the Mummy ride and the white shark boat trip. Pretty cool.
Next up to the best beach ever Siesta Key. Rip currents all over, really scary but we managed it. After a swim and a long beach walk we continued to the Everglades national park. Simply impressing nature, all over the place. We made a nice boat trip and the guide explained us everything about all the plant and crocodiles. Next day we made a cool cycle trip through the park and chilled in the evening in the Hard Rock Cafe Miami beach.
And finally we went for a trip to Key West. All in all a nice sleepy pirate village with a quiet good bar mile and an awesome sunset.
So far the sum up of 12 days Florida.
See you in the sun!
Labels:
Florida,
Trip,
United States
Location:
Miami, FL, USA
Wednesday, June 1, 2011
Creating a Data centric website
Hey Folks,
I am just starting to create a data-centric webapp. The main goal is to allow the user to discover an huge dataset.
Therefore I am currently searching for diverse libraries. Here some early results:
Post on Highcharts JS
Post on JS Charts
Article on Accessible Data Visualization
Post on Google Chart API
To sum up, Google Charts seems to be the fastest solution, but only enough for a prototype. For an highly interactive and discoverable chart I will dive into Highcharts and js chart.
Stay tuned!
I am just starting to create a data-centric webapp. The main goal is to allow the user to discover an huge dataset.
Therefore I am currently searching for diverse libraries. Here some early results:
Post on Highcharts JS
Post on JS Charts
Article on Accessible Data Visualization
Post on Google Chart API
To sum up, Google Charts seems to be the fastest solution, but only enough for a prototype. For an highly interactive and discoverable chart I will dive into Highcharts and js chart.
Stay tuned!
Thursday, April 21, 2011
Apache Nutch and I
Hey Folks,
I just started to working with Apache Nutch during my studies. Seems to be a quiet sophisticated spider/search engine. So it might apply perfectly to the blogIntelligence-project I currently working on. Let's see. Nevertheless, it's an official part of the Search Engine lecture at HPI so I will have to get along with it.
So I decided to make an extra blog on nutch check it out.
http://nutchinprogress.blogspot.com/
I just started to working with Apache Nutch during my studies. Seems to be a quiet sophisticated spider/search engine. So it might apply perfectly to the blogIntelligence-project I currently working on. Let's see. Nevertheless, it's an official part of the Search Engine lecture at HPI so I will have to get along with it.
So I decided to make an extra blog on nutch check it out.
http://nutchinprogress.blogspot.com/
Subscribe to:
Posts (Atom)




