Tuesday, December 15, 2009

Nice Idea for an wave gadget

Currently after creating a wave and inviting somebody one cannot kick users out of a wave.
So the idea was to build a gadget/robot which handle a body list of wave users. The wave will get encrypted with every change the key will change.
After kicking an user out of the wave, the key of the user will not be updated, so he will not longer be able to participate at the wave.
Practically it may be possible to insert new unencrypted content to the wave, but the encryption mechanism will automatically hide unauthorized messages.

SWT Project Sprint Planning Session 3

Amazing, what people implement for a lecture at the Hasso Plattner Institute. Some guys really work 200% more than anybody ask for, but as contrast some feature are really just functional. Nobody seem to test the UI. So it looks like an incoherent clutch of input fields in combination with some weird checkboxes.
Guys, please use your stuff before presenting it, user will be grateful.

Working on Web Crawler

Since a couple of weeks a friend of mine and I working on a blog crawler to saving the blogosphere. Since now we saved about 50000 blogs and worked around 3 Mio web pages. Sounds much, but it is not. We challenging a lot of other crawlers around including google. Overall crawlers usually get about 300 site per second. As contrast our crawler just get 2500 site per day. That really weird.
Concerning the fact that we distributed our crawler to gain more cumputation power, but here is the problem. To guaranty that all blogs stay linked in the database and to avoid duplicates in the db we need to define a central databank , which seem to be the bottleneg.
Every crawler client just waiting severall minute for the database just to answer a very simpel select statement.
So well my database knowledge is very limited to some base stuff teached at the university. Currently I try to speed up the query using  CREATE INDEX.
After all last time I heard about asychronous distributed message queues. This seems to be a good option to get the whole job handling out the database. But I think that only the INSERTs consume so much time.

So to sum up, buying new hardware is a good option.

Start Blogging

Hello World,
that my first blog, let's test it. My intention for creating this blog is primary to practise web2.0 and share my knowledge. So I hope to offer some useful informations for everyone here