Semantic Clustering
06 / 2007
en | de
Home
Icon

This page shows a few examples of clustering web content, using algorithms and visualisations that we did during my time at Netbreeze. Clustering means to divide a large set into many smaller groups of very closely related items (in or case these are mostly words or phrases). We implemented several clustering algorithms for documents, words, adjectives, phrases, products, and even drug ingredients that appear on the internet. Using one CPU, we were able to cluster the interessting terms of around one million web documents on user request (i.e. in a few seconds computing time). The dynamic visualisation examples below show the relations between terms, as well as their importance (item size), and trend (item color). The visualisation applet was built using Prefuse.

Clustering of trendy phrases related to climate change (Data source: a selection of internet documents containing 'climate change' up to May 2007):

If you can read this text, you probably don't have Java installed? (version 1.4.2 or later is required)

Get Java here.


Applet and Data © by Netbreeze GmbH

Meaning of the visualisation:
The size of the items represents the frequency they occurred in the given set of documents, and the color of the items represents their trendiness (i.e. how much the term has increased or decreased lately).

How to use the applet:
Use the two slider controls at the bottom of the applet to adjust the number of items to be displayed per cluster, and the maximum distance for items to be displayed measured from a selected node. Furthermore, you can use your mouse scroll wheel to zoom into the visualisation, and you can drag all items by the mouse. By clicking on the hidden control panel on the right, you are able to adjust even more options such as the gravitational force and the distance between the nodes.



Clustering of phrases linked to heart attack (Data from May 2007):

If you can read this text, you probably don't have Java installed? (version 1.4.2 or later is required)

Get Java here.


Applet and Data © by Netbreeze GmbH

Clustering of drugs ingredients based on internet data (Data from May 2007):

If you can read this text, you probably don't have Java installed? (version 1.4.2 or later is required)

Get Java here.


Applet and Data © by Netbreeze GmbH

Links:

Netbreeze GmbH is a Swiss company building knowledge generators based on internet data.
Prefuse interactive information visualization toolkit (very nice Java open source toolkit).