Automated Market Research
06 / 2007
en | de

As always, the phone rings at the most inconvenient time possible, and the person on the other end of the line really urgently needs to know your very own opinion on toothpaste A compared to competitor toothpaste B with respect to 'cleanness feeling after brushing both sides 2.5 times'? And how much, on a scale from 1 to 10, you identify their product with 'cool', 'ugly', 'good quality' or whatever, even if you didn't ever use their stuff? The next time facing a telephone survey for market research purposes, you'll probably feel even a bit more annoyed, because - apart from the fact that many of these surveys are statistically not significant, boring and very expensive - a lot of results of that kind could be generated fully automatically just from freely available internet data. And this not only for a given time point as in a phone poll, but as real time data over any time period.

The amount of user generated content on the internet (blogs, discussion forums, personal homepages) has grown tremendously over the last time, and is probably still under-estimated by the classical market research industry. On the web there are for example around 200'000 blog postings on the subject of 'toothpaste' alone, and all of them are freely available for everyone.

My visualisation below shows one of the easiest and most straight forward ways to analyse public internet documents on some car brands (in this case a good part of all English blog postings and online news worldwide in the time frame between February and June 2007 was used). The brands are rated by appearance together with attributes such as 'female', 'male', 'ecological', 'family friendly', 'arrogant', 'cool', 'cheap', 'luxury' and others. Of course this example here is not in any way representative or conclusive, and I'm not even interested in car brands at all, but the nice thing is that a lot of such data is immediately available for free on the internet and - most notably - that changes in the brand perception can be tracked nearly in real-time.

If you can read this text, you probably don't have Java installed? (version 1.4.2 or later is required)

Get Java here.

Applet and Data © by Netbreeze GmbH

Meaning of the visualisation:
The brands are positioned by relative occurrence together with a given attribute. As an example: by end of June, Mini Cooper with respect to 'cute' is at 539%, and with respect to 'arrogant' it's on 200%. This means that the word 'cute', in the time period before end of June 2007, occurred 5 times more frequent in connection* with Mini Cooper than in the average of all car brands, while 'arrogant' together with Mini Cooper appeared twice as much as the average. (See also information box when pointing the mouse over an item)

* Two words are only counted as occurring together if they occur in the same paragraph. Usual web search engines are unable to directly count such numbers, because e.g. Google is only able to search for words occurring in the same document, or directly following each-other, but not for more restrictive criteria like 'occurring in the same sentence', 'in the same paragraph' or 'at most x words distance from the other word'. Netbreeze is able to perform such more restrictive searches on the web.

How to use the applet:
For both axes of the graph (horizontal and vertical), you can specify the attribute of your choice by using the menus above, which allows you to immediately compare all brands with respect to these two attributes. To change the time frame, you may just drag the time slider bar to left or right, which allows you to inspect the time evolution of each brand. By changing these settings, there are 3468 different plots to be explored. Did you know for example that the brands VW and Mini Cooper have moved exactly in the opposite direction in the 'ecological vs. cool'-space in the last few months?

The above visualisation is just a first, still quite primitive approach to the analysis of only a partial subset of the user generated content on the web. The quality of the results is likely to improve significantly by taking into account all the available data on one hand, and on the other hand by applying more sophisticated analysis methods such as sentiment detection, semantic text analysis (negations etc.) and also better statistical methods.


Netbreeze GmbH is a Swiss company building knowledge generators based on internet data.
TNS Emnid Semiometrie™ TNS Emnid is a large global market research company and is providing its clients a similar visualisation as the above in the area of brand positioning, but only in 2 dimensions, without a time evolution and based on surveys and not on internet data. On their website you can take the survey to obtain the position of your personality in their emotional space.
Prefuse interactive information visualization toolkit (very nice Java open source toolkit).
Google Blog Search Blog search for 'toothpaste'.