Monday, October 04, 2004

New Search Engine


What does a new search engine have to do with Philosophy? Why, it puts me on the first page of search results for "Melbourne Philosophy" of course! Apart from that, it's good to see some plausible competition for Google. Your average user might not care about the subtle limitations of Google, but us philosophers care about the rightness of things, not just the function of things. Okay, Google is possibly one of the best-behaved corporate citizens in the world, using low-visibility advertising, blinding-fast speed and an objective page-ranking system. Okay, they have provided a valuable research tool into the vast and largely unmapped worldwide web. But there are problems.

How does Google work? Tech people may be familiar with the PageRank system, by which every link to is a vote for that site. Search results are a list of sites containing the search criteria, ordered by the number of votes. As a result, well-known sites are rewarded, and little-known sites are not. As a result, you get a kind of power-law distribution for how highly a site is ranked. The vast majority of sited get just one or two links, with some sites having truly enormous rankings. It is as much art as science getting a website to be sufficiently well-known to appear in Google's search results. Popularity is more important than quality. The success of Google relies on the usually true assumption that quality is what leads to popularity. Popularity can be measured quickly according to this straw-poll mechanism, while quality cannot yet be well measured without human intervention.

The problem with this - and really the point of this article - is that Google can fail as a research tool, because it is unable to discriminate websites into groups or classes. The "needle in the haystack" has no way of being discovered against the background radiation of a million other websites. The vast majority of users are happy, because the vast majority of users are looking for the most popular websites. (Clearly this must be definitionally true - although one could imagine a process similar to that of astroturfing , whereby false links are deposited around the web. This is a well-known "attack" on Google often employed by meme-gamers. Clusty didn't do so well on this search term - try the "I'm feeling Lucky" button on Google with the search term "French Military Victories" to see an example however.

Clusty helps to separate the results into broad classes, allowing much faster access to information. Tech-users might like to think of this as a tree-size problem. Google presents a list of results, 10 at a time. Finding the 50th site takes you to depth 5. Let's face it, almost nobody is going to go as deep as 5 levels unless they _really_ want that information. Clusty is different. As well as the 10 best-matching items, it presents more ways of going deeper into the tree. Instead of choosing "the next 10", you are given an easy-to-use navigation bar which suggests logical data groupings by analysing the data itself. Google offers something similar by allowing "search within results", however, Clusty wins, because you don't have to have any domain knowledge to move through its options. Groups of web sites containing many common keywords become "clusters" through which you can search, allowing the data itself to describe its groupings. While Google is a syntax-only popularity contest, Clusty places more importance on the content of the documents returned, and the hope is that the clusters will be separated semantically, allowing people to more quickly access the right information.


Blogger a process of emanation said...

two words: Nigritude Ultramarine

10/05/2004 01:01:00 AM  

Post a Comment

Links to this post:

Create a Link

<< Home