Collaborative Filtering
Principia Cybernetica Web

Collaborative Filtering

Collaborative filtering systems can produce personal recommendations by computing the similarity between your preference and the one of other people

Recently a number of methods have been developed for the "collaborative filtering" or "social filtering" of information (Resnick et al. 1994; Shardanand & Maes 1995; Breeze et al. 1998). The main idea is to automate the process of "word-of-mouth" by which people recommend products or services to one another. If you need to choose between a variety of options with which you do not have any experience, you will often rely on the opinions of others who do have such experience. However, when there are thousands or millions of options, like in the Web, it becomes practically impossible for an individual to locate reliable experts that can give advice about each of the options. By shifting from an individual to a collective method of recommendation, the problem becomes more manageable.

Instead of asking opinions to each individual, you might try to determine an "average opinion" for the group. This, however, ignores your particular interests, which may be different from those of the "average person". You would rather like to hear the opinions of those people who have interests similar to your own, that is to say, you would prefer a "division-of-labor" type of organization, where people only contribute to the domain they are specialized in.

The basic mechanism behind collaborative filtering systems is the following:

  • a large group of people's preferences are registered;
  • using a similarity metric, a subgroup of people is selected whose preferences are similar to the preferences of the person who seeks advice;
  • a (possibly weighted) average of the preferences for that subgroup is calculated;
  • the resulting preference function is used to recommend options on which the advice-seeker has expressed no personal opinion as yet.
Typical similarity metrics are Pearson correlation coefficients between the users' preference functions and (less frequently) vector distances or dot products.

If the similarity metric has indeed selected people with similar tastes, the chances are great that the options that are highly evaluated by that group will also be appreciated by the advice-seeker. The typical application is the recommendation of books, music CDs, or movies. More generally, the method can be used for the selection of documents, services or products of any kind.

The main bottleneck with existing collaborative filtering systems is the collection of preferences (cf. Shardanand & Maes 1995). To be reliable, the system needs a very large number of people (typically thousands) to express their preferences about a relatively large number of options (typically dozens). This requires quite a lot of effort from a lot of people. Since the system only becomes useful after a "critical mass" of opinions has been collected, people will not be very motivated to express detailed preferences in the beginning stages (e.g. by scoring dozens of music records on a 10 point scale), when the system cannot yet help them.

One way to avoid this start-up problem is to collect preferences that are implicit in people's actions (Nichols 1998). For example, people who order books from an Internet bookshop implicitly express their preference for the books they buy over the books they do not buy. Customers who have bought the same book are likely to have similar preferences for other books as well. This principle is applied by the Amazon web bookshop, which for each book offers a list of related books that were bought by the same people.

There are even more straightforward ways to collect implicit preferences on the web. One method is to register all the documents on a website that have been consulted by a given user (cf. Breeze et al. 1998). The list of all available documents, with preference 1 for those that have been consulted and preference 0 for the others, then determines a preference function for that user (cf. Breeze et al. 1998). Using a similarity metric on these preference vectors makes it possible to determine neighborhoods of users with similar interests.

More info:


Copyright© 2001 Principia Cybernetica - Referencing this page

F. Heylighen,

Jan 31, 2001 (modified)
Mar 24, 1999 (created)


Project Organization

Collaborative Knowledge Development

PCP Research on Intelligent Webs

Prev. Next


Add comment...