Web Connectivity Analysis
Principia Cybernetica Web

Web Connectivity Analysis

There exist different algorithms to extract information from the pattern of links (connectivity) between web pages


The links connecting documents in the web are in principle all equivalent: the web itself does not express an preference for one link or one document above another. Yet, the connectivity or pattern of linkages between pages does contain a lot of implicit information about the relative importance of links. The author of a web document will normally only include links to other documents that are relevant to the general subject of the page, and of sufficient quality. Thus, locating one document relevant to your goals may be sufficient to guide you to further information on that issue. High quality documents, that contain clear, accurate and useful information, are likely to have many links pointing to them, while low quality documents will get few or no links. Thus, although no explicit preference function is attached to a link, there is a preference implicit in the total number of links pointing to a document. This preference is produced collectively, by the group of all web authors.

There exist different mathematical techniques to extract this information. Recently, two types of algorithms have been developed for this purpose: PageRank (Brin & Page 1998) and HITS (Kleinberg 1998). Both use a bootstrapping approach: they determine the quality or "authority" of a web page on the basis of the number and quality of the pages that link to it. Since the definition is recursive (a page has high quality if many high quality pages point to it), the algorithm needs several iterations to determine the overall quality of a page. Mathematically, this is equivalent to computing the eigenvectors of the matrix that represents the linking pattern in the selected part of the web. PageRank uses the linking matrix directly, HITS uses a product of the matrix and its transposed matrix. The latter method produces two types of pages: authorities, that are pointed to by many good "hubs" (indexes or lists of web pages), and hubs, that point to many good authorities. In combination with a keyword search, which restricts the pages for which the quality is computed to a specific problem "neighborhood", these methods seem to produce a much better quality in the answers returned for a query.

The disadvantage of these methods is that they are static: they merely use the (rather sparse) linking pattern that already exists; they do not allow the web to adapt to the way it is used, as the learning web algorithms propose. However, the two methods can complement each other, as the use of connectivity matrices does not require these matrices to have only binary values (either there is a link or there is not). The learning web and other techniques will produce less sparse matrices with numerical values that can be analysed in the same way, but are likely to produce more fine-grained and reliable results.

Various Links on Web Connectivity Analysis


Copyright© 2000 Principia Cybernetica - Referencing this page

Author
F. Heylighen,

Date
May 31, 2000 (modified)
Mar 24, 1999 (created)

Home

Project Organization

Collaborative Knowledge Development

PCP Research on Intelligent Webs

Up
Prev. Next
Down



Discussion

Add comment...