Bootstrapping Methods for Knowledge Structuring
Assume that we have a mental model in the form of a semantic network, which, for example, has been elicited from an individual or group of users. How can we make sure that the model is as simple, as complete and as coherent as possible? In other words, how can we optimize the structure of the knowledge as expressed by the model? To support the necessary knowledge structuring, we have developed an algorithm based on a bootstrapping principle. "Bootstrapping", in this case, means improving the quality of knowledge only by relying on the knowledge that is already available in the model itself, without need for consulting external sources. The bootstrapping techniques assume that the model is organized as a network of nodes, representing concepts and instances, connected by a variety of semantic links, representing if-then rules and IS_A hierarchies. The algorithm searches for similarities between the sets of incoming and outgoing links for two nodes.
knowledge, expressed as a network of nodesand links, can be structured in a better way by bootstrapping the distinctions between nodes, leading to the merging, differentiation or integration of ambiguously distinguished concepts
For example, suppose that we have two concepts, "pet" and "domesticated animal". Suppose the superclasses (outgoing IS_A links) of these concepts are the same: both are "animals" and both "live with humans". Suppose now that their subclasses or instances are also the same: "Fido" is a "pet" but is also a "domesticated animal". The algorithm interprets this situation as an ambiguity, that can be resolved in either of three ways:
Each of these operations will change the pattern of nodes and links in the mental model, and thus elicit a new round of searching for similarities. For example, if two nodes A and B are identified, two nodes C and D that previously were distinguished by their links to respectively A and B will now point to the same node A-B. Therefore, C and D may themselves need to be identified, differentiated or integrated. Thus, bootstrapping operations will cascade through the network, triggering an on-going process of restructuring, until all ambiguities have been resolved.
- Identification (merging):
- "pet" and "domesticated animal" might be considered as synonyms within the model, and the two different nodes should be identified, leaving a single node "pet-domesticated animal".
The two concepts are actually different, but the model lacks the links to distinguish them, e.g. "pet" should have the property "lives in the house", which "domesticated animal" lacks; or, there are instances of "domesticated animal", such as "Moo the cow", which are not instances of "pet". In that case, the algorithm asks the user to provide the missing information.
- Integration (clustering):
Even if the two concepts are different in details, they may have so many properties in common that it is worth integrating them into a new higher order concept, for example, both concepts are special cases of the larger category of "animals that are dependent on people", which also includes some other categories, such as "rats" and "sparrows". In that case, the algorithm may cluster the concepts that have a high overlap in their links, and suggest that cluster to the user as a new concept.
Bootstrapping can be applied not only to mental models derived from individual knowledge, but also to the collective mental models that are implicit in the structure of the web. For example, two web documents may have a high overlap in both their incoming and outgoing links, or their semantic associations. This may mean: 1) the two documents are merely copies, stored at different addresses, of the same text, and should be treated by a web agent as identical; 2) the pattern of linkages does not sufficiently reflect the intrinsic differences between the documents, and therefore it is worth creating additional links that differentiate them; 3) the documents belong to a class of similar documents, and it is worth generating a new, "overview" or "index" document, that groups links to all such documents in a single place.
Copyright© 1999 Principia Cybernetica -
Referencing this page
Mar 29, 1999 (modified)
Aug 2, 1994 (created)