Bloom filters

From Wikipedia:
The Bloom filter, conceived by Burton H. Bloom, is a space-efficient probabilistic data structure that is used to test whether or not an element is a member of a set. False positives are possible, but false negatives are not. Elements can be added to the set, but not removed (though this can be addressed with a counting filter). The more elements that are added to the set, the larger the probability of false positives.
A good list of papers about applications at the end of the article, including P2P networks. Certainly could be applied to automatic aggregation of Topic Maps too. To be compared with methods of Subject Identity Measure already mentioned.

1 comment:

  1. Anonymous22.6.05

    I'm not sure Bloom filters are that relevant to topic map classification, really. A Bloom filter is a string lookup that's just a little bit too wide (it finds too many matches). The only real benefit to it, compared with an exact match, is that it saves space. It's the sort of thing you'd use to implement the T9 input method on a mobile phone.

    For automated classification things like Bayesian inferencing and Latent Semantic Indexing are probably more appropriate.


Comments welcome