mndrix/gist:4642708

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    Indexing is done by things which implement the Clauses interface.  There should be one implementation that handles large clause lists of arbitrary complexity using "The Path-Indexing Method for Indexing Terms" by Stickel (see my Google Drive for a copy).  This is the fallback implementation since it can handle everything we throw at it.
There's another implementation for small clause lists (up to 100 or so).  It calculates a special hash (see below) for each clause.  To answer a query, it calculates a similar hash (see below).  It runs through the list of all prepared hashes finding those for which query & candidate == query (using bitwise AND).  Because the list is short and bitwise arithmetic is fast, this should be a fast operation.  The hashes give us indexing over all arguments to a reasonable depth.  This technique should work well for 80-90% of real world clause definitions.
Term Hashing
The fundamental operation is func Hash(terms []Term, bits int) int64 which hashes a list of terms into a bitstring of bits bits long.  Hashing the head of a clause, we pass all the arguments of the term.  The functor occupies the first bits/4 bits.  The first three arguments occupy the rest (bits/4 bits each).  If any of these values is a term itself, recursively call Hash with the functor and arguments in terms and the allowed bitspace for bits.  Any atom fills the bitspace with a normal hash of itself.  During this phase, a variable populates its bitspace with all 1s.  During query hashing, a variable fills its bitspace with all 0s.  This algorithm converts an arbitrary term into a bitstring that encodes both its content and structure.  As with any hash technique, there will be collisions, but they will be resolved during final unification.  All we need to do is quickly trim the list of possible candidates to something practically small.