idibidiart/theory.md

## theory.md

      
    Raw
  

              theory.md
            
          
    Theory of Operation

Training a neural embeddings model on domain-specific data is a way to build an Always Evolving Pattern Dictionary for that domain that associates the different patterns found that data via their cosine similarity in vector space. The patterns can be any sequence of symbols, including whole words, letters, digits, etc.
Some neural embeddings models like Word2Vec work only at the whole-word level (not the letter level) and are context independent in terms of the matched vector, when the trained model is fed a test pattern, i.e. each pattern is represented by just one vector regardless of the associated context in the test pattern. Others like BERT, ELMo and Flair are context dependent, i.e. may have one or more vectors for each pattern and the matched vector depends on the context in the test pattern, and work at the sub-word level. Simple embeddings that work on the word level and are context independent work well in situations where we have patterns such as product names and acronyms that have a very specific meaning regardless of the context they appear in. For example, "CAC" and "CACC" refer to two completely different products, so they should not be confused as being potentially similar. The same goes for the meaning of "some app" as the context in the test pattern in that it shouldn't affect its meaning. However, this assumes that there is some axiomatic knowledge that we can reference, for how would we infer with certainty that CACC is factually not a misspelling of CAC? or what the correct set of labels we should infer for "some app" if our embeddings model suggests that "some app" is similar to "another app"?
We find that a simple embeddings models like Word2Vec, where each word/pattern is given one meaning regardless of context in the test pattern, and where the model works at the whole-word level, is the right choice when the concepts in the domain have a definite meaning, and where we have either a labeled dataset that covers all the topics in the domain or an ontology/taxonomy of topics for a closed domain of knowledge. But when the patterns contain noise, e.g. misspellings, or have different meanings in different contexts, relative to the test patterns, we should use a context-dependent embeddings model like BERT, ELMo, or Flair. Usually, the latter is used with open domains of knowledge, such as all knowledge on DPedia, Wikipedia, etc, and requires an open-ended reasoner, which is an open research topic.
In this PoC, we use an ontology (or a taxonomy of topics for the domain) as a way to filter out all potential topics that are irrelevant to the domain that the Word2vec Embeddings Model would otherwise happily produce (as embeddings have no knowledge of the domain, just the language used in the domain) We also use the ontology/taxonomy to discover logical relations not just semantic similarity between found topics (some things could be used in similar context but are not logically related and conversely they could be logically related but not used in similar contexts.)
In other words, we combine simple neural learning with axiomatic knowledge (that enables logical analysis) to produce a system that does not only processes input reactively but also analyzes it on the context of the domain knowledge it has. This combines both cognitive and logical dimensions of AI and maximizes the opportunity for automation, by reducing or eliminating noise in the output of the classifier.