Skip to content

Instantly share code, notes, and snippets.

@chancyk
Created May 7, 2014 23:59
Show Gist options
  • Save chancyk/4e109852b052ad08d21c to your computer and use it in GitHub Desktop.
Save chancyk/4e109852b052ad08d21c to your computer and use it in GitHub Desktop.
Profile of zope.index.TextIndex.apply
Line # Hits Time Per Hit % Time Line Contents
==============================================================
66 @profile
67 def apply(self, querytext, start=0, count=None):
68 8734 34597 4.0 0.3 parser = QueryParser(self.lexicon)
69 8734 1622182 185.7 14.1 tree = parser.parseQuery(querytext)
70 8645 2041557 236.2 17.7 results = tree.executeQuery(self.index)
71 8645 12093 1.4 0.1 if results:
72 8645 675324 78.1 5.9 qw = self.index.query_weight(tree.terms())
73
74 # Hack to avoid ZeroDivisionError
75 8645 12596 1.5 0.1 if qw == 0:
76 qw = 1.0
77
78 8645 9764 1.1 0.1 qw *= 1.0
79
80 2623759 2421427 0.9 21.0 for docid, score in six.iteritems(results):
81 2615114 2049659 0.8 17.8 try:
82 2615114 2636516 1.0 22.9 results[docid] = score/qw
83 except TypeError:
84 # We overflowed the score, perhaps wildly unlikely.
85 # Who knows.
86 results[docid] = 2**64 // 10
87
88 8645 7268 0.8 0.1 return results
@chancyk
Copy link
Author

chancyk commented May 8, 2014

TF-IDF:0.8 canopy creation with 5000 records.

tfidf blocking...
tfidf blocking... name
Index created: 0.27
INFO:dedupe.tfidf:Canopy: TF-IDF:0.8name
Canopy Keys: 9.09
tfidf blocking... address
Index created: 0.22
INFO:dedupe.tfidf:Canopy: TF-IDF:0.8address
Canopy Keys: 9.79

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment