Skip to content

Instantly share code, notes, and snippets.

@djour
Created April 18, 2012 16:11
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save djour/2414613 to your computer and use it in GitHub Desktop.
Save djour/2414613 to your computer and use it in GitHub Desktop.
Define subclass tf-idf for weighting scheme
diff --git a/xapian-core/include/xapian/weight.h b/xapian-core/include/xapian/weight.h
index 09cbe46..7592af1 100644
--- a/xapian-core/include/xapian/weight.h
+++ b/xapian-core/include/xapian/weight.h
@@ -506,6 +506,51 @@ class XAPIAN_VISIBILITY_DEFAULT TradWeight : public Weight {
double get_maxextra() const;
};
+/** Xapian::Weight subclass implementing the basic tf-idf scheme
+ *
+ * This class implements the basic tf-idf Weighting scheme, as
+ * described in SMART, the corresponding parameters string in
+ * SMART is nnn. That means:
+ * new-tf = tf.
+ * new-wt = new-tf.
+ * norm-weight = new-wt.
+ * no parameter in this basic implenmentation, in the future,
+ * parameter can be used to specify the different re-computation of
+ * tf, df and normalization of entire subvector.
+*/
+class XAPIAN_VISIBILITY_DEFAULT Tf_idfWeight : public Weight {
+ /// idf.
+ mutable double idf;
+
+ Tf_idfWeight * clone() const;
+
+ void init(double factor);
+
+ public:
+ /** Construct a tf-idf weight.
+ * add parameters to specify different re-computation
+ * in the future.
+ */
+ Tf_idfWeight() {
+ need_stat(COLLECTION_SIZE);
+ need_stat(TERMFREQ);
+ need_stat(WDF);
+ need_stat(WDF_MAX);
+ }
+
+ std::string name() const;
+
+ std::string serialise() const;
+ Tf_idfWeight * unserialise(const std::string & s) const;
+
+ double get_sumpart(Xapian::termcount wdf,
+ Xapian::termcount doclen) const;
+ double get_maxpart() const;
+
+ double get_sumextra(Xapian::termcount doclen) const;
+ double get_maxextra() const;
+};
+
}
#endif // XAPIAN_INCLUDED_WEIGHT_H
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment