Skip to content

Instantly share code, notes, and snippets.

@drin
Created October 14, 2022 22:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save drin/eaa9f36fcc8b848f710b3270ea6658b1 to your computer and use it in GitHub Desktop.
Save drin/eaa9f36fcc8b848f710b3270ea6658b1 to your computer and use it in GitHub Desktop.
Trying to test that HashMultiColumn produces expected hash values for int32_t input values

A simplified version of HashIntImp for testing:

// hash_int based on key_hash.cc:HashIntImp (672431b)
template <typename T>
uint64_t hash_int(T val) {
  constexpr uint64_t int_const = 11400714785074694791ULL;
  uint64_t cast_val            = static_cast<uint64_t>(val);

  return static_cast<uint64_t>(BYTESWAP(cast_val * int_const));
}

Calculate hash of test values individually:

std::vector<int32_t> test_values {3, -1, 2, 0, 127, 64};
UInt64Builder result_builder;
ASSERT_OK(result_builder.Reserve(test_values.size()));

for (int val_ndx = 0; val_ndx < test_values.size(); ++val_ndx) {
  uint64_t expected_hash = hash_int<int32_t>(test_values[val_ndx]);
  ASSERT_OK(result_builder.Append(expected_hash));
}

ASSERT_OK_AND_ASSIGN(auto expected_results, result_builder.Finish());
ARROW_LOG(INFO) << "expected results: " << expected_results->ToString();

Expected Results (always the same value):

expected results: [
  10763536662319179482,
  8733909567890966625,
  1050982531982388796,
  0,
  17976392168077493629,
  13880642133967036045
]

Use hash_64 to calculate hash values of int32_t values:

auto test_array = ArrayFromJSON(int32(), "[3, -1, 2, 0, 127, 64]");
ASSERT_OK_AND_ASSIGN(Datum hash_result, CallFunction("hash_64", {test_array}));

ArraySpan result_span { *(hash_result.array()) };
ARROW_LOG(INFO) << "hash results: " << result_span.ToArray()->ToString();

Actual results (different everytime this is run):

actual results: [
  3710583505152,
  0,
  0,
  0,
  0,
  0
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment