Skip to content

Instantly share code, notes, and snippets.

@Aupajo
Created October 13, 2019 07:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Aupajo/f90b2900984c98b66adade4861b8fd99 to your computer and use it in GitHub Desktop.
Save Aupajo/f90b2900984c98b66adade4861b8fd99 to your computer and use it in GitHub Desktop.
require 'digest/md5'
# This demonstrates an approach you can use to deterministically generate fake
# data based on user data to anonymize it.
# An array of substitute names read from a file
substitute_names = File.read('names.txt').split("\n")
# Real customer data, e.g., from a database
real_name = "John Realname"
# Hash the real name
name_hash = Digest::MD5.hexdigest(real_name) # => "ee4743cf5e5e2e90e27bf15f44e4afa9"
# Convert the name to an integer
name_hash_as_integer = name_hash.to_i(16) # => 316726291424644485479127505202124730281
# Fit the integer within the number of substitute names
new_name_index = name_hash_as_integer % substitute_names.count
# New name to use instead
replacement_name = substitute_names[new_name_index] # => "Ava Johnson"
puts replacement_name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment