Skip to content

Instantly share code, notes, and snippets.

@stellard
Created April 11, 2012 16:57
Show Gist options
  • Save stellard/2360522 to your computer and use it in GitHub Desktop.
Save stellard/2360522 to your computer and use it in GitHub Desktop.
pipeline lookup transform
module ETL #:nodoc:
module Transform #:nodoc:
# Transform which looks up the value and replaces it with another previously seen in the pipline
class PipelineLookupTransform < ETL::Transform::Transform
class DuplicateKeyError < ETLError #:nodoc:
end
# The resolver to use if the foreign key is not found in the collection
attr_accessor :resolver
# The default value to use if none is found.
attr_accessor :default
# Initialize the pipeline lookup transform.
#
# Configuration options:
# *<tt>:collection</tt>: A Hash of natural keys mapped to surrogate keys. If this is not specified then
# an empty Hash will be used. This Hash will be used to cache values that have been resolved already
# for future use.
# *<tt>:value_key</tt>: The attribute to be used as the value for the specified key
# *<tt>:lookup_key</tt>: The attribute to be used as the value for the specified key
# *<tt>:default</tt>: A default foreign key to use if no foreign key is found
# *<tt>:ignore_duplicates</tt>: default => false, raise an exception if duplicate value is found for a key, else just use the last one seen.
def initialize(control, name, configuration={})
super
@collection = (configuration[:collection] || {})
@value_key = configuration[:value_key]
@lookup_key = configuration[:lookup_key]
@default = configuration[:default]
raise ConfigurationError,":value_key must be set" unless @value_key
raise ConfigurationError,":lookup_key must be set" unless @lookup_key
@ignore_duplicates = false if configuration[:ignore_duplicates].nil?
end
# Transform the value by resolving it to a value that it has already seen before in the pipeline
def transform(name, value, row)
cache_lookup_value row
@collection[value] || @default
end
private
def cache_lookup_value row
storage_key = row[@lookup_key]
raise DuplicateKeyError, "Duplicate key found for #{key} and :ignore_duplicates set to true" if @collection[storage_key] && @ignore_duplicates
@collection[storage_key] = row[@value_key]
end
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment