Created
April 11, 2012 16:57
-
-
Save stellard/2360522 to your computer and use it in GitHub Desktop.
pipeline lookup transform
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
module ETL #:nodoc: | |
module Transform #:nodoc: | |
# Transform which looks up the value and replaces it with another previously seen in the pipline | |
class PipelineLookupTransform < ETL::Transform::Transform | |
class DuplicateKeyError < ETLError #:nodoc: | |
end | |
# The resolver to use if the foreign key is not found in the collection | |
attr_accessor :resolver | |
# The default value to use if none is found. | |
attr_accessor :default | |
# Initialize the pipeline lookup transform. | |
# | |
# Configuration options: | |
# *<tt>:collection</tt>: A Hash of natural keys mapped to surrogate keys. If this is not specified then | |
# an empty Hash will be used. This Hash will be used to cache values that have been resolved already | |
# for future use. | |
# *<tt>:value_key</tt>: The attribute to be used as the value for the specified key | |
# *<tt>:lookup_key</tt>: The attribute to be used as the value for the specified key | |
# *<tt>:default</tt>: A default foreign key to use if no foreign key is found | |
# *<tt>:ignore_duplicates</tt>: default => false, raise an exception if duplicate value is found for a key, else just use the last one seen. | |
def initialize(control, name, configuration={}) | |
super | |
@collection = (configuration[:collection] || {}) | |
@value_key = configuration[:value_key] | |
@lookup_key = configuration[:lookup_key] | |
@default = configuration[:default] | |
raise ConfigurationError,":value_key must be set" unless @value_key | |
raise ConfigurationError,":lookup_key must be set" unless @lookup_key | |
@ignore_duplicates = false if configuration[:ignore_duplicates].nil? | |
end | |
# Transform the value by resolving it to a value that it has already seen before in the pipeline | |
def transform(name, value, row) | |
cache_lookup_value row | |
@collection[value] || @default | |
end | |
private | |
def cache_lookup_value row | |
storage_key = row[@lookup_key] | |
raise DuplicateKeyError, "Duplicate key found for #{key} and :ignore_duplicates set to true" if @collection[storage_key] && @ignore_duplicates | |
@collection[storage_key] = row[@value_key] | |
end | |
end | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment