Skip to content

Instantly share code, notes, and snippets.

@murphyslaw
Created June 3, 2011 19:18
Show Gist options
  • Save murphyslaw/1006979 to your computer and use it in GitHub Desktop.
Save murphyslaw/1006979 to your computer and use it in GitHub Desktop.
A ruby script that transforms an array from a csv file into an array with the same structure but with transformed column names and values. You can optionally define transformations on each column.
#!/usr/bin/ruby -w
require 'active_support/core_ext'
# The transformer class uses a source array and a mapping table to
# create a target array, with the same structure as the source array
# but transformed values. The structure of the source array is oriented
# on what is returned after parsing a csv file.
#
# Optionally, it is also possible to define transformation methods for each column, which
# are described below for the different transformation types.
#
# For example:
# source: [["column1", "column2"], ["first", "second"], ["a", "b"]]
# mapping: { "column1" => "custom_column1", "column2" => "custom_column2" }
# target: [["custom_column1", "custom_column2"], ["FIRST", "second"], ["A", "b"]]
#
# Any errors encountered during the run are stored in the +errors+ structure, that
# holds errors for each column.
#
# For example:
# transformer = Transformer.new(source, mapping, options)
# transformer.run
# transformer.errors['column2'] # returns all errors for 'column2'
#
# There are also a couple of options available, that are explained in more detail
# in the +set_default_options+ method.
class Transformer
attr_reader :source
attr_reader :mapping
attr_reader :source_columns
attr_writer :target_columns
attr_reader :options
attr_reader :errors
# To initialize the transformer you need to pass in the source array
# and the mapping table.
def initialize(source, mapping, options = {})
@source = source
@mapping = mapping
@source_columns = @source.delete_at(0)
set_default_options(options)
initialize_errors
end
# Sets the default options for the transformer.
def set_default_options(options)
@options = options.reverse_merge!({
:display_transformations => true, # If enabled, each source and transformed target row will be displayed.
})
end
# Creates the data structure to hold errors for each of the columns.
def initialize_errors
@errors = {}
@source_columns.each do |column|
@errors[column] = []
end
end
# Returns the index in the source array for the given column name.
def source_column_index(name)
source_columns.index(name)
end
# Returns the name of the target column as defined in the mapping table
# for a given source column name.
def target_column_name(source_column_name)
mapping[source_column_name]
end
# Returns an array containing the target column names.
def target_columns
@target_columns ||=
mapping.keys.inject([]) do |target_columns, source_column_name|
target_columns << target_column_name(source_column_name)
target_columns
end
end
# Transforms a single source row, depending on the mapping
# and transform methods.
def run_transformations(row)
transformed_row = []
mapping.each do |source_column_name, target_column_name|
source_column_index = source_column_index(source_column_name)
source_value = row[source_column_index]
target_value = transform(source_column_name, source_value)
transformed_row << target_value
end
if options[:display_transformations]
puts "#{row.inspect}"
puts "#{transformed_row.inspect}\n\n"
end
transformed_row
end
# Checks wether there is a transformation method for the
# given column and executes it, otherwise it returns the
# original value unchanged.
def transform(source_column_name, value)
method_name = "transform_#{source_column_name}"
respond_to?(method_name) ? send(method_name, value) : value
end
# Main method to run the transformer. It turns the source array
# into the target array.
def run
target = []
target << target_columns
source.map { |row| target << run_transformations(row) }
target
end
# Transformations
#
# To enable a transformation you need to define a method, that
# starts with "transform_" followed by the source column name
# you want to transform. The method is given the value and
# should return the target value.
# This transformation is an example for changing the values format.
# It also handles the case, when the value is nil.
def transform_column1(value)
value ||= ''
value.upcase
end
# This transformation is an example for only allowing certain
# values in the target and discarding other ones.
def transform_column2(value)
allowed_values = [
'yes',
'no',
]
index = allowed_values.index(value)
errors['column2'] << "Warning: '#{value}' discarded, because it is not allowed. #{allowed_values.inspect}" unless index
index ? value : ''
end
# This transformation is an example for mapping specific source values
# to target values and discarding the rest.
def transform_column3(value)
mapping = {
'y' => 'yes',
'n' => 'no',
}
index = mapping.keys.index(value)
errors['column3'] << "Warning: '#{value}' discarded, because it is not mapped. #{mapping.keys.inspect}" unless index
index ? mapping[value] : ''
end
end
#
# Prepare source, mapping and options.
#
source = [
['column1', 'column2', 'column3'],
['first', 'second', 'third'],
['no', 'yes', 'y'],
]
mapping = {
'column1' => 'custom_column1',
'column2' => 'custom_column2',
'column3' => 'custom_column3',
}
options = {}
transformer = Transformer.new(source, mapping, options)
#
# Output results.
#
puts "source (#{source.size}): #{source.inspect}\n\n"
target = transformer.run
puts "target (#{target.size}): #{target.inspect}\n\n"
transformer.errors.each do |column, errors|
puts errors
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment