Skip to content

Instantly share code, notes, and snippets.

@audy
Created July 9, 2012 19:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save audy/3078251 to your computer and use it in GitHub Desktop.
Save audy/3078251 to your computer and use it in GitHub Desktop.
Shannon Diversity Index
#!/usr/bin/env ruby
#
# Compute Shannon Diversity Index from a CSV file
# Input has samples as columns, variables as rows
#
#
# Usage
# ruby sdi.rb input.txt
#
#
# Example Input
#
# ID,cluster-1,...,cluster-n
# sample-1,10,...,55
# ...
# sample-n,20,...,39
#
SEP = ','
##
# Calculate SDI given an input stream to a CSV file
#
def sdi(handle)
input_file = ARGV[0]
id_to_measurements = Hash.new
header = handle.gets # skip header
header = header.strip.split(SEP)
handle.each do |line|
line = line.strip.split(SEP)
id, measurements = line[0], line[1..-1].map(&:to_f)
id_to_measurements[id] = measurements
end
# Divide by column total
id_to_measurements.map do |id, measurements|
total = measurements.inject(:+)
measurements.map! { |x| x/total }
end
# The SDI Formula
id_to_measurements.map do |id, measurements|
[id, measurements.map { |x| -1*x*Math.log(x) }.reject { |x| x.nan? }.inject(:+)]
end
end
##
# gogogo gl hf
#
def main
id_to_sdi = File.open(ARGV[0]) { |stream| sdi(stream) }
puts "sample_id,sdi"
id_to_sdi.map do |id, sdi|
puts "#{id},#{sdi}"
end
end
if $0 == __FILE__
main
end
__END__
-,a,b,c
A,1,2,3
B,2,3,4
C,4,5,6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment