Skip to content

Instantly share code, notes, and snippets.

@trevorturk
Last active September 11, 2023 20:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save trevorturk/4962c6e12aa501cdb24a309c314669af to your computer and use it in GitHub Desktop.
Save trevorturk/4962c6e12aa501cdb24a309c314669af to your computer and use it in GitHub Desktop.
Converting from a hash-of-arrays into an array-of-hashes

Most weather data sources give you an array of hashes (representing hours or days) with data points like so:

[
  {
    temperature: 73,
    windSpeed: 10,
  },
  {
    temperature: 70,
    windSpeed: 15,
  },
]

...but The Weather Company gives you a hash of arrays, like so:

{
  temperature: [73, 70, ...],
  windSpeed: [10, 15, ...],
}

Iterating over a large hash of arrays and digging out values by array index turns out to be really slow -- much better to convert the hash of arrays into an array of hashes and then map... but why?

diff --git a/app/models/api/sources/the_weather_company.rb b/app/models/api/sources/the_weather_company.rb
index 8f5b31885..8ad5816aa 100644
--- a/app/models/api/sources/the_weather_company.rb
+++ b/app/models/api/sources/the_weather_company.rb
@@ -69,25 +69,25 @@ class Api::Sources::TheWeatherCompany < Api::Sources::Base
Api::Hourly.new(
summary: -> { DATA_MISSING },
data: -> do
- HOURLY_HOURS.times.map.with_index do |index|
+ to_hourly_hours(hourly_data).map do |hour|
Api::Hourly::Hour.new(
- temperature: -> { hourly_data.dig(:temperature, index) },
- wind_speed: -> { hourly_data.dig(:windSpeed, index) },
+ temperature: -> { hour.dig(:temperature) },
+ wind_speed: -> { hour.dig(:windSpeed) },
)
end
end
@@ -201,7 +201,7 @@ class Api::Sources::TheWeatherCompany < Api::Sources::Base
end
- HOURLY_HOURS = 120
+ HOURLY_HOURS = 360
def hourly_data
fetch_data do
@@ -306,6 +306,37 @@ class Api::Sources::TheWeatherCompany < Api::Sources::Base
end
end
+ # The way TWC sends hourly data is odd and slow to `dig` from, so we convert
+ # the hash-of-arrays into an array-of-hashes
+ def to_hourly_hours(data)
+ hours = HOURLY_HOURS.times.map { Hash.new }
+ data.each do |key, vals|
+ vals.each.with_index do |val, index|
+ hours[index][key] = val
+ end
+ end
+ hours
+ end
+
require "benchmark"
require "securerandom"
# Build a hash with the type of data point => array of values, e.g:
# {
# temperature: [73, 70, ...],
# windSpeed: [10, 15, ...],
# }
KEYS = 20
VALS = 360
hash = Hash.new
keys = KEYS.times.map { SecureRandom.hex(3) }
keys.each { |key| hash[key] = VALS.times.map { rand(89) + 10 } }
hash
# Compare mapping the hash vs converting to an array first
Benchmark.bm do |x|
x.report("h") do
VALS.times.map.with_index do |idx|
h = {}
keys.each do |key|
h[key.upcase] = hash.dig(key, idx)
end
h
end
end
x.report("a") do
# Convert the hash of arrays into an array of hashes, resulting in e.g:
# [
# {
# temperature: 73,
# windSpeed: 10,
# },
# {
# temperature: 70,
# windSpeed: 15,
# },
# ]
array = VALS.times.map { Hash.new }
hash.each do |key, vals|
vals.each.with_index do |val, idx|
array[idx][key] = val
end
end
array
array = array.map do |a|
h = {}
keys.each do |key|
h[key.upcase] = a.dig(key)
end
h
end
end
end
# user system total real
# h 0.001677 0.000035 0.001712 ( 0.001715)
# a 0.002436 0.000055 0.002491 ( 0.002493)
@trevorturk
Copy link
Author

Change resulted in a massive speedup (1000ms -> 200ms) and much better memory usage on Heroku

Screenshot 2023-09-09 at 7 40 17 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment