Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save camallen/0279a30a414902eeaa0ef6b5321c3ae4 to your computer and use it in GitHub Desktop.
Save camallen/0279a30a414902eeaa0ef6b5321c3ae4 to your computer and use it in GitHub Desktop.
Broken survey task labels on AmazonCam Tambopata project (invalid translation strings data)
# 1. The underlying cause of the issue
# ~17:00-17:30 on Friday 18th October, 2019
# the commands I used to work on the translation fix issue
# fix the workflow strings for an older survey task (missing descriptions)
workflow = Workflow.find 2485
tasks = workflow.tasks
task_string_extractor =
# I believe I resued the workflow tasks variable here and mangled the workflow translation strings
# as workflow.tasks variable had already been run through the `TasksVisitors::ExtractStrings`
# instead I should have reloaded the workflow object to ensure the tasks were fresh and untouched.
# Note: always take care when working on a rails console
# best practice is to use a test project to vet your changes before running on production / live projects
# this modifies tasks in place instead of returning a new task object.
stripped_tasks = tasks
strings = task_string_extractor.collector
workflow.strings = strings
# I believe reusing the workflow variable here is where i mangled the translation strings
# and thus added the broken labels to the english language translation record
# instead I should have reloaded the workflow object to ensure the tasks were fresh and untouched.
# now update the translation strings
translated_strings =
translation = Translation.find_or_initialize_by(
translated: workflow,
language: workflow.primary_language
translation.update_strings_and_versions(translated_strings, workflow.latest_version_id)!
# testing what the injected strings look like after loading from a fresh representation of the workflow
t_workflow = Workflow.find 2485
hmm =
# 2. The fixes for the issue
# ~15:30-19:00 on Sunday 20th October, 2019
# the commands I used to fix the broken workflow strings from a restored backup
# and to fix the translation issues
workflow = Workflow.where(id: 2485).first
good_workflow_data = nil
bad_workflow_data = nil
# get the bad strings (host == panoptes main db)
#"tmp/workflow_2485_bad_strings.csv", "wb") do |csv|
# csv << ["id", "tasks", "strings"]
# csv << [, workflow.tasks.to_json, workflow.strings.to_json]
# end
# I exported a file to share the state of the bad strings for testing
bad_workflow_data ="tmp/workflow_2485_bad_strings.csv")
# #bork them again first
borked_tasks = JSON.parse(bad_workflow_data[1][1])
borked_strings = JSON.parse(bad_workflow_data[1][2])
# ---------------------------------------------------------------------------------------------
# now get the good strings (change the host to the restored backup)
#"tmp/workflow_2485_good_strings.csv", "wb") do |csv|
# csv << ["id", "tasks", "strings"]
# csv << [, workflow.tasks.to_json, workflow.strings.to_json]
# end
# ge them good stings from the saved file
good_workflow_data ="tmp/workflow_2485_good_strings.csv")
good_tasks = JSON.parse(good_workflow_data[1][1])
good_strings = JSON.parse(good_workflow_data[1][2])
# function to inspect difference in the nested data
def hash_diff(first, second)
delete_if { |k, v| second[k] == v }.
merge!(second.dup.delete_if { |k, v| first.has_key?(k) })
# have a look at them to see what they look like
borked_tasks["T0"]['choices']['TAYRA'] { |k| k.include?('TAYRA') } { |k,v| k.include?('TAYRA') } { |k,v| k.include?('TAYRA') }
good_tasks["T0"] do |k|
if good_tasks["T0"][k].is_a?(Hash)
hash_diff(good_tasks["T0"][k], borked_tasks["T0"][k])
# attempt to restore the good string data to the broken workflow configs
# 1st combine the good strings and bad strings back together
# so that we can properly extract them to get the 'proper' (not only label) representations of the
# extracted strings, e.g the old data was extracted withouth descriptions etc
tasks_to_inject = good_tasks.dup
strings_to_inject = good_strings.dup
# get the strings / tasks from the injected data
task_string_extractor =
exracted_tasks = tasks_to_inject
extracted_strings = task_string_extractor.collector
# now update the workflow column data (avoid callbacks and version histories, we are replacing them manually)
workflow.update_column(:strings, extracted_strings)
workflow.update_column(:tasks, exracted_tasks)
# find and fix the borked workflow version tasks & strings
# "6394416,6394417,6394418,6394419,6394420,6394421,6394422,6394423,6394424,6394425,6394426,6394427,6394428,6394429,6394430,6394431,6394432,6394433,6394434,6394435,6394436,6394437,6394438,6394439,6394440,6394441,6498095"
# manually check the last two (highest ids) - 6394441,6498095
# only the last one is broken after looking it (2nd last made in Jan when migrating to new version system)
wv = WorkflowVersion.find(6498095)
wv.update_column(:strings, extracted_strings)
wv.update_column(:tasks, exracted_tasks)
# ok - now fix the synced primary translations
workflow = Workflow.where(id: 2485).first
language = workflow.primary_language
translation = Translation.find_or_initialize_by(translated: workflow,language: language)
translated_strings =
translation.update_strings_and_versions(translated_strings, workflow.latest_version_id)
# check the translation record strings are correct now
# check the translation string_versions look good as well (refer to the correct workflow_version record we found above)
# do not save, it'll create a new translation version, instead just update this one
translation.update_column(:strings, translation.strings)
# now update the translation version we previous borked
translation = Translation.find_or_initialize_by(translated: workflow,language: language)
tv = TranslationVersion.find 118625
tv.strings = translation.strings
tv.string_versions = translation.string_versions
# check the changes
tv.changes.keys #(only strings)
# update the strings so this vesion is correct
tv.update_column(:strings, tv.strings)
# update the busted classifications
incident_data = - 3.days - 1.hour - 30.minutes
borked_cs = Classification.where(project_id:"created_at > ?", incident_date).where(workflow_version: '587.31') { |cs| cs.update_column(:metadata, cs.metadata.merge('invalid_translation' => 'original lables not showing correctly')) }
# emailed the researchers Sunday evening, 20th October ~18:45
# emailed the project translator Sunday evening, 20th October ~18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment