Skip to content

Instantly share code, notes, and snippets.

@joecorall
Last active January 12, 2024 15:03
Show Gist options
  • Save joecorall/8fdf865ff4aa42e534dea3c50116bd59 to your computer and use it in GitHub Desktop.
Save joecorall/8fdf865ff4aa42e534dea3c50116bd59 to your computer and use it in GitHub Desktop.
Batch media ingest from i7 to i2
#!/usr/bin/env bash
set -eou pipefail
BASE_CONFIG=configs/i7.yml
# Example BASE_CONFIG:
# task: add_media
# host: "https://example.com"
# username: admin
# password: password
# input_csv: add_media_take2.csv
# enable_http_cache: false
INPUT_CSV="input_data/$(grep input_csv $BASE_CONFIG | awk -F ': ' '{print $2}')"
# Which CSV row to start on
START=1
# if this job was killed, pick up where it left off
if [ -f batch.iter ]; then
START=$(cat batch.iter)
fi
# how many CSV rows each workbench process should execute at a time
COUNT=10
# how many executions/threads can run at once
PARALLEL_EXECUTIONS=7
# When to stop processing
MAX=$(wc -l $INPUT_CSV | awk '{print $1}')
MAX=$((MAX - 1))
while [ "$START" -lt "$MAX" ]; do
for ((i = 0; i < PARALLEL_EXECUTIONS; i++)); do
STOP=$((START+COUNT-1))
if [ "$STOP" -gt "$MAX" ]; then
STOP=$MAX
fi
BATCH_CONFIG=$BASE_CONFIG-batch-$i.yml
cp $BASE_CONFIG $BATCH_CONFIG
echo "log_file_path: logs/i7-batch-$i.log" >> $BATCH_CONFIG
echo "csv_start_row: $START" >> $BATCH_CONFIG
echo "csv_stop_row: $STOP" >> $BATCH_CONFIG
python3 ./workbench --config $BATCH_CONFIG &
job_ids+=($!)
START=$((START + COUNT))
# save our place incase we have to run this again
echo $START > batch.iter
# break once we're at the max
if [ "$START" -gt "$MAX" ]; then
break
fi
done
echo "Waiting for jobs to complete."
for job_id in "${job_ids[@]}"; do
wait "$job_id" || echo "One job failed, but continuing anyway"
done
echo "Jobs completed."
done
rm batch.iter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment