#!/bin/bash
set -euo pipefail
mkdir zero
for i in {1..1000000}; do
touch "zero/$i.data"
done
#!/bin/bash
set -euo pipefail
mkdir 3k
for i in {1..1000000}; do
truncate -s 3k "3k/$i.data"
done
Follow the gomnia example to generate file size of a particular distribution and total size. Once completed:
#!/bin/bash
set -euo pipefail
mkdir 1TB
i=0
while read size; do
truncate -s "$size" "1TB/$i.data"
((i+=1))
done
Download for Linux:
https://github.com/calebcase/rclone/releases/tag/v1.50.2-362-g28d7db32-feature-storj-beta
Or you can build it yourself:
git clone https://github.com/calebcase/rclone
cd rclone
git checkout feature/storj
go build
Run rclone config
and follow the interactive prompts. You will need a
scope/access from uplink setup
or uplink share
.
For example, my config for the atlanta cluster contains something like:
[atlanta]
type = storj
scope = supersecretscope
skip-peer-ca-whitelist = true
rclone mkdir atlanta:test
Create an upload
script with the following:
#!/bin/bash
set -euo pipefail
site=${1?site name}
dataset=${2?path to dataset}
attempt=${3?attempt number}
concurrency=${4:-64}
date -u
time rclone --transfers $concurrency -v \
copy $dataset $site:test/$dataset.$attempt &>> $site.$dataset.$attempt.log
This will copy the local directory $dataset
to the backend $site
. Invoke it
like this:
upload atlanta zero 1
You should end up with a local log file atlanta.zero.1.log
.
Check the upload for errors as it progress. In particular we are interested in timeout events.
Create an errors
script with the following:
#!/bin/bash
set -euo pipefail
site=${1?site name}
dataset=${2?path to dataset}
attempt=${3?attempt number}
(
printf 'Now: %s\n' "$(date -u --iso=s)"
general=$(
(grep ERROR $site.$dataset.$attempt.log || true) |
(grep -v 'already closed' || true) |
wc -l
)
printf 'General: %d\n' "$general"
timeouts=$(
(grep ERROR $site.$dataset.$attempt.log || true) |
(grep -v 'already closed' || true) |
(grep 'timed out waiting on copy' || true) |
wc -l
)
printf 'Timeouts: %d\n' "$timeouts"
) | column -t
(grep ERROR $site.$dataset.$attempt.log || true) |
(grep -v 'already closed' || true) |
(grep 'timed out waiting on copy' || true) |
awk '{print $1 " " $2 " CET"}' |
xargs -I{} date -u --iso=m -d {} |
uniq -c
Invoke errors
like this:
./errors atlanta 3k 1
You should see output like this:
Now: 2020-01-21T12:16:23+00:00
General: 0
Timeouts: 0
Create an listings
script with the following:
#!/bin/bash
set -euo pipefail
site=${1?site name}
dataset=${2?path to dataset}
attempt=${3?attempt number}
printf 'Recursive Listing\n'
date -u
time rclone ls $site:test/$dataset.$attempt | wc -l
printf '\nNon-recursive Listing\n'
date -u
time rclone lsf $site:test/$dataset.$attempt | wc -l
Invoke listing
like this:
./listing atlanta 3k 1
You should see output like this:
Recursive Listing
Tue 21 Jan 2020 11:45:11 AM UTC
87744
real 1m23.694s
user 0m8.545s
sys 0m1.133s
Non-recursive Listing
Tue 21 Jan 2020 11:46:35 AM UTC
90945
real 0m58.549s
user 0m8.117s
sys 0m0.456s