calebcase/use-case-testing.md

## use-case-testing.md

      
    Raw
  

              use-case-testing.md
            
          
    Prepare datasets

Zero files

#!/bin/bash
set -euo pipefail

mkdir zero
for i in {1..1000000}; do
  touch "zero/$i.data"
done
3k files

#!/bin/bash
set -euo pipefail

mkdir 3k
for i in {1..1000000}; do
  truncate -s 3k "3k/$i.data"
done
Others

Follow the gomnia example to generate file size of a
particular distribution and total size. Once completed:
#!/bin/bash
set -euo pipefail

mkdir 1TB

i=0
while read size; do
  truncate -s "$size" "1TB/$i.data"
  ((i+=1))
done
Install rclone with the storj patch

Download for Linux:
https://github.com/calebcase/rclone/releases/tag/v1.50.2-362-g28d7db32-feature-storj-beta
Or you can build it yourself:
git clone https://github.com/calebcase/rclone
cd rclone
git checkout feature/storj
go build

Configure rclone

Run rclone config and follow the interactive prompts. You will need a
scope/access from uplink setup or uplink share.
For example, my config for the atlanta cluster contains something like:
[atlanta]
type = storj
scope = supersecretscope
skip-peer-ca-whitelist = true

Make target bucket

rclone mkdir atlanta:test
Upload

Create an upload script with the following:
#!/bin/bash
set -euo pipefail

site=${1?site name}
dataset=${2?path to dataset}
attempt=${3?attempt number}
concurrency=${4:-64}

date -u
time rclone --transfers $concurrency -v \
  copy $dataset $site:test/$dataset.$attempt &>> $site.$dataset.$attempt.log
This will copy the local directory $dataset to the backend $site. Invoke it
like this:
upload atlanta zero 1
You should end up with a local log file atlanta.zero.1.log.
Errors

Check the upload for errors as it progress. In particular we are interested in
timeout events.
Create an errors script with the following:
#!/bin/bash
set -euo pipefail

site=${1?site name}
dataset=${2?path to dataset}
attempt=${3?attempt number}

(
  printf 'Now: %s\n' "$(date -u --iso=s)"

  general=$(
    (grep ERROR $site.$dataset.$attempt.log || true) |
      (grep -v 'already closed' || true) |
      wc -l
  )
  printf 'General: %d\n' "$general"

  timeouts=$(
    (grep ERROR $site.$dataset.$attempt.log || true) |
      (grep -v 'already closed' || true) |
      (grep 'timed out waiting on copy' || true) |
      wc -l
  )
  printf 'Timeouts: %d\n' "$timeouts"
) | column -t

(grep ERROR $site.$dataset.$attempt.log || true) |
  (grep -v 'already closed' || true) |
  (grep 'timed out waiting on copy' || true) |
  awk '{print $1 " " $2 " CET"}' |
  xargs -I{} date -u --iso=m -d {} |
  uniq -c
Invoke errors like this:
./errors atlanta 3k 1
You should see output like this:
Now:       2020-01-21T12:16:23+00:00
General:   0
Timeouts:  0

Listings

Create an listings script with the following:
#!/bin/bash
set -euo pipefail

site=${1?site name}
dataset=${2?path to dataset}
attempt=${3?attempt number}

printf 'Recursive Listing\n'
date -u
time rclone ls $site:test/$dataset.$attempt | wc -l

printf '\nNon-recursive Listing\n'
date -u
time rclone lsf $site:test/$dataset.$attempt | wc -l
Invoke listing like this:
./listing atlanta 3k 1
You should see output like this:
Recursive Listing
Tue 21 Jan 2020 11:45:11 AM UTC
87744

real    1m23.694s
user    0m8.545s
sys     0m1.133s

Non-recursive Listing
Tue 21 Jan 2020 11:46:35 AM UTC
90945

real    0m58.549s
user    0m8.117s
sys     0m0.456s