The exercise is to compress a text file supplied through standard input using the method given as the first argument ("zip", "7z", "tar").
The evaluation pipeline evaluates whether the archive created by the user's algorithm can be extracted and the yielded data is the same as the original data. The generated input is compressed for reference and the file sizes are compared.
The definition of the pipeline can be expressed by an YAML code:
tests:
common:
pipeline: "compress"
cases:
test1:
length: 1000
method: "zip"
test2:
length: 1000
method: "7z"
test3:
length: 1000000
method: "zip"
test4:
length: 1000000
method: "tar"
test5:
length: 1000000000
method: "zip"
test6:
length: 1000000000
method: "7z"
test7:
length: 1000000000
method: "tar"
pipelines:
compress:
output:
score: "j"
generator:
type: "recodex/random-text-generator"
options:
# seed: 12345
length: "${length}"
input:
output:
text: "g"
compile:
type: "recodex/compilation"
input:
source_files:
- "...${source_files}"
output:
executable: "e"
exec:
type: "recodex/execution"
input:
executable: "${e}"
args:
- "${method}"
files:
"text.in": ${e}
output:
files:
"archive.out": "z"
compress:
type: "recodex/compression"
input:
archive: "${g}"
method: "${method}"
output: "c"
extract:
type: "recodex/extraction"
input:
archive: "${z}"
method: "${method}"
output: "u"
judge_correctness:
type: "recodex/diff"
options:
encoding: "utf-8"
mode: "strict"
input:
expected: "${g}"
actual: "${u}"
output:
score: "jc"
judge_ratio:
type: "recodex/ratio"
options:
mode: "smaller-is-better"
input:
expected: "@file_size(${c})"
actual: "@file_size(${z})"
output:
score: "jr"
judge:
type: "recodex/min"
input:
- "${jc}"
- "${jr}"
output:
score: "j"
Pipeline flow is defined by this DOT graph:
strict digraph {
rankdir=LR;
node [ shape=ellipse, color=green ] source_files;
node [ shape=note, color=grey ] generator method;
node [ shape=box, color=blue ] compress extract compile exec judge judge_correctness judge_ratio;
node [ shape=doublecircle, color=green ] score;
source_files -> compile [ label="s" ];
generator -> exec [ label="g" ];
compile -> exec [ label="e" ];
method -> exec [ label="method" ];
generator -> compress [ label="g" ];
method -> compress [ label="method" ];
exec -> extract [ label="z" ];
method -> extract [ lable="method" ]
extract -> judge_correctness [ label="u" ];
generator -> judge_correctness [ label="g" ];
exec -> judge_ratio [ label="z" ];
compress -> judge_ratio [ label="c" ];
judge_correctness -> judge [ label="jc" ];
judge_ratio -> judge [ label="jr" ];
judge -> score [ label="j" ]
}
The variable names should be "obfuscated" for the sake of expansion and optimalization. The obfuscation can be done for example in a "functional" way: each output variable is in fact a function of the input variables. So for example an output variable "A", wich depends on input variables "R" and "X", would be obfuscated as "A(R, X)" -- the parameters are sorted lexically and separated by commas and whitespaces. The variables, which change over iterations are marked with indexer-like square brackets syntax with an iterator variable.
strict digraph {
rankdir=LR;
node [ shape=ellipse, color=green ] source_files;
node [ shape=note, color=grey ] generator method;
node [ shape=box, color=blue ] compress extract compile exec judge judge_correctness judge_ratio;
node [ shape=doublecircle, color=green ] score;
source_files -> compile [ label="s" ];
generator -> exec [ label="g[i]" ];
compile -> exec [ label="c(s)" ];
method -> exec [ label="m[j]" ];
generator -> compress [ label="g[i]" ];
method -> compress [ label="m[j]" ];
exec -> extract [ label="e(c(s), g[i], m[j])" ];
method -> extract [ lable="m[j]" ]
extract -> judge_correctness [ label="u(e(c(s), g[i], m[j]), m[j])" ];
generator -> judge_correctness [ label="g[i]" ];
exec -> judge_ratio [ label="e(c(s), g[i], m[j])" ];
compress -> judge_ratio [ label="z(g[i], m[j])" ];
judge_correctness -> judge [ label="jc(u(e(...), g[i], m[j])" ];
judge_ratio -> judge [ label="jr(e(...), z(g[i], m[j]))" ];
judge -> score [ label="j(jr(...), jc(...))" ]
}
When the variables are obfuscated, then the test cases can be instantiated and the nodes, which have the same set of input variables and output variables and have the same type and options can be merged into one (e.g., the compilation is done only once) and create a graph, which can be then turned into a JobConfig for a given programming language or runtime environment:
@todo