Skip to content

Instantly share code, notes, and snippets.

@simonrozsival
Last active April 23, 2017 13:13
Show Gist options
  • Save simonrozsival/745aa83d884d29431b3d6b1344417b24 to your computer and use it in GitHub Desktop.
Save simonrozsival/745aa83d884d29431b3d6b1344417b24 to your computer and use it in GitHub Desktop.
Compression algorithm ReCodEx exercise - configuration specification proposal

Compression exercise

The exercise is to compress a text file supplied through standard input using the method given as the first argument ("zip", "7z", "tar").

The evaluation pipeline evaluates whether the archive created by the user's algorithm can be extracted and the yielded data is the same as the original data. The generated input is compressed for reference and the file sizes are compared.

Pipeline:

The definition of the pipeline can be expressed by an YAML code:

tests:
    common:
        pipeline: "compress"
    cases:
        test1:
            length: 1000
            method: "zip"
        test2:
            length: 1000
            method: "7z"
        test3:
            length: 1000000
            method: "zip"
        test4:
            length: 1000000
            method: "tar"
        test5:
            length: 1000000000
            method: "zip"
        test6:
            length: 1000000000
            method: "7z"
        test7:
            length: 1000000000
            method: "tar"

pipelines:
    compress:
        output:
            score: "j"
        generator:
            type: "recodex/random-text-generator"
            options:
                # seed: 12345
                length: "${length}"
            input:
            output:
                text: "g"

        compile:
            type: "recodex/compilation"
            input:
                source_files:
                    - "...${source_files}"
            output:
                executable: "e"

        exec:
            type: "recodex/execution"
            input:
                executable: "${e}"
                args:
                    - "${method}"
                files:
                    "text.in": ${e}
            output:
                files:
                    "archive.out": "z"

        compress:
            type: "recodex/compression"
            input:
                archive: "${g}"
                method: "${method}"
            output: "c"

        extract:
            type: "recodex/extraction"
            input:
                archive: "${z}"
                method: "${method}"
            output: "u"

        judge_correctness:
            type: "recodex/diff"
            options:
                encoding: "utf-8"
                mode: "strict"
            input:
                expected: "${g}"
                actual: "${u}"
            output:
                score: "jc"

        judge_ratio:
            type: "recodex/ratio"
            options:
                mode: "smaller-is-better"
            input:
                expected: "@file_size(${c})"
                actual: "@file_size(${z})"
            output:
                score: "jr"

        judge:
            type: "recodex/min"
            input:
                - "${jc}"
                - "${jr}"
            output:
                score: "j"

Pipeline flow is defined by this DOT graph:

pipeline graph

strict digraph {

    rankdir=LR;

    node [ shape=ellipse, color=green ] source_files;
    node [ shape=note, color=grey ] generator method;
    node [ shape=box, color=blue ] compress extract compile exec judge judge_correctness judge_ratio;
    node [ shape=doublecircle, color=green ] score;

    source_files -> compile [ label="s" ];
    
    generator -> exec [ label="g" ];
    compile -> exec [ label="e" ];
    method -> exec [ label="method" ];
    
    generator -> compress [ label="g" ];
    method -> compress [ label="method" ];

    exec -> extract [ label="z" ];
    method -> extract [ lable="method" ]

    extract -> judge_correctness [ label="u" ];
    generator -> judge_correctness [ label="g" ];

    exec -> judge_ratio [ label="z" ];
    compress -> judge_ratio [ label="c" ];

    judge_correctness -> judge [ label="jc" ];
    judge_ratio -> judge [ label="jr" ];

    judge -> score [ label="j" ]

}

The variable names should be "obfuscated" for the sake of expansion and optimalization. The obfuscation can be done for example in a "functional" way: each output variable is in fact a function of the input variables. So for example an output variable "A", wich depends on input variables "R" and "X", would be obfuscated as "A(R, X)" -- the parameters are sorted lexically and separated by commas and whitespaces. The variables, which change over iterations are marked with indexer-like square brackets syntax with an iterator variable.

the same graph with obfiscated variables

strict digraph {

    rankdir=LR;

    node [ shape=ellipse, color=green ] source_files;
    node [ shape=note, color=grey ] generator method;
    node [ shape=box, color=blue ] compress extract compile exec judge judge_correctness judge_ratio;
    node [ shape=doublecircle, color=green ] score;

    source_files -> compile [ label="s" ];
    
    generator -> exec [ label="g[i]" ];
    compile -> exec [ label="c(s)" ];
    method -> exec [ label="m[j]" ];
    
    generator -> compress [ label="g[i]" ];
    method -> compress [ label="m[j]" ];

    exec -> extract [ label="e(c(s), g[i], m[j])" ];
    method -> extract [ lable="m[j]" ]

    extract -> judge_correctness [ label="u(e(c(s), g[i], m[j]), m[j])" ];
    generator -> judge_correctness [ label="g[i]" ];

    exec -> judge_ratio [ label="e(c(s), g[i], m[j])" ];
    compress -> judge_ratio [ label="z(g[i], m[j])" ];

    judge_correctness -> judge [ label="jc(u(e(...), g[i], m[j])" ];
    judge_ratio -> judge [ label="jr(e(...), z(g[i], m[j]))" ];

    judge -> score [ label="j(jr(...), jc(...))" ]

}

Pipeline expansion

When the variables are obfuscated, then the test cases can be instantiated and the nodes, which have the same set of input variables and output variables and have the same type and options can be merged into one (e.g., the compilation is done only once) and create a graph, which can be then turned into a JobConfig for a given programming language or runtime environment:

expanded pipeline

Job config

@todo

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
strict digraph {
rankdir=LR;
node [ shape=ellipse, color=green ] source_files;
node [ shape=note, color=grey ] "generator[0]" "generator[1]" "generator[2]" "generator[3]" "generator[4]" "generator[5]" "generator[6]" "method[0]" "method[1]" "method[2]";
node [ shape=box, color=blue ] "compress[0][0]" "compress[1][1]" "compress[2][0]" "compress[3][1]" "compress[4][0]" "compress[5][1]" "compress[6][2]" "extract[0][0]" "extract[1][1]" "extract[2][0]" "extract[3][1]" "extract[4][0]" "extract[5][1]" "extract[6][2]" compile "exec[0][0]" "exec[1][1]" "exec[2][0]" "exec[3][1]" "exec[4][0]" "exec[5][1]" "exec[6][2]" "judge[0][0]" "judge[1][1]" "judge[2][0]" "judge[3][1]" "judge[4][0]" "judge[5][1]" "judge[6][2]" "judge_correctness[0][0]" "judge_correctness[1][1]" "judge_correctness[2][0]" "judge_correctness[3][1]" "judge_correctness[4][0]" "judge_correctness[5][1]" "judge_correctness[6][2]" "judge_ratio[0][0]" "judge_ratio[1][1]" "judge_ratio[2][0]" "judge_ratio[3][1]" "judge_ratio[4][0]" "judge_ratio[5][1]" "judge_ratio[6][2]";
node [ shape=doublecircle, color=green ] "score[0][0]" "score[1][1]" "score[2][0]" "score[3][1]" "score[4][0]" "score[5][1]" "score[6][2]";
source_files -> compile [ label="s" ];
compile -> "exec[0][0]" [ label="c(s)" ];
"generator[0]" -> "exec[0][0]" [ label="g[0]" ];
"method[0]" -> "exec[0][0]" [ label="m[0]" ];
"generator[0]" -> "compress[0][0]" [ label="g[0]" ];
"method[0]" -> "compress[0][0]" [ label="m[0]" ];
"exec[0][0]" -> "extract[0][0]" [ label="e(c(s), g[0], m[0])" ];
"method[0]" -> "extract[0][0]" [ lable="m[0]" ]
"extract[0][0]" -> "judge_correctness[0][0]" [ label="u(e(c(s), g[0], m[0]), m[0])" ];
"generator[0]" -> "judge_correctness[0][0]" [ label="g[0]" ];
"exec[0][0]" -> "judge_ratio[0][0]" [ label="e(c(s), g[0], m[0])" ];
"compress[0][0]" -> "judge_ratio[0][0]" [ label="z(g[0], m[0])" ];
"judge_correctness[0][0]" -> "judge[0][0]" [ label="jc(u(e(c(s), g[0], m[0]), g[0], m[0])" ];
"judge_ratio[0][0]" -> "judge[0][0]" [ label="jr(e(c(s), g[0], m[0]), z(g[0], m[0]))" ];
"judge[0][0]" -> "score[0][0]" [ label="j(jc(u(e(c(s), g[0], m[0]), g[0], m[0]), jr(e(c(s), g[0], m[0]), z(g[0], m[0])))" ]
compile -> "exec[1][1]" [ label="c(s)" ];
"generator[1]" -> "exec[1][1]" [ label="g[1]" ];
"method[1]" -> "exec[1][1]" [ label="m[1]" ];
"generator[1]" -> "compress[1][1]" [ label="g[1]" ];
"method[1]" -> "compress[1][1]" [ label="m[1]" ];
"exec[1][1]" -> "extract[1][1]" [ label="e(c(s), g[1], m[1])" ];
"method[1]" -> "extract[1][1]" [ lable="m[1]" ]
"extract[1][1]" -> "judge_correctness[1][1]" [ label="u(e(c(s), g[1], m[1]), m[1])" ];
"generator[1]" -> "judge_correctness[1][1]" [ label="g[1]" ];
"exec[1][1]" -> "judge_ratio[1][1]" [ label="e(c(s), g[1], m[1])" ];
"compress[1][1]" -> "judge_ratio[1][1]" [ label="z(g[1], m[1])" ];
"judge_correctness[1][1]" -> "judge[1][1]" [ label="jc(u(e(c(s), g[1], m[1]), g[1], m[1])" ];
"judge_ratio[1][1]" -> "judge[1][1]" [ label="jr(e(c(s), g[1], m[1]), z(g[1], m[1]))" ];
"judge[1][1]" -> "score[1][1]" [ label="j(jc(u(e(c(s), g[1], m[1]), g[1], m[1]), jr(e(c(s), g[1], m[1]), z(g[1], m[1])))" ]
compile -> "exec[2][0]" [ label="c(s)" ];
"generator[2]" -> "exec[2][0]" [ label="g[2]" ];
"method[0]" -> "exec[2][0]" [ label="m[0]" ];
"generator[2]" -> "compress[2][0]" [ label="g[2]" ];
"method[0]" -> "compress[2][0]" [ label="m[0]" ];
"exec[2][0]" -> "extract[2][0]" [ label="e(c(s), g[2], m[0])" ];
"method[0]" -> "extract[2][0]" [ lable="m[0]" ]
"extract[2][0]" -> "judge_correctness[2][0]" [ label="u(e(c(s), g[2], m[0]), m[0])" ];
"generator[2]" -> "judge_correctness[2][0]" [ label="g[2]" ];
"exec[2][0]" -> "judge_ratio[2][0]" [ label="e(c(s), g[2], m[0])" ];
"compress[2][0]" -> "judge_ratio[2][0]" [ label="z(g[2], m[0])" ];
"judge_correctness[2][0]" -> "judge[2][0]" [ label="jc(u(e(c(s), g[2], m[0]), g[2], m[0])" ];
"judge_ratio[2][0]" -> "judge[2][0]" [ label="jr(e(c(s), g[2], m[0]), z(g[2], m[0]))" ];
"judge[2][0]" -> "score[2][0]" [ label="j(jc(u(e(c(s), g[2], m[0]), g[2], m[0]), jr(e(c(s), g[2], m[0]), z(g[2], m[0])))" ]
compile -> "exec[3][1]" [ label="c(s)" ];
"generator[3]" -> "exec[3][1]" [ label="g[3]" ];
"method[1]" -> "exec[3][1]" [ label="m[1]" ];
"generator[3]" -> "compress[3][1]" [ label="g[3]" ];
"method[1]" -> "compress[3][1]" [ label="m[1]" ];
"exec[3][1]" -> "extract[3][1]" [ label="e(c(s), g[3], m[1])" ];
"method[1]" -> "extract[3][1]" [ lable="m[1]" ]
"extract[3][1]" -> "judge_correctness[3][1]" [ label="u(e(c(s), g[3], m[1]), m[1])" ];
"generator[3]" -> "judge_correctness[3][1]" [ label="g[3]" ];
"exec[3][1]" -> "judge_ratio[3][1]" [ label="e(c(s), g[3], m[1])" ];
"compress[3][1]" -> "judge_ratio[3][1]" [ label="z(g[3], m[1])" ];
"judge_correctness[3][1]" -> "judge[3][1]" [ label="jc(u(e(c(s), g[3], m[1]), g[3], m[1])" ];
"judge_ratio[3][1]" -> "judge[3][1]" [ label="jr(e(c(s), g[3], m[1]), z(g[3], m[1]))" ];
"judge[3][1]" -> "score[3][1]" [ label="j(jc(u(e(c(s), g[3], m[1]), g[3], m[1]), jr(e(c(s), g[3], m[1]), z(g[3], m[1])))" ]
compile -> "exec[4][0]" [ label="c(s)" ];
"generator[4]" -> "exec[4][0]" [ label="g[4]" ];
"method[0]" -> "exec[4][0]" [ label="m[0]" ];
"generator[4]" -> "compress[4][0]" [ label="g[4]" ];
"method[0]" -> "compress[4][0]" [ label="m[0]" ];
"exec[4][0]" -> "extract[4][0]" [ label="e(c(s), g[4], m[0])" ];
"method[0]" -> "extract[4][0]" [ lable="m[0]" ]
"extract[4][0]" -> "judge_correctness[4][0]" [ label="u(e(c(s), g[4], m[0]), m[0])" ];
"generator[4]" -> "judge_correctness[4][0]" [ label="g[4]" ];
"exec[4][0]" -> "judge_ratio[4][0]" [ label="e(c(s), g[4], m[0])" ];
"compress[4][0]" -> "judge_ratio[4][0]" [ label="z(g[4], m[0])" ];
"judge_correctness[4][0]" -> "judge[4][0]" [ label="jc(u(e(c(s), g[4], m[0]), g[4], m[0])" ];
"judge_ratio[4][0]" -> "judge[4][0]" [ label="jr(e(c(s), g[4], m[0]), z(g[4], m[0]))" ];
"judge[4][0]" -> "score[4][0]" [ label="j(jc(u(e(c(s), g[4], m[0]), g[4], m[0]), jr(e(c(s), g[4], m[0]), z(g[4], m[0])))" ]
compile -> "exec[5][1]" [ label="c(s)" ];
"generator[5]" -> "exec[5][1]" [ label="g[5]" ];
"method[1]" -> "exec[5][1]" [ label="m[1]" ];
"generator[5]" -> "compress[5][1]" [ label="g[5]" ];
"method[1]" -> "compress[5][1]" [ label="m[1]" ];
"exec[5][1]" -> "extract[5][1]" [ label="e(c(s), g[5], m[1])" ];
"method[1]" -> "extract[5][1]" [ lable="m[1]" ]
"extract[5][1]" -> "judge_correctness[5][1]" [ label="u(e(c(s), g[5], m[1]), m[1])" ];
"generator[5]" -> "judge_correctness[5][1]" [ label="g[5]" ];
"exec[5][1]" -> "judge_ratio[5][1]" [ label="e(c(s), g[5], m[1])" ];
"compress[5][1]" -> "judge_ratio[5][1]" [ label="z(g[5], m[1])" ];
"judge_correctness[5][1]" -> "judge[5][1]" [ label="jc(u(e(c(s), g[5], m[1]), g[5], m[1])" ];
"judge_ratio[5][1]" -> "judge[5][1]" [ label="jr(e(c(s), g[5], m[1]), z(g[5], m[1]))" ];
"judge[5][1]" -> "score[5][1]" [ label="j(jc(u(e(c(s), g[5], m[1]), g[5], m[1]), jr(e(c(s), g[5], m[1]), z(g[5], m[1])))" ]
compile -> "exec[6][2]" [ label="c(s)" ];
"generator[6]" -> "exec[6][2]" [ label="g[6]" ];
"method[2]" -> "exec[6][2]" [ label="m[2]" ];
"generator[6]" -> "compress[6][2]" [ label="g[6]" ];
"method[2]" -> "compress[6][2]" [ label="m[2]" ];
"exec[6][2]" -> "extract[6][2]" [ label="e(c(s), g[6], m[2])" ];
"method[2]" -> "extract[6][2]" [ lable="m[2]" ]
"extract[6][2]" -> "judge_correctness[6][2]" [ label="u(e(c(s), g[6], m[2]), m[2])" ];
"generator[6]" -> "judge_correctness[6][2]" [ label="g[6]" ];
"exec[6][2]" -> "judge_ratio[6][2]" [ label="e(c(s), g[6], m[2])" ];
"compress[6][2]" -> "judge_ratio[6][2]" [ label="z(g[6], m[2])" ];
"judge_correctness[6][2]" -> "judge[6][2]" [ label="jc(u(e(c(s), g[6], m[2]), g[6], m[2])" ];
"judge_ratio[6][2]" -> "judge[6][2]" [ label="jr(e(c(s), g[6], m[2]), z(g[6], m[2]))" ];
"judge[6][2]" -> "score[6][2]" [ label="j(jc(u(e(c(s), g[6], m[2]), g[6], m[2]), jr(e(c(s), g[6], m[2]), z(g[6], m[2])))" ]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment