Skip to content

Instantly share code, notes, and snippets.

@msm-code
Created December 28, 2021 23:26
Show Gist options
  • Save msm-code/34701149fb4b567bd11b0e867ba0047b to your computer and use it in GitHub Desktop.
Save msm-code/34701149fb4b567bd11b0e867ba0047b to your computer and use it in GitHub Desktop.
Notes from my karton benchmarking setup

Karton benchmark setup notes

  1. Bought a digitalocean VM
  2. Setup
apt update
apt install docker-compose docker.io python3.8-venv
git clone https://github.com/CERT-Polska/karton-playground.git
sudo docker-compose up
  1. Logged in and set up a remote port forward
ssh root@xxxxxxxx -L 8080:127.0.0.1:8080 -L 8030:127.0.0.1:8030 -L 8090:127.0.0.1:8090 -L 6379:127.0.0.1:6379
  1. Set up karton-strings example for a good measure
cp config/karton.ini karton-strings/karton.ini
python3 karton-strings/karton-strings.py
  1. Create a karton-pusher using docs (I would swear it's integrated in some OS component, huh)
mkdir karton-producer
cp karton-strings/karton.ini karton-producer/
vim karton-producer/producer.py

File contents:

import sys
from karton.core import Config, Producer, Task, Resource

config = Config("karton.ini")
producer = Producer(config)

filename = sys.argv[1]
with open(filename, "rb") as f:
    contents = f.read()

resource = Resource(os.path.basename(filename), contents)

task = Task({"type": "sample", "kind": "raw"})

task.add_resource("sample", resource)
task.add_payload("tags", ["simple_producer"])
task.add_payload("additional_info", ["This sample has been added by simple producer example"])

logging.info('pushing file to karton %s, task %s' % (name, task))
producer.send_task(task)
  1. Test it:
(venv) root@kartonbench:~/karton-playground/karton-producer# python3 producer.py /bin/ls
Traceback (most recent call last):
  File "producer.py", line 11, in <module>
    resource = Resource(os.path.basename(filename), contents)
NameError: name 'os' is not defined
  1. Welp. Fix the example and create a PR: https://github.com/CERT-Polska/karton/compare/master...msm-code:patch-1. In the end it doesn't work anyway, because mwdb-reporter expects to manage karton tasks (error about missing karton-analysis). I've removed karton-reporter and replaced it with consume.py:
from karton.core import Karton, Task, Resource
import time
import sys
from datetime import datetime

HOWMANY = int(sys.argv[1])
PROCESSED = 0


class GenericUnpacker(Karton):
    identity = "karton.mwdb-reporter"
    filters = [
        {
            "type": "sample",
        }
    ]

    def process(self, task: Task) -> None:
        global HOWMANY
        global PROCESSED
        if PROCESSED == 0:
            self.start = datetime.now()

        PROCESSED += 1
        if PROCESSED == HOWMANY:
            print("done")
            print("took", (datetime.now() - self.start).total_seconds())
            exit()

if __name__ == "__main__":
    # Here comes the main loop
    GenericUnpacker().loop()

(I've also added a parameter to producer to produce more tasks in a loop).

  1. Final step, removed karton system from the docker compose and installed it from source
git clone https://github.com/CERT-Polska/karton.git
cd karton
python3 -m venv venv
source ./venv/bin/activate
pip install .
[copy karton.ini]
karton-system --setup-bucket
  1. Ok, time for a first test. Started a producer with 2000 tasks (so "reporter" will receive 6000 tasks in total - 2000 raw, 2000 from strings, and 2000 from classifier). This took:
  • producer: 0m18.790s
  • consumer : 55.294401s (35 initial tasks per second)

This is not enough, since we want to stress test the GC too. I've increased the number of produced tasks to 20000. Times:

  • producer: 3m15.755s (10.41x slower)
  • consumer: 614.747157 (32.53370068045715 initial tasks per second) (11.11 times slower)
  1. To avoid optimising the wrong thing, I've started with the obvious and redirected all logs from terminal to /dev/null (this can accidentaly go wrong in many ways, for example terminal may be slow, speed may be limited by my bandwidth to my VPS, or other irrelevant things). This didn't change the results significantly

  2. Helper script to run things in cprofile:

(venv) root@kartonbench:~/karton# cat run.py
from karton.system import SystemService
import io
import cProfile, pstats

with cProfile.Profile() as pr:
    try:
        SystemService.main()
    except Exception as e:
        print(e)
        pass

s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats('cumtime')
ps.print_stats()
print(s.getvalue())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment