Skip to content

Instantly share code, notes, and snippets.

@pridhi-arora
Created October 14, 2022 18:32
Show Gist options
  • Save pridhi-arora/4c439e89d06c1ae9c4eb5991435f38c4 to your computer and use it in GitHub Desktop.
Save pridhi-arora/4c439e89d06c1ae9c4eb5991435f38c4 to your computer and use it in GitHub Desktop.
Proposal to Store ChRIS Logs in Object Storage

Store ChRIS Logs in Object Storage

---
title: Store ChRIS Logs in Object Storage
authors:
  - "@PridhiArora"
reviewers:
  - 
  
creation-date: 2022-10-14
last-updated: 2022-10-14
status: implementable

Table of Contents

Glossary

  • S3 - Amazon S3 is an object storage service that stores data as objects within buckets
  • Minio - Minio is an open-source distributed object storage server written in Go, designed for Private Cloud infrastructure providing S3 storage functionality

Summary

ChRIS project currently stores log messages as strings in Postgres DB, and this causes Postgres DB to be populated with relatively large strings. The large size causes the backend to slow down. One quick fix is to truncate the log string that is already implemented and use only a limited string. Right now, a substring until 3000 words is populated into the database, and the rest is truncated. But this devoids us of a lot of information. The decoration of log strings is done to make messages aesthetically pleasing. Removing unnecessary characters, such as emojis and Chinese characters, reduces the string size from 4000B to 2000B. Both the above fixes are still not enough as log size increases daily, and it is essential to find a better solution, and this is where object storage comes into the picture.

Motivation

The absence of a proper storage system for logs is slowing the backend, causing many users to be frustrated. The idea is to have an appropriate design so that ChRIS system is seamless.

Goals

  1. To allow implementation of object storage for storage of logs.
  2. To increase performance as well by using asynchronous data streams instead of buffering full responses.

Proposal

User Stories

Story - Decreasing the latency in ChRIS backend.

Stacy works at an IT department of an organization where ChRIS backend is used; she intends to use it regularly and needs logs to refer to. Storing logs in object storage would give her more readable logs.

Requirements

Functional Requirements

  • FR1: Run Docker container with Minio
  • FR2: Connect Docker container to ChRIS' backend.
  • FR3: Storge logs inside of Minio

Non-Functional Requirements

  • NFR1: Unit tests MUST exist for the support.

Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints

Proposed Changes

Currently, the summary gets saved in Postgres.

When object storage will be implemented, the following lines would need to be changed.

 self.c_plugin_inst.status = 'started'
self.c_plugin_inst.summary = self.get_job_status_summary()  # initial status
self.c_plugin_inst.raw = json_zip2str(d_resp)
self.c_plugin_inst.save()

Following is a code snippet taken from the official Minio Docs that can be used to implement the functionality.

from minio import Minio
from minio.error import S3Error


def main():
    # Create a client with the MinIO server playground, its access key
    # and secret key.
    client = Minio(
        "play.min.io",
        access_key="Q3AM3UQ867SPQQA43P2F",
        secret_key="zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG",
    )

    # Make 'asiatrip' bucket if not exist.
    found = client.bucket_exists("asiatrip")
    if not found:
        client.make_bucket("asiatrip")
    else:
        print("Bucket 'asiatrip' already exists")

    # Upload '/home/user/Photos/asiaphotos.zip' as object name
    # 'asiaphotos-2015.zip' to bucket 'asiatrip'.
    client.fput_object(
        "asiatrip", "asiaphotos-2015.zip", "/home/user/Photos/asiaphotos.zip",
    )
    print(
        "'/home/user/Photos/asiaphotos.zip' is successfully uploaded as "
        "object 'asiaphotos-2015.zip' to bucket 'asiatrip'."
    )


if __name__ == "__main__":
    try:
        main()
    except S3Error as exc:
        print("error occurred.", exc)
Run File Uploader
$ python file_uploader.py
'/home/user/Photos/asiaphotos.zip' is successfully uploaded as object 'asiaphotos-2015.zip' to bucket 'asiatrip'.

Test Plan

  • Unit tests.

Graduation Criteria

  • Docker-supported Minio
  • Integration of ChRIS with Minio
  • Unit tests to test the functionality
@jennydaman
Copy link

You did a good job describing the current behavior and its technical problems (storage of truncated logs in PostgreSQL).

Regarding the motivation of storing logs in object storage v.s. a SQL DB, correct me if I am wrong, but I would assume that SQL DB is more performant? The problem is that SQL is more limited, e.g. must be valid UTF-8 string of fewer than 4,000 bytes (Postgres exploits these limitations to optimize performance). Chinese characters and emojis are not "unimportant," they're very important, the problem is that they are unsupported.

Currently ChRIS already depends on an s3-compatible object storage service called OpenStack Swift. We would rather use an existing service rather than introduce another dependency (minio). Moreover, we should try to use a Django abstraction (wrapper) around the file/object storage API rather than using the client library directly. (The motivation to do so is interoperability between object storage providers and leveraging features from the Django framework).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment