Skip to content

Instantly share code, notes, and snippets.

View saswata-dutta's full-sized avatar
💭
I may be slow to respond.

Saswata Dutta saswata-dutta

💭
I may be slow to respond.
View GitHub Profile
@saswata-dutta
saswata-dutta / json_cleaner.py
Created April 13, 2024 14:48 — forked from pepoluan/json_cleaner.py
Python-based JSON Cleanup Preprocessor
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at https://mozilla.org/MPL/2.0/.
# Context: https://stackoverflow.com/a/56701174/149900
from io import StringIO
WHITESPACE = " \t\r\n"
import boto3
s3_resource = boto3.resource('s3')
objects = list(s3_resource.Bucket('my-bucket').objects.filter(Prefix='my-folder/'))
objects.sort(key=lambda o: o.last_modified)
print(objects[-1].key)
struct X {
unsigned int length;
char *str;
};
std::string serialize(const X &x) {
std::string s;
s.resize(sizeof(x.length) + x.length);
std::memcpy(s.c_str(), &x.length, sizeof(x.length));
std::memcpy(s.c_str() + sizeof(x.length), x.str, x.length);
import numpy as np
from scipy.spatial import KDTree
rgb_list = np.array([(0, 255, 126), (255, 34, 121)]) # imagine 10k items
rgb_list_normalized = rgb_list / 255.0
tree = KDTree(rgb_list_normalized)
def custom_distance(rgb1, rgb2):
cat some.logs | ggrep -n -P -o '"orderId":"[0-9-]+"|x-account-id=[0-9]+|"errorCode":"\w+"' | gawk -F':' '{arr[$1]=arr[$1]","$0} END {for (i in arr) print arr[i]}'
,1:"orderId":"1",1:x-account-id=6398441113,1:"errorCode":"e1",1:"errorCode":"e11"
,3:"orderId":"2",3:x-account-id=6398441113,3:"errorCode":"e2",3:"errorCode":"e12"
,5:"orderId":"3",5:x-account-id=6398441113,5:"errorCode":"e3",5:"errorCode":"e13"
@saswata-dutta
saswata-dutta / s3_select_key_prefix.py
Created August 8, 2023 10:26
s3 select over a "folder prefix"
import boto3
import datetime
s3 = boto3.client("s3")
bucket = "???"
prefix_base = "actionType=cancel"
query = """SELECT subjectid, s."timestamp" FROM s3object s where clientid = '???' and status = 'UNCANCELLABLE'"""
const { Signer } = require("@aws-sdk/rds-signer");
const mysql = require("mysql2/promise");
const headers = {
"Content-Type": "application/json",
};
const rdsProps = {
hostname: "???.???.ap-south-1.rds.amazonaws.com",
port: 3306,
@saswata-dutta
saswata-dutta / logback.xml
Created July 20, 2023 09:21
logback size and time based rolling
<configuration>
<appender name="ROLLING" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>./logs/myapp.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<!-- daily rollover -->
<fileNamePattern>./logs/myapp-%d{yyyy-MM-dd_HH}.%i.log.gz</fileNamePattern>
<!-- keep 10 days' worth of history capped at 3GB total size -->
<maxHistory>240</maxHistory>
<maxFileSize>100MB</maxFileSize>
@saswata-dutta
saswata-dutta / ddb_based_file_system.md
Last active July 17, 2023 11:33
create a file system in DynamoDb

Index:

  1. prefer using ulid for all file and folder in the index: this avoids rewriting entries in case of rename
  2. have s3 url separately created than file and folder index, so that move doesn't require s3 moves
  3. limit folder depth to 3 or 5 from ui and back-end : to avoid bulky folder level operations

DDb Schema:

  • pk: acc-id
  • sk: parentPath + "___" + ulid (to make uniq rows use ULID strings in the sk which are sorted by time)
  • root is just '/'; there is no row for root.
@FunctionalInterface
interface ThrowingConsumer<T, E extends Exception> {
void accept(T t) throws E;
static <T> Consumer<T> unchecked(ThrowingConsumer<T, Exception> t) {
return arg -> {
try {
t.accept(arg);
} catch (Exception ex) {