Skip to content

Instantly share code, notes, and snippets.

@recalde
recalde / db_config.json
Created March 13, 2024 01:28
Export Data Script
{
"sql_server": {
"driver": "{SQL Server}",
"server": "your_server",
"database": "your_database",
"username": "your_username",
"password": "your_password"
},
"oracle": {
"dsn": "your_dsn",
import pandas as pd
import pyarrow.csv as pv
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import calendar
from pathlib import Path
import os
def read_large_csv(filepath):
@recalde
recalde / auto_complete_chat.md
Created March 6, 2024 15:30
Auto Complete Chat

Question :

you are aws python developer using AWS who is interested in speed, performance, and memory optimization. you have been given a file with 366,000 distinct account names, sorted alphabetically. the files is 10mb total. you have a front end website that needs to auto-complete on a text field, using a string contains filter. there should be a lambda web api that returns data to the front end. you can use lambda, s3, dynamodb, in memory cache, and browser side cache to solve this problem. describe several options to solve this problem.

Answer :

Solving this problem involves efficiently handling a large dataset for auto-completion with constraints on speed, performance, and memory optimization. Here are several approaches using AWS services:

1. Lambda with S3 and In-Memory Cache

import os
import csv
import boto3
from concurrent.futures import ThreadPoolExecutor, as_completed
from FjaPb import FjaPb # Assuming this is the correct import for FjaPb
# Load AWS credentials and target bucket details from environment variables
ACCESS_KEY = os.getenv("AWS_ACCESS_KEY_ID")
SECRET_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
BUCKET_URL = os.getenv("S3_BUCKET_URL")
@recalde
recalde / auto_complete_acct_nm.py
Last active February 28, 2024 23:30
Account name cache
import boto3
import csv
from datetime import datetime, timedelta
from io import StringIO
def scan_dynamodb_table(dynamodb_table_name, last_execution_time=None):
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(dynamodb_table_name)
# Calculate the start and end time for the time range (1 month ago from last execution time or now)
@recalde
recalde / summarize.py
Last active February 28, 2024 20:20
s3-stats
import pandas as pd
import matplotlib.pyplot as plt
import pyarrow.parquet as pq
import os
# Reading data using Apache Arrow
data_dir = '/path/to/your/data/directory'
file_paths = [os.path.join(data_dir, file) for file in os.listdir(data_dir) if file.endswith('.parquet')]
dfs = [pq.read_table(file).to_pandas() for file in file_paths]
df = pd.concat(dfs, ignore_index=True)
import os
import boto3
import json
import jwt
from jwt.algorithms import RSAAlgorithm
import requests
import logging
# Configure logging
logger = logging.getLogger()
import boto3
import requests
import json
import datetime
from dateutil import parser
from requests_ntlm import HttpNtlmAuth
# AWS clients
s3 = boto3.client('s3')
ssm = boto3.client('ssm')
To integrate an AWS Lambda function with an on-premises IIS server using Windows authentication and cache HTTP responses, follow these streamlined instructions focusing on authentication and caching strategies:
Handling Windows Authentication
HTTP Client: Utilize a Python HTTP client library that supports Windows authentication, such as requests-ntlm for NTLM or requests-negotiate-sspi for Kerberos, alongside the standard requests library.
Implementing Caching
You can choose from several caching strategies based on your requirements:
Amazon S3: Suitable for storing large JSON responses. Serialize the response and store it in an S3 bucket with a unique key for each entry to facilitate easy retrieval.
Amazon DynamoDB: Ideal for structured data caching, allowing queries based on response attributes. Use a TTL attribute for automatic expiration of old entries.
Amazon ElastiCache: Offers high-performance, low-latency caching suitable for frequently accessed data. Choose between Redis and Memcached based on your
@recalde
recalde / chat_gtp_test_sql.md
Created January 25, 2024 15:11
chat_gtp_test_sql

To modify the program to write the results to a markdown file with a table, we'll adjust the script to collect the results in a list and then write this list to a markdown file at the end. Each row in the markdown table will contain the SQL file name and the output status.

Here's the updated script:

import os
import psycopg2

# Database configuration
db_config = {