Skip to content

Instantly share code, notes, and snippets.

View srirajk's full-sized avatar

sriraj kadimisetty srirajk

View GitHub Profile
@srirajk
srirajk / application.properties
Created April 16, 2018 20:31
kylo-services
###
# #%L
# thinkbig-service-app
# %%
# Copyright (C) 2017 ThinkBig Analytics
# %%
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
###
# #%L
# thinkbig-ui-app
# %%
# Copyright (C) 2017 ThinkBig Analytics
# %%
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
version: "3.0"
services:
cassandra:
hostname: cassandra
container_name: cassandra
image: cassandra:4.0
ports:
- 9042:9042
environment:
- CASSANDRA_CLUSTER_NAME="Test Cluster"
#!/bin/bash
# Find processes matching the criteria and filter out the PID
pids=$(ps -ef | grep "uid123" | grep "java -jar" | grep "spark-submit" | grep -v grep | awk '{print $2}')
# Check if any matching processes were found
if [ -n "$pids" ]; then
echo "Matching processes found: $pids"
# Kill the processes
JDK_JAVA_OPTIONS=--add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED

Apache Spark with AsyncRDDActions and Delta Lake

Scalability and Performance:

  • Distributed Processing: Spark excels in distributing data processing tasks across multiple nodes, which significantly speeds up processing times for large datasets.
  • In-Memory Computing: Spark's in-memory computing capabilities allow for faster data processing as compared to disk-based processing, reducing the time for iterative algorithms and data transformations.
  • Resource Management: Spark integrates well with cluster managers like YARN, Mesos, or Kubernetes, which allows for efficient resource allocation and scalability.

Fault Tolerance and Data Integrity:

  • Lineage-Based Fault Recovery: Spark's RDDs maintain a lineage of transformations that allows them to rebuild lost data automatically, enhancing fault tolerance without the need for manual intervention.
  • ACID Transactions with Delta Lake: Integrating with Delta Lake provides ACID properties to data operations, ensuring data integrity acr

Here's an expanded version of the problem statement and solution approach, incorporating more context about using either Apache Spark or Apache Flink for the initial data load:


Problem Statement

In an application ecosystem where data lookups are critical for performance, the reliance on an Oracle database presents significant challenges. The database is not owned by the development team and undergoes frequent updates, leading to latency issues and increased load on the database server. Traditional caching mechanisms like Hibernate's second-level cache are not viable due to the unpredictability of data changes. This scenario demands an efficient and responsive solution to minimize latency and ensure data consistency.

Why Spring Boot Alone is Insufficient for Initial Load

Redis Cache Integration in Microservices Architecture: Documentation for Architectural Review

Objective:

This document outlines the strategy for integrating Redis as a shared cache in the process layer of our microservices architecture to support efficient data sharing and pagination.

Problem Statement:

In the process layer, managing and orchestrating multiple API calls efficiently is crucial. This orchestration requires temporary storage of data to support functionalities like pagination effectively.

Proposed Solution:

  • Redis as a Shared Cache in the Process Layer:
@srirajk
srirajk / spring_batch_deployment_strategy.md
Last active June 3, 2024 17:20
Spring Batch Deployment

Deployment Strategies for Spring Batch on Kubernetes

Overview

When deploying Spring Batch applications on Kubernetes, three main strategies can be considered. Each strategy offers unique advantages and comes with its own set of trade-offs. This document outlines these strategies, providing detailed descriptions along with their pros and cons to help in making an informed decision.

1. Deploying a New Kubernetes Job for Each Batch File

Description: This strategy involves creating a new Kubernetes Job for each batch file that needs processing. Each job runs in its own isolated environment, which ensures that the resources are dynamically allocated and freed up after the job completes. This approach leverages Kubernetes' native job management capabilities to handle batch processing tasks.

@srirajk
srirajk / Threads.md
Created June 4, 2024 11:29
platform vs virtual threads

OS Threads

  1. High Resource Use: Creating an OS thread requires significant memory and CPU resources, including a large stack size for each thread.
  2. Expensive Context Switching: Switching between threads involves saving and loading thread states, which is computationally expensive and slow.
  3. Kernel Mode Transitions: OS threads need to frequently transition between user and kernel modes, adding overhead to thread operations.
  4. Scheduling Overhead: The OS has to manage and schedule all threads, and as the number of threads grows, this overhead increases.
  5. Complex Synchronization: Managing access to shared resources between multiple threads requires complex synchronization, adding further overhead and potential for inefficiencies.

Virtual Threads