sriraj kadimisetty srirajk

## application.properties
###
# #%L
# thinkbig-service-app
# %%
# Copyright (C) 2017 ThinkBig Analytics
# %%
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#

## application.properties
###
# #%L
# thinkbig-ui-app
# %%
# Copyright (C) 2017 ThinkBig Analytics
# %%
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#

## atlas-compose.yaml
version: "3.0"
services:
  cassandra:
    hostname: cassandra
    container_name: cassandra
    image: cassandra:4.0
    ports:
      - 9042:9042
    environment:
      - CASSANDRA_CLUSTER_NAME="Test Cluster"

## test-script
#!/bin/bash

# Find processes matching the criteria and filter out the PID
pids=$(ps -ef | grep "uid123" | grep "java -jar" | grep "spark-submit" | grep -v grep | awk '{print $2}')

# Check if any matching processes were found
if [ -n "$pids" ]; then
    echo "Matching processes found: $pids"

    # Kill the processes

## spark-java-17-env.txt
JDK_JAVA_OPTIONS=--add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED

## comparisons.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                srirajk
                / comparisons.md
            
            
              Created
              April 29, 2024 16:26
            
          
    Apache Spark with AsyncRDDActions and Delta Lake

Scalability and Performance:

Distributed Processing: Spark excels in distributing data processing tasks across multiple nodes, which significantly speeds up processing times for large datasets.
In-Memory Computing: Spark's in-memory computing capabilities allow for faster data processing as compared to disk-based processing, reducing the time for iterative algorithms and data transformations.
Resource Management: Spark integrates well with cluster managers like YARN, Mesos, or Kubernetes, which allows for efficient resource allocation and scalability.

Fault Tolerance and Data Integrity:

Lineage-Based Fault Recovery: Spark's RDDs maintain a lineage of transformations that allows them to rebuild lost data automatically, enhancing fault tolerance without the need for manual intervention.
ACID Transactions with Delta Lake: Integrating with Delta Lake provides ACID properties to data operations, ensuring data integrity acr


## cache-management.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                srirajk
                / cache-management.md
            
            
              Created
              May 14, 2024 16:18
            
          
    Here's an expanded version of the problem statement and solution approach, incorporating more context about using either Apache Spark or Apache Flink for the initial data load:

Problem Statement

In an application ecosystem where data lookups are critical for performance, the reliance on an Oracle database presents significant challenges. The database is not owned by the development team and undergoes frequent updates, leading to latency issues and increased load on the database server. Traditional caching mechanisms like Hibernate's second-level cache are not viable due to the unpredictability of data changes. This scenario demands an efficient and responsive solution to minimize latency and ensure data consistency.
Why Spring Boot Alone is Insufficient for Initial Load


## redis-problem.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                srirajk
                / redis-problem.md
            
            
              Created
              June 3, 2024 00:33
            
          
    Redis Cache Integration in Microservices Architecture: Documentation for Architectural Review

Objective:

This document outlines the strategy for integrating Redis as a shared cache in the process layer of our microservices architecture to support efficient data sharing and pagination.
Problem Statement:

In the process layer, managing and orchestrating multiple API calls efficiently is crucial. This orchestration requires temporary storage of data to support functionalities like pagination effectively.
Proposed Solution:


Redis as a Shared Cache in the Process Layer:


## spring_batch_deployment_strategy.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                srirajk
                / spring_batch_deployment_strategy.md
            
            
              Last active
              June 3, 2024 17:20
            
              
                Spring Batch Deployment
              
          
    Deployment Strategies for Spring Batch on Kubernetes

Overview

When deploying Spring Batch applications on Kubernetes, three main strategies can be considered. Each strategy offers unique advantages and comes with its own set of trade-offs. This document outlines these strategies, providing detailed descriptions along with their pros and cons to help in making an informed decision.
1. Deploying a New Kubernetes Job for Each Batch File

Description:
This strategy involves creating a new Kubernetes Job for each batch file that needs processing. Each job runs in its own isolated environment, which ensures that the resources are dynamically allocated and freed up after the job completes. This approach leverages Kubernetes' native job management capabilities to handle batch processing tasks.

  
## Threads.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                srirajk
                / Threads.md
            
            
              Created
              June 4, 2024 11:29
            
              
                platform vs virtual threads
              
          
    OS Threads


High Resource Use: Creating an OS thread requires significant memory and CPU resources, including a large stack size for each thread.
Expensive Context Switching: Switching between threads involves saving and loading thread states, which is computationally expensive and slow.
Kernel Mode Transitions: OS threads need to frequently transition between user and kernel modes, adding overhead to thread operations.
Scheduling Overhead: The OS has to manage and schedule all threads, and as the number of threads grows, this overhead increases.
Complex Synchronization: Managing access to shared resources between multiple threads requires complex synchronization, adding further overhead and potential for inefficiencies.

Virtual Threads
	###
	# #%L
	# thinkbig-service-app
	# %%
	# Copyright (C) 2017 ThinkBig Analytics
	# %%
	# Licensed under the Apache License, Version 2.0 (the "License");
	# you may not use this file except in compliance with the License.
	# You may obtain a copy of the License at
	#
	###
	# #%L
	# thinkbig-ui-app
	# %%
	# Copyright (C) 2017 ThinkBig Analytics
	# %%
	# Licensed under the Apache License, Version 2.0 (the "License");
	# you may not use this file except in compliance with the License.
	# You may obtain a copy of the License at
	#
	version: "3.0"
	services:
	cassandra:
	hostname: cassandra
	container_name: cassandra
	image: cassandra:4.0
	ports:
	- 9042:9042
	environment:
	- CASSANDRA_CLUSTER_NAME="Test Cluster"
	#!/bin/bash

	# Find processes matching the criteria and filter out the PID
	pids=$(ps -ef \| grep "uid123" \| grep "java -jar" \| grep "spark-submit" \| grep -v grep \| awk '{print $2}')

	# Check if any matching processes were found
	if [ -n "$pids" ]; then
	echo "Matching processes found: $pids"

	# Kill the processes