Skip to content

Instantly share code, notes, and snippets.

View fenar's full-sized avatar
💭
I may be slow to respond.

Fatih Nar fenar

💭
I may be slow to respond.
View GitHub Profile
@fenar
fenar / clean-replicaset.sh
Created February 18, 2025 04:03
CleanUp-Zombie-ReplicaSets
#!/bin/bash
# Get all ReplicaSets in all namespaces where no pods are ready (status 0/0)
replicasets=$(oc get rs --all-namespaces --no-headers | awk '$3 == "0" && $4 == "0" {print $1, $2}')
# Loop through the list of ReplicaSets and delete them
echo "$replicasets" | while read namespace rsname
do
echo "Deleting ReplicaSet $rsname in namespace $namespace"
oc delete rs $rsname -n $namespace
@fenar
fenar / rlhf.md
Created December 18, 2024 15:16
RLHF

The Ultimate Reinforcement Learning Bible: Concepts, Code, and Applications

1. Introduction to Reinforcement Learning

1.1 What is Reinforcement Learning?

Imagine teaching a dog new tricks. You don't explicitly tell the dog exactly how to sit or roll over - instead, you reward good behavior with treats and perhaps gently discourage unwanted behavior. Over time, the dog learns what actions lead to treats and begins to make better decisions.

This is exactly how reinforcement learning works in the world of artificial intelligence!

#Issue: fenar@macpro71 elastic % oc logs -n openshift-logging elasticsearch-cdm-9p1cy8c7-1-655b488ddd-hcnds --all-containers
cp: cannot create regular file '/etc/elasticsearch/elasticsearch.yml': Permission denied
cp: cannot create regular file '/etc/elasticsearch/log4j2.properties': Permission denied
#check your scc in use for the proplem pod
% oc get pod elasticsearch-cdm-9p1cy8c7-1-655b488ddd-hcnds -o yaml | grep scc
openshift.io/scc: anyuid
#edit your scc RunAsUser to MustRunAsRange OR use a different SCC with RUNASUSER set to MustRunAsRange
% oc edit scc anyuid
apiVersion: security.openshift.io/v1
kind: SecurityContextConstraints
metadata:
name: alloy-logs-scc
allowHostDirVolumePlugin: true
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: false
@fenar
fenar / grfn.yaml
Created August 22, 2024 20:40
pod-perm-debugging
fenar@macpro71 projects % oc get pods
NAME READY STATUS RESTARTS AGE
tme-aix-wb01-0 2/2 Running 0 6h54m
fenar@macpro71 projects % oc exec tme-aix-wb01-0 -- ls -l /var/log
Defaulted container "tme-aix-wb01" out of: tme-aix-wb01, oauth-proxy
total 612
-rw-rw----. 1 root utmp 0 Jun 28 14:09 btmp
-rw-r--r--. 1 root root 147607 Jun 28 14:49 dnf.librepo.log
-rw-r--r--. 1 root root 399490 Jun 28 14:49 dnf.log

Key Points of the Configuration

Pod Anti-Affinity: The podAntiAffinity rule prevents the ODF pods from being scheduled on the same node. This ensures high availability and fault tolerance by distributing the pods across different nodes.

Node Affinity: The nodeAffinity rule specifies that the ODF pods should only be scheduled on the specified nodes (worker-odf-01.acmhub2.narlabs.io, worker-odf-02.acmhub2.narlabs.io, worker-odf-03.acmhub2.narlabs.io). This guarantees that the ODF pods run only on these designated nodes, which may have specific configurations or resources required by the application.

Additional Considerations

Resource Requests and Limits: The resources requested and limited for the containers ensure that each pod has sufficient CPU and memory. Adjust these values based on actual workload requirements.

Persistent Volume and PVC: The configuration includes a sample PersistentVolume and PersistentVolumeClaim for storage. Customize these to fit the specific storage needs and

@fenar
fenar / NVIDIA-SMI-WATCH.md
Last active June 12, 2024 22:05
CLI GPU Observability

fenar@macpro71 acm-observability % oc get pods -n nvidia-gpu-operator
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-s9jsp 1/1 Running 0 42m
gpu-operator-9f47fbdc-d2g9k 1/1 Running 0 44m
nvidia-container-toolkit-daemonset-7k46j 1/1 Running 0 42m
nvidia-cuda-validator-4br5g 0/1 Completed 0 40m
nvidia-dcgm-2z4px 1/1 Running 0 42m
nvidia-dcgm-exporter-7wgq2 1/1 Running 0 42m
nvidia-device-plugin-daemonset-mlvcv 1/1 Running 0 42m
nvidia-driver-daemonset-415.92.202405281402-0-ww2nb 2/2 Running 0 43m

Configure the NVIDIA DCGM Exporter Dashboard on Red Hat OCP

(1) Download the latest NVIDIA DCGM Exporter Dashboard from the DCGM Exporter repository on GitHub:
curl -LfO https://github.com/NVIDIA/dcgm-exporter/raw/main/grafana/dcgm-exporter-dashboard.json

(2) Create a config map from the downloaded file in the openshift-config-managed namespace:
oc create configmap nvidia-dcgm-exporter-dashboard -n openshift-config-managed --from-file=dcgm-exporter-dashboard.json

(3) Label the config map to expose the dashboard in the Administrator perspective of the web console:

@fenar
fenar / format-hdd.sh
Created June 12, 2024 17:43
Format All Disks
#!/bin/bash
# Ensure the script is run as root
if [ "$EUID" -ne 0 ]; then
echo "Please run as root"
exit
fi
# List all the hard drives
drives=$(lsblk -dpno NAME,TYPE | grep 'disk' | awk '{print $1}')
#!/bin/bash
# Sample Script for VM migration betweek Openstack Deployments
# This is for inspiration purposes, use it wisely.
# Variables
SOURCE_OS_AUTH_URL="http://source-openstack:5000/v3"
SOURCE_OS_PROJECT_NAME="source_project"
SOURCE_OS_USERNAME="source_user"
SOURCE_OS_PASSWORD="source_password"
SOURCE_IMAGE_ID="source_image_id"