Skip to content

Instantly share code, notes, and snippets.

View onefoursix's full-sized avatar

Mark Brooks onefoursix

View GitHub Profile
@onefoursix
onefoursix / groovy_retry_on_unknown_host_exception.groovy
Last active June 19, 2024 01:03
StreamSets Groovy script retry on unknownHostException
In the Groovy Stage's init script, look for this line of code (around line 138):
// Execute the request
HttpResponse response = httpClient.execute(request)
Comment out that line and replace that line with this section of code:
// mbrooks 06/01/2024 -- comment out this line
// HttpResponse response = httpClient.execute(request)
import os
from datetime import datetime
from time import time
import sys
from streamsets.sdk import ControlHub
import json
# Set to true to echo the metrics to stdout
print_metrics = True
@onefoursix
onefoursix / get-sdc-metrics-and-pipeline-and-record-counts.py
Created September 18, 2023 06:28
Python-based StreamSets REST API script that get SDC CPU and memory metrics as well as running pipeline record counts
#!/usr/bin/env python
'''
This script writes a continuous stream of CPU and Memory metrics for a given SDC
as well as counts of all running pipelines' input records, output records, and error records.
The script uses StreamSets Platform's REST API to pull metrics directly from SDCs; it does not connect to Control Hub
On each refresh interval, the script will record CPU and memory metrics and then the record counts for each running pipeline.
@onefoursix
onefoursix / curl-sdc-streamsets-platform.sh
Created August 28, 2023 15:44
An example of how to call an SDC REST API on StreamSets Platform
export CRED_ID="..."
export CRED_TOKEN="..."
curl -X GET "https://rancher.onefoursix.com:31910/rest/v1/system/jmx" \
-H "X-SS-App-Component-Id: $CRED_ID" \
-H "X-SS-App-Auth-Token: $CRED_TOKEN"
@onefoursix
onefoursix / get-engine-metrics-for-platform.py
Last active August 28, 2023 05:55
Python script to get pipeline and engine metrics for StreamSets platform
#!/usr/bin/env python
'''
This script writes a rolling log file that contains running Pipeline names and record counts,
along with SDC CPU usage and JVM heap memory metrics for all Data Collectors registered
with StreamSets Platform that match the specified set of Labels.
The script writes a sdc-resource-metrics.log as a rolling log file with the pipeline and SDC metrics
as well as a sdc-resource-metrics-messages-and-errors.log file that shows the SDCs that are discovered
and whether connections to them are successful.
@onefoursix
onefoursix / delete-stuck-activating-deployments.sh
Last active July 23, 2023 04:25
A Bash script to call the StreamSets Platform REST API to delete Kubernetes Deployments stuck in an Activating state
#!/usr/bin/env bash
# Script that calls the /provisioning/rest/v1/csp/deployment/${DEP_ID}/unsafeForceStop
# endpoint to delete StreamSets Kubernetes Deployments stuck in an Activating state.
#
# Use this REST API endpoint only if Kubernetes Deployments are stuck
# in an Activating state.
#
# After the script completes, the selected Deployments will be in a
# "Deactivation Error" state and can be deleted using the Control Hub UI
@onefoursix
onefoursix / sdc-keystore.yaml
Created July 18, 2023 22:24
SDC Deployment with a keystore loaded from a Secret
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: streamsets-deployment-576a6314-076a-4eba-9916-1482c89cae72
name: streamsets-deployment-576a6314-076a-4eba-9916-1482c89cae72
namespace: ns100
spec:
replicas: 1
selector:
@onefoursix
onefoursix / get-sdc-memory-and-cpu-metrics.py
Last active July 15, 2023 20:27
A StreamSets SDK for Python script for Control Hub 3.x to capture CPU and memory metrics as well as running pipelines from Data Collector
#!/usr/bin/env python
'''
This script writes a rolling log file that contains CPU usage and JVM heap memory metrics
for a given Data Collector registered with Control Hub 3.x, with a user definable refresh
interval, along with the number and names of Jobs running on the Data Collector at the time
of metrics collection.
Prerequisites:
@onefoursix
onefoursix / dataops-sdc-svc-ingress.yaml
Created April 21, 2023 23:27
StreamSets DataOps Platform Kubernetes manifest for SDC + Service + Ingress with TLS all the way down
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: streamsets-deployment-cfa81f1d-baf1-4d7d-9136-e0114a083bc9
name: streamsets-deployment-cfa81f1d-baf1-4d7d-9136-e0114a083bc9
namespace: ns1
spec:
replicas: 1
selector:
@onefoursix
onefoursix / dataops-backup.py
Last active March 11, 2023 03:01
A Python script to backup objects from StreamSets DataOps Platform
#!/usr/bin/env python
'''
This script exports Fragments, Pipelines, Jobs, and Job Templates from StreamSets DataOps Platform
The current version of this script does not export Connections, Tasks, nor Topologies
Prerequisites:
- Python 3.6+; Python 3.9+ preferred