Skip to content

Instantly share code, notes, and snippets.

@reedv
reedv / hdfs_file_retention_cleanup.sh
Last active August 19, 2024 21:12
Script I've used for deleting older .BAK archive files from HDFS after some set retention period
#!/bin/bash
PROJECT_HOME=$1
TABLENAME=$2
{
DATASTORE="$(jq -r '."datastore"' $PROJECT_HOME/conf.json)/$TABLENAME"
NFS_PATH="$(jq -r '."nfs_path"' $PROJECT_HOME/conf.json)"
EXPORT_STAGE="$(jq -r '."export_stage"' $PROJECT_HOME/conf.json)/$TABLENAME"
echo "NFS path: $NFS_PATH"
@reedv
reedv / airflow_restart.sh
Last active August 19, 2024 21:08
Steps for restarting airflow daemons when process fails (IDK if should also check that all scheduler threads closed)
# check if scheduler and webserver daemons are running and kill them
cat $HOME/airflow/airflow-webserver.pid | xargs kill
cat $HOME/airflow/airflow-webserver-monitor.pid | xargs kill
cat $HOME/airflow/airflow-scheduler.pid | xargs kill
# delete .log, .err, .out, and .pid files
rm $HOME/airflow/airflow-webserver.*
rm $HOME/airflow/airflow-webserver-monitor.*
rm $HOME/airflow/airflow-scheduler.*
# restart daemons
airflow scheduler -D
@reedv
reedv / pre-ambari-install.md
Last active December 2, 2019 19:36
Everything you'll need installed before installing Apache Ambari

Note that version requirements for different versions of Ambari may differ. The binaries are what I am trying to use for v2.7.3 (see https://cwiki.apache.org/confluence/display/AMBARI/Installation+Guide+for+Ambari+2.7.3). If need a later version of maven, see https://www.tecmint.com/install-apache-maven-on-centos-7/

[root@HW001 ~]# java -version
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
[root@HW001 ~]#
[root@HW001 ~]#
[root@HW001 ~]#
@reedv
reedv / python_env_test_001.md
Last active October 30, 2019 00:24
Testing effect if a python script runs another child python script that sets env vars

Here we test to see if running a python script that calls another script that sets env variables also causes changes in the env of the parent:

envcheck_parent.py

import os
import subprocess

print "1p: %s\n" % os.environ['USER']
print "2p: %s\n" % ('TESTVAR' in os.environ)
@reedv
reedv / mapr-client-6.0.0-install-notes.md
Last active January 31, 2023 10:49
some notes on installing mapr client

In addition to the instructions here: https://mapr.com/docs/60/AdvancedInstallation/SettingUptheClient-install-mapr-client.html and here: https://mapr.com/docs/60/AdvancedInstallation/SettingUptheClient-redhat.html here are a few more helpful details to note...

Need to install the mapr-clinet rpm and yum install mapr-client does not always work. I go to the repo in web browser, navigate to the version and OS I need (eg. https://package.mapr.com/releases/v6.0.0/redhat/), find the latest mapr-client* rpm, copy the link, then...

[root@airflowetl ~]# mkdir tmp
[root@airflowetl ~]# cd tmp/
[root@airflowetl ~]# wget https://package.mapr.com/releases/v6.0.0/redhat/mapr-client-6.0.0.20171109191718.GA-1.x86_64.rpm
[root@airflowetl ~]# rpm -i mapr-client-6.0.0.20171109191718.GA-1.x86_64.rpm
@reedv
reedv / checking columns before bcp.md
Last active October 7, 2019 19:53
checking columns before bcp with pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from datetime import datetime
import subprocess

import sys
print "This is the name of the script: ", sys.argv[0]
TABLE = sys.argv[1]
@reedv
reedv / continue-trap-loop.md
Last active October 9, 2018 02:09
Example of trapping individual iterations of a bash loop via exit code without actually exiting from the whole loop

Suppose you have a bash script that runs over a set of files and does something to them. If it fails to do the things to those files, you want to send an email to yourself with a trap (like so https://gist.github.com/reedv/600f310c2b4a427b8fd91a8f870b6adc). This can be done if you logic is in a script files that gets called by the loop that has some logic like

#!/bin/bash

{
    loop_item=$1
    <do some stuff to the loop item> 
} || { echo -e "\n\nFailed to do the stuff, exiting script with explicit error code"; exit 255; }
@reedv
reedv / tensorflow4cygwin.md
Last active May 5, 2021 20:46
Getting tensorflow-gpu on a Windows10 cygwin environment

When just using cygwin's python3 to try use tensorflow, eg. something like...

apt-cyg install python3-devel
cd python-virtualenv-base
virtualenv -p `which python3` tensorflow-examples

found that there were some problems with installing tensorflow-gpu package using cygwin's python. Was seeing the error

$ pip install tensorflow --user
Collecting tensorflow
@reedv
reedv / centos-cache-flush.md
Last active July 19, 2018 02:23
addressing CentOS7 taking up large amounts of memory seemingly to do nothing

Was having problems on centos system where there seemed to be a low amount of free memory. Going through some debugging commands:

$ cat /proc/meminfo
$ free -m

could see that the output of "cat /proc/meminfo" and "free -mh" was showing large amounts of buffer/cache memory (being used by the node host OS (CentOS 7)) (on the order of ~20GB).