Skip to content

Instantly share code, notes, and snippets.

View onefoursix's full-sized avatar

Mark Brooks onefoursix

View GitHub Profile
@onefoursix
onefoursix / impala-1.4-CDH-4.7-JDBC-maven-deps.xml
Last active August 29, 2015 14:05
Impala 1.4 CDH 4.7 JDBC Maven deps
<dependencies>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>0.10.0-cdh4.7.0</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
@onefoursix
onefoursix / impala-cdh-5.1.0-jdbc-maven-deps
Created August 8, 2014 06:38
Impala CDH5.1.0 JDBC Maven deps
<dependencies>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>0.12.0-cdh5.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
@onefoursix
onefoursix / gist:2ef20bbc959ef45a6c74
Last active August 29, 2015 14:06
Example of how to restart the HS2 role using the Cloudera Manager API (in python) CM4.6
#!/usr/bin/python
## **********************************************************************
## restart-hs2-role.py
##
## Example of how to restart the HS2 role in the Hive Service using the Cloudera Manager API
##
## Make sure to set the CM Host, CM Port, login, password, Cluster Name and Hive Service name
## in the "Settings" section below
##
@onefoursix
onefoursix / impalaQueries.py
Last active January 16, 2021 23:30
Python CM-API Example to pull Impala Query metrics
#!/usr/bin/python
## *******************************************************************************************
## impalaQueries.py
##
## Getting Info on Impala Queries
##
## Usage: ./impalaQueries.py
##
## *******************************************************************************************
@onefoursix
onefoursix / get-bdr-history.py
Last active October 12, 2017 23:04
Example of how to retrieve BDR history using the Cloudera Manager API
#!/usr/bin/python
## ********************************************************************************
## get-bdr-history.py
##
## Example of how to retrieve BDR command history using the Cloudera Manager API
##
## Usage: ./get-bdr-history.py <limit>
##
## <limit> is the maximum number of replication commands to retrieve
@onefoursix
onefoursix / get-hive-yarn-jobs-for-sentry-user.py
Last active June 13, 2019 02:19
Example of how to get info on Hive YARN jobs for a specific Sentry user using the Cloudera Manager API
#!/usr/bin/python
## ********************************************************************************
## get-hive-yarn-jobs-for-sentry-user.py
##
## Example of how to retrieve info on YARN Hive jobs for a given Sentry user
## using the Cloudera Manager API
##
## Usage: ./get-hive-yarn-jobs-for-sentry-user.py <sentry_user_name>
##
@onefoursix
onefoursix / get-yarn-long-running-jobs.py
Last active October 28, 2020 12:20
Example of using the Cloudera Manager API to poll for YARN health checks and to list long running jobs using a tsquery
#!/usr/bin/python
## ********************************************************************************
## get-yarn-long-running-jobs.py
##
## Usage: ./get-yarn-long-running-jobs.py
##
## Edit the settings below to connect to your Cluster
##
## ********************************************************************************
#!/usr/bin/python
import sys
from cm_api.api_client import ApiResource
## ** CM Connection Settings ******************************
cm_host = "localhost"
cm_port = "7180"
cm_login = "admin"
cm_password = "admin"
#!/usr/bin/python
## ********************************************************************************
## mr-usage-by-user.py
##
## Aggregates YARN MapReduce usage by day and user and writes the results to the console and to a file
##
## As the CM-API call "yarn.get_yarn_applications" can only return 1000 jobs max per call the script will make
## multiple calls to yarn.get_yarn_applications and aggregate all results between the script's global start and end times
##
@onefoursix
onefoursix / dockerfile-for-sdc-on-openshift
Last active October 10, 2019 16:08
sdc-docker-openshift
FROM streamsets/datacollector:3.11.0
RUN sudo chgrp -R 0 /etc/sdc /logs /data /resources /opt/streamsets-datacollector-3.11.0 && sudo chmod -R g=u /etc/sdc /logs /data /resources /opt/streamsets-datacollector-3.11.0
RUN sudo sed -i 's/http.realm.file.permission.check=true/http.realm.file.permission.check=false/' /etc/sdc/sdc.properties