Skip to content

Instantly share code, notes, and snippets.

View kholis's full-sized avatar

Nur Kholis M kholis

View GitHub Profile
@timrobertson100
timrobertson100 / notes.txt
Last active February 8, 2024 06:08
Spark 2.4 on CDH 5.12
Based on ideas here, expanded to enable Hive support
https://www.linkedin.com/pulse/running-spark-2xx-cloudera-hadoop-distro-cdh-deenar-toraskar-cfa/
wget https://archive.apache.org/dist/spark/spark-2.4.8/spark-2.4.8-bin-without-hadoop.tgz
tar -xvzf spark-2.4.8-bin-without-hadoop.tgz
cd spark-2.4.8-bin-without-hadoop
cp -R /etc/spark2/conf/* conf/
cp /etc/hive/conf/hive-site.xml conf/
@asdaraujo
asdaraujo / kafka-python-sasl-gssapi.py
Last active June 5, 2024 07:32
kafka-python example with Kerberos auth
# Requirements: kafka-python gssapi krbticket
import os
import time
from kafka import KafkaConsumer, KafkaProducer
from krbticket import KrbConfig, KrbCommand
try:
os.environ['KRB5CCNAME'] = '/tmp/krb5cc_<myusername>'
kconfig = KrbConfig(principal='araujo', keytab='/path/to/<myusername>.keytab')
KrbCommand.kinit(kconfig)
@mrpeardotnet
mrpeardotnet / PVE-HP-ssacli-smart-storage-admin.md
Created November 25, 2019 22:10
HP Smart Storage Admin CLI (ssacli) installation and usage on Proxmox PVE (6.x)

HP Smart Storage Admin CLI (ssacli) installation and usage on Proxmox PVE (6.x)

Why use HP Smart Storage Admin CLI?

You can use ssacli (smart storage administrator command line interface) tool to manage any of supported HP Smart Array Controllers in your Proxmox host without need to reboot your server to access Smart Storage Administrator in BIOS. That means no host downtime when managing your storage.

CLI is not as convenient as GUI interface provided by BIOS or desktop utilities, but still allows you to fully manage your controller, physical disks and logical drives on the fly with no Proxmox host downtime.

ssacli replaces older hpssacli, but shares the same syntax and adds support for newer servers and controllers.

Installation

@achintya-kumar
achintya-kumar / Kafka-console-consumer-with-kerberos.md
Last active March 1, 2023 03:59
Kafka console consumer with Kerberos

Kafka console consumer with Kerberos

1. Create a jaas.conf file with the following contents:

KafkaClient {
   com.sun.security.auth.module.Krb5LoginModule required
   useKeyTab=true
   keyTab="keytabFile.keytab"
   storeKey=true
   useTicketCache=false
   serviceName="kafka"
@wagnerjgoncalves
wagnerjgoncalves / example_dataframe_api.py
Last active February 8, 2024 11:49
Pyspark using SparkSession example
# -*- coding: utf-8 -*-
"""
Example of Python Data Frame with SparkSession.
"""
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
@croxton
croxton / SSL-certs-OSX.md
Last active March 3, 2024 18:58 — forked from leevigraham/Generate ssl certificates with Subject Alt Names on OSX.md
Generate ssl certificates with Subject Alt Names

Generate ssl certificates with Subject Alt Names on OSX

Open ssl.conf in a text editor.

Edit the domain(s) listed under the [alt_names] section so that they match the local domain name you want to use for your project, e.g.

DNS.1   = my-project.dev

Additional FQDNs can be added if required:

@linar-jether
linar-jether / simple_python_datasource.py
Last active May 24, 2023 01:22
Grafana python datasource - using pandas for timeseries and table data. inspired by and compatible with the simple json datasource ---- Up-to-date version maintained @ https://github.com/panodata/grafana-pandas-datasource
from flask import Flask, request, jsonify, json, abort
from flask_cors import CORS, cross_origin
import pandas as pd
app = Flask(__name__)
cors = CORS(app)
app.config['CORS_HEADERS'] = 'Content-Type'
@TonyWuLihu
TonyWuLihu / parquetmerge.py
Created December 26, 2016 08:59
Merge parquet files locally
import sys
from datetime import date,datetime,timedelta
import datetime
import string
from pexpect import *
from os import remove,listdir
import os
import pprint
def chunks(l, n):
@aseigneurin
aseigneurin / Spark high availability.md
Created November 1, 2016 16:42
Spark - High availability

Spark - High availability

Components in play

As a reminder, here are the components in play to run an application:

  • The cluster:
    • Spark Master: coordinates the resources
    • Spark Workers: offer resources to run the applications
  • The application:
@rajkrrsingh
rajkrrsingh / LLAP_DEMO.md
Last active February 20, 2018 13:18
LLAP Application demo on HDP2.5 using tpc-ds dataset
su - hdfs 
wget https://github.com/hortonworks/hive-testbench/archive/hive14.zip
unzip hive14.zip
cd hive-testbench-hive14/
vi hive-testbench-hive14/settings/load-partitioned.sql -- remove G1GC and use Parrallel GC scheme
cd hive-testbench-hive14/
yum install gcc
echo 'export JAVA_HOME=/usr/jdk64/jdk1.8.0_77' >> ~/.bashrc
echo 'PATH=$PATH:$JAVA_HOME/bin' &gt;&gt; ~/.bashrc