Skip to content

Instantly share code, notes, and snippets.

View marcusrehm's full-sized avatar

Marcus Rehm marcusrehm

View GitHub Profile
@marcusrehm
marcusrehm / UnionSparkDataFrame.scala
Last active September 25, 2020 15:16
Function to join Spark DataFrames using Spark context to improve performance.
def unionAll(dfs: Seq[DataFrame]): DataFrame = {
val spark = SparkSession.builder().getOrCreate()
spark.sqlContext.createDataFrame(
spark.sparkContext.union(dfs.map(df => df.rdd)),
dfs.head.schema
)
}
@marcusrehm
marcusrehm / Alpine_Dockerfile_Openjdk11
Last active May 2, 2019 18:00
Configure Azul Zulu Openjdk11 on Alpine - Docker
# Define Java environment
RUN mkdir /usr/lib/jvm/
RUN chmod -R +x /usr/lib/jvm/
COPY /install/zulu11.31.11-ca-jdk11.0.3-linux_musl_x64.tar.gz /usr/lib/jvm/
RUN tar -xzvf /usr/lib/jvm/zulu11.31.11-ca-jdk11.0.3-linux_musl_x64.tar.gz -C /usr/lib/jvm/ \
&& ln -s /usr/lib/jvm/zulu11.31.11-ca-jdk11.0.3-linux_musl_x64 /usr/lib/jvm/default-jvm \
&& ln -s /usr/lib/jvm/default-jvm/bin/java /usr/bin/java \
&& rm /usr/lib/jvm/zulu11.31.11-ca-jdk11.0.3-linux_musl_x64.tar.gz
ENV JAVA_HOME /usr/lib/jvm/default-jvm/
ENV LD_LIBRARY_PATH /usr/lib/jvm/default-jvm/lib:/usr/lib/jvm/default-jvm/lib/server
@marcusrehm
marcusrehm / databricks_add_group_member.py
Created November 30, 2018 16:50
Databricks - Groups and Users admin
import requests
import base64
DOMAIN = ''
TOKEN = b''
user_name = ""
parent_name = "data-engineers"
@marcusrehm
marcusrehm / model_importance.py
Created July 3, 2018 18:15
View model importance in Spark
print('Feature Importance:')
featureImportances = model.stages[7].featureImportances
for keyPair in spark.sparkContext \
.parallelize(zip(features,
featureImportances.toArray())) \
.sortBy(lambda pair: -pair[1]) \
.collect():
print('{0}: {1}'.format(keyPair[0], keyPair[1]))
@marcusrehm
marcusrehm / extract_telefone.py
Created November 7, 2017 13:10
Expressão Regular para números de telefone fixo e celular do Brasil.
import re
p_celular = re.compile("^(0){0,1}([1-9]{2}){0,1}([7-8]|9[1-9])[0-9]{3}[0-9]{4}$")
print("Match no celular: ", re.search(p_celular, '022999860983'))
print("Não é um celular: ", re.search(p_celular, '01938735599'))
p_fixo = re.compile("^(0){0,1}([1-9]{2}){0,1}([2-6])[0-9]{3}[0-9]{4}$")
print("Match no fixo: ", re.search(p_fixo, '01938735599'))
print("Não é um fixo: ", re.search(p_fixo, '022999860983'))
@marcusrehm
marcusrehm / genderize.io.R
Last active August 29, 2015 14:15
Simple function to acquire genders for a vector of persons first names.
require('jsonlite')
#Get genders for a vector of first names.
#Usage: genders <- getGenders(c('joão', 'fábio', 'lúcia', 'rúbens'))
getGenders <- function (raw_names) {
url_api <- 'http://api.genderize.io?'
for(i in 1:length(raw_names)) {
url_api <- paste(url_api, 'name[', i - 1, ']=',
iconv(raw_names[i], to='ASCII//TRANSLIT'),
'&',
@marcusrehm
marcusrehm / odi_ssas_process_procedure.py
Last active August 29, 2015 14:14
Script to process Analysis Services cubes (over XMLA) from Oracle Data Integrator procedures. Procedure options "SSAS_URL", "SSAS Database", "Cube", "Processing Option" must be created before use.
import urllib
import urllib2
url = '<%=odiRef.getContext("SSAS_URL")%>'
data = '<?xml version="1.0"?><Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/">' +
'<Body>' +
'<Execute xmlns="urn:schemas-microsoft-com:xml-analysis"><Command>' +
'<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">' +
'<Parallel>' +
'<Process xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200" xmlns:ddl300="http://schemas.microsoft.com/analysisservices/2011/engine/300" xmlns:ddl300_300="http://schemas.microsoft.com/analysisservic
@marcusrehm
marcusrehm / change_sql_server_db_state.sql
Last active September 1, 2020 17:37
This script helps bring all databases with Suspect, Single User and Recovery Pending state to Online state in SQL Server 2012.
declare @dbname varchar(255);
DECLARE dbname_cursor CURSOR FOR
SELECT name--, database_id, create_date, STATE_DESC
FROM sys.databases
WHERE STATE_DESC in ('SINGLE_USER', 'SUSPECT', 'RECOVERY_PENDING');
OPEN dbname_cursor
FETCH NEXT FROM dbname_cursor
INTO @dbname