Skip to content

Instantly share code, notes, and snippets.

View gggordon's full-sized avatar
💭
How can we make lives better today 🤔?

Gilroy Gordon gggordon

💭
How can we make lives better today 🤔?
View GitHub Profile
@gggordon
gggordon / spark_tips_and_tricks.md
Created September 30, 2021 15:14 — forked from dusenberrymw/spark_tips_and_tricks.md
Tips and tricks for Apache Spark.

Spark Tips & Tricks

Misc. Tips & Tricks

  • If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
  • Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
  • Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the
@gggordon
gggordon / gh-pages-deploy.md
Created January 31, 2021 05:33 — forked from cobyism/gh-pages-deploy.md
Deploy to `gh-pages` from a `dist` folder on the master branch. Useful for use with [yeoman](http://yeoman.io).

Deploying a subfolder to GitHub Pages

Sometimes you want to have a subdirectory on the master branch be the root directory of a repository’s gh-pages branch. This is useful for things like sites developed with Yeoman, or if you have a Jekyll site contained in the master branch alongside the rest of your code.

For the sake of this example, let’s pretend the subfolder containing your site is named dist.

Step 1

Remove the dist directory from the project’s .gitignore file (it’s ignored by default by Yeoman).

<?xml version="1.0" encoding="UTF-8"?>
<wsdl:definitions targetNamespace="http://om.open.ac.uk/" xmlns:apachesoap="http://xml.apache.org/xml-soap" xmlns:impl="http://om.open.ac.uk/" xmlns:intf="http://om.open.ac.uk/" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:wsdlsoap="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<wsdl:types>
<schema targetNamespace="http://om.open.ac.uk/" xmlns="http://www.w3.org/2001/XMLSchema">
<import namespace="http://schemas.xmlsoap.org/soap/encoding/"/>
<complexType name="ArrayOf_soapenc_string">
<complexContent>
<restriction base="soapenc:Array">
<attribute ref="soapenc:arrayType" wsdl:arrayType="soapenc:string[]"/>
@gggordon
gggordon / reset_mysql_root_password.sh
Created October 26, 2019 23:12
Reset MYSQL PASSWORD
#!/usr/bin/env bash
# Resetting mysql default password
NEW_PASSWORD='password'
su root
systemctl stop mysqld
systemctl set-environment MYSQLD_OPTS="--skip-grant-tables --skip-networking"
@gggordon
gggordon / students.csv
Created October 22, 2019 00:22
Students.csv
001 Rajiv Reddy 9848022337 Hyderabad 50.3
002 siddarth Battacharya 9848022338 Kolkata 23.8
003 Rajesh Khanna 9848022339 Delhi 89.7
004 Preethi Agarwal 9848022330 Pune 77.0
005 Trupthi Mohanthy 9848022336 Bhuwaneshwar 90
006 Archana Mishra 9848022335 Chennai 55
@gggordon
gggordon / install_mongodb_unattended.sh
Created August 21, 2019 01:22
Install mongo db unattended
#!/usr/bin/env bash
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
echo "deb http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list
sudo apt-get update
sudo apt-get install -y --allow-unauthenticated mongodb-org
sudo systemctl start mongod
sudo systemctl status mongod
sudo systemctl enable mongod
@gggordon
gggordon / install_nodejs_unattended.sh
Created August 21, 2019 01:21
Install Node js unattended
#!/usr/bin/env bash
curl -sL https://deb.nodesource.com/setup_8.x -o nodesource_setup.sh
sudo bash nodesource_setup.sh
sudo apt-get install -y nodejs
@gggordon
gggordon / ubuntu-lamp-php7-phpmyadmin-install.sh
Last active August 15, 2020 17:22
Ubuntu 16.0.4 Unattended LAMP installation
#!/usr/bin/env bash
# Copyright gggordon 2015
# License @MIT
# adapted from https://gist.github.com/gggordon/4b068ee89cf92a82d078
set -e
MYSQL_ROOT_PASS='password'
@gggordon
gggordon / hr_data.csv
Created October 16, 2017 09:22
Data Visualization Using Python - HR Data
We can't make this file beautiful and searchable because it's too large.
satisfaction_level,last_evaluation,number_project,average_montly_hours,time_spend_company,Work_accident,left,promotion_last_5years,sales,salary
0.38,0.53,2,157,3,0,1,0,sales,low
0.8,0.86,5,262,6,0,1,0,sales,medium
0.11,0.88,7,272,4,0,1,0,sales,medium
0.72,0.87,5,223,5,0,1,0,sales,low
0.37,0.52,2,159,3,0,1,0,sales,low
0.41,0.5,2,153,3,0,1,0,sales,low
0.1,0.77,6,247,4,0,1,0,sales,low
0.92,0.85,5,259,5,0,1,0,sales,low
0.89,1,5,224,5,0,1,0,sales,low
@gggordon
gggordon / conda-install-requirements-extended.bat
Last active October 30, 2017 04:30
Data Visualization using Python - Package Requirements
conda install alembic==0.9.6
conda install beautifulsoup4==4.4.1
conda install bleach==2.1.1
conda install blinker==1.3
conda install bokeh==0.12.9
conda install branca==0.2.0
conda install brewer2mpl==1.4.1
conda install Brlapi==0.6.4
conda install chardet==2.3.0
conda install checkbox-support==0.22