Skip to content

Instantly share code, notes, and snippets.

View gggordon's full-sized avatar
💭
How can we make lives better today 🤔?

Gilroy Gordon gggordon

💭
How can we make lives better today 🤔?
View GitHub Profile
@dusenberrymw
dusenberrymw / spark_tips_and_tricks.md
Last active February 8, 2023 05:11
Tips and tricks for Apache Spark.

Spark Tips & Tricks

Misc. Tips & Tricks

  • If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
  • Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
  • Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the
@gggordon
gggordon / ntfs_write_support_for_mac.txt
Created February 4, 2016 09:13
NTFS Write Support for MAC
NTFS Write Support On OS X Mountain Lion
Posted on April 21, 2013
Source: https://prateekvjoshi.com/2013/04/21/ntfs-write-support-on-os-x-mountain-lion/
If you have noticed, Mac OS X doesn’t support writing onto NTFS disks. But not to worry, you don’t have to install any third party drivers to enable this. Mountain Lion 10.8.3 already has native write support for the NTFS. OSX Mountain Lion does have built-in support for NTFS, and it can read and write. However, Apple does not enable it by default.
Here is what you should do:
Uninstall other 3rd-party NTFS software, like Paragon, Tuxera or NTFS-3G.
Edit /etc/fstab (you can do this with “sudo vi /etc/fstab”)
Add the following line:
@stephanetimmermans
stephanetimmermans / ubuntu-maven-3
Last active November 16, 2020 09:04
Install Maven3 on Unbuntu 14.04
#sudo apt-get remove maven2
sudo add-apt-repository "deb http://ppa.launchpad.net/natecarlson/maven3/ubuntu precise main"
sudo apt-get update
sudo apt-get install maven3
#If you encounter this:
#The program 'mvn' can be found in the following packages:
# * maven
# * maven2
@cobyism
cobyism / gh-pages-deploy.md
Last active May 25, 2024 08:30
Deploy to `gh-pages` from a `dist` folder on the master branch. Useful for use with [yeoman](http://yeoman.io).

Deploying a subfolder to GitHub Pages

Sometimes you want to have a subdirectory on the master branch be the root directory of a repository’s gh-pages branch. This is useful for things like sites developed with Yeoman, or if you have a Jekyll site contained in the master branch alongside the rest of your code.

For the sake of this example, let’s pretend the subfolder containing your site is named dist.

Step 1

Remove the dist directory from the project’s .gitignore file (it’s ignored by default by Yeoman).