Skip to content

Instantly share code, notes, and snippets.

View ChongTang's full-sized avatar

Chong Tang ChongTang

View GitHub Profile
@ChongTang
ChongTang / pyspark_memory_tune.md
Last active February 27, 2019 00:50
Things about how to tune Spark memory related parameters.

How to tune Spark w.r.t memory issues?

If you just want to how to properly set parameter values, just go to this section.

Some notes about cache:

Spark provides its own native caching mechanisms, which can be used through different methods such as .persist(), .cache(), and CACHE TABLE. This native caching is effective with small data sets as well as in ETL pipelines where you need to cache intermediate results. However, Spark native caching currently does not work well with partitioning, since a cached table does not retain the partitioning data. A more generic and reliable caching technique is storage layer caching.

A nice image to show how memory is used on each executor yarn-spark-memory.png

  1. yarn.nodemanager.resource.memory-mb: controls the maximum sum of memory used by all containers on each Spark node.
@ChongTang
ChongTang / convert id_rsa to pem
Created June 30, 2017 12:56 — forked from mingfang/convert id_rsa to pem
Convert id_rsa to pem file
openssl rsa -in ~/.ssh/id_rsa -outform pem > id_rsa.pem
chmod 700 id_rsa.pem
@ChongTang
ChongTang / myModule.js
Created July 21, 2015 01:47
AngularJS auto focus on another input when an input's text reach certain length
// The AngularJS code
var app = angular.module('autofocus', []);
app.directive('autofocusWhen', function () {
return function (scope, element, attrs) {
scope.$watch('maxLengthReach', function(newValue){
if (newValue.length >= 5 ) {
element[0].focus();
}
});