Skip to content

Instantly share code, notes, and snippets.

View lassebenni's full-sized avatar

Lasse Benninga lassebenni

View GitHub Profile
/**
* Retrieves all the rows in the active spreadsheet that contain data and logs the
* values for each row.
* For more information on using the Spreadsheet API, see
* https://developers.google.com/apps-script/service_spreadsheet
*/
function readRows() {
var sheet = SpreadsheetApp.getActiveSheet();
var rows = sheet.getDataRange();
var numRows = rows.getNumRows();

"Saving" your files in Git is a bit different than in most programs or editors. In Git nomenclature, 'saving' your code or changes is referred to as committing code. This means that the changes are recorded as 'history' of the current working directory.

A commit is the Git equivalent of a "save". Traditional saving should be thought of as a file system operation. Saving changes ~ https://www.atlassian.com/git/tutorials/saving-changes

A commit applies to the entire directory, and all changes (i.e. removing a file, changing some code, renaming a folder) are 'added' simultaneously in one commit. These commits (saved changes) can be stacked on the developer's local repo before being 'pushed' out to the source repository. This makes it possible for the developer to keep working on a feature (or multiple) without internet connection, all the while keeping a clear working history of his/her changes. When the developer is ready to connect to the source/public repo, all these commits can be pushed in one go and adde

@lassebenni
lassebenni / hadoop-tldr.md
Created April 29, 2020 19:46
About hadoop

TLDR; Hadoop is a framework of distributed storage and distributed processing of very large data sets on a cluster.

The distributed storage part means that the data is stored in pieces (blocks) on multiple computers (nodes) in a resilient way so in case of hardware failure of one of the nodes, the data stays available. This storage system is called HDFS (Apache Hadoop Distributed File System) and acts like one single storage device (i.e. hard disk) even though it is comprised of many, many different nodes that store a bit of the entire data. This is very cost-efficient as one can keep adding hardware to keep up with increasing amounts of data (horizontal scaling).

The next part is distributed processing, which means that a task (varying from simple to complex) can be split into equal parts and given to nodes (which use their own memory/cpu power) to process in parallel. That way, just like in distributed storage, many computers use their resources to act as a single "super computer" when handling a task fo

  1. Connect to spark by creating a spark context.

from pyspark import SparkContext, SparkConf conf = SparkConf().setAppName('somename').setMaster('local') sc = SparkContext(conf=conf) The appName parameter is a name for your application to show on the cluster UI. master is a Spark, Mesos or YARN cluster URL, or a special “local” string to run in local mode. In practice, when running on a cluster, you will not want to hardcode master in the program, but rather launch the application with spark-submit and receive it there. However, for local testing and unit tests, you can pass “local” to run Spark in-process.

  1. Creating an RDD. RDD's are distributed objects that contain data. Mostly used for lists. Spark revolves around the concept of a resilient distributed dataset (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel. There are two ways to create RDDs: parallelizing an existing collection in your driver program, or referencing a dataset in an external storage system
@lassebenni
lassebenni / pylang.md
Last active April 30, 2020 11:12
Python language

Memory allocation in Python

Python variables are fundamentally different than variables in C or C++. In fact, Python doesn’t even have variables. Python has names, not variables.

@lassebenni
lassebenni / webdev.md
Created April 29, 2020 06:28
web development

Hosting websites

Virtual hosts are a way to host multiple sites on a single IP, instead of just one site for one ip.


@lassebenni
lassebenni / bash.md
Last active January 20, 2021 10:36
#bash tricks

Output intermediate results to terminal using tee

TLDR; tee /dev/tty

Use the tee /dev/tty command in between pipes. This outputs the current STDIn to the terminal. This helps you to confirm the result in between pipes.

$ echo "hello" | sed -e 's/o//g' | tee /dev/tty | sed -e 's/hell/heaven/g' hell heaven

@lassebenni
lassebenni / display_variables.py
Last active July 8, 2019 07:04
display all variables in Jupyter Notebook
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
val foo: Array[String] = List("foo").+:("bar")(breakOut)
@lassebenni
lassebenni / blurImage
Created January 8, 2015 13:35
blurImage method
private Bitmap blurImage(Bitmap input) {
Bitmap outputBitmap = Bitmap.createBitmap(input.getWidth(), input.getHeight(), Bitmap.Config.ARGB_8888);
Canvas c = new Canvas(outputBitmap);
Paint paint = new Paint();
ColorFilter filter = new LightingColorFilter(0xff727272, 0x00000000);
paint.setColorFilter(filter);
RenderScript rs = RenderScript.create(getActivity());
ScriptIntrinsicBlur theIntrinsic = ScriptIntrinsicBlur.create(rs, Element.U8_4(rs));