Skip to content

Instantly share code, notes, and snippets.

View fran0x's full-sized avatar

Francisco Lopez fran0x

View GitHub Profile
@fran0x
fran0x / cheat-sheet-sbt.md
Created August 11, 2016 09:38
Cheat Sheet SBT

Cheat Sheet SBT

To install sbt in OS X run brew install sbt (requires the almighty Homebrew installed first).

Basic commands are the following:

Command Action
Deletes all generated files (in the target directory) clean
Compiles the main sources (in src/main/scala and src/main/java directories) compile
@fran0x
fran0x / digitalocean-swarm.sh
Last active March 16, 2023 13:22
Script to create a Docker Swarm cluster in Digital Ocean
#!/bin/bash
# Configuration
#export DIGITALOCEAN_ACCESS_TOKEN= # Digital Ocean Token (mandatory to provide)
export DIGITALOCEAN_SIZE=512mb # default
export DIGITALOCEAN_REGION=nyc3 # default
export DIGITALOCEAN_PRIVATE_NETWORKING=true # default=false
#export DIGITALOCEAN_IMAGE="ubuntu-15-04-x64" # default
# For other settings see defaults in https://docs.docker.com/machine/drivers/digital-ocean/
class HowardHinnantDate < Formula
desc "C++ library for date and time operations based on <chrono>"
homepage "https://github.com/HowardHinnant/date"
url "https://github.com/HowardHinnant/date/archive/v2.4.1.tar.gz"
sha256 "98907d243397483bd7ad889bf6c66746db0d7d2a39cc9aacc041834c40b65b98"
bottle do
cellar :any
sha256 "4a838948afe43157af491b4310d36ae88e5cb731181568a19f66819198f24aee" => :catalina
end
class MysqlConnectorCxx < Formula
desc "MySQL database connector for C++ applications"
homepage "https://dev.mysql.com/downloads/connector/cpp/"
url "https://dev.mysql.com/get/Downloads/Connector-C++/mysql-connector-c++-8.0.18-src.tar.gz"
sha256 "63b20e446c0aadeddbbc5cef36db8222d602793e6f1e6de511bdf7bcb2181f86"
revision 2
depends_on "boost" => :build
depends_on "cmake" => :build
depends_on "mysql-client"
#!/bin/bash
# forked from http://codegists.com/code/spark-submit-emr/
# Minimum TODOs on a per job basis:
# 1. define name, application jar path, main class, queue and log4j-yarn.properties path
# 2. remove properties not applicable to your Spark version (Spark 1.x vs. Spark 2.x)
# 3. tweak num_executors, executor_memory (+ overhead), and backpressure settings
# the two most important settings:
num_executors=6
In the neural network terminology:
- one epoch = one forward pass and one backward pass of all the training examples
- batch size = the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you'll need.
- number of iterations = number of passes, each pass using [batch size] number of examples. To be clear, one pass = one forward pass + one backward pass (we do not count the forward pass and backward pass as two different passes).
Example: if you have 1000 training examples, and your batch size is 500, then it will take 2 iterations to complete 1 epoch.
@fran0x
fran0x / s3count.md
Created February 9, 2018 16:11 — forked from cjdd3b/s3count.md
How to count files in an S3 bucket

Counting files in S3 buckets and folders is harder than it should be. But here's a way to get it done using s3cmd:

  1. Install S3cmd
  • On Mac, brew install s3cmd
  • On Windows, go here
  1. From the command line, run s3cmd --configure

  2. Add your credentials when prompted.

@fran0x
fran0x / Spark_Jupyter_OS_X.md
Last active January 27, 2018 18:15
Steps to configure Jupyter (iPython Notebook) with Python (3.5.1) and Spark (1.6.0) kernel on Mac OS X (El Capitan)

Install Python3, Scala and Apache Spark via Brew (http://brew.sh/)

brew update
brew install python3
brew install scala
brew install apache-spark

Set environment variables

@fran0x
fran0x / coursera_deep_learning_3.md
Created December 21, 2017 20:08
coursera_deep_learning_3.md

orthogonalization: know what to tune to achieve what effect; for this would help to have orthogonal controls (steering wheel, acceleration, braking; well defined impact); however that's not usually the case in machine learning

assumptions we always made in ML:

  • fit training set well on cost function (human like): knobs would be: bigger network, better optimization algorithm (adam)
  • hope it does well in dev set: knobs would be: bigger (training) data set, regularization
  • hope it does well in test set: knob would be: bigger dev set
  • performs well in real world: k: change dev set or cost function
@fran0x
fran0x / spark_k8s.md
Last active October 12, 2017 13:28
Spark in Kubernetes