Skip to content

Instantly share code, notes, and snippets.

View frank-leap's full-sized avatar

Francisco Lopez frank-leap

  • GSR
  • Spain
View GitHub Profile
@frank-leap
frank-leap / s3count.md
Created February 9, 2018 16:11 — forked from cjdd3b/s3count.md
How to count files in an S3 bucket

Counting files in S3 buckets and folders is harder than it should be. But here's a way to get it done using s3cmd:

  1. Install S3cmd
  • On Mac, brew install s3cmd
  • On Windows, go here
  1. From the command line, run s3cmd --configure

  2. Add your credentials when prompted.

@frank-leap
frank-leap / build.sbt
Created November 20, 2016 19:24 — forked from r-wheeler/build.sbt
build.sbt for stanford NLP
name := "Simple Project"
version := "1.0"
libraryDependencies += "edu.stanford.nlp" % "stanford-corenlp" % "3.3.0"
libraryDependencies += "edu.stanford.nlp" % "stanford-corenlp" % "3.3.0" classifier "models"
@frank-leap
frank-leap / 00_OSX_Docker_Machine_Setup.md
Created September 13, 2016 15:19 — forked from bitjockey42/00_OSX_Docker_Machine_Setup.md
Use native virtualization on OS X docker with xhyve

What this?

So one of the painful points of using docker on OS X is that you need to run a virtualbox VM, which often suffers from performance issues. With xhyve, a OS X virtualization system, and docker-machine-xhyve you can now have docker use the native OS X hypervisor to run containers.

No more dealing with virtualbox shenanigans!

In this script, I've also set up a way to autoconfigure terminal sessions to load docker's environment vars (dependent on docker-machine) so you do not have to run eval $(docker-machine env whatever) every time you open a new terminal window.

Requirements

@frank-leap
frank-leap / machine-learning.md
Created July 10, 2016 17:22 — forked from jaceklaskowski/machine-learning.md
Machine Learning for the very Impatient

How much of machine learning is statistics and vice versa?

Learning using https://www.coursera.org/learn/machine-learning/home/welcome

  • machine learning = teaching a computer to learn concepts using data — without being explicitly programmed.
  • Supervised learning = "right answers" given
  • Regression problem
    • continuous valued output
    • deduce the function for a given data set and predict other values
  • "in regression problems, we are taking input variables and trying to map the output onto a continuous expected result function."
@frank-leap
frank-leap / spark.md
Last active August 8, 2016 07:20 — forked from jaceklaskowski/spark-intro.md
Introduction to Apache Spark

Introducting Apache Spark

  • What use cases are a good fit for Apache Spark? How to work with Spark?
    • create RDDs, transform them, and execute actions to get result of a computation
    • All computations in memory = "memory is cheap" (we do need enough of memory to fit all the data in)
      • the less disk operations, the faster (you do know it, don't you?)
    • You develop such computation flows or pipelines using a programming language - Scala, Python or Java <-- that's where ability to write code is paramount
    • Data is usually on a distributed file system like Hadoop HDFS or NoSQL databases like Cassandra
    • Data mining = analysis / insights / analytics
  • log mining
@frank-leap
frank-leap / jvm-tools.md
Created July 1, 2016 07:32 — forked from jaceklaskowski/jvm-tools.md
I should have known these tools earlier - a story about jps, jstat and jmap

From http://stackoverflow.com/a/32393044/1305344:

object size extends App {
  (1 to 1000000).map(i => ("foo"+i, ()))
  val input = readLine("prompt> ")
}

Run it with sbt 'runMain size' and then use jps (to know the pids), jstat -gc pid (to query for gc) and jmap (similar to jstat) to analise resource allocation.

@frank-leap
frank-leap / spark-jobserver-docker-macos.md
Created July 1, 2016 07:32 — forked from jaceklaskowski/spark-jobserver-docker-macos.md
How to run spark-jobserver on Docker and Mac OS (using docker-machine)