Francisco Lopez frank-leap

## s3count.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                frank-leap
                / s3count.md
            
            
              Created
              February 9, 2018 16:11
                — forked from cjdd3b/s3count.md
            
              
                How to count files in an S3 bucket
              
          
    Counting files in S3 buckets and folders is harder than it should be. But here's a way to get it done using s3cmd:

Install S3cmd


On Mac, brew install s3cmd
On Windows, go here


From the command line, run s3cmd --configure


Add your credentials when prompted.


## build.sbt
name := "Simple Project"

version := "1.0"

libraryDependencies += "edu.stanford.nlp" % "stanford-corenlp" % "3.3.0"

libraryDependencies += "edu.stanford.nlp" % "stanford-corenlp" % "3.3.0" classifier "models"

## 00_OSX_Docker_Machine_Setup.md

      
              6 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                frank-leap
                / 00_OSX_Docker_Machine_Setup.md
            
            
              Created
              September 13, 2016 15:19
                — forked from bitjockey42/00_OSX_Docker_Machine_Setup.md
            
              
                Use native virtualization on OS X docker with xhyve
              
          
    What this?

So one of the painful points of using docker on OS X is that you need to run a virtualbox VM, which often suffers from performance issues. With xhyve, a OS X virtualization system, and docker-machine-xhyve you can now have docker use the native OS X hypervisor to run containers.
No more dealing with virtualbox shenanigans!
In this script, I've also set up a way to autoconfigure terminal sessions to load docker's environment vars (dependent on docker-machine) so you do not have to run eval $(docker-machine env whatever) every time you open a new terminal window.
Requirements


## machine-learning.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                frank-leap
                / machine-learning.md
            
            
              Created
              July 10, 2016 17:22
                — forked from jaceklaskowski/machine-learning.md
            
              
                Machine Learning for the very Impatient
              
          
    How much of machine learning is statistics and vice versa?
Learning using https://www.coursera.org/learn/machine-learning/home/welcome

machine learning = teaching a computer to learn concepts using data — without being explicitly programmed.
Supervised learning = "right answers" given
Regression problem

continuous valued output
deduce the function for a given data set and predict other values


"in regression problems, we are taking input variables and trying to map the output onto a continuous expected result function."


## spark.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                frank-leap
                / spark.md
            
            
              Last active
              August 8, 2016 07:20
                — forked from jaceklaskowski/spark-intro.md
            
              
                Introduction to Apache Spark
              
          
    Introducting Apache Spark


What use cases are a good fit for Apache Spark? How to work with Spark?

create RDDs, transform them, and execute actions to get result of a computation
All computations in memory = "memory is cheap" (we do need enough of memory to fit all the data in)

the less disk operations, the faster (you do know it, don't you?)


You develop such computation flows or pipelines using a programming language - Scala, Python or Java <-- that's where ability to write code is paramount
Data is usually on a distributed file system like Hadoop HDFS or NoSQL databases like Cassandra
Data mining = analysis / insights / analytics


log mining


## jvm-tools.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                frank-leap
                / jvm-tools.md
            
            
              Created
              July 1, 2016 07:32
                — forked from jaceklaskowski/jvm-tools.md
            
              
                I should have known these tools earlier - a story about jps, jstat and jmap
              
          
    From http://stackoverflow.com/a/32393044/1305344:
object size extends App {
  (1 to 1000000).map(i => ("foo"+i, ()))
  val input = readLine("prompt> ")
}

Run it with sbt 'runMain size' and then use jps (to know the pids), jstat -gc pid (to query for gc) and jmap (similar to jstat) to analise resource allocation.

  
## spark-jobserver-docker-macos.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                frank-leap
                / spark-jobserver-docker-macos.md
            
            
              Created
              July 1, 2016 07:32
                — forked from jaceklaskowski/spark-jobserver-docker-macos.md
            
              
                How to run spark-jobserver on Docker and Mac OS (using docker-machine)
              
          
    From https://github.com/spark-jobserver/spark-jobserver#getting-started-with-spark-job-server:

The easiest way to get started is to try the Docker container which prepackages a Spark distribution with the job server and lets you start and deploy it.

➜  spark-jobserver git:(master) docker-machine version
docker-machine version 0.7.0, build a650a40

// https://gist.github.com/radekg/ec5a1575c450a48e5cba
	name := "Simple Project"

	version := "1.0"

	libraryDependencies += "edu.stanford.nlp" % "stanford-corenlp" % "3.3.0"

	libraryDependencies += "edu.stanford.nlp" % "stanford-corenlp" % "3.3.0" classifier "models"