Jacek Laskowski jaceklaskowski

## tensorflow.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                jaceklaskowski
                / tensorflow.md
            
            
              Last active
              August 7, 2017 06:29
            
              
                Notes about TensorFlow (before settling on Apache BEAM and Databricks' TensorFrames)
              
          
    TensorFlow

What is TensorFlow?


Google's TensorFlow is a open source Deep Learning neural network machine learning library

Grew out of Google's DistBelief v2 = Google's Brain project


Building a system that simplifies deployment of large-scale machine learning models to a variety of hardware (thousands of servers in datacenters, smartphones, GPUs).
Much like Theano - a popular deep learning framework.
Data Flow Graph (aka Computational Graph or TensorFlow Graph of Computation) with nodes for data or operations and edges for flow of data between nodes called tensor.
Tensor is a multi-dimentional array that flows between nodes.


## spark-exercise-custom-defaultsource.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                jaceklaskowski
                / spark-exercise-custom-defaultsource.md
            
            
              Last active
              January 9, 2018 19:19
            
              
                Exercise: Creating Custom Format for DataFrameReader in Apache Spark
              
          
Create a Scala/sbt project


Use IntelliJ IDEA


Add libraryDependencies for Spark 2.0.0 (RC2)
Create class mf.DefaultSource (or similar)
publishLocal (or similar)
./bin/spark-shell --packages organization:spark-mf-format_2.11:1.0.0
spark.read.format("mf").load("mojFormat.mf")

For the bravests:

  
## spark-summit-sf-2016-talks.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              1 star
            
          
                jaceklaskowski
                / spark-summit-sf-2016-talks.md
            
            
              Last active
              November 2, 2016 12:48
            
              
                Reviews of Spark Summit 2016 Talks -- Must-watches
              
          
    Awesome Talks -- Watch it!


Deep Dive: Apache Spark Memory Management - An excellent talk about Spark's memory management in the past releases and the upcoming 2.0. No code. The slides were awesome with a superb presentation style. Very informatory.
A Deep Dive Into Structured Streaming -- a superb talk about the upcoming Structured Streaming in Spark 2.0.
Structuring Spark: Dataframes, Datasets And Streaming -- another superb talk about the reasons for structuring Spark using Datasets by the one and only Michael Armbrust.
Large-Scale Deep Learning with TensorFlow by Jeff Dean (Google) -- just yesterday I was thinking about feature vectors and how close they map to the real objects (they are supposed to represent) and that gave me the Aha moment that the more features the better but you need to be careful with over-featuring the m


## sparksummit-west-2016.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              5 stars
            
          
                jaceklaskowski
                / sparksummit-west-2016.md
            
            
              Last active
              March 24, 2017 11:43
            
              
                Spark Summit West 2016 Sparked My Interest -- Spark Summit West 2016 in San Francisco (to review at the earliest convenience)
              
          
    Things that Sparked My Interest

Agenda


https://spark-summit.org/2016/schedule/

Links to Review

Grouped by topic

  
## spark-hackathon.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                jaceklaskowski
                / spark-hackathon.md
            
            
              Created
              May 14, 2016 14:52
            
              
                Apache Spark Hackathon
              
          
    Discussion: https://groups.google.com/forum/#!topic/scalania/JV8bELXNgC4
What


(git) Cloning repo + local build
Fixing compilation warnings
Improving scaladoc
Being a Spark contributor: JIRA + pull requests - mastering the flow
What else? ...


## spark-jobserver-docker-macos.md

      
              1 file
            
          
              2 forks
            
          
              0 comments
            
          
              4 stars
            
          
                jaceklaskowski
                / spark-jobserver-docker-macos.md
            
            
              Last active
              August 1, 2018 11:28
            
              
                How to run spark-jobserver on Docker and Mac OS (using docker-machine)
              
          
    From https://github.com/spark-jobserver/spark-jobserver#getting-started-with-spark-job-server:

The easiest way to get started is to try the Docker container which prepackages a Spark distribution with the job server and lets you start and deploy it.

➜  spark-jobserver git:(master) docker-machine version
docker-machine version 0.7.0, build a650a40

// https://gist.github.com/radekg/ec5a1575c450a48e5cba


## apache-spark-meetup.md

      
              1 file
            
          
              0 forks
            
          
              6 comments
            
          
              0 stars
            
          
                jaceklaskowski
                / apache-spark-meetup.md
            
            
              Last active
              October 15, 2015 06:50
            
              
                What people asked to cover at Apache Spark meetups
              
          
    Warsaw Scala Enthusiasts meetup about Apache Spark themed Let's Scala few Apache Spark apps together! and the follow-up Let's Scala few Apache Spark apps together - part 2!.
Many, many people answered the question:

EN: What and how would you like to learn at the meetup (about Apache Spark)?

The answers are as follows (and are going to be the foundation for the agenda):

Set up a cluster using many laptops and see how much it could handle.
MLlib with a simple classification like logistic regression.


## jvm-tools.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                jaceklaskowski
                / jvm-tools.md
            
            
              Created
              September 4, 2015 09:55
            
              
                I should have known these tools earlier - a story about jps, jstat and jmap
              
          
    From http://stackoverflow.com/a/32393044/1305344:
object size extends App {
  (1 to 1000000).map(i => ("foo"+i, ()))
  val input = readLine("prompt> ")
}

Run it with sbt 'runMain size' and then use jps (to know the pids), jstat -gc pid (to query for gc) and jmap (similar to jstat) to analise resource allocation.

  
## spark-intro.md

      
              1 file
            
          
              4 forks
            
          
              1 comment
            
          
              3 stars
            
          
                jaceklaskowski
                / spark-intro.md
            
            
              Last active
              February 29, 2020 19:38
            
              
                Introduction to Apache Spark
              
          
    Introducting Apache Spark


What use cases are a good fit for Apache Spark? How to work with Spark?

create RDDs, transform them, and execute actions to get result of a computation
All computations in memory = "memory is cheap" (we do need enough of memory to fit all the data in)

the less disk operations, the faster (you do know it, don't you?)


You develop such computation flows or pipelines using a programming language - Scala, Python or Java <-- that's where ability to write code is paramount
Data is usually on a distributed file system like Hadoop HDFS or NoSQL databases like Cassandra
Data mining = analysis / insights / analytics


log mining


## sphinx-dockerd.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                jaceklaskowski
                / sphinx-dockerd.md
            
            
              Last active
              April 23, 2021 06:33
            
              
                Writing docs using Sphinx (inside Docker)
              
          
    Steps:

Build a Docker image and install sphinx inside
Run the image to have a complete working environment to create docs.

See https://github.com/subuser-security/subuser/blob/master/docs/Makefile.
Building jaceklaskowski/sphinx image

# Sphinx doc system containerized