Skip to content

Instantly share code, notes, and snippets.

Last active May 21, 2017 17:09
Show Gist options
  • Save liopic/1a6fc83528be4789b1ebf4e81d651686 to your computer and use it in GitHub Desktop.
Save liopic/1a6fc83528be4789b1ebf4e81d651686 to your computer and use it in GitHub Desktop.
PyDataBcn 2017

keynote Travis Oliphant

  • NumFocus (ngo)
  • ML -> python
  • anaconda makes ML magic available to mortals
    • modeling, predicting, classif, visualization
    • feat labeling, data clean, data extrac, scaling, deploy
  • spyder IDE
  • recommends: Scikit, TF, Keras, XGBoost
  • intros de numpy, scipy(stats helpers), matplotlib
  • numba (bigger nodes, scale up) vs dask (more nodes, scale out), blaze (best of both: GPU cluster)
  • jupyterLab

keynote Holden Karau

  • PySpark
  • RDDs/Dataframes
  • FP
  • DAG (& the query plan)
  • Py4J (py access java obj in JVM)

climate data

  • weather API
  •, github/ibm-cds-labs/python-notebooks
  • pixiedust (graphs with menu editable)
    • mapbox
  • and
  • scipy.interpolate to make a map
  • forecast weather to change retail offers
  • pd.merge_

Happiness inside the job

  • Tuesday is the saddest day
  • Exploring data
    • Choose day to post job offer
  • graphs employeeA - employeeB
    • intracompany interactions
  • ML churn prediction input
    • Employee individual features
    • Company wide features
    • Employee-company features
    • Social features

Analyzing code contributions with networkX and matplotlib

  • cohesion of a group
    • robustness
    • overlap
  • conectivity: remove actors until group disconected, or diferent paths
    • k-components

Rolling Pandas

∘ in series or dataframes ∘ inclusion-exclusion, summed area tables

Asteroid prediction impact

  • TensorFlow
    • tensor = n-dimensional
    • flow = graph that shows flow of the data
  • tensorflow google neural network visulization
  • google released images with labels library
  • tensor flow codelab in her github
  • steps: explore dataset, recognition protocol, 1st layer, evaluation

Squeeze your big data

  • old time tales: the faster the transmision line, the less the compression is needed
    • modern CPUs are so fast that memory bus is bottleneck
  • Blosc -> compressor that uses multiple cores
  • data containers, chunked containers
    • On disk: HDF5 format, NetCDF4
    • In memory: bcolz, zarr
  • compression in ML
    • Tuple Oriented Coding
  • Bandwidth that sends data to GPU is slow, compress from CPU to GPU.
  • Only in recent CPUs
  • Use compressed data chunks


  • control all HW easily
  • used python in all steps!

Jupyter as interactive dashboard

  • relies in web front, but not ready until later stages of the project
  • Early state-> prototype to validate, understand the data
  • js UI is difficult -> use jupyter
  • ipywidgets
    • add components
    • layout widgets (boxes, tabs, accordion)
    • jupyter in dashboard mode
    • far ideal for production
  • good for prototype
    • for production: kibana, grafana


  • Distributed ledger (you need C in CAP)
  • pyledger

Marketing data science

  • 27 features to represent customers
  • personality segments & groups
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment