- NumFocus (ngo)
- ML -> python
- anaconda makes ML magic available to mortals
- modeling, predicting, classif, visualization
- feat labeling, data clean, data extrac, scaling, deploy
- spyder IDE
- recommends: Scikit, TF, Keras, XGBoost
- intros de numpy, scipy(stats helpers), matplotlib
- numba (bigger nodes, scale up) vs dask (more nodes, scale out), blaze (best of both: GPU cluster)
- jupyterLab
- PySpark
- RDDs/Dataframes
- FP
- DAG (& the query plan)
- Py4J (py access java obj in JVM)
- console.ng.bluemix.net weather API
- datascience.ibm.com, github/ibm-cds-labs/python-notebooks
- pixiedust (graphs with menu editable)
- mapbox
- climexp.knmi.nl and ecmwf.int/en/forecast/datasets
- scipy.interpolate to make a map
- forecast weather to change retail offers
- pd.merge_
- medium.com/ibm-watson-data-lab
- seti.org/ML4SETI
- Tuesday is the saddest day
- Exploring data
- Choose day to post job offer
- graphs employeeA - employeeB
- intracompany interactions
- ML churn prediction input
- Employee individual features
- Company wide features
- Employee-company features
- Social features
- cohesion of a group
- robustness
- overlap
- conectivity: remove actors until group disconected, or diferent paths
- k-components
∘ in series or dataframes ∘ inclusion-exclusion, summed area tables
- TensorFlow
- tensor = n-dimensional
- flow = graph that shows flow of the data
- tensorflow google neural network visulization
- google released images with labels library
- tensor flow codelab in her github
- steps: explore dataset, recognition protocol, 1st layer, evaluation
- old time tales: the faster the transmision line, the less the compression is needed
- modern CPUs are so fast that memory bus is bottleneck
- Blosc -> compressor that uses multiple cores
- data containers, chunked containers
- On disk: HDF5 format, NetCDF4
- In memory: bcolz, zarr
- compression in ML
- Tuple Oriented Coding
- Bandwidth that sends data to GPU is slow, compress from CPU to GPU.
- Only in recent CPUs
- Use compressed data chunks
- control all HW easily
- used python in all steps!
- relies in web front, but not ready until later stages of the project
- Early state-> prototype to validate, understand the data
- js UI is difficult -> use jupyter
- ipywidgets
- add components
- layout widgets (boxes, tabs, accordion)
- jupyter in dashboard mode
- far ideal for production
- good for prototype
- for production: kibana, grafana
- Distributed ledger (you need C in CAP)
- pyledger
- 27 features to represent customers
- personality segments & groups