Skip to content

Instantly share code, notes, and snippets.

@martin-kokos
Created July 11, 2018 12:30
Show Gist options
  • Save martin-kokos/8a617a700a3ee8eb978c9ba4bc8f8ee2 to your computer and use it in GitHub Desktop.
Save martin-kokos/8a617a700a3ee8eb978c9ba4bc8f8ee2 to your computer and use it in GitHub Desktop.
Personal notes of pyData Berlin 2018

Populism & ML (political sciences)

  • scholars do not agree on definition
  • scrape manifests and speeches, label by expert
  • correlate populism x time in office x exit strategies
  • don't evaluate countries bellow democratic threshold (outliers)
  • "Populists in Europe"
  • used: TF-IDF, scattertext, gensim, pyLDA
  • cat and mouse problem - populists can avoid looking like populists

Smart contracts (Ethereum)

  • python library populus for unit testing smart contracts written in Solidity
  • remix.ethereum.org - Solidity IDE
  • metamask.io - dApps in browser

Visualising CNN

NLP and psychology

  • bunch.ai - Culture analytics

ctparse

  • natural time representation parser

Spacy&prodigy (NLP)

Maximizing failure probability

  • underthink
  • overexpect
  • outsource
  • wire all together

Spacy features

  • generic entity types
  • prodigy annotation (and modeling) tool
  • "time to first evidence" concept

mobile.de (eBay) personalized recommendations

  • Bayes approach
  • user segmentation

buying journey (market)

Apache Arrow (data pipes)

  • Spark > Apache Hive
  • Arrow eliminates data conversion by providing shared data structure with bindings
  • 200x performance in piping data
  • data pipelines blog https://www.inovex.de/blog/

Spark, Beam, TF (data pipes)

  • feature preparation pipeline: TFX, Kubeflow, TF.Transform
  • Dask is better if one uses only python

Going Full-stack (product management)

Data systems performance (technical progress)

  • delta encoding, etc.
  • Be clear when communicating, no buzzwords
  • Define what is desired, eg. Clustering by attributes
  • Daimond.ai
  • prof. Jens Dittrich on YouTube

Extending pandas (data handling)

  • df.info()

0.23+

  • ExtensionDtype
  • ExtensionArray

Apache Arrow

  • user defined functions
  • more native types
  • efficient memory, I/O,

Numba

  • acceleration with just decorator for for-loops
  • jitclass for data store

Archer

  • custom data types to avoid py objects
  • similar: cyberpandas, geopandas

Rasa workshop (chatbot)

  • Open Source AI conversational framework
  • very nice
  • just see the presentation

Multi-armed Bandits worshop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment