Lots of lots of people!
- List of rules
- Operator (worker), DAG (instructions), Task (job), Connection (credentials), Hooks (common interfaces to external services, Slack Hook), Variables (envs), XComs (small messages between Tasks)
- github.com/karpenkovarya/airflow_for_beginners
- Focus on a ML workflos/Pipeline
- MLOps: source control, create enviroments, use pipelines, CI+CD
- Work with docker images
- ml.azure.com
- bit.ly/PyConDE-mlops
- meta-programming
- DRY or punish your coworkers
- Why not repeat yourself
- decorators
- using classes hierachy
- metaclass
- init_subclass
- template = dedent(), exec(template, var)
- predict the missing word using bidirectional (left and right sides of the sentence)
- pretraining unsupervised + fine-tuning supervised (f.ex. POS of words)
- Bert input: word embedding + position in the sentence + segment
- from raw text to tokens : word tokenization
- https://speakerdeck.com/stecklin/why-you-should-not-train-your-own-bert-model-for-different-languages-or-domains
- research enviroment (jupyter) vs development env
-
- results inconstintency, features inc...
- pay attention: feat calculation
- you can limit data sources
- features: calculation, test business values
- Use tags to define parts of the process: feature retrieving, engineering, etc
- Notebook(Exploration) -> script
- Looking for data column lineage
- Languages fight in different uses/platforms
- Compiler to webassembly?
- Gradient (saas, CI addon to check experiments)
- sacred (defined in code) + omniboard
- mlflow (in code) enforces local env,
- dvc
- data collection(x), management(x), evaluations(), application ()
- use titanic instructive example
- high quality data -> good predictions
- it will not always work
- https://pmbaumgartner.github.io/blog/applied-nlp-lessons/
- Swagger 2.0
- bravado* and pyramid-swagger
- paralell with dask
- save state
- feature names
- ColumnTransformer
- permutation_importance
- Real tests are costly
- eli5
- partial dependent plots
- shap
- Tech debt factory
- Pyqtgraph