- [2018-09-04 Tue 23:42] got this far with pytorch gan run: [Epoch 146/200] [Batch 885/938] [D loss: 0.372291] [G loss: 1.665622]
- [2018-10-12 Fri 12:56] note: G in GANs don’t see real image – perhaps makes it too slow to train. maybe it should have access to image? but seems to converge fairly quick to approximate - problems are when it is trying to get fine details.
- now this makes me think that perhaps a dialog between G and D for each part of image might help.
- training schedule of multiple iterations per one image so G tries to get close to one image at a time
- [2018-11-16 Fri 06:10] downloaded all pdfs for CS224n NLP course. Also made playlist of youtube lectures.
- [2019-02-06 Wed 09:32] my trial run couple weeks ago (meet#3?) of the spinningup code for RL. went well. on wks. 611 conda install gym 612 pip install gym 623 conda create -n spinupRL python=3.6 628 cd DevAcademics/ReinforcementLearning/spinningup 629 pip install -e . 630 conda list 631 python -m spinup.run ppo –hid [32,32] –env LunarLander-v2 –exp_name installtest –gamma 0.999\n 632 python -m spinup.run plot /home/will/DevAcademics/ReinforcementLearning/spinningup/data/installtest/installtest_s0 633 python -m spinup.run test_policy /home/will/DevAcademics/ReinforcementLearning/spinningup/data/installtest/installtest_s0
Machine Learning Playlist - YouTube
- Machine Learning | 160 out of 160 videos
- Average Duration: 0 days, 0 hours, 13 minutes and 12 seconds
- Total Duration: 1 days, 11 hours, 13 minutes and 33 seconds
use this material to start on autoencoders, via TDLS slack channel: Ehsan [7 hours ago] Just for the channel I copy my answer here as well: There is an abundance of work on autoencoders … hmmmm most of my knowledge comes from reading articles. Here we go: This paper is a must for VAEs: https://arxiv.org/abs/1312.6114
But read this one first: http://proceedings.mlr.press/v27/baldi12a/baldi12a.pdf This is a great review: http://www.cl.uni-heidelberg.de/courses/ws14/deepl/BengioETAL12.pdf You will particularly like this one: https://arxiv.org/pdf/1502.04156.pdf
"""Grab a cookie."""
import numpy as np
import matplotlib.pyplot as plt
z = np.linspace(-1,3, num=10000)
binary = z <= 0
hinge = np.maximum(0,1 -z)
quad = (z - 1) ** 2
logistic = np.log2(1 + np.exp(-z))
for name, loss in dict(binary=binary, hinge=hinge, logistic=logistic, quadractic=quad).items():
plt.plot(z, loss, label=name)
plt.legend(loc="best")
plt.tight_layout()
outfile = "losses.png"
plt.savefig(outfile, dpi=200, bbox_inches="tight")
print(outfile)
plt.show()
didn’t write down which one could be either:
Learning by Abstraction: The Neural State Machine – I think this Learning Neurosymbolic Generative Models via Program Synthesis NEURO-SYMBOLIC PROGRAM SYNTHESIS
MtL-Progress-github.io: Repository to track the progress in Meta-Learning [2019-08-24 Sat 12:00]
MichaelMMeskhi/MtL-Progress-github.io: Repository to track the progress in Meta-Learning (MtL), including the datasets and the current state-of-the-art for the most common MtL
- All the best big data tools and how to use them - Import.io
- Jupyter Notobook for beginner - most powerful tips - YouTube
- github desktop: all awesome lists, data science/ML repos
- academiclog tag
- both docker tech + data science tooling.
- containers services need ad-hoc swarming, getting things to work together. otherise just make one big container with everything for faster prototyping, turnaround.
- perhaps first one big box, and then develop swarms as proficiency increases and as targets are clearer.
- speed, experimentation first.
- data-sci, emacs
- replicate workflow as per here emacs + ipython workflow
- emacs Academic: {0/4} want emacs dev boxes 1st to test out on.
- Container DevOps
- also link sets in chrome
- python guide from D Kinghorn
- drivendata/cookiecutter-data-science: A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
Using IPython/Jupyter Notebook with PyCharm - Help | PyCharm Installing, Uninstalling and Upgrading Packages - Help | PyCharm
- installing package option to install to suer’s site packages directory
- (on winwk, Will\AppData\Roaming\Python)
- did this for jupyter, matplotlib, sympy. as per tutorial (using dummy project)
- many other dep packages also installed
- jupyter is metapackage: install all jupyter components
- jupyter install had error: needs MS visual C++ 10.0. can do it that way, or pip –user in shell
- same error.
- seems this can all be done outside pycharm, and then just select in projects preferences.
jupyter nbconvert notebook.ipynb --to markdown
pandoc notebook.md -o notebook.org
Widgets: Building Interactive Dashboards with Jupyter Project Jupyter | Widgets Interactive Visualizations In Jupyter Notebook – Towards Data Science
Connect to an existing kernel · Issue #2044 · jupyterlab/jupyterlab Initial server management implementation by lucbouchard1 · Pull Request #71 · jupyterlab/jupyterlab_app jwkvam/jupyterlab_vim: Vim notebook cell bindings for JupyterLab
- tensorflow
- conda package not watched by them
- conda env > virtualenv > pip > conda (docker another use case)
- docker for GPU recommended
- [X] update conda / anaconda
- pip3 requirements empty, all in pip, 322 total
- conda list - 574 total
- several are duplicates with pip installs
- latest conda root env export to yaml file: 358 in conda, 98 in pip
PackagesNotFoundError: The following packages are not available from current channels:
- r-r6==2.2.2=r3.4.1_0
- r-tibble==1.4.2=r3.4.1_0
- ca-certificates==2018.4.16=0
- r-bindr==0.1.1=r3.4.1_0
- zope.interface==4.5.0=py36h470a237_0
- yellowbrick==0.7=py36_1
- r-dbi==1.0.0=r341_0
- r-utf8==1.1.3=r3.4.1_0
- rpy2==2.9.3=py36r3.4.1_0
- jupyterlab==0.33.12=py36_0
- pcre==8.39=0
- constantly==15.1.0=py_0
- r-bit64==0.9_5=r3.4.1_0
- pytest-runner==4.2=py_0
- r-rlang==0.2.0=r3.4.1_0
- r-crayon==1.3.4=r3.4.1_0
- incremental==17.5.0=py_0
- readline==7.0=0
- r-git2r==0.21.0=r341h0c37787_0
- r-dbplyr==1.2.1=r341_0
- r-glue==1.2.0=r3.4.1_0
- json-rpc==1.10.3=py36_0
- r-blob==1.1.1=r3.4.1_0
- r-purrr==0.2.4=r3.4.1_0
- r-pillar==1.2.2=r341_0
- pytorch==0.4.1=py36_cuda0.0_cudnn0.0_1
- torchvision==0.2.1=py36_1
- protobuf==3.5.2=py36_0
- r-dplyr==0.7.4=r3.4.1_0
- r-base==3.4.1=3
- r-rcpp==0.12.15=r3.4.1_0
- hyperlink==17.3.1=py_0
- r-digest==0.6.15=r3.4.1_0
- onnx==1.1.2=py36h0c63530_0
- pyasn1-modules==0.2.1=py_0
- tzlocal==1.5.1=py_0
- libedit==3.1.20170329=0
- cssselect==1.0.3=py_0
- r-cli==1.0.0=r3.4.1_0
- r-tidyselect==0.2.4=r3.4.1_0
- yapf==0.22.0=py_0
- libprotobuf==3.5.2=0
- pyasn1==0.4.3=py_0
- r-magrittr==1.5=r3.4.1_0
- r-rsqlite==2.0=r3.4.1_0
- xgboost==0.72=py36_0
- service_identity==17.0.0=py_0
- r-prettyunits==1.0.2=r3.4.1_0
- r-bh==1.66.0_1=r3.4.1_0
- did conda install -c r r-git2r, to try and see if that helps -> nope
- conda install pytorch torchvision -c pytorch.
- previous install was gpu version. won’t work on mac
- conda install -c districtdatalabs yellowbrick
- r-dbplyr==1.2.1=r341_0
- r-pillar==1.2.2=r341_0
- torchvision==0.2.1=py36_1
- pytorch==0.4.1=py36_cuda0.0_cudnn0.0_1
- yellowbrick==0.7=py36_1
- py-spy sampling profiler benfred/py-spy: Sampling profiler for Python programs
- sympy, symbolic
Pweave is good for creating reports, tutorials, presentations etc. with embedded python code It can also be used to make websites together with e.g. Sphinx or rest2web.
(llvm used in other libs)HPC python library Numba: High-Performance Python with CUDA Acceleration | Hacker News Numba: High-Performance Python with CUDA Acceleration | Parallel Forall
Another good library arrayfire/arrayfire-python: Python bindings for ArrayFire: A general purpose GPU library. arrayfire/arrayfire: ArrayFire: a general purpose GPU library.
- cuda is a target, but can compile to others numba/numba: NumPy aware dynamic Python compiler using LLVM
- cupy/cupy: NumPy-like API accelerated with CUDA
- CuPy is an implementation of NumPy-compatible multi-dimensional array on CUDA. CuPy consists of the core multi-dimensional array class, cupy.ndarray, and many functions on it. It supports a subset of numpy.ndarray interface.
- like a smaller version of above inducer/pycuda: CUDA integration for Python, plus shiny features
- cuda: grids, blocks, then threads
- threadidx 3D
- thread does the execution
- block helps with indexing ?
- blocks should execute independantly. threads shared.
- cuda kernels are c/c++ code with additional syntax, most importantly __global__ for identifying the kernel function, and the <<<…>>> syntax for specifying grid size and block size
- for career, real world use
- this is a major goal for all tooling
- want large scale as well
- much of the pipeline automated such that only some selection is needed
- any and all tools that simplify any of the process
- will use different cloud platform than for internet facing.
- security is more an issue there. the ML cluster will be more isolated
- also all the current tools are likely less secure anyways
NEW LAUNCH! Integrating Amazon SageMaker into your Enterprise - MCL34… Machine Learning Models & Algorithms | Amazon SageMaker on AWS
Research Blog: Facets: An Open Source Visualization Tool for Machine Learning Training Data PAIR-code/facets: Visualizations for machine learning datasets
- Solved: Automatically cleaning your data - Microsoft Power BI Community
- Making data cleaning simple with the Sparkling.data library
- https://namara.io/#/ - some signup
- IBM BDU Labs | My Data
- OpenRefine/OpenRefine: OpenRefine is a free, open source power tool for working with messy data and improving it
- data prepping II [2018-08-25 Sat 16:27]
- best data cleaning munging tools - Google Search
- Seven Free Data Wrangling Tools
- What are the best data cleansing tools? - Quora
- What are the best resources to learn data wrangling (data cleaning)? - Quora
- What are the best languages and libraries for cleaning data? - Quora
- 7 Steps to Mastering Data Preparation with Python
- Janitor, a good R package for data cleaning – SWIMMING IN THE DATA LAKE – Medium
How’s Julia language (MIT) for ML? : MachineLearning Julia vs. Python: Julia language rises for data science | InfoWorld
JuliaEditorSupport JuliaCon 2018 | Making the test-debug cycle more efficient | Tim Holy - YouTube JuliaCon 2018 | Tools for making program analysis and debugging manageable | Jameson Nash - YouTube JuliaCon 2018 | Cassette: Dynamic, Context-Specific Compiler Pass Injection for Julia | J Revels - YouTube
DeepLearningFrameworks/Knet_CNN.ipynb at master · ilkarman/DeepLearningFrameworks TIOBE Index | TIOBE - The Software Quality Company Julia and “deep learning” : Julia
TensorFlow.jl/why_julia.md at master · malmaud/TensorFlow.jl,
- whole DatSci pipeline on GPU, lots of big names, easy scale out, python integration.
- rapidsai/cudf: cuDF - GPU DataFrame Library
- rapidsai/cuml: cuML - RAPIDS Machine Learning Library
- Other projects
- RAPIDS + BLAZINGSQL
- RAPIDS + DASK
- RAPIDS + XGBOOST
- RAPIDS + SPARK
OpenML Home [2019-08-17 Sat 09:48]
OpenML — OpenML 0.10.0 documentation
Democratizing Machine Learning As machine learning is enhancing our ability to understand nature and build a better future, it is crucial that we make it transparent and easily accessible to everyone in research, education and industry. The Open Machine Learning project is an inclusive movement to build an open, organized, online ecosystem for machine learning. We build open source tools to discover (and share) open data from any domain, easily draw them into your favourite machine learning environments, quickly build models alongside (and together with) thousands of other data scientists, analyse your results against the state of the art, and even get automatic advice on how to build better models. Stand on the shoulders of giants and make the world a better place.
- cookbook first to study, and with awesomelist to organize these link dumps
- A Kretz
- [ ] Dat Sci Cookbook: local repo DatSci Cookbook Repo [2019-08-21 Wed]
- pdf and
code examples
- pdf and
- Plumbers of Data Science - YouTube - Kretz, cookbook author
- [ ] Dat Sci Cookbook: local repo DatSci Cookbook Repo [2019-08-21 Wed]
- awesome datsci https://github.com/EthicalML/awesome-production-machine-learning
- links in academiclog: big browser dump – go thru ML experiment frameworks
- A Kretz
- prior notes [2019-08-15 Thu]
- when to use:
- dask
- mlflow
- polyaxon - seems more related to kubernetes, managing in production on clusters
- DVC - github-lfs + makefiles
- [ ] use with hservers and store big files on them?
- issue
- large files in a project folder that will need to be kept seperate somehow
- big data
- big models
- FGLab (Kaixhin) 3 ppl only, smaller project
- MLflow, Sacred, FGLab, Polyaxon alts(competitors).
- h2o, datarobot also alts
- kubeflow complements, can run the others on top of it.
- DVC compliment?
- sagemaker, airflow, glue go together
- airflow can use to build pipelines to work on kubernetes
- tutorial vids: google, mlflow, machine-learning-yearning, etc
- [ ] model / data parallel example
- Manifold company is an example boutique biz ? emulate
- when to use:
- NullConvergence/torch_temp: A(nother) Pytorch experimental template - uses sacred
- victoresque/pytorch-template: PyTorch deep learning projects made easy.
- ml-tooling/ml-project-template: ML project template facilitating both research and production phases.
- from ml-tooling Berlin group
- research & production
- MrGemy95/Tensorflow-Project-Template: A best practice for tensorflow project template architecture.
- williamFalcon/pytorch-lightning: The lightweight PyTorch wrapper for ML researchers. Scale your models. Write less boilerplate
- my clones
- distillpub/template: This is the repository for the distill web framework
- h5bp/html5-boilerplate: A professional front-end template for building fast, robust, and adaptable web apps or sites.
- how do you setup your ml pipeline? : MachineLearning
- medium - How do you manage your Machine Learning Experiments?
- How experiment management can improve the ROI of your machine learning projects
- How to Work With Stakeholders as a Data Scientist - Towards Data Science
- Reproducible model training: deep dive - Towards Data Science
- Slurm Workload Manager - Wikipedia
- How do you manage your machine learning experiments? : MachineLearning
- scussion How do you manage and keep track of your experiments? : MachineLearning
- Best way to manage ML experiements : MachineLearning
- How do you keep track of your experiment results? : MachineLearning
- What tools are used in practice to schedule training jobs, annotate datasets, keep track of past experiments… ? : MachineLearning
- using git for deep learning experiments - Google Search
- Compare to other ML e2e platforms · Issue #58 · mlflow/mlflow
- What are the current open source alternatives to MLflow? | Hacker News
- Towards Reproducible Research with PyTorch Hub | Hacker News
- Home - Guild AI
- Verta Enterprise Runthrough - YouTube
- The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction – Google AI
- Weights & Biases
- Forge, or how do you manage your machine learning experiments?
- neptune.ml: Experiment management tool that fits any workflow
- fast.ai · Making neural nets uncool again
- gems-uff/noworkflow
- Supporting infrastructure to run scientific experiments without a scientific workflow management system.
- for general python script experiments
- github group from Berlin, 8 repos Machine Learning Tooling
- IDSIA/sacred: Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.
- MIC-DKFZ/trixi
- Manage your machine learning experiments with trixi - modular, reproducible, high fashion. An experiment infrastructure optimized for PyTorch, but flexible enough to work for your framework and your tastes.
- trixi/pytorch_experiment.ipynb at master · MIC-DKFZ/trixi
- seba-1511/randopt: Streamlined machine learning experiment management.
- kubeflow/pipelines: Machine Learning Pipelines for Kubeflow
- TRAINS (fewer stars)
- TRAINS: An open-source, zero-integration tool to boost machine learning research
- allegroai/trains: TRAINS - Auto-Magical Experiment Manager & Version Control for AI
- allegroai/trains-server: TRAINS Server - Auto-Magical Experiment Manager & Version Control for AI
- trains/brief.md at master · allegroai/trains
- Allegro.ai - Deep Learning Computer Vision Platform
- trains - Allegro.AI
- Home - Guild AI
- Comet.ml | Supercharging Machine Learning
- mlflow/mlflow: Open source platform for the machine learning lifecycle
- Weights & Biases
- kubeflow/kubeflow: Machine Learning Toolkit for Kubernetes
- seba-1511/randopt: Streamlined machine learning experiment management.
- richardliaw/track: Track your ML project!
- kubeflow mlflow - Google Search
Github Search · machine learning project - useful looking stuff
- What’s your favorite logger? : MachineLearning
- How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform - Databricks
- Kaixhin/FGLab: Future Gadget Laboratory
- Semantic Versioning 2.0.0 | Semantic Versioning
- danielwaterworth/metricmachine: Simple flask app for displaying live timeseries data
- polyaxon/polyaxon: A platform for reproducible and scalable machine learning and deep learning on kubernetes
- Pachyderm - Scalable, Reproducible Data Science
- Controlled Experiments in Machine Learning
- rquintino (Rui Quintino)
How to check if an object is a generator object in python? - Stack Overflow Why can’t python module dill pickle the generator function? - Stack Overflow pickle iterators and generators · Issue #10 · uqfoundation/dill python - Why can’t generators be pickled? - Stack Overflow Issue 1092962: Make Generators Pickle-able - Python tracker python - Why can’t generators be pickled? - Stack Overflow Where does a generator store it’s values? : Python Automatically remove generator object from memory at StopIteration (Python) - Stack Overflow Python multiprocessing PicklingError: Can’t pickle <type ‘function’> - Stack Overflow UsingPickle - Python Wiki Change Fork Name For Github - Stack Overflow
Jenkins (software) - Wikipedia Category:Build automation - Wikipedia Continuous integration - Wikipedia Continuous Integration. CircleCI vs Travis CI vs Jenkins - By Django Stars Jenkins (software) - Wikipedia Continuous delivery - Wikipedia Continuous deployment - Wikipedia Comparison of continuous integration software - Wikipedia reddit.com: search results - continuous deployment magit-circleci: See the latest CircleCI builds from the Magit status buffer. : emacs
rmuslimov/jenkins.el: Jenkins plugin for emacs kljohann/mpv.el: control mpv for easy note taking Jenkins Is Getting Old | Hacker News Product Vision - CI/CD | GitLab Fun with Gitlab CI - VADOSWARE
Can’t see the forks of a project on GitHub when “Too many forks to display” is shown - Web Applications Stack Exchange Intuitive way to view most active fork in GitHub - Stack Overflow Popular github Forks GitPop2: Find the most popular fork on GitHub Active GitHub Forks Enhanced GitHub - Chrome Web Store
How to write a production-level code in Data Science? Refactoring Python Code for Machine Learning Projects. Python “Spaghetti Code” Everywhere!
How to Write Beautiful Python Code With PEP 8 – Real Python How to write a production-level code in Data Science? styleguide | Style guides for Google-originated open-source projects Coding Style Guidelines — Pylearn 0.1 documentation
Packaging Python Projects — Python Packaging User Guide Making a PyPI-friendly README — Python Packaging User Guide Minimal Structure — Python Packaging Tutorial Over 10% of Python Packages on PyPI are Distributed Without Any License | Snyk Choose an open source license | Choose a License Licenses | Choose a License TLDRLegal - Software Licenses Explained in Plain English
A template to make good README.md template-python/README.md at master · jacebrowning/template-python
Where do you keep your files? : emacs Rational ClearCase - Wikipedia
D What’s your favorite logger? : MachineLearning D How do you manage your machine learning experiments? : MachineLearning Discussion How do you manage and keep track of your experiments? : MachineLearning D Best way to manage ML experiements : MachineLearning D How do you keep track of your experiment results? : MachineLearning D What tools are used in practice to schedule training jobs, annotate datasets, keep track of past experiments… ? : MachineLearning
Home - Guild AI guildai/guildai: Open source experiment tracking and optimization for machine learning Comet.ml | Supercharging Machine Learning mlflow/mlflow: Open source platform for the machine learning lifecycle Tutorial — MLflow 1.2.0 documentation Weights & Biases kubeflow/kubeflow: Machine Learning Toolkit for Kubernetes Kubeflow | Kubeflow seba-1511/randopt: Streamlined machine learning experiment management. richardliaw/track: Track your ML project!
Introducing MLflow: an Open Source Platform for the Complete Machine Learning Lifecycle How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform - Databricks What are the current open source alternatives to MLflow? | Hacker News
FGLab: Machine Learning Dashboard Kaixhin/FGLab: Future Gadget Laboratory
Semantic Versioning 2.0.0 | Semantic Versioning danielwaterworth/metricmachine: Simple flask app for displaying live timeseries data polyaxon/polyaxon: A platform for reproducible and scalable machine learning and deep learning on kubernetes Pachyderm - Scalable, Reproducible Data Science mlflow sacred weights and biases - Google Search Compare to other ML e2e platforms · Issue #58 · mlflow/mlflow Controlled Experiments in Machine Learning rquintino (Rui Quintino) Towards Reproducible Research with PyTorch Hub | Hacker News Tutorial — Airflow Documentation
TRAINS: An open-source, zero-integration tool to boost machine learning research allegroai/trains: TRAINS - Auto-Magical Experiment Manager & Version Control for AI allegroai/trains-server: TRAINS Server - Auto-Magical Experiment Manager & Version Control for AI trains/brief.md at master · allegroai/trains Allegro.ai - Deep Learning Computer Vision Platform trains - Allegro.AI
Home - Cookiecutter Data Science drivendata/cookiecutter-data-science: A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. manifoldai/docker-cookiecutter-data-science: A fork of the cookiecutter-data-science leveraging Docker for local development. An AI Engineering Services Firm | Manifold
Machine Learning Version Control System · DVC Data Version Control - Machine Learning Time Travel - YouTube iterative/dvc: 🦉Data Version Control | Git for Data & Models
Workflow management system - Wikipedia
github - How do Git LFS and git-annex differ? - Stack Overflow
D How do you structure your PyTorch deep learning Implementations/Projects/PythonLibs : MachineLearning importlib — The implementation of import — Python 3.7.4 documentation pytorch-template/README.md at master · victoresque/pytorch-template MrGemy95/Tensorflow-Project-Template: A best practice for tensorflow project template architecture. toolkit/pytorch_project_template at master · gmum/toolkit
Rules of Machine Learning: | ML Universal Guides | Google Developers Machine Learning Crash Course | Google Developers Introduction to Machine Learning | Machine Learning Crash Course | Google Developers Introducing ML - YouTube Education – Google AI
excellent reddit discussion {D} How do you structure your PyTorch deep learning Implementations/Projects/PythonLibs : MachineLearning
[2019-08-15 Thu 13:21]
r/MachineLearning
Posted by u/__Julia . 9 months ago Archived
Hi, In the data science community, I have seen a wide adoption of this project structure https://github.com/drivendata/cookiecutter-data-science. However, I am still struggling to find a unified way to structure ML experiments that save readers time to understand the structure of the project.
- snaf77 38 points · 9 months ago
- Yes, my team have (somewhat). I have written couple of medium articles on that: https://medium.com/@mbednarski
- Some our rules of thumb:
- Notebooks are allowed only for private projects, we do not even commit them.
- All steps needs to be reproducible from raw data to trained models (we use gnu make)
- Teams needs to know more about Python, not only basics - this allows them to write better code.
- git flow
- strict package versioning - we use pip-tools
- swagger for api
- It is regular software project, ML does not allow to do “shortcusts because >>science<<”. So all principles like DRY, KISS. Exception is when performance is an issue (but it should be profiled first)
- (unit) test where possible - e.g. data loaders, preprocessors etc
- Well defined entry points (one common cli is better than bunch of scripts)
- DOCUMENTATION for things that are not obvious from reading the code
- I prefer to keep as much configuration (batch size, optimizer, etc) in json files, but not everyone likes it (i like to have mapping: json config -> results dir)
- AllenNLP is a good inspiration for me
- LiberalSexist 2 points · 9 months ago
- Just read the first part of your structured ML series and found plenty of great ideas applicable for data-science projects in general.
- “AllenNLP is a good inspiration for me” – What work/code of AllenNLP do you mean in particular?
- BatJedi121 1 point 8 months ago
- I personally like how almost everything is configureable through JSON files. A lot of the boilerplate like vocab, masking sequences, having LSTMs to vectors/LSTMs to multiple outputs like things, typical attention types handled in really simple ways. setting up the training/val loop, collecting metrics, serializing/loading models.
- I think getting to AllenNLP level for projects is overkill - but I think future libraries should definitely follow their design principles. take the boilerplate out.
- mate_classic 13 points · 9 months ago
- I was thinking about the same recently. Right now I’m trying to refactor my code to resemble this template here: https://github.com/victoresque/pytorch-template/blob/master/README.md
- thatguydr 1 point · 9 months ago
- I like the OP’s link a lot better for the top-down structure, as it separates all of the code into a single spot that can be committed easily. I like your link because it separates the model classes, the loader classes, the trainer class(es), and the utility classes (though I’d put the abstracts in that spot).
- What I don’t like in either is the rather cavalier treatment of the reporting/evaluation. If you could tie the evaluation and the original config in with the code, all of that could be committed, provided you make a low-memory evaluation format and not dozens of plots. (Maybe throw the requirements in with them.) That’d be a very clean solution.
- mate_classic 1 point 9 months ago
- I didn’t really think about evaluation data. Maybe because I’m working only with generative models right now, where evaluation is looking for the most beautiful picture most of the time.
- [deleted] 1 point · 8 months ago
- I really like this template. Gonna start using this.
- ranihorev 8 points · 9 months ago
- The biggest challenge for me is how to do the transition from the notebook to production fast and smooth
- srossi93 26 points · 9 months ago
- Simple, never use notebooks! Notebook is a great tool for visualization, simple experiments, debugging, but as soon as the lines of code are >50, I immediately switch to a more structured organization. BTW, PyTorch is super OO and it’s super easy to derive, inherit, extend functionalities!
- tidier 2 points 9 months ago
- Never might be a bit strong, but I absolutely agree with everything else. Notebooks are excellent for experimentation, but I aggressively shift stuff to Python files and use importlib.reload.
- ranihorev 1 point 9 months ago
- I completely agree. The tricky part is to identify the point in which the experiment is done…
- tidier 2 points 9 months ago
- Simple, never use notebooks! Notebook is a great tool for visualization, simple experiments, debugging, but as soon as the lines of code are >50, I immediately switch to a more structured organization. BTW, PyTorch is super OO and it’s super easy to derive, inherit, extend functionalities!
- JanneJM 2 points 9 months ago
- You can run external programs from the notebook though. One benefit of doing that is that you could have a self-documenting pipeline, with the final (or just preview) results inline with the commands running the model, glue logic and so on.
- MoreDonuts 1 point 9 months ago
- And compose.
- gionnelles 1 point 9 months ago
- My team follows this exclusively.
- srossi93 26 points · 9 months ago
- The biggest challenge for me is how to do the transition from the notebook to production fast and smooth
- pickwickdick 3 points · 9 months ago
- I structure my code using the following conventions:
- 1. I keep the
model
and thetrain/eval
logic separate. - 2. I have a
ParamParser
class in a utils folder that takes in a JSON file as input and exposes all keys as member variables that can be accessed like member variables. - 3. In my
model.py
file I define the loss function, accuracy and expose it via ametrics
dictionary. Now intrain.py
I can simply callmetrics['accuracy'](out,label)
to compute the accuracy (or loss).
- 1. I keep the
- Hopefully, this helps answer your questions OP.
- I structure my code using the following conventions:
ekshaks 3 points · 9 months ago Dealing with tensor shapes and documenting them for others is a pervasive problem. I use shape annotations using the tsalib library to document shapes throughout the data and model pipeline. https://github.com/ofnote/tsalib
mentatf 2 points · 9 months ago Using scikit learn guidelines and skorch that fits perfectly with that. ( https://github.com/dnouri/skorch)
RoastDepreciation 1 point · 9 months ago
- Cookiecutter is an excellent starting point. Adapt to your team’s needs.
- Not committing notebooks is simply not an option. They’re here to stay. Instead commit notebooks without output and store regular html exports in a separate reports root folder for future reference and reporting in the team.
katyngate 0 points · 9 months ago Here’s one possible way: https://github.com/gmum/toolkit/tree/master/pytorch_project_template
- skimming through:
- they should have an api for notebooks
- slides on autoML /home/will/Downloads/Chicago ML - Applied Engineering Workshop.pdf
Hi @Mohammedi Haroune and welcome! By default, Guild inspects the script you want to run (or the main module specified in the Guild for the operation) and checks for the use of argparse. If the script uses argparse, Guild runs the script with the –help option and uses that dry-run to inspect the arguments available and uses those as defined (see note concerning magic below). If the script does not use argparse, Guild checks for global variable assignments of numbers and strings and uses those as flags. With that information (either from argparse or globals) it lets the user redefine flag values using NAME=VALUE on the command line. Before it runs the operation, Guild prints the flag values as a preview. You can also see what Guild is importing by running guild help or guild run SCRIPT_OR_OPERATION –help-op. If the flags come from argparse, Guild passes those as command line options to the script. If defined as globals, Guild sets the global values to the user-provided values by dynamically modifying the module AST as it’s loaded. This is all a bit magical and everyone reading this should feel a little uneasy at this point 🙂 The reason for all this implicit logic is to let you pick up a script and just run it - Guild captures the experiment as expected. In most cases this magic just works and everyone’s happy. But when it doesn’t, it’s mysterious and frustrating! The good news is that all of this behavior can be strictly controlled - and even disabled altogether - with a few lines in a Guild file (a file named guild.yml in your project directory). In the Guild file, you can provide explicit information about the flags for an operation as well as how the flags are set. This scheme is quite under-documented atm. For a fairly exhaustive list of examples on this topic, see: https://github.com/guildai/guildai/tree/master/guild/tests/samples/projects/flags You can change to that directory (after cloning the repo obviously) and run each example to see the behavior. Of course that’s an exercise for the uber curious 🙂 If you want to accomplish something and it’s not falling into place for you - please just post your question here and someone can help! (
I saw that any value that is output in the formal key:value will be captured by guild. Is there any other way to explicitly tell guild to capture this or outputting to stdout is the only way as of now? (edited) Garrett 9:56 AM @Abhinv Ramesh Kashyap The short answer is yes, definitely - Guild happily reads any generated TF event files. By default Guild parses your script output for patterns KEY: NUMBER as you observed. However you can control that behavior using a Guild file. Here’s an example that steps you through the concepts and shows you how to configure an operation for both modified parsing behavior and also how to disable the parsing altogether when you just want to log values directly. https://github.com/guildai/examples/tree/master/custom-scalars (edited)
@here I’ve created a new repo that we can use to work on/communicate issue resolution: https://github.com/guildai/issues Sometimes (often) it’s handy to systematically reproduce a bug/issue and be able to quickly re-run steps against new releases to confirm expected behavior. Our first examples is related to source code copies - an important topic for many Guild users. If you’re interested in how Guild decides what files to save as source code, this https://github.com/guildai/issues/tree/master/issue-39 is a step-by-step walk through. (edited) 0.6.6rc2 is available for pre-release eval. I snuck in a pretty cool feature that I’d love to get some feedback on. Now when you run guild tensorboard Guild will prepare TensorBoard HParam summaries in the back ground so you can compare run hyperparams and metrics in the HParam tab. This is a really nice feature offered by TensorBoard!
Garrett Sep 3rd at 7:09 AM The gist of the question is related to a common pipeline: prepare data from some raw source, engineer features on the prepared data (a second stage of data prep), train a model, validate a model. In a Guild file, each of these stages are defined as separate operations. The operations are related to one another through resource dependencies. The first operation will depend on the raw data. Subsequent operations will depend on their upstream operations. Something like this: prepare-data: requires:
- file: data.csv
add-features: requires:
- operation: prepare-data
train: requires:
- operation: add-features
validate: requires:
- operation: train
It might make sense for some of these operations to be melded into one. E.g. prepare-data and add-features could be one operation (i.e. roll the feature engineering work into the data prep script). Or train and validate could be one (validate as a part of the training script - this is very common). The triggers for creating a separate operation (e.g. split up raw data prep and feature engineering) are:
- Does the operation take long? (a subjective term - but usually you know it when you see it) - If yes, consider creating a separate operation to simply avoid having to re-run the operation when you can re-use artifacts as a dependency.
- Could the operation potentially be run multiple times, each time with different hyperparameters or inputs (flag values) for a given set of upstream dependencies? For example, for validation, its common to validate against new data sets as new labeled examples become available. You probably don’t want to retrain a model just to revalidate with new data. In this case, validate should be a separate operation. (edited)
María Benavente 6 days ago awesome! let’s say, for example, that add-features accesses as well the data file, would it be necessary to set the requirement also for that operation? (edited)
Garrett 6 days ago Yes, indeed it would! You could get to data by way of the prepare-data operation but this is not a good idea - and arguably Guild should treat that as an error (or warn you). You should instead list data as a required resource for add-features. This is where defining your resources in separate named sections is a good idea. Then you can simply reference the resource by name and not have to redefine it every time it’s needed. To define a named resource, you need to define a model. For example
- model: my-model
resources:
raw-data:
sources:
- file: data-1.csv
- file: data-2.csv
prepared-data: sources:
- operation: prepare-data
operations: prepare-data: requires: raw-data add-features: requires:
- raw-data
- prepared-data
… Note that I went ahead and defined a prepared-data resource. Even if a resource is only used once, I think it’s nice to define named resources as it keeps the operation requires config simple and readable. (edited)
Garrett 6 days ago Note that I edited the example above to include a sources attr under each resource. Guild requires this atm. (I’m actually going to fix this right now to make sources optional - for now you have to use it.)
María Benavente 6 days ago wow, okey! that’s really clean
María Benavente 6 days ago and in order to avoid those files running with sourcecode // exclude would it be possible to reference it also that way? Example: sourcecode:
- exclude: raw-data
Garrett 6 days ago Yes but you have to spell that as - exclude: raw-data/*
Garrett 6 days ago I don’t really like that requirement - I’ll look into fixing that so you can just list the directory there. 👍 1
Garrett 6 days ago But for now, use the glob pattern.
María Benavente 6 days ago alright
María Benavente 6 days ago i’m finding quite confusing a behavior I’m experimenting as a result of requiring specific files into each operation:
- model: claim-detection
description: Classifier for claim-tagged data
resources:
excels:
sources:
- file: data/excels/
- file: data/raw/
raw: sources:
- file: data/raw/
- file: data/processed/
- operation: generatedataset
Now that I do this, locally to my code, those folders “loose the data prefix”. I wasn’t a big deal to update my global_path variable at the code, but I’m not sure whether the global path should remain or not
María Benavente 6 days ago (did I explain myself here?)
Garrett 6 days ago Yes, that’s right - the way you’re specifying the data files, they will not appear under a data path. They are selected and linked to using their base names (e.g. excels, raw, etc.) If you want these selected files/dirs to appear under a data path, you do a couple things. First, you could just select data and leave it at that: sources:
- file: data
This will create a link to data and you’ll have access to everything in that directory. If you’d prefer to be more specific (generally a good idea) you can specify a path attr to indicate that links to selected files/dirs should be created in a sub-directory. Like this: excels: path: data sources:
- file: data/excels
- file: data/raw
This will create the directory structure that you’re expecting - but only include the two specified dirs as links.
Garrett 6 days ago Btw, in cases like this where you’re trying to sort out the directory layout, there’s a –stage DIR option to the run command that will only layout the run directory and not actually run the operation. You can inspect DIR in this case to see what Guild is doing.
2Garrett 6 days ago The third option is the one you mention, which is to adjust your script to look for the resources in something other than data. In most cases, I just specify data as a source and be done with it. Remember this creates a symlink - it’s not copying anything. The only harm in including data is that you have access to everything in that dir, which could mask some bugs. It’s also less explicit. There’s a point however when being explicit has diminishing returns - so it’s a judgment call.
good commentary (R) pytorch-lightning - The researcher’s version of keras : MachineLearning [2019-10-20 Sun 22:39]
awesome, comprehensive: A Comparison of Reinforcement Learning Frameworks: Dopamine, RLLib, Keras-RL, Coach, TRFL, Tensorforce, Coach and more [2019-10-20 Sun 22:55]
- BEST (Raschka) –> /home/will/DevAcademics/LanguageThemed/python_reference
look at chrome links in NOW
- need to revisit these:
- I can readily reinstall all packages into an env, and then delete all in root env, both pip and conda
- setuptools, easy_install old, don’t
- PYTHONPATH, don’t
- also mentioned, for science packages, you usually want the most up-to-date. the dependancy issue is more for web frameworks and other apps. There just have one env for conda and keep that updated. only use envs for rarer cases
- Best answer: python - What is the difference between venv, pyvenv, pyenv, virtualenv, virtualenvwrapper, pipenv, etc? - Stack Overflow
- DKinghorn: very good reasons for Anaconda
- decent guides
- How to Setup a Python Environment for Machine Learning and Deep Learning with Anaconda
- How to Learn Python for Data Science (Updated)
- detailed explanations How to Set Up Your Python Environment on a Mac — davidculley.com
- meh
- The definitive guide to setup my Python workspace – Henrique Bastos 2017
- uses pyenv (with pyenv-virtualenv, pyenv-virtualenvwrapper),
- puts anaconda IN pyenv?
- seems to knowledgable
- Simple Python Environments For Data Science 🐍 – Rick Galbo – Medium
- The definitive guide to setup my Python workspace – Henrique Bastos 2017
- comments+1 Freezing Python’s Dependency Hell | Hacker News
- 52 days ago. complaints of pipenv; poetry, nix again
- deeper discussion why nix way is better way
- nix vs conda, similar approach
- meh, Python Virtual Environments – a Primer 2016 | Hacker News
- Nix? Vex? heard nix several times now in guides. composable
- conda again
- good review Pipenv review, after using it in production – David Jetelina – Medium
- maybe go back to the basics virtualenv + pip, when not conda
- more complaints, 4mths ago Pipenv: A Guide to the New Python Packaging Tool : Python
- homepage Pipenv: Python Dev Workflow for Humans — pipenv 2018.7.1.dev0 documentation
- Pipenv: One Year Later and a Call for Help | Hacker News
- Advanced Usage of Pipenv — pipenv 2018.7.1.dev0 documentation
- To use Pipenv with a third-party Python distribution (e.g. Anaconda), you simply provide the path to the Python binary:
- $ pipenv install –python=/path/to/python
- Anaconda uses Conda to manage packages. To reuse Conda–installed Python packages, use the –site-packages flag:
- $ pipenv –python=/path/to/python –site-packages
- To use Pipenv with a third-party Python distribution (e.g. Anaconda), you simply provide the path to the Python binary:
- gitter, irc for python, anaconda, etc about datsci practices, ie failure of portable conda envs, need to customize
- one or few best envs for general datsci research. there should be only a few.
- any overall package guide? prob not
Python Closures: How to use it and Why? python map function - Google Search best python coding slack channels - Google Search python coding gitter - Google Search reddit: the front page of the internet Python Python coding: a subreddit for people who know Python Quick python tips to add to your collection
- [2019-07-15 Mon]
- Proposal Named Axes/Dimensions or Tensor Shape Annotations · Issue #4164 · pytorch/pytorch
- Tensor Considered Harmful
- Tensor Considered Harmful Pt. 2
- harvardnlp/namedtensor: Named Tensor implementation for Torch
- pydata/xarray: N-D labeled arrays and datasets in Python
- xarray: N-D labeled arrays and datasets in Python — xarray 0.12.2 documentation
- @xarray_dev (@xarray_dev) | Twitter
- ofnote/tsalib: Tensor Shape Annotation Library (numpy, tensorflow, pytorch, …)
- naming axis in tensors (Thu Jan 10 2019)
- NVIDIA/OpenSeq2Seq: Toolkit for efficient experimentation with various sequence-to-sequence models (https://github.com/NVIDIA/OpenSeq2Seq)
- ctongfei/nexus: Experimental typesafe tensors / deep learning / probabilistic programming in Scala (https://github.com/ctongfei/nexus)
- harvardnlp/namedtensor: Proof of concept for a dynamic named tensor for pytorch (https://github.com/harvardnlp/namedtensor)
- Tensor Considered Harmful (http://nlp.seas.harvard.edu/NamedTensor)
- [D] Tensor Considered Harmful (A polemic against numpy / pytorch and a proposal for a named tensor) : MachineLearning (https://www.reddit.com/r/MachineLearning/comments/accmek/d_tensor_considered_harmful_a_polemic_against/)
- harvardnlp on Twitter: “”Tensor Considered Harmful” (https://t.co/iueFvrYT6O). A polemic against numpy / pytorch and a proposal for a named tensor (https://t.co/MVBUm7OyBq). (New year’s goal, be more troublesome.)… https://t.co/fLmk8RR4Xy” (https://twitter.com/harvardnlp/status/1080911225427496966)
- Yann LeCun on Twitter: “A pretty cool proposal from Sasha Rush for “named tensors”, i.e. tensors with named indices. With an implementation in PyTorch. https://t.co/8TGYmrjxAG https://t.co/8TGYmrjxAG” (https://twitter.com/ylecun/status/1080974471689687040)
- Dynamic shapes · Issue #3 · KhronosGroup/NNEF-Tools (KhronosGroup/NNEF-Tools#3)
- @xarray_dev (@xarray_dev) | Twitter (https://twitter.com/xarray_dev)
- xarray: N-D labeled arrays and datasets in Python — xarray 0.11.2+1.gd6bed01 documentation (http://xarray.pydata.org/en/stable/)
- NumFOCUS: Open Code = Better Science - NumFOCUS (https://numfocus.org/)
- pydata/xarray: N-D labeled arrays and datasets in Python (https://github.com/pydata/xarray)
- xarray: N-D labeled arrays and datasets in Python — xarray 0.11.2+1.gd6bed01 documentation (http://xarray.pydata.org/en/stable/)
- ofnote/tsalib: Tensor Shape (Annotation) Library (https://github.com/ofnote/tsalib)
- Introducing Tensor Shape Annotation Library : tsalib (https://towardsdatascience.com/introducing-tensor-shape-annotation-library-tsalib-963b5b13c35b)
- [Proposal] Named Axes/Dimensions or Tensor Shape Annotations · Issue #4164 · pytorch/pytorch (pytorch/pytorch#4164)
- Tensor considered harmful | Hacker News (https://news.ycombinator.com/item?id=18823777)
- [D] Tensor Considered Harmful (A polemic against numpy / pytorch and a proposal for a named tensor) : MachineLearning (https://www.reddit.com/r/MachineLearning/comments/accmek/d_tensor_considered_harmful_a_polemic_against/)
- Tensor Considered Harmful (http://nlp.seas.harvard.edu/NamedTensor)
- NVIDIA/OpenSeq2Seq: Toolkit for efficient experimentation with various sequence-to-sequence models (https://github.com/NVIDIA/OpenSeq2Seq)
- ctongfei/nexus: Experimental typesafe tensors / deep learning / probabilistic programming in Scala (https://github.com/ctongfei/nexus)
- NumFOCUS: Open Code = Better Science - NumFOCUS (https://numfocus.org/)
- Dynamic shapes · Issue #3 · KhronosGroup/NNEF-Tools (KhronosGroup/NNEF-Tools#3)
Ahmad Moussa [1 day ago] if it’s remotely you could send yourself an email via a python script
Pyrestone [7 hours ago] I also use the python telegram api sometimes. It’s pretty simple and you can send messages to your phone.
- for:
- downloading their data to process
- run benchmarks, tasks
- starred repos
- dir in devacademics
All papers with abstracts https://paperswithcode.com/media/about/papers-with-abstracts.json.gz Links between papers and code https://paperswithcode.com/media/about/links-between-papers-and-code.json.gz Evaluation tables https://paperswithcode.com/media/about/evaluation-tables.json.gz
The last JSON is in the sota-extractor format and the code from there can be used to load in the JSON into a set of Python classes.
At the moment, data is regenerated once a week (over the weekend).
Part of the data is coming from the sources listed in the sota-extractor README.
links-btw-paper-and-code
{
"paper_title": "FASTSUBS: An Efficient and Exact Procedure for Finding the Most Likely Lexical Substitutes Based on an N-gram Language Model",
"paper_arxiv_id": "1205.5407",
"paper_url_abs": "http://arxiv.org/abs/1205.5407v2",
"paper_url_pdf": "http://arxiv.org/pdf/1205.5407v2.pdf",
"repo_url": "https://github.com/denizyuret/fastsubs-googlecode",
"mentioned_in_paper": false,
"mentioned_in_github": true
},
evaluation-tables
{
"categories": [
"Computer Vision"
],
"datasets": [],
"description": "The average of the normalized top-1 prediction scores of unseen classes in the generalized zero-shot learning setting, where the label of a test sample is predicted among all (seen + unseen) classes.",
"source_link": null,
"subtasks": [],
"synonyms": [],
"task": "Generalized Zero-Shot Learning - Unseen"
},
{
"categories": [
"Medical"
],
"datasets": [],
"description": "",
"source_link": null,
"subtasks": [],
"synonyms": [],
"task": "breast density classification"
},
{
"categories": [
"Medical"
],
"datasets": [],
"description": "",
"source_link": null,
"subtasks": [],
"synonyms": [],
"task": "epilepsy prediction"
},
{
"categories": [
"Methodology"
],
"datasets": [],
"description": "",
"source_link": null,
"subtasks": [],
"synonyms": [],
"task": "Sparse Learning"
},
{
"categories": [
"Robots"
],
"datasets": [],
"description": "",
"source_link": null,
"subtasks": [],
"synonyms": [],
"task": "Calibration"
},
{
"categories": [
"Graphs"
],
"datasets": [],
"description": "",
"source_link": null,
"subtasks": [],
"synonyms": [],
"task": "hypergraph partitioning"
}
papers-with-abstracts
{
"arxiv_id": null,
"title": "Towards a Discourse Model for Knowledge Elicitation",
"abstract": "",
"url_abs": "https://www.aclweb.org/anthology/papers/R/R13/R13-2006/",
"url_pdf": "https://www.aclweb.org/anthology/R13-2006",
"proceeding": "RANLP 2013 9"
},
{
"arxiv_id": "1508.05902",
"title": "A Framework for Comparing Groups of Documents",
"abstract": "We present a general framework for comparing multiple groups of documents. A\nbipartite graph model is proposed where document groups are represented as one\nnode set and the comparison criteria are represented as the other node set.\nUsing this model, we present basic algorithms to extract insights into\nsimilarities and differences among the document groups. Finally, we demonstrate\nthe versatility of our framework through an analysis of NSF funding programs\nfor basic research.",
"url_abs": "http://arxiv.org/abs/1508.05902v1",
"url_pdf": "http://arxiv.org/pdf/1508.05902v1.pdf",
"proceeding": null
},
{
"arxiv_id": null,
"title": "DysList: An Annotated Resource of Dyslexic Errors",
"abstract": "",
"url_abs": "https://www.aclweb.org/anthology/papers/L/L14/L14-1492/",
"url_pdf": "http://www.lrec-conf.org/proceedings/lrec2014/pdf/612_Paper.pdf",
"proceeding": "LREC 2014 5"
},
Hands-on with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost - YouTube [2019-09-25 Wed 10:25] Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU Description In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow. Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google. KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking. Airflow is the most-widely used pipeline orchestration framework in machine learning.
- most ppl not using pytorch, 1st mover advantage
- airflow better than luigie
- next gpu will be multi-user thread friendly
- on-prem 39%, pretty good
- A/B and multi-armed bandit testing of models
- EKS - amazon elasti-kubernetes service
- kubeflow doesn’t offer native airflow integration, pipelineai kubeflow version does along with MLflow (databricks)
- each github star is worth $1,500 in SV land.
PipelineAI - Products [2019-09-25 Wed 11:08] Multi/Hybrid-Cloud CPU + GPU + TPU Dynamic Auto Scaling Adaptive Traffic Shift Continuous Model Training Continuous Pipeline Optimization Continuous Model Validation Kafka Streaming Private Dashboards Logging Integration SAML + LDAP + OAuth + IAM 24x7 Support
- Github’s Top Open Datasets For Machine Learning
- https://www.kaggle.com/datasets
- ~/DevAcademics/Datasets
- Google Dataset Search
- Academic Torrents
Stanford DAWN Deep Learning Benchmark (DAWNBench) ~/Documents/2Research/OnlineEdu/datasciencemasters-go ~/Documents/2Research/OnlineEdu/open-source-machine-learning-degree
awesome-datascience ~/Documents/2Research/DataScience/awesome-datascience-ideas ~/Documents/2Research/DataScience/datascience-awesome-cheat-sheets ~/Documents/2Research/DataScience/free-data-science-books
~/Documents/2Research/DataScience/awesome-crawler ~/Documents/2Research/DataScience/awesome-crawler/README.html ~/Documents/2Research/DataScience/awesome-crawler/README.md
Web Scraping Tutorial with Python: Tips and Tricks
kennethreitz/twitter-scraper: Scrape the Twitter Frontend API without authentication. taspinar/twitterscraper: Scrape Twitter for Tweets haccer/tweep: An advanced Twitter scraping tool written in Python that doesn’t use Twitter’s API, evading most API limitations. tweepy/tweepy: Twitter for Python!
Twitter scraper tutorial with Python: Requests, BeautifulSoup, and Selenium — Part 1 Mining Twitter Data with Python (Part 1: Collecting data) – Marco Bonzanini bonzanini/Book-SocialMediaMiningPython: Companion code for the book “Mastering Social Media Mining with Python”
Beautiful Soup Documentation — Beautiful Soup 4.4.0 documentation Selenium - Web Browser Automation
- BruceDone/awesome-crawler: A collection of awesome web crawler,spider in different languages
- be a good web-scraping citizen:
- What is the best open source web crawler that is very scalable and fast? And why? - Quora
- top are Heritrix, Nutch, Scrapy
- scrapy
- michael-yin/awesome-scrapy: A curated list of … from the Scrapy community.
- Scrapy Tutorial — Scrapy 1.5.0 documentation
- Scrapy at a glance — Scrapy 1.5.0 documentation
- Scrapy | Community
- Scrapy: An open source web scraping framework for Python
- scrapinghub/portia: Visual scraping for Scrapy - worth getting
- others
- How do I choose between using Beautiful Soup or Scrapy? - Quora, some useful points
- KDnuggets
- via Selenium Webdriver vs Mechanize - Stack Overflow, good answer, they overlap some These are completely different tools that somewhat “cross” in the web-scraping, web automation, automated data extraction scope. selenium usually becomes a “fall-back” tool - when someone cannot web-scrape a site with mechanize or RoboBrowser or MechanicalSoup (note - another alternative) Also note that you should, first, consider using an API (if provided by the target website) instead of going down to web-scraping.
Home - Nurture.AI GitXiv: Collaborative Open Computer Science Papers with Code : the latest in machine learning
3.3.1 Discrete Variables and Probability Mass Functions 3.9.1 Bernoulli Distribution3.9.2 Multinoulli Distribution3.9.3 Gaussian Distribution3.9.4 Exponential and Laplace Distributions3.9.5 The Dirac Distribution and Empirical Distribution3.9.6 Mixtures of Distributions 4.3.1 Beyond the Gradient: Jacobian and Hessian Matrices 5.1.1 The Task, T5.1.2 The Performance Measure, P5.1.3 The Experience, E5.1.4 Example: Linear Regression 5.2.1 The No Free Lunch Theorem5.2.2 Regularization 5.3.1 Cross-Validation 5.4.1 Point Estimation5.4.2 Bias5.4.3 Variance and Standard Error5.4.4 Trading off Bias and Variance to Minimize Mean Squared Error5.4.5 Consistency 5.5.1 Conditional Log-Likelihood and Mean Squared Error5.5.2 Properties of Maximum Likelihood 5.6.1 Maximum A Posteriori (MAP) Estimation 5.7.1 Probabilistic Supervised Learning5.7.2 Support Vector Machines5.7.3 Other Simple Supervised Learning Algorithms 5.8.1 Principal Components Analysis5.8.2 k-means Clustering 5.11.1 The Curse of Dimensionality5.11.2 Local Constancy and Smoothness Regularization5.11.3 Manifold Learning 6.2.1 Cost Functions6.2.1.1 Learning Conditional Distributions with Maximum Likelihood6.2.1.2 Learning Conditional Statistics6.2.2 Output Units6.2.2.1 Linear Units for Gaussian Output Distributions6.2.2.2 Sigmoid Units for Bernoulli Output Distributions6.2.2.3 Softmax Units for Multinoulli Output Distributions6.2.2.4 Other Output Types6.3 Hidden Units
need to relink to new pdf location
Link on page 6: http://www.cs.nyu.edu/˜roweis/data.html
B Solution of - DKL(qbold0mu mumu 2005/06/28 ver: 1.3 subfig package(z) || pbold0mu mumu 2005/06/28 ver: 1.3 subfig package(z)), Gaussian case
Link on page 1: tmiller@ unimelb. edu. au
Link on page 24: https://www.youtube.com/watch?v=VTNmLt7QX8E
Link on page 60: https:
Link on page 60: //www.ijcai.org/proceedings/2017/0023.pdf
Link on page 61: http://www.darpa.mil/program/ explainable-artificial-intelligence, full solicitation at http://www.darpa.mil/attachments/
Link on page 61: http://www.darpa.mil/attachments/
Link on page 61: DARPA, Explainable Artificial Intelligence (XAI) Program, http://www.darpa.mil/program/ explainable-artificial-intelligence
Link on page 61: DARPA-BAA-16-53.pdf
Link on page 61: [[https://arxiv.org/pdf/1709.10256][xplainable Planning, in: IJCAI 2017 Workshop on Explainable Artificial Intelligence (XAI), URL https://arxiv.org/pdf/1709.10256, 2017. [47] N. Frosst, G. Hinton, Distilling a Neural Network Into a Soft Deci]]
Link on page 61: https://arxiv.org/abs/1711.09784
Link on page 62: https://arxiv.org/abs/
Link on page 62: 1802.00541
Link on page 64: the Asylum, in: IJCAI 2017 Workshop on Explainable Artificial Intelligence (XAI), 36–42, URL http://people.
Link on page 64: eng.unimelb.edu.au/tmiller/pubs/explanation-inmates.pdf
Link on page 64: G. Nott, ‘Explainable Artificial Intelligence’: Cracking open the black box of AI, Computer World https://www.computerworld.com.au/article/617359/
Link on page 66: https:
Link on page 66: D. S. Weld, G. Bansal, Intelligible Artificial Intelligence, arXiv e-prints 1803.04263, URL https: //arxiv.org/pdf/1803.04263.pdf
Link on page 29: 1n: Convolutional Neural Networks for Visual Recogni- tion,” http://cs231n.stanford.edu/. P. A. Merolla, J. V. Arthur, R.
Link on page 30: https://software.intel.com/en-us/
Link on page 30: “Intel Math Kernel Library,” https://software.intel.com/en-us/ mkl
Link on page 30: 2017. “Caffe LeNet MNIST,” http://caffe.berkeleyvision.org/gathered/
Link on page 30: “Caffe LeNet MNIST,” http://caffe.berkeleyvision.org/gathered/ examples/mnist.html. “Caffe Model Zoo,”
Link on page 30: . “Caffe Model Zoo,” http://caffe.berkeleyvision.org/model zoo.
Link on page 30: “Caffe Model Zoo,” http://caffe.berkeleyvision.org/model zoo. html
Link on page 30: http://www.vlfeat.org/
Link on page 30: “Matconvnet Pretrained Models,” http://www.vlfeat.org/ matconvnet/pretrained/. “TensorFlow-Slim ima
Link on page 30: https://github.
Link on page 30: “TensorFlow-Slim image classification library,” https://github. com/tensorflow/models/tree/master/slim
Link on page 30: er/slim. “Deep Learning Frameworks,” https://developer.nvidia.com/
Link on page 30: “Deep Learning Frameworks,” https://developer.nvidia.com/ deep-learning-frameworks. Y.-H. Chen, T. Krishna,
Link on page 30: DATABASE of handwritten digits,” http://yann.lecun.com/exdb/ mnist/
Link on page 30: https://www.cs.toronto.edu/ ∼ kriz/cifar.html. A. Torralba, R. Fergus, and W. T. Freema
Link on page 30: http://host.robots.ox.ac.uk/pascal/
Link on page 30: VOC/
Link on page 30: http:
Link on page 30: //mscoco.org/. “Google Ope
Link on page 30: https://github.com/openimages/dataset
Link on page 30: mages,” https://github.com/openimages/dataset. “YouTube-8M,” https://research.google.com/youtube8m/. “AudioSet,” https://research.google.com/audioset/index.h
Link on page 30: M,” https://research.google.com/youtube8m/. “AudioSet,” https://research.google.com/audioset/index.html. S. Condon, “Facebook unveils Big Basin, new server gear
Link on page 31: http://eyeriss.mit.edu/energy.html. R. Dorrance, F. Ren, and D. Marković, “A scalable spars
Building machines that learn and think like people - 579 citations- Google Scholar
4.1.1.#Intuitive physics4.1.2.#Intuitive psychology 4.2.1.#Compositionality4.2.2.#Causality4.2.3.#Learning-to-learn 4.3.1.#Approximate inference in structured models4.3.2.#Model-based and model-free reinforcement learning.5.1.#Comparing the learning speeds of humans and neural networks on specific tasks is not meaningful, because humans have extensive prior experience
The architecture challenge: Future artificial-intelligence systems will require sophisticated architectures, and knowledge of the brain might guide their construction 10.1017/S0140525X17000036 Gianluca Baldassarre, Vieri Giuliano Santucci, Emilio Cartoni, and Daniele Caligiore Laboratory of Computational Embodied Neuroscience, Institute of Cognitive Sciences and Technologies, National Research Council of Italy, Rome, Italy. gianluca.baldassarre@istc.cnr.it#vieri.santucci@istc.cnr.it emilio.cartoni@istc.cnr.it#daniele.caligiore@istc.cnr.it http://www.istc.cnr.it/people/ http://www.istc.cnr.it/people/gianluca-baldassarre http://www.istc.cnr.it/people/vieri-giuliano-santucci http://www.istc.cnr.it/people/emilio-cartoni http://www.istc.cnr.it/people/daniele-caligiore In this commentary, we highlight a crucial challenge posed by the proposal of Lake et al. to introduce key elements of human cognition into deep neural networks and future artificial-intelligence systems: the need to design effective sophisticated architectures. We propose that looking at the brain is an important means of facing this great challenge. We agree with the claim of Lake et al. that to obtain human-level learning speed and cognitive flexibility, future artificial-intelligence (AI) systems will have to incorporate key elements of human cognition: from causal models of the world, to intuitive psychological theories, compositionality, and knowledge transfer. However, the authors largely overlook the importance of a major challenge to implementation of the functions they advocate: the need to develop sophisticated architectures to learn, represent, and process the knowledge related to those functions. Here we call this the architecture challenge. In this commentary, we make two claims: (1) tackling the architecture challenge is fundamental to success in developing human-level AI systems; (2) looking at the brain can furnish important insights on how to face the architecture challenge. The difficulty of the architecture challenge stems from the fact that the space of the architectures needed to implement the several functions advocated by Lake et al. is huge. The authors get close to this problem when they recognize that one thing that the enormous genetic algorithm of evolution has done in millions of years of the stochastic hill-climbing search is to develop suitable brain architectures. One possible way to attack the architecture challenge, also mentioned by Lake et al., would be to use evolutionary techniques mimicking evolution. We think that today this strategy is out of reach, given the “ocean-like” size of the search space. At most, we can use such techniques to explore small, interesting “islands lost within the ocean.” But how do we find those islands in the first place? We propose looking at the architecture of real brains, the product of the evolution genetic algorithm, and try to “steal insights” from nature. Indeed, we think that much of the intelligence of the brain resides in its architecture. Obviously, identifying the proper insights is not easy to do, as the brain is very difficult to understand. However, it might be useful to try, as the effort might give us at least some general indications, a compass, to find the islands in the ocean. Here we present some examples to support our intuition. When building architectures of AI systems, even when following cognitive science indications (e.g., Franklin 2007), the tendency is to “divide and conquer,” that is, to list the needed high-level functions, implement a module for each of them, and suitably interface the modules. However, the organisation of the brain can be understood on the basis of not only high-level functions (see below), but also “low-level” functions (usually called “mechanisms”). An example of a mechanism is brain organisation based on macro-structures, each having fine repeated micro-architectures implementing specific computations and learning processes (Caligiore et al. 2016; Doya 1999): the cortex to statically and dynamically store knowledge acquired by associative learning processes (Penhune & Steele 2012; Shadmehr & Krakauer 2008), the basal ganglia to learn to select information by reinforcement learning (Graybiel 2005; Houk et al. 1995), the cerebellum to implement fast time-scale computations possibly acquired with supervised learning (Kawato et al. 2011; Wolpert et al. 1998), and the limbic brain structures interfacing the brain to the body and generating motivations, emotions, and the value of things (Mirolli et al. 2010; Mogenson et al. 1980). Each of these mechanisms supports multiple, high-level functions (see below). Brain architecture is also forged by the fact that natural intelligence is strongly embodied and situated (an aspect not much stressed by Lake et al.); that is, it is shaped to adaptively interact with the physical world (Anderson 2003; Pfeifer & Gómez 2009) to satisfy the organism's needs and goals (Mannella et al. 2013). Thus, the cortex is organised along multiple cortical pathways running from sensors to actuators (Baldassarre et al. 2013a) and “intercepted” by the basal ganglia selective processes in their last part closer to action (Mannella & Baldassarre 2015). These pathways are organised in a hierarchical fashion, with the higher ones that process needs and motivational information controlling the lower ones closer to sensation/action. The lowest pathways dynamically connect musculoskeletal body proprioception with primary motor areas (Churchland et al. 2012). Higher-level “dorsal” pathways control the lowest pathways by processing visual/auditory information used to interact with the environment (Scott 2004). Even higher-level “ventral” pathways inform the brain on the identity and nature of resources in the environment to support decisions (Caligiore et al. 2010; Milner & Goodale 2006). At the hierarchy apex, the limbic brain supports goal selection based on visceral, social, and other types of needs/goals. Embedded within the higher pathways, an important structure involving basal ganglia–cortical loops learns and implements stimulus–response habitual behaviours (used to act in familiar situations) and goal-directed behaviours (important for problem solving and planning when new challenges are encountered) (Baldassarre et al. 2013b; Mannella et al. 2013). These brain structures form a sophisticated network, knowledge of which might help in designing the architectures of human-like embodied AI systems able to act in the real world. A last example of the need for sophisticated architectures starts with the recognition by Lake et al. that we need to endow AI systems with a “developmental start-up software.” In this respect, together with other authors (e.g., Weng et al. 2001; see Baldassarre et al. 2013b; 2014, for collections of works) we believe that human-level intelligence can be achieved only through open-ended learning, that is, the cumulative learning of progressively more complex skills and knowledge, driven by intrinsic motivations, which are motivations related to the acquisition of knowledge and skills rather than material resources (Baldassarre 2011). The brain (e.g., Lisman & Grace 2005; Redgrave & Gurney 2006) and computational theories and models (e.g., Baldassarre & Mirolli 2013; Baldassarre et al. 2014; Santucci et al. 2016) indicate how the implementation of these processes indeed requires very sophisticated architectures able to store multiple skills, to transfer knowledge while avoiding catastrophic interference, to explore the environment based on the acquired skills, to self-generate goals/tasks, and to focus on goals that ensure a maximum knowledge gain. Building machines that learn and think for themselves
Autonomous development and learning in artificial intelligence and robotics: Scaling up deep learning to human-like learning
- comparing their system with other autoML frameworks:
- Autosklearn
- autostacker
- TPOT
- OpenML datasets
- given: dataset, well defined task, performance criteria
- DARPA D3M (Data Driven Discovery)
- AlphaZerio as a starting point
- single-player game
- DNN for predicting
- pipeline performance (value, or Q fn), and
- action probabilities
- DL meetup group
- Heatmapping.org
- Tutorial: Implementing Layer-Wise Relevance Propagation
- A Quick Introduction to Deep Taylor Decomposition
- albermax/innvestigate: A toolbox to iNNvestigate neural networks’ predictions!
- sebastian-lapuschkin/lrp_toolbox: The LRP Toolbox provides simple and accessible stand-alone implementations of LRP for artificial neural networks supporting Matlab and Python. The Toolbox realizes LRP functionality for the Caffe Deep Learning Framework as an extension of Caffe source code published in 10/2015.
- VigneshSrinivasan10/interprettensor
- ArrasL/LRP_for_LSTM
- Also TDLS: Explainable Neural Networks based on Additive Index Models - YouTube
The Bayesian Zig Zag: Developing Probabilistic Models Using Grid Methods and MCMC
- [ ] remember part2 of this talk with tpus.
- multi-armed bandit (traffic routing?)
- injectable functions?
- offline/online(production)
- all to docker image -> sagemaker, cloud, personal premises, etc
- recorded
- tensorflow + etl engine, batch
- nvlink 16-32 gpus now, switch
- xla libs (in tensorflow), it’s cost optimizer to fuse layers/operations
- hiring, jr/sr, san fran incubator - his house, 6mths mentor, nice part of san fran
- AI - Computer Vision | Meetup
- 1st csharma (Cartik)(github)
- cartik.sharma@gmail.com
- Cartik Sharma | LinkedIn (followed)
- NeuroMorph Inc. - Engineering Director
- NEUROMORPH, QUBITS FOR CORTICAL MODELING - researchgate
- qubits for cortical modeling
- image classify products
- yolo, inceptionV3,
- product ID via pics
ACM webinar: Project Jupyter: From Computational Notebooks to Large Scale Data Science with Sensitive Data
- Speaker:
- Project Jupyter: From Computational Notebooks to Large Scale Data Science with Sensitive Data - ACM Learning Webinars - Association for Computing Machinery
- orgs slide
- LIGO Open Science Center
- NumFOCUS: parent 501(c), big org for lots of FOSS
- message specification JSON
- transport layer over ZeroMQ or WebSockets
- see client and server in repo
- nteract: alt frontend, simple
- binder: https://mybinder.org, turn repo into working notebooks, handles all deps
- Regulations: HIPPA, FERPA, GDPR, FedRAMP, Title 13, Title 26, SOX, GLBA, Cali consumer privacy act, AB. 375
- Five Safe (Desai, Ritche, Welpton)
- Jlab: bash, C++11,14, Javascript
- A High-Level Grammar of Interactive Graphics | Vega-Lite
- GenePattern
- jupyter/nbdime: Tools for diffing and merging of Jupyter notebooks.
- jupyter display - Google Search, for better visuals
- Observable - ???
- kubeflow
- middleware for ML applications
- “model inference is hard…latency, accuracy, size, throughput, energy eff, and rate of experimentation”
- current sols research focused & offline
- crowded space, not lots of production engineering
- bespoke 1-off, offline
- kafka?
- explainability in here too, use LIME
- DeepInterpreter? (his notebook 6)
- tensorflow lite (aka mobile, tf compile), smaller, faster format, optimize
- kafka for data access
- autograph - converts python into tensorflow graphs
- chris@pipeline.ai for jobs, questions,
- PipelineAI - Community
- Other Projects:
- on github:
- Good points:
- explanation vs justification
- explanation vs causality
- not prescriptive
- decisions in healthcare
- heuristics majority
- rules based system
- ML based system
- # of factors in diagnosis
- [Sculley 2015], ML code is small part of ML in healthcare
- Q: At slide x, what do the abbreviations LOS, ROR, SSI EF stand for? (slide with healthcare utilization) – LOS .- Length of Stay, RoR - Risk of Readmission, SSI - surgical site infection, EF - ejection fraction (cardiac)
- [ ] Transparent
- Falling Rule Lists
- GAM(GeneralizedAdditiveModels)
- GA2M(GeneralizedAdditiveModelswith
- LIME(LocallyInterpretableModelAgnostic Explanations)
- Naïve Bayes
- Regression Models
- Shapley Values
- semi - shallow ensembles
- non-transparent
- deep learning
- SVM
- gradient boosting models
- transparency, fidelity, trust
- howto validate explanations?
- missed some stuff just before here:
- https://youtu.be/JO0LwmIlWw0?t=4707
- Sonnet, for this course they don’t use high level sonnet or keras,
- alternative frameworks https://youtu.be/JO0LwmIlWw0?t=4890
- then colab
- just before https://youtu.be/JO0LwmIlWw0?t=5358, we want visualizations to see what’s happening
- tf.resetgraph
- there’s a default graph
- graphs get complicated quickly, notoriously hard to debug
- minor debug tips (around) https://youtu.be/JO0LwmIlWw0?t=6299
- NOTES:
- Xiyangs notebook in my gan-explorations repo has dcgan and conditional dcgan, not sagan
- Conditional GANs | Kaggle, nice ref
- full attendance
- Paper Read
- Intro
- [ ] generative models broadly?
- Related work
- [ ] why are markov chains needed in previous ones?
- [ ] VAEs
- Adverserial Nets
- [ ] train on data period first, then compete?
- [ ] scores are errors?
- [ ]
- cross entropy
- z ~ uniform()
- Theoretical results
- Experiments
- [ ] Gaussian Parzen window?
- Pros Cons
- [ ] helvetica scenario?
- [ ] negative chain boltzmann machine
- markov chains, inference, not needed
- Mchains need blurry distributions for chains to mix between modes.
- this can represent sharp, even degenerate distributions
- Conclusion
- learned approximate inference
- variational inference
- MCMC inference
- AIS?
- Parzen density?
- Intro
- Lecture 13 | Generative Models - YouTube
- for G, better objective max D getting wrong max(D(G(z))), instead of minimizing (1-D(G(z)))
- Wasserstein GAN supposed to avoid issue with balancing training between G and D
- Tips:
- replace any pooling layers with strided convs(D), and fractional-strided cons(G)
- use batchnorm in both G and D
- remove fully connected hidden layers for deeper architectures
- use relu in G for all layers, output use Tanh
- use leakyRelu in D for all layers
- active research:
- better loss functions, more stable training (Wasserstein, LSGAN, etc)
- conditional GANs
- all kinds of applications
- current active generative models research
- PixelRNN and PixelCNN
- explicit density model
- optimizes exact likelihood
- good samples
- inefficient sequential generation
- VAE
- optimize variational lower bound on likelihood
- useful latent representation
- inference queries
- samples not great
- GANs
- game-theoretic approach, best samples
- tricky & unstable to train
- no inference queries
- recent work to also combine the above
- PixelRNN and PixelCNN
- Goodfellow Tutorial 2016
- Generating Pokemon with a Generative Adversarial Network - YouTube
- DCGAN deep convolutional GAN, 1st improvement
- batchnorm must for both
- avoid fully connected hidden units
- avoid pooling, simply stride the conv (or capsule)
- relu like other guides
- use this for baseline comparison, esp for non-simple datasets
- CGANs conditional GANs
- concatenate same y’ input to both z (for G), and x (for D), ie. text labels for that neat trick
- Wasserstein
- improve loss fn, eg. when to stop?
- highest training stability
- informative & interpretable loss fn
- DCGAN deep convolutional GAN, 1st improvement
- [2018-09-11 Tue]
- Intro
- Scores
- Inception score
- Frechet Inception distance
- ImageNet dataset
- Scores
- Intro
- SAGAN part
- image features –> 2 weighted feature spaces f,g
- β’s between different regions of f,g
- optimize params: w_f, w_g, w_h,
- derived: βj,i, o_i,
- β - NxN attention map
- i,j ∈ {1..N}
# f w array
# g w array
# softmax f(xi)^T * g(xj)
- γ - training hyperparameter
- hinge loss
- [X] with no pooling, how to reduce dims btw conv layers?
- 1x1 convs?
- batchnorm?
- spectral normalization (G and D)
- TTUR (two-timescale update rule)
- imbalanced learning rate
- TTUR seperate learning rates
- for Self-Attention-GAN-Tensorflow repo, changed default dataset to mnist, from celebA (only 3 pics in there)
For MNIST
- generator
- layers = 8-3 = 5
- 1 layer 1024 channels
- 3 conv layers (1024, 512, 256)
- attention layer (128)
- 2 conv layers (128, 256)
- 1 conv layer sigmoid
- layers = 8-3 = 5
- discriminator
- layers = 8-3 = 5
- 1 layer 64 channels
- 3 conv layers (64, 128, 256)
- attention layer (256 ch)
- 2 conv layers (256, 512)
- 1 conv layer (4), flatten
- 1 dense layer sigmoid
- layers = 8-3 = 5
- not just D
- every layer for both
- to compensate for slow learning since D has regularization applied
- ID (Inception Distance)
- KL divergence btw conditional class and marginal class.
- higher better
- has problems
- FID (Frechet Inception Distance) is a more principled and comprehensive metric, and has been shown to be more consistent with human evaluation in assessing the realism and variation of the generated samples
- Wassertein-2 distance between generated and real images in the feature space of an Inception-v3 network.
- lower values mean closer distances between synthetic and real data distributions
- 128 x 128 images
- spec-norm every layer on both G and D
- conditional batch normalization for G, and projection type for D.
- Adam optimizer:
- beta1 = 0, beta2 = 0.9
- learn rate for D = 0.0004
- learn rate for G = 0.0001
- SAGAN uses conditional batch normalization in the generator and projection in the discriminator.
- attention mid to late layer is best
- both G and D
- complements convolution, which is strong in modeling local dependencies
- We observe that the network learns to allocate attention according to similarity of color and texture, rather than just spatial adjacency.
- Xiyang recommended references
- Dave Macdonald RangleIO guy trying to get ML going
- 1 month? presentation
- Q’s
- regions = ? (ith and jth), pixel, some arbitrary rectangle
- embedding space? (non-local paper. embedded gaussian sec) dimension reduction?
- you don’t need softmax constraint? absolute value can …?
- [ ] hinge loss could use followup, add to doc
- spectral normalization (see paper)
- lipschitz condition
- compute eigenvalues of weights - sqrt of highest - but
- [ ] followup
- [ ] measures:
- FID
https://github.com/soumith/ganhacks
While research in Generative Adversarial Networks (GANs) continues to improve the fundamental stability of these models, we use a bunch of tricks to train them and make them stable day to day.
Here are a summary of some of the tricks.
[Here’s a link to the authors of thi1s document](*a*uthors)
If you find a trick that is particularly useful in practice, please open a Pull Request to add it to the document. If we find it to be reasonable and verified, we will me11rge it in.
- normalize the images between -1 and 1
- Tanh as the last layer of the generator output
In GAN papers, the loss function to optimize G is `min (log 1-D)`, but in practice folks practically use `max log D`
- because the first formulation has vanishing gradients early on
- Goodfellow et. al (2014)
In practice, works well:
- Flip labels when training generator: real = fake, fake = real
- Dont sample from a Uniform distribution
![cube](images/cube.png “Cube”)
- Sample from a gaussian distribution
![sphere](images/sphere.png “Sphere”)
- When doing interpolations, do the interpolation via a great circle, rather than a straight line from point A to point B
- Tom White’s [Sampling Generative Networks](https://arxiv.org/abs/1609.04468) ref code https://github.com/dribnet/plat has mor11e details
- Construct different mini-batches for real and fake, i.e. each mini-batch needs to contain only all real images or all generated images.
- when batchnorm is not an option use instance normalization (for each sample, subtract mean and divide by standard deviation).
![batchmix](images/batchmix.png “BatchMix”)
- the stability of the GAN game suffers if you have sparse gradients
- LeakyReLU = good (in both G and D)
- For Downsampling, use: Average Pooling, Conv2d + stride
- For Upsampling, use: PixelShuffle, ConvTranspose2d + stride
- PixelShuffle: https://arxiv.org/abs/11609.05158
- Label Smoothing, i.e. if you have two target labels: Real=1 and Fake=0, then for each incoming sample, if it is real, then replace the label with a random number between 0.7 and 1.2, and if it is a fake sample, replace it with 0.0 and 0.3 (for example).
- Salimans et. al. 2016
- make the labels the noisy for the discriminator: occasionally flip the labels when training the discriminator
- Use DCGAN when you can. It works!
- if you cant use DCGANs and no model is stable, use a hybrid model : KL + GAN or11 VAE + GAN
- Experience Replay
- Keep a replay buffer of past generations and occassionally show them
- Keep checkpoints from the past of G and D and occassionaly swap them out for a few iterations
- All stability tricks that work for deep deterministic policy gradients
- See Pfau & Viny11als (2016)
- optim.Adam rules!
- See Radford et. al. 2015
- Use SGD for discriminator and ADAM for11 generator
- D loss goes to 0: failure mode
- check norms of gradients: if they are over 100 things are screwing up
- when things are working, D loss has low variance and goes down over time vs having huge variance and spiking
- if loss of generator steadily decreases, then it’s fooling D with garbage (says martin)
- Dont try to find a (number of G / number of D) schedule to uncollapse training
- It’s hard and we’ve all tried it.
- If you do try it, have a principled approach to it, rather than intuition
For example “` while lossD > A: train D while lossG > B: t11rain G “`
- if you have labels available, training the discriminator to also classify the samples: auxi11llary GANs
- Add some artificial noise to inputs to D (Arjovsky et. al., Huszar, 2016)
- adding gaussian noise to every layer of generator (Zhao et. al. EBGAN)
- Improved GANs: OpenAI code also has it (comm11ented out)
- especially when you have noise
- hard to find a schedule of number of D iterations vs G 11iterations
- Mix11ed results
- Use an Embedding layer
- Add as additional channels to images
- Keep embedding dimensionality low and upsample to match image ch11annel size
- Provide noise in the form of dropout (50%).
- Apply on several layers of our generator at both training and test time
- https://arxiv.org/pdf/1611.0711004v1.pdf
Authors
- Soumith Chintala
- Emily Denton
- Martin Arjovsky
- Michael Mathieu
Deep Learning with Generative Adverserial Networks – ICLR 2017 Discoveries - https://amundtveit.com/2016/11/12/deep-learning-with-generative-and-generative-adverserial-networks-iclr-2017-discoveries/
2018-05-07 Monday Improved Techniques for Training GANs - I Goodfellow 2016 Wasserstein GAN On the regularization of Wasserstein GANs | OpenReview Training GANs with Optimism Progressive Growing of GANs for Improved Quality, Stability, and Variation | OpenReview
no notes
- try with celebA dataset:
- [-] sthalles repo
- [X] trying his dcgan first.
- [ ] get working
- [ ] Xiyang code?
- [ ] hhhhhao/paper repo
- [ ] mod to accept new data
- [ ] test run
- [ ] add spectral normalization to hhhhhhhao/paper repo
- [-] sthalles repo
- [ ] get celebA in tfFrames
- [ ] what dim changes to get code to run these?
- [ ]
implement attention layer on top of base repo Werner tensorflow gan base start
- 128x128 batch 8 to 16
- bigger batches better for both G and D, better gradients and ?? something else Dave said
- recommended Deep Residual Learning for Image Recognition paper
- Dave:
- cycle GANs intro
- doodles = a source distribution (besides real images)
- his generated doodles end up monochrome - why?
- synthetic data used - horizontal flip, random rotation; since dataset fairly small
- image batch readers on repeat (actual method)
- different design ideas decisions
- eve optimizer didin’t work well with GANs
- spectralnorm
def conv(o, channels, ks=3, strides=1, norm=None, padding='SAME', name=None):
with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
if norm is not None:
o = norm(o, name)
o = LeakyReLU() (o)
in_channels = o.get_shape()[-1]
w = tf.get_variable("kernel", shape=[ks, ks, in_channels, channels], initializer=tf.keras.initializers.he_uniform())
b = tf.get_variable("bias", [channels], initializer=tf.constant_initializer(0.0))
o = tf.nn.conv2d(o, spectral_norm(w, name_prefix="w"), [1, strides, strides, 1], padding) + b
return o
# https://github.com/taki0112/Spectral_Normalization-Tensorflow/blob/master/spectral_norm.py
def spectral_norm(w, iteration=1, name_prefix=""):
w_shape = w.shape.as_list()
w = tf.reshape(w, [-1, w_shape[-1]])
u = tf.get_variable(name_prefix+"u", [1, w_shape[-1]], initializer=tf.random_normal_initializer(), trainable=False)
u_hat = u
v_hat = None
for i in range(iteration):
"""
power iteration
Usually iteration = 1 will be enough
"""
v_ = tf.matmul(u_hat, tf.transpose(w))
v_hat = tf.nn.l2_normalize(v_)
u_ = tf.matmul(v_hat, w)
u_hat = tf.nn.l2_normalize(u_)
u_hat = tf.stop_gradient(u_hat)
v_hat = tf.stop_gradient(v_hat)
sigma = tf.matmul(tf.matmul(v_hat, w), tf.transpose(u_hat))
with tf.control_dependencies([u.assign(u_hat)]):
w_norm = w / sigma
w_norm = tf.reshape(w_norm, w_shape)
return w_norm
- softened hinge-loss is better for cycle-GAN (Dave)
- Softened hinge loss objectives for Generator and Discriminator:
fake_term = tf.reduce_mean(tf.nn.softplus( fake * SCALE + OFFSET))
real_term = tf.reduce_mean(tf.nn.softplus(-real * SCALE + OFFSET))
gen_term = tf.reduce_mean(tf.nn.softplus(-fake * SCALE + OFFSET))
- hinge-loss gradients smaller (but nicer?)
- Dave found 10x diff in learning rate of cycle-GAN high
- L1 loss on pixels themselves -> a strong signal
- large gradients
- loss with cyclic check…
- but with hinge-loss gradients are limited to 1 so then not a problem
Link on page 2: http://www.iangoodfellow.com/slides/2016-12-04-NIPS.key The video was recorded by the NIPS foundation and should be ma
Link on page 6: create images. A video demonstration of iGAN is available at the following URL: https://www.youtube.com/watch?v=9c4z6YsBGQ0
Link on page 7: https://www.youtube.com/watch?v=FDELBFSeqQs
Link on page 30: GitHub repository associated with Soumith’s talk: https://github.com/soumith/ganhacks
Link on page 54: oodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org
- DQN
- tricks
- AlphaGo
- monte carlo tree search
- use residual networks
- monte carlo tree search
- driving https://youtu.be/MQ6pP65o7OM?t=2527
- watch videos from slack /rl
- review last week
- Review:
- MCD, decision process, actual costs, actions & inputs
- agent approximations of above
- state, value function, value of state
- oracle provides actual values (π)
- state value fn – v^(s,w)
- action value function q_pi actual, q^(s,a,w)
- s = state, a = action, w = weights of net
- RL problems do not have oracle information about actual π values of either functions, we must get estimates via environment interactions
- 1st strategy MC learningk
- [ ] notes, Qs
- [ ] generally:
- [ ] what criteria(theory) is there to select the different types of layers
- [ ] activatation functions
- other aspects
- [ ] and why certain number of layers?
- [ ] what criteria(theory) is there to select the different types of layers
- [ ] somehow keep track all the different variables,fns in RL
- [ ] https://youtu.be/SWpyiEezfp4?t=55
- [ ] https://youtu.be/MqTXoCxQ_eY?t=78
- [ ] generally:
- [ ] why is network used to get reward value with TD,q-learning?
- ie why in the target valuation?
- [ ] how to keep track of things?
- [ ] use Deep Q-learning doc on Gdrive to post Q’s and rough work
- NO, traffic - javascript based, old. also NO on car-pole? atari gym
- missed #5, first miss
attended. links:
layers.py - tkipf/gcn - Sourcegraph utils.py - tkipf/pygcn - Sourcegraph train.py - tkipf/pygcn - Sourcegraph Sparse matrices (scipy.sparse) — SciPy v1.2.1 Reference Guide scipy.sparse.coo_matrix — SciPy v1.2.1 Reference Guide scipy.sparse.eye — SciPy v1.2.1 Reference Guide scipy.sparse.diags — SciPy v1.2.1 Reference Guide pygcn/data/cora at master · tkipf/pygcn main.py - pytorch/examples - Sourcegraph [[https://aisc.a-i.science/events/2019-03-27/][[GCN] Semi-Supervised Classification with Graph Convolutional Networks | Lunch & Learn | A.I. Socratic Circles (#AISC)]] [[https://www.youtube.com/watch?v=eEs-qXs_9Dc][[GCN] Semi-Supervised Classification with Graph Convolutional Networks | AISC Lunch & Learn - YouTube]] 0711.0189.pdf networkx.generators.random_graphs.barabasi_albert_graph — NetworkX 2.3rc1.dev20190329133857 documentation AlxndrMlk/Barabasi-Albert_Network: Barabási–Albert Network. A Step-by-Step Model with Visualizations created in Python 3. algorithm - Python: implementing a step-by-step, modified Barabasi-Albert model for scale-free networks - Stack Overflow python-igraph manual GraphSAGE [[https://arxiv.org/abs/1706.02216][[1706.02216] Inductive Representation Learning on Large Graphs]] New Tab NetworkX — NetworkX Beyond Grids: Learning Graph Representations for Visual Recognition New Tab Papers With Code : Search for graph convolution
[2019-05-25 Sat] python3 main.py –exp_name $EXPNAME –dataset omniglot –test_N_way 5 –train_N_way 5 –train_N_shots 1 –test_N_shots 1 –batch_size 300 –dec_lr=10000 –iterations 100000
python3 main.py –exp_name $EXPNAME –dataset omniglot –test_N_way 5 –train_N_way 5 –train_N_shots 1 –test_N_shots 1 –batch_size 300 –dec_lr=10000 –iterations 500
python3 main.py –exp_name $EXPNAME –dataset omniglot –test_N_way 5 –train_N_way 5 –train_N_shots 1 –test_N_shots 1 –batch_size 300 –dec_lr=10000 –save_interval 10 –iterations 500
- meet1 no notes, can’t remember what was said
- meet2 [2019-06-22 Sat 14:58]
- trying to understand BERT
- watching recommended talk to explain bert/nlp Language Learning with BERT - TensorFlow and Deep Learning Singapore - YouTube
Other links from previous meets
- GloVe: Global Vectors for Word Representation
- tensor2tensor/common_attention.py at d9f807cf2738323d19aba0a20a8cf0c7f7da8b27 · tensorflow/tensor2tensor
- The Annotated Transformer
- Attention Is All You Need - YouTube
- The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time
- The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) – Jay Alammar – Visualizing machine learning one concept at a time
- Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) – Jay Alammar – Visualizing machine learning one concept at a time
- via above
- AI2 - they have great nlp, esp for research papers
- read papers - we are looking at landscape what’s out there
for Xiyangs graph-nn group
ICLR 2018 report Quora i [12] Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis [13] [1711.00740] Learning to Represent Programs with Graphs [14] [1802.03691] Tree-to-tree Neural Networks for Program Translation
ICLR 2018 report Quora ii [13] [1711.00740] Learning to Represent Programs with Graphs [19] https://tkipf.github.io/graph-co… [19b] [1712.00268] Deformable Shape Completion with Graph Convolutional Autoencoders [20] Graph Attention Networks [21] https://www-cs.stanford.edu/grou… [22] [1711.04043] Few-Shot Learning with Graph Neural Networks
1609.02907 Semi-Supervised Classification with Graph Convolutional Networks 1706.02216 Inductive Representation Learning on Large Graphs 1710.10903 Graph Attention Networks 1611.08402 Geometric deep learning on graphs and manifolds using mixture model CNNs Graph Convolutional Networks | Thomas Kipf | PhD Student @ University of Amsterdam 1801.10247 FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling [[https://www.google.com/search?q=%5B13%5D+%5B1711.00740%5D+Learning+to+Represent+Programs+with+Graphs&oq=%5B13%5D+%5B1711.00740%5D+Learning+to+Represent+Programs+with+Graphs&aqs=chrome..69i57.651j0j7&sourceid=chrome&ie=UTF-8][[13] [1711.00740] Learning to Represent Programs with Graphs - Google Search]] DeepMind-Advanced-Deep-Learning-and-Reinforcement-Learning/dl_01 Introduction to Machine Learning Based AI.pdf at master · enggen/DeepMind-Advanced-Deep-Learning-and-Reinforcement-Learning [[https://arxiv.org/abs/1810.09202][[1810.09202] Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation]] Graph Neural Network - YouTube Graph Neural Networks - YouTube Graph Convolution Learning - YouTube Xavier Bresson: “Convolutional Neural Networks on Graphs” - YouTube williamleif/GraphSAGE: Representation learning on large graphs using stochastic graph convolutions. matenure/FastGCN: The sample codes for our ICLR18 paper “FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling””
- Laplacian Operator core of spectral graph theory
- Q: fixed input size?
- number of vertices
- and/or edges
- K&W,
- X - feature matrix of nodes
- A - adjacency matrix
1709.05429.pdf, Research - Hector Zenil, Complexity Explorer,
1812.04202.pdf, 1806.01261 Relational inductive biases, deep learning, and graph networks, 1811.05868 Pitfalls of Graph Neural Network Evaluation, 1812.08434 Graph Neural Networks: A Review of Methods and Applications, Meet Deep Graph Library, a Python Package For Graph Neural Networks, 1812.04202.pdf
Learnability can be undecidable | Main Stream | A.I. Socratic Circles (#AISC)
- organic, connection, psych, bio-unrealistic
- computationalism
- Hinton PDP 1st book
- bio neurons also fire randomly
- bio/psych qualitative more than quantitative
- anatomy of learning – levels
- focus on the function only
- think a crypto function, it is not learnable
- connection between computational complexity and learnability? Yes
- applies to QM computing, since QM comp is still a function
- bare minimum, a function that learns.
- start with set X
- family of
- inductive argument for most papers, inductive assumption, that their algo works for a certain set of problems. try to convince generalizability by testing on samples from the problem space.
- this paper deductive
- [ ] finitely supported def
- Yaxal - temporal pattern attention for multivariate time series forecasting
- Ramya B - time series forecasting based on wavelet decmposition and feature extraction
- not stationary signals.
- wavelet for SP, not fourier nowadays?
- cascade correlation algo used
- also use PCA
- Albert Lai - captchas?
- a closer look at spatiotemporal convolutions for action recognition - Owen Ho
- a resnet (how?), 3D instead of 2D
- video recognition
- Florian Goebels - A generic framework for privacy preserving deep learning
- federated learning, interesting.
- a reinforcement learning framework for explainable recommendation - Omar Nada
- Jiri Stodulka - Deep RL based recommendation with explicit user-item interactions modeling
- Jorge Lopez - predicting crime using twitter and kernel density estimation
- Morten Dahl Dropout Labs
- homomorphic encryption
- secret sharing
- TFE
- underneath, 3rd party libs: TEE, HE, MPC
- 1,2 orders of magnitude slower
- uses current TF distributed code for communication (orchestratin)
- tf.device
- collaboration with openmind for pytorch – more federated. they are not so focused on federation
- can access openmind code from tfe
- ie. costs - ReLu uses a comparison which is expensive for encryption
- can try to approximate which could impact accuracy
- future
- ethical issues, privacy,
(TF-Encrypted) Private machine learning in tensorflow with secure computing | Lunch & Learn | A.I. Socratic Circles (#AISC) 1810.08130 Private Machine Learning in TensorFlow using Secure Computation tf-encrypted/tf-encrypted: A Framework for Machine Learning on Encrypted Data
Interpreting Scenes, Words & Sentences From Natural Supervision- language semantic parsing -> heirarchical program
- visual concept annontation and program annotation –> symbolic reasoning module
- curriculum learning
- root then query, then filter for program
- once object is identified, it is filtered out
- multiple candidate programs for sentence / concept are sampled
- reinforcement with
- bidirectional GRU encoder for
- concept decoder (hard coded)
- algo1 string to tree semantic parser
- think of it as a recursive algorithm as it needs to be re-called
- 2 separate GRU cells (hardcoded), for given function, it can call 2 other functions
- once objects recognized and program generated (semantic parsing), then symbolic reasoning
- parts:
- from pic
- object detection
- feature extraction
- concept box
- text
- semantic parsing
- concept embeddings
- program box
- from pic
- off-policy search process for program selection (semantic parsing) (the reasoning process?)
- how is reward determined? updates weights once correct answer is found??
- attribute - shape, concept - sphere, etc
- different embeddings for different concepts
- but voc vector represent attributes
- embeddings space is hard-coded, but withiin space is learned
- cirriculum learning – what exactly?
- stupidly simple to start,
- mask r-cnn, resnet are pretrained
- concept embeddings, semantic parsing are trained, as is the neuro-symbolic reasoning (NSR)
- runs fns over objects and embeddings (RL part)
- semantic parser is trained on RL, not others (those are backpropped)
- concept embeddings, semantic parsing are trained, as is the neuro-symbolic reasoning (NSR)
- NSR it is differentiable, this whole model is end-to-end apparently
- RL is the GRUs? programs are built by the GRUs
- if program gives correct answer reward = 1 otherwise = 0
UoW — 2 types of features. machine intelligence course 5th lecture Alice Rueda Aggregating local image descriptors into compact codes
Alice Rueda This is the VLAD paper
Discussion lead: Santo Fortunato Motivation: Identifying fundamental drivers of science and developing predictive models to capture its evolution are instrumental for the design of policies that can improve the scientific enterprise—for example, through enhanced career paths for scientists, better performance evaluation for organizations hosting research, discovery of novel effective funding vehicles, and even identification of promising regions along the scientific frontier. The science of science uses large-scale data on the production of science to search for universal and domain-specific patterns. Here, we review recent developments in this transdisciplinary field.slides:
- scientometrics
- 2 founders
- one guy started WOS (web of science)
- data
- WOK thomson thomson-reutuers
- scopus elsevier
- gscholar AI
- MS academic graph, bigger than everyone else AI
- H-index
- c / c_0
- interesting points prob that paper A cites older paper B
- cite dynamics 3 paper specific parameters
- preferential attachment - i cited the more its num of citations
- time decay sruvival prob (obsolescence)
- intrinsic fitness η_i of the paper
- teams
- science is becoming more and more team science
- team size is growing
- team papers are more cited
- Q team size affect type of contribution
- yes. small teams disrupt, new ideas, while large teams develop existing ideas
- distruption index
- DL is popular
- API
- careers - scientists can peak at anytime in their life, there is no pattern
- make queries on their static local dataset, generated own network
- have dataset locally if its big
- pubmed
- dataset shared? invest with grants to get them.
- pubmed, American physical society - free
- word2vec vs node2vec
Motivation: Machine learning models are notoriously difficult to interpret and debug. This is particularly true of neural networks. In this work, we introduce automated software testing techniques for neural networks that are well-suited to discovering errors which occur only for rare inputs. Specifically, we develop coverage-guided fuzzing (CGF) methods for neural networks. In CGF, random mutations of inputs to a neural network are guided by a coverage metric toward the goal of satisfying user-specified constraints. We describe how fast approximate nearest neighbor algorithms can provide this coverage metric. We then discuss the application of CGF to the following goals: finding numerical errors in trained neural networks, generating disagreements between neural networks and quantized versions of those networks, and surfacing undesirable behavior in character level language models. Finally, we release an open source library called TensorFuzz that implements the described techniques.
- info on his startup and the scene
- cylance hack: enable dynamic debugging
- coverage guided fuzzing
- property based testing
- approximate nearest neighbour
- CGF (fuzzing) hard to do for ANNs
- tensorfuzz
- send in NN graph, not code
- images or text are ok
- discussion points
- not real abliation studies, they are retraining on prior tasks
- not properly continuous learning?
- Ehsan is bringing this up, thinks another paper should be done with ablation studies, and to do only sampling from prior tasks to see if catastrophic forgetting happens
- how much improvement is architecture vs just more data and bigger?
we need cheap ASICS ASAP, except those with the budget will just get bigger yet
- reproducible, trackable, testable, maintainable
- give higher view, you can customi
- ML automates decision making
- trading: buy, sell?
- health: is there tumor?
- market pricing?
- 2011 knight capital story 45min lost $465mil
- hidden tech debt in ML systems, sculley
- prep data, build & train, deploy
- [ ] we’ll be using azure – free credit
/home/will/Zotero/storage/69G8AVWR/Francois-Lavet et al. - 2018 - An Introduction to Deep Reinforcement Learning.pdf
- esp relevant to NLP
- other methods to reduce complexity of softmax layer.
- relates softmax bottleneck to matrix factorization bottleneck
- something of a binary structure?? probability of nodes in a tree.
- M vocab size, N reps different contexts (# words in training data)
- dirichlet -> prior , when ppl want to sample discrete distributions
- taylor expansions -> when they know that the higher order derivatives is smaller .. close to 0?
- why conditional?
- why mcmc needed?
- what is contrastive divergence?
- training goal: get all visible states more probably likely
- uses log-likelihood of V()
- what is the e symbol?
- how relate to matrix factorization techniques?
- conditional factored RBMs?
- collaborative filtering
- SVD is a niave matrix factorization, there are many matrix factorizations?
- Harriet related to LDA somehow
- A Zhavoronkob CEO Insilico Medicine
- novel drug discovery
- de-novo molecule creation
- drug discovery (DD)
- slow, lots of failures
- 10 years whole pipeline , 5.5 years for research/pre-clinical
- GAN text to image synthesis
- GANs in drug discovery “make perfect needles” vs needle in haystack
- go to Alan Aspuru-Guzik lab
- adding RL to GAN
- 3 years of using generative models
- can’t use G for everything
- GA though can outperform G sometimes
- SMILE and Graphs?
- Daniel
- EU head of division for DL
- AE, VAE
- latent codes: AE sparse, VAE gaussian distributed - can do better inference
- VAE KL regulator on latent space
- AAE
- why AE? no mode collapse, works with discrete data out of the box
- for molecules small differences matter, not like images
- SMILES - rep mol as a string
- build a spanning tree
- write atoms in depth-first search order
- pip package: rdkit
- they combine both
- conditional generation x ~ p(x | properties
- optimization : quality(x) -> max_x
- GENTRL(theres) vs ORGAN, RANC ATNC
- multi-modal priors – get artifacts at boundaries
- tensor train
- a prior of a googol gasussiand: a tensor ring induced prior for G modelss
- marginals and conditionals derivation
- cts case – gaussian mixture
- optimize reward with REINFORCE
- optimize the latent manifold
- SOM
- crystal -> analyze surface – sounds like static
- template - small molecule that binds to this protein (target)
- use that mol as a template
- also research into synthetic prediction and automation of synthesizing mols
- Garapuz labs again
- model zoo: reps for molecules, lots on smiles/seqs, more now on graph approaches, but smiles still better
- fingerprints (his fav), take fingerprint – gen all chemical space that has same fingerprint
- gen fingerprints by conditioning
- 3D, point clouds, don’t know conformation of molecules
- fingerprints (his fav), take fingerprint – gen all chemical space that has same fingerprint
- find a lab, gradually transition into field
- pharma field is a pain
- deepgenomics good
- MS also in field, or join them
- american chemical society journals, etc, not just nips, iclr
- https://test.ai/ - example use to auto-test websites
- DOM tree
- html, dom,
- miniWOB+ dataset - web tasks.
- style info important?
- leverage headless browser to render dom tree and then take it for the DNNs
- then get nodes and attributes from DOM tree
- then rendered html page from dom
- generate json from node tree (dom)
- then generate html from json file
- 225 attributes per node
- DNN would need to learn about inheritnac
- pro team - acceptance criteria, steps to produce, QA finish closes ticket.
- 3 levels, functional, business verification,
- test.ai
- rendered image recognize a button,
- once dom tree breaks, query fails?
- [ ] anything relevant to tree structure
- [X] think of tasks
- [ ] graph generation part?
- [ ] decoder? for graph
- [ ] GCN are efficient and speed matters for us since we will have a lot of data points
- also parallelization matters for high throughput
- [ ] meetup Zhang go through his code
- upsampling tree structure from z-space
- work on the input processing part for Graph NN (i.e. translate the JSON DOM tree into one-hot features plus adjacency matrices)? There are still details and techniques that need to be hammered out, but maybe a good place to start is @Sheng Jia’s codebase.
- Sheng Jia pytorch
- miniwob/env.py and miniwob/custom.py does some preprocessing, flatten out the tree into lists and record the indices etc
- The adjancent nodes are labeled for each node as in key-value pair “adj_V” by my wrapper environment,
- and the actual adjacency matrix is created in models/dom_qnet.py line 112.
- Essentially my custom environment wrapper returns
- a list of dom nodes, each of which has multiple key-value pairs for the attributes including the adj_V.
- Those are processed by the model for getting the actual feature vectors.
- miniwob/env.py and miniwob/custom.py does some preprocessing, flatten out the tree into lists and record the indices etc
- generative code modelling on graphs paper:
- He said one way to enable efficient graph generation during training is to put a batch of graphs into a single graph where they are not connected to each other, and in implementation just use sparse matrix.
- [ ] what would sparse matrix look like?
- no autograd on sparseM
- trail run of graphNN/gae
- several fails with proper env versioning.
- need python3.6 first, then run setup. tensorflow=1.13
- env = gae.
- train.py | 200 epochs
- Test ROC score: 0.9160440573364683, Test AP score: 0.9308764945564731
- python train.py –dataset citeseer | 200 epochs
- Test ROC score: 0.8687404902789517, Test AP score: 0.8759123955009718
- several fails with proper env versioning.
utils/tensorise.py test_data/tensorised test_data/exprs-types.json.gz test_data/graphs/
WARNING: Logging before flag parsing goes to stderr. W0622 12:03:49.424902 140087753975616 deprecation_wrapper.py:119] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfutils/gradratiologgingoptimizer.py:19: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.W0622 12:03:49.474851 140087753975616 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:123: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2019-06-22 12:03:49.512044: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2019-06-22 12:03:49.512087: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303) 2019-06-22 12:03:49.512111: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (workstation): /proc/driver/nvidia/version does not exist 2019-06-22 12:03:49.533996: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3701080000 Hz 2019-06-22 12:03:49.534493: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557c315b29e0 executing computations on platform Host. Devices: 2019-06-22 12:03:49.534519: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> Imputed grammar: Expression -[00]-> ! Expression Expression -[01]-> - Expression Expression -[02]-> – Expression Expression -[03]-> CharLiteral Expression -[04]-> Expression * Expression Expression -[05]-> Expression + Expression Expression -[06]-> Expression ++ Expression -[07]-> Expression . IndexOf ( Expression ) Expression -[08]-> Expression . IndexOf ( Expression , Expression , Expression ) Expression -[09]-> Expression . StartsWith ( Expression ) Expression -[10]-> Expression < Expression Expression -[11]-> Expression > Expression Expression -[12]-> Expression ? Expression : Expression Expression -[13]-> Expression [ Expression ] Expression -[14]-> IntLiteral Expression -[15]-> StringLiteral Expression -[16]-> Variable Known literals: IntLiteral: [‘%UNK%’, ‘0’, ‘1’, ‘2’, ‘4’, ‘43’] CharLiteral: [‘%UNK%’, “’-‘”] StringLiteral: [‘“foobar”’, ‘%UNK%’] W0622 12:03:49.610398 140087753975616 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:175: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
W0622 12:03:49.610724 140087753975616 deprecation_wrapper.py:119] From utils/../exprsynth/contextgraphmodel.py:142: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
W0622 12:03:49.658701 140087753975616 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:212: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
W0622 12:03:50.152516 140087753975616 deprecation.py:323] From utils/../exprsynth/contextgraphmodel.py:184: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. W0622 12:03:50.158362 140087753975616 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:03:50.908026 140087753975616 deprecation.py:506] From utils/../exprsynth/contextgraphmodel.py:186: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. W0622 12:03:51.013405 140087753975616 deprecation.py:323] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfmodels/sparsegnn.py:95: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version. Instructions for updating: This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0. W0622 12:03:52.276013 140087753975616 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:564: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:03:52.378436 140087753975616 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:574: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:03:58.355536 140087753975616 deprecation.py:323] From utils/../exprsynth/nagdecoder.py:420: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” W0622 12:04:13.613733 140087753975616 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:200: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
utils/train.py trained_models/overtrain test_data/tensorised/{,}
WARNING: Logging before flag parsing goes to stderr. W0622 12:10:21.039777 139776961083200 deprecation_wrapper.py:119] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfutils/gradratiologgingoptimizer.py:19: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.W0622 12:10:21.053452 139776961083200 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:123: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2019-06-22 12:10:21.085190: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2019-06-22 12:10:21.085327: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303) 2019-06-22 12:10:21.085415: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (workstation): /proc/driver/nvidia/version does not exist 2019-06-22 12:10:21.107267: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3701080000 Hz 2019-06-22 12:10:21.107748: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5639c2adef40 executing computations on platform Host. Devices: 2019-06-22 12:10:21.107769: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> W0622 12:10:21.111878 139776961083200 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:175: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
W0622 12:10:21.112083 139776961083200 deprecation_wrapper.py:119] From utils/../exprsynth/contextgraphmodel.py:142: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
W0622 12:10:21.163701 139776961083200 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:212: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
W0622 12:10:21.684241 139776961083200 deprecation.py:323] From utils/../exprsynth/contextgraphmodel.py:184: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. W0622 12:10:21.692071 139776961083200 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:10:22.428447 139776961083200 deprecation.py:506] From utils/../exprsynth/contextgraphmodel.py:186: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. W0622 12:10:22.530452 139776961083200 deprecation.py:323] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfmodels/sparsegnn.py:95: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version. Instructions for updating: This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0. W0622 12:10:23.880191 139776961083200 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:564: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:10:23.902530 139776961083200 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:574: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:10:30.169962 139776961083200 deprecation.py:323] From utils/../exprsynth/nagdecoder.py:420: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” W0622 12:10:45.005255 139776961083200 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:200: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
Starting training run NAG-2019-06-22-12-10-21 of model NAGModel with following hypers: {“optimizer”: “Adam”, “seed”: 0, “dropout_keep_rate”: 0.9, “learning_rate”: 0.00025, “learning_rate_decay”: 0.98, “momentum”: 0.85, “gradient_clip”: 1, “max_epochs”: 500, “patience”: 5, “max_num_cg_nodes_in_batch”: 100000, “excluded_cg_edge_types”: [], “cg_add_subtoken_nodes”: true, “cg_node_label_embedding_style”: “Token”, “cg_node_label_vocab_size”: 10000, “cg_node_label_char_length”: 16, “cg_node_label_embedding_size”: 32, “cg_node_type_vocab_size”: 54, “cg_node_type_max_num”: 10, “cg_node_type_embedding_size”: 32, “cg_ggnn_layer_timesteps”: [3, 1, 3, 1], “cg_ggnn_residual_connections”: {“1”: [0], “3”: [0, 1]}, “cg_ggnn_hidden_size”: 64, “cg_ggnn_use_edge_bias”: false, “cg_ggnn_use_edge_msg_avg_aggregation”: false, “cg_ggnn_use_propagation_attention”: false, “cg_ggnn_graph_rnn_activation”: “tanh”, “cg_ggnn_graph_rnn_cell”: “GRU”, “eg_token_vocab_size”: 100, “eg_literal_vocab_size”: 10, “eg_max_variable_choices”: 10, “eg_propagation_substeps”: 50, “eg_hidden_size”: 64, “eg_edge_label_size”: 16, “exclude_edge_types”: [], “eg_graph_rnn_cell”: “GRU”, “eg_graph_rnn_activation”: “tanh”, “eg_use_edge_bias”: false, “eg_use_vars_for_production_choice”: true, “eg_update_last_variable_use_representation”: true, “eg_use_literal_copying”: true, “eg_use_context_attention”: true, “eg_max_context_tokens”: 500, “run_id”: “NAG-2019-06-22-12-10-21”} 2019-06-22 12:10:48.280541: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=–tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass –vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=–xla_hlo_profile. ==== Epoch 0 ==== Epoch 0 (train) took 12.28s [processed 1 samples/second] Training Loss: 9.437750 Epoch 0 (valid) took 3.41s [processed 4 samples/second] Validation Loss: 8.566715 Best result so far – saving model as ‘trained_models/overtrain/NAGModel_NAG-2019-06-22-12-10-21_model_best.pkl.gz’.
==== Epoch 136 ==== Epoch 136 (train) took 0.88s [processed 17 samples/second] Training Loss: 0.451712 Epoch 136 (valid) took 0.40s [processed 37 samples/second] Validation Loss: 0.392952
utils/test.py trained_models/overtrain/NAGModel_NAG-2019-06-22-12-10-21_model_best.pkl.gz test_data/graphs/ trained_models/overtrain/test_results/
WARNING: Logging before flag parsing goes to stderr. W0622 12:19:52.787774 140550576658240 deprecation_wrapper.py:119] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfutils/gradratiologgingoptimizer.py:19: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.W0622 12:19:52.905472 140550576658240 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:123: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2019-06-22 12:19:52.955281: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2019-06-22 12:19:52.955433: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303) 2019-06-22 12:19:52.955526: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (workstation): /proc/driver/nvidia/version does not exist 2019-06-22 12:19:52.974973: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3701080000 Hz 2019-06-22 12:19:52.975514: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x561a7a83d810 executing computations on platform Host. Devices: 2019-06-22 12:19:52.975536: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> W0622 12:19:52.976696 140550576658240 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:175: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
W0622 12:19:52.976995 140550576658240 deprecation_wrapper.py:119] From utils/../exprsynth/contextgraphmodel.py:142: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
W0622 12:19:53.046406 140550576658240 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:212: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
W0622 12:19:53.993441 140550576658240 deprecation.py:323] From utils/../exprsynth/contextgraphmodel.py:184: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. W0622 12:19:54.005863 140550576658240 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:19:54.663139 140550576658240 deprecation.py:506] From utils/../exprsynth/contextgraphmodel.py:186: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. W0622 12:19:54.760896 140550576658240 deprecation.py:323] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfmodels/sparsegnn.py:95: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version. Instructions for updating: This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0. W0622 12:19:55.945921 140550576658240 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:564: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:19:55.965353 140550576658240 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:574: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:20:01.999933 140550576658240 deprecation.py:323] From utils/../exprsynth/nagdecoder.py:420: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” W0622 12:20:19.303285 140550576658240 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:200: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
2019-06-22 12:20:23.390381: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=–tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass –vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=–xla_hlo_profile. W0622 12:20:31.159235 140550576658240 deprecation.py:506] From utils/../exprsynth/nagdecoder.py:580: calling softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version. Instructions for updating: dim is deprecated, use axis instead Groundtruth: args [ 0 ] @1 Prob. 0.561: args [ 0 ] @2 Prob. 0.044: args [ classVar ] @3 Prob. 0.030: args . IndexOf ( ‘-’ ) @4 Prob. 0.020: args [ “foobar” ] @5 Prob. 0.016: args ? 1 : classVar Groundtruth: intVar + classVar @1 Prob. 0.419: intVar + classVar @2 Prob. 0.074: intVar + 4 @3 Prob. 0.050: intVar . IndexOf ( ‘-’ ) @4 Prob. 0.040: intVar . IndexOf ( ‘-’ , 2 , 43 ) @5 Prob. 0.014: intVar . IndexOf ( ‘-’ , 2 , classVar ) Groundtruth: foo . StartsWith ( “foobar” ) @1 Prob. 0.424: foo . StartsWith ( “foobar” ) @2 Prob. 0.102: foo . StartsWith ( ‘-’ ) @3 Prob. 0.034: foo . IndexOf ( ‘-’ ) @4 Prob. 0.031: foo . StartsWith ( %UNK% ) @5 Prob. 0.003: foo . IndexOf ( ‘-’ , 43 , 43 ) Groundtruth: foo . IndexOf ( ‘-’ ) @1 Prob. 0.276: foo . IndexOf ( ‘-’ ) @2 Prob. 0.062: foo . IndexOf ( ‘-’ , 2 , 43 ) @3 Prob. 0.051: foo [ 0 ] @4 Prob. 0.033: foo + classVar @5 Prob. 0.025: foo . StartsWith ( ‘-’ ) Groundtruth: foo . IndexOf ( ‘-’ , 2 , 43 ) @1 Prob. 0.215: foo . IndexOf ( ‘-’ , 2 , 43 ) @2 Prob. 0.133: foo . IndexOf ( ‘-’ ) @3 Prob. 0.070: foo + classVar @4 Prob. 0.055: foo + 4 @5 Prob. 0.005: foo . IndexOf ( ‘-’ , 2 , classVar ) Groundtruth: arr [ 1 ] @1 Prob. 0.353: b [ 1 ] @2 Prob. 0.167: arr [ 1 ] @3 Prob. 0.126: i [ 1 ] @4 Prob. 0.050: b [ i ] @5 Prob. 0.025: b ? 1 : i Groundtruth: j > classVar2 @1 Prob. 0.564: j > classVar2 @2 Prob. 0.119: j > 4 @3 Prob. 0.114: – j @4 Prob. 0.028: ! j @5 Prob. 0.019: j > - classVar2 Groundtruth: – j @1 Prob. 0.531: – j @2 Prob. 0.160: j > classVar2 @3 Prob. 0.059: j > 4 @4 Prob. 0.052: j ++ @5 Prob. 0.042: ! j Groundtruth: iarr [ j ] * - 1 + 4 @1 Prob. 0.193: iarr + j @2 Prob. 0.161: iarr + 4 @3 Prob. 0.013: iarr [ j ] * - 1 + 4 @4 Prob. 0.005: iarr [ 1 ] * - 1 + 4 @5 Prob. 0.005: iarr [ 1 ] * j + 4 Groundtruth: ! b @1 Prob. 0.474: ! b @2 Prob. 0.079: b ++ @3 Prob. 0.062: b > 4 @4 Prob. 0.039: b > j @5 Prob. 0.027: b < iarr Groundtruth: j > 4 @1 Prob. 0.300: j > 4 @2 Prob. 0.066: j < iarr @3 Prob. 0.064: j > iarr @4 Prob. 0.042: 4 < j @5 Prob. 0.036: j < 4 Groundtruth: 4 < classVar2 @1 Prob. 0.156: classVar2 > 4 @2 Prob. 0.136: 4 < classVar2 @3 Prob. 0.056: 4 > 4 @4 Prob. 0.050: classVar2 ++ @5 Prob. 0.049: classVar2 < j Groundtruth: classVar2 ++ @1 Prob. 0.442: classVar2 ++ @2 Prob. 0.085: ! classVar2 @3 Prob. 0.082: classVar2 > 4 @4 Prob. 0.059: – classVar2 @5 Prob. 0.057: classVar2 > j Groundtruth: b ? 2 : i @1 Prob. 0.352: b ? 2 : i @2 Prob. 0.085: b ? 2 : - i @3 Prob. 0.082: b ? 2 : arr @4 Prob. 0.041: b ? 2 : 4 @5 Prob. 0.026: b ? 2 : - 1 Groundtruth: b ? 1 : - i @1 Prob. 0.276: b ? 1 : i @2 Prob. 0.114: b ? 1 : arr @3 Prob. 0.071: b ? 1 : - i @4 Prob. 0.045: b [ 1 ] @5 Prob. 0.029: b ? 1 : 4 Num samples: 15 (15 before filtering) Avg Sample Perplexity: 1.39 Std Sample Perplexity: 0.21 Accuracy@1: 73.3333% Accuracy@5: 100.0000%
Xiyang Chen [4:05 AM] After some thinking and surveying, I’m leaning towards a dual encoder-decoder setup as a possible backbone architecture for our unsupervised learning task, where the latent spaces z could be tied together via a loss (maybe Wasserstein/optimal transport loss). This is also being used for some works on multimodal training tasks such as VQA, where one autoencoder is responsible for the visual (image) part and the other autoencoder for the text part. For our case it would be something like one autoencoder for HTML tree and another for screenshot, maybe conditioned on width of the window as well as desktop/mobile mode. More on this later. Meanwhile let me know if you have any thoughts/opinions. (edited)
Xiyang Chen [4:12 AM] Another radically different/maybe related approach is just use a CNN on screenshots with auxiliary HTML DOM node info, with skip connections on the hierarchy. But this idea is still vague. (edited)
Sheng Jia [11:43 AM] Is there any work on generating the graph or tree structure? (about the autoencoder for HTML) (edited) or are we assuming the fixed structure for now
raceback (most recent call last): File “exec.py”, line 85, in <module> main_f(res_dir, settings, hparams_list, paths_list, prints_dict) File “/home/will/DevAcademics/GraphNN/DOM-Q-NET/entries/q_template.py”, line 224, in main num_atoms=qlearn_hs.get(“num_atoms”) File “/home/will/DevAcademics/GraphNN/DOM-Q-NET/algorithms/qlearn.py”, line 73, in multitask_train t_config.batch_device File “/home/will/DevAcademics/GraphNN/DOM-Q-NET/actors/dqn_actor.py”, line 93, in __init__ self._raw_s_t = self._env.reset() File “/home/will/DevAcademics/GraphNN/DOM-Q-NET/miniwob/env.py”, line 36, in reset self._instance.force_stop() File “/home/will/DevAcademics/GraphNN/DOM-Q-NET/miniwob/instance.py”, line 134, in force_stop self._driver.execute_script(‘return core.endEpisode(0);’) File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py”, line 636, in execute_script ‘args’: converted_args})[‘value’] File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py”, line 321, in execute self.error_handler.check_response(response) File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py”, line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.JavascriptException: Message: javascript error: core is not defined
https://chromedriver.storage.googleapis.com/index.html?path=75.0.3770.90/
docker run -d -p 4444:4444 -v /dev/shm:/dev/shm selenium/standalone-chrome-debug:3.6.0-bromine
docker run -d -p 4444:4444 -v /dev/shm:/dev/shm selenium/standalone-chrome:3.141.59-radium
CHROMEDRIVER SELENIUM ChromeDriver - WebDriver for Chrome ChromeDriver · SeleniumHQ/selenium Wiki Selenium Documentation — Selenium Documentation Selenium using Python - Geckodriver executable needs to be in PATH - Stack Overflow SeleniumHQ/selenium: A browser automation framework and ecosystem. How to Setup Selenium with ChromeDriver on Ubuntu 18.04 & 16.04 – TecAdmin python - selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: File not found error invoking send_keys() using Selenium - Stack Overflow selenium (Session info: headless chrome=75.0.3770.100) - Google Search DevToolsActivePort file doesn’t exist. · Issue #46 · heroku/heroku-buildpack-google-chrome 2470 - Chromedriver produces error when run via cron -> DevToolsActivePort file doesn’t exist - chromedriver - Monorail selenium - WebDriverException: unknown error: DevToolsActivePort file doesn’t exist while trying to initiate Chrome Browser - Stack Overflow python - I got error message while input string into selenium webdriver - Stack Overflow url encoding - How to urlencode a querystring in Python? - Stack Overflow How to convert a url string to safe characters with python? - Stack Overflow Live Coding: Selenium Browser Automation | DevDungeon SeleniumHQ/docker-selenium: Docker images for Selenium Grid Server (Standalone, Hub, and Nodes). Using Selenium-Server on Docker to run your Browser Tests - Meltwater Engineering Blog DevDungeon | Virtual Hackerspace python - urllib.urlencode: TypeError not a valid non-string sequence or mapping object - Stack Overflow selenium.common.exceptions — Selenium 3.14 documentation javascript - Error selenium.common.exceptions.JavascriptException: Message: ReferenceError: room is not defined - Stack Overflow selenium.webdriver.chrome.options.Options Python Example python - selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally with ChromeDriver Chrome and Selenium - Stack Overflow Version Selection - ChromeDriver - WebDriver for Chrome Autograd: Automatic Differentiation — PyTorch Tutorials 1.1.0 documentation [[https://arxiv.org/abs/1810.10531v1?utm_content=buffer48508&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer][[1810.10531v1] A mathematical theory of semantic development in deep neural networks]]
DOCKER Docker Selenium. Getting Started - YouTube selenium webdriver docker standalone - YouTube How To Run Your Selenium Tests Headlessly in Docker - Chris Kenst HN Search powered by Algolia Show HN: Python script to automate filling out Google form using Selenium | Hacker News vedipen/AutomateGoogleForm: Python script for automating Google form filling. Automate the Boring Stuff with Python Learn Selenium - Best Selenium Tutorials (Ranked) | Hackr.io 7 Must-read Selenium Tutorials - Applitools Blog christian-bromann/awesome-selenium: A curated list of delightful Selenium resources. Reviews of ‘TestNG Tutorials for Selenium Webdriver’ for learning Selenium | Hackr.io selenium blog/How to use selenium with docker.md at 512d4b69242af09af7dd1f83261e1bc661ec1d23 · Windsooon/blog 7. WebDriver API — Selenium Python Bindings 2 documentation Getting Started with Hub and Nodes · SeleniumHQ/docker-selenium Wiki Upgrade chromedriver to 2.32 · Issue #560 · SeleniumHQ/docker-selenium Selenium Standalone v.3.141.59 2. Getting Started — Selenium Python Bindings 2 documentation Chrome Options & Desiredcapabilities: AdBlocker, Incognito, Headless Headless chrome is not working in the docker · Issue #520 · SeleniumHQ/docker-selenium SchulteMarkus/selenium-standalone-chrome-spring-boot-demo: Demonstrating “selenium/standalone-chrome” in a Spring Boot project WebDriver Hub
HN Search powered by Algolia Free Hotel Wifi with Python and Selenium | Hacker News Free Hotel Wifi with Python and Selenium · Gokberk Yaltirakli Selenium 2.0: Out Now | Hacker News How to make Selenium tests reliable, scalable, and maintainable | Hacker News Selenium: 7 Things You Need To Know - Lucidchart Consistent Selenium Testing in Python | Hacker News Hacker News Hacker News How to Scrape Web using Python, Selenium and Beautiful Soup · Swetha’s Blog Hacker News weskerfoot/DeleteFB: Automate Scrubbing your Facebook Presence Hacker News Hacker News Selenium With Headless Chrome On Travis CI GUI and Headless Browser Testing - Travis CI Hacker News Running Headless Selenium with Chrome Hacker News How To Install Node.js on Ubuntu 18.04 | DigitalOcean
- latex doc Representation learning of HTML at large scale and its applications to downstream tasks - Online LaTeX Editor Overleaf
- 3rd part experiments
- refs
- model of DOMs. OUTPUT:
- (automated) Web navigation
- Classification (easiest)
- HTML generation
- Sheng: which areas are clickable (for RL)? filter out what is clickable or not
- test.ai is classifying buttons
- QA/test automation
- pixelwise presentation screen (images),
- action space factorization
- Xiyang interested in parsing part
- only one cite
- using GCN for parse (classification, etc, transductive not inductive – all nodes need to be known ahead of time)
- closest alt is molecular generation (aisc talk)
- test.ai is classifying buttons
- some small task
- like an autoencoder?
- input simplest: RNN – any length
- 2 AEs
- html
- images
- somehow regularize 2 codes so they somehow relate to each other
- 2-4 papers, image labelling
- graphNN for encode/decode html
- alts: transformer (150 tokens max / gpu)
- gae, graphsage
- image AE
- conv
- transformer decodeing is sequantial
- sketch to code
- Turning Design Mockups Into Code With Deep Learning
- LSTM can do it, but slow, not scale. we want scale
- transformer has heirarchy rep, representational power he thinks better than LSTM
- but MS active with graphs for code generation
- so we know works for RNNs, good reason to think it will work for Transformer, graphNN proof from MS work.
- How to deal with in page images?
- also text
first two priority:
- literature review
- crawling 1000-2000 pages for data
- sketch website has a simple dataset
- then small experiment for theoretical, use data we crawled
- optimal transport – make 2 distrib as close as possible while making differentiable.
- see if any repos of the papers
- DOM tree into form good for DNNs
def normalize_adj(adj):
"""Symmetrically normalize adjacency matrix."""
adj = sp.coo_matrix(adj)
rowsum = np.array(adj.sum(1))
d_inv_sqrt = np.power(rowsum, -0.5).flatten()
d_inv_sqrt[np.isinf(d_inv_sqrt)] = 0.
d_mat_inv_sqrt = sp.diags(d_inv_sqrt)
return adj.dot(d_mat_inv_sqrt).transpose().dot(d_mat_inv_sqrt).tocoo()
def normalize(mx):
"""Row-normalize sparse matrix"""
rowsum = np.array(mx.sum(1))
r_inv = np.power(rowsum, -1).flatten()
r_inv[np.isinf(r_inv)] = 0.
r_mat_inv = sp.diags(r_inv)
mx = r_mat_inv.dot(mx)
return mx
- two new papers:
- generalized zero and few-shot paper talks motivation and history of why
- refs their old paper from ICLR
- poster for this one
- pluralistic image completion
- generalized zero and few-shot paper talks motivation and history of why
- encoder: lstm, transformer, graphnn?
- decoder:
- generating code in parallel – we want it to scale
- anything mentions tree or heirarchy
- molecule generation via graphnn – heavy constraints though
- clones Gran repo here ~/DevAcademics/GraphNN/GRAN
AISC State of NLP in 2019
- DEALS
- To show our appreciation for your patience, we would like to offer you a 50% off discount for any of our workshops you register until the end of September. Code: nlpjun27
- cheap I suggest you to use GCP ($ 400 of free credit ) and use the image provided by fast.ai to do that because it comes with all of required packages (hopefully) installed. Here is a guide how to do it: https://course.fast.ai/start_gcp.html Then download the notebook from google CoLab and upload it on your GPU VM on GCP.
- stoi = string to identifier
- itos = id to string
[2019-06-27 Thu]
- logistics for deeper nets is worse – vanishing/exploding gradients problem, hence ReLU
- NLP predict next word
- Viterbi algo
- RNN drawbacks, hard to compete against transformers
- Part II
- Huggingface BERT
- stanford QA set
- Huggingface good
- 3 levels tokenization - words, etc
- autoencoding: representation learning
- neural network methods in language tel aviv
- statistical natural lang processing chris manning, shutz - older
- matrix factorization
- drovesky martin - speech and language processing
BERT is just the encoder part of the transformer, trained with different tasks
from aisc/nlp-workshop posts july6 Passing the weights to CrossEntropyLoss correctly - PyTorch Forums with this method weightings are applied to the calculations to give a different weighting for output classes.ufoym/imbalanced-dataset-sampler: A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones. this looks like a resampler like smote, rose.
from sampler import ImbalancedDatasetSampler
train_loader = torch.utils.data.DataLoader(
train_dataset,
sampler=ImbalancedDatasetSampler(train_dataset), # this is the module
batch_size=args.batch_size,
**kwargs
)
this is about label-smoothing loss With label smoothing, KL-divergence between qsmoothed ground truth prob.(w) pprob. computed by model(w) is minimized. label smoothing reduces onehot targets from (0,1) to numbers representing some uncertainty ie (0.1, 0.9). Example is given where this is used with dirty data when some instances are mis-labeled. Both frameworks have args for this. OpenNMT-py/loss.py at e8622eb5c6117269bb3accd8eb6f66282b5e67d9 · OpenNMT/OpenNMT-py
Assignment 2: the second assignment aims to use the encoder of the transformer architecture to tackle the QUORA DEDUPLICATION task that you worked on during the first assignment. Can you improve the performance of your model using the transformer architecture? Remember you can only swap the MODEL part of your old code with the transformer encoder and change the relevant parameters (input size and alike). To that end you can use BERT’s Encoder or the code in the study material above (the latter I recommend).
class Embedder(nn.Module):
def __init__(self, vocab_size, d_model):
super().__init__()
self.embed = nn.Embedding(vocab_size, d_model)
def forward(self, x):
return self.embed(x)
class PositionalEncoder(nn.Module):
def __init__(self, d_model, max_seq_len = 80):
super().__init__()
self.d_model = d_model
# create constant 'pe' matrix with values dependant on
# pos and i
pe = torch.zeros(max_seq_len, d_model)
for pos in range(max_seq_len):
for i in range(0, d_model, 2):
pe[pos, i] = \
math.sin(pos / (10000 ** ((2 * i)/d_model)))
pe[pos, i + 1] = \
math.cos(pos / (10000 ** ((2 * (i + 1))/d_model)))
pe = pe.unsqueeze(0)
self.register_buffer('pe', pe)
def forward(self, x):
# make embeddings relatively larger
x = x * math.sqrt(self.d_model)
#add constant to embedding
seq_len = x.size(1)
x = x + Variable(self.pe[:,:seq_len], \
requires_grad=False).cuda()
return x
batch = next(iter(train_iter))
input_seq = batch.English.transpose(0,1)
input_pad = EN_TEXT.vocab.stoi['<pad>']
# creates mask with 0s wherever there is padding in the input
input_msk = (input_seq != input_pad).unsqueeze(1)
for the target_seq we do the same, but then create an additinoal step
# create mask as before
target_seq = batch.French.transpose(0,1)
target_pad = FR_TEXT.vocab.stoi['<pad>']
target_msk = (target_seq != target_pad).unsqueeze(1)
size = target_seq.size(1) # get seq_len for matrix
nopeak_mask = np.triu(np.ones(1, size, size),
k=1).astype('uint8')
nopeak_mask = Variable(torch.from_numpy(nopeak_mask) == 0)
target_msk = target_msk & nopeak_mask
class MultiHeadAttention(nn.Module):
def __init__(self, heads, d_model, dropout = 0.1):
super().__init__()
self.d_model = d_model
self.d_k = d_model // heads
self.h = heads
self.q_linear = nn.Linear(d_model, d_model)
self.v_linear = nn.Linear(d_model, d_model)
self.k_linear = nn.Linear(d_model, d_model)
self.dropout = nn.Dropout(dropout)
self.out = nn.Linear(d_model, d_model)
def forward(self, q, k, v, mask=None):
bs = q.size(0)
# perform linear operation and split into h heads
k = self.k_linear(k).view(bs, -1, self.h, self.d_k)
q = self.q_linear(q).view(bs, -1, self.h, self.d_k)
v = self.v_linear(v).view(bs, -1, self.h, self.d_k)
# transpose to get dimensions bs * h * sl * d_model
k = k.transpose(1,2)
q = q.transpose(1,2)
v = v.transpose(1,2)
# calculate attention using function we will define next
scores = attention(q, k, v, self.d_k, mask, self.dropout)
# concatenate heads and put through final linear layer
concat = scores.transpose(1,2).contiguous()\
.view(bs, -1, self.d_model)
output = self.out(concat)
def attention(q, k, v, d_k, mask=None, dropout=None):
scores = torch.matmul(q, k.transpose(-2, -1)) / math.sqrt(d_k)
if mask is not None:
mask = mask.unsqueeze(1)
scores = scores.masked_fill(mask == 0, -1e9)
scores = F.softmax(scores, dim=-1)
if dropout is not None:
scores = dropout(scores)
output = torch.matmul(scores, v)
return output
class FeedForward(nn.Module):
def __init__(self, d_model, d_ff=2048, dropout = 0.1):
super().__init__()
# We set d_ff as a default to 2048
self.linear_1 = nn.Linear(d_model, d_ff)
self.dropout = nn.Dropout(dropout)
self.linear_2 = nn.Linear(d_ff, d_model)
def forward(self, x):
x = self.dropout(F.relu(self.linear_1(x)))
x = self.linear_2(x)
return x
class Norm(nn.Module):
def __init__(self, d_model, eps = 1e-6):
super().__init__()
self.size = d_model
# create two learnable parameters to calibrate normalisation
self.alpha = nn.Parameter(torch.ones(self.size))
self.bias = nn.Parameter(torch.zeros(self.size))
self.eps = eps
def forward(self, x):
norm = self.alpha * (x - x.mean(dim=-1, keepdim=True)) \
/ (x.std(dim=-1, keepdim=True) + self.eps) + self.bias
return norm
# build an encoder layer with one multi-head attention layer and one # feed-forward layer
class EncoderLayer(nn.Module):
def __init__(self, d_model, heads, dropout = 0.1):
super().__init__()
self.norm_1 = Norm(d_model)
self.norm_2 = Norm(d_model)
self.attn = MultiHeadAttention(heads, d_model)
self.ff = FeedForward(d_model)
self.dropout_1 = nn.Dropout(dropout)
self.dropout_2 = nn.Dropout(dropout)
def forward(self, x, mask):
x2 = self.norm_1(x)
x = x + self.dropout_1(self.attn(x2,x2,x2,mask))
x2 = self.norm_2(x)
x = x + self.dropout_2(self.ff(x2))
return x
# build a decoder layer with two multi-head attention layers and
# one feed-forward layer
class DecoderLayer(nn.Module):
def __init__(self, d_model, heads, dropout=0.1):
super().__init__()
self.norm_1 = Norm(d_model)
self.norm_2 = Norm(d_model)
self.norm_3 = Norm(d_model)
self.dropout_1 = nn.Dropout(dropout)
self.dropout_2 = nn.Dropout(dropout)
self.dropout_3 = nn.Dropout(dropout)
self.attn_1 = MultiHeadAttention(heads, d_model)
self.attn_2 = MultiHeadAttention(heads, d_model)
self.ff = FeedForward(d_model).cuda()
def forward(self, x, e_outputs, src_mask, trg_mask):
x2 = self.norm_1(x)
x = x + self.dropout_1(self.attn_1(x2, x2, x2, trg_mask))
x2 = self.norm_2(x)
x = x + self.dropout_2(self.attn_2(x2, e_outputs, e_outputs,
src_mask))
x2 = self.norm_3(x)
x = x + self.dropout_3(self.ff(x2))
return x
# We can then build a convenient cloning function that can generate multiple layers:
def get_clones(module, N):
return nn.ModuleList([copy.deepcopy(module) for i in range(N)])
class Encoder(nn.Module):
def __init__(self, vocab_size, d_model, N, heads):
super().__init__()
self.N = N
self.embed = Embedder(vocab_size, d_model)
self.pe = PositionalEncoder(d_model)
self.layers = get_clones(EncoderLayer(d_model, heads), N)
self.norm = Norm(d_model)
def forward(self, src, mask):
x = self.embed(src)
x = self.pe(x)
for i in range(N):
x = self.layers[i](x, mask)
return self.norm(x)
class Decoder(nn.Module):
def __init__(self, vocab_size, d_model, N, heads):
super().__init__()
self.N = N
self.embed = Embedder(vocab_size, d_model)
self.pe = PositionalEncoder(d_model)
self.layers = get_clones(DecoderLayer(d_model, heads), N)
self.norm = Norm(d_model)
def forward(self, trg, e_outputs, src_mask, trg_mask):
x = self.embed(trg)
x = self.pe(x)
for i in range(self.N):
x = self.layers[i](x, e_outputs, src_mask, trg_mask)
return self.norm(x)
class Transformer(nn.Module):
def __init__(self, src_vocab, trg_vocab, d_model, N, heads):
super().__init__()
self.encoder = Encoder(src_vocab, d_model, N, heads)
self.decoder = Decoder(trg_vocab, d_model, N, heads)
self.out = nn.Linear(d_model, trg_vocab)
def forward(self, src, trg, src_mask, trg_mask):
e_outputs = self.encoder(src, src_mask)
d_output = self.decoder(trg, e_outputs, src_mask, trg_mask)
output = self.out(d_output)
return output
# we don't perform softmax on the output as this will be handled
# automatically by our loss function
running the transformer article repo on gpgpu, needed to do following (env ‘main’) 817 conda install -c derickl torchtext 827 conda install dill 832 python -m spacy download en 833 python -m spacy download fr 829 python train.py -src_data data/english.txt -trg_data data/french.txt -src_lang en -trg_lang fr start time [2019-07-11 Thu] 09:52pm
- I moved opt.train to cuda, check on gpgpu later
if low VRAM: reduce batch size and number of encoder layers
Wen Ho encoder model
EncoderWrapper( (encoder): Encoder( (embed): Embedder( (embed): Embedding(85519, 256) ) (pe): PositionalEncoder( (dropout): Dropout(p=0.3) ) (layers): ModuleList( (0): EncoderLayer( (norm_1): Norm() (norm_2): Norm() (attn): MultiHeadAttention( (q_linear): Linear(in_features=256, out_features=256, bias=True) (v_linear): Linear(in_features=256, out_features=256, bias=True) (k_linear): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.3) (out): Linear(in_features=256, out_features=256, bias=True) ) (ff): FeedForward( (linear_1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.3) (linear_2): Linear(in_features=2048, out_features=256, bias=True) ) (dropout_1): Dropout(p=0.3) (dropout_2): Dropout(p=0.3) ) (1): EncoderLayer( (norm_1): Norm() (norm_2): Norm() (attn): MultiHeadAttention( (q_linear): Linear(in_features=256, out_features=256, bias=True) (v_linear): Linear(in_features=256, out_features=256, bias=True) (k_linear): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.3) (out): Linear(in_features=256, out_features=256, bias=True) ) (ff): FeedForward( (linear_1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.3) (linear_2): Linear(in_features=2048, out_features=256, bias=True) ) (dropout_1): Dropout(p=0.3) (dropout_2): Dropout(p=0.3) ) (2): EncoderLayer( (norm_1): Norm() (norm_2): Norm() (attn): MultiHeadAttention( (q_linear): Linear(in_features=256, out_features=256, bias=True) (v_linear): Linear(in_features=256, out_features=256, bias=True) (k_linear): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.3) (out): Linear(in_features=256, out_features=256, bias=True) ) (ff): FeedForward( (linear_1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.3) (linear_2): Linear(in_features=2048, out_features=256, bias=True) ) (dropout_1): Dropout(p=0.3) (dropout_2): Dropout(p=0.3) ) (3): EncoderLayer( (norm_1): Norm() (norm_2): Norm() (attn): MultiHeadAttention( (q_linear): Linear(in_features=256, out_features=256, bias=True) (v_linear): Linear(in_features=256, out_features=256, bias=True) (k_linear): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.3) (out): Linear(in_features=256, out_features=256, bias=True) ) (ff): FeedForward( (linear_1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.3) (linear_2): Linear(in_features=2048, out_features=256, bias=True) ) (dropout_1): Dropout(p=0.3) (dropout_2): Dropout(p=0.3) ) (4): EncoderLayer( (norm_1): Norm() (norm_2): Norm() (attn): MultiHeadAttention( (q_linear): Linear(in_features=256, out_features=256, bias=True) (v_linear): Linear(in_features=256, out_features=256, bias=True) (k_linear): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.3) (out): Linear(in_features=256, out_features=256, bias=True) ) (ff): FeedForward( (linear_1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.3) (linear_2): Linear(in_features=2048, out_features=256, bias=True) ) (dropout_1): Dropout(p=0.3) (dropout_2): Dropout(p=0.3) ) (5): EncoderLayer( (norm_1): Norm() (norm_2): Norm() (attn): MultiHeadAttention( (q_linear): Linear(in_features=256, out_features=256, bias=True) (v_linear): Linear(in_features=256, out_features=256, bias=True) (k_linear): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.3) (out): Linear(in_features=256, out_features=256, bias=True) ) (ff): FeedForward( (linear_1): Linear(in_features=256, out_features=2048, bias=True) (dropout): Dropout(p=0.3) (linear_2): Linear(in_features=2048, out_features=256, bias=True) ) (dropout_1): Dropout(p=0.3) (dropout_2): Dropout(p=0.3) ) ) (norm): Norm() ) (out): Linear(in_features=256, out_features=1, bias=True) (sig): Sigmoid() )
hyper params:
- wen: Configuration of the Encoder net: hidden dimension = 256, N=6 and heads = 8
- Tracy Pham: My last Linear layer has `in_features = (max_seq_length * d_model)`. in your model, `in_features=256` = d_model?
- Got the correct dimension of the last linear layer, `in_features = (max_seq_length * d_model)`
- Now accuracy is at ~.78 with `d_model=512`, `N=1`, `heads=2`, and `batch_size=50`.
- but d_model should be embedding dimension by reading the example repo code
werner on imports for that transformer post
import torch
import torch.nn as nn
from torch.autograd import Variable
import spacy
import torchtext
from torchtext import data
from torchtext.data import Field, BucketIterator, TabularDataset
from sklearn.model_selection import train_test_split
# from Batch import MyIterator, batch_size_fn
# from Tokenize import tokenize
import os
import numpy as np
import pandas as pd
import math
import copy
import torch.nn.functional as F
import time
Headline | Time | |
---|---|---|
Total time | 4:48 | |
\_ AISC Math of DL Workshops [4/5] | 4:48 |
1st TA Working Session, Introduction 30 1st Notebook preperation 45 2nd Notebook preperation + related work 240 visualization search cleanup drive, files, cleanup notebooks time on slack with students – unknown
Willy Rempel | Jul 17, 2019 | 1st TA Working Session, Introduction | 30 |
Willy Rempel | Jul 16, 2019 | 1st Notebook preperation | 45 |
Willy Rempel | Jul 24, 2019 | 2nd Notebook preperation + related work? | 80 |
Willy Rempel | Jul 25, 2019 | ” ” | 160 |
---|---|---|---|
315 / 60 = 5.25 hours | 315 | ||
9 hours class | |||
14 * 14.25 = 199.50 | 14.25 |
- issues people had:
- layers in convnet
- multiple layers
- imagenary nums
- how better handson?
- issue: not much time
@Willy Rempel will work on selecting some visualizations that help people understand layers in convnet and optimization etc
- conv net example to supplement
- 2nd section autograd
- when we come back – optimizers example walkthrough
- handson notebook: – go through cells.
- random code and comments so they have to edit
- simple english. ‘this code is a breaker, you need to fix below to continue ’
- for notebook – Amir has shared link and in slides for content
Amir F – draft document on 3rd assignment: writing the blog post if we have any ideas on it please feedback
- overall: what grad is, how it propogates
- exp with autograd
- solving a problem GD
- putting together
- linear regression example ?
- visualizations
- convnet visualization
- autograd, what it means and code
- solve problem with GD
- handson
- 2nd visualization at the end, otpimzation
- pytorch that contains all the concepts
- datascience blog post
# Create a Rank-2 tensor of all ones
x = torch.ones(2, 2, requires_grad=True)
print(x)
# Define y to be a function of x
y = x+2
# And z to be a function of y (and hence x):
z = 3*y*y
out = z.mean()
print(z, out)
# Now backprop:
out.backward()
# print gradients d(out)/dx
print(x.grad)
- 1 TA for wed sesh
- different levels of engagement for asgns
- grading?
- students ready and have access to material
- examples codes into colab notebooks
- prelim content
- read this paper&code dynamic deep networks for retinal vessel segmentation sraashis/ature
- 1st session
- 1 neuron in pytorch
- affine maps
- tensors
- non-linear
- parameters
- linear algebra
- quick intro to pytorch
- define tensor in pytorch
- fill pytorch with a certain scalar
- fill a pytorch tensor with rands
- find a min value of a pytorch tensor
- simple imports
- pytroch beginner tutorial
- convert a py list to a pytorch tensor and vice versa
- tensors and scalars
- everything in numpy -> do in pytorch
- transpose
- dot products
- exercise: transpose images
- matrix manipulation
- matrix determinant
- eigenvalues
- then hands-on for module ii
- non-linearities, acv functions
- types and what we use in pytorch
- hands on: activation fns
- use prior image and apply activation fns to it
- loss fns
- where in pytorch
- 1 neuron in pytorch
ws-math-dl-all ws-math-dl-breakout-1 ws-math-dl-breakout-2 ws-math-dl-breakout-3 Breakout Room x - TA Willy Rempel
my online group Ridwan A Jen L Motasem Vikash
This is a working session with one of the TAs where you can ask questions about the workshop, set up, and hands on parts. We will spend the first session making sure everyone has the info they need. The booking is for 2 hours but will most probably end earlier.
We will cover part 1 of chapter 2 Except 2.12. Well, most of it. And the rest will be given as a reading assignment.
my hangout: https://hangouts.google.com/call/xkxqyaNdzAx8t1qGUy8pAEEE ask people to introduct themselves
fastai has awesome library for preprocessing image data 5 days old
gated convnets
fast.ai recommend
import imageio
# image = imageio.imread("/Users/amirh/Downloads/Veins.png", as_gray=True)
image = imageio.imread("Veins.png", as_gray=True)
image = imageio.imread("Veins.png", format='PNG', as_gray=True)
image = imageio.mimread('Veins.png', as_gray=True)
# image = imageio.imread("https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb", as_gray=True)
import matplotlib.pyplot as plt
fig = plt.figure(); plt.gray() # show the filtered result in grayscale
ax1 = fig.add_subplot(121) # left side
ax2 = fig.add_subplot(122) # right side
result = ndimage.sobel(image)
ax1.imshow(image)
ax2.imshow(result)
plt.show()
# original
!wget "https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb" -O "Veins.png"
# werner
!wget "https://drive.google.com/uc?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb" -O "Veins.png"
!curl "https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb" > Veins.png
!wget "https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb"
!wget https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb -O "Veins.png"
img = imageio.imread("https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb", as_gray=True)
!wget "https://drive.google.com/uc?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb" -O 'Veins.jpg'
Authors: Amir Hajian Presenter: [name] Facilitators: [names]
July 2019
Outline What we will learn in these 9 hours? How can I get the most out of it? How to do deep learning in 2019?
2
What to aim for You should be able to follow this work by the end of the workshop 3
Recap from last session 4
Dissecting A DL Architecture 5
An artificial neuron 6
Simplifying the notation It’s all about matrices, vectors and exploring the parameter space to find the right parameters!
Linear Algebra: Tensors and Scalars
8
What is PyTorch How to import it
Exercises with Tensors and Scalars 9 Define a tensor in PyTorch Fill a PyTorch tensor with a certain scalar Fill a PyTorch tensor with random numbers Find minimum value of a PyTorch tensor Convert a Py list to a PyTorch tensor and vice versa
Linear Algebra: Matrix Transpose torch.t() 10 data = torch.randn(200,250) data[100:120,:]=0.5 imshow(data)
imshow(data.t())
Linear Algebra: Dot Product
11
Linear Algebra: Dot Product
12
Linear Algebra: Dot Product 13 torch.matmul(a, b)
Linear Algebra: Matrix Multiplication torch.matmul(M1, M2) 14 data = torch.randn(5) torch.matmul(data,data)
data = torch.randn(2,5) torch.matmul(data,data.ta())
Exercise: Transpose images Simulated Data: Create a random 2D matrix with dimensions 200x250, set columns 100:120 to zero, display it, transpose the matrix, display it again.
Real image data: Read the image provided to you, display it, transpose it, and display it again. 15 data = torch.randn(200,250) data[100:120,:]=0.5 imshow(data)
imshow(data.t())
Linear Algebra: Matrix Determinant
16 data = torch.randn(2,2) torch.det(data)
Linear Algebra: Eigenvalues
17 data = torch.randn(2,2) torch.det(data)
18
Non-linearities, activation functions Types of activation functions and what we use in PyTorch Affine Maps: f(x)=Ax+b
PyTorch way: lin = nn.Linear(5, 3) data = torch.randn(2, 5) lin(data) 19 We’ll do the first example from here: https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim
torch.manual_seed(1)
lin = nn.Linear(5, 3) # maps from R^5 to R^3, parameters A, b
data = torch.randn(2, 5) print(lin(data)) # yes
Types of activation functions and what we use in PyTorch Non-linearities f(x) = Ax+b g(x) = Cx+d f(g(x)) = A(Cx+d)+b = ACx + ( Ad + b)
What to use?
20
Hands on: Types of activation functions and what we use in PyTorch Apply non-linearities in PyTorch
21 Define a ReLU layer in PyTorch Work with non-linearities: Plot a relu function Plot a tanh function Plot a sigmoid function and observe how it is a distribution function
We’ll do the first example from here: https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim
torch.manual_seed(1)
lin = nn.Linear(5, 3) # maps from R^5 to R^3, parameters A, b
data = torch.randn(2, 5) print(lin(data)) # yes
Loss functions in PyTorch 22 torch.nn.MSELoss torch.nn.CrossEntropyLoss
23 Define a tensor in PyTorch Fill a PyTorch tensor with a certain scalar Fill a PyTorch tensor with random numbers Find minimum value of a PyTorch tensor Reshape a tensor Flatten a tensor Convert a Py list to a PyTorch tensor and vice versa Multiply tensors with a scalar Dot product two tensors Transpose a matrix in PyTorch Matrix Multiplications in PyTorch Define a ReLU layer in PyTorch
Work with non-linearities: Plot a relu function Plot a tanh function Plot a sigmoid function and observe how it is a distribution function Something like this for playing with non-linearities:
data = torch.arange(-2,2,step=0.1) plot(data.numpy(),torch.tanh(data).numpy()) show()
We will follow these examples for playing with non-liearities: https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html
For tensors we will follow these: https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html
- [X] hands-on colab notebook - Me
- hands-on #1 1D conv
- hide cells?
- have them search and google
- give hints, but have elements missing
- also todo in pytorch, have to go from np to pytorch
- handson #2
- from Amir, edge detection code
- also do edge detection on bio-med pic from last time.
- dropouts?
- hands-on #1 1D conv
other guys:
- skeleton of blog post
- proof of work instead
- involved in the last week
- TA hour tomorrow
Mathematics of Deep Learning - II Authors: Amir Hajian
July 2019
Outline Convolution: Why we need more than MLP? What is convolution? What is a kernel? What does it do? 1D convolution 2D convolution Hands-on experiments with convolutions in python Efficient convolution algorithms ConvNets: a lightning fast introduction to their structure (conv layers, pooling, etc) and their applications. Dropout: How we prevent overfitting in neural networks? What is the math behind it? How do you use it in PyTorch?
2 What is a convolution? Formal definition: convolution is a mathematical operation on two functions (f and g) to obtain a third function that expresses how the shape of one is modified by the other.
3 What is a convolution? Formal definition: convolution is a mathematical operation on two functions (f and g) to obtain a third function that expresses how the shape of one is modified by the other.
4 Practical definition: Take a function f Take a function g Shift g by a finite amount T Multiply f with the shifted g: f(t) g(t-T) sum over the whole range to get the value of f*g at point T Go to step c and repeat for all T values.
What is a convolution? A simple example: What is the result of convolving a delta function with a Gaussian kernel? 5
What is a convolution? A simple example: What is the result of convolving a delta function with a Gaussian? 6
What is a convolution? A simple example: What is the result of convolving a delta function with a Gaussian kernel? 7 Read more
What is a convolution? A simple example: What is the result of convolving a delta function with a Gaussian kernel? 8
What is a convolution? A visual example: 9
What is a convolution? A visual example: 10
What is a convolution? A visual example: 11
What is a convolution? A visual example: 12
What is a convolution? How to code it up?
In Python: scipy.signal.convolve for 1D conv scipy.signal.conv2d for 2D 13
Experiments: Hands-on 14 Experiment with convolutions in 1D by smoothing a top-hat function with a Hann function Define a top-hat function that is non-zero in the range of [100:200] Define a Hann function between 0 and 50 - Hint: use scipy.signal.hann Apply the Hann function to the top-hat - Hint: use signal.convolve Plot the signal before and after smoothing to see the result. Discuss with your teammates to make sure you understand the results. Repeat it with PyTorch Conv function at home. from scipy import signal sig = np.repeat([0., 1., 0.], 100) win = signal.hann(50) filtered = signal.convolve(sig, win, mode=’same’) / sum(win)
import matplotlib.pyplot as plt fig, (ax_orig, ax_win, ax_filt) = plt.subplots(3, 1, sharex=True) ax_orig.plot(sig) ax_orig.set_title(‘Original pulse’) ax_orig.margins(0, 0.1) ax_win.plot(win) ax_win.set_title(‘Filter impulse response’) ax_win.margins(0, 0.1) ax_filt.plot(filtered) ax_filt.set_title(‘Filtered signal’) ax_filt.margins(0, 0.1) fig.tight_layout() fig.show()
Application: smoothing/binning noisy functions original_signal = torch.randn([1,1,100]) kernel = torch.ones([1,1,10]) smooth_signal = torch.conv1d(original_signal, kernel, padding=5)/kernel.sum() 15 Pick the kernel to be a top-hat function of length L Convolve a noisy function with the kernel Observe how the function is binned using this operation. Note: To plot you need to convert to numpy and flatten:
plot(original_signal.numpy().flatten(), label=”Original Signal”) plot(smooth_signal.numpy().flatten(), label=”Smooth Signal”)
Application: smoothing/binning noisy functions In PyTorch: torch.conv1d(original_signal, kernel, padding=5) 16 Pick the kernel to be a top-hat function of length L Convolve a noisy function with the kernel Observe how the function is binned using this operation. original_signal = torch.randn([1,1,100]) kernel = torch.ones([1,1,10]) smooth_signal = torch.conv1d(original_signal, kernel, padding=5)/kernel.sum() Note: To plot you need to convert to numpy and flatten:
plot(original_signal.numpy().flatten(), label=”Original Signal”) plot(smooth_signal.numpy().flatten(), label=”Smooth Signal”)
Hands-On: 1D Convolution in PyTorch Experiment with 1D conv in PyTorch by recreating this plot to bin a noisy function. Get creative. Pick your own function. Add noise to it. Pick different kernels, experiment with the width and shape of the kernels. original_signal = torch.randn([1,1,100]) kernel = torch.ones([1,1,10]) smooth_signal = torch.conv1d(original_signal, kernel, padding=5)/kernel.sum() 17 Note: To plot you need to convert to numpy and flatten:
plot(original_signal.numpy().flatten(), label=”Original Signal”) plot(smooth_signal.numpy().flatten(), label=”Smooth Signal”)
Convolutions in 2D: A step towards ConvNets 18 Note: To plot you need to convert to numpy and flatten:
plot(original_signal.numpy().flatten(), label=”Original Signal”) plot(smooth_signal.numpy().flatten(), label=”Smooth Signal”)
2D Convolution: Detect Edges with Sobel Operator 19 import imageio image = imageio.imread(“/Users/amirh/Downloads/Veins.png”, as_gray=True) from scipy import ndimage, misc import matplotlib.pyplot as plt fig = plt.figure(); plt.gray() # show the filtered result in grayscale ax1 = fig.add_subplot(121) # left side ax2 = fig.add_subplot(122) # right side result = ndimage.sobel(image) ax1.imshow(image) ax2.imshow(result) plt.show()
Experiments: Hands-on time 20 Experiment with convolutions in 2D to detect edges in an image Read the image and convert it to grey. Define the kernel Apply the kernel to the image using scipy.signal.convolve2d Plot the results Try Sobel Kernel as well as Scharr Kernel. See the difference in the results? Note for TA’s: here is a sample solution.
from scipy import signal sig = np.repeat([0., 1., 0.], 100) win = signal.hann(50) filtered = signal.convolve(sig, win, mode=’same’) / sum(win)
import matplotlib.pyplot as plt fig, (ax_orig, ax_win, ax_filt) = plt.subplots(3, 1, sharex=True) ax_orig.plot(sig) ax_orig.set_title(‘Original pulse’) ax_orig.margins(0, 0.1) ax_win.plot(win) ax_win.set_title(‘Filter impulse response’) ax_win.margins(0, 0.1) ax_filt.plot(filtered) ax_filt.set_title(‘Filtered signal’) ax_filt.margins(0, 0.1) fig.tight_layout() fig.show()
21 import imageio from scipy import signal from scipy import misc image = imageio.imread(“/Users/amirh/Downloads/Veins.png”, as_gray=True) sobel_y = np.array([[ -1, -2, -1], [0, 0, 0], [ 1, 2, 1]]) sobel = signal.convolve2d(image, sobel_y, boundary=’symm’, mode=’same’)
import matplotlib.pyplot as plt fig, (ax_orig, ax_mag) = plt.subplots(1, 2) ax_orig.imshow(image, cmap=’gray’) ax_orig.set_title(‘Original’) ax_orig.set_axis_off() ax_mag.imshow(np.absolute(sobel_y), cmap=’gray’) ax_mag.set_title(‘Sobel Applied’) ax_mag.set_axis_off() fig.show()
Exercise: 1) find edges in the x-direction using sobel_x = np.array([[ -1, 0, +1], [-2, 0, +2], [ -1, 0, +1]]) Exercise: 2) combine x and y results to get a final result
import imageio from scipy import signal from scipy import misc image = imageio.imread(“/Users/amirh/Downloads/Veins.png”, as_gray=True) sobel_y = np.array([[ -1, -2, -1], [0, 0, 0], [ 1, 2, 1]]) sobel_x = np.array([[ -1, 0, +1], [-2, 0, +2], [ -1, 0, +1]])
result = signal.convolve2d(image, sobel_x, boundary=’symm’, mode=’same’)
import matplotlib.pyplot as plt fig, (ax_orig, ax_mag) = plt.subplots(1, 2) ax_orig.imshow(image, cmap=’gray’) ax_orig.set_title(‘Original’) ax_orig.set_axis_off() ax_mag.imshow(np.absolute(result), cmap=’gray’) ax_mag.set_title(‘Sobel Applied’) ax_mag.set_axis_off() fig.show()
import imageio from scipy import signal from scipy import misc image = imageio.imread(“/Users/amirh/Downloads/Veins.png”, as_gray=True) sobel_y = np.array([[ -1j, -2j, -1j], [0, 0, 0], [ 1j, 2j, 1j]]) sobel_x = np.array([[ -1, 0, +1], [-2, 0, +2], [ -1, 0, +1]])
result = signal.convolve2d(image, sobel_x+sobel_y, boundary=’symm’, mode=’same’)
import matplotlib.pyplot as plt fig, (ax_orig, ax_mag) = plt.subplots(1, 2) ax_orig.imshow(image, cmap=’gray’) ax_orig.set_title(‘Original’) ax_orig.set_axis_off() ax_mag.imshow(np.absolute(result), cmap=’gray’) ax_mag.set_title(‘Sobel Applied’) ax_mag.set_axis_off() fig.show()
smoothing = np.ones([50,50])
result = signal.convolve2d(image, smoothing, boundary=’symm’, mode=’same’)
import matplotlib.pyplot as plt fig, (ax_orig, ax_mag) = plt.subplots(1, 2) ax_orig.imshow(image, cmap=’gray’) ax_orig.set_title(‘Original’) ax_orig.set_axis_off() ax_mag.imshow(np.absolute(result), cmap=’gray’) ax_mag.set_title(‘Sobel Applied’) ax_mag.set_axis_off() fig.show()
2D Convolution: Detect Edges with Scharr Operator
22
import imageio
from scipy import signal
from scipy import misc
image = imageio.imread(“/Users/amirh/Downloads/Veins.png”, as_gray=True)
scharr = np.array([[ -3-3j, 0-10j, 3 -3j],
[-10+0j, 0 0j, +10 +0j],
[ -3+3j, 0+10j, +3 +3j]]) # Gx + j*Gy
grad = signal.convolve2d(image, scharr, boundary=’symm’, mode=’same’)
import matplotlib.pyplot as plt fig, (ax_orig, ax_mag) = plt.subplots(1, 2) ax_orig.imshow(image, cmap=’gray’) ax_orig.set_title(‘Original’) ax_orig.set_axis_off() ax_mag.imshow(np.absolute(grad), cmap=’gray’) ax_mag.set_title(‘Gradient magnitude’) ax_mag.set_axis_off() fig.show()
Dropouts or how not to overfit 23
Dropouts or how not to overfit 24
Dropouts or how not to overfit 25
Reading: Chapter 9 of Deep Learning Book ConvNets Tutorial in PyTorch Understanding Convnets (blogpost, paper) Understanding Dropouts (blogpost, paper) Programming Learn to work with ConvNets. Follow these tutorials to learn how to use ConvNets for various tasks in PyTorch Beginner: Training a classifier Advanced: Gated ConvNets for Neural NLP Explainability: How ConvNets characterize images Understanding ConvNets
Headline | Time |
---|---|
Total time | 14:41 |
Headline | Time |
---|---|
Total time | 1d 2:15 |
- fwiw: ppl remember beginning and end of event most
- Andring notebooks for presentation
- 2.him.
- capsblog post OR reproduce
- 1st n
- NOdule
- indcgan, gen, discr, 2.5hrs
- pointed code
- we no focus more on theory?
- sont some more advanced stuff
- mareating some of the layers
- we ck individuals who want the more advanced
- cyclegan intro, handson,
- end with 1hr applications and evaluation and conclusions
- Amir won’t be there
- [X] go thru notebook, run and review in detail (prep for wed)
- students tasks:
- wed short draft qualitative about what paper is about
- 2 weeks try to reproduce, expect write observations of attempt, not necessarily success
- can be as low as reading code and observations
- why detach?
- why that view call?
- [X] ask individuals who want more advanced
- those without mics.
- how to engage? talk or chat?
- ask for progress
- suggest others mute or reduce volume when conversation not relevant to them
- remind everyone to mute mics when talk comes on
- [X] try out screen share with lower-right webcam
Headline | Time |
---|---|
Total time | 16:52 |
- less 2.5hrs for that saturday = 17 - 2.5 = 14.5
Introduction Week 1 - Introduction:
Intro to GANs and adversarial training Writing a GAN Training your GAN (hands on) Week 2 - Image to image translation:
Intro to image to image translation GANs Writing a cycleGAN Training a cycleGAN (hands on) Week 3 - Advanced Topics:
Evaluating GANs Current research and state of the art Beyond GANs: applications of adversarial training
[one paragraph covering the gist of what was covered in session 1] For the first session the groundwork was laid with an overview of General Adversarial Networks and a solid introduction to the theory. Instead of the usual image tasks typical of GANs, participants worked on a minimal GAN that simply converted a random uniform distribution to a Gaussian. In this way the focus was on the core essentials that uniquely define GANs.[one paragraph covering the gist of what was covered in session 2] In session two, participants progressed from foundations to more contemporary GAN architectures. The hands-on exercise involved using the ubiquitous DCGAN architecture for an image to image translation task. The lecture portion filled out the picture with the necessary theory and its historical progression.[one paragraph covering the gist of what was covered in session 3] The last session moved to an even more challenging GAN architecture: the cycleGAN. After the lecture portion, an implementation of a cycleGAN along with the training code completed by the participants. As execution continued, our instructor Andrew finished off the workshop by discussing current research and the state-of-the art in the field. He made it clear that GANs are not just for images, but have a place in many areas of machine learning, such as: times series, <insert more here>.[one paragraph outlining the post; has to written once the sections are filled]During the month of August, AISC (Aggregated Intellect Socratic Circles) held a workshop - ‘Generative Adversarial Networks and Beyond’. Attending either on location or online, students were engaged in both lectures from our instructor, Andrew B. Martin, as well as hands-on coding exercises. Over the course of the 3 weekly sessions, students went from the core basics of GANs, through the ubiquitous DCGAN, and ending with CycleGANs. Along the way, students were given supplementary material to expand on the class contents and provide guidance to better enable them to continue with GANs on their own. Our workshop finished with a capstone project the result of which is this post. It is the collective work of all our students. Teams were formed and each team could choose to either write a qualitative summary of a selected paper, or to reproduce the papers results. Below are the results of their efforts. [brief intro to each entry] [Lastly, we finish off with <last entry>. <some closing sentence>]
scratch- [X] blogpost guideline detailed - ask on staff if ?
- intro to whole blog post
- 1 paragraph – takaway of 3 sessions, high level overview
During the month of August, AISC (Aggregated Intellect Socratic Circles) held the ‘Generative Adversarial Networks and Beyond’ workshop.
—
Over the course of the 3 weekly sessions, students went from the core basics of GANs, through the ubiquitous DCGAN, and ending with CycleGANs. Attending either on location or online, students were engaged in both lectures from our instructor Andrew B. Martin and hands-on training with notebooks.Attending either on location or online, students were engaged in both lectures from our instructor, Andrew B. Martin, as well as hands-on coding exercises. Over the course of the 3 weekly sessions, students went from the core basics of GANs, through the ubiquitous DCGAN, and ending with CycleGANs.Over the course of the 3 weekly sessions, students went from the core basics of GANs advanced architectures through the ubiquitous DCGAN, and ending with CycleGANs.Over the course of 3 weekly sessions, students went from the core basics of GANs, to the ubiquitous DCGAN, and ending with CycleGANs. Attending either on location or online, students heard lectures from our instructor Andrew B. Martin and dove right into hands-on training with notebooks.
—
Along the way, students were given supplementary material to expand on the class contents and provide guidance to better enable them to continue with GANs on their own.
— Our workshop finished with a capstone project that is the collective work of all our students. Teams were formed and each team could choose to either write a qualitative summary of a selected paper, or to reproduce the papers results. Below are the results of their efforts. [brief intro to each entry]
The final assignment for this workshop are the capstone blog posts.To finish off the workshop, students were given a capstone project to complete. Students were broken up into nnn teams. Each team had the option to either write a qualitative summary of a selected paper, or to reproduce the papers results.
They were broken up into nnn teams and each team choose one to two options. The first option was to write a qualitative summary of a selected paper, or alternatively, as a second option, to reproduce the papers results in a coding project.
—
- content points
- Instructor Andrew works in industry
- uses GANs in
- he provided practical insights to use GANs in real-life
- capstone
- Instructor Andrew works in industry
Workshop Overview
Generative Adversarial Networks have been very popular in recent years for various tasks like image generation and data augmentation. A large number of papers at ICLR 2019 were focused on GANs. With the fast pace of the field, you won’t be able to stay up to date with the latest if you don’t have the right foundational knowledge of how these networks work under the hood. We are offering this workshop to help you step into the depth of GANs. Are you ready?
In this workshop, you will learn the theory and gather hands on experience in some of the most fundamental concepts and practical tips about GANs.
Important Dates
Please note that this workshop will happen on 3 separate evenings:
August 14, 2019
August 21, 2019
August 28, 2019
Office hours will happen on,
August 20, 2019 (in person and online participants, Group office hour with the TAs)
August 27, 2019 (in person and online participants, Group office hour with the TAs)
Date TBC (Group office hour with the instructor, has to be purchased separately)
“Why should I care about GANs?”
GANs and adversarial ML are widely discussed and increasingly used in AI
GANs are rapidly being applied in many of the cutting edge AI applications
“But I don’t care about generating fake photos!”
Adversarial ML has become an integral part of many of the recent ML algorithms
This workshop goes beyond GANs where you will explore adversarial training and their numerous potential applications
Why you should attend
In this 3-session intensive workshop, we will bring you up to speed with everything needed to build a strong background in GANs. It will be a combination of theory and hands-on applications in PyTorch.
This workshop is built on the instructors extensive experience in academia and industry on related topics.
This workshop is the first in its series and paves the way theoretically and technically for many application specific workshops to follow.
Target Audience
Data Scientists, Machine Learning Engineers, Software Engineers, Students, Other Analytics Roles (data analysts, managers, product owners, etc)
Prerequisites
Knowledge of Python Knowledge of Machine Leaning Familiarity with deep learning Familiarity with PyTorch or other deep learning frameworks is a plus This is a beginner to intermediate workshop Learning Outcomes
We will build a working application in Python using GANs and image processing to generate and translate images
Understand how a vanilla GAN works Understand how and image to image translation GAN works Be able to explain limitations, current research directions, and applications Build GAN to generate images Build another GAN to translate images You will get a deeper understanding on how to apply GANs and adversarial loss to you own deep learning pipeline, in supervised, unsupervised and semi-supervised settings Pre-workshop reading material
TBD
Learning Material
All participants will have access to the following learning material:
Slides from the sessions Hands on notebooks Video recording of the sessions (you can use the videos to watch the parts that you missed, or re-watch any parts that are still unclear for you; access to videos beyond one week after the workshop is available to be purchased; see tickets >> add-ons) Instructor
Andrew Martin
Head of Data @ Looka Inc
Andrew is a data scientist of 8 years working in deep learning and optimization. He currently leads the data team at Looka where they use generative models like GANs to make great design accessible and delightful to everyone.
Course Modules
The workshop happens on 3 evenings, 3 hours each; each module below will be 50 mins.
Week 1 - Introduction:
Intro to GANs and adversarial training Writing a GAN Training your GAN (hands on) Week 2 - Image to image translation:
Intro to image to image translation GANs Writing a cycleGAN Training a cycleGAN (hands on) Week 3 - Advanced Topics:
Evaluating GANs Current research and state of the art Beyond GANs: applications of adversarial training
@Willy Rempel @Werner could you please start looking at the parts people have copied and provide them with feedback? just leave comments directly on their write up. Perhaps only focus on the technical side of what they have rather than language, unless it’s very difficult to read or the flow is very bad etc- 3 ppl
- 3 ppl, Alice et al
- population of Gs
- evo selection
- 1st entry started about 1hr ago. Also did several hours work yesterday.
- restarting at [2019-08-09 Fri] 12:43
- less than 24h I was at epoch=133.
facades: 400 images from the CMP Facades dataset. [Citation] cityscapes: 2975 images from the Cityscapes training set. [Citation] maps: 1096 training images scraped from Google Maps. horse2zebra: 939 horse images and 1177 zebra images downloaded from ImageNet using keywords wild horse and zebra apple2orange: 996 apple images and 1020 orange images downloaded from ImageNet using keywords apple and navel orange. summer2winter_yosemite: 1273 summer Yosemite images and 854 winter Yosemite images were downloaded using Flickr API. See more details in our paper. monet2photo, vangogh2photo, ukiyoe2photo, cezanne2photo: The art images were downloaded from Wikiart. The real photos are downloaded from Flickr using the combination of the tags landscape and landscapephotography. The training set size of each class is Monet:1074, Cezanne:584, Van Gogh:401, Ukiyo-e:1433, Photographs:6853. iphone2dslr_flower: both classes of images were downlaoded from Flickr. The training set size of each class is iPhone:1813, DSLR:3316. See more details in our paper.
python train.py –name CycleMaps1 –model cycle_gan –display_id 1 –dataroot ./datasets/maps
python train.py –name CycleMaps1 –model cycle_gan –display_id 1 –dataroot ./datasets/maps –display_server=”http://192.168.0.35”
bi-cubic order = 3
!pip install pytorch !pip install torchvision
from <gitrepo> import util, models, options, data
no importing: datasets (.sh .py) scripts (all .sh)
explicit has p(x) implicit – has a blox box, just get samples from distribution
- emphasis on GANS, less pytorch ??
- [X] share on staff channel
- [X] elu vs relu vs leaky
- [X] notebook images from Andrew properly reffed so they show up in case of reset
- The generator needs to go from a small input to a larger output, so we need to do ‘transpose convolution’. here’s one explanation. https://towardsdatascience.com/transpose-convolution-77818e55a123
GANs workshop day 2
Today’s itinerary Previous class Simple GAN to DCGAN Coding a DCGAN Intro to CycleGAN Assignments, exercises, and next week OVERVIEWSelected work Approximate P(x,y) rather than P(x | y)
Generated symbols OUR PROJECTS
Generated Typefaces OUR PROJECTS
Previous Class Approximate P(x,y) rather than P(x | y)What we did Overview of Generative Adversarial Networks 1 hidden layer fully connected generator and discriminator Stochastic Gradient Descent optimization Binary cross entropy loss function
Reading and Assignments Goodfellow et al (2014), Radford et al (2015) Pix2pix, CycleGAN, and/or batch norm papers Challenges Update the model to use 32x32 or 128x128 images Interpolate through the latent space of the trained DCGAN Adapt the first GAN to run on GPU Prepare your own dataset to run through DCGAN Try removing batch norm and see what happens Questions: Does training convergence indicate better results? How would you estimate memory needs for a GAN? Previous class
Simple GAN to DCGAN Approximate P(x,y) rather than P(x | y)
OVERVIEW OF GANS Training Through training the generator learns to turn random noise into realistic samplesGenerator and discriminator Two networks, the generator and discriminator, competing in a two player minimax game
Discriminator trained to identify whether a sample comes from the training set or the generator
Generator trained to generate samples that trick the discriminator OVERVIEW OF GANS
Model description Transform random uniform noise into a normal distribution 1 hidden layer fully connected generator and discriminator Stochastic Gradient Descent optimization Binary cross entropy loss function
First GAN
DCGAN OVERVIEW OF GANS Radford et al 2015Model description Transform random random noise into images of font sheets Multiple convolutional hidden layers with batch normalization and leaky ReLU Model weights initialized from Normal with mean 0 and stddev 0.2 Adam optimization Binary cross entropy loss function
DCGAN
Model description OVERVIEW OF GANS
Similarities with Simple GAN Generator still maps latent space to target distribution samples Discriminator still maps samples to a classification Training loop is virtually unchanged Model still uses binary cross entropy loss function
DCGAN
Concepts for DCGAN Approximate P(x,y) rather than P(x | y)
Convolutional hidden layers All convolutional network (Springenberg et al., 2014) Efficient for getting a representation of images Strided convolutions to learn its own pooling function
Concepts for DCGAN
Strided convolutions Concepts for DCGAN
Transposed strided convolutions Concepts for DCGAN
Batch Norm Introduced in Ioffe & Szegedy (2015) Normalize input to each unit to have mean 0 and variance 1 Allows gradients to flow for deeper generators with no mode collapseConcepts for DCGAN
Adam optimizer Introduced in Kingma & Ba (2014) Adaptive moment estimation Stochastic gradient descent has one learning rate. Adam has an adaptive learning rate for each network parameter Uses moving averages of first and second moments of gradient Four hyper parameters: initial learning rate, decay on moving averages, and epsilon In practice very robust and doesn’t need as much tuning as other algos Concepts for DCGANCoding DCGAN Questions What prevents this architecture from being used for large models? What would happen if we replaced batch norm with spectral norm? What do I mean when I said strided convolutions replace pooling functions? Coding DCGANIntro to CycleGAN Approximate P(x,y) rather than P(x | y)Horse to zebra CycleGAN
Unpaired image to image translation CycleGAN
High level Unpaired image to image translation Builds on the pix2pix model from a year earlier Two generators and two discriminators Concept of cycle consistency loss Transfers style from one collection to another collection CycleGAN
Model training CycleGAN Zhu et al 2017
Paper implementation Generator: 6 - 9 residual blocks Discriminator: 70x70 PatchGAN to reduce size Least squares loss replacing BCE for stability Train generator with history of generated images rather than latest to reduce oscillation Train with Adam and a batch size of 1 See PyTorch code: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix CycleGANCourse implementation Generator: Transposed convolutions with a residual block Discriminator: convolutional discriminator Least squares loss replacing BCE for stability Train with generated images from the current batch Train with Adam and a batch size of 16 CycleGANWhat’s next?Next week Coding cycleGAN Adversarial training more generally Evaluating GANs and where the technology is heading What’s next?Assignments and reading Pix2pix and cycleGAN papers Take a look at the capstone papers Look at this code base: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/tree/master/models Collect all your questions. About anything… What’s next?Thank you
update gans intro , edit group posts 3 paragraphs / tech coveredHeadline | Time |
---|---|
Total time | 11:04 |
likely more accurate time. except for breaks though
Headline | Time |
---|---|
Total time | 12:56 |
3 teams – ask Qs, are engaged, understand things, convey we are here to help more attention from instructor online relaxing FG will do most of it.
pencil&paper exercise? exercises get points setup rules clearly
Vahid — oct3rd, oct4th to EU
log in sheet – 15min answering Qs offline add prep time add
udacity RL repo- interesting when starting up jupyter-lab for this repo: Build Recommended JupyterLab build is suggested: @jupyter-widgets/jupyterlab-manager needs to be included in build
For online participants, please familiarize yourself with the online participation guide:
- You can join this call for the hands on sections to discuss any issues you face, or any questions you have https://meet.google.com/ake-cozz-kkx?hs=122. It is the same for all sessions. This is not the same as the link for the workshop streaming itself (located here https://member.ai.science/workshops/rl-2019-09&sa=D&usd=2&usg=AOvVaw0zvT5wO7_6Sofktww4pQ2J)
- Please don’t forget to mute your mic when the instructor session begins, or if there is background noise at your location.
- Do not hesitate to ask questions, we are here to help.
- Screen sharing in meet.google.com: https://support.google.com/meet/answer/7290345?co=GENIE.Platform%3DDesktop&hl=en, or this video clip https://www.youtube.com/watch?v=6FCIqvv68NY. For some issues, we may need to see your screen in order to help you.
- The hands-on notebook will be in the google drive prior to the workshop. Don’t forget to make a copy in your own drive and use the copy for all your work. If you have time, try and look thru the notebook prior to the session.
- We will use <link> as an online whiteboard as a convenience. Sometimes it is easier to explain with a diagram.
- title
- “3ed” -> “3rd”
- cap “W”orkshop
- #6 “process converges to to” duplicate “to”
- copy session3 sol notebook to std folder
- dl to my files
- start reading texts
could you populate this doc with some of the important links shared with the class, or useful links that the student shared? https://docs.google.com/document/d/16KX0-fJMBn1ByaG23TEZQHTVAMx-CHEO8jY23utBdjw/edit
This blog post is the collective work of the participants of the “Reinforcement Learning” workshop. This post serves as a proof of work, and covers some of the concepts covered in the workshop in addition to advanced concepts pursued by students. The blog post will be shared on the AISC Blog, and other major ML/DS related outlets.
Objective The objective of this post is to demonstrate an understanding of the concepts learned in the Workshop.Teams will be tasked with summarizing a selected RL paper and providing additional insight into the findings and concepts discussed.
Contribution Declaration Each team will provide contribution acknowledgement to its members for participating in the blog post writing and idea generation. Steps Refer to the References section and select one of the suggested papers. Given the topics discussed in the workshop, you should be familiar with the concepts used in each of the papers. Please list the names of your group and team members that contributed to the development at the top of each section. Work with your breakout team to go through the paper, and support each other to make sure everyone understands it well; you can divide and conquer by reading/researching different parts and explaining to each other Come up with a work plan for each member of the team to contribute something (the updated version of this should later turn into the “contribution declaration”) Collaborate and draft your learnings into a section in this article; the alternative is to create a video about what you learned AISC blog editors will provide light technical and language feedback on the post prior to publication on the AISC Blog. AISC blog editors will provide guidance before submitting the blog post to other mediums such as Towards Data Science (Medium.com) and KDNuggets.
Important Dates There are 2 deliverable options based on your available time commitment. Please select the option that suits your team members. SEP 20: Select the paper you want to work on SEP 30: Write a qualitative summary of what you learned about the paper you selected. Please provide enough technical details to demonstrate your understanding OCT 14: Reproduce the results of the paper by going through the code, and rerunning it on the given data set. This should link back to your github repo, or any other pages showing your results/code If you’ve completed both exercises and would like an additional challenge, please contact us at events@ai.science and we can provide guidance to find an extension that could result in some sort of research publication, or a simple application that you can use as part of your portfolioRecommendations You may pick a paper that is not listed here as long as you convince your team; please add that paper to the References section We generally prefer that you don’t work alone, but if you have very good reasons for it, we might consider it You are encouraged to read other sections and provide constructive feedback in the form of comments, but please do not alter them If you claim a section to write, then you need to deliver by the due dates above. If you miss the deadline, the post will be published without your section Breakout teams are combinations of in-person and online audience, and it’s everyone’s responsibility to make sure that all team members are engaged and informed about plans. You can use the slack channel for your communication as much as you want, but also can arrange for video calls etc. If you use any other resources, add their information to the References section, but make sure you don’t modify the existing ReferencesThe foundation of all Reinforcement Learning (RL) is of an agent that acts on, and is acted on by its environment.
The agent-environment relationship forms a closed loop:
- the environment receives from the agent an action
- this action can change the environment. That is, the environment changes state
- the agent then receives from the environment both state and reward information
This loop can be formalized as a Markov Decision Process (MDP),
$$p(s′,r \mapsto s,a) = Pr(S_t=s′, R_t=r \mapsto St-1=s, At-1=a)$$
where the next state s’ and its associated reward
Environments can be discrete or continuous. State changes can be deterministic or not. An environment can be described by its states, the allowed actions at each state, and a transition function
- Maintain approximate value and policy functions
- The policy is iteratively improved with respect to the value function, while the value function is evaluated with respect to the policy
- This feedback loop converges to optimal policy and value functions
- the value function is used to structure and constrain the policy search
Dynamic Programming algorithms are at one extreme of RL methods requiring a perfect model of the environment and typically exponential computation cost. They are used for the theoretical underpinning of reinforcement learning as opposed to practical use. Briefly, dynamic programming involves finding optimal solutions by progressively building from optimal solutions to sub-problems. At the other extreme Monte Carlo (MC) methods have no model and rely soley on experience from agent-environment interaction. The value of a state s is computed by averaging over the total rewards of several traces starting from s. These methods require completing entire episodes (traces) before the value function can be updated. Temporal Difference (TD) learning is an invaluable approach that combines advantages from both DP and MC methods. As the name implies, valuation updates are done recursively by the difference between time steps. It does not require an environment model like DP, and unlike MC it can update prior to episode completion. Like MC, it learns directly from experience, and like DP it iteratively updates estimates. This is easiest to show by comparing value function updates rules: Monte Carlo $$V(St) \mapsto V(St) + α[Gt - V(St]$$
Dynamic Programming $$vπ(s) = \mathbb{E}π[Rt+1 + \gammaGt+1| St = s]$$
Temporal Difference $$V(St \mapsto V(St + α[Rt+1 + \gammaV(St+1 - V(St)]$$
where
SARSA is a TD algorithm that is an ‘on-policy’ learning method. On-policy methods evaluate and improve the same policy
- Double deep Q-Learning
- Duelling DQN
- Action Advantage
- Noisy Networks
- Multi-step Learning
- Prioritized Experience Replay
Lastly, Policy Gradient Methods directly optimize a parameterized, differentiable policy function that does not require the use of the value function for action selection. For example, the REINFORCE Monte Carlo Policy Gradient algorithm trains policy $$π(a\vertSt,θ)$$ with parameters
where
- Markov Decision Process (MDP) + Markov Property
- Env
- states
- rewards
- transition
- recursive, iterative, game, terminate, infinite, trace/trajectory/episode, goal, G, R/rrr,
- discounting
- exploration vs exploitation
- e-greedy
- 1-armed bandit
- Agent
- policy
- actions
- epsilon-greedy
- value
- value functions and Bellman Equation
- policy
- Almost all RL algorithm are GPI
- Maintain both an approximate value function and an approximate policy
- Iteratively improve policy with respect to value function, and value function always drives to the value function of the current policy
- Overall process converges to to optimal policy and optimal value function Generalized Policy Iteration (or in PGMs?)
- Dynamic programming
- model vs model free - simulates future states
- Florians:
- Use value function to perform a structured search of good policies
- Iterative approximations v1, v2, v3, v4, … of vπ(s) by using Bellman equation as update rule
- Replace v(s) with new value calculated bu the old values of v(s’). This is called expected update.
- Terminates ones value functions minimal change after iteration
- Monte Carlo Methods
- DP requires distribution of the next events
- Monte Carlo based methods rely only on experience No prior knowledge of the environment is required Averages sample returns (remember k-armed bandits)
- TD learning
- Compare to DP and MC
- Does not require model of environment (unlike DP)
- MC needs to wait until episode finish, TP can online update
- MC it’s hard to estimate value of action-state pair
- On-Policy, Off-Policy
- off-policy
- Importance Sampling
- Transform Weight
- Weight Importance Scaling
- off-policy
- SARSA
- Q-Learning is also a temporal difference learning algorithm. However, unlike SARSA, it is off-policy.
- Compare to DP and MC
- Deep Q-Learning
- ?optimal bellman for QL
- Experience Replay
- All the learnable parts: policy, value fns,
- Rainbow DQN
- Double deep Q-Learning
- Duelling DQN
- Action Advantage
- Noisy Networks
- Multi-step Learning
- Prioritized Experience Replay
- Policy Gradient Methods
- describe
- Regression towards optimal policy - we don’t know optimal policy
- REINFORCE Monte Carlo Policy Gradient Control
- stronger convergence guarantees
- Deep Deterministic Policy Gradients
- Actor-Critic
- DDPG Double Deep Policy Gradient algorithm
In all the discussion up to now, any model learned indirectly by translating reward information into a loss function. Policy Gradient Methods directly apply the reward signal into the gradient updates of the policy function.
draftAn agent must learn the value of states
value fns expanded: $$vπ(s) = ∑a π(a\mapstos)∑s′,r p(s′,r\mapstos,a)[r + \gammavπ(s′)], for all s ∈ S $$ q version needs to be done right: $$qπ(s,a) = ∑a π(a\mapstos)∑s′,r p(s′,r\mapstos,a)[r + \gammavπ(s′)], for all s ∈ S $$
The foundation of all Reinforcement Learning (RL) is of the agent in an environment. From this flows all aspects of RL
- agent volition (policy)
- agent
The agent is an actor which recieves information about the environment, acts so as to make a change An agent that acts, recieves input
Environments can be discrete or continuous. State changes can be deterministic or not.
An environment can be described by its states, the allowed actions at each state, and a transition function
- pdf reading [2019-10-21 Mon 13:49], about 1.5 hours till now.
- before hand sporadic
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Title of the paper: A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem Team members: Larry Li , Nick Buryonk , Gurinder Ghotra Contributions: Title of the paper: Deep Reinforcement Learning in Large Discrete Action Spaces Team members: Eric Djona Fegnem, Ariel Wang, Brendan McGivern, Yuanhui Lang, Ike Okonkwo Contributions: Title of the paper: AlphaD3M: Machine Learning Pipeline Synthesis Team members: Most Husne Jahan, Gunjan Lakhlani, Andriy Kourinnyi, Alireza Darbehani Contributions:- 1st sesh, distracted.
- didn’t count this,
- Policy:
- defines how our AI choose his action for a given state
- Reward Signal:
- the reward for a given action (not state?)
- Value function:
- Expected future rewards of a state (how good is a given state)
- Environment models:
- Simulates future states, allows for planning
https://awwapp.com/# https://www.tutorialspoint.com/free_online_whiteboard.htm https://drive.google.com/drive/folders/1f8okO2XDraTJydT7MetOKuPrU8U1pBU2 https://colab.research.google.com/drive/1ymctv7BRNG9UL7Tara_vyYgi103sdMnW#scrollTo=AT8EfnJGgbc7 https://colab.research.google.com/drive/1zEgL1uYgtJgZMJ00LBKPs02dOkOnnVa8#scrollTo=CeiiZKYVgcVq https://drive.google.com/drive/folders/1ffIXP5k9k5Rcvbo4ouAXFJeR8XoTcUIb https://colab.research.google.com/drive/13tk0npWP6JhEx9JVw0bdcC8u3skD2fm0#scrollTo=5zH8U8OGe_fd
**** Notes ***** My AWS Account InfoAWSAccessKeyId=AKIAICLYXL5KD3AURP4A AWSSecretKey=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 region=us-east-2
***** My Azure Account Info az account show
{ "environmentName": "AzureCloud", "id": "38e53cfe-df59-42bc-ac0c-b50136568522", "isDefault": true, "name": "Free Trial", "state": "Enabled", "tenantId": "0b7a2c43-b11b-4048-a4ba-cf3fdd2b2272", "user": { "cloudShellID": true, "name": "live.com#willy.rempel@gmail.com", "type": "user" } }
azureProfile.json
{"subscriptions": [{"id": "38e53cfe-df59-42bc-ac0c-b50136568522", "name": "Free Trial", "state": "Enabled", "user": {"name": "willy.rempel@gmail.com", "type": "user"}, "isDefault": true, "tenantId": "0b7a2c43-b11b-4048-a4ba-cf3fdd2b2272", "environmentName": "AzureCloud"}]}
ML service workspace
name | azure-ml-ws-1 |
Subscription | Free Trial |
Resource group | cloud-shell-storage-eastus |
Location | East US 2 |
***** Important links doc ****** Azure insturctions • Set up your free Azure Credit • Install mini conda for Python 3.7: https://docs.conda.io/en/latest/miniconda.html (make sure you updated the Path for conda command) • Run the following commands • conda create –name azureml python=3.7 • conda activate azureml • conda install scikit-learn • pip install tensorflow==1.14 • pip install azureml-sdk[explain,automl,notebooks,automl,services] • pip install pandas • pip install jupyter • Github: • Install github desktop • Create a github account • Set up github on your machine (login etc) • Install VSCode • Install extension: (https://code.visualstudio.com/docs/editor/extension-gallery) • Python (Author: Microsoft) • Azure Account (Author: Microsoft) • Azure Machine Learning (Author: Microsoft) • Git Graph (Author: mhutchie) ***** session1 notes
- 10-15 years of DevOps
- MLops
- own entire lifecycle: build and deploy
- IT (Ops) only focus on infrastructure
- at least be able to talk to Ops team,
- cloud
- managed resurces
- huge abstraction
- serverless, auto scalability
- Separation of resources
- lower cost
- @1:07 git starts
**** Archive **** TODOs [22/43]
***** AWS Study [2/7] ****** [x] sagemaker api****** [x] boto3 api****** [.] aws deepdive series ******* vid1- managed notebook EC2 VM instance, managed means
- doesn’t show up in EC2 console
- no SSH access
- EBS volume 5GB default
- persists
- add, create git repo
- config shell 15min time limit, use’&’
- Elastic Inference – attach GPU
****** [-] aws cli ****** [.] build own pipeline might need more, check serverless repo ******* [.] yaml formation ******* [.] aws shell ***** Azure Study [0/2]
Here are the topics and the respective contents that need to be created for AWS and GCP: For every bullet for each day we need to gather:
- Title and a short description of the technology in a paragraph or two. This will be used to explain technology.
- 2-3 links for further studies.
- If it requires implementation: simple notebook. If it’s an architecture 2-3 images
Hossein mentioned - appendices included for hands-on and home-work (HW)
Overall Architecture for ML Stack in GCP and AWS (one or two paragraphs + architecture diagram)[c] CI | CD frameworks on AWS and GCP + Integrating them with GCP AI Hub or AWS SageMaker for ML Pipelines (GitOps) (Simple diagram showing the flow + Short Notebook if applicable)
- native clouds, or FOSS as example
- data engineers
- tracking/logging metrics in AWS
- hard to find?
- exptrack – collect logs, vs #4 below.
- autoML –
- autoML algos are in marketplace
AWS Search
- Find
- Evaluate
- Verify datasets used by training jobs
- Trace Model lineage
- dataset
- algorithm
- hyperparameters
- metrics
Tags are used to track experiments and group them together. You apply them in your code, and can search using either the AWS Console, a web front-end, or by the API.
AutoML - this is offered by the AWS marketplace, where there are several options
[x] 2-3 links for further studies.AWS Documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/search.htmlSample Notebook: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/search/ml_experiment_management_using_search.ipynb
[x] If it requires implementation: simple notebook. If it’s an architecture 2-3 images[x] code sample[x] screenshot[x] Hyperparameter tuning techniques and parallel training engines in GCP or AWS (one or two paragraphs + code samples + screenshot if applicable)
- bayesian search has good tutorial links
- built-in come with metrics
- metrics found in cloudwatch logs
- define metrics. send to ??? https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-metrics.html
Example from Documents: https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-ex.html
[x] If it requires implementation: simple notebook. If it’s an architecture 2-3 images[x] code sample[x] screenshot- is he presenting our stuff or are we?
- am I doing online?
- What is expected of us? thurough knowledge of our cloud
- FOSS do I pick my own favourite?
- NO: do we talk about the high level services (turn-key)
- ie. protect a branch
- discuss different deployment scenarios - batch and online
- build release pipeline for model deployment
- monitoring and logging techniques for ML model in the wild
- best practices to build scalable ML pipelines
- bring model explainability in the pipeline (training and deployment)
cell 14 : add “`model_name = “tf_mnist_pipeline.model”“` cell 15 : create score directory in root folder cell 19 : add “`import os“` cell 19 : “`Execution script score.py doesn’t exist.“`
- pipeline= series of transforms
- data pre-process keep seperate from model building
- [ ] actual compute targets
- [ ] bring in data from redshift? other than S3
- [ ] what enviros for work?
- reproducible enviro for training for day1
- multi-pipelines options for the whole pipeline, so people know about them
- be able to coach ppl on how to do the work on cloud use for capstone
- [ ] 1st part architecture: data - preprocc - train - etc
- deploy, orchestration next day
- end-to-end initial, what it looks like
- exp tracking , any logs, not just
- use these as guides: azure experiments, mlflow experiments
- [ ] high level understaning for initial
- [ ] find some github links for examples
- [ ] seperate g docs for days
- use boto, SDK
- they pick a project that groups build from beginning to end
- Azure
- focused
- FOSS stacks as well, and theoretical aspects
- QFlow, etc ???
- Spod,
- dataprod, databricks,
- ppl tend to use managed FOSS
- focus for enterprise ready on the cloud
- including as alt to azure
- AWS
- TODO US: for AWS
- slides
- simple notebook
- ie. track metrics on aws or gcp
- TODO US: for AWS
- google cloud compute (GCP) - other TAs
- AWS
- full ML pipeline finished by end of WK3
- git, more advanced
- team focused
- 1hr including hands-on, some examples
- cherry picking, bisecting optional HW
- some contest? missed. Monday, tue
- sagemaker - fancy version of jupyter
- some stuff, not all.
- AWS behind
- exp tracking
- sparkML -> big data pipeline
- ie ML exp & metrics https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-track-experiments
- sagemaker: no container,
- ci/cd use co-pipline e
- AWS acct: ML-Ops Staff
- sagemaker: do everything there.
- sagemaker estimator least flexible
- [ ] can we go lower level?
- AWS lambda for inference?
- CloudFormation - it is the orchestration
- use a definition, and reverse engineer
- ECR - amazons container (elastic container register)
- amazon dynamoDB for logging etc, but you are not stuck with it
- cloudformation button, deploy whole thing
- Xiyang will send AWS
Input artifact bld Output artifact trigger FunctionName aws-mlops-model-cicd-pipeli-LambdaSageMakerTrigger-DKC87JX8ZI2C
- 50-55 ppl
- feedback:
- lots about git
- issues about enviros and setup
- some too slow, some too fast
- etc
- Amir:
- cutting down material
- TAs more proactive
- howto for online?
- fixed (poll) hours
- 2nd session
- continue with git
- reduced and abstracted content a lot
- platform agnostic content
- but gets advanced, not enough time
- give extra material
- additional session for advanced students
- will not cover: containerization, deploy dockers
- not trivial
- MLops is devops for ML
- 3rd deploy to kub or docker into wild – from telemetry make data driven actions
- kubernetes env on AWS and howto get telemetries?
- refresher of entire workshop
- TA hour 1 [2019-10-12 Sat] : G1, G3
- Jiri
- workshop circle with Joffri on AWS
- aisc: Omar recommendor systems
- RL recommender system
- wants to do both Azure & AWS, will do Azure with Zain
- Zain, has less time, will do Azure only
- irl for their next meetup sat.
- mnist model
- cloud agnostic for Fatin
- tracking, reproducability, dockerimages?,
- anyone else to join?
- fyi : terraform intead of cloudformation at his job
TheAlgorithms/Python: All Algorithms implemented in Python
- PYCON UK 2017: Machine learning libraries you’d wish you’d known about - YouTube - great talk, seasoned datsci. followed him on github.
- ianozsvald/data_science_delivered: Observations from Ian on successfully delivering data science products - his repo with real-world advice for production and scale datsci, with notebooks. repo cloned in ~/DevAcademics/PythonNotebooks/data_science_delivered
- scikit-learn-contrib/forest-confidence-interval: Confidence intervals for scikit-learn forest algorithms - via readme from above. How good are the models?
- will install all 5 libraries he mentioned.
- Web REST API Benchmark on a Real Life Application – Mihai Cracan – Medium
- Top 20 Python libraries for data science in 2018 | ActiveWizards: data science and engineering lab , this started it all for last couple of days. (found via linkedin)
- Graphviz - Graph Visualization Software
- StatsModels: Statistics in Python — statsmodels 0.9.0 documentation
- Overview — ELI5 0.7 documentation
- ONNX - Getting Started
- NLP
- PYCON UK 2017: Machine learning libraries you’d wish you’d known about - YouTube
- DistrictDataLabs/yellowbrick: Visual analysis and diagnostic tools to facilitate machine learning model selection.
- marcotcr/lime: Lime: Explaining the predictions of any machine learning classifier
- EpistasisLab/tpot: A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
- Dask: Scalable analytics in Python
- xgboost/python-package at master · dmlc/xgboost
- Installation Guide — xgboost 0.72 documentation
- How to Install XGBoost for Python on macOS
- in the comments guy did conda install -c conda-forge xgboost, saves lots of steps
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.hist(np.random.randn(20000), bins=200)
print("hello world")
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.hist(np.random.randn(20000), bins=200)
import IPython.kernel.multikernelmanager as km
# km.list_kernel_ids()
# km.MultiKernelManager.list_kernel_ids()
# print print_me
# print "hello world"
# print "another ipython kernel!"
gradientTape example
import tensorflow as tf
x = tf.constant(3.0)
with tf.GradientTape() as g:
g.watch(x)
y = x * x
dy_dx = g.gradient(y, x) # Will compute to 6.0
print(dy_dx)
4.1 apihelper.py
def info(object, spacing=10, collapse=1):
"""Print methods and doc stings. Takes module, class, list, dictionary, or string."""
methodList = [method for method in dir(object) if callable(getattr(object, method))]
processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)
print "\n".join(["%s %s" %
(method.ljust(spacing),
processFunc(str(getattr(object, method).__doc__)))
for method in methodList])
if __name__ == "__main__":
print info.__doc__
import sympy as sym
x = sym.Symbol('x')
k = sym.Symbol('k')
print sym.latex(sym.Integral(1/x, x))
- l list
- n next
- c continue
- s step
- r return
- b break
- And python
import sys
def is_venv():
return (hasattr(sys, 'real_prefix') or
(hasattr(sys, 'base_prefix') and sys.base_prefix != sys.prefix))
The check for sys.real_prefix covers virtualenv, the equality of non-empty sys.base_prefix with sys.prefix covers venv. Consider a script that uses the function like this:
if is_venv():
print('inside virtualenv or venv')
else:
print('outside virtualenv or venv')
open all csv files in a dir and do some row stuff 36.11. pipes — Interface to shell pipelines — Python 2.7.14 documentation 13.1. csv — CSV File Reading and Writing — Python 2.7.14 documentation
import os
import csv
path=os.getcwd()
filenames = os.listdir(path)
for filename in filenames:
if filename.endswith('.csv'):
r=csv.reader(open(filename))
new_data = []
for row in r:
row[-1] = row[-1].replace("S-D", "S")
new_data.append(row)
newfilename = "".join(filename.split(".csv")) + "_edited.csv"
with open(newfilename, "w") as f:
writer = csv.writer(f)
writer.writerows(new_data)
import matplotlib.pyplot as plt
plt.style.use('dark_background')
cmap = plt.get_cmap('viridis')
colors = cmap(np.linspace(0, 1.0, len(data)))
for i, series in enumerate(data):
plt.plot(series, color=colors[i])
plt.show()
A mile Hy - My experience with lispy Python | Modern Emacs
- setv - set variables
- cond - cases wrapped in []
- do = progn
- (for [i (range 10)] …)
Kitchin loves it.
(import numpy)
(setv a (numpy.array [1 2 3]))
(setv b (numpy.array [1 2 3]))
(print (numpy.dot a b))
(defn simple-conversation []
(print "hello! yadda yadda")
(setv name (input "What name? "))
(setv age (input "What age? "))
(print (+ "hello " name "! I see you are " age " years old.")))
(simple-conversation)
a = 5
b = 2**5
hi = "Hello World!"
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
a = 888
# plt.hist(np.random.randn(20000), bins=200)
# %matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
# import tensorflow as tf
plt.hist(np.random.randn(20000), bins=200)
# import tensorflow as tf
x = np.random.randn(10000)
y = np.sin(x)
import numpy as np
import tensorflow as tf
node1 = tf.constant(3.0, dtype=tf.float32)
node2 = tf.constant(4.0)
# print(node1, node2)
# [foo(x) + 7 for x in range(20)]
"what?"
sess = tf.Session()
# return (node1, node2)
# return (sess.run([node1, node2]))
node3 = tf.add(node1, node2)
return ("sess.run(node3):", sess.run(node3))
# %matplotlib inline
# import matplotlib.pyplot as plt
# import numpy as np
# import tensorflow as tf
# plt.hist(np.random.randn(20000), bins=200)
# def foo(x):
# return x + 9
# [foo(x) + 7 for x in range(7)]
import sympy as sym
x = sym.Symbol('x')
k = sym.Symbol('k')
print(sym.latex(sym.Integral(1/x,x)))
print(sym.latex(sym.besseli(x,k)))
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
t = np.linspace(0, 20 * np.pi, 350)
x = np.exp(-0.1 * t) * np.sin(t)
y = np.exp(-0.1 * t) * np.cos(t)
plt.plot(x, y)
plt.axis('equal')
plt.figure()
plt.plot(y, x)
plt.axis('equal')
print('Length of t = {}'.format(len(t)))
print('x .dot. y = {}'.format(x @ y))
+---------+
| |
| Willy |
| |
+----+----+---+
|Bar |Baz |
| | |
+----+--------+
++
++
find where ditaa should go
(expand-file-name
"ditaa.jar"
(file-name-as-directory
(expand-file-name
"scripts"
(file-name-as-directory
(expand-file-name
"../contrib"
(file-name-directory (org-find-library-dir "org")))))))
+------+ +-----+ +-----+ +-----+
|{io} | |{d} | |{s} | |cBLU |
| Foo +---+ Bar +---+ Baz +---+ Moo |
| | | | | | | |
+------+ +-----+ +--+--+ +-----+
|
/-----\ | +------+
| | | | c1AB |
| Goo +------+---=--+ Shoo |
\-----/ | |
+------+
AISC RL Workshop DLRL RL workshop
https://opensource.google/projects/dopamine https://opensource.google/projects/deepmind-lab https://opensource.google/projects/magenta