Skip to content

Instantly share code, notes, and snippets.

@vv111y
Created October 24, 2019 19:45
Show Gist options
  • Save vv111y/14026cb5c0ea00f920bd0dd85210e5a6 to your computer and use it in GitHub Desktop.
Save vv111y/14026cb5c0ea00f920bd0dd85210e5a6 to your computer and use it in GitHub Desktop.

DatSciWorkbook

HEADER

START [0/6]

LOG

  • [2018-09-04 Tue 23:42] got this far with pytorch gan run: [Epoch 146/200] [Batch 885/938] [D loss: 0.372291] [G loss: 1.665622]
  • [2018-10-12 Fri 12:56] note: G in GANs don’t see real image – perhaps makes it too slow to train. maybe it should have access to image? but seems to converge fairly quick to approximate - problems are when it is trying to get fine details.
    • now this makes me think that perhaps a dialog between G and D for each part of image might help.
    • training schedule of multiple iterations per one image so G tries to get close to one image at a time
  • [2018-11-16 Fri 06:10] downloaded all pdfs for CS224n NLP course. Also made playlist of youtube lectures.
  • [2019-02-06 Wed 09:32] my trial run couple weeks ago (meet#3?) of the spinningup code for RL. went well. on wks. 611 conda install gym 612 pip install gym 623 conda create -n spinupRL python=3.6 628 cd DevAcademics/ReinforcementLearning/spinningup 629 pip install -e . 630 conda list 631 python -m spinup.run ppo –hid [32,32] –env LunarLander-v2 –exp_name installtest –gamma 0.999\n 632 python -m spinup.run plot /home/will/DevAcademics/ReinforcementLearning/spinningup/data/installtest/installtest_s0 633 python -m spinup.run test_policy /home/will/DevAcademics/ReinforcementLearning/spinningup/data/installtest/installtest_s0

[.] mathematicalmonk ML playlist

Machine Learning Playlist - YouTube

  • Machine Learning | 160 out of 160 videos
  • Average Duration: 0 days, 0 hours, 13 minutes and 12 seconds
  • Total Duration: 1 days, 11 hours, 13 minutes and 33 seconds

[.] keras trials with own code

[.] learn autoencoders

use this material to start on autoencoders, via TDLS slack channel: Ehsan [7 hours ago] Just for the channel I copy my answer here as well: There is an abundance of work on autoencoders … hmmmm most of my knowledge comes from reading articles. Here we go: This paper is a must for VAEs: https://arxiv.org/abs/1312.6114

But read this one first: http://proceedings.mlr.press/v27/baldi12a/baldi12a.pdf This is a great review: http://www.cl.uni-heidelberg.de/courses/ws14/deepl/BengioETAL12.pdf You will particularly like this one: https://arxiv.org/pdf/1502.04156.pdf

[.] infographic niagara data group

[.] [#B] O’Reilly for coding?

neet graph on loss function and convexification

"""Grab a cookie."""
import numpy as np
import matplotlib.pyplot as plt

z = np.linspace(-1,3, num=10000)
binary = z <= 0
hinge = np.maximum(0,1 -z)
quad = (z - 1) ** 2
logistic = np.log2(1 + np.exp(-z))

for name, loss in dict(binary=binary, hinge=hinge, logistic=logistic, quadractic=quad).items():
    plt.plot(z, loss, label=name)

plt.legend(loc="best")
plt.tight_layout()
outfile = "losses.png"
plt.savefig(outfile, dpi=200, bbox_inches="tight")
print(outfile)
plt.show()

[.] symbolic paper from Xiyang – Learning by Abstraction: The Neural State Machine

didn’t write down which one could be either:

Learning by Abstraction: The Neural State Machine – I think this Learning Neurosymbolic Generative Models via Program Synthesis NEURO-SYMBOLIC PROGRAM SYNTHESIS

MichaelMMeskhi/MtL-Progress-github.io: Repository to track the progress in Meta-Learning (MtL), including the datasets and the current state-of-the-art for the most common MtL

DatSci Tooling (toolsplan) [2/10]

Start

Notes

jupyter notebook in pycharm

Using IPython/Jupyter Notebook with PyCharm - Help | PyCharm Installing, Uninstalling and Upgrading Packages - Help | PyCharm

  • installing package option to install to suer’s site packages directory
    • (on winwk, Will\AppData\Roaming\Python)
    • did this for jupyter, matplotlib, sympy. as per tutorial (using dummy project)
    • many other dep packages also installed
    • jupyter is metapackage: install all jupyter components
    • jupyter install had error: needs MS visual C++ 10.0. can do it that way, or pip –user in shell
      • same error.
      • seems this can all be done outside pycharm, and then just select in projects preferences.

convert notebook to orgmode

jupyter nbconvert notebook.ipynb --to markdown
pandoc notebook.md -o notebook.org

interactive jupyter via widgets

Widgets: Building Interactive Dashboards with Jupyter Project Jupyter | Widgets Interactive Visualizations In Jupyter Notebook – Towards Data Science

jupyter management

Connect to an existing kernel · Issue #2044 · jupyterlab/jupyterlab Initial server management implementation by lucbouchard1 · Pull Request #71 · jupyterlab/jupyterlab_app jwkvam/jupyterlab_vim: Vim notebook cell bindings for JupyterLab

rodeo, R studio type ide for python

Python Config Notes

  • tensorflow
    • conda package not watched by them
    • conda env > virtualenv > pip > conda (docker another use case)
    • docker for GPU recommended
  • [X] update conda / anaconda
  • pip3 requirements empty, all in pip, 322 total
  • conda list - 574 total
    • several are duplicates with pip installs
    • latest conda root env export to yaml file: 358 in conda, 98 in pip

packages not found in conda channels

PackagesNotFoundError: The following packages are not available from current channels:

  • r-r6==2.2.2=r3.4.1_0
  • r-tibble==1.4.2=r3.4.1_0
  • ca-certificates==2018.4.16=0
  • r-bindr==0.1.1=r3.4.1_0
  • zope.interface==4.5.0=py36h470a237_0
  • yellowbrick==0.7=py36_1
  • r-dbi==1.0.0=r341_0
  • r-utf8==1.1.3=r3.4.1_0
  • rpy2==2.9.3=py36r3.4.1_0
  • jupyterlab==0.33.12=py36_0
  • pcre==8.39=0
  • constantly==15.1.0=py_0
  • r-bit64==0.9_5=r3.4.1_0
  • pytest-runner==4.2=py_0
  • r-rlang==0.2.0=r3.4.1_0
  • r-crayon==1.3.4=r3.4.1_0
  • incremental==17.5.0=py_0
  • readline==7.0=0
  • r-git2r==0.21.0=r341h0c37787_0
  • r-dbplyr==1.2.1=r341_0
  • r-glue==1.2.0=r3.4.1_0
  • json-rpc==1.10.3=py36_0
  • r-blob==1.1.1=r3.4.1_0
  • r-purrr==0.2.4=r3.4.1_0
  • r-pillar==1.2.2=r341_0
  • pytorch==0.4.1=py36_cuda0.0_cudnn0.0_1
  • torchvision==0.2.1=py36_1
  • protobuf==3.5.2=py36_0
  • r-dplyr==0.7.4=r3.4.1_0
  • r-base==3.4.1=3
  • r-rcpp==0.12.15=r3.4.1_0
  • hyperlink==17.3.1=py_0
  • r-digest==0.6.15=r3.4.1_0
  • onnx==1.1.2=py36h0c63530_0
  • pyasn1-modules==0.2.1=py_0
  • tzlocal==1.5.1=py_0
  • libedit==3.1.20170329=0
  • cssselect==1.0.3=py_0
  • r-cli==1.0.0=r3.4.1_0
  • r-tidyselect==0.2.4=r3.4.1_0
  • yapf==0.22.0=py_0
  • libprotobuf==3.5.2=0
  • pyasn1==0.4.3=py_0
  • r-magrittr==1.5=r3.4.1_0
  • r-rsqlite==2.0=r3.4.1_0
  • xgboost==0.72=py36_0
  • service_identity==17.0.0=py_0
  • r-prettyunits==1.0.2=r3.4.1_0
  • r-bh==1.66.0_1=r3.4.1_0

conda installs special channels, not in conda-forge

  • did conda install -c r r-git2r, to try and see if that helps -> nope
  • conda install pytorch torchvision -c pytorch.
    • previous install was gpu version. won’t work on mac
  • conda install -c districtdatalabs yellowbrick
  • r-dbplyr==1.2.1=r341_0
  • r-pillar==1.2.2=r341_0
  • torchvision==0.2.1=py36_1
  • pytorch==0.4.1=py36_cuda0.0_cudnn0.0_1
  • yellowbrick==0.7=py36_1

py packages [0/0]

dit - discrete info theory py

dit: discrete information theory — dit 1.0.2 documentation

pweave - scientific report generator

mpastell/Pweave [2019-08-29 Thu 12:26] Pweave is a scientific report generator and a literate programming tool for Python. Pweave can capture the results and plots from data analysis and works well with NumPy, SciPy and matplotlib. It is able to run python code from source document and include the results and capture matplotlib plots in the output.

Pweave is good for creating reports, tutorials, presentations etc. with embedded python code It can also be used to make websites together with e.g. Sphinx or rest2web.

HPC python

(llvm used in other libs)

HPC python library Numba: High-Performance Python with CUDA Acceleration | Hacker News Numba: High-Performance Python with CUDA Acceleration | Parallel Forall

Another good library arrayfire/arrayfire-python: Python bindings for ArrayFire: A general purpose GPU library. arrayfire/arrayfire: ArrayFire: a general purpose GPU library.

Python on CUDA packages

DatSci Automation [0/0]

  • for career, real world use
    • this is a major goal for all tooling
  • want large scale as well
  • much of the pipeline automated such that only some selection is needed
  • any and all tools that simplify any of the process

data analysis ML clusters

  • will use different cloud platform than for internet facing.
  • security is more an issue there. the ML cluster will be more isolated
  • also all the current tools are likely less secure anyways

ex: amazon sagemaker

NEW LAUNCH! Integrating Amazon SageMaker into your Enterprise - MCL34… Machine Learning Models & Algorithms | Amazon SageMaker on AWS

Visualizations [0/0]

Visualize | Keen IO

facets data visualization

Research Blog: Facets: An Open Source Visualization Tool for Machine Learning Training Data PAIR-code/facets: Visualizations for machine learning datasets

holoviews

HoloViews — HoloViews

Data Cleaning [0/0]

Other Languages

julia tensorflow / ML

Goodies: check out the videos

How’s Julia language (MIT) for ML? : MachineLearning Julia vs. Python: Julia language rises for data science | InfoWorld

JuliaEditorSupport JuliaCon 2018 | Making the test-debug cycle more efficient | Tim Holy - YouTube JuliaCon 2018 | Tools for making program analysis and debugging manageable | Jameson Nash - YouTube JuliaCon 2018 | Cassette: Dynamic, Context-Specific Compiler Pass Injection for Julia | J Revels - YouTube

DeepLearningFrameworks/Knet_CNN.ipynb at master · ilkarman/DeepLearningFrameworks TIOBE Index | TIOBE - The Software Quality Company Julia and “deep learning” : Julia

TensorFlow.jl/why_julia.md at master · malmaud/TensorFlow.jl,

High Level Frameworks: OpenML, Rapids

OpenML Home [2019-08-17 Sat 09:48]

OpenML — OpenML 0.10.0 documentation

Democratizing Machine Learning As machine learning is enhancing our ability to understand nature and build a better future, it is crucial that we make it transparent and easily accessible to everyone in research, education and industry. The Open Machine Learning project is an inclusive movement to build an open, organized, online ecosystem for machine learning. We build open source tools to discover (and share) open data from any domain, easily draw them into your favourite machine learning environments, quickly build models alongside (and together with) thousands of other data scientists, analyse your results against the state of the art, and even get automatic advice on how to build better models. Stand on the shoulders of giants and make the world a better place.

[.] Exp-frameworks, templates, tool-notes [0/3]

Start

  • prior notes [2019-08-15 Thu]
    • when to use:
      • dask
      • mlflow
      • polyaxon - seems more related to kubernetes, managing in production on clusters
      • DVC - github-lfs + makefiles
        • [ ] use with hservers and store big files on them?
    issue
    large files in a project folder that will need to be kept seperate somehow
    • big data
    • big models
    • FGLab (Kaixhin) 3 ppl only, smaller project
    • MLflow, Sacred, FGLab, Polyaxon alts(competitors).
      • h2o, datarobot also alts
      • kubeflow complements, can run the others on top of it.
      • DVC compliment?
    • sagemaker, airflow, glue go together
    • airflow can use to build pipelines to work on kubernetes
    • tutorial vids: google, mlflow, machine-learning-yearning, etc
    • [ ] model / data parallel example
    • Manifold company is an example boutique biz ? emulate

The Data Engineering Cookbook Notes

Skeleton

Notes for page 4
Introduction
How To Use This Cookbook
Data Engineer vs Data Scientists
Data ScientistData EngineerWho Companies Need
Basic Data Engineering Skills
Learn To Code
Get Familiar With Git
Agile Development
Why is agile so important?Agile rules I learned over the yearsIs the method making a difference?The problem with outsourcingKnowledge is king: A lesson from Elon MuskHow you really can be agileAgile FrameworksScrumOKRSoftware Engineering Culture
Learn how a Computer Works
CPU,RAM,GPU,HDDDifferences between PCs and Servers
Computer Networking - Data Transmission
OSI ModelIP SubnettingSwitch, Level 3 SwitchRouterFirewalls
Security and Privacy
SSL Public & Private Key CertificatesWhat is a certificate authorityJSON Web TokensGDPR regulationsPrivacy by design
Linux
OS BasicsShell scriptingCron jobsPacket management
The Cloud
IaaS vs PaaS vs SaaSAWS,Azure, IBM, Google Cloud basicsCloud vs On-PremisesSecurityHybrid Clouds
Security Zone Design
How to secure a multi layered applicationCluster security with KerberosKerberos Tickets
Big Data
What is big data and where is the difference to data science and data analytics?The 4Vs of Big DataWhy Big Data?Planning is EverythingThe Problem With ETLScaling UpScaling OutPlease Don’t go Big Data
My Big Data Platform Blueprint
IngestAnalyse / ProcessStoreDisplay
Lambda Architecture
Batch ProcessingStream ProcessingShould you do stream or batch processing?Lambda Architecture AlternativeKappa ArchitectureKappa Architecture with KuduWhy a Good Data Platform Is Important
Data Warehouse vs Data Lake
Hadoop Platforms
What is HadoopWhat makes Hadoop so popular?Hadoop Ecosystem ComponentsHadoop Is Everywhere?Should you learn Hadoop?How does a Hadoop System architecture look likeWhat tools are usually in a with Hadoop ClusterHow to select Hadoop Cluster Hardware
Docker
What is docker and what do you use it forDon’t Mess Up Your SystemPreconfigured ImagesTake It With YouKubernetes Container DeploymentHow to create, start,stop a ContainerDocker micro services?KubernetesWhy and how to do Docker container orchestrationUseful Docker Commands
REST APIs
API DesignImplementation FrameworksOAuth security
Databases
SQL DatabasesPostgreSQL DBDatabase DesignSQL QueriesStored ProceduresODBC/JDBC Server ConnectionsNoSQL StoresKeyValue Stores (HBase)Document Store HDFSDocument Store MongoDBElasticsearch Search Engine and Document StoreHive WarehouseImpalaKuduApache DruidInfluxDB Time Series DatabaseMPP Databases (Greenplum)
Data Processing and Analytics - Frameworks
Is ETL still relevant for Analytics?Stream ProcessingThree methods of streamingAt Least OnceAt Most OnceExactly OnceCheck The Tools!MapReduceHow does MapReduce workExampleWhat is the limitation of MapReduce?Apache SparkWhat is the difference to MapReduce?How does Spark fit to Hadoop?Where’s the difference?Spark and Hadoop is a perfect fitSpark on YARN:My simple rule of thumb:Available LanguagesHow Spark works: Driver, Executor, SparkcontextSpark batch vs stream processingHow does Spark use data from HadoopWhat are RDDs and how to use themHow and why to use SparkSQL?What are DataFrames how to use themMachine Learning on Spark? (Tensor Flow)MLlib:Spark SetupSpark Resource ManagementApache NifiStreamSets
Apache Kafka
Why a message queue tool?Kakfa architectureWhat are topicsWhat does Zookeeper have to do with KafkaHow to produce and consume messagesKAFKA Commands
Machine Learning
Training and Applying modelsWhat is deep learningHow to do Machine Learning in productionWhy machine learning in production is harder then you thinkModels Do Not Work ForeverWhere The Platforms That Support This?Training Parameter ManagementWhat’s Your Solution?How to convince people machine learning worksNo Rules, No Physical ModelsYou Have The Data. USE IT!Data is Stronger Than OpinionsAWS Sagemaker
Data Visualization
Android & IOSHow to design APIs for mobile appsHow to use Webservers to display contentTomcatJettyNodeREDReactBusiness Intelligence ToolsTableauPowerBIQuliksenseIdentity & Device ManagementWhat is a digital twin?Active Directory
Data Engineering Course: Building A Data Platform
What We Want To Do
Thoughts On Choosing A Development Environment
A Look Into the Twitter API
Ingesting Tweets with Apache Nifi
Writing from Nifi to Apache Kafka
Apache Zeppelin
Install and Ingest Kafka TopicProcessing Messages with Spark & SparkSQLVisualizing Data
Switch Processing from Zeppelin to Spark
Install SparkIngest Messages from KafkaWriting from Spark to KafkaMove Zeppelin Code to Spark
Case Studies
How I do Case Studies
Data Science @AirbnbData Science @AmazonData Science @BaiduData Science @BlackrockData Science @BMWData Science @Booking.comData Science @CERNData Science @DisneyData Science @DrivetribeData Science @DropboxData Science @EbayData Science @ExpediaData Science @FacebookData Science @GoogleData Science @@GrammarlyData Science @ING FraudData Science @InstagramData Science @LinkedInData Science @LyftData Science @NASAData Science @NetflixData Science @OLXData Science @OTTOData Science @PaypalData Science @PinterestData Science @SalesforceData Science @Siemens MindsphereData Science @SlackData Science @SpotifyData Science @SymantecData Science @TinderData Science @TwitterData Science @UberData Science @UpworkData Science @WootData Science @Zalando
1001 Data Engineering Interview Questions
Live Streams
All Interview Questions

[#B] big browser dump – go thru ML experiment frameworks

Experiment Templates

  1. NullConvergence/torch_temp: A(nother) Pytorch experimental template - uses sacred
  2. victoresque/pytorch-template: PyTorch deep learning projects made easy.
  3. ml-tooling/ml-project-template: ML project template facilitating both research and production phases.
    • from ml-tooling Berlin group
    • research & production
  4. MrGemy95/Tensorflow-Project-Template: A best practice for tensorflow project template architecture.
  5. williamFalcon/pytorch-lightning: The lightweight PyTorch wrapper for ML researchers. Scale your models. Write less boilerplate

Start

ML Frameworks

Github Search · machine learning project - useful looking stuff

more ML

pythonfu - pickling, generator, etc

How to check if an object is a generator object in python? - Stack Overflow Why can’t python module dill pickle the generator function? - Stack Overflow pickle iterators and generators · Issue #10 · uqfoundation/dill python - Why can’t generators be pickled? - Stack Overflow Issue 1092962: Make Generators Pickle-able - Python tracker python - Why can’t generators be pickled? - Stack Overflow Where does a generator store it’s values? : Python Automatically remove generator object from memory at StopIteration (Python) - Stack Overflow Python multiprocessing PicklingError: Can’t pickle <type ‘function’> - Stack Overflow UsingPickle - Python Wiki Change Fork Name For Github - Stack Overflow

CI CB CD stuff

Jenkins (software) - Wikipedia Category:Build automation - Wikipedia Continuous integration - Wikipedia Continuous Integration. CircleCI vs Travis CI vs Jenkins - By Django Stars Jenkins (software) - Wikipedia Continuous delivery - Wikipedia Continuous deployment - Wikipedia Comparison of continuous integration software - Wikipedia reddit.com: search results - continuous deployment magit-circleci: See the latest CircleCI builds from the Magit status buffer. : emacs

rmuslimov/jenkins.el: Jenkins plugin for emacs kljohann/mpv.el: control mpv for easy note taking Jenkins Is Getting Old | Hacker News Product Vision - CI/CD | GitLab Fun with Gitlab CI - VADOSWARE

fork searching

Can’t see the forks of a project on GitHub when “Too many forks to display” is shown - Web Applications Stack Exchange Intuitive way to view most active fork in GitHub - Stack Overflow Popular github Forks GitPop2: Find the most popular fork on GitHub Active GitHub Forks Enhanced GitHub - Chrome Web Store

major link dump tooling [2019-08-15 Thu]

Write proper python

How to write a production-level code in Data Science? Refactoring Python Code for Machine Learning Projects. Python “Spaghetti Code” Everywhere!

How to Write Beautiful Python Code With PEP 8 – Real Python How to write a production-level code in Data Science? styleguide | Style guides for Google-originated open-source projects Coding Style Guidelines — Pylearn 0.1 documentation

Python Packaging

Packaging Python Projects — Python Packaging User Guide Making a PyPI-friendly README — Python Packaging User Guide Minimal Structure — Python Packaging Tutorial Over 10% of Python Packages on PyPI are Distributed Without Any License | Snyk Choose an open source license | Choose a License Licenses | Choose a License TLDRLegal - Software Licenses Explained in Plain English

A template to make good README.md template-python/README.md at master · jacebrowning/template-python

activescott/python-package-example: A simple example of creating and consuming a distributable Python package.

Where do you keep your files? : emacs Rational ClearCase - Wikipedia

Reddit

D What’s your favorite logger? : MachineLearning D How do you manage your machine learning experiments? : MachineLearning Discussion How do you manage and keep track of your experiments? : MachineLearning D Best way to manage ML experiements : MachineLearning D How do you keep track of your experiment results? : MachineLearning D What tools are used in practice to schedule training jobs, annotate datasets, keep track of past experiments… ? : MachineLearning

several frameworks

Home - Guild AI guildai/guildai: Open source experiment tracking and optimization for machine learning Comet.ml | Supercharging Machine Learning mlflow/mlflow: Open source platform for the machine learning lifecycle Tutorial — MLflow 1.2.0 documentation Weights & Biases kubeflow/kubeflow: Machine Learning Toolkit for Kubernetes Kubeflow | Kubeflow seba-1511/randopt: Streamlined machine learning experiment management. richardliaw/track: Track your ML project!

MLflow

Introducing MLflow: an Open Source Platform for the Complete Machine Learning Lifecycle How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform - Databricks What are the current open source alternatives to MLflow? | Hacker News

FGLab

FGLab: Machine Learning Dashboard Kaixhin/FGLab: Future Gadget Laboratory

frameworks & discussions

Semantic Versioning 2.0.0 | Semantic Versioning danielwaterworth/metricmachine: Simple flask app for displaying live timeseries data polyaxon/polyaxon: A platform for reproducible and scalable machine learning and deep learning on kubernetes Pachyderm - Scalable, Reproducible Data Science mlflow sacred weights and biases - Google Search Compare to other ML e2e platforms · Issue #58 · mlflow/mlflow Controlled Experiments in Machine Learning rquintino (Rui Quintino) Towards Reproducible Research with PyTorch Hub | Hacker News Tutorial — Airflow Documentation

Build end-to-end machine learning workflows with Amazon SageMaker and Apache Airflow | AWS Machine Learning Blog

TRAINS

TRAINS: An open-source, zero-integration tool to boost machine learning research allegroai/trains: TRAINS - Auto-Magical Experiment Manager & Version Control for AI allegroai/trains-server: TRAINS Server - Auto-Magical Experiment Manager & Version Control for AI trains/brief.md at master · allegroai/trains Allegro.ai - Deep Learning Computer Vision Platform trains - Allegro.AI

cookiecutter

Home - Cookiecutter Data Science drivendata/cookiecutter-data-science: A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. manifoldai/docker-cookiecutter-data-science: A fork of the cookiecutter-data-science leveraging Docker for local development. An AI Engineering Services Firm | Manifold

DVC etc

Machine Learning Version Control System · DVC Data Version Control - Machine Learning Time Travel - YouTube iterative/dvc: 🦉Data Version Control | Git for Data & Models

Workflow management system - Wikipedia

Git Large File Storage | Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise.

github - How do Git LFS and git-annex differ? - Stack Overflow

D How do you structure your PyTorch deep learning Implementations/Projects/PythonLibs

D How do you structure your PyTorch deep learning Implementations/Projects/PythonLibs : MachineLearning importlib — The implementation of import — Python 3.7.4 documentation pytorch-template/README.md at master · victoresque/pytorch-template MrGemy95/Tensorflow-Project-Template: A best practice for tensorflow project template architecture. toolkit/pytorch_project_template at master · gmum/toolkit

google ML

Rules of Machine Learning:  |  ML Universal Guides  |  Google Developers Machine Learning Crash Course  |  Google Developers Introduction to Machine Learning  |  Machine Learning Crash Course  |  Google Developers Introducing ML - YouTube Education – Google AI

[2019-08-15 Thu 13:21]

r/MachineLearning Posted by u/__Julia . 9 months ago Archived https://www.redditstatic.com/desktop2x/img/renderTimingPixel.png

Hi, In the data science community, I have seen a wide adoption of this project structure https://github.com/drivendata/cookiecutter-data-science. However, I am still struggling to find a unified way to structure ML experiments that save readers time to understand the structure of the project.

20 comments

  • snaf77 38 points · 9 months ago
    • Yes, my team have (somewhat). I have written couple of medium articles on that: https://medium.com/@mbednarski
    • Some our rules of thumb:
      • Notebooks are allowed only for private projects, we do not even commit them.
      • All steps needs to be reproducible from raw data to trained models (we use gnu make)
      • Teams needs to know more about Python, not only basics - this allows them to write better code.
      • git flow
      • strict package versioning - we use pip-tools
      • swagger for api
      • It is regular software project, ML does not allow to do “shortcusts because >>science<<”. So all principles like DRY, KISS. Exception is when performance is an issue (but it should be profiled first)
      • (unit) test where possible - e.g. data loaders, preprocessors etc
      • Well defined entry points (one common cli is better than bunch of scripts)
      • DOCUMENTATION for things that are not obvious from reading the code
      • I prefer to keep as much configuration (batch size, optimizer, etc) in json files, but not everyone likes it (i like to have mapping: json config -> results dir)
    • AllenNLP is a good inspiration for me
    • LiberalSexist 2 points · 9 months ago
      • Just read the first part of your structured ML series and found plenty of great ideas applicable for data-science projects in general.
      • “AllenNLP is a good inspiration for me” – What work/code of AllenNLP do you mean in particular?
      • BatJedi121 1 point 8 months ago
        • I personally like how almost everything is configureable through JSON files. A lot of the boilerplate like vocab, masking sequences, having LSTMs to vectors/LSTMs to multiple outputs like things, typical attention types handled in really simple ways. setting up the training/val loop, collecting metrics, serializing/loading models.
        • I think getting to AllenNLP level for projects is overkill - but I think future libraries should definitely follow their design principles. take the boilerplate out.
  • mate_classic 13 points · 9 months ago
    • I was thinking about the same recently. Right now I’m trying to refactor my code to resemble this template here: https://github.com/victoresque/pytorch-template/blob/master/README.md
    • thatguydr 1 point · 9 months ago
      • I like the OP’s link a lot better for the top-down structure, as it separates all of the code into a single spot that can be committed easily. I like your link because it separates the model classes, the loader classes, the trainer class(es), and the utility classes (though I’d put the abstracts in that spot).
      • What I don’t like in either is the rather cavalier treatment of the reporting/evaluation. If you could tie the evaluation and the original config in with the code, all of that could be committed, provided you make a low-memory evaluation format and not dozens of plots. (Maybe throw the requirements in with them.) That’d be a very clean solution.
    • mate_classic 1 point 9 months ago
      • I didn’t really think about evaluation data. Maybe because I’m working only with generative models right now, where evaluation is looking for the most beautiful picture most of the time.
    • [deleted] 1 point · 8 months ago
      • I really like this template. Gonna start using this.
  • ranihorev 8 points · 9 months ago
    • The biggest challenge for me is how to do the transition from the notebook to production fast and smooth
      • srossi93 26 points · 9 months ago
        • Simple, never use notebooks! Notebook is a great tool for visualization, simple experiments, debugging, but as soon as the lines of code are >50, I immediately switch to a more structured organization. BTW, PyTorch is super OO and it’s super easy to derive, inherit, extend functionalities!
          • tidier 2 points 9 months ago
            • Never might be a bit strong, but I absolutely agree with everything else. Notebooks are excellent for experimentation, but I aggressively shift stuff to Python files and use importlib.reload.
          • ranihorev 1 point 9 months ago
            • I completely agree. The tricky part is to identify the point in which the experiment is done…
      • JanneJM 2 points 9 months ago
        • You can run external programs from the notebook though. One benefit of doing that is that you could have a self-documenting pipeline, with the final (or just preview) results inline with the commands running the model, glue logic and so on.
      • MoreDonuts 1 point 9 months ago
        • And compose.
      • gionnelles 1 point 9 months ago
        • My team follows this exclusively.
  • pickwickdick 3 points · 9 months ago
    • I structure my code using the following conventions:
      • 1. I keep the model and the train/eval logic separate.
      • 2. I have a ParamParser class in a utils folder that takes in a JSON file as input and exposes all keys as member variables that can be accessed like member variables.
      • 3. In my model.py file I define the loss function, accuracy and expose it via a metrics dictionary. Now in train.py I can simply call metrics['accuracy'](out,label) to compute the accuracy (or loss).
    • Hopefully, this helps answer your questions OP.

ekshaks 3 points · 9 months ago Dealing with tensor shapes and documenting them for others is a pervasive problem. I use shape annotations using the tsalib library to document shapes throughout the data and model pipeline. https://github.com/ofnote/tsalib

mentatf 2 points · 9 months ago Using scikit learn guidelines and skorch that fits perfectly with that. ( https://github.com/dnouri/skorch)

RoastDepreciation 1 point · 9 months ago

  • Cookiecutter is an excellent starting point. Adapt to your team’s needs.
  • Not committing notebooks is simply not an option. They’re here to stay. Instead commit notebooks without output and store regular html exports in a separate reports root folder for future reference and reporting in the team.

katyngate 0 points · 9 months ago Here’s one possible way: https://github.com/gmum/toolkit/tree/master/pytorch_project_template

[.] MLflow tryout

[.] guildAI tryout

[.] pytorch-lightning, trains, wandb

guildAI slack

  • skimming through:
    • they should have an api for notebooks
    • slides on autoML /home/will/Downloads/Chicago ML - Applied Engineering Workshop.pdf
explanation how guild works

Hi @Mohammedi Haroune and welcome! By default, Guild inspects the script you want to run (or the main module specified in the Guild for the operation) and checks for the use of argparse. If the script uses argparse, Guild runs the script with the –help option and uses that dry-run to inspect the arguments available and uses those as defined (see note concerning magic below). If the script does not use argparse, Guild checks for global variable assignments of numbers and strings and uses those as flags. With that information (either from argparse or globals) it lets the user redefine flag values using NAME=VALUE on the command line. Before it runs the operation, Guild prints the flag values as a preview. You can also see what Guild is importing by running guild help or guild run SCRIPT_OR_OPERATION –help-op. If the flags come from argparse, Guild passes those as command line options to the script. If defined as globals, Guild sets the global values to the user-provided values by dynamically modifying the module AST as it’s loaded. This is all a bit magical and everyone reading this should feel a little uneasy at this point 🙂 The reason for all this implicit logic is to let you pick up a script and just run it - Guild captures the experiment as expected. In most cases this magic just works and everyone’s happy. But when it doesn’t, it’s mysterious and frustrating! The good news is that all of this behavior can be strictly controlled - and even disabled altogether - with a few lines in a Guild file (a file named guild.yml in your project directory). In the Guild file, you can provide explicit information about the flags for an operation as well as how the flags are set. This scheme is quite under-documented atm. For a fairly exhaustive list of examples on this topic, see: https://github.com/guildai/guildai/tree/master/guild/tests/samples/projects/flags You can change to that directory (after cloning the repo obviously) and run each example to see the behavior. Of course that’s an exercise for the uber curious 🙂 If you want to accomplish something and it’s not falling into place for you - please just post your question here and someone can help! (

Abhinv Ramesh Kashyap 6:15 AM

I saw that any value that is output in the formal key:value will be captured by guild. Is there any other way to explicitly tell guild to capture this or outputting to stdout is the only way as of now? (edited) Garrett 9:56 AM @Abhinv Ramesh Kashyap The short answer is yes, definitely - Guild happily reads any generated TF event files. By default Guild parses your script output for patterns KEY: NUMBER as you observed. However you can control that behavior using a Guild file. Here’s an example that steps you through the concepts and shows you how to configure an operation for both modified parsing behavior and also how to disable the parsing altogether when you just want to log values directly. https://github.com/guildai/examples/tree/master/custom-scalars (edited)

Garrett 7:51 AM

@here I’ve created a new repo that we can use to work on/communicate issue resolution: https://github.com/guildai/issues Sometimes (often) it’s handy to systematically reproduce a bug/issue and be able to quickly re-run steps against new releases to confirm expected behavior. Our first examples is related to source code copies - an important topic for many Guild users. If you’re interested in how Guild decides what files to save as source code, this https://github.com/guildai/issues/tree/master/issue-39 is a step-by-step walk through. (edited) 0.6.6rc2 is available for pre-release eval. I snuck in a pretty cool feature that I’d love to get some feedback on. Now when you run guild tensorboard Guild will prepare TensorBoard HParam summaries in the back ground so you can compare run hyperparams and metrics in the HParam tab. This is a really nice feature offered by TensorBoard!

I got a question about workflow and I’d like to answer here for everyone’s benefit.

Garrett Sep 3rd at 7:09 AM The gist of the question is related to a common pipeline: prepare data from some raw source, engineer features on the prepared data (a second stage of data prep), train a model, validate a model. In a Guild file, each of these stages are defined as separate operations. The operations are related to one another through resource dependencies. The first operation will depend on the raw data. Subsequent operations will depend on their upstream operations. Something like this: prepare-data: requires:

  • file: data.csv

add-features: requires:

  • operation: prepare-data

train: requires:

  • operation: add-features

validate: requires:

  • operation: train

It might make sense for some of these operations to be melded into one. E.g. prepare-data and add-features could be one operation (i.e. roll the feature engineering work into the data prep script). Or train and validate could be one (validate as a part of the training script - this is very common). The triggers for creating a separate operation (e.g. split up raw data prep and feature engineering) are:

  • Does the operation take long? (a subjective term - but usually you know it when you see it) - If yes, consider creating a separate operation to simply avoid having to re-run the operation when you can re-use artifacts as a dependency.
  • Could the operation potentially be run multiple times, each time with different hyperparameters or inputs (flag values) for a given set of upstream dependencies? For example, for validation, its common to validate against new data sets as new labeled examples become available. You probably don’t want to retrain a model just to revalidate with new data. In this case, validate should be a separate operation. (edited)
21

María Benavente 6 days ago awesome! let’s say, for example, that add-features accesses as well the data file, would it be necessary to set the requirement also for that operation? (edited)

Garrett 6 days ago Yes, indeed it would! You could get to data by way of the prepare-data operation but this is not a good idea - and arguably Guild should treat that as an error (or warn you). You should instead list data as a required resource for add-features. This is where defining your resources in separate named sections is a good idea. Then you can simply reference the resource by name and not have to redefine it every time it’s needed. To define a named resource, you need to define a model. For example

  • model: my-model resources: raw-data: sources:
    • file: data-1.csv
    • file: data-2.csv

    prepared-data: sources:

    • operation: prepare-data

    operations: prepare-data: requires: raw-data add-features: requires:

    • raw-data
    • prepared-data

… Note that I went ahead and defined a prepared-data resource. Even if a resource is only used once, I think it’s nice to define named resources as it keeps the operation requires config simple and readable. (edited)

Garrett 6 days ago Note that I edited the example above to include a sources attr under each resource. Guild requires this atm. (I’m actually going to fix this right now to make sources optional - for now you have to use it.)

María Benavente 6 days ago wow, okey! that’s really clean

María Benavente 6 days ago and in order to avoid those files running with sourcecode // exclude would it be possible to reference it also that way? Example: sourcecode:

  • exclude: raw-data

Garrett 6 days ago Yes but you have to spell that as - exclude: raw-data/*

Garrett 6 days ago I don’t really like that requirement - I’ll look into fixing that so you can just list the directory there. 👍 1

Garrett 6 days ago But for now, use the glob pattern.

María Benavente 6 days ago alright

María Benavente 6 days ago i’m finding quite confusing a behavior I’m experimenting as a result of requiring specific files into each operation:

  • model: claim-detection description: Classifier for claim-tagged data resources: excels: sources:
    • file: data/excels/
    • file: data/raw/

    raw: sources:

    • file: data/raw/
    • file: data/processed/
    • operation: generatedataset

Now that I do this, locally to my code, those folders “loose the data prefix”. I wasn’t a big deal to update my global_path variable at the code, but I’m not sure whether the global path should remain or not

María Benavente 6 days ago (did I explain myself here?)

Garrett 6 days ago Yes, that’s right - the way you’re specifying the data files, they will not appear under a data path. They are selected and linked to using their base names (e.g. excels, raw, etc.) If you want these selected files/dirs to appear under a data path, you do a couple things. First, you could just select data and leave it at that: sources:

  • file: data

This will create a link to data and you’ll have access to everything in that directory. If you’d prefer to be more specific (generally a good idea) you can specify a path attr to indicate that links to selected files/dirs should be created in a sub-directory. Like this: excels: path: data sources:

  • file: data/excels
  • file: data/raw

This will create the directory structure that you’re expecting - but only include the two specified dirs as links.

Garrett 6 days ago Btw, in cases like this where you’re trying to sort out the directory layout, there’s a –stage DIR option to the run command that will only layout the run directory and not actually run the operation. You can inspect DIR in this case to see what Guild is doing.

2

Garrett 6 days ago The third option is the one you mention, which is to adjust your script to look for the resources in something other than data. In most cases, I just specify data as a source and be done with it. Remember this creates a symlink - it’s not copying anything. The only harm in including data is that you have access to everything in that dir, which could mask some bugs. It’s also less explicit. There’s a point however when being explicit has diminishing returns - so it’s a judgment call.

[2019-10-20 Sun]

good commentary (R) pytorch-lightning - The researcher’s version of keras : MachineLearning [2019-10-20 Sun 22:39]

awesome, comprehensive: A Comparison of Reinforcement Learning Frameworks: Dopamine, RLLib, Keras-RL, Coach, TRFL, Tensorforce, Coach and more [2019-10-20 Sun 22:55]

[.] mastery python system -includes Arch [1/4]

  • BEST (Raschka) –> /home/will/DevAcademics/LanguageThemed/python_reference
  • look at chrome links in NOW

[.] conda, pyenvs, the whole thing, and setup proper envs policy

[.] learn pytorch - use new notebooks

maybe - Deep Neural Networks with PyTorch - Stefan Otte - YouTube

[.] python dev chat groups QA

  • gitter, irc for python, anaconda, etc about datsci practices, ie failure of portable conda envs, need to customize
  • one or few best envs for general datsci research. there should be only a few.
  • any overall package guide? prob not

Python Closures: How to use it and Why? python map function - Google Search best python coding slack channels - Google Search python coding gitter - Google Search reddit: the front page of the internet Python Python coding: a subreddit for people who know Python Quick python tips to add to your collection

[x] how does package/module system work

named tensors

misc links that were in named tensor heading

[x] notify when job done - trying telegram

I’d recommend making actual .py files and running your code through there if it is taking that long. You can then use notify2 (“pip install notify2”) to send yourself a desktop notification when your code finishes.

Ahmad Moussa [1 day ago] if it’s remotely you could send yourself an email via a python script

Pyrestone [7 hours ago] I also use the python telegram api sometimes. It’s pretty simple and you can send messages to your phone.

[-] SOTA papers with code work

  • for:
    • downloading their data to process
    • run benchmarks, tasks
  • starred repos
  • dir in devacademics

json links

All papers with abstracts https://paperswithcode.com/media/about/papers-with-abstracts.json.gz Links between papers and code https://paperswithcode.com/media/about/links-between-papers-and-code.json.gz Evaluation tables https://paperswithcode.com/media/about/evaluation-tables.json.gz

The last JSON is in the sota-extractor format and the code from there can be used to load in the JSON into a set of Python classes.

At the moment, data is regenerated once a week (over the weekend).

Part of the data is coming from the sources listed in the sota-extractor README.

papers with code json data snips

links-btw-paper-and-code

{
  "paper_title": "FASTSUBS: An Efficient and Exact Procedure for Finding the Most Likely Lexical Substitutes Based on an N-gram Language Model",
  "paper_arxiv_id": "1205.5407",
  "paper_url_abs": "http://arxiv.org/abs/1205.5407v2",
  "paper_url_pdf": "http://arxiv.org/pdf/1205.5407v2.pdf",
  "repo_url": "https://github.com/denizyuret/fastsubs-googlecode",
  "mentioned_in_paper": false,
  "mentioned_in_github": true
},

evaluation-tables

{
  "categories": [
    "Computer Vision"
  ],
  "datasets": [],
  "description": "The average of the normalized top-1 prediction scores of unseen classes in the generalized zero-shot learning setting, where the label of a test sample is predicted among all (seen + unseen) classes.",
  "source_link": null,
  "subtasks": [],
  "synonyms": [],
  "task": "Generalized Zero-Shot Learning - Unseen"
},
{
  "categories": [
    "Medical"
  ],
  "datasets": [],
  "description": "",
  "source_link": null,
  "subtasks": [],
  "synonyms": [],
  "task": "breast density classification"
},
{
  "categories": [
    "Medical"
  ],
  "datasets": [],
  "description": "",
  "source_link": null,
  "subtasks": [],
  "synonyms": [],
  "task": "epilepsy prediction"
},
{
  "categories": [
    "Methodology"
  ],
  "datasets": [],
  "description": "",
  "source_link": null,
  "subtasks": [],
  "synonyms": [],
  "task": "Sparse Learning"
},
{
  "categories": [
    "Robots"
  ],
  "datasets": [],
  "description": "",
  "source_link": null,
  "subtasks": [],
  "synonyms": [],
  "task": "Calibration"
},
{
  "categories": [
    "Graphs"
  ],
  "datasets": [],
  "description": "",
  "source_link": null,
  "subtasks": [],
  "synonyms": [],
  "task": "hypergraph partitioning"
}

papers-with-abstracts

{
  "arxiv_id": null,
  "title": "Towards a Discourse Model for Knowledge Elicitation",
  "abstract": "",
  "url_abs": "https://www.aclweb.org/anthology/papers/R/R13/R13-2006/",
  "url_pdf": "https://www.aclweb.org/anthology/R13-2006",
  "proceeding": "RANLP 2013 9"
},
{
  "arxiv_id": "1508.05902",
  "title": "A Framework for Comparing Groups of Documents",
  "abstract": "We present a general framework for comparing multiple groups of documents. A\nbipartite graph model is proposed where document groups are represented as one\nnode set and the comparison criteria are represented as the other node set.\nUsing this model, we present basic algorithms to extract insights into\nsimilarities and differences among the document groups. Finally, we demonstrate\nthe versatility of our framework through an analysis of NSF funding programs\nfor basic research.",
  "url_abs": "http://arxiv.org/abs/1508.05902v1",
  "url_pdf": "http://arxiv.org/pdf/1508.05902v1.pdf",
  "proceeding": null
},
{
  "arxiv_id": null,
  "title": "DysList: An Annotated Resource of Dyslexic Errors",
  "abstract": "",
  "url_abs": "https://www.aclweb.org/anthology/papers/L/L14/L14-1492/",
  "url_pdf": "http://www.lrec-conf.org/proceedings/lrec2014/pdf/612_Paper.pdf",
  "proceeding": "LREC 2014 5"
},

pipelineAI is kubeflow as a service (KASS)

Hands-on with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost - YouTube [2019-09-25 Wed 10:25] Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU Description In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow. Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google. KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking. Airflow is the most-widely used pipeline orchestration framework in machine learning.

  • most ppl not using pytorch, 1st mover advantage
  • airflow better than luigie
  • next gpu will be multi-user thread friendly
  • on-prem 39%, pretty good
  • A/B and multi-armed bandit testing of models
  • EKS - amazon elasti-kubernetes service
  • kubeflow doesn’t offer native airflow integration, pipelineai kubeflow version does along with MLflow (databricks)
  • each github star is worth $1,500 in SV land.

PipelineAI - Products [2019-09-25 Wed 11:08] Multi/Hybrid-Cloud CPU + GPU + TPU Dynamic Auto Scaling Adaptive Traffic Shift Continuous Model Training Continuous Pipeline Optimization Continuous Model Validation Kafka Streaming Private Dashboards Logging Integration SAML + LDAP + OAuth + IAM 24x7 Support

DataSets

  1. Github’s Top Open Datasets For Machine Learning
  2. https://www.kaggle.com/datasets
  3. ~/DevAcademics/Datasets
    1. ~/DevAcademics/Datasets/awesome-public-datasets
  4. Google Dataset Search
  5. Academic Torrents

Reference

Stanford DAWN Deep Learning Benchmark (DAWNBench) ~/Documents/2Research/OnlineEdu/datasciencemasters-go ~/Documents/2Research/OnlineEdu/open-source-machine-learning-degree

local awesome lists

awesome-datascience ~/Documents/2Research/DataScience/awesome-datascience-ideas ~/Documents/2Research/DataScience/datascience-awesome-cheat-sheets ~/Documents/2Research/DataScience/free-data-science-books

Web Scraping Info

~/Documents/2Research/DataScience/awesome-crawler ~/Documents/2Research/DataScience/awesome-crawler/README.html ~/Documents/2Research/DataScience/awesome-crawler/README.md

python web/twitter scraping

Web Scraping Tutorial with Python: Tips and Tricks

kennethreitz/twitter-scraper: Scrape the Twitter Frontend API without authentication. taspinar/twitterscraper: Scrape Twitter for Tweets haccer/tweep: An advanced Twitter scraping tool written in Python that doesn’t use Twitter’s API, evading most API limitations. tweepy/tweepy: Twitter for Python!

Twitter scraper tutorial with Python: Requests, BeautifulSoup, and Selenium — Part 1 Mining Twitter Data with Python (Part 1: Collecting data) – Marco Bonzanini bonzanini/Book-SocialMediaMiningPython: Companion code for the book “Mastering Social Media Mining with Python”

Beautiful Soup Documentation — Beautiful Soup 4.4.0 documentation Selenium - Web Browser Automation

web scraping info

Papers with Code

Home - Nurture.AI GitXiv: Collaborative Open Computer Science Papers with Code : the latest in machine learning

Books noter

DeepLearningBook_cropped

Skeleton

Contents

Website

Acknowledgments

Notation

Chapter 1 - Introduction

1.1 Who Should Read This Book?
1.2 Historical Trends in Deep Learning
1.2.1 The Many Names and Changing Fortunes of Neural Networks
1.2.2 Increasing Dataset Sizes
1.2.3 Increasing Model Sizes
1.2.4 Increasing Accuracy, Complexity and Real-World Impact

Part I - Applied Math and Machine Learning Basics

Chapter 2 - Linear Algebra
2.1 Scalars, Vectors, Matrices and Tensors
2.2 Multiplying Matrices and Vectors
2.3 Identity and Inverse Matrices
2.4 Linear Dependence and Span
2.5 Norms
2.6 Special Kinds of Matrices and Vectors
2.7 Eigendecomposition
2.8 Singular Value Decomposition
2.9 The Moore-Penrose Pseudoinverse
2.10 The Trace Operator
2.11 The Determinant
2.12 Example: Principal Components Analysis
Chapter 3 - Probability and Information Theory
3.1 Why Probability?
3.2 Random Variables
3.3 Probability Distributions
3.3.1 Discrete Variables and Probability Mass Functions
3.4 Marginal Probability
3.5 Conditional Probability
3.6 The Chain Rule of Conditional Probabilities
3.7 Independence and Conditional Independence
3.8 Expectation, Variance and Covariance
3.9 Common Probability Distributions
3.9.1 Bernoulli Distribution3.9.2 Multinoulli Distribution3.9.3 Gaussian Distribution3.9.4 Exponential and Laplace Distributions3.9.5 The Dirac Distribution and Empirical Distribution3.9.6 Mixtures of Distributions
3.10 Useful Properties of Common Functions
3.11 Bayes’ Rule
3.12 Technical Details of Continuous Variables
3.13 Information Theory
3.14 Structured Probabilistic Models
Chapter 4 - Numerical Computation
4.1 Overflow and Underflow
4.2 Poor Conditioning
4.3 Gradient-Based Optimization
4.3.1 Beyond the Gradient: Jacobian and Hessian Matrices
4.4 Constrained Optimization
4.5 Example: Linear Least Squares
Chapter 5 - Machine Learning Basics
5.1 Learning Algorithms
5.1.1 The Task, T5.1.2 The Performance Measure, P5.1.3 The Experience, E5.1.4 Example: Linear Regression
5.2 Capacity, Overfitting and Underfitting
5.2.1 The No Free Lunch Theorem5.2.2 Regularization
5.3 Hyperparameters and Validation Sets
5.3.1 Cross-Validation
5.4 Estimators, Bias and Variance
5.4.1 Point Estimation5.4.2 Bias5.4.3 Variance and Standard Error5.4.4 Trading off Bias and Variance to Minimize Mean Squared Error5.4.5 Consistency
5.5 Maximum Likelihood Estimation
5.5.1 Conditional Log-Likelihood and Mean Squared Error5.5.2 Properties of Maximum Likelihood
5.6 Bayesian Statistics
5.6.1 Maximum A Posteriori (MAP) Estimation
5.7 Supervised Learning Algorithms
5.7.1 Probabilistic Supervised Learning5.7.2 Support Vector Machines5.7.3 Other Simple Supervised Learning Algorithms
5.8 Unsupervised Learning Algorithms
5.8.1 Principal Components Analysis5.8.2 k-means Clustering
5.9 Stochastic Gradient Descent
5.10 Building a Machine Learning Algorithm
5.11 Challenges Motivating Deep Learning
5.11.1 The Curse of Dimensionality5.11.2 Local Constancy and Smoothness Regularization5.11.3 Manifold Learning

Part II - Deep Networks: Modern Practices

Chapter 6 - Deep Feedforward Networks
6.1 Example: Learning XOR
6.2 Gradient-Based Learning
6.2.1 Cost Functions6.2.1.1 Learning Conditional Distributions with Maximum Likelihood6.2.1.2 Learning Conditional Statistics6.2.2 Output Units6.2.2.1 Linear Units for Gaussian Output Distributions6.2.2.2 Sigmoid Units for Bernoulli Output Distributions6.2.2.3 Softmax Units for Multinoulli Output Distributions6.2.2.4 Other Output Types
6.3 Hidden Units
6.3.1 Rectified Linear Units and Their Generalizations6.3.2 Logistic Sigmoid and Hyperbolic Tangent6.3.3 Other Hidden Units
6.4 Architecture Design
6.4.1 Universal Approximation Properties and Depth6.4.2 Other Architectural Considerations
6.5 Back-Propagation and Other Differentiation Algorithms
6.5.1 Computational Graphs6.5.2 Chain Rule of Calculus6.5.3 Recursively Applying the Chain Rule to Obtain Backprop6.5.4 Back-Propagation Computation in Fully-Connected MLP6.5.5 Symbol-to-Symbol Derivatives6.5.6 General Back-Propagation6.5.7 Example: Back-Propagation for MLP Training6.5.8 Complications6.5.9 Differentiation outside the Deep Learning Community6.5.10 Higher-Order Derivatives
6.6 Historical Notes
Chapter 7 - Regularization for Deep Learning
7.1 Parameter Norm Penalties
7.1.1 L2 Parameter Regularization7.1.2 L1 Regularization
7.2 Norm Penalties as Constrained Optimization
7.3 Regularization and Under-Constrained Problems
7.4 Dataset Augmentation
7.5 Noise Robustness
7.6 Semi-Supervised Learning
7.7 Multi-Task Learning
7.8 Early Stopping
7.9 Parameter Tying and Parameter Sharing
7.10 Sparse Representations
7.11 Bagging and Other Ensemble Methods
7.12 Dropout
7.13 Adversarial Training
7.14 Tangent Distance, Tangent Prop, and Manifold Tangent Classifier
Chapter 8 - Optimization for Training Deep Models
8.1 How Learning Differs from Pure Optimization
8.1.1 Empirical Risk Minimization8.1.2 Surrogate Loss Functions and Early Stopping8.1.3 Batch and Minibatch Algorithms
8.2 Challenges in Neural Network Optimization
8.2.1 Ill-Conditioning8.2.2 Local Minima8.2.3 Plateaus, Saddle Points and Other Flat Regions8.2.4 Cliffs and Exploding Gradients8.2.5 Long-Term Dependencies8.2.6 Inexact Gradients8.2.7 Poor Correspondence between Local and Global Structure8.2.8 Theoretical Limits of Optimization
8.3 Basic Algorithms
8.3.1 Stochastic Gradient Descent8.3.2 Momentum8.3.3 Nesterov Momentum
8.4 Parameter Initialization Strategies
8.5 Algorithms with Adaptive Learning Rates
8.5.1 AdaGrad8.5.2 RMSProp8.5.3 Adam8.5.4 Choosing the Right Optimization Algorithm
8.6 Approximate Second-Order Methods
8.6.1 Newton’s Method8.6.2 Conjugate Gradients8.6.3 BFGS
8.7 Optimization Strategies and Meta-Algorithms
8.7.1 Batch Normalization8.7.2 Coordinate Descent8.7.3 Polyak Averaging8.7.4 Supervised Pretraining8.7.5 Designing Models to Aid Optimization8.7.6 Continuation Methods and Curriculum Learning
Chapter 9 - Convolutional Networks
9.1 The Convolution Operation
9.2 Motivation
9.3 Pooling
9.4 Convolution and Pooling as an Infinitely Strong Prior
9.5 Variants of the Basic Convolution Function
9.6 Structured Outputs
9.7 Data Types
9.8 Efficient Convolution Algorithms
9.9 Random or Unsupervised Features
9.10 The Neuroscientific Basis for Convolutional Networks
9.11 Convolutional Networks and the History of Deep Learning
Chapter 10 - Sequence Modeling: Recurrent and Recursive Nets
10.1 Unfolding Computational Graphs
10.2 Recurrent Neural Networks
10.2.1 Teacher Forcing and Networks with Output Recurrence10.2.2 Computing the Gradient in a Recurrent Neural Network10.2.3 Recurrent Networks as Directed Graphical Models10.2.4 Modeling Sequences Conditioned on Context with RNNs
10.3 Bidirectional RNNs
10.4 Encoder-Decoder Sequence-to-Sequence Architectures
10.5 Deep Recurrent Networks
10.6 Recursive Neural Networks
10.7 The Challenge of Long-Term Dependencies
10.8 Echo State Networks
10.9 Leaky Units and Other Strategies for Multiple Time Scales
10.9.1 Adding Skip Connections through Time10.9.2 Leaky Units and a Spectrum of Different Time Scales10.9.3 Removing Connections
10.10 The Long Short-Term Memory and Other Gated RNNs
10.10.1 LSTM10.10.2 Other Gated RNNs
10.11 Optimization for Long-Term Dependencies
10.11.1 Clipping Gradients10.11.2 Regularizing to Encourage Information Flow
10.12 Explicit Memory
Chapter 11 - Practical Methodology
11.1 Performance Metrics
11.2 Default Baseline Models
11.3 Determining Whether to Gather More Data
11.4 Selecting Hyperparameters
11.4.1 Manual Hyperparameter Tuning11.4.2 Automatic Hyperparameter Optimization Algorithms11.4.3 Grid Search11.4.4 Random Search11.4.5 Model-Based Hyperparameter Optimization
11.5 Debugging Strategies
11.6 Example: Multi-Digit Number Recognition
Chapter 12 - Applications
12.1 Large Scale Deep Learning
12.1.1 Fast CPU Implementations12.1.2 GPU Implementations12.1.3 Large Scale Distributed Implementations12.1.4 Model Compression12.1.5 Dynamic Structure12.1.6 Specialized Hardware Implementations of Deep Networks
12.2 Computer Vision
12.2.1 Preprocessing12.2.1.1 Contrast Normalization12.2.1.2 Dataset Augmentation
12.3 Speech Recognition
12.4 Natural Language Processing
12.4.1 n-grams12.4.2 Neural Language Models12.4.3 High-Dimensional Outputs12.4.3.1 Use of a Short List12.4.3.2 Hierarchical Softmax12.4.3.3 Importance Sampling12.4.3.4 Noise-Contrastive Estimation and Ranking Loss12.4.4 Combining Neural Language Models with n-grams12.4.5 Neural Machine Translation12.4.5.1 Using an Attention Mechanism and Aligning Pieces of Data12.4.6 Historical Perspective
12.5 Other Applications
12.5.1 Recommender Systems12.5.1.1 Exploration Versus Exploitation12.5.2 Knowledge Representation, Reasoning and Question Answering12.5.2.1 Knowledge, Relations and Question Answering

Part III - Deep Learning Research

Chapter 13 - Linear Factor Models
13.1 Probabilistic PCA and Factor Analysis
13.2 Independent Component Analysis (ICA)
13.3 Slow Feature Analysis
13.4 Sparse Coding
13.5 Manifold Interpretation of PCA
Chapter 14 - Autoencoders
14.1 Undercomplete Autoencoders
14.2 Regularized Autoencoders
14.2.1 Sparse Autoencoders14.2.2 Denoising Autoencoders14.2.3 Regularizing by Penalizing Derivatives
14.3 Representational Power, Layer Size and Depth
14.4 Stochastic Encoders and Decoders
14.5 Denoising Autoencoders
14.5.1 Estimating the Score14.5.1.1 Historical Perspective
14.6 Learning Manifolds with Autoencoders
14.7 Contractive Autoencoders
14.8 Predictive Sparse Decomposition
14.9 Applications of Autoencoders
Chapter 15 - Representation Learning
15.1 Greedy Layer-Wise Unsupervised Pretraining
15.1.1 When and Why Does Unsupervised Pretraining Work?
15.2 Transfer Learning and Domain Adaptation
15.3 Semi-Supervised Disentangling of Causal Factors
15.4 Distributed Representation
15.5 Exponential Gains from Depth
15.6 Providing Clues to Discover Underlying Causes
Chapter 16 - Structured Probabilistic Models for Deep Learning
16.1 The Challenge of Unstructured Modeling
16.2 Using Graphs to Describe Model Structure
16.2.1 Directed Models16.2.2 Undirected Models16.2.3 The Partition Function16.2.4 Energy-Based Models16.2.5 Separation and D-Separation16.2.6 Converting between Undirected and Directed Graphs16.2.7 Factor Graphs
16.3 Sampling from Graphical Models
16.4 Advantages of Structured Modeling
16.5 Learning about Dependencies
16.6 Inference and Approximate Inference
16.7 The Deep Learning Approach to Structured Probabilistic Models
16.7.1 Example: The Restricted Boltzmann Machine
Chapter 17 - Monte Carlo Methods
17.1 Sampling and Monte Carlo Methods
17.1.1 Why Sampling?17.1.2 Basics of Monte Carlo Sampling
17.2 Importance Sampling
17.3 Markov Chain Monte Carlo Methods
17.4 Gibbs Sampling
17.5 The Challenge of Mixing between Separated Modes
17.5.1 Tempering to Mix between Modes17.5.2 Depth May Help Mixing
Chapter 18 - Confronting the Partition Function
18.1 The Log-Likelihood Gradient
18.2 Stochastic Maximum Likelihood and Contrastive Divergence
18.3 Pseudolikelihood
18.4 Score Matching and Ratio Matching
18.5 Denoising Score Matching
18.6 Noise-Contrastive Estimation
18.7 Estimating the Partition Function
18.7.1 Annealed Importance Sampling18.7.2 Bridge Sampling
Chapter 19 - Approximate Inference
19.1 Inference as Optimization
19.2 Expectation Maximization
19.3 MAP Inference and Sparse Coding
19.4 Variational Inference and Learning
19.4.1 Discrete Latent Variables19.4.2 Calculus of Variations19.4.3 Continuous Latent Variables19.4.4 Interactions between Learning and Inference
19.5 Learned Approximate Inference
19.5.1 Wake-Sleep19.5.2 Other Forms of Learned Inference
Chapter 20 - Deep Generative Models
20.1 Boltzmann Machines
20.2 Restricted Boltzmann Machines
20.2.1 Conditional Distributions20.2.2 Training Restricted Boltzmann Machines
20.3 Deep Belief Networks
20.4 Deep Boltzmann Machines
20.4.1 Interesting Properties20.4.2 DBM Mean Field Inference20.4.3 DBM Parameter Learning20.4.4 Layer-Wise Pretraining20.4.5 Jointly Training Deep Boltzmann Machines
20.5 Boltzmann Machines for Real-Valued Data
20.5.1 Gaussian-Bernoulli RBMs20.5.2 Undirected Models of Conditional Covariance
20.6 Convolutional Boltzmann Machines
20.7 Boltzmann Machines for Structured or Sequential Outputs
20.8 Other Boltzmann Machines
20.9 Back-Propagation through Random Operations
20.9.1 Back-Propagating through Discrete Stochastic Operations
20.10 Directed Generative Nets
20.10.1 Sigmoid Belief Nets20.10.2 Differentiable Generator Nets20.10.3 Variational Autoencoders20.10.4 Generative Adversarial Networks20.10.5 Generative Moment Matching Networks20.10.6 Convolutional Generative Networks20.10.7 Auto-Regressive Networks20.10.8 Linear Auto-Regressive Networks20.10.9 Neural Auto-Regressive Networks20.10.10 NADE
20.11 Drawing Samples from Autoencoders
20.11.1 Markov Chain Associated with any Denoising Autoencoder20.11.2 Clamping and Conditional Sampling20.11.3 Walk-Back Training Procedure
20.12 Generative Stochastic Networks
20.12.1 Discriminant GSNs
20.13 Other Generation Schemes
20.14 Evaluating Generative Models
20.15 Conclusion

Bibliography

Index

pdfs noter [29/29]

Kingma and Welling - 2013 - Auto-Encoding Variational Bayes.pdf

need to relink to new pdf location

Skeleton

1 Introduction

2 Method

2.1 Problem scenario
2.2 The variational bound
2.3 The SGVB estimator and AEVB algorithm
2.4 The reparameterization trick

3 Example: Variational Auto-Encoder

4 Related work

5 Experiments

6 Conclusion

7 Future work

A Visualisations

B Solution of - DKL(qbold0mu mumu 2005/06/28 ver: 1.3 subfig package(z) || pbold0mu mumu 2005/06/28 ver: 1.3 subfig package(z)), Gaussian case

C MLP’s as probabilistic encoders and decoders

C.1 Bernoulli MLP as decoder
C.2 Gaussian MLP as encoder or decoder

D Marginal likelihood estimator

E Monte Carlo EM

F Full VB

F.1 Example

Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples

2. Re-examin the Internal Representations

3. Towards Interpretalbe DNNs

Explanation in Artificial Intelligence: Insights from the Social Sciences

Skeleton

1 Introduction

1.1 Scope
1.2 Major Findings
1.3 Outline
1.4 Example

2 Philosophical Foundations — What Is Explanation?

2.1 Definitions
2.1.1 Causality
2.1.2 Explanation
2.1.3 Explanation as a Product
2.1.4 Explanation as Abductive Reasoning
2.1.5 Interpretability and Justification
2.2 Why People Ask for Explanations
2.3 Contrastive Explanation
2.4 Types and Levels of Explanation
2.5 Structure of Explanation
2.6 Explanation and XAI
2.6.1 Causal Attribution is Not Causal Explanation
2.6.2 Contrastive Explanation
2.6.3 Explanatory Tasks and Levels of Explanation
2.6.4 Explanatory Model of Self
2.6.5 Structure of Explanation

3 Social Attribution — How Do People Explain Behaviour?

3.1 Definitions
3.2 Intentionality and Explanation
3.3 Beliefs, Desires, Intentions, and Traits
3.3.1 Malle’s Conceptual Model for Social Attribution
3.4 Individual vs. Group Behaviour
3.5 Norms and Morals
3.6 Social Attribution and XAI
3.6.1 Folk Psychology
3.6.2 Malle’s Models
3.6.3 Collective Intelligence
3.6.4 Norms and Morals

4 Cognitive Processes — How Do People Select and Evaluate Explanations?

4.1 Causal Connection, Explanation Selection, and Evaluation
4.2 Causal Connection: Abductive Reasoning
4.2.1 Abductive Reasoning and Causal Types
4.2.2 Background and Discounting
4.2.3 Explanatory Modes
4.2.4 Inherent and Extrinsic Features
4.3 Causal Connection: Counterfactuals and Mutability
4.3.1 Abnormality
4.3.2 Temporality
4.3.3 Controllability and Intent
4.3.4 Social Norms
4.4 Explanation Selection
4.4.1 Facts and Foils
4.4.2 Abnormality
4.4.3 Intentionality and Functionality
4.4.4 Necessity, Sufficiency and Robustness
4.4.5 Responsibility
4.4.6 Preconditions, Failure, and Intentions
4.5 Explanation Evaluation
4.5.1 Coherence, Simplicity, and Generality
4.5.2 Truth and Probability
4.5.3 Goals and Explanatory Mode
4.6 Cognitive Processes and XAI
4.6.1 Abductive Reasoning
4.6.2 Mutability and Computation
4.6.3 Abnormality
4.6.4 Intentionality and Functionality
4.6.5 Perspectives and Controllability
4.6.6 Evaluation of Explanations

5 Social Explanation — How Do People Communicate Explanations?

5.1 Explanation as Conversation
5.1.1 Logic and Conversation
5.1.2 Relation & Relevance in Explanation Selection
5.1.3 Argumentation and Explanation
5.1.4 Linguistic structure
5.2 Explanatory Dialogue
5.3 Social Explanation and XAI
5.3.1 Conversational Model
5.3.2 Dialogue
5.3.3 Theory of Mind
5.3.4 Implicature
5.3.5 Dilution
5.3.6 Social and Interactive Explanation

6 Conclusions

Link on page 60: https:
Link on page 61: DARPA-BAA-16-53.pdf
Link on page 61: [[https://arxiv.org/pdf/1709.10256][xplainable Planning, in: IJCAI 2017 Workshop on Explainable Artificial Intelligence (XAI), URL https://arxiv.org/pdf/1709.10256, 2017. [47] N. Frosst, G. Hinton, Distilling a Neural Network Into a Soft Deci]]
Link on page 62: https://arxiv.org/abs/
Link on page 62: 1802.00541
Link on page 66: https:

Geometric deep learning: going beyond Euclidean data

Skeleton

I Introduction

II Geometric learning problems

III Deep learning on Euclidean domains

IV The geometry of manifolds and graphs

V Spectral methods

VI Spectrum-free methods

VII Charting-based methods

VIII Combined spatial/spectral methods

IX Applications

X Open problems and future directions

References

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Skeleton

I Introduction

II Background on Deep Neural Networks (DNN)

II-A Artificial Intelligence and DNNs
II-B Neural Networks and Deep Neural Networks (DNNs)
II-C Inference versus Training
II-D Development History
II-F Embedded versus Cloud
II-E Applications of DNN

III Overview of DNNs

III-A Convolutional Neural Networks (CNNs)
III-A1 Non-Linearity
III-A2 Pooling
III-A3 Normalization
III-B Popular DNN Models
IV-B Models
IV-C Popular Datasets for Classification

IV DNN development resources

IV-A Frameworks
IV-D Datasets for Other Tasks

V Hardware for DNN Processing

V-A Accelerate Kernel Computation on CPU and GPU Platforms
V-B Energy-Efficient Dataflow for Accelerators
V-B1 Weight stationary (WS)
V-B3 No local reuse (NLR)
V-B2 Output stationary (OS)
V-B4 Row stationary (RS)
V-B5 Energy comparison of different dataflows
VI-A DRAM

VI Near-Data Processing

VI-B SRAM
VI-D Sensors
VI-C Non-volatile Resistive Memories

VII Co-design of DNN models and Hardware

VII-A Reduce Precision
VII-A1 Linear quantization
VII-A2 Non-linear quantization
VII-B Reduce Number of Operations and Model Size
VII-B1 Exploiting Activation Statistics
VII-B2 Network Pruning
VII-B3 Compact Network Architectures
VII-B4 Knowledge Distillation

VIII Benchmarking Metrics for DNN Evaluation and Comparison

VIII-A Metrics for DNN Models
VIII-B Metrics for DNN Hardware

IX Summary

Link on page 30: http://www.vlfeat.org/
Link on page 30: https://github.
Link on page 30: VOC/
Link on page 30: http:

UNDERSTANDING DEEP LEARNING REQUIRES RETHINKING GENERALIZATION

Learning by Abstraction: The Neural State Machine

Skeleton

1 Introduction

2 Related work

3 The Neural State Machine

3.1 Concept vocabulary
3.2 States and edge transitions
3.3 Reasoning instructions
3.4 Model simulation

4 Experiments

4.1 Compositional question answering
4.2 Generalization experiments

5 Conclusion

6 Supplementary material

6.1 Related work (full version)
6.2 Ablation studies
6.3 Concept vocabulary
6.4 Scene graph generation
6.5 Implementation and training details

Building machines that learn and think like people

Building machines that learn and think like people - 579 citations- Google Scholar

Skeleton

Building machines that learn and think like people

Introduction
1.1.#What this article is not
1.2.#Overview of the key ideas
Cognitive and neural inspiration in artificial intelligence
Challenges for building more human-like machines
3.1.#The Characters Challenge
3.2.#The Frostbite Challenge
Core ingredients of human intelligence
4.1.#Developmental start-up software
4.1.1.#Intuitive physics4.1.2.#Intuitive psychology
4.2.#Learning as rapid model building
4.2.1.#Compositionality4.2.2.#Causality4.2.3.#Learning-to-learn
4.3.#Thinking Fast
4.3.1.#Approximate inference in structured models4.3.2.#Model-based and model-free reinforcement learning.
Responses to common questions
5.1.#Comparing the learning speeds of humans and neural networks on specific tasks is not meaningful, because humans have extensive prior experience
5.2.#Biological plausibility suggests theories of intelligence should start with neural networks
5.3.#Language is essential for human intelligence. Why is it not more prominent here?
Looking forward
6.1.#Promising directions in deep learning
6.2.#Future applications to practical AI problems
6.3.#Toward more human-like learning and thinking machines

Open Peer Commentary

The architecture challenge: Future artificial-intelligence systems will require sophisticated architectures, and knowledge of the brain might guide their construction 10.1017/S0140525X17000036 Gianluca Baldassarre, Vieri Giuliano Santucci, Emilio Cartoni, and Daniele Caligiore Laboratory of Computational Embodied Neuroscience, Institute of Cognitive Sciences and Technologies, National Research Council of Italy, Rome, Italy. gianluca.baldassarre@istc.cnr.it#vieri.santucci@istc.cnr.it emilio.cartoni@istc.cnr.it#daniele.caligiore@istc.cnr.it http://www.istc.cnr.it/people/ http://www.istc.cnr.it/people/gianluca-baldassarre http://www.istc.cnr.it/people/vieri-giuliano-santucci http://www.istc.cnr.it/people/emilio-cartoni http://www.istc.cnr.it/people/daniele-caligiore In this commentary, we highlight a crucial challenge posed by the proposal of Lake et al. to introduce key elements of human cognition into deep neural networks and future artificial-intelligence systems: the need to design effective sophisticated architectures. We propose that looking at the brain is an important means of facing this great challenge. We agree with the claim of Lake et al. that to obtain human-level learning speed and cognitive flexibility, future artificial-intelligence (AI) systems will have to incorporate key elements of human cognition: from causal models of the world, to intuitive psychological theories, compositionality, and knowledge transfer. However, the authors largely overlook the importance of a major challenge to implementation of the functions they advocate: the need to develop sophisticated architectures to learn, represent, and process the knowledge related to those functions. Here we call this the architecture challenge. In this commentary, we make two claims: (1) tackling the architecture challenge is fundamental to success in developing human-level AI systems; (2) looking at the brain can furnish important insights on how to face the architecture challenge. The difficulty of the architecture challenge stems from the fact that the space of the architectures needed to implement the several functions advocated by Lake et al. is huge. The authors get close to this problem when they recognize that one thing that the enormous genetic algorithm of evolution has done in millions of years of the stochastic hill-climbing search is to develop suitable brain architectures. One possible way to attack the architecture challenge, also mentioned by Lake et al., would be to use evolutionary techniques mimicking evolution. We think that today this strategy is out of reach, given the &ldquo;ocean-like&rdquo; size of the search space. At most, we can use such techniques to explore small, interesting &ldquo;islands lost within the ocean.&rdquo; But how do we find those islands in the first place? We propose looking at the architecture of real brains, the product of the evolution genetic algorithm, and try to &ldquo;steal insights&rdquo; from nature. Indeed, we think that much of the intelligence of the brain resides in its architecture. Obviously, identifying the proper insights is not easy to do, as the brain is very difficult to understand. However, it might be useful to try, as the effort might give us at least some general indications, a compass, to find the islands in the ocean. Here we present some examples to support our intuition. When building architectures of AI systems, even when following cognitive science indications (e.g., Franklin 2007), the tendency is to &ldquo;divide and conquer,&rdquo; that is, to list the needed high-level functions, implement a module for each of them, and suitably interface the modules. However, the organisation of the brain can be understood on the basis of not only high-level functions (see below), but also &ldquo;low-level&rdquo; functions (usually called &ldquo;mechanisms&rdquo;). An example of a mechanism is brain organisation based on macro-structures, each having fine repeated micro-architectures implementing specific computations and learning processes (Caligiore et al. 2016; Doya 1999): the cortex to statically and dynamically store knowledge acquired by associative learning processes (Penhune &amp; Steele 2012; Shadmehr &amp; Krakauer 2008), the basal ganglia to learn to select information by reinforcement learning (Graybiel 2005; Houk et al. 1995), the cerebellum to implement fast time-scale computations possibly acquired with supervised learning (Kawato et al. 2011; Wolpert et al. 1998), and the limbic brain structures interfacing the brain to the body and generating motivations, emotions, and the value of things (Mirolli et al. 2010; Mogenson et al. 1980). Each of these mechanisms supports multiple, high-level functions (see below). Brain architecture is also forged by the fact that natural intelligence is strongly embodied and situated (an aspect not much stressed by Lake et al.); that is, it is shaped to adaptively interact with the physical world (Anderson 2003; Pfeifer &amp; G&oacute;mez 2009) to satisfy the organism&apos;s needs and goals (Mannella et al. 2013). Thus, the cortex is organised along multiple cortical pathways running from sensors to actuators (Baldassarre et al. 2013a) and &ldquo;intercepted&rdquo; by the basal ganglia selective processes in their last part closer to action (Mannella &amp; Baldassarre 2015). These pathways are organised in a hierarchical fashion, with the higher ones that process needs and motivational information controlling the lower ones closer to sensation&sol;action. The lowest pathways dynamically connect musculoskeletal body proprioception with primary motor areas (Churchland et al. 2012). Higher-level &ldquo;dorsal&rdquo; pathways control the lowest pathways by processing visual&sol;auditory information used to interact with the environment (Scott 2004). Even higher-level &ldquo;ventral&rdquo; pathways inform the brain on the identity and nature of resources in the environment to support decisions (Caligiore et al. 2010; Milner &amp; Goodale 2006). At the hierarchy apex, the limbic brain supports goal selection based on visceral, social, and other types of needs&sol;goals. Embedded within the higher pathways, an important structure involving basal ganglia&ndash;cortical loops learns and implements stimulus&ndash;response habitual behaviours (used to act in familiar situations) and goal-directed behaviours (important for problem solving and planning when new challenges are encountered) (Baldassarre et al. 2013b; Mannella et al. 2013). These brain structures form a sophisticated network, knowledge of which might help in designing the architectures of human-like embodied AI systems able to act in the real world. A last example of the need for sophisticated architectures starts with the recognition by Lake et al. that we need to endow AI systems with a &ldquo;developmental start-up software.&rdquo; In this respect, together with other authors (e.g., Weng et al. 2001; see Baldassarre et al. 2013b; 2014, for collections of works) we believe that human-level intelligence can be achieved only through open-ended learning, that is, the cumulative learning of progressively more complex skills and knowledge, driven by intrinsic motivations, which are motivations related to the acquisition of knowledge and skills rather than material resources (Baldassarre 2011). The brain (e.g., Lisman &amp; Grace 2005; Redgrave &amp; Gurney 2006) and computational theories and models (e.g., Baldassarre &amp; Mirolli 2013; Baldassarre et al. 2014; Santucci et al. 2016) indicate how the implementation of these processes indeed requires very sophisticated architectures able to store multiple skills, to transfer knowledge while avoiding catastrophic interference, to explore the environment based on the acquired skills, to self-generate goals&sol;tasks, and to focus on goals that ensure a maximum knowledge gain. Building machines that learn and think for themselves
Building machines that learn and think for themselves
Digging deeper on “deep” learning: A computational ecology approach
Back to the future: The return of cognitive functionalism
Theories or fragments?
The humanness of artificial non-normative personalities
Children begin with the same start-up software, but their software updates are cultural
Deep-learning networks and the functional architecture of executive control
Causal generative models are just a start
Thinking like animals or thinking like colleagues?
Evidence from machines that learn and think like people
What can the brain teach us about building artificial intelligence?
Building brains that communicate like machines
The importance of motivation and emotion for explaining human cognition
Building on prior knowledge without building it in
Building machines that adapt and compute like brains
Will human-like machines make human-like mistakes?
Benefits of embodiment
Understand the cogs to understand cognition
Social-motor experience and perception-action learning bring efficiency to machines
The argument for single-purpose robots
Autonomous development and learning in artificial intelligence and robotics: Scaling up deep learning to human-like learning
Human-like machines: Transparency and comprehensibility
Intelligent machines and human minds
The fork in the road
Avoiding frostbite: It helps to learn from others
Crossmodal lifelong learning in hybrid neural embodied architectures
Summary
Nature versus nurture
Coherent theories versus theory fragments
Symbolic versus sub-symbolic representations
Additional ingredients
R5.1.#Machines that feel: Emotion
R5.2.#Machines that act: Action and embodiment
R5.3.#Machines that learn from others: Culture and pedagogy
R5.4.#Machines that explore: Open-ended learning and intrinsic motivation
Insights from neuroscience and the brain
Coda: Ethics, responsibility, and opportunities

TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing

repo in /Users/Will/DevAcademics/DNN-Misc/tensorfuzz

Skeleton

1 Introduction

2 Background

2.1 Coverage-guided fuzzing
2.2 Testing of Neural Networks
2.3 Opportunities for improvement

3 The TensorFuzz library

3.1 The basic fuzzing procecure
3.2 Details of the fuzzing procedure
3.3 Batching and nondeterminism

4 Experimental results

4.1 CGF can efficiently find numerical errors in trained neural networks
4.2 CGF surfaces disagreements between models and their quantized versions
4.3 CGF surfaces undesirable behavior in character level language models

5 Conclusion

AlphaD3M: Machine Learning Pipeline Synthesis

  • comparing their system with other autoML frameworks:
    • Autosklearn
    • autostacker
    • TPOT
  • OpenML datasets
  • given: dataset, well defined task, performance criteria
  • DARPA D3M (Data Driven Discovery)
  • AlphaZerio as a starting point
    • single-player game

Notes for page 2

  • DNN for predicting
    • pipeline performance (value, or Q fn), and
    • action probabilities

GAN org-noter pdfs

Karras et al_2018_Progressive Growing of GANs for Improved Quality, Stability, and Variation.pdf

2 progressive growing of gans

related work, alternative architectures, but not quit the same

3 increasing variation using minibatch standard deviation

Wang et al. - 2018 - Evolutionary Generative Adversarial Networks.pdf

Skeleton

1 Introduction
2 Related Works
2.1 Generative Adversarial Networks
2.2 Evolutionary Algorithms
3 Method
3.1 Generative Adversarial Networks
3.2 Evolutionary Algorithm
3.3 Mutations
3.3.1 Minimax mutation3.3.2 Heuristic mutation3.3.3 Least-squares mutation
3.4 Evaluation
3.5 E-GAN
4 Experiments
4.1 Implementation Details
4.2 Synthetic Datasets and Mode Collapse
4.3 CIFAR-10 and Inception Score
4.4 LSUN and Architecture Robustness
4.5 CelebA and Space Continuity
5 Conclusion

Kumar et al. - 2017 - Semi-supervised Learning with GANs Manifold Invar.pdf

Skeleton

1 Introduction
2 Semi-supervised learning using GANs
2.1 Estimating the tangent space of data manifold
2.1.1 Training the inverse mapping (the encoder)2.1.2 Estimating the dominant tangent space
2.2 Injecting invariances into the classifier using tangents
2.3 GAN discriminator as the classifier for semi-supervised learning: effect of fake examples
3 Experiments
4 Discussion
A Tangent Plots
B Reconstruction Plots

Workshops & Meetups

webinares | meetups

Explainability of AI

ACM webinar - failed

The Bayesian Zig Zag: Developing Probabilistic Models Using Grid Methods and MCMC

PipelineAI webinar

  • [ ] remember part2 of this talk with tpus.
  • multi-armed bandit (traffic routing?)
  • injectable functions?
  • offline/online(production)
  • all to docker image -> sagemaker, cloud, personal premises, etc
  • recorded
  • tensorflow + etl engine, batch
  • nvlink 16-32 gpus now, switch
    • xla libs (in tensorflow), it’s cost optimizer to fuse layers/operations
  • hiring, jr/sr, san fran incubator - his house, 6mths mentor, nice part of san fran

AI computer vision TMLS

ACM webinar: Project Jupyter: From Computational Notebooks etc

ACM webinar: Project Jupyter: From Computational Notebooks to Large Scale Data Science with Sensitive Data

PipelineAI webinar II

Explainable ML in Healthcare ACM webinar

  • Good points:
    • explanation vs justification
    • explanation vs causality
      • not prescriptive
  • decisions in healthcare
    • heuristics majority
    • rules based system
    • ML based system
    • # of factors in diagnosis
  • [Sculley 2015], ML code is small part of ML in healthcare
  • Q: At slide x, what do the abbreviations LOS, ROR, SSI EF stand for? (slide with healthcare utilization) – LOS .- Length of Stay, RoR - Risk of Readmission, SSI - surgical site infection, EF - ejection fraction (cardiac)
  • [ ] Transparent
    • Falling Rule Lists
    • GAM(GeneralizedAdditiveModels)
    • GA2M(GeneralizedAdditiveModelswith
    • LIME(LocallyInterpretableModelAgnostic Explanations)
    • Naïve Bayes
    • Regression Models
    • Shapley Values
  • semi - shallow ensembles
  • non-transparent
    • deep learning
    • SVM
    • gradient boosting models
  • transparency, fidelity, trust
  • howto validate explanations?

Dave DeepMind 2

pipeline.ai talk

import fairing mlflow knyfe, pycuda, ipykernel, kanren, requests, pytorch-cpu torchvision-cpu -c pytorch, accimage, html5lib, Hy, BeautifulSoup4, libgcc-ng

DLRL

DLRL workshop #1 Intro to GANs [0/0]

  • NOTES:
    • Xiyangs notebook in my gan-explorations repo has dcgan and conditional dcgan, not sagan
    • Conditional GANs | Kaggle, nice ref

Meeting 1 [2018-09-01]

  • full attendance
  • Paper Read
    1. Intro
      1. [ ] generative models broadly?
    2. Related work
      1. [ ] why are markov chains needed in previous ones?
      2. [ ] VAEs
    3. Adverserial Nets
      1. [ ] train on data period first, then compete?
      2. [ ] scores are errors?
      3. [ ]
      4. cross entropy
        1. z ~ uniform()
    4. Theoretical results
    5. Experiments
      1. [ ] Gaussian Parzen window?
    6. Pros Cons
      • [ ] helvetica scenario?
      • [ ] negative chain boltzmann machine
      • markov chains, inference, not needed
        • Mchains need blurry distributions for chains to mix between modes.
        • this can represent sharp, even degenerate distributions
    7. Conclusion
      1. learned approximate inference
      2. variational inference
      3. MCMC inference
      4. AIS?
      5. Parzen density?

Meet 2 [2018-09-08 Sat]

  • Lecture 13 | Generative Models - YouTube
    • for G, better objective max D getting wrong max(D(G(z))), instead of minimizing (1-D(G(z)))
    • Wasserstein GAN supposed to avoid issue with balancing training between G and D
    • Tips:
      • replace any pooling layers with strided convs(D), and fractional-strided cons(G)
      • use batchnorm in both G and D
      • remove fully connected hidden layers for deeper architectures
      • use relu in G for all layers, output use Tanh
      • use leakyRelu in D for all layers
    • active research:
      • better loss functions, more stable training (Wasserstein, LSGAN, etc)
      • conditional GANs
      • all kinds of applications
    • current active generative models research
      • PixelRNN and PixelCNN
        • explicit density model
        • optimizes exact likelihood
        • good samples
        • inefficient sequential generation
      • VAE
        • optimize variational lower bound on likelihood
        • useful latent representation
        • inference queries
        • samples not great
      • GANs
        • game-theoretic approach, best samples
        • tricky & unstable to train
        • no inference queries
      • recent work to also combine the above
  • Goodfellow Tutorial 2016
  • Generating Pokemon with a Generative Adversarial Network - YouTube
    • DCGAN deep convolutional GAN, 1st improvement
      • batchnorm must for both
      • avoid fully connected hidden units
      • avoid pooling, simply stride the conv (or capsule)
      • relu like other guides
      • use this for baseline comparison, esp for non-simple datasets
    • CGANs conditional GANs
      • concatenate same y’ input to both z (for G), and x (for D), ie. text labels for that neat trick
    • Wasserstein
      • improve loss fn, eg. when to stop?
      • highest training stability
      • informative & interpretable loss fn

SAGAN paper notes

  • [2018-09-11 Tue]
    • Intro
      • Scores
        • Inception score
        • Frechet Inception distance
      • ImageNet dataset

sagan part

  • SAGAN part
    • image features –> 2 weighted feature spaces f,g
    • β’s between different regions of f,g
  • optimize params: w_f, w_g, w_h,
  • derived: βj,i, o_i,
    • β - NxN attention map
    • i,j ∈ {1..N}
# f w array
# g w array
# softmax f(xi)^T * g(xj)
  • γ - training hyperparameter
  • hinge loss

Q’s

  1. [X] with no pooling, how to reduce dims btw conv layers?
    • 1x1 convs?
    • batchnorm?

GAN stabilization

  • spectral normalization (G and D)
    • TTUR (two-timescale update rule)
  • imbalanced learning rate
    • TTUR seperate learning rates

Study session [2018-09-22 Sat]

  • for Self-Attention-GAN-Tensorflow repo, changed default dataset to mnist, from celebA (only 3 pics in there)
Papers/sagan repo

For MNIST

  • generator
    • layers = 8-3 = 5
      • 1 layer 1024 channels
      • 3 conv layers (1024, 512, 256)
      • attention layer (128)
      • 2 conv layers (128, 256)
      • 1 conv layer sigmoid
  • discriminator
    • layers = 8-3 = 5
      • 1 layer 64 channels
      • 3 conv layers (64, 128, 256)
      • attention layer (256 ch)
      • 2 conv layers (256, 512)
      • 1 conv layer (4), flatten
      • 1 dense layer sigmoid

SAGAN paper org-noter

Skeleton

1 Introduction
2 Related Work
3 Self-Attention Generative Adversarial Networks
4 Techniques to stabilize GAN training
4.1 Spectral normalization for both generator and discriminator
  • not just D
  • every layer for both
4.2 Imbalanced learning rate for generator and discriminator updates
  • to compensate for slow learning since D has regularization applied
5 Experiments
evaluation metrics:
  • ID (Inception Distance)
    • KL divergence btw conditional class and marginal class.
    • higher better
    • has problems
  • FID (Frechet Inception Distance) is a more principled and comprehensive metric, and has been shown to be more consistent with human evaluation in assessing the realism and variation of the generated samples
    • Wassertein-2 distance between generated and real images in the feature space of an Inception-v3 network.
    • lower values mean closer distances between synthetic and real data distributions
Network structure & implementation
  • 128 x 128 images
  • spec-norm every layer on both G and D
  • conditional batch normalization for G, and projection type for D.
  • Adam optimizer:
    • beta1 = 0, beta2 = 0.9
    • learn rate for D = 0.0004
    • learn rate for G = 0.0001
  • SAGAN uses conditional batch normalization in the generator and projection in the discriminator.
5.1 Evaluating the proposed stabilization techniques.
5.2 Self-attention mechanism.
  • attention mid to late layer is best
  • both G and D
  • complements convolution, which is strong in modeling local dependencies
visualize attention maps
  • We observe that the network learns to allocate attention according to similarity of color and texture, rather than just spatial adjacency.
5.3 Comparison with the state-of-the-art
6 Conclusion

Meet 3

  • Xiyang recommended references
  • Dave Macdonald RangleIO guy trying to get ML going
    • 1 month? presentation
  • Q’s
    • regions = ? (ith and jth), pixel, some arbitrary rectangle
    • embedding space? (non-local paper. embedded gaussian sec) dimension reduction?
      • you don’t need softmax constraint? absolute value can …?
  • [ ] hinge loss could use followup, add to doc
  • spectral normalization (see paper)
    • lipschitz condition
    • compute eigenvalues of weights - sqrt of highest - but
    • [ ] followup
  • [ ] measures:
    • FID

Repo: How to Train a GAN? Tips and tricks to make GANs work

https://github.com/soumith/ganhacks

While research in Generative Adversarial Networks (GANs) continues to improve the fundamental stability of these models, we use a bunch of tricks to train them and make them stable day to day.

Here are a summary of some of the tricks.

[Here’s a link to the authors of thi1s document](*a*uthors)

If you find a trick that is particularly useful in practice, please open a Pull Request to add it to the document. If we find it to be reasonable and verified, we will me11rge it in.

1. Normalize the inputs
  • normalize the images between -1 and 1
  • Tanh as the last layer of the generator output
2: A modified loss function

In GAN papers, the loss function to optimize G is `min (log 1-D)`, but in practice folks practically use `max log D`

  • because the first formulation has vanishing gradients early on
  • Goodfellow et. al (2014)

In practice, works well:

  • Flip labels when training generator: real = fake, fake = real
3: Use a spherical Z
  • Dont sample from a Uniform distribution

![cube](images/cube.png “Cube”)

  • Sample from a gaussian distribution

![sphere](images/sphere.png “Sphere”)

4: BatchNorm
  • Construct different mini-batches for real and fake, i.e. each mini-batch needs to contain only all real images or all generated images.
  • when batchnorm is not an option use instance normalization (for each sample, subtract mean and divide by standard deviation).

![batchmix](images/batchmix.png “BatchMix”)

5: Avoid Sparse Gradients: ReLU, MaxPool
  • the stability of the GAN game suffers if you have sparse gradients
  • LeakyReLU = good (in both G and D)
  • For Downsampling, use: Average Pooling, Conv2d + stride
  • For Upsampling, use: PixelShuffle, ConvTranspose2d + stride
6: Use Soft and Noisy Labels
  • Label Smoothing, i.e. if you have two target labels: Real=1 and Fake=0, then for each incoming sample, if it is real, then replace the label with a random number between 0.7 and 1.2, and if it is a fake sample, replace it with 0.0 and 0.3 (for example).
    • Salimans et. al. 2016
  • make the labels the noisy for the discriminator: occasionally flip the labels when training the discriminator
7: DCGAN / Hybrid Models
  • Use DCGAN when you can. It works!
  • if you cant use DCGANs and no model is stable, use a hybrid model : KL + GAN or11 VAE + GAN
8: Use stability tricks from RL
  • Experience Replay
    • Keep a replay buffer of past generations and occassionally show them
    • Keep checkpoints from the past of G and D and occassionaly swap them out for a few iterations
  • All stability tricks that work for deep deterministic policy gradients
  • See Pfau & Viny11als (2016)
9: Use the ADAM Optimizer
  • optim.Adam rules!
    • See Radford et. al. 2015
  • Use SGD for discriminator and ADAM for11 generator
10: Track failures early
  • D loss goes to 0: failure mode
  • check norms of gradients: if they are over 100 things are screwing up
  • when things are working, D loss has low variance and goes down over time vs having huge variance and spiking
  • if loss of generator steadily decreases, then it’s fooling D with garbage (says martin)
11: Dont balance loss via statistics (unless you have a good reason to)
  • Dont try to find a (number of G / number of D) schedule to uncollapse training
  • It’s hard and we’ve all tried it.
  • If you do try it, have a principled approach to it, rather than intuition

For example “` while lossD > A: train D while lossG > B: t11rain G “`

12: If you have labels, use them
  • if you have labels available, training the discriminator to also classify the samples: auxi11llary GANs
13: Add noise to inputs, decay over time
14: [notsure] Train discriminator more (sometimes)
  • especially when you have noise
  • hard to find a schedule of number of D iterations vs G 11iterations
15: [notsure] Batch Discrimination
  • Mix11ed results
16: Discrete variables in Conditional GANs
  • Use an Embedding layer
  • Add as additional channels to images
  • Keep embedding dimensionality low and upsample to match image ch11annel size
17: Use Dropouts in G in both train and test phase

Authors

  • Soumith Chintala
  • Emily Denton
  • Martin Arjovsky
  • Michael Mathieu

Deep Learning with Generative Adverserial Networks – ICLR 2017 Discoveries

Deep Learning with Generative Adverserial Networks – ICLR 2017 Discoveries - https://amundtveit.com/2016/11/12/deep-learning-with-generative-and-generative-adverserial-networks-iclr-2017-discoveries/

2018-05-07 Monday Improved Techniques for Training GANs - I Goodfellow 2016 Wasserstein GAN On the regularization of Wasserstein GANs | OpenReview Training GANs with Optimism Progressive Growing of GANs for Improved Quality, Stability, and Variation | OpenReview

Meet 4

no notes

[.] Meet 5

  • try with celebA dataset:
    • [-] sthalles repo
      • [X] trying his dcgan first.
      • [ ] get working
    • [ ] Xiyang code?
    • [ ] hhhhhao/paper repo
      • [ ] mod to accept new data
      • [ ] test run
      • [ ] add spectral normalization to hhhhhhhao/paper repo
  • [ ] get celebA in tfFrames
    • [ ] what dim changes to get code to run these?
    • [ ]

[.] DLRL sagan attention layer

implement attention layer on top of base repo Werner tensorflow gan base start

Meet7

  • 128x128 batch 8 to 16
  • bigger batches better for both G and D, better gradients and ?? something else Dave said
  • recommended Deep Residual Learning for Image Recognition paper
  • Dave:
    • cycle GANs intro
    • doodles = a source distribution (besides real images)
      • his generated doodles end up monochrome - why?
    • synthetic data used - horizontal flip, random rotation; since dataset fairly small
      • image batch readers on repeat (actual method)
    • different design ideas decisions
    • eve optimizer didin’t work well with GANs
  • spectralnorm
def conv(o, channels, ks=3, strides=1, norm=None, padding='SAME', name=None):

    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):

        if norm is not None:
            o = norm(o, name)

        o = LeakyReLU() (o)

        in_channels = o.get_shape()[-1]

        w = tf.get_variable("kernel", shape=[ks, ks, in_channels, channels], initializer=tf.keras.initializers.he_uniform())
        b = tf.get_variable("bias", [channels], initializer=tf.constant_initializer(0.0))


        o = tf.nn.conv2d(o, spectral_norm(w, name_prefix="w"), [1, strides, strides, 1], padding) + b
    return o
# https://github.com/taki0112/Spectral_Normalization-Tensorflow/blob/master/spectral_norm.py
def spectral_norm(w, iteration=1, name_prefix=""):
    w_shape = w.shape.as_list()
    w = tf.reshape(w, [-1, w_shape[-1]])

    u = tf.get_variable(name_prefix+"u", [1, w_shape[-1]], initializer=tf.random_normal_initializer(), trainable=False)

    u_hat = u
    v_hat = None
    for i in range(iteration):

        """
        power iteration
        Usually iteration = 1 will be enough
        """

        v_ = tf.matmul(u_hat, tf.transpose(w))
        v_hat = tf.nn.l2_normalize(v_)

        u_ = tf.matmul(v_hat, w)
        u_hat = tf.nn.l2_normalize(u_)

    u_hat = tf.stop_gradient(u_hat)
    v_hat = tf.stop_gradient(v_hat)

    sigma = tf.matmul(tf.matmul(v_hat, w), tf.transpose(u_hat))

    with tf.control_dependencies([u.assign(u_hat)]):
        w_norm = w / sigma
        w_norm = tf.reshape(w_norm, w_shape)


    return w_norm
  • softened hinge-loss is better for cycle-GAN (Dave)
    • Softened hinge loss objectives for Generator and Discriminator:
fake_term = tf.reduce_mean(tf.nn.softplus( fake * SCALE + OFFSET))
real_term = tf.reduce_mean(tf.nn.softplus(-real * SCALE + OFFSET))
gen_term  = tf.reduce_mean(tf.nn.softplus(-fake * SCALE + OFFSET))
  • hinge-loss gradients smaller (but nicer?)
  • Dave found 10x diff in learning rate of cycle-GAN high
    • L1 loss on pixels themselves -> a strong signal
    • large gradients
    • loss with cyclic check…
    • but with hinge-loss gradients are limited to 1 so then not a problem

Meet 8,9?

NIPS 2016 Tutorial Generative Adversarial Network

Skeleton

1 Why study generative modeling?
2 How do generative models work? How do GANs compare to others?
2.1 Maximum likelihood estimation
2.2 A taxonomy of deep generative models
2.3 Explicit density models
2.3.1 Tractable explicit models2.3.2 Explicit models requiring approximation
2.4 Implicit density models
2.5 Comparing GANs to other generative models
3 How do GANs work?
3.1 The GAN framework
3.2 Cost functions
3.2.1 The discriminator’s cost, J(D)3.2.2 Minimax3.2.3 Heuristic, non-saturating game3.2.4 Maximum likelihood game3.2.5 Is the choice of divergence a distinguishing feature of GANs?3.2.6 Comparison of cost functions
3.3 The DCGAN architecture
3.4 How do GANs relate to noise-contrastive estimation and maximum likelihood?
4 Tips and Tricks
4.1 Train with labels
4.2 One-sided label smoothing
4.3 Virtual batch normalization
4.4 Can one balance G and D?
5 Research Frontiers
5.1 Non-convergence
5.1.1 Mode collapse5.1.2 Other games
5.2 Evaluation of generative models
5.3 Discrete outputs
5.4 Semi-supervised learning
5.5 Using the code
5.6 Developing connections to reinforcement learning
6 Plug and Play Generative Networks
7 Exercises
7.1 The optimal discriminator strategy
7.2 Gradient descent for games
7.3 Maximum likelihood in the GAN framework
8 Solutions to exercises
8.1 The optimal discriminator strategy
8.2 Gradient descent for games
8.3 Maximum likelihood in the GAN framework
9 Conclusion

RL meet #2

[x] [#B] prep RL meet #2 tomorrow

  • watch videos from slack /rl
  • review last week

RL meet #3

  • Review:
    • MCD, decision process, actual costs, actions & inputs
    • agent approximations of above
    • state, value function, value of state
      • oracle provides actual values (π)
      • state value fn – v^(s,w)
    • action value function q_pi actual, q^(s,a,w)
      • s = state, a = action, w = weights of net
    • RL problems do not have oracle information about actual π values of either functions, we must get estimates via environment interactions
    • 1st strategy MC learningk
    • [ ] notes, Qs
  • [ ] why is network used to get reward value with TD,q-learning?
    • ie why in the target valuation?
  • [ ] how to keep track of things?
  • [ ] use Deep Q-learning doc on Gdrive to post Q’s and rough work
  • NO, traffic - javascript based, old. also NO on car-pole? atari gym

RL meet #4 & 5

  • missed #5, first miss

graph NN meet 3

attended. links:

layers.py - tkipf/gcn - Sourcegraph utils.py - tkipf/pygcn - Sourcegraph train.py - tkipf/pygcn - Sourcegraph Sparse matrices (scipy.sparse) — SciPy v1.2.1 Reference Guide scipy.sparse.coo_matrix — SciPy v1.2.1 Reference Guide scipy.sparse.eye — SciPy v1.2.1 Reference Guide scipy.sparse.diags — SciPy v1.2.1 Reference Guide pygcn/data/cora at master · tkipf/pygcn main.py - pytorch/examples - Sourcegraph [[https://aisc.a-i.science/events/2019-03-27/][[GCN] Semi-Supervised Classification with Graph Convolutional Networks | Lunch & Learn | A.I. Socratic Circles (#AISC)]] [[https://www.youtube.com/watch?v=eEs-qXs_9Dc][[GCN] Semi-Supervised Classification with Graph Convolutional Networks | AISC Lunch & Learn - YouTube]] 0711.0189.pdf networkx.generators.random_graphs.barabasi_albert_graph — NetworkX 2.3rc1.dev20190329133857 documentation AlxndrMlk/Barabasi-Albert_Network: Barabási–Albert Network. A Step-by-Step Model with Visualizations created in Python 3. algorithm - Python: implementing a step-by-step, modified Barabasi-Albert model for scale-free networks - Stack Overflow python-igraph manual GraphSAGE [[https://arxiv.org/abs/1706.02216][[1706.02216] Inductive Representation Learning on Large Graphs]] New Tab NetworkX — NetworkX Beyond Grids: Learning Graph Representations for Visual Recognition New Tab Papers With Code : Search for graph convolution

DLRL may18-19

[2019-05-25 Sat] python3 main.py –exp_name $EXPNAME –dataset omniglot –test_N_way 5 –train_N_way 5 –train_N_shots 1 –test_N_shots 1 –batch_size 300 –dec_lr=10000 –iterations 100000

python3 main.py –exp_name $EXPNAME –dataset omniglot –test_N_way 5 –train_N_way 5 –train_N_shots 1 –test_N_shots 1 –batch_size 300 –dec_lr=10000 –iterations 500

python3 main.py –exp_name $EXPNAME –dataset omniglot –test_N_way 5 –train_N_way 5 –train_N_shots 1 –test_N_shots 1 –batch_size 300 –dec_lr=10000 –save_interval 10 –iterations 500

DLRL BERT June 2019

  • meet1 no notes, can’t remember what was said
  • meet2 [2019-06-22 Sat 14:58]
    • trying to understand BERT

BERT meet 3

Other links from previous meets

https://tfhub.dev/google/

AISC

AISC Talks

Speaker List

Toronto Deep Learning Series (TDLS) TDLS speaker list - Google Sheets

AISC graph-nn-research @ TDLS [1/1]

  • read papers - we are looking at landscape what’s out there

[x] graph-nn read Kipf & Welling paper

for Xiyangs graph-nn group

graph neural nets info

Oriol Vinyals on Twitter: “Graph Neural Networks / Relational Networks are models worth studying. We wrote a pretty comprehensive review about them which I hope you will find helpful (code forthcoming!). https://t.co/D46XCkUIeb… https://t.co/RxvY4yZGHe”

ICLR 2018 report Quora in AcademicLog

ICLR 2018 report Quora i [12] Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis [13] [1711.00740] Learning to Represent Programs with Graphs [14] [1802.03691] Tree-to-tree Neural Networks for Program Translation

ICLR 2018 report Quora ii [13] [1711.00740] Learning to Represent Programs with Graphs [19] https://tkipf.github.io/graph-co… [19b] [1712.00268] Deformable Shape Completion with Graph Convolutional Autoencoders [20] Graph Attention Networks [21] https://www-cs.stanford.edu/grou… [22] [1711.04043] Few-Shot Learning with Graph Neural Networks

links

1609.02907 Semi-Supervised Classification with Graph Convolutional Networks 1706.02216 Inductive Representation Learning on Large Graphs 1710.10903 Graph Attention Networks 1611.08402 Geometric deep learning on graphs and manifolds using mixture model CNNs Graph Convolutional Networks | Thomas Kipf | PhD Student @ University of Amsterdam 1801.10247 FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling [[https://www.google.com/search?q=%5B13%5D+%5B1711.00740%5D+Learning+to+Represent+Programs+with+Graphs&oq=%5B13%5D+%5B1711.00740%5D+Learning+to+Represent+Programs+with+Graphs&aqs=chrome..69i57.651j0j7&sourceid=chrome&ie=UTF-8][[13] [1711.00740] Learning to Represent Programs with Graphs - Google Search]] DeepMind-Advanced-Deep-Learning-and-Reinforcement-Learning/dl_01 Introduction to Machine Learning Based AI.pdf at master · enggen/DeepMind-Advanced-Deep-Learning-and-Reinforcement-Learning [[https://arxiv.org/abs/1810.09202][[1810.09202] Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation]] Graph Neural Network - YouTube Graph Neural Networks - YouTube Graph Convolution Learning - YouTube Xavier Bresson: “Convolutional Neural Networks on Graphs” - YouTube williamleif/GraphSAGE: Representation learning on large graphs using stochastic graph convolutions. matenure/FastGCN: The sample codes for our ICLR18 paper “FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling””

meet 1

  • Laplacian Operator core of spectral graph theory
  • Q: fixed input size?
    • number of vertices
    • and/or edges
  • K&W,
    • X - feature matrix of nodes
    • A - adjacency matrix

1709.05429.pdf, Research - Hector Zenil, Complexity Explorer,

papers/info for GNN meet

1812.04202.pdf, 1806.01261 Relational inductive biases, deep learning, and graph networks, 1811.05868 Pitfalls of Graph Neural Network Evaluation, 1812.08434 Graph Neural Networks: A Review of Methods and Applications, Meet Deep Graph Library, a Python Package For Graph Neural Networks, 1812.04202.pdf

AISC Learnability can be undecidable

Learnability can be undecidable | Main Stream | A.I. Socratic Circles (#AISC)

  • organic, connection, psych, bio-unrealistic
  • computationalism
  • Hinton PDP 1st book
  • bio neurons also fire randomly
  • bio/psych qualitative more than quantitative
  • anatomy of learning – levels
  • focus on the function only
    • think a crypto function, it is not learnable
    • connection between computational complexity and learnability? Yes
    • applies to QM computing, since QM comp is still a function
    • bare minimum, a function that learns.
    • start with set X
      • family of
  • inductive argument for most papers, inductive assumption, that their algo works for a certain set of problems. try to convince generalizability by testing on samples from the problem space.
    • this paper deductive
  • [ ] finitely supported def

AISC abstract night

  • Yaxal - temporal pattern attention for multivariate time series forecasting
  • Ramya B - time series forecasting based on wavelet decmposition and feature extraction
    • not stationary signals.
    • wavelet for SP, not fourier nowadays?
    • cascade correlation algo used
    • also use PCA
  • Albert Lai - captchas?
  • a closer look at spatiotemporal convolutions for action recognition - Owen Ho
    • a resnet (how?), 3D instead of 2D
    • video recognition
  • Florian Goebels - A generic framework for privacy preserving deep learning
    • federated learning, interesting.
  • a reinforcement learning framework for explainable recommendation - Omar Nada
  • Jiri Stodulka - Deep RL based recommendation with explicit user-item interactions modeling
  • Jorge Lopez - predicting crime using twitter and kernel density estimation

AISC Unsupervised Data Augmentation

AISC L&L Private machine learning in tensorflow with secure computing

  • Morten Dahl Dropout Labs
  • homomorphic encryption
  • secret sharing
  • TFE
    • underneath, 3rd party libs: TEE, HE, MPC
  • 1,2 orders of magnitude slower
  • uses current TF distributed code for communication (orchestratin)
    • tf.device
  • collaboration with openmind for pytorch – more federated. they are not so focused on federation
    • can access openmind code from tfe
  • ie. costs - ReLu uses a comparison which is expensive for encryption
    • can try to approximate which could impact accuracy
  • future
    • ethical issues, privacy,

(TF-Encrypted) Private machine learning in tensorflow with secure computing | Lunch & Learn | A.I. Socratic Circles (#AISC) 1810.08130 Private Machine Learning in TensorFlow using Secure Computation tf-encrypted/tf-encrypted: A Framework for Machine Learning on Encrypted Data

AISC The Neuro-Symbolic Concept Learner

Interpreting Scenes, Words & Sentences From Natural Supervision
  • language semantic parsing -> heirarchical program
  • visual concept annontation and program annotation –> symbolic reasoning module
  • curriculum learning
  • root then query, then filter for program
    • once object is identified, it is filtered out
  • multiple candidate programs for sentence / concept are sampled
    • reinforcement with
  • bidirectional GRU encoder for
    • concept decoder (hard coded)
    • algo1 string to tree semantic parser
    • think of it as a recursive algorithm as it needs to be re-called
    • 2 separate GRU cells (hardcoded), for given function, it can call 2 other functions
  • once objects recognized and program generated (semantic parsing), then symbolic reasoning
  • parts:
    • from pic
      • object detection
      • feature extraction
      • concept box
    • text
      • semantic parsing
      • concept embeddings
      • program box
  • off-policy search process for program selection (semantic parsing) (the reasoning process?)
    • how is reward determined? updates weights once correct answer is found??
  • attribute - shape, concept - sphere, etc
    • different embeddings for different concepts
    • but voc vector represent attributes
    • embeddings space is hard-coded, but withiin space is learned
  • cirriculum learning – what exactly?
    • stupidly simple to start,
  • mask r-cnn, resnet are pretrained
    • concept embeddings, semantic parsing are trained, as is the neuro-symbolic reasoning (NSR)
      • runs fns over objects and embeddings (RL part)
      • semantic parser is trained on RL, not others (those are backpropped)
  • NSR it is differentiable, this whole model is end-to-end apparently
    • RL is the GRUs? programs are built by the GRUs
    • if program gives correct answer reward = 1 otherwise = 0

AISC XLnet talk -empty

AISC Resnet talk [2019-08-12 Mon]

UoW — 2 types of features. machine intelligence course 5th lecture Alice Rueda ​Aggregating local image descriptors into compact codes

Alice Rueda ​This is the VLAD paper

AISC L&L SciSci

Discussion lead: Santo Fortunato Motivation: Identifying fundamental drivers of science and developing predictive models to capture its evolution are instrumental for the design of policies that can improve the scientific enterprise—for example, through enhanced career paths for scientists, better performance evaluation for organizations hosting research, discovery of novel effective funding vehicles, and even identification of promising regions along the scientific frontier. The science of science uses large-scale data on the production of science to search for universal and domain-specific patterns. Here, we review recent developments in this transdisciplinary field.

slides:

  • scientometrics
    • 2 founders
    • one guy started WOS (web of science)
  • data
    • WOK thomson thomson-reutuers
    • scopus elsevier
    • gscholar AI
    • MS academic graph, bigger than everyone else AI
  • H-index
  • c / c_0
  • interesting points prob that paper A cites older paper B
  • cite dynamics 3 paper specific parameters
    • preferential attachment - i cited the more its num of citations
    • time decay sruvival prob (obsolescence)
    • intrinsic fitness η_i of the paper
  • teams
    • science is becoming more and more team science
    • team size is growing
    • team papers are more cited
    • Q team size affect type of contribution
      • yes. small teams disrupt, new ideas, while large teams develop existing ideas
      • distruption index
    • DL is popular
  • API
  • careers - scientists can peak at anytime in their life, there is no pattern
  • make queries on their static local dataset, generated own network
    • have dataset locally if its big
    • pubmed
    • dataset shared? invest with grants to get them.
    • pubmed, American physical society - free
  • word2vec vs node2vec

AISC TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing

Motivation: Machine learning models are notoriously difficult to interpret and debug. This is particularly true of neural networks. In this work, we introduce automated software testing techniques for neural networks that are well-suited to discovering errors which occur only for rare inputs. Specifically, we develop coverage-guided fuzzing (CGF) methods for neural networks. In CGF, random mutations of inputs to a neural network are guided by a coverage metric toward the goal of satisfying user-specified constraints. We describe how fast approximate nearest neighbor algorithms can provide this coverage metric. We then discuss the application of CGF to the following goals: finding numerical errors in trained neural networks, generating disagreements between neural networks and quantized versions of those networks, and surfacing undesirable behavior in character level language models. Finally, we release an open source library called TensorFuzz that implements the described techniques.

  • info on his startup and the scene
  • cylance hack: enable dynamic debugging
  • coverage guided fuzzing
  • property based testing
  • approximate nearest neighbour
  • CGF (fuzzing) hard to do for ANNs
  • tensorfuzz
    • send in NN graph, not code
    • images or text are ok

Ernie 2.0: A Continual Pre-Training Framework for Language Understanding | AISC

  • discussion points
    • not real abliation studies, they are retraining on prior tasks
    • not properly continuous learning?
    • Ehsan is bringing this up, thinks another paper should be done with ablation studies, and to do only sampling from prior tasks to see if catastrophic forgetting happens
    • how much improvement is architecture vs just more data and bigger?

we need cheap ASICS ASAP, except those with the budget will just get bigger yet

MLops LaL [2019-09-25 Wed 12:19]

  • reproducible, trackable, testable, maintainable
  • give higher view, you can customi
  • ML automates decision making
    • trading: buy, sell?
    • health: is there tumor?
    • market pricing?
  • 2011 knight capital story 45min lost $465mil
    • hidden tech debt in ML systems, sculley
  • prep data, build & train, deploy
  • [ ] we’ll be using azure – free credit

/home/will/Zotero/storage/69G8AVWR/Francois-Lavet et al. - 2018 - An Introduction to Deep Reinforcement Learning.pdf

Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities [2019-10-08 Tue]

  • esp relevant to NLP
  • other methods to reduce complexity of softmax layer.
  • relates softmax bottleneck to matrix factorization bottleneck
  • something of a binary structure?? probability of nodes in a tree.
  • M vocab size, N reps different contexts (# words in training data)
  • dirichlet -> prior , when ppl want to sample discrete distributions
  • taylor expansions -> when they know that the higher order derivatives is smaller .. close to 0?

AISC Restricted Boltzmann Machines for Collaborative Filtering [2019-10-22 Tue]

  • why conditional?
  • why mcmc needed?
  • what is contrastive divergence?
  • training goal: get all visible states more probably likely
    • uses log-likelihood of V()
  • what is the e symbol?
  • how relate to matrix factorization techniques?
  • conditional factored RBMs?
  • collaborative filtering
  • SVD is a niave matrix factorization, there are many matrix factorizations?
    • Harriet related to LDA somehow

AISC Deep learning enables rapid identification of potent DDR1 kinase inhibitors [2019-10-23 Wed]

  • A Zhavoronkob CEO Insilico Medicine
    • novel drug discovery
    • de-novo molecule creation
    • drug discovery (DD)
      • slow, lots of failures
      • 10 years whole pipeline , 5.5 years for research/pre-clinical
  • GAN text to image synthesis
  • GANs in drug discovery “make perfect needles” vs needle in haystack
  • go to Alan Aspuru-Guzik lab
  • adding RL to GAN
    • 3 years of using generative models
    • can’t use G for everything
    • GA though can outperform G sometimes
    • SMILE and Graphs?
  • Daniel
    • EU head of division for DL
    • AE, VAE
      • latent codes: AE sparse, VAE gaussian distributed - can do better inference
      • VAE KL regulator on latent space
      • AAE
      • why AE? no mode collapse, works with discrete data out of the box
    • for molecules small differences matter, not like images
    • SMILES - rep mol as a string
      • build a spanning tree
      • write atoms in depth-first search order
      • pip package: rdkit
    • they combine both
      • conditional generation x ~ p(x | properties
      • optimization : quality(x) -> max_x
      • GENTRL(theres) vs ORGAN, RANC ATNC
    • multi-modal priors – get artifacts at boundaries
      • tensor train
      • a prior of a googol gasussiand: a tensor ring induced prior for G modelss
      • marginals and conditionals derivation
      • cts case – gaussian mixture
    • optimize reward with REINFORCE
    • optimize the latent manifold
    • SOM
  • crystal -> analyze surface – sounds like static
  • template - small molecule that binds to this protein (target)
    • use that mol as a template
  • also research into synthetic prediction and automation of synthesizing mols
    • Garapuz labs again
  • model zoo: reps for molecules, lots on smiles/seqs, more now on graph approaches, but smiles still better
    • fingerprints (his fav), take fingerprint – gen all chemical space that has same fingerprint
      • gen fingerprints by conditioning
    • 3D, point clouds, don’t know conformation of molecules
  • find a lab, gradually transition into field
    • pharma field is a pain
    • deepgenomics good
    • MS also in field, or join them
    • american chemical society journals, etc, not just nips, iclr

AISC Projects

neural-dom [0/10]

REPO : https://bitbucket.org/aggregate-intellect/neural-html/src/master/ Chris: c_bobotsis@hotmail.com Shen, Kai

[.] prog synthetic email – the japanese language thing

[.] msg TS guy re DOM extraction code in WOB .core lib

[.] gather more material, prep for DOM encoding

[.] graphnn group meet1

  • https://test.ai/ - example use to auto-test websites
  • DOM tree
    • html, dom,
  • miniWOB+ dataset - web tasks.
    • style info important?
  • leverage headless browser to render dom tree and then take it for the DNNs
  • then get nodes and attributes from DOM tree
    • then rendered html page from dom
    • generate json from node tree (dom)
      • then generate html from json file
      • 225 attributes per node
        • DNN would need to learn about inheritnac
  • pro team - acceptance criteria, steps to produce, QA finish closes ticket.
    • 3 levels, functional, business verification,
    • test.ai
      • rendered image recognize a button,
      • once dom tree breaks, query fails?
  • [ ] anything relevant to tree structure
  • [X] think of tasks
  • [ ] graph generation part?
  • [ ] decoder? for graph
  • [ ] GCN are efficient and speed matters for us since we will have a lot of data points
    • also parallelization matters for high throughput
  • [ ] meetup Zhang go through his code
  • upsampling tree structure from z-space

[.] input proc of graphNN

  • work on the input processing part for Graph NN (i.e. translate the JSON DOM tree into one-hot features plus adjacency matrices)? There are still details and techniques that need to be hammered out, but maybe a good place to start is @Sheng Jia’s codebase.
  • Sheng Jia pytorch
    • miniwob/env.py and miniwob/custom.py does some preprocessing, flatten out the tree into lists and record the indices etc
      • The adjancent nodes are labeled for each node as in key-value pair “adj_V” by my wrapper environment,
    • and the actual adjacency matrix is created in models/dom_qnet.py line 112.
    • Essentially my custom environment wrapper returns
      • a list of dom nodes, each of which has multiple key-value pairs for the attributes including the adj_V.
      • Those are processed by the model for getting the actual feature vectors.
  • generative code modelling on graphs paper:
    • He said one way to enable efficient graph generation during training is to put a batch of graphs into a single graph where they are not connected to each other, and in implementation just use sparse matrix.
Session [2019-06-22 Sat]
  • [ ] what would sparse matrix look like?
    • no autograd on sparseM
  • trail run of graphNN/gae
    • several fails with proper env versioning.
      • need python3.6 first, then run setup. tensorflow=1.13
      • env = gae.
    • train.py | 200 epochs
      • Test ROC score: 0.9160440573364683, Test AP score: 0.9308764945564731
    • python train.py –dataset citeseer | 200 epochs
      • Test ROC score: 0.8687404902789517, Test AP score: 0.8759123955009718
Generative Code Modeling with Graphs - tryout [2019-06-22 Sat]

utils/tensorise.py test_data/tensorised test_data/exprs-types.json.gz test_data/graphs/

WARNING: Logging before flag parsing goes to stderr. W0622 12:03:49.424902 140087753975616 deprecation_wrapper.py:119] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfutils/gradratiologgingoptimizer.py:19: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

W0622 12:03:49.474851 140087753975616 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:123: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2019-06-22 12:03:49.512044: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2019-06-22 12:03:49.512087: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303) 2019-06-22 12:03:49.512111: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (workstation): /proc/driver/nvidia/version does not exist 2019-06-22 12:03:49.533996: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3701080000 Hz 2019-06-22 12:03:49.534493: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557c315b29e0 executing computations on platform Host. Devices: 2019-06-22 12:03:49.534519: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> Imputed grammar: Expression -[00]-> ! Expression Expression -[01]-> - Expression Expression -[02]-> – Expression Expression -[03]-> CharLiteral Expression -[04]-> Expression * Expression Expression -[05]-> Expression + Expression Expression -[06]-> Expression ++ Expression -[07]-> Expression . IndexOf ( Expression ) Expression -[08]-> Expression . IndexOf ( Expression , Expression , Expression ) Expression -[09]-> Expression . StartsWith ( Expression ) Expression -[10]-> Expression < Expression Expression -[11]-> Expression > Expression Expression -[12]-> Expression ? Expression : Expression Expression -[13]-> Expression [ Expression ] Expression -[14]-> IntLiteral Expression -[15]-> StringLiteral Expression -[16]-> Variable Known literals: IntLiteral: [‘%UNK%’, ‘0’, ‘1’, ‘2’, ‘4’, ‘43’] CharLiteral: [‘%UNK%’, “’-‘”] StringLiteral: [‘“foobar”’, ‘%UNK%’] W0622 12:03:49.610398 140087753975616 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:175: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

W0622 12:03:49.610724 140087753975616 deprecation_wrapper.py:119] From utils/../exprsynth/contextgraphmodel.py:142: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

W0622 12:03:49.658701 140087753975616 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:212: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0622 12:03:50.152516 140087753975616 deprecation.py:323] From utils/../exprsynth/contextgraphmodel.py:184: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. W0622 12:03:50.158362 140087753975616 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:03:50.908026 140087753975616 deprecation.py:506] From utils/../exprsynth/contextgraphmodel.py:186: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. W0622 12:03:51.013405 140087753975616 deprecation.py:323] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfmodels/sparsegnn.py:95: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version. Instructions for updating: This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0. W0622 12:03:52.276013 140087753975616 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:564: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:03:52.378436 140087753975616 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:574: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:03:58.355536 140087753975616 deprecation.py:323] From utils/../exprsynth/nagdecoder.py:420: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” W0622 12:04:13.613733 140087753975616 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:200: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

utils/train.py trained_models/overtrain test_data/tensorised/{,}

WARNING: Logging before flag parsing goes to stderr. W0622 12:10:21.039777 139776961083200 deprecation_wrapper.py:119] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfutils/gradratiologgingoptimizer.py:19: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

W0622 12:10:21.053452 139776961083200 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:123: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2019-06-22 12:10:21.085190: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2019-06-22 12:10:21.085327: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303) 2019-06-22 12:10:21.085415: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (workstation): /proc/driver/nvidia/version does not exist 2019-06-22 12:10:21.107267: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3701080000 Hz 2019-06-22 12:10:21.107748: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5639c2adef40 executing computations on platform Host. Devices: 2019-06-22 12:10:21.107769: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> W0622 12:10:21.111878 139776961083200 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:175: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

W0622 12:10:21.112083 139776961083200 deprecation_wrapper.py:119] From utils/../exprsynth/contextgraphmodel.py:142: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

W0622 12:10:21.163701 139776961083200 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:212: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0622 12:10:21.684241 139776961083200 deprecation.py:323] From utils/../exprsynth/contextgraphmodel.py:184: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. W0622 12:10:21.692071 139776961083200 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:10:22.428447 139776961083200 deprecation.py:506] From utils/../exprsynth/contextgraphmodel.py:186: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. W0622 12:10:22.530452 139776961083200 deprecation.py:323] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfmodels/sparsegnn.py:95: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version. Instructions for updating: This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0. W0622 12:10:23.880191 139776961083200 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:564: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:10:23.902530 139776961083200 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:574: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:10:30.169962 139776961083200 deprecation.py:323] From utils/../exprsynth/nagdecoder.py:420: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” W0622 12:10:45.005255 139776961083200 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:200: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

Starting training run NAG-2019-06-22-12-10-21 of model NAGModel with following hypers: {“optimizer”: “Adam”, “seed”: 0, “dropout_keep_rate”: 0.9, “learning_rate”: 0.00025, “learning_rate_decay”: 0.98, “momentum”: 0.85, “gradient_clip”: 1, “max_epochs”: 500, “patience”: 5, “max_num_cg_nodes_in_batch”: 100000, “excluded_cg_edge_types”: [], “cg_add_subtoken_nodes”: true, “cg_node_label_embedding_style”: “Token”, “cg_node_label_vocab_size”: 10000, “cg_node_label_char_length”: 16, “cg_node_label_embedding_size”: 32, “cg_node_type_vocab_size”: 54, “cg_node_type_max_num”: 10, “cg_node_type_embedding_size”: 32, “cg_ggnn_layer_timesteps”: [3, 1, 3, 1], “cg_ggnn_residual_connections”: {“1”: [0], “3”: [0, 1]}, “cg_ggnn_hidden_size”: 64, “cg_ggnn_use_edge_bias”: false, “cg_ggnn_use_edge_msg_avg_aggregation”: false, “cg_ggnn_use_propagation_attention”: false, “cg_ggnn_graph_rnn_activation”: “tanh”, “cg_ggnn_graph_rnn_cell”: “GRU”, “eg_token_vocab_size”: 100, “eg_literal_vocab_size”: 10, “eg_max_variable_choices”: 10, “eg_propagation_substeps”: 50, “eg_hidden_size”: 64, “eg_edge_label_size”: 16, “exclude_edge_types”: [], “eg_graph_rnn_cell”: “GRU”, “eg_graph_rnn_activation”: “tanh”, “eg_use_edge_bias”: false, “eg_use_vars_for_production_choice”: true, “eg_update_last_variable_use_representation”: true, “eg_use_literal_copying”: true, “eg_use_context_attention”: true, “eg_max_context_tokens”: 500, “run_id”: “NAG-2019-06-22-12-10-21”} 2019-06-22 12:10:48.280541: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=–tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass –vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=–xla_hlo_profile. ==== Epoch 0 ==== Epoch 0 (train) took 12.28s [processed 1 samples/second] Training Loss: 9.437750 Epoch 0 (valid) took 3.41s [processed 4 samples/second] Validation Loss: 8.566715 Best result so far – saving model as ‘trained_models/overtrain/NAGModel_NAG-2019-06-22-12-10-21_model_best.pkl.gz’.

==== Epoch 136 ==== Epoch 136 (train) took 0.88s [processed 17 samples/second] Training Loss: 0.451712 Epoch 136 (valid) took 0.40s [processed 37 samples/second] Validation Loss: 0.392952

utils/test.py trained_models/overtrain/NAGModel_NAG-2019-06-22-12-10-21_model_best.pkl.gz test_data/graphs/ trained_models/overtrain/test_results/

WARNING: Logging before flag parsing goes to stderr. W0622 12:19:52.787774 140550576658240 deprecation_wrapper.py:119] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfutils/gradratiologgingoptimizer.py:19: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

W0622 12:19:52.905472 140550576658240 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:123: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2019-06-22 12:19:52.955281: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2019-06-22 12:19:52.955433: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303) 2019-06-22 12:19:52.955526: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (workstation): /proc/driver/nvidia/version does not exist 2019-06-22 12:19:52.974973: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3701080000 Hz 2019-06-22 12:19:52.975514: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x561a7a83d810 executing computations on platform Host. Devices: 2019-06-22 12:19:52.975536: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> W0622 12:19:52.976696 140550576658240 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:175: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

W0622 12:19:52.976995 140550576658240 deprecation_wrapper.py:119] From utils/../exprsynth/contextgraphmodel.py:142: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

W0622 12:19:53.046406 140550576658240 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:212: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0622 12:19:53.993441 140550576658240 deprecation.py:323] From utils/../exprsynth/contextgraphmodel.py:184: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. W0622 12:19:54.005863 140550576658240 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:19:54.663139 140550576658240 deprecation.py:506] From utils/../exprsynth/contextgraphmodel.py:186: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. W0622 12:19:54.760896 140550576658240 deprecation.py:323] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfmodels/sparsegnn.py:95: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version. Instructions for updating: This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0. W0622 12:19:55.945921 140550576658240 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:564: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:19:55.965353 140550576658240 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:574: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor W0622 12:20:01.999933 140550576658240 deprecation.py:323] From utils/../exprsynth/nagdecoder.py:420: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ” W0622 12:20:19.303285 140550576658240 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:200: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

2019-06-22 12:20:23.390381: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=–tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass –vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=–xla_hlo_profile. W0622 12:20:31.159235 140550576658240 deprecation.py:506] From utils/../exprsynth/nagdecoder.py:580: calling softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version. Instructions for updating: dim is deprecated, use axis instead Groundtruth: args [ 0 ] @1 Prob. 0.561: args [ 0 ] @2 Prob. 0.044: args [ classVar ] @3 Prob. 0.030: args . IndexOf ( ‘-’ ) @4 Prob. 0.020: args [ “foobar” ] @5 Prob. 0.016: args ? 1 : classVar Groundtruth: intVar + classVar @1 Prob. 0.419: intVar + classVar @2 Prob. 0.074: intVar + 4 @3 Prob. 0.050: intVar . IndexOf ( ‘-’ ) @4 Prob. 0.040: intVar . IndexOf ( ‘-’ , 2 , 43 ) @5 Prob. 0.014: intVar . IndexOf ( ‘-’ , 2 , classVar ) Groundtruth: foo . StartsWith ( “foobar” ) @1 Prob. 0.424: foo . StartsWith ( “foobar” ) @2 Prob. 0.102: foo . StartsWith ( ‘-’ ) @3 Prob. 0.034: foo . IndexOf ( ‘-’ ) @4 Prob. 0.031: foo . StartsWith ( %UNK% ) @5 Prob. 0.003: foo . IndexOf ( ‘-’ , 43 , 43 ) Groundtruth: foo . IndexOf ( ‘-’ ) @1 Prob. 0.276: foo . IndexOf ( ‘-’ ) @2 Prob. 0.062: foo . IndexOf ( ‘-’ , 2 , 43 ) @3 Prob. 0.051: foo [ 0 ] @4 Prob. 0.033: foo + classVar @5 Prob. 0.025: foo . StartsWith ( ‘-’ ) Groundtruth: foo . IndexOf ( ‘-’ , 2 , 43 ) @1 Prob. 0.215: foo . IndexOf ( ‘-’ , 2 , 43 ) @2 Prob. 0.133: foo . IndexOf ( ‘-’ ) @3 Prob. 0.070: foo + classVar @4 Prob. 0.055: foo + 4 @5 Prob. 0.005: foo . IndexOf ( ‘-’ , 2 , classVar ) Groundtruth: arr [ 1 ] @1 Prob. 0.353: b [ 1 ] @2 Prob. 0.167: arr [ 1 ] @3 Prob. 0.126: i [ 1 ] @4 Prob. 0.050: b [ i ] @5 Prob. 0.025: b ? 1 : i Groundtruth: j > classVar2 @1 Prob. 0.564: j > classVar2 @2 Prob. 0.119: j > 4 @3 Prob. 0.114: – j @4 Prob. 0.028: ! j @5 Prob. 0.019: j > - classVar2 Groundtruth: – j @1 Prob. 0.531: – j @2 Prob. 0.160: j > classVar2 @3 Prob. 0.059: j > 4 @4 Prob. 0.052: j ++ @5 Prob. 0.042: ! j Groundtruth: iarr [ j ] * - 1 + 4 @1 Prob. 0.193: iarr + j @2 Prob. 0.161: iarr + 4 @3 Prob. 0.013: iarr [ j ] * - 1 + 4 @4 Prob. 0.005: iarr [ 1 ] * - 1 + 4 @5 Prob. 0.005: iarr [ 1 ] * j + 4 Groundtruth: ! b @1 Prob. 0.474: ! b @2 Prob. 0.079: b ++ @3 Prob. 0.062: b > 4 @4 Prob. 0.039: b > j @5 Prob. 0.027: b < iarr Groundtruth: j > 4 @1 Prob. 0.300: j > 4 @2 Prob. 0.066: j < iarr @3 Prob. 0.064: j > iarr @4 Prob. 0.042: 4 < j @5 Prob. 0.036: j < 4 Groundtruth: 4 < classVar2 @1 Prob. 0.156: classVar2 > 4 @2 Prob. 0.136: 4 < classVar2 @3 Prob. 0.056: 4 > 4 @4 Prob. 0.050: classVar2 ++ @5 Prob. 0.049: classVar2 < j Groundtruth: classVar2 ++ @1 Prob. 0.442: classVar2 ++ @2 Prob. 0.085: ! classVar2 @3 Prob. 0.082: classVar2 > 4 @4 Prob. 0.059: – classVar2 @5 Prob. 0.057: classVar2 > j Groundtruth: b ? 2 : i @1 Prob. 0.352: b ? 2 : i @2 Prob. 0.085: b ? 2 : - i @3 Prob. 0.082: b ? 2 : arr @4 Prob. 0.041: b ? 2 : 4 @5 Prob. 0.026: b ? 2 : - 1 Groundtruth: b ? 1 : - i @1 Prob. 0.276: b ? 1 : i @2 Prob. 0.114: b ? 1 : arr @3 Prob. 0.071: b ? 1 : - i @4 Prob. 0.045: b [ 1 ] @5 Prob. 0.029: b ? 1 : 4 Num samples: 15 (15 before filtering) Avg Sample Perplexity: 1.39 Std Sample Perplexity: 0.21 Accuracy@1: 73.3333% Accuracy@5: 100.0000%

[.] JSON DOM tree -> 1hot features
[.] JSON DOM tree -> adjacency M
[2019-06-24 Mon]

Xiyang Chen [4:05 AM] After some thinking and surveying, I’m leaning towards a dual encoder-decoder setup as a possible backbone architecture for our unsupervised learning task, where the latent spaces z could be tied together via a loss (maybe Wasserstein/optimal transport loss). This is also being used for some works on multimodal training tasks such as VQA, where one autoencoder is responsible for the visual (image) part and the other autoencoder for the text part. For our case it would be something like one autoencoder for HTML tree and another for screenshot, maybe conditioned on width of the window as well as desktop/mobile mode. More on this later. Meanwhile let me know if you have any thoughts/opinions. (edited)

Xiyang Chen [4:12 AM] Another radically different/maybe related approach is just use a CNN on screenshots with auxiliary HTML DOM node info, with skip connections on the hierarchy. But this idea is still vague. (edited)

Sheng Jia [11:43 AM] Is there any work on generating the graph or tree structure? (about the autoencoder for HTML) (edited) or are we assuming the fixed structure for now

[2019-06-28 Fri]

raceback (most recent call last): File “exec.py”, line 85, in <module> main_f(res_dir, settings, hparams_list, paths_list, prints_dict) File “/home/will/DevAcademics/GraphNN/DOM-Q-NET/entries/q_template.py”, line 224, in main num_atoms=qlearn_hs.get(“num_atoms”) File “/home/will/DevAcademics/GraphNN/DOM-Q-NET/algorithms/qlearn.py”, line 73, in multitask_train t_config.batch_device File “/home/will/DevAcademics/GraphNN/DOM-Q-NET/actors/dqn_actor.py”, line 93, in __init__ self._raw_s_t = self._env.reset() File “/home/will/DevAcademics/GraphNN/DOM-Q-NET/miniwob/env.py”, line 36, in reset self._instance.force_stop() File “/home/will/DevAcademics/GraphNN/DOM-Q-NET/miniwob/instance.py”, line 134, in force_stop self._driver.execute_script(‘return core.endEpisode(0);’) File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py”, line 636, in execute_script ‘args’: converted_args})[‘value’] File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py”, line 321, in execute self.error_handler.check_response(response) File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py”, line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.JavascriptException: Message: javascript error: core is not defined

https://chromedriver.storage.googleapis.com/index.html?path=75.0.3770.90/

docker commands

docker run -d -p 4444:4444 -v /dev/shm:/dev/shm selenium/standalone-chrome-debug:3.6.0-bromine

docker run -d -p 4444:4444 -v /dev/shm:/dev/shm selenium/standalone-chrome:3.141.59-radium

link dump

CHROMEDRIVER SELENIUM ChromeDriver - WebDriver for Chrome ChromeDriver · SeleniumHQ/selenium Wiki Selenium Documentation — Selenium Documentation Selenium using Python - Geckodriver executable needs to be in PATH - Stack Overflow SeleniumHQ/selenium: A browser automation framework and ecosystem. How to Setup Selenium with ChromeDriver on Ubuntu 18.04 & 16.04 – TecAdmin python - selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: File not found error invoking send_keys() using Selenium - Stack Overflow selenium (Session info: headless chrome=75.0.3770.100) - Google Search DevToolsActivePort file doesn’t exist. · Issue #46 · heroku/heroku-buildpack-google-chrome 2470 - Chromedriver produces error when run via cron -> DevToolsActivePort file doesn’t exist - chromedriver - Monorail selenium - WebDriverException: unknown error: DevToolsActivePort file doesn’t exist while trying to initiate Chrome Browser - Stack Overflow python - I got error message while input string into selenium webdriver - Stack Overflow url encoding - How to urlencode a querystring in Python? - Stack Overflow How to convert a url string to safe characters with python? - Stack Overflow Live Coding: Selenium Browser Automation | DevDungeon SeleniumHQ/docker-selenium: Docker images for Selenium Grid Server (Standalone, Hub, and Nodes). Using Selenium-Server on Docker to run your Browser Tests - Meltwater Engineering Blog DevDungeon | Virtual Hackerspace python - urllib.urlencode: TypeError not a valid non-string sequence or mapping object - Stack Overflow selenium.common.exceptions — Selenium 3.14 documentation javascript - Error selenium.common.exceptions.JavascriptException: Message: ReferenceError: room is not defined - Stack Overflow selenium.webdriver.chrome.options.Options Python Example python - selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally with ChromeDriver Chrome and Selenium - Stack Overflow Version Selection - ChromeDriver - WebDriver for Chrome Autograd: Automatic Differentiation — PyTorch Tutorials 1.1.0 documentation [[https://arxiv.org/abs/1810.10531v1?utm_content=buffer48508&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer][[1810.10531v1] A mathematical theory of semantic development in deep neural networks]]

DOCKER Docker Selenium. Getting Started - YouTube selenium webdriver docker standalone - YouTube How To Run Your Selenium Tests Headlessly in Docker - Chris Kenst HN Search powered by Algolia Show HN: Python script to automate filling out Google form using Selenium | Hacker News vedipen/AutomateGoogleForm: Python script for automating Google form filling. Automate the Boring Stuff with Python Learn Selenium - Best Selenium Tutorials (Ranked) | Hackr.io 7 Must-read Selenium Tutorials - Applitools Blog christian-bromann/awesome-selenium: A curated list of delightful Selenium resources. Reviews of ‘TestNG Tutorials for Selenium Webdriver’ for learning Selenium | Hackr.io selenium blog/How to use selenium with docker.md at 512d4b69242af09af7dd1f83261e1bc661ec1d23 · Windsooon/blog 7. WebDriver API — Selenium Python Bindings 2 documentation Getting Started with Hub and Nodes · SeleniumHQ/docker-selenium Wiki Upgrade chromedriver to 2.32 · Issue #560 · SeleniumHQ/docker-selenium Selenium Standalone v.3.141.59 2. Getting Started — Selenium Python Bindings 2 documentation Chrome Options & Desiredcapabilities: AdBlocker, Incognito, Headless Headless chrome is not working in the docker · Issue #520 · SeleniumHQ/docker-selenium SchulteMarkus/selenium-standalone-chrome-spring-boot-demo: Demonstrating “selenium/standalone-chrome” in a Spring Boot project WebDriver Hub

HN Search powered by Algolia Free Hotel Wifi with Python and Selenium | Hacker News Free Hotel Wifi with Python and Selenium · Gokberk Yaltirakli Selenium 2.0: Out Now | Hacker News How to make Selenium tests reliable, scalable, and maintainable | Hacker News Selenium: 7 Things You Need To Know - Lucidchart Consistent Selenium Testing in Python | Hacker News Hacker News Hacker News How to Scrape Web using Python, Selenium and Beautiful Soup · Swetha’s Blog Hacker News weskerfoot/DeleteFB: Automate Scrubbing your Facebook Presence Hacker News Hacker News Selenium With Headless Chrome On Travis CI GUI and Headless Browser Testing - Travis CI Hacker News Running Headless Selenium with Chrome Hacker News How To Install Node.js on Ubuntu 18.04 | DigitalOcean

[.] Meet : Restart [2019-10-12 Sat]

  • latex doc Representation learning of HTML at large scale and its applications to downstream tasks - Online LaTeX Editor Overleaf
  • 3rd part experiments
  • refs
  • model of DOMs. OUTPUT:
    • (automated) Web navigation
    • Classification (easiest)
    • HTML generation
  • Sheng: which areas are clickable (for RL)? filter out what is clickable or not
    • test.ai is classifying buttons
      • QA/test automation
      • pixelwise presentation screen (images),
    • action space factorization
      • Xiyang interested in parsing part
    • only one cite
    • using GCN for parse (classification, etc, transductive not inductive – all nodes need to be known ahead of time)
      • closest alt is molecular generation (aisc talk)
  • some small task
  • like an autoencoder?
    • input simplest: RNN – any length
  • 2 AEs
    • html
    • images
    • somehow regularize 2 codes so they somehow relate to each other
      • 2-4 papers, image labelling
    • graphNN for encode/decode html
      • alts: transformer (150 tokens max / gpu)
      • gae, graphsage
    • image AE
      • conv
  • transformer decodeing is sequantial
  • sketch to code
  • Turning Design Mockups Into Code With Deep Learning
    • LSTM can do it, but slow, not scale. we want scale
    • transformer has heirarchy rep, representational power he thinks better than LSTM
    • but MS active with graphs for code generation
  • so we know works for RNNs, good reason to think it will work for Transformer, graphNN proof from MS work.
  • How to deal with in page images?
    • also text

[.] jobs

first two priority:

  1. literature review
  2. crawling 1000-2000 pages for data
    • sketch website has a simple dataset
  3. then small experiment for theoretical, use data we crawled
  4. optimal transport – make 2 distrib as close as possible while making differentiable.
  5. see if any repos of the papers
  6. DOM tree into form good for DNNs
[.] review other latest, relevant papers

normalizing adj code

gcn-repo-normalizefn

def normalize_adj(adj):
    """Symmetrically normalize adjacency matrix."""
    adj = sp.coo_matrix(adj)
    rowsum = np.array(adj.sum(1))
    d_inv_sqrt = np.power(rowsum, -0.5).flatten()
    d_inv_sqrt[np.isinf(d_inv_sqrt)] = 0.
    d_mat_inv_sqrt = sp.diags(d_inv_sqrt)
    return adj.dot(d_mat_inv_sqrt).transpose().dot(d_mat_inv_sqrt).tocoo()

pygcn-repo-normalefn

def normalize(mx):
    """Row-normalize sparse matrix"""
    rowsum = np.array(mx.sum(1))
    r_inv = np.power(rowsum, -1).flatten()
    r_inv[np.isinf(r_inv)] = 0.
    r_mat_inv = sp.diags(r_inv)
    mx = r_mat_inv.dot(mx)
    return mx

neural-dom meet 2 [2019-10-19 Sat]

  • two new papers:
    • generalized zero and few-shot paper talks motivation and history of why
      • refs their old paper from ICLR
      • poster for this one
    • pluralistic image completion
  • encoder: lstm, transformer, graphnn?
  • decoder:
    • generating code in parallel – we want it to scale
  • anything mentions tree or heirarchy
  • molecule generation via graphnn – heavy constraints though

[2019-10-20 Sun]

paper
Efficient Graph Generation with Graph Recurrent Attention Networks ~/Zotero/storage/J484VNAT/Liao et al. - 2019 - Efficient Graph Generation with Graph Recurrent At.pdf
Skeleton
Link on page 1: t deep graph generative model that can scale to this size. Our code is released at: https://github.com/lrjconan/GRAN1 IntroductionNotes for page 2Notes for page 2Notes for page 22 Model2.1 Representation of Graphs and the Generation Process2.2 Graph Recurrent Attention Networks2.3 Learning with Families of Canonical Orderings3 Related Work4 Experiments4.1 Dataset and Evaluation Metrics4.2 Benchmarking Sample Quality4.3 Efficiency vs. Sample Quality4.4 Ablation Study5 Conclusion6 Appendix6.1 K-Core Node Ordering6.2 Lobster Graphs6.3 Full Ablation Study & Visual Examples

AISC Workshops

NLP workshop

AISC State of NLP in 2019

  • DEALS
    • To show our appreciation for your patience, we would like to offer you a 50% off discount for any of our workshops you register until the end of September. Code: nlpjun27
    • cheap I suggest you to use GCP ($ 400 of free credit ) and use the image provided by fast.ai to do that because it comes with all of required packages (hopefully) installed. Here is a guide how to do it: https://course.fast.ai/start_gcp.html Then download the notebook from google CoLab and upload it on your GPU VM on GCP.
  • stoi = string to identifier
  • itos = id to string

NLP workshop #1

[2019-06-27 Thu]

  • logistics for deeper nets is worse – vanishing/exploding gradients problem, hence ReLU
  • NLP predict next word
  • Viterbi algo
  • RNN drawbacks, hard to compete against transformers
  • Part II
  • Huggingface BERT
  • stanford QA set
  • Huggingface good
  • 3 levels tokenization - words, etc
  • autoencoding: representation learning
  • neural network methods in language tel aviv
  • statistical natural lang processing chris manning, shutz - older
    • matrix factorization
  • drovesky martin - speech and language processing

NLP workshop #3

BERT is just the encoder part of the transformer, trained with different tasks

for unbalanced data

from aisc/nlp-workshop posts july6 Passing the weights to CrossEntropyLoss correctly - PyTorch Forums with this method weightings are applied to the calculations to give a different weighting for output classes.

ufoym/imbalanced-dataset-sampler: A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones. this looks like a resampler like smote, rose.

from sampler import ImbalancedDatasetSampler

train_loader = torch.utils.data.DataLoader(
    train_dataset, 
    sampler=ImbalancedDatasetSampler(train_dataset),  # this is the module
    batch_size=args.batch_size, 
    **kwargs
)

this is about label-smoothing loss With label smoothing, KL-divergence between qsmoothed ground truth prob.(w) pprob. computed by model(w) is minimized. label smoothing reduces onehot targets from (0,1) to numbers representing some uncertainty ie (0.1, 0.9). Example is given where this is used with dirty data when some instances are mis-labeled. Both frameworks have args for this. OpenNMT-py/loss.py at e8622eb5c6117269bb3accd8eb6f66282b5e67d9 · OpenNMT/OpenNMT-py

[x] NLP ASGN 2

Assignment 2: the second assignment aims to use the encoder of the transformer architecture to tackle the QUORA DEDUPLICATION task that you worked on during the first assignment. Can you improve the performance of your model using the transformer architecture? Remember you can only swap the MODEL part of your old code with the transformer encoder and change the relevant parameters (input size and alike). To that end you can use BERT’s Encoder or the code in the study material above (the latter I recommend).

Transformer II - version from medium page
Embedding
class Embedder(nn.Module):
    def __init__(self, vocab_size, d_model):
        super().__init__()
        self.embed = nn.Embedding(vocab_size, d_model)
    def forward(self, x):
        return self.embed(x)
positional encoding
class PositionalEncoder(nn.Module):
    def __init__(self, d_model, max_seq_len = 80):
        super().__init__()
        self.d_model = d_model
        
        # create constant 'pe' matrix with values dependant on 
        # pos and i
        pe = torch.zeros(max_seq_len, d_model)
        for pos in range(max_seq_len):
            for i in range(0, d_model, 2):
                pe[pos, i] = \
                math.sin(pos / (10000 ** ((2 * i)/d_model)))
                pe[pos, i + 1] = \
                math.cos(pos / (10000 ** ((2 * (i + 1))/d_model)))
                
        pe = pe.unsqueeze(0)
        self.register_buffer('pe', pe)
 
    
    def forward(self, x):
        # make embeddings relatively larger
        x = x * math.sqrt(self.d_model)
        #add constant to embedding
        seq_len = x.size(1)
        x = x + Variable(self.pe[:,:seq_len], \
        requires_grad=False).cuda()
        return x
masks
batch = next(iter(train_iter))
input_seq = batch.English.transpose(0,1)
input_pad = EN_TEXT.vocab.stoi['<pad>']
# creates mask with 0s wherever there is padding in the input
input_msk = (input_seq != input_pad).unsqueeze(1)

for the target_seq we do the same, but then create an additinoal step

# create mask as before
target_seq = batch.French.transpose(0,1)
target_pad = FR_TEXT.vocab.stoi['<pad>']
target_msk = (target_seq != target_pad).unsqueeze(1)
size = target_seq.size(1) # get seq_len for matrix
nopeak_mask = np.triu(np.ones(1, size, size),
k=1).astype('uint8')
nopeak_mask = Variable(torch.from_numpy(nopeak_mask) == 0)
target_msk = target_msk & nopeak_mask
MHA
class MultiHeadAttention(nn.Module):
    def __init__(self, heads, d_model, dropout = 0.1):
        super().__init__()
        
        self.d_model = d_model
        self.d_k = d_model // heads
        self.h = heads
        
        self.q_linear = nn.Linear(d_model, d_model)
        self.v_linear = nn.Linear(d_model, d_model)
        self.k_linear = nn.Linear(d_model, d_model)
        self.dropout = nn.Dropout(dropout)
        self.out = nn.Linear(d_model, d_model)
    
    def forward(self, q, k, v, mask=None):
        
        bs = q.size(0)
        
        # perform linear operation and split into h heads
        
        k = self.k_linear(k).view(bs, -1, self.h, self.d_k)
        q = self.q_linear(q).view(bs, -1, self.h, self.d_k)
        v = self.v_linear(v).view(bs, -1, self.h, self.d_k)
        
        # transpose to get dimensions bs * h * sl * d_model
       
        k = k.transpose(1,2)
        q = q.transpose(1,2)
        v = v.transpose(1,2)
# calculate attention using function we will define next
        scores = attention(q, k, v, self.d_k, mask, self.dropout)
        
        # concatenate heads and put through final linear layer
        concat = scores.transpose(1,2).contiguous()\
        .view(bs, -1, self.d_model)
        
        output = self.out(concat)
    
def attention(q, k, v, d_k, mask=None, dropout=None):
    
    scores = torch.matmul(q, k.transpose(-2, -1)) /  math.sqrt(d_k)
if mask is not None:
        mask = mask.unsqueeze(1)
        scores = scores.masked_fill(mask == 0, -1e9)
scores = F.softmax(scores, dim=-1)
    
    if dropout is not None:
        scores = dropout(scores)
        
    output = torch.matmul(scores, v)
    return output
FFNN
class FeedForward(nn.Module):
    def __init__(self, d_model, d_ff=2048, dropout = 0.1):
        super().__init__() 
        # We set d_ff as a default to 2048
        self.linear_1 = nn.Linear(d_model, d_ff)
        self.dropout = nn.Dropout(dropout)
        self.linear_2 = nn.Linear(d_ff, d_model)
    def forward(self, x):
        x = self.dropout(F.relu(self.linear_1(x)))
        x = self.linear_2(x)
        return x
normalization
class Norm(nn.Module):
    def __init__(self, d_model, eps = 1e-6):
        super().__init__()
    
        self.size = d_model
        # create two learnable parameters to calibrate normalisation
        self.alpha = nn.Parameter(torch.ones(self.size))
        self.bias = nn.Parameter(torch.zeros(self.size))
        self.eps = eps
    def forward(self, x):
        norm = self.alpha * (x - x.mean(dim=-1, keepdim=True)) \
        / (x.std(dim=-1, keepdim=True) + self.eps) + self.bias
        return norm
E D layers
# build an encoder layer with one multi-head attention layer and one # feed-forward layer
class EncoderLayer(nn.Module):
    def __init__(self, d_model, heads, dropout = 0.1):
        super().__init__()
        self.norm_1 = Norm(d_model)
        self.norm_2 = Norm(d_model)
        self.attn = MultiHeadAttention(heads, d_model)
        self.ff = FeedForward(d_model)
        self.dropout_1 = nn.Dropout(dropout)
        self.dropout_2 = nn.Dropout(dropout)
        
    def forward(self, x, mask):
        x2 = self.norm_1(x)
        x = x + self.dropout_1(self.attn(x2,x2,x2,mask))
        x2 = self.norm_2(x)
        x = x + self.dropout_2(self.ff(x2))
        return x
    
# build a decoder layer with two multi-head attention layers and
# one feed-forward layer
class DecoderLayer(nn.Module):
    def __init__(self, d_model, heads, dropout=0.1):
        super().__init__()
        self.norm_1 = Norm(d_model)
        self.norm_2 = Norm(d_model)
        self.norm_3 = Norm(d_model)
        
        self.dropout_1 = nn.Dropout(dropout)
        self.dropout_2 = nn.Dropout(dropout)
        self.dropout_3 = nn.Dropout(dropout)
        
        self.attn_1 = MultiHeadAttention(heads, d_model)
        self.attn_2 = MultiHeadAttention(heads, d_model)
        self.ff = FeedForward(d_model).cuda()
def forward(self, x, e_outputs, src_mask, trg_mask):
        x2 = self.norm_1(x)
        x = x + self.dropout_1(self.attn_1(x2, x2, x2, trg_mask))
        x2 = self.norm_2(x)
        x = x + self.dropout_2(self.attn_2(x2, e_outputs, e_outputs,
        src_mask))
        x2 = self.norm_3(x)
        x = x + self.dropout_3(self.ff(x2))
        return x
# We can then build a convenient cloning function that can generate multiple layers:
def get_clones(module, N):
    return nn.ModuleList([copy.deepcopy(module) for i in range(N)])
E & D
class Encoder(nn.Module):
    def __init__(self, vocab_size, d_model, N, heads):
        super().__init__()
        self.N = N
        self.embed = Embedder(vocab_size, d_model)
        self.pe = PositionalEncoder(d_model)
        self.layers = get_clones(EncoderLayer(d_model, heads), N)
        self.norm = Norm(d_model)
    def forward(self, src, mask):
        x = self.embed(src)
        x = self.pe(x)
        for i in range(N):
            x = self.layers[i](x, mask)
        return self.norm(x)
    
class Decoder(nn.Module):
    def __init__(self, vocab_size, d_model, N, heads):
        super().__init__()
        self.N = N
        self.embed = Embedder(vocab_size, d_model)
        self.pe = PositionalEncoder(d_model)
        self.layers = get_clones(DecoderLayer(d_model, heads), N)
        self.norm = Norm(d_model)
    def forward(self, trg, e_outputs, src_mask, trg_mask):
        x = self.embed(trg)
        x = self.pe(x)
        for i in range(self.N):
            x = self.layers[i](x, e_outputs, src_mask, trg_mask)
        return self.norm(x)
transformer
class Transformer(nn.Module):
    def __init__(self, src_vocab, trg_vocab, d_model, N, heads):
        super().__init__()
        self.encoder = Encoder(src_vocab, d_model, N, heads)
        self.decoder = Decoder(trg_vocab, d_model, N, heads)
        self.out = nn.Linear(d_model, trg_vocab)
    def forward(self, src, trg, src_mask, trg_mask):
        e_outputs = self.encoder(src, src_mask)
        d_output = self.decoder(trg, e_outputs, src_mask, trg_mask)
        output = self.out(d_output)
        return output
# we don't perform softmax on the output as this will be handled 
# automatically by our loss function

[x] NLP ASGN 3

see A3.org

[2019-07-11 Thu]

running the transformer article repo on gpgpu, needed to do following (env ‘main’) 817 conda install -c derickl torchtext 827 conda install dill 832 python -m spacy download en 833 python -m spacy download fr 829 python train.py -src_data data/english.txt -trg_data data/french.txt -src_lang en -trg_lang fr start time [2019-07-11 Thu] 09:52pm

[2019-07-12 Fri]

  • I moved opt.train to cuda, check on gpgpu later

[2019-07-26 Fri] going thru slack posts

if low VRAM: reduce batch size and number of encoder layers

Wen Ho encoder model

EncoderWrapper(
  (encoder): Encoder(
    (embed): Embedder(
      (embed): Embedding(85519, 256)
    )
    (pe): PositionalEncoder(
      (dropout): Dropout(p=0.3)
    )
    (layers): ModuleList(
      (0): EncoderLayer(
        (norm_1): Norm()
        (norm_2): Norm()
        (attn): MultiHeadAttention(
          (q_linear): Linear(in_features=256, out_features=256, bias=True)
          (v_linear): Linear(in_features=256, out_features=256, bias=True)
          (k_linear): Linear(in_features=256, out_features=256, bias=True)
          (dropout): Dropout(p=0.3)
          (out): Linear(in_features=256, out_features=256, bias=True)
        )
        (ff): FeedForward(
          (linear_1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.3)
          (linear_2): Linear(in_features=2048, out_features=256, bias=True)
        )
        (dropout_1): Dropout(p=0.3)
        (dropout_2): Dropout(p=0.3)
      )
      (1): EncoderLayer(
        (norm_1): Norm()
        (norm_2): Norm()
        (attn): MultiHeadAttention(
          (q_linear): Linear(in_features=256, out_features=256, bias=True)
          (v_linear): Linear(in_features=256, out_features=256, bias=True)
          (k_linear): Linear(in_features=256, out_features=256, bias=True)
          (dropout): Dropout(p=0.3)
          (out): Linear(in_features=256, out_features=256, bias=True)
        )
        (ff): FeedForward(
          (linear_1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.3)
          (linear_2): Linear(in_features=2048, out_features=256, bias=True)
        )
        (dropout_1): Dropout(p=0.3)
        (dropout_2): Dropout(p=0.3)
      )
      (2): EncoderLayer(
        (norm_1): Norm()
        (norm_2): Norm()
        (attn): MultiHeadAttention(
          (q_linear): Linear(in_features=256, out_features=256, bias=True)
          (v_linear): Linear(in_features=256, out_features=256, bias=True)
          (k_linear): Linear(in_features=256, out_features=256, bias=True)
          (dropout): Dropout(p=0.3)
          (out): Linear(in_features=256, out_features=256, bias=True)
        )
        (ff): FeedForward(
          (linear_1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.3)
          (linear_2): Linear(in_features=2048, out_features=256, bias=True)
        )
        (dropout_1): Dropout(p=0.3)
        (dropout_2): Dropout(p=0.3)
      )
      (3): EncoderLayer(
        (norm_1): Norm()
        (norm_2): Norm()
        (attn): MultiHeadAttention(
          (q_linear): Linear(in_features=256, out_features=256, bias=True)
          (v_linear): Linear(in_features=256, out_features=256, bias=True)
          (k_linear): Linear(in_features=256, out_features=256, bias=True)
          (dropout): Dropout(p=0.3)
          (out): Linear(in_features=256, out_features=256, bias=True)
        )
        (ff): FeedForward(
          (linear_1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.3)
          (linear_2): Linear(in_features=2048, out_features=256, bias=True)
        )
        (dropout_1): Dropout(p=0.3)
        (dropout_2): Dropout(p=0.3)
      )
      (4): EncoderLayer(
        (norm_1): Norm()
        (norm_2): Norm()
        (attn): MultiHeadAttention(
          (q_linear): Linear(in_features=256, out_features=256, bias=True)
          (v_linear): Linear(in_features=256, out_features=256, bias=True)
          (k_linear): Linear(in_features=256, out_features=256, bias=True)
          (dropout): Dropout(p=0.3)
          (out): Linear(in_features=256, out_features=256, bias=True)
        )
        (ff): FeedForward(
          (linear_1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.3)
          (linear_2): Linear(in_features=2048, out_features=256, bias=True)
        )
        (dropout_1): Dropout(p=0.3)
        (dropout_2): Dropout(p=0.3)
      )
      (5): EncoderLayer(
        (norm_1): Norm()
        (norm_2): Norm()
        (attn): MultiHeadAttention(
          (q_linear): Linear(in_features=256, out_features=256, bias=True)
          (v_linear): Linear(in_features=256, out_features=256, bias=True)
          (k_linear): Linear(in_features=256, out_features=256, bias=True)
          (dropout): Dropout(p=0.3)
          (out): Linear(in_features=256, out_features=256, bias=True)
        )
        (ff): FeedForward(
          (linear_1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.3)
          (linear_2): Linear(in_features=2048, out_features=256, bias=True)
        )
        (dropout_1): Dropout(p=0.3)
        (dropout_2): Dropout(p=0.3)
      )
    )
    (norm): Norm()
  )
  (out): Linear(in_features=256, out_features=1, bias=True)
  (sig): Sigmoid()
)

hyper params:

  • wen: Configuration of the Encoder net: hidden dimension = 256, N=6 and heads = 8
  • Tracy Pham: My last Linear layer has `in_features = (max_seq_length * d_model)`. in your model, `in_features=256` = d_model?
  • Got the correct dimension of the last linear layer, `in_features = (max_seq_length * d_model)`
  • Now accuracy is at ~.78 with `d_model=512`, `N=1`, `heads=2`, and `batch_size=50`.
    • but d_model should be embedding dimension by reading the example repo code

werner on imports for that transformer post

import torch
import torch.nn as nn
from torch.autograd import Variable

import spacy
import torchtext
from torchtext import data
from torchtext.data import Field, BucketIterator, TabularDataset

from sklearn.model_selection import train_test_split

# from Batch import MyIterator, batch_size_fn
# from Tokenize import tokenize

import os

import numpy as np
import pandas as pd

import math
import copy

import torch.nn.functional as F
import time

AISC Math of DL Workshops [5/5]

HeadlineTime
Total time4:48
\_ AISC Math of DL Workshops [4/5]4:48

TODOs

[x] send note on ticket payment for refund
Time schedule

1st TA Working Session, Introduction 30 1st Notebook preperation 45 2nd Notebook preperation + related work 240 visualization search cleanup drive, files, cleanup notebooks time on slack with students – unknown

Willy RempelJul 17, 20191st TA Working Session, Introduction30
Willy RempelJul 16, 20191st Notebook preperation45
Willy RempelJul 24, 20192nd Notebook preperation + related work?80
Willy RempelJul 25, 2019” ”160
315 / 60 = 5.25 hours315
9 hours class
14 * 14.25 = 199.5014.25
[x] 3rd prep meet
  • issues people had:
    • layers in convnet
    • multiple layers
    • imagenary nums
  • how better handson?
    • issue: not much time

@Willy Rempel will work on selecting some visualizations that help people understand layers in convnet and optimization etc

  1. conv net example to supplement
  2. 2nd section autograd
  3. when we come back – optimizers example walkthrough
  • handson notebook: – go through cells.
    • random code and comments so they have to edit
    • simple english. ‘this code is a breaker, you need to fix below to continue ’
    • for notebook – Amir has shared link and in slides for content

Amir F – draft document on 3rd assignment: writing the blog post if we have any ideas on it please feedback

Amir H. class walkthrough
  • overall: what grad is, how it propogates
    1. exp with autograd
    2. solving a problem GD
    3. putting together
      • linear regression example ?
  1. visualizations
    • convnet visualization
    • autograd, what it means and code
  2. solve problem with GD
    • handson
    • 2nd visualization at the end, otpimzation
  3. pytorch that contains all the concepts
    • datascience blog post
[x] autograd slide 40
Hands-on: autograd
# Create a Rank-2 tensor of all ones
x = torch.ones(2, 2, requires_grad=True)
print(x)
# Define y to be a function of x
y = x+2
# And z to be a function of y (and hence x):
z = 3*y*y
out = z.mean()
print(z, out)
# Now backprop:
out.backward()
# print gradients d(out)/dx
print(x.grad)

prelim meet

  • 1 TA for wed sesh
  • different levels of engagement for asgns
  • grading?
  • students ready and have access to material
    • examples codes into colab notebooks
  • prelim content
    • read this paper&code dynamic deep networks for retinal vessel segmentation sraashis/ature
    • 1st session
      • 1 neuron in pytorch
        • affine maps
        • tensors
        • non-linear
        • parameters
        • linear algebra
      • quick intro to pytorch
        • define tensor in pytorch
        • fill pytorch with a certain scalar
        • fill a pytorch tensor with rands
        • find a min value of a pytorch tensor
        • simple imports
        • pytroch beginner tutorial
        • convert a py list to a pytorch tensor and vice versa
        • tensors and scalars
        • everything in numpy -> do in pytorch
        • transpose
        • dot products
      • exercise: transpose images
        • matrix manipulation
        • matrix determinant
      • eigenvalues
        • then hands-on for module ii
      • non-linearities, acv functions
        • types and what we use in pytorch
        • hands on: activation fns
        • use prior image and apply activation fns to it
      • loss fns
        • where in pytorch

Notes

ws-math-dl-all ws-math-dl-breakout-1 ws-math-dl-breakout-2 ws-math-dl-breakout-3 Breakout Room x - TA Willy Rempel

my online group Ridwan A Jen L Motasem Vikash

Working session 1

This is a working session with one of the TAs where you can ask questions about the workshop, set up, and hands on parts. We will spend the first session making sure everyone has the info they need. The booking is for 2 hours but will most probably end earlier.

Workshop 1

We will cover part 1 of chapter 2 Except 2.12. Well, most of it. And the rest will be given as a reading assignment.

Working session 2

my hangout: https://hangouts.google.com/call/xkxqyaNdzAx8t1qGUy8pAEEE ask people to introduct themselves

fastai has awesome library for preprocessing image data 5 days old

gated convnets

fast.ai recommend

import imageio
# image = imageio.imread("/Users/amirh/Downloads/Veins.png", as_gray=True)
image = imageio.imread("Veins.png", as_gray=True)
image = imageio.imread("Veins.png", format='PNG', as_gray=True)

image = imageio.mimread('Veins.png', as_gray=True)

# image = imageio.imread("https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb", as_gray=True)
import matplotlib.pyplot as plt
fig = plt.figure(); plt.gray()  # show the filtered result in grayscale
ax1 = fig.add_subplot(121)  # left side
ax2 = fig.add_subplot(122)  # right side
result = ndimage.sobel(image)
ax1.imshow(image)
ax2.imshow(result)
plt.show()

# original
!wget "https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb" -O "Veins.png"

# werner
!wget "https://drive.google.com/uc?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb" -O "Veins.png"

!curl "https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb" > Veins.png
!wget "https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb" 
!wget https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb -O "Veins.png"
img = imageio.imread("https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb", as_gray=True)

!wget "https://drive.google.com/uc?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb" -O 'Veins.jpg'

Workshop 2

Workshop 3

[x] prep notebook 1

Mathematics of DL

Authors: Amir Hajian Presenter: [name] Facilitators: [names]

July 2019

Outline What we will learn in these 9 hours? How can I get the most out of it? How to do deep learning in 2019?

2

What to aim for You should be able to follow this work by the end of the workshop 3

Recap from last session 4

Dissecting A DL Architecture 5

An artificial neuron 6

Simplifying the notation It’s all about matrices, vectors and exploring the parameter space to find the right parameters!

Linear Algebra: Tensors and Scalars

8

Getting started with PyTorch:

What is PyTorch How to import it

Exercises with Tensors and Scalars 9 Define a tensor in PyTorch Fill a PyTorch tensor with a certain scalar Fill a PyTorch tensor with random numbers Find minimum value of a PyTorch tensor Convert a Py list to a PyTorch tensor and vice versa

Linear Algebra: Matrix Transpose torch.t() 10 data = torch.randn(200,250) data[100:120,:]=0.5 imshow(data)

imshow(data.t())

Linear Algebra: Dot Product

11

Linear Algebra: Dot Product

12

Linear Algebra: Dot Product 13 torch.matmul(a, b)

Linear Algebra: Matrix Multiplication torch.matmul(M1, M2) 14 data = torch.randn(5) torch.matmul(data,data)

data = torch.randn(2,5) torch.matmul(data,data.ta())

Exercise: Transpose images Simulated Data: Create a random 2D matrix with dimensions 200x250, set columns 100:120 to zero, display it, transpose the matrix, display it again.

Real image data: Read the image provided to you, display it, transpose it, and display it again. 15 data = torch.randn(200,250) data[100:120,:]=0.5 imshow(data)

imshow(data.t())

Linear Algebra: Matrix Determinant

16 data = torch.randn(2,2) torch.det(data)

Linear Algebra: Eigenvalues

17 data = torch.randn(2,2) torch.det(data)

Non-linearities

18

Non-linearities, activation functions Types of activation functions and what we use in PyTorch Affine Maps: f(x)=Ax+b

PyTorch way: lin = nn.Linear(5, 3) data = torch.randn(2, 5) lin(data) 19 We’ll do the first example from here: https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim

torch.manual_seed(1)

lin = nn.Linear(5, 3) # maps from R^5 to R^3, parameters A, b

data = torch.randn(2, 5) print(lin(data)) # yes

Non-linearities:

Types of activation functions and what we use in PyTorch Non-linearities f(x) = Ax+b g(x) = Cx+d f(g(x)) = A(Cx+d)+b = ACx + ( Ad + b)

What to use?

20

Hands on: Types of activation functions and what we use in PyTorch Apply non-linearities in PyTorch

21 Define a ReLU layer in PyTorch Work with non-linearities: Plot a relu function Plot a tanh function Plot a sigmoid function and observe how it is a distribution function

Apply ReLU to the image we uploaded earlier

We’ll do the first example from here: https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim

torch.manual_seed(1)

lin = nn.Linear(5, 3) # maps from R^5 to R^3, parameters A, b

data = torch.randn(2, 5) print(lin(data)) # yes

Loss functions

Loss functions in PyTorch 22 torch.nn.MSELoss torch.nn.CrossEntropyLoss

Experiments

23 Define a tensor in PyTorch Fill a PyTorch tensor with a certain scalar Fill a PyTorch tensor with random numbers Find minimum value of a PyTorch tensor Reshape a tensor Flatten a tensor Convert a Py list to a PyTorch tensor and vice versa Multiply tensors with a scalar Dot product two tensors Transpose a matrix in PyTorch Matrix Multiplications in PyTorch Define a ReLU layer in PyTorch

Work with non-linearities: Plot a relu function Plot a tanh function Plot a sigmoid function and observe how it is a distribution function Something like this for playing with non-linearities:

data = torch.arange(-2,2,step=0.1) plot(data.numpy(),torch.tanh(data).numpy()) show()

We will follow these examples for playing with non-liearities: https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html

For tensors we will follow these: https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html

[x] A2 notebook

  • [X] hands-on colab notebook - Me
    • hands-on #1 1D conv
      • hide cells?
      • have them search and google
      • give hints, but have elements missing
      • also todo in pytorch, have to go from np to pytorch
    • handson #2
      • from Amir, edge detection code
      • also do edge detection on bio-med pic from last time.
    • dropouts?

other guys:

  • skeleton of blog post
    • proof of work instead
    • involved in the last week
  • TA hour tomorrow
slides skeleton
1st part

Mathematics of Deep Learning - II Authors: Amir Hajian

July 2019

Outline Convolution: Why we need more than MLP? What is convolution? What is a kernel? What does it do? 1D convolution 2D convolution Hands-on experiments with convolutions in python Efficient convolution algorithms ConvNets: a lightning fast introduction to their structure (conv layers, pooling, etc) and their applications. Dropout: How we prevent overfitting in neural networks? What is the math behind it? How do you use it in PyTorch?

2 What is a convolution? Formal definition: convolution is a mathematical operation on two functions (f and g) to obtain a third function that expresses how the shape of one is modified by the other.

3 What is a convolution? Formal definition: convolution is a mathematical operation on two functions (f and g) to obtain a third function that expresses how the shape of one is modified by the other.

4 Practical definition: Take a function f Take a function g Shift g by a finite amount T Multiply f with the shifted g: f(t) g(t-T) sum over the whole range to get the value of f*g at point T Go to step c and repeat for all T values.

What is a convolution? A simple example: What is the result of convolving a delta function with a Gaussian kernel? 5

What is a convolution? A simple example: What is the result of convolving a delta function with a Gaussian? 6

What is a convolution? A simple example: What is the result of convolving a delta function with a Gaussian kernel? 7 Read more

What is a convolution? A simple example: What is the result of convolving a delta function with a Gaussian kernel? 8

What is a convolution? A visual example: 9

What is a convolution? A visual example: 10

What is a convolution? A visual example: 11

What is a convolution? A visual example: 12

What is a convolution? How to code it up?

handson

In Python: scipy.signal.convolve for 1D conv scipy.signal.conv2d for 2D 13

Experiments: Hands-on 14 Experiment with convolutions in 1D by smoothing a top-hat function with a Hann function Define a top-hat function that is non-zero in the range of [100:200] Define a Hann function between 0 and 50 - Hint: use scipy.signal.hann Apply the Hann function to the top-hat - Hint: use signal.convolve Plot the signal before and after smoothing to see the result. Discuss with your teammates to make sure you understand the results. Repeat it with PyTorch Conv function at home. from scipy import signal sig = np.repeat([0., 1., 0.], 100) win = signal.hann(50) filtered = signal.convolve(sig, win, mode=’same’) / sum(win)

import matplotlib.pyplot as plt fig, (ax_orig, ax_win, ax_filt) = plt.subplots(3, 1, sharex=True) ax_orig.plot(sig) ax_orig.set_title(‘Original pulse’) ax_orig.margins(0, 0.1) ax_win.plot(win) ax_win.set_title(‘Filter impulse response’) ax_win.margins(0, 0.1) ax_filt.plot(filtered) ax_filt.set_title(‘Filtered signal’) ax_filt.margins(0, 0.1) fig.tight_layout() fig.show()

Application: smoothing/binning noisy functions original_signal = torch.randn([1,1,100]) kernel = torch.ones([1,1,10]) smooth_signal = torch.conv1d(original_signal, kernel, padding=5)/kernel.sum() 15 Pick the kernel to be a top-hat function of length L Convolve a noisy function with the kernel Observe how the function is binned using this operation. Note: To plot you need to convert to numpy and flatten:

plot(original_signal.numpy().flatten(), label=”Original Signal”) plot(smooth_signal.numpy().flatten(), label=”Smooth Signal”)

Application: smoothing/binning noisy functions In PyTorch: torch.conv1d(original_signal, kernel, padding=5) 16 Pick the kernel to be a top-hat function of length L Convolve a noisy function with the kernel Observe how the function is binned using this operation. original_signal = torch.randn([1,1,100]) kernel = torch.ones([1,1,10]) smooth_signal = torch.conv1d(original_signal, kernel, padding=5)/kernel.sum() Note: To plot you need to convert to numpy and flatten:

plot(original_signal.numpy().flatten(), label=”Original Signal”) plot(smooth_signal.numpy().flatten(), label=”Smooth Signal”)

Hands-On: 1D Convolution in PyTorch Experiment with 1D conv in PyTorch by recreating this plot to bin a noisy function. Get creative. Pick your own function. Add noise to it. Pick different kernels, experiment with the width and shape of the kernels. original_signal = torch.randn([1,1,100]) kernel = torch.ones([1,1,10]) smooth_signal = torch.conv1d(original_signal, kernel, padding=5)/kernel.sum() 17 Note: To plot you need to convert to numpy and flatten:

plot(original_signal.numpy().flatten(), label=”Original Signal”) plot(smooth_signal.numpy().flatten(), label=”Smooth Signal”)

Convolutions in 2D: A step towards ConvNets 18 Note: To plot you need to convert to numpy and flatten:

plot(original_signal.numpy().flatten(), label=”Original Signal”) plot(smooth_signal.numpy().flatten(), label=”Smooth Signal”)

2D Convolution: Detect Edges with Sobel Operator 19 import imageio image = imageio.imread(“/Users/amirh/Downloads/Veins.png”, as_gray=True) from scipy import ndimage, misc import matplotlib.pyplot as plt fig = plt.figure(); plt.gray() # show the filtered result in grayscale ax1 = fig.add_subplot(121) # left side ax2 = fig.add_subplot(122) # right side result = ndimage.sobel(image) ax1.imshow(image) ax2.imshow(result) plt.show()

Experiments: Hands-on time 20 Experiment with convolutions in 2D to detect edges in an image Read the image and convert it to grey. Define the kernel Apply the kernel to the image using scipy.signal.convolve2d Plot the results Try Sobel Kernel as well as Scharr Kernel. See the difference in the results? Note for TA’s: here is a sample solution.

from scipy import signal sig = np.repeat([0., 1., 0.], 100) win = signal.hann(50) filtered = signal.convolve(sig, win, mode=’same’) / sum(win)

import matplotlib.pyplot as plt fig, (ax_orig, ax_win, ax_filt) = plt.subplots(3, 1, sharex=True) ax_orig.plot(sig) ax_orig.set_title(‘Original pulse’) ax_orig.margins(0, 0.1) ax_win.plot(win) ax_win.set_title(‘Filter impulse response’) ax_win.margins(0, 0.1) ax_filt.plot(filtered) ax_filt.set_title(‘Filtered signal’) ax_filt.margins(0, 0.1) fig.tight_layout() fig.show()

2D Convolution: Detect Edges with Sobel Operator

21 import imageio from scipy import signal from scipy import misc image = imageio.imread(“/Users/amirh/Downloads/Veins.png”, as_gray=True) sobel_y = np.array([[ -1, -2, -1], [0, 0, 0], [ 1, 2, 1]]) sobel = signal.convolve2d(image, sobel_y, boundary=’symm’, mode=’same’)

import matplotlib.pyplot as plt fig, (ax_orig, ax_mag) = plt.subplots(1, 2) ax_orig.imshow(image, cmap=’gray’) ax_orig.set_title(‘Original’) ax_orig.set_axis_off() ax_mag.imshow(np.absolute(sobel_y), cmap=’gray’) ax_mag.set_title(‘Sobel Applied’) ax_mag.set_axis_off() fig.show()

Exercise: 1) find edges in the x-direction using sobel_x = np.array([[ -1, 0, +1], [-2, 0, +2], [ -1, 0, +1]]) Exercise: 2) combine x and y results to get a final result

import imageio from scipy import signal from scipy import misc image = imageio.imread(“/Users/amirh/Downloads/Veins.png”, as_gray=True) sobel_y = np.array([[ -1, -2, -1], [0, 0, 0], [ 1, 2, 1]]) sobel_x = np.array([[ -1, 0, +1], [-2, 0, +2], [ -1, 0, +1]])

result = signal.convolve2d(image, sobel_x, boundary=’symm’, mode=’same’)

import matplotlib.pyplot as plt fig, (ax_orig, ax_mag) = plt.subplots(1, 2) ax_orig.imshow(image, cmap=’gray’) ax_orig.set_title(‘Original’) ax_orig.set_axis_off() ax_mag.imshow(np.absolute(result), cmap=’gray’) ax_mag.set_title(‘Sobel Applied’) ax_mag.set_axis_off() fig.show()

import imageio from scipy import signal from scipy import misc image = imageio.imread(“/Users/amirh/Downloads/Veins.png”, as_gray=True) sobel_y = np.array([[ -1j, -2j, -1j], [0, 0, 0], [ 1j, 2j, 1j]]) sobel_x = np.array([[ -1, 0, +1], [-2, 0, +2], [ -1, 0, +1]])

result = signal.convolve2d(image, sobel_x+sobel_y, boundary=’symm’, mode=’same’)

import matplotlib.pyplot as plt fig, (ax_orig, ax_mag) = plt.subplots(1, 2) ax_orig.imshow(image, cmap=’gray’) ax_orig.set_title(‘Original’) ax_orig.set_axis_off() ax_mag.imshow(np.absolute(result), cmap=’gray’) ax_mag.set_title(‘Sobel Applied’) ax_mag.set_axis_off() fig.show()

smoothing = np.ones([50,50])

result = signal.convolve2d(image, smoothing, boundary=’symm’, mode=’same’)

import matplotlib.pyplot as plt fig, (ax_orig, ax_mag) = plt.subplots(1, 2) ax_orig.imshow(image, cmap=’gray’) ax_orig.set_title(‘Original’) ax_orig.set_axis_off() ax_mag.imshow(np.absolute(result), cmap=’gray’) ax_mag.set_title(‘Sobel Applied’) ax_mag.set_axis_off() fig.show()

2D Convolution: Detect Edges with Scharr Operator 22 import imageio from scipy import signal from scipy import misc image = imageio.imread(“/Users/amirh/Downloads/Veins.png”, as_gray=True) scharr = np.array([[ -3-3j, 0-10j, 3 -3j], [-10+0j, 0 0j, +10 +0j], [ -3+3j, 0+10j, +3 +3j]]) # Gx + j*Gy grad = signal.convolve2d(image, scharr, boundary=’symm’, mode=’same’)

import matplotlib.pyplot as plt fig, (ax_orig, ax_mag) = plt.subplots(1, 2) ax_orig.imshow(image, cmap=’gray’) ax_orig.set_title(‘Original’) ax_orig.set_axis_off() ax_mag.imshow(np.absolute(grad), cmap=’gray’) ax_mag.set_title(‘Gradient magnitude’) ax_mag.set_axis_off() fig.show()

Dropouts or how not to overfit 23

Dropouts or how not to overfit 24

Dropouts or how not to overfit 25

Reading: Chapter 9 of Deep Learning Book ConvNets Tutorial in PyTorch Understanding Convnets (blogpost, paper) Understanding Dropouts (blogpost, paper) Programming Learn to work with ConvNets. Follow these tutorials to learn how to use ConvNets for various tasks in PyTorch Beginner: Training a classifier Advanced: Gated ConvNets for Neural NLP Explainability: How ConvNets characterize images Understanding ConvNets

GAN Workshop [12/29]

HeadlineTime
Total time14:41
HeadlineTime
Total time1d 2:15

Notes

  • fwiw: ppl remember beginning and end of event most
prep meets
meet 1 [2019-08-12 Mon]
  • Andring notebooks for presentation
    • 2.him.
  • capsblog post OR reproduce
  • 1st n
    • NOdule
    • indcgan, gen, discr, 2.5hrs
      • pointed code
meet 2 [2019-08-19 Mon]
  • we no focus more on theory?
    • sont some more advanced stuff
    • mareating some of the layers
  • we ck individuals who want the more advanced
meet 3 [2019-08-26 Mon]
  • cyclegan intro, handson,
    • end with 1hr applications and evaluation and conclusions
  • Amir won’t be there
  • [X] go thru notebook, run and review in detail (prep for wed)
  • students tasks:
    • wed short draft qualitative about what paper is about
    • 2 weeks try to reproduce, expect write observations of attempt, not necessarily success
      • can be as low as reading code and observations

TODOs [4/21]

WK3 cycleGan notebook Review [0/6]
[.] super(resnetblock, self) - calling itself
[.] opt namedtuple
[.] whole cycle thing - still check why for D selection
[.] BCEWithLogitsLoss
[.] wgangp no loss?
[.] label generator register_buffer
HW todo from WK2 slides [0/11]
[.] Does training convergence indicate better results?
[.] How would you estimate memory needs for a GAN?
[.] Update the model to use 32x32 or 128x128 images
[.] Interpolate through the latent space of the trained DCGAN
[.] Adapt the first GAN to run on GPU
[.] Prepare your own dataset to run through DCGAN
[.] Try removing batch norm and see what happens
[.] What prevents this architecture from being used for large models?
[.] What would happen if we replaced batch norm with spectral norm?
[.] What do I mean when I said strided convolutions replace pooling functions?
[.] WK2 training Qs
  • why detach?
  • why that view call?
[c] share:gan training anim
better online experience
  1. [X] ask individuals who want more advanced
  2. those without mics.
    1. how to engage? talk or chat?
  3. ask for progress
  4. suggest others mute or reduce volume when conversation not relevant to them
  5. remind everyone to mute mics when talk comes on
  6. [X] try out screen share with lower-right webcam
[x] Blog post Intro
HeadlineTime
Total time16:52
  • less 2.5hrs for that saturday = 17 - 2.5 = 14.5
2nd

Introduction Week 1 - Introduction:

Intro to GANs and adversarial training Writing a GAN Training your GAN (hands on) Week 2 - Image to image translation:

Intro to image to image translation GANs Writing a cycleGAN Training a cycleGAN (hands on) Week 3 - Advanced Topics:

Evaluating GANs Current research and state of the art Beyond GANs: applications of adversarial training

[one paragraph covering the gist of what was covered in session 1] For the first session the groundwork was laid with an overview of General Adversarial Networks and a solid introduction to the theory. Instead of the usual image tasks typical of GANs, participants worked on a minimal GAN that simply converted a random uniform distribution to a Gaussian. In this way the focus was on the core essentials that uniquely define GANs.[one paragraph covering the gist of what was covered in session 2] In session two, participants progressed from foundations to more contemporary GAN architectures. The hands-on exercise involved using the ubiquitous DCGAN architecture for an image to image translation task. The lecture portion filled out the picture with the necessary theory and its historical progression.[one paragraph covering the gist of what was covered in session 3] The last session moved to an even more challenging GAN architecture: the cycleGAN. After the lecture portion, an implementation of a cycleGAN along with the training code completed by the participants. As execution continued, our instructor Andrew finished off the workshop by discussing current research and the state-of-the art in the field. He made it clear that GANs are not just for images, but have a place in many areas of machine learning, such as: times series, <insert more here>.[one paragraph outlining the post; has to written once the sections are filled]
1st draft - wrong

During the month of August, AISC (Aggregated Intellect Socratic Circles) held a workshop - ‘Generative Adversarial Networks and Beyond’. Attending either on location or online, students were engaged in both lectures from our instructor, Andrew B. Martin, as well as hands-on coding exercises. Over the course of the 3 weekly sessions, students went from the core basics of GANs, through the ubiquitous DCGAN, and ending with CycleGANs. Along the way, students were given supplementary material to expand on the class contents and provide guidance to better enable them to continue with GANs on their own. Our workshop finished with a capstone project the result of which is this post. It is the collective work of all our students. Teams were formed and each team could choose to either write a qualitative summary of a selected paper, or to reproduce the papers results. Below are the results of their efforts. [brief intro to each entry] [Lastly, we finish off with <last entry>. <some closing sentence>]

scratch
  • [X] blogpost guideline detailed - ask on staff if ?
    • intro to whole blog post
    • 1 paragraph – takaway of 3 sessions, high level overview

During the month of August, AISC (Aggregated Intellect Socratic Circles) held the ‘Generative Adversarial Networks and Beyond’ workshop.

Over the course of the 3 weekly sessions, students went from the core basics of GANs, through the ubiquitous DCGAN, and ending with CycleGANs. Attending either on location or online, students were engaged in both lectures from our instructor Andrew B. Martin and hands-on training with notebooks.Attending either on location or online, students were engaged in both lectures from our instructor, Andrew B. Martin, as well as hands-on coding exercises. Over the course of the 3 weekly sessions, students went from the core basics of GANs, through the ubiquitous DCGAN, and ending with CycleGANs.Over the course of the 3 weekly sessions, students went from the core basics of GANs advanced architectures through the ubiquitous DCGAN, and ending with CycleGANs.

Over the course of 3 weekly sessions, students went from the core basics of GANs, to the ubiquitous DCGAN, and ending with CycleGANs. Attending either on location or online, students heard lectures from our instructor Andrew B. Martin and dove right into hands-on training with notebooks.

Along the way, students were given supplementary material to expand on the class contents and provide guidance to better enable them to continue with GANs on their own.

— Our workshop finished with a capstone project that is the collective work of all our students. Teams were formed and each team could choose to either write a qualitative summary of a selected paper, or to reproduce the papers results. Below are the results of their efforts. [brief intro to each entry]

The final assignment for this workshop are the capstone blog posts.

To finish off the workshop, students were given a capstone project to complete. Students were broken up into nnn teams. Each team had the option to either write a qualitative summary of a selected paper, or to reproduce the papers results.

They were broken up into nnn teams and each team choose one to two options. The first option was to write a qualitative summary of a selected paper, or alternatively, as a second option, to reproduce the papers results in a coding project.

  • content points
    • Instructor Andrew works in industry
      • uses GANs in
      • he provided practical insights to use GANs in real-life
    • capstone
workshop blurb

Workshop Overview

Generative Adversarial Networks have been very popular in recent years for various tasks like image generation and data augmentation. A large number of papers at ICLR 2019 were focused on GANs. With the fast pace of the field, you won’t be able to stay up to date with the latest if you don’t have the right foundational knowledge of how these networks work under the hood. We are offering this workshop to help you step into the depth of GANs. Are you ready?

In this workshop, you will learn the theory and gather hands on experience in some of the most fundamental concepts and practical tips about GANs.

Important Dates

Please note that this workshop will happen on 3 separate evenings:

August 14, 2019

August 21, 2019

August 28, 2019

Office hours will happen on,

August 20, 2019 (in person and online participants, Group office hour with the TAs)

August 27, 2019 (in person and online participants, Group office hour with the TAs)

Date TBC (Group office hour with the instructor, has to be purchased separately)

“Why should I care about GANs?”

GANs and adversarial ML are widely discussed and increasingly used in AI

GANs are rapidly being applied in many of the cutting edge AI applications

“But I don’t care about generating fake photos!”

Adversarial ML has become an integral part of many of the recent ML algorithms

This workshop goes beyond GANs where you will explore adversarial training and their numerous potential applications

Why you should attend

In this 3-session intensive workshop, we will bring you up to speed with everything needed to build a strong background in GANs. It will be a combination of theory and hands-on applications in PyTorch.

This workshop is built on the instructors extensive experience in academia and industry on related topics.

This workshop is the first in its series and paves the way theoretically and technically for many application specific workshops to follow.

Target Audience

Data Scientists, Machine Learning Engineers, Software Engineers, Students, Other Analytics Roles (data analysts, managers, product owners, etc)

Prerequisites

Knowledge of Python Knowledge of Machine Leaning Familiarity with deep learning Familiarity with PyTorch or other deep learning frameworks is a plus This is a beginner to intermediate workshop Learning Outcomes

We will build a working application in Python using GANs and image processing to generate and translate images

Understand how a vanilla GAN works Understand how and image to image translation GAN works Be able to explain limitations, current research directions, and applications Build GAN to generate images Build another GAN to translate images You will get a deeper understanding on how to apply GANs and adversarial loss to you own deep learning pipeline, in supervised, unsupervised and semi-supervised settings Pre-workshop reading material

TBD

Learning Material

All participants will have access to the following learning material:

Slides from the sessions Hands on notebooks Video recording of the sessions (you can use the videos to watch the parts that you missed, or re-watch any parts that are still unclear for you; access to videos beyond one week after the workshop is available to be purchased; see tickets >> add-ons) Instructor

Andrew Martin

Head of Data @ Looka Inc

Andrew is a data scientist of 8 years working in deep learning and optimization. He currently leads the data team at Looka where they use generative models like GANs to make great design accessible and delightful to everyone.

Course Modules

The workshop happens on 3 evenings, 3 hours each; each module below will be 50 mins.

Week 1 - Introduction:

Intro to GANs and adversarial training Writing a GAN Training your GAN (hands on) Week 2 - Image to image translation:

Intro to image to image translation GANs Writing a cycleGAN Training a cycleGAN (hands on) Week 3 - Advanced Topics:

Evaluating GANs Current research and state of the art Beyond GANs: applications of adversarial training

[x] student writeup editing
@Willy Rempel @Werner could you please start looking at the parts people have copied and provide them with feedback? just leave comments directly on their write up. Perhaps only focus on the technical side of what they have rather than language, unless it’s very difficult to read or the flow is very bad etc
[x] fix clock last night

Breakout Groups [0/0]

4 progressive GAN
  • 3 ppl
5 EvoGan
  • 3 ppl, Alice et al
  • population of Gs
    • evo selection

Archive

cycleGAN testing
  • 1st entry started about 1hr ago. Also did several hours work yesterday.
horse2zebra :: started 2:20
batch_size: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False crop_size: 256 dataroot: ./datasets/horse2zebra [default: None] dataset_mode: unaligned direction: AtoB display_env: main display_freq: 400 display_id: 1 display_ncols: 4 display_port: 8097 display_server: http://192.168.0.35 [default: http://localhost] display_winsize: 256 epoch: latest epoch_count: 1 gan_mode: lsgan gpu_ids: 0 init_gain: 0.02 init_type: normal input_nc: 3 isTrain: True [default: None] lambda_A: 10.0 lambda_B: 10.0 lambda_identity: 0.5 load_iter: 0 [default: 0] load_size: 286 lr: 0.0002 lr_decay_iters: 50 lr_policy: linear max_dataset_size: inf model: cycle_gan n_layers_D: 3 name: CycleZebra1 [default: experiment_name] ndf: 64 netD: basic netG: resnet_9blocks ngf: 64 niter: 100 niter_decay: 100 no_dropout: True no_flip: False no_html: False norm: instance num_threads: 4 output_nc: 3 phase: train pool_size: 50 preprocess: resize_and_crop print_freq: 100 save_by_iter: False save_epoch_freq: 5 save_latest_freq: 5000 serial_batches: False suffix: update_html_freq: 1000 verbose: False(epoch: 133, iters: 1112, time: 2.520, data: 0.003) D_A: 0.090 G_A: 0.543 cycle_A: 0.625 idt_A: 0.233 D_B: 0.213 G_B: 0.239 cycle_B: 0.807 idt_B: 0.198 Traceback (most recent call last): File “train.py”, line 43, in <module> File “/home/will/DevAcademics/GANs/pytorch-CycleGAN-and-pix2pix/data/__init__.py”, line 90, in __iter__ for i, data in enumerate(self.dataloader): File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/torch/utils/data/dataloader.py”, line 582, in __next__ return self._process_next_batch(batch) File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/torch/utils/data/dataloader.py”, line 608, in _process_next_batch raise batch.exc_type(batch.exc_msg) FileNotFoundError: Traceback (most recent call last): File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py”, line 99, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py”, line 99, in <listcomp> samples = collate_fn([dataset[i] for i in batch_indices]) File “/home/will/DevAcademics/GANs/pytorch-CycleGAN-and-pix2pix/data/unaligned_dataset.py”, line 57, in __getitem__ File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/PIL/Image.py”, line 2770, in open fp = builtins.open(filename, “rb”) FileNotFoundError: [Errno 2] No such file or directory: ‘./datasets/horse2zebra/trainA/n02381460_1266.jpg’
  • restarting at [2019-08-09 Fri] 12:43
  • less than 24h I was at epoch=133.
datasets

facades: 400 images from the CMP Facades dataset. [Citation] cityscapes: 2975 images from the Cityscapes training set. [Citation] maps: 1096 training images scraped from Google Maps. horse2zebra: 939 horse images and 1177 zebra images downloaded from ImageNet using keywords wild horse and zebra apple2orange: 996 apple images and 1020 orange images downloaded from ImageNet using keywords apple and navel orange. summer2winter_yosemite: 1273 summer Yosemite images and 854 winter Yosemite images were downloaded using Flickr API. See more details in our paper. monet2photo, vangogh2photo, ukiyoe2photo, cezanne2photo: The art images were downloaded from Wikiart. The real photos are downloaded from Flickr using the combination of the tags landscape and landscapephotography. The training set size of each class is Monet:1074, Cezanne:584, Van Gogh:401, Ukiyo-e:1433, Photographs:6853. iphone2dslr_flower: both classes of images were downlaoded from Flickr. The training set size of each class is iPhone:1813, DSLR:3316. See more details in our paper.

execution scratch

python train.py –name CycleMaps1 –model cycle_gan –display_id 1 –dataroot ./datasets/maps

python train.py –name CycleMaps1 –model cycle_gan –display_id 1 –dataroot ./datasets/maps –display_server=”http://192.168.0.35

bi-cubic order = 3

colab scratch

!pip install pytorch !pip install torchvision

colab notebook from big repo

from <gitrepo> import util, models, options, data

no importing: datasets (.sh .py) scripts (all .sh)

L&L GANs

explicit has p(x) implicit – has a blox box, just get samples from distribution

Workshop 2 [7/7]
  • emphasis on GANS, less pytorch ??
[x] followup Qs
  • [X] share on staff channel
  • [X] elu vs relu vs leaky
  • [X] notebook images from Andrew properly reffed so they show up in case of reset
[x] share:gan hacks
other prep tomorrow [3/4]
[x] conv stride, padding,[x] transpose conv [c] pytorch study different modules: datasets mostly, util, nn, optim,
[x] collect relevant webpages for screen sharing and ref
[x] pull up relevant pdfs too
text of slides for today

GANs workshop day 2

Today’s itinerary Previous class Simple GAN to DCGAN Coding a DCGAN Intro to CycleGAN Assignments, exercises, and next week OVERVIEW

Selected work Approximate P(x,y) rather than P(x | y)

Generated symbols OUR PROJECTS

Generated Typefaces OUR PROJECTS

Previous Class Approximate P(x,y) rather than P(x | y)

What we did Overview of Generative Adversarial Networks 1 hidden layer fully connected generator and discriminator Stochastic Gradient Descent optimization Binary cross entropy loss function

Reading and Assignments Goodfellow et al (2014), Radford et al (2015) Pix2pix, CycleGAN, and/or batch norm papers Challenges Update the model to use 32x32 or 128x128 images Interpolate through the latent space of the trained DCGAN Adapt the first GAN to run on GPU Prepare your own dataset to run through DCGAN Try removing batch norm and see what happens Questions: Does training convergence indicate better results? How would you estimate memory needs for a GAN? Previous class

Simple GAN to DCGAN Approximate P(x,y) rather than P(x | y)

OVERVIEW OF GANS Training Through training the generator learns to turn random noise into realistic samples

Generator and discriminator Two networks, the generator and discriminator, competing in a two player minimax game

Discriminator trained to identify whether a sample comes from the training set or the generator

Generator trained to generate samples that trick the discriminator OVERVIEW OF GANS

Model description Transform random uniform noise into a normal distribution 1 hidden layer fully connected generator and discriminator Stochastic Gradient Descent optimization Binary cross entropy loss function

First GAN

DCGAN OVERVIEW OF GANS Radford et al 2015

Model description Transform random random noise into images of font sheets Multiple convolutional hidden layers with batch normalization and leaky ReLU Model weights initialized from Normal with mean 0 and stddev 0.2 Adam optimization Binary cross entropy loss function

DCGAN

Model description OVERVIEW OF GANS

Similarities with Simple GAN Generator still maps latent space to target distribution samples Discriminator still maps samples to a classification Training loop is virtually unchanged Model still uses binary cross entropy loss function

DCGAN

Concepts for DCGAN Approximate P(x,y) rather than P(x | y)

Convolutional hidden layers All convolutional network (Springenberg et al., 2014) Efficient for getting a representation of images Strided convolutions to learn its own pooling function

Concepts for DCGAN

Strided convolutions Concepts for DCGAN

Transposed strided convolutions Concepts for DCGAN

Batch Norm Introduced in Ioffe & Szegedy (2015) Normalize input to each unit to have mean 0 and variance 1 Allows gradients to flow for deeper generators with no mode collapse

Concepts for DCGAN

Adam optimizer Introduced in Kingma & Ba (2014) Adaptive moment estimation Stochastic gradient descent has one learning rate. Adam has an adaptive learning rate for each network parameter Uses moving averages of first and second moments of gradient Four hyper parameters: initial learning rate, decay on moving averages, and epsilon In practice very robust and doesn’t need as much tuning as other algos Concepts for DCGANCoding DCGAN Questions What prevents this architecture from being used for large models? What would happen if we replaced batch norm with spectral norm? What do I mean when I said strided convolutions replace pooling functions? Coding DCGANIntro to CycleGAN Approximate P(x,y) rather than P(x | y)

Horse to zebra CycleGAN

Unpaired image to image translation CycleGAN

High level Unpaired image to image translation Builds on the pix2pix model from a year earlier Two generators and two discriminators Concept of cycle consistency loss Transfers style from one collection to another collection CycleGAN

Model training CycleGAN Zhu et al 2017

Paper implementation Generator: 6 - 9 residual blocks Discriminator: 70x70 PatchGAN to reduce size Least squares loss replacing BCE for stability Train generator with history of generated images rather than latest to reduce oscillation Train with Adam and a batch size of 1 See PyTorch code: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix CycleGANCourse implementation Generator: Transposed convolutions with a residual block Discriminator: convolutional discriminator Least squares loss replacing BCE for stability Train with generated images from the current batch Train with Adam and a batch size of 16 CycleGANWhat’s next?Next week Coding cycleGAN Adversarial training more generally Evaluating GANs and where the technology is heading What’s next?Assignments and reading Pix2pix and cycleGAN papers Take a look at the capstone papers Look at this code base: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/tree/master/models Collect all your questions. About anything… What’s next?

Thank you

[x] Capstone Blog Post

update gans intro , edit group posts 3 paragraphs / tech covered
HeadlineTime
Total time11:04
edit group posts

likely more accurate time. except for breaks though

[#A] RL Workshop [10/15]

HeadlineTime
Total time12:56

Notes

Prep Meet1 [2019-09-16 Mon]
1st Sutton book 2nd rainbow 3rd policy gradients

3 teams – ask Qs, are engaged, understand things, convey we are here to help more attention from instructor online relaxing FG will do most of it.

pencil&paper exercise? exercises get points setup rules clearly

Vahid — oct3rd, oct4th to EU

log in sheet – 15min answering Qs offline add prep time add

Prep Meet3 [2019-09-30 Mon]

Archive

TODOs [9/14]

[x] review
udacity RL repo
  • interesting when starting up jupyter-lab for this repo: Build Recommended JupyterLab build is suggested: @jupyter-widgets/jupyterlab-manager needs to be included in build
[c] followup jalan
she could not get things to work
[c] followup knowing probabilities
[x] aisc online-participant guide post

For online participants, please familiarize yourself with the online participation guide:

  1. You can join this call for the hands on sections to discuss any issues you face, or any questions you have https://meet.google.com/ake-cozz-kkx?hs=122. It is the same for all sessions. This is not the same as the link for the workshop streaming itself (located here https://member.ai.science/workshops/rl-2019-09&sa=D&usd=2&usg=AOvVaw0zvT5wO7_6Sofktww4pQ2J)
  2. Please don’t forget to mute your mic when the instructor session begins, or if there is background noise at your location.
  3. Do not hesitate to ask questions, we are here to help.
  4. Screen sharing in meet.google.com: https://support.google.com/meet/answer/7290345?co=GENIE.Platform%3DDesktop&hl=en, or this video clip https://www.youtube.com/watch?v=6FCIqvv68NY. For some issues, we may need to see your screen in order to help you.
  5. The hands-on notebook will be in the google drive prior to the workshop. Don’t forget to make a copy in your own drive and use the copy for all your work. If you have time, try and look thru the notebook prior to the session.
  6. We will use <link> as an online whiteboard as a convenience. Sometimes it is easier to explain with a diagram.
[x] prep RL – proread Florians, read his material
2 papers and blog posts policy gradient in np actor-critic deep deterministic
[x] proof slides
  • title
    • “3ed” -> “3rd”
    • cap “W”orkshop
  • #6 “process converges to to” duplicate “to”
[x] test notebook
[x] routine work
  • copy session3 sol notebook to std folder
  • dl to my files
  • start reading texts
[x] breakdown time
[.] populate this doc with important links

could you populate this doc with some of the important links shared with the class, or useful links that the student shared? https://docs.google.com/document/d/16KX0-fJMBn1ByaG23TEZQHTVAMx-CHEO8jY23utBdjw/edit

Capstone blog post [0/4]
Guidelines

This blog post is the collective work of the participants of the “Reinforcement Learning” workshop. This post serves as a proof of work, and covers some of the concepts covered in the workshop in addition to advanced concepts pursued by students. The blog post will be shared on the AISC Blog, and other major ML/DS related outlets.

Objective The objective of this post is to demonstrate an understanding of the concepts learned in the Workshop.

Teams will be tasked with summarizing a selected RL paper and providing additional insight into the findings and concepts discussed.

Contribution Declaration Each team will provide contribution acknowledgement to its members for participating in the blog post writing and idea generation. Steps Refer to the References section and select one of the suggested papers. Given the topics discussed in the workshop, you should be familiar with the concepts used in each of the papers. Please list the names of your group and team members that contributed to the development at the top of each section. Work with your breakout team to go through the paper, and support each other to make sure everyone understands it well; you can divide and conquer by reading/researching different parts and explaining to each other Come up with a work plan for each member of the team to contribute something (the updated version of this should later turn into the “contribution declaration”) Collaborate and draft your learnings into a section in this article; the alternative is to create a video about what you learned AISC blog editors will provide light technical and language feedback on the post prior to publication on the AISC Blog. AISC blog editors will provide guidance before submitting the blog post to other mediums such as Towards Data Science (Medium.com) and KDNuggets.

Important Dates There are 2 deliverable options based on your available time commitment. Please select the option that suits your team members. SEP 20: Select the paper you want to work on SEP 30: Write a qualitative summary of what you learned about the paper you selected. Please provide enough technical details to demonstrate your understanding OCT 14: Reproduce the results of the paper by going through the code, and rerunning it on the given data set. This should link back to your github repo, or any other pages showing your results/code If you’ve completed both exercises and would like an additional challenge, please contact us at events@ai.science and we can provide guidance to find an extension that could result in some sort of research publication, or a simple application that you can use as part of your portfolioRecommendations You may pick a paper that is not listed here as long as you convince your team; please add that paper to the References section We generally prefer that you don’t work alone, but if you have very good reasons for it, we might consider it You are encouraged to read other sections and provide constructive feedback in the form of comments, but please do not alter them If you claim a section to write, then you need to deliver by the due dates above. If you miss the deadline, the post will be published without your section Breakout teams are combinations of in-person and online audience, and it’s everyone’s responsibility to make sure that all team members are engaged and informed about plans. You can use the slack channel for your communication as much as you want, but also can arrange for video calls etc. If you use any other resources, add their information to the References section, but make sure you don’t modify the existing References
[.] capstone intro

The foundation of all Reinforcement Learning (RL) is of an agent that acts on, and is acted on by its environment.

The agent-environment relationship forms a closed loop:

  1. the environment receives from the agent an action
  2. this action can change the environment. That is, the environment changes state
  3. the agent then receives from the environment both state and reward information

This loop can be formalized as a Markov Decision Process (MDP), $$p(s′,r \mapsto s,a) = Pr(S_t=s′, R_t=r \mapsto St-1=s, At-1=a)$$ where the next state s’ and its associated reward $$r ∈ \mathbb{R}$$, is only dependant on the previous state s, and the action taken a. Every part of this system can be elaborated upon, and from this follows all the rest of the field.

Environments can be discrete or continuous. State changes can be deterministic or not. An environment can be described by its states, the allowed actions at each state, and a transition function $$t(s,a) = s′$$ that accepts a state s, an action a, and return the next state. If the environment is not deterministic, then $$t(s,a)$$ returns $$P(S’)$$, the probability vector where each $$p(s_i) in P(S’)$$ is the probability that the action will result in the environment changing to state $$s_i$$. The agents goal is to maximize its total reward over time. Thus it needs to choose a particular sequence of actions that will achieve this. Agents choose their actions based on a policy function $$μ(s)$$ if it is deterministic, or $$π(s)$$ if it is probabilistic. The policy can be as simple as a static look-up table, or as complex as a large, deep net. An $$ε-greedy$$ policy is one where the agent chooses a random action with probability $$ε$$, or the maximally rewarding action otherwise. This highlights the trade-off between exploitation and exploration that is a common theme of policies. When the agent-environment loop is indefinite in duration, the rewards cannot be simply summed. A discounting factor $$0 &lt; γ &lt; 1$$ is used to attenuate future rewards: $$G_t = Rt+1 + \gammaRt+2 + γ2Rt+3 + γ3Rt+4 + …$$ Agents also have either a value function v(s) that returns a value for state s, or q(s,a) that values a state-action pair (the value of taking action a while in state s). Both value functions satisfy a recursive relationship expressed by the Bellman equations $$vπ(s) = \mathop{\mathbb{E}π}\left[∑k=0γkRt+k+1\big\vertSt=s\right], for all s ∈ S$$ $$qπ(s,a) = \mathop{\mathbb{E}π}\left[∑k=0γkRt+k+1\big\vertSt=s, At=a\right], for all s ∈ S$$ Almost all reinforcement learning algorithms are General Poliy Iteration (GPI) methods:

  • Maintain approximate value and policy functions
  • The policy is iteratively improved with respect to the value function, while the value function is evaluated with respect to the policy
  • This feedback loop converges to optimal policy and value functions
  • the value function is used to structure and constrain the policy search

Dynamic Programming algorithms are at one extreme of RL methods requiring a perfect model of the environment and typically exponential computation cost. They are used for the theoretical underpinning of reinforcement learning as opposed to practical use. Briefly, dynamic programming involves finding optimal solutions by progressively building from optimal solutions to sub-problems. At the other extreme Monte Carlo (MC) methods have no model and rely soley on experience from agent-environment interaction. The value of a state s is computed by averaging over the total rewards of several traces starting from s. These methods require completing entire episodes (traces) before the value function can be updated. Temporal Difference (TD) learning is an invaluable approach that combines advantages from both DP and MC methods. As the name implies, valuation updates are done recursively by the difference between time steps. It does not require an environment model like DP, and unlike MC it can update prior to episode completion. Like MC, it learns directly from experience, and like DP it iteratively updates estimates. This is easiest to show by comparing value function updates rules: Monte Carlo $$V(St) \mapsto V(St) + α[Gt - V(St]$$

Dynamic Programming $$vπ(s) = \mathbb{E}π[Rt+1 + \gammaGt+1| St = s]$$

Temporal Difference $$V(St \mapsto V(St + α[Rt+1 + \gammaV(St+1 - V(St)]$$

where $$α$$ is the learning rate, and the term $$Rt+1 + \gammaV(St+1$$ is the updated estimate of $$V(St)$$

SARSA is a TD algorithm that is an ‘on-policy’ learning method. On-policy methods evaluate and improve the same policy $$π$$ that is used to make the action decisions. In contrast, an off-policy method uses two policies: a behavioural policy $$b$$ that is more amenable to explore traces outside of current optimal estimates, and the target policy $$π$$ to be optimized. Q-learning is an off-policy TD method. It is defined by: $$Q(St,At) \mapsto Q(St,At) + α[Rt+1 + γ\maxaQ(St+1,a) - Q(St, At)] Deep Q-learning (DQN) uses deep neural networks for the policy and value functions. The cost function for DQN is $$\big\left( DQNnet(St,a) - (r + γ\maxaDQNnet(St+1,a))\big\right)2$$ An important addition to the architecture is experience replay by the use of a memory D. Agent experiences are stored as tuples $$et = (s_t,a_t,r_t,st+1)$$ over many episodes. During training minibatches of samples are taken from D at random for standard SGD optimization. Rainbow DQN combines several architectural innovations to vanilla DQN that have proven to be beneficial:

  • Double deep Q-Learning
  • Duelling DQN
  • Action Advantage
  • Noisy Networks
  • Multi-step Learning
  • Prioritized Experience Replay

Lastly, Policy Gradient Methods directly optimize a parameterized, differentiable policy function that does not require the use of the value function for action selection. For example, the REINFORCE Monte Carlo Policy Gradient algorithm trains policy $$π(a\vertSt,θ)$$ with parameters $$θ$$ by the update rule $$ θt+1 = θt + α\gammetGt\frac{δ\pi(At\vertSt, θt}{π(At\vertStt}$$

where $$α$$ is the learning rate, $$γ$$ is the discounting factor, and $$Gt$$ is the total episodic return. An useful algebraic trick is to reformulate the right-most term above as $$\frac{δ\pi(At\vertSt, θt}{π(At\vertStt} = δ ln π(At\vertStt)$$

points
  • Markov Decision Process (MDP) + Markov Property
  • Env
    • states
    • rewards
    • transition
  • recursive, iterative, game, terminate, infinite, trace/trajectory/episode, goal, G, R/rrr,
    • discounting
    • exploration vs exploitation
      • e-greedy
      • 1-armed bandit
  • Agent
    • policy
      • actions
      • epsilon-greedy
    • value
      • value functions and Bellman Equation
  • Almost all RL algorithm are GPI
    • Maintain both an approximate value function and an approximate policy
    • Iteratively improve policy with respect to value function, and value function always drives to the value function of the current policy
    • Overall process converges to to optimal policy and optimal value function Generalized Policy Iteration (or in PGMs?)
  • Dynamic programming
    • model vs model free - simulates future states
    • Florians:
      • Use value function to perform a structured search of good policies
      • Iterative approximations v1, v2, v3, v4, … of vπ(s) by using Bellman equation as update rule
      • Replace v(s) with new value calculated bu the old values of v(s’). This is called expected update.
      • Terminates ones value functions minimal change after iteration
  • Monte Carlo Methods
    • DP requires distribution of the next events
    • Monte Carlo based methods rely only on experience No prior knowledge of the environment is required Averages sample returns (remember k-armed bandits)
  • TD learning
    • Compare to DP and MC
      • Does not require model of environment (unlike DP)
      • MC needs to wait until episode finish, TP can online update
      • MC it’s hard to estimate value of action-state pair
    • On-Policy, Off-Policy
      • off-policy
        • Importance Sampling
        • Transform Weight
        • Weight Importance Scaling
    • SARSA
    • Q-Learning is also a temporal difference learning algorithm. However, unlike SARSA, it is off-policy.
  • Deep Q-Learning
    • ?optimal bellman for QL
    • Experience Replay
    • All the learnable parts: policy, value fns,
    • Rainbow DQN
      • Double deep Q-Learning
      • Duelling DQN
      • Action Advantage
      • Noisy Networks
      • Multi-step Learning
      • Prioritized Experience Replay
  • Policy Gradient Methods
    • describe
    • Regression towards optimal policy - we don’t know optimal policy
    • REINFORCE Monte Carlo Policy Gradient Control
    • stronger convergence guarantees
    • Deep Deterministic Policy Gradients
    • Actor-Critic
    • DDPG Double Deep Policy Gradient algorithm

In all the discussion up to now, any model learned indirectly by translating reward information into a loss function. Policy Gradient Methods directly apply the reward signal into the gradient updates of the policy function.

draft

An agent must learn the value of states

value fns expanded: $$vπ(s) = ∑a π(a\mapstos)∑s′,r p(s′,r\mapstos,a)[r + \gammavπ(s′)], for all s ∈ S $$ q version needs to be done right: $$qπ(s,a) = ∑a π(a\mapstos)∑s′,r p(s′,r\mapstos,a)[r + \gammavπ(s′)], for all s ∈ S $$

The foundation of all Reinforcement Learning (RL) is of the agent in an environment. From this flows all aspects of RL

  • agent volition (policy)
  • agent

The agent is an actor which recieves information about the environment, acts so as to make a change An agent that acts, recieves input

Environments can be discrete or continuous. State changes can be deterministic or not. An environment can be described by its states, the allowed actions at each state, and a transition function $$t(s,a) = s′$$ that accepts a state s, an action a, and return the next state. If the environment is not deterministic, then $$t(s,a)$$ returns $$p(S’)$$, the probability vector where each $$p(s_i) ∈ p(S’)$$ is the probability that the action will result in the environment changing to state $$s_i$$. A game such as chess is discrete and deterministic; players take turns one at a time, and moving the pieces has certain, well defined state transitions. A fisherman who is fishing has neither. The environment state is continuously changing, and even though the fisherman performs the action sequence of fishing, the result of catching a fish is not certain, but is a probability. Notice that fishing was described as a sequence of actions. Such a sequence is called a trace (or trajectory). The sequences are recursive in nature, one follows from the next. The chess game eventually terminates, all possible trajectories are finite. Some agent-environment loops are indefinite, such as say gambling with a one-arm bandit (a casino slot machine). A game with a one-arm bandit has a goal to make money over time, in that case the player is rewarded $$r_i$$ each time the bandit returns a win at time $$t_$i$. But each move costs, each non-winning state has a reward of say -1. In chess there is a single goal G to win the game, there are few terminal states that have a reward, and the intermediate states have none. But some states are more likely to result in a win and thus are more valuable. An agent has associated with it both a value function and a policy function. It is possible for an agent to have An agent must learn the value of states

[.] capstone edit #4 1st edit
[.] capstone edit #9 2nd edit
[.] capstone edit #4 2st edit
Breakout 4
  • pdf reading [2019-10-21 Mon 13:49], about 1.5 hours till now.
    • before hand sporadic

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Breakout 7
Title of the paper: A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem Team members: Larry Li , Nick Buryonk , Gurinder Ghotra Contributions:
Breakout 8
Title of the paper: Deep Reinforcement Learning in Large Discrete Action Spaces Team members: Eric Djona Fegnem, Ariel Wang, Brendan McGivern, Yuanhui Lang, Ike Okonkwo Contributions:
Breakout 9
Title of the paper: AlphaD3M: Machine Learning Pipeline Synthesis Team members: Most Husne Jahan, Gunjan Lakhlani, Andriy Kourinnyi, Alireza Darbehani Contributions:
  • 1st sesh, distracted.
    • didn’t count this,

Breakout Groups [0/0]

Session 1 [0/0]

how to write own env keras or el
  • Policy:
    • defines how our AI choose his action for a given state
  • Reward Signal:
    • the reward for a given action (not state?)
  • Value function:
    • Expected future rewards of a state (how good is a given state)
  • Environment models:
    • Simulates future states, allows for planning

Session 2 [1/1]

[x] prep
links

https://awwapp.com/# https://www.tutorialspoint.com/free_online_whiteboard.htm https://drive.google.com/drive/folders/1f8okO2XDraTJydT7MetOKuPrU8U1pBU2 https://colab.research.google.com/drive/1ymctv7BRNG9UL7Tara_vyYgi103sdMnW#scrollTo=AT8EfnJGgbc7 https://colab.research.google.com/drive/1zEgL1uYgtJgZMJ00LBKPs02dOkOnnVa8#scrollTo=CeiiZKYVgcVq https://drive.google.com/drive/folders/1ffIXP5k9k5Rcvbo4ouAXFJeR8XoTcUIb https://colab.research.google.com/drive/13tk0npWP6JhEx9JVw0bdcC8u3skD2fm0#scrollTo=5zH8U8OGe_fd

Session 3 [0/0]

MLops Workshop [22/44]

**** Notes ***** My AWS Account Info
AWSAccessKeyId=AKIAICLYXL5KD3AURP4A
AWSSecretKey=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
region=us-east-2

***** My Azure Account Info az account show

{
  "environmentName": "AzureCloud",
  "id": "38e53cfe-df59-42bc-ac0c-b50136568522",
  "isDefault": true,
  "name": "Free Trial",
  "state": "Enabled",
  "tenantId": "0b7a2c43-b11b-4048-a4ba-cf3fdd2b2272",
  "user": {
    "cloudShellID": true,
    "name": "live.com#willy.rempel@gmail.com",
    "type": "user"
  }
}

azureProfile.json

{"subscriptions": [{"id": "38e53cfe-df59-42bc-ac0c-b50136568522", "name": "Free Trial", "state": "Enabled", "user": {"name": "willy.rempel@gmail.com", "type": "user"}, "isDefault": true, "tenantId": "0b7a2c43-b11b-4048-a4ba-cf3fdd2b2272", "environmentName": "AzureCloud"}]}

ML service workspace

nameazure-ml-ws-1
SubscriptionFree Trial
Resource groupcloud-shell-storage-eastus
LocationEast US 2

***** Important links doc ****** Azure insturctions • Set up your free Azure Credit • Install mini conda for Python 3.7: https://docs.conda.io/en/latest/miniconda.html (make sure you updated the Path for conda command) • Run the following commands • conda create –name azureml python=3.7 • conda activate azureml • conda install scikit-learn • pip install tensorflow==1.14 • pip install azureml-sdk[explain,automl,notebooks,automl,services] • pip install pandas • pip install jupyter • Github: • Install github desktop • Create a github account • Set up github on your machine (login etc) • Install VSCode • Install extension: (https://code.visualstudio.com/docs/editor/extension-gallery) • Python (Author: Microsoft) • Azure Account (Author: Microsoft) • Azure Machine Learning (Author: Microsoft) • Git Graph (Author: mhutchie) ***** session1 notes

  • 10-15 years of DevOps
  • MLops
    • own entire lifecycle: build and deploy
    • IT (Ops) only focus on infrastructure
    • at least be able to talk to Ops team,
  • cloud
    • managed resurces
    • huge abstraction
    • serverless, auto scalability
    • Separation of resources
    • lower cost
  • @1:07 git starts

**** Archive **** TODOs [22/43]

***** AWS Study [2/7] ****** [x] sagemaker api****** [x] boto3 api****** [.] aws deepdive series ******* vid1
  • managed notebook EC2 VM instance, managed means
    • doesn’t show up in EC2 console
    • no SSH access
  • EBS volume 5GB default
    • persists
  • add, create git repo
  • config shell 15min time limit, use’&’
  • Elastic Inference – attach GPU

****** [-] aws cli ****** [.] build own pipeline might need more, check serverless repo ******* [.] yaml formation ******* [.] aws shell ***** Azure Study [0/2]

[.] yaml files method
[.] relation of dev.azure.com , portal.azure.com
Info for Docs For all doc entries

Here are the topics and the respective contents that need to be created for AWS and GCP: For every bullet for each day we need to gather:

  • Title and a short description of the technology in a paragraph or two. This will be used to explain technology.
  • 2-3 links for further studies.
  • If it requires implementation: simple notebook. If it’s an architecture 2-3 images

Hossein mentioned - appendices included for hands-on and home-work (HW)

Doc Day 1: [18/18]
[c] Overall Architecture for ML Stack
Overall Architecture for ML Stack in GCP and AWS (one or two paragraphs + architecture diagram)
[c] CI | CD frameworks on AWS and GCP + Integrating them with GCP AI Hub or AWS SageMaker for ML Pipelines (GitOps) (Simple diagram showing the flow + Short Notebook if applicable)
  • native clouds, or FOSS as example
  • data engineers
[x] Experiment tracking tools (one or two paragraphs + code samples + screenshot if applicable)
  • tracking/logging metrics in AWS
  • hard to find?
  • exptrack – collect logs, vs #4 below.
  • autoML –
[x] Title and a short description of the technologyTitle and a short description of the technology. This will be used to explain technology. 2 paragraphs
  • autoML algos are in marketplace

AWS Search

  • Find
  • Evaluate
  • Verify datasets used by training jobs
  • Trace Model lineage
    • dataset
    • algorithm
    • hyperparameters
    • metrics

Tags are used to track experiments and group them together. You apply them in your code, and can search using either the AWS Console, a web front-end, or by the API.

AutoML - this is offered by the AWS marketplace, where there are several options

[x] 2-3 links for further studies.AWS Documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/search.html

Sample Notebook: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/search/ml_experiment_management_using_search.ipynb

[x] If it requires implementation: simple notebook. If it’s an architecture 2-3 images[x] code sample[x] screenshot
[x] Hyperparameter tuning techniques and parallel training engines in GCP or AWS (one or two paragraphs + code samples + screenshot if applicable)
Notes reading AWS docs [x] title + 2 paragraphsTitle and a short description of the technology. This will be used to explain technology. Define Metrics Define Hyperparameter Ranges Early Stopping Options Bayesian or Random Search Options[x] 2-3 links for further studies.AWS Documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html

Example from Documents: https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-ex.html

[x] If it requires implementation: simple notebook. If it’s an architecture 2-3 images[x] code sample[x] screenshot
[x] Q for Hossien
  • is he presenting our stuff or are we?
    • am I doing online?
  • What is expected of us? thurough knowledge of our cloud
  • FOSS do I pick my own favourite?
  • NO: do we talk about the high level services (turn-key)
[x] addition from Hossein
Important Update - extra actions before Day 1 please accomplish two extra steps after: conda activate azureml • pip install pandas • pip install jupyter After all of the steps for number 2 is finished (after pip install azureml-sdk …) => test whether AzureML SDK is property installed: In command prompt: make sure azureml conda environment is activated (conda activate azureml) type: python type: import azureml.core type: print(azureml.core.VERSION)
[x] check Werners diagram
[x] sagemaker search
only on console?
[#A] Day2 [2/14]
Doc Day 2:
[.] Model Management / Model Store (one or two paragraphs + code samples (register model/access model) + screenshot if applicable)[.] Containerize the model into an image (one or two paragraphs + code samples (register model/access model) + screenshot if applicable)[.] Perform a dummy test at model registration time into the model store - (one or two paragraphs + code samples + screenshot if applicable)[.] Integrate CI/CD pipeline with the model store using the dummy test (one or two paragraphs + code samples + screenshot if applicable)
[x] move D Rangel
[.] G3:Nour helpful extra stuff
  • ie. protect a branch
[x] session2 prep - tuesday night deck and notebooks (drafts), go thru
[.] pre-session pack
[.] TAs more proactive - howto for online?
[.] code samples and guides to do capstone. also for future pipeline builds
[.] AWS Notebooks
[.] Q data sanity check
[.] Q telemetry info
[.] Q register model
[.] G3 M2
Doc Day3: [0/0]
  • discuss different deployment scenarios - batch and online
  • build release pipeline for model deployment
  • monitoring and logging techniques for ML model in the wild
  • best practices to build scalable ML pipelines
  • bring model explainability in the pipeline (training and deployment)
[.] notbooks fixes

cell 14 : add “`model_name = “tf_mnist_pipeline.model”“` cell 15 : create score directory in root folder cell 19 : add “`import os“` cell 19 : “`Execution script score.py doesn’t exist.“`

[#A] MLops Capstone: [0/1]

[.] [#B] Capstone & Documents Guide (meet Hossien) [2019-10-07 Mon 18:00]
  • pipeline= series of transforms
  • data pre-process keep seperate from model building
  • [ ] actual compute targets
  • [ ] bring in data from redshift? other than S3
  • [ ] what enviros for work?
  • reproducible enviro for training for day1
  • multi-pipelines options for the whole pipeline, so people know about them
  • be able to coach ppl on how to do the work on cloud use for capstone
  • [ ] 1st part architecture: data - preprocc - train - etc
    • deploy, orchestration next day
  • end-to-end initial, what it looks like
  • exp tracking , any logs, not just
  • use these as guides: azure experiments, mlflow experiments
  • [ ] high level understaning for initial
    • [ ] find some github links for examples
  • [ ] seperate g docs for days
  • use boto, SDK
  • they pick a project that groups build from beginning to end
? help Jiri with their AWS documents - Joffri

meeting

meet #1 [2019-10-01 Mon]
  • Azure
    • focused
  • FOSS stacks as well, and theoretical aspects
    • QFlow, etc ???
    • Spod,
    • dataprod, databricks,
    • ppl tend to use managed FOSS
    • focus for enterprise ready on the cloud
  • including as alt to azure
    • AWS
      • TODO US: for AWS
        • slides
        • simple notebook
          • ie. track metrics on aws or gcp
    • google cloud compute (GCP) - other TAs
  • full ML pipeline finished by end of WK3
meet #2 fri [2019-10-04 Fri]
  • git, more advanced
    • team focused
    • 1hr including hands-on, some examples
    • cherry picking, bisecting optional HW
  • some contest? missed. Monday, tue
Day1 Werner meets
werner call
meet 2.1 Werner
  • sagemaker: no container,
  • ci/cd use co-pipline e
  • AWS acct: ML-Ops Staff
  • sagemaker: do everything there.
  • sagemaker estimator least flexible
    • [ ] can we go lower level?
  • AWS lambda for inference?
  • CloudFormation - it is the orchestration
    • use a definition, and reverse engineer
    • ECR - amazons container (elastic container register)
    • amazon dynamoDB for logging etc, but you are not stuck with it
    • cloudformation button, deploy whole thing
  • Xiyang will send AWS
meet 2.2 Werner tryout ci/cd on aws
Action execution failed Parameter validation failed: Invalid length for parameter InputDataConfig, value: 0, valid range: 1-inf

Input artifact bld Output artifact trigger FunctionName aws-mlops-model-cicd-pipeli-LambdaSageMakerTrigger-DKC87JX8ZI2C

meet #3 fri [2019-10-11 Fri]
  • 50-55 ppl
  • feedback:
    • lots about git
    • issues about enviros and setup
    • some too slow, some too fast
    • etc
  • Amir:
    • cutting down material
    • TAs more proactive
      • howto for online?
      • fixed (poll) hours
meet #4 [2019-10-14 Mon]
  • 2nd session
    • continue with git
    • reduced and abstracted content a lot
    • platform agnostic content
      • but gets advanced, not enough time
      • give extra material
      • additional session for advanced students
    • will not cover: containerization, deploy dockers
      • not trivial
    • MLops is devops for ML
  • 3rd deploy to kub or docker into wild – from telemetry make data driven actions
    • kubernetes env on AWS and howto get telemetries?
    • refresher of entire workshop
meet #5 [2019-10-18 Fri]

Breakout Groups [0/0]

  • TA hour 1 [2019-10-12 Sat] : G1, G3
G1
Jiri Stodulka Michael Smart Ramya Balasubramaniam Zain Nasrullah
TA hour 1 [2019-10-12 Sat]
  • Jiri
    • workshop circle with Joffri on AWS
    • aisc: Omar recommendor systems
      • RL recommender system
    • wants to do both Azure & AWS, will do Azure with Zain
  • Zain, has less time, will do Azure only
G2
Alvin Jin Doug Rangel Farhan Lediona Nishani
meet #1
  • irl for their next meetup sat.
  • mnist model
G3
Alex Fatin Haque Andriy Kourinyi Nour Fahmy
TA hour 1 [2019-10-12 Sat]
  • cloud agnostic for Fatin
  • tracking, reproducability, dockerimages?,
    • anyone else to join?
    • fyi : terraform intead of cloudformation at his job

Code Stuff

python Ref

TheAlgorithms/Python: All Algorithms implemented in Python

python libs + datsci advice notebooks

python libs II

ob-ipython [2018-01-03 Wed]

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

plt.hist(np.random.randn(20000), bins=200)

./obipy-resources/38602vRO.png

print("hello world")

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

plt.hist(np.random.randn(20000), bins=200)

./obipy-resources/38602Jma.png

import IPython.kernel.multikernelmanager as km
# km.list_kernel_ids()
# km.MultiKernelManager.list_kernel_ids()
  # print print_me
  # print "hello world"
# print "another ipython kernel!"

some tensorflow [2018-09-14 Fri]

gradientTape example

import tensorflow as tf

  x = tf.constant(3.0)
  with tf.GradientTape() as g:
      g.watch(x)
      y = x * x
      dy_dx = g.gradient(y, x) # Will compute to 6.0
      print(dy_dx)

diveintopython book [2018-01-05 Fri]

CH 4 The power of introspection

4.1 apihelper.py

def info(object, spacing=10, collapse=1):
    """Print methods and doc stings. Takes module, class, list, dictionary, or string."""
    methodList = [method for method in dir(object) if callable(getattr(object, method))]
    processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)
    print "\n".join(["%s %s" %
    (method.ljust(spacing),
     processFunc(str(getattr(object, method).__doc__)))
                     for method in methodList])

if __name__ == "__main__":
    print info.__doc__

import sympy as sym
x = sym.Symbol('x')
k = sym.Symbol('k')
print sym.latex(sym.Integral(1/x, x))

pdb debug

  • l list
  • n next
  • c continue
  • s step
  • r return
  • b break
  • And python

code to check if in a venv

import sys

def is_venv():
    return (hasattr(sys, 'real_prefix') or
            (hasattr(sys, 'base_prefix') and sys.base_prefix != sys.prefix))
  

The check for sys.real_prefix covers virtualenv, the equality of non-empty sys.base_prefix with sys.prefix covers venv. Consider a script that uses the function like this:

if is_venv():
    print('inside virtualenv or venv')
else:
    print('outside virtualenv or venv')

Code Snippets

open all csv files in a dir and do some row stuff 36.11. pipes — Interface to shell pipelines — Python 2.7.14 documentation 13.1. csv — CSV File Reading and Writing — Python 2.7.14 documentation

import os 
import csv

path=os.getcwd()

filenames = os.listdir(path)

for filename in filenames:
    if filename.endswith('.csv'):
        r=csv.reader(open(filename))
        new_data = []
        for row in r:
            row[-1] = row[-1].replace("S-D", "S")
            new_data.append(row)

        newfilename = "".join(filename.split(".csv")) + "_edited.csv"
        with open(newfilename, "w") as f:
            writer = csv.writer(f)
            writer.writerows(new_data)

Dave’s fancy time series plot

import matplotlib.pyplot as plt
plt.style.use('dark_background')
cmap = plt.get_cmap('viridis')
colors = cmap(np.linspace(0, 1.0, len(data)))
for i, series in enumerate(data):
    plt.plot(series, color=colors[i])
plt.show()

Hy

A mile Hy - My experience with lispy Python | Modern Emacs

  • setv - set variables
  • cond - cases wrapped in []
  • do = progn
  • (for [i (range 10)] …)

try some hy

Kitchin loves it.

(import numpy)
(setv a (numpy.array [1 2 3]))
(setv b (numpy.array [1 2 3]))
(print (numpy.dot a b))
(defn simple-conversation []
  (print "hello! yadda yadda")
  (setv name (input "What name? "))
  (setv age (input "What age? "))
  (print (+ "hello " name "! I see you are " age " years old.")))

(simple-conversation)

sample ob-ipython

a = 5
b = 2**5
hi = "Hello World!"
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
a = 888

# plt.hist(np.random.randn(20000), bins=200)
# %matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
# import tensorflow as tf

plt.hist(np.random.randn(20000), bins=200)

# import tensorflow as tf

x = np.random.randn(10000)
y = np.sin(x)

/tmp/image.png

import numpy as np
import tensorflow as tf

node1 = tf.constant(3.0, dtype=tf.float32)
node2 = tf.constant(4.0)

# print(node1, node2)
# [foo(x) + 7 for x in range(20)]
"what?"

sess = tf.Session()

# return (node1, node2)

# return (sess.run([node1, node2]))

node3 = tf.add(node1, node2)
return ("sess.run(node3):", sess.run(node3))
# %matplotlib inline
# import matplotlib.pyplot as plt
# import numpy as np
# import tensorflow as tf

# plt.hist(np.random.randn(20000), bins=200)
# def foo(x):
#     return x + 9

# [foo(x) + 7 for x in range(7)]
import sympy as sym
x = sym.Symbol('x')
k = sym.Symbol('k')

print(sym.latex(sym.Integral(1/x,x)))
print(sym.latex(sym.besseli(x,k)))

kitchin

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

t = np.linspace(0, 20 * np.pi, 350)
x = np.exp(-0.1 * t) * np.sin(t)
y = np.exp(-0.1 * t) * np.cos(t)

plt.plot(x, y)
plt.axis('equal')

plt.figure()
plt.plot(y, x)
plt.axis('equal')

print('Length of t = {}'.format(len(t)))
print('x .dot. y = {}'.format(x @ y))

ditaa org-babel

+---------+
|         |
| Willy   |
|         |
+----+----+---+
|Bar |Baz     |
|    |        |
+----+--------+

++
++

find where ditaa should go

(expand-file-name
             "ditaa.jar"
      (file-name-as-directory
            (expand-file-name
                "scripts"
               (file-name-as-directory
                  (expand-file-name
                      "../contrib"
                      (file-name-directory (org-find-library-dir "org")))))))
+------+   +-----+   +-----+   +-----+
|{io}  |   |{d}  |   |{s}  |   |cBLU |
| Foo  +---+ Bar +---+ Baz +---+ Moo |
|      |   |     |   |     |   |     |
+------+   +-----+   +--+--+   +-----+
                        |
           /-----\      |      +------+
           |     |      |      | c1AB |
           | Goo +------+---=--+ Shoo |
           \-----/             |      |
                               +------+

Scratch

RL

AISC RL Workshop DLRL RL workshop

GoogleAI RL Projects

https://opensource.google/projects/dopamine https://opensource.google/projects/deepmind-lab https://opensource.google/projects/magenta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment