vv111y/DatSciWorkBook.org

## DatSciWorkBook.org

      
    Raw
  

              DatSciWorkBook.org
            
          
    DatSciWorkbook

HEADER

START [0/6]

LOG


  [2018-09-04 Tue 23:42] got this far with pytorch gan run: [Epoch 146/200] [Batch 885/938] [D loss: 0.372291] [G loss: 1.665622]
  [2018-10-12 Fri 12:56] note: G in GANs don’t see real image – perhaps makes it too slow to train. maybe it should have access to image? but seems to converge fairly quick to approximate - problems are when it is trying to get fine details.
    
      now this makes me think that perhaps a dialog between G and D for each part of image might help.
      training schedule of multiple iterations per one image so G tries to get close to one image at a time
    
  
  [2018-11-16 Fri 06:10] downloaded all pdfs for CS224n NLP course. Also made playlist of youtube lectures.
  [2019-02-06 Wed 09:32] my trial run couple weeks ago (meet#3?) of the spinningup code for RL. went well. on wks.
    611  conda install gym
    612  pip install gym
    623  conda create -n spinupRL python=3.6
    628  cd DevAcademics/ReinforcementLearning/spinningup
    629  pip install -e .
    630  conda list
    631  python -m spinup.run ppo –hid [32,32] –env LunarLander-v2 –exp_name installtest –gamma 0.999\n
    632  python -m spinup.run plot /home/will/DevAcademics/ReinforcementLearning/spinningup/data/installtest/installtest_s0
    633  python -m spinup.run test_policy /home/will/DevAcademics/ReinforcementLearning/spinningup/data/installtest/installtest_s0

[.] mathematicalmonk ML playlist

Machine Learning Playlist - YouTube

  Machine Learning  |  160 out of 160 videos
  Average Duration: 0 days, 0 hours, 13 minutes and 12 seconds
  Total Duration: 1 days, 11 hours, 13 minutes and 33 seconds

[.] keras trials with own code

[.] learn autoencoders

use this material to start on autoencoders, via TDLS slack channel:
  Ehsan [7 hours ago]
  Just for the channel I copy my answer here as well:
  There is an abundance of work on autoencoders … hmmmm most of my knowledge comes from reading articles. Here we go:
  This paper is a must for VAEs: https://arxiv.org/abs/1312.6114
But read this one first: http://proceedings.mlr.press/v27/baldi12a/baldi12a.pdf
  This is a great review: http://www.cl.uni-heidelberg.de/courses/ws14/deepl/BengioETAL12.pdf
  You will particularly like this one: https://arxiv.org/pdf/1502.04156.pdf
[.] infographic niagara data group

[.] [#B] O’Reilly for coding?

neet graph on loss function and convexification

"""Grab a cookie."""
import numpy as np
import matplotlib.pyplot as plt

z = np.linspace(-1,3, num=10000)
binary = z <= 0
hinge = np.maximum(0,1 -z)
quad = (z - 1) ** 2
logistic = np.log2(1 + np.exp(-z))

for name, loss in dict(binary=binary, hinge=hinge, logistic=logistic, quadractic=quad).items():
    plt.plot(z, loss, label=name)

plt.legend(loc="best")
plt.tight_layout()
outfile = "losses.png"
plt.savefig(outfile, dpi=200, bbox_inches="tight")
print(outfile)
plt.show()
[.] symbolic paper from Xiyang – Learning by Abstraction: The Neural State Machine

didn’t write down which one could be either:
Learning by Abstraction: The Neural State Machine – I think this
  Learning Neurosymbolic Generative Models via Program Synthesis
  NEURO-SYMBOLIC PROGRAM SYNTHESIS
MtL-Progress-github.io: Repository to track the progress in Meta-Learning  [2019-08-24 Sat 12:00]

MichaelMMeskhi/MtL-Progress-github.io: Repository to track the progress in Meta-Learning (MtL), including the datasets and the current state-of-the-art for the most common MtL
DatSci Tooling (toolsplan) [2/10]

Start


  All the best big data tools and how to use them - Import.io
    
      All the Best Big Data Tools and How to Use Them - Import.io
    
  
  Jupyter Notobook for beginner - most powerful tips - YouTube
  github desktop: all awesome lists, data science/ML repos
  academiclog tag
  both docker tech + data science tooling.
    
      containers services need ad-hoc swarming, getting things to work together. otherise just make one big container with everything for faster prototyping, turnaround.
      perhaps first one big box, and then develop swarms as proficiency increases and as targets are clearer.
      speed, experimentation first.
    
  
  data-sci, emacs
    
      replicate workflow as per here emacs + ipython workflow
      emacs Academic: {0/4}  want emacs dev boxes 1st to test out on.
    
  
  Container DevOps
    
      also link sets in chrome
    
  
  python guide from D Kinghorn
    
      How to Install Anaconda Python and First Steps for Linux and Windows
    
  
  drivendata/cookiecutter-data-science: A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Notes

jupyter notebook in pycharm

Using IPython/Jupyter Notebook with PyCharm - Help | PyCharm
  Installing, Uninstalling and Upgrading Packages - Help | PyCharm

  installing package option to install to suer’s site packages directory
    
      (on winwk, Will\AppData\Roaming\Python)
      did this for jupyter, matplotlib, sympy. as per tutorial (using dummy project)
      many other dep packages also installed
      jupyter is metapackage: install all jupyter components
      jupyter install had error: needs MS visual C++ 10.0. can do it that way, or pip –user in shell
        
          same error.
          seems this can all be done outside pycharm, and then just select in projects preferences.
        
      
convert notebook to orgmode

jupyter nbconvert notebook.ipynb --to markdown
pandoc notebook.md -o notebook.org
interactive jupyter via widgets

Widgets:
  Building Interactive Dashboards with Jupyter
  Project Jupyter | Widgets
  Interactive Visualizations In Jupyter Notebook – Towards Data Science
jupyter management

Connect to an existing kernel · Issue #2044 · jupyterlab/jupyterlab
  Initial server management implementation by lucbouchard1 · Pull Request #71 · jupyterlab/jupyterlab_app
  jwkvam/jupyterlab_vim: Vim notebook cell bindings for JupyterLab
rodeo, R studio type ide for python

Python Config Notes


  tensorflow
    
      conda package not watched by them
      conda env > virtualenv > pip > conda (docker another use case)
      docker for GPU recommended
    
  
  [X] update conda / anaconda
  pip3 requirements empty, all in pip, 322 total
  conda list - 574 total
    
      several are duplicates with pip installs
      latest conda root env export to yaml file: 358 in conda, 98 in pip
    
  
packages not found in conda channels

PackagesNotFoundError: The following packages are not available from current channels:

  r-r6==2.2.2=r3.4.1_0
  r-tibble==1.4.2=r3.4.1_0
  ca-certificates==2018.4.16=0
  r-bindr==0.1.1=r3.4.1_0
  zope.interface==4.5.0=py36h470a237_0
  yellowbrick==0.7=py36_1
  r-dbi==1.0.0=r341_0
  r-utf8==1.1.3=r3.4.1_0
  rpy2==2.9.3=py36r3.4.1_0
  jupyterlab==0.33.12=py36_0
  pcre==8.39=0
  constantly==15.1.0=py_0
  r-bit64==0.9_5=r3.4.1_0
  pytest-runner==4.2=py_0
  r-rlang==0.2.0=r3.4.1_0
  r-crayon==1.3.4=r3.4.1_0
  incremental==17.5.0=py_0
  readline==7.0=0
  r-git2r==0.21.0=r341h0c37787_0
  r-dbplyr==1.2.1=r341_0
  r-glue==1.2.0=r3.4.1_0
  json-rpc==1.10.3=py36_0
  r-blob==1.1.1=r3.4.1_0
  r-purrr==0.2.4=r3.4.1_0
  r-pillar==1.2.2=r341_0
  pytorch==0.4.1=py36_cuda0.0_cudnn0.0_1
  torchvision==0.2.1=py36_1
  protobuf==3.5.2=py36_0
  r-dplyr==0.7.4=r3.4.1_0
  r-base==3.4.1=3
  r-rcpp==0.12.15=r3.4.1_0
  hyperlink==17.3.1=py_0
  r-digest==0.6.15=r3.4.1_0
  onnx==1.1.2=py36h0c63530_0
  pyasn1-modules==0.2.1=py_0
  tzlocal==1.5.1=py_0
  libedit==3.1.20170329=0
  cssselect==1.0.3=py_0
  r-cli==1.0.0=r3.4.1_0
  r-tidyselect==0.2.4=r3.4.1_0
  yapf==0.22.0=py_0
  libprotobuf==3.5.2=0
  pyasn1==0.4.3=py_0
  r-magrittr==1.5=r3.4.1_0
  r-rsqlite==2.0=r3.4.1_0
  xgboost==0.72=py36_0
  service_identity==17.0.0=py_0
  r-prettyunits==1.0.2=r3.4.1_0
  r-bh==1.66.0_1=r3.4.1_0

conda installs special channels, not in conda-forge


  did conda install -c r r-git2r, to try and see if that helps -> nope
  conda install pytorch torchvision -c pytorch.
    
      previous install was gpu version. won’t work on mac
    
  
  conda install -c districtdatalabs yellowbrick
  r-dbplyr==1.2.1=r341_0
  r-pillar==1.2.2=r341_0
  torchvision==0.2.1=py36_1
  pytorch==0.4.1=py36_cuda0.0_cudnn0.0_1
  yellowbrick==0.7=py36_1

py packages [0/0]


  py-spy sampling profiler benfred/py-spy: Sampling profiler for Python programs
  sympy, symbolic

dit - discrete info theory py
dit: discrete information theory — dit 1.0.2 documentation
pweave - scientific report generator
mpastell/Pweave [2019-08-29 Thu 12:26]
  Pweave is a scientific report generator and a literate programming tool for Python. Pweave can capture the results and plots from data analysis and works well with NumPy, SciPy and matplotlib. It is able to run python code from source document and include the results and capture matplotlib plots in the output.
Pweave is good for creating reports, tutorials, presentations etc. with embedded python code It can also be used to make websites together with e.g. Sphinx or rest2web.
HPC python
(llvm used in other libs)
HPC python library
  Numba: High-Performance Python with CUDA Acceleration | Hacker News
  Numba: High-Performance Python with CUDA Acceleration | Parallel Forall
Another good library
  arrayfire/arrayfire-python: Python bindings for ArrayFire: A general purpose GPU library.
  arrayfire/arrayfire: ArrayFire: a general purpose GPU library.
Python on CUDA packages


  cuda is a target, but can compile to others numba/numba: NumPy aware dynamic Python compiler using LLVM
  cupy/cupy: NumPy-like API accelerated with CUDA
    
      CuPy is an implementation of NumPy-compatible multi-dimensional array on CUDA. CuPy consists of the core multi-dimensional array class, cupy.ndarray, and many functions on it. It supports a subset of numpy.ndarray interface.
    
  
  like a smaller version of above inducer/pycuda: CUDA integration for Python, plus shiny features
    
      Welcome to PyCUDA’s documentation! — PyCUDA 2019.1.2 documentation
      Is PyCuda even worth it? : CUDA mar-19, good info
    
  
  cuda: grids, blocks, then threads
    
      threadidx 3D
      thread does the execution
      block helps with indexing ?
      blocks should execute independantly. threads shared.
      cuda kernels are c/c++ code with additional syntax, most importantly __global__ for identifying the kernel function, and the <<<…>>> syntax for specifying grid size and block size
    
  
DatSci Automation [0/0]


  for career, real world use
    
      this is a major goal for all tooling
    
  
  want large scale as well
  much of the pipeline automated such that only some selection is needed
  any and all tools that simplify any of the process

data analysis ML clusters


  will use different cloud platform than for internet facing.
  security is more an issue there. the ML cluster will be more isolated
  also all the current tools are likely less secure anyways

ex: amazon sagemaker

NEW LAUNCH! Integrating Amazon SageMaker into your Enterprise - MCL34…
  Machine Learning Models & Algorithms | Amazon SageMaker on AWS
Visualizations [0/0]

Visualize | Keen IO
facets data visualization

Research Blog: Facets: An Open Source Visualization Tool for Machine Learning Training Data
  PAIR-code/facets: Visualizations for machine learning datasets
holoviews

HoloViews — HoloViews
Data Cleaning [0/0]


  Solved: Automatically cleaning your data - Microsoft Power BI Community
  Making data cleaning simple with the Sparkling.data library
  https://namara.io/#/ - some signup
  IBM BDU Labs | My Data
  OpenRefine/OpenRefine: OpenRefine is a free, open source power tool for working with messy data and improving it
    
      Home · OpenRefine/OpenRefine Wiki
      OpenRefine - Wikipedia
      UnaVista TR Public Data 11a_20170627 xls - OpenRefine
      openrefine.github.com
      Documentation For Users · OpenRefine/OpenRefine Wiki
    
  
  data prepping II [2018-08-25 Sat 16:27]
    
      best data cleaning munging tools - Google Search
      Seven Free Data Wrangling Tools
      What are the best data cleansing tools? - Quora
      What are the best resources to learn data wrangling (data cleaning)? - Quora
      What are the best languages and libraries for cleaning data? - Quora
      7 Steps to Mastering Data Preparation with Python
      Janitor, a good R package for data cleaning – SWIMMING IN THE DATA LAKE – Medium
    
  
Other Languages

julia tensorflow / ML
Goodies: check out the videos
How’s Julia language (MIT) for ML? : MachineLearning
  Julia vs. Python: Julia language rises for data science | InfoWorld
JuliaEditorSupport
  JuliaCon 2018 | Making the test-debug cycle more efficient | Tim Holy - YouTube
  JuliaCon 2018 | Tools for making program analysis and debugging manageable | Jameson Nash - YouTube
  JuliaCon 2018 | Cassette: Dynamic, Context-Specific Compiler Pass Injection for Julia | J Revels - YouTube
DeepLearningFrameworks/Knet_CNN.ipynb at master · ilkarman/DeepLearningFrameworks
  TIOBE Index | TIOBE - The Software Quality Company
  Julia and “deep learning” : Julia
TensorFlow.jl/why_julia.md at master · malmaud/TensorFlow.jl,
High Level Frameworks: OpenML, Rapids

Open GPU Data Science | RAPIDS


  whole DatSci pipeline on GPU, lots of big names, easy scale out, python integration.
  rapidsai/cudf: cuDF - GPU DataFrame Library
  rapidsai/cuml: cuML - RAPIDS Machine Learning Library
  Other projects
    
      RAPIDS + BLAZINGSQL
      RAPIDS + DASK
      RAPIDS + XGBOOST
      RAPIDS + SPARK
    
  
OpenML Home  [2019-08-17 Sat 09:48]

OpenML — OpenML 0.10.0 documentation
Democratizing Machine Learning
  As machine learning is enhancing our ability to understand nature and build a better future, it is crucial that we make it transparent and easily accessible to everyone in research, education and industry. The Open Machine Learning project is an inclusive movement to build an open, organized, online ecosystem for machine learning. We build open source tools to discover (and share) open data from any domain, easily draw them into your favourite machine learning environments, quickly build models alongside (and together with) thousands of other data scientists, analyse your results against the state of the art, and even get automatic advice on how to build better models. Stand on the shoulders of giants and make the world a better place.
[.] Exp-frameworks, templates, tool-notes [0/3]

Start


  cookbook first to study, and with awesomelist to organize these link dumps
    
      A Kretz
        
          [ ] Dat Sci Cookbook: local repo  DatSci Cookbook Repo  [2019-08-21 Wed]
            
              pdf and code examples
            
          
          Plumbers of Data Science - YouTube - Kretz, cookbook author
        
      
      awesome datsci https://github.com/EthicalML/awesome-production-machine-learning
      links in academiclog: big browser dump – go thru ML experiment frameworks
    
  
  prior notes [2019-08-15 Thu]
    
      when to use:
        
          dask
          mlflow
          polyaxon - seems more related to kubernetes, managing in production on clusters
          DVC - github-lfs + makefiles
            
              [ ] use with hservers and store big files on them?
            
          
      issue
large files in a project folder that will need to be kept seperate somehow
        
          big data
          big models
        
      
      FGLab (Kaixhin) 3 ppl only, smaller project
      MLflow, Sacred, FGLab, Polyaxon alts(competitors).
        
          h2o, datarobot also alts
          kubeflow complements, can run the others on top of it.
          DVC compliment?
        
      
      sagemaker, airflow, glue go together
      airflow can use to build pipelines to work on kubernetes
      tutorial vids: google, mlflow, machine-learning-yearning, etc
      [ ] model / data parallel example
      Manifold company is an example boutique biz ? emulate
    
  
The Data Engineering Cookbook Notes

Skeleton

Notes for page 4

Introduction

How To Use This Cookbook

Data Engineer vs Data Scientists
Data ScientistData EngineerWho Companies Need
Basic Data Engineering Skills

Learn To Code

Get Familiar With Git

Agile Development
Why is agile so important?Agile rules I learned over the yearsIs the method making a difference?The problem with outsourcingKnowledge is king: A lesson from Elon MuskHow you really can be agileAgile FrameworksScrumOKRSoftware Engineering Culture
Learn how a Computer Works
CPU,RAM,GPU,HDDDifferences between PCs and Servers
Computer Networking - Data Transmission
OSI ModelIP SubnettingSwitch, Level 3 SwitchRouterFirewalls
Security and Privacy
SSL Public & Private Key CertificatesWhat is a certificate authorityJSON Web TokensGDPR regulationsPrivacy by design
Linux
OS BasicsShell scriptingCron jobsPacket management
The Cloud
IaaS vs PaaS vs SaaSAWS,Azure, IBM, Google Cloud basicsCloud vs On-PremisesSecurityHybrid Clouds
Security Zone Design
How to secure a multi layered applicationCluster security with KerberosKerberos Tickets
Big Data
What is big data and where is the difference to data science and data analytics?The 4Vs of Big DataWhy Big Data?Planning is EverythingThe Problem With ETLScaling UpScaling OutPlease Don’t go Big Data
My Big Data Platform Blueprint
IngestAnalyse / ProcessStoreDisplay
Lambda Architecture
Batch ProcessingStream ProcessingShould you do stream or batch processing?Lambda Architecture AlternativeKappa ArchitectureKappa Architecture with KuduWhy a Good Data Platform Is Important
Data Warehouse vs Data Lake

Hadoop Platforms
What is HadoopWhat makes Hadoop so popular?Hadoop Ecosystem ComponentsHadoop Is Everywhere?Should you learn Hadoop?How does a Hadoop System architecture look likeWhat tools are usually in a with Hadoop ClusterHow to select Hadoop Cluster Hardware
Docker
What is docker and what do you use it forDon’t Mess Up Your SystemPreconfigured ImagesTake It With YouKubernetes Container DeploymentHow to create, start,stop a ContainerDocker micro services?KubernetesWhy and how to do Docker container orchestrationUseful Docker Commands
REST APIs
API DesignImplementation FrameworksOAuth security
Databases
SQL DatabasesPostgreSQL DBDatabase DesignSQL QueriesStored ProceduresODBC/JDBC Server ConnectionsNoSQL StoresKeyValue Stores (HBase)Document Store HDFSDocument Store MongoDBElasticsearch Search Engine and Document StoreHive WarehouseImpalaKuduApache DruidInfluxDB Time Series DatabaseMPP Databases (Greenplum)
Data Processing and Analytics - Frameworks
Is ETL still relevant for Analytics?Stream ProcessingThree methods of streamingAt Least OnceAt Most OnceExactly OnceCheck The Tools!MapReduceHow does MapReduce workExampleWhat is the limitation of MapReduce?Apache SparkWhat is the difference to MapReduce?How does Spark fit to Hadoop?Where’s the difference?Spark and Hadoop is a perfect fitSpark on YARN:My simple rule of thumb:Available LanguagesHow Spark works: Driver, Executor, SparkcontextSpark batch vs stream processingHow does Spark use data from HadoopWhat are RDDs and how to use themHow and why to use SparkSQL?What are DataFrames how to use themMachine Learning on Spark? (Tensor Flow)MLlib:Spark SetupSpark Resource ManagementApache NifiStreamSets
Apache Kafka
Why a message queue tool?Kakfa architectureWhat are topicsWhat does Zookeeper have to do with KafkaHow to produce and consume messagesKAFKA Commands
Machine Learning
Training and Applying modelsWhat is deep learningHow to do Machine Learning in productionWhy machine learning in production is harder then you thinkModels Do Not Work ForeverWhere The Platforms That Support This?Training Parameter ManagementWhat’s Your Solution?How to convince people machine learning worksNo Rules, No Physical ModelsYou Have The Data. USE IT!Data is Stronger Than OpinionsAWS Sagemaker
Data Visualization
Android & IOSHow to design APIs for mobile appsHow to use Webservers to display contentTomcatJettyNodeREDReactBusiness Intelligence ToolsTableauPowerBIQuliksenseIdentity & Device ManagementWhat is a digital twin?Active Directory
Data Engineering Course: Building A Data Platform

What We Want To Do

Thoughts On Choosing A Development Environment

A Look Into the Twitter API

Ingesting Tweets with Apache Nifi

Writing from Nifi to Apache Kafka

Apache Zeppelin
Install and Ingest Kafka TopicProcessing Messages with Spark & SparkSQLVisualizing Data
Switch Processing from Zeppelin to Spark
Install SparkIngest Messages from KafkaWriting from Spark to KafkaMove Zeppelin Code to Spark
Case Studies

How I do Case Studies
Data Science @AirbnbData Science @AmazonData Science @BaiduData Science @BlackrockData Science @BMWData Science @Booking.comData Science @CERNData Science @DisneyData Science @DrivetribeData Science @DropboxData Science @EbayData Science @ExpediaData Science @FacebookData Science @GoogleData Science @@GrammarlyData Science @ING FraudData Science @InstagramData Science @LinkedInData Science @LyftData Science @NASAData Science @NetflixData Science @OLXData Science @OTTOData Science @PaypalData Science @PinterestData Science @SalesforceData Science @Siemens MindsphereData Science @SlackData Science @SpotifyData Science @SymantecData Science @TinderData Science @TwitterData Science @UberData Science @UpworkData Science @WootData Science @Zalando
1001 Data Engineering Interview Questions

Live Streams

All Interview Questions

[#B] big browser dump – go thru ML experiment frameworks

Experiment Templates


  NullConvergence/torch_temp: A(nother) Pytorch experimental template - uses sacred
  victoresque/pytorch-template: PyTorch deep learning projects made easy.
  ml-tooling/ml-project-template: ML project template facilitating both research and production phases.
    
      from ml-tooling Berlin group
      research & production
    
  
  MrGemy95/Tensorflow-Project-Template: A best practice for tensorflow project template architecture.
  williamFalcon/pytorch-lightning: The lightweight PyTorch wrapper for ML researchers. Scale your models. Write less boilerplate
    
      PyTorch lightning Documentation  [2019-10-12 Sat 21:10]
      williamFalcon/pytorch-lightning-conference-seed: Pytorch Lightning code guideline for conferences
    
  
  my clones
    
      vv111y/torch_temp: A(nother) Pytorch experimental template
      vv111y/pytorch-template: PyTorch deep learning projects made easy.
    
  
  distillpub/template: This is the repository for the distill web framework
  h5bp/html5-boilerplate: A professional front-end template for building fast, robust, and adaptable web apps or sites.

Start


  how do you setup your ml pipeline? : MachineLearning
    
      medium - How do you manage your Machine Learning Experiments?
      How experiment management can improve the ROI of your machine learning projects
      How to Work With Stakeholders as a Data Scientist - Towards Data Science
      Reproducible model training: deep dive - Towards Data Science
      Slurm Workload Manager - Wikipedia
    
  
  How do you manage your machine learning experiments? : MachineLearning
    
      scussion How do you manage and keep track of your experiments? : MachineLearning
      Best way to manage ML experiements : MachineLearning
      How do you keep track of your experiment results? : MachineLearning
      What tools are used in practice to schedule training jobs, annotate datasets, keep track of past experiments… ? : MachineLearning
    
  
  using git for deep learning experiments - Google Search
    
      How to Plan and Run Machine Learning Experiments Systematically
      How do you manage your machine learning experiments? : MachineLearning
    
  
  Compare to other ML e2e platforms · Issue #58 · mlflow/mlflow
  What are the current open source alternatives to MLflow? | Hacker News
  Towards Reproducible Research with PyTorch Hub | Hacker News
  Home - Guild AI
  Verta Enterprise Runthrough - YouTube
  The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction – Google AI
  Weights & Biases
  Forge, or how do you manage your machine learning experiments?
  neptune.ml: Experiment management tool that fits any workflow
  fast.ai · Making neural nets uncool again
    
      Fast AI Video Viewer
      | fastai
    
  
  gems-uff/noworkflow
    
      Supporting infrastructure to run scientific experiments without a scientific workflow management system.
      for general python script experiments
    
  
ML Frameworks


  github group from Berlin, 8 repos Machine Learning Tooling
  IDSIA/sacred: Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.
  MIC-DKFZ/trixi
    
      Manage your machine learning experiments with trixi - modular, reproducible, high fashion. An experiment infrastructure optimized for PyTorch, but flexible enough to work for your framework and your tastes.
      trixi/pytorch_experiment.ipynb at master · MIC-DKFZ/trixi
    
  
  seba-1511/randopt: Streamlined machine learning experiment management.
    
      seba1511.net/randopt/
    
  
  kubeflow/pipelines: Machine Learning Pipelines for Kubeflow
    
      Research dill vs. cloudpickle for pickling functions · Issue #1387 · kubeflow/pipelines
    
  
  TRAINS (fewer stars)
    
      TRAINS: An open-source, zero-integration tool to boost machine learning research
      allegroai/trains: TRAINS - Auto-Magical Experiment Manager & Version Control for AI
      allegroai/trains-server: TRAINS Server - Auto-Magical Experiment Manager & Version Control for AI
      trains/brief.md at master · allegroai/trains
      Allegro.ai - Deep Learning Computer Vision Platform
      trains - Allegro.AI
    
  
  Home - Guild AI
    
      guildai/guildai: Open source experiment tracking and optimization for machine learning
    
  
  Comet.ml | Supercharging Machine Learning
  mlflow/mlflow: Open source platform for the machine learning lifecycle
    
      MLflow - A platform for the machine learning lifecycle | MLflow
      Introducing MLflow: an Open Source Platform for the Complete Machine Learning Lifecycle
    
  
  Weights & Biases
  kubeflow/kubeflow: Machine Learning Toolkit for Kubernetes
    
      Kubeflow | Kubeflow
    
  
  seba-1511/randopt: Streamlined machine learning experiment management.
  richardliaw/track: Track your ML project!
  kubeflow mlflow - Google Search

Github Search · machine learning project - useful looking stuff
more ML


  What’s your favorite logger? : MachineLearning
  How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform - Databricks
  Kaixhin/FGLab: Future Gadget Laboratory
    
      FGLab: Machine Learning Dashboard
    
  
  Semantic Versioning 2.0.0 | Semantic Versioning
  danielwaterworth/metricmachine: Simple flask app for displaying live timeseries data
  polyaxon/polyaxon: A platform for reproducible and scalable machine learning and deep learning on kubernetes
  Pachyderm - Scalable, Reproducible Data Science
  Controlled Experiments in Machine Learning
  rquintino (Rui Quintino)

pythonfu - pickling, generator, etc

How to check if an object is a generator object in python? - Stack Overflow
  Why can’t python module dill pickle the generator function? - Stack Overflow
  pickle iterators and generators · Issue #10 · uqfoundation/dill
  python - Why can’t generators be pickled? - Stack Overflow
  Issue 1092962: Make Generators Pickle-able - Python tracker
  python - Why can’t generators be pickled? - Stack Overflow
  Where does a generator store it’s values? : Python
  Automatically remove generator object from memory at StopIteration (Python) - Stack Overflow
  Python multiprocessing PicklingError: Can’t pickle <type ‘function’> - Stack Overflow
  UsingPickle - Python Wiki
  Change Fork Name For Github - Stack Overflow
CI CB CD stuff

Jenkins (software) - Wikipedia
  Category:Build automation - Wikipedia
  Continuous integration - Wikipedia
  Continuous Integration. CircleCI vs Travis CI vs Jenkins - By Django Stars
  Jenkins (software) - Wikipedia
  Continuous delivery - Wikipedia
  Continuous deployment - Wikipedia
  Comparison of continuous integration software - Wikipedia
  reddit.com: search results - continuous deployment
  magit-circleci: See the latest CircleCI builds from the Magit status buffer. : emacs
rmuslimov/jenkins.el: Jenkins plugin for emacs
  kljohann/mpv.el: control mpv for easy note taking
  Jenkins Is Getting Old | Hacker News
  Product Vision - CI/CD | GitLab
  Fun with Gitlab CI - VADOSWARE
fork searching

Can’t see the forks of a project on GitHub when “Too many forks to display” is shown - Web Applications Stack Exchange
  Intuitive way to view most active fork in GitHub - Stack Overflow
  Popular github Forks
  GitPop2: Find the most popular fork on GitHub
  Active GitHub Forks
  Enhanced GitHub - Chrome Web Store
major link dump tooling [2019-08-15 Thu]

Write proper python

How to write a production-level code in Data Science?
  Refactoring Python Code for Machine Learning Projects. Python “Spaghetti Code” Everywhere!
How to Write Beautiful Python Code With PEP 8 – Real Python
  How to write a production-level code in Data Science?
  styleguide | Style guides for Google-originated open-source projects
  Coding Style Guidelines — Pylearn 0.1 documentation
Python Packaging

Packaging Python Projects — Python Packaging User Guide
  Making a PyPI-friendly README — Python Packaging User Guide
  Minimal Structure — Python Packaging Tutorial
  Over 10% of Python Packages on PyPI are Distributed Without Any License | Snyk
  Choose an open source license | Choose a License
  Licenses | Choose a License
  TLDRLegal - Software Licenses Explained in Plain English

A template to make good README.md
  template-python/README.md at master · jacebrowning/template-python
activescott/python-package-example: A simple example of creating and consuming a distributable Python package.

Where do you keep your files? : emacs
  Rational ClearCase - Wikipedia
Reddit

D What’s your favorite logger? : MachineLearning
  D How do you manage your machine learning experiments? : MachineLearning
  Discussion How do you manage and keep track of your experiments? : MachineLearning
  D Best way to manage ML experiements : MachineLearning
  D How do you keep track of your experiment results? : MachineLearning
  D What tools are used in practice to schedule training jobs, annotate datasets, keep track of past experiments… ? : MachineLearning
several frameworks

Home - Guild AI
  guildai/guildai: Open source experiment tracking and optimization for machine learning
  Comet.ml | Supercharging Machine Learning
  mlflow/mlflow: Open source platform for the machine learning lifecycle
  Tutorial — MLflow 1.2.0 documentation
  Weights & Biases
  kubeflow/kubeflow: Machine Learning Toolkit for Kubernetes
  Kubeflow | Kubeflow
  seba-1511/randopt: Streamlined machine learning experiment management.
  richardliaw/track: Track your ML project!
MLflow

Introducing MLflow: an Open Source Platform for the Complete Machine Learning Lifecycle
  How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform - Databricks
  What are the current open source alternatives to MLflow? | Hacker News
FGLab

FGLab: Machine Learning Dashboard
  Kaixhin/FGLab: Future Gadget Laboratory
frameworks & discussions

Semantic Versioning 2.0.0 | Semantic Versioning
  danielwaterworth/metricmachine: Simple flask app for displaying live timeseries data
  polyaxon/polyaxon: A platform for reproducible and scalable machine learning and deep learning on kubernetes
  Pachyderm - Scalable, Reproducible Data Science
  mlflow sacred weights and biases - Google Search
  Compare to other ML e2e platforms · Issue #58 · mlflow/mlflow
  Controlled Experiments in Machine Learning
  rquintino (Rui Quintino)
  Towards Reproducible Research with PyTorch Hub | Hacker News
  Tutorial — Airflow Documentation
Build end-to-end machine learning workflows with Amazon SageMaker and Apache Airflow | AWS Machine Learning Blog
TRAINS

TRAINS: An open-source, zero-integration tool to boost machine learning research
  allegroai/trains: TRAINS - Auto-Magical Experiment Manager & Version Control for AI
  allegroai/trains-server: TRAINS Server - Auto-Magical Experiment Manager & Version Control for AI
  trains/brief.md at master · allegroai/trains
  Allegro.ai - Deep Learning Computer Vision Platform
  trains - Allegro.AI
cookiecutter

Home - Cookiecutter Data Science
  drivendata/cookiecutter-data-science: A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
  manifoldai/docker-cookiecutter-data-science: A fork of the cookiecutter-data-science leveraging Docker for local development.
  An AI Engineering Services Firm | Manifold
DVC etc

Machine Learning Version Control System · DVC
  Data Version Control - Machine Learning Time Travel - YouTube
  iterative/dvc: 🦉Data Version Control | Git for Data & Models
Workflow management system - Wikipedia
Git Large File Storage | Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise.
github - How do Git LFS and git-annex differ? - Stack Overflow
D How do you structure your PyTorch deep learning Implementations/Projects/PythonLibs

D How do you structure your PyTorch deep learning Implementations/Projects/PythonLibs : MachineLearning
  importlib — The implementation of import — Python 3.7.4 documentation
  pytorch-template/README.md at master · victoresque/pytorch-template
  MrGemy95/Tensorflow-Project-Template: A best practice for tensorflow project template architecture.
  toolkit/pytorch_project_template at master · gmum/toolkit
google ML

Rules of Machine Learning:  |  ML Universal Guides  |  Google Developers
  Machine Learning Crash Course  |  Google Developers
  Introduction to Machine Learning  |  Machine Learning Crash Course  |  Google Developers
  Introducing ML - YouTube
  Education – Google AI
excellent reddit discussion {D} How do you structure your PyTorch deep learning Implementations/Projects/PythonLibs : MachineLearning

[2019-08-15 Thu 13:21]
r/MachineLearning
  Posted by u/__Julia . 9 months ago Archived
  
Hi, In the data science community, I have seen a wide adoption of this project structure https://github.com/drivendata/cookiecutter-data-science. However, I am still struggling to find a unified way to structure ML experiments that save readers time to understand the structure of the project.
20 comments

  snaf77 38 points · 9 months ago
    
      Yes, my team have (somewhat). I have written couple of medium articles on that: https://medium.com/@mbednarski
      Some our rules of thumb:
        
          Notebooks are allowed only for private projects, we do not even commit them.
          All steps needs to be reproducible from raw data to trained models (we use gnu make)
          Teams needs to know more about Python, not only basics - this allows them to write better code.
          git flow
          strict package versioning - we use pip-tools
          swagger for api
          It is regular software project, ML does not allow to do “shortcusts because >>science<<”. So all principles like DRY, KISS. Exception is when performance is an issue (but it should be profiled first)
          (unit) test where possible - e.g. data loaders, preprocessors etc
          Well defined entry points (one common cli is better than bunch of scripts)
          DOCUMENTATION for things that are not obvious from reading the code
          I prefer to keep as much configuration (batch size, optimizer, etc) in json files, but not everyone likes it (i like to have mapping: json config -> results dir)
        
      
      AllenNLP is a good inspiration for me
      LiberalSexist 2 points · 9 months ago
        
          Just read the first part of your structured ML series and found plenty of great ideas applicable for data-science projects in general.
          “AllenNLP is a good inspiration for me” – What work/code of AllenNLP do you mean in particular?
          BatJedi121 1 point 8 months ago
            
              I personally like how almost everything is configureable through JSON files. A lot of the boilerplate like vocab, masking sequences, having LSTMs to vectors/LSTMs to multiple outputs like things, typical attention types handled in really simple ways. setting up the training/val loop, collecting metrics, serializing/loading models.
              I think getting to AllenNLP level for projects is overkill - but I think future libraries should definitely follow their design principles. take the boilerplate out.
            
          
  mate_classic 13 points · 9 months ago
    
      I was thinking about the same recently. Right now I’m trying to refactor my code to resemble this template here: https://github.com/victoresque/pytorch-template/blob/master/README.md
      thatguydr 1 point · 9 months ago
        
          I like the OP’s link a lot better for the top-down structure, as it separates all of the code into a single spot that can be committed easily. I like your link because it separates the model classes, the loader classes, the trainer class(es), and the utility classes (though I’d put the abstracts in that spot).
          What I don’t like in either is the rather cavalier treatment of the reporting/evaluation. If you could tie the evaluation and the original config in with the code, all of that could be committed, provided you make a low-memory evaluation format and not dozens of plots. (Maybe throw the requirements in with them.) That’d be a very clean solution.
        
      
      mate_classic 1 point 9 months ago
        
          I didn’t really think about evaluation data. Maybe because I’m working only with generative models right now, where evaluation is looking for the most beautiful picture most of the time.
        
      
      [deleted] 1 point · 8 months ago
        
          I really like this template. Gonna start using this.
        
      
  ranihorev 8 points · 9 months ago
    
      The biggest challenge for me is how to do the transition from the notebook to production fast and smooth
        
          srossi93 26 points · 9 months ago
            
              Simple, never use notebooks! Notebook is a great tool for visualization, simple experiments, debugging, but as soon as the lines of code are >50, I immediately switch to a more structured organization. BTW, PyTorch is super OO and it’s super easy to derive, inherit, extend functionalities!
                
                  tidier 2 points 9 months ago
                    
                      Never might be a bit strong, but I absolutely agree with everything else. Notebooks are excellent for experimentation, but I aggressively shift stuff to Python files and use importlib.reload.
                    
                  
                  ranihorev 1 point 9 months ago
                    
                      I completely agree. The tricky part is to identify the point in which the experiment is done…
                    
                  
          JanneJM 2 points 9 months ago
            
              You can run external programs from the notebook though. One benefit of doing that is that you could have a self-documenting pipeline, with the final (or just preview) results inline with the commands running the model, glue logic and so on.
            
          
          MoreDonuts 1 point 9 months ago
            
              And compose.
            
          
          gionnelles 1 point 9 months ago
            
              My team follows this exclusively.
            
          
  pickwickdick 3 points · 9 months ago
    
      I structure my code using the following conventions:
        
          1. I keep the model and the train/eval logic separate.
          2. I have a ParamParser class in a utils folder that takes in a JSON file as input and exposes all keys as member variables that can be accessed like member variables.
          3. In my model.py file I define the loss function, accuracy and expose it via a metrics dictionary. Now in train.py I can simply call metrics['accuracy'](out,label) to compute the accuracy (or loss).
        
      
      Hopefully, this helps answer your questions OP.
    
  
ekshaks 3 points · 9 months ago
  Dealing with tensor shapes and documenting them for others is a pervasive problem. I use shape annotations using the tsalib library to document shapes throughout the data and model pipeline.
  https://github.com/ofnote/tsalib
mentatf 2 points · 9 months ago
  Using scikit learn guidelines and skorch that fits perfectly with that. ( https://github.com/dnouri/skorch)
RoastDepreciation 1 point · 9 months ago

  Cookiecutter is an excellent starting point. Adapt to your team’s needs.
  Not committing notebooks is simply not an option. They’re here to stay. Instead commit notebooks without output and store regular html exports in a separate reports root folder for future reference and reporting in the team.

katyngate 0 points · 9 months ago
  Here’s one possible way: https://github.com/gmum/toolkit/tree/master/pytorch_project_template
[.] MLflow tryout

[.] guildAI tryout

[.] pytorch-lightning, trains, wandb

guildAI slack


  skimming through:
    
      they should have an api for notebooks
      slides on autoML /home/will/Downloads/Chicago ML - Applied Engineering Workshop.pdf
    
  
explanation how guild works

Hi @Mohammedi Haroune and welcome! By default, Guild inspects the script you want to run (or the main module specified in the Guild for the operation) and checks for the use of argparse. If the script uses argparse, Guild runs the script with the –help option and uses that dry-run to inspect the arguments available and uses those as defined (see note concerning magic below). If the script does not use argparse, Guild checks for global variable assignments of numbers and strings and uses those as flags. With that information (either from argparse or globals) it lets the user redefine flag values using NAME=VALUE on the command line. Before it runs the operation, Guild prints the flag values as a preview. You can also see what Guild is importing by running guild help or guild run SCRIPT_OR_OPERATION –help-op.
  If the flags come from argparse, Guild passes those as command line options to the script. If defined as globals, Guild sets the global values to the user-provided values by dynamically modifying the module AST as it’s loaded.
  This is all a bit magical and everyone reading this should feel a little uneasy at this point 🙂 The reason for all this implicit logic is to let you pick up a script and just run it - Guild captures the experiment as expected. In most cases this magic just works and everyone’s happy. But when it doesn’t, it’s mysterious and frustrating!
  The good news is that all of this behavior can be strictly controlled - and even disabled altogether - with a few lines in a Guild file (a file named guild.yml in your project directory). In the Guild file, you can provide explicit information about the flags for an operation as well as how the flags are set. This scheme is quite under-documented atm. For a fairly exhaustive list of examples on this topic, see:
  https://github.com/guildai/guildai/tree/master/guild/tests/samples/projects/flags
  You can change to that directory (after cloning the repo obviously) and run each example to see the behavior. Of course that’s an exercise for the uber curious 🙂 If you want to accomplish something and it’s not falling into place for you - please just post your question here and someone can help! (
Abhinv Ramesh Kashyap 6:15 AM

I saw that any value that is output in the formal key:value will be captured by guild. Is there any other way to explicitly tell  guild to capture this or outputting to stdout is the only way as of now? (edited)
  Garrett 9:56 AM
  @Abhinv Ramesh Kashyap The short answer is yes, definitely - Guild happily reads any generated TF event files. By default Guild parses your script output for patterns KEY: NUMBER as you observed. However you can control that behavior using a Guild file. Here’s an example that steps you through the concepts and shows you how to configure an operation for both modified parsing behavior and also how to disable the parsing altogether when you just want to log values directly.
  https://github.com/guildai/examples/tree/master/custom-scalars (edited)
Garrett 7:51 AM

@here I’ve created a new repo that we can use to work on/communicate issue resolution:
  https://github.com/guildai/issues
  Sometimes (often) it’s handy to systematically reproduce a bug/issue and be able to quickly re-run steps against new releases to confirm expected behavior.
  Our first examples is related to source code copies - an important topic for many Guild users. If you’re interested in how Guild decides what files to save as source code, this https://github.com/guildai/issues/tree/master/issue-39 is a step-by-step walk through. (edited)
  0.6.6rc2 is available for pre-release eval. I snuck in a pretty cool feature that I’d love to get some feedback on. Now when you run guild tensorboard Guild will prepare TensorBoard HParam summaries in the back ground so you can compare run hyperparams and metrics in the HParam tab. This is a really nice feature offered by TensorBoard!
I got a question about workflow and I’d like to answer here for everyone’s benefit.

Garrett Sep 3rd at 7:09 AM
  The gist of the question is related to a common pipeline: prepare data from some raw source, engineer features on the prepared data (a second stage of data prep), train a model, validate a model.
  In a Guild file, each of these stages are defined as separate operations. The operations are related to one another through resource dependencies. The first operation will depend on the raw data. Subsequent operations will depend on their upstream operations. Something like this:
  prepare-data:
  requires:

  file: data.csv

add-features:
  requires:

  operation: prepare-data

train:
  requires:

  operation: add-features

validate:
  requires:

  operation: train

It might make sense for some of these operations to be melded into one. E.g. prepare-data and add-features could be one operation (i.e. roll the feature engineering work into the data prep script). Or train and validate could be one (validate as a part of the training script - this is very common). The triggers for creating a separate operation (e.g. split up raw data prep and feature engineering) are:

  Does the operation take long? (a subjective term - but usually you know it when you see it) - If yes, consider creating a separate operation to simply avoid having to re-run the operation when you can re-use artifacts as a dependency.
  Could the operation  potentially be run multiple times, each time with different hyperparameters or inputs (flag values) for a given set of upstream dependencies? For example, for validation, its common to validate against new data sets as new labeled examples become available. You probably don’t want to retrain a model just to revalidate with new data. In this case, validate should be a separate operation. (edited)
21
María Benavente  6 days ago
  awesome! let’s say, for example, that add-features accesses as well the data file, would it be necessary to set the requirement also for that operation? (edited)
Garrett  6 days ago
  Yes, indeed it would! You could get to data by way of the prepare-data operation but this is not a good idea - and arguably Guild should treat that as an error (or warn you). You should instead list data as a required resource for add-features.
  This is where defining your resources in separate named sections is a good idea. Then you can simply reference the resource by name and not have to redefine it every time it’s needed. To define a named resource, you need to define a model. For example

  model: my-model
    resources:
    raw-data:
    sources:
    
      file: data-1.csv
      file: data-2.csv
    
    prepared-data:
      sources:
    
      operation: prepare-data
    
    operations:
      prepare-data:
      requires: raw-data
      add-features:
      requires:
    
      raw-data
      prepared-data
    
  
…
  Note that I went ahead and defined a prepared-data resource. Even if a resource is only used once, I think it’s nice to define named resources as it keeps the operation requires config simple and readable. (edited)
Garrett  6 days ago
  Note that I edited the example above to include a sources attr under each resource. Guild requires this atm. (I’m actually going to fix this right now to make  sources optional - for now you have to use it.)
María Benavente  6 days ago
  wow, okey! that’s really clean
María Benavente  6 days ago
  and in order to avoid those files running with sourcecode // exclude would it be possible to reference it also that way? Example:
  sourcecode:

  exclude: raw-data

Garrett  6 days ago
  Yes but you have to spell that as - exclude: raw-data/*
Garrett  6 days ago
  I don’t really like that requirement - I’ll look into fixing that so you can just list the directory there.
  👍
  1
Garrett  6 days ago
  But for now, use the glob pattern.
María Benavente  6 days ago
  alright
María Benavente  6 days ago
  i’m finding quite confusing a behavior I’m experimenting as a result of requiring specific files into each operation:

  model: claim-detection
    description: Classifier for claim-tagged data
    resources:
    excels:
    sources:
    
      file: data/excels/
      file: data/raw/
    
    raw:
      sources:
    
      file: data/raw/
      file: data/processed/
      operation: generatedataset
    
  
Now that I do this, locally to my code, those folders “loose the data prefix”.  I wasn’t a big deal to update my global_path variable at the code, but I’m not sure whether the global path should remain or not
María Benavente  6 days ago
  (did I explain myself here?)
Garrett  6 days ago
  Yes, that’s right - the way you’re specifying the data files, they will not appear under a data path. They are selected and linked to using their base names (e.g. excels, raw, etc.)
  If you want these selected files/dirs to appear under a data path, you do a couple things. First, you could just select data and leave it at that:
  sources:

  file: data

This will create a link to data and you’ll have access to everything in that directory.
  If you’d prefer to be more specific (generally a good idea) you can specify a path attr to indicate that links to selected files/dirs should be created in a sub-directory. Like this:
  excels:
  path: data
  sources:

  file: data/excels
  file: data/raw

This will create the directory structure that you’re expecting - but only include the two specified dirs as links.
Garrett  6 days ago
  Btw, in cases like this where you’re trying to sort out the directory layout, there’s a –stage DIR option to the run command that will only layout the run directory and not actually run the operation. You can inspect DIR in this case to see what Guild is doing.2
Garrett  6 days ago
  The third option is the one you mention, which is to adjust your script to look for the resources in something other than data.
  In most cases, I just specify data as a source and be done with it. Remember this creates a symlink - it’s not copying anything. The only harm in including data is that you have access to everything in that dir, which could mask some bugs. It’s also less explicit. There’s a point however when being explicit has diminishing returns - so it’s a judgment call.
[2019-10-20 Sun]

good commentary (R) pytorch-lightning - The researcher’s version of keras : MachineLearning  [2019-10-20 Sun 22:39]
awesome, comprehensive: A Comparison of Reinforcement Learning Frameworks: Dopamine, RLLib, Keras-RL, Coach, TRFL, Tensorforce, Coach and more  [2019-10-20 Sun 22:55]
[.] mastery python system -includes Arch [1/4]


  BEST (Raschka) –> /home/will/DevAcademics/LanguageThemed/python_reference
  look at chrome links in NOW

[.] conda, pyenvs, the whole thing, and setup proper envs policy


  need to revisit these:
    
      Using Anaconda Properly/Safely in Arch : archlinux
      arch pythong packages anaconda - Google Search
      AUR (en) - Search Criteria: Anaconda
      How To Make Package Managers Cry - YouTube
      reddit sidebar guides?
    
  
  I can readily reinstall all packages into an env, and then delete all in root env, both pip and conda
  setuptools, easy_install old, don’t
  PYTHONPATH, don’t
  also mentioned, for science packages, you usually want the most up-to-date. the dependancy issue is more for web frameworks and other apps. There just have one env for conda and keep that updated. only use envs for rarer cases
  Best answer: python - What is the difference between venv, pyvenv, pyenv, virtualenv, virtualenvwrapper, pipenv, etc? - Stack Overflow
  DKinghorn: very good reasons for Anaconda
    
      How to Install Anaconda Python and First Steps for Linux and Windows
      Install Intel Python using conda from Anaconda Python
    
  
  decent guides
    
      How to Setup a Python Environment for Machine Learning and Deep Learning with Anaconda
      How to Learn Python for Data Science (Updated)
      detailed explanations How to Set Up Your Python Environment on a Mac — davidculley.com
        
          How to Install Software via Homebrew — davidculley.com
        
      
      meh
        
          The definitive guide to setup my Python workspace – Henrique Bastos 2017
            
              uses pyenv (with pyenv-virtualenv, pyenv-virtualenvwrapper),
              puts anaconda IN pyenv?
              seems to knowledgable
            
          
          Simple Python Environments For Data Science 🐍 – Rick Galbo – Medium
        
      
  comments+1 Freezing Python’s Dependency Hell | Hacker News
    
      52 days ago. complaints of pipenv; poetry, nix again
      deeper discussion why nix way is better way
      nix vs conda, similar approach
      meh, Python Virtual Environments – a Primer 2016 | Hacker News
        
          Nix? Vex? heard nix several times now in guides. composable
          conda again
        
      
  good review Pipenv review, after using it in production – David Jetelina – Medium
    
      maybe go back to the basics virtualenv + pip, when not conda
      more complaints, 4mths ago Pipenv: A Guide to the New Python Packaging Tool : Python
    
  
  homepage Pipenv: Python Dev Workflow for Humans — pipenv 2018.7.1.dev0 documentation
    
      Pipenv: One Year Later and a Call for Help | Hacker News
      Advanced Usage of Pipenv — pipenv 2018.7.1.dev0 documentation
        
          To use Pipenv with a third-party Python distribution (e.g. Anaconda), you simply provide the path to the Python binary:
            
              $ pipenv install –python=/path/to/python
            
          
          Anaconda uses Conda to manage packages. To reuse Conda–installed Python packages, use the –site-packages flag:
            
              $ pipenv –python=/path/to/python –site-packages
            
          
[.] learn pytorch - use new notebooks
maybe - Deep Neural Networks with PyTorch - Stefan Otte - YouTube
[.] python dev chat groups QA


  gitter, irc for python, anaconda, etc about datsci practices, ie failure of portable conda envs, need to customize
  one or few best envs for general datsci research. there should be only a few.
  any overall package guide? prob not

Python Closures: How to use it and Why?
  python map function - Google Search
  best python coding slack channels - Google Search
  python coding gitter - Google Search
  reddit: the front page of the internet
  Python
  Python coding: a subreddit for people who know Python
  Quick python tips to add to your collection
[x] how does package/module system work

named tensors


  [2019-07-15 Mon]
    
      Proposal Named Axes/Dimensions or Tensor Shape Annotations · Issue #4164 · pytorch/pytorch
      Tensor Considered Harmful
      Tensor Considered Harmful Pt. 2
      harvardnlp/namedtensor: Named Tensor implementation for Torch
      pydata/xarray: N-D labeled arrays and datasets in Python
      xarray: N-D labeled arrays and datasets in Python — xarray 0.12.2 documentation
      @xarray_dev (@xarray_dev) | Twitter
      ofnote/tsalib: Tensor Shape Annotation Library (numpy, tensorflow, pytorch, …)
    
  
  naming axis in tensors (Thu Jan 10 2019)
    
      NVIDIA/OpenSeq2Seq: Toolkit for efficient experimentation with various sequence-to-sequence models (https://github.com/NVIDIA/OpenSeq2Seq)
      ctongfei/nexus: Experimental typesafe tensors / deep learning / probabilistic programming in Scala (https://github.com/ctongfei/nexus)
      harvardnlp/namedtensor: Proof of concept for a dynamic named tensor for pytorch (https://github.com/harvardnlp/namedtensor)
        
          Tensor Considered Harmful (http://nlp.seas.harvard.edu/NamedTensor)
            
              [D] Tensor Considered Harmful (A polemic against numpy / pytorch and a proposal for a named tensor) : MachineLearning (https://www.reddit.com/r/MachineLearning/comments/accmek/d_tensor_considered_harmful_a_polemic_against/)
                
                  harvardnlp on Twitter: “”Tensor Considered Harmful” (https://t.co/iueFvrYT6O). A polemic against numpy / pytorch and a proposal for a named tensor (https://t.co/MVBUm7OyBq). (New year’s goal, be more troublesome.)… https://t.co/fLmk8RR4Xy” (https://twitter.com/harvardnlp/status/1080911225427496966)
                  Yann LeCun on Twitter: “A pretty cool proposal from Sasha Rush for “named tensors”, i.e. tensors with named indices. With an implementation in PyTorch. https://t.co/8TGYmrjxAG https://t.co/8TGYmrjxAG” (https://twitter.com/ylecun/status/1080974471689687040)
                    
                      Dynamic shapes · Issue #3 · KhronosGroup/NNEF-Tools (KhronosGroup/NNEF-Tools#3)
                      @xarray_dev (@xarray_dev) | Twitter (https://twitter.com/xarray_dev)
                        
                          xarray: N-D labeled arrays and datasets in Python — xarray 0.11.2+1.gd6bed01 documentation (http://xarray.pydata.org/en/stable/)
                            
                              NumFOCUS: Open Code = Better Science - NumFOCUS (https://numfocus.org/)
                              pydata/xarray: N-D labeled arrays and datasets in Python (https://github.com/pydata/xarray)
                            
                          
                  ofnote/tsalib: Tensor Shape (Annotation) Library (https://github.com/ofnote/tsalib)
                  Introducing Tensor Shape Annotation Library : tsalib (https://towardsdatascience.com/introducing-tensor-shape-annotation-library-tsalib-963b5b13c35b)
                  [Proposal] Named Axes/Dimensions or Tensor Shape Annotations · Issue #4164 · pytorch/pytorch (pytorch/pytorch#4164)
                
              
              Tensor considered harmful | Hacker News (https://news.ycombinator.com/item?id=18823777)
            
          
misc links that were in named tensor heading


  NVIDIA/OpenSeq2Seq: Toolkit for efficient experimentation with various sequence-to-sequence models (https://github.com/NVIDIA/OpenSeq2Seq)
  ctongfei/nexus: Experimental typesafe tensors / deep learning / probabilistic programming in Scala (https://github.com/ctongfei/nexus)
  NumFOCUS: Open Code = Better Science - NumFOCUS (https://numfocus.org/)
  Dynamic shapes · Issue #3 · KhronosGroup/NNEF-Tools (KhronosGroup/NNEF-Tools#3)

[x] notify when job done - trying telegram
I’d recommend making actual .py files and running your code through there if it is taking that long. You can then use notify2 (“pip install notify2”) to send yourself a desktop notification when your code finishes.
Ahmad Moussa  [1 day ago]
  if it’s remotely you could send yourself an email via a python script
Pyrestone  [7 hours ago]
  I also use the python telegram api sometimes. It’s pretty simple and you can send messages to your phone.
[-] SOTA papers with code work


  for:
    
      downloading their data to process
      run benchmarks, tasks
    
  
  starred repos
  dir in devacademics

json links

All papers with abstracts	https://paperswithcode.com/media/about/papers-with-abstracts.json.gz
  Links between papers and code	https://paperswithcode.com/media/about/links-between-papers-and-code.json.gz
  Evaluation tables	https://paperswithcode.com/media/about/evaluation-tables.json.gz
The last JSON is in the sota-extractor format and the code from there can be used to load in the JSON into a set of Python classes.
At the moment, data is regenerated once a week (over the weekend).
Part of the data is coming from the sources listed in the sota-extractor README.
papers with code json data snips

links-btw-paper-and-code
{
  "paper_title": "FASTSUBS: An Efficient and Exact Procedure for Finding the Most Likely Lexical Substitutes Based on an N-gram Language Model",
  "paper_arxiv_id": "1205.5407",
  "paper_url_abs": "http://arxiv.org/abs/1205.5407v2",
  "paper_url_pdf": "http://arxiv.org/pdf/1205.5407v2.pdf",
  "repo_url": "https://github.com/denizyuret/fastsubs-googlecode",
  "mentioned_in_paper": false,
  "mentioned_in_github": true
},
evaluation-tables
{
  "categories": [
    "Computer Vision"
  ],
  "datasets": [],
  "description": "The average of the normalized top-1 prediction scores of unseen classes in the generalized zero-shot learning setting, where the label of a test sample is predicted among all (seen + unseen) classes.",
  "source_link": null,
  "subtasks": [],
  "synonyms": [],
  "task": "Generalized Zero-Shot Learning - Unseen"
},
{
  "categories": [
    "Medical"
  ],
  "datasets": [],
  "description": "",
  "source_link": null,
  "subtasks": [],
  "synonyms": [],
  "task": "breast density classification"
},
{
  "categories": [
    "Medical"
  ],
  "datasets": [],
  "description": "",
  "source_link": null,
  "subtasks": [],
  "synonyms": [],
  "task": "epilepsy prediction"
},
{
  "categories": [
    "Methodology"
  ],
  "datasets": [],
  "description": "",
  "source_link": null,
  "subtasks": [],
  "synonyms": [],
  "task": "Sparse Learning"
},
{
  "categories": [
    "Robots"
  ],
  "datasets": [],
  "description": "",
  "source_link": null,
  "subtasks": [],
  "synonyms": [],
  "task": "Calibration"
},
{
  "categories": [
    "Graphs"
  ],
  "datasets": [],
  "description": "",
  "source_link": null,
  "subtasks": [],
  "synonyms": [],
  "task": "hypergraph partitioning"
}
papers-with-abstracts
{
  "arxiv_id": null,
  "title": "Towards a Discourse Model for Knowledge Elicitation",
  "abstract": "",
  "url_abs": "https://www.aclweb.org/anthology/papers/R/R13/R13-2006/",
  "url_pdf": "https://www.aclweb.org/anthology/R13-2006",
  "proceeding": "RANLP 2013 9"
},
{
  "arxiv_id": "1508.05902",
  "title": "A Framework for Comparing Groups of Documents",
  "abstract": "We present a general framework for comparing multiple groups of documents. A\nbipartite graph model is proposed where document groups are represented as one\nnode set and the comparison criteria are represented as the other node set.\nUsing this model, we present basic algorithms to extract insights into\nsimilarities and differences among the document groups. Finally, we demonstrate\nthe versatility of our framework through an analysis of NSF funding programs\nfor basic research.",
  "url_abs": "http://arxiv.org/abs/1508.05902v1",
  "url_pdf": "http://arxiv.org/pdf/1508.05902v1.pdf",
  "proceeding": null
},
{
  "arxiv_id": null,
  "title": "DysList: An Annotated Resource of Dyslexic Errors",
  "abstract": "",
  "url_abs": "https://www.aclweb.org/anthology/papers/L/L14/L14-1492/",
  "url_pdf": "http://www.lrec-conf.org/proceedings/lrec2014/pdf/612_Paper.pdf",
  "proceeding": "LREC 2014 5"
},
pipelineAI is kubeflow as a service (KASS)

Hands-on with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost - YouTube  [2019-09-25 Wed 10:25]
  Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU
  Description
  In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow.
  Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
  KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
  Airflow is the most-widely used pipeline orchestration framework in machine learning.

  most ppl not using pytorch, 1st mover advantage
  airflow better than luigie
  next gpu will be multi-user thread friendly
  on-prem 39%, pretty good
  A/B and multi-armed bandit testing of models
  EKS - amazon elasti-kubernetes service
  kubeflow doesn’t offer native airflow integration, pipelineai kubeflow version does along with MLflow (databricks)
  each github star is worth $1,500 in SV land.

PipelineAI - Products  [2019-09-25 Wed 11:08]
  Multi/Hybrid-Cloud
  CPU + GPU + TPU
  Dynamic Auto Scaling
  Adaptive Traffic Shift
  Continuous Model Training
  Continuous Pipeline Optimization
  Continuous Model Validation
  Kafka Streaming
  Private Dashboards
  Logging Integration
  SAML + LDAP + OAuth + IAM
  24x7 Support
DataSets


  Github’s Top Open Datasets For Machine Learning
  https://www.kaggle.com/datasets
  ~/DevAcademics/Datasets
    
      ~/DevAcademics/Datasets/awesome-public-datasets
    
  
  Google Dataset Search
  Academic Torrents

Reference

Stanford DAWN Deep Learning Benchmark (DAWNBench)
  ~/Documents/2Research/OnlineEdu/datasciencemasters-go
  ~/Documents/2Research/OnlineEdu/open-source-machine-learning-degree
local awesome lists

awesome-datascience
  ~/Documents/2Research/DataScience/awesome-datascience-ideas
  ~/Documents/2Research/DataScience/datascience-awesome-cheat-sheets
  ~/Documents/2Research/DataScience/free-data-science-books
Web Scraping Info

~/Documents/2Research/DataScience/awesome-crawler
  ~/Documents/2Research/DataScience/awesome-crawler/README.html
  ~/Documents/2Research/DataScience/awesome-crawler/README.md
python web/twitter scraping

Web Scraping Tutorial with Python: Tips and Tricks
kennethreitz/twitter-scraper: Scrape the Twitter Frontend API without authentication.
  taspinar/twitterscraper: Scrape Twitter for Tweets
  haccer/tweep: An advanced Twitter scraping tool written in Python that doesn’t use Twitter’s API, evading most API limitations.
  tweepy/tweepy: Twitter for Python!
Twitter scraper tutorial with Python: Requests, BeautifulSoup, and Selenium — Part 1
  Mining Twitter Data with Python (Part 1: Collecting data) – Marco Bonzanini
  bonzanini/Book-SocialMediaMiningPython: Companion code for the book “Mastering Social Media Mining with Python”
Beautiful Soup Documentation — Beautiful Soup 4.4.0 documentation
  Selenium - Web Browser Automation
web scraping info


  BruceDone/awesome-crawler: A collection of awesome web crawler,spider in different languages
  be a good web-scraping citizen:
    
      web scraping - How to be a good citizen when crawling web sites? - Software Engineering Stack Exchange
      html - Web scraping etiquette - Stack Overflow
    
  
  What is the best open source web crawler that is very scalable and fast? And why? - Quora
    
      top are Heritrix, Nutch, Scrapy
    
  
  scrapy
    
      michael-yin/awesome-scrapy: A curated list of … from the Scrapy community.
      Scrapy Tutorial — Scrapy 1.5.0 documentation
      Scrapy at a glance — Scrapy 1.5.0 documentation
      Scrapy | Community
      Scrapy: An open source web scraping framework for Python
      scrapinghub/portia: Visual scraping for Scrapy - worth getting
    
  
  others
    
      Selenium - Web Browser Automation
      Heritrix 3.0 and 3.1 User Guide - Heritrix - IA Webteam Confluence
      Apache Nutch™ -
      binux/pyspider: A Powerful Spider(Web Crawler) System in Python.
    
  
  How do I choose between using Beautiful Soup or Scrapy? - Quora, some useful points
  KDnuggets
    
      Web Content Mining, Screen Scraping
      Web Scraping Tutorial with Python: Tips and Tricks
      How to get structured data from the web without crawling
    
  
  via Selenium Webdriver vs Mechanize - Stack Overflow, good answer, they overlap some
    These are completely different tools that somewhat “cross” in the web-scraping, web automation, automated data extraction scope.
    selenium usually becomes a “fall-back” tool - when someone cannot web-scrape a site with mechanize or RoboBrowser or MechanicalSoup (note - another alternative)
    Also note that you should, first, consider using an API (if provided by the target website) instead of going down to web-scraping.

Papers with Code

Home - Nurture.AI
  GitXiv: Collaborative Open Computer Science
  Papers with Code : the latest in machine learning
Books noter

DeepLearningBook_cropped

Skeleton

Contents

Website

Acknowledgments

Notation

Chapter 1 - Introduction

1.1 Who Should Read This Book?

1.2 Historical Trends in Deep Learning

1.2.1 The Many Names and Changing Fortunes of Neural Networks

1.2.2 Increasing Dataset Sizes

1.2.3 Increasing Model Sizes

1.2.4 Increasing Accuracy, Complexity and Real-World Impact

Part I - Applied Math and Machine Learning Basics

Chapter 2 - Linear Algebra

2.1 Scalars, Vectors, Matrices and Tensors

2.2 Multiplying Matrices and Vectors

2.3 Identity and Inverse Matrices

2.4 Linear Dependence and Span

2.5 Norms

2.6 Special Kinds of Matrices and Vectors

2.7 Eigendecomposition

2.8 Singular Value Decomposition

2.9 The Moore-Penrose Pseudoinverse

2.10 The Trace Operator

2.11 The Determinant

2.12 Example: Principal Components Analysis

Chapter 3 - Probability and Information Theory

3.1 Why Probability?

3.2 Random Variables

3.3 Probability Distributions
3.3.1 Discrete Variables and Probability Mass Functions
3.4 Marginal Probability

3.5 Conditional Probability

3.6 The Chain Rule of Conditional Probabilities

3.7 Independence and Conditional Independence

3.8 Expectation, Variance and Covariance

3.9 Common Probability Distributions
3.9.1 Bernoulli Distribution3.9.2 Multinoulli Distribution3.9.3 Gaussian Distribution3.9.4 Exponential and Laplace Distributions3.9.5 The Dirac Distribution and Empirical Distribution3.9.6 Mixtures of Distributions
3.10 Useful Properties of Common Functions

3.11 Bayes’ Rule

3.12 Technical Details of Continuous Variables

3.13 Information Theory

3.14 Structured Probabilistic Models

Chapter 4 - Numerical Computation

4.1 Overflow and Underflow

4.2 Poor Conditioning

4.3 Gradient-Based Optimization
4.3.1 Beyond the Gradient: Jacobian and Hessian Matrices
4.4 Constrained Optimization

4.5 Example: Linear Least Squares

Chapter 5 - Machine Learning Basics

5.1 Learning Algorithms
5.1.1 The Task, T5.1.2 The Performance Measure, P5.1.3 The Experience, E5.1.4 Example: Linear Regression
5.2 Capacity, Overfitting and Underfitting
5.2.1 The No Free Lunch Theorem5.2.2 Regularization
5.3 Hyperparameters and Validation Sets
5.3.1 Cross-Validation
5.4 Estimators, Bias and Variance
5.4.1 Point Estimation5.4.2 Bias5.4.3 Variance and Standard Error5.4.4 Trading off Bias and Variance to Minimize Mean Squared Error5.4.5 Consistency
5.5 Maximum Likelihood Estimation
5.5.1 Conditional Log-Likelihood and Mean Squared Error5.5.2 Properties of Maximum Likelihood
5.6 Bayesian Statistics
5.6.1 Maximum A Posteriori (MAP) Estimation
5.7 Supervised Learning Algorithms
5.7.1 Probabilistic Supervised Learning5.7.2 Support Vector Machines5.7.3 Other Simple Supervised Learning Algorithms
5.8 Unsupervised Learning Algorithms
5.8.1 Principal Components Analysis5.8.2 k-means Clustering
5.9 Stochastic Gradient Descent

5.10 Building a Machine Learning Algorithm

5.11 Challenges Motivating Deep Learning
5.11.1 The Curse of Dimensionality5.11.2 Local Constancy and Smoothness Regularization5.11.3 Manifold Learning
Part II - Deep Networks: Modern Practices

Chapter 6 - Deep Feedforward Networks

6.1 Example: Learning XOR

6.2 Gradient-Based Learning
6.2.1 Cost Functions6.2.1.1 Learning Conditional Distributions with Maximum Likelihood6.2.1.2 Learning Conditional Statistics6.2.2 Output Units6.2.2.1 Linear Units for Gaussian Output Distributions6.2.2.2 Sigmoid Units for Bernoulli Output Distributions6.2.2.3 Softmax Units for Multinoulli Output Distributions6.2.2.4 Other Output Types
6.3 Hidden Units
6.3.1 Rectified Linear Units and Their Generalizations6.3.2 Logistic Sigmoid and Hyperbolic Tangent6.3.3 Other Hidden Units
6.4 Architecture Design
6.4.1 Universal Approximation Properties and Depth6.4.2 Other Architectural Considerations
6.5 Back-Propagation and Other Differentiation Algorithms
6.5.1 Computational Graphs6.5.2 Chain Rule of Calculus6.5.3 Recursively Applying the Chain Rule to Obtain Backprop6.5.4 Back-Propagation Computation in Fully-Connected MLP6.5.5 Symbol-to-Symbol Derivatives6.5.6 General Back-Propagation6.5.7 Example: Back-Propagation for MLP Training6.5.8 Complications6.5.9 Differentiation outside the Deep Learning Community6.5.10 Higher-Order Derivatives
6.6 Historical Notes

Chapter 7 - Regularization for Deep Learning

7.1 Parameter Norm Penalties
7.1.1 L2 Parameter Regularization7.1.2 L1 Regularization
7.2 Norm Penalties as Constrained Optimization

7.3 Regularization and Under-Constrained Problems

7.4 Dataset Augmentation

7.5 Noise Robustness

7.6 Semi-Supervised Learning

7.7 Multi-Task Learning

7.8 Early Stopping

7.9 Parameter Tying and Parameter Sharing

7.10 Sparse Representations

7.11 Bagging and Other Ensemble Methods

7.12 Dropout

7.13 Adversarial Training

7.14 Tangent Distance, Tangent Prop, and Manifold Tangent Classifier

Chapter 8 - Optimization for Training Deep Models

8.1 How Learning Differs from Pure Optimization
8.1.1 Empirical Risk Minimization8.1.2 Surrogate Loss Functions and Early Stopping8.1.3 Batch and Minibatch Algorithms
8.2 Challenges in Neural Network Optimization
8.2.1 Ill-Conditioning8.2.2 Local Minima8.2.3 Plateaus, Saddle Points and Other Flat Regions8.2.4 Cliffs and Exploding Gradients8.2.5 Long-Term Dependencies8.2.6 Inexact Gradients8.2.7 Poor Correspondence between Local and Global Structure8.2.8 Theoretical Limits of Optimization
8.3 Basic Algorithms
8.3.1 Stochastic Gradient Descent8.3.2 Momentum8.3.3 Nesterov Momentum
8.4 Parameter Initialization Strategies

8.5 Algorithms with Adaptive Learning Rates
8.5.1 AdaGrad8.5.2 RMSProp8.5.3 Adam8.5.4 Choosing the Right Optimization Algorithm
8.6 Approximate Second-Order Methods
8.6.1 Newton’s Method8.6.2 Conjugate Gradients8.6.3 BFGS
8.7 Optimization Strategies and Meta-Algorithms
8.7.1 Batch Normalization8.7.2 Coordinate Descent8.7.3 Polyak Averaging8.7.4 Supervised Pretraining8.7.5 Designing Models to Aid Optimization8.7.6 Continuation Methods and Curriculum Learning
Chapter 9 - Convolutional Networks

9.1 The Convolution Operation

9.2 Motivation

9.3 Pooling

9.4 Convolution and Pooling as an Infinitely Strong Prior

9.5 Variants of the Basic Convolution Function

9.6 Structured Outputs

9.7 Data Types

9.8 Efficient Convolution Algorithms

9.9 Random or Unsupervised Features

9.10 The Neuroscientific Basis for Convolutional Networks

9.11 Convolutional Networks and the History of Deep Learning

Chapter 10 - Sequence Modeling: Recurrent and Recursive Nets

10.1 Unfolding Computational Graphs

10.2 Recurrent Neural Networks
10.2.1 Teacher Forcing and Networks with Output Recurrence10.2.2 Computing the Gradient in a Recurrent Neural Network10.2.3 Recurrent Networks as Directed Graphical Models10.2.4 Modeling Sequences Conditioned on Context with RNNs
10.3 Bidirectional RNNs

10.4 Encoder-Decoder Sequence-to-Sequence Architectures

10.5 Deep Recurrent Networks

10.6 Recursive Neural Networks

10.7 The Challenge of Long-Term Dependencies

10.8 Echo State Networks

10.9 Leaky Units and Other Strategies for Multiple Time Scales
10.9.1 Adding Skip Connections through Time10.9.2 Leaky Units and a Spectrum of Different Time Scales10.9.3 Removing Connections
10.10 The Long Short-Term Memory and Other Gated RNNs
10.10.1 LSTM10.10.2 Other Gated RNNs
10.11 Optimization for Long-Term Dependencies
10.11.1 Clipping Gradients10.11.2 Regularizing to Encourage Information Flow
10.12 Explicit Memory

Chapter 11 - Practical Methodology

11.1 Performance Metrics

11.2 Default Baseline Models

11.3 Determining Whether to Gather More Data

11.4 Selecting Hyperparameters
11.4.1 Manual Hyperparameter Tuning11.4.2 Automatic Hyperparameter Optimization Algorithms11.4.3 Grid Search11.4.4 Random Search11.4.5 Model-Based Hyperparameter Optimization
11.5 Debugging Strategies

11.6 Example: Multi-Digit Number Recognition

Chapter 12 - Applications

12.1 Large Scale Deep Learning
12.1.1 Fast CPU Implementations12.1.2 GPU Implementations12.1.3 Large Scale Distributed Implementations12.1.4 Model Compression12.1.5 Dynamic Structure12.1.6 Specialized Hardware Implementations of Deep Networks
12.2 Computer Vision
12.2.1 Preprocessing12.2.1.1 Contrast Normalization12.2.1.2 Dataset Augmentation
12.3 Speech Recognition

12.4 Natural Language Processing
12.4.1 n-grams12.4.2 Neural Language Models12.4.3 High-Dimensional Outputs12.4.3.1 Use of a Short List12.4.3.2 Hierarchical Softmax12.4.3.3 Importance Sampling12.4.3.4 Noise-Contrastive Estimation and Ranking Loss12.4.4 Combining Neural Language Models with n-grams12.4.5 Neural Machine Translation12.4.5.1 Using an Attention Mechanism and Aligning Pieces of Data12.4.6 Historical Perspective
12.5 Other Applications
12.5.1 Recommender Systems12.5.1.1 Exploration Versus Exploitation12.5.2 Knowledge Representation, Reasoning and Question Answering12.5.2.1 Knowledge, Relations and Question Answering
Part III - Deep Learning Research

Chapter 13 - Linear Factor Models

13.1 Probabilistic PCA and Factor Analysis

13.2 Independent Component Analysis (ICA)

13.3 Slow Feature Analysis

13.4 Sparse Coding

13.5 Manifold Interpretation of PCA

Chapter 14 - Autoencoders

14.1 Undercomplete Autoencoders

14.2 Regularized Autoencoders
14.2.1 Sparse Autoencoders14.2.2 Denoising Autoencoders14.2.3 Regularizing by Penalizing Derivatives
14.3 Representational Power, Layer Size and Depth

14.4 Stochastic Encoders and Decoders

14.5 Denoising Autoencoders
14.5.1 Estimating the Score14.5.1.1 Historical Perspective
14.6 Learning Manifolds with Autoencoders

14.7 Contractive Autoencoders

14.8 Predictive Sparse Decomposition

14.9 Applications of Autoencoders

Chapter 15 - Representation Learning

15.1 Greedy Layer-Wise Unsupervised Pretraining
15.1.1 When and Why Does Unsupervised Pretraining Work?
15.2 Transfer Learning and Domain Adaptation

15.3 Semi-Supervised Disentangling of Causal Factors

15.4 Distributed Representation

15.5 Exponential Gains from Depth

15.6 Providing Clues to Discover Underlying Causes

Chapter 16 - Structured Probabilistic Models for Deep Learning

16.1 The Challenge of Unstructured Modeling

16.2 Using Graphs to Describe Model Structure
16.2.1 Directed Models16.2.2 Undirected Models16.2.3 The Partition Function16.2.4 Energy-Based Models16.2.5 Separation and D-Separation16.2.6 Converting between Undirected and Directed Graphs16.2.7 Factor Graphs
16.3 Sampling from Graphical Models

16.4 Advantages of Structured Modeling

16.5 Learning about Dependencies

16.6 Inference and Approximate Inference

16.7 The Deep Learning Approach to Structured Probabilistic Models
16.7.1 Example: The Restricted Boltzmann Machine
Chapter 17 - Monte Carlo Methods

17.1 Sampling and Monte Carlo Methods
17.1.1 Why Sampling?17.1.2 Basics of Monte Carlo Sampling
17.2 Importance Sampling

17.3 Markov Chain Monte Carlo Methods

17.4 Gibbs Sampling

17.5 The Challenge of Mixing between Separated Modes
17.5.1 Tempering to Mix between Modes17.5.2 Depth May Help Mixing
Chapter 18 - Confronting the Partition Function

18.1 The Log-Likelihood Gradient

18.2 Stochastic Maximum Likelihood and Contrastive Divergence

18.3 Pseudolikelihood

18.4 Score Matching and Ratio Matching

18.5 Denoising Score Matching

18.6 Noise-Contrastive Estimation

18.7 Estimating the Partition Function
18.7.1 Annealed Importance Sampling18.7.2 Bridge Sampling
Chapter 19 - Approximate Inference

19.1 Inference as Optimization

19.2 Expectation Maximization

19.3 MAP Inference and Sparse Coding

19.4 Variational Inference and Learning
19.4.1 Discrete Latent Variables19.4.2 Calculus of Variations19.4.3 Continuous Latent Variables19.4.4 Interactions between Learning and Inference
19.5 Learned Approximate Inference
19.5.1 Wake-Sleep19.5.2 Other Forms of Learned Inference
Chapter 20 - Deep Generative Models

20.1 Boltzmann Machines

20.2 Restricted Boltzmann Machines
20.2.1 Conditional Distributions20.2.2 Training Restricted Boltzmann Machines
20.3 Deep Belief Networks

20.4 Deep Boltzmann Machines
20.4.1 Interesting Properties20.4.2 DBM Mean Field Inference20.4.3 DBM Parameter Learning20.4.4 Layer-Wise Pretraining20.4.5 Jointly Training Deep Boltzmann Machines
20.5 Boltzmann Machines for Real-Valued Data
20.5.1 Gaussian-Bernoulli RBMs20.5.2 Undirected Models of Conditional Covariance
20.6 Convolutional Boltzmann Machines

20.7 Boltzmann Machines for Structured or Sequential Outputs

20.8 Other Boltzmann Machines

20.9 Back-Propagation through Random Operations
20.9.1 Back-Propagating through Discrete Stochastic Operations
20.10 Directed Generative Nets
20.10.1 Sigmoid Belief Nets20.10.2 Differentiable Generator Nets20.10.3 Variational Autoencoders20.10.4 Generative Adversarial Networks20.10.5 Generative Moment Matching Networks20.10.6 Convolutional Generative Networks20.10.7 Auto-Regressive Networks20.10.8 Linear Auto-Regressive Networks20.10.9 Neural Auto-Regressive Networks20.10.10 NADE
20.11 Drawing Samples from Autoencoders
20.11.1 Markov Chain Associated with any Denoising Autoencoder20.11.2 Clamping and Conditional Sampling20.11.3 Walk-Back Training Procedure
20.12 Generative Stochastic Networks
20.12.1 Discriminant GSNs
20.13 Other Generation Schemes

20.14 Evaluating Generative Models

20.15 Conclusion

Bibliography

Index

pdfs noter [29/29]

Kingma and Welling - 2013 - Auto-Encoding Variational Bayes.pdf

need to relink to new pdf location
Skeleton

1 Introduction

2 Method

2.1 Problem scenario

2.2 The variational bound

2.3 The SGVB estimator and AEVB algorithm

2.4 The reparameterization trick

3 Example: Variational Auto-Encoder

4 Related work

5 Experiments

Link on page 6: http://www.cs.nyu.edu/˜roweis/data.html

6 Conclusion

7 Future work

A Visualisations

B Solution of - DKL(qbold0mu mumu 2005/06/28 ver: 1.3 subfig package(z) || pbold0mu mumu 2005/06/28 ver: 1.3 subfig package(z)), Gaussian case

C MLP’s as probabilistic encoders and decoders

C.1 Bernoulli MLP as decoder

C.2 Gaussian MLP as encoder or decoder

D Marginal likelihood estimator

E Monte Carlo EM

F Full VB

F.1 Example

Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples

2. Re-examin the Internal Representations

3. Towards Interpretalbe DNNs

Explanation in Artificial Intelligence: Insights from the Social Sciences

Skeleton

Link on page 1: tmiller@ unimelb. edu. au

1 Introduction

1.1 Scope

1.2 Major Findings

1.3 Outline

1.4 Example

2 Philosophical Foundations — What Is Explanation?

2.1 Definitions

2.1.1 Causality

2.1.2 Explanation

2.1.3 Explanation as a Product

2.1.4 Explanation as Abductive Reasoning

2.1.5 Interpretability and Justification

2.2 Why People Ask for Explanations

2.3 Contrastive Explanation

2.4 Types and Levels of Explanation

2.5 Structure of Explanation

2.6 Explanation and XAI

2.6.1 Causal Attribution is Not Causal Explanation

2.6.2 Contrastive Explanation

2.6.3 Explanatory Tasks and Levels of Explanation

2.6.4 Explanatory Model of Self

2.6.5 Structure of Explanation

3 Social Attribution — How Do People Explain Behaviour?

3.1 Definitions

3.2 Intentionality and Explanation

Link on page 24: https://www.youtube.com/watch?v=VTNmLt7QX8E

3.3 Beliefs, Desires, Intentions, and Traits

3.3.1 Malle’s Conceptual Model for Social Attribution

3.4 Individual vs. Group Behaviour

3.5 Norms and Morals

3.6 Social Attribution and XAI

3.6.1 Folk Psychology

3.6.2 Malle’s Models

3.6.3 Collective Intelligence

3.6.4 Norms and Morals

4 Cognitive Processes — How Do People Select and Evaluate Explanations?

4.1 Causal Connection, Explanation Selection, and Evaluation

4.2 Causal Connection: Abductive Reasoning

4.2.1 Abductive Reasoning and Causal Types

4.2.2 Background and Discounting

4.2.3 Explanatory Modes

4.2.4 Inherent and Extrinsic Features

4.3 Causal Connection: Counterfactuals and Mutability

4.3.1 Abnormality

4.3.2 Temporality

4.3.3 Controllability and Intent

4.3.4 Social Norms

4.4 Explanation Selection

4.4.1 Facts and Foils

4.4.2 Abnormality

4.4.3 Intentionality and Functionality

4.4.4 Necessity, Sufficiency and Robustness

4.4.5 Responsibility

4.4.6 Preconditions, Failure, and Intentions

4.5 Explanation Evaluation

4.5.1 Coherence, Simplicity, and Generality

4.5.2 Truth and Probability

4.5.3 Goals and Explanatory Mode

4.6 Cognitive Processes and XAI

4.6.1 Abductive Reasoning

4.6.2 Mutability and Computation

4.6.3 Abnormality

4.6.4 Intentionality and Functionality

4.6.5 Perspectives and Controllability

4.6.6 Evaluation of Explanations

5 Social Explanation — How Do People Communicate Explanations?

5.1 Explanation as Conversation

5.1.1 Logic and Conversation

5.1.2 Relation & Relevance in Explanation Selection

5.1.3 Argumentation and Explanation

5.1.4 Linguistic structure

5.2 Explanatory Dialogue

5.3 Social Explanation and XAI

5.3.1 Conversational Model

5.3.2 Dialogue

5.3.3 Theory of Mind

5.3.4 Implicature

5.3.5 Dilution

5.3.6 Social and Interactive Explanation

6 Conclusions

Link on page 60: https:

Link on page 60: //www.ijcai.org/proceedings/2017/0023.pdf

Link on page 61: http://www.darpa.mil/program/ explainable-artificial-intelligence, full solicitation at http://www.darpa.mil/attachments/

Link on page 61: http://www.darpa.mil/attachments/

Link on page 61: DARPA, Explainable Artificial Intelligence (XAI) Program, http://www.darpa.mil/program/ explainable-artificial-intelligence

Link on page 61: DARPA-BAA-16-53.pdf

Link on page 61: [[https://arxiv.org/pdf/1709.10256][xplainable Planning, in: IJCAI 2017 Workshop on Explainable Artificial Intelligence (XAI), URL https://arxiv.org/pdf/1709.10256, 2017. [47] N. Frosst, G. Hinton, Distilling a Neural Network Into a Soft Deci]]

Link on page 61: https://arxiv.org/abs/1711.09784

Link on page 62: https://arxiv.org/abs/

Link on page 62: 1802.00541

Link on page 64: the Asylum, in: IJCAI 2017 Workshop on Explainable Artificial Intelligence (XAI), 36–42, URL http://people.

Link on page 64: eng.unimelb.edu.au/tmiller/pubs/explanation-inmates.pdf

Link on page 64: G. Nott, ‘Explainable Artificial Intelligence’: Cracking open the black box of AI, Computer World https://www.computerworld.com.au/article/617359/

Link on page 66: https:

Link on page 66: D. S. Weld, G. Bansal, Intelligible Artificial Intelligence, arXiv e-prints 1803.04263, URL https: //arxiv.org/pdf/1803.04263.pdf

Geometric deep learning: going beyond Euclidean data

Skeleton

I Introduction

II Geometric learning problems

III Deep learning on Euclidean domains

IV The geometry of manifolds and graphs

V Spectral methods

VI Spectrum-free methods

VII Charting-based methods

VIII Combined spatial/spectral methods

IX Applications

X Open problems and future directions

References

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Skeleton

I Introduction

II Background on Deep Neural Networks (DNN)

II-A Artificial Intelligence and DNNs

II-B Neural Networks and Deep Neural Networks (DNNs)

II-C Inference versus Training

II-D Development History

II-F Embedded versus Cloud

II-E Applications of DNN

III Overview of DNNs

III-A Convolutional Neural Networks (CNNs)

III-A1 Non-Linearity

III-A2 Pooling

III-A3 Normalization

III-B Popular DNN Models

IV-B Models

IV-C Popular Datasets for Classification

IV DNN development resources

IV-A Frameworks

IV-D Datasets for Other Tasks

V Hardware for DNN Processing

V-A Accelerate Kernel Computation on CPU and GPU Platforms

V-B Energy-Efficient Dataflow for Accelerators

V-B1 Weight stationary (WS)

V-B3 No local reuse (NLR)

V-B2 Output stationary (OS)

V-B4 Row stationary (RS)

V-B5 Energy comparison of different dataflows

VI-A DRAM

VI Near-Data Processing

VI-B SRAM

VI-D Sensors

VI-C Non-volatile Resistive Memories

VII Co-design of DNN models and Hardware

VII-A Reduce Precision

VII-A1 Linear quantization

VII-A2 Non-linear quantization

VII-B Reduce Number of Operations and Model Size

VII-B1 Exploiting Activation Statistics

VII-B2 Network Pruning

VII-B3 Compact Network Architectures

VII-B4 Knowledge Distillation

VIII Benchmarking Metrics for DNN Evaluation and Comparison

VIII-A Metrics for DNN Models

VIII-B Metrics for DNN Hardware

IX Summary

Link on page 29: 1n: Convolutional Neural Networks for Visual Recogni- tion,” http://cs231n.stanford.edu/. P. A. Merolla, J. V. Arthur, R.

Link on page 30: https://software.intel.com/en-us/

Link on page 30: “Intel Math Kernel Library,” https://software.intel.com/en-us/ mkl

Link on page 30: 2017. “Caffe LeNet MNIST,” http://caffe.berkeleyvision.org/gathered/

Link on page 30: “Caffe LeNet MNIST,” http://caffe.berkeleyvision.org/gathered/ examples/mnist.html. “Caffe Model Zoo,”

Link on page 30: . “Caffe Model Zoo,” http://caffe.berkeleyvision.org/model zoo.

Link on page 30: “Caffe Model Zoo,” http://caffe.berkeleyvision.org/model zoo. html

Link on page 30: http://www.vlfeat.org/

Link on page 30: “Matconvnet Pretrained Models,” http://www.vlfeat.org/ matconvnet/pretrained/. “TensorFlow-Slim ima

Link on page 30: https://github.

Link on page 30: “TensorFlow-Slim image classification library,” https://github. com/tensorflow/models/tree/master/slim

Link on page 30: er/slim. “Deep Learning Frameworks,” https://developer.nvidia.com/

Link on page 30: “Deep Learning Frameworks,” https://developer.nvidia.com/ deep-learning-frameworks. Y.-H. Chen, T. Krishna,

Link on page 30: a Cortes, “THE MNIST DATABASE of handwritten digits,” http://yann.lecun.com/exdb/

Link on page 30: DATABASE of handwritten digits,” http://yann.lecun.com/exdb/ mnist/

Link on page 30: https://www.cs.toronto.edu/ ∼ kriz/cifar.html. A. Torralba, R. Fergus, and W. T. Freema

Link on page 30: http://host.robots.ox.ac.uk/pascal/

Link on page 30: VOC/

Link on page 30: http:

Link on page 30: //mscoco.org/. “Google Ope

Link on page 30: https://github.com/openimages/dataset

Link on page 30: mages,” https://github.com/openimages/dataset. “YouTube-8M,” https://research.google.com/youtube8m/. “AudioSet,” https://research.google.com/audioset/index.h

Link on page 30: M,” https://research.google.com/youtube8m/. “AudioSet,” https://research.google.com/audioset/index.html. S. Condon, “Facebook unveils Big Basin, new server gear

Link on page 31: http://eyeriss.mit.edu/energy.html. R. Dorrance, F. Ren, and D. Marković, “A scalable spars

Link on page 32: p Nets,” ICLR, 2015. “Benchmarking DNN Processors,” http://eyeriss.mit.edu/

Link on page 32: “Benchmarking DNN Processors,” http://eyeriss.mit.edu/ benchmarking.html

UNDERSTANDING DEEP LEARNING REQUIRES RETHINKING GENERALIZATION

Learning by Abstraction: The Neural State Machine

Skeleton

1 Introduction

2 Related work

3 The Neural State Machine

3.1 Concept vocabulary

3.2 States and edge transitions

3.3 Reasoning instructions

3.4 Model simulation

4 Experiments

4.1 Compositional question answering

4.2 Generalization experiments

5 Conclusion

6 Supplementary material

6.1 Related work (full version)

6.2 Ablation studies

6.3 Concept vocabulary

6.4 Scene graph generation

6.5 Implementation and training details

Building machines that learn and think like people

  Building machines that learn and think like people - 579 citations- Google Scholar
Skeleton

Building machines that learn and think like people

Introduction

1.1.#What this article is not

1.2.#Overview of the key ideas

Cognitive and neural inspiration in artificial intelligence

Challenges for building more human-like machines

3.1.#The Characters Challenge

3.2.#The Frostbite Challenge

Core ingredients of human intelligence

4.1.#Developmental start-up software
4.1.1.#Intuitive physics4.1.2.#Intuitive psychology
4.2.#Learning as rapid model building
4.2.1.#Compositionality4.2.2.#Causality4.2.3.#Learning-to-learn
4.3.#Thinking Fast
4.3.1.#Approximate inference in structured models4.3.2.#Model-based and model-free reinforcement learning.
Responses to common questions

5.1.#Comparing the learning speeds of humans and neural networks on specific tasks is not meaningful, because humans have extensive prior experience

5.2.#Biological plausibility suggests theories of intelligence should start with neural networks

5.3.#Language is essential for human intelligence. Why is it not more prominent here?

Looking forward

6.1.#Promising directions in deep learning

6.2.#Future applications to practical AI problems

6.3.#Toward more human-like learning and thinking machines

Open Peer Commentary

The architecture challenge: Future artificial-intelligence systems will require sophisticated architectures, and knowledge of the brain might guide their construction 10.1017/S0140525X17000036 Gianluca Baldassarre, Vieri Giuliano Santucci, Emilio Cartoni, and Daniele Caligiore Laboratory of Computational Embodied Neuroscience, Institute of Cognitive Sciences and Technologies, National Research Council of Italy, Rome, Italy. gianluca.baldassarre@istc.cnr.it#vieri.santucci@istc.cnr.it emilio.cartoni@istc.cnr.it#daniele.caligiore@istc.cnr.it http://www.istc.cnr.it/people/ http://www.istc.cnr.it/people/gianluca-baldassarre http://www.istc.cnr.it/people/vieri-giuliano-santucci http://www.istc.cnr.it/people/emilio-cartoni http://www.istc.cnr.it/people/daniele-caligiore In this commentary, we highlight a crucial challenge posed by the proposal of Lake et al. to introduce key elements of human cognition into deep neural networks and future artificial-intelligence systems: the need to design effective sophisticated architectures. We propose that looking at the brain is an important means of facing this great challenge. We agree with the claim of Lake et al. that to obtain human-level learning speed and cognitive flexibility, future artificial-intelligence (AI) systems will have to incorporate key elements of human cognition: from causal models of the world, to intuitive psychological theories, compositionality, and knowledge transfer. However, the authors largely overlook the importance of a major challenge to implementation of the functions they advocate: the need to develop sophisticated architectures to learn, represent, and process the knowledge related to those functions. Here we call this the architecture challenge. In this commentary, we make two claims: (1) tackling the architecture challenge is fundamental to success in developing human-level AI systems; (2) looking at the brain can furnish important insights on how to face the architecture challenge. The difficulty of the architecture challenge stems from the fact that the space of the architectures needed to implement the several functions advocated by Lake et al. is huge. The authors get close to this problem when they recognize that one thing that the enormous genetic algorithm of evolution has done in millions of years of the stochastic hill-climbing search is to develop suitable brain architectures. One possible way to attack the architecture challenge, also mentioned by Lake et al., would be to use evolutionary techniques mimicking evolution. We think that today this strategy is out of reach, given the &ldquo;ocean-like&rdquo; size of the search space. At most, we can use such techniques to explore small, interesting &ldquo;islands lost within the ocean.&rdquo; But how do we find those islands in the first place? We propose looking at the architecture of real brains, the product of the evolution genetic algorithm, and try to &ldquo;steal insights&rdquo; from nature. Indeed, we think that much of the intelligence of the brain resides in its architecture. Obviously, identifying the proper insights is not easy to do, as the brain is very difficult to understand. However, it might be useful to try, as the effort might give us at least some general indications, a compass, to find the islands in the ocean. Here we present some examples to support our intuition. When building architectures of AI systems, even when following cognitive science indications (e.g., Franklin 2007), the tendency is to &ldquo;divide and conquer,&rdquo; that is, to list the needed high-level functions, implement a module for each of them, and suitably interface the modules. However, the organisation of the brain can be understood on the basis of not only high-level functions (see below), but also &ldquo;low-level&rdquo; functions (usually called &ldquo;mechanisms&rdquo;). An example of a mechanism is brain organisation based on macro-structures, each having fine repeated micro-architectures implementing specific computations and learning processes (Caligiore et al. 2016; Doya 1999): the cortex to statically and dynamically store knowledge acquired by associative learning processes (Penhune &amp; Steele 2012; Shadmehr &amp; Krakauer 2008), the basal ganglia to learn to select information by reinforcement learning (Graybiel 2005; Houk et al. 1995), the cerebellum to implement fast time-scale computations possibly acquired with supervised learning (Kawato et al. 2011; Wolpert et al. 1998), and the limbic brain structures interfacing the brain to the body and generating motivations, emotions, and the value of things (Mirolli et al. 2010; Mogenson et al. 1980). Each of these mechanisms supports multiple, high-level functions (see below). Brain architecture is also forged by the fact that natural intelligence is strongly embodied and situated (an aspect not much stressed by Lake et al.); that is, it is shaped to adaptively interact with the physical world (Anderson 2003; Pfeifer &amp; G&oacute;mez 2009) to satisfy the organism&apos;s needs and goals (Mannella et al. 2013). Thus, the cortex is organised along multiple cortical pathways running from sensors to actuators (Baldassarre et al. 2013a) and &ldquo;intercepted&rdquo; by the basal ganglia selective processes in their last part closer to action (Mannella &amp; Baldassarre 2015). These pathways are organised in a hierarchical fashion, with the higher ones that process needs and motivational information controlling the lower ones closer to sensation&sol;action. The lowest pathways dynamically connect musculoskeletal body proprioception with primary motor areas (Churchland et al. 2012). Higher-level &ldquo;dorsal&rdquo; pathways control the lowest pathways by processing visual&sol;auditory information used to interact with the environment (Scott 2004). Even higher-level &ldquo;ventral&rdquo; pathways inform the brain on the identity and nature of resources in the environment to support decisions (Caligiore et al. 2010; Milner &amp; Goodale 2006). At the hierarchy apex, the limbic brain supports goal selection based on visceral, social, and other types of needs&sol;goals. Embedded within the higher pathways, an important structure involving basal ganglia&ndash;cortical loops learns and implements stimulus&ndash;response habitual behaviours (used to act in familiar situations) and goal-directed behaviours (important for problem solving and planning when new challenges are encountered) (Baldassarre et al. 2013b; Mannella et al. 2013). These brain structures form a sophisticated network, knowledge of which might help in designing the architectures of human-like embodied AI systems able to act in the real world. A last example of the need for sophisticated architectures starts with the recognition by Lake et al. that we need to endow AI systems with a &ldquo;developmental start-up software.&rdquo; In this respect, together with other authors (e.g., Weng et al. 2001; see Baldassarre et al. 2013b; 2014, for collections of works) we believe that human-level intelligence can be achieved only through open-ended learning, that is, the cumulative learning of progressively more complex skills and knowledge, driven by intrinsic motivations, which are motivations related to the acquisition of knowledge and skills rather than material resources (Baldassarre 2011). The brain (e.g., Lisman &amp; Grace 2005; Redgrave &amp; Gurney 2006) and computational theories and models (e.g., Baldassarre &amp; Mirolli 2013; Baldassarre et al. 2014; Santucci et al. 2016) indicate how the implementation of these processes indeed requires very sophisticated architectures able to store multiple skills, to transfer knowledge while avoiding catastrophic interference, to explore the environment based on the acquired skills, to self-generate goals&sol;tasks, and to focus on goals that ensure a maximum knowledge gain. Building machines that learn and think for themselves

Building machines that learn and think for themselves

Digging deeper on “deep” learning: A computational ecology approach

Back to the future: The return of cognitive functionalism

Theories or fragments?

The humanness of artificial non-normative personalities

Children begin with the same start-up software, but their software updates are cultural

Deep-learning networks and the functional architecture of executive control

Causal generative models are just a start

Thinking like animals or thinking like colleagues?

Evidence from machines that learn and think like people

What can the brain teach us about building artificial intelligence?

Building brains that communicate like machines

The importance of motivation and emotion for explaining human cognition

Building on prior knowledge without building it in

Building machines that adapt and compute like brains

Will human-like machines make human-like mistakes?

Benefits of embodiment

Understand the cogs to understand cognition

Social-motor experience and perception-action learning bring efficiency to machines

The argument for single-purpose robots

Autonomous development and learning in artificial intelligence and robotics: Scaling up deep learning to human-like learning

Human-like machines: Transparency and comprehensibility

Intelligent machines and human minds

The fork in the road

Avoiding frostbite: It helps to learn from others

Crossmodal lifelong learning in hybrid neural embodied architectures

Summary

Nature versus nurture

Coherent theories versus theory fragments

Symbolic versus sub-symbolic representations

Additional ingredients

R5.1.#Machines that feel: Emotion

R5.2.#Machines that act: Action and embodiment

R5.3.#Machines that learn from others: Culture and pedagogy

R5.4.#Machines that explore: Open-ended learning and intrinsic motivation

Insights from neuroscience and the brain

Coda: Ethics, responsibility, and opportunities

TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing
repo in /Users/Will/DevAcademics/DNN-Misc/tensorfuzz
Skeleton

1 Introduction

2 Background

2.1 Coverage-guided fuzzing

2.2 Testing of Neural Networks

2.3 Opportunities for improvement

3 The TensorFuzz library

3.1 The basic fuzzing procecure

3.2 Details of the fuzzing procedure

3.3 Batching and nondeterminism

4 Experimental results

4.1 CGF can efficiently find numerical errors in trained neural networks

4.2 CGF surfaces disagreements between models and their quantized versions

4.3 CGF surfaces undesirable behavior in character level language models

5 Conclusion

AlphaD3M: Machine Learning Pipeline Synthesis


  comparing their system with other autoML frameworks:
    
      Autosklearn
      autostacker
      TPOT
    
  
  OpenML datasets
  given: dataset, well defined task, performance criteria
  DARPA D3M (Data Driven Discovery)
  AlphaZerio as a starting point
    
      single-player game
    
  
Notes for page 2


  DNN for predicting
    
      pipeline performance (value, or Q fn), and
      action probabilities
    
  
GAN org-noter pdfs

Karras et al_2018_Progressive Growing of GANs for Improved Quality, Stability, and Variation.pdf

2 progressive growing of gans

related work, alternative architectures, but not quit the same

3 increasing variation using minibatch standard deviation

Wang et al. - 2018 - Evolutionary Generative Adversarial Networks.pdf

Skeleton

1 Introduction

2 Related Works

2.1 Generative Adversarial Networks

2.2 Evolutionary Algorithms

3 Method

3.1 Generative Adversarial Networks

3.2 Evolutionary Algorithm

3.3 Mutations
3.3.1 Minimax mutation3.3.2 Heuristic mutation3.3.3 Least-squares mutation
3.4 Evaluation

3.5 E-GAN

4 Experiments

4.1 Implementation Details

4.2 Synthetic Datasets and Mode Collapse

4.3 CIFAR-10 and Inception Score

4.4 LSUN and Architecture Robustness

4.5 CelebA and Space Continuity

5 Conclusion

Kumar et al. - 2017 - Semi-supervised Learning with GANs Manifold Invar.pdf

Skeleton

1 Introduction

2 Semi-supervised learning using GANs

2.1 Estimating the tangent space of data manifold
2.1.1 Training the inverse mapping (the encoder)2.1.2 Estimating the dominant tangent space
2.2 Injecting invariances into the classifier using tangents

2.3 GAN discriminator as the classifier for semi-supervised learning: effect of fake examples

3 Experiments

4 Discussion

A Tangent Plots

B Reconstruction Plots

Workshops & Meetups

webinares | meetups

Explainability of AI


  DL meetup group
  Heatmapping.org
  Tutorial: Implementing Layer-Wise Relevance Propagation
  A Quick Introduction to Deep Taylor Decomposition
  albermax/innvestigate: A toolbox to iNNvestigate neural networks’ predictions!
  sebastian-lapuschkin/lrp_toolbox: The LRP Toolbox provides simple and accessible stand-alone implementations of LRP for artificial neural networks supporting Matlab and Python. The Toolbox realizes LRP functionality for the Caffe Deep Learning Framework as an extension of Caffe source code published in 10/2015.
  VigneshSrinivasan10/interprettensor
  ArrasL/LRP_for_LSTM
  Also TDLS: Explainable Neural Networks based on Additive Index Models - YouTube

ACM webinar - failed

The Bayesian Zig Zag: Developing Probabilistic Models Using Grid Methods and MCMC
PipelineAI webinar


  [ ] remember part2 of this talk with tpus.
  multi-armed bandit (traffic routing?)
  injectable functions?
  offline/online(production)
  all to docker image -> sagemaker, cloud, personal premises, etc
  recorded
  tensorflow + etl engine, batch
  nvlink 16-32 gpus now, switch
    
      xla libs (in tensorflow), it’s cost optimizer to fuse layers/operations
    
  
  hiring, jr/sr, san fran incubator - his house, 6mths mentor, nice part of san fran

AI computer vision TMLS

  
    AI - Computer Vision | Meetup
    1st csharma (Cartik)(github)
      
        cartik.sharma@gmail.com
        Cartik Sharma | LinkedIn (followed)
        NeuroMorph Inc. - Engineering Director
          
            Neuromorph Inc. | Crunchbase
          
        
        NEUROMORPH, QUBITS FOR CORTICAL MODELING - researchgate
          
            Spherical harmonics - Wikipedia
            Source localization - Scholarpedia
            Juxtaposition of markovian spaces (further explanation)
          
        
        qubits for cortical modeling
      
    
    image classify products
      
        yolo, inceptionV3,
        product ID via pics
      
    
ACM webinar: Project Jupyter: From Computational Notebooks etc

ACM webinar: Project Jupyter: From Computational Notebooks to Large Scale Data Science with Sensitive Data

  Speaker:
    
      ellisonbg (Brian E. Granger)
      Brian Granger (@ellisonbg) | Twitter
    
  
  Project Jupyter: From Computational Notebooks to Large Scale Data Science with Sensitive Data - ACM Learning Webinars - Association for Computing Machinery
  orgs slide
    
      LIGO Open Science Center
      NumFOCUS: parent 501(c), big org for lots of FOSS
    
  
  message specification JSON
    
      transport layer over ZeroMQ or WebSockets
      see client and server in repo
    
  
  nteract: alt frontend, simple
  binder: https://mybinder.org, turn repo into working notebooks, handles all deps
  Regulations: HIPPA, FERPA, GDPR, FedRAMP, Title 13, Title 26, SOX, GLBA, Cali consumer privacy act, AB. 375
    
      Five Safe (Desai, Ritche, Welpton)
    
  
  Jlab: bash, C++11,14, Javascript
  A High-Level Grammar of Interactive Graphics | Vega-Lite
  GenePattern
  jupyter/nbdime: Tools for diffing and merging of Jupyter notebooks.
  jupyter display - Google Search, for better visuals
    
      pandas - Show DataFrame as table in iPython Notebook - Stack Overflow
      Module: display — IPython 6.5.0 documentation
    
  
  Observable - ???

PipelineAI webinar II


  kubeflow
  middleware for ML applications
  “model inference is hard…latency, accuracy, size, throughput, energy eff, and rate of experimentation”
  current sols research focused & offline
    
      crowded space, not lots of production engineering
      bespoke 1-off, offline
    
  
  kafka?
  explainability in here too, use LIME
    
      DeepInterpreter? (his notebook 6)
    
  
  tensorflow lite (aka mobile, tf compile), smaller, faster format, optimize
  kafka for data access
  autograph - converts python into tensorflow graphs
  chris@pipeline.ai for jobs, questions,
  PipelineAI - Community
    
      PipelineAI- notebooks
      PipelineAI - YouTube
      Chris Fregly presentations | SlideShare
    
  
  Other Projects:
    
      Kubeflow
      Kubernetes-native microservices API gateway: Ambassador
      Argo Project
      Pachyderm - Scalable, Reproducible Data Science
      Chainer: A flexible framework for neural networks
      pipeline/docs/quickstart at master · PipelineAI/pipeline
      Apache Kafka
      AutoGraph: Easy control flow for graphs  |  TensorFlow
    
  
  on github:
    
      PipelineAI/pipeline: PipelineAI: Real-Time Enterprise AI Platform
      pipeline/docs/quickstart/docker at master · PipelineAI/pipeline
      PipelineAI/notebooks: Sample Notebooks for PipelineAI
      PipelineAI/models: Sample Models for PipelineAI
    
  
Explainable ML in Healthcare ACM webinar


  Good points:
    
      explanation vs justification
      explanation vs causality
        
          not prescriptive
        
      
  decisions in healthcare
    
      heuristics majority
      rules based system
      ML based system
      # of factors in diagnosis
    
  
  [Sculley 2015], ML code is small part of ML in healthcare
  Q: At slide x, what do the abbreviations LOS, ROR, SSI EF stand for? (slide with healthcare utilization) – LOS .- Length of Stay, RoR - Risk of Readmission, SSI - surgical site infection, EF - ejection fraction (cardiac)
  [ ] Transparent
    
      Falling Rule Lists
      GAM(GeneralizedAdditiveModels)
      GA2M(GeneralizedAdditiveModelswith
      LIME(LocallyInterpretableModelAgnostic Explanations)
      Naïve Bayes
      Regression Models
      Shapley Values
    
  
  semi - shallow ensembles
  non-transparent
    
      deep learning
      SVM
      gradient boosting models
    
  
  transparency, fidelity, trust
  howto validate explanations?

Dave DeepMind 2


  missed some stuff just before here:
    
      https://youtu.be/JO0LwmIlWw0?t=1673
      https://youtu.be/JO0LwmIlWw0?t=2296
      https://youtu.be/JO0LwmIlWw0?t=2990
      https://youtu.be/JO0LwmIlWw0?t=3358
      https://youtu.be/JO0LwmIlWw0?t=3934
      https://youtu.be/JO0LwmIlWw0?t=4428
    
  
  https://youtu.be/JO0LwmIlWw0?t=4707
    
      Sonnet, for this course they don’t use high level sonnet or keras,
    
  
  alternative frameworks https://youtu.be/JO0LwmIlWw0?t=4890
  then colab
    
      just before https://youtu.be/JO0LwmIlWw0?t=5358, we want visualizations to see what’s happening
      tf.resetgraph
      there’s a default graph
      graphs get complicated quickly, notoriously hard to debug
      minor debug tips (around) https://youtu.be/JO0LwmIlWw0?t=6299
    
  
pipeline.ai talk
import fairing
  mlflow
  knyfe, pycuda, ipykernel, kanren, requests, pytorch-cpu torchvision-cpu -c pytorch, accimage, html5lib, Hy, BeautifulSoup4, libgcc-ng
DLRL

DLRL workshop #1 Intro to GANs [0/0]


  NOTES:
    
      Xiyangs notebook in my gan-explorations repo has dcgan and conditional dcgan, not sagan
      Conditional GANs | Kaggle, nice ref
    
  
Meeting 1 [2018-09-01]


  full attendance


  Paper Read
    
      Intro
        
          [ ] generative models broadly?
        
      
      Related work
        
          [ ] why are markov chains needed in previous ones?
          [ ] VAEs
        
      
      Adverserial Nets
        
          [ ] train on data period first, then compete?
          [ ] scores are errors?
          [ ]
          cross entropy
            
              z ~ uniform()
            
          
      Theoretical results
        
          
      Experiments
        
          [ ] Gaussian Parzen window?
        
      
      Pros Cons
        
          [ ] helvetica scenario?
          [ ] negative chain boltzmann machine
          markov chains, inference, not needed
            
              Mchains need blurry distributions for chains to mix between modes.
              this can represent sharp, even degenerate distributions
            
          
      Conclusion
        
          learned approximate inference
          variational inference
          MCMC inference
          AIS?
          Parzen density?
        
      
Meet 2 [2018-09-08 Sat]


  Lecture 13 | Generative Models - YouTube
    
      for G, better objective max D getting wrong max(D(G(z))), instead of minimizing (1-D(G(z)))
      Wasserstein GAN supposed to avoid issue with balancing training between G and D
      Tips:
        
          replace any pooling layers with strided convs(D), and fractional-strided cons(G)
          use batchnorm in both G and D
          remove fully connected hidden layers for deeper architectures
          use relu in G for all layers, output use Tanh
          use leakyRelu in D for all layers
        
      
      active research:
        
          better loss functions, more stable training (Wasserstein, LSGAN, etc)
          conditional GANs
          all kinds of applications
        
      
      current active generative models research
        
          PixelRNN and PixelCNN
            
              explicit density model
              optimizes exact likelihood
              good samples
              inefficient sequential generation
            
          
          VAE
            
              optimize variational lower bound on likelihood
              useful latent representation
              inference queries
              samples not great
            
          
          GANs
            
              game-theoretic approach, best samples
              tricky & unstable to train
              no inference queries
            
          
          recent work to also combine the above
        
      
  Goodfellow Tutorial 2016
    
      https://github.com/soumith/ganhacks
    
  
  Generating Pokemon with a Generative Adversarial Network - YouTube
    
      DCGAN deep convolutional GAN, 1st improvement
        
          batchnorm must for both
          avoid fully connected hidden units
          avoid pooling, simply stride the conv (or capsule)
          relu like other guides
          use this for baseline comparison, esp for non-simple datasets
        
      
      CGANs conditional GANs
        
          concatenate same y’ input to both z (for G), and x (for D), ie. text labels for that neat trick
        
      
      Wasserstein
        
          improve loss fn, eg. when to stop?
          highest training stability
          informative & interpretable loss fn
        
      
SAGAN paper notes


  [2018-09-11 Tue]
    
      Intro
        
          Scores
            
              Inception score
              Frechet Inception distance
            
          
          ImageNet dataset
        
      
sagan part


  SAGAN part
    
      image features –> 2 weighted feature spaces f,g
      β’s between different regions of f,g
    
  
  optimize params: w_f, w_g, w_h,
  derived: β_j,i, o_i,
    
      β - NxN attention map
      i,j ∈ {1..N}
    
  
# f w array
# g w array
# softmax f(xi)^T * g(xj)


  γ - training hyperparameter
  hinge loss
    
      
Q’s


  [X] with no pooling, how to reduce dims btw conv layers?
    
      1x1 convs?
      batchnorm?
    
  
GAN stabilization


  spectral normalization (G and D)
    
      TTUR (two-timescale update rule)
    
  
  imbalanced learning rate
    
      TTUR seperate learning rates
    
  
Study session [2018-09-22 Sat]


  for Self-Attention-GAN-Tensorflow repo, changed default dataset to mnist, from celebA (only 3 pics in there)

Papers/sagan repo

For MNIST

  generator
    
      layers = 8-3 = 5
        
          1 layer 1024 channels
          3 conv layers (1024, 512, 256)
          attention layer (128)
          2 conv layers (128, 256)
          1 conv layer sigmoid
        
      
  discriminator
    
      layers = 8-3 = 5
        
          1 layer 64 channels
          3 conv layers (64, 128, 256)
          attention layer (256 ch)
          2 conv layers (256, 512)
          1 conv layer (4), flatten
          1 dense layer sigmoid
        
      
SAGAN paper org-noter

Skeleton

1 Introduction

2 Related Work

3 Self-Attention Generative Adversarial Networks

4 Techniques to stabilize GAN training

4.1 Spectral normalization for both generator and discriminator


  not just D
  every layer for both

4.2 Imbalanced learning rate for generator and discriminator updates


  to compensate for slow learning since D has regularization applied

5 Experiments

evaluation metrics:


  ID (Inception Distance)
    
      KL divergence btw conditional class and marginal class.
      higher better
      has problems
    
  
  FID (Frechet Inception Distance) is a more principled and comprehensive metric, and has been shown to be more consistent with human evaluation in assessing the realism and variation of the generated samples
    
      Wassertein-2 distance between generated and real images in the feature space of an Inception-v3 network.
      lower values mean closer distances between synthetic and real data distributions
    
  
Network structure & implementation


  128 x 128 images
  spec-norm every layer on both G and D
  conditional batch normalization for G, and projection type for D.
  Adam optimizer:
    
      beta1 = 0, beta2 = 0.9
      learn rate for D = 0.0004
      learn rate for G = 0.0001
    
  
  SAGAN uses conditional batch normalization in the generator and projection in the discriminator.

5.1 Evaluating the proposed stabilization techniques.

5.2 Self-attention mechanism.


  attention mid to late layer is best
  both G and D
  complements convolution, which is strong in modeling local dependencies

visualize attention maps


  We observe that the network learns to allocate attention according to similarity of color and texture, rather than just spatial adjacency.

5.3 Comparison with the state-of-the-art

6 Conclusion

Meet 3


  Xiyang recommended references
  Dave Macdonald RangleIO guy trying to get ML going
    
      1 month? presentation
    
  
  Q’s
    
      regions = ? (ith and jth), pixel, some arbitrary rectangle
      embedding space? (non-local paper. embedded gaussian sec) dimension reduction?
        
          you don’t need softmax constraint? absolute value can …?
        
      
  [ ] hinge loss could use followup, add to doc
  spectral normalization (see paper)
    
      lipschitz condition
      compute eigenvalues of weights - sqrt of highest - but
      [ ] followup
    
  
  [ ] measures:
    
      FID
      
    
Repo: How to Train a GAN? Tips and tricks to make GANs work

https://github.com/soumith/ganhacks
While research in Generative Adversarial Networks (GANs) continues to improve the
  fundamental stability of these models,
  we use a bunch of tricks to train them and make them stable day to day.
Here are a summary of some of the tricks.
[Here’s a link to the authors of thi1s document](*a*uthors)
If you find a trick that is particularly useful in practice, please open a Pull Request to add it to the document.
  If we find it to be reasonable and verified, we will me11rge it in.
1. Normalize the inputs


  normalize the images between -1 and 1
  Tanh as the last layer of the generator output

2: A modified loss function

In GAN papers, the loss function to optimize G is `min (log 1-D)`, but in practice folks practically use `max log D`

  because the first formulation has vanishing gradients early on
  Goodfellow et. al (2014)

In practice, works well:

  Flip labels when training generator: real = fake, fake = real

3: Use a spherical Z


  Dont sample from a Uniform distribution

![cube](images/cube.png “Cube”)

  Sample from a gaussian distribution

![sphere](images/sphere.png “Sphere”)

  When doing interpolations, do the interpolation via a great circle, rather than a straight line from point A to point B
  Tom White’s [Sampling Generative Networks](https://arxiv.org/abs/1609.04468) ref code https://github.com/dribnet/plat has mor11e details

4: BatchNorm


  Construct different mini-batches for real and fake, i.e. each mini-batch needs to contain only all real images or all generated images.
  when batchnorm is not an option use instance normalization (for each sample, subtract mean and divide by standard deviation).

![batchmix](images/batchmix.png “BatchMix”)
5: Avoid Sparse Gradients: ReLU, MaxPool


  the stability of the GAN game suffers if you have sparse gradients
  LeakyReLU = good (in both G and D)
  For Downsampling, use: Average Pooling, Conv2d + stride
  For Upsampling, use: PixelShuffle, ConvTranspose2d + stride
    
      PixelShuffle: https://arxiv.org/abs/11609.05158
    
  
6: Use Soft and Noisy Labels


  Label Smoothing, i.e. if you have two target labels: Real=1 and Fake=0, then for each incoming sample, if it is real, then replace the label with a random number between 0.7 and 1.2, and if it is a fake sample, replace it with 0.0 and 0.3 (for example).
    
      Salimans et. al. 2016
    
  
  make the labels the noisy for the discriminator: occasionally flip the labels when training the discriminator

7: DCGAN / Hybrid Models


  Use DCGAN when you can. It works!
  if you cant use DCGANs and no model is stable, use a hybrid model :  KL + GAN or11 VAE + GAN

8: Use stability tricks from RL


  Experience Replay
    
      Keep a replay buffer of past generations and occassionally show them
      Keep checkpoints from the past of G and D and occassionaly swap them out for a few iterations
    
  
  All stability tricks that work for deep deterministic policy gradients
  See Pfau & Viny11als (2016)

9: Use the ADAM Optimizer


  optim.Adam rules!
    
      See Radford et. al. 2015
    
  
  Use SGD for discriminator and ADAM for11 generator

10: Track failures early


  D loss goes to 0: failure mode
  check norms of gradients: if they are over 100 things are screwing up
  when things are working, D loss has low variance and goes down over time vs having huge variance and spiking
  if loss of generator steadily decreases, then it’s fooling D with garbage (says martin)

11: Dont balance loss via statistics (unless you have a good reason to)


  Dont try to find a (number of G / number of D) schedule to uncollapse training
  It’s hard and we’ve all tried it.
  If you do try it, have a principled approach to it, rather than intuition

For example
  “`
  while lossD > A:
  train D
  while lossG > B:
  t11rain G
  “`
12: If you have labels, use them


  if you have labels available, training the discriminator to also classify the samples: auxi11llary GANs

13: Add noise to inputs, decay over time


  Add some artificial noise to inputs to D (Arjovsky et. al., Huszar, 2016)
    
      http://www.inference.vc/instance-noise-a-trick-for-stabilising-gan-training/
      https://openreview.net/forum?id=Hk4_qw5xe
    
  
  adding gaussian noise to every layer of generator (Zhao et. al. EBGAN)
    
      Improved GANs: OpenAI code also has it (comm11ented out)
    
  
14: [notsure] Train discriminator more (sometimes)


  especially when you have noise
  hard to find a schedule of number of D iterations vs G 11iterations

15: [notsure] Batch Discrimination


  Mix11ed results

16: Discrete variables in Conditional GANs


  Use an Embedding layer
  Add as additional channels to images
  Keep embedding dimensionality low and upsample to match image ch11annel size

17: Use Dropouts in G in both train and test phase


  Provide noise in the form of dropout (50%).
  Apply on several layers of our generator at both training and test time
  https://arxiv.org/pdf/1611.0711004v1.pdf

Authors

  Soumith Chintala
  Emily Denton
  Martin Arjovsky
  Michael Mathieu

Deep Learning with Generative Adverserial Networks – ICLR 2017 Discoveries

Deep Learning with Generative Adverserial Networks – ICLR 2017 Discoveries - https://amundtveit.com/2016/11/12/deep-learning-with-generative-and-generative-adverserial-networks-iclr-2017-discoveries/
How was the ICLR 2018 conference? - Quora

2018-05-07 Monday
  Improved Techniques for Training GANs - I Goodfellow 2016
  Wasserstein GAN
  On the regularization of Wasserstein GANs | OpenReview
  Training GANs with Optimism
  Progressive Growing of GANs for Improved Quality, Stability, and Variation | OpenReview
Meet 4

no notes
[.] Meet 5


  try with celebA dataset:
    
      [-] sthalles repo
        
          [X] trying his dcgan first.
          [ ] get working
        
      
      [ ] Xiyang code?
      [ ] hhhhhao/paper repo
        
          [ ] mod to accept new data
          [ ] test run
          [ ] add spectral normalization to hhhhhhhao/paper repo
        
      
  [ ] get celebA in tfFrames
    
      [ ] what dim changes to get code to run these?
      [ ]
    
  
[.] DLRL sagan attention layer

implement attention layer on top of base repo
  Werner tensorflow gan base start
Meet7


  128x128 batch 8 to 16
  bigger batches better for both G and D, better gradients and ?? something else Dave said
  recommended Deep Residual Learning for Image Recognition paper
  Dave:
    
      cycle GANs intro
      doodles = a source distribution (besides real images)
        
          his generated doodles end up monochrome - why?
        
      
      synthetic data used - horizontal flip, random rotation; since dataset fairly small
        
          image batch readers on repeat (actual method)
        
      
      different design ideas decisions
      eve optimizer didin’t work well with GANs
    
  
  spectralnorm

def conv(o, channels, ks=3, strides=1, norm=None, padding='SAME', name=None):

    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):

        if norm is not None:
            o = norm(o, name)

        o = LeakyReLU() (o)

        in_channels = o.get_shape()[-1]

        w = tf.get_variable("kernel", shape=[ks, ks, in_channels, channels], initializer=tf.keras.initializers.he_uniform())
        b = tf.get_variable("bias", [channels], initializer=tf.constant_initializer(0.0))


        o = tf.nn.conv2d(o, spectral_norm(w, name_prefix="w"), [1, strides, strides, 1], padding) + b
    return o
# https://github.com/taki0112/Spectral_Normalization-Tensorflow/blob/master/spectral_norm.py
def spectral_norm(w, iteration=1, name_prefix=""):
    w_shape = w.shape.as_list()
    w = tf.reshape(w, [-1, w_shape[-1]])

    u = tf.get_variable(name_prefix+"u", [1, w_shape[-1]], initializer=tf.random_normal_initializer(), trainable=False)

    u_hat = u
    v_hat = None
    for i in range(iteration):

        """
        power iteration
        Usually iteration = 1 will be enough
        """

        v_ = tf.matmul(u_hat, tf.transpose(w))
        v_hat = tf.nn.l2_normalize(v_)

        u_ = tf.matmul(v_hat, w)
        u_hat = tf.nn.l2_normalize(u_)

    u_hat = tf.stop_gradient(u_hat)
    v_hat = tf.stop_gradient(v_hat)

    sigma = tf.matmul(tf.matmul(v_hat, w), tf.transpose(u_hat))

    with tf.control_dependencies([u.assign(u_hat)]):
        w_norm = w / sigma
        w_norm = tf.reshape(w_norm, w_shape)


    return w_norm

  softened hinge-loss is better for cycle-GAN (Dave)
    
      Softened hinge loss objectives for Generator and Discriminator:
    
  
fake_term = tf.reduce_mean(tf.nn.softplus( fake * SCALE + OFFSET))
real_term = tf.reduce_mean(tf.nn.softplus(-real * SCALE + OFFSET))
gen_term  = tf.reduce_mean(tf.nn.softplus(-fake * SCALE + OFFSET))

  hinge-loss gradients smaller (but nicer?)
  Dave found 10x diff in learning rate of cycle-GAN high
    
      L1 loss on pixels themselves -> a strong signal
      large gradients
      loss with cyclic check…
      but with hinge-loss gradients are limited to 1 so then not a problem
    
  
Meet 8,9?

NIPS 2016 Tutorial Generative Adversarial Network

Skeleton

Link on page 1: owing URLs: http://www.iangoodfellow.com/slides/2016-12-04-NIPS.pdf

Link on page 2: http://www.iangoodfellow.com/slides/2016-12-04-NIPS.key The video was recorded by the NIPS foundation and should be ma

1 Why study generative modeling?

Link on page 6: create images. A video demonstration of iGAN is available at the following URL: https://www.youtube.com/watch?v=9c4z6YsBGQ0

Link on page 7: https://www.youtube.com/watch?v=FDELBFSeqQs

2 How do generative models work? How do GANs compare to others?

2.1 Maximum likelihood estimation

2.2 A taxonomy of deep generative models

2.3 Explicit density models
2.3.1 Tractable explicit models2.3.2 Explicit models requiring approximation
2.4 Implicit density models

2.5 Comparing GANs to other generative models

3 How do GANs work?

3.1 The GAN framework

3.2 Cost functions
3.2.1 The discriminator’s cost, J(D)3.2.2 Minimax3.2.3 Heuristic, non-saturating game3.2.4 Maximum likelihood game3.2.5 Is the choice of divergence a distinguishing feature of GANs?3.2.6 Comparison of cost functions
3.3 The DCGAN architecture

3.4 How do GANs relate to noise-contrastive estimation and maximum likelihood?

4 Tips and Tricks

Link on page 30: GitHub repository associated with Soumith’s talk: https://github.com/soumith/ganhacks

4.1 Train with labels

4.2 One-sided label smoothing

4.3 Virtual batch normalization

4.4 Can one balance G and D?

5 Research Frontiers

5.1 Non-convergence
5.1.1 Mode collapse5.1.2 Other games
5.2 Evaluation of generative models

5.3 Discrete outputs

5.4 Semi-supervised learning

5.5 Using the code

5.6 Developing connections to reinforcement learning

6 Plug and Play Generative Networks

7 Exercises

7.1 The optimal discriminator strategy

7.2 Gradient descent for games

7.3 Maximum likelihood in the GAN framework

8 Solutions to exercises

8.1 The optimal discriminator strategy

8.2 Gradient descent for games

8.3 Maximum likelihood in the GAN framework

9 Conclusion

Link on page 54: oodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org

RL meet #2


  DQN
    
      tricks
    
  
  AlphaGo
    
      monte carlo tree search
        
          use residual networks
          
        
  driving https://youtu.be/MQ6pP65o7OM?t=2527

[x] [#B] prep RL meet #2 tomorrow

  
    watch videos from slack /rl
    review last week
  
RL meet #3


  Review:
    
      MCD, decision process, actual costs, actions & inputs
      agent approximations of above
      state, value function, value of state
        
          oracle provides actual values (π)
          state value fn – v^(s,w)
        
      
      action value function q_pi actual, q^(s,a,w)
        
          s = state, a = action, w = weights of net
        
      
      RL problems do not have oracle information about actual π values of either functions, we must get estimates via environment interactions
      1st strategy MC learningk
      [ ] notes, Qs
        
          [ ] generally:
            
              [ ] what criteria(theory) is there to select the different types of layers
                
                  [ ] activatation functions
                  other aspects
                
              
              [ ] and why certain number of layers?
            
          
          [ ] somehow keep track all the different variables,fns in RL
          [ ] https://youtu.be/SWpyiEezfp4?t=55
          [ ] https://youtu.be/MqTXoCxQ_eY?t=78
        
      
  [ ] why is network used to get reward value with TD,q-learning?
    
      ie why in the target valuation?
    
  
  [ ] how to keep track of things?
  [ ] use Deep Q-learning doc on Gdrive to post Q’s and rough work
  NO, traffic - javascript based, old. also NO on car-pole? atari gym

RL meet #4 & 5


  missed #5, first miss

graph NN meet 3

  attended. links:
layers.py - tkipf/gcn - Sourcegraph
  utils.py - tkipf/pygcn - Sourcegraph
  train.py - tkipf/pygcn - Sourcegraph
  Sparse matrices (scipy.sparse) — SciPy v1.2.1 Reference Guide
  scipy.sparse.coo_matrix — SciPy v1.2.1 Reference Guide
  scipy.sparse.eye — SciPy v1.2.1 Reference Guide
  scipy.sparse.diags — SciPy v1.2.1 Reference Guide
  pygcn/data/cora at master · tkipf/pygcn
  main.py - pytorch/examples - Sourcegraph
  [[https://aisc.a-i.science/events/2019-03-27/][[GCN] Semi-Supervised Classification with Graph Convolutional Networks | Lunch & Learn | A.I. Socratic Circles (#AISC)]]
  [[https://www.youtube.com/watch?v=eEs-qXs_9Dc][[GCN] Semi-Supervised Classification with Graph Convolutional Networks | AISC Lunch & Learn - YouTube]]
  0711.0189.pdf
  networkx.generators.random_graphs.barabasi_albert_graph — NetworkX 2.3rc1.dev20190329133857 documentation
  AlxndrMlk/Barabasi-Albert_Network: Barabási–Albert Network. A Step-by-Step Model with Visualizations created in Python 3.
  algorithm - Python: implementing a step-by-step, modified Barabasi-Albert model for scale-free networks - Stack Overflow
  python-igraph manual
  GraphSAGE
  [[https://arxiv.org/abs/1706.02216][[1706.02216] Inductive Representation Learning on Large Graphs]]
  New Tab
  NetworkX — NetworkX
  Beyond Grids: Learning Graph Representations for Visual Recognition
  New Tab
  Papers With Code : Search for graph convolution
DLRL may18-19

[2019-05-25 Sat]
  python3 main.py –exp_name $EXPNAME –dataset omniglot –test_N_way 5 –train_N_way 5 –train_N_shots 1 –test_N_shots 1 –batch_size 300  –dec_lr=10000  –iterations 100000
python3 main.py –exp_name $EXPNAME –dataset omniglot –test_N_way 5 –train_N_way 5 –train_N_shots 1 –test_N_shots 1 –batch_size 300  –dec_lr=10000  –iterations 500
python3 main.py –exp_name $EXPNAME –dataset omniglot –test_N_way 5 –train_N_way 5 –train_N_shots 1 –test_N_shots 1 –batch_size 300  –dec_lr=10000  –save_interval 10 –iterations 500
DLRL BERT June 2019


  meet1 no notes, can’t remember what was said
  meet2 [2019-06-22 Sat 14:58]
    
      trying to understand BERT
    
  
BERT meet 3

  
    watching recommended talk to explain bert/nlp Language Learning with BERT - TensorFlow and Deep Learning Singapore - YouTube
  
Other links from previous meets

  GloVe: Global Vectors for Word Representation
  tensor2tensor/common_attention.py at d9f807cf2738323d19aba0a20a8cf0c7f7da8b27 · tensorflow/tensor2tensor
    
      Tensor2Tensor Intro - Colaboratory
    
  
  The Annotated Transformer
    
      Attention Is All You Need - YouTube
      The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time
      The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) – Jay Alammar – Visualizing machine learning one concept at a time
      Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) – Jay Alammar – Visualizing machine learning one concept at a time
    
  
  via above
    
      Kullback-Leibler Divergence Explained — Count Bayesie
      Visual Information Theory – colah’s blog
    
  
  AI2 - they have great nlp, esp for research papers
  

https://tfhub.dev/google/
AISC

AISC Talks

Speaker List
Toronto Deep Learning Series (TDLS)
  TDLS speaker list - Google Sheets
AISC graph-nn-research @ TDLS [1/1]

  
    read papers - we are looking at landscape what’s out there
  
[x] graph-nn read Kipf & Welling paper

for Xiyangs graph-nn group
graph neural nets info

Oriol Vinyals on Twitter: “Graph Neural Networks / Relational Networks are models worth studying. We wrote a pretty comprehensive review about them which I hope you will find helpful (code forthcoming!). https://t.co/D46XCkUIeb… https://t.co/RxvY4yZGHe”
ICLR 2018 report Quora in AcademicLog

ICLR 2018 report Quora i
  [12] Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis
  [13] [1711.00740] Learning to Represent Programs with Graphs
  [14] [1802.03691] Tree-to-tree Neural Networks for Program Translation
ICLR 2018 report Quora ii
  [13] [1711.00740] Learning to Represent Programs with Graphs
  [19] https://tkipf.github.io/graph-co…
  [19b] [1712.00268] Deformable Shape Completion with Graph Convolutional Autoencoders
  [20] Graph Attention Networks
  [21] https://www-cs.stanford.edu/grou…
  [22] [1711.04043] Few-Shot Learning with Graph Neural Networks
links

1609.02907 Semi-Supervised Classification with Graph Convolutional Networks
  1706.02216 Inductive Representation Learning on Large Graphs
  1710.10903 Graph Attention Networks
  1611.08402 Geometric deep learning on graphs and manifolds using mixture model CNNs
  Graph Convolutional Networks | Thomas Kipf | PhD Student @ University of Amsterdam
  1801.10247 FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling
  [[https://www.google.com/search?q=%5B13%5D+%5B1711.00740%5D+Learning+to+Represent+Programs+with+Graphs&oq=%5B13%5D+%5B1711.00740%5D+Learning+to+Represent+Programs+with+Graphs&aqs=chrome..69i57.651j0j7&sourceid=chrome&ie=UTF-8][[13] [1711.00740] Learning to Represent Programs with Graphs - Google Search]]
  DeepMind-Advanced-Deep-Learning-and-Reinforcement-Learning/dl_01 Introduction to Machine Learning Based AI.pdf at master · enggen/DeepMind-Advanced-Deep-Learning-and-Reinforcement-Learning
  [[https://arxiv.org/abs/1810.09202][[1810.09202] Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation]]
  Graph Neural Network - YouTube
  Graph Neural Networks - YouTube
  Graph Convolution Learning - YouTube
  Xavier Bresson: “Convolutional Neural Networks on Graphs” - YouTube
  williamleif/GraphSAGE: Representation learning on large graphs using stochastic graph convolutions.
  matenure/FastGCN: The sample codes for our ICLR18 paper “FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling””
meet 1

  
    Laplacian Operator core of spectral graph theory
    Q: fixed input size?
      
        number of vertices
        and/or edges
      
    
    K&W,
      
        X - feature matrix of nodes
        A - adjacency matrix
      
    
1709.05429.pdf,
  Research - Hector Zenil,
  Complexity Explorer,
papers/info for GNN meet

1812.04202.pdf,
  1806.01261 Relational inductive biases, deep learning, and graph networks,
  1811.05868 Pitfalls of Graph Neural Network Evaluation,
  1812.08434 Graph Neural Networks: A Review of Methods and Applications,
  Meet Deep Graph Library, a Python Package For Graph Neural Networks,
  1812.04202.pdf
AISC Learnability can be undecidable

Learnability can be undecidable | Main Stream | A.I. Socratic Circles (#AISC)

  organic, connection, psych, bio-unrealistic
  computationalism
  Hinton PDP 1st book
  bio neurons also fire randomly
  bio/psych qualitative more than quantitative
  anatomy of learning – levels
  focus on the function only
    
      think a crypto function, it is not learnable
      connection between computational complexity and learnability? Yes
      applies to QM computing, since QM comp is still a function
      bare minimum, a function that learns.
      start with set X
        
          family of
        
      
  inductive argument for most papers, inductive assumption, that their algo works for a certain set of problems. try to convince generalizability by testing on samples from the problem space.
    
      this paper deductive
    
  
  [ ] finitely supported def

AISC abstract night


  Yaxal - temporal pattern attention for multivariate time series forecasting
    
      
  Ramya B - time series forecasting based on wavelet decmposition and feature extraction
    
      not stationary signals.
      wavelet for SP, not fourier nowadays?
      cascade correlation algo used
      also use PCA
    
  
  Albert Lai - captchas?
  a closer look at spatiotemporal convolutions for action recognition - Owen Ho
    
      a resnet (how?), 3D instead of 2D
      video recognition
    
  
  Florian Goebels - A generic framework for privacy preserving deep learning
    
      federated learning, interesting.
    
  
  a reinforcement learning framework for explainable recommendation - Omar Nada
  Jiri Stodulka - Deep RL based recommendation with explicit user-item interactions modeling
  Jorge Lopez - predicting crime using twitter and kernel density estimation
  

AISC Unsupervised Data Augmentation

AISC L&L Private machine learning in tensorflow with secure computing


  Morten Dahl Dropout Labs
  homomorphic encryption
  secret sharing
  TFE
    
      underneath, 3rd party libs: TEE, HE, MPC
    
  
  1,2 orders of magnitude slower
  uses current TF distributed code for communication (orchestratin)
    
      tf.device
    
  
  collaboration with openmind for pytorch – more federated. they are not so focused on federation
    
      can access openmind code from tfe
    
  
  ie. costs - ReLu uses a comparison which is expensive for encryption
    
      can try to approximate which could impact accuracy
    
  
  future
    
      ethical issues, privacy,
    
  
(TF-Encrypted) Private machine learning in tensorflow with secure computing | Lunch & Learn | A.I. Socratic Circles (#AISC)
  1810.08130 Private Machine Learning in TensorFlow using Secure Computation
  tf-encrypted/tf-encrypted: A Framework for Machine Learning on Encrypted Data
AISC The Neuro-Symbolic Concept Learner
Interpreting Scenes, Words & Sentences From Natural Supervision

  language semantic parsing -> heirarchical program
  visual concept annontation and program annotation –> symbolic reasoning module
  curriculum learning
  root then query, then filter for program
    
      once object is identified, it is filtered out
    
  
  multiple candidate programs for sentence / concept are sampled
    
      reinforcement with
    
  
  bidirectional GRU encoder for
    
      concept decoder (hard coded)
      algo1 string to tree semantic parser
      think of it as a recursive algorithm as it needs to be re-called
      2 separate GRU cells (hardcoded), for given function, it can call 2 other functions
    
  
  once objects recognized and program generated (semantic parsing), then symbolic reasoning
    
      
  parts:
    
      from pic
        
          object detection
          feature extraction
          concept box
        
      
      text
        
          semantic parsing
          concept embeddings
          program box
        
      
  off-policy search process for program selection (semantic parsing) (the reasoning process?)
    
      how is reward determined? updates weights once correct answer is found??
    
  
  attribute - shape, concept - sphere, etc
    
      different embeddings for different concepts
      but voc vector represent attributes
      embeddings space is hard-coded, but withiin space is learned
    
  
  cirriculum learning – what exactly?
    
      stupidly simple to start,
    
  
  mask r-cnn, resnet are pretrained
    
      concept embeddings, semantic parsing are trained, as is the neuro-symbolic reasoning (NSR)
        
          runs fns over objects and embeddings (RL part)
          semantic parser is trained on RL, not others (those are backpropped)
        
      
  NSR it is differentiable, this whole model is end-to-end apparently
    
      RL is the GRUs? programs are built by the GRUs
      if program gives correct answer reward = 1 otherwise = 0
    
  
AISC XLnet talk -empty

AISC Resnet talk [2019-08-12 Mon]

UoW — 2 types of features. machine intelligence course 5th lecture
  Alice Rueda
  Aggregating local image descriptors into compact codes
Alice Rueda
  This is the VLAD paper
AISC L&L SciSci
Discussion lead: Santo Fortunato
  Motivation:
  Identifying fundamental drivers of science and developing predictive models to capture its evolution are instrumental for the design of policies that can improve the scientific enterprise—for example, through enhanced career paths for scientists, better performance evaluation for organizations hosting research, discovery of novel effective funding vehicles, and even identification of promising regions along the scientific frontier. The science of science uses large-scale data on the production of science to search for universal and domain-specific patterns. Here, we review recent developments in this transdisciplinary field.
slides:

  scientometrics
    
      2 founders
      one guy started WOS (web of science)
    
  
  data
    
      WOK thomson thomson-reutuers
      scopus elsevier
      gscholar	AI
      MS academic graph, bigger than everyone else	AI
    
  
  H-index
  c / c_0
  interesting points prob that paper A cites older paper B
  cite dynamics 3 paper specific parameters
    
      preferential attachment - i cited the more its num of citations
      time decay sruvival prob (obsolescence)
      intrinsic fitness η_i of the paper
    
  
  teams
    
      science is becoming more and more team science
      team size is growing
      team papers are more cited
      Q team size affect type of contribution
        
          yes. small teams disrupt, new ideas, while large teams develop existing ideas
          distruption index
        
      
      DL is popular
    
  
  API
    
      
  careers - scientists can peak at anytime in their life, there is no pattern
  make queries on their static local dataset, generated own network
    
      have dataset locally if its big
      pubmed
      dataset shared? invest with grants to get them.
      pubmed, American physical society - free
    
  
  word2vec vs node2vec

AISC TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing

Motivation:
  Machine learning models are notoriously difficult to interpret and debug. This is particularly true of neural networks. In this work, we introduce automated software testing techniques for neural networks that are well-suited to discovering errors which occur only for rare inputs. Specifically, we develop coverage-guided fuzzing (CGF) methods for neural networks. In CGF, random mutations of inputs to a neural network are guided by a coverage metric toward the goal of satisfying user-specified constraints. We describe how fast approximate nearest neighbor algorithms can provide this coverage metric. We then discuss the application of CGF to the following goals: finding numerical errors in trained neural networks, generating disagreements between neural networks and quantized versions of those networks, and surfacing undesirable behavior in character level language models. Finally, we release an open source library called TensorFuzz that implements the described techniques.

  info on his startup and the scene
  cylance hack: enable dynamic debugging
  coverage guided fuzzing
  property based testing
  approximate nearest neighbour
  CGF (fuzzing) hard to do for ANNs
  tensorfuzz
    
      send in NN graph, not code
      images or text are ok
      
    
Ernie 2.0: A Continual Pre-Training Framework for Language Understanding | AISC


  discussion points
    
      not real abliation studies, they are retraining on prior tasks
      not properly continuous learning?
      Ehsan is bringing this up, thinks another paper should be done with ablation studies, and to do only sampling from prior tasks to see if catastrophic forgetting happens
      how much improvement is architecture vs just more data and bigger?
      
    
we need cheap ASICS ASAP,  except those with the budget will just get bigger yet
MLops LaL [2019-09-25 Wed 12:19]


  reproducible, trackable, testable, maintainable
  give higher view, you can customi
  ML automates decision making
    
      trading: buy, sell?
      health: is there tumor?
      market pricing?
    
  
  2011 knight capital story 45min lost $465mil
    
      hidden tech debt in ML systems, sculley
    
  
  prep data, build & train, deploy
  [ ] we’ll be using azure – free credit

/home/will/Zotero/storage/69G8AVWR/Francois-Lavet et al. - 2018 - An Introduction to Deep Reinforcement Learning.pdf
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities [2019-10-08 Tue]


  esp relevant to NLP
  other methods to reduce complexity of softmax layer.
  relates softmax bottleneck to matrix factorization bottleneck
  something of a binary structure?? probability of nodes in a tree.
  M vocab size, N reps different contexts (# words in training data)
  dirichlet -> prior , when ppl want to sample discrete distributions
  taylor expansions -> when they know that the higher order derivatives is smaller .. close to 0?

AISC Restricted Boltzmann Machines for Collaborative Filtering [2019-10-22 Tue]


  why conditional?
  why mcmc needed?
  what is contrastive divergence?
  training goal: get all visible states more probably likely
    
      uses log-likelihood of V()
    
  
  what is the e symbol?
  how relate to matrix factorization techniques?
  conditional factored RBMs?
  collaborative filtering
  SVD is a niave matrix factorization, there are many matrix factorizations?
    
      Harriet related to LDA somehow
    
  
AISC Deep learning enables rapid identification of potent DDR1 kinase inhibitors [2019-10-23 Wed]


  A Zhavoronkob CEO Insilico Medicine
    
      novel drug discovery
      de-novo molecule creation
      drug discovery (DD)
        
          slow, lots of failures
          10 years whole pipeline , 5.5 years for research/pre-clinical
        
      
  GAN text to image synthesis
  GANs in drug discovery “make perfect needles” vs needle in haystack
  go to Alan Aspuru-Guzik lab
  adding RL to GAN
    
      3 years of using generative models
      can’t use G for everything
      GA though can outperform G sometimes
      SMILE and Graphs?
    
  
  Daniel
    
      EU head of division for DL
      AE, VAE
        
          latent codes: AE sparse, VAE gaussian distributed - can do better inference
          VAE KL regulator on latent space
          AAE
          why AE? no mode collapse, works with discrete data out of the box
        
      
      for molecules small differences matter, not like images
      SMILES - rep mol as a string
        
          build a spanning tree
          write atoms in depth-first search order
          pip package: rdkit
        
      
      they combine both
        
          conditional generation x ~ p(x | properties
          optimization : quality(x) -> max_x
          GENTRL(theres) vs ORGAN, RANC ATNC
        
      
      multi-modal priors – get artifacts at boundaries
        
          tensor train
          a prior of a googol gasussiand: a tensor ring induced prior for G modelss
          marginals and conditionals derivation
          cts case – gaussian mixture
        
      
      optimize reward with REINFORCE
      optimize the latent manifold
      SOM
    
  
  crystal -> analyze surface – sounds like static
  template - small molecule that binds to this protein (target)
    
      use that mol as a template
    
  
  also research into synthetic prediction and automation of synthesizing mols
    
      Garapuz labs again
    
  
  model zoo: reps for molecules, lots on smiles/seqs, more now on graph approaches, but smiles still better
    
      fingerprints (his fav), take fingerprint – gen all chemical space that has same fingerprint
        
          gen fingerprints by conditioning
        
      
      3D, point clouds, don’t know conformation of molecules
    
  
  find a lab, gradually transition into field
    
      pharma field is a pain
      deepgenomics good
      MS also in field, or join them
      american chemical society journals, etc, not just nips, iclr
    
  
AISC Projects

neural-dom [0/10]
REPO : https://bitbucket.org/aggregate-intellect/neural-html/src/master/
  Chris: c_bobotsis@hotmail.com
  Shen, Kai
[.] prog synthetic email – the japanese language thing

[.] msg TS guy re DOM extraction code in WOB .core lib

[.] gather more material, prep for DOM encoding

[.] graphnn group meet1


  https://test.ai/ - example use to auto-test websites
  DOM tree
    
      html, dom,
    
  
  miniWOB+ dataset - web tasks.
    
      style info important?
    
  
  leverage headless browser to render dom tree and then take it for the DNNs
  then get nodes and attributes from DOM tree
    
      then rendered html page from dom
      generate json from node tree (dom)
        
          then generate html from json file
          225 attributes per node
            
              DNN would need to learn about inheritnac
            
          
  pro team - acceptance criteria, steps to produce, QA finish closes ticket.
    
      3 levels, functional, business verification,
      test.ai
        
          rendered image recognize a button,
          once dom tree breaks, query fails?
        
      
  [ ] anything relevant to tree structure
  [X] think of tasks
  [ ] graph generation part?
  [ ] decoder? for graph
  [ ] GCN are efficient and speed matters for us since we will have a lot of data points
    
      also parallelization matters for high throughput
    
  
  [ ] meetup Zhang go through his code
  upsampling tree structure from z-space

[.] input proc of graphNN


  work on the input processing part for Graph NN (i.e. translate the JSON DOM tree into one-hot features plus adjacency matrices)?  There are still details and techniques that need to be hammered out, but maybe a good place to start is @Sheng Jia’s codebase.
  Sheng Jia  pytorch
    
      miniwob/env.py and miniwob/custom.py does some preprocessing, flatten out the tree into lists and record the indices etc
        
          The adjancent nodes are labeled for each node as in key-value pair “adj_V” by my wrapper environment,
        
      
      and the actual adjacency matrix is created in models/dom_qnet.py line 112.
      Essentially my custom environment wrapper returns
        
          a list of dom nodes, each of which has multiple key-value pairs for the attributes including the adj_V.
          Those are processed by the model for getting the actual feature vectors.
        
      
  generative code modelling on graphs paper:
    
      He said one way to enable efficient graph generation during training is to put a batch of graphs into a single graph where they are not connected to each other, and in implementation just use sparse matrix.
    
  
Session [2019-06-22 Sat]


  [ ] what would sparse matrix look like?
    
      no autograd on sparseM
    
  
  trail run of graphNN/gae
    
      several fails with proper env versioning.
        
          need python3.6 first, then run setup. tensorflow=1.13
          env = gae.
        
      
      train.py | 200 epochs
        
          Test ROC score: 0.9160440573364683, Test AP score: 0.9308764945564731
        
      
      python train.py –dataset citeseer | 200 epochs
        
          Test ROC score: 0.8687404902789517, Test AP score: 0.8759123955009718
        
      
Generative Code Modeling with Graphs - tryout [2019-06-22 Sat]

utils/tensorise.py test_data/tensorised test_data/exprs-types.json.gz test_data/graphs/WARNING: Logging before flag parsing goes to stderr.
  W0622 12:03:49.424902 140087753975616 deprecation_wrapper.py:119] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfutils/gradratiologgingoptimizer.py:19: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
  W0622 12:03:49.474851 140087753975616 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:123: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
  2019-06-22 12:03:49.512044: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
    2019-06-22 12:03:49.512087: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
    2019-06-22 12:03:49.512111: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (workstation): /proc/driver/nvidia/version does not exist
    2019-06-22 12:03:49.533996: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3701080000 Hz
    2019-06-22 12:03:49.534493: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557c315b29e0 executing computations on platform Host. Devices:
    2019-06-22 12:03:49.534519: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
    Imputed grammar:
    Expression  -[00]->  ! Expression
    Expression  -[01]->  - Expression
    Expression  -[02]->  – Expression
    Expression  -[03]->  CharLiteral
    Expression  -[04]->  Expression * Expression
    Expression  -[05]->  Expression + Expression
    Expression  -[06]->  Expression ++
    Expression  -[07]->  Expression . IndexOf ( Expression )
    Expression  -[08]->  Expression . IndexOf ( Expression , Expression , Expression )
    Expression  -[09]->  Expression . StartsWith ( Expression )
    Expression  -[10]->  Expression < Expression
    Expression  -[11]->  Expression > Expression
    Expression  -[12]->  Expression ? Expression : Expression
    Expression  -[13]->  Expression [ Expression ]
    Expression  -[14]->  IntLiteral
    Expression  -[15]->  StringLiteral
    Expression  -[16]->  Variable
    Known literals:
    IntLiteral: [‘%UNK%’, ‘0’, ‘1’, ‘2’, ‘4’, ‘43’]
    CharLiteral: [‘%UNK%’, “’-‘”]
    StringLiteral: [‘“foobar”’, ‘%UNK%’]
    W0622 12:03:49.610398 140087753975616 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:175: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
  W0622 12:03:49.610724 140087753975616 deprecation_wrapper.py:119] From utils/../exprsynth/contextgraphmodel.py:142: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
  W0622 12:03:49.658701 140087753975616 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:212: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
  W0622 12:03:50.152516 140087753975616 deprecation.py:323] From utils/../exprsynth/contextgraphmodel.py:184: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use keras.layers.dense instead.
    W0622 12:03:50.158362 140087753975616 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    W0622 12:03:50.908026 140087753975616 deprecation.py:506] From utils/../exprsynth/contextgraphmodel.py:186: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
    W0622 12:03:51.013405 140087753975616 deprecation.py:323] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfmodels/sparsegnn.py:95: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
    Instructions for updating:
    This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.
    W0622 12:03:52.276013 140087753975616 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:564: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    W0622 12:03:52.378436 140087753975616 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:574: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    W0622 12:03:58.355536 140087753975616 deprecation.py:323] From utils/../exprsynth/nagdecoder.py:420: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.where in 2.0, which has the same broadcast rule as np.where
    /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
    W0622 12:04:13.613733 140087753975616 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:200: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
utils/train.py trained_models/overtrain test_data/tensorised/{,}WARNING: Logging before flag parsing goes to stderr.
  W0622 12:10:21.039777 139776961083200 deprecation_wrapper.py:119] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfutils/gradratiologgingoptimizer.py:19: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
W0622 12:10:21.053452 139776961083200 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:123: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2019-06-22 12:10:21.085190: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
  2019-06-22 12:10:21.085327: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
  2019-06-22 12:10:21.085415: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (workstation): /proc/driver/nvidia/version does not exist
  2019-06-22 12:10:21.107267: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3701080000 Hz
  2019-06-22 12:10:21.107748: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5639c2adef40 executing computations on platform Host. Devices:
  2019-06-22 12:10:21.107769: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
  W0622 12:10:21.111878 139776961083200 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:175: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
W0622 12:10:21.112083 139776961083200 deprecation_wrapper.py:119] From utils/../exprsynth/contextgraphmodel.py:142: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
W0622 12:10:21.163701 139776961083200 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:212: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
W0622 12:10:21.684241 139776961083200 deprecation.py:323] From utils/../exprsynth/contextgraphmodel.py:184: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
  Instructions for updating:
  Use keras.layers.dense instead.
  W0622 12:10:21.692071 139776961083200 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
  Instructions for updating:
  Call initializer instance with the dtype argument instead of passing it to the constructor
  W0622 12:10:22.428447 139776961083200 deprecation.py:506] From utils/../exprsynth/contextgraphmodel.py:186: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
  Instructions for updating:
  Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
  W0622 12:10:22.530452 139776961083200 deprecation.py:323] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfmodels/sparsegnn.py:95: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
  Instructions for updating:
  This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.
  W0622 12:10:23.880191 139776961083200 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:564: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
  Instructions for updating:
  Call initializer instance with the dtype argument instead of passing it to the constructor
  W0622 12:10:23.902530 139776961083200 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:574: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
  Instructions for updating:
  Call initializer instance with the dtype argument instead of passing it to the constructor
  W0622 12:10:30.169962 139776961083200 deprecation.py:323] From utils/../exprsynth/nagdecoder.py:420: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
  Instructions for updating:
  Use tf.where in 2.0, which has the same broadcast rule as np.where
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  W0622 12:10:45.005255 139776961083200 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:200: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
Starting training run NAG-2019-06-22-12-10-21 of model NAGModel with following hypers:
  {“optimizer”: “Adam”, “seed”: 0, “dropout_keep_rate”: 0.9, “learning_rate”: 0.00025, “learning_rate_decay”: 0.98, “momentum”: 0.85, “gradient_clip”: 1, “max_epochs”: 500, “patience”: 5, “max_num_cg_nodes_in_batch”: 100000, “excluded_cg_edge_types”: [], “cg_add_subtoken_nodes”: true, “cg_node_label_embedding_style”: “Token”, “cg_node_label_vocab_size”: 10000, “cg_node_label_char_length”: 16, “cg_node_label_embedding_size”: 32, “cg_node_type_vocab_size”: 54, “cg_node_type_max_num”: 10, “cg_node_type_embedding_size”: 32, “cg_ggnn_layer_timesteps”: [3, 1, 3, 1], “cg_ggnn_residual_connections”: {“1”: [0], “3”: [0, 1]}, “cg_ggnn_hidden_size”: 64, “cg_ggnn_use_edge_bias”: false, “cg_ggnn_use_edge_msg_avg_aggregation”: false, “cg_ggnn_use_propagation_attention”: false, “cg_ggnn_graph_rnn_activation”: “tanh”, “cg_ggnn_graph_rnn_cell”: “GRU”, “eg_token_vocab_size”: 100, “eg_literal_vocab_size”: 10, “eg_max_variable_choices”: 10, “eg_propagation_substeps”: 50, “eg_hidden_size”: 64, “eg_edge_label_size”: 16, “exclude_edge_types”: [], “eg_graph_rnn_cell”: “GRU”, “eg_graph_rnn_activation”: “tanh”, “eg_use_edge_bias”: false, “eg_use_vars_for_production_choice”: true, “eg_update_last_variable_use_representation”: true, “eg_use_literal_copying”: true, “eg_use_context_attention”: true, “eg_max_context_tokens”: 500, “run_id”: “NAG-2019-06-22-12-10-21”}
  2019-06-22 12:10:48.280541: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=–tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass –vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=–xla_hlo_profile.
  ==== Epoch 0 ====
  Epoch 0 (train) took 12.28s [processed 1 samples/second]
  Training Loss: 9.437750
  Epoch 0 (valid) took 3.41s [processed 4 samples/second]
  Validation Loss: 8.566715
  Best result so far – saving model as ‘trained_models/overtrain/NAGModel_NAG-2019-06-22-12-10-21_model_best.pkl.gz’.
==== Epoch 136 ====
  Epoch 136 (train) took 0.88s [processed 17 samples/second]
  Training Loss: 0.451712
  Epoch 136 (valid) took 0.40s [processed 37 samples/second]
  Validation Loss: 0.392952
utils/test.py trained_models/overtrain/NAGModel_NAG-2019-06-22-12-10-21_model_best.pkl.gz test_data/graphs/ trained_models/overtrain/test_results/WARNING: Logging before flag parsing goes to stderr.
  W0622 12:19:52.787774 140550576658240 deprecation_wrapper.py:119] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfutils/gradratiologgingoptimizer.py:19: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
W0622 12:19:52.905472 140550576658240 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:123: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2019-06-22 12:19:52.955281: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
  2019-06-22 12:19:52.955433: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
  2019-06-22 12:19:52.955526: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (workstation): /proc/driver/nvidia/version does not exist
  2019-06-22 12:19:52.974973: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3701080000 Hz
  2019-06-22 12:19:52.975514: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x561a7a83d810 executing computations on platform Host. Devices:
  2019-06-22 12:19:52.975536: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
  W0622 12:19:52.976696 140550576658240 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:175: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
W0622 12:19:52.976995 140550576658240 deprecation_wrapper.py:119] From utils/../exprsynth/contextgraphmodel.py:142: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
W0622 12:19:53.046406 140550576658240 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:212: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
W0622 12:19:53.993441 140550576658240 deprecation.py:323] From utils/../exprsynth/contextgraphmodel.py:184: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
  Instructions for updating:
  Use keras.layers.dense instead.
  W0622 12:19:54.005863 140550576658240 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
  Instructions for updating:
  Call initializer instance with the dtype argument instead of passing it to the constructor
  W0622 12:19:54.663139 140550576658240 deprecation.py:506] From utils/../exprsynth/contextgraphmodel.py:186: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
  Instructions for updating:
  Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
  W0622 12:19:54.760896 140550576658240 deprecation.py:323] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/dpu_utils/tfmodels/sparsegnn.py:95: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
  Instructions for updating:
  This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.
  W0622 12:19:55.945921 140550576658240 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:564: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
  Instructions for updating:
  Call initializer instance with the dtype argument instead of passing it to the constructor
  W0622 12:19:55.965353 140550576658240 deprecation.py:506] From /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:574: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
  Instructions for updating:
  Call initializer instance with the dtype argument instead of passing it to the constructor
  W0622 12:20:01.999933 140550576658240 deprecation.py:323] From utils/../exprsynth/nagdecoder.py:420: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
  Instructions for updating:
  Use tf.where in 2.0, which has the same broadcast rule as np.where
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  /home/will/anaconda3/envs/graph-code-modelling/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  “Converting sparse IndexedSlices to a dense Tensor of unknown shape. ”
  W0622 12:20:19.303285 140550576658240 deprecation_wrapper.py:119] From utils/../exprsynth/model.py:200: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
2019-06-22 12:20:23.390381: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=–tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass –vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=–xla_hlo_profile.
  W0622 12:20:31.159235 140550576658240 deprecation.py:506] From utils/../exprsynth/nagdecoder.py:580: calling softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.
  Instructions for updating:
  dim is deprecated, use axis instead
  Groundtruth: args [ 0 ]
  @1 Prob. 0.561: args [ 0 ]
  @2 Prob. 0.044: args [ classVar ]
  @3 Prob. 0.030: args . IndexOf ( ‘-’ )
  @4 Prob. 0.020: args [ “foobar” ]
  @5 Prob. 0.016: args ? 1 : classVar
  Groundtruth: intVar + classVar
  @1 Prob. 0.419: intVar + classVar
  @2 Prob. 0.074: intVar + 4
  @3 Prob. 0.050: intVar . IndexOf ( ‘-’ )
  @4 Prob. 0.040: intVar . IndexOf ( ‘-’ , 2 , 43 )
  @5 Prob. 0.014: intVar . IndexOf ( ‘-’ , 2 , classVar )
  Groundtruth: foo . StartsWith ( “foobar” )
  @1 Prob. 0.424: foo . StartsWith ( “foobar” )
  @2 Prob. 0.102: foo . StartsWith ( ‘-’ )
  @3 Prob. 0.034: foo . IndexOf ( ‘-’ )
  @4 Prob. 0.031: foo . StartsWith ( %UNK% )
  @5 Prob. 0.003: foo . IndexOf ( ‘-’ , 43 , 43 )
  Groundtruth: foo . IndexOf ( ‘-’ )
  @1 Prob. 0.276: foo . IndexOf ( ‘-’ )
  @2 Prob. 0.062: foo . IndexOf ( ‘-’ , 2 , 43 )
  @3 Prob. 0.051: foo [ 0 ]
  @4 Prob. 0.033: foo + classVar
  @5 Prob. 0.025: foo . StartsWith ( ‘-’ )
  Groundtruth: foo . IndexOf ( ‘-’ , 2 , 43 )
  @1 Prob. 0.215: foo . IndexOf ( ‘-’ , 2 , 43 )
  @2 Prob. 0.133: foo . IndexOf ( ‘-’ )
  @3 Prob. 0.070: foo + classVar
  @4 Prob. 0.055: foo + 4
  @5 Prob. 0.005: foo . IndexOf ( ‘-’ , 2 , classVar )
  Groundtruth: arr [ 1 ]
  @1 Prob. 0.353: b [ 1 ]
  @2 Prob. 0.167: arr [ 1 ]
  @3 Prob. 0.126: i [ 1 ]
  @4 Prob. 0.050: b [ i ]
  @5 Prob. 0.025: b ? 1 : i
  Groundtruth: j > classVar2
  @1 Prob. 0.564: j > classVar2
  @2 Prob. 0.119: j > 4
  @3 Prob. 0.114: – j
  @4 Prob. 0.028: ! j
  @5 Prob. 0.019: j > - classVar2
  Groundtruth: – j
  @1 Prob. 0.531: – j
  @2 Prob. 0.160: j > classVar2
  @3 Prob. 0.059: j > 4
  @4 Prob. 0.052: j ++
  @5 Prob. 0.042: ! j
  Groundtruth: iarr [ j ] * - 1 + 4
  @1 Prob. 0.193: iarr + j
  @2 Prob. 0.161: iarr + 4
  @3 Prob. 0.013: iarr [ j ] * - 1 + 4
  @4 Prob. 0.005: iarr [ 1 ] * - 1 + 4
  @5 Prob. 0.005: iarr [ 1 ] * j + 4
  Groundtruth: ! b
  @1 Prob. 0.474: ! b
  @2 Prob. 0.079: b ++
  @3 Prob. 0.062: b > 4
  @4 Prob. 0.039: b > j
  @5 Prob. 0.027: b < iarr
  Groundtruth: j > 4
  @1 Prob. 0.300: j > 4
  @2 Prob. 0.066: j < iarr
  @3 Prob. 0.064: j > iarr
  @4 Prob. 0.042: 4 < j
  @5 Prob. 0.036: j < 4
  Groundtruth: 4 < classVar2
  @1 Prob. 0.156: classVar2 > 4
  @2 Prob. 0.136: 4 < classVar2
  @3 Prob. 0.056: 4 > 4
  @4 Prob. 0.050: classVar2 ++
  @5 Prob. 0.049: classVar2 < j
  Groundtruth: classVar2 ++
  @1 Prob. 0.442: classVar2 ++
  @2 Prob. 0.085: ! classVar2
  @3 Prob. 0.082: classVar2 > 4
  @4 Prob. 0.059: – classVar2
  @5 Prob. 0.057: classVar2 > j
  Groundtruth: b ? 2 : i
  @1 Prob. 0.352: b ? 2 : i
  @2 Prob. 0.085: b ? 2 : - i
  @3 Prob. 0.082: b ? 2 : arr
  @4 Prob. 0.041: b ? 2 : 4
  @5 Prob. 0.026: b ? 2 : - 1
  Groundtruth: b ? 1 : - i
  @1 Prob. 0.276: b ? 1 : i
  @2 Prob. 0.114: b ? 1 : arr
  @3 Prob. 0.071: b ? 1 : - i
  @4 Prob. 0.045: b [ 1 ]
  @5 Prob. 0.029: b ? 1 : 4
  Num samples: 15 (15 before filtering)
  Avg Sample Perplexity: 1.39
  Std Sample Perplexity: 0.21
  Accuracy@1: 73.3333%
  Accuracy@5: 100.0000%
[.] JSON DOM tree -> 1hot features

[.] JSON DOM tree -> adjacency M

[2019-06-24 Mon]

Xiyang Chen [4:05 AM]
  After some thinking and surveying, I’m leaning towards a dual encoder-decoder setup as a possible backbone architecture for our unsupervised learning task, where the latent spaces z could be tied together via a loss (maybe Wasserstein/optimal transport loss). This is also being used for some works on multimodal training tasks such as VQA, where one autoencoder is responsible for the visual (image) part and the other autoencoder for the text part.
  For our case it would be something like one autoencoder for HTML tree and another for screenshot, maybe conditioned on width of the window as well as desktop/mobile mode. More on this later. Meanwhile let me know if you have any thoughts/opinions. (edited)
Xiyang Chen [4:12 AM]
  Another radically different/maybe related approach is just use a CNN on screenshots with auxiliary HTML DOM node info, with skip connections on the hierarchy. But this idea is still vague. (edited)
Sheng Jia [11:43 AM]
  Is there any work on generating the graph or tree structure? (about the autoencoder for HTML) (edited)
  or are we assuming the fixed structure for now
[2019-06-28 Fri]

raceback (most recent call last):
  File “exec.py”, line 85, in <module>
  main_f(res_dir, settings, hparams_list, paths_list, prints_dict)
  File “/home/will/DevAcademics/GraphNN/DOM-Q-NET/entries/q_template.py”, line 224, in main
  num_atoms=qlearn_hs.get(“num_atoms”)
  File “/home/will/DevAcademics/GraphNN/DOM-Q-NET/algorithms/qlearn.py”, line 73, in multitask_train
  t_config.batch_device
  File “/home/will/DevAcademics/GraphNN/DOM-Q-NET/actors/dqn_actor.py”, line 93, in __init__
  self._raw_s_t = self._env.reset()
  File “/home/will/DevAcademics/GraphNN/DOM-Q-NET/miniwob/env.py”, line 36, in reset
  self._instance.force_stop()
  File “/home/will/DevAcademics/GraphNN/DOM-Q-NET/miniwob/instance.py”, line 134, in force_stop
  self._driver.execute_script(‘return core.endEpisode(0);’)
  File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py”, line 636, in execute_script
  ‘args’: converted_args})[‘value’]
  File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py”, line 321, in execute
  self.error_handler.check_response(response)
  File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py”, line 242, in check_response
  raise exception_class(message, screen, stacktrace)
  selenium.common.exceptions.JavascriptException: Message: javascript error: core is not defined
https://chromedriver.storage.googleapis.com/index.html?path=75.0.3770.90/
docker commands

docker run -d -p 4444:4444 -v /dev/shm:/dev/shm selenium/standalone-chrome-debug:3.6.0-bromine
docker run -d -p 4444:4444 -v /dev/shm:/dev/shm selenium/standalone-chrome:3.141.59-radium
link dump

CHROMEDRIVER SELENIUM
  ChromeDriver - WebDriver for Chrome
  ChromeDriver · SeleniumHQ/selenium Wiki
  Selenium Documentation — Selenium Documentation
  Selenium using Python - Geckodriver executable needs to be in PATH - Stack Overflow
  SeleniumHQ/selenium: A browser automation framework and ecosystem.
  How to Setup Selenium with ChromeDriver on Ubuntu 18.04 & 16.04 – TecAdmin
  python - selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: File not found error invoking send_keys() using Selenium - Stack Overflow
  selenium (Session info: headless chrome=75.0.3770.100) - Google Search
  DevToolsActivePort file doesn’t exist. · Issue #46 · heroku/heroku-buildpack-google-chrome
  2470 - Chromedriver produces error when run via cron -> DevToolsActivePort file doesn’t exist - chromedriver - Monorail
  selenium - WebDriverException: unknown error: DevToolsActivePort file doesn’t exist while trying to initiate Chrome Browser - Stack Overflow
  python - I got error message while input string into selenium webdriver - Stack Overflow
  url encoding - How to urlencode a querystring in Python? - Stack Overflow
  How to convert a url string to safe characters with python? - Stack Overflow
  Live Coding: Selenium Browser Automation | DevDungeon
  SeleniumHQ/docker-selenium: Docker images for Selenium Grid Server (Standalone, Hub, and Nodes).
  Using Selenium-Server on Docker to run your Browser Tests - Meltwater Engineering Blog
  DevDungeon | Virtual Hackerspace
  python - urllib.urlencode: TypeError not a valid non-string sequence or mapping object - Stack Overflow
  selenium.common.exceptions — Selenium 3.14 documentation
  javascript - Error selenium.common.exceptions.JavascriptException: Message: ReferenceError: room is not defined - Stack Overflow
  selenium.webdriver.chrome.options.Options Python Example
  python - selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally with ChromeDriver Chrome and Selenium - Stack Overflow
  Version Selection - ChromeDriver - WebDriver for Chrome
  Autograd: Automatic Differentiation — PyTorch Tutorials 1.1.0 documentation
  [[https://arxiv.org/abs/1810.10531v1?utm_content=buffer48508&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer][[1810.10531v1] A mathematical theory of semantic development in deep neural networks]]
DOCKER
  Docker Selenium. Getting Started - YouTube
  selenium webdriver docker standalone - YouTube
  How To Run Your Selenium Tests Headlessly in Docker - Chris Kenst
  HN Search powered by Algolia
  Show HN: Python script to automate filling out Google form using Selenium | Hacker News
  vedipen/AutomateGoogleForm: Python script for automating Google form filling.
  Automate the Boring Stuff with Python
  Learn Selenium - Best Selenium Tutorials (Ranked) | Hackr.io
  7 Must-read Selenium Tutorials - Applitools Blog
  christian-bromann/awesome-selenium: A curated list of delightful Selenium resources.
  Reviews of ‘TestNG Tutorials for Selenium Webdriver’ for learning Selenium | Hackr.io
  selenium
  blog/How to use selenium with docker.md at 512d4b69242af09af7dd1f83261e1bc661ec1d23 · Windsooon/blog
  7. WebDriver API — Selenium Python Bindings 2 documentation
  Getting Started with Hub and Nodes · SeleniumHQ/docker-selenium Wiki
  Upgrade chromedriver to 2.32 · Issue #560 · SeleniumHQ/docker-selenium
  Selenium Standalone v.3.141.59
  2. Getting Started — Selenium Python Bindings 2 documentation
  Chrome Options & Desiredcapabilities: AdBlocker, Incognito, Headless
  Headless chrome is not working in the docker · Issue #520 · SeleniumHQ/docker-selenium
  SchulteMarkus/selenium-standalone-chrome-spring-boot-demo: Demonstrating “selenium/standalone-chrome” in a Spring Boot project
  WebDriver Hub
HN Search powered by Algolia
  Free Hotel Wifi with Python and Selenium | Hacker News
  Free Hotel Wifi with Python and Selenium · Gokberk Yaltirakli
  Selenium 2.0: Out Now | Hacker News
  How to make Selenium tests reliable, scalable, and maintainable | Hacker News
  Selenium: 7 Things You Need To Know - Lucidchart
  Consistent Selenium Testing in Python | Hacker News
  Hacker News
  Hacker News
  How to Scrape Web using Python, Selenium and Beautiful Soup · Swetha’s Blog
  Hacker News
  weskerfoot/DeleteFB: Automate Scrubbing your Facebook Presence
  Hacker News
  Hacker News
  Selenium With Headless Chrome On Travis CI
  GUI and Headless Browser Testing - Travis CI
  Hacker News
  Running Headless Selenium with Chrome
  Hacker News
  How To Install Node.js on Ubuntu 18.04 | DigitalOcean
[.] Meet : Restart [2019-10-12 Sat]


  latex doc Representation learning of HTML at large scale and its applications to downstream tasks - Online LaTeX Editor Overleaf
  3rd part experiments
  refs
  model of DOMs. OUTPUT:
    
      (automated) Web navigation
      Classification (easiest)
      HTML generation
    
  
  Sheng: which areas are clickable (for RL)? filter out what is clickable or not
    
      test.ai is classifying buttons
        
          QA/test automation
          pixelwise presentation screen (images),
        
      
      action space factorization
        
          Xiyang interested in parsing part
        
      
      only one cite
      using GCN for parse (classification, etc, transductive not inductive – all nodes need to be known ahead of time)
        
          closest alt is molecular generation (aisc talk)
        
      
  some small task
  like an autoencoder?
    
      input simplest: RNN – any length
    
  
  2 AEs
    
      html
      images
      somehow regularize 2 codes so they somehow relate to each other
        
          2-4 papers, image labelling
        
      
      graphNN for encode/decode html
        
          alts: transformer (150 tokens max / gpu)
          gae, graphsage
        
      
      image AE
        
          conv
        
      
  transformer decodeing is sequantial
  sketch to code
  Turning Design Mockups Into Code With Deep Learning
    
      LSTM can do it, but slow, not scale. we want scale
      transformer has heirarchy rep, representational power he thinks better than LSTM
      but MS active with graphs for code generation
    
  
  so we know works for RNNs, good reason to think it will work for Transformer, graphNN proof from MS work.
  How to deal with in page images?
    
      also text
    
  
[.] jobs

first two priority:

  literature review
  crawling 1000-2000 pages for data
    
      sketch website has a simple dataset
    
  
  then small experiment for theoretical, use data we crawled
  optimal transport – make 2 distrib as close as possible while making differentiable.
  see if any repos of the papers
  DOM tree into form good for DNNs

[.] review other latest, relevant papers

normalizing adj code

gcn-repo-normalizefn
def normalize_adj(adj):
    """Symmetrically normalize adjacency matrix."""
    adj = sp.coo_matrix(adj)
    rowsum = np.array(adj.sum(1))
    d_inv_sqrt = np.power(rowsum, -0.5).flatten()
    d_inv_sqrt[np.isinf(d_inv_sqrt)] = 0.
    d_mat_inv_sqrt = sp.diags(d_inv_sqrt)
    return adj.dot(d_mat_inv_sqrt).transpose().dot(d_mat_inv_sqrt).tocoo()
pygcn-repo-normalefn
def normalize(mx):
    """Row-normalize sparse matrix"""
    rowsum = np.array(mx.sum(1))
    r_inv = np.power(rowsum, -1).flatten()
    r_inv[np.isinf(r_inv)] = 0.
    r_mat_inv = sp.diags(r_inv)
    mx = r_mat_inv.dot(mx)
    return mx
neural-dom meet 2 [2019-10-19 Sat]


  two new papers:
    
      generalized zero and few-shot paper talks motivation and history of why
        
          refs their old paper from ICLR
          poster for this one
        
      
      pluralistic image completion
    
  
  encoder: lstm, transformer, graphnn?
  decoder:
    
      generating code in parallel – we want it to scale
    
  
  anything mentions tree or heirarchy
  molecule generation via graphnn – heavy constraints though

[2019-10-20 Sun]


  clones Gran repo here ~/DevAcademics/GraphNN/GRAN
    
      pdf-link
    
  
paper
Efficient Graph Generation with Graph Recurrent Attention Networks
  ~/Zotero/storage/J484VNAT/Liao et al. - 2019 - Efficient Graph Generation with Graph Recurrent At.pdf
Skeleton
Link on page 1: t deep graph generative model that can scale to this size. Our code is released at: https://github.com/lrjconan/GRAN1 IntroductionNotes for page 2Notes for page 2Notes for page 22 Model2.1 Representation of Graphs and the Generation Process2.2 Graph Recurrent Attention Networks2.3 Learning with Families of Canonical Orderings3 Related Work4 Experiments4.1 Dataset and Evaluation Metrics4.2 Benchmarking Sample Quality4.3 Efficiency vs. Sample Quality4.4 Ablation Study5 Conclusion6 Appendix6.1 K-Core Node Ordering6.2 Lobster Graphs6.3 Full Ablation Study & Visual Examples
AISC Workshops

NLP workshop

AISC State of NLP in 2019

  DEALS
    
      To show our appreciation for your patience, we would like to offer you a 50% off discount for any of our workshops you register until the end of September. Code: nlpjun27
      cheap I suggest you to use GCP ($ 400 of free credit ) and use the image provided by fast.ai to do that because it comes with all of required packages (hopefully) installed. Here is a guide how to do it: https://course.fast.ai/start_gcp.html Then download the notebook from google CoLab and upload it on your GPU VM on GCP.
    
  
  stoi = string to identifier
  itos = id to string

NLP workshop #1

[2019-06-27 Thu]

  logistics for deeper nets is worse – vanishing/exploding gradients problem, hence ReLU
  NLP predict next word
  Viterbi algo
  RNN drawbacks, hard to compete against transformers
  Part II
  Huggingface BERT
  stanford QA set
  Huggingface good
  3 levels tokenization - words, etc
  autoencoding: representation learning
  neural network methods in language tel aviv
  statistical natural lang processing chris manning, shutz - older
    
      matrix factorization
    
  
  drovesky martin - speech and language processing

NLP workshop #3

BERT is just the encoder part of the transformer,
  trained with different tasks
for unbalanced data
from aisc/nlp-workshop posts july6
  Passing the weights to CrossEntropyLoss correctly - PyTorch Forums
  with this method weightings are applied to the calculations to give a different weighting for output classes.
ufoym/imbalanced-dataset-sampler: A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.
  this looks like a resampler like smote, rose.
from sampler import ImbalancedDatasetSampler

train_loader = torch.utils.data.DataLoader(
    train_dataset, 
    sampler=ImbalancedDatasetSampler(train_dataset),  # this is the module
    batch_size=args.batch_size, 
    **kwargs
)
this is about label-smoothing loss
  With label smoothing,
  KL-divergence between q_{smoothed ground truth prob.}(w)
  p_{prob. computed by model}(w) is minimized.
  label smoothing reduces onehot targets from (0,1) to numbers representing some uncertainty ie (0.1, 0.9). Example is given where this is used with dirty data when some instances are mis-labeled. Both frameworks have args for this.
  OpenNMT-py/loss.py at e8622eb5c6117269bb3accd8eb6f66282b5e67d9 · OpenNMT/OpenNMT-py
[x] NLP ASGN 2

Assignment 2:
  the second assignment aims to use the encoder of the transformer architecture to tackle the QUORA DEDUPLICATION task that you worked on during the first assignment. Can you improve the performance of your model using the transformer architecture?
  Remember you can only swap the MODEL part of your old code with the transformer encoder and change the relevant parameters (input size and alike). To that end you can use BERT’s Encoder or the code in the study material above (the latter I recommend).
Transformer II - version from medium page

Embedding

class Embedder(nn.Module):
    def __init__(self, vocab_size, d_model):
        super().__init__()
        self.embed = nn.Embedding(vocab_size, d_model)
    def forward(self, x):
        return self.embed(x)
positional encoding

class PositionalEncoder(nn.Module):
    def __init__(self, d_model, max_seq_len = 80):
        super().__init__()
        self.d_model = d_model
        
        # create constant 'pe' matrix with values dependant on 
        # pos and i
        pe = torch.zeros(max_seq_len, d_model)
        for pos in range(max_seq_len):
            for i in range(0, d_model, 2):
                pe[pos, i] = \
                math.sin(pos / (10000 ** ((2 * i)/d_model)))
                pe[pos, i + 1] = \
                math.cos(pos / (10000 ** ((2 * (i + 1))/d_model)))
                
        pe = pe.unsqueeze(0)
        self.register_buffer('pe', pe)
 
    
    def forward(self, x):
        # make embeddings relatively larger
        x = x * math.sqrt(self.d_model)
        #add constant to embedding
        seq_len = x.size(1)
        x = x + Variable(self.pe[:,:seq_len], \
        requires_grad=False).cuda()
        return x
masks

batch = next(iter(train_iter))
input_seq = batch.English.transpose(0,1)
input_pad = EN_TEXT.vocab.stoi['<pad>']
# creates mask with 0s wherever there is padding in the input
input_msk = (input_seq != input_pad).unsqueeze(1)
for the target_seq we do the same, but then create an additinoal step
# create mask as before
target_seq = batch.French.transpose(0,1)
target_pad = FR_TEXT.vocab.stoi['<pad>']
target_msk = (target_seq != target_pad).unsqueeze(1)
size = target_seq.size(1) # get seq_len for matrix
nopeak_mask = np.triu(np.ones(1, size, size),
k=1).astype('uint8')
nopeak_mask = Variable(torch.from_numpy(nopeak_mask) == 0)
target_msk = target_msk & nopeak_mask
MHA

class MultiHeadAttention(nn.Module):
    def __init__(self, heads, d_model, dropout = 0.1):
        super().__init__()
        
        self.d_model = d_model
        self.d_k = d_model // heads
        self.h = heads
        
        self.q_linear = nn.Linear(d_model, d_model)
        self.v_linear = nn.Linear(d_model, d_model)
        self.k_linear = nn.Linear(d_model, d_model)
        self.dropout = nn.Dropout(dropout)
        self.out = nn.Linear(d_model, d_model)
    
    def forward(self, q, k, v, mask=None):
        
        bs = q.size(0)
        
        # perform linear operation and split into h heads
        
        k = self.k_linear(k).view(bs, -1, self.h, self.d_k)
        q = self.q_linear(q).view(bs, -1, self.h, self.d_k)
        v = self.v_linear(v).view(bs, -1, self.h, self.d_k)
        
        # transpose to get dimensions bs * h * sl * d_model
       
        k = k.transpose(1,2)
        q = q.transpose(1,2)
        v = v.transpose(1,2)
# calculate attention using function we will define next
        scores = attention(q, k, v, self.d_k, mask, self.dropout)
        
        # concatenate heads and put through final linear layer
        concat = scores.transpose(1,2).contiguous()\
        .view(bs, -1, self.d_model)
        
        output = self.out(concat)
    
def attention(q, k, v, d_k, mask=None, dropout=None):
    
    scores = torch.matmul(q, k.transpose(-2, -1)) /  math.sqrt(d_k)
if mask is not None:
        mask = mask.unsqueeze(1)
        scores = scores.masked_fill(mask == 0, -1e9)
scores = F.softmax(scores, dim=-1)
    
    if dropout is not None:
        scores = dropout(scores)
        
    output = torch.matmul(scores, v)
    return output
FFNN

class FeedForward(nn.Module):
    def __init__(self, d_model, d_ff=2048, dropout = 0.1):
        super().__init__() 
        # We set d_ff as a default to 2048
        self.linear_1 = nn.Linear(d_model, d_ff)
        self.dropout = nn.Dropout(dropout)
        self.linear_2 = nn.Linear(d_ff, d_model)
    def forward(self, x):
        x = self.dropout(F.relu(self.linear_1(x)))
        x = self.linear_2(x)
        return x
normalization

class Norm(nn.Module):
    def __init__(self, d_model, eps = 1e-6):
        super().__init__()
    
        self.size = d_model
        # create two learnable parameters to calibrate normalisation
        self.alpha = nn.Parameter(torch.ones(self.size))
        self.bias = nn.Parameter(torch.zeros(self.size))
        self.eps = eps
    def forward(self, x):
        norm = self.alpha * (x - x.mean(dim=-1, keepdim=True)) \
        / (x.std(dim=-1, keepdim=True) + self.eps) + self.bias
        return norm
E D layers

# build an encoder layer with one multi-head attention layer and one # feed-forward layer
class EncoderLayer(nn.Module):
    def __init__(self, d_model, heads, dropout = 0.1):
        super().__init__()
        self.norm_1 = Norm(d_model)
        self.norm_2 = Norm(d_model)
        self.attn = MultiHeadAttention(heads, d_model)
        self.ff = FeedForward(d_model)
        self.dropout_1 = nn.Dropout(dropout)
        self.dropout_2 = nn.Dropout(dropout)
        
    def forward(self, x, mask):
        x2 = self.norm_1(x)
        x = x + self.dropout_1(self.attn(x2,x2,x2,mask))
        x2 = self.norm_2(x)
        x = x + self.dropout_2(self.ff(x2))
        return x
    
# build a decoder layer with two multi-head attention layers and
# one feed-forward layer
class DecoderLayer(nn.Module):
    def __init__(self, d_model, heads, dropout=0.1):
        super().__init__()
        self.norm_1 = Norm(d_model)
        self.norm_2 = Norm(d_model)
        self.norm_3 = Norm(d_model)
        
        self.dropout_1 = nn.Dropout(dropout)
        self.dropout_2 = nn.Dropout(dropout)
        self.dropout_3 = nn.Dropout(dropout)
        
        self.attn_1 = MultiHeadAttention(heads, d_model)
        self.attn_2 = MultiHeadAttention(heads, d_model)
        self.ff = FeedForward(d_model).cuda()
def forward(self, x, e_outputs, src_mask, trg_mask):
        x2 = self.norm_1(x)
        x = x + self.dropout_1(self.attn_1(x2, x2, x2, trg_mask))
        x2 = self.norm_2(x)
        x = x + self.dropout_2(self.attn_2(x2, e_outputs, e_outputs,
        src_mask))
        x2 = self.norm_3(x)
        x = x + self.dropout_3(self.ff(x2))
        return x
# We can then build a convenient cloning function that can generate multiple layers:
def get_clones(module, N):
    return nn.ModuleList([copy.deepcopy(module) for i in range(N)])
E & D

class Encoder(nn.Module):
    def __init__(self, vocab_size, d_model, N, heads):
        super().__init__()
        self.N = N
        self.embed = Embedder(vocab_size, d_model)
        self.pe = PositionalEncoder(d_model)
        self.layers = get_clones(EncoderLayer(d_model, heads), N)
        self.norm = Norm(d_model)
    def forward(self, src, mask):
        x = self.embed(src)
        x = self.pe(x)
        for i in range(N):
            x = self.layers[i](x, mask)
        return self.norm(x)
    
class Decoder(nn.Module):
    def __init__(self, vocab_size, d_model, N, heads):
        super().__init__()
        self.N = N
        self.embed = Embedder(vocab_size, d_model)
        self.pe = PositionalEncoder(d_model)
        self.layers = get_clones(DecoderLayer(d_model, heads), N)
        self.norm = Norm(d_model)
    def forward(self, trg, e_outputs, src_mask, trg_mask):
        x = self.embed(trg)
        x = self.pe(x)
        for i in range(self.N):
            x = self.layers[i](x, e_outputs, src_mask, trg_mask)
        return self.norm(x)
transformer

class Transformer(nn.Module):
    def __init__(self, src_vocab, trg_vocab, d_model, N, heads):
        super().__init__()
        self.encoder = Encoder(src_vocab, d_model, N, heads)
        self.decoder = Decoder(trg_vocab, d_model, N, heads)
        self.out = nn.Linear(d_model, trg_vocab)
    def forward(self, src, trg, src_mask, trg_mask):
        e_outputs = self.encoder(src, src_mask)
        d_output = self.decoder(trg, e_outputs, src_mask, trg_mask)
        output = self.out(d_output)
        return output
# we don't perform softmax on the output as this will be handled 
# automatically by our loss function
[x] NLP ASGN 3
see A3.org
[2019-07-11 Thu]

running the transformer article repo on gpgpu, needed to do following (env ‘main’)
  817  conda install -c derickl torchtext
  827  conda install dill
  832  python -m spacy download en
  833  python -m spacy download fr
  829  python train.py -src_data data/english.txt -trg_data data/french.txt -src_lang en -trg_lang fr
  start time [2019-07-11 Thu] 09:52pm
[2019-07-12 Fri]


  I moved opt.train to cuda, check on gpgpu later

[2019-07-26 Fri] going thru slack posts

if low VRAM: reduce batch size and number of encoder layers
Wen Ho encoder model
EncoderWrapper(
  (encoder): Encoder(
    (embed): Embedder(
      (embed): Embedding(85519, 256)
    )
    (pe): PositionalEncoder(
      (dropout): Dropout(p=0.3)
    )
    (layers): ModuleList(
      (0): EncoderLayer(
        (norm_1): Norm()
        (norm_2): Norm()
        (attn): MultiHeadAttention(
          (q_linear): Linear(in_features=256, out_features=256, bias=True)
          (v_linear): Linear(in_features=256, out_features=256, bias=True)
          (k_linear): Linear(in_features=256, out_features=256, bias=True)
          (dropout): Dropout(p=0.3)
          (out): Linear(in_features=256, out_features=256, bias=True)
        )
        (ff): FeedForward(
          (linear_1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.3)
          (linear_2): Linear(in_features=2048, out_features=256, bias=True)
        )
        (dropout_1): Dropout(p=0.3)
        (dropout_2): Dropout(p=0.3)
      )
      (1): EncoderLayer(
        (norm_1): Norm()
        (norm_2): Norm()
        (attn): MultiHeadAttention(
          (q_linear): Linear(in_features=256, out_features=256, bias=True)
          (v_linear): Linear(in_features=256, out_features=256, bias=True)
          (k_linear): Linear(in_features=256, out_features=256, bias=True)
          (dropout): Dropout(p=0.3)
          (out): Linear(in_features=256, out_features=256, bias=True)
        )
        (ff): FeedForward(
          (linear_1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.3)
          (linear_2): Linear(in_features=2048, out_features=256, bias=True)
        )
        (dropout_1): Dropout(p=0.3)
        (dropout_2): Dropout(p=0.3)
      )
      (2): EncoderLayer(
        (norm_1): Norm()
        (norm_2): Norm()
        (attn): MultiHeadAttention(
          (q_linear): Linear(in_features=256, out_features=256, bias=True)
          (v_linear): Linear(in_features=256, out_features=256, bias=True)
          (k_linear): Linear(in_features=256, out_features=256, bias=True)
          (dropout): Dropout(p=0.3)
          (out): Linear(in_features=256, out_features=256, bias=True)
        )
        (ff): FeedForward(
          (linear_1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.3)
          (linear_2): Linear(in_features=2048, out_features=256, bias=True)
        )
        (dropout_1): Dropout(p=0.3)
        (dropout_2): Dropout(p=0.3)
      )
      (3): EncoderLayer(
        (norm_1): Norm()
        (norm_2): Norm()
        (attn): MultiHeadAttention(
          (q_linear): Linear(in_features=256, out_features=256, bias=True)
          (v_linear): Linear(in_features=256, out_features=256, bias=True)
          (k_linear): Linear(in_features=256, out_features=256, bias=True)
          (dropout): Dropout(p=0.3)
          (out): Linear(in_features=256, out_features=256, bias=True)
        )
        (ff): FeedForward(
          (linear_1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.3)
          (linear_2): Linear(in_features=2048, out_features=256, bias=True)
        )
        (dropout_1): Dropout(p=0.3)
        (dropout_2): Dropout(p=0.3)
      )
      (4): EncoderLayer(
        (norm_1): Norm()
        (norm_2): Norm()
        (attn): MultiHeadAttention(
          (q_linear): Linear(in_features=256, out_features=256, bias=True)
          (v_linear): Linear(in_features=256, out_features=256, bias=True)
          (k_linear): Linear(in_features=256, out_features=256, bias=True)
          (dropout): Dropout(p=0.3)
          (out): Linear(in_features=256, out_features=256, bias=True)
        )
        (ff): FeedForward(
          (linear_1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.3)
          (linear_2): Linear(in_features=2048, out_features=256, bias=True)
        )
        (dropout_1): Dropout(p=0.3)
        (dropout_2): Dropout(p=0.3)
      )
      (5): EncoderLayer(
        (norm_1): Norm()
        (norm_2): Norm()
        (attn): MultiHeadAttention(
          (q_linear): Linear(in_features=256, out_features=256, bias=True)
          (v_linear): Linear(in_features=256, out_features=256, bias=True)
          (k_linear): Linear(in_features=256, out_features=256, bias=True)
          (dropout): Dropout(p=0.3)
          (out): Linear(in_features=256, out_features=256, bias=True)
        )
        (ff): FeedForward(
          (linear_1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.3)
          (linear_2): Linear(in_features=2048, out_features=256, bias=True)
        )
        (dropout_1): Dropout(p=0.3)
        (dropout_2): Dropout(p=0.3)
      )
    )
    (norm): Norm()
  )
  (out): Linear(in_features=256, out_features=1, bias=True)
  (sig): Sigmoid()
)

hyper params:

  wen: Configuration of the Encoder net: hidden dimension = 256, N=6 and heads = 8
  Tracy Pham: My last Linear layer has `in_features = (max_seq_length * d_model)`. in your model, `in_features=256` = d_model?
  Got the correct dimension of the last linear layer, `in_features = (max_seq_length * d_model)`
  Now accuracy is at ~.78 with `d_model=512`, `N=1`, `heads=2`, and `batch_size=50`.
    
      but d_model should be embedding dimension by reading the example repo code
    
  
werner on imports for that transformer post
import torch
import torch.nn as nn
from torch.autograd import Variable

import spacy
import torchtext
from torchtext import data
from torchtext.data import Field, BucketIterator, TabularDataset

from sklearn.model_selection import train_test_split

# from Batch import MyIterator, batch_size_fn
# from Tokenize import tokenize

import os

import numpy as np
import pandas as pd

import math
import copy

import torch.nn.functional as F
import time
AISC Math of DL Workshops [5/5]


  Headline Time 
  Total time 4:48 
  \_  AISC Math of DL Workshops [4/5] 4:48

TODOs

[x] send note on ticket payment for refund

Time schedule

1st TA Working Session, Introduction	30
  1st Notebook preperation	45
  2nd Notebook preperation + related work	240
  visualization search
  cleanup drive, files, cleanup notebooks
  time on slack with students – unknown

  Willy Rempel Jul 17, 2019 1st TA Working Session, Introduction 30
  Willy Rempel Jul 16, 2019 1st Notebook preperation 45
  Willy Rempel Jul 24, 2019 2nd Notebook preperation + related work? 80
  Willy Rempel Jul 25, 2019 ” ” 160
  315 / 60 = 5.25 hours 315
  9 hours class 
  14 * 14.25 = 199.50 14.25 

[x] 3rd prep meet


  issues people had:
    
      layers in convnet
      multiple layers
      imagenary nums
    
  
  how better handson?
    
      issue: not much time
    
  
@Willy Rempel will work on selecting some visualizations that help people understand layers in convnet and optimization etc

  conv net example to supplement
  2nd section autograd
  when we come back – optimizers example walkthrough


  handson notebook: – go through cells.
    
      random code and comments so they have to edit
      simple english. ‘this code is a breaker, you need to fix below to continue ’
      for notebook – Amir has shared link and in slides for content
    
  
Amir F – draft document on 3rd assignment: writing the blog post
  if we have any ideas on it please feedback
Amir H. class walkthrough


  overall: what grad is, how it propogates
    
      exp with autograd
      solving a problem GD
      putting together
        
          linear regression example ?
        
      
  visualizations
    
      convnet visualization
      autograd, what it means and code
    
  
  solve problem with GD
    
      handson
      2nd visualization at the end, otpimzation
    
  
  pytorch that contains all the concepts
    
      datascience blog post
    
  
[x] autograd slide 40
Hands-on: autograd
# Create a Rank-2 tensor of all ones
x = torch.ones(2, 2, requires_grad=True)
print(x)
# Define y to be a function of x
y = x+2
# And z to be a function of y (and hence x):
z = 3*y*y
out = z.mean()
print(z, out)
# Now backprop:
out.backward()
# print gradients d(out)/dx
print(x.grad)
prelim meet


  1 TA for wed sesh
  different levels of engagement for asgns
  grading?
  students ready and have access to material
    
      examples codes into colab notebooks
    
  
  prelim content
    
      read this paper&code dynamic deep networks for retinal vessel segmentation sraashis/ature
      1st session
        
          1 neuron in pytorch
            
              affine maps
              tensors
              non-linear
              parameters
              linear algebra
            
          
          quick intro to pytorch
            
              define tensor in pytorch
              fill pytorch with a certain scalar
              fill a pytorch tensor with rands
              find a min value of a pytorch tensor
              simple imports
              pytroch beginner tutorial
              convert a py list to a pytorch tensor and vice versa
              tensors and scalars
              everything in numpy -> do in pytorch
              transpose
              dot products
            
          
          exercise: transpose images
            
              matrix manipulation
              matrix determinant
            
          
          eigenvalues
            
              then hands-on for module ii
            
          
          non-linearities, acv functions
            
              types and what we use in  pytorch
              hands on: activation fns
              use prior image and apply activation fns to it
            
          
          loss fns
            
              where in pytorch
            
          
Notes

ws-math-dl-all
  ws-math-dl-breakout-1
  ws-math-dl-breakout-2
  ws-math-dl-breakout-3
  Breakout Room x - TA Willy Rempel
my online group
  Ridwan A
  Jen L
  Motasem
  Vikash
Working session 1

This is a working session with one of the TAs where you can ask questions about the workshop, set up, and hands on parts. We will spend the first session making sure everyone has the info they need. The booking is for 2 hours but will most probably end earlier.
Workshop 1

We will cover part 1 of chapter 2 Except 2.12. Well, most of it. And the rest will be given as a reading assignment.
Working session 2

my hangout:
  https://hangouts.google.com/call/xkxqyaNdzAx8t1qGUy8pAEEE
  ask people to introduct themselves
fastai has awesome library for preprocessing image data
  5 days old
gated convnets
fast.ai recommend
import imageio
# image = imageio.imread("/Users/amirh/Downloads/Veins.png", as_gray=True)
image = imageio.imread("Veins.png", as_gray=True)
image = imageio.imread("Veins.png", format='PNG', as_gray=True)

image = imageio.mimread('Veins.png', as_gray=True)

# image = imageio.imread("https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb", as_gray=True)
import matplotlib.pyplot as plt
fig = plt.figure(); plt.gray()  # show the filtered result in grayscale
ax1 = fig.add_subplot(121)  # left side
ax2 = fig.add_subplot(122)  # right side
result = ndimage.sobel(image)
ax1.imshow(image)
ax2.imshow(result)
plt.show()

# original
!wget "https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb" -O "Veins.png"

# werner
!wget "https://drive.google.com/uc?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb" -O "Veins.png"

!curl "https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb" > Veins.png
!wget "https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb" 
!wget https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb -O "Veins.png"
img = imageio.imread("https://drive.google.com/open?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb", as_gray=True)

!wget "https://drive.google.com/uc?id=1DeAk2H22KadwmVmLshtbll6K-NkO5Vwb" -O 'Veins.jpg'
Workshop 2

Workshop 3

[x] prep notebook 1

Mathematics of DL

Authors: Amir Hajian
  Presenter: [name]
  Facilitators: [names]
July 2019
Outline
  What we will learn in these 9 hours?
  How can I get the most out of it?
  How to do deep learning in 2019?
2
What to aim for
  You should be able to follow this work by the end of the workshop
  3
Recap from last session
  4
Dissecting A DL Architecture
  5
An artificial neuron
  6
Simplifying the notation
  It’s all about matrices, vectors and exploring the parameter space to find the right parameters!
Linear Algebra: Tensors and Scalars
8
Getting started with PyTorch:

What is PyTorch
  How to import it
Exercises with Tensors and Scalars
  9
  Define a tensor in PyTorch
  Fill a PyTorch tensor with a certain scalar
  Fill a PyTorch tensor with random numbers
  Find minimum value of a PyTorch tensor
  Convert a Py list to a PyTorch tensor and vice versa
Linear Algebra: Matrix Transpose
  torch.t()
  10
  data = torch.randn(200,250)
  data[100:120,:]=0.5
  imshow(data)
imshow(data.t())
Linear Algebra: Dot Product
11
Linear Algebra: Dot Product
12
Linear Algebra: Dot Product
  13
  torch.matmul(a, b)
Linear Algebra: Matrix Multiplication
  torch.matmul(M1, M2)
  14
  data = torch.randn(5)
  torch.matmul(data,data)
data = torch.randn(2,5)
  torch.matmul(data,data.ta())
Exercise: Transpose images
  Simulated Data:
  Create a random 2D matrix with dimensions 200x250, set columns 100:120 to zero, display it, transpose the matrix, display it again.
Real image data:
  Read the image provided to you, display it, transpose it, and display it again.
  15
  data = torch.randn(200,250)
  data[100:120,:]=0.5
  imshow(data)
imshow(data.t())
Linear Algebra: Matrix Determinant
16
  data = torch.randn(2,2)
  torch.det(data)
Linear Algebra: Eigenvalues
17
  data = torch.randn(2,2)
  torch.det(data)
Non-linearities

18
Non-linearities, activation functions
  Types of activation functions and what we use in PyTorch
  Affine Maps:
  f(x)=Ax+b
PyTorch way:
  lin = nn.Linear(5, 3)
  data = torch.randn(2, 5)
  lin(data)
  19
  We’ll do the first example from here: https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html
  import torch
  import torch.nn as nn
  import torch.nn.functional as F
  import torch.optim as optim
torch.manual_seed(1)
lin = nn.Linear(5, 3)  # maps from R^5 to R^3, parameters A, b
data = torch.randn(2, 5)
  print(lin(data))  # yes
Non-linearities:

Types of activation functions and what we use in PyTorch
  Non-linearities
  f(x) = Ax+b
  g(x) = Cx+d
  f(g(x)) = A(Cx+d)+b
  = ACx + ( Ad + b)
What to use?
20
Hands on:
  Types of activation functions and what we use in PyTorch
  Apply non-linearities in PyTorch
21
  Define a ReLU layer in PyTorch
  Work with non-linearities:
  Plot a relu function
  Plot a tanh function
  Plot a sigmoid function and observe how it is a distribution function
Apply ReLU to the image we uploaded earlier

We’ll do the first example from here: https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html
  import torch
  import torch.nn as nn
  import torch.nn.functional as F
  import torch.optim as optim
torch.manual_seed(1)
lin = nn.Linear(5, 3)  # maps from R^5 to R^3, parameters A, b
data = torch.randn(2, 5)
  print(lin(data))  # yes
Loss functions

Loss functions in PyTorch
  22
  torch.nn.MSELoss
  torch.nn.CrossEntropyLoss
Experiments

23
  Define a tensor in PyTorch
  Fill a PyTorch tensor with a certain scalar
  Fill a PyTorch tensor with random numbers
  Find minimum value of a PyTorch tensor
  Reshape a tensor
  Flatten a tensor
  Convert a Py list to a PyTorch tensor and vice versa
  Multiply tensors with a scalar
  Dot product two tensors
  Transpose a matrix in PyTorch
  Matrix Multiplications in PyTorch
  Define a ReLU layer in PyTorch
Work with non-linearities:
  Plot a relu function
  Plot a tanh function
  Plot a sigmoid function and observe how it is a distribution function
  Something like this for playing with non-linearities:
data = torch.arange(-2,2,step=0.1)
  plot(data.numpy(),torch.tanh(data).numpy())
  show()
We will follow these examples for playing with non-liearities:
  https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html
For tensors we will follow these:
  https://pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html
[x] A2 notebook


  [X] hands-on colab notebook  - Me
    
      hands-on #1 1D conv
        
          hide cells?
          have them search and google
          give hints, but have elements missing
          also todo in pytorch, have to go from np to pytorch
        
      
      handson #2
        
          from Amir, edge detection code
          also do edge detection on bio-med pic from last time.
        
      
      dropouts?
    
  
other guys:

  skeleton of blog post
    
      proof of work instead
      involved in the last week
    
  
  TA hour tomorrow

slides skeleton

1st part

Mathematics of Deep Learning - II
  Authors: Amir Hajian
July 2019
Outline
  Convolution:
  Why we need more than MLP?
  What is convolution? What is a kernel?
  What does it do?
  1D convolution
  2D convolution
  Hands-on experiments with convolutions in python
  Efficient convolution algorithms
  ConvNets: a lightning fast introduction to their structure (conv layers, pooling, etc) and their applications.
  Dropout:
  How we prevent overfitting in neural networks?
  What is the math behind it?
  How do you use it in PyTorch?
2
  What is a convolution?
  Formal definition: convolution is a mathematical operation on two functions (f and g) to obtain a third function that expresses how  the shape of one is modified by the other.
3
  What is a convolution?
  Formal definition: convolution is a mathematical operation on two functions (f and g) to obtain a third function that expresses how  the shape of one is modified by the other.
4
  Practical definition:
  Take a function f
  Take a function g
  Shift g by a finite amount T
  Multiply f with the shifted g: f(t) g(t-T)
  sum over the whole range to get the value of f*g at point T
  Go to step c and repeat for all T values.
What is a convolution?
  A simple example:
  What is the result of convolving a delta function with a Gaussian kernel?
  5
What is a convolution?
  A simple example:
  What is the result of convolving a delta function with a Gaussian?
  6
What is a convolution?
  A simple example:
  What is the result of convolving a delta function with a Gaussian kernel?
  7
  Read more
What is a convolution?
  A simple example:
  What is the result of convolving a delta function with a Gaussian kernel?
  8
What is a convolution?
  A visual example:
  9
What is a convolution?
  A visual example:
  10
What is a convolution?
  A visual example:
  11
What is a convolution?
  A visual example:
  12
What is a convolution?
  How to code it up?
handson

In Python:
  scipy.signal.convolve        for 1D conv
  scipy.signal.conv2d            for 2D
  13
Experiments: Hands-on
  14
  Experiment with convolutions in 1D by smoothing a top-hat function with a Hann function
  Define a top-hat function that is non-zero in the range of [100:200]
  Define a Hann function between 0 and 50 - Hint: use scipy.signal.hann
  Apply the Hann function to the top-hat - Hint: use signal.convolve
  Plot the signal before and after smoothing to see the result.
  Discuss with your teammates to make sure you understand the results.
  Repeat it with PyTorch Conv function at home.
  from scipy import signal
  sig = np.repeat([0., 1., 0.], 100)
  win = signal.hann(50)
  filtered = signal.convolve(sig, win, mode=’same’) / sum(win)
import matplotlib.pyplot as plt
  fig, (ax_orig, ax_win, ax_filt) = plt.subplots(3, 1, sharex=True)
  ax_orig.plot(sig)
  ax_orig.set_title(‘Original pulse’)
  ax_orig.margins(0, 0.1)
  ax_win.plot(win)
  ax_win.set_title(‘Filter impulse response’)
  ax_win.margins(0, 0.1)
  ax_filt.plot(filtered)
  ax_filt.set_title(‘Filtered signal’)
  ax_filt.margins(0, 0.1)
  fig.tight_layout()
  fig.show()
Application: smoothing/binning noisy functions
  original_signal = torch.randn([1,1,100])
  kernel = torch.ones([1,1,10])
  smooth_signal = torch.conv1d(original_signal, kernel, padding=5)/kernel.sum()
  15
  Pick the kernel to be a top-hat function of length L
  Convolve a noisy function with the kernel
  Observe how the function is binned using this operation.
  Note:
  To plot you need to convert to numpy and flatten:
plot(original_signal.numpy().flatten(), label=”Original Signal”)
  plot(smooth_signal.numpy().flatten(), label=”Smooth Signal”)
Application: smoothing/binning noisy functions
  In PyTorch:
  torch.conv1d(original_signal, kernel, padding=5)
  16
  Pick the kernel to be a top-hat function of length L
  Convolve a noisy function with the kernel
  Observe how the function is binned using this operation.
  original_signal = torch.randn([1,1,100])
  kernel = torch.ones([1,1,10])
  smooth_signal = torch.conv1d(original_signal, kernel, padding=5)/kernel.sum()
  Note:
  To plot you need to convert to numpy and flatten:
plot(original_signal.numpy().flatten(), label=”Original Signal”)
  plot(smooth_signal.numpy().flatten(), label=”Smooth Signal”)
Hands-On: 1D Convolution in PyTorch
  Experiment with 1D conv in PyTorch by recreating this plot to bin a noisy function.
  Get creative. Pick your own function.
  Add noise to it.
  Pick different kernels, experiment with the width and shape of the kernels.
  original_signal = torch.randn([1,1,100])
  kernel = torch.ones([1,1,10])
  smooth_signal = torch.conv1d(original_signal, kernel, padding=5)/kernel.sum()
  17
  Note:
  To plot you need to convert to numpy and flatten:
plot(original_signal.numpy().flatten(), label=”Original Signal”)
  plot(smooth_signal.numpy().flatten(), label=”Smooth Signal”)
Convolutions in 2D: A step towards ConvNets
  18
  Note:
  To plot you need to convert to numpy and flatten:
plot(original_signal.numpy().flatten(), label=”Original Signal”)
  plot(smooth_signal.numpy().flatten(), label=”Smooth Signal”)
2D Convolution: Detect Edges with Sobel Operator
  19
  import imageio
  image = imageio.imread(“/Users/amirh/Downloads/Veins.png”, as_gray=True)
  from scipy import ndimage, misc
  import matplotlib.pyplot as plt
  fig = plt.figure(); plt.gray()  # show the filtered result in grayscale
  ax1 = fig.add_subplot(121)  # left side
  ax2 = fig.add_subplot(122)  # right side
  result = ndimage.sobel(image)
  ax1.imshow(image)
  ax2.imshow(result)
  plt.show()
Experiments: Hands-on time
  20
  Experiment with convolutions in 2D to detect edges in an image
  Read the image and convert it to grey.
  Define the kernel
  Apply the kernel to the image using scipy.signal.convolve2d
  Plot the results
  Try Sobel Kernel as well as Scharr Kernel.
  See the difference in the results?
  Note for TA’s: here is a sample solution.
from scipy import signal
  sig = np.repeat([0., 1., 0.], 100)
  win = signal.hann(50)
  filtered = signal.convolve(sig, win, mode=’same’) / sum(win)
import matplotlib.pyplot as plt
  fig, (ax_orig, ax_win, ax_filt) = plt.subplots(3, 1, sharex=True)
  ax_orig.plot(sig)
  ax_orig.set_title(‘Original pulse’)
  ax_orig.margins(0, 0.1)
  ax_win.plot(win)
  ax_win.set_title(‘Filter impulse response’)
  ax_win.margins(0, 0.1)
  ax_filt.plot(filtered)
  ax_filt.set_title(‘Filtered signal’)
  ax_filt.margins(0, 0.1)
  fig.tight_layout()
  fig.show()
2D Convolution: Detect Edges with Sobel Operator

21
  import imageio
  from scipy import signal
  from scipy import misc
  image = imageio.imread(“/Users/amirh/Downloads/Veins.png”, as_gray=True)
  sobel_y = np.array([[ -1, -2,  -1],
  [0, 0, 0],
  [ 1, 2,  1]])
  sobel = signal.convolve2d(image, sobel_y, boundary=’symm’, mode=’same’)
import matplotlib.pyplot as plt
  fig, (ax_orig, ax_mag) = plt.subplots(1, 2)
  ax_orig.imshow(image, cmap=’gray’)
  ax_orig.set_title(‘Original’)
  ax_orig.set_axis_off()
  ax_mag.imshow(np.absolute(sobel_y), cmap=’gray’)
  ax_mag.set_title(‘Sobel Applied’)
  ax_mag.set_axis_off()
  fig.show()
Exercise: 1) find edges in the x-direction using
  sobel_x = np.array([[ -1, 0,  +1],
  [-2, 0, +2],
  [ -1, 0,  +1]])
  Exercise: 2) combine x and y results to get a final result
import imageio
  from scipy import signal
  from scipy import misc
  image = imageio.imread(“/Users/amirh/Downloads/Veins.png”, as_gray=True)
  sobel_y = np.array([[ -1, -2,  -1],
  [0, 0, 0],
  [ 1, 2,  1]])
  sobel_x = np.array([[ -1, 0,  +1],
  [-2, 0, +2],
  [ -1, 0,  +1]])
result = signal.convolve2d(image, sobel_x, boundary=’symm’, mode=’same’)
import matplotlib.pyplot as plt
  fig, (ax_orig, ax_mag) = plt.subplots(1, 2)
  ax_orig.imshow(image, cmap=’gray’)
  ax_orig.set_title(‘Original’)
  ax_orig.set_axis_off()
  ax_mag.imshow(np.absolute(result), cmap=’gray’)
  ax_mag.set_title(‘Sobel Applied’)
  ax_mag.set_axis_off()
  fig.show()
import imageio
  from scipy import signal
  from scipy import misc
  image = imageio.imread(“/Users/amirh/Downloads/Veins.png”, as_gray=True)
  sobel_y = np.array([[ -1j, -2j,  -1j],
  [0, 0, 0],
  [ 1j, 2j,  1j]])
  sobel_x = np.array([[ -1, 0,  +1],
  [-2, 0, +2],
  [ -1, 0,  +1]])
result = signal.convolve2d(image, sobel_x+sobel_y, boundary=’symm’, mode=’same’)
import matplotlib.pyplot as plt
  fig, (ax_orig, ax_mag) = plt.subplots(1, 2)
  ax_orig.imshow(image, cmap=’gray’)
  ax_orig.set_title(‘Original’)
  ax_orig.set_axis_off()
  ax_mag.imshow(np.absolute(result), cmap=’gray’)
  ax_mag.set_title(‘Sobel Applied’)
  ax_mag.set_axis_off()
  fig.show()
smoothing = np.ones([50,50])
result = signal.convolve2d(image, smoothing, boundary=’symm’, mode=’same’)
import matplotlib.pyplot as plt
  fig, (ax_orig, ax_mag) = plt.subplots(1, 2)
  ax_orig.imshow(image, cmap=’gray’)
  ax_orig.set_title(‘Original’)
  ax_orig.set_axis_off()
  ax_mag.imshow(np.absolute(result), cmap=’gray’)
  ax_mag.set_title(‘Sobel Applied’)
  ax_mag.set_axis_off()
  fig.show()
2D Convolution: Detect Edges with Scharr Operator
  22
  import imageio
  from scipy import signal
  from scipy import misc
  image = imageio.imread(“/Users/amirh/Downloads/Veins.png”, as_gray=True)
  scharr = np.array([[ -3-3j, 0-10j,  3 -3j],
  [-10+0j, 0 0j, +10 +0j],
  [ -3+3j, 0+10j,  +3 +3j]])	# Gx + j*Gy
  grad = signal.convolve2d(image, scharr, boundary=’symm’, mode=’same’)
import matplotlib.pyplot as plt
  fig, (ax_orig, ax_mag) = plt.subplots(1, 2)
  ax_orig.imshow(image, cmap=’gray’)
  ax_orig.set_title(‘Original’)
  ax_orig.set_axis_off()
  ax_mag.imshow(np.absolute(grad), cmap=’gray’)
  ax_mag.set_title(‘Gradient magnitude’)
  ax_mag.set_axis_off()
  fig.show()
Dropouts or how not to overfit
  23
Dropouts or how not to overfit
  24
Dropouts or how not to overfit
  25
Reading:
  Chapter 9 of Deep Learning Book
  ConvNets Tutorial in PyTorch
  Understanding Convnets (blogpost, paper)
  Understanding Dropouts (blogpost, paper)
  Programming
  Learn to work with ConvNets. Follow these tutorials to learn how to use ConvNets for various tasks in PyTorch
  Beginner: Training a classifier
  Advanced: Gated ConvNets for Neural NLP
  Explainability:
  How ConvNets characterize images
  Understanding ConvNets
GAN Workshop [12/29]


  Headline Time
  Total time 14:41


  Headline Time
  Total time 1d 2:15

Notes


  fwiw: ppl remember beginning and end of event most

prep meets

meet 1 [2019-08-12 Mon]


  Andring notebooks for presentation
    
      2.him.
    
  
  capsblog post OR reproduce
  1st n
    
      NOdule
      indcgan, gen, discr, 2.5hrs
        
          pointed code
        
      
meet 2 [2019-08-19 Mon]


  we no focus more on theory?
    
      sont some more advanced stuff
      mareating some of the layers
    
  
  we ck individuals who want the more advanced

meet 3 [2019-08-26 Mon]


  cyclegan intro, handson,
    
      end with 1hr applications and evaluation and conclusions
    
  
  Amir won’t be there
  [X] go thru notebook, run and review in detail (prep for wed)
  students tasks:
    
      wed short draft qualitative about what paper is about
      2 weeks try to reproduce, expect write observations of attempt, not necessarily success
        
          can be as low as reading code and observations
        
      
TODOs [4/21]

WK3 cycleGan notebook Review [0/6]

[.] super(resnetblock, self) - calling itself

[.] opt namedtuple

[.] whole cycle thing - still check why for D selection

[.] BCEWithLogitsLoss

[.] wgangp no loss?

[.] label generator register_buffer

HW todo from WK2 slides [0/11]

[.] Does training convergence indicate better results?

[.] How would you estimate memory needs for a GAN?

[.] Update the model to use 32x32 or 128x128 images

[.] Interpolate through the latent space of the trained DCGAN

[.] Adapt the first GAN to run on GPU

[.] Prepare your own dataset to run through DCGAN

[.] Try removing batch norm and see what happens

[.] What prevents this architecture from being used for large models?

[.] What would happen if we replaced batch norm with spectral norm?

[.] What do I mean when I said strided convolutions replace pooling functions?

[.] WK2 training Qs


  why detach?
  why that view call?

[c] share:gan training anim

better online experience


  [X] ask individuals who want more advanced
  those without mics.
    
      how to engage? talk or chat?
    
  
  ask for progress
  suggest others mute or reduce volume when conversation not relevant to them
  remind everyone to mute mics when talk comes on
  [X] try out screen share with lower-right webcam

[x] Blog post Intro


  Headline Time
  Total time 16:52


  less 2.5hrs for that saturday = 17 - 2.5 = 14.5

2nd

Introduction
  Week 1 - Introduction:
Intro to GANs and adversarial training
  Writing a GAN
  Training your GAN (hands on)
  Week 2 - Image to image translation:
Intro to image to image translation GANs
  Writing a cycleGAN
  Training a cycleGAN (hands on)
  Week 3 - Advanced Topics:
Evaluating GANs
  Current research and state of the art
  Beyond GANs: applications of adversarial training[one paragraph covering the gist of what was covered in session 1]
  For the first session the groundwork was laid with an overview of General Adversarial Networks and a solid introduction to the theory. Instead of the usual image tasks typical of GANs, participants worked on a minimal GAN that simply converted a random uniform distribution to a Gaussian. In this way the focus was on the core essentials that uniquely define GANs.[one paragraph covering the gist of what was covered in session 2]
  In session two, participants progressed from foundations to more contemporary GAN architectures. The hands-on exercise involved using the ubiquitous DCGAN architecture for an image to image translation task. The lecture portion filled out the picture with the necessary theory and its historical progression.[one paragraph covering the gist of what was covered in session 3]
  The last session moved to an even more challenging GAN architecture: the cycleGAN. After the lecture portion, an implementation of a cycleGAN along with the training code completed by the participants. As execution continued, our instructor Andrew finished off the workshop by discussing current research and the state-of-the art in the field. He made it clear that GANs are not just for images, but have a place in many areas of machine learning, such as: times series, <insert more here>.[one paragraph outlining the post; has to written once the sections are filled]
1st draft - wrong

During the month of August, AISC (Aggregated Intellect Socratic Circles) held a workshop - ‘Generative Adversarial Networks and Beyond’.
  Attending either on location or online, students were engaged in both lectures from our instructor, Andrew B. Martin, as well as hands-on coding exercises.
  Over the course of the 3 weekly sessions, students went from the core basics of GANs, through the ubiquitous DCGAN, and ending with CycleGANs.
  Along the way, students were given supplementary material to expand on the class contents and provide guidance to better enable them to continue with GANs on their own. Our workshop finished with a capstone project the result of which is this post. It is the collective work of all our students. Teams were formed and each team could choose to either write a qualitative summary of a selected paper, or to reproduce the papers results.
  Below are the results of their efforts. [brief intro to each entry]
  [Lastly, we finish off with <last entry>. <some closing sentence>]scratch

  [X] blogpost guideline detailed - ask on staff if ?
    
      intro to whole blog post
      1 paragraph – takaway of 3 sessions, high level overview
    
  
During the month of August, AISC (Aggregated Intellect Socratic Circles) held the ‘Generative Adversarial Networks and Beyond’ workshop.
—Over the course of the 3 weekly sessions, students went from the core basics of GANs, through the ubiquitous DCGAN, and ending with CycleGANs. Attending either on location or online, students were engaged in both lectures from our instructor Andrew B. Martin and hands-on training with notebooks.Attending either on location or online, students were engaged in both lectures from our instructor, Andrew B. Martin, as well as hands-on coding exercises.
  Over the course of the 3 weekly sessions, students went from the core basics of GANs, through the ubiquitous DCGAN, and ending with CycleGANs.Over the course of the 3 weekly sessions, students went from the core basics of GANs advanced architectures through the ubiquitous DCGAN, and ending with CycleGANs.
Over the course of 3 weekly sessions, students went from the core basics of GANs, to the ubiquitous DCGAN, and ending with CycleGANs. Attending either on location or online, students heard lectures from our instructor Andrew B. Martin and dove right into hands-on training with notebooks.
—
Along the way, students were given supplementary material to expand on the class contents and provide guidance to better enable them to continue with GANs on their own.
—
  Our workshop finished with a capstone project that is the collective work of all our students. Teams were formed and each team could choose to either write a qualitative summary of a selected paper, or to reproduce the papers results.
  Below are the results of their efforts. [brief intro to each entry]The final assignment for this workshop are the capstone blog posts.
To finish off the workshop, students were given a capstone project to complete.
  Students were broken up into nnn teams. Each team had the option to either write a qualitative summary of a selected paper, or to reproduce the papers results.
They were broken up into nnn teams and each team choose one to two options. The first option was to write a qualitative summary of a selected paper, or alternatively, as a second option, to reproduce the papers results in a coding project.
—

  content points
    
      Instructor Andrew works in industry
        
          uses GANs in
          he provided practical insights to use GANs in real-life
        
      
      capstone
    
  
workshop blurb

Workshop Overview
Generative Adversarial Networks have been very popular in recent years for various tasks like image generation and data augmentation. A large number of papers at ICLR 2019 were focused on GANs. With the fast pace of the field, you won’t be able to stay up to date with the latest if you don’t have the right foundational knowledge of how these networks work under the hood. We are offering this workshop to help you step into the depth of GANs. Are you ready?
In this workshop, you will learn the theory and gather hands on experience in some of the most fundamental concepts and practical tips about GANs.
Important Dates
Please note that this workshop will happen on 3 separate evenings:
August 14, 2019
August 21, 2019
August 28, 2019
Office hours will happen on,
August 20, 2019 (in person and online participants, Group office hour with the TAs)
August 27, 2019 (in person and online participants, Group office hour with the TAs)
Date TBC (Group office hour with the instructor, has to be purchased separately)
“Why should I care about GANs?”
GANs and adversarial ML are widely discussed and increasingly used in AI
GANs are rapidly being applied in many of the cutting edge AI applications
“But I don’t care about generating fake photos!”
Adversarial ML has become an integral part of many of the recent ML algorithms
This workshop goes beyond GANs where you will explore adversarial training and their numerous potential applications
Why you should attend
In this 3-session intensive workshop, we will bring you up to speed with everything needed to build a strong background in GANs. It will be a combination of theory and hands-on applications in PyTorch.
This workshop is built on the instructors extensive experience in academia and industry on related topics.
This workshop is the first in its series and paves the way theoretically and technically for many application specific workshops to follow.
Target Audience
Data Scientists, Machine Learning Engineers, Software Engineers, Students, Other Analytics Roles (data analysts, managers, product owners, etc)
Prerequisites
Knowledge of Python
  Knowledge of Machine Leaning
  Familiarity with deep learning
  Familiarity with PyTorch or other deep learning frameworks is a plus
  This is a beginner to intermediate workshop
  Learning Outcomes
We will build a working application in Python using GANs and image processing to generate and translate images
Understand how a vanilla GAN works
  Understand how and image to image translation GAN works
  Be able to explain limitations, current research directions, and applications
  Build GAN to generate images
  Build another GAN to translate images
  You will get a deeper understanding on how to apply GANs and adversarial loss to you own deep learning pipeline, in supervised, unsupervised and semi-supervised settings
  Pre-workshop reading material
TBD
Learning Material
All participants will have access to the following learning material:
Slides from the sessions
  Hands on notebooks
  Video recording of the sessions (you can use the videos to watch the parts that you missed, or re-watch any parts that are still unclear for you; access to videos beyond one week after the workshop is available to be purchased; see tickets >> add-ons)
  Instructor
Andrew Martin
Head of Data @ Looka Inc
Andrew is a data scientist of 8 years working in deep learning and optimization. He currently leads the data team at Looka where they use generative models like GANs to make great design accessible and delightful to everyone.
Course Modules
The workshop happens on 3 evenings, 3 hours each; each module below will be 50 mins.
Week 1 - Introduction:
Intro to GANs and adversarial training
  Writing a GAN
  Training your GAN (hands on)
  Week 2 - Image to image translation:
Intro to image to image translation GANs
  Writing a cycleGAN
  Training a cycleGAN (hands on)
  Week 3 - Advanced Topics:
Evaluating GANs
  Current research and state of the art
  Beyond GANs: applications of adversarial training
[x] student writeup editing
@Willy Rempel @Werner could you please start looking at the parts people have copied and provide them with feedback? just leave comments directly on their write up. Perhaps only focus on the technical side of what they have rather than language, unless it’s very difficult to read or the flow is very bad etc
[x] fix clock last night

Breakout Groups [0/0]

4 progressive GAN


  3 ppl

5 EvoGan


  3 ppl, Alice et al
  population of Gs
    
      evo selection
    
  
Archive

cycleGAN testing


  1st entry started about 1hr ago. Also did several hours work yesterday.

horse2zebra :: started 2:20
batch_size: 1
  beta1: 0.5
  checkpoints_dir: ./checkpoints
  continue_train: False
  crop_size: 256
  dataroot: ./datasets/horse2zebra               [default: None]
  dataset_mode: unaligned
  direction: AtoB
  display_env: main
  display_freq: 400
  display_id: 1
  display_ncols: 4
  display_port: 8097
  display_server: http://192.168.0.35                  [default: http://localhost]
  display_winsize: 256
  epoch: latest
  epoch_count: 1
  gan_mode: lsgan
  gpu_ids: 0
  init_gain: 0.02
  init_type: normal
  input_nc: 3
  isTrain: True                                 [default: None]
  lambda_A: 10.0
  lambda_B: 10.0
  lambda_identity: 0.5
  load_iter: 0                                    [default: 0]
  load_size: 286
  lr: 0.0002
  lr_decay_iters: 50
  lr_policy: linear
  max_dataset_size: inf
  model: cycle_gan
  n_layers_D: 3
  name: CycleZebra1                          [default: experiment_name]
  ndf: 64
  netD: basic
  netG: resnet_9blocks
  ngf: 64
  niter: 100
  niter_decay: 100
  no_dropout: True
  no_flip: False
  no_html: False
  norm: instance
  num_threads: 4
  output_nc: 3
  phase: train
  pool_size: 50
  preprocess: resize_and_crop
  print_freq: 100
  save_by_iter: False
  save_epoch_freq: 5
  save_latest_freq: 5000
  serial_batches: False
  suffix:
  update_html_freq: 1000
  verbose: False(epoch: 133, iters: 1112, time: 2.520, data: 0.003) D_A: 0.090 G_A: 0.543 cycle_A: 0.625 idt_A: 0.233 D_B: 0.213 G_B: 0.239 cycle_B: 0.807 idt_B: 0.198
  Traceback (most recent call last):
  File “train.py”, line 43, in <module>
  File “/home/will/DevAcademics/GANs/pytorch-CycleGAN-and-pix2pix/data/__init__.py”, line 90, in __iter__
  for i, data in enumerate(self.dataloader):
  File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/torch/utils/data/dataloader.py”, line 582, in __next__
  return self._process_next_batch(batch)
  File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/torch/utils/data/dataloader.py”, line 608, in _process_next_batch
  raise batch.exc_type(batch.exc_msg)
  FileNotFoundError: Traceback (most recent call last):
  File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py”, line 99, in _worker_loop
  samples = collate_fn([dataset[i] for i in batch_indices])
  File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py”, line 99, in <listcomp>
  samples = collate_fn([dataset[i] for i in batch_indices])
  File “/home/will/DevAcademics/GANs/pytorch-CycleGAN-and-pix2pix/data/unaligned_dataset.py”, line 57, in __getitem__
  File “/home/will/anaconda3/envs/main/lib/python3.7/site-packages/PIL/Image.py”, line 2770, in open
  fp = builtins.open(filename, “rb”)
  FileNotFoundError: [Errno 2] No such file or directory: ‘./datasets/horse2zebra/trainA/n02381460_1266.jpg’

  restarting at [2019-08-09 Fri] 12:43
  less than 24h I was at epoch=133.

datasets

facades: 400 images from the CMP Facades dataset. [Citation]
  cityscapes: 2975 images from the Cityscapes training set. [Citation]
  maps: 1096 training images scraped from Google Maps.
  horse2zebra: 939 horse images and 1177 zebra images downloaded from ImageNet using keywords wild horse and zebra
  apple2orange: 996 apple images and 1020 orange images downloaded from ImageNet using keywords apple and navel orange.
  summer2winter_yosemite: 1273 summer Yosemite images and 854 winter Yosemite images were downloaded using Flickr API. See more details in our paper.
  monet2photo, vangogh2photo, ukiyoe2photo, cezanne2photo: The art images were downloaded from Wikiart. The real photos are downloaded from Flickr using the combination of the tags landscape and landscapephotography. The training set size of each class is Monet:1074, Cezanne:584, Van Gogh:401, Ukiyo-e:1433, Photographs:6853.
  iphone2dslr_flower: both classes of images were downlaoded from Flickr. The training set size of each class is iPhone:1813, DSLR:3316. See more details in our paper.
execution scratch

python train.py –name CycleMaps1 –model cycle_gan –display_id 1 –dataroot ./datasets/maps
python train.py –name CycleMaps1 –model cycle_gan –display_id 1 –dataroot ./datasets/maps –display_server=”http://192.168.0.35”
bi-cubic order = 3
colab scratch

!pip install pytorch
  !pip install torchvision
colab notebook from big repo

from <gitrepo> import util, models, options, data
no importing:
  datasets (.sh .py)
  scripts (all .sh)
L&L GANs

explicit has p(x)
  implicit – has a blox box, just get samples from distribution
Workshop 2 [7/7]


  emphasis on GANS, less pytorch ??

[x] followup Qs


  [X] share on staff channel
  [X] elu vs relu vs leaky
  [X] notebook images from Andrew properly reffed so they show up in case of reset

[x] share:gan hacks

other prep tomorrow [3/4]
[x] conv stride, padding,[x] transpose conv
  
    The generator needs to go from a small input to a larger output, so we need to do  ‘transpose convolution’. here’s one explanation. https://towardsdatascience.com/transpose-convolution-77818e55a123
  [c] pytorch study different modules: datasets mostly, util, nn, optim,
[x] collect relevant webpages for screen sharing and ref

[x] pull up relevant pdfs too

text of slides for today

GANs workshop day 2Today’s itinerary
  Previous class
  Simple GAN to DCGAN
  Coding a DCGAN
  Intro to CycleGAN
  Assignments, exercises, and next week
  OVERVIEW
Selected work
  Approximate P(x,y) rather than P(x | y)
Generated symbols
  OUR PROJECTS
Generated Typefaces
  OUR PROJECTSPrevious Class
  Approximate P(x,y) rather than P(x | y)
What we did
  Overview of Generative Adversarial Networks
  1 hidden layer fully connected generator and discriminator
  Stochastic Gradient Descent optimization
  Binary cross entropy loss function
Reading and Assignments
  Goodfellow et al (2014), Radford et al (2015)
  Pix2pix, CycleGAN, and/or batch norm papers
  Challenges
  Update the model to use 32x32 or 128x128 images
  Interpolate through the latent space of the trained DCGAN
  Adapt the first GAN to run on GPU
  Prepare your own dataset to run through DCGAN
  Try removing batch norm and see what happens
  Questions:
  Does training convergence indicate better results?
  How would you estimate memory needs for a GAN?
  Previous class
Simple GAN to DCGAN
  Approximate P(x,y) rather than P(x | y)OVERVIEW OF GANS
  Training
  Through training the generator learns to turn random noise into realistic samples
Generator and discriminator
  Two networks, the generator and discriminator, competing in a two player minimax game
Discriminator trained to identify whether a sample comes from the training set or the generator
Generator trained to generate samples that trick the discriminator
  OVERVIEW OF GANS
Model description
  Transform random uniform noise into a normal distribution
  1 hidden layer fully connected generator and discriminator
  Stochastic Gradient Descent optimization
  Binary cross entropy loss function
First GANDCGAN
  OVERVIEW OF GANS
  Radford et al 2015
Model description
  Transform random random noise into images of font sheets
  Multiple convolutional hidden layers with batch normalization and leaky ReLU
  Model weights initialized from Normal with mean 0 and stddev 0.2
  Adam optimization
  Binary cross entropy loss function
DCGAN
Model description
  OVERVIEW OF GANS
Similarities with Simple GAN
  Generator still maps latent space to target distribution samples
  Discriminator still maps samples to a classification
  Training loop is virtually unchanged
  Model still uses binary cross entropy loss function
DCGAN
Concepts for DCGAN
  Approximate P(x,y) rather than P(x | y)
Convolutional hidden layers
  All convolutional network (Springenberg et al., 2014)
  Efficient for getting a representation of images
  Strided convolutions to learn its own pooling function
Concepts for DCGAN
Strided convolutions
  Concepts for DCGAN
Transposed strided convolutions
  Concepts for DCGANBatch Norm
  Introduced in Ioffe & Szegedy (2015)
  Normalize input to each unit to have mean 0 and variance 1
  Allows gradients to flow for deeper generators with no mode collapse
Concepts for DCGANAdam optimizer
  Introduced in Kingma & Ba (2014)
  Adaptive moment estimation
  Stochastic gradient descent has one learning rate. Adam has an adaptive learning rate for each network parameter
  Uses moving averages of first and second moments of gradient
  Four hyper parameters: initial learning rate, decay on moving averages, and epsilon
  In practice very robust and doesn’t need as much tuning as other algos
  Concepts for DCGANCoding DCGAN
  Questions
  What prevents this architecture from being used for large models?
  What would happen if we replaced batch norm with spectral norm?
  What do I mean when I said strided convolutions replace pooling functions?
  Coding DCGANIntro to CycleGAN
  Approximate P(x,y) rather than P(x | y)
Horse to zebra
  CycleGAN
Unpaired image to image translation
  CycleGAN
High level
  Unpaired image to image translation
  Builds on the pix2pix model from a year earlier
  Two generators and two discriminators
  Concept of cycle consistency loss
  Transfers style from one collection to another collection
  CycleGAN
Model training
  CycleGAN
  Zhu et al 2017Paper implementation
  Generator: 6 - 9 residual blocks
  Discriminator: 70x70 PatchGAN to reduce size
  Least squares loss replacing BCE for stability
  Train generator with history of generated images rather than latest to reduce oscillation
  Train with Adam and a batch size of 1
  See PyTorch code: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix
  CycleGANCourse implementation
  Generator: Transposed convolutions with a residual block
  Discriminator: convolutional discriminator
  Least squares loss replacing BCE for stability
  Train with generated images from the current batch
  Train with Adam and a batch size of 16
  CycleGANWhat’s next?Next week
  Coding cycleGAN
  Adversarial training more generally
  Evaluating GANs and where the technology is heading
  What’s next?Assignments and reading
  Pix2pix and cycleGAN papers
  Take a look at the capstone papers
  Look at this code base: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/tree/master/models
  Collect all your questions. About anything…
  What’s next?
Thank you
[x] Capstone Blog Post
update gans intro , edit group posts
    3 paragraphs / tech covered

  Headline Time
  Total time 11:04

edit group posts

likely more accurate time. except for breaks though
[#A] RL Workshop [10/15]


  Headline Time
  Total time 12:56

Notes

Prep Meet1 [2019-09-16 Mon]
1st Sutton book
  2nd rainbow
  3rd policy gradients
3 teams – ask Qs, are engaged, understand things, convey we are here to help
  more attention from instructor online
  relaxing FG will do most of it.
pencil&paper exercise?
  exercises get points
  setup rules clearly
Vahid — oct3rd, oct4th to EU
log in sheet – 15min
  answering Qs offline add
  prep time add
Prep Meet3 [2019-09-30 Mon]

Archive

TODOs [9/14]

[x] review
udacity RL repo

  interesting when starting up jupyter-lab for this repo:
    Build Recommended
    JupyterLab build is suggested:
    @jupyter-widgets/jupyterlab-manager needs to be included in build
  

[c] followup jalan
she could not get things to work
[c] followup knowing probabilities

[x] aisc online-participant guide post

For online participants, please familiarize yourself with the online participation guide:

  You can join this call for the hands on sections to discuss any issues you face, or any questions you have https://meet.google.com/ake-cozz-kkx?hs=122. It is the same for all sessions. This is not the same as the link for the workshop streaming itself (located here  https://member.ai.science/workshops/rl-2019-09&sa=D&usd=2&usg=AOvVaw0zvT5wO7_6Sofktww4pQ2J)
  Please don’t forget to mute your mic when the instructor session begins, or if there is background noise at your location.
  Do not hesitate to ask questions, we are here to help.
  Screen sharing in meet.google.com:    https://support.google.com/meet/answer/7290345?co=GENIE.Platform%3DDesktop&hl=en, or this video clip  https://www.youtube.com/watch?v=6FCIqvv68NY. For some issues, we may need to see your screen in order to help you.
  The hands-on notebook will be in the google drive prior to the workshop. Don’t forget to make a copy in your own drive and use the copy for all your work. If you have time, try and look thru the notebook prior to the session.
  We will use <link> as an online whiteboard as a convenience. Sometimes it is easier to explain with a diagram.

[x] prep RL – proread Florians, read his material
2 papers and blog posts
  policy gradient in np
  actor-critic
  deep deterministic
[x] proof slides


  title
    
      “3ed” -> “3rd”
      cap “W”orkshop
    
  
  #6 “process converges to to” duplicate “to”

[x] test notebook

[x] routine work


  copy session3 sol notebook to std folder
  dl to my files
  start reading texts

[x] breakdown time

[.] populate this doc with important links

could you populate this doc with some of the important links shared with the class, or useful links that the student shared?
  https://docs.google.com/document/d/16KX0-fJMBn1ByaG23TEZQHTVAMx-CHEO8jY23utBdjw/edit
Capstone blog post [0/4]

Guidelines

This blog post is the collective work of the participants of the “Reinforcement Learning” workshop. This post serves as a proof of work, and covers some of the concepts covered in the workshop in addition to advanced concepts pursued by students. The blog post will be shared on the AISC Blog, and other major ML/DS related outlets.Objective
  The objective of this post is to demonstrate an understanding of the concepts learned in the Workshop.
Teams will be tasked with summarizing a selected RL paper and providing additional insight into the findings and concepts discussed.
Contribution Declaration
  Each team will provide contribution acknowledgement to its members for participating in the blog post writing and idea generation.
  Steps
  Refer to the References section and select one of the suggested papers. Given the topics discussed in the workshop, you should be familiar with the concepts used in each of the papers.
  Please list the names of your group and team members that contributed to the development at the top of each section.
  Work with your breakout team to go through the paper, and support each other to make sure everyone understands it well; you can divide and conquer by reading/researching different parts and explaining to each other
  Come up with a work plan for each member of the team to contribute something (the updated version of this should later turn into the “contribution declaration”)
  Collaborate and draft your learnings into a section in this article; the alternative is to create a video about what you learned
  AISC blog editors will provide light technical and language feedback on the post prior to publication on the AISC Blog.
  AISC blog editors will provide guidance before submitting the blog post to other mediums such as Towards Data Science (Medium.com) and KDNuggets.Important Dates
  There are 2 deliverable options based on your available time commitment. Please select the option that suits your team members.
  SEP 20: Select the paper you want to work on
  SEP 30: Write a qualitative summary of what you learned about the paper you selected. Please provide enough technical details to demonstrate your understanding
  OCT 14: Reproduce the results of the paper by going through the code, and rerunning it on the given data set. This should link back to your github repo, or any other pages showing your results/code
  If you’ve completed both exercises and would like an additional challenge, please contact us at events@ai.science and we can provide guidance to find an extension that could result in some sort of research publication, or a simple application that you can use as part of your portfolioRecommendations
  You may pick a paper that is not listed here as long as you convince your team; please add that paper to the References section
  We generally prefer that you don’t work alone, but if you have very good reasons for it, we might consider it
  You are encouraged to read other sections and provide constructive feedback in the form of comments, but please do not alter them
  If you claim a section to write, then you need to deliver by the due dates above. If you miss the deadline, the post will be published without your section
  Breakout teams are combinations of in-person and online audience, and it’s everyone’s responsibility to make sure that all team members are engaged and informed about plans. You can use the slack channel for your communication as much as you want, but also can arrange for video calls etc.
  If you use any other resources, add their information to the References section, but make sure you don’t modify the existing References
[.] capstone intro

The foundation of all Reinforcement Learning (RL) is of an agent that acts on, and is acted on by its environment.
The agent-environment relationship forms a closed loop:

  the environment receives from the agent an action
  this action can change the environment. That is, the environment changes state
  the agent then receives from the environment both state and reward information

This loop can be formalized as a Markov Decision Process (MDP),
  $$p(s′,r \mapsto s,a) = Pr(S_t=s′, R_t=r \mapsto S_t-1=s, A_t-1=a)$$
  where the next state s’ and its associated reward $$r ∈ \mathbb{R}$$, is only dependant on the previous state s, and the action taken a. Every part of this system can be elaborated upon, and from this follows all the rest of the field.
Environments can be discrete or continuous. State changes can be deterministic or not. An environment can be described by its states, the allowed actions at each state, and a transition function $$t(s,a) = s′$$ that accepts a state s, an action a, and return the next state. If the environment is not deterministic, then $$t(s,a)$$ returns $$P(S’)$$, the probability vector where each $$p(s_i) in P(S’)$$ is the probability that the action will result in the environment changing to state $$s_i$$.
  The agents goal is to maximize its total reward over time. Thus it needs to choose a particular sequence of actions that will achieve this. Agents choose their actions based on a policy function $$μ(s)$$ if it is deterministic, or $$π(s)$$ if it is probabilistic. The policy can be as simple as a static look-up table, or as complex as a large, deep net. An $$ε-greedy$$ policy is one where the agent chooses a random action with probability $$ε$$, or the maximally rewarding action otherwise. This highlights the trade-off between exploitation and exploration that is a common theme of policies.
  When the agent-environment loop is indefinite in duration, the rewards cannot be simply summed. A discounting factor $$0 &lt; γ &lt; 1$$ is used to attenuate future rewards:
  $$G_t = R_t+1 + \gammaR_t+2 + γ²R_t+3 + γ³R_t+4 + …$$
  Agents also have either a value function v(s) that returns a value for state s, or q(s,a) that values a state-action pair (the value of taking action a while in state s).
  Both value functions satisfy a recursive relationship expressed by the Bellman equations
  $$v_π(s) = \mathop{\mathbb{E}_π}\left[∑_k=0^∞γ^kR_t+k+1\big\vertS_t=s\right], for all s ∈ S$$
  $$q_π(s,a) = \mathop{\mathbb{E}_π}\left[∑_k=0^∞γ^kR_t+k+1\big\vertS_t=s, A_t=a\right], for all s ∈ S$$
  Almost all reinforcement learning algorithms are General Poliy Iteration (GPI) methods:

  Maintain approximate value and policy functions
  The policy is iteratively improved with respect to the value function, while the value function is evaluated with respect to the policy
  This feedback loop converges to optimal policy and value functions
  the value function is used to structure and constrain the policy search

Dynamic Programming algorithms are at one extreme of RL methods requiring a perfect model of the environment and typically exponential computation cost. They are used for the theoretical underpinning of reinforcement learning as opposed to practical use. Briefly, dynamic programming involves finding optimal solutions by progressively building from optimal solutions to sub-problems.
  At the other extreme Monte Carlo (MC) methods have no model and rely soley on experience from agent-environment interaction. The value of a state s is computed by averaging over the total rewards of several traces starting from s. These methods require completing entire episodes (traces) before the value function can be updated.
  Temporal Difference (TD) learning is an invaluable approach that combines advantages from both DP and MC methods. As the name implies, valuation updates are done recursively by the difference between time steps. It does not require an environment model like DP, and unlike MC it can update prior to episode completion. Like MC, it learns directly from experience, and like DP it iteratively updates estimates. This is easiest to show by comparing value function updates rules:
  Monte Carlo
  $$V(S_t) \mapsto V(S_t) + α[G_t - V(S_t]$$
Dynamic Programming
  $$v_π(s) = \mathbb{E}_π[R_t+1 + \gammaG_t+1| S_t = s]$$
Temporal Difference
  $$V(S_t \mapsto V(S_t + α[R_t+1 + \gammaV(S_t+1 - V(S_t)]$$
where $$α$$ is the learning rate, and the term
  $$R_t+1 + \gammaV(S_t+1$$ is the updated estimate of $$V(S_t)$$
SARSA is a TD algorithm that is an ‘on-policy’ learning method. On-policy methods evaluate and improve the same policy $$π$$ that is used to make the action decisions.
  In contrast, an off-policy method uses two policies: a behavioural policy $$b$$ that is more amenable to explore traces outside of current optimal estimates, and the target policy $$π$$ to be optimized.
  Q-learning is an off-policy TD method. It is defined by:
  $$Q(S_t,A_t) \mapsto Q(S_t,A_t) + α[R_t+1 + γ\max_aQ(S_t+1,a) - Q(S_t, A_t)]
  Deep Q-learning (DQN) uses deep neural networks for the policy and value functions. The cost function for DQN is
  $$\big\left( DQN_net(S_t,a) - (r + γ\max_aDQN_net(S_t+1,a))\big\right)²$$
  An important addition to the architecture is experience replay by the use of a memory D. Agent experiences are stored as tuples $$e_t = (s_t,a_t,r_t,s_t+1)$$ over many episodes. During training minibatches of samples are taken from D at random for standard SGD optimization.
  Rainbow DQN combines several architectural innovations to vanilla DQN that have proven to be beneficial:

  Double deep Q-Learning
  Duelling DQN
  Action Advantage
  Noisy Networks
  Multi-step Learning
  Prioritized Experience Replay

Lastly, Policy Gradient Methods directly optimize a parameterized, differentiable policy function that does not require the use of the value function for action selection. For example, the REINFORCE Monte Carlo Policy Gradient algorithm trains policy $$π(a\vertS_t,θ)$$ with parameters $$θ$$ by the update rule
  $$ θ_t+1 = θ_t + α\gamme^tG_t\frac{δ\pi(A_t\vertS_t, θ_t}{π(A_t\vertS_t,θ_t}$$
where $$α$$ is the learning rate, $$γ$$ is the discounting factor, and $$G_t$$ is the total episodic return. An useful algebraic trick is to reformulate the right-most term above as
  $$\frac{δ\pi(A_t\vertS_t, θ_t}{π(A_t\vertS_t,θ_t} = δ ln π(A_t\vertS_t,θ_t)$$points

  Markov Decision Process (MDP) + Markov Property
  Env
    
      states
      rewards
      transition
    
  
  recursive, iterative, game, terminate, infinite, trace/trajectory/episode, goal, G, R/rrr,
    
      discounting
      exploration vs exploitation
        
          e-greedy
          1-armed bandit
        
      
  Agent
    
      policy
        
          actions
          epsilon-greedy
        
      
      value
        
          value functions and Bellman Equation
        
      
  Almost all RL algorithm are GPI
    
      Maintain both an approximate value function and an approximate policy
      Iteratively improve policy with respect to value function, and value function always drives to the value function of the current policy
      Overall process converges to to optimal policy and optimal value function Generalized Policy Iteration (or in PGMs?)
    
  
  Dynamic programming
    
      model vs model free - simulates future states
      Florians:
        
          Use value function to perform a structured search of good policies
          Iterative approximations v1, v2, v3, v4, … of vπ(s) by using Bellman equation as update rule
          Replace v(s) with new value calculated bu the old values of v(s’). This is called expected update.
          Terminates ones value functions minimal change after iteration
        
      
  Monte Carlo Methods
    
      DP requires distribution of the next events
      Monte Carlo based methods rely only on experience No prior knowledge of the environment is required Averages sample returns (remember k-armed bandits)
    
  
  TD learning
    
      Compare to DP and MC
        
          Does not require model of environment (unlike DP)
          MC needs to wait until episode finish, TP can online update
          MC it’s hard to estimate value of action-state pair
        
      
      On-Policy, Off-Policy
        
          off-policy
            
              Importance Sampling
              Transform Weight
              Weight Importance Scaling
            
          
      SARSA
      Q-Learning is also a temporal difference learning algorithm. However, unlike SARSA, it is off-policy.
    
  
  Deep Q-Learning
    
      ?optimal bellman for QL
      Experience Replay
      All the learnable parts: policy, value fns,
      Rainbow DQN
        
          Double deep Q-Learning
          Duelling DQN
          Action Advantage
          Noisy Networks
          Multi-step Learning
          Prioritized Experience Replay
        
      
  Policy Gradient Methods
    
      describe
      Regression towards optimal policy - we don’t know optimal policy
      REINFORCE Monte Carlo Policy Gradient Control
      stronger convergence guarantees
      Deep Deterministic Policy Gradients
      Actor-Critic
      DDPG Double Deep Policy Gradient algorithm
    
  
In all the discussion up to now, any model learned indirectly by translating reward information into a loss function. Policy Gradient Methods directly apply the reward signal into the gradient updates of the policy function.draft
An agent must learn the value of states
value fns expanded:
  $$v_π(s) = ∑_a π(a\mapstos)∑_s′,r p(s′,r\mapstos,a)[r + \gammav_π(s′)], for all s ∈ S $$
  q version needs to be done right:
  $$q_π(s,a) = ∑_a π(a\mapstos)∑_s′,r p(s′,r\mapstos,a)[r + \gammav_π(s′)], for all s ∈ S $$
The foundation of all Reinforcement Learning (RL) is of the agent in an environment. From this flows all aspects of RL

  agent volition (policy)
  agent

The agent is an actor which recieves information about the environment, acts so as to make a change
  An agent that acts, recieves input
Environments can be discrete or continuous. State changes can be deterministic or not.
  An environment can be described by its states, the allowed actions at each state, and a transition function $$t(s,a) = s′$$ that accepts a state s, an action a, and return the next state. If the environment is not deterministic, then $$t(s,a)$$ returns $$p(S’)$$, the probability vector where each $$p(s_i) ∈ p(S’)$$ is the probability that the action will result in the environment changing to state $$s_i$$.
  A game such as chess is discrete and deterministic; players take turns one at a time, and moving the pieces has certain, well defined state transitions. A fisherman who is fishing has neither. The environment state is continuously changing, and even though the fisherman performs the action sequence of fishing, the result of catching a fish is not certain, but is a probability.
  Notice that fishing was described as a sequence of actions. Such a sequence is called a trace (or trajectory). The sequences are recursive in nature, one follows from the next. The chess game eventually terminates, all possible trajectories are finite. Some agent-environment loops are indefinite, such as say gambling with a one-arm bandit (a casino slot machine).
  A game with a one-arm bandit has a goal to make money over time, in that case the player is rewarded $$r_i$$ each time the bandit returns a win at time $$t_$i$.
  But each move costs, each non-winning state has a reward of say -1.
  In chess there is a single goal G to win the game, there are few terminal states that have a reward, and the intermediate states have none. But some states are more likely to result in a win and thus are more valuable. An agent has associated with it both a value function and a policy function. It is possible for an agent to have
  An agent must learn the value of states
[.] capstone edit #4 1st edit

[.] capstone edit #9 2nd edit

[.] capstone edit #4 2st edit

Breakout 4


  pdf reading [2019-10-21 Mon 13:49], about 1.5 hours till now.
    
      before hand sporadic
    
  
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Breakout 7
Title of the paper: A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem
  Team members: Larry Li , Nick Buryonk , Gurinder Ghotra
  Contributions:
Breakout 8
Title of the paper: Deep Reinforcement Learning in Large Discrete Action Spaces		Team members: Eric Djona Fegnem, Ariel Wang, Brendan McGivern, Yuanhui Lang, Ike Okonkwo
  Contributions:
Breakout 9
Title of the paper: AlphaD3M: Machine Learning Pipeline Synthesis
  Team members: Most Husne Jahan, Gunjan Lakhlani, Andriy Kourinnyi, Alireza Darbehani
  Contributions:

  1st sesh, distracted.
    
      didn’t count this,
    
  
Breakout Groups [0/0]

Session 1 [0/0]
how to write own env
  keras or el

  Policy:
    
      defines how our AI choose his action for a given state
    
  
  Reward Signal:
    
      the reward for a given action (not state?)
    
  
  Value function:
    
      Expected future rewards of a state (how good is a given state)
    
  
  Environment models:
    
      Simulates future states, allows for planning
      
    
Session 2 [1/1]

[x] prep

links

https://awwapp.com/#
  https://www.tutorialspoint.com/free_online_whiteboard.htm
  https://drive.google.com/drive/folders/1f8okO2XDraTJydT7MetOKuPrU8U1pBU2
  https://colab.research.google.com/drive/1ymctv7BRNG9UL7Tara_vyYgi103sdMnW#scrollTo=AT8EfnJGgbc7
  https://colab.research.google.com/drive/1zEgL1uYgtJgZMJ00LBKPs02dOkOnnVa8#scrollTo=CeiiZKYVgcVq
  https://drive.google.com/drive/folders/1ffIXP5k9k5Rcvbo4ouAXFJeR8XoTcUIb
  https://colab.research.google.com/drive/13tk0npWP6JhEx9JVw0bdcC8u3skD2fm0#scrollTo=5zH8U8OGe_fd
Session 3 [0/0]

MLops Workshop [22/44]
**** Notes
  ***** My AWS Account Info
AWSAccessKeyId=AKIAICLYXL5KD3AURP4A
AWSSecretKey=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
region=us-east-2

***** My Azure Account Info
  az account show
{
  "environmentName": "AzureCloud",
  "id": "38e53cfe-df59-42bc-ac0c-b50136568522",
  "isDefault": true,
  "name": "Free Trial",
  "state": "Enabled",
  "tenantId": "0b7a2c43-b11b-4048-a4ba-cf3fdd2b2272",
  "user": {
    "cloudShellID": true,
    "name": "live.com#willy.rempel@gmail.com",
    "type": "user"
  }
}

azureProfile.json
{"subscriptions": [{"id": "38e53cfe-df59-42bc-ac0c-b50136568522", "name": "Free Trial", "state": "Enabled", "user": {"name": "willy.rempel@gmail.com", "type": "user"}, "isDefault": true, "tenantId": "0b7a2c43-b11b-4048-a4ba-cf3fdd2b2272", "environmentName": "AzureCloud"}]}

ML service workspace

  name azure-ml-ws-1
  Subscription Free Trial
  Resource group cloud-shell-storage-eastus
  Location East US 2

***** Important links doc
  ****** Azure insturctions
  • Set up your free Azure Credit
  • Install mini conda for Python 3.7: https://docs.conda.io/en/latest/miniconda.html (make sure you updated the Path for conda command)
  • Run the following commands
  • conda create –name azureml python=3.7
  • conda activate azureml
  • conda install scikit-learn
  • pip install tensorflow==1.14
  • pip install azureml-sdk[explain,automl,notebooks,automl,services]
  • pip install pandas
  • pip install jupyter
  • Github:
  • Install github desktop
  • Create a github account
  • Set up github on your machine (login etc)
  • Install VSCode
  • Install extension: (https://code.visualstudio.com/docs/editor/extension-gallery)
  • Python (Author: Microsoft)
  • Azure Account (Author: Microsoft)
  • Azure Machine Learning (Author: Microsoft)
  • Git Graph (Author: mhutchie)
  ***** session1 notes

  10-15 years of DevOps
  MLops
    
      own entire lifecycle: build and deploy
      IT (Ops) only focus on infrastructure
      at least be able to talk to Ops team,
    
  
  cloud
    
      managed resurces
      huge abstraction
      serverless, auto scalability
      Separation of resources
      lower cost
    
  
  @1:07 git starts

**** Archive
  **** TODOs [22/43]***** AWS Study [2/7]
  ****** [x] sagemaker api****** [x] boto3 api****** [.] aws deepdive series
  ******* vid1

  managed notebook EC2 VM instance, managed means
    
      doesn’t show up in EC2 console
      no SSH access
    
  
  EBS volume 5GB default
    
      persists
    
  
  add, create git repo
  config shell 15min time limit, use’&’
  Elastic Inference – attach GPU
  

****** [-] aws cli
  ****** [.] build own pipeline
  might need more, check serverless repo
  ******* [.] yaml formation
  ******* [.] aws shell
  ***** Azure Study [0/2]
[.] yaml files method

[.] relation of dev.azure.com , portal.azure.com

Info for Docs For all doc entries

Here are the topics and the respective contents that need to be created for AWS and GCP:
  For every bullet for each day we need to gather:

  Title and a short description of the technology in a paragraph or two. This will be used to explain technology.
  2-3 links for further studies.
  If it requires implementation: simple notebook. If it’s an architecture 2-3 images

Hossein mentioned - appendices included for hands-on and home-work (HW)
Doc Day 1: [18/18]

[c] Overall Architecture for ML Stack
Overall Architecture for ML Stack in GCP and AWS (one or two paragraphs + architecture diagram)
[c] CI | CD  frameworks on AWS and GCP + Integrating them with GCP AI Hub or AWS SageMaker for ML Pipelines (GitOps) (Simple diagram showing the flow + Short Notebook if applicable)


  native clouds, or FOSS as example
  data engineers

[x] Experiment tracking tools (one or two paragraphs + code samples + screenshot if applicable)


  tracking/logging metrics in AWS
  hard to find?
  exptrack – collect logs, vs #4 below.
  autoML –
[x] Title and a short description of the technologyTitle and a short description of the technology. This will be used to explain technology. 2 paragraphs

  autoML algos are in marketplace

AWS Search

  Find
  Evaluate
  Verify datasets used by training jobs
  Trace Model lineage
    
      dataset
      algorithm
      hyperparameters
      metrics
    
  
Tags are used to track experiments and group them together. You apply them in your code, and can search using either the AWS Console, a web front-end, or by the API.
AutoML - this is offered by the AWS marketplace, where there are several options[x] 2-3 links for further studies.AWS Documentation:
  https://docs.aws.amazon.com/sagemaker/latest/dg/search.html
Sample Notebook:
  https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/search/ml_experiment_management_using_search.ipynb[x] If it requires implementation: simple notebook. If it’s an architecture 2-3 images[x] code sample[x] screenshot
[x] Hyperparameter tuning techniques and parallel training engines in GCP or AWS (one or two paragraphs + code samples + screenshot  if applicable)
Notes reading AWS docs

  bayesian search has good tutorial links
  built-in come with metrics
    
      metrics found in cloudwatch logs
    
  
  define metrics. send to ??? https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-metrics.html
  
[x] title + 2 paragraphsTitle and a short description of the technology. This will be used to explain technology.
  Define Metrics
  Define Hyperparameter Ranges
  Early Stopping Options
  Bayesian or Random Search Options[x] 2-3 links for further studies.AWS Documentation:
  https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html
Example from Documents:
  https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-ex.html[x] If it requires implementation: simple notebook. If it’s an architecture 2-3 images[x] code sample[x] screenshot
[x] Q for Hossien


  is he presenting our stuff or are we?
    
      am I doing online?
    
  
  What is expected of us? thurough knowledge of our cloud
  FOSS do I pick my own favourite?
  NO: do we talk about the high level services (turn-key)

[x] addition from Hossein
Important Update - extra actions before Day 1 please accomplish two extra steps after: conda activate azureml
  • pip install pandas
  • pip install jupyter
  After all of the steps for number 2 is finished (after pip install azureml-sdk …) => test whether AzureML SDK is property installed:
  In command prompt: make sure azureml conda environment is activated (conda activate azureml)
  type: python
  type: import azureml.core
  type: print(azureml.core.VERSION)
[x] check Werners diagram

[x] sagemaker search
only on console?
[#A] Day2 [2/14]

Doc Day 2:
[.] Model Management / Model Store (one or two paragraphs + code samples (register model/access model) + screenshot if applicable)[.] Containerize the model into an image (one or two paragraphs + code samples (register model/access model) + screenshot if applicable)[.] Perform a dummy test at model registration time into the model store - (one or two paragraphs + code samples + screenshot  if applicable)[.] Integrate CI/CD pipeline with the model store using the dummy test (one or two paragraphs + code samples + screenshot  if applicable)
[x] move D Rangel

[.] G3:Nour helpful extra stuff


  ie. protect a branch

[x] session2 prep - tuesday night deck and notebooks (drafts), go thru

[.] pre-session pack

[.] TAs more proactive - howto for online?

[.] code samples and guides to do capstone. also for future pipeline builds
[.] AWS Notebooks
[.] Q data sanity check

[.] Q telemetry info

[.] Q register model

[.] G3 M2

Doc Day3: [0/0]


  discuss different deployment scenarios - batch and online
  build release pipeline for model deployment
  monitoring and logging techniques for ML model in the wild
  best practices to build scalable ML pipelines
  bring model explainability in the pipeline (training and deployment)

[.] notbooks fixes

cell 14 : add “`model_name = “tf_mnist_pipeline.model”“`
  cell 15 : create score directory in root folder
  cell 19 : add “`import os“`
  cell 19 : “`Execution script score.py doesn’t exist.“`
[#A] MLops Capstone: [0/1]

[.] [#B] Capstone & Documents Guide (meet Hossien) [2019-10-07 Mon 18:00]


  pipeline= series of transforms
  data pre-process keep seperate from model building
  [ ] actual compute targets
  [ ] bring in data from redshift? other than S3
  [ ] what enviros for work?
  reproducible enviro for training for day1
  multi-pipelines options for the whole pipeline, so people know about them
  be able to coach ppl on how to do the work on cloud use for capstone
  [ ] 1st part architecture: data - preprocc - train - etc
    
      deploy, orchestration next day
    
  
  end-to-end initial, what it looks like
  exp tracking , any logs, not just
  use these as guides: azure experiments, mlflow experiments
  [ ] high level understaning for initial
    
      [ ] find some github links for examples
    
  
  [ ] seperate g docs for days
  use boto, SDK
  they pick a project that groups build from beginning to end

? help Jiri with their AWS documents - Joffri

meeting

meet #1 [2019-10-01 Mon]


  Azure
    
      focused
    
  
  FOSS stacks as well, and theoretical aspects
    
      QFlow, etc ???
      Spod,
      dataprod, databricks,
      ppl tend to use managed FOSS
      focus for enterprise ready on the cloud
    
  
  including as alt to azure
    
      AWS
        
          TODO US: for AWS
            
              slides
              simple notebook
                
                  ie. track metrics on aws or gcp
                
              
      google cloud compute (GCP) - other TAs
    
  
  full ML pipeline finished by end of WK3

meet #2 fri [2019-10-04 Fri]


  git, more advanced
    
      team focused
      1hr including hands-on, some examples
      cherry picking, bisecting optional HW
    
  
  some contest? missed. Monday, tue

Day1 Werner meets

werner call


  sagemaker - fancy version of jupyter
    
      some stuff, not all.
    
  
  AWS behind
  exp tracking
  sparkML -> big data pipeline
  ie ML exp & metrics https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-track-experiments

meet 2.1 Werner


  sagemaker: no container,
  ci/cd use co-pipline e
  AWS acct:
    ML-Ops Staff
  sagemaker: do everything there.
  sagemaker estimator least flexible
    
      [ ] can we go lower level?
    
  
  AWS lambda for inference?
  CloudFormation - it is the orchestration
    
      use a definition, and reverse engineer
      ECR - amazons container (elastic container register)
      amazon dynamoDB for logging etc, but you are not stuck with it
      cloudformation button, deploy whole thing
    
  
  Xiyang will send AWS

meet 2.2 Werner tryout ci/cd on aws
Action execution failed
  Parameter validation failed: Invalid length for parameter InputDataConfig, value: 0, valid range: 1-inf
Input artifact
  bld
  Output artifact
  trigger
  FunctionName
  aws-mlops-model-cicd-pipeli-LambdaSageMakerTrigger-DKC87JX8ZI2C
meet #3 fri [2019-10-11 Fri]


  50-55 ppl
  feedback:
    
      lots about git
      issues about enviros and setup
      some too slow, some too fast
      etc
    
  
  Amir:
    
      cutting down material
      TAs more proactive
        
          howto for online?
          fixed (poll) hours
        
      
meet #4 [2019-10-14 Mon]


  2nd session
    
      continue with git
      reduced and abstracted content a lot
      platform agnostic content
        
          but gets advanced, not enough time
          give extra material
          additional session for advanced students
        
      
      will not cover: containerization, deploy dockers
        
          not trivial
        
      
      MLops is devops for ML
    
  
  3rd deploy to kub or docker into wild – from telemetry make data driven actions
    
      kubernetes env on AWS and howto get telemetries?
      refresher of entire workshop
    
  
meet #5 [2019-10-18 Fri]

Breakout Groups [0/0]


  TA hour 1 [2019-10-12 Sat] : G1, G3

G1
Jiri Stodulka
  Michael Smart
  Ramya Balasubramaniam
  Zain Nasrullah
TA hour 1 [2019-10-12 Sat]


  Jiri
    
      workshop circle with Joffri on AWS
      aisc: Omar recommendor systems
        
          RL recommender system
        
      
      wants to do both Azure & AWS, will do Azure with Zain
    
  
  Zain, has less time, will do Azure only

G2
Alvin Jin
  Doug Rangel
  Farhan
  Lediona Nishani
meet #1


  irl for their next meetup sat.
  mnist model

G3
Alex
  Fatin Haque
  Andriy Kourinyi
  Nour Fahmy
TA hour 1 [2019-10-12 Sat]


  cloud agnostic for Fatin
  tracking, reproducability, dockerimages?,
    
      anyone else to join?
      fyi : terraform intead of cloudformation at his job
    
  
Code Stuff

python Ref

TheAlgorithms/Python: All Algorithms implemented in Python
python libs + datsci advice notebooks


  PYCON UK 2017: Machine learning libraries you’d wish you’d known about - YouTube - great talk, seasoned datsci. followed him on github.
    
      ianozsvald/data_science_delivered: Observations from Ian on successfully delivering data science products - his repo with real-world advice for production and scale datsci, with notebooks. repo cloned in ~/DevAcademics/PythonNotebooks/data_science_delivered
      scikit-learn-contrib/forest-confidence-interval: Confidence intervals for scikit-learn forest algorithms - via readme from above. How good are the models?
      will install all 5 libraries he mentioned.
    
  
python libs II


  Web REST API Benchmark on a Real Life Application – Mihai Cracan – Medium
  Top 20 Python libraries for data science in 2018 | ActiveWizards: data science and engineering lab , this started it all for last couple of days. (found via linkedin)
  Graphviz - Graph Visualization Software
  StatsModels: Statistics in Python — statsmodels 0.9.0 documentation
  Overview — ELI5 0.7 documentation
    
      Mikhail Korobov - Explaining behavior of Machine Learning models with eli5 library - YouTube
    
  
  ONNX - Getting Started
    
      onnx/onnx: Open Neural Network Exchange
    
  
  NLP
    
      gensim: Topic modelling for humans
      spaCy · Industrial-strength Natural Language Processing in Python
      Natural Language Toolkit — NLTK 3.3 documentation
    
  
  PYCON UK 2017: Machine learning libraries you’d wish you’d known about - YouTube
    
      DistrictDataLabs/yellowbrick: Visual analysis and diagnostic tools to facilitate machine learning model selection.
      marcotcr/lime: Lime: Explaining the predictions of any machine learning classifier
      EpistasisLab/tpot: A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
      Dask: Scalable analytics in Python
    
  
  xgboost/python-package at master · dmlc/xgboost
    
      Installation Guide — xgboost 0.72 documentation
      How to Install XGBoost for Python on macOS
        
          in the comments guy did conda install -c conda-forge xgboost, saves lots of steps
        
      
ob-ipython [2018-01-03 Wed]

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

plt.hist(np.random.randn(20000), bins=200)


print("hello world")

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

plt.hist(np.random.randn(20000), bins=200)


import IPython.kernel.multikernelmanager as km
# km.list_kernel_ids()
# km.MultiKernelManager.list_kernel_ids()

  # print print_me
  # print "hello world"
# print "another ipython kernel!"

some tensorflow [2018-09-14 Fri]

gradientTape example
import tensorflow as tf

  x = tf.constant(3.0)
  with tf.GradientTape() as g:
      g.watch(x)
      y = x * x
      dy_dx = g.gradient(y, x) # Will compute to 6.0
      print(dy_dx)


diveintopython book [2018-01-05 Fri]

CH 4 The power of introspection

4.1 apihelper.py
def info(object, spacing=10, collapse=1):
    """Print methods and doc stings. Takes module, class, list, dictionary, or string."""
    methodList = [method for method in dir(object) if callable(getattr(object, method))]
    processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)
    print "\n".join(["%s %s" %
    (method.ljust(spacing),
     processFunc(str(getattr(object, method).__doc__)))
                     for method in methodList])

if __name__ == "__main__":
    print info.__doc__


import sympy as sym
x = sym.Symbol('x')
k = sym.Symbol('k')
print sym.latex(sym.Integral(1/x, x))

pdb debug


  l list
  n next
  c continue
  s step
  r return
  b break
  And python

code to check if in a venv

  import sys

def is_venv():
    return (hasattr(sys, 'real_prefix') or
            (hasattr(sys, 'base_prefix') and sys.base_prefix != sys.prefix))
  
The check for sys.real_prefix covers virtualenv, the equality of non-empty sys.base_prefix with sys.prefix covers venv.
  Consider a script that uses the function like this:
if is_venv():
    print('inside virtualenv or venv')
else:
    print('outside virtualenv or venv')
Code Snippets

open all csv files in a dir and do some row stuff
  36.11. pipes — Interface to shell pipelines — Python 2.7.14 documentation
  13.1. csv — CSV File Reading and Writing — Python 2.7.14 documentation
import os 
import csv

path=os.getcwd()

filenames = os.listdir(path)

for filename in filenames:
    if filename.endswith('.csv'):
        r=csv.reader(open(filename))
        new_data = []
        for row in r:
            row[-1] = row[-1].replace("S-D", "S")
            new_data.append(row)

        newfilename = "".join(filename.split(".csv")) + "_edited.csv"
        with open(newfilename, "w") as f:
            writer = csv.writer(f)
            writer.writerows(new_data)
Dave’s fancy time series plot

import matplotlib.pyplot as plt
plt.style.use('dark_background')
cmap = plt.get_cmap('viridis')
colors = cmap(np.linspace(0, 1.0, len(data)))
for i, series in enumerate(data):
    plt.plot(series, color=colors[i])
plt.show()
Hy

A mile Hy - My experience with lispy Python | Modern Emacs

  setv - set variables
  cond - cases wrapped in []
  do = progn
  (for [i (range 10)] …)

try some hy

Kitchin loves it.
(import numpy)
(setv a (numpy.array [1 2 3]))
(setv b (numpy.array [1 2 3]))
(print (numpy.dot a b))
(defn simple-conversation []
  (print "hello! yadda yadda")
  (setv name (input "What name? "))
  (setv age (input "What age? "))
  (print (+ "hello " name "! I see you are " age " years old.")))

(simple-conversation)
sample ob-ipython

a = 5
b = 2**5
hi = "Hello World!"

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

a = 888


# plt.hist(np.random.randn(20000), bins=200)

# %matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
# import tensorflow as tf

plt.hist(np.random.randn(20000), bins=200)

# import tensorflow as tf

x = np.random.randn(10000)
y = np.sin(x)

import numpy as np
import tensorflow as tf

node1 = tf.constant(3.0, dtype=tf.float32)
node2 = tf.constant(4.0)

# print(node1, node2)
# [foo(x) + 7 for x in range(20)]
"what?"

sess = tf.Session()

# return (node1, node2)

# return (sess.run([node1, node2]))

node3 = tf.add(node1, node2)
return ("sess.run(node3):", sess.run(node3))
# %matplotlib inline
# import matplotlib.pyplot as plt
# import numpy as np
# import tensorflow as tf

# plt.hist(np.random.randn(20000), bins=200)

# def foo(x):
#     return x + 9

# [foo(x) + 7 for x in range(7)]

import sympy as sym
x = sym.Symbol('x')
k = sym.Symbol('k')

print(sym.latex(sym.Integral(1/x,x)))
print(sym.latex(sym.besseli(x,k)))

kitchin

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

t = np.linspace(0, 20 * np.pi, 350)
x = np.exp(-0.1 * t) * np.sin(t)
y = np.exp(-0.1 * t) * np.cos(t)


plt.plot(x, y)
plt.axis('equal')

plt.figure()
plt.plot(y, x)
plt.axis('equal')

print('Length of t = {}'.format(len(t)))
print('x .dot. y = {}'.format(x @ y))

ditaa org-babel

+---------+
|         |
| Willy   |
|         |
+----+----+---+
|Bar |Baz     |
|    |        |
+----+--------+

++
++

find where ditaa should go
(expand-file-name
             "ditaa.jar"
      (file-name-as-directory
            (expand-file-name
                "scripts"
               (file-name-as-directory
                  (expand-file-name
                      "../contrib"
                      (file-name-directory (org-find-library-dir "org")))))))
+------+   +-----+   +-----+   +-----+
|{io}  |   |{d}  |   |{s}  |   |cBLU |
| Foo  +---+ Bar +---+ Baz +---+ Moo |
|      |   |     |   |     |   |     |
+------+   +-----+   +--+--+   +-----+
                        |
           /-----\      |      +------+
           |     |      |      | c1AB |
           | Goo +------+---=--+ Shoo |
           \-----/             |      |
                               +------+

Scratch

RL

AISC RL Workshop
  DLRL RL workshop
GoogleAI RL Projects

https://opensource.google/projects/dopamine
  https://opensource.google/projects/deepmind-lab
  https://opensource.google/projects/magenta
Headline	Time
Total time	4:48
\_ AISC Math of DL Workshops [4/5]		4:48
Willy Rempel	Jul 25, 2019	” ”	160
Willy Rempel	Jul 17, 2019	1st TA Working Session, Introduction	30
Willy Rempel	Jul 16, 2019	1st Notebook preperation	45
Willy Rempel	Jul 24, 2019	2nd Notebook preperation + related work?	80
		315 / 60 = 5.25 hours	315
		9 hours class
	14 * 14.25 = 199.50	14.25
name	azure-ml-ws-1
Subscription	Free Trial
Resource group	cloud-shell-storage-eastus
Location	East US 2